2009-07-05 22:20:52

by Michael S. Zick

[permalink] [raw]
Subject: Null Pointer BUG in uhci_hcd

Anyone with suggestions about this one?

root@cb01:~# dmesg
uhci_hcd: USB Universal Host Controller Interface driver
uhci_hcd 0000:00:10.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, low) -> IRQ 10
uhci_hcd 0000:00:10.0: UHCI Host Controller
uhci_hcd 0000:00:10.0: new USB bus registered, assigned bus number 1
uhci_hcd 0000:00:10.0: irq 10, io base 0x00004400
usb usb1: New USB device found, idVendor=1d6b, idProduct=0001
usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb1: Product: UHCI Host Controller
usb usb1: Manufacturer: Linux 2.6.30-ce1200v-09185 uhci_hcd
usb usb1: SerialNumber: 0000:00:10.0
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 2 ports detected
uhci_hcd 0000:00:10.1: PCI INT B -> Link[LNKB] -> GSI 11 (level, low) -> IRQ 11
uhci_hcd 0000:00:10.1: UHCI Host Controller
uhci_hcd 0000:00:10.1: new USB bus registered, assigned bus number 2
uhci_hcd 0000:00:10.1: HCRESET not completed yet!
uhci_hcd 0000:00:10.1: irq 11, io base 0x00004420
usb usb2: New USB device found, idVendor=1d6b, idProduct=0001
usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb2: Product: UHCI Host Controller
usb usb2: Manufacturer: Linux 2.6.30-ce1200v-09185 uhci_hcd
usb usb2: SerialNumber: 0000:00:10.1
usb usb2: configuration #1 chosen from 1 choice
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 0 ports detected
uhci_hcd 0000:00:10.2: PCI INT C -> Link[LNKC] -> GSI 10 (level, low) -> IRQ 10
uhci_hcd 0000:00:10.2: UHCI Host Controller
uhci_hcd 0000:00:10.2: new USB bus registered, assigned bus number 3
uhci_hcd 0000:00:10.2: HCRESET not completed yet!
uhci_hcd 0000:00:10.2: irq 10, io base 0x00004440
usb usb3: New USB device found, idVendor=1d6b, idProduct=0001
usb usb3: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb3: Product: UHCI Host Controller
usb usb3: Manufacturer: Linux 2.6.30-ce1200v-09185 uhci_hcd
usb usb3: SerialNumber: 0000:00:10.2
uhci_hcd 0000:00:10.2: host system error, PCI problems?
uhci_hcd 0000:00:10.2: host controller process error, something bad happened!
uhci_hcd 0000:00:10.2: host controller halted, very bad!
uhci_hcd 0000:00:10.2: HCRESET not completed yet!
uhci_hcd 0000:00:10.2: HC died; cleaning up
uhci_hcd 0000:00:10.1: host system error, PCI problems?
uhci_hcd 0000:00:10.1: host controller process error, something bad happened!
uhci_hcd 0000:00:10.1: host controller halted, very bad!
uhci_hcd 0000:00:10.1: HCRESET not completed yet!
uhci_hcd 0000:00:10.1: HC died; cleaning up
usb usb3: configuration #1 chosen from 1 choice
usb usb3: can't set config #1, error -108
uhci_hcd 0000:00:10.2: HC died; cleaning up
BUG: unable to handle kernel NULL pointer dereference at 00000050
IP: [<c03ae8b1>] usb_kick_khubd+0x21/0x40
*pde = 00000000
Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
last sysfs file: /sys/devices/pci0000:00/0000:00:10.0/usb1/bmAttributes
Modules linked in: uhci_hcd(+) joydev i2c_viapro rtl8187 [last unloaded: ehci_hcd]

Pid: 3562, comm: modprobe Tainted: G W (2.6.30-ce1200v-09185 #5) Everex gBook Series
EIP: 0060:[<c03ae8b1>] EFLAGS: 00010046 CPU: 0
EIP is at usb_kick_khubd+0x21/0x40
EAX: 00000000 EBX: 00000282 ECX: d90db39b EDX: 00000000
ESI: d9e15000 EDI: 00000000 EBP: d9e15000 ESP: d0749d60
DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
Process modprobe (pid: 3562, ti=d0748000 task=d9e1e370 task.ti=d0748000)
Stack:
d90db39b c03b071c c06b0840 dc7e31a2 db473bf0 d90db39b 00000000 00004440
c03b1515 c06b08ec dc7e31a2 db473bf0 0000000a c0694a4c 00004440 00000000
00000080 0000000a c048c054 d0749dbf d63fe800 db473bf0 db49f85c d90db39b
Call Trace:
[<c03b071c>] ? usb_hc_died+0x8c/0xb0
[<c03b1515>] ? usb_add_hcd+0x595/0x600
[<c048c054>] ? pcibios_set_master+0x34/0xd0
[<c03bf485>] ? usb_hcd_pci_probe+0x1c5/0x2d0
[<c01f728a>] ? sysfs_do_create_link+0xca/0x150
[<c02a283a>] ? local_pci_probe+0x1a/0x40
[<c02a39ca>] ? pci_device_probe+0x7a/0xa0
[<c034653c>] ? driver_probe_device+0x8c/0x170
[<c02a2aea>] ? pci_match_device+0xca/0x110
[<c03466ab>] ? __driver_attach+0x8b/0xa0
[<c0346620>] ? __driver_attach+0x0/0xa0
[<c0345988>] ? bus_for_each_dev+0x48/0x80
[<c02a2860>] ? pci_device_shutdown+0x0/0x40
[<c0346372>] ? driver_attach+0x22/0x40
[<c0346620>] ? __driver_attach+0x0/0xa0
[<c034615b>] ? bus_add_driver+0x1db/0x250
[<c02a2860>] ? pci_device_shutdown+0x0/0x40
[<c0346b14>] ? driver_register+0x64/0x130
[<dc7d5000>] ? uhci_hcd_init+0x0/0xe1 [uhci_hcd]
[<c02a3cb2>] ? __pci_register_driver+0x42/0xc0
[<dc7d5000>] ? uhci_hcd_init+0x0/0xe1 [uhci_hcd]
[<dc7d508f>] ? uhci_hcd_init+0x8f/0xe1 [uhci_hcd]
[<c010112e>] ? do_one_initcall+0x3e/0x1a0
[<c015ba72>] ? sys_init_module+0xb2/0x1f0
[<c010313b>] ? sysenter_do_call+0x12/0x26
Code: 74 26 00 8d bc 27 00 00 00 00 83 ec 04 65 8b 15 14 00 00 00 89 14 24 31 d2 8b 14 24 65 33 15 14 00 00 00 75 12 8b 80 20 01 00 00 <8b> 40 50 8b 40 64 59 e9 a3 d7 ff ff e8 2e ba d7 ff 8d b4 26 00
EIP: [<c03ae8b1>] usb_kick_khubd+0x21/0x40 SS:ESP 0068:d0749d60
CR2: 0000000000000050
---[ end trace 15ef77c17c582a97 ]---
note: modprobe[3562] exited with preempt_count 1
root@cb01:~#


2009-07-07 09:13:18

by Jiri Kosina

[permalink] [raw]
Subject: Re: Null Pointer BUG in uhci_hcd


[ adding linux-usb to CC ]

On Sun, 5 Jul 2009, Michael S. Zick wrote:

> Anyone with suggestions about this one?
>
> root@cb01:~# dmesg
> uhci_hcd: USB Universal Host Controller Interface driver
> uhci_hcd 0000:00:10.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, low) -> IRQ 10
> uhci_hcd 0000:00:10.0: UHCI Host Controller
> uhci_hcd 0000:00:10.0: new USB bus registered, assigned bus number 1
> uhci_hcd 0000:00:10.0: irq 10, io base 0x00004400
> usb usb1: New USB device found, idVendor=1d6b, idProduct=0001
> usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
> usb usb1: Product: UHCI Host Controller
> usb usb1: Manufacturer: Linux 2.6.30-ce1200v-09185 uhci_hcd
> usb usb1: SerialNumber: 0000:00:10.0
> usb usb1: configuration #1 chosen from 1 choice
> hub 1-0:1.0: USB hub found
> hub 1-0:1.0: 2 ports detected
> uhci_hcd 0000:00:10.1: PCI INT B -> Link[LNKB] -> GSI 11 (level, low) -> IRQ 11
> uhci_hcd 0000:00:10.1: UHCI Host Controller
> uhci_hcd 0000:00:10.1: new USB bus registered, assigned bus number 2
> uhci_hcd 0000:00:10.1: HCRESET not completed yet!
> uhci_hcd 0000:00:10.1: irq 11, io base 0x00004420
> usb usb2: New USB device found, idVendor=1d6b, idProduct=0001
> usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
> usb usb2: Product: UHCI Host Controller
> usb usb2: Manufacturer: Linux 2.6.30-ce1200v-09185 uhci_hcd
> usb usb2: SerialNumber: 0000:00:10.1
> usb usb2: configuration #1 chosen from 1 choice
> hub 2-0:1.0: USB hub found
> hub 2-0:1.0: 0 ports detected
> uhci_hcd 0000:00:10.2: PCI INT C -> Link[LNKC] -> GSI 10 (level, low) -> IRQ 10
> uhci_hcd 0000:00:10.2: UHCI Host Controller
> uhci_hcd 0000:00:10.2: new USB bus registered, assigned bus number 3
> uhci_hcd 0000:00:10.2: HCRESET not completed yet!
> uhci_hcd 0000:00:10.2: irq 10, io base 0x00004440
> usb usb3: New USB device found, idVendor=1d6b, idProduct=0001
> usb usb3: New USB device strings: Mfr=3, Product=2, SerialNumber=1
> usb usb3: Product: UHCI Host Controller
> usb usb3: Manufacturer: Linux 2.6.30-ce1200v-09185 uhci_hcd
> usb usb3: SerialNumber: 0000:00:10.2
> uhci_hcd 0000:00:10.2: host system error, PCI problems?
> uhci_hcd 0000:00:10.2: host controller process error, something bad happened!
> uhci_hcd 0000:00:10.2: host controller halted, very bad!
> uhci_hcd 0000:00:10.2: HCRESET not completed yet!
> uhci_hcd 0000:00:10.2: HC died; cleaning up
> uhci_hcd 0000:00:10.1: host system error, PCI problems?
> uhci_hcd 0000:00:10.1: host controller process error, something bad happened!
> uhci_hcd 0000:00:10.1: host controller halted, very bad!
> uhci_hcd 0000:00:10.1: HCRESET not completed yet!
> uhci_hcd 0000:00:10.1: HC died; cleaning up
> usb usb3: configuration #1 chosen from 1 choice
> usb usb3: can't set config #1, error -108
> uhci_hcd 0000:00:10.2: HC died; cleaning up
> BUG: unable to handle kernel NULL pointer dereference at 00000050
> IP: [<c03ae8b1>] usb_kick_khubd+0x21/0x40
> *pde = 00000000
> Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
> last sysfs file: /sys/devices/pci0000:00/0000:00:10.0/usb1/bmAttributes
> Modules linked in: uhci_hcd(+) joydev i2c_viapro rtl8187 [last unloaded: ehci_hcd]
>
> Pid: 3562, comm: modprobe Tainted: G W (2.6.30-ce1200v-09185 #5) Everex gBook Series
> EIP: 0060:[<c03ae8b1>] EFLAGS: 00010046 CPU: 0
> EIP is at usb_kick_khubd+0x21/0x40
> EAX: 00000000 EBX: 00000282 ECX: d90db39b EDX: 00000000
> ESI: d9e15000 EDI: 00000000 EBP: d9e15000 ESP: d0749d60
> DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
> Process modprobe (pid: 3562, ti=d0748000 task=d9e1e370 task.ti=d0748000)
> Stack:
> d90db39b c03b071c c06b0840 dc7e31a2 db473bf0 d90db39b 00000000 00004440
> c03b1515 c06b08ec dc7e31a2 db473bf0 0000000a c0694a4c 00004440 00000000
> 00000080 0000000a c048c054 d0749dbf d63fe800 db473bf0 db49f85c d90db39b
> Call Trace:
> [<c03b071c>] ? usb_hc_died+0x8c/0xb0
> [<c03b1515>] ? usb_add_hcd+0x595/0x600
> [<c048c054>] ? pcibios_set_master+0x34/0xd0
> [<c03bf485>] ? usb_hcd_pci_probe+0x1c5/0x2d0
> [<c01f728a>] ? sysfs_do_create_link+0xca/0x150
> [<c02a283a>] ? local_pci_probe+0x1a/0x40
> [<c02a39ca>] ? pci_device_probe+0x7a/0xa0
> [<c034653c>] ? driver_probe_device+0x8c/0x170
> [<c02a2aea>] ? pci_match_device+0xca/0x110
> [<c03466ab>] ? __driver_attach+0x8b/0xa0
> [<c0346620>] ? __driver_attach+0x0/0xa0
> [<c0345988>] ? bus_for_each_dev+0x48/0x80
> [<c02a2860>] ? pci_device_shutdown+0x0/0x40
> [<c0346372>] ? driver_attach+0x22/0x40
> [<c0346620>] ? __driver_attach+0x0/0xa0
> [<c034615b>] ? bus_add_driver+0x1db/0x250
> [<c02a2860>] ? pci_device_shutdown+0x0/0x40
> [<c0346b14>] ? driver_register+0x64/0x130
> [<dc7d5000>] ? uhci_hcd_init+0x0/0xe1 [uhci_hcd]
> [<c02a3cb2>] ? __pci_register_driver+0x42/0xc0
> [<dc7d5000>] ? uhci_hcd_init+0x0/0xe1 [uhci_hcd]
> [<dc7d508f>] ? uhci_hcd_init+0x8f/0xe1 [uhci_hcd]
> [<c010112e>] ? do_one_initcall+0x3e/0x1a0
> [<c015ba72>] ? sys_init_module+0xb2/0x1f0
> [<c010313b>] ? sysenter_do_call+0x12/0x26
> Code: 74 26 00 8d bc 27 00 00 00 00 83 ec 04 65 8b 15 14 00 00 00 89 14 24 31 d2 8b 14 24 65 33 15 14 00 00 00 75 12 8b 80 20 01 00 00 <8b> 40 50 8b 40 64 59 e9 a3 d7 ff ff e8 2e ba d7 ff 8d b4 26 00
> EIP: [<c03ae8b1>] usb_kick_khubd+0x21/0x40 SS:ESP 0068:d0749d60
> CR2: 0000000000000050
> ---[ end trace 15ef77c17c582a97 ]---
> note: modprobe[3562] exited with preempt_count 1
> root@cb01:~#
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
Jiri Kosina
SUSE Labs

2009-07-07 14:08:17

by Oliver Neukum

[permalink] [raw]
Subject: Re: Null Pointer BUG in uhci_hcd

Am Dienstag, 7. Juli 2009 11:13:01 schrieb Jiri Kosina:
> [ adding linux-usb to CC ]
>
> On Sun, 5 Jul 2009, Michael S. Zick wrote:
> > Anyone with suggestions about this one?

Looks like we should test for presence before we switch off
autosuspend for khubd. Please test this diagnostic patch if this is
repeatable.

Regards
Oliver

diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 2af3b4f..c5b3929 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -371,11 +371,16 @@ static void kick_khubd(struct usb_hub *hub)
{
unsigned long flags;

- /* Suppress autosuspend until khubd runs */
- to_usb_interface(hub->intfdev)->pm_usage_cnt = 1;
+ if (!hub) {
+ BUG();
+ return;
+ }

spin_lock_irqsave(&hub_event_lock, flags);
if (!hub->disconnected && list_empty(&hub->event_list)) {
+ /* Suppress autosuspend until khubd runs */
+ to_usb_interface(hub->intfdev)->pm_usage_cnt = 1;
+
list_add_tail(&hub->event_list, &hub_event_list);
wake_up(&khubd_wait);
}
@@ -384,6 +389,11 @@ static void kick_khubd(struct usb_hub *hub)

void usb_kick_khubd(struct usb_device *hdev)
{
+ if (!hdev) {
+ BUG();
+ return;
+ }
+
/* FIXME: What if hdev isn't bound to the hub driver? */
kick_khubd(hdev_to_hub(hdev));
}

2009-07-07 14:32:58

by Michael S. Zick

[permalink] [raw]
Subject: Re: Null Pointer BUG in uhci_hcd

On Tue July 7 2009, Oliver Neukum wrote:
> Am Dienstag, 7. Juli 2009 11:13:01 schrieb Jiri Kosina:
> > [ adding linux-usb to CC ]
> >
> > On Sun, 5 Jul 2009, Michael S. Zick wrote:
> > > Anyone with suggestions about this one?
>
> Looks like we should test for presence before we switch off
> autosuspend for khubd. Please test this diagnostic patch if this is
> repeatable.
>

Super! Will give it a try in todays build/testing and get back.

Note: If I understand the directions with this debug_vmalloc option,
this dump was caused by reference to a vmalloc'd page that had since
been freed. I have a copy of the dmesg at boot and the /sys/kernel/debug
list of PTE entries - - if that would help to know what cr2=0..50 was
pointing at.

Mike
> Regards
> Oliver
>
> diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> index 2af3b4f..c5b3929 100644
> --- a/drivers/usb/core/hub.c
> +++ b/drivers/usb/core/hub.c
> @@ -371,11 +371,16 @@ static void kick_khubd(struct usb_hub *hub)
> {
> unsigned long flags;
>
> - /* Suppress autosuspend until khubd runs */
> - to_usb_interface(hub->intfdev)->pm_usage_cnt = 1;
> + if (!hub) {
> + BUG();
> + return;
> + }
>
> spin_lock_irqsave(&hub_event_lock, flags);
> if (!hub->disconnected && list_empty(&hub->event_list)) {
> + /* Suppress autosuspend until khubd runs */
> + to_usb_interface(hub->intfdev)->pm_usage_cnt = 1;
> +
> list_add_tail(&hub->event_list, &hub_event_list);
> wake_up(&khubd_wait);
> }
> @@ -384,6 +389,11 @@ static void kick_khubd(struct usb_hub *hub)
>
> void usb_kick_khubd(struct usb_device *hdev)
> {
> + if (!hdev) {
> + BUG();
> + return;
> + }
> +
> /* FIXME: What if hdev isn't bound to the hub driver? */
> kick_khubd(hdev_to_hub(hdev));
> }
>
>
>

2009-07-07 15:10:43

by Alan Stern

[permalink] [raw]
Subject: Re: Null Pointer BUG in uhci_hcd

On Tue, 7 Jul 2009, Oliver Neukum wrote:

> Am Dienstag, 7. Juli 2009 11:13:01 schrieb Jiri Kosina:
> > [ adding linux-usb to CC ]
> >
> > On Sun, 5 Jul 2009, Michael S. Zick wrote:
> > > Anyone with suggestions about this one?
>
> Looks like we should test for presence before we switch off
> autosuspend for khubd. Please test this diagnostic patch if this is
> repeatable.

That patch won't fix the problem. This issue is not that hdev is NULL;
the problem is that the hub driver isn't bound to hdev and as a result,
the hdev_to_hub routine fails.

This patch may work better.

Alan Stern


Index: usb-2.6/drivers/usb/core/hub.c
===================================================================
--- usb-2.6.orig/drivers/usb/core/hub.c
+++ usb-2.6/drivers/usb/core/hub.c
@@ -163,8 +163,10 @@ static inline char *portspeed(int portst
}

/* Note that hdev or one of its children must be locked! */
-static inline struct usb_hub *hdev_to_hub(struct usb_device *hdev)
+static struct usb_hub *hdev_to_hub(struct usb_device *hdev)
{
+ if (!hdev || !hdev->actconfig)
+ return NULL;
return usb_get_intfdata(hdev->actconfig->interface[0]);
}

@@ -385,8 +387,10 @@ static void kick_khubd(struct usb_hub *h

void usb_kick_khubd(struct usb_device *hdev)
{
- /* FIXME: What if hdev isn't bound to the hub driver? */
- kick_khubd(hdev_to_hub(hdev));
+ struct usb_hub *hub = hdev_to_hub(hdev);
+
+ if (hub)
+ kick_khubd(hub);
}


2009-07-07 15:24:32

by Michael S. Zick

[permalink] [raw]
Subject: Re: Null Pointer BUG in uhci_hcd

On Tue July 7 2009, Alan Stern wrote:
> On Tue, 7 Jul 2009, Oliver Neukum wrote:
>
> > Am Dienstag, 7. Juli 2009 11:13:01 schrieb Jiri Kosina:
> > > [ adding linux-usb to CC ]
> > >
> > > On Sun, 5 Jul 2009, Michael S. Zick wrote:
> > > > Anyone with suggestions about this one?
> >
> > Looks like we should test for presence before we switch off
> > autosuspend for khubd. Please test this diagnostic patch if this is
> > repeatable.
>
> That patch won't fix the problem. This issue is not that hdev is NULL;
> the problem is that the hub driver isn't bound to hdev and as a result,
> the hdev_to_hub routine fails.
>
> This patch may work better.
>

Got it.
Anything that gives me something other than a dead machine
with a glowing power-on light is forward progress.

Will give this patch a try today and report back.

At the moment am running the machine without either USB driver loaded.
Just to see if they are contributing to the problems.

Mike
> Alan Stern
>
>
> Index: usb-2.6/drivers/usb/core/hub.c
> ===================================================================
> --- usb-2.6.orig/drivers/usb/core/hub.c
> +++ usb-2.6/drivers/usb/core/hub.c
> @@ -163,8 +163,10 @@ static inline char *portspeed(int portst
> }
>
> /* Note that hdev or one of its children must be locked! */
> -static inline struct usb_hub *hdev_to_hub(struct usb_device *hdev)
> +static struct usb_hub *hdev_to_hub(struct usb_device *hdev)
> {
> + if (!hdev || !hdev->actconfig)
> + return NULL;
> return usb_get_intfdata(hdev->actconfig->interface[0]);
> }
>
> @@ -385,8 +387,10 @@ static void kick_khubd(struct usb_hub *h
>
> void usb_kick_khubd(struct usb_device *hdev)
> {
> - /* FIXME: What if hdev isn't bound to the hub driver? */
> - kick_khubd(hdev_to_hub(hdev));
> + struct usb_hub *hub = hdev_to_hub(hdev);
> +
> + if (hub)
> + kick_khubd(hub);
> }
>
>
>
>
>

2009-07-07 15:31:59

by Alan Stern

[permalink] [raw]
Subject: Re: Null Pointer BUG in uhci_hcd

On Tue, 7 Jul 2009, Michael S. Zick wrote:

> Got it.
> Anything that gives me something other than a dead machine
> with a glowing power-on light is forward progress.
>
> Will give this patch a try today and report back.
>
> At the moment am running the machine without either USB driver loaded.
> Just to see if they are contributing to the problems.

It looks like there's a serious problem in your UHCI hardware.
According to the log you posted, it's not working at all.

Of course, that's no excuse for an oops.

Alan Stern

2009-07-07 15:43:49

by Michael S. Zick

[permalink] [raw]
Subject: Re: Null Pointer BUG in uhci_hcd

On Tue July 7 2009, Alan Stern wrote:
> On Tue, 7 Jul 2009, Michael S. Zick wrote:
>
> > Got it.
> > Anything that gives me something other than a dead machine
> > with a glowing power-on light is forward progress.
> >
> > Will give this patch a try today and report back.
> >
> > At the moment am running the machine without either USB driver loaded.
> > Just to see if they are contributing to the problems.
>
> It looks like there's a serious problem in your UHCI hardware.
> According to the log you posted, it's not working at all.
>

I am using the x86-generic drivers - -

There are VIA-device specific drivers in other arch trees,
(arm?, powerpc?) - I haven't compared them to the x86-generic
to see if they had to do something specific for 1106:3038, 1106:3104

Any idea if those drivers are "long dead" or still in use?

Mike
> Of course, that's no excuse for an oops.
>
> Alan Stern
>
>
>

2009-07-07 17:28:20

by Alan Stern

[permalink] [raw]
Subject: Re: Null Pointer BUG in uhci_hcd

On Tue, 7 Jul 2009, Michael S. Zick wrote:

> > It looks like there's a serious problem in your UHCI hardware.
> > According to the log you posted, it's not working at all.
> >
>
> I am using the x86-generic drivers - -

Not a problem with the drivers, a problem in the hardware.

> There are VIA-device specific drivers in other arch trees,
> (arm?, powerpc?) - I haven't compared them to the x86-generic
> to see if they had to do something specific for 1106:3038, 1106:3104
>
> Any idea if those drivers are "long dead" or still in use?

I don't know what drivers you're talking about.

Alan Stern

2009-07-07 17:57:00

by Michael S. Zick

[permalink] [raw]
Subject: Re: Null Pointer BUG in uhci_hcd

On Tue July 7 2009, Alan Stern wrote:
> On Tue, 7 Jul 2009, Michael S. Zick wrote:
>
> > > It looks like there's a serious problem in your UHCI hardware.
> > > According to the log you posted, it's not working at all.
> > >
> >
> > I am using the x86-generic drivers - -
>
> Not a problem with the drivers, a problem in the hardware.
>

Or hardware that works as the manufacturer intended, but differently
than expected. Same difference as "broken" when compared to "standard".

>
> > There are VIA-device specific drivers in other arch trees,
> > (arm?, powerpc?) - I haven't compared them to the x86-generic
> > to see if they had to do something specific for 1106:3038, 1106:3104
> >
> > Any idea if those drivers are "long dead" or still in use?
>
> I don't know what drivers you're talking about.
>

Let me see if I can find it again;
1106:3038 translates to: PCI_DEVICE_ID_VIA_82C586_2
1106:3104 translates to: PCI_DEVICE_ID_VIA_8235_USB_2

The first appears in (this device is the UHCI hub in the CX700 chipset):
arch/powerpc/platforms/85xx/mpc85xx_cds.c: case PCI_DEVICE_ID_VIA_82C586_2:
arch/mips/pci/fixup-lm2e.c:DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C586_2,
include/linux/pci_ids.h:#define PCI_DEVICE_ID_VIA_82C586_2 0x3038

The second appears in:
include/linux/pci_ids.h:#define PCI_DEVICE_ID_VIA_8235_USB_2 0x3104
<Hmm... Dead? Not used? Never used? Source padding?>

Not including any sources that may be hardcoded rather than use the defines.

Mike
> Alan Stern
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>

2009-07-07 19:21:20

by Alan Stern

[permalink] [raw]
Subject: Re: Null Pointer BUG in uhci_hcd

On Tue, 7 Jul 2009, Michael S. Zick wrote:

> On Tue July 7 2009, Alan Stern wrote:
> > On Tue, 7 Jul 2009, Michael S. Zick wrote:
> >
> > > > It looks like there's a serious problem in your UHCI hardware.
> > > > According to the log you posted, it's not working at all.
> > > >
> > >
> > > I am using the x86-generic drivers - -
> >
> > Not a problem with the drivers, a problem in the hardware.
> >
>
> Or hardware that works as the manufacturer intended, but differently
> than expected. Same difference as "broken" when compared to "standard".

No, hardware that doesn't work at all. As in "all reads return
0xffffffff" -- that's just a guess but something like it would account
for what you saw.

For more information enable CONFIG_USB_DEBUG and CONFIG_DEBUG_FS, then
see what's sitting in the usb/uhci/* files in debugfs.

> Let me see if I can find it again;
> 1106:3038 translates to: PCI_DEVICE_ID_VIA_82C586_2
> 1106:3104 translates to: PCI_DEVICE_ID_VIA_8235_USB_2
>
> The first appears in (this device is the UHCI hub in the CX700 chipset):
> arch/powerpc/platforms/85xx/mpc85xx_cds.c: case PCI_DEVICE_ID_VIA_82C586_2:
> arch/mips/pci/fixup-lm2e.c:DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C586_2,
> include/linux/pci_ids.h:#define PCI_DEVICE_ID_VIA_82C586_2 0x3038

Those are simple interrupt routing fixups. Not related.

> The second appears in:
> include/linux/pci_ids.h:#define PCI_DEVICE_ID_VIA_8235_USB_2 0x3104
> <Hmm... Dead? Not used? Never used? Source padding?>

Probably never used.

Alan Stern

2009-07-07 19:51:21

by Michael S. Zick

[permalink] [raw]
Subject: Re: Null Pointer BUG in uhci_hcd

On Tue July 7 2009, Alan Stern wrote:
> On Tue, 7 Jul 2009, Michael S. Zick wrote:
>
> > On Tue July 7 2009, Alan Stern wrote:
> > > On Tue, 7 Jul 2009, Michael S. Zick wrote:
> > >
> > > > > It looks like there's a serious problem in your UHCI hardware.
> > > > > According to the log you posted, it's not working at all.
> > > > >
> > > >
> > > > I am using the x86-generic drivers - -
> > >
> > > Not a problem with the drivers, a problem in the hardware.
> > >
> >
> > Or hardware that works as the manufacturer intended, but differently
> > than expected. Same difference as "broken" when compared to "standard".
>
> No, hardware that doesn't work at all. As in "all reads return
> 0xffffffff" -- that's just a guess but something like it would account
> for what you saw.
>
> For more information enable CONFIG_USB_DEBUG and CONFIG_DEBUG_FS, then
> see what's sitting in the usb/uhci/* files in debugfs.
>

Will add that to my list - -
I need to do that to find/fix that error path message flood loop in ehci anyway.

Mike
> > Let me see if I can find it again;
> > 1106:3038 translates to: PCI_DEVICE_ID_VIA_82C586_2
> > 1106:3104 translates to: PCI_DEVICE_ID_VIA_8235_USB_2
> >
> > The first appears in (this device is the UHCI hub in the CX700 chipset):
> > arch/powerpc/platforms/85xx/mpc85xx_cds.c: case PCI_DEVICE_ID_VIA_82C586_2:
> > arch/mips/pci/fixup-lm2e.c:DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C586_2,
> > include/linux/pci_ids.h:#define PCI_DEVICE_ID_VIA_82C586_2 0x3038
>
> Those are simple interrupt routing fixups. Not related.
>
> > The second appears in:
> > include/linux/pci_ids.h:#define PCI_DEVICE_ID_VIA_8235_USB_2 0x3104
> > <Hmm... Dead? Not used? Never used? Source padding?>
>
> Probably never used.
>
> Alan Stern
>
>
>

2009-07-07 20:00:52

by Alan Stern

[permalink] [raw]
Subject: Re: Null Pointer BUG in uhci_hcd

On Tue, 7 Jul 2009, Michael S. Zick wrote:

> Will add that to my list - -
> I need to do that to find/fix that error path message flood loop in ehci anyway.

What EHCI message flood?

Alan Stern

2009-07-07 20:52:22

by Michael S. Zick

[permalink] [raw]
Subject: Re: Null Pointer BUG in uhci_hcd

On Tue July 7 2009, Alan Stern wrote:
> On Tue, 7 Jul 2009, Michael S. Zick wrote:
>
> > Will add that to my list - -
> > I need to do that to find/fix that error path message flood loop in ehci anyway.
>
> What EHCI message flood?
>

Give me a chance to test what you have already put on my plate.
It isn't very interesting - when the driver shuts down a hub
for "problems found" (I can't be more specific at the moment)
It will try to immediately after resume the hub;
Which leads to it taking the hub out of service;
Which leads to it trying to resume the hub;
etc. etc. etc.
That message flood in the error recovery path.

Set: ignore_oc=1 on the command line and that will stop the flood
at the first message - but only because it hard-locks the kernel with irq's disabled.

On this machine, the "over current" sense is a false indication -
something is not within the expected range.

Thank goodness, no one has asked me to get this kernel certified
for life-support use (not possible, the VIA silicon isn't certified either). ;)

Mike
> Alan Stern
>
>
>

2009-07-07 21:24:54

by Michael S. Zick

[permalink] [raw]
Subject: Re: Null Pointer BUG in uhci_hcd

On Tue July 7 2009, Michael S. Zick wrote:
> On Tue July 7 2009, Alan Stern wrote:
> > On Tue, 7 Jul 2009, Michael S. Zick wrote:
> >
> > > Will add that to my list - -
> > > I need to do that to find/fix that error path message flood loop in ehci anyway.
> >
> > What EHCI message flood?
> >
>

Here is an example of the message paths taken - I have another 27Mbytes of the same
if this snippet isn't enough.

Jun 30 10:25:48 cb01 kernel: usb 1-1: new high speed USB device using ehci_hcd and a
ddress 5
Jun 30 10:25:48 cb01 kernel: usb 1-1: New USB device found, idVendor=0930, idProduct
=6545
Jun 30 10:25:48 cb01 kernel: usb 1-1: New USB device strings: Mfr=1, Product=2, Seri
alNumber=3
Jun 30 10:25:48 cb01 kernel: usb 1-1: Product: USB Flash Memory
Jun 30 10:25:48 cb01 kernel: usb 1-1: Manufacturer:
Jun 30 10:25:48 cb01 kernel: usb 1-1: SerialNumber: 5B8204000008
Jun 30 10:25:48 cb01 kernel: usb 1-1: configuration #1 chosen from 1 choice
Jun 30 10:25:48 cb01 kernel: scsi3 : SCSI emulation for USB Mass Storage devices
Jun 30 10:25:48 cb01 kernel: usb-storage: device found at 5
Jun 30 10:25:48 cb01 kernel: usb-storage: waiting for device to settle before scanni
ng
Jun 30 10:25:50 cb01 kernel: usb 1-1: USB disconnect, address 5
Jun 30 10:38:00 cb01 kernel: hub 2-0:1.0: over-current change on port 1
Jun 30 10:38:02 cb01 kernel: hub 2-0:1.0: connect-debounce failed, port 1 disabled
Jun 30 10:38:02 cb01 kernel: hub 2-0:1.0: over-current change on port 2
Jun 30 10:38:02 cb01 kernel: hub 2-0:1.0: over-current change on port 1
Jun 30 10:38:04 cb01 kernel: hub 2-0:1.0: connect-debounce failed, port 1 disabled
Jun 30 10:38:04 cb01 kernel: hub 2-0:1.0: over-current change on port 2
Jun 30 10:38:04 cb01 kernel: hub 2-0:1.0: over-current change on port 1
Jun 30 10:38:06 cb01 kernel: hub 2-0:1.0: connect-debounce failed, port 1 disabled
Jun 30 10:38:06 cb01 kernel: hub 2-0:1.0: over-current change on port 2
Jun 30 10:38:06 cb01 kernel: hub 2-0:1.0: over-current change on port 1
Jun 30 10:38:08 cb01 kernel: hub 2-0:1.0: connect-debounce failed, port 1 disabled
Jun 30 10:38:08 cb01 kernel: hub 2-0:1.0: over-current change on port 2
Jun 30 10:38:08 cb01 kernel: hub 2-0:1.0: over-current change on port 1
Jun 30 10:38:10 cb01 kernel: hub 2-0:1.0: connect-debounce failed, port 1 disabled
Jun 30 10:38:10 cb01 kernel: hub 2-0:1.0: over-current change on port 2
Jun 30 10:38:10 cb01 kernel: hub 2-0:1.0: over-current change on port 1
Jun 30 10:38:12 cb01 kernel: hub 2-0:1.0: connect-debounce failed, port 1 disabled
Jun 30 10:38:12 cb01 kernel: hub 2-0:1.0: over-current change on port 2
Jun 30 10:38:12 cb01 kernel: hub 2-0:1.0: over-current change on port 1
Jun 30 10:38:14 cb01 kernel: hub 2-0:1.0: connect-debounce failed, port 1 disabled
Jun 30 10:38:14 cb01 kernel: hub 2-0:1.0: over-current change on port 2
Jun 30 10:38:14 cb01 kernel: hub 2-0:1.0: over-current change on port 1
Jun 30 10:38:16 cb01 kernel: hub 2-0:1.0: connect-debounce failed, port 1 disabled
Jun 30 10:38:16 cb01 kernel: hub 2-0:1.0: over-current change on port 2
Jun 30 10:38:16 cb01 kernel: hub 2-0:1.0: over-current change on port 1

Jun 30 10:38:16 cb01 kernel: hub 2-0:1.0: over-current change on port 1
Jun 30 10:38:18 cb01 kernel: hub 2-0:1.0: connect-debounce failed, port 1 disabled
Jun 30 10:38:18 cb01 kernel: hub 2-0:1.0: over-current change on port 2
Jun 30 10:38:18 cb01 kernel: hub 2-0:1.0: over-current change on port 1
Jun 30 10:38:20 cb01 kernel: hub 2-0:1.0: connect-debounce failed, port 1 disabled
Jun 30 10:38:20 cb01 kernel: hub 2-0:1.0: over-current change on port 2
Jun 30 10:38:20 cb01 kernel: hub 2-0:1.0: over-current change on port 1
Jun 30 10:38:22 cb01 kernel: hub 2-0:1.0: connect-debounce failed, port 1 disabled
Jun 30 10:38:22 cb01 kernel: hub 2-0:1.0: over-current change on port 2
Jun 30 10:38:22 cb01 kernel: hub 2-0:1.0: over-current change on port 1
Jun 30 10:38:24 cb01 kernel: hub 2-0:1.0: connect-debounce failed, port 1 disabled
Jun 30 10:38:24 cb01 kernel: hub 2-0:1.0: over-current change on port 2
Jun 30 10:38:24 cb01 kernel: hub 2-0:1.0: over-current change on port 1
Jun 30 10:38:26 cb01 kernel: hub 2-0:1.0: connect-debounce failed, port 1 disabled
Jun 30 10:38:26 cb01 kernel: hub 2-0:1.0: over-current change on port 2
Jun 30 10:38:26 cb01 kernel: hub 2-0:1.0: over-current change on port 1
Jun 30 10:38:28 cb01 kernel: hub 2-0:1.0: connect-debounce failed, port 1 disabled
Jun 30 10:38:28 cb01 kernel: hub 2-0:1.0: over-current change on port 2
Jun 30 10:38:28 cb01 kernel: hub 2-0:1.0: over-current change on port 1
Jun 30 10:38:30 cb01 kernel: hub 2-0:1.0: connect-debounce failed, port 1 disabled
Jun 30 10:38:30 cb01 kernel: hub 2-0:1.0: over-current change on port 2
Jun 30 10:38:30 cb01 kernel: hub 2-0:1.0: over-current change on port 1
Jun 30 10:38:31 cb01 kernel: sd 2:0:0:0: Device offlined - not ready after error recovery
Jun 30 10:38:32 cb01 kernel: hub 2-0:1.0: connect-debounce failed, port 1 disabled
Jun 30 10:38:32 cb01 kernel: hub 2-0:1.0: over-current change on port 2
Jun 30 10:38:32 cb01 kernel: hub 2-0:1.0: over-current change on port 1
Jun 30 10:38:34 cb01 kernel: hub 2-0:1.0: connect-debounce failed, port 1 disabled
Jun 30 10:38:34 cb01 kernel: hub 2-0:1.0: over-current change on port 2
Jun 30 10:38:34 cb01 kernel: hub 1-0:1.0: over-current change on port 1
Jun 30 10:38:34 cb01 kernel: ehci_hcd 0000:00:10.4: port 1 resume error -19
Jun 30 10:38:34 cb01 kernel: hub 1-0:1.0: hub_port_status failed (err = -32)
Jun 30 10:38:34 cb01 kernel: hub 1-0:1.0: connect-debounce failed, port 1 disabled
Jun 30 10:38:34 cb01 kernel: hub 1-0:1.0: over-current change on port 2
Jun 30 10:38:34 cb01 kernel: ehci_hcd 0000:00:10.4: port 2 resume error -19
Jun 30 10:38:34 cb01 kernel: hub 1-0:1.0: hub_port_status failed (err = -32)
Jun 30 10:38:34 cb01 kernel: hub 1-0:1.0: connect-debounce failed, port 2 disabled
Jun 30 10:38:34 cb01 kernel: hub 1-0:1.0: over-current change on port 3
Jun 30 10:38:34 cb01 kernel: ehci_hcd 0000:00:10.4: port 3 resume error -19
Jun 30 10:38:34 cb01 kernel: hub 1-0:1.0: hub_port_status failed (err = -32)
Jun 30 10:38:34 cb01 kernel: hub 1-0:1.0: connect-debounce failed, port 3 disabled
Jun 30 10:38:34 cb01 kernel: hub 1-0:1.0: over-current change on port 4
Jun 30 10:38:34 cb01 kernel: ehci_hcd 0000:00:10.4: port 4 resume error -19
Jun 30 10:38:34 cb01 kernel: hub 1-0:1.0: hub_port_status failed (err = -32)
Jun 30 10:38:34 cb01 kernel: hub 1-0:1.0: connect-debounce failed, port 4 disabled
Jun 30 10:38:34 cb01 kernel: hub 1-0:1.0: over-current change on port 5
Jun 30 10:38:35 cb01 kernel: ehci_hcd 0000:00:10.4: port 6 resume error -19
Jun 30 10:38:35 cb01 kernel: hub 1-0:1.0: hub_port_status failed (err = -32)
Jun 30 10:38:35 cb01 kernel: hub 2-0:1.0: over-current change on port 1
Jun 30 10:38:36 cb01 kernel: hub 2-0:1.0: connect-debounce failed, port 1 disabled
Jun 30 10:38:36 cb01 kernel: hub 2-0:1.0: over-current change on port 2
Jun 30 10:38:37 cb01 kernel: hub 1-0:1.0: over-current change on port 1
Jun 30 10:38:37 cb01 kernel: ehci_hcd 0000:00:10.4: port 1 resume error -19
Jun 30 10:38:37 cb01 kernel: hub 1-0:1.0: hub_port_status failed (err = -32)
Jun 30 10:38:37 cb01 kernel: hub 1-0:1.0: connect-debounce failed, port 1 disabled
Jun 30 10:38:37 cb01 kernel: hub 1-0:1.0: over-current change on port 2
Jun 30 10:38:37 cb01 kernel: ehci_hcd 0000:00:10.4: port 2 resume error -19
Jun 30 10:38:37 cb01 kernel: hub 1-0:1.0: hub_port_status failed (err = -32)
Jun 30 10:38:37 cb01 kernel: hub 1-0:1.0: connect-debounce failed, port 2 disabled
Jun 30 10:38:37 cb01 kernel: hub 1-0:1.0: over-current change on port 3
Jun 30 10:38:37 cb01 kernel: ehci_hcd 0000:00:10.4: port 3 resume error -19
Jun 30 10:38:37 cb01 kernel: hub 1-0:1.0: hub_port_status failed (err = -32)
Jun 30 10:38:37 cb01 kernel: hub 1-0:1.0: connect-debounce failed, port 3 disabled
Jun 30 10:38:37 cb01 kernel: hub 1-0:1.0: over-current change on port 4
Jun 30 10:38:37 cb01 kernel: ehci_hcd 0000:00:10.4: port 4 resume error -19
Jun 30 10:38:37 cb01 kernel: hub 1-0:1.0: hub_port_status failed (err = -32)
Jun 30 10:38:37 cb01 kernel: ehci_hcd 0000:00:10.4: port 5 resume error -19
Jun 30 10:38:37 cb01 kernel: hub 1-0:1.0: hub_port_status failed (err = -32)
Jun 30 10:38:37 cb01 kernel: hub 1-0:1.0: over-current change on port 6
Jun 30 10:38:37 cb01 kernel: usb 1-6: USB disconnect, address 4
Jun 30 10:38:37 cb01 kernel: ehci_hcd 0000:00:10.4: port 6 resume error -19
Jun 30 10:38:37 cb01 kernel: hub 1-0:1.0: hub_port_status failed (err = -32)
Jun 30 10:38:37 cb01 kernel: hub 2-0:1.0: over-current change on port 1
Jun 30 10:38:39 cb01 kernel: __ratelimit: 2 callbacks suppressed
Jun 30 10:38:39 cb01 kernel: hub 2-0:1.0: connect-debounce failed, port 1 disabled
Jun 30 10:38:39 cb01 kernel: hub 2-0:1.0: over-current change on port 2
Jun 30 10:38:39 cb01 kernel: hub 1-0:1.0: over-current change on port 1
Jun 30 10:38:39 cb01 kernel: ehci_hcd 0000:00:10.4: port 1 resume error -19
Jun 30 10:38:39 cb01 kernel: hub 1-0:1.0: hub_port_status failed (err = -32)
Jun 30 10:38:39 cb01 kernel: hub 1-0:1.0: connect-debounce failed, port 1 disabled


> Give me a chance to test what you have already put on my plate.
>

Mike
> It isn't very interesting - when the driver shuts down a hub
> for "problems found" (I can't be more specific at the moment)
> It will try to immediately after resume the hub;
> Which leads to it taking the hub out of service;
> Which leads to it trying to resume the hub;
> etc. etc. etc.
> That message flood in the error recovery path.
>
> Set: ignore_oc=1 on the command line and that will stop the flood
> at the first message - but only because it hard-locks the kernel with irq's disabled.
>
> On this machine, the "over current" sense is a false indication -
> something is not within the expected range.
>
> Thank goodness, no one has asked me to get this kernel certified
> for life-support use (not possible, the VIA silicon isn't certified either). ;)
>
> Mike
> > Alan Stern
> >
> >
> >
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>

2009-07-08 14:43:59

by Michael S. Zick

[permalink] [raw]
Subject: Re: Null Pointer BUG in uhci_hcd

On Tue July 7 2009, Alan Stern wrote:
> On Tue, 7 Jul 2009, Michael S. Zick wrote:
>
> > On Tue July 7 2009, Alan Stern wrote:
> > > On Tue, 7 Jul 2009, Michael S. Zick wrote:
> > >
> > > > > It looks like there's a serious problem in your UHCI hardware.
> > > > > According to the log you posted, it's not working at all.
> > > > >
> > > >
> > > > I am using the x86-generic drivers - -
> > >
> > > Not a problem with the drivers, a problem in the hardware.
> > >
> >
> > Or hardware that works as the manufacturer intended, but differently
> > than expected. Same difference as "broken" when compared to "standard".
>
> No, hardware that doesn't work at all. As in "all reads return
> 0xffffffff" -- that's just a guess but something like it would account
> for what you saw.
>

Scratches head, and other parts. . .

Hmm... Already known about this chipset/cpu combination; there is
something about its hardware timing that is a "primary problem" - -

In addition to the suggested diagnostic and suggest fix - will look
into finding if that is a problem here, also.
It may just be a hardware timing issue - the processor is hitting the
hub before the hub has its act together from the last action.

Most likely will take most of the day - unless something jumps out
and says "I'm broke" during the testing. ;)

Mike
> For more information enable CONFIG_USB_DEBUG and CONFIG_DEBUG_FS, then
> see what's sitting in the usb/uhci/* files in debugfs.
>
> > Let me see if I can find it again;
> > 1106:3038 translates to: PCI_DEVICE_ID_VIA_82C586_2
> > 1106:3104 translates to: PCI_DEVICE_ID_VIA_8235_USB_2
> >
> > The first appears in (this device is the UHCI hub in the CX700 chipset):
> > arch/powerpc/platforms/85xx/mpc85xx_cds.c: case PCI_DEVICE_ID_VIA_82C586_2:
> > arch/mips/pci/fixup-lm2e.c:DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C586_2,
> > include/linux/pci_ids.h:#define PCI_DEVICE_ID_VIA_82C586_2 0x3038
>
> Those are simple interrupt routing fixups. Not related.
>
> > The second appears in:
> > include/linux/pci_ids.h:#define PCI_DEVICE_ID_VIA_8235_USB_2 0x3104
> > <Hmm... Dead? Not used? Never used? Source padding?>
>
> Probably never used.
>
> Alan Stern
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>

2009-07-08 16:30:30

by Michael S. Zick

[permalink] [raw]
Subject: Re: Null Pointer BUG in uhci_hcd

On Wed July 8 2009, Michael S. Zick wrote:
> On Tue July 7 2009, Alan Stern wrote:
> > On Tue, 7 Jul 2009, Michael S. Zick wrote:
> >
> > > On Tue July 7 2009, Alan Stern wrote:
> > > > On Tue, 7 Jul 2009, Michael S. Zick wrote:
> > > >
> > > > > > It looks like there's a serious problem in your UHCI hardware.
> > > > > > According to the log you posted, it's not working at all.
> > > > > >
> > > > >
> > > > > I am using the x86-generic drivers - -
> > > >
> > > > Not a problem with the drivers, a problem in the hardware.
> > > >
> > >
> > > Or hardware that works as the manufacturer intended, but differently
> > > than expected. Same difference as "broken" when compared to "standard".
> >
> > No, hardware that doesn't work at all. As in "all reads return
> > 0xffffffff" -- that's just a guess but something like it would account
> > for what you saw.
> >
>
> Scratches head, and other parts. . .
>
> Hmm... Already known about this chipset/cpu combination; there is
> something about its hardware timing that is a "primary problem" - -
>
> In addition to the suggested diagnostic and suggest fix - will look
> into finding if that is a problem here, also.
> It may just be a hardware timing issue - the processor is hitting the
> hub before the hub has its act together from the last action.
>
> Most likely will take most of the day - unless something jumps out
> and says "I'm broke" during the testing. ;)
>

But, then again, we might just get lucky:
ehci_hcd 0000:00:10.4: Unlink after no-IRQ? Controller is probably using the wrong IRQ.
usb 1-5: khubd timed out on ep0in len=18/64

dmesg from start of ehci load (no external devices connected):

ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
ehci_hcd: block sizes: qh 128 qtd 96 itd 160 sitd 96
ehci_hcd 0000:00:10.4: PCI INT D -> Link[LNKD] -> GSI 9 (level, low) -> IRQ 9
ehci_hcd 0000:00:10.4: setting latency timer to 64
ehci_hcd 0000:00:10.4: EHCI Host Controller
drivers/usb/core/inode.c: creating file 'devices'
drivers/usb/core/inode.c: creating file '001'
ehci_hcd 0000:00:10.4: new USB bus registered, assigned bus number 1
ehci_hcd 0000:00:10.4: reset hcs_params 0x103206 dbg=1 cc=3 pcc=2 ordered !ppc ports=6
ehci_hcd 0000:00:10.4: reset hcc_params 6872 thresh 7 uframes 256/512/1024
ehci_hcd 0000:00:10.4: reset command 080022 (park)=0 ithresh=8 Async period=1024 Reset HALT
ehci_hcd 0000:00:10.4: debug port 1
ehci_hcd 0000:00:10.4: MWI active
ehci_hcd 0000:00:10.4: supports USB remote wakeup
ehci_hcd 0000:00:10.4: irq 9, io mem 0xd1400000
ehci_hcd 0000:00:10.4: reset command 080002 (park)=0 ithresh=8 period=1024 Reset HALT
ehci_hcd 0000:00:10.4: init command 010009 (park)=0 ithresh=1 period=256 RUN
ehci_hcd 0000:00:10.4: USB 2.0 started, EHCI 1.00
usb usb1: default language 0x0409
usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb1: Product: EHCI Host Controller
usb usb1: Manufacturer: Linux 2.6.30-ce1200v-09189 ehci_hcd
usb usb1: SerialNumber: 0000:00:10.4
usb usb1: uevent
usb usb1: usb_probe_device
usb usb1: configuration #1 chosen from 1 choice
usb usb1: adding 1-0:1.0 (config #1, interface 0)
usb 1-0:1.0: uevent
hub 1-0:1.0: usb_probe_interface
hub 1-0:1.0: usb_probe_interface - got id
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 6 ports detected
hub 1-0:1.0: standalone hub
hub 1-0:1.0: no power switching (usb 1.0)
hub 1-0:1.0: individual port over-current protection
hub 1-0:1.0: power on to power good time: 20ms
hub 1-0:1.0: local power source is good
hub 1-0:1.0: trying to enable port power on non-switchable hub
drivers/usb/core/inode.c: creating file '001'
uhci_hcd: USB Universal Host Controller Interface driver
uhci_hcd 0000:00:10.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, low) -> IRQ 10
uhci_hcd 0000:00:10.0: setting latency timer to 64
uhci_hcd 0000:00:10.0: UHCI Host Controller
drivers/usb/core/inode.c: creating file '002'
uhci_hcd 0000:00:10.0: new USB bus registered, assigned bus number 2
uhci_hcd 0000:00:10.0: detected 2 ports
uhci_hcd 0000:00:10.0: uhci_check_and_reset_hc: cmd = 0x0000
uhci_hcd 0000:00:10.0: Performing full reset
uhci_hcd 0000:00:10.0: supports USB remote wakeup
uhci_hcd 0000:00:10.0: irq 10, io base 0x00004400
usb usb2: default language 0x0409
usb usb2: New USB device found, idVendor=1d6b, idProduct=0001
usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb2: Product: UHCI Host Controller
usb usb2: Manufacturer: Linux 2.6.30-ce1200v-09189 uhci_hcd
usb usb2: SerialNumber: 0000:00:10.0
usb usb2: uevent
usb usb2: usb_probe_device
usb usb2: configuration #1 chosen from 1 choice
usb usb2: adding 2-0:1.0 (config #1, interface 0)
usb 2-0:1.0: uevent
hub 2-0:1.0: usb_probe_interface
hub 2-0:1.0: usb_probe_interface - got id
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 2 ports detected
hub 2-0:1.0: standalone hub
hub 2-0:1.0: no power switching (usb 1.0)
hub 2-0:1.0: individual port over-current protection
hub 2-0:1.0: power on to power good time: 2ms
hub 2-0:1.0: local power source is good
hub 2-0:1.0: trying to enable port power on non-switchable hub
drivers/usb/core/inode.c: creating file '001'
uhci_hcd 0000:00:10.1: PCI INT B -> Link[LNKB] -> GSI 11 (level, low) -> IRQ 11
uhci_hcd 0000:00:10.1: setting latency timer to 64
uhci_hcd 0000:00:10.1: UHCI Host Controller
drivers/usb/core/inode.c: creating file '003'
uhci_hcd 0000:00:10.1: new USB bus registered, assigned bus number 3
uhci_hcd 0000:00:10.1: detected 2 ports
uhci_hcd 0000:00:10.1: uhci_check_and_reset_hc: cmd = 0x0000
uhci_hcd 0000:00:10.1: Performing full reset
uhci_hcd 0000:00:10.1: supports USB remote wakeup
uhci_hcd 0000:00:10.1: irq 11, io base 0x00004420
usb usb3: default language 0x0409
usb usb3: New USB device found, idVendor=1d6b, idProduct=0001
usb usb3: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb3: Product: UHCI Host Controller
usb usb3: Manufacturer: Linux 2.6.30-ce1200v-09189 uhci_hcd
usb usb3: SerialNumber: 0000:00:10.1
usb usb3: uevent
usb usb3: usb_probe_device
usb usb3: configuration #1 chosen from 1 choice
usb usb3: adding 3-0:1.0 (config #1, interface 0)
usb 3-0:1.0: uevent
hub 3-0:1.0: usb_probe_interface
hub 3-0:1.0: usb_probe_interface - got id
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 2 ports detected
hub 3-0:1.0: standalone hub
hub 3-0:1.0: no power switching (usb 1.0)
hub 3-0:1.0: individual port over-current protection
hub 3-0:1.0: power on to power good time: 2ms
hub 3-0:1.0: local power source is good
hub 3-0:1.0: trying to enable port power on non-switchable hub
drivers/usb/core/inode.c: creating file '001'
uhci_hcd 0000:00:10.2: PCI INT C -> Link[LNKC] -> GSI 10 (level, low) -> IRQ 10
uhci_hcd 0000:00:10.2: setting latency timer to 64
uhci_hcd 0000:00:10.2: UHCI Host Controller
drivers/usb/core/inode.c: creating file '004'
uhci_hcd 0000:00:10.2: new USB bus registered, assigned bus number 4
uhci_hcd 0000:00:10.2: detected 2 ports
uhci_hcd 0000:00:10.2: uhci_check_and_reset_hc: cmd = 0x0000
uhci_hcd 0000:00:10.2: Performing full reset
uhci_hcd 0000:00:10.2: supports USB remote wakeup
uhci_hcd 0000:00:10.2: irq 10, io base 0x00004440
usb usb4: default language 0x0409
usb usb4: New USB device found, idVendor=1d6b, idProduct=0001
usb usb4: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb4: Product: UHCI Host Controller
usb usb4: Manufacturer: Linux 2.6.30-ce1200v-09189 uhci_hcd
usb usb4: SerialNumber: 0000:00:10.2
usb usb4: uevent
usb usb4: usb_probe_device
usb usb4: configuration #1 chosen from 1 choice
usb usb4: adding 4-0:1.0 (config #1, interface 0)
usb 4-0:1.0: uevent
hub 4-0:1.0: usb_probe_interface
hub 4-0:1.0: usb_probe_interface - got id
hub 4-0:1.0: USB hub found
hub 4-0:1.0: 2 ports detected
hub 4-0:1.0: standalone hub
hub 4-0:1.0: no power switching (usb 1.0)
hub 4-0:1.0: individual port over-current protection
hub 4-0:1.0: power on to power good time: 2ms
hub 4-0:1.0: local power source is good
hub 4-0:1.0: trying to enable port power on non-switchable hub
drivers/usb/core/inode.c: creating file '001'
ehci_hcd 0000:00:10.4: GetStatus port 5 status 001803 POWER sig=j CSC CONNECT
hub 1-0:1.0: port 5: status 0501 change 0001
ehci_hcd 0000:00:10.4: GetStatus port 6 status 001803 POWER sig=j CSC CONNECT
hub 1-0:1.0: port 6: status 0501 change 0001
usbcore: registered new interface driver rtl8187
uhci_hcd 0000:00:10.0: port 1 portsc 008a,00
uhci_hcd 0000:00:10.0: port 2 portsc 008a,00
uhci_hcd 0000:00:10.1: port 1 portsc 008a,00
uhci_hcd 0000:00:10.1: port 2 portsc 008a,00
uhci_hcd 0000:00:10.2: port 1 portsc 009a,00
uhci_hcd 0000:00:10.2: port 2 portsc 009a,00
hub 1-0:1.0: state 7 ports 6 chg 0060 evt 0000
hub 1-0:1.0: port 5, status 0501, change 0000, 480 Mb/s
ehci_hcd 0000:00:10.4: port 5 high speed
ehci_hcd 0000:00:10.4: GetStatus port 5 status 001005 POWER sig=se0 PE CONNECT
usb 1-5: new high speed USB device using ehci_hcd and address 2
usb usb2: uevent
usb 2-0:1.0: uevent
usb usb3: uevent
usb 3-0:1.0: uevent
usb usb4: uevent
usb 4-0:1.0: uevent
usb usb1: uevent
usb 1-0:1.0: uevent
usb usb2: suspend_rh (auto-stop)
usb usb3: suspend_rh (auto-stop)
usb usb4: suspend_rh (auto-stop)
PM: Starting manual resume from disk
PM: Resume from partition 8:2
PM: Checking hibernation image.
PM: Resume from disk failed.
kjournald starting. Commit interval 5 seconds
EXT3-fs: mounted filesystem with writeback data mode.
ehci_hcd 0000:00:10.4: Unlink after no-IRQ? Controller is probably using the wrong IRQ.
usb 1-5: khubd timed out on ep0in len=18/64
ehci_hcd 0000:00:10.4: port 5 high speed
ehci_hcd 0000:00:10.4: GetStatus port 5 status 001005 POWER sig=se0 PE CONNECT
udev: starting version 141
usb usb2: uevent
usb 2-0:1.0: uevent
usb usb3: uevent
usb 3-0:1.0: uevent
udev: starting version 141
usb usb2: uevent
usb 2-0:1.0: uevent
usb usb3: uevent
usb 3-0:1.0: uevent
usb usb4: uevent
usb 4-0:1.0: uevent
usb usb1: uevent
usb 1-0:1.0: uevent
udev: renamed network interface eth0 to eth1
usb 1-5: khubd timed out on ep0out len=0/0
eth1: link up, 100Mbps, full-duplex, lpa 0x45E1
Adding 1004052k swap on /dev/sda2. Priority:4 extents:1 across:1004052k
usb 1-5: khubd timed out on ep0out len=0/0
usb 1-5: device not accepting address 2, error -110
ehci_hcd 0000:00:10.4: port 5 high speed
ehci_hcd 0000:00:10.4: GetStatus port 5 status 001005 POWER sig=se0 PE CONNECT
usb 1-5: new high speed USB device using ehci_hcd and address 3
EXT3 FS on sda1, internal journal
kjournald starting. Commit interval 5 seconds
EXT3 FS on sda4, internal journal
EXT3-fs: mounted filesystem with writeback data mode.
usb 1-5: khubd timed out on ep0in len=18/64
ehci_hcd 0000:00:10.4: port 5 high speed
ehci_hcd 0000:00:10.4: GetStatus port 5 status 001005 POWER sig=se0 PE CONNECT
eth1: no IPv6 routers present
usb 1-5: khubd timed out on ep0out len=0/0
usb 1-5: khubd timed out on ep0out len=0/0
usb 1-5: device not accepting address 3, error -110
ehci_hcd 0000:00:10.4: port 5 high speed
ehci_hcd 0000:00:10.4: GetStatus port 5 status 001005 POWER sig=se0 PE CONNECT
usb 1-5: new high speed USB device using ehci_hcd and address 4
usb usb2: uevent
usb 2-0:1.0: uevent
usb usb3: uevent
usb 3-0:1.0: uevent
usb usb4: uevent
usb 4-0:1.0: uevent
usb usb1: uevent
usb 1-0:1.0: uevent
usb 1-5: khubd timed out on ep0out len=0/0
usb 1-5: khubd timed out on ep0out len=0/0
usb 1-5: device not accepting address 4, error -110
ehci_hcd 0000:00:10.4: port 5 high speed
ehci_hcd 0000:00:10.4: GetStatus port 5 status 001005 POWER sig=se0 PE CONNECT
usb 1-5: new high speed USB device using ehci_hcd and address 5


> Mike
> > For more information enable CONFIG_USB_DEBUG and CONFIG_DEBUG_FS, then
> > see what's sitting in the usb/uhci/* files in debugfs.
> >

Anything in particular to look for/at?

Mike
> > > Let me see if I can find it again;
> > > 1106:3038 translates to: PCI_DEVICE_ID_VIA_82C586_2
> > > 1106:3104 translates to: PCI_DEVICE_ID_VIA_8235_USB_2
> > >
> > > The first appears in (this device is the UHCI hub in the CX700 chipset):
> > > arch/powerpc/platforms/85xx/mpc85xx_cds.c: case PCI_DEVICE_ID_VIA_82C586_2:
> > > arch/mips/pci/fixup-lm2e.c:DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_VIA, PCI_DEVICE_ID_VIA_82C586_2,
> > > include/linux/pci_ids.h:#define PCI_DEVICE_ID_VIA_82C586_2 0x3038
> >
> > Those are simple interrupt routing fixups. Not related.
> >
> > > The second appears in:
> > > include/linux/pci_ids.h:#define PCI_DEVICE_ID_VIA_8235_USB_2 0x3104
> > > <Hmm... Dead? Not used? Never used? Source padding?>
> >
> > Probably never used.
> >
> > Alan Stern
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
> >
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>

2009-07-08 16:38:54

by Michael S. Zick

[permalink] [raw]
Subject: Re: Null Pointer BUG in uhci_hcd

On Wed July 8 2009, Michael S. Zick wrote:
> On Wed July 8 2009, Michael S. Zick wrote:
> > On Tue July 7 2009, Alan Stern wrote:
> > > On Tue, 7 Jul 2009, Michael S. Zick wrote:
> > >
> > > > On Tue July 7 2009, Alan Stern wrote:
> > > > > On Tue, 7 Jul 2009, Michael S. Zick wrote:
> > > > >
> > > > > > > It looks like there's a serious problem in your UHCI hardware.
> > > > > > > According to the log you posted, it's not working at all.
> > > > > > >
> > > > > >
> > > > > > I am using the x86-generic drivers - -
> > > > >
> > > > > Not a problem with the drivers, a problem in the hardware.
> > > > >
> > > >
> > > > Or hardware that works as the manufacturer intended, but differently
> > > > than expected. Same difference as "broken" when compared to "standard".
> > >
> > > No, hardware that doesn't work at all. As in "all reads return
> > > 0xffffffff" -- that's just a guess but something like it would account
> > > for what you saw.
> > >

I tried to combine the addition of the diagnostic print (BUG())
statements and the potential fix - -
This is what I put into the code for today's testing:

diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index be86ae3..d686f1d 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -160,8 +160,10 @@ static inline char *portspeed(int portstatus)
}

/* Note that hdev or one of its children must be locked! */
-static inline struct usb_hub *hdev_to_hub(struct usb_device *hdev)
+static struct usb_hub *hdev_to_hub(struct usb_device *hdev)
{
+ if (!hdev || !hdev->actconfig)
+ return NULL;
return usb_get_intfdata(hdev->actconfig->interface[0]);
}

@@ -369,11 +371,16 @@ static void kick_khubd(struct usb_hub *hub)
{
unsigned long flags;

- /* Suppress autosuspend until khubd runs */
- to_usb_interface(hub->intfdev)->pm_usage_cnt = 1;
+ if (!hub) {
+ BUG();
+ return;
+ }

spin_lock_irqsave(&hub_event_lock, flags);
if (!hub->disconnected && list_empty(&hub->event_list)) {
+ /* Suppress autosuspend until khubd runs */
+ to_usb_interface(hub->intfdev)->pm_usage_cnt = 1;
+
list_add_tail(&hub->event_list, &hub_event_list);
wake_up(&khubd_wait);
}
@@ -382,8 +389,19 @@ static void kick_khubd(struct usb_hub *hub)

void usb_kick_khubd(struct usb_device *hdev)
{
- /* FIXME: What if hdev isn't bound to the hub driver? */
- kick_khubd(hdev_to_hub(hdev));
+ struct usb_hub *hub;
+
+ if (!hdev) {
+ BUG();
+ return;
+ }
+ hub = hdev_to_hub(hdev);
+ if (hub) {
+ kick_khubd(hub);
+ } else {
+ BUG();
+ return;
+ }
}

Mike

2009-07-08 19:55:14

by Alan Stern

[permalink] [raw]
Subject: Re: Null Pointer BUG in uhci_hcd

On Tue, 7 Jul 2009, Michael S. Zick wrote:

> Here is an example of the message paths taken - I have another 27Mbytes of the same
> if this snippet isn't enough.

> Jun 30 10:38:00 cb01 kernel: hub 2-0:1.0: over-current change on port 1
> Jun 30 10:38:02 cb01 kernel: hub 2-0:1.0: connect-debounce failed, port 1 disabled
> Jun 30 10:38:02 cb01 kernel: hub 2-0:1.0: over-current change on port 2
> Jun 30 10:38:02 cb01 kernel: hub 2-0:1.0: over-current change on port 1
> Jun 30 10:38:04 cb01 kernel: hub 2-0:1.0: connect-debounce failed, port 1 disabled
> Jun 30 10:38:04 cb01 kernel: hub 2-0:1.0: over-current change on port 2

This is typical of the errors on bus 2. ehci-hcd receives
status-change interrupts because of the over-current-change flag. The
hub driver tries to debounce port 1 (presumably because a device is
connected there) but gives up after two seconds. Then the cycle
repeats.

> Jun 30 10:38:34 cb01 kernel: hub 1-0:1.0: over-current change on port 1
> Jun 30 10:38:34 cb01 kernel: ehci_hcd 0000:00:10.4: port 1 resume error -19
> Jun 30 10:38:34 cb01 kernel: hub 1-0:1.0: hub_port_status failed (err = -32)
> Jun 30 10:38:34 cb01 kernel: hub 1-0:1.0: connect-debounce failed, port 1 disabled
> Jun 30 10:38:34 cb01 kernel: hub 1-0:1.0: over-current change on port 2
> Jun 30 10:38:34 cb01 kernel: ehci_hcd 0000:00:10.4: port 2 resume error -19
> Jun 30 10:38:34 cb01 kernel: hub 1-0:1.0: hub_port_status failed (err = -32)
> Jun 30 10:38:34 cb01 kernel: hub 1-0:1.0: connect-debounce failed, port 2 disabled

This is typical of the problems on bus 1. Again we see
over-current-change flags, and this time there's the "resume error
-19". The only reason that code can appear is if a read returns
0xffffffff, i.e., ~(u32)0. See handshake() in ehci-hcd.c, and see the
call to handshake in ehci-hub.c shortly before the line containing
"resume error".

It's like I said, your hardware really isn't working properly.

Alan Stern

2009-07-08 19:57:36

by Alan Stern

[permalink] [raw]
Subject: Re: Null Pointer BUG in uhci_hcd

On Wed, 8 Jul 2009, Michael S. Zick wrote:

> But, then again, we might just get lucky:
> ehci_hcd 0000:00:10.4: Unlink after no-IRQ? Controller is probably using the wrong IRQ.
> usb 1-5: khubd timed out on ep0in len=18/64

Like I've been telling you all along, the hardware isn't working. I
suspect it's worse than a simple interrupt-routing mistake.

> > > For more information enable CONFIG_USB_DEBUG and CONFIG_DEBUG_FS, then
> > > see what's sitting in the usb/uhci/* files in debugfs.
> > >
>
> Anything in particular to look for/at?

Just post the contents of those files.

Alan Stern

2009-07-08 20:22:47

by Michael S. Zick

[permalink] [raw]
Subject: Re: Null Pointer BUG in uhci_hcd

On Wed July 8 2009, Alan Stern wrote:
> On Wed, 8 Jul 2009, Michael S. Zick wrote:
>
> > But, then again, we might just get lucky:
> > ehci_hcd 0000:00:10.4: Unlink after no-IRQ? Controller is probably using the wrong IRQ.
> > usb 1-5: khubd timed out on ep0in len=18/64
>
> Like I've been telling you all along, the hardware isn't working.
>

That sounds very fragile.

> I suspect it's worse than a simple interrupt-routing mistake.
>

I would not object to your removing that one mistake - that is one less
to contend with.

> > > > For more information enable CONFIG_USB_DEBUG and CONFIG_DEBUG_FS, then
> > > > see what's sitting in the usb/uhci/* files in debugfs.
> > > >
> >

Fours up-time - nothing written to dmesg since boot -
Just the usual hard-lockup.

Also - something in the combined fix/diagnostic patch
disable any sensing of external usb events.
So I couldn't very well poke devices at it. ;)

> > Anything in particular to look for/at?
>
> Just post the contents of those files.
>

Will capture that information during this afternoon's testing.

Mike
> Alan Stern
>
>
>

2009-07-08 21:02:59

by Alan Stern

[permalink] [raw]
Subject: Re: Null Pointer BUG in uhci_hcd

On Wed, 8 Jul 2009, Michael S. Zick wrote:

> > Like I've been telling you all along, the hardware isn't working.
> >
>
> That sounds very fragile.

I'm not sure how to interpret that sentence. If you mean that your
system's hardware design isn't good, I agree. If you mean that the
kernel shouldn't produce a lot of output when faced with broken
hardware, that's not so clear. What else should it do (bearing in mind
that the kernel can't tell the hardware is broken)?

> > I suspect it's worse than a simple interrupt-routing mistake.
> >
>
> I would not object to your removing that one mistake - that is one less
> to contend with.

I didn't say there was an interrupt-routing mistake; I said it was
_worse_ than an interrupt-routing mistake.

> > > > > For more information enable CONFIG_USB_DEBUG and CONFIG_DEBUG_FS, then
> > > > > see what's sitting in the usb/uhci/* files in debugfs.
> > > > >
> > >
>
> Fours up-time - nothing written to dmesg since boot -
> Just the usual hard-lockup.
>
> Also - something in the combined fix/diagnostic patch
> disable any sensing of external usb events.
> So I couldn't very well poke devices at it. ;)

You should take out those BUG statements. If they ever trigger, they
will certainly lock up your system.

Alan Stern

2009-07-08 22:31:50

by Michael S. Zick

[permalink] [raw]
Subject: Re: Null Pointer BUG in uhci_hcd

On Wed July 8 2009, Alan Stern wrote:
> On Wed, 8 Jul 2009, Michael S. Zick wrote:
>
> > > Like I've been telling you all along, the hardware isn't working.
> > >
> >
> > That sounds very fragile.
>
> I'm not sure how to interpret that sentence. If you mean that your
> system's hardware design isn't good, I agree. If you mean that the
> kernel shouldn't produce a lot of output when faced with broken
> hardware, that's not so clear. What else should it do (bearing in mind
> that the kernel can't tell the hardware is broken)?
>

It is unlikely that VIA Tech. will recall the CX700 chipset.

So being able to take a device off-line (like the driver claims it is doing)
and *leave* it off-line - until told to "try again" - that would be an
improvement.

The current process of filling up the /var/log directory until the machine
chokes is a rather fragile sort of response to a hot-plugged device, good or bad.

> > > I suspect it's worse than a simple interrupt-routing mistake.
> > >
> >
> > I would not object to your removing that one mistake - that is one less
> > to contend with.
>
> I didn't say there was an interrupt-routing mistake; I said it was
> _worse_ than an interrupt-routing mistake.
>

Never claimed you did - the driver made that claim.
But still, it would be nice to get rid of the interrupt-routing mistake.

I suppose waiting for the OLPC project to find and fix the problem is
a viable alternative also.
You will not be the first to give up on this C7-M/CX700 combination. ;)

> > > > > > For more information enable CONFIG_USB_DEBUG and CONFIG_DEBUG_FS, then
> > > > > > see what's sitting in the usb/uhci/* files in debugfs.
> > > > > >
> > > >
> >
> > Fours up-time - nothing written to dmesg since boot -
> > Just the usual hard-lockup.
> >
> > Also - something in the combined fix/diagnostic patch
> > disable any sensing of external usb events.
> > So I couldn't very well poke devices at it. ;)
>
> You should take out those BUG statements. If they ever trigger, they
> will certainly lock up your system.
>

Although I would expect that they not do so silently.

I take them out and just in time before starting the next
round of testing.

Mike
> Alan Stern
>
>
>

2009-07-08 23:56:25

by Michael S. Zick

[permalink] [raw]
Subject: Re: Null Pointer BUG in uhci_hcd

On Wed July 8 2009, Alan Stern wrote:
> On Wed, 8 Jul 2009, Michael S. Zick wrote:
>
> > But, then again, we might just get lucky:
> > ehci_hcd 0000:00:10.4: Unlink after no-IRQ? Controller is probably using the wrong IRQ.
> > usb 1-5: khubd timed out on ep0in len=18/64
>
> Like I've been telling you all along, the hardware isn't working. I
> suspect it's worse than a simple interrupt-routing mistake.
>
> > > > For more information enable CONFIG_USB_DEBUG and CONFIG_DEBUG_FS, then
> > > > see what's sitting in the usb/uhci/* files in debugfs.
> > > >
> >
> > Anything in particular to look for/at?
>
> Just post the contents of those files.
>
Done:

root@cb01:/sys/kernel/debug/uhci# cat *
Root-hub state: suspended FSBR: 0
HC status
usbcmd = 0048 Maxp32 CF EGSM
usbstat = 0020 HCHalted
usbint = 0002
usbfrnum = (1)4a8
flbaseadd = 19c7f000
sof = 40
stat1 = 0480 OverCurrent
stat2 = 0480 OverCurrent
Most recent frame: 52a (298) Last ISO frame: 52a (298)
Periodic load table
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Total: 0, #INT: 0, #ISO: 0
Root-hub state: suspended FSBR: 0
HC status
usbcmd = 0048 Maxp32 CF EGSM
usbstat = 0020 HCHalted
usbint = 0002
usbfrnum = (1)4a0
flbaseadd = 19c63000
sof = 40
stat1 = 0480 OverCurrent
stat2 = 0480 OverCurrent
Most recent frame: 528 (296) Last ISO frame: 528 (296)
Periodic load table
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Total: 0, #INT: 0, #ISO: 0
Root-hub state: running FSBR: 0
HC status
usbcmd = 00c1 Maxp64 CF RS
usbstat = 0000
usbint = 000f
usbfrnum = (0)c1c
flbaseadd = 19c9c000
sof = 40
stat1 = 0495 OverCurrent Enabled Connected
stat2 = 0495 OverCurrent Enabled Connected
Most recent frame: 56aee (750) Last ISO frame: 56aee (750)
Periodic load table
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
17 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Total: 17, #INT: 1, #ISO: 0

And, at no extra cost:

root@cb01:/sys/kernel/debug/ehci/0000:00:10.4# cat *
size = 256
bus pci, device 0000:00:10.4
EHCI Host Controller
EHCI 1.00, hcd state 4
ownership 01000001 linux
SMI sts/enable 0xe00f0000
structural params 0x00103206
capability params 0x00006872
status 100f Halt FLR PCD ERR INT
command 010008 (park)=0 ithresh=1 period=256 HALT
intrenable 37 IAA FATAL PCD ERR INT
uframe 14db
port 1 status 701000 POWER sig=se0
port 2 status 701000 POWER sig=se0
port 3 status 701000 POWER sig=se0
port 4 status 701000 POWER sig=se0
port 5 status 703802 POWER OWNER sig=j CSC
port 6 status 703802 POWER OWNER sig=j CSC
irq normal 0 err 0 reclaim 0 (lost 20)
complete 0 unlink 20

Mike
> Alan Stern
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>

2009-07-09 00:00:19

by Michael S. Zick

[permalink] [raw]
Subject: Re: Null Pointer BUG in uhci_hcd

On Wed July 8 2009, Michael S. Zick wrote:
> On Wed July 8 2009, Alan Stern wrote:
> > On Wed, 8 Jul 2009, Michael S. Zick wrote:
> >
> > > But, then again, we might just get lucky:
> > > ehci_hcd 0000:00:10.4: Unlink after no-IRQ? Controller is probably using the wrong IRQ.
> > > usb 1-5: khubd timed out on ep0in len=18/64
> >
> > Like I've been telling you all along, the hardware isn't working. I
> > suspect it's worse than a simple interrupt-routing mistake.
> >
> > > > > For more information enable CONFIG_USB_DEBUG and CONFIG_DEBUG_FS, then
> > > > > see what's sitting in the usb/uhci/* files in debugfs.
> > > > >
> > >
> > > Anything in particular to look for/at?
> >
> > Just post the contents of those files.
> >
> Done:
>

All "BUG" statements out - -
Still does not notice plugging in an external device.
Nothing, zip, nada in the dmesg as a result of plug/unplug.

Mike
> root@cb01:/sys/kernel/debug/uhci# cat *
> Root-hub state: suspended FSBR: 0
> HC status
> usbcmd = 0048 Maxp32 CF EGSM
> usbstat = 0020 HCHalted
> usbint = 0002
> usbfrnum = (1)4a8
> flbaseadd = 19c7f000
> sof = 40
> stat1 = 0480 OverCurrent
> stat2 = 0480 OverCurrent
> Most recent frame: 52a (298) Last ISO frame: 52a (298)
> Periodic load table
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> Total: 0, #INT: 0, #ISO: 0
> Root-hub state: suspended FSBR: 0
> HC status
> usbcmd = 0048 Maxp32 CF EGSM
> usbstat = 0020 HCHalted
> usbint = 0002
> usbfrnum = (1)4a0
> flbaseadd = 19c63000
> sof = 40
> stat1 = 0480 OverCurrent
> stat2 = 0480 OverCurrent
> Most recent frame: 528 (296) Last ISO frame: 528 (296)
> Periodic load table
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> Total: 0, #INT: 0, #ISO: 0
> Root-hub state: running FSBR: 0
> HC status
> usbcmd = 00c1 Maxp64 CF RS
> usbstat = 0000
> usbint = 000f
> usbfrnum = (0)c1c
> flbaseadd = 19c9c000
> sof = 40
> stat1 = 0495 OverCurrent Enabled Connected
> stat2 = 0495 OverCurrent Enabled Connected
> Most recent frame: 56aee (750) Last ISO frame: 56aee (750)
> Periodic load table
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 17 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> Total: 17, #INT: 1, #ISO: 0
>
> And, at no extra cost:
>
> root@cb01:/sys/kernel/debug/ehci/0000:00:10.4# cat *
> size = 256
> bus pci, device 0000:00:10.4
> EHCI Host Controller
> EHCI 1.00, hcd state 4
> ownership 01000001 linux
> SMI sts/enable 0xe00f0000
> structural params 0x00103206
> capability params 0x00006872
> status 100f Halt FLR PCD ERR INT
> command 010008 (park)=0 ithresh=1 period=256 HALT
> intrenable 37 IAA FATAL PCD ERR INT
> uframe 14db
> port 1 status 701000 POWER sig=se0
> port 2 status 701000 POWER sig=se0
> port 3 status 701000 POWER sig=se0
> port 4 status 701000 POWER sig=se0
> port 5 status 703802 POWER OWNER sig=j CSC
> port 6 status 703802 POWER OWNER sig=j CSC
> irq normal 0 err 0 reclaim 0 (lost 20)
> complete 0 unlink 20
>
> Mike
> > Alan Stern
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
> >
>
>
>

2009-07-09 14:19:13

by Alan Stern

[permalink] [raw]
Subject: Re: Null Pointer BUG in uhci_hcd

On Wed, 8 Jul 2009, Michael S. Zick wrote:

> It is unlikely that VIA Tech. will recall the CX700 chipset.
>
> So being able to take a device off-line (like the driver claims it is doing)
> and *leave* it off-line - until told to "try again" - that would be an
> improvement.

Sorry, you lost me there. In all the logs you have posted, I can find
only one line where the kernel claims to be taking a device offline:

> Jun 30 10:38:31 cb01 kernel: sd 2:0:0:0: Device offlined - not ready after error recovery

And in that case it _did_ leave the device offline. So what are you
concerned about?


> The current process of filling up the /var/log directory until the machine
> chokes is a rather fragile sort of response to a hot-plugged device, good or bad.

It isn't a response to a hot-plugged device; it's a response to broken
hardware. If your hardware was working properly you could hot-plug
and hot-unplug devices 'till you turned blue in the face, without
filling up the /var/log directory.


> > > > I suspect it's worse than a simple interrupt-routing mistake.
> > > >
> > >
> > > I would not object to your removing that one mistake - that is one less
> > > to contend with.
> >
> > I didn't say there was an interrupt-routing mistake; I said it was
> > _worse_ than an interrupt-routing mistake.
> >
>
> Never claimed you did - the driver made that claim.
> But still, it would be nice to get rid of the interrupt-routing mistake.

How can you get rid of an interrupt-routine mistake if there is no such
mistake in the first place?

Not that I'm claiming there is no such mistake -- the logs you have
provided aren't clear in this respect. So that's the first issue to
address: Determine whether the interrupts are or aren't being routed
correctly.

To that end, you should try doing some more directed testing.

Start with a nice cold boot, with no USB devices plugged in. Copy the
dmesg log and clear the kernel's log buffer. And just to get as much
information as possible, start a process copying usbmon's 0u file
(you'll have to enable CONFIG_USB_MON if it isn't already enabled).

Then plug in a high-speed device. When everything settles down, copy
the dmesg buffer again and also get a copy of the
/sys/kernel/debug/ehci/0000:00:10.4/registers file. Those, together
with the usbmon trace, will provide a good starting point.

Assuming something goes wrong, of course. If everything works okay,
you'll have to keep trying similar experiments (plugging and unplugging
devices) until something breaks.

Alan Stern

2009-07-09 14:20:38

by Alan Stern

[permalink] [raw]
Subject: Re: Null Pointer BUG in uhci_hcd

On Wed, 8 Jul 2009, Michael S. Zick wrote:

> > > > > For more information enable CONFIG_USB_DEBUG and CONFIG_DEBUG_FS, then
> > > > > see what's sitting in the usb/uhci/* files in debugfs.
> > > > >
> > >
> > > Anything in particular to look for/at?
> >
> > Just post the contents of those files.
> >
> Done:
>
> root@cb01:/sys/kernel/debug/uhci# cat *
> Root-hub state: suspended FSBR: 0
> HC status
> usbcmd = 0048 Maxp32 CF EGSM
> usbstat = 0020 HCHalted
> usbint = 0002
> usbfrnum = (1)4a8
> flbaseadd = 19c7f000
> sof = 40
> stat1 = 0480 OverCurrent
> stat2 = 0480 OverCurrent
> Most recent frame: 52a (298) Last ISO frame: 52a (298)
> Periodic load table
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> Total: 0, #INT: 0, #ISO: 0
> Root-hub state: suspended FSBR: 0
> HC status
> usbcmd = 0048 Maxp32 CF EGSM
> usbstat = 0020 HCHalted
> usbint = 0002
> usbfrnum = (1)4a0
> flbaseadd = 19c63000
> sof = 40
> stat1 = 0480 OverCurrent
> stat2 = 0480 OverCurrent
> Most recent frame: 528 (296) Last ISO frame: 528 (296)
> Periodic load table
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> Total: 0, #INT: 0, #ISO: 0
> Root-hub state: running FSBR: 0
> HC status
> usbcmd = 00c1 Maxp64 CF RS
> usbstat = 0000
> usbint = 000f
> usbfrnum = (0)c1c
> flbaseadd = 19c9c000
> sof = 40
> stat1 = 0495 OverCurrent Enabled Connected
> stat2 = 0495 OverCurrent Enabled Connected
> Most recent frame: 56aee (750) Last ISO frame: 56aee (750)
> Periodic load table
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 17 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> Total: 17, #INT: 1, #ISO: 0
>
> And, at no extra cost:
>
> root@cb01:/sys/kernel/debug/ehci/0000:00:10.4# cat *
> size = 256
> bus pci, device 0000:00:10.4
> EHCI Host Controller
> EHCI 1.00, hcd state 4
> ownership 01000001 linux
> SMI sts/enable 0xe00f0000
> structural params 0x00103206
> capability params 0x00006872
> status 100f Halt FLR PCD ERR INT
> command 010008 (park)=0 ithresh=1 period=256 HALT
> intrenable 37 IAA FATAL PCD ERR INT
> uframe 14db
> port 1 status 701000 POWER sig=se0
> port 2 status 701000 POWER sig=se0
> port 3 status 701000 POWER sig=se0
> port 4 status 701000 POWER sig=se0
> port 5 status 703802 POWER OWNER sig=j CSC
> port 6 status 703802 POWER OWNER sig=j CSC
> irq normal 0 err 0 reclaim 0 (lost 20)
> complete 0 unlink 20

Those files all look perfectly normal. You made these copies at a time
when everything was working, right? It will be more interesting to see
them at a time when something isn't working.

Alan Stern

2009-07-09 14:47:28

by Michael S. Zick

[permalink] [raw]
Subject: Re: Null Pointer BUG in uhci_hcd

On Thu July 9 2009, Alan Stern wrote:
> On Wed, 8 Jul 2009, Michael S. Zick wrote:
>
> > > > > > For more information enable CONFIG_USB_DEBUG and CONFIG_DEBUG_FS, then
> > > > > > see what's sitting in the usb/uhci/* files in debugfs.
> > > > > >
> > > >
> > > > Anything in particular to look for/at?
> > >
> > > Just post the contents of those files.
> > >
> > Done:
> >
> > root@cb01:/sys/kernel/debug/uhci# cat *
> > Root-hub state: suspended FSBR: 0
> > HC status
> > usbcmd = 0048 Maxp32 CF EGSM
> > usbstat = 0020 HCHalted
> > usbint = 0002
> > usbfrnum = (1)4a8
> > flbaseadd = 19c7f000
> > sof = 40
> > stat1 = 0480 OverCurrent
> > stat2 = 0480 OverCurrent
> > Most recent frame: 52a (298) Last ISO frame: 52a (298)
> > Periodic load table
> > 0 0 0 0 0 0 0 0
> > 0 0 0 0 0 0 0 0
> > 0 0 0 0 0 0 0 0
> > 0 0 0 0 0 0 0 0
> > Total: 0, #INT: 0, #ISO: 0
> > Root-hub state: suspended FSBR: 0
> > HC status
> > usbcmd = 0048 Maxp32 CF EGSM
> > usbstat = 0020 HCHalted
> > usbint = 0002
> > usbfrnum = (1)4a0
> > flbaseadd = 19c63000
> > sof = 40
> > stat1 = 0480 OverCurrent
> > stat2 = 0480 OverCurrent
> > Most recent frame: 528 (296) Last ISO frame: 528 (296)
> > Periodic load table
> > 0 0 0 0 0 0 0 0
> > 0 0 0 0 0 0 0 0
> > 0 0 0 0 0 0 0 0
> > 0 0 0 0 0 0 0 0
> > Total: 0, #INT: 0, #ISO: 0
> > Root-hub state: running FSBR: 0
> > HC status
> > usbcmd = 00c1 Maxp64 CF RS
> > usbstat = 0000
> > usbint = 000f
> > usbfrnum = (0)c1c
> > flbaseadd = 19c9c000
> > sof = 40
> > stat1 = 0495 OverCurrent Enabled Connected
> > stat2 = 0495 OverCurrent Enabled Connected
> > Most recent frame: 56aee (750) Last ISO frame: 56aee (750)
> > Periodic load table
> > 0 0 0 0 0 0 0 0
> > 0 0 0 0 0 0 0 0
> > 17 0 0 0 0 0 0 0
> > 0 0 0 0 0 0 0 0
> > Total: 17, #INT: 1, #ISO: 0
> >
> > And, at no extra cost:
> >
> > root@cb01:/sys/kernel/debug/ehci/0000:00:10.4# cat *
> > size = 256
> > bus pci, device 0000:00:10.4
> > EHCI Host Controller
> > EHCI 1.00, hcd state 4
> > ownership 01000001 linux
> > SMI sts/enable 0xe00f0000
> > structural params 0x00103206
> > capability params 0x00006872
> > status 100f Halt FLR PCD ERR INT
> > command 010008 (park)=0 ithresh=1 period=256 HALT
> > intrenable 37 IAA FATAL PCD ERR INT
> > uframe 14db
> > port 1 status 701000 POWER sig=se0
> > port 2 status 701000 POWER sig=se0
> > port 3 status 701000 POWER sig=se0
> > port 4 status 701000 POWER sig=se0
> > port 5 status 703802 POWER OWNER sig=j CSC
> > port 6 status 703802 POWER OWNER sig=j CSC
> > irq normal 0 err 0 reclaim 0 (lost 20)
> > complete 0 unlink 20
>
> Those files all look perfectly normal. You made these copies at a time
> when everything was working, right? It will be more interesting to see
> them at a time when something isn't working.
>

First data point posted only - that was: "with patched driver" vs "no driver" -
Note what problems are outside of this investigation.

In case you missed it - this is the CX700 integrated chipset in a NetBook.
No all USB devices are removable in the sense of being physically impossible.

The documentation for the CX700 has been released from NDA and is available
at the VIA Linux portal - just in case you don't already have your copy.

More testing today - including trying some of the many kernel options available
for dealing with funky irq, bios, and acpi implementations.

On the subject of irq's:

root@cb01:~# biosdecode
# biosdecode 2.9
SMBIOS 2.4 present.
Structure Table Length: 1556 bytes
Structure Table Address: 0x000DC010
Number Of Structures: 47
Maximum Structure Size: 119 bytes
BIOS32 Service Directory present.
Revision: 0
Calling Interface Address: 0x000FDD64
ACPI 1.0 present.
OEM Identifier: PTLTD
RSD Table 32-bit Address: 0x1BEE5663
PNP BIOS 1.0 present.
Event Notification: Not Supported
Real Mode 16-bit Code Address: E923:768E
Real Mode 16-bit Data Address: 0040:0000
16-bit Protected Mode Code Address: 0x00015FCF
16-bit Protected Mode Data Address: 0x00000400
PCI Interrupt Routing 1.0 present.
Router ID: 00:11.0
Exclusive IRQs: None
Compatible Router: 1106:8324
Slot Entry 1: ID 00:00, on-board
Slot Entry 2: ID 00:11, on-board
Slot Entry 3: ID 00:10, on-board
Slot Entry 4: ID 00:0f, on-board
Slot Entry 5: ID 04:09, slot number 9
Slot Entry 6: ID 00:01, on-board
Slot Entry 7: ID 01:00, on-board
Slot Entry 8: ID 00:13, on-board
Slot Entry 9: ID 02:01, on-board
Slot Entry 10: ID 00:13, on-board
Slot Entry 11: ID 03:03, slot number 49
Slot Entry 12: ID 03:04, slot number 50
Slot Entry 13: ID 03:05, slot number 51
Slot Entry 14: ID 03:06, slot number 52
Slot Entry 15: ID 03:07, slot number 53

Whatever a VIA 8324 router happens to be - - -

All four of the different brands of "Nano-book reference design"
machines have the same, or very similar, BIOS (all the VIA demo
board BIOS). That interrupt table is the same in all machines.
Only one of the manufacturer's even bothered to change the serial
number of the machine (1234567890) - they are that "similar".

Mike

> Alan Stern
>
>
>

2009-07-09 15:09:04

by Alan Stern

[permalink] [raw]
Subject: Re: Null Pointer BUG in uhci_hcd

On Thu, 9 Jul 2009, Michael S. Zick wrote:

> In case you missed it - this is the CX700 integrated chipset in a NetBook.
> No all USB devices are removable in the sense of being physically impossible.

Do the best you can.

> The documentation for the CX700 has been released from NDA and is available
> at the VIA Linux portal - just in case you don't already have your copy.
>
> More testing today - including trying some of the many kernel options available
> for dealing with funky irq, bios, and acpi implementations.
>
> On the subject of irq's:
>
> root@cb01:~# biosdecode
> # biosdecode 2.9
> SMBIOS 2.4 present.
> Structure Table Length: 1556 bytes
> Structure Table Address: 0x000DC010
> Number Of Structures: 47
> Maximum Structure Size: 119 bytes
> BIOS32 Service Directory present.
> Revision: 0
> Calling Interface Address: 0x000FDD64
> ACPI 1.0 present.
> OEM Identifier: PTLTD
> RSD Table 32-bit Address: 0x1BEE5663
> PNP BIOS 1.0 present.
> Event Notification: Not Supported
> Real Mode 16-bit Code Address: E923:768E
> Real Mode 16-bit Data Address: 0040:0000
> 16-bit Protected Mode Code Address: 0x00015FCF
> 16-bit Protected Mode Data Address: 0x00000400
> PCI Interrupt Routing 1.0 present.
> Router ID: 00:11.0
> Exclusive IRQs: None
> Compatible Router: 1106:8324
> Slot Entry 1: ID 00:00, on-board
> Slot Entry 2: ID 00:11, on-board
> Slot Entry 3: ID 00:10, on-board
> Slot Entry 4: ID 00:0f, on-board
> Slot Entry 5: ID 04:09, slot number 9
> Slot Entry 6: ID 00:01, on-board
> Slot Entry 7: ID 01:00, on-board
> Slot Entry 8: ID 00:13, on-board
> Slot Entry 9: ID 02:01, on-board
> Slot Entry 10: ID 00:13, on-board
> Slot Entry 11: ID 03:03, slot number 49
> Slot Entry 12: ID 03:04, slot number 50
> Slot Entry 13: ID 03:05, slot number 51
> Slot Entry 14: ID 03:06, slot number 52
> Slot Entry 15: ID 03:07, slot number 53

Means nothing to me. I'm no ACPI or chipset expert.

> Whatever a VIA 8324 router happens to be - - -
>
> All four of the different brands of "Nano-book reference design"
> machines have the same, or very similar, BIOS (all the VIA demo
> board BIOS). That interrupt table is the same in all machines.
> Only one of the manufacturer's even bothered to change the serial
> number of the machine (1234567890) - they are that "similar".

Alan Stern

2009-07-09 16:07:45

by Michael S. Zick

[permalink] [raw]
Subject: Re: Null Pointer BUG in uhci_hcd

On Thu July 9 2009, Alan Stern wrote:
> On Thu, 9 Jul 2009, Michael S. Zick wrote:
>

Going to have to mark this thread as "suspended" -
Have a 1Mbyte per minute memory leak that needs finding first. ;)

Mike
> > In case you missed it - this is the CX700 integrated chipset in a NetBook.
> > No all USB devices are removable in the sense of being physically impossible.
>
> Do the best you can.
>
> > The documentation for the CX700 has been released from NDA and is available
> > at the VIA Linux portal - just in case you don't already have your copy.
> >
> > More testing today - including trying some of the many kernel options available
> > for dealing with funky irq, bios, and acpi implementations.
> >
> > On the subject of irq's:
> >
> > root@cb01:~# biosdecode
> > # biosdecode 2.9
> > SMBIOS 2.4 present.
> > Structure Table Length: 1556 bytes
> > Structure Table Address: 0x000DC010
> > Number Of Structures: 47
> > Maximum Structure Size: 119 bytes
> > BIOS32 Service Directory present.
> > Revision: 0
> > Calling Interface Address: 0x000FDD64
> > ACPI 1.0 present.
> > OEM Identifier: PTLTD
> > RSD Table 32-bit Address: 0x1BEE5663
> > PNP BIOS 1.0 present.
> > Event Notification: Not Supported
> > Real Mode 16-bit Code Address: E923:768E
> > Real Mode 16-bit Data Address: 0040:0000
> > 16-bit Protected Mode Code Address: 0x00015FCF
> > 16-bit Protected Mode Data Address: 0x00000400
> > PCI Interrupt Routing 1.0 present.
> > Router ID: 00:11.0
> > Exclusive IRQs: None
> > Compatible Router: 1106:8324
> > Slot Entry 1: ID 00:00, on-board
> > Slot Entry 2: ID 00:11, on-board
> > Slot Entry 3: ID 00:10, on-board
> > Slot Entry 4: ID 00:0f, on-board
> > Slot Entry 5: ID 04:09, slot number 9
> > Slot Entry 6: ID 00:01, on-board
> > Slot Entry 7: ID 01:00, on-board
> > Slot Entry 8: ID 00:13, on-board
> > Slot Entry 9: ID 02:01, on-board
> > Slot Entry 10: ID 00:13, on-board
> > Slot Entry 11: ID 03:03, slot number 49
> > Slot Entry 12: ID 03:04, slot number 50
> > Slot Entry 13: ID 03:05, slot number 51
> > Slot Entry 14: ID 03:06, slot number 52
> > Slot Entry 15: ID 03:07, slot number 53
>
> Means nothing to me. I'm no ACPI or chipset expert.
>
> > Whatever a VIA 8324 router happens to be - - -
> >
> > All four of the different brands of "Nano-book reference design"
> > machines have the same, or very similar, BIOS (all the VIA demo
> > board BIOS). That interrupt table is the same in all machines.
> > Only one of the manufacturer's even bothered to change the serial
> > number of the machine (1234567890) - they are that "similar".
>
> Alan Stern
>
>
>

2009-07-09 20:59:16

by Michael S. Zick

[permalink] [raw]
Subject: Re: Null Pointer BUG in uhci_hcd

On Thu July 9 2009, Michael S. Zick wrote:
> On Thu July 9 2009, Alan Stern wrote:
> > On Thu, 9 Jul 2009, Michael S. Zick wrote:
> >
>
> Going to have to mark this thread as "suspended" -
> Have a 1Mbyte per minute memory leak that needs finding first. ;)
>

Hmmm... acpi=noirq stops the memory leak. (???)
That's the easy part - now to figure out why. ;)

Then we can go back to this thread; see if there was a real driver problem
or this thread was just demonstrating an artifact of the above <whatever>.

Mike

> Mike
> > > In case you missed it - this is the CX700 integrated chipset in a NetBook.
> > > No all USB devices are removable in the sense of being physically impossible.
> >
> > Do the best you can.
> >
> > > The documentation for the CX700 has been released from NDA and is available
> > > at the VIA Linux portal - just in case you don't already have your copy.
> > >
> > > More testing today - including trying some of the many kernel options available
> > > for dealing with funky irq, bios, and acpi implementations.
> > >
> > > On the subject of irq's:
> > >
> > > root@cb01:~# biosdecode
> > > # biosdecode 2.9
> > > SMBIOS 2.4 present.
> > > Structure Table Length: 1556 bytes
> > > Structure Table Address: 0x000DC010
> > > Number Of Structures: 47
> > > Maximum Structure Size: 119 bytes
> > > BIOS32 Service Directory present.
> > > Revision: 0
> > > Calling Interface Address: 0x000FDD64
> > > ACPI 1.0 present.
> > > OEM Identifier: PTLTD
> > > RSD Table 32-bit Address: 0x1BEE5663
> > > PNP BIOS 1.0 present.
> > > Event Notification: Not Supported
> > > Real Mode 16-bit Code Address: E923:768E
> > > Real Mode 16-bit Data Address: 0040:0000
> > > 16-bit Protected Mode Code Address: 0x00015FCF
> > > 16-bit Protected Mode Data Address: 0x00000400
> > > PCI Interrupt Routing 1.0 present.
> > > Router ID: 00:11.0
> > > Exclusive IRQs: None
> > > Compatible Router: 1106:8324
> > > Slot Entry 1: ID 00:00, on-board
> > > Slot Entry 2: ID 00:11, on-board
> > > Slot Entry 3: ID 00:10, on-board
> > > Slot Entry 4: ID 00:0f, on-board
> > > Slot Entry 5: ID 04:09, slot number 9
> > > Slot Entry 6: ID 00:01, on-board
> > > Slot Entry 7: ID 01:00, on-board
> > > Slot Entry 8: ID 00:13, on-board
> > > Slot Entry 9: ID 02:01, on-board
> > > Slot Entry 10: ID 00:13, on-board
> > > Slot Entry 11: ID 03:03, slot number 49
> > > Slot Entry 12: ID 03:04, slot number 50
> > > Slot Entry 13: ID 03:05, slot number 51
> > > Slot Entry 14: ID 03:06, slot number 52
> > > Slot Entry 15: ID 03:07, slot number 53
> >
> > Means nothing to me. I'm no ACPI or chipset expert.
> >
> > > Whatever a VIA 8324 router happens to be - - -
> > >
> > > All four of the different brands of "Nano-book reference design"
> > > machines have the same, or very similar, BIOS (all the VIA demo
> > > board BIOS). That interrupt table is the same in all machines.
> > > Only one of the manufacturer's even bothered to change the serial
> > > number of the machine (1234567890) - they are that "similar".
> >
> > Alan Stern
> >
> >
> >
>
>
>

2009-07-11 19:08:12

by Michael S. Zick

[permalink] [raw]
Subject: uhci_hcd fails to off-line device/hub on error condition

On Wed July 8 2009, Michael S. Zick wrote:
> On Wed July 8 2009, Michael S. Zick wrote:
> > On Wed July 8 2009, Michael S. Zick wrote:
> > > On Tue July 7 2009, Alan Stern wrote:
> > > > On Tue, 7 Jul 2009, Michael S. Zick wrote:
> > > >
> > > > > On Tue July 7 2009, Alan Stern wrote:
> > > > > > On Tue, 7 Jul 2009, Michael S. Zick wrote:
> > > > > >
> > > > > > > > It looks like there's a serious problem in your UHCI hardware.
> > > > > > > > According to the log you posted, it's not working at all.
> > > > > > > >
> > > > > > >
> > > > > > > I am using the x86-generic drivers - -
> > > > > >
> > > > > > Not a problem with the drivers, a problem in the hardware.
> > > > > >
> > > > >
> > > > > Or hardware that works as the manufacturer intended, but differently
> > > > > than expected. Same difference as "broken" when compared to "standard".
> > > >
> > > > No, hardware that doesn't work at all. As in "all reads return
> > > > 0xffffffff" -- that's just a guess but something like it would account
> > > > for what you saw.
> > > >
>
> I tried to combine the addition of the diagnostic print (BUG())
> statements and the potential fix - -
> This is what I put into the code for today's testing:
>
> diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> index be86ae3..d686f1d 100644
> --- a/drivers/usb/core/hub.c
> +++ b/drivers/usb/core/hub.c
> @@ -160,8 +160,10 @@ static inline char *portspeed(int portstatus)
> }
>
> /* Note that hdev or one of its children must be locked! */
> -static inline struct usb_hub *hdev_to_hub(struct usb_device *hdev)
> +static struct usb_hub *hdev_to_hub(struct usb_device *hdev)
> {
> + if (!hdev || !hdev->actconfig)
> + return NULL;
> return usb_get_intfdata(hdev->actconfig->interface[0]);
> }
>
> @@ -369,11 +371,16 @@ static void kick_khubd(struct usb_hub *hub)
> {
> unsigned long flags;
>
> - /* Suppress autosuspend until khubd runs */
> - to_usb_interface(hub->intfdev)->pm_usage_cnt = 1;
> + if (!hub) {
> + BUG();
> + return;
> + }
>
> spin_lock_irqsave(&hub_event_lock, flags);
> if (!hub->disconnected && list_empty(&hub->event_list)) {
> + /* Suppress autosuspend until khubd runs */
> + to_usb_interface(hub->intfdev)->pm_usage_cnt = 1;
> +
> list_add_tail(&hub->event_list, &hub_event_list);
> wake_up(&khubd_wait);
> }
> @@ -382,8 +389,19 @@ static void kick_khubd(struct usb_hub *hub)
>
> void usb_kick_khubd(struct usb_device *hdev)
> {
> - /* FIXME: What if hdev isn't bound to the hub driver? */
> - kick_khubd(hdev_to_hub(hdev));
> + struct usb_hub *hub;
> +
> + if (!hdev) {
> + BUG();
> + return;
> + }
> + hub = hdev_to_hub(hdev);
> + if (hub) {
> + kick_khubd(hub);
> + } else {
> + BUG();
> + return;
> + }
> }
>

- - Megabytes of the same snipped - -
uhci_hcd 0000:00:10.2: host controller process error, something bad happened!
uhci_hcd 0000:00:10.1: host system error, PCI problems?
uhci_hcd 0000:00:10.1: host controller process error, something bad happened!
uhci_hcd 0000:00:10.2: host system error, PCI problems?
uhci_hcd 0000:00:10.2: host controller process error, something bad happened!

- - driver can detect unexpected behavior sufficent to post the messages - -

uhci_hcd 0000:00:10.1: host system error, PCI problems?
uhci_hcd 0000:00:10.1: host controller process error, something bad happened!

- - but never does anything about it - other than bitch - -
- - insert external action: rmmod uhci_hcd here - -

uhci_hcd 0000:00:10.2: remove, state 1
uhci_hcd 0000:00:10.2: roothub graceful disconnect
usb usb4: USB disconnect, address 1
usb usb4: unregistering device
usb usb4: usb_disable_device nuking all URBs
usb usb4: unregistering interface 4-0:1.0
uhci_hcd 0000:00:10.2: host system error, PCI problems?
uhci_hcd 0000:00:10.2: host controller process error, something bad happened!
uhci_hcd 0000:00:10.1: host system error, PCI problems?
uhci_hcd 0000:00:10.1: host controller process error, something bad happened!
uhci_hcd 0000:00:10.1: host system error, PCI problems?
uhci_hcd 0000:00:10.1: host controller process error, something bad happened!
uhci_hcd 0000:00:10.2: host system error, PCI problems?
uhci_hcd 0000:00:10.2: host controller process error, something bad happened!
uhci_hcd 0000:00:10.1: host system error, PCI problems?
uhci_hcd 0000:00:10.1: host controller process error, something bad happened!
uhci_hcd 0000:00:10.2: host system error, PCI problems?
uhci_hcd 0000:00:10.2: host controller process error, something bad happened!
uhci_hcd 0000:00:10.1: host system error, PCI problems?
uhci_hcd 0000:00:10.1: host controller process error, something bad happened!
uhci_hcd 0000:00:10.2: host system error, PCI problems?
uhci_hcd 0000:00:10.2: host controller process error, something bad happened!
uhci_hcd 0000:00:10.1: host system error, PCI problems?
uhci_hcd 0000:00:10.1: host controller process error, something bad happened!
usb 4-0:1.0: uevent
usb usb4: uevent
uhci_hcd 0000:00:10.2: USB bus 4 deregistered
uhci_hcd 0000:00:10.1: remove, state 1
uhci_hcd 0000:00:10.1: roothub graceful disconnect
usb usb3: USB disconnect, address 1
usb usb3: unregistering device
usb usb3: usb_disable_device nuking all URBs
usb usb3: unregistering interface 3-0:1.0
usb 3-0:1.0: uevent
usb usb3: uevent
uhci_hcd 0000:00:10.1: USB bus 3 deregistered
uhci_hcd 0000:00:10.0: remove, state 0
uhci_hcd 0000:00:10.0: roothub graceful disconnect
usb usb2: USB disconnect, address 1
usb usb2: unregistering device
usb usb2: usb_disable_device nuking all URBs
usb usb2: unregistering interface 2-0:1.0
usb 2-0:1.0: uevent
usb usb2: uevent
uhci_hcd 0000:00:10.0: USB bus 2 deregistered

- - end event - - (with apologies to China Syndrome)

Mike
> Mike
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>

2009-07-12 15:16:42

by Alan Stern

[permalink] [raw]
Subject: Re: uhci_hcd fails to off-line device/hub on error condition

On Sat, 11 Jul 2009, Michael S. Zick wrote:

> - - Megabytes of the same snipped - -
> uhci_hcd 0000:00:10.2: host controller process error, something bad happened!
> uhci_hcd 0000:00:10.1: host system error, PCI problems?
> uhci_hcd 0000:00:10.1: host controller process error, something bad happened!
> uhci_hcd 0000:00:10.2: host system error, PCI problems?
> uhci_hcd 0000:00:10.2: host controller process error, something bad happened!
>
> - - driver can detect unexpected behavior sufficent to post the messages - -
>
> uhci_hcd 0000:00:10.1: host system error, PCI problems?
> uhci_hcd 0000:00:10.1: host controller process error, something bad happened!
>
> - - but never does anything about it - other than bitch - -

Yep. The problem identified by those messages is supposed to be very
rare -- they mean that the PCI bus is overloaded and consequently the
USB controller wasn't able to transfer the data it needed sufficiently
rapidly. (Except in your case, where they really mean that something
is wrong with the hardware. But how can the kernel be expected to
realize this?)

The driver doesn't do anything about it because there's nothing to be
done. Bus overloads are very rare and quickly go away by themselves.
It isn't a fatal error.

Now, I suppose we could add a test for badly misbehaving hardware.
You can do it yourself: In drivers/usb/host/uhci-hcd.c, find the big
"if" statement in uhci_irq() that tests for any unexpected bits in
the "status" variable. Try printing out the value of status; if it
turns out to be 0xffff then we can detect your problem and take the
controller offline.

Alan Stern