2001-04-03 18:46:38

by Miles Lane

[permalink] [raw]
Subject: Contacts within AMD? AMD-756 USB host-controller blacklisted due to erratum #4.

Running 2.4.2-ac28, I get the following error:

usb-ohci.c: 00:07.4 (Advanced Micro Devices [AMD] AMD-756 [Viper] USB):
blacklisted, erratum #4

David Brownell recently added this check to the usb-ohci driver
since noone has gotten information from AMD for the workaround,
which is rumored to exist, for this bug.

Do any of you have contacts within AMD who might be able to
get an explanation of the workaround to David Brownell?

The bug is that the NDP value sent by the AMD-756 is sometimes
bogus. The following examples, collected before the chip
was blacklisted, show the failure. As you can see, the bogus
value given varies. Rereading NDP seems to give a valid value.
I am not really clear why we don't simply read the value twice
whenever the host-controller is detected to be an AMD-756.

Mar 4 17:20:52 aerie kernel: usb-ohci.c: bogus NDP=128 for OHCI
usb-00:07.4
Mar 4 17:20:52 aerie kernel: usb-ohci.c: rereads as NDP=4

Mar 4 17:50:29 aerie kernel: usb-ohci.c: bogus NDP=245 for OHCI
usb-00:07.4
Mar 4 17:50:29 aerie kernel: usb-ohci.c: rereads as NDP=4

Mar 6 21:11:07 aerie kernel: usb-ohci.c: bogus NDP=210 for OHCI
usb-00:07.4
Mar 6 21:11:07 aerie kernel: usb-ohci.c: rereads as NDP=4

Thanks,
Miles


2001-04-03 19:09:27

by Alan

[permalink] [raw]
Subject: Re: Contacts within AMD? AMD-756 USB host-controller blacklisted due to

> David Brownell recently added this check to the usb-ohci driver
> since noone has gotten information from AMD for the workaround,
> which is rumored to exist, for this bug.
>
> Do any of you have contacts within AMD who might be able to
> get an explanation of the workaround to David Brownell?

We are working on that currently via the Red Hat contact.

> value given varies. Rereading NDP seems to give a valid value.
> I am not really clear why we don't simply read the value twice
> whenever the host-controller is detected to be an AMD-756.

because we dont know the full scope of the problem yet.


2001-04-04 19:13:57

by Thomas Dodd

[permalink] [raw]
Subject: Re: Contacts within AMD? AMD-756 USB host-controller blacklisted due to

Alan Cox wrote:
>
> > David Brownell recently added this check to the usb-ohci driver
> > since noone has gotten information from AMD for the workaround,
> > which is rumored to exist, for this bug.
> >
> > Do any of you have contacts within AMD who might be able to
> > get an explanation of the workaround to David Brownell?
>
> We are working on that currently via the Red Hat contact.
>
> > value given varies. Rereading NDP seems to give a valid value.
> > I am not really clear why we don't simply read the value twice
> > whenever the host-controller is detected to be an AMD-756.
>
> because we dont know the full scope of the problem yet.

Exactly how many bug reports has this caused?
What kind of problems?

I know I had trouble onece, but it was a CONFIG problem
with the 2.4.2ac series and the extra DEBUG options.

-Thomas

2001-04-04 19:54:30

by Miles Lane

[permalink] [raw]
Subject: Re: Contacts within AMD? AMD-756 USB host-controller blacklisted due to

Thomas Dodd wrote:

> Alan Cox wrote:
>
>>> David Brownell recently added this check to the usb-ohci driver
>>> since noone has gotten information from AMD for the workaround,
>>> which is rumored to exist, for this bug.
>>>
>>> Do any of you have contacts within AMD who might be able to
>>> get an explanation of the workaround to David Brownell?
>>
>> We are working on that currently via the Red Hat contact.
>>
>>
>>> value given varies. Rereading NDP seems to give a valid value.
>>> I am not really clear why we don't simply read the value twice
>>> whenever the host-controller is detected to be an AMD-756.
>>
>> because we dont know the full scope of the problem yet.
>
>
> Exactly how many bug reports has this caused?
> What kind of problems?
>
> I know I had trouble onece, but it was a CONFIG problem
> with the 2.4.2ac series and the extra DEBUG options.

I think probably everyone who has an AMD-756 has reported
this error. At least, I've not seen any messages from
people saying, "I have an AMD-756 and have never seen this
error." Most of the time, when the error occurs, it seems
pretty benign. That is, I haven't noticed it crashing USB
device connections, causing data corruption or OOPSen.
Some folks _have_ reported OOPSen, though, that seemed to
be triggered by the erratum #4 hardware bug. I think I
may have had one of these a long time ago.

I believe David has found that there definitely are code
paths where this hardware bug can cause failures of various
sorts and that's why the AMD-756 has been blacklisted.
I don't believe these failure code paths have anything to
do with specific debugging configurations.

David/Alan, please correct me if I've got this all wrong.

Thanks,
Miles

2001-04-04 22:20:37

by Joachim 'roh' Steiger

[permalink] [raw]
Subject: Re: Contacts within AMD? AMD-756 USB host-controller blacklisted due to

i would like to help to track down this problem
i'm using a gigabyte 7IXE revision 1.1
kernel is 2.4.1

lspci output for usb:
00:07.4 USB Controller: Advanced Micro Devices [AMD] AMD-756 [Viper] USB
(rev 06) (prog-i
f 10 [OHCI])
Flags: bus master, medium devsel, latency 16, IRQ 11
Memory at efffc000 (32-bit, non-prefetchable) [size=4K]


On Wed, 4 Apr 2001, Miles Lane wrote:
> Thomas Dodd wrote:
>
> > Alan Cox wrote:
> >
> >> because we dont know the full scope of the problem yet.
> > Exactly how many bug reports has this caused?
> > What kind of problems?

here i only have this kernelmessage floating around in my logfiles about 1
time the day:

Apr 4 14:47:15 campari kernel: usb-ohci.c: bogus NDP=204 for OHCI
usb-00:07.4
Apr 4 14:47:15 campari kernel: usb-ohci.c: rereads as NDP=4

> error." Most of the time, when the error occurs, it seems
> pretty benign. That is, I haven't noticed it crashing USB
> device connections, causing data corruption or OOPSen.
> Some folks _have_ reported OOPSen, though, that seemed to
> be triggered by the erratum #4 hardware bug. I think I
> may have had one of these a long time ago.

as you see it's revision 6
i've had no other problems with usb for now and use this
idVendor 0x046d Logitech Inc.
idProduct 0xc00c
usb-wheelmouse all the time

i've never had this kernel or previous kernel (2.4.0test8) oopsen
and it runs perfectly stable here

> I believe David has found that there definitely are code
> paths where this hardware bug can cause failures of various
> sorts and that's why the AMD-756 has been blacklisted.

since i did'nt cause any troubles here i would not like to have the
complete AMD-756 blacklisted in the ohci-driver
eventually only some revisions are that bad

please correct me if i'm wrong i only don't want to blacklist complete
chipset-series

roh
--
Joachim 'roh' Steiger mailto:[email protected]
Convergence Integrated Media GmbH http://www.convergence.de
Rosenthaler Str. 51 fon: +49(0)30-72 62 06 77
10178 Berlin, Germany fax: +49(0)30-72 62 06 55

2001-04-04 22:40:52

by Miles Lane

[permalink] [raw]
Subject: Re: Contacts within AMD? AMD-756 USB host-controller blacklisted due to

Joachim 'roh' Steiger wrote:

> i would like to help to track down this problem
> i'm using a gigabyte 7IXE revision 1.1
> kernel is 2.4.1
>
> lspci output for usb:
> 00:07.4 USB Controller: Advanced Micro Devices [AMD] AMD-756 [Viper] USB
> (rev 06) (prog-i
> f 10 [OHCI])
> Flags: bus master, medium devsel, latency 16, IRQ 11
> Memory at efffc000 (32-bit, non-prefetchable) [size=4K]
>
>
> On Wed, 4 Apr 2001, Miles Lane wrote:
>
>> Thomas Dodd wrote:
>>
>>
>>> Alan Cox wrote:
>>>
>>>
>>>> because we dont know the full scope of the problem yet.
>>>
>>> Exactly how many bug reports has this caused?
>>> What kind of problems?
>>
>
> here i only have this kernelmessage floating around in my logfiles about 1
> time the day:
>
> Apr 4 14:47:15 campari kernel: usb-ohci.c: bogus NDP=204 for OHCI
> usb-00:07.4
> Apr 4 14:47:15 campari kernel: usb-ohci.c: rereads as NDP=4
>
>
>> error." Most of the time, when the error occurs, it seems
>> pretty benign. That is, I haven't noticed it crashing USB
>> device connections, causing data corruption or OOPSen.
>> Some folks _have_ reported OOPSen, though, that seemed to
>> be triggered by the erratum #4 hardware bug. I think I
>> may have had one of these a long time ago.
>
>
> as you see it's revision 6
> i've had no other problems with usb for now and use this
> idVendor 0x046d Logitech Inc.
> idProduct 0xc00c
> usb-wheelmouse all the time
>
> i've never had this kernel or previous kernel (2.4.0test8) oopsen
> and it runs perfectly stable here
>
>
>> I believe David has found that there definitely are code
>> paths where this hardware bug can cause failures of various
>> sorts and that's why the AMD-756 has been blacklisted.
>
>
> since i did'nt cause any troubles here i would not like to have the
> complete AMD-756 blacklisted in the ohci-driver
> eventually only some revisions are that bad
>
> please correct me if i'm wrong i only don't want to blacklist complete
> chipset-series

Hi Joachim,

Personally, I agree with you, but I can also understand David's
desire to avoid wasting time chasing phantom bugs that only
show up due to this broken hardware. If it turns out that
there is actually a well-defined workaround that AMD will
tell us about, it shouldn't take too long before we have a
real fix and the AMD-756 can be taken off of the blacklist.

My guess is that there are specific drivers for which this
hardware bug causes problems. You probably just aren't
using the *right* drivers. :-)

Luckily, USB add-on cards are pretty cheap, so I suppose you
could just put a new host-controller in your test machine
for a month or two until David and Alan get this sorted out
with AMD. Think of it this way, you'll have more hardware
configurations to test with, so get a UHCI or EHCI card.
Woohoo! (Only half kidding)

Miles

2001-04-04 23:01:55

by David Brownell

[permalink] [raw]
Subject: Re: Contacts within AMD? AMD-756 USB host-controller blacklisted due to

> Apr 4 14:47:15 campari kernel: usb-ohci.c: bogus NDP=204 for OHCI
> usb-00:07.4
> Apr 4 14:47:15 campari kernel: usb-ohci.c: rereads as NDP=4

Means that your system would have oopsed if it hadn't
tested for the bogus register read (NDP). That's only one
path; other bogus reads (which could also oops) on other
paths are undetected. Slightly less-bogus reads on that
particular path may not be detected, and can still oops.

> please correct me if i'm wrong i only don't want to blacklist complete
> chipset-series

Then feel free to develop and submit a better fix. That'd
be more practical if AMD's workaround were public. As I
understand it, the bulk of the production chips have this
erratum. More power to RedHat getting info from AMD.
Meanwhile, this patch improves robustness.

- Dave


2001-04-04 23:48:06

by Ryan Butler

[permalink] [raw]
Subject: Re: Contacts within AMD? AMD-756 USB host-controller blacklisted due to

Miles Lane wrote:

>
>
> Personally, I agree with you, but I can also understand David's
> desire to avoid wasting time chasing phantom bugs that only
> show up due to this broken hardware. If it turns out that
> there is actually a well-defined workaround that AMD will
> tell us about, it shouldn't take too long before we have a
> real fix and the AMD-756 can be taken off of the blacklist.
>
> My guess is that there are specific drivers for which this
> hardware bug causes problems. You probably just aren't
> using the *right* drivers. :-)
>
00:07.4 USB Controller: Advanced Micro Devices [AMD] AMD-756 [Viper] USB
(rev 06) (prog-if 10 [OHCI])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr-
Stepping- SERR+ FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 80 max, 16 set, cache line size 08
Interrupt: pin D routed to IRQ 10
Region 0: Memory at efffd000 (32-bit, non-prefetchable) [size=4K]


This is also a rev 06 chip, on a MSI K7Pro (Slot A) board.

The only item I use with it is a Creative Webcam III and I have yet to
see this error with any kernel version in the 2.4.x series.

So I think you might be right about certain drivers exposing the
hardware flaw.

As for not blacklisting the whole chipset, its probably the smart thing
to do, and for those who really really want to use the driver anyhow, a
simple diff between the usb-ohci.c files between 2.4.2 and 2.4.3 shows
how to remove the blacklist.

Ryan Butler
ADI Internet Solutions
[email protected]

2001-04-05 00:09:58

by Thomas Dodd

[permalink] [raw]
Subject: Re: Contacts within AMD? AMD-756 USB host-controller blacklisted dueto

diff -u --new-file --recursive linux-2.4.3-ac2.orig/drivers/usb/Config.in linux-2.4.3-ac2/drivers/usb/Config.in
--- linux-2.4.3-ac2.orig/drivers/usb/Config.in Wed Apr 4 15:23:13 2001
+++ linux-2.4.3-ac2/drivers/usb/Config.in Wed Apr 4 16:13:52 2001
@@ -24,6 +24,9 @@
dep_tristate ' UHCI Alternate Driver (JE) support' CONFIG_USB_UHCI_ALT $CONFIG_USB
fi
dep_tristate ' OHCI (Compaq, iMacs, OPTi, SiS, ALi, ...) support' CONFIG_USB_OHCI $CONFIG_USB
+ if [ "$CONFIG_EXPERIMENTAL" = "y" ]; then
+ bool ' AMD-756 OHCI support (DANGEROUS)(EXPERIMENTAL)' CONFIG_AMD_OHCI_OK
+ fi

comment 'USB Device Class drivers'
dep_tristate ' USB Audio support' CONFIG_USB_AUDIO $CONFIG_USB $CONFIG_SOUND
diff -u --new-file --recursive linux-2.4.3-ac2.orig/drivers/usb/usb-ohci.c linux-2.4.3-ac2/drivers/usb/usb-ohci.c
--- linux-2.4.3-ac2.orig/drivers/usb/usb-ohci.c Wed Apr 4 15:23:15 2001
+++ linux-2.4.3-ac2/drivers/usb/usb-ohci.c Wed Apr 4 16:18:01 2001
@@ -2332,13 +2332,14 @@
unsigned long mem_resource, mem_len;
void *mem_base;

+#ifndef CONFIG_AMD_OHCI_OK
/* blacklisted hardware? */
if (id->driver_data) {
info ("%s (%s): %s", dev->slot_name,
dev->name, (char *) id->driver_data);
return -ENODEV;
}
-
+#endif
if (pci_enable_device(dev) < 0)
return -ENODEV;

@@ -2508,6 +2509,7 @@
* AMD-756 [Viper] USB has a serious erratum when used with
* lowspeed devices like mice; oopses have been seen. The
* vendor workaround needs an NDA ... for now, blacklist it.
+ * Use CONFIG_AMD_OHCI_OK to try anyway.
*/
vendor: 0x1022,
device: 0x740c,


Attachments:
AMD-USB.patch (1.73 kB)

2001-04-05 00:26:31

by Alan

[permalink] [raw]
Subject: Re: Contacts within AMD? AMD-756 USB host-controller blacklisted dueto

> Comprimise?
>
> This patch make it a config option to enable the AMD-756.
> It's marked DANGEROUS and EXPERIMENTAL, and is only
> available if CONFIG_EXPERIMENTAL is set.

Since we expect to get errata docs very soon Im not that worried. As an
implementation I'd rather a module option of 'ignore_blacklist' or similar
so that it is runtime

2001-04-05 20:31:55

by Thomas Dodd

[permalink] [raw]
Subject: Re: Contacts within AMD? AMD-756 USB host-controller blacklisted dueto

diff -u --new-file --recursive linux-2.4.3-ac2.orig/drivers/usb/usb-ohci.c linux-2.4.3-ac2/drivers/usb/usb-ohci.c
--- linux-2.4.3-ac2.orig/drivers/usb/usb-ohci.c Wed Apr 4 15:23:15 2001
+++ linux-2.4.3-ac2/drivers/usb/usb-ohci.c Thu Apr 5 14:02:08 2001
@@ -92,6 +92,10 @@
static LIST_HEAD (ohci_hcd_list);
static spinlock_t usb_ed_lock = SPIN_LOCK_UNLOCKED;

+static int overrideBlacklist = 0;
+MODULE_PARM(overrideBlacklist, "i");
+MODULE_PARM_DESC(overrideBlacklist, " override blacklisted controlers");
+
/*-------------------------------------------------------------------------*
* URB support functions
*-------------------------------------------------------------------------*/
@@ -2333,12 +2337,13 @@
void *mem_base;

/* blacklisted hardware? */
- if (id->driver_data) {
- info ("%s (%s): %s", dev->slot_name,
+ if (overrideBlacklist != 1){
+ if (id->driver_data) {
+ info ("%s (%s): %s", dev->slot_name,
dev->name, (char *) id->driver_data);
return -ENODEV;
+ }
}
-
if (pci_enable_device(dev) < 0)
return -ENODEV;


Attachments:
USB.patch (1.04 kB)