2013-03-26 14:21:01

by Peter Hurley

[permalink] [raw]
Subject: [PATCH v2] firewire: Fix 'failed to read phy reg' on FW643 rev8

With the LSI FW643 rev 8 [1], the first commanded bus reset at
the conclusion of ohci_enable() has been observed to fail with
the following messages:

[ 4.884015] firewire_ohci 0000:01:00.0: failed to read phy reg
....
[ 5.684012] firewire_ohci 0000:01:00.0: failed to read phy reg

With drivers/firewire/ohci.c instrumented, the error condition [2]
indicates the PHY arbitration state machine has timed out prior to
enabling PHY LCtrl.

Furthermore, instrumenting ohci_enable() shows that LPS has been
enabled within 1 ms.

Test LPS latching every 1 ms rather than every 50ms.

[1] lspci -v

01:00.0 FireWire (IEEE 1394): LSI Corporation FW643 [TrueFire] PCIe 1394b Controller (rev 08) (prog-if 10 [OHCI])
Subsystem: LSI Corporation FW643 [TrueFire] PCIe 1394b Controller
Flags: bus master, fast devsel, latency 0, IRQ 92
Memory at fbeff000 (64-bit, non-prefetchable) [size=4K]
Capabilities: [44] Power Management version 3
Capabilities: [4c] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [60] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Virtual Channel
Capabilities: [170] Device Serial Number 08-14-43-82-00-00-41-fc
Kernel driver in use: firewire_ohci
Kernel modules: firewire-ohci

[2] instrumented WARNING in read_phy_reg()

[ 4.576010] ------------[ cut here ]------------
[ 4.576035] WARNING: at ./drivers/firewire/ohci.c:570 read_phy_reg+0x93/0xe0 [firewire_ohci]()
[ 4.576050] Hardware name: Precision WorkStation T5400
[ 4.576058] failed to read phy reg:1 (phy(5) @ config enhance:19)
[ 4.576068] Modules linked in: hid_logitech_dj hid_generic(+) usbhid <...snip...>
[ 4.576140] Pid: 61, comm: kworker/2:1 Not tainted 3.8.0-2+fwtest-xeon #2+fwtest
[ 4.576149] Call Trace:
[ 4.576160] [<ffffffff8105468f>] warn_slowpath_common+0x7f/0xc0
[ 4.576168] [<ffffffff81054786>] warn_slowpath_fmt+0x46/0x50
[ 4.576178] [<ffffffffa00caca3>] read_phy_reg+0x93/0xe0 [firewire_ohci]
[ 4.576188] [<ffffffffa00cae19>] ohci_read_phy_reg+0x39/0x60 [firewire_ohci]
[ 4.576203] [<ffffffffa00731ff>] fw_send_phy_config+0xbf/0xe0 [firewire_core]
[ 4.576214] [<ffffffffa006b2d6>] br_work+0x46/0xb0 [firewire_core]
[ 4.576225] [<ffffffff81071e0c>] process_one_work+0x13c/0x500
[ 4.576238] [<ffffffffa006b290>] ? fw_card_initialize+0x180/0x180 [firewire_core]
[ 4.576248] [<ffffffff810737ed>] worker_thread+0x16d/0x470
[ 4.576257] [<ffffffff81073680>] ? busy_worker_rebind_fn+0x100/0x100
[ 4.576266] [<ffffffff8107d160>] kthread+0xc0/0xd0
[ 4.576275] [<ffffffff816a0000>] ? pcpu_dump_alloc_info+0x1cb/0x2c4
[ 4.576284] [<ffffffff8107d0a0>] ? kthread_create_on_node+0x130/0x130
[ 4.576297] [<ffffffff816b2f6c>] ret_from_fork+0x7c/0xb0
[ 4.576305] [<ffffffff8107d0a0>] ? kthread_create_on_node+0x130/0x130
[ 4.576313] ---[ end trace cbc940994b300302 ]---

Reported-by: Tim Jordan <[email protected]>
Signed-off-by: Peter Hurley <[email protected]>
---
drivers/firewire/ohci.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/firewire/ohci.c b/drivers/firewire/ohci.c
index 45912e6..044ace3 100644
--- a/drivers/firewire/ohci.c
+++ b/drivers/firewire/ohci.c
@@ -2268,8 +2268,8 @@ static int ohci_enable(struct fw_card *card,
OHCI1394_HCControl_postedWriteEnable);
flush_writes(ohci);

- for (lps = 0, i = 0; !lps && i < 3; i++) {
- msleep(50);
+ for (lps = 0, i = 0; !lps && i < 150; i++) {
+ msleep(1);
lps = reg_read(ohci, OHCI1394_HCControlSet) &
OHCI1394_HCControl_LPS;
}
--
1.8.1.2


2013-04-28 12:16:15

by Stefan Richter

[permalink] [raw]
Subject: Re: [PATCH v2] firewire: Fix 'failed to read phy reg' on FW643 rev8

On Mar 26 Peter Hurley wrote:
> --- a/drivers/firewire/ohci.c
> +++ b/drivers/firewire/ohci.c
> @@ -2268,8 +2268,8 @@ static int ohci_enable(struct fw_card *card,
> OHCI1394_HCControl_postedWriteEnable);
> flush_writes(ohci);
>
> - for (lps = 0, i = 0; !lps && i < 3; i++) {
> - msleep(50);
> + for (lps = 0, i = 0; !lps && i < 150; i++) {
> + msleep(1);
> lps = reg_read(ohci, OHCI1394_HCControlSet) &
> OHCI1394_HCControl_LPS;
> }

This duplicates a mistake (?) that we already did a few times elsewhere in
firewire-ohci: msleep(1) may put the thread to sleep for much, much longer
than 1 millisecond. I will propose a patch soon.
--
Stefan Richter
-=====-===-= -=-- ===--
http://arcgraph.de/sr/

2013-04-28 21:07:27

by Stefan Richter

[permalink] [raw]
Subject: Re: [PATCH v2] firewire: Fix 'failed to read phy reg' on FW643 rev8

On Mar 26 Peter Hurley wrote:
> --- a/drivers/firewire/ohci.c
> +++ b/drivers/firewire/ohci.c
> @@ -2268,8 +2268,8 @@ static int ohci_enable(struct fw_card *card,
> OHCI1394_HCControl_postedWriteEnable);
> flush_writes(ohci);
>
> - for (lps = 0, i = 0; !lps && i < 3; i++) {
> - msleep(50);
> + for (lps = 0, i = 0; !lps && i < 150; i++) {
> + msleep(1);
> lps = reg_read(ohci, OHCI1394_HCControlSet) &
> OHCI1394_HCControl_LPS;
> }

Unfortunately, this introduced a regression with a CardBus card based on TI
TSB82AA2 [104c:8025] (rev 01) + TSB81BA3(A) [080028:831304]... actually
two cards from different brands with PCI subsystem IDs [9710:6500] and
[104c:8025] respectively, but probably same hardware under the covers.

Apr 28 20:49:43 stein kernel: pcmcia_socket pcmcia_socket0: pccard: CardBus card inserted into slot 0
Apr 28 20:49:43 stein kernel: pci 0000:0d:00.0: [104c:8025] type 00 class 0x0c0010
Apr 28 20:49:43 stein kernel: pci 0000:0d:00.0: reg 10: [mem 0x00000000-0x000007ff]
Apr 28 20:49:43 stein kernel: pci 0000:0d:00.0: reg 14: [mem 0x00000000-0x00003fff]
Apr 28 20:49:43 stein kernel: pci 0000:0d:00.0: supports D1 D2
Apr 28 20:49:43 stein kernel: pci 0000:0d:00.0: PME# supported from D0 D1 D2 D3hot
Apr 28 20:49:43 stein kernel: pci 0000:0d:00.0: BAR 1: assigned [mem 0xfb800000-0xfb803fff]
Apr 28 20:49:43 stein kernel: pci 0000:0d:00.0: BAR 0: assigned [mem 0xfb804000-0xfb8047ff]
Apr 28 20:49:43 stein kernel: firewire_ohci 0000:0d:00.0: enabling device (0000 -> 0002)
Apr 28 20:49:43 stein kernel: firewire_ohci 0000:0d:00.0: setting latency timer to 64
Apr 28 20:49:43 stein kernel: firewire_ohci 0000:0d:00.0: failed to read phy reg
Apr 28 20:49:43 stein kernel: firewire_ohci: probe of 0000:0d:00.0 failed with error -16

The same with your patch for more verbose logging:

Apr 28 19:01:59 stein kernel: pcmcia_socket pcmcia_socket0: pccard: CardBus card inserted into slot 0
Apr 28 19:01:59 stein kernel: pci 0000:0d:00.0: [104c:8025] type 00 class 0x0c0010
Apr 28 19:01:59 stein kernel: pci 0000:0d:00.0: reg 10: [mem 0x00000000-0x000007ff]
Apr 28 19:01:59 stein kernel: pci 0000:0d:00.0: reg 14: [mem 0x00000000-0x00003fff]
Apr 28 19:01:59 stein kernel: pci 0000:0d:00.0: reg 18: [mem 0x00000000-0x000007ff]
Apr 28 19:01:59 stein kernel: pci 0000:0d:00.0: supports D1 D2
Apr 28 19:01:59 stein kernel: pci 0000:0d:00.0: PME# supported from D0 D1 D2 D3hot
Apr 28 19:01:59 stein kernel: pci 0000:0d:00.0: BAR 1: assigned [mem 0xfb800000-0xfb803fff]
Apr 28 19:01:59 stein kernel: pci 0000:0d:00.0: BAR 0: assigned [mem 0xfb804000-0xfb8047ff]
Apr 28 19:01:59 stein kernel: pci 0000:0d:00.0: BAR 2: assigned [mem 0xfb804800-0xfb804fff]
Apr 28 19:01:59 stein kernel: firewire_ohci 0000:0d:00.0: enabling device (0000 -> 0002)
Apr 28 19:01:59 stein kernel: firewire_ohci 0000:0d:00.0: setting latency timer to 64
Apr 28 19:01:59 stein kernel: firewire_ohci 0000:0d:00.0: failed to read phy reg 2
Apr 28 19:01:59 stein kernel: Pid: 1117, comm: pccardd Not tainted 3.8.0-rc7 #3
Apr 28 19:01:59 stein kernel: Call Trace:
Apr 28 19:01:59 stein kernel: [<ffffffffa0179e1d>] read_phy_reg+0x7b/0x90 [firewire_ohci]
Apr 28 19:01:59 stein kernel: [<ffffffffa0179fbb>] ohci_enable+0xb5/0x569 [firewire_ohci]
Apr 28 19:01:59 stein kernel: [<ffffffffa016993f>] fw_card_add+0x46/0x8d [firewire_core]
Apr 28 19:01:59 stein kernel: [<ffffffffa0178832>] pci_probe+0x556/0x6b7 [firewire_ohci]
Apr 28 19:01:59 stein kernel: [<ffffffff810fd14a>] ? sysfs_do_create_link+0x155/0x1a7
Apr 28 19:01:59 stein kernel: [<ffffffff81192cfe>] pci_device_probe+0x5a/0x8d
Apr 28 19:01:59 stein kernel: [<ffffffff81298418>] ? driver_sysfs_add+0x6b/0x91
Apr 28 19:01:59 stein kernel: [<ffffffff8129869a>] driver_probe_device+0xa5/0x1b3
Apr 28 19:01:59 stein kernel: [<ffffffff81298857>] __device_attach+0x35/0x3a
Apr 28 19:01:59 stein kernel: [<ffffffff81298822>] ? __driver_attach+0x7a/0x7a
Apr 28 19:01:59 stein kernel: [<ffffffff81296f5b>] bus_for_each_drv+0x51/0x87
Apr 28 19:01:59 stein kernel: [<ffffffff812985c3>] device_attach+0x72/0x88
Apr 28 19:01:59 stein kernel: [<ffffffff81297c4e>] bus_probe_device+0x2d/0x98
Apr 28 19:01:59 stein kernel: [<ffffffff81296505>] device_add+0x3d8/0x54d
Apr 28 19:01:59 stein kernel: [<ffffffff8118cc85>] pci_bus_add_device+0x32/0x58
Apr 28 19:01:59 stein kernel: [<ffffffff8118ce68>] pci_bus_add_devices+0x29/0xed
Apr 28 19:01:59 stein kernel: [<ffffffffa007a410>] cb_alloc+0xba/0xcaa [pcmcia_core]
Apr 28 19:01:59 stein kernel: [<ffffffffa0079a18>] socket_insert+0xa1/0xe5 [pcmcia_core]
Apr 28 19:01:59 stein kernel: [<ffffffffa0079c70>] pccardd+0x1a8/0x440 [pcmcia_core]
Apr 28 19:01:59 stein kernel: [<ffffffffa0079ac8>] ? pcmcia_get_socket+0x21/0x21 [pcmcia_core]
Apr 28 19:01:59 stein kernel: [<ffffffff8103f898>] kthread+0xb5/0xbd
Apr 28 19:01:59 stein kernel: [<ffffffff8103f7e3>] ? __kthread_parkme+0x67/0x67
Apr 28 19:01:59 stein kernel: [<ffffffff8138d8ac>] ret_from_fork+0x7c/0xb0
Apr 28 19:01:59 stein kernel: [<ffffffff8103f7e3>] ? __kthread_parkme+0x67/0x67
Apr 28 19:01:59 stein kernel: firewire_ohci: probe of 0000:0d:00.0 failed with error -16

At the moment, these two CardBus cards are my only ones with discrete phy.
All my other controllers have got an integrated link and phy.

I could try a PCI card with TSB82AA2 + TSB81BA3(A) in the next few days,
furthermore a PCI card with ALi M5271 + TI TSB41AB3 (which did not work
with firewire-ohci when I tried it last a long time ago,
https://bugzilla.kernel.org/show_bug.cgi?id=10935).

The following fixup worked for the said two CardBus cards during quite a
few tries...

--- a/drivers/firewire/ohci.c
+++ b/drivers/firewire/ohci.c
@@ -2268,8 +2268,11 @@ static int ohci_enable(struct fw_card *c
OHCI1394_HCControl_postedWriteEnable);
flush_writes(ohci);

+ if (ohci->quirks & QUIRK_TI_SLLZ059)
+ usleep_range(4000, 4200);
+
for (lps = 0, i = 0; !lps && i < 150; i++) {
- msleep(1);
+ usleep_range(1000, 1200);
lps = reg_read(ohci, OHCI1394_HCControlSet) &
OHCI1394_HCControl_LPS;
}

...whereas usleep_range(3800, 4000) exhibited a notable percentage of
failures. Even lower ranges let the failure occur more often or always.

However, when I rewrote the fixup to...

--- a/drivers/firewire/ohci.c
+++ b/drivers/firewire/ohci.c
@@ -2280,6 +2280,7 @@ static int ohci_enable(struct fw_card *c
}

if (ohci->quirks & QUIRK_TI_SLLZ059) {
+ msleep(10);
ret = probe_tsb41ba3d(ohci);
if (ret < 0)
return ret;

...I still got one failure among several successful tries. So I
changed msleep(10) to msleep(50) and started a two tests to call it a
day but suddenly noticed that I got one /dev/fw* too few. The reason
was that my onboard JMicron JMB381 [197b:2380] with internal phy
[001b8c:038100] had at one point started to fail with

Apr 28 22:05:17 stein kernel: firewire_ohci 0000:0a:00.0: failed to read phy reg
Apr 28 22:05:17 stein kernel: firewire_ohci: probe of 0000:0a:00.0 failed with error -16

too. And unlike the TSB82AA2 cards, unloading firewire-ohci and
reloading a good version of firewire-ohci did not cure the JMB381;
I had to reboot in order to bring it back.

There is a small possibility that the reduction of sleep time before
the first phy reg access wasn't the real cause for the JMB381 to act up.
There is so much more which can upset this little POS. However, this
failure started during a time when I had nothing connected to the JMB381
at all; I was only doing the unload/reload cycles with firewire-ohci in
order to debug the TSB82AA2 cards.

So I believe we should white-list the shorter wait after LPS write
only for Agere/LSI cards. I will be replying with a respective patch
shortly.
--
Stefan Richter
-=====-===-= -=-- ===--
http://arcgraph.de/sr/

2013-04-28 21:24:29

by Stefan Richter

[permalink] [raw]
Subject: [PATCH v3] firewire: Fix 'failed to read phy reg' on FW643 rev8

From: Peter Hurley <[email protected]>

With the LSI FW643 rev 8 [1], the first commanded bus reset at
the conclusion of ohci_enable() has been observed to fail with
the following messages:

[ 4.884015] firewire_ohci 0000:01:00.0: failed to read phy reg
....
[ 5.684012] firewire_ohci 0000:01:00.0: failed to read phy reg

With drivers/firewire/ohci.c instrumented, the error condition [2]
indicates the PHY arbitration state machine has timed out prior to
enabling PHY LCtrl.

Furthermore, instrumenting ohci_enable() shows that LPS has been
enabled within 1 ms.

Test LPS latching every 1 ms rather than every 50ms.

[1] lspci -v

01:00.0 FireWire (IEEE 1394): LSI Corporation FW643 [TrueFire] PCIe 1394b Controller (rev 08) (prog-if 10 [OHCI])
Subsystem: LSI Corporation FW643 [TrueFire] PCIe 1394b Controller
Flags: bus master, fast devsel, latency 0, IRQ 92
Memory at fbeff000 (64-bit, non-prefetchable) [size=4K]
Capabilities: [44] Power Management version 3
Capabilities: [4c] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [60] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Virtual Channel
Capabilities: [170] Device Serial Number 08-14-43-82-00-00-41-fc
Kernel driver in use: firewire_ohci
Kernel modules: firewire-ohci

[2] instrumented WARNING in read_phy_reg()

[ 4.576010] ------------[ cut here ]------------
[ 4.576035] WARNING: at ./drivers/firewire/ohci.c:570 read_phy_reg+0x93/0xe0 [firewire_ohci]()
[ 4.576050] Hardware name: Precision WorkStation T5400
[ 4.576058] failed to read phy reg:1 (phy(5) @ config enhance:19)
[ 4.576068] Modules linked in: hid_logitech_dj hid_generic(+) usbhid <...snip...>
[ 4.576140] Pid: 61, comm: kworker/2:1 Not tainted 3.8.0-2+fwtest-xeon #2+fwtest
[ 4.576149] Call Trace:
[ 4.576160] [<ffffffff8105468f>] warn_slowpath_common+0x7f/0xc0
[ 4.576168] [<ffffffff81054786>] warn_slowpath_fmt+0x46/0x50
[ 4.576178] [<ffffffffa00caca3>] read_phy_reg+0x93/0xe0 [firewire_ohci]
[ 4.576188] [<ffffffffa00cae19>] ohci_read_phy_reg+0x39/0x60 [firewire_ohci]
[ 4.576203] [<ffffffffa00731ff>] fw_send_phy_config+0xbf/0xe0 [firewire_core]
[ 4.576214] [<ffffffffa006b2d6>] br_work+0x46/0xb0 [firewire_core]
[ 4.576225] [<ffffffff81071e0c>] process_one_work+0x13c/0x500
[ 4.576238] [<ffffffffa006b290>] ? fw_card_initialize+0x180/0x180 [firewire_core]
[ 4.576248] [<ffffffff810737ed>] worker_thread+0x16d/0x470
[ 4.576257] [<ffffffff81073680>] ? busy_worker_rebind_fn+0x100/0x100
[ 4.576266] [<ffffffff8107d160>] kthread+0xc0/0xd0
[ 4.576275] [<ffffffff816a0000>] ? pcpu_dump_alloc_info+0x1cb/0x2c4
[ 4.576284] [<ffffffff8107d0a0>] ? kthread_create_on_node+0x130/0x130
[ 4.576297] [<ffffffff816b2f6c>] ret_from_fork+0x7c/0xb0
[ 4.576305] [<ffffffff8107d0a0>] ? kthread_create_on_node+0x130/0x130
[ 4.576313] ---[ end trace cbc940994b300302 ]---

[Stefan R: Peter also reports a change of behavior with LSI FW323.
Before the patch, there would often occur a lock transaction failure
during firewire-core startup:
[ 6.056022] firewire_core 0000:07:06.0: BM lock failed (timeout), making local node (ffc0) root
This failure no longer happens after the patch, without an obvious
reason for the failure or the fix.]

[Stefan R: Added quirk flag, quirk table entry, and comment.]

Reported-by: Tim Jordan <[email protected]>
Signed-off-by: Peter Hurley <[email protected]>
Signed-off-by: Stefan Richter <[email protected]>
---
drivers/firewire/ohci.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)

--- a/drivers/firewire/ohci.c
+++ b/drivers/firewire/ohci.c
@@ -284,6 +284,7 @@ static char ohci_driver_name[] = KBUILD_
#define QUIRK_NO_MSI 16
#define QUIRK_TI_SLLZ059 32
#define QUIRK_IR_WAKE 64
+#define QUIRK_PHY_LCTRL_TIMEOUT 128

/* In case of multiple matches in ohci_quirks[], only the first one is used. */
static const struct {
@@ -296,7 +297,10 @@ static const struct {
QUIRK_BE_HEADERS},

{PCI_VENDOR_ID_ATT, PCI_DEVICE_ID_AGERE_FW643, 6,
- QUIRK_NO_MSI},
+ QUIRK_PHY_LCTRL_TIMEOUT | QUIRK_NO_MSI},
+
+ {PCI_VENDOR_ID_ATT, PCI_ANY_ID, PCI_ANY_ID,
+ QUIRK_PHY_LCTRL_TIMEOUT},

{PCI_VENDOR_ID_CREATIVE, PCI_DEVICE_ID_CREATIVE_SB1394, PCI_ANY_ID,
QUIRK_RESET_PACKET},
@@ -343,6 +347,7 @@ MODULE_PARM_DESC(quirks, "Chip quirks (d
", disable MSI = " __stringify(QUIRK_NO_MSI)
", TI SLLZ059 erratum = " __stringify(QUIRK_TI_SLLZ059)
", IR wake unreliable = " __stringify(QUIRK_IR_WAKE)
+ ", phy LCtrl timeout = " __stringify(QUIRK_PHY_LCTRL_TIMEOUT)
")");

#define OHCI_PARAM_DEBUG_AT_AR 1
@@ -2293,14 +2298,25 @@ static int ohci_enable(struct fw_card *c
* will lock up the machine. Wait 50msec to make sure we have
* full link enabled. However, with some cards (well, at least
* a JMicron PCIe card), we have to try again sometimes.
+ *
+ * TI TSB82AA2 + TSB81BA3(A) cards signal LPS enabled early but
+ * cannot actually use the phy at that time. These need tens of
+ * millisecods pause between LPS write and first phy access too.
+ *
+ * But do not wait for 50msec on Agere/LSI cards. Their phy
+ * arbitration state machine may time out during such a long wait.
*/
+
reg_write(ohci, OHCI1394_HCControlSet,
OHCI1394_HCControl_LPS |
OHCI1394_HCControl_postedWriteEnable);
flush_writes(ohci);

- for (lps = 0, i = 0; !lps && i < 3; i++) {
+ if (!(ohci->quirks & QUIRK_PHY_LCTRL_TIMEOUT))
msleep(50);
+
+ for (lps = 0, i = 0; !lps && i < 150; i++) {
+ msleep(1);
lps = reg_read(ohci, OHCI1394_HCControlSet) &
OHCI1394_HCControl_LPS;
}


--
Stefan Richter
-=====-===-= -=-- ===--
http://arcgraph.de/sr/