2013-03-21 03:29:14

by Frank Rowand

[permalink] [raw]
Subject: [BUG] bisected: PandaBoard smsc95xx ethernet driver error from USB timeout

Hi All,

Not quite sure quite where the problem is (USB, OMAP, smsc95xx driver, other???),
so casting the nets wide...

The PandaBoard frequently fails to boot with an eth0 error when mounting
the root file system via NFS (ethernet driver fails due to a USB timeout;
no ethernet means NFS won't work). A typical set of error messages is:

[ 3.264373] smsc95xx 1-1.1:1.0: usb_probe_interface
[ 3.269500] smsc95xx 1-1.1:1.0: usb_probe_interface - got id
[ 3.275543] smsc95xx v1.0.4
[ 8.078674] smsc95xx 1-1.1:1.0: eth0: register 'smsc95xx' at usb-ehci-omap.0-1.1, smsc95xx USB 2.0 Ethernet, 82:b9:1d:fa:67:0d
[ 8.091003] hub 1-1:1.0: state 7 ports 5 chg 0000 evt 0002
[ 13.509918] usb 1-1.1: swapper/0 timed out on ep0out len=0/4
[ 13.515869] smsc95xx 1-1.1:1.0: eth0: Failed to write register index 0x00000108
[ 13.523559] smsc95xx 1-1.1:1.0: eth0: Failed to write ADDRL: -110
[ 13.529998] IP-Config: Failed to open eth0

I have bisected this to:

commit 18aafe64d75d0e27dae206cacf4171e4e485d285
Author: Alan Stern <[email protected]>
Date: Wed Jul 11 11:23:04 2012 -0400

USB: EHCI: use hrtimer for the I/O watchdog

Note that to compile this version of the kernel, an additional fix must
also be applied:

commit ba5952e0711b14d8d4fe172671f8aa6091ace3ee
Author: Ming Lei <[email protected]>
Date: Fri Jul 13 17:25:24 2012 +0800

USB: ehci-omap: fix compile failure(v1)

The symptom can be worked around by retrying the USB access if a timeout
occurs. This is clearly _not_ the fix, just a hack that I used to
investigate the problem:

http://article.gmane.org/gmane.linux.rt.user/9773

My kernel configuration is:

arch/arm/configs/omap2plus_defconfig

plus to get the ethernet driver I add:

CONFIG_USB_EHCI_HCD
CONFIG_USB_NET_SMSC95XX

I found the problem on 3.6.11, but have not replicated it on 3.9-rcX
yet because my config fails to build on 3.9-rc1 and 3.9-rc2. I'll try
to work on that issue tomorrow.


2013-03-21 09:00:56

by Ming Lei

[permalink] [raw]
Subject: Re: [BUG] bisected: PandaBoard smsc95xx ethernet driver error from USB timeout

Hi Frank,

On Thu, Mar 21, 2013 at 11:29 AM, Frank Rowand <[email protected]> wrote:
>
> I found the problem on 3.6.11, but have not replicated it on 3.9-rcX
> yet because my config fails to build on 3.9-rc1 and 3.9-rc2. I'll try
> to work on that issue tomorrow.

I play upstream kernel on Pandaboard A1 frequently, looks not
see the failure problem before. Maybe the problem is config dependent.

If you may share your config file, I'd like to do the test too.


Thanks,
--
Ming Lei

2013-03-21 14:41:49

by Alan Stern

[permalink] [raw]
Subject: Re: [BUG] bisected: PandaBoard smsc95xx ethernet driver error from USB timeout

On Wed, 20 Mar 2013, Frank Rowand wrote:

> Hi All,
>
> Not quite sure quite where the problem is (USB, OMAP, smsc95xx driver, other???),
> so casting the nets wide...
>
> The PandaBoard frequently fails to boot with an eth0 error when mounting
> the root file system via NFS (ethernet driver fails due to a USB timeout;
> no ethernet means NFS won't work). A typical set of error messages is:
>
> [ 3.264373] smsc95xx 1-1.1:1.0: usb_probe_interface
> [ 3.269500] smsc95xx 1-1.1:1.0: usb_probe_interface - got id
> [ 3.275543] smsc95xx v1.0.4
> [ 8.078674] smsc95xx 1-1.1:1.0: eth0: register 'smsc95xx' at usb-ehci-omap.0-1.1, smsc95xx USB 2.0 Ethernet, 82:b9:1d:fa:67:0d
> [ 8.091003] hub 1-1:1.0: state 7 ports 5 chg 0000 evt 0002
> [ 13.509918] usb 1-1.1: swapper/0 timed out on ep0out len=0/4
> [ 13.515869] smsc95xx 1-1.1:1.0: eth0: Failed to write register index 0x00000108
> [ 13.523559] smsc95xx 1-1.1:1.0: eth0: Failed to write ADDRL: -110
> [ 13.529998] IP-Config: Failed to open eth0
>
> I have bisected this to:
>
> commit 18aafe64d75d0e27dae206cacf4171e4e485d285
> Author: Alan Stern <[email protected]>
> Date: Wed Jul 11 11:23:04 2012 -0400
>
> USB: EHCI: use hrtimer for the I/O watchdog

I don't understand how that commit could cause a timeout unless there
are at least two other bugs present in your system.

> Note that to compile this version of the kernel, an additional fix must
> also be applied:
>
> commit ba5952e0711b14d8d4fe172671f8aa6091ace3ee
> Author: Ming Lei <[email protected]>
> Date: Fri Jul 13 17:25:24 2012 +0800
>
> USB: ehci-omap: fix compile failure(v1)
>
> The symptom can be worked around by retrying the USB access if a timeout
> occurs. This is clearly _not_ the fix, just a hack that I used to
> investigate the problem:
>
> http://article.gmane.org/gmane.linux.rt.user/9773
>
> My kernel configuration is:
>
> arch/arm/configs/omap2plus_defconfig
>
> plus to get the ethernet driver I add:
>
> CONFIG_USB_EHCI_HCD
> CONFIG_USB_NET_SMSC95XX
>
> I found the problem on 3.6.11, but have not replicated it on 3.9-rcX
> yet because my config fails to build on 3.9-rc1 and 3.9-rc2. I'll try
> to work on that issue tomorrow.

Let me know how it works out.

Alan Stern

2013-03-21 20:07:00

by Frank Rowand

[permalink] [raw]
Subject: Re: [BUG] bisected: PandaBoard smsc95xx ethernet driver error from USB timeout

On 03/21/13 07:41, Alan Stern wrote:
> On Wed, 20 Mar 2013, Frank Rowand wrote:
>
>> Hi All,
>>
>> Not quite sure quite where the problem is (USB, OMAP, smsc95xx driver, other???),
>> so casting the nets wide...
>>
>> The PandaBoard frequently fails to boot with an eth0 error when mounting
>> the root file system via NFS (ethernet driver fails due to a USB timeout;
>> no ethernet means NFS won't work). A typical set of error messages is:
>>
>> [ 3.264373] smsc95xx 1-1.1:1.0: usb_probe_interface
>> [ 3.269500] smsc95xx 1-1.1:1.0: usb_probe_interface - got id
>> [ 3.275543] smsc95xx v1.0.4
>> [ 8.078674] smsc95xx 1-1.1:1.0: eth0: register 'smsc95xx' at usb-ehci-omap.0-1.1, smsc95xx USB 2.0 Ethernet, 82:b9:1d:fa:67:0d
>> [ 8.091003] hub 1-1:1.0: state 7 ports 5 chg 0000 evt 0002
>> [ 13.509918] usb 1-1.1: swapper/0 timed out on ep0out len=0/4
>> [ 13.515869] smsc95xx 1-1.1:1.0: eth0: Failed to write register index 0x00000108
>> [ 13.523559] smsc95xx 1-1.1:1.0: eth0: Failed to write ADDRL: -110
>> [ 13.529998] IP-Config: Failed to open eth0
>>
>> I have bisected this to:
>>
>> commit 18aafe64d75d0e27dae206cacf4171e4e485d285
>> Author: Alan Stern <[email protected]>
>> Date: Wed Jul 11 11:23:04 2012 -0400
>>
>> USB: EHCI: use hrtimer for the I/O watchdog
>
> I don't understand how that commit could cause a timeout unless there
> are at least two other bugs present in your system.

Yes, I would not be at all surprised if this commit merely exposes a
problem in other code. That is why I included so many different people
and subsystems on the email distribution list.

< snip >

-Frank

2013-03-21 20:26:35

by Frank Rowand

[permalink] [raw]
Subject: Re: [BUG] bisected: PandaBoard smsc95xx ethernet driver error from USB timeout

On 03/21/13 02:00, Ming Lei wrote:
> Hi Frank,
>
> On Thu, Mar 21, 2013 at 11:29 AM, Frank Rowand <[email protected]> wrote:
>>
>> I found the problem on 3.6.11, but have not replicated it on 3.9-rcX
>> yet because my config fails to build on 3.9-rc1 and 3.9-rc2. I'll try
>> to work on that issue tomorrow.
>
> I play upstream kernel on Pandaboard A1 frequently, looks not
> see the failure problem before. Maybe the problem is config dependent.
>
> If you may share your config file, I'd like to do the test too.

I will do a separate reply with the actual config at the point where
the bisect completed.

I create the config for each commit during the bisect with scripts
that do the equivalent of:

make omap2plus_defconfig

make menuconfig

# this allows USB thumb drive
# Device Drivers -> USB support -> EHCI HCD (USB 2.0) support
CONFIG_USB_EHCI_HCD=y

# ethernet device
# Device Drivers -> Network device support -> USB Network Adapters ->
# Multi-purpose USB Networking Framework ->
# SMSC LAN95XX based USB 2.0 10/100 ethernet devices
CONFIG_USB_NET_SMSC95XX=y


Some more random information that may be helpful....

----------
$ cat /proc/cmdline
ip=192.168.1.85:192.168.1.1:192.168.1.1:255.255.255.0:panda nfsroot=192.168.1.1:/a/target/panda root=/dev/nfs ip=dhcp mem=463M console=ttyO2,115200n8 debug earlyprintk


----------
The percentage of boots that show the problem varies quite a bit between
the kernel versions that I tried during my bisect. For my first attempt
at bisecting, I decided a version was good if it booted 12 times. That
bisect failed for various reasons. For my second attempt at bisecting,
I decided a version was good if it booted 18 times.


----------
There are some timeout messages that I am not positive are symptoms of
the problem. With these messages, the smsc95xx driver initialization is
successful, so the ethernet device is available. For the first bisect
attempt, I did not treat these messages as errors. For the second bisect
attempt I treated these messages as errors. A typical example of the
timeout message is:

[ 9.537811] hub 1-1:1.0: state 7 ports 5 chg 0000 evt 0002
[ 17.056701] usb 1-1.1: swapper/0 timed out on ep0out len=0/4
[ 17.062652] smsc95xx 1-1.1:1.0: eth0: Failed to write register index 0x00000108
[ 17.070343] smsc95xx 1-1.1:1.0: eth0: Failed to write ADDRL: -110
[ 17.076751] IP-Config: Failed to open eth0

The mention of swapper is not relevent, it just happens to be the
current process when the time out occurs.

I have only seen these timeout messages in the boot log, so they may not
be a very visible symptom. They also _might_ be unrelated to the problem,
but my gut feel is that they are related.


----------
The problem manifests as a timeout from at least two different locations
in drivers/net/usb/smsc95xx.c:

656 static int smsc95xx_set_mac_address(struct usbnet *dev)
657 {
...
663 ret = smsc95xx_write_reg(dev, ADDRL, addr_lo);
664 if (ret < 0) {
665 netdev_warn(dev->net, "Failed to write ADDRL: %d\n", ret);
666 return ret;
667 }

751 static int smsc95xx_reset(struct usbnet *dev)
752 {
...
783 write_buf = PM_CTL_PHY_RST_;
784 ret = smsc95xx_write_reg(dev, PM_CTRL, write_buf);
785 if (ret < 0) {
786 netdev_warn(dev->net, "Failed to write PM_CTRL: %d\n", ret);
787 return ret;
788 }

There may be additional locations. These are just two that I captured when
debugging. Some of the other smsc95xx_write_reg() calls in smsc95xx_reset()
are protected with checks for timeout, with up to 100 retries. I don't know
if more checks for timeout, or longer timeout, is a solution or just an
incorrect way of papering over the real problem -- this is not an area of
expertise for me.


Thanks,

Frank

2013-03-21 20:29:55

by Frank Rowand

[permalink] [raw]
Subject: Re: [BUG] bisected: PandaBoard smsc95xx ethernet driver error from USB timeout

On 03/21/13 02:00, Ming Lei wrote:
> Hi Frank,
>
> On Thu, Mar 21, 2013 at 11:29 AM, Frank Rowand <[email protected]> wrote:
>>
>> I found the problem on 3.6.11, but have not replicated it on 3.9-rcX
>> yet because my config fails to build on 3.9-rc1 and 3.9-rc2. I'll try
>> to work on that issue tomorrow.
>
> I play upstream kernel on Pandaboard A1 frequently, looks not
> see the failure problem before. Maybe the problem is config dependent.
>
> If you may share your config file, I'd like to do the test too.

Config attached...

Thanks,

Frank


Attachments:
config_panda_18aafe6 (68.17 kB)

2013-03-21 20:33:46

by Frank Rowand

[permalink] [raw]
Subject: Re: [BUG] bisected: PandaBoard smsc95xx ethernet driver error from USB timeout

On 03/21/13 13:25, Frank Rowand wrote:
> On 03/21/13 02:00, Ming Lei wrote:

< snip >

> ----------
> There are some timeout messages that I am not positive are symptoms of
> the problem. With these messages, the smsc95xx driver initialization is
> successful, so the ethernet device is available. For the first bisect
> attempt, I did not treat these messages as errors. For the second bisect
> attempt I treated these messages as errors. A typical example of the
> timeout message is:
>
> [ 9.537811] hub 1-1:1.0: state 7 ports 5 chg 0000 evt 0002
> [ 17.056701] usb 1-1.1: swapper/0 timed out on ep0out len=0/4
> [ 17.062652] smsc95xx 1-1.1:1.0: eth0: Failed to write register index 0x00000108
> [ 17.070343] smsc95xx 1-1.1:1.0: eth0: Failed to write ADDRL: -110
> [ 17.076751] IP-Config: Failed to open eth0

Oops, pasted the wrong example. This is an example of a timeout, but the
driver still works, and the system boots:

[ 6.072357] smsc95xx 1-1.1:1.0: eth0: register 'smsc95xx' at usb-ehci-omap.0-1.1, smsc95xx USB 2.0 Ethernet, fa:50:73:02:79:67
[ 6.084655] hub 1-1:1.0: state 7 ports 5 chg 0000 evt 0002
[ 11.220672] usb 1-1.1: swapper/0 timed out on ep0out len=4/4
[ 18.183441] usb 1-1.1: link qh8-0001/dc8d6640 start 2 [1/0 us]
[ 19.822296] IP-Config: Complete:

>
> The mention of swapper is not relevent, it just happens to be the
> current process when the time out occurs.
>
> I have only seen these timeout messages in the boot log, so they may not
> be a very visible symptom. They also _might_ be unrelated to the problem,
> but my gut feel is that they are related.

< snip >

-Frank

2013-03-22 02:45:33

by Frank Rowand

[permalink] [raw]
Subject: Re: [BUG] bisected: PandaBoard smsc95xx ethernet driver error from USB timeout

On 03/21/13 07:41, Alan Stern wrote:
> On Wed, 20 Mar 2013, Frank Rowand wrote:
>
>> Hi All,
>>
>> Not quite sure quite where the problem is (USB, OMAP, smsc95xx driver, other???),
>> so casting the nets wide...
>>
>> The PandaBoard frequently fails to boot with an eth0 error when mounting
>> the root file system via NFS (ethernet driver fails due to a USB timeout;
>> no ethernet means NFS won't work). A typical set of error messages is:
>>
>> [ 3.264373] smsc95xx 1-1.1:1.0: usb_probe_interface
>> [ 3.269500] smsc95xx 1-1.1:1.0: usb_probe_interface - got id
>> [ 3.275543] smsc95xx v1.0.4
>> [ 8.078674] smsc95xx 1-1.1:1.0: eth0: register 'smsc95xx' at usb-ehci-omap.0-1.1, smsc95xx USB 2.0 Ethernet, 82:b9:1d:fa:67:0d
>> [ 8.091003] hub 1-1:1.0: state 7 ports 5 chg 0000 evt 0002
>> [ 13.509918] usb 1-1.1: swapper/0 timed out on ep0out len=0/4
>> [ 13.515869] smsc95xx 1-1.1:1.0: eth0: Failed to write register index 0x00000108
>> [ 13.523559] smsc95xx 1-1.1:1.0: eth0: Failed to write ADDRL: -110
>> [ 13.529998] IP-Config: Failed to open eth0
>>
>> I have bisected this to:
>>
>> commit 18aafe64d75d0e27dae206cacf4171e4e485d285
>> Author: Alan Stern <[email protected]>
>> Date: Wed Jul 11 11:23:04 2012 -0400
>>
>> USB: EHCI: use hrtimer for the I/O watchdog
>
> I don't understand how that commit could cause a timeout unless there
> are at least two other bugs present in your system.
>
>> Note that to compile this version of the kernel, an additional fix must
>> also be applied:
>>
>> commit ba5952e0711b14d8d4fe172671f8aa6091ace3ee
>> Author: Ming Lei <[email protected]>
>> Date: Fri Jul 13 17:25:24 2012 +0800
>>
>> USB: ehci-omap: fix compile failure(v1)
>>
>> The symptom can be worked around by retrying the USB access if a timeout
>> occurs. This is clearly _not_ the fix, just a hack that I used to
>> investigate the problem:
>>
>> http://article.gmane.org/gmane.linux.rt.user/9773
>>
>> My kernel configuration is:
>>
>> arch/arm/configs/omap2plus_defconfig
>>
>> plus to get the ethernet driver I add:
>>
>> CONFIG_USB_EHCI_HCD
>> CONFIG_USB_NET_SMSC95XX
>>
>> I found the problem on 3.6.11, but have not replicated it on 3.9-rcX
>> yet because my config fails to build on 3.9-rc1 and 3.9-rc2. I'll try
>> to work on that issue tomorrow.
>
> Let me know how it works out.

My PandaBoard builds fail on 3.9-rcX due to ARM multiplatform issues.
Either there is something I need to change about the way I build it,
or it is broken (that is a side issue). My simple expedient was to
hack around multiplatform, and just make it build (patch below if
anyone else wants a _temporary_ hack).

The problem appears to not be present in 3.9-rc3. In older kernel versions,
the worst case to see the problem was 18 boots. For 3.9-rc3 I booted 42
times without seeing the problem.

The problem occurs at least up through 3.8. I'll try to reverse bisect
between 3.8 and 3.9-rc3 to see when the problem disappeared (I'm running
short of time, so no promises for a near term result).

-Frank


This patch is a _temporary_ hack, not fit for man or beast. Avert
your eyes, do not apply to any respectable repository!

---
arch/arm/Kconfig | 2 1 + 1 - 0 !
arch/arm/Makefile | 2 2 + 0 - 0 !
2 files changed, 3 insertions(+), 1 deletion(-)

Index: b/arch/arm/Kconfig
===================================================================
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1013,7 +1013,7 @@ config ARCH_MULTI_V7
bool "ARMv7 based platforms (Cortex-A, PJ4, Krait)"
default y
select ARCH_MULTI_V6_V7
- select ARCH_VEXPRESS
+ select ARCH_VEXPRESS if !ARCH_OMAP2PLUS
select CPU_V7

config ARCH_MULTI_V6_V7
Index: b/arch/arm/Makefile
===================================================================
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -227,8 +227,10 @@ else
MACHINE :=
endif
ifeq ($(CONFIG_ARCH_MULTIPLATFORM),y)
+ifneq ($(CONFIG_ARCH_OMAP2PLUS),y)
MACHINE :=
endif
+endif

machdirs := $(patsubst %,arch/arm/mach-%/,$(machine-y))
platdirs := $(patsubst %,arch/arm/plat-%/,$(plat-y))

2013-03-22 08:42:45

by Roger Quadros

[permalink] [raw]
Subject: Re: [BUG] bisected: PandaBoard smsc95xx ethernet driver error from USB timeout

Hi Frank,

On 03/22/2013 04:45 AM, Frank Rowand wrote:
> On 03/21/13 07:41, Alan Stern wrote:
>> On Wed, 20 Mar 2013, Frank Rowand wrote:
>>
>>> Hi All,
>>>
>>> Not quite sure quite where the problem is (USB, OMAP, smsc95xx driver, other???),
>>> so casting the nets wide...
>>>
>>> The PandaBoard frequently fails to boot with an eth0 error when mounting
>>> the root file system via NFS (ethernet driver fails due to a USB timeout;
>>> no ethernet means NFS won't work). A typical set of error messages is:
>>>
>>> [ 3.264373] smsc95xx 1-1.1:1.0: usb_probe_interface
>>> [ 3.269500] smsc95xx 1-1.1:1.0: usb_probe_interface - got id
>>> [ 3.275543] smsc95xx v1.0.4
>>> [ 8.078674] smsc95xx 1-1.1:1.0: eth0: register 'smsc95xx' at usb-ehci-omap.0-1.1, smsc95xx USB 2.0 Ethernet, 82:b9:1d:fa:67:0d
>>> [ 8.091003] hub 1-1:1.0: state 7 ports 5 chg 0000 evt 0002
>>> [ 13.509918] usb 1-1.1: swapper/0 timed out on ep0out len=0/4
>>> [ 13.515869] smsc95xx 1-1.1:1.0: eth0: Failed to write register index 0x00000108
>>> [ 13.523559] smsc95xx 1-1.1:1.0: eth0: Failed to write ADDRL: -110
>>> [ 13.529998] IP-Config: Failed to open eth0
>>>
>>> I have bisected this to:
>>>
>>> commit 18aafe64d75d0e27dae206cacf4171e4e485d285
>>> Author: Alan Stern <[email protected]>
>>> Date: Wed Jul 11 11:23:04 2012 -0400
>>>
>>> USB: EHCI: use hrtimer for the I/O watchdog
>>
>> I don't understand how that commit could cause a timeout unless there
>> are at least two other bugs present in your system.
>>
>>> Note that to compile this version of the kernel, an additional fix must
>>> also be applied:
>>>
>>> commit ba5952e0711b14d8d4fe172671f8aa6091ace3ee
>>> Author: Ming Lei <[email protected]>
>>> Date: Fri Jul 13 17:25:24 2012 +0800
>>>
>>> USB: ehci-omap: fix compile failure(v1)
>>>
>>> The symptom can be worked around by retrying the USB access if a timeout
>>> occurs. This is clearly _not_ the fix, just a hack that I used to
>>> investigate the problem:
>>>
>>> http://article.gmane.org/gmane.linux.rt.user/9773
>>>
>>> My kernel configuration is:
>>>
>>> arch/arm/configs/omap2plus_defconfig
>>>
>>> plus to get the ethernet driver I add:
>>>
>>> CONFIG_USB_EHCI_HCD
>>> CONFIG_USB_NET_SMSC95XX
>>>
>>> I found the problem on 3.6.11, but have not replicated it on 3.9-rcX
>>> yet because my config fails to build on 3.9-rc1 and 3.9-rc2. I'll try
>>> to work on that issue tomorrow.
>>
>> Let me know how it works out.
>
> My PandaBoard builds fail on 3.9-rcX due to ARM multiplatform issues.
> Either there is something I need to change about the way I build it,
> or it is broken (that is a side issue). My simple expedient was to
> hack around multiplatform, and just make it build (patch below if
> anyone else wants a _temporary_ hack).

This is a known issue and will be resolved the proper way in 3.10.
For 3.9 you could also use a temporary fix posted here

http://thread.gmane.org/gmane.linux.usb.general/82693/

>
> The problem appears to not be present in 3.9-rc3. In older kernel versions,
> the worst case to see the problem was 18 boots. For 3.9-rc3 I booted 42
> times without seeing the problem.

This is good to hear.

>
> The problem occurs at least up through 3.8. I'll try to reverse bisect
> between 3.8 and 3.9-rc3 to see when the problem disappeared (I'm running
> short of time, so no promises for a near term result).

Thanks for the tests. There were a lot of OMAP EHCI related cleanup/fixes [1]
that went into 3.9. It would be interesting to know what fixed it.

[1] - https://lkml.org/lkml/2013/1/23/155

cheers,
-roger

2013-03-22 10:03:37

by Mats Liljegren

[permalink] [raw]
Subject: Re: [BUG] bisected: PandaBoard smsc95xx ethernet driver error from USB timeout

Frank Rowand wrote:
> On 03/21/13 07:41, Alan Stern wrote:
> > On Wed, 20 Mar 2013, Frank Rowand wrote:
> >
> >> Hi All,
> >>
> >> Not quite sure quite where the problem is (USB, OMAP, smsc95xx driver, other???),
> >> so casting the nets wide...
> >>
> >> The PandaBoard frequently fails to boot with an eth0 error when mounting
> >> the root file system via NFS (ethernet driver fails due to a USB timeout;
> >> no ethernet means NFS won't work). A typical set of error messages is:
> >>
> >> [ 3.264373] smsc95xx 1-1.1:1.0: usb_probe_interface
> >> [ 3.269500] smsc95xx 1-1.1:1.0: usb_probe_interface - got id
> >> [ 3.275543] smsc95xx v1.0.4
> >> [ 8.078674] smsc95xx 1-1.1:1.0: eth0: register 'smsc95xx' at usb-ehci-omap.0-1.1, smsc95xx USB 2.0 Ethernet, 82:b9:1d:fa:67:0d
> >> [ 8.091003] hub 1-1:1.0: state 7 ports 5 chg 0000 evt 0002
> >> [ 13.509918] usb 1-1.1: swapper/0 timed out on ep0out len=0/4
> >> [ 13.515869] smsc95xx 1-1.1:1.0: eth0: Failed to write register index 0x00000108
> >> [ 13.523559] smsc95xx 1-1.1:1.0: eth0: Failed to write ADDRL: -110
> >> [ 13.529998] IP-Config: Failed to open eth0
> >>
> >> I have bisected this to:
> >>
> >> commit 18aafe64d75d0e27dae206cacf4171e4e485d285
> >> Author: Alan Stern <[email protected]>
> >> Date: Wed Jul 11 11:23:04 2012 -0400
> >>
> >> USB: EHCI: use hrtimer for the I/O watchdog
> >
> > I don't understand how that commit could cause a timeout unless there
> > are at least two other bugs present in your system.
> >
> >> Note that to compile this version of the kernel, an additional fix must
> >> also be applied:
> >>
> >> commit ba5952e0711b14d8d4fe172671f8aa6091ace3ee
> >> Author: Ming Lei <[email protected]>
> >> Date: Fri Jul 13 17:25:24 2012 +0800
> >>
> >> USB: ehci-omap: fix compile failure(v1)
> >>
> >> The symptom can be worked around by retrying the USB access if a timeout
> >> occurs. This is clearly _not_ the fix, just a hack that I used to
> >> investigate the problem:
> >>
> >> http://article.gmane.org/gmane.linux.rt.user/9773
> >>
> >> My kernel configuration is:
> >>
> >> arch/arm/configs/omap2plus_defconfig
> >>
> >> plus to get the ethernet driver I add:
> >>
> >> CONFIG_USB_EHCI_HCD
> >> CONFIG_USB_NET_SMSC95XX
> >>
> >> I found the problem on 3.6.11, but have not replicated it on 3.9-rcX
> >> yet because my config fails to build on 3.9-rc1 and 3.9-rc2. I'll try
> >> to work on that issue tomorrow.
> >
> > Let me know how it works out.
>
> My PandaBoard builds fail on 3.9-rcX due to ARM multiplatform issues.
> Either there is something I need to change about the way I build it,
> or it is broken (that is a side issue). My simple expedient was to
> hack around multiplatform, and just make it build (patch below if
> anyone else wants a _temporary_ hack).

I have built 3.9-RC2 for PandaBoard ES and the only problem I have seen is
that you need to add "LOADADDR=0x80008000" when building uImage target.

-- Mats

2013-03-22 18:24:46

by Frank Rowand

[permalink] [raw]
Subject: Re: [BUG] bisected: PandaBoard smsc95xx ethernet driver error from USB timeout

On 03/22/13 03:03, Mats Liljegren wrote:
> Frank Rowand wrote:
>> On 03/21/13 07:41, Alan Stern wrote:
>>> On Wed, 20 Mar 2013, Frank Rowand wrote:
>>>
>>>> Hi All,
>>>>
>>>> Not quite sure quite where the problem is (USB, OMAP, smsc95xx driver, other???),
>>>> so casting the nets wide...
>>>>
>>>> The PandaBoard frequently fails to boot with an eth0 error when mounting
>>>> the root file system via NFS (ethernet driver fails due to a USB timeout;
>>>> no ethernet means NFS won't work). A typical set of error messages is:
>>>>
>>>> [ 3.264373] smsc95xx 1-1.1:1.0: usb_probe_interface
>>>> [ 3.269500] smsc95xx 1-1.1:1.0: usb_probe_interface - got id
>>>> [ 3.275543] smsc95xx v1.0.4
>>>> [ 8.078674] smsc95xx 1-1.1:1.0: eth0: register 'smsc95xx' at usb-ehci-omap.0-1.1, smsc95xx USB 2.0 Ethernet, 82:b9:1d:fa:67:0d
>>>> [ 8.091003] hub 1-1:1.0: state 7 ports 5 chg 0000 evt 0002
>>>> [ 13.509918] usb 1-1.1: swapper/0 timed out on ep0out len=0/4
>>>> [ 13.515869] smsc95xx 1-1.1:1.0: eth0: Failed to write register index 0x00000108
>>>> [ 13.523559] smsc95xx 1-1.1:1.0: eth0: Failed to write ADDRL: -110
>>>> [ 13.529998] IP-Config: Failed to open eth0
>>>>
>>>> I have bisected this to:
>>>>
>>>> commit 18aafe64d75d0e27dae206cacf4171e4e485d285
>>>> Author: Alan Stern <[email protected]>
>>>> Date: Wed Jul 11 11:23:04 2012 -0400
>>>>
>>>> USB: EHCI: use hrtimer for the I/O watchdog
>>>
>>> I don't understand how that commit could cause a timeout unless there
>>> are at least two other bugs present in your system.
>>>
>>>> Note that to compile this version of the kernel, an additional fix must
>>>> also be applied:
>>>>
>>>> commit ba5952e0711b14d8d4fe172671f8aa6091ace3ee
>>>> Author: Ming Lei <[email protected]>
>>>> Date: Fri Jul 13 17:25:24 2012 +0800
>>>>
>>>> USB: ehci-omap: fix compile failure(v1)
>>>>
>>>> The symptom can be worked around by retrying the USB access if a timeout
>>>> occurs. This is clearly _not_ the fix, just a hack that I used to
>>>> investigate the problem:
>>>>
>>>> http://article.gmane.org/gmane.linux.rt.user/9773
>>>>
>>>> My kernel configuration is:
>>>>
>>>> arch/arm/configs/omap2plus_defconfig
>>>>
>>>> plus to get the ethernet driver I add:
>>>>
>>>> CONFIG_USB_EHCI_HCD
>>>> CONFIG_USB_NET_SMSC95XX
>>>>
>>>> I found the problem on 3.6.11, but have not replicated it on 3.9-rcX
>>>> yet because my config fails to build on 3.9-rc1 and 3.9-rc2. I'll try
>>>> to work on that issue tomorrow.
>>>
>>> Let me know how it works out.
>>
>> My PandaBoard builds fail on 3.9-rcX due to ARM multiplatform issues.
>> Either there is something I need to change about the way I build it,
>> or it is broken (that is a side issue). My simple expedient was to
>> hack around multiplatform, and just make it build (patch below if
>> anyone else wants a _temporary_ hack).
>
> I have built 3.9-RC2 for PandaBoard ES and the only problem I have seen is
> that you need to add "LOADADDR=0x80008000" when building uImage target.

Yes, that is essentially what my hack patch does. The result of my patch
is that arch/arm/boot/Makefile is invoked with MACHINE="arch/arm/mach-omap2"
so that at the top of the makefile, the "include $(srctree)/$(MACHINE)/Makefile.boot"
which pulls in the proper values for addresses.

-Frank

2013-03-24 02:17:40

by Ming Lei

[permalink] [raw]
Subject: Re: [BUG] bisected: PandaBoard smsc95xx ethernet driver error from USB timeout

On Fri, Mar 22, 2013 at 4:28 AM, Frank Rowand <[email protected]> wrote:
>> I play upstream kernel on Pandaboard A1 frequently, looks not
>> see the failure problem before. Maybe the problem is config dependent.
>>
>> If you may share your config file, I'd like to do the test too.

3.9-rc2-20130314 doesn't have the problem observed on my Pandaboard A1,
but I only tested booting from MMC, not from NFS.


Thanks,
--
Ming Lei