2013-03-27 21:18:14

by Joseph A. Millikan

[permalink] [raw]
Subject: 3.8.4 kernel


We use Mint 11 32-bit and we've noticed with any kernel beyond 3.6.11,
the Ethernet port stops responding to pings after 15 minutes and samba
clients cease to receive data in realtime. I hope this is the correct
place to report this as we received a nasty, sullen response when posted
with the Ubuntu folks (which we will NEVER do again.)

If you aren't the proper party to whom we should direct this report,
please disregard. We were just trying to help developers if there is an
undiscovered issue with Ethernet ports on Lenovo G770 laptops which our
court uses with Mint 11.

Thank you for your attention to this report.


2013-03-27 21:46:47

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: 3.8.4 kernel

On Wed, Mar 27, 2013 at 3:18 PM, Joseph A. Millikan
<[email protected]> wrote:
>
> We use Mint 11 32-bit and we've noticed with any kernel beyond 3.6.11, the
> Ethernet port stops responding to pings after 15 minutes and samba clients
> cease to receive data in realtime. I hope this is the correct place to
> report this as we received a nasty, sullen response when posted with the
> Ubuntu folks (which we will NEVER do again.)
>
> If you aren't the proper party to whom we should direct this report, please
> disregard. We were just trying to help developers if there is an
> undiscovered issue with Ethernet ports on Lenovo G770 laptops which our
> court uses with Mint 11.

What specific kernels have you tried? It'd be surprising if such an
egregious issue went unnoticed for long, but if recent kernels like
3.8 or 3.9-rc are still broken, there's likely something we need to
fix. Can you collect complete dmesg logs from 3.6.11 and the oldest
broken kernel you've found? You might also run "watch cat
/proc/interrupts" in a window off to the side and see if the NIC
interrupt count stops increasing after 15 minutes.

Bjorn

2013-04-26 22:33:35

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: 3.8.4 kernel

On Wed, Mar 27, 2013 at 3:46 PM, Bjorn Helgaas <[email protected]> wrote:
> On Wed, Mar 27, 2013 at 3:18 PM, Joseph A. Millikan
> <[email protected]> wrote:
>>
>> We use Mint 11 32-bit and we've noticed with any kernel beyond 3.6.11, the
>> Ethernet port stops responding to pings after 15 minutes and samba clients
>> cease to receive data in realtime. I hope this is the correct place to
>> report this as we received a nasty, sullen response when posted with the
>> Ubuntu folks (which we will NEVER do again.)
>>
>> If you aren't the proper party to whom we should direct this report, please
>> disregard. We were just trying to help developers if there is an
>> undiscovered issue with Ethernet ports on Lenovo G770 laptops which our
>> court uses with Mint 11.
>
> What specific kernels have you tried? It'd be surprising if such an
> egregious issue went unnoticed for long, but if recent kernels like
> 3.8 or 3.9-rc are still broken, there's likely something we need to
> fix. Can you collect complete dmesg logs from 3.6.11 and the oldest
> broken kernel you've found? You might also run "watch cat
> /proc/interrupts" in a window off to the side and see if the NIC
> interrupt count stops increasing after 15 minutes.

Did this ever get resolved?

2013-05-07 01:55:22

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: 3.8.4 kernel

[+cc Xiong, Cloud, netdev since this looks like an atl1c issue]

On Fri, Apr 26, 2013 at 3:33 PM, Bjorn Helgaas <[email protected]> wrote:
> On Wed, Mar 27, 2013 at 3:46 PM, Bjorn Helgaas <[email protected]> wrote:
>> On Wed, Mar 27, 2013 at 3:18 PM, Joseph A. Millikan
>> <[email protected]> wrote:
>>>
>>> We use Mint 11 32-bit and we've noticed with any kernel beyond 3.6.11, the
>>> Ethernet port stops responding to pings after 15 minutes and samba clients
>>> cease to receive data in realtime. I hope this is the correct place to
>>> report this as we received a nasty, sullen response when posted with the
>>> Ubuntu folks (which we will NEVER do again.)
>>>
>>> If you aren't the proper party to whom we should direct this report, please
>>> disregard. We were just trying to help developers if there is an
>>> undiscovered issue with Ethernet ports on Lenovo G770 laptops which our
>>> court uses with Mint 11.
>>
>> What specific kernels have you tried? It'd be surprising if such an
>> egregious issue went unnoticed for long, but if recent kernels like
>> 3.8 or 3.9-rc are still broken, there's likely something we need to
>> fix. Can you collect complete dmesg logs from 3.6.11 and the oldest
>> broken kernel you've found? You might also run "watch cat
>> /proc/interrupts" in a window off to the side and see if the NIC
>> interrupt count stops increasing after 15 minutes.
>
> Did this ever get resolved?

I opened https://bugzilla.kernel.org/show_bug.cgi?id=57681 to keep
track of this and attached the dmesg logs you collected. If you have
a chance, could you also collect and attach the output of "lspci -vv"
(any kernel is fine for this).

I don't see anything obvious wrong, at least from the PCI side. Maybe
the atl1c guys will have some ideas.

Bjorn

2013-05-07 02:52:02

by Huang, Xiong

[permalink] [raw]
Subject: RE: 3.8.4 kernel

> >
> > Did this ever get resolved?
>
> I opened https://bugzilla.kernel.org/show_bug.cgi?id=57681 to keep track of
> this and attached the dmesg logs you collected. If you have a chance, could you
> also collect and attach the output of "lspci -vv"
> (any kernel is fine for this).
>
> I don't see anything obvious wrong, at least from the PCI side. Maybe the atl1c
> guys will have some ideas.
>

Hi Bjorn and All
This issue should be same as bug https://bugzilla.kernel.org/show_bug.cgi?id=54021
I didn't find any abnormal info from the log and the PHY link is stable as well :(
Is it related to the network manager ?

BR.
Xiong

2013-05-07 16:20:40

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: 3.8.4 kernel

[+cc Eric because he made a change (69b08f62e17) that apparently
exposes driver bugs]

On Mon, May 6, 2013 at 7:51 PM, Huang, Xiong <[email protected]> wrote:
>> >
>> > Did this ever get resolved?
>>
>> I opened https://bugzilla.kernel.org/show_bug.cgi?id=57681 to keep track of
>> this and attached the dmesg logs you collected. If you have a chance, could you
>> also collect and attach the output of "lspci -vv"
>> (any kernel is fine for this).
>>
>> I don't see anything obvious wrong, at least from the PCI side. Maybe the atl1c
>> guys will have some ideas.
>>
>
> Hi Bjorn and All
> This issue should be same as bug https://bugzilla.kernel.org/show_bug.cgi?id=54021
> I didn't find any abnormal info from the log and the PHY link is stable as well :(
> Is it related to the network manager ?

I looked at bug #54021 and it does look similar. It doesn't look like
it has been resolved.

http://forums.gentoo.org/viewtopic-t-949168-highlight-.html is another
report that looks very similar. BigE there has a Lenovo G570 that has
both wireless and wired networking. If BigE disables wireless with a
hardware switch before booting, it seems to avoid the atl1c wired
networking issue. That's not a fix, of course, but it might be a clue
and a temporary workaround until we have a real solution.

Joseph, if there's no hardware switch for the wireless on your G770,
you can probably still take wireless out of the picture by removing or
renaming the bcma module (look in
/lib/modules/.../kernel/drivers/bcma/), then rebooting.

Xiong, do you have a specific network manager-related test that Joseph
could perform? I don't want to burden Joseph with a lot of debugging
because he's currently happy with 3.6.11 and testing is pretty
disruptive in his environment.

Bjorn

2013-05-07 17:12:00

by Eric Dumazet

[permalink] [raw]
Subject: Re: 3.8.4 kernel

On Tue, 2013-05-07 at 09:20 -0700, Bjorn Helgaas wrote:
> [+cc Eric because he made a change (69b08f62e17) that apparently
> exposes driver bugs]
>
> On Mon, May 6, 2013 at 7:51 PM, Huang, Xiong <[email protected]> wrote:
> >> >
> >> > Did this ever get resolved?
> >>
> >> I opened https://bugzilla.kernel.org/show_bug.cgi?id=57681 to keep track of
> >> this and attached the dmesg logs you collected. If you have a chance, could you
> >> also collect and attach the output of "lspci -vv"
> >> (any kernel is fine for this).
> >>
> >> I don't see anything obvious wrong, at least from the PCI side. Maybe the atl1c
> >> guys will have some ideas.
> >>
> >
> > Hi Bjorn and All
> > This issue should be same as bug https://bugzilla.kernel.org/show_bug.cgi?id=54021
> > I didn't find any abnormal info from the log and the PHY link is stable as well :(
> > Is it related to the network manager ?
>
> I looked at bug #54021 and it does look similar. It doesn't look like
> it has been resolved.
>
> http://forums.gentoo.org/viewtopic-t-949168-highlight-.html is another
> report that looks very similar. BigE there has a Lenovo G570 that has
> both wireless and wired networking. If BigE disables wireless with a
> hardware switch before booting, it seems to avoid the atl1c wired
> networking issue. That's not a fix, of course, but it might be a clue
> and a temporary workaround until we have a real solution.
>
> Joseph, if there's no hardware switch for the wireless on your G770,
> you can probably still take wireless out of the picture by removing or
> renaming the bcma module (look in
> /lib/modules/.../kernel/drivers/bcma/), then rebooting.
>
> Xiong, do you have a specific network manager-related test that Joseph
> could perform? I don't want to burden Joseph with a lot of debugging
> because he's currently happy with 3.6.11 and testing is pretty
> disruptive in his environment.

drivers/net/wireless/brcm80211/brcmsmac/dma.c contains this suspect code
in dma64_getnextrxp() :

dma_addr_t pa;
...
pa = le32_to_cpu(di->rxd64[i].addrlow) - di->dataoffsetlow;
/* clear this packet from the descriptor ring */
dma_unmap_single(di->dmadev, pa, di->rxbufsize, DMA_FROM_DEVICE);


How can it possibly work, I honestly have no idea.

I suggest enabling CONFIG_DMA_API_DEBUG ?


2013-05-07 18:25:04

by Huang, Xiong

[permalink] [raw]
Subject: RE: 3.8.4 kernel

>
> Joseph, if there's no hardware switch for the wireless on your G770, you can
> probably still take wireless out of the picture by removing or renaming the
> bcma module (look in /lib/modules/.../kernel/drivers/bcma/), then rebooting.
>
> Xiong, do you have a specific network manager-related test that Joseph could
> perform? I don't want to burden Joseph with a lot of debugging because he's
> currently happy with 3.6.11 and testing is pretty disruptive in his environment.
>

There is a stranger behavior for this issue: if only enable either one (Ethernet or Wlan), everything looks OK.
But once enable both, the issue arises.

And Misha Labjuk said:
------->
Comment #38 From Misha Labjuk 2013-03-10 21:10:15 (-) [reply] 02:00.0 Ethernet controller: Qualcomm Atheros AR8152 v2.0 Fast Ethernet (rev
c1)

If configured by iproute2 (ip addr add...) stable working.
If NetworkManager running (just running, not managing) - ethernet die after 1-5
min. After "mii-tool -r eth0" working until next freeze.
<------

Unfortunately, I have no idea about NetworkManager :(

BR.
Xiong