2013-05-04 20:56:12

by John

[permalink] [raw]
Subject: 3.9.0 dmesg reports that my NIC is hanging

After upgrading to the official Arch Linux 3.9-2 kernel package, dmesg reports that my NIC is hanging:

[ ? ?5.955720] e1000e 0000:00:19.0 eno1: changing MTU from 1500 to 4000
[ ? ?8.464507] e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang:
? TDH ? ? ? ? ? ? ? ? ?<0>
? TDT ? ? ? ? ? ? ? ? ?<2>
? next_to_use ? ? ? ? ?<2>
? next_to_clean ? ? ? ?<0>
buffer_info[next_to_clean]:
? time_stamp ? ? ? ? ? <fffea787>
? next_to_watch ? ? ? ?<0>
? jiffies ? ? ? ? ? ? ?<fffeaa30>
? next_to_watch.status <0>
MAC Status ? ? ? ? ? ? <40080080>
PHY Status ? ? ? ? ? ? <7949>
PHY 1000BASE-T Status ?<0>
PHY Extended Status ? ?<3000>
PCI Status ? ? ? ? ? ? <10>

Not too sure what else to post. ?I am not subscribed to lkml so please cc my email in your reply.


Link to complete dmesg: http://pastebin.com/zRBajGrY
Seems similar to: bugzilla.redhat.com/show_bug.cgi?id=785806


2013-05-06 17:30:09

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: 3.9.0 dmesg reports that my NIC is hanging

[+cc Jeff, e1000-devel (from MAINTAINERS)]

On Sat, May 4, 2013 at 1:56 PM, John <[email protected]> wrote:
> After upgrading to the official Arch Linux 3.9-2 kernel package, dmesg reports that my NIC is hanging:
>
> [ 5.955720] e1000e 0000:00:19.0 eno1: changing MTU from 1500 to 4000
> [ 8.464507] e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang:
> TDH <0>
> TDT <2>
> next_to_use <2>
> next_to_clean <0>
> buffer_info[next_to_clean]:
> time_stamp <fffea787>
> next_to_watch <0>
> jiffies <fffeaa30>
> next_to_watch.status <0>
> MAC Status <40080080>
> PHY Status <7949>
> PHY 1000BASE-T Status <0>
> PHY Extended Status <3000>
> PCI Status <10>
>
> Not too sure what else to post. I am not subscribed to lkml so please cc my email in your reply.
>
>
> Link to complete dmesg: http://pastebin.com/zRBajGrY
> Seems similar to: bugzilla.redhat.com/show_bug.cgi?id=785806

It sounds like this is a regression, so it might be useful to know
what the newest working kernel was, and maybe a dmesg log from it as
well, though I don't see any obvious clues in the 3.9.0-2-ARCH dmesg
you collected.

Bjorn

2013-05-07 21:07:31

by John

[permalink] [raw]
Subject: Re: 3.9.0 dmesg reports that my NIC is hanging





----- Original Message -----
> From: Bjorn Helgaas <[email protected]>
> To: John <[email protected]>
> Cc: lkml <[email protected]>; Jeff Kirsher <[email protected]>; "[email protected]" <[email protected]>
> Sent: Monday, May 6, 2013 1:29 PM
> Subject: Re: 3.9.0 dmesg reports that my NIC is hanging
>
> [+cc Jeff, e1000-devel (from MAINTAINERS)]
>
> On Sat, May 4, 2013 at 1:56 PM, John <[email protected]> wrote:
>>? After upgrading to the official Arch Linux 3.9-2 kernel package, dmesg
> reports that my NIC is hanging:
>>
>>? [? ? 5.955720] e1000e 0000:00:19.0 eno1: changing MTU from 1500 to 4000
>>? [? ? 8.464507] e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang:
>> ?? TDH? ? ? ? ? ? ? ? ? <0>
>> ?? TDT? ? ? ? ? ? ? ? ? <2>
>> ?? next_to_use? ? ? ? ? <2>
>> ?? next_to_clean? ? ? ? <0>
>>? buffer_info[next_to_clean]:
>> ?? time_stamp? ? ? ? ?? <fffea787>
>> ?? next_to_watch? ? ? ? <0>
>> ?? jiffies? ? ? ? ? ? ? <fffeaa30>
>> ?? next_to_watch.status <0>
>>? MAC Status? ? ? ? ? ?? <40080080>
>>? PHY Status? ? ? ? ? ?? <7949>
>>? PHY 1000BASE-T Status? <0>
>>? PHY Extended Status? ? <3000>
>>? PCI Status? ? ? ? ? ?? <10>
>>
>>? Not too sure what else to post.? I am not subscribed to lkml so please cc
> my email in your reply.
>>
>>
>>? Link to complete dmesg: http://pastebin.com/zRBajGrY
>>? Seems similar to: bugzilla.redhat.com/show_bug.cgi?id=785806
>
> It sounds like this is a regression, so it might be useful to know
> what the newest working kernel was, and maybe a dmesg log from it as
> well, though I don't see any obvious clues in the 3.9.0-2-ARCH dmesg
> you collected.
>
> Bjorn


Thank you for the reply, Bjorn. ?3.8.11-1-ARCH works just fine for me. ?Here is the dmesg from 3.8.11-1-ARCH per your request:?http://pastebin.com/cUHwrQfq

2013-08-22 17:45:19

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: 3.9.0 dmesg reports that my NIC is hanging

On Tue, May 7, 2013 at 3:07 PM, John <[email protected]> wrote:
>
>
>
>
> ----- Original Message -----
>> From: Bjorn Helgaas <[email protected]>
>> To: John <[email protected]>
>> Cc: lkml <[email protected]>; Jeff Kirsher <[email protected]>; "[email protected]" <[email protected]>
>> Sent: Monday, May 6, 2013 1:29 PM
>> Subject: Re: 3.9.0 dmesg reports that my NIC is hanging
>>
>> [+cc Jeff, e1000-devel (from MAINTAINERS)]
>>
>> On Sat, May 4, 2013 at 1:56 PM, John <[email protected]> wrote:
>>> After upgrading to the official Arch Linux 3.9-2 kernel package, dmesg
>> reports that my NIC is hanging:
>>>
>>> [ 5.955720] e1000e 0000:00:19.0 eno1: changing MTU from 1500 to 4000
>>> [ 8.464507] e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang:
>>> TDH <0>
>>> TDT <2>
>>> next_to_use <2>
>>> next_to_clean <0>
>>> buffer_info[next_to_clean]:
>>> time_stamp <fffea787>
>>> next_to_watch <0>
>>> jiffies <fffeaa30>
>>> next_to_watch.status <0>
>>> MAC Status <40080080>
>>> PHY Status <7949>
>>> PHY 1000BASE-T Status <0>
>>> PHY Extended Status <3000>
>>> PCI Status <10>
>>>
>>> Not too sure what else to post. I am not subscribed to lkml so please cc
>> my email in your reply.
>>>
>>>
>>> Link to complete dmesg: http://pastebin.com/zRBajGrY
>>> Seems similar to: bugzilla.redhat.com/show_bug.cgi?id=785806
>>
>> It sounds like this is a regression, so it might be useful to know
>> what the newest working kernel was, and maybe a dmesg log from it as
>> well, though I don't see any obvious clues in the 3.9.0-2-ARCH dmesg
>> you collected.
>>
>> Bjorn
>
>
> Thank you for the reply, Bjorn. 3.8.11-1-ARCH works just fine for me. Here is the dmesg from 3.8.11-1-ARCH per your request: http://pastebin.com/cUHwrQfq

Sorry this thread died. Did this ever get resolved?

If not, can you collect "lspci -vv" output for the whole system on
both the working kernel and the failing one?

There are reports of similar symptoms at [1] and [2]. I can't tell
yet if you're seeing the same problem, but for [1], booting with
"pci=pcie_bus_peer2peer" was a workaround.

Bjorn

[1] http://lkml.kernel.org/r/[email protected] [2012-11-08]
[2] http://lkml.kernel.org/r/[email protected] [2012-07-09]