2015-02-23 14:57:19

by Ian Kumlien

[permalink] [raw]
Subject: [igb] AER timeout - resend.

Sending this to both netdev and kernel since i don't know if it's the
driver or the pcie AER that does something odd - the machine was
stable before 3.19 and PCIE AER.

Everything started out like i first sent to linux nics () intel:
------

And today i had some issues and wondered why things was broken, i was met with:

[950016.366477] pcieport 0000:00:04.0: AER: Uncorrected (Non-Fatal)
error received: id=0500
[950016.366495] igb 0000:05:00.0: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, id=0500(Requester ID)
[950016.366502] igb 0000:05:00.0: device [8086:1521] error
status/mask=00004000/00000000
[950016.366509] igb 0000:05:00.0: [14] Completion Timeout
[950016.366519] igb 0000:05:00.0: broadcast error_detected message
[950016.379742] br0: port 1(enp5s0f0) entered disabled state
[950016.488213] igb 0000:05:00.0: broadcast slot_reset message
[950016.588014] igb 0000:05:00.0: broadcast resume message
[950016.752654] igb 0000:05:00.0: AER: Device recovery successful
[950019.817249] igb 0000:05:00.1 enp5s0f1: igb: enp5s0f1 NIC Link is
Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[950020.699773] igb 0000:05:00.0 enp5s0f0: igb: enp5s0f0 NIC Link is
Up 1000 Mbps Full Duplex, Flow Control: RX
[950020.701485] br0: port 1(enp5s0f0) entered forwarding state
[950020.701504] br0: port 1(enp5s0f0) entered forwarding state
[976152.448092] ata5: exception Emask 0x50 SAct 0x0 SErr 0x4090800
action 0xe frozen
[976152.448100] ata5: irq_stat 0x00400040, connection status changed
[976152.448107] ata5: SError: { HostInt PHYRdyChg 10B8B DevExch }
[976152.448117] ata5: hard resetting link
[976152.448134] ata6: exception Emask 0x50 SAct 0x0 SErr 0x4090800
action 0xe frozen
[976152.448140] ata6: irq_stat 0x00400040, connection status changed
[976152.448147] ata6: SError: { HostInt PHYRdyChg 10B8B DevExch }
[976152.448155] ata6: hard resetting link
[976153.171195] ata6: SATA link down (SStatus 0 SControl 300)
[976158.174058] ata6: hard resetting link
[976158.174110] ata5: SATA link down (SStatus 0 SControl 300)
[976163.176997] ata5: hard resetting link
[976163.480133] ata6: SATA link down (SStatus 0 SControl 300)
[976163.480147] ata6: limiting SATA link speed to 1.5 Gbps
[976168.483028] ata6: hard resetting link
[976168.483095] ata5: SATA link down (SStatus 0 SControl 300)
[976168.483108] ata5: limiting SATA link speed to 1.5 Gbps
[976173.485907] ata5: hard resetting link
[976173.789066] ata6: SATA link down (SStatus 0 SControl 310)
[976173.789080] ata6.00: disabled
[976173.791066] ata6: EH complete
[976173.791078] ata5: SATA link down (SStatus 0 SControl 310)
[976173.791085] ata6.00: detaching (SCSI 5:0:0:0)
[976173.791090] ata5.00: disabled
[976173.794073] ata5: EH complete
[976173.794100] ata5.00: detaching (SCSI 4:0:0:0)
[976173.794968] sd 5:0:0:0: [sdb] Synchronizing SCSI cache
[976173.795073] sd 5:0:0:0: [sdb] Synchronize Cache(10) failed:
Result: hostbyte=0x04 driverbyte=0x00
[976173.795080] sd 5:0:0:0: [sdb] Stopping disk
[976173.795108] sd 5:0:0:0: [sdb] Start/Stop Unit failed: Result:
hostbyte=0x04 driverbyte=0x00
[976173.797180] sd 4:0:0:0: [sda] Synchronizing SCSI cache
[976173.797254] sd 4:0:0:0: [sda] Synchronize Cache(10) failed:
Result: hostbyte=0x04 driverbyte=0x00
[976173.797261] sd 4:0:0:0: [sda] Stopping disk
[976173.797285] sd 4:0:0:0: [sda] Start/Stop Unit failed: Result:
hostbyte=0x04 driverbyte=0x00

So two out of two disks just failed and isn't replying anymore?

Seven hours after a AER this machine who's intel ssd:s are idle just
fail to respond? ;)

Anyway, will reboot it when i get home - any idea/suggestion is more
than welcome.


2015-07-01 18:02:48

by Shaun Ruffell

[permalink] [raw]
Subject: Re: [igb] AER timeout - resend.

On Mon, Feb 23, 2015 at 03:56:56PM +0100, Ian Kumlien wrote:
> Sending this to both netdev and kernel since i don't know if it's the
> driver or the pcie AER that does something odd - the machine was
> stable before 3.19 and PCIE AER.
>
> Everything started out like i first sent to linux nics () intel:
> ------
>
> And today i had some issues and wondered why things was broken, i was met with:
>
> [950016.366477] pcieport 0000:00:04.0: AER: Uncorrected (Non-Fatal)
> error received: id=0500
> [950016.366495] igb 0000:05:00.0: PCIe Bus Error: severity=Uncorrected
> (Non-Fatal), type=Transaction Layer, id=0500(Requester ID)
> [950016.366502] igb 0000:05:00.0: device [8086:1521] error
> status/mask=00004000/00000000
> [950016.366509] igb 0000:05:00.0: [14] Completion Timeout
> [950016.366519] igb 0000:05:00.0: broadcast error_detected message
> [950016.379742] br0: port 1(enp5s0f0) entered disabled state
> [950016.488213] igb 0000:05:00.0: broadcast slot_reset message
> [950016.588014] igb 0000:05:00.0: broadcast resume message
> [950016.752654] igb 0000:05:00.0: AER: Device recovery successful
> [950019.817249] igb 0000:05:00.1 enp5s0f1: igb: enp5s0f1 NIC Link is
> Up 1000 Mbps Full Duplex, Flow Control: RX/TX
> [950020.699773] igb 0000:05:00.0 enp5s0f0: igb: enp5s0f0 NIC Link is
> Up 1000 Mbps Full Duplex, Flow Control: RX
> [950020.701485] br0: port 1(enp5s0f0) entered forwarding state
> [950020.701504] br0: port 1(enp5s0f0) entered forwarding state
> [976152.448092] ata5: exception Emask 0x50 SAct 0x0 SErr 0x4090800
> action 0xe frozen
> [976152.448100] ata5: irq_stat 0x00400040, connection status changed
> [976152.448107] ata5: SError: { HostInt PHYRdyChg 10B8B DevExch }
> [976152.448117] ata5: hard resetting link
> [976152.448134] ata6: exception Emask 0x50 SAct 0x0 SErr 0x4090800
> action 0xe frozen
> [976152.448140] ata6: irq_stat 0x00400040, connection status changed
> [976152.448147] ata6: SError: { HostInt PHYRdyChg 10B8B DevExch }
> [976152.448155] ata6: hard resetting link
> [976153.171195] ata6: SATA link down (SStatus 0 SControl 300)
> [976158.174058] ata6: hard resetting link
> [976158.174110] ata5: SATA link down (SStatus 0 SControl 300)
> [976163.176997] ata5: hard resetting link
> [976163.480133] ata6: SATA link down (SStatus 0 SControl 300)
> [976163.480147] ata6: limiting SATA link speed to 1.5 Gbps
> [976168.483028] ata6: hard resetting link
> [976168.483095] ata5: SATA link down (SStatus 0 SControl 300)
> [976168.483108] ata5: limiting SATA link speed to 1.5 Gbps
> [976173.485907] ata5: hard resetting link
> [976173.789066] ata6: SATA link down (SStatus 0 SControl 310)
> [976173.789080] ata6.00: disabled
> [976173.791066] ata6: EH complete
> [976173.791078] ata5: SATA link down (SStatus 0 SControl 310)
> [976173.791085] ata6.00: detaching (SCSI 5:0:0:0)
> [976173.791090] ata5.00: disabled
> [976173.794073] ata5: EH complete
> [976173.794100] ata5.00: detaching (SCSI 4:0:0:0)
> [976173.794968] sd 5:0:0:0: [sdb] Synchronizing SCSI cache
> [976173.795073] sd 5:0:0:0: [sdb] Synchronize Cache(10) failed:
> Result: hostbyte=0x04 driverbyte=0x00
> [976173.795080] sd 5:0:0:0: [sdb] Stopping disk
> [976173.795108] sd 5:0:0:0: [sdb] Start/Stop Unit failed: Result:
> hostbyte=0x04 driverbyte=0x00
> [976173.797180] sd 4:0:0:0: [sda] Synchronizing SCSI cache
> [976173.797254] sd 4:0:0:0: [sda] Synchronize Cache(10) failed:
> Result: hostbyte=0x04 driverbyte=0x00
> [976173.797261] sd 4:0:0:0: [sda] Stopping disk
> [976173.797285] sd 4:0:0:0: [sda] Start/Stop Unit failed: Result:
> hostbyte=0x04 driverbyte=0x00
>
> So two out of two disks just failed and isn't replying anymore?
>
> Seven hours after a AER this machine who's intel ssd:s are idle just
> fail to respond? ;)
>
> Anyway, will reboot it when i get home - any idea/suggestion is more
> than welcome.

Hi Ian,

Did you ever find a resolution to this? I'm seeing something very
similar where a customer upgrades to 3.19 and then there are AER
errors and the links are brought down but 3.10 works fine.

Thanks,
Shaun

2015-07-01 21:34:16

by Shaun Ruffell

[permalink] [raw]
Subject: Re: [igb] AER timeout - resend.

On Wed, Jul 01, 2015 at 11:18:36PM +0200, Ian Kumlien wrote:
> It was actually fixed with a bios upgrade from Super Micro (in this case)
> so I'd investigate that first... =)

Hmm...interesting. Thanks for the reply!