2014-01-30 19:08:20

by Zoltan Kiss

[permalink] [raw]
Subject: igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer

Hi,

I've experienced some queue timeout problems mentioned in the subject
with igb and bnx2 cards. I haven't seen them on other cards so far. I'm
using XenServer with 3.10 Dom0 kernel (however igb were already updated
to latest version), and there are Windows guests sending data through
these cards. I noticed these problems in XenRT test runs, and I know
that they usually mean some lost interrupt problem or other hardware
error, but in my case they started to appear more often, and they are
likely connected to my netback grant mapping patches. These patches
causing skb's with huge (~64kb) linear buffers to appear more often.
The reason for that is an old problem in the ring protocol: originally
the maximum amount of slots were linked to MAX_SKB_FRAGS, as every slot
ended up as a frag of the skb. When this value were changed, netback had
to cope with the situation by coalescing the packets into fewer frags.
My patch series take a different approach: the leftover slots (pages)
were assigned to a new skb's frags, and that skb were stashed to the
frag_list of the first one. Then, before sending it off to the stack it
calls skb = skb_copy_expand(skb, 0, 0, GFP_ATOMIC, __GFP_NOWARN), which
basically creates a new skb and copied all the data into it. As far as I
understood, it put everything into the linear buffer, which can amount
to 64KB at most. The original skb are freed then, and this new one were
sent to the stack.
I suspect that this is the problem as it only happens when guests send
too much slots. Does anyone familiar with these drivers have seen such
issue before? (when these kind of skb's get stucked in the queue)

Regards,

Zoltan Kiss


2014-01-30 20:34:12

by Michael Chan

[permalink] [raw]
Subject: Re: igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer

On Thu, 2014-01-30 at 19:08 +0000, Zoltan Kiss wrote:
> I've experienced some queue timeout problems mentioned in the subject
> with igb and bnx2 cards.

Please provide the full tx timeout dmesg. bnx2 dumps some diagnostic
information during tx timeout that may be useful. Thanks.

2014-01-31 13:29:11

by Zoltan Kiss

[permalink] [raw]
Subject: Re: igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer

On 30/01/14 21:34, Michael Chan wrote:
> On Thu, 2014-01-30 at 19:08 +0000, Zoltan Kiss wrote:
>> I've experienced some queue timeout problems mentioned in the subject
>> with igb and bnx2 cards.
> Please provide the full tx timeout dmesg. bnx2 dumps some diagnostic
> information during tx timeout that may be useful. Thanks.
Hi,

Here is some:

[ 5417.275463] ------------[ cut here ]------------
[ 5417.275472] WARNING: at net/sched/sch_generic.c:255
dev_watchdog+0x156/0x1f0()
[ 5417.275474] NETDEV WATCHDOG: eth1 (bnx2): transmit queue 2 timed out
[ 5417.275476] Modules linked in: tun nfsv3 nfs_acl rpcsec_gss_krb5
auth_rpcgss oid_registry nfsv4 nfs fscache lockd sunrpc ipv6
openvswitch(O) ipt_REJECT nf_conntrack_ipv
rack xt_tcpudp iptable_filter ip_tables x_tables nls_utf8 isofs
dm_multipath scsi_dh dm_mod dcdbas coretemp microcode psmouse serio_raw
lpc_ich mfd_core hid_generic ehci_p
sg hed bnx2 usbhid hid sr_mod cdrom sd_mod pata_acpi ata_generic
ata_piix libata uhci_hcd mptsas mptscsih mptbase scsi_transport_sas scsi_mod
[ 5417.275517] CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G O
3.10.11-0.xs1.8.50.170.377582 #1
[ 5417.275518] Hardware name: Dell Inc. PowerEdge R710/00W9X3, BIOS
1.2.6 07/17/2009
[ 5417.275520] 000000ff f008be08 c1488c53 f008be30 c1046664 c1658a88
f008be5c 000000ff
[ 5417.275525] c13fc146 c13fc146 ee96a000 00000002 00137d44 f008be48
c1046723 00000009
[ 5417.275530] f008be40 c1658a88 f008be5c f008be80 c13fc146 c16556e1
000000ff c1658a88
[ 5417.275535] Call Trace:
[ 5417.275539] [<c1488c53>] dump_stack+0x16/0x1b
[ 5417.275544] [<c1046664>] warn_slowpath_common+0x64/0x80
[ 5417.275546] [<c13fc146>] ? dev_watchdog+0x156/0x1f0
[ 5417.275549] [<c13fc146>] ? dev_watchdog+0x156/0x1f0
[ 5417.275551] [<c1046723>] warn_slowpath_fmt+0x33/0x40
[ 5417.275554] [<c13fc146>] dev_watchdog+0x156/0x1f0
[ 5417.275559] [<c10549ce>] call_timer_fn+0x3e/0xf0
[ 5417.275563] [<c107293e>] ? finish_task_switch+0x4e/0xb0
[ 5417.275565] [<c13fbff0>] ? __netdev_watchdog_up+0x60/0x60
[ 5417.275568] [<c1055c1b>] run_timer_softirq+0x1ab/0x210
[ 5417.275571] [<c13fbff0>] ? __netdev_watchdog_up+0x60/0x60
[ 5417.275574] [<c104e3f4>] __do_softirq+0xc4/0x200
[ 5417.275577] [<c1493547>] ? xen_do_upcall+0x7/0xc
[ 5417.275579] [<c104e550>] run_ksoftirqd+0x20/0x50
[ 5417.275582] [<c106f182>] smpboot_thread_fn+0x142/0x150
[ 5417.275586] [<c1067a2b>] kthread+0x9b/0xa0
[ 5417.275589] [<c106f040>] ? smpboot_create_threads+0x60/0x60
[ 5417.275591] [<c1070000>] ? cpu_rt_runtime_read+0x40/0x80
[ 5417.275594] [<c1492f77>] ret_from_kernel_thread+0x1b/0x28
[ 5417.275596] [<c1067990>] ? kthread_freezable_should_stop+0x60/0x60
[ 5417.275599] ---[ end trace 691f572d388226ca ]---
[ 5417.275602] bnx2 0000:01:00.1 eth1: <--- start FTQ dump --->
[ 5417.275622] bnx2 0000:01:00.1 eth1: RV2P_PFTQ_CTL 00010000
[ 5417.275629] bnx2 0000:01:00.1 eth1: RV2P_TFTQ_CTL 00020000
[ 5417.275636] bnx2 0000:01:00.1 eth1: RV2P_MFTQ_CTL 00004000
[ 5417.275643] bnx2 0000:01:00.1 eth1: TBDR_FTQ_CTL 00004002
[ 5417.275650] bnx2 0000:01:00.1 eth1: TDMA_FTQ_CTL 00010002
[ 5417.275657] bnx2 0000:01:00.1 eth1: TXP_FTQ_CTL 00010000
[ 5417.275663] bnx2 0000:01:00.1 eth1: TXP_FTQ_CTL 00010000
[ 5417.275670] bnx2 0000:01:00.1 eth1: TPAT_FTQ_CTL 00010000
[ 5417.275677] bnx2 0000:01:00.1 eth1: RXP_CFTQ_CTL 00008000
[ 5417.275684] bnx2 0000:01:00.1 eth1: RXP_FTQ_CTL 00100000
[ 5417.275690] bnx2 0000:01:00.1 eth1: COM_COMXQ_FTQ_CTL 00010000
[ 5417.275698] bnx2 0000:01:00.1 eth1: COM_COMTQ_FTQ_CTL 00020000
[ 5417.275705] bnx2 0000:01:00.1 eth1: COM_COMQ_FTQ_CTL 00010000
[ 5417.275712] bnx2 0000:01:00.1 eth1: CP_CPQ_FTQ_CTL 00004000
[ 5417.275718] bnx2 0000:01:00.1 eth1: CPU states:
[ 5417.275730] bnx2 0000:01:00.1 eth1: 045000 mode b84c state 80001000
evt_mask 500 pc 8001284 pc 8001284 instr 1440fffc
[ 5417.275746] bnx2 0000:01:00.1 eth1: 085000 mode b84c state 80005000
evt_mask 500 pc 8000a54 pc 8000a5c instr 10400016
[ 5417.275785] bnx2 0000:01:00.1 eth1: 0c5000 mode b84c state 80001000
evt_mask 500 pc 8004c20 pc 8004c20 instr 32050003
[ 5417.275801] bnx2 0000:01:00.1 eth1: 105000 mode b8cc state 80000000
evt_mask 500 pc 8000a8c pc 8000a94 instr 8c420020
[ 5417.275817] bnx2 0000:01:00.1 eth1: 145000 mode b880 state 80000000
evt_mask 500 pc 8000ab0 pc 800d1e8 instr 27bd0020
[ 5417.275834] bnx2 0000:01:00.1 eth1: 185000 mode b8cc state 80000000
evt_mask 500 pc 8000cb0 pc 8000930 instr 8ce800e8
[ 5417.275845] bnx2 0000:01:00.1 eth1: <--- end FTQ dump --->
[ 5417.275851] bnx2 0000:01:00.1 eth1: <--- start TBDC dump --->
[ 5417.275858] bnx2 0000:01:00.1 eth1: TBDC free cnt: 32
[ 5417.275864] bnx2 0000:01:00.1 eth1: LINE CID BIDX CMD VALIDS
[ 5417.275875] bnx2 0000:01:00.1 eth1: 00 001080 17c8 00 [0]
[ 5417.275886] bnx2 0000:01:00.1 eth1: 01 001080 17e0 00 [0]
[ 5417.275897] bnx2 0000:01:00.1 eth1: 02 001080 17e8 00 [0]
[ 5417.275907] bnx2 0000:01:00.1 eth1: 03 001080 17f8 00 [0]
[ 5417.275918] bnx2 0000:01:00.1 eth1: 04 001080 1800 00 [0]
[ 5417.275929] bnx2 0000:01:00.1 eth1: 05 001080 17d0 00 [0]
[ 5417.275940] bnx2 0000:01:00.1 eth1: 06 001080 17d8 00 [0]
[ 5417.275951] bnx2 0000:01:00.1 eth1: 07 001080 17f0 00 [0]
[ 5417.275961] bnx2 0000:01:00.1 eth1: 08 001080 1620 00 [0]
[ 5417.275972] bnx2 0000:01:00.1 eth1: 09 17de00 fbf8 78 [0]
[ 5417.275983] bnx2 0000:01:00.1 eth1: 0a 1bbf80 fef8 9f [0]
[ 5417.275994] bnx2 0000:01:00.1 eth1: 0b 1d2d80 f7f8 7f [0]
[ 5417.276005] bnx2 0000:01:00.1 eth1: 0c 148f00 f7b8 88 [0]
[ 5417.276016] bnx2 0000:01:00.1 eth1: 0d 16af80 f7d0 75 [0]
[ 5417.276026] bnx2 0000:01:00.1 eth1: 0e 1adf80 bfb0 26 [0]
[ 5417.276037] bnx2 0000:01:00.1 eth1: 0f 1ebf80 dd68 3c [0]
[ 5417.276048] bnx2 0000:01:00.1 eth1: 10 1cf700 d1f0 fc [0]
[ 5417.276059] bnx2 0000:01:00.1 eth1: 11 1cdc00 fbf0 7d [0]
[ 5417.276069] bnx2 0000:01:00.1 eth1: 12 15c900 f7f8 ef [0]
[ 5417.276081] bnx2 0000:01:00.1 eth1: 13 17cf00 d7d8 3f [0]
[ 5417.276093] bnx2 0000:01:00.1 eth1: 14 1ecf80 ffb0 b7 [0]
[ 5417.276107] bnx2 0000:01:00.1 eth1: 15 1cbd80 f3e8 bf [0]
[ 5417.276119] bnx2 0000:01:00.1 eth1: 16 179b80 d7f8 d7 [0]
[ 5417.276130] bnx2 0000:01:00.1 eth1: 17 1fdf00 f3e8 7e [0]
[ 5417.276141] bnx2 0000:01:00.1 eth1: 18 1f9780 b578 af [0]
[ 5417.276152] bnx2 0000:01:00.1 eth1: 19 1d7d80 fef0 ff [0]
[ 5417.276163] bnx2 0000:01:00.1 eth1: 1a 1d9e80 5fe8 d7 [0]
[ 5417.276174] bnx2 0000:01:00.1 eth1: 1b 1fff80 ebf8 f8 [0]
[ 5417.276186] bnx2 0000:01:00.1 eth1: 1c 1fbd80 f7d8 7f [0]
[ 5417.276200] bnx2 0000:01:00.1 eth1: 1d 16da80 2ef8 ff [0]
[ 5417.276211] bnx2 0000:01:00.1 eth1: 1e 1f9b80 bf50 8e [0]
[ 5417.276224] bnx2 0000:01:00.1 eth1: 1f 1bdf00 faf8 75 [0]
[ 5417.276231] bnx2 0000:01:00.1 eth1: <--- end TBDC dump --->
[ 5417.276246] bnx2 0000:01:00.1 eth1: DEBUG: intr_sem[0] PCI_CMD[00100406]
[ 5417.276258] bnx2 0000:01:00.1 eth1: DEBUG: PCI_PM[19002008]
PCI_MISC_CFG[92000088]
[ 5417.276269] bnx2 0000:01:00.1 eth1: DEBUG: EMAC_TX_STATUS[00000008]
EMAC_RX_STATUS[00000000]
[ 5417.276280] bnx2 0000:01:00.1 eth1: DEBUG: RPM_MGMT_PKT_CTRL[40000088]
[ 5417.276288] bnx2 0000:01:00.1 eth1: DEBUG:
HC_STATS_INTERRUPT_STATUS[01fb0004]
[ 5417.276298] bnx2 0000:01:00.1 eth1: DEBUG: PBA[00000000]
[ 5417.276304] bnx2 0000:01:00.1 eth1: <--- start MCP states dump --->
[ 5417.276314] bnx2 0000:01:00.1 eth1: DEBUG: MCP_STATE_P0[0003610e]
MCP_STATE_P1[0003610e]
[ 5417.276326] bnx2 0000:01:00.1 eth1: DEBUG: MCP mode[0000b880]
state[80000000] evt_mask[00000500]
[ 5417.276339] bnx2 0000:01:00.1 eth1: DEBUG: pc[0800d7b8] pc[08000cdc]
instr[00041880]
[ 5417.276349] bnx2 0000:01:00.1 eth1: DEBUG: shmem states:
[ 5417.276358] bnx2 0000:01:00.1 eth1: DEBUG: drv_mb[0d000004]
fw_mb[00000004] link_status[0000006f]
[ 5417.276369] drv_pulse_mb[00001485]
[ 5417.276373] bnx2 0000:01:00.1 eth1: DEBUG:
dev_info_signature[44564903] reset_type[01005254]
[ 5417.276383] condition[0003610e]
[ 5417.276389] bnx2 0000:01:00.1 eth1: DEBUG: 000001c0: 01005254
42530000 0003610e 00000000
[ 5417.276402] bnx2 0000:01:00.1 eth1: DEBUG: 000003cc: 44444444
44444444 44444444 00000a28
[ 5417.276416] bnx2 0000:01:00.1 eth1: DEBUG: 000003dc: 0004ffff
00000000 00000000 00000000
[ 5417.276430] bnx2 0000:01:00.1 eth1: DEBUG: 000003ec: 00000000
00000000 00000000 00000000
[ 5417.276440] bnx2 0000:01:00.1 eth1: DEBUG: 0x3fc[0000ffff]

2014-01-31 18:56:22

by Wei Liu

[permalink] [raw]
Subject: Re: igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer

On Thu, Jan 30, 2014 at 07:08:11PM +0000, Zoltan Kiss wrote:
> Hi,
>
> I've experienced some queue timeout problems mentioned in the
> subject with igb and bnx2 cards. I haven't seen them on other cards
> so far. I'm using XenServer with 3.10 Dom0 kernel (however igb were
> already updated to latest version), and there are Windows guests
> sending data through these cards. I noticed these problems in XenRT
> test runs, and I know that they usually mean some lost interrupt
> problem or other hardware error, but in my case they started to
> appear more often, and they are likely connected to my netback grant
> mapping patches. These patches causing skb's with huge (~64kb)
> linear buffers to appear more often.
> The reason for that is an old problem in the ring protocol:
> originally the maximum amount of slots were linked to MAX_SKB_FRAGS,
> as every slot ended up as a frag of the skb. When this value were
> changed, netback had to cope with the situation by coalescing the
> packets into fewer frags.
> My patch series take a different approach: the leftover slots
> (pages) were assigned to a new skb's frags, and that skb were
> stashed to the frag_list of the first one. Then, before sending it
> off to the stack it calls skb = skb_copy_expand(skb, 0, 0,
> GFP_ATOMIC, __GFP_NOWARN), which basically creates a new skb and
> copied all the data into it. As far as I understood, it put
> everything into the linear buffer, which can amount to 64KB at most.
> The original skb are freed then, and this new one were sent to the
> stack.

Just my two cents, if it is this case, you can try to call
skb_copy_expand on every SKB netback receives to manually create SKBs
with ~64KB linear buffer to see how it goes...

Wei.

> I suspect that this is the problem as it only happens when guests
> send too much slots. Does anyone familiar with these drivers have
> seen such issue before? (when these kind of skb's get stucked in the
> queue)
>
> Regards,
>
> Zoltan Kiss
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2014-02-04 19:47:55

by Michael Chan

[permalink] [raw]
Subject: Re: igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer

On Fri, 2014-01-31 at 14:29 +0100, Zoltan Kiss wrote:
> [ 5417.275472] WARNING: at net/sched/sch_generic.c:255
> dev_watchdog+0x156/0x1f0()
> [ 5417.275474] NETDEV WATCHDOG: eth1 (bnx2): transmit queue 2 timed out

The dump shows an internal IRQ pending on MSIX vector 2 which matches
the the queue number that is timing out. I don't know what happened to
the MSIX and why the driver is not seeing it. Do you see an IRQ error
message from the kernel a few seconds before the tx timeout message?

2014-02-04 21:33:21

by Zoltan Kiss

[permalink] [raw]
Subject: Re: igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer

On 31/01/14 18:56, Wei Liu wrote:
> On Thu, Jan 30, 2014 at 07:08:11PM +0000, Zoltan Kiss wrote:
>> Hi,
>>
>> I've experienced some queue timeout problems mentioned in the
>> subject with igb and bnx2 cards. I haven't seen them on other cards
>> so far. I'm using XenServer with 3.10 Dom0 kernel (however igb were
>> already updated to latest version), and there are Windows guests
>> sending data through these cards. I noticed these problems in XenRT
>> test runs, and I know that they usually mean some lost interrupt
>> problem or other hardware error, but in my case they started to
>> appear more often, and they are likely connected to my netback grant
>> mapping patches. These patches causing skb's with huge (~64kb)
>> linear buffers to appear more often.
>> The reason for that is an old problem in the ring protocol:
>> originally the maximum amount of slots were linked to MAX_SKB_FRAGS,
>> as every slot ended up as a frag of the skb. When this value were
>> changed, netback had to cope with the situation by coalescing the
>> packets into fewer frags.
>> My patch series take a different approach: the leftover slots
>> (pages) were assigned to a new skb's frags, and that skb were
>> stashed to the frag_list of the first one. Then, before sending it
>> off to the stack it calls skb = skb_copy_expand(skb, 0, 0,
>> GFP_ATOMIC, __GFP_NOWARN), which basically creates a new skb and
>> copied all the data into it. As far as I understood, it put
>> everything into the linear buffer, which can amount to 64KB at most.
>> The original skb are freed then, and this new one were sent to the
>> stack.
>
> Just my two cents, if it is this case, you can try to call
> skb_copy_expand on every SKB netback receives to manually create SKBs
> with ~64KB linear buffer to see how it goes...

I've tried it, and it did break everything in a similar way, so that's a
strong clue that the problem lies here. I've rewrote that part of my
patches to do less modification, based on Malcolm's idea: netback pulls
the first frag into linear buffer, then moves a frag from the frag_list
skb into the first one. That seems to help, but so far I have only one
relevant test result, I'm waiting for more results.

Zoli

2014-02-05 20:24:11

by Zoltan Kiss

[permalink] [raw]
Subject: Re: igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer

On 04/02/14 19:47, Michael Chan wrote:
> On Fri, 2014-01-31 at 14:29 +0100, Zoltan Kiss wrote:
>> [ 5417.275472] WARNING: at net/sched/sch_generic.c:255
>> dev_watchdog+0x156/0x1f0()
>> [ 5417.275474] NETDEV WATCHDOG: eth1 (bnx2): transmit queue 2 timed out
>
> The dump shows an internal IRQ pending on MSIX vector 2 which matches
> the the queue number that is timing out. I don't know what happened to
> the MSIX and why the driver is not seeing it. Do you see an IRQ error
> message from the kernel a few seconds before the tx timeout message?

I haven't seen any IRQ related error message. Note, this is on Xen
4.3.1. Now I have new results with a reworked version of the patch,
unfortunately it still has this issue. Here is a bnx2 dump, lspci output
and some Xen debug output (MSI and interrupt bindings, I have more if
needed).

[82099.288743] bnx2 0000:02:00.0 eth0: <--- start FTQ dump --->
[82099.288767] bnx2 0000:02:00.0 eth0: RV2P_PFTQ_CTL 00010002
[82099.288779] bnx2 0000:02:00.0 eth0: RV2P_TFTQ_CTL 00020000
[82099.288790] bnx2 0000:02:00.0 eth0: RV2P_MFTQ_CTL 00004000
[82099.288801] bnx2 0000:02:00.0 eth0: TBDR_FTQ_CTL 00404002
[82099.288812] bnx2 0000:02:00.0 eth0: TDMA_FTQ_CTL 00010002
[82099.288823] bnx2 0000:02:00.0 eth0: TXP_FTQ_CTL 00810002
[82099.288834] bnx2 0000:02:00.0 eth0: TXP_FTQ_CTL 01010002
[82099.288845] bnx2 0000:02:00.0 eth0: TPAT_FTQ_CTL 00010002
[82099.288878] bnx2 0000:02:00.0 eth0: RXP_CFTQ_CTL 00008000
[82099.288889] bnx2 0000:02:00.0 eth0: RXP_FTQ_CTL 00100002
[82099.288899] bnx2 0000:02:00.0 eth0: COM_COMXQ_FTQ_CTL 00010000
[82099.288911] bnx2 0000:02:00.0 eth0: COM_COMTQ_FTQ_CTL 00020000
[82099.288923] bnx2 0000:02:00.0 eth0: COM_COMQ_FTQ_CTL 00010000
[82099.288934] bnx2 0000:02:00.0 eth0: CP_CPQ_FTQ_CTL 00004000
[82099.288944] bnx2 0000:02:00.0 eth0: CPU states:
[82099.288960] bnx2 0000:02:00.0 eth0: 045000 mode b84c state 80005000
evt_mask 500 pc 8001284 pc 8000cb8 instr 35690100
[82099.288984] bnx2 0000:02:00.0 eth0: 085000 mode b84c state 80001000
evt_mask 500 pc 8000a58 pc 8000a4c instr 38420001
[82099.289007] bnx2 0000:02:00.0 eth0: 0c5000 mode b84c state 80001000
evt_mask 500 pc 8004c14 pc 8004c14 instr 32050003
[82099.289030] bnx2 0000:02:00.0 eth0: 105000 mode b8cc state 80000000
evt_mask 500 pc 8000a94 pc 8000a94 instr 8c420020
[82099.289063] bnx2 0000:02:00.0 eth0: 145000 mode b880 state 80000000
evt_mask 500 pc 800d244 pc 8008aac instr 8c460000
[82099.289087] bnx2 0000:02:00.0 eth0: 185000 mode b8cc state 80000000
evt_mask 500 pc 8000c6c pc 8000c6c instr 3c056000
[82099.289103] bnx2 0000:02:00.0 eth0: <--- end FTQ dump --->
[82099.289112] bnx2 0000:02:00.0 eth0: <--- start TBDC dump --->
[82099.289124] bnx2 0000:02:00.0 eth0: TBDC free cnt: 31
[82099.289133] bnx2 0000:02:00.0 eth0: LINE CID BIDX CMD VALIDS
[82099.289148] bnx2 0000:02:00.0 eth0: 00 000800 a3b8 00 [1]
[82099.289163] bnx2 0000:02:00.0 eth0: 01 001100 1b58 00 [0]
[82099.289178] bnx2 0000:02:00.0 eth0: 02 000800 a390 00 [0]
[82099.289193] bnx2 0000:02:00.0 eth0: 03 000800 a370 00 [0]
[82099.289217] bnx2 0000:02:00.0 eth0: 04 000800 a378 00 [0]
[82099.289232] bnx2 0000:02:00.0 eth0: 05 000800 a388 00 [0]
[82099.289247] bnx2 0000:02:00.0 eth0: 06 000800 a398 00 [0]
[82099.289262] bnx2 0000:02:00.0 eth0: 07 000800 a3a8 00 [0]
[82099.289277] bnx2 0000:02:00.0 eth0: 08 000800 a3b0 00 [0]
[82099.289291] bnx2 0000:02:00.0 eth0: 09 000800 a3b8 00 [0]
[82099.289306] bnx2 0000:02:00.0 eth0: 0a 000800 8c10 00 [0]
[82099.289321] bnx2 0000:02:00.0 eth0: 0b 000800 eaf0 00 [0]
[82099.289336] bnx2 0000:02:00.0 eth0: 0c 000800 eaf8 00 [0]
[82099.289351] bnx2 0000:02:00.0 eth0: 0d 001100 5e60 00 [0]
[82099.289365] bnx2 0000:02:00.0 eth0: 0e 001100 5e68 00 [0]
[82099.289380] bnx2 0000:02:00.0 eth0: 0f 001100 5e70 00 [0]
[82099.289395] bnx2 0000:02:00.0 eth0: 10 001100 5e88 00 [0]
[82099.289410] bnx2 0000:02:00.0 eth0: 11 001100 5e90 00 [0]
[82099.289425] bnx2 0000:02:00.0 eth0: 12 001100 5ee8 00 [0]
[82099.289440] bnx2 0000:02:00.0 eth0: 13 001100 5ef8 00 [0]
[82099.289454] bnx2 0000:02:00.0 eth0: 14 001100 5e00 00 [0]
[82099.289470] bnx2 0000:02:00.0 eth0: 15 001100 5a20 00 [0]
[82099.289485] bnx2 0000:02:00.0 eth0: 16 001100 59a8 00 [0]
[82099.289499] bnx2 0000:02:00.0 eth0: 17 001100 59b0 00 [0]
[82099.289514] bnx2 0000:02:00.0 eth0: 18 001100 59b8 00 [0]
[82099.289529] bnx2 0000:02:00.0 eth0: 19 001100 5a28 00 [0]
[82099.289544] bnx2 0000:02:00.0 eth0: 1a 001100 5a30 00 [0]
[82099.289559] bnx2 0000:02:00.0 eth0: 1b 000800 8c58 00 [0]
[82099.289573] bnx2 0000:02:00.0 eth0: 1c 000800 8c60 00 [0]
[82099.289588] bnx2 0000:02:00.0 eth0: 1d 055e80 dca8 fb [0]
[82099.289603] bnx2 0000:02:00.0 eth0: 1e 1cf780 f7b8 af [0]
[82099.289618] bnx2 0000:02:00.0 eth0: 1f 1dff80 efe0 bf [0]
[82099.289629] bnx2 0000:02:00.0 eth0: <--- end TBDC dump --->
[82099.289644] bnx2 0000:02:00.0 eth0: DEBUG: intr_sem[0] PCI_CMD[00100406]
[82099.289661] bnx2 0000:02:00.0 eth0: DEBUG: PCI_PM[19002008]
PCI_MISC_CFG[92000088]
[82099.289676] bnx2 0000:02:00.0 eth0: DEBUG: EMAC_TX_STATUS[0000000e]
EMAC_RX_STATUS[00000000]
[82099.289691] bnx2 0000:02:00.0 eth0: DEBUG: RPM_MGMT_PKT_CTRL[40000088]
[82099.289703] bnx2 0000:02:00.0 eth0: DEBUG:
HC_STATS_INTERRUPT_STATUS[01ee0000]
[82099.289716] bnx2 0000:02:00.0 eth0: DEBUG: PBA[00000000]
[82099.289726] bnx2 0000:02:00.0 eth0: <--- start MCP states dump --->
[82099.289739] bnx2 0000:02:00.0 eth0: DEBUG: MCP_STATE_P0[0003610e]
MCP_STATE_P1[0003610e]
[82099.289756] bnx2 0000:02:00.0 eth0: DEBUG: MCP mode[0000b880]
state[80008000] evt_mask[00000500]
[82099.289773] bnx2 0000:02:00.0 eth0: DEBUG: pc[0800b110] pc[0800aff0]
instr[afbf0048]
[82099.289787] bnx2 0000:02:00.0 eth0: DEBUG: shmem states:
[82099.289800] bnx2 0000:02:00.0 eth0: DEBUG: drv_mb[0d000005]
fw_mb[00000005] link_status[0010026f]
[82099.289820] bnx2 0000:02:00.0 eth0: DEBUG:
dev_info_signature[44564903] reset_type[01005254]
[82099.289842] bnx2 0000:02:00.0 eth0: DEBUG: 000001c0: 01005254
42530088 0003610e 00000000
[82099.289860] bnx2 0000:02:00.0 eth0: DEBUG: 000003cc: 44444444
44444444 44444444 00000a28
[82099.289879] bnx2 0000:02:00.0 eth0: DEBUG: 000003dc: 000cffff
00000000 ffff0000 00000000
[82099.289897] bnx2 0000:02:00.0 eth0: DEBUG: 000003ec: 00000000
00000000 00000000 00000000
[82099.289911] bnx2 0000:02:00.0 eth0: DEBUG: 0x3fc[0000ffff]
[82099.289921] bnx2 0000:02:00.0 eth0: <--- end MCP states dump --->

lspci -s 02:00.0 -vv
02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709S
Gigabit Ethernet (rev 20)
Subsystem: Dell Device 045f
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 32
Region 0: Memory at c8000000 (64-bit, non-prefetchable) [size=32M]
Expansion ROM at c6000000 [disabled] [size=128K]
Capabilities: [48] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [50] Vital Product Data
Product Name: Broadcom NetXtreme II Ethernet Controller
Read-only fields:
[PN] Part number: BCM95709C0
[EC] Engineering changes: 220197-2
[SN] Serial number: 0123456789
[MN] Manufacture ID: 31 30 32 38
[V0] Vendor specific: 6.2.14
[RV] Reserved: checksum good, 22 byte(s) reserved
End
Capabilities: [58] MSI: Enable- Count=1/16 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [a0] MSI-X: Enable+ Count=9 Masked-
Vector table: BAR=0 offset=0000c000
PBA: BAR=0 offset=0000e000
Capabilities: [ac] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <4us, L1 <64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported+
RlxdOrd+ ExtTag- PhantFunc- AuxPwr+ NoSnoop+
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
LnkCap: Port #0, Speed 5GT/s, Width x4, ASPM L0s L1, Latency L0 <2us,
L1 <2us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive-
BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not
Supported
DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-, LTR-, OBFF
Disabled
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance-
ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-,
EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v1] Device Serial Number b8-ac-6f-ff-fe-b4-17-20
Capabilities: [110 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+ RxOF-
MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP+ FCP+ CmpltTO+ CmpltAbrt- UnxCmplt- RxOF+
MalfTLP+ ECRC+ UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
CEMsk: RxErr- BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [150 v1] Power Budgeting <?>
Capabilities: [160 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
Status: NegoPending- InProgress-
Kernel driver in use: bnx2
Kernel modules: bnx2



(XEN) [2014-02-05 20:15:20] MSI information:
(XEN) [2014-02-05 20:15:20] IOMMU 56 vec=28 fixed edge assert
phys cpu dest=00000020 mask=1/0/?
(XEN) [2014-02-05 20:15:20] MSI 57 vec=d0 fixed edge assert
phys cpu dest=00000020 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI 58 vec=d8 fixed edge assert
phys cpu dest=00000020 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI 59 vec=21 fixed edge assert
phys cpu dest=00000020 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI 60 vec=29 fixed edge assert
phys cpu dest=00000020 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI 61 vec=31 fixed edge assert
phys cpu dest=00000020 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI 62 vec=39 fixed edge assert
phys cpu dest=00000020 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 63 vec=62 fixed edge assert
phys cpu dest=00000022 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 64 vec=d7 fixed edge assert
phys cpu dest=00000022 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 65 vec=ba fixed edge assert
phys cpu dest=00000000 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 66 vec=92 fixed edge assert
phys cpu dest=00000022 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 67 vec=3a fixed edge assert
phys cpu dest=00000021 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 68 vec=b8 fixed edge assert
phys cpu dest=00000022 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 69 vec=2a fixed edge assert
phys cpu dest=00000020 mask=1/1/1
(XEN) [2014-02-05 20:15:20] MSI-X 70 vec=33 fixed edge assert
phys cpu dest=00000021 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 71 vec=c2 fixed edge assert
phys cpu dest=00000000 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 72 vec=9a fixed edge assert
phys cpu dest=00000022 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 73 vec=d0 fixed edge assert
phys cpu dest=00000022 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 74 vec=da fixed edge assert
phys cpu dest=00000022 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 75 vec=b2 fixed edge assert
phys cpu dest=00000020 mask=1/1/1
(XEN) [2014-02-05 20:15:20] MSI-X 76 vec=6b fixed edge assert
phys cpu dest=00000021 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 77 vec=68 fixed edge assert
phys cpu dest=00000022 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 78 vec=b2 fixed edge assert
phys cpu dest=00000022 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 79 vec=53 fixed edge assert
phys cpu dest=00000021 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 80 vec=78 fixed edge assert
phys cpu dest=00000022 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 81 vec=4b fixed edge assert
phys cpu dest=00000020 mask=1/1/1
(XEN) [2014-02-05 20:15:20] MSI-X 82 vec=a7 fixed edge assert
phys cpu dest=00000022 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 83 vec=63 fixed edge assert
phys cpu dest=00000022 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 84 vec=6f fixed edge assert
phys cpu dest=00000000 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 85 vec=5b fixed edge assert
phys cpu dest=00000022 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 86 vec=99 fixed edge assert
phys cpu dest=00000022 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 87 vec=a3 fixed edge assert
phys cpu dest=00000020 mask=1/1/1
(XEN) [2014-02-05 20:15:20] MSI-X 88 vec=73 fixed edge assert
phys cpu dest=00000000 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 89 vec=58 fixed edge assert
phys cpu dest=00000022 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 90 vec=aa fixed edge assert
phys cpu dest=00000000 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 91 vec=38 fixed edge assert
phys cpu dest=00000022 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 92 vec=8f fixed edge assert
phys cpu dest=00000022 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 93 vec=3c fixed edge assert
phys cpu dest=00000020 mask=1/1/1
(XEN) [2014-02-05 20:15:20] MSI-X 94 vec=3b fixed edge assert
phys cpu dest=00000021 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 95 vec=ca fixed edge assert
phys cpu dest=00000022 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 96 vec=a8 fixed edge assert
phys cpu dest=00000022 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 97 vec=32 fixed edge assert
phys cpu dest=00000022 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 98 vec=23 fixed edge assert
phys cpu dest=00000000 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 99 vec=94 fixed edge assert
phys cpu dest=00000020 mask=1/1/1
(XEN) [2014-02-05 20:15:20] MSI-X 100 vec=7b fixed edge assert
phys cpu dest=00000021 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 101 vec=60 fixed edge assert
phys cpu dest=00000022 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 102 vec=a2 fixed edge assert
phys cpu dest=00000022 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 103 vec=4b fixed edge assert
phys cpu dest=00000021 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 104 vec=a1 fixed edge assert
phys cpu dest=00000021 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 105 vec=2d fixed edge assert
phys cpu dest=00000020 mask=1/1/1
(XEN) [2014-02-05 20:15:20] MSI-X 106 vec=43 fixed edge assert
phys cpu dest=00000000 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 107 vec=d2 fixed edge assert
phys cpu dest=00000022 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 108 vec=61 fixed edge assert
phys cpu dest=00000000 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 109 vec=21 fixed edge assert
phys cpu dest=00000022 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 110 vec=2b fixed edge assert
phys cpu dest=00000000 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 111 vec=85 fixed edge assert
phys cpu dest=00000020 mask=1/1/1
(XEN) [2014-02-05 20:15:22] Guest interrupt information:
(XEN) [2014-02-05 20:15:22] IRQ: 0 affinity:00000001 vec:f0
type=IO-APIC-edge status=00000000 timer_interrupt+0/0x18f
(XEN) [2014-02-05 20:15:22] IRQ: 1 affinity:00000001 vec:30
type=IO-APIC-edge status=00000002 mapped, unbound
(XEN) [2014-02-05 20:15:22] IRQ: 3 affinity:00000001 vec:38
type=IO-APIC-edge status=00000002 mapped, unbound
(XEN) [2014-02-05 20:15:22] IRQ: 4 affinity:00000001 vec:f1
type=IO-APIC-edge status=00000000 ns16550_interrupt+0/0x73
(XEN) [2014-02-05 20:15:22] IRQ: 5 affinity:00000001 vec:40
type=IO-APIC-edge status=00000002 mapped, unbound
(XEN) [2014-02-05 20:15:22] IRQ: 6 affinity:00000001 vec:48
type=IO-APIC-edge status=00000002 mapped, unbound
(XEN) [2014-02-05 20:15:22] IRQ: 7 affinity:00000001 vec:50
type=IO-APIC-edge status=00000002 mapped, unbound
(XEN) [2014-02-05 20:15:22] IRQ: 8 affinity:00000001 vec:58
type=IO-APIC-edge status=00000030 in-flight=0 domain-list=0: 8(---),
(XEN) [2014-02-05 20:15:22] IRQ: 9 affinity:00000001 vec:60
type=IO-APIC-level status=00000030 in-flight=0 domain-list=0: 9(---),
(XEN) [2014-02-05 20:15:22] IRQ: 10 affinity:00000001 vec:68
type=IO-APIC-edge status=00000002 mapped, unbound
(XEN) [2014-02-05 20:15:22] IRQ: 11 affinity:00000001 vec:70
type=IO-APIC-edge status=00000002 mapped, unbound
(XEN) [2014-02-05 20:15:22] IRQ: 12 affinity:00000001 vec:78
type=IO-APIC-edge status=00000002 mapped, unbound
(XEN) [2014-02-05 20:15:22] IRQ: 13 affinity:0000ffff vec:88
type=IO-APIC-edge status=00000002 mapped, unbound
(XEN) [2014-02-05 20:15:22] IRQ: 14 affinity:00000001 vec:90
type=IO-APIC-edge status=00000002 mapped, unbound
(XEN) [2014-02-05 20:15:22] IRQ: 15 affinity:00000001 vec:98
type=IO-APIC-edge status=00000002 mapped, unbound
(XEN) [2014-02-05 20:15:22] IRQ: 17 affinity:00000010 vec:59
type=IO-APIC-level status=00000030 in-flight=0 domain-list=0: 17(---),
(XEN) [2014-02-05 20:15:22] IRQ: 18 affinity:00000010 vec:61
type=IO-APIC-level status=00000030 in-flight=0 domain-list=0: 18(---),
(XEN) [2014-02-05 20:15:22] IRQ: 19 affinity:00000010 vec:b9
type=IO-APIC-level status=00000030 in-flight=0 domain-list=0: 19(---),
(XEN) [2014-02-05 20:15:22] IRQ: 20 affinity:00000010 vec:71
type=IO-APIC-level status=00000030 in-flight=0 domain-list=0: 20(---),
(XEN) [2014-02-05 20:15:22] IRQ: 21 affinity:00000400 vec:6d
type=IO-APIC-level status=00000030 in-flight=0 domain-list=0: 21(---),
(XEN) [2014-02-05 20:15:22] IRQ: 32 affinity:0000ffff vec:79
type=IO-APIC-level status=00000002 mapped, unbound
(XEN) [2014-02-05 20:15:22] IRQ: 33 affinity:0000ffff vec:89
type=IO-APIC-level status=00000002 mapped, unbound
(XEN) [2014-02-05 20:15:22] IRQ: 36 affinity:0000ffff vec:41
type=IO-APIC-level status=00000002 mapped, unbound
(XEN) [2014-02-05 20:15:22] IRQ: 38 affinity:0000ffff vec:99
type=IO-APIC-level status=00000002 mapped, unbound
(XEN) [2014-02-05 20:15:22] IRQ: 39 affinity:0000ffff vec:a9
type=IO-APIC-level status=00000002 mapped, unbound
(XEN) [2014-02-05 20:15:22] IRQ: 42 affinity:0000ffff vec:81
type=IO-APIC-level status=00000002 mapped, unbound
(XEN) [2014-02-05 20:15:22] IRQ: 43 affinity:0000ffff vec:91
type=IO-APIC-level status=00000002 mapped, unbound
(XEN) [2014-02-05 20:15:22] IRQ: 45 affinity:0000ffff vec:a1
type=IO-APIC-level status=00000002 mapped, unbound
(XEN) [2014-02-05 20:15:22] IRQ: 47 affinity:0000ffff vec:b1
type=IO-APIC-level status=00000002 mapped, unbound
(XEN) [2014-02-05 20:15:22] IRQ: 53 affinity:0000ffff vec:a0
type=IO-APIC-level status=00000002 mapped, unbound
(XEN) [2014-02-05 20:15:22] IRQ: 56 affinity:0000ffff vec:28
type=DMA_MSI status=00000000 iommu_page_fault+0/0x12
(XEN) [2014-02-05 20:15:22] IRQ: 57 affinity:00000001 vec:d0
type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:311(---),
(XEN) [2014-02-05 20:15:22] IRQ: 58 affinity:00000001 vec:d8
type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:310(---),
(XEN) [2014-02-05 20:15:22] IRQ: 59 affinity:00000001 vec:21
type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:309(---),
(XEN) [2014-02-05 20:15:22] IRQ: 60 affinity:00000001 vec:29
type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:308(---),
(XEN) [2014-02-05 20:15:22] IRQ: 61 affinity:00000001 vec:31
type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:307(---),
(XEN) [2014-02-05 20:15:22] IRQ: 62 affinity:00000001 vec:39
type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:306(---),
(XEN) [2014-02-05 20:15:22] IRQ: 63 affinity:00000004 vec:62
type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:305(---),
(XEN) [2014-02-05 20:15:22] IRQ: 64 affinity:00000004 vec:d7
type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:304(---),
(XEN) [2014-02-05 20:15:22] IRQ: 65 affinity:00000100 vec:ba
type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:303(---),
(XEN) [2014-02-05 20:15:22] IRQ: 66 affinity:00000004 vec:92
type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:302(---),
(XEN) [2014-02-05 20:15:22] IRQ: 67 affinity:00000002 vec:3a
type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:301(---),
(XEN) [2014-02-05 20:15:22] IRQ: 68 affinity:00000004 vec:b8
type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:300(---),
(XEN) [2014-02-05 20:15:22] IRQ: 69 affinity:00000001 vec:2a
type=PCI-MSI/-X status=00000002 mapped, unbound
(XEN) [2014-02-05 20:15:22] IRQ: 70 affinity:00000002 vec:33
type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:298(---),
(XEN) [2014-02-05 20:15:22] IRQ: 71 affinity:00000100 vec:c2
type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:297(---),
(XEN) [2014-02-05 20:15:22] IRQ: 72 affinity:00000004 vec:9a
type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:296(---),
(XEN) [2014-02-05 20:15:22] IRQ: 73 affinity:00000004 vec:d0
type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:295(---),
(XEN) [2014-02-05 20:15:22] IRQ: 74 affinity:00000004 vec:da
type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:294(---),
(XEN) [2014-02-05 20:15:22] IRQ: 75 affinity:00000001 vec:b2
type=PCI-MSI/-X status=00000002 mapped, unbound
(XEN) [2014-02-05 20:15:22] IRQ: 76 affinity:00000002 vec:6b
type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:292(---),
(XEN) [2014-02-05 20:15:22] IRQ: 77 affinity:00000004 vec:68
type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:291(---),
(XEN) [2014-02-05 20:15:22] IRQ: 78 affinity:00000004 vec:b2
type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:290(---),
(XEN) [2014-02-05 20:15:22] IRQ: 79 affinity:00000002 vec:53
type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:289(---),
(XEN) [2014-02-05 20:15:22] IRQ: 80 affinity:00000004 vec:78
type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:288(---),
(XEN) [2014-02-05 20:15:22] IRQ: 81 affinity:00000001 vec:4b
type=PCI-MSI/-X status=00000002 mapped, unbound
(XEN) [2014-02-05 20:15:22] IRQ: 82 affinity:00000004 vec:a7
type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:286(---),
(XEN) [2014-02-05 20:15:22] IRQ: 83 affinity:00000004 vec:63
type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:285(---),
(XEN) [2014-02-05 20:15:22] IRQ: 84 affinity:00000100 vec:6f
type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:284(---),
(XEN) [2014-02-05 20:15:22] IRQ: 85 affinity:00000004 vec:5b
type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:283(---),
(XEN) [2014-02-05 20:15:22] IRQ: 86 affinity:00000004 vec:99
type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:282(---),
(XEN) [2014-02-05 20:15:22] IRQ: 87 affinity:00000001 vec:a3
type=PCI-MSI/-X status=00000002 mapped, unbound
(XEN) [2014-02-05 20:15:22] IRQ: 88 affinity:00000100 vec:73
type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:280(---),
(XEN) [2014-02-05 20:15:22] IRQ: 89 affinity:00000004 vec:58
type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:279(---),
(XEN) [2014-02-05 20:15:22] IRQ: 90 affinity:00000100 vec:aa
type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:278(---),
(XEN) [2014-02-05 20:15:22] IRQ: 91 affinity:00000004 vec:38
type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:277(---),
(XEN) [2014-02-05 20:15:22] IRQ: 92 affinity:00000004 vec:8f
type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:276(---),
(XEN) [2014-02-05 20:15:22] IRQ: 93 affinity:00000001 vec:3c
type=PCI-MSI/-X status=00000002 mapped, unbound
(XEN) [2014-02-05 20:15:22] IRQ: 94 affinity:00000002 vec:3b
type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:274(---),
(XEN) [2014-02-05 20:15:22] IRQ: 95 affinity:00000004 vec:ca
type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:273(---),
(XEN) [2014-02-05 20:15:22] IRQ: 96 affinity:00000004 vec:a8
type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:272(---),
(XEN) [2014-02-05 20:15:22] IRQ: 97 affinity:00000004 vec:32
type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:271(---),
(XEN) [2014-02-05 20:15:22] IRQ: 98 affinity:00000100 vec:23
type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:270(---),
(XEN) [2014-02-05 20:15:22] IRQ: 99 affinity:00000001 vec:94
type=PCI-MSI/-X status=00000002 mapped, unbound
(XEN) [2014-02-05 20:15:22] IRQ: 100 affinity:00000002 vec:7b
type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:268(---),
(XEN) [2014-02-05 20:15:22] IRQ: 101 affinity:00000004 vec:60
type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:267(---),
(XEN) [2014-02-05 20:15:22] IRQ: 102 affinity:00000004 vec:a2
type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:266(---),
(XEN) [2014-02-05 20:15:22] IRQ: 103 affinity:00000002 vec:4b
type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:265(---),
(XEN) [2014-02-05 20:15:22] IRQ: 104 affinity:00000002 vec:a1
type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:264(---),
(XEN) [2014-02-05 20:15:22] IRQ: 105 affinity:00000001 vec:2d
type=PCI-MSI/-X status=00000002 mapped, unbound
(XEN) [2014-02-05 20:15:22] IRQ: 106 affinity:00000100 vec:43
type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:262(---),
(XEN) [2014-02-05 20:15:22] IRQ: 107 affinity:00000004 vec:d2
type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:261(---),
(XEN) [2014-02-05 20:15:22] IRQ: 108 affinity:00000100 vec:61
type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:260(---),
(XEN) [2014-02-05 20:15:22] IRQ: 109 affinity:00000004 vec:21
type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:259(---),
(XEN) [2014-02-05 20:15:22] IRQ: 110 affinity:00000100 vec:2b
type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:258(---),
(XEN) [2014-02-05 20:15:22] IRQ: 111 affinity:00000001 vec:85
type=PCI-MSI/-X status=00000002 mapped, unbound
(XEN) [2014-02-05 20:15:22] IO-APIC interrupt information:
(XEN) [2014-02-05 20:15:22] IRQ 0 Vec240:
(XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 2: vec=f0
delivery=Fixed dest=P status=0 polarity=0 irr=0 trig=E mask=0 dest_id:32
(XEN) [2014-02-05 20:15:22] IRQ 1 Vec 48:
(XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 1: vec=30
delivery=Fixed dest=P status=0 polarity=0 irr=0 trig=E mask=0 dest_id:32
(XEN) [2014-02-05 20:15:22] IRQ 3 Vec 56:
(XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 3: vec=38
delivery=Fixed dest=P status=0 polarity=0 irr=0 trig=E mask=0 dest_id:32
(XEN) [2014-02-05 20:15:22] IRQ 4 Vec241:
(XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 4: vec=f1
delivery=Fixed dest=P status=0 polarity=0 irr=0 trig=E mask=0 dest_id:32
(XEN) [2014-02-05 20:15:22] IRQ 5 Vec 64:
(XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 5: vec=40
delivery=Fixed dest=P status=0 polarity=0 irr=0 trig=E mask=0 dest_id:32
(XEN) [2014-02-05 20:15:22] IRQ 6 Vec 72:
(XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 6: vec=48
delivery=Fixed dest=P status=0 polarity=0 irr=0 trig=E mask=0 dest_id:32
(XEN) [2014-02-05 20:15:22] IRQ 7 Vec 80:
(XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 7: vec=50
delivery=Fixed dest=P status=0 polarity=0 irr=0 trig=E mask=0 dest_id:32
(XEN) [2014-02-05 20:15:22] IRQ 8 Vec 88:
(XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 8: vec=58
delivery=Fixed dest=P status=0 polarity=0 irr=0 trig=E mask=0 dest_id:32
(XEN) [2014-02-05 20:15:22] IRQ 9 Vec 96:
(XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 9: vec=60
delivery=Fixed dest=P status=0 polarity=0 irr=0 trig=L mask=0 dest_id:32
(XEN) [2014-02-05 20:15:22] IRQ 10 Vec104:
(XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 10: vec=68
delivery=Fixed dest=P status=0 polarity=0 irr=0 trig=E mask=0 dest_id:32
(XEN) [2014-02-05 20:15:22] IRQ 11 Vec112:
(XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 11: vec=70
delivery=Fixed dest=P status=0 polarity=0 irr=0 trig=E mask=0 dest_id:32
(XEN) [2014-02-05 20:15:22] IRQ 12 Vec120:
(XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 12: vec=78
delivery=Fixed dest=P status=0 polarity=0 irr=0 trig=E mask=0 dest_id:32
(XEN) [2014-02-05 20:15:22] IRQ 13 Vec136:
(XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 13: vec=88
delivery=Fixed dest=P status=0 polarity=0 irr=0 trig=E mask=1 dest_id:32
(XEN) [2014-02-05 20:15:22] IRQ 14 Vec144:
(XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 14: vec=90
delivery=Fixed dest=P status=0 polarity=0 irr=0 trig=E mask=0 dest_id:32
(XEN) [2014-02-05 20:15:22] IRQ 15 Vec152:
(XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 15: vec=98
delivery=Fixed dest=P status=0 polarity=0 irr=0 trig=E mask=0 dest_id:32
(XEN) [2014-02-05 20:15:22] IRQ 17 Vec 89:
(XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 17: vec=59
delivery=Fixed dest=P status=0 polarity=1 irr=0 trig=L mask=0 dest_id:50
(XEN) [2014-02-05 20:15:22] IRQ 18 Vec 97:
(XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 18: vec=61
delivery=Fixed dest=P status=0 polarity=1 irr=0 trig=L mask=0 dest_id:50
(XEN) [2014-02-05 20:15:22] IRQ 19 Vec185:
(XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 19: vec=b9
delivery=Fixed dest=P status=0 polarity=1 irr=0 trig=L mask=0 dest_id:50
(XEN) [2014-02-05 20:15:22] IRQ 20 Vec113:
(XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 20: vec=71
delivery=Fixed dest=P status=0 polarity=1 irr=0 trig=L mask=0 dest_id:50
(XEN) [2014-02-05 20:15:22] IRQ 21 Vec109:
(XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 21: vec=6d
delivery=Fixed dest=P status=0 polarity=1 irr=0 trig=L mask=0 dest_id:2
(XEN) [2014-02-05 20:15:22] IRQ 32 Vec121:
(XEN) [2014-02-05 20:15:22] Apic 0x01, Pin 0: vec=79
delivery=Fixed dest=P status=0 polarity=1 irr=0 trig=L mask=1 dest_id:32
(XEN) [2014-02-05 20:15:22] IRQ 33 Vec137:
(XEN) [2014-02-05 20:15:22] Apic 0x01, Pin 1: vec=89
delivery=Fixed dest=P status=0 polarity=1 irr=0 trig=L mask=1 dest_id:32
(XEN) [2014-02-05 20:15:22] IRQ 36 Vec 65:
(XEN) [2014-02-05 20:15:22] Apic 0x01, Pin 4: vec=41
delivery=Fixed dest=P status=0 polarity=1 irr=0 trig=L mask=1 dest_id:32
(XEN) [2014-02-05 20:15:22] IRQ 38 Vec153:
(XEN) [2014-02-05 20:15:22] Apic 0x01, Pin 6: vec=99
delivery=Fixed dest=P status=0 polarity=1 irr=0 trig=L mask=1 dest_id:32
(XEN) [2014-02-05 20:15:22] IRQ 39 Vec169:
(XEN) [2014-02-05 20:15:22] Apic 0x01, Pin 7: vec=a9
delivery=Fixed dest=P status=0 polarity=1 irr=0 trig=L mask=1 dest_id:32
(XEN) [2014-02-05 20:15:22] IRQ 42 Vec129:
(XEN) [2014-02-05 20:15:22] Apic 0x01, Pin 10: vec=81
delivery=Fixed dest=P status=0 polarity=1 irr=0 trig=L mask=1 dest_id:32
(XEN) [2014-02-05 20:15:22] IRQ 43 Vec145:
(XEN) [2014-02-05 20:15:22] Apic 0x01, Pin 11: vec=91
delivery=Fixed dest=P status=0 polarity=1 irr=0 trig=L mask=1 dest_id:32
(XEN) [2014-02-05 20:15:22] IRQ 45 Vec161:
(XEN) [2014-02-05 20:15:22] Apic 0x01, Pin 13: vec=a1
delivery=Fixed dest=P status=0 polarity=1 irr=0 trig=L mask=1 dest_id:32
(XEN) [2014-02-05 20:15:22] IRQ 47 Vec177:
(XEN) [2014-02-05 20:15:22] Apic 0x01, Pin 15: vec=b1
delivery=Fixed dest=P status=0 polarity=1 irr=0 trig=L mask=1 dest_id:32
(XEN) [2014-02-05 20:15:22] IRQ 53 Vec160:
(XEN) [2014-02-05 20:15:22] Apic 0x01, Pin 21: vec=a0
delivery=Fixed dest=P status=0 polarity=1 irr=0 trig=L mask=1 dest_id:32

2014-02-05 20:28:01

by Zoltan Kiss

[permalink] [raw]
Subject: Re: igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer

On 05/02/14 20:23, Zoltan Kiss wrote:
> On 04/02/14 19:47, Michael Chan wrote:
>> On Fri, 2014-01-31 at 14:29 +0100, Zoltan Kiss wrote:
>>> [ 5417.275472] WARNING: at net/sched/sch_generic.c:255
>>> dev_watchdog+0x156/0x1f0()
>>> [ 5417.275474] NETDEV WATCHDOG: eth1 (bnx2): transmit queue 2 timed out
>>
>> The dump shows an internal IRQ pending on MSIX vector 2 which matches
>> the the queue number that is timing out. I don't know what happened to
>> the MSIX and why the driver is not seeing it. Do you see an IRQ error
>> message from the kernel a few seconds before the tx timeout message?
>
> I haven't seen any IRQ related error message. Note, this is on Xen
> 4.3.1. Now I have new results with a reworked version of the patch,
> unfortunately it still has this issue. Here is a bnx2 dump, lspci output
> and some Xen debug output (MSI and interrupt bindings, I have more if
> needed).

And here is the watchdog message and the first dump, if it matters:

[10118.282007] ------------[ cut here ]------------
[10118.282018] WARNING: at net/sched/sch_generic.c:255
dev_watchdog+0x156/0x1f0()
[10118.282021] NETDEV WATCHDOG: eth0 (bnx2): transmit queue 0 timed out
[10118.282024] Modules linked in: tun nfsv3 nfs_acl rpcsec_gss_krb5
auth_rpcgss oid_registry nfsv4 nfs fscache lockd sunrpc ipv6 openvswitch(O)
frag_ipv4 xt_state nf_conntrack xt_tcpudp iptable_filter ip_tables
x_tables sr_mod cdrom nls_utf8 isofs dm_multipath scsi_dh dm_mod
usb_storage
lk_helper cryptd lrw aes_i586 xts gf128mul coretemp microcode
hid_generic lpc_ich mfd_core ehci_pci ehci_hcd i7core_edac edac_core
bnx2 sg hed u
scsi_transport_sas raid_class scsi_mod
[10118.282083] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G O
3.10.11-0.xs1.8.50.175.377583 #1
[10118.282086] Hardware name: Dell Inc. PowerEdge M710HD/05GGXD, BIOS
2.0.0 01/31/2011
[10118.282089] 000000ff ee0a5dd0 c1488cd3 ee0a5df8 c1046664 c1658a88
ee0a5e24 000000ff
[10118.282097] c13fc1c6 c13fc1c6 ec778000 00000000 00256a1c ee0a5e10
c1046723 00000009
[10118.282104] ee0a5e08 c1658a88 ee0a5e24 ee0a5e48 c13fc1c6 c16556e1
000000ff c1658a88
[10118.282112] Call Trace:
[10118.282118] [<c1488cd3>] dump_stack+0x16/0x1b
[10118.282125] [<c1046664>] warn_slowpath_common+0x64/0x80
[10118.282129] [<c13fc1c6>] ? dev_watchdog+0x156/0x1f0
[10118.282133] [<c13fc1c6>] ? dev_watchdog+0x156/0x1f0
[10118.282137] [<c1046723>] warn_slowpath_fmt+0x33/0x40
[10118.282141] [<c13fc1c6>] dev_watchdog+0x156/0x1f0
[10118.282149] [<c10549ce>] call_timer_fn+0x3e/0xf0
[10118.282155] [<c10013a7>] ? xen_hypercall_sched_op+0x7/0x20
[10118.282159] [<c13fc070>] ? __netdev_watchdog_up+0x60/0x60
[10118.282164] [<c1055c1b>] run_timer_softirq+0x1ab/0x210
[10118.282169] [<c10be4fd>] ? irq_get_irq_data+0xd/0x10
[10118.282176] [<c130fb2d>] ? info_for_irq+0xd/0x20
[10118.282180] [<c13fc070>] ? __netdev_watchdog_up+0x60/0x60
[10118.282184] [<c104e3f4>] __do_softirq+0xc4/0x200
[10118.282189] [<c1312316>] ? evtchn_fifo_handle_events+0xf6/0x120
[10118.282193] [<c104e5bd>] irq_exit+0x3d/0x90
[10118.282198] [<c130fe55>] xen_evtchn_do_upcall+0x25/0x40
[10118.282203] [<c14935c7>] xen_do_upcall+0x7/0xc
[10118.282207] [<c10013a7>] ? xen_hypercall_sched_op+0x7/0x20
[10118.282213] [<c1007f12>] ? xen_safe_halt+0x12/0x20
[10118.282218] [<c1015eff>] default_idle+0x3f/0xb0
[10118.282222] [<c1015a17>] arch_cpu_idle+0x17/0x30
[10118.282229] [<c108f591>] cpu_startup_entry+0x141/0x1f0
[10118.282234] [<c147d11b>] cpu_bringup_and_idle+0x12/0x14
[10118.282237] ---[ end trace 25ed24391f6c7acd ]---
[10118.282242] bnx2 0000:02:00.0 eth0: <--- start FTQ dump --->
[10118.282267] bnx2 0000:02:00.0 eth0: RV2P_PFTQ_CTL 00010000
[10118.282277] bnx2 0000:02:00.0 eth0: RV2P_TFTQ_CTL 00020000
[10118.282288] bnx2 0000:02:00.0 eth0: RV2P_MFTQ_CTL 00004000
[10118.282298] bnx2 0000:02:00.0 eth0: TBDR_FTQ_CTL 00004002
[10118.282309] bnx2 0000:02:00.0 eth0: TDMA_FTQ_CTL 00010002
[10118.282319] bnx2 0000:02:00.0 eth0: TXP_FTQ_CTL 01810002
[10118.282330] bnx2 0000:02:00.0 eth0: TXP_FTQ_CTL 01810002
[10118.282340] bnx2 0000:02:00.0 eth0: TPAT_FTQ_CTL 00010002
[10118.282372] bnx2 0000:02:00.0 eth0: RXP_CFTQ_CTL 00008000
[10118.282383] bnx2 0000:02:00.0 eth0: RXP_FTQ_CTL 00100000
[10118.282392] bnx2 0000:02:00.0 eth0: COM_COMXQ_FTQ_CTL 00010000
[10118.282403] bnx2 0000:02:00.0 eth0: COM_COMTQ_FTQ_CTL 00020000
[10118.282414] bnx2 0000:02:00.0 eth0: COM_COMQ_FTQ_CTL 00010000
[10118.282425] bnx2 0000:02:00.0 eth0: CP_CPQ_FTQ_CTL 00004000
[10118.282434] bnx2 0000:02:00.0 eth0: CPU states:
[10118.282449] bnx2 0000:02:00.0 eth0: 045000 mode b84c state 80001000
evt_mask 500 pc 8000844 pc 80012bc instr a0e00012
[10118.282471] bnx2 0000:02:00.0 eth0: 085000 mode b84c state 80001000
evt_mask 500 pc 8000a50 pc 8000ac4 instr 38420001
[10118.282493] bnx2 0000:02:00.0 eth0: 0c5000 mode b84c state 80001000
evt_mask 500 pc 8004c14 pc 8004c18 instr 32070001
[10118.282515] bnx2 0000:02:00.0 eth0: 105000 mode b8cc state 80000000
evt_mask 500 pc 8000a9c pc 8000b28 instr 8c530000
[10118.282537] bnx2 0000:02:00.0 eth0: 145000 mode b880 state 80000000
evt_mask 500 pc 800d1a8 pc 800af74 instr 441010a
[10118.282560] bnx2 0000:02:00.0 eth0: 185000 mode b8cc state 80000000
evt_mask 500 pc 8000918 pc 8000928 instr 8f870048
[10118.282577] bnx2 0000:02:00.0 eth0: <--- end FTQ dump --->
[10118.282586] bnx2 0000:02:00.0 eth0: <--- start TBDC dump --->
[10118.282597] bnx2 0000:02:00.0 eth0: TBDC free cnt: 31
[10118.282607] bnx2 0000:02:00.0 eth0: LINE CID BIDX CMD VALIDS
[10118.282622] bnx2 0000:02:00.0 eth0: 00 000800 a3b8 00 [1]
[10118.282637] bnx2 0000:02:00.0 eth0: 01 001100 1b58 00 [0]
[10118.282652] bnx2 0000:02:00.0 eth0: 02 000800 a390 00 [0]
[10118.282667] bnx2 0000:02:00.0 eth0: 03 000800 a370 00 [0]
[10118.282682] bnx2 0000:02:00.0 eth0: 04 000800 a378 00 [0]
[10118.282696] bnx2 0000:02:00.0 eth0: 05 000800 a388 00 [0]
[10118.282711] bnx2 0000:02:00.0 eth0: 06 000800 a398 00 [0]
[10118.282726] bnx2 0000:02:00.0 eth0: 07 000800 a3a8 00 [0]
[10118.282741] bnx2 0000:02:00.0 eth0: 08 000800 a3b0 00 [0]
[10118.282756] bnx2 0000:02:00.0 eth0: 09 000800 a3b8 00 [0]
[10118.282771] bnx2 0000:02:00.0 eth0: 0a 000800 8c10 00 [0]
[10118.282785] bnx2 0000:02:00.0 eth0: 0b 000800 eaf0 00 [0]
[10118.282800] bnx2 0000:02:00.0 eth0: 0c 000800 eaf8 00 [0]
[10118.282815] bnx2 0000:02:00.0 eth0: 0d 001100 5e60 00 [0]
[10118.282830] bnx2 0000:02:00.0 eth0: 0e 001100 5e68 00 [0]
[10118.282845] bnx2 0000:02:00.0 eth0: 0f 001100 5e70 00 [0]
[10118.282860] bnx2 0000:02:00.0 eth0: 10 001100 5e88 00 [0]
[10118.282875] bnx2 0000:02:00.0 eth0: 11 001100 5e90 00 [0]
[10118.282890] bnx2 0000:02:00.0 eth0: 12 001100 5ee8 00 [0]
[10118.282905] bnx2 0000:02:00.0 eth0: 13 001100 5ef8 00 [0]
[10118.282920] bnx2 0000:02:00.0 eth0: 14 001100 5e00 00 [0]
[10118.282935] bnx2 0000:02:00.0 eth0: 15 001100 5a20 00 [0]
[10118.282950] bnx2 0000:02:00.0 eth0: 16 001100 59a8 00 [0]
[10118.282964] bnx2 0000:02:00.0 eth0: 17 001100 59b0 00 [0]
[10118.282979] bnx2 0000:02:00.0 eth0: 18 001100 59b8 00 [0]
[10118.282994] bnx2 0000:02:00.0 eth0: 19 001100 5a28 00 [0]
[10118.283009] bnx2 0000:02:00.0 eth0: 1a 001100 5a30 00 [0]
[10118.283024] bnx2 0000:02:00.0 eth0: 1b 000800 8c58 00 [0]
[10118.283038] bnx2 0000:02:00.0 eth0: 1c 000800 8c60 00 [0]
[10118.283053] bnx2 0000:02:00.0 eth0: 1d 055e80 dca8 fb [0]
[10118.283068] bnx2 0000:02:00.0 eth0: 1e 1cf780 f7b8 af [0]
[10118.283083] bnx2 0000:02:00.0 eth0: 1f 1dff80 efe0 bf [0]
[10118.283094] bnx2 0000:02:00.0 eth0: <--- end TBDC dump --->
[10118.283111] bnx2 0000:02:00.0 eth0: DEBUG: intr_sem[0] PCI_CMD[00100406]
[10118.283128] bnx2 0000:02:00.0 eth0: DEBUG: PCI_PM[19002008]
PCI_MISC_CFG[92000088]
[10118.283143] bnx2 0000:02:00.0 eth0: DEBUG: EMAC_TX_STATUS[00000008]
EMAC_RX_STATUS[00000000]
[10118.283159] bnx2 0000:02:00.0 eth0: DEBUG: RPM_MGMT_PKT_CTRL[40000088]
[10118.283170] bnx2 0000:02:00.0 eth0: DEBUG:
HC_STATS_INTERRUPT_STATUS[01fe0001]
[10118.283184] bnx2 0000:02:00.0 eth0: DEBUG: PBA[00000000]
[10118.283194] bnx2 0000:02:00.0 eth0: <--- start MCP states dump --->
[10118.283207] bnx2 0000:02:00.0 eth0: DEBUG: MCP_STATE_P0[0003610e]
MCP_STATE_P1[0003610e]
[10118.283224] bnx2 0000:02:00.0 eth0: DEBUG: MCP mode[0000b880]
state[80004000] evt_mask[00000500]
[10118.283242] bnx2 0000:02:00.0 eth0: DEBUG: pc[0800d244] pc[0800b0ac]
instr[00000000]
[10118.283255] bnx2 0000:02:00.0 eth0: DEBUG: shmem states:
[10118.283268] bnx2 0000:02:00.0 eth0: DEBUG: drv_mb[0d000005]
fw_mb[00000005] link_status[0010026f]
[10118.283283] drv_pulse_mb[00002768]
[10118.283288] bnx2 0000:02:00.0 eth0: DEBUG:
dev_info_signature[44564903] reset_type[01005254]
[10118.283302] condition[0003610e]
[10118.283310] bnx2 0000:02:00.0 eth0: DEBUG: 000001c0: 01005254
42530088 0003610e 00000000
[10118.283328] bnx2 0000:02:00.0 eth0: DEBUG: 000003cc: 44444444
44444444 44444444 00000a28
[10118.283346] bnx2 0000:02:00.0 eth0: DEBUG: 000003dc: 000cffff
00000000 ffff0000 00000000
[10118.283364] bnx2 0000:02:00.0 eth0: DEBUG: 000003ec: 00000000
00000000 00000000 00000000
[10118.283379] bnx2 0000:02:00.0 eth0: DEBUG: 0x3fc[0000ffff]
[10118.283389] bnx2 0000:02:00.0 eth0: <--- end MCP states dump --->

2014-02-05 20:44:08

by Andrew Cooper

[permalink] [raw]
Subject: Re: igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer

On 05/02/2014 20:23, Zoltan Kiss wrote:
> On 04/02/14 19:47, Michael Chan wrote:
>> On Fri, 2014-01-31 at 14:29 +0100, Zoltan Kiss wrote:
>>> [ 5417.275472] WARNING: at net/sched/sch_generic.c:255
>>> dev_watchdog+0x156/0x1f0()
>>> [ 5417.275474] NETDEV WATCHDOG: eth1 (bnx2): transmit queue 2 timed out
>>
>> The dump shows an internal IRQ pending on MSIX vector 2 which matches
>> the the queue number that is timing out. I don't know what happened to
>> the MSIX and why the driver is not seeing it. Do you see an IRQ error
>> message from the kernel a few seconds before the tx timeout message?
>
> I haven't seen any IRQ related error message. Note, this is on Xen
> 4.3.1. Now I have new results with a reworked version of the patch,
> unfortunately it still has this issue. Here is a bnx2 dump, lspci
> output and some Xen debug output (MSI and interrupt bindings, I have
> more if needed).

You need debug-keys 'Q' as well to map between the PCI devices and Xen IRQs

~Andrew

2014-02-06 09:58:34

by Zoltan Kiss

[permalink] [raw]
Subject: Re: igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer

On 05/02/14 20:43, Andrew Cooper wrote:
> On 05/02/2014 20:23, Zoltan Kiss wrote:
>> On 04/02/14 19:47, Michael Chan wrote:
>>> On Fri, 2014-01-31 at 14:29 +0100, Zoltan Kiss wrote:
>>>> [ 5417.275472] WARNING: at net/sched/sch_generic.c:255
>>>> dev_watchdog+0x156/0x1f0()
>>>> [ 5417.275474] NETDEV WATCHDOG: eth1 (bnx2): transmit queue 2 timed out
>>>
>>> The dump shows an internal IRQ pending on MSIX vector 2 which matches
>>> the the queue number that is timing out. I don't know what happened to
>>> the MSIX and why the driver is not seeing it. Do you see an IRQ error
>>> message from the kernel a few seconds before the tx timeout message?
>>
>> I haven't seen any IRQ related error message. Note, this is on Xen
>> 4.3.1. Now I have new results with a reworked version of the patch,
>> unfortunately it still has this issue. Here is a bnx2 dump, lspci
>> output and some Xen debug output (MSI and interrupt bindings, I have
>> more if needed).
>
> You need debug-keys 'Q' as well to map between the PCI devices and Xen IRQs
>
> ~Andrew
>

I could have it after reboot:

(XEN) [2014-02-06 09:44:34] 0000:02:00.0 - dom 0 - MSIs < 64 65 66 67
68 69 >

So the relevant MSI informations:

(XEN) [2014-02-05 20:15:20] MSI-X 64 vec=d7 fixed edge assert
phys cpu dest=00000022 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 65 vec=ba fixed edge assert
phys cpu dest=00000000 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 66 vec=92 fixed edge assert
phys cpu dest=00000022 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 67 vec=3a fixed edge assert
phys cpu dest=00000021 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 68 vec=b8 fixed edge assert
phys cpu dest=00000022 mask=1/0/0
(XEN) [2014-02-05 20:15:20] MSI-X 69 vec=2a fixed edge assert
phys cpu dest=00000020 mask=1/1/1
...
(XEN) [2014-02-05 20:15:22] IRQ: 64 affinity:00000004 vec:d7
type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:304(---),
(XEN) [2014-02-05 20:15:22] IRQ: 65 affinity:00000100 vec:ba
type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:303(---),
(XEN) [2014-02-05 20:15:22] IRQ: 66 affinity:00000004 vec:92
type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:302(---),
(XEN) [2014-02-05 20:15:22] IRQ: 67 affinity:00000002 vec:3a
type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:301(---),
(XEN) [2014-02-05 20:15:22] IRQ: 68 affinity:00000004 vec:b8
type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:300(---),
(XEN) [2014-02-05 20:15:22] IRQ: 69 affinity:00000001 vec:2a
type=PCI-MSI/-X status=00000002 mapped, unbound


Zoli

2014-02-12 17:14:05

by Zoltan Kiss

[permalink] [raw]
Subject: Re: igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer

Hi,

I still haven't managed to crack this problem. I've made sure the below
mentioned skb's look the same as the other ones: linear buffer with
header, and the rest is aggregated into frags. Utilizing the skb
destructor I've also checked that these packets are all freed before the
TX hang happens. So the only difference from current upstream is that
the pages are grant mapped into Dom0 instead of grant copy to a local page.
I've also found some of my older notes about this issue, where I managed
to reproduce this on igb, and in that particular case the TX hang could
be solved with ifconfig down/up. Does the "Detected Tx Unit Hang"
messages give any hint to igb developers?

Nov 26 04:18:34 localhost kernel: [ 7814.197868] ------------[ cut here
]------------
Nov 26 04:18:34 localhost kernel: [ 7814.197889] WARNING: at
net/sched/sch_generic.c:255 dev_watchdog+0x165/0x220()
Nov 26 04:18:34 localhost kernel: [ 7814.197892] NETDEV WATCHDOG: eth0
(igb): transmit queue 7 timed out
Nov 26 04:18:34 localhost kernel: [ 7814.197894] Modules linked in: tun
nfsv3 nfs_acl nfs fscache dm_multipath scsi_dh lockd sunrpc openvswitch
ipt_REJECT nf_conntrack_ipv4 nf_defrag_ip
v4 xt_tcpudp xt_conntrack nf_conntrack iptable_filter ip_tables x_tables
nls_utf8 isofs dm_mirror video backlight sbs sbshc hed acpi_ipmi
ipmi_msghandler nvram sg psmouse serio_raw igb
i2c_algo_bit ptp pps_core hpilo tpm_tis tpm tpm_bios lpc_ich mfd_core
ehci_pci crc32_pclmul aesni_intel ablk_helper cryptd lrw aes_i586 xts
gf128mul dm_region_hash dm_log dm_mod shpchp
hpsa sd_mod scsi_mod uhci_hcd ohci_hcd ehci_hcd fbcon font tileblit
bitblit softcursor [last unloaded: microcode]
Nov 26 04:18:34 localhost kernel: [ 7814.197957] CPU: 5 PID: 0 Comm:
swapper/5 Not tainted 3.10.11-0.xs1.8.50.127.377543 #1
Nov 26 04:18:34 localhost kernel: [ 7814.197959] Hardware name: HP
ProLiant BL420c Gen8, BIOS I30 12/14/2012
Nov 26 04:18:34 localhost kernel: [ 7814.197962] e5cd9e10 c13e4c55
e5cd9ddc c1278546 e5cd9e00 c1047fd3 c1643220 e5cd9e2c
Nov 26 04:18:34 localhost kernel: [ 7814.197969] 000000ff c13e4c55
e1fa8700 00000007 000004e2 e5cd9e18 c1048093 00000009
Nov 26 04:18:34 localhost kernel: [ 7814.197975] e5cd9e10 c1643220
e5cd9e2c e5cd9e50 c13e4c55 c163fe6b 000000ff c1643220
Nov 26 04:18:34 localhost kernel: [ 7814.197982] Call Trace:
Nov 26 04:18:34 localhost kernel: [ 7814.197988] [<c13e4c55>] ?
dev_watchdog+0x165/0x220
Nov 26 04:18:34 localhost kernel: [ 7814.197994] [<c1278546>]
dump_stack+0x16/0x20
Nov 26 04:18:34 localhost kernel: [ 7814.198000] [<c1047fd3>]
warn_slowpath_common+0x63/0x80
Nov 26 04:18:34 localhost kernel: [ 7814.198003] [<c13e4c55>] ?
dev_watchdog+0x165/0x220
Nov 26 04:18:34 localhost kernel: [ 7814.198007] [<c1048093>]
warn_slowpath_fmt+0x33/0x40
Nov 26 04:18:34 localhost kernel: [ 7814.198011] [<c13e4c55>]
dev_watchdog+0x165/0x220
Nov 26 04:18:34 localhost kernel: [ 7814.198017] [<c13e4af0>] ?
dev_activate+0x110/0x110
Nov 26 04:18:34 localhost kernel: [ 7814.198020] [<c1055c18>]
call_timer_fn+0x58/0xe0
Nov 26 04:18:34 localhost kernel: [ 7814.198024] [<c1056ce8>]
run_timer_softirq+0x1a8/0x1f0
Nov 26 04:18:34 localhost kernel: [ 7814.198028] [<c12fb61d>] ?
info_for_irq+0xd/0x20
Nov 26 04:18:34 localhost kernel: [ 7814.198031] [<c12fbb6c>] ?
evtchn_from_irq+0x3c/0x50
Nov 26 04:18:34 localhost kernel: [ 7814.198034] [<c13e4af0>] ?
dev_activate+0x110/0x110
Nov 26 04:18:34 localhost kernel: [ 7814.198038] [<c104fcb9>]
__do_softirq+0xd9/0x1e0
Nov 26 04:18:34 localhost kernel: [ 7814.198041] [<c12fc045>] ?
__xen_evtchn_do_upcall+0x245/0x280
Nov 26 04:18:34 localhost kernel: [ 7814.198045] [<c104fe41>]
irq_exit+0x41/0x80
Nov 26 04:18:34 localhost kernel: [ 7814.198048] [<c12fc0e5>]
xen_evtchn_do_upcall+0x25/0x30
Nov 26 04:18:34 localhost kernel: [ 7814.198053] [<c147b287>]
xen_do_upcall+0x7/0xc
Nov 26 04:18:34 localhost kernel: [ 7814.198058] [<c10c00d8>] ?
rcu_process_gp_end+0x58/0x70
Nov 26 04:18:34 localhost kernel: [ 7814.198061] [<c10013a7>] ?
xen_hypercall_sched_op+0x7/0x20
Nov 26 04:18:34 localhost kernel: [ 7814.198066] [<c1007ef2>] ?
xen_safe_halt+0x12/0x20
Nov 26 04:18:34 localhost kernel: [ 7814.198070] [<c1015be6>]
default_idle+0x56/0xb0
Nov 26 04:18:34 localhost kernel: [ 7814.198074] [<c10158e7>]
arch_cpu_idle+0x17/0x30
Nov 26 04:18:34 localhost kernel: [ 7814.198078] [<c108e2ae>]
cpu_startup_entry+0x15e/0x1d0
Nov 26 04:18:34 localhost kernel: [ 7814.198085] [<c1464282>]
cpu_bringup_and_idle+0x12/0x20
Nov 26 04:18:34 localhost kernel: [ 7814.198088] ---[ end trace
d8c0d3f5c187aa6b ]---

And the recovery:

Nov 26 21:47:54 localhost kernel: [70773.950715] ------------[ cut here
]------------
Nov 26 21:47:54 localhost kernel: [70773.950747] WARNING: at
net/core/dev.c:4201 net_rx_action+0xfd/0x1c0()
Nov 26 21:47:54 localhost kernel: [70773.950751] Modules linked in: tun
nfsv3 nfs_acl nfs fscache dm_multipath scsi_dh lockd sunrpc openvswitch
ipt_REJECT nf_conntrack_ipv4 nf_defrag_ip
v4 xt_tcpudp xt_conntrack nf_conntrack iptable_filter ip_tables x_tables
nls_utf8 isofs dm_mirror video backlight sbs sbshc hed acpi_ipmi
ipmi_msghandler nvram sg psmouse serio_raw igb
i2c_algo_bit ptp pps_core hpilo tpm_tis tpm tpm_bios lpc_ich mfd_core
ehci_pci crc32_pclmul aesni_intel ablk_helper cryptd lrw aes_i586 xts
gf128mul dm_region_hash dm_log dm_mod shpchp
hpsa sd_mod scsi_mod uhci_hcd ohci_hcd ehci_hcd fbcon font tileblit
bitblit softcursor [last unloaded: microcode]
Nov 26 21:47:54 localhost kernel: [70773.950852] CPU: 0 PID: 0 Comm:
swapper/0 Tainted: G W 3.10.11-0.xs1.8.50.127.377543 #1
Nov 26 21:47:54 localhost kernel: [70773.950856] Hardware name: HP
ProLiant BL420c Gen8, BIOS I30 12/14/2012
Nov 26 21:47:54 localhost kernel: [70773.950860] 00000000 c13ccdfd
c167fc78 c1278546 c167fc9c c1047fd3 c15ebc78 c163f7da
Nov 26 21:47:54 localhost kernel: [70773.950873] 00001069 c13ccdfd
dff404c8 00000040 00000000 c167fcac c1048012 00000009
Nov 26 21:47:54 localhost kernel: [70773.950884] 00000000 c167fcd8
c13ccdfd ed383888 010cbb97 000000e2 ed383880 00000043
Nov 26 21:47:54 localhost kernel: [70773.950896] Call Trace:
Nov 26 21:47:54 localhost kernel: [70773.950905] [<c13ccdfd>] ?
net_rx_action+0xfd/0x1c0
Nov 26 21:47:54 localhost kernel: [70773.950915] [<c1278546>]
dump_stack+0x16/0x20
Nov 26 21:47:54 localhost kernel: [70773.950924] [<c1047fd3>]
warn_slowpath_common+0x63/0x80
Nov 26 21:47:54 localhost kernel: [70773.950930] [<c13ccdfd>] ?
net_rx_action+0xfd/0x1c0
Nov 26 21:47:54 localhost kernel: [70773.950937] [<c1048012>]
warn_slowpath_null+0x22/0x30
Nov 26 21:47:54 localhost kernel: [70773.950954] [<c13ccdfd>]
net_rx_action+0xfd/0x1c0
Nov 26 21:47:54 localhost kernel: [70773.950969] [<c104fcb9>]
__do_softirq+0xd9/0x1e0
Nov 26 21:47:54 localhost kernel: [70773.950985] [<c12fc045>] ?
__xen_evtchn_do_upcall+0x245/0x280
Nov 26 21:47:54 localhost kernel: [70773.951002] [<c104fe41>]
irq_exit+0x41/0x80
Nov 26 21:47:54 localhost kernel: [70773.951011] [<c12fc0e5>]
xen_evtchn_do_upcall+0x25/0x30
Nov 26 21:47:54 localhost kernel: [70773.951019] [<c147b287>]
xen_do_upcall+0x7/0xc
Nov 26 21:47:54 localhost kernel: [70773.951026] [<c10013a7>] ?
xen_hypercall_sched_op+0x7/0x20
Nov 26 21:47:54 localhost kernel: [70773.951033] [<c1007ef2>] ?
xen_safe_halt+0x12/0x20
Nov 26 21:47:54 localhost kernel: [70773.951041] [<c1015be6>]
default_idle+0x56/0xb0
Nov 26 21:47:54 localhost kernel: [70773.951046] [<c10158e7>]
arch_cpu_idle+0x17/0x30
Nov 26 21:47:54 localhost kernel: [70773.951054] [<c108e2ae>]
cpu_startup_entry+0x15e/0x1d0
Nov 26 21:47:54 localhost kernel: [70773.951064] [<c1460362>]
rest_init+0x62/0x70
Nov 26 21:47:54 localhost kernel: [70773.951071] [<c16efcea>]
start_kernel+0x39a/0x3b0
Nov 26 21:47:54 localhost kernel: [70773.951076] [<c16ef520>] ?
repair_env_string+0x60/0x60
Nov 26 21:47:54 localhost kernel: [70773.951082] [<c16ef2eb>]
i386_start_kernel+0x8b/0x90
Nov 26 21:47:54 localhost kernel: [70773.951088] [<c16f2c2d>]
xen_start_kernel+0x7cd/0x7f0
Nov 26 21:47:54 localhost kernel: [70773.951097] ---[ end trace
d8c0d3f5c187aa6c ]---
Nov 26 21:47:54 localhost kernel: [70773.952034] ------------[ cut here
]------------
Nov 26 21:47:54 localhost kernel: [70773.952067] WARNING: at
drivers/net/ethernet/intel/igb/igb_main.c:2860 __igb_close+0x3d/0xb0 [igb]()
Nov 26 21:47:54 localhost kernel: [70773.952071] Modules linked in: tun
nfsv3 nfs_acl nfs fscache dm_multipath scsi_dh lockd sunrpc openvswitch
ipt_REJECT nf_conntrack_ipv4 nf_defrag_ip
v4 xt_tcpudp xt_conntrack nf_conntrack iptable_filter ip_tables x_tables
nls_utf8 isofs dm_mirror video backlight sbs sbshc hed acpi_ipmi
ipmi_msghandler nvram sg psmouse serio_raw igb i2c_algo_bit ptp pps_core
hpilo tpm_tis tpm tpm_bios lpc_ich mfd_core ehci_pci crc32_pclmul
aesni_intel ablk_helper cryptd lrw aes_i586 xts gf128mul dm_region_hash
dm_log dm_mod shpchp hpsa sd_mod scsi_mod uhci_hcd ohci_hcd ehci_hcd
fbcon font tileblit bitblit softcursor [last unloaded: microcode]
Nov 26 21:47:54 localhost kernel: [70773.952150] CPU: 4 PID: 3467 Comm:
ifconfig Tainted: G W 3.10.11-0.xs1.8.50.127.377543 #1
Nov 26 21:47:54 localhost kernel: [70773.952153] Hardware name: HP
ProLiant BL420c Gen8, BIOS I30 12/14/2012
Nov 26 21:47:54 localhost kernel: [70773.952157] 00000000 eddcec4d
ca701d8c c1278546 ca701db0 c1047fd3 c15ebc78 edde1b0c
Nov 26 21:47:54 localhost kernel: [70773.952169] 00000b2c eddcec4d
00000000 e35504c0 e5f17000 ca701dc0 c1048012 00000009
Nov 26 21:47:54 localhost kernel: [70773.952180] 00000000 ca701dd4
eddcec4d e3550000 ca701e00 ca701e00 ca701ddc eddceccf
Nov 26 21:47:54 localhost kernel: [70773.952192] Call Trace:
Nov 26 21:47:54 localhost kernel: [70773.952207] [<eddcec4d>] ?
__igb_close+0x3d/0xb0 [igb]
Nov 26 21:47:54 localhost kernel: [70773.952216] [<c1278546>]
dump_stack+0x16/0x20
Nov 26 21:47:54 localhost kernel: [70773.952223] [<c1047fd3>]
warn_slowpath_common+0x63/0x80
Nov 26 21:47:54 localhost kernel: [70773.952237] [<eddcec4d>] ?
__igb_close+0x3d/0xb0 [igb]
Nov 26 21:47:54 localhost kernel: [70773.952243] [<c1048012>]
warn_slowpath_null+0x22/0x30
Nov 26 21:47:54 localhost kernel: [70773.952255] [<eddcec4d>]
__igb_close+0x3d/0xb0 [igb]
Nov 26 21:47:54 localhost kernel: [70773.952267] [<eddceccf>]
igb_close+0xf/0x20 [igb]
Nov 26 21:47:54 localhost kernel: [70773.952275] [<c13c8691>]
__dev_close_many+0x91/0xb0
Nov 26 21:47:54 localhost kernel: [70773.952284] [<c13df583>] ?
netpoll_rx_disable+0x43/0x50
Nov 26 21:47:54 localhost kernel: [70773.952289] [<c13c9163>]
__dev_close+0x43/0x80
Nov 26 21:47:54 localhost kernel: [70773.952300] [<c13c7c28>]
__dev_change_flags+0xa8/0x120
Nov 26 21:47:54 localhost kernel: [70773.952308] [<c13c85c3>]
dev_change_flags+0x23/0x60
Nov 26 21:47:54 localhost kernel: [70773.952314] [<c1424d9c>]
devinet_ioctl+0x29c/0x600
Nov 26 21:47:54 localhost kernel: [70773.952323] [<c13dbf05>] ?
dev_ioctl+0x475/0x4d0
Nov 26 21:47:54 localhost kernel: [70773.952330] [<c1425d6b>]
inet_ioctl+0x5b/0x80
Nov 26 21:47:54 localhost kernel: [70773.952340] [<c13b776e>]
sock_ioctl+0x1fe/0x230
Nov 26 21:47:54 localhost kernel: [70773.952350] [<c13b7570>] ?
sock_recvmsg_nosec+0xb0/0xb0
Nov 26 21:47:54 localhost kernel: [70773.952360] [<c1143cf6>]
vfs_ioctl+0x26/0x40
Nov 26 21:47:54 localhost kernel: [70773.952367] [<c11448ba>]
do_vfs_ioctl+0x4ea/0x550
Nov 26 21:47:54 localhost kernel: [70773.952376] [<c113de22>] ?
final_putname+0x32/0x40
Nov 26 21:47:54 localhost kernel: [70773.952382] [<c113de22>] ?
final_putname+0x32/0x40
Nov 26 21:47:54 localhost kernel: [70773.952391] [<c113de67>] ?
putname+0x37/0x40
Nov 26 21:47:54 localhost kernel: [70773.952401] [<c1134b64>] ?
do_sys_open+0x194/0x1a0
Nov 26 21:47:54 localhost kernel: [70773.952408] [<c1144983>]
SyS_ioctl+0x63/0x90
Nov 26 21:47:54 localhost kernel: [70773.952416] [<c147ad4d>]
sysenter_do_call+0x12/0x28
Nov 26 21:47:54 localhost kernel: [70773.952423] ---[ end trace
d8c0d3f5c187aa6d ]---
Nov 26 21:47:54 localhost kernel: [70773.971294] igb 0000:04:00.1 eth1:
Reset adapter
Nov 26 21:47:54 localhost kernel: [70774.068154] igb 0000:04:00.0 eth0:
Reset adapter
Nov 26 21:47:55 localhost kernel: [70774.357949] igb: eth1 NIC Link is
Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Nov 26 21:48:00 localhost kernel: [70779.231904] igb: eth0 NIC Link is
Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Nov 26 21:48:00 localhost kernel: [70779.346793] igb: eth0 NIC Link is
Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Nov 26 21:48:02 localhost kernel: [70781.214844] igb 0000:04:00.0:
Detected Tx Unit Hang
Nov 26 21:48:02 localhost kernel: [70781.214844] Tx Queue <7>
Nov 26 21:48:02 localhost kernel: [70781.214844] TDH <0>
Nov 26 21:48:02 localhost kernel: [70781.214844] TDT <0>
Nov 26 21:48:02 localhost kernel: [70781.214844] next_to_use <1>
Nov 26 21:48:02 localhost kernel: [70781.214844] next_to_clean <0>
Nov 26 21:48:02 localhost kernel: [70781.214844] buffer_info[next_to_clean]
Nov 26 21:48:02 localhost kernel: [70781.214844] time_stamp
<10cc0cd>
Nov 26 21:48:02 localhost kernel: [70781.214844] next_to_watch
<e2d5e000>
Nov 26 21:48:02 localhost kernel: [70781.214844] jiffies
<10cc2ae>
Nov 26 21:48:02 localhost kernel: [70781.214844] desc.status
<12c000>
Nov 26 21:48:04 localhost kernel: [70783.214857] igb 0000:04:00.0:
Detected Tx Unit Hang
Nov 26 21:48:04 localhost kernel: [70783.214857] Tx Queue <7>
Nov 26 21:48:04 localhost kernel: [70783.214857] TDH <0>
Nov 26 21:48:04 localhost kernel: [70783.214857] TDT <0>
Nov 26 21:48:04 localhost kernel: [70783.214857] next_to_use <1>
Nov 26 21:48:04 localhost kernel: [70783.214857] next_to_clean <0>
Nov 26 21:48:04 localhost kernel: [70783.214857] buffer_info[next_to_clean]
Nov 26 21:48:04 localhost kernel: [70783.214857] time_stamp
<10cc0cd>
Nov 26 21:48:04 localhost kernel: [70783.214857] next_to_watch
<e2d5e000>
Nov 26 21:48:04 localhost kernel: [70783.214857] jiffies
<10cc4a2>
Nov 26 21:48:04 localhost kernel: [70783.214857] desc.status
<12c000>
Nov 26 21:48:06 localhost kernel: [70785.214700] igb 0000:04:00.0:
Detected Tx Unit Hang
Nov 26 21:48:06 localhost kernel: [70785.214700] Tx Queue <7>
Nov 26 21:48:06 localhost kernel: [70785.214700] TDH <0>
Nov 26 21:48:06 localhost kernel: [70785.214700] TDT <0>
Nov 26 21:48:06 localhost kernel: [70785.214700] next_to_use <1>
Nov 26 21:48:06 localhost kernel: [70785.214700] next_to_clean <0>
Nov 26 21:48:06 localhost kernel: [70785.214700] buffer_info[next_to_clean]
Nov 26 21:48:06 localhost kernel: [70785.214700] time_stamp
<10cc0cd>
Nov 26 21:48:06 localhost kernel: [70785.214700] next_to_watch
<e2d5e000>
Nov 26 21:48:06 localhost kernel: [70785.214700] jiffies
<10cc696>
Nov 26 21:48:06 localhost kernel: [70785.214700] desc.status
<12c000>
Nov 26 21:48:08 localhost kernel: [70787.214734] igb 0000:04:00.0:
Detected Tx Unit Hang
Nov 26 21:48:08 localhost kernel: [70787.214734] Tx Queue <7>
Nov 26 21:48:08 localhost kernel: [70787.214734] TDH <0>
Nov 26 21:48:08 localhost kernel: [70787.214734] TDT <0>
Nov 26 21:48:08 localhost kernel: [70787.214734] next_to_use <1>
Nov 26 21:48:08 localhost kernel: [70787.214734] next_to_clean <0>
Nov 26 21:48:08 localhost kernel: [70787.214734] buffer_info[next_to_clean]
Nov 26 21:48:08 localhost kernel: [70787.214734] time_stamp
<10cc0cd>
Nov 26 21:48:08 localhost kernel: [70787.214734] next_to_watch
<e2d5e000>
Nov 26 21:48:08 localhost kernel: [70787.214734] jiffies
<10cc88a>
Nov 26 21:48:08 localhost kernel: [70787.214734] desc.status
<12c000>
Nov 26 21:48:10 localhost kernel: [70789.214752] igb 0000:04:00.0:
Detected Tx Unit Hang
Nov 26 21:48:10 localhost kernel: [70789.214752] Tx Queue <7>
Nov 26 21:48:10 localhost kernel: [70789.214752] TDH <0>
Nov 26 21:48:10 localhost kernel: [70789.214752] TDT <0>
Nov 26 21:48:10 localhost kernel: [70789.214752] next_to_use <1>
Nov 26 21:48:10 localhost kernel: [70789.214752] next_to_clean <0>
Nov 26 21:48:10 localhost kernel: [70789.214752] buffer_info[next_to_clean]
Nov 26 21:48:10 localhost kernel: [70789.214752] time_stamp
<10cc0cd>
Nov 26 21:48:10 localhost kernel: [70789.214752] next_to_watch
<e2d5e000>
Nov 26 21:48:10 localhost kernel: [70789.214752] jiffies
<10cca7e>
Nov 26 21:48:10 localhost kernel: [70789.214752] desc.status
<12c000>
Nov 26 21:48:11 localhost kernel: [70790.214611] igb 0000:04:00.0 eth0:
Reset adapter
Nov 26 21:48:11 localhost kernel: [70790.246610] igb 0000:04:00.1 eth1:
Reset adapter
Nov 26 21:48:11 localhost kernel: [70790.250616] igb: eth1 NIC Link is Down
Nov 26 21:48:11 localhost kernel: [70790.340089] igb: eth0 NIC Link is
Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Nov 26 21:48:11 localhost kernel: [70790.367984] igb: eth1 NIC Link is
Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Nov 26 21:48:11 localhost kernel: [70790.598550] igb: eth1 NIC Link is Down
Nov 26 21:48:11 localhost kernel: [70790.634559] igb: eth1 NIC Link is
Up 1000 Mbps Full Duplex, Flow Control: RX/TX
Nov 26 21:48:11 localhost kernel: [70790.638593] igb: eth0 NIC Link is Down
Nov 26 21:48:11 localhost kernel: [70790.674599] igb: eth0 NIC Link is
Up 1000 Mbps Full Duplex, Flow Control: RX/TX


On 30/01/14 19:08, Zoltan Kiss wrote:
> I've experienced some queue timeout problems mentioned in the subject
> with igb and bnx2 cards. I haven't seen them on other cards so far. I'm
> using XenServer with 3.10 Dom0 kernel (however igb were already updated
> to latest version), and there are Windows guests sending data through
> these cards. I noticed these problems in XenRT test runs, and I know
> that they usually mean some lost interrupt problem or other hardware
> error, but in my case they started to appear more often, and they are
> likely connected to my netback grant mapping patches. These patches
> causing skb's with huge (~64kb) linear buffers to appear more often.
> The reason for that is an old problem in the ring protocol: originally
> the maximum amount of slots were linked to MAX_SKB_FRAGS, as every slot
> ended up as a frag of the skb. When this value were changed, netback had
> to cope with the situation by coalescing the packets into fewer frags.
> My patch series take a different approach: the leftover slots (pages)
> were assigned to a new skb's frags, and that skb were stashed to the
> frag_list of the first one. Then, before sending it off to the stack it
> calls skb = skb_copy_expand(skb, 0, 0, GFP_ATOMIC, __GFP_NOWARN), which
> basically creates a new skb and copied all the data into it. As far as I
> understood, it put everything into the linear buffer, which can amount
> to 64KB at most. The original skb are freed then, and this new one were
> sent to the stack.
> I suspect that this is the problem as it only happens when guests send
> too much slots. Does anyone familiar with these drivers have seen such
> issue before? (when these kind of skb's get stucked in the queue)