2013-06-14 11:58:54

by Holger Kiehl

[permalink] [raw]
Subject: Problems with ixgbe driver

Hello,

I have dual port 10Gb Intel network card on a 2 socket (Xeon X5690) with
a total of 12 cores. Hyperthreading is enabled so there are 24 cores.
The problem I have is that when other systems send large amount of data
the network with the intel ixgbe driver gets very slow. Ping times go up
from 0.2ms to appr. 60ms. Some FTP connections stall for more then 2
minutes. What is strange is that heatbeat is configured on the system
with a serial connection to another node and kernel always reports

ttyS0: 4 input overrun(s)

when lot of data is send and the ping time goes up.

On the network there are three vlan's configured. The network is bonded
(active-backup) together with another HP NC523SFP 10Gb 2-port Server
Adapter. When I switch the network to this card the problem goes away.
Also the ttyS0 input overruns disappear. Note also both network cards
are connected to the same switch.

The system uses Scientific Linux 6.4 with kernel.org kernel. I noticed
this behavior with kernel 3.9.5 and 3.9.6-rc1. Before I did not notice
it because traffic always went over the HP NC523SFP qlcnic card.

In search for a solution to the problem I found a newer ixgbe driver
3.15.1 (3.9.6-rc1. has 3.11.33-k) and tried that. But it has the same
problem. However when I load the module as follows:

modprobe ixgbe RSS=8,8

the problem goes away. The kernel.org ixgbe driver does not offer this
option. Why? It seems that both drivers have problems on systems with
24 cpu's. But I cannot believe that I am the only one who noticed this,
since ixgbe is widely used.

It would really be nice if one could set the RSS=8,8 option for kernel.org
ixgbe driver too. Or if someone could tell me where I can force the driver
to Receive Side Scaling to 8 even if it means editing the source code.

Below I have added some additional information. Please CC me since I
am not subscribed to any of these lists. And please do not hesitate
to ask if more information is needed.

Many thanks in advance.

Regards,
Holger


Loading ixgbe module 3.15.1 without any options:

2013-06-14T10:01:15.001506+00:00 helena kernel: [74474.075411] Intel(R) 10 Gigabit PCI Express Network Driver - version 3.15.1
2013-06-14T10:01:15.033866+00:00 helena kernel: [74474.116422] Copyright (c) 1999-2013 Intel Corporation.
2013-06-14T10:01:15.204956+00:00 helena kernel: [74474.319440] ixgbe 0000:10:00.0: (PCI Express:5.0GT/s:Width x4) 90:e2:ba:2b:40:80
2013-06-14T10:01:15.317447+00:00 helena kernel: [74474.362568] ixgbe 0000:10:00.0 eth6: MAC: 2, PHY: 15, SFP+: 5, PBA No: E68785-006
2013-06-14T10:01:15.317465+00:00 helena kernel: [74474.394068] bonding: bond0: Adding slave eth6.
2013-06-14T10:01:15.317468+00:00 helena kernel: [74474.431805] ixgbe 0000:10:00.0 eth6: Enabled Features: RxQ: 24 TxQ: 24 FdirHash RSC
2013-06-14T10:01:15.519117+00:00 helena kernel: [74474.599206] 8021q: adding VLAN 0 to HW filter on device eth6
2013-06-14T10:01:15.592853+00:00 helena kernel: [74474.633370] bonding: bond0: enslaving eth6 as a backup interface with a down link.
2013-06-14T10:01:15.592864+00:00 helena kernel: [74474.666823] ixgbe 0000:10:00.0 eth6: detected SFP+: 5
2013-06-14T10:01:15.634509+00:00 helena kernel: [74474.707900] ixgbe 0000:10:00.0 eth6: Intel(R) 10 Gigabit Network Connection
2013-06-14T10:01:15.888030+00:00 helena kernel: [74474.917771] ixgbe 0000:10:00.1: (PCI Express:5.0GT/s:Width x4) 90:e2:ba:2b:40:81
2013-06-14T10:01:15.888032+00:00 helena kernel: [74474.918516] ixgbe 0000:10:00.0 eth6: NIC Link is Up 10 Gbps, Flow Control: RX/TX
2013-06-14T10:01:15.981283+00:00 helena kernel: [74475.001538] ixgbe 0000:10:00.1 eth7: MAC: 2, PHY: 15, SFP+: 6, PBA No: E68785-006
2013-06-14T10:01:15.981293+00:00 helena kernel: [74475.006351] bonding: bond0: link status definitely up for interface eth6, 10000 Mbps full duplex.
2013-06-14T10:01:16.025063+00:00 helena kernel: [74475.094633] ixgbe 0000:10:00.1 eth7: Enabled Features: RxQ: 24 TxQ: 24 FdirHash RSC
2013-06-14T10:01:16.067357+00:00 helena kernel: [74475.138402] ixgbe 0000:10:00.1 eth7: Intel(R) 10 Gigabit Network Connection


Loading ixgbe module 3.15.1 with RSS=8,8:

2013-06-14T10:04:24.790464+00:00 helena kernel: [74663.558702] Intel(R) 10 Gigabit PCI Express Network Driver - version 3.15.1
2013-06-14T10:04:24.790484+00:00 helena kernel: [74663.601435] Copyright (c) 1999-2013 Intel Corporation.
2013-06-14T10:04:24.853174+00:00 helena kernel: [74663.630652] ixgbe: Receive-Side Scaling (RSS) set to 8
2013-06-14T10:04:25.043310+00:00 helena kernel: [74663.813984] ixgbe 0000:10:00.0: (PCI Express:5.0GT/s:Width x4) 90:e2:ba:2b:40:80
2013-06-14T10:04:25.113547+00:00 helena kernel: [74663.853937] ixgbe 0000:10:00.0 eth6: MAC: 2, PHY: 15, SFP+: 5, PBA No: E68785-006
2013-06-14T10:04:25.113561+00:00 helena kernel: [74663.882910] bonding: bond0: Adding slave eth6.
2013-06-14T10:04:25.159260+00:00 helena kernel: [74663.924060] ixgbe 0000:10:00.0 eth6: Enabled Features: RxQ: 8 TxQ: 8 FdirHash RSC
2013-06-14T10:04:25.244858+00:00 helena kernel: [74664.023362] 8021q: adding VLAN 0 to HW filter on device eth6
2013-06-14T10:04:25.319005+00:00 helena kernel: [74664.055452] bonding: bond0: enslaving eth6 as a backup interface with a down link.
2013-06-14T10:04:25.319012+00:00 helena kernel: [74664.084567] ixgbe 0000:10:00.0 eth6: detected SFP+: 5
2013-06-14T10:04:25.362038+00:00 helena kernel: [74664.130774] ixgbe 0000:10:00.0 eth6: Intel(R) 10 Gigabit Network Connection
2013-06-14T10:04:25.391707+00:00 helena kernel: [74664.172815] ixgbe: Receive-Side Scaling (RSS) set to 8
2013-06-14T10:04:25.737735+00:00 helena kernel: [74664.334858] ixgbe 0000:10:00.0 eth6: NIC Link is Up 10 Gbps, Flow Control: RX/TX
2013-06-14T10:04:25.737763+00:00 helena kernel: [74664.353153] ixgbe 0000:10:00.1: (PCI Express:5.0GT/s:Width x4) 90:e2:ba:2b:40:81
2013-06-14T10:04:25.737765+00:00 helena kernel: [74664.353236] ixgbe 0000:10:00.1 eth7: MAC: 2, PHY: 15, SFP+: 6, PBA No: E68785-006
2013-06-14T10:04:25.737767+00:00 helena kernel: [74664.353237] ixgbe 0000:10:00.1 eth7: Enabled Features: RxQ: 8 TxQ: 8 FdirHash RSC
2013-06-14T10:04:25.737769+00:00 helena kernel: [74664.353263] ixgbe 0000:10:00.1 eth7: Intel(R) 10 Gigabit Network Connection
2013-06-14T10:04:25.817303+00:00 helena kernel: [74664.574233] bonding: bond0: link status definitely up for interface eth6, 10000 Mbps full duplex.


When kernel.org ixgbe driver loads. Note, compiled in, not module:

2013-06-12T13:15:00.770116+00:00 helena kernel: [ 14.873719] ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver - version 3.11.33-k
2013-06-12T13:15:00.770117+00:00 helena kernel: [ 14.924513] ixgbe: Copyright (c) 1999-2013 Intel Corporation.
2013-06-12T13:15:00.770213+00:00 helena kernel: [ 15.086272] ixgbe 0000:10:00.0: Multiqueue Enabled: Rx Queue count = 24, Tx Queue count = 24
2013-06-12T13:15:00.770214+00:00 helena kernel: [ 15.086398] ixgbe 0000:10:00.0: (PCI Express:5.0GT/s:Width x4) 90:e2:ba:2b:40:80
2013-06-12T13:15:00.770215+00:00 helena kernel: [ 15.086481] ixgbe 0000:10:00.0: MAC: 2, PHY: 15, SFP+: 5, PBA No: E68785-006
2013-06-12T13:15:00.770216+00:00 helena kernel: [ 15.086482] ixgbe 0000:10:00.0: PCI-Express bandwidth available for this card is not sufficient for optimal performance.
2013-06-12T13:15:00.770217+00:00 helena kernel: [ 15.086483] ixgbe 0000:10:00.0: For optimal performance a x8 PCI-Express slot is required.
2013-06-12T13:15:00.770217+00:00 helena kernel: [ 15.087586] ixgbe 0000:10:00.0: Intel(R) 10 Gigabit Network Connection
2013-06-12T13:15:00.770325+00:00 helena kernel: [ 15.250040] ixgbe 0000:10:00.1: Multiqueue Enabled: Rx Queue count = 24, Tx Queue count = 24
2013-06-12T13:15:00.770326+00:00 helena kernel: [ 15.250166] ixgbe 0000:10:00.1: (PCI Express:5.0GT/s:Width x4) 90:e2:ba:2b:40:81
2013-06-12T13:15:00.770326+00:00 helena kernel: [ 15.250249] ixgbe 0000:10:00.1: MAC: 2, PHY: 15, SFP+: 6, PBA No: E68785-006
2013-06-12T13:15:00.770327+00:00 helena kernel: [ 15.250250] ixgbe 0000:10:00.1: PCI-Express bandwidth available for this card is not sufficient for optimal performance.
2013-06-12T13:15:00.770328+00:00 helena kernel: [ 15.250250] ixgbe 0000:10:00.1: For optimal performance a x8 PCI-Express slot is required.
2013-06-12T13:15:00.770329+00:00 helena kernel: [ 15.251327] ixgbe 0000:10:00.1: Intel(R) 10 Gigabit Network Connection


lspci:

00:00.0 Host bridge: Intel Corporation 5520 I/O Hub to ESI Port (rev 13)
00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1 (rev 13)
00:02.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 2 (rev 13)
00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 3 (rev 13)
00:04.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express Root Port 4 (rev 13)
00:05.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express Root Port 5 (rev 13)
00:06.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express Root Port 6 (rev 13)
00:07.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 7 (rev 13)
00:08.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 8 (rev 13)
00:09.0 PCI bridge: Intel Corporation 7500/5520/5500/X58 I/O Hub PCI Express Root Port 9 (rev 13)
00:0a.0 PCI bridge: Intel Corporation 7500/5520/5500/X58 I/O Hub PCI Express Root Port 10 (rev 13)
00:0d.0 Host bridge: Intel Corporation Device 343a (rev 13)
00:0d.1 Host bridge: Intel Corporation Device 343b (rev 13)
00:0d.2 Host bridge: Intel Corporation Device 343c (rev 13)
00:0d.3 Host bridge: Intel Corporation Device 343d (rev 13)
00:0d.4 Host bridge: Intel Corporation 7500/5520/5500/X58 Physical Layer Port 0 (rev 13)
00:0d.5 Host bridge: Intel Corporation 7500/5520/5500 Physical Layer Port 1 (rev 13)
00:0d.6 Host bridge: Intel Corporation Device 341a (rev 13)
00:0e.0 Host bridge: Intel Corporation Device 341c (rev 13)
00:0e.1 Host bridge: Intel Corporation Device 341d (rev 13)
00:0e.2 Host bridge: Intel Corporation Device 341e (rev 13)
00:0e.3 Host bridge: Intel Corporation Device 341f (rev 13)
00:0e.4 Host bridge: Intel Corporation Device 3439 (rev 13)
00:14.0 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub System Management Registers (rev 13)
00:14.1 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub GPIO and Scratch Pad Registers (rev 13)
00:14.2 PIC: Intel Corporation 7500/5520/5500/X58 I/O Hub Control Status and RAS Registers (rev 13)
00:1c.0 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 1
00:1c.2 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 3
00:1c.4 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 5
00:1d.0 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #1
00:1d.1 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #2
00:1d.2 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #3
00:1d.3 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #6
00:1d.7 USB controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #1
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
00:1f.0 ISA bridge: Intel Corporation 82801JIB (ICH10) LPC Interface Controller
01:03.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI ES1000 (rev 02)
02:00.0 System peripheral: Hewlett-Packard Company Integrated Lights-Out Standard Slave Instrumentation & System Support (rev 04)
02:00.2 System peripheral: Hewlett-Packard Company Integrated Lights-Out Standard Management Processor Support and Messaging (rev 04)
02:00.4 USB controller: Hewlett-Packard Company Integrated Lights-Out Standard Virtual USB Controller (rev 01)
03:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
03:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
04:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
04:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
05:00.0 RAID bus controller: Hewlett-Packard Company Smart Array G6 controllers (rev 01)
08:00.0 PCI bridge: PLX Technology, Inc. PEX 8616 16-lane, 4-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev bb)
09:04.0 PCI bridge: PLX Technology, Inc. PEX 8616 16-lane, 4-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev bb)
09:05.0 PCI bridge: PLX Technology, Inc. PEX 8616 16-lane, 4-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev bb)
09:06.0 PCI bridge: PLX Technology, Inc. PEX 8616 16-lane, 4-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev bb)
0b:00.0 Mass storage controller: Fusion-io ioDimm3 (rev 01)
0c:00.0 Mass storage controller: Fusion-io ioDimm3 (rev 01)
10:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)
10:00.1 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)
13:00.0 RAID bus controller: Hewlett-Packard Company Smart Array G6 controllers (rev 01)
1a:00.0 Ethernet controller: QLogic Corp. cLOM8214 1/10GbE Controller (rev 54)
1a:00.1 Ethernet controller: QLogic Corp. cLOM8214 1/10GbE Controller (rev 54)
3e:00.0 Host bridge: Intel Corporation Xeon 5600 Series QuickPath Architecture Generic Non-core Registers (rev 02)
3e:00.1 Host bridge: Intel Corporation Xeon 5600 Series QuickPath Architecture System Address Decoder (rev 02)
3e:02.0 Host bridge: Intel Corporation Xeon 5600 Series QPI Link 0 (rev 02)
3e:02.1 Host bridge: Intel Corporation Xeon 5600 Series QPI Physical 0 (rev 02)
3e:02.2 Host bridge: Intel Corporation Xeon 5600 Series Mirror Port Link 0 (rev 02)
3e:02.3 Host bridge: Intel Corporation Xeon 5600 Series Mirror Port Link 1 (rev 02)
3e:02.4 Host bridge: Intel Corporation Xeon 5600 Series QPI Link 1 (rev 02)
3e:02.5 Host bridge: Intel Corporation Xeon 5600 Series QPI Physical 1 (rev 02)
3e:03.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Registers (rev 02)
3e:03.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Target Address Decoder (rev 02)
3e:03.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller RAS Registers (rev 02)
3e:03.4 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Test Registers (rev 02)
3e:04.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 0 Control (rev 02)
3e:04.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 0 Address (rev 02)
3e:04.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 0 Rank (rev 02)
3e:04.3 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 0 Thermal Control (rev 02)
3e:05.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 1 Control (rev 02)
3e:05.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 1 Address (rev 02)
3e:05.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 1 Rank (rev 02)
3e:05.3 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 1 Thermal Control (rev 02)
3e:06.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 2 Control (rev 02)
3e:06.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 2 Address (rev 02)
3e:06.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 2 Rank (rev 02)
3e:06.3 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 2 Thermal Control (rev 02)
3f:00.0 Host bridge: Intel Corporation Xeon 5600 Series QuickPath Architecture Generic Non-core Registers (rev 02)
3f:00.1 Host bridge: Intel Corporation Xeon 5600 Series QuickPath Architecture System Address Decoder (rev 02)
3f:02.0 Host bridge: Intel Corporation Xeon 5600 Series QPI Link 0 (rev 02)
3f:02.1 Host bridge: Intel Corporation Xeon 5600 Series QPI Physical 0 (rev 02)
3f:02.2 Host bridge: Intel Corporation Xeon 5600 Series Mirror Port Link 0 (rev 02)
3f:02.3 Host bridge: Intel Corporation Xeon 5600 Series Mirror Port Link 1 (rev 02)
3f:02.4 Host bridge: Intel Corporation Xeon 5600 Series QPI Link 1 (rev 02)
3f:02.5 Host bridge: Intel Corporation Xeon 5600 Series QPI Physical 1 (rev 02)
3f:03.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Registers (rev 02)
3f:03.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Target Address Decoder (rev 02)
3f:03.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller RAS Registers (rev 02)
3f:03.4 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Test Registers (rev 02)
3f:04.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 0 Control (rev 02)
3f:04.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 0 Address (rev 02)
3f:04.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 0 Rank (rev 02)
3f:04.3 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 0 Thermal Control (rev 02)
3f:05.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 1 Control (rev 02)
3f:05.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 1 Address (rev 02)
3f:05.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 1 Rank (rev 02)
3f:05.3 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 1 Thermal Control (rev 02)
3f:06.0 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 2 Control (rev 02)
3f:06.1 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 2 Address (rev 02)
3f:06.2 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 2 Rank (rev 02)
3f:06.3 Host bridge: Intel Corporation Xeon 5600 Series Integrated Memory Controller Channel 2 Thermal Control (rev 02)


lscpu:

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 24
On-line CPU(s) list: 0-23
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 44
Stepping: 2
CPU MHz: 1600.000
BogoMIPS: 6931.29
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 12288K
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23


2013-06-14 16:08:09

by Tantilov, Emil S

[permalink] [raw]
Subject: RE: Problems with ixgbe driver

>-----Original Message-----
>From: [email protected] [mailto:[email protected]] On
>Behalf Of Holger Kiehl
>Sent: Friday, June 14, 2013 4:50 AM
>To: [email protected]
>Cc: linux-kernel; [email protected]
>Subject: Problems with ixgbe driver
>
>Hello,
>
>I have dual port 10Gb Intel network card on a 2 socket (Xeon X5690) with
>a total of 12 cores. Hyperthreading is enabled so there are 24 cores.
>The problem I have is that when other systems send large amount of data
>the network with the intel ixgbe driver gets very slow. Ping times go up
>from 0.2ms to appr. 60ms. Some FTP connections stall for more then 2
>minutes. What is strange is that heatbeat is configured on the system
>with a serial connection to another node and kernel always reports

If the network slows down so much there should be some indication in dmesg. Like Tx hangs perhaps.
Can you provide the output of dmesg and ethtool -S from the offending interface after the issue occurs?

>
> ttyS0: 4 input overrun(s)
>
>when lot of data is send and the ping time goes up.
>
>On the network there are three vlan's configured. The network is bonded
>(active-backup) together with another HP NC523SFP 10Gb 2-port Server
>Adapter. When I switch the network to this card the problem goes away.
>Also the ttyS0 input overruns disappear. Note also both network cards
>are connected to the same switch.
>
>The system uses Scientific Linux 6.4 with kernel.org kernel. I noticed
>this behavior with kernel 3.9.5 and 3.9.6-rc1. Before I did not notice
>it because traffic always went over the HP NC523SFP qlcnic card.
>
>In search for a solution to the problem I found a newer ixgbe driver
>3.15.1 (3.9.6-rc1. has 3.11.33-k) and tried that. But it has the same
>problem. However when I load the module as follows:
>
> modprobe ixgbe RSS=8,8
>
>the problem goes away. The kernel.org ixgbe driver does not offer this
>option. Why? It seems that both drivers have problems on systems with

If you are using newer kernel and ethtool version you can use `ethtool -L ethX combined Y` to control the number of queues per interface.

>24 cpu's. But I cannot believe that I am the only one who noticed this,
>since ixgbe is widely used.

We run traffic with multiple queues all the time and I don't think what you are reporting is a generic issue. Most likely it's something related to your setup/system.

>
>It would really be nice if one could set the RSS=8,8 option for kernel.org
>ixgbe driver too. Or if someone could tell me where I can force the driver
>to Receive Side Scaling to 8 even if it means editing the source code.
>
>Below I have added some additional information. Please CC me since I
>am not subscribed to any of these lists. And please do not hesitate
>to ask if more information is needed.

I would suggest that you open up a bug at e1000.sf.net - describe your configuration and attach the relevant info (dmesg, ethtool -S, lspci etc). This would make it easier for us to follow.

Thanks,
Emil

2013-06-17 09:11:50

by Holger Kiehl

[permalink] [raw]
Subject: RE: Problems with ixgbe driver

Hello,

first, thank you for the quick help!

On Fri, 14 Jun 2013, Tantilov, Emil S wrote:

>> -----Original Message-----
>> From: [email protected] [mailto:[email protected]] On
>> Behalf Of Holger Kiehl
>> Sent: Friday, June 14, 2013 4:50 AM
>> To: [email protected]
>> Cc: linux-kernel; [email protected]
>> Subject: Problems with ixgbe driver
>>
>> Hello,
>>
>> I have dual port 10Gb Intel network card on a 2 socket (Xeon X5690) with
>> a total of 12 cores. Hyperthreading is enabled so there are 24 cores.
>> The problem I have is that when other systems send large amount of data
>> the network with the intel ixgbe driver gets very slow. Ping times go up
>> from 0.2ms to appr. 60ms. Some FTP connections stall for more then 2
>> minutes. What is strange is that heatbeat is configured on the system
>> with a serial connection to another node and kernel always reports
>
> If the network slows down so much there should be some indication in dmesg. Like Tx hangs perhaps.
> Can you provide the output of dmesg and ethtool -S from the offending interface after the issue occurs?
>
No, there is absolute no indication in dmesg or /var/log/messages. But here
the ethtool output when ping times go up:

root@helena:~# ethtool -S eth6
NIC statistics:
rx_packets: 4410779
tx_packets: 8902514
rx_bytes: 2014041824
tx_bytes: 13199913202
rx_errors: 0
tx_errors: 0
rx_dropped: 0
tx_dropped: 0
multicast: 4245
collisions: 0
rx_over_errors: 0
rx_crc_errors: 0
rx_frame_errors: 0
rx_fifo_errors: 0
rx_missed_errors: 28143
tx_aborted_errors: 0
tx_carrier_errors: 0
tx_fifo_errors: 0
tx_heartbeat_errors: 0
rx_pkts_nic: 2401276937
tx_pkts_nic: 3868619482
rx_bytes_nic: 868282794731
tx_bytes_nic: 5743382228649
lsc_int: 4
tx_busy: 0
non_eop_descs: 743957
broadcast: 1745556
rx_no_buffer_count: 0
tx_timeout_count: 0
tx_restart_queue: 425
rx_long_length_errors: 0
rx_short_length_errors: 0
tx_flow_control_xon: 171
rx_flow_control_xon: 0
tx_flow_control_xoff: 277
rx_flow_control_xoff: 0
rx_csum_offload_errors: 0
alloc_rx_page_failed: 0
alloc_rx_buff_failed: 0
lro_aggregated: 0
lro_flushed: 0
rx_no_dma_resources: 0
hw_rsc_aggregated: 1153374
hw_rsc_flushed: 129169
fdir_match: 2424508153
fdir_miss: 1706029
fdir_overflow: 33
os2bmc_rx_by_bmc: 0
os2bmc_tx_by_bmc: 0
os2bmc_tx_by_host: 0
os2bmc_rx_by_host: 0
tx_queue_0_packets: 470182
tx_queue_0_bytes: 690123121
tx_queue_1_packets: 797784
tx_queue_1_bytes: 1203968369
tx_queue_2_packets: 648692
tx_queue_2_bytes: 950171718
tx_queue_3_packets: 647434
tx_queue_3_bytes: 948647518
tx_queue_4_packets: 263216
tx_queue_4_bytes: 394806409
tx_queue_5_packets: 426786
tx_queue_5_bytes: 629387628
tx_queue_6_packets: 253708
tx_queue_6_bytes: 371774276
tx_queue_7_packets: 544634
tx_queue_7_bytes: 812223169
tx_queue_8_packets: 279056
tx_queue_8_bytes: 407792510
tx_queue_9_packets: 735792
tx_queue_9_bytes: 1092693961
tx_queue_10_packets: 393576
tx_queue_10_bytes: 583283986
tx_queue_11_packets: 712565
tx_queue_11_bytes: 1037740789
tx_queue_12_packets: 264445
tx_queue_12_bytes: 386010613
tx_queue_13_packets: 246828
tx_queue_13_bytes: 370387352
tx_queue_14_packets: 191789
tx_queue_14_bytes: 281160607
tx_queue_15_packets: 384581
tx_queue_15_bytes: 579890782
tx_queue_16_packets: 175119
tx_queue_16_bytes: 261312970
tx_queue_17_packets: 151219
tx_queue_17_bytes: 220259675
tx_queue_18_packets: 467746
tx_queue_18_bytes: 707472612
tx_queue_19_packets: 30642
tx_queue_19_bytes: 44896997
tx_queue_20_packets: 157957
tx_queue_20_bytes: 238772784
tx_queue_21_packets: 287819
tx_queue_21_bytes: 434965075
tx_queue_22_packets: 269298
tx_queue_22_bytes: 407637986
tx_queue_23_packets: 102344
tx_queue_23_bytes: 145542751
rx_queue_0_packets: 219438
rx_queue_0_bytes: 273936020
rx_queue_1_packets: 398269
rx_queue_1_bytes: 52080243
rx_queue_2_packets: 285870
rx_queue_2_bytes: 102299543
rx_queue_3_packets: 347238
rx_queue_3_bytes: 145830086
rx_queue_4_packets: 118448
rx_queue_4_bytes: 17515218
rx_queue_5_packets: 228029
rx_queue_5_bytes: 114142681
rx_queue_6_packets: 94285
rx_queue_6_bytes: 107618165
rx_queue_7_packets: 289615
rx_queue_7_bytes: 168428647
rx_queue_8_packets: 109288
rx_queue_8_bytes: 35178080
rx_queue_9_packets: 393061
rx_queue_9_bytes: 377122152
rx_queue_10_packets: 155004
rx_queue_10_bytes: 66560302
rx_queue_11_packets: 381580
rx_queue_11_bytes: 182550920
rx_queue_12_packets: 140681
rx_queue_12_bytes: 44514373
rx_queue_13_packets: 127091
rx_queue_13_bytes: 18524907
rx_queue_14_packets: 92548
rx_queue_14_bytes: 34725166
rx_queue_15_packets: 199612
rx_queue_15_bytes: 66689821
rx_queue_16_packets: 90018
rx_queue_16_bytes: 29206483
rx_queue_17_packets: 81277
rx_queue_17_bytes: 55206035
rx_queue_18_packets: 224446
rx_queue_18_bytes: 14869858
rx_queue_19_packets: 16975
rx_queue_19_bytes: 48400959
rx_queue_20_packets: 80806
rx_queue_20_bytes: 5398100
rx_queue_21_packets: 146815
rx_queue_21_bytes: 9796087
rx_queue_22_packets: 136018
rx_queue_22_bytes: 9023369
rx_queue_23_packets: 54781
rx_queue_23_bytes: 34724433

This was with the 3.15.1 driver and setting the combinde queue to 24 via
ethtool, as you suggested below.

>>
>> ttyS0: 4 input overrun(s)
>>
>> when lot of data is send and the ping time goes up.
>>
>> On the network there are three vlan's configured. The network is bonded
>> (active-backup) together with another HP NC523SFP 10Gb 2-port Server
>> Adapter. When I switch the network to this card the problem goes away.
>> Also the ttyS0 input overruns disappear. Note also both network cards
>> are connected to the same switch.
>>
>> The system uses Scientific Linux 6.4 with kernel.org kernel. I noticed
>> this behavior with kernel 3.9.5 and 3.9.6-rc1. Before I did not notice
>> it because traffic always went over the HP NC523SFP qlcnic card.
>>
>> In search for a solution to the problem I found a newer ixgbe driver
>> 3.15.1 (3.9.6-rc1. has 3.11.33-k) and tried that. But it has the same
>> problem. However when I load the module as follows:
>>
>> modprobe ixgbe RSS=8,8
>>
>> the problem goes away. The kernel.org ixgbe driver does not offer this
>> option. Why? It seems that both drivers have problems on systems with
>
> If you are using newer kernel and ethtool version you can use `ethtool -L ethX combined Y` to control the number of queues per interface.
>
Okay, thank you! I did not know this.

>> 24 cpu's. But I cannot believe that I am the only one who noticed this,
>> since ixgbe is widely used.
>
> We run traffic with multiple queues all the time and I don't think what you are reporting is a generic issue. Most likely it's something related to your setup/system.
>
Yes, I think so too. But what could it be? Please, just ask what other
information I could provide. As I already mentioned earlier the ixgbe card
is bonded with a qlogic nic and I have two (not three) vlan configured over
over this bond. Maybe the following is useful (eth6 is the ixgbe driver):

root@helena:~# ethtool -k eth6
Features for eth6:
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: on
tx-checksum-ip-generic: off [fixed]
tx-checksum-ipv6: on
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: on
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: on
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: on
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]

>>
>> It would really be nice if one could set the RSS=8,8 option for kernel.org
>> ixgbe driver too. Or if someone could tell me where I can force the driver
>> to Receive Side Scaling to 8 even if it means editing the source code.
>>
>> Below I have added some additional information. Please CC me since I
>> am not subscribed to any of these lists. And please do not hesitate
>> to ask if more information is needed.
>
> I would suggest that you open up a bug at e1000.sf.net - describe your configuration and attach the relevant info (dmesg, ethtool -S, lspci etc). This would make it easier for us to follow.
>
Sorry, but I could not find out how I can open a new bug. I could just view
existing bugs. Please give me a hint what I need to do.

Thanks,
Holger

2013-06-24 14:38:40

by Tantilov, Emil S

[permalink] [raw]
Subject: RE: Problems with ixgbe driver

>-----Original Message-----
>From: Holger Kiehl [mailto:[email protected]]
>Sent: Monday, June 17, 2013 2:12 AM
>To: Tantilov, Emil S
>Cc: [email protected]; linux-kernel; [email protected]
>Subject: RE: Problems with ixgbe driver
>
>Hello,
>
>first, thank you for the quick help!
>
>On Fri, 14 Jun 2013, Tantilov, Emil S wrote:
>
>>> -----Original Message-----
>>> From: [email protected] [mailto:[email protected]]
>On
>>> Behalf Of Holger Kiehl
>>> Sent: Friday, June 14, 2013 4:50 AM
>>> To: [email protected]
>>> Cc: linux-kernel; [email protected]
>>> Subject: Problems with ixgbe driver
>>>
>>> Hello,
>>>
>>> I have dual port 10Gb Intel network card on a 2 socket (Xeon X5690) with
>>> a total of 12 cores. Hyperthreading is enabled so there are 24 cores.
>>> The problem I have is that when other systems send large amount of data
>>> the network with the intel ixgbe driver gets very slow. Ping times go up
>>> from 0.2ms to appr. 60ms. Some FTP connections stall for more then 2
>>> minutes. What is strange is that heatbeat is configured on the system
>>> with a serial connection to another node and kernel always reports
>>
>> If the network slows down so much there should be some indication in
>dmesg. Like Tx hangs perhaps.
>> Can you provide the output of dmesg and ethtool -S from the offending
>interface after the issue occurs?
>>
>No, there is absolute no indication in dmesg or /var/log/messages. But here
>the ethtool output when ping times go up:
>
> root@helena:~# ethtool -S eth6
> NIC statistics:
> rx_packets: 4410779
> tx_packets: 8902514
> rx_bytes: 2014041824
> tx_bytes: 13199913202
> rx_errors: 0
> tx_errors: 0
> rx_dropped: 0
> tx_dropped: 0
> multicast: 4245
> collisions: 0
> rx_over_errors: 0
> rx_crc_errors: 0
> rx_frame_errors: 0
> rx_fifo_errors: 0
> rx_missed_errors: 28143
> tx_aborted_errors: 0
> tx_carrier_errors: 0
> tx_fifo_errors: 0
> tx_heartbeat_errors: 0
> rx_pkts_nic: 2401276937
> tx_pkts_nic: 3868619482
> rx_bytes_nic: 868282794731
> tx_bytes_nic: 5743382228649
> lsc_int: 4
> tx_busy: 0
> non_eop_descs: 743957
> broadcast: 1745556
> rx_no_buffer_count: 0
> tx_timeout_count: 0
> tx_restart_queue: 425
> rx_long_length_errors: 0
> rx_short_length_errors: 0
> tx_flow_control_xon: 171
> rx_flow_control_xon: 0
> tx_flow_control_xoff: 277
> rx_flow_control_xoff: 0
> rx_csum_offload_errors: 0
> alloc_rx_page_failed: 0
> alloc_rx_buff_failed: 0
> lro_aggregated: 0
> lro_flushed: 0
> rx_no_dma_resources: 0
> hw_rsc_aggregated: 1153374
> hw_rsc_flushed: 129169
> fdir_match: 2424508153
> fdir_miss: 1706029
> fdir_overflow: 33
> os2bmc_rx_by_bmc: 0
> os2bmc_tx_by_bmc: 0
> os2bmc_tx_by_host: 0
> os2bmc_rx_by_host: 0
> tx_queue_0_packets: 470182
> tx_queue_0_bytes: 690123121
> tx_queue_1_packets: 797784
> tx_queue_1_bytes: 1203968369
> tx_queue_2_packets: 648692
> tx_queue_2_bytes: 950171718
> tx_queue_3_packets: 647434
> tx_queue_3_bytes: 948647518
> tx_queue_4_packets: 263216
> tx_queue_4_bytes: 394806409
> tx_queue_5_packets: 426786
> tx_queue_5_bytes: 629387628
> tx_queue_6_packets: 253708
> tx_queue_6_bytes: 371774276
> tx_queue_7_packets: 544634
> tx_queue_7_bytes: 812223169
> tx_queue_8_packets: 279056
> tx_queue_8_bytes: 407792510
> tx_queue_9_packets: 735792
> tx_queue_9_bytes: 1092693961
> tx_queue_10_packets: 393576
> tx_queue_10_bytes: 583283986
> tx_queue_11_packets: 712565
> tx_queue_11_bytes: 1037740789
> tx_queue_12_packets: 264445
> tx_queue_12_bytes: 386010613
> tx_queue_13_packets: 246828
> tx_queue_13_bytes: 370387352
> tx_queue_14_packets: 191789
> tx_queue_14_bytes: 281160607
> tx_queue_15_packets: 384581
> tx_queue_15_bytes: 579890782
> tx_queue_16_packets: 175119
> tx_queue_16_bytes: 261312970
> tx_queue_17_packets: 151219
> tx_queue_17_bytes: 220259675
> tx_queue_18_packets: 467746
> tx_queue_18_bytes: 707472612
> tx_queue_19_packets: 30642
> tx_queue_19_bytes: 44896997
> tx_queue_20_packets: 157957
> tx_queue_20_bytes: 238772784
> tx_queue_21_packets: 287819
> tx_queue_21_bytes: 434965075
> tx_queue_22_packets: 269298
> tx_queue_22_bytes: 407637986
> tx_queue_23_packets: 102344
> tx_queue_23_bytes: 145542751
> rx_queue_0_packets: 219438
> rx_queue_0_bytes: 273936020
> rx_queue_1_packets: 398269
> rx_queue_1_bytes: 52080243
> rx_queue_2_packets: 285870
> rx_queue_2_bytes: 102299543
> rx_queue_3_packets: 347238
> rx_queue_3_bytes: 145830086
> rx_queue_4_packets: 118448
> rx_queue_4_bytes: 17515218
> rx_queue_5_packets: 228029
> rx_queue_5_bytes: 114142681
> rx_queue_6_packets: 94285
> rx_queue_6_bytes: 107618165
> rx_queue_7_packets: 289615
> rx_queue_7_bytes: 168428647
> rx_queue_8_packets: 109288
> rx_queue_8_bytes: 35178080
> rx_queue_9_packets: 393061
> rx_queue_9_bytes: 377122152
> rx_queue_10_packets: 155004
> rx_queue_10_bytes: 66560302
> rx_queue_11_packets: 381580
> rx_queue_11_bytes: 182550920
> rx_queue_12_packets: 140681
> rx_queue_12_bytes: 44514373
> rx_queue_13_packets: 127091
> rx_queue_13_bytes: 18524907
> rx_queue_14_packets: 92548
> rx_queue_14_bytes: 34725166
> rx_queue_15_packets: 199612
> rx_queue_15_bytes: 66689821
> rx_queue_16_packets: 90018
> rx_queue_16_bytes: 29206483
> rx_queue_17_packets: 81277
> rx_queue_17_bytes: 55206035
> rx_queue_18_packets: 224446
> rx_queue_18_bytes: 14869858
> rx_queue_19_packets: 16975
> rx_queue_19_bytes: 48400959
> rx_queue_20_packets: 80806
> rx_queue_20_bytes: 5398100
> rx_queue_21_packets: 146815
> rx_queue_21_bytes: 9796087
> rx_queue_22_packets: 136018
> rx_queue_22_bytes: 9023369
> rx_queue_23_packets: 54781
> rx_queue_23_bytes: 34724433
>
>This was with the 3.15.1 driver and setting the combinde queue to 24 via
>ethtool, as you suggested below.

Sorry for the late reply.

There are 2 counters that could be related to this:

rx_missed_errors and fdir_overflow. Since you see better results by lowering the number of queues I'm guessing it's most likely due to the Flow Director running out of filters. If you can easily reproduce this - run watch -d -n1 "ethtool -S ethX" and see if you can catch any of these counters incrementing.

You need an account at sourceforge in order to submit a ticket.

Thanks,
Emil