2004-11-11 23:58:25

by Radheka Godse

[permalink] [raw]
Subject: [Bonding-devel][PATCH]Zero Copy Transmit Support (Update)

Please ignore previous submittal, it has reversed tags ('-'s instead of
'+'s; I had swapped the order for diff of modified and un-modified
src trees)

The patch below ADDS Zero Copy Transmit Support to bonding device.

We saw around ~50% system utilization improvement with tcp sendfile()
function enabled netperf test in 802.3ad mode.

Note that this patch was generated for 2.6.9 kernel after applying
bond-patch-2.6.1(also attached to this thread) that was submitted last week and
got accepted into the kernel.

Signed-off-by: Radheka Godse <[email protected]>

diff -uprN -X dontdiff linux-2.6.9_bond-patch-2.6.1/drivers/net/bonding/bonding.h linux-2.6.9/drivers/net/bonding/bonding.h
--- linux-2.6.9_bond-patch-2.6.1/drivers/net/bonding/bonding.h 2004-11-10 15:42:55.000000000 -0800
+++ linux-2.6.9/drivers/net/bonding/bonding.h 2004-11-11 13:05:54.000000000 -0800
@@ -36,8 +36,8 @@
#include "bond_3ad.h"
#include "bond_alb.h"

-#define DRV_VERSION "2.6.1"
-#define DRV_RELDATE "October 29, 2004"
+#define DRV_VERSION "2.6.2"
+#define DRV_RELDATE "November 09, 2004"
#define DRV_NAME "bonding"
#define DRV_DESCRIPTION "Ethernet Channel Bonding Driver"

diff -uprN -X dontdiff linux-2.6.9_bond-patch-2.6.1/drivers/net/bonding/bond_main.c linux-2.6.9/drivers/net/bonding/bond_main.c
--- linux-2.6.9_bond-patch-2.6.1/drivers/net/bonding/bond_main.c 2004-11-10 15:42:55.000000000 -0800
+++ linux-2.6.9/drivers/net/bonding/bond_main.c 2004-11-11 13:05:54.000000000 -0800
@@ -475,6 +475,9 @@
* Solution is to move call to dev_remove_pack outside of the
* spinlock.
* Set version to 2.6.1.
+ * 2004/11/09 - Radheka Godse <radheka.godse at intel dot com>
+ * - Added Zero Copy Transmit Support by setting appropriate flags.
+ * Set version to 2.6.2.
*
*/

@@ -4318,7 +4321,22 @@ static int __init bond_init(struct net_d
bond_dev->features |= (NETIF_F_HW_VLAN_TX |
NETIF_F_HW_VLAN_RX |
NETIF_F_HW_VLAN_FILTER);
-
+
+ /* We let the bond device publish all hardware
+ * acceleration features possible. This is OK,
+ * since if an skb is passed from the bond to
+ * a slave that doesn't support one of those
+ * features, everything is fixed in the
+ * dev_queue_xmit() function (e.g. calculate
+ * check sum, linearize the skb, etc.).
+ */
+ bond_dev->features |= (NETIF_F_SG |
+ NETIF_F_IP_CSUM |
+ NETIF_F_NO_CSUM |
+ NETIF_F_HW_CSUM |
+ NETIF_F_HIGHDMA |
+ NETIF_F_FRAGLIST);
+
#ifdef CONFIG_PROC_FS
bond_create_proc_entry(bond);
#endif


2004-11-12 00:16:34

by David Miller

[permalink] [raw]
Subject: Re: [Bonding-devel][PATCH]Zero Copy Transmit Support (Update)

On Fri, 12 Nov 2004 15:51:49 -0800 (PST)
Radheka Godse <[email protected]> wrote:

> -
> +
> + /* We let the bond device publish all hardware
> + * acceleration features possible. This is OK,
> + * since if an skb is passed from the bond to
> + * a slave that doesn't support one of those
> + * features, everything is fixed in the
> + * dev_queue_xmit() function (e.g. calculate
> + * check sum, linearize the skb, etc.).
> + */

This is very inefficient if the bond slaves don't
support these features.

I believe you when you say you saw improvement in the
case where the slaves do support TSO, but if you test
a non-TSO slave case I bet you'll see a marked
decrease in system utilization at least.

The upper layers need to know the precise capabilities
of the device in order to optimize the copy from userspace,
the checksumming, and the data gathering into the SKB.

Therefore, if you "fake it out" like this without checking
what the slaves actually support, then a lot of wasted cpu
time will be spent in each dev_queue_xmit() path. There will
in many cases be multiple passes over the data instead of one,
and it is possible to introduce an extra data copy as well.

I would recommend instead the following algorithm. Publish
only the capabilities which all slaves support.

2004-11-12 21:36:38

by Radheka Godse

[permalink] [raw]
Subject: RE: [Bonding-devel][PATCH]Zero Copy Transmit Support (Update)

I had similar thoughts but then, the bond device does not have any
slaves attached to it at load time. By publishing them upfront the bond
device is able to take advantage of hardware acceleration if it is later
available... I will get test data for non-TSO test scenarios to see if
there is significant degradation in performance.

-radheka


-----Original Message-----
From: David S. Miller [mailto:[email protected]]
Sent: Thursday, November 11, 2004 4:00 PM
To: Godse, Radheka
Cc: [email protected]; [email protected];
[email protected]; [email protected]
Subject: Re: [Bonding-devel][PATCH]Zero Copy Transmit Support (Update)

On Fri, 12 Nov 2004 15:51:49 -0800 (PST)
Radheka Godse <[email protected]> wrote:

> -
> +
> + /* We let the bond device publish all hardware
> + * acceleration features possible. This is OK,
> + * since if an skb is passed from the bond to
> + * a slave that doesn't support one of those
> + * features, everything is fixed in the
> + * dev_queue_xmit() function (e.g. calculate
> + * check sum, linearize the skb, etc.).
> + */

This is very inefficient if the bond slaves don't
support these features.

I believe you when you say you saw improvement in the
case where the slaves do support TSO, but if you test
a non-TSO slave case I bet you'll see a marked
decrease in system utilization at least.

The upper layers need to know the precise capabilities
of the device in order to optimize the copy from userspace,
the checksumming, and the data gathering into the SKB.

Therefore, if you "fake it out" like this without checking
what the slaves actually support, then a lot of wasted cpu
time will be spent in each dev_queue_xmit() path. There will
in many cases be multiple passes over the data instead of one,
and it is possible to introduce an extra data copy as well.

I would recommend instead the following algorithm. Publish
only the capabilities which all slaves support.

2004-11-12 21:40:37

by David Miller

[permalink] [raw]
Subject: Re: [Bonding-devel][PATCH]Zero Copy Transmit Support (Update)

On Fri, 12 Nov 2004 13:31:53 -0800
"Godse, Radheka" <[email protected]> wrote:

> I had similar thoughts but then, the bond device does not have any
> slaves attached to it at load time. By publishing them upfront the bond
> device is able to take advantage of hardware acceleration if it is later
> available...

This is not a problem at all.

You can make ethtool calls on the bonding device to change
settings later. Nothing prevents you from changing the
SG/CSUM/TSO bits after device registration.

2004-11-12 22:00:52

by Jay Vosburgh

[permalink] [raw]
Subject: Re: [Bonding-devel][PATCH]Zero Copy Transmit Support (Update)


David S. Miller <[email protected]> wrote:
>> I had similar thoughts but then, the bond device does not have any
>> slaves attached to it at load time. By publishing them upfront the bond
>> device is able to take advantage of hardware acceleration if it is later
>> available...

"Shlomi Yaakobovich" <[email protected]> posted a patch to
update the features as slaves are added and removed, based on the
features advertised by the slaves. His original patch wasn't properly
based; this is the same change set redone to patch against 2.6.9.

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [email protected]


Index: linux-2681-bd/drivers/net/bonding/bond_main.c
===================================================================
RCS file: /sda7/CVS/linux-2681-bd/drivers/net/bonding/bond_main.c,v
retrieving revision 1.2
retrieving revision 1.3
diff -u -u -r1.2 -r1.3
--- linux-2681-bd/drivers/net/bonding/bond_main.c 15 Oct 2004 22:11:14 -0000 1.2
+++ linux-2681-bd/drivers/net/bonding/bond_main.c 15 Oct 2004 23:27:34 -0000 1.3
@@ -1580,6 +1580,32 @@
return 0;
}

+enum { /* don't set these here */
+ BOND_MAINTAINED_FEATURES =
+ NETIF_F_VLAN_CHALLENGED|NETIF_F_HW_VLAN_TX |
+ NETIF_F_HW_VLAN_RX |NETIF_F_HW_VLAN_FILTER
+};
+
+static void bond_set_features(struct net_device *bond_dev)
+{
+ struct bonding *bond = bond_dev->priv;
+ struct slave *slave;
+ int i, features = 0;
+
+ bond_for_each_slave(bond, slave, i) {
+ struct net_device *slave_dev = slave->dev;
+ if (i == 0) {
+ features = slave_dev->features;
+ } else {
+ features &= slave_dev->features;
+ }
+ }
+ /* clear all previous features */
+ bond_dev->features &= BOND_MAINTAINED_FEATURES;
+ /* add common features, don't touch driver maintained ones */
+ bond_dev->features |= (features & ~BOND_MAINTAINED_FEATURES);
+}
+
/* enslave device <slave> to bond device <master> */
static int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
{
@@ -1972,6 +1998,7 @@
new_slave->link != BOND_LINK_DOWN ? "n up" : " down");

/* enslave is successful */
+ bond_set_features(bond_dev);
return 0;

/* Undo stages on error */
@@ -2177,6 +2204,7 @@

kfree(slave);

+ bond_set_features(bond_dev);
return 0; /* deletion OK */
}

@@ -2292,6 +2320,7 @@
printk(KERN_INFO DRV_NAME
": %s: released all slaves\n",
bond_dev->name);
+ bond_set_features(bond_dev);

out:
write_unlock_bh(&bond->lock);

2004-11-12 22:04:58

by David Miller

[permalink] [raw]
Subject: Re: [Bonding-devel][PATCH]Zero Copy Transmit Support (Update)

On Fri, 12 Nov 2004 13:56:26 -0800
Jay Vosburgh <[email protected]> wrote:

>
> David S. Miller <[email protected]> wrote:
> >> I had similar thoughts but then, the bond device does not have any
> >> slaves attached to it at load time. By publishing them upfront the bond
> >> device is able to take advantage of hardware acceleration if it is later
> >> available...
>
> "Shlomi Yaakobovich" <[email protected]> posted a patch to
> update the features as slaves are added and removed, based on the
> features advertised by the slaves. His original patch wasn't properly
> based; this is the same change set redone to patch against 2.6.9.

That's definitely a good start.

It does need to be fixed to enforce the usual rules about
illegal combinations. And his code is going to include
all sorts of weird things like VLAN offload which I wonder
if works correctly with the current bonding driver? :)

The two rules are codified in register_netdevice() as follows:

/* Fix illegal SG+CSUM combinations. */
if ((dev->features & NETIF_F_SG) &&
!(dev->features & (NETIF_F_IP_CSUM |
NETIF_F_NO_CSUM |
NETIF_F_HW_CSUM))) {
printk("%s: Dropping NETIF_F_SG since no checksum feature.\n",
dev->name);
dev->features &= ~NETIF_F_SG;
}

/* TSO requires that SG is present as well. */
if ((dev->features & NETIF_F_TSO) &&
!(dev->features & NETIF_F_SG)) {
printk("%s: Dropping NETIF_F_TSO since no SG feature.\n",
dev->name);
dev->features &= ~NETIF_F_TSO;
}

2004-11-12 22:24:54

by Jay Vosburgh

[permalink] [raw]
Subject: Re: [Bonding-devel][PATCH]Zero Copy Transmit Support (Update)

David S. Miller <[email protected]> wrote:
>That's definitely a good start.
>
>It does need to be fixed to enforce the usual rules about
>illegal combinations. And his code is going to include
>all sorts of weird things like VLAN offload which I wonder
>if works correctly with the current bonding driver? :)

The existing code should handle VLANs correctly, and the patch
excludes the VLAN related bits from the dev->features update.

>The two rules are codified in register_netdevice() as follows:
[...]

Would it be preferrable to duplicate that logic in bonding, or
push it out to an inline or some such?

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [email protected]

2004-11-12 22:40:54

by David Miller

[permalink] [raw]
Subject: Re: [Bonding-devel][PATCH]Zero Copy Transmit Support (Update)

On Fri, 12 Nov 2004 14:20:51 -0800
Jay Vosburgh <[email protected]> wrote:

> Would it be preferrable to duplicate that logic in bonding, or
> push it out to an inline or some such?

It's present also in the ethtool methods used to change
these things, so you might consider making ethtool calls
to change the bits as well.

It's a pretty bad idea for folks to be changing the ->features
bits directly in drivers after device registry.

2004-11-18 19:28:49

by Radheka Godse

[permalink] [raw]
Subject: RE: [Bonding-devel][PATCH]Zero Copy Transmit Support (Update)

> -
> +
> + /* We let the bond device publish all hardware
> + * acceleration features possible. This is OK,
> + * since if an skb is passed from the bond to
> + * a slave that doesn't support one of those
> + * features, everything is fixed in the
> + * dev_queue_xmit() function (e.g. calculate
> + * check sum, linearize the skb, etc.).
> + */

This is very inefficient if the bond slaves don't
support these features.

I believe you when you say you saw improvement in the
case where the slaves do support TSO, but if you test
a non-TSO slave case I bet you'll see a marked
decrease in system utilization at least.
...

Note: 2.6.2 patch enables Zero Copy Transmit support by default while
registering the bond device... but, I continue to see better CPU
utilization (~50% improvement) when this patch was tested in various
scenarios(see data below). Test results are from a dual Itanium 2 1.4GHz
system but I saw a similar trend on a PIII Xeon 550MHz x 8 system.

bond-patch-2.6.1 and 82546EB Copper NICs with hardware acceleration
support,
Test results with TSO on and off
ETH1 ETH2 Throughput/Performance System Utilization
on off 1870 35 %
on on 1870 35 %
off off 1871 34 %

bond-patch-2.6.2 and 82546EB Copper NICs with hardware acceleration
support,
Test results with TSO on and off
ETH1 ETH2 Throughput/Performance System Utilization
on off 1879 14 %
on on 1880 14.08 %
off off 1879 13 %
------------------------------------------------------------------

bond-patch-2.6.1 and 82543GC NICs without hardware support for TSO
ETH1 ETH2 Throughput/Performance System Utilization
off off 1815 36.85 %

bond-patch-2.6.2 and 82543GC NICs without hardware support for TSO
ETH1 ETH2 Throughput/Performance System Utilization
off off 1848 11.85 %
------------------------------------------------------------------

bond-patch-2.6.1, and one 82546EB NIC with & one 82543GC NIC without
hardware support for TSO
ETH1 ETH2 Throughput/Performance System Utilization

on off 1817 30.69 %

bond-patch-2.6.2, and one 82546EB NIC with & one 82543GC NIC without
hardware support for TSO
ETH1 ETH2 Throughput/Performance System Utilization
on off 1836 24.54 %
------------------------------------------------------------------

bond-patch-2.6.1 and 82543GC NICs
Test results with all HW acceleration bits on for 1 NIC and turned off
for the other NIC
ETH1 ETH2 Throughput/Performance System Utilization
on off 1851 35 %

bond-patch-2.6.2 and 82543GC NICs
Test results with all HW acceleration bits on for 1 NIC and turned off
for the other NIC
ETH1 ETH2 Throughput/Performance System Utilization
on off 1822 17 %
------------------------------------------------------------------

bond-patch-2.6.1 and 82543GC NICs
Test results with all HW acceleration bits turned off for both NICs
ETH1 ETH2 Throughput/Performance System Utilization
off off 1816 35 %

bond-patch-2.6.2 and 82543GC NICs
Test results with all HW acceleration bits turned off for both NICs
ETH1 ETH2 Throughput/Performance System Utilization
off off 1604 16.23 %