2023-04-11 02:11:46

by Liang Li

[permalink] [raw]
Subject: [Question] About bonding offload

Hi Everyone,

I'm a redhat network-qe and am testing bonding offload. e.g. gso,tso,gro,lro.
I got two questions during my testing.

1. The tcp performance has no difference when bonding GRO is on versus off.
When testing with bonding, I always get ~890 Mbits/sec bandwidth no
matter whether GRO is on.
When testing with a physical NIC instead of bonding on the same
machine, with GRO off, I get 464 Mbits/sec bandwidth, with GRO on, I
get 897 Mbits/sec bandwidth.
So looks like the GRO can't be turned off on bonding?

I used iperf3 to test performance.
And I limited iperf3 process cpu usage during my testing to simulate a
cpu bottleneck.
Otherwise it's difficult to see bandwidth differences when offload is
on versus off.

I reported a bz for this: https://bugzilla.redhat.com/show_bug.cgi?id=2183434

2. Should bonding propagate offload configuration to slaves?
For now, only "ethtool -K bond0 lro off" can be propagated to slaves,
others can't be propagated to slaves, e.g.
ethtool -K bond0 tso on/off
ethtool -K bond0 gso on/off
ethtool -K bond0 gro on/off
ethtool -K bond0 lro on
All above configurations can't be propagated to bonding slaves.

I reports a bz for this: https://bugzilla.redhat.com/show_bug.cgi?id=2183777

I am using the RHEL with kernel 4.18.0-481.el8.x86_64.

BR,
Liang Li


2023-04-11 02:32:01

by Andrew Lunn

[permalink] [raw]
Subject: Re: [Question] About bonding offload

On Tue, Apr 11, 2023 at 09:47:14AM +0800, Liang Li wrote:
> Hi Everyone,
>
> I'm a redhat network-qe and am testing bonding offload. e.g. gso,tso,gro,lro.
> I got two questions during my testing.
>
> 1. The tcp performance has no difference when bonding GRO is on versus off.
> When testing with bonding, I always get ~890 Mbits/sec bandwidth no
> matter whether GRO is on.
> When testing with a physical NIC instead of bonding on the same
> machine, with GRO off, I get 464 Mbits/sec bandwidth, with GRO on, I
> get 897 Mbits/sec bandwidth.
> So looks like the GRO can't be turned off on bonding?
>
> I used iperf3 to test performance.
> And I limited iperf3 process cpu usage during my testing to simulate a
> cpu bottleneck.
> Otherwise it's difficult to see bandwidth differences when offload is
> on versus off.
>
> I reported a bz for this: https://bugzilla.redhat.com/show_bug.cgi?id=2183434
>
> 2. Should bonding propagate offload configuration to slaves?
> For now, only "ethtool -K bond0 lro off" can be propagated to slaves,
> others can't be propagated to slaves, e.g.
> ethtool -K bond0 tso on/off
> ethtool -K bond0 gso on/off
> ethtool -K bond0 gro on/off
> ethtool -K bond0 lro on
> All above configurations can't be propagated to bonding slaves.
>
> I reports a bz for this: https://bugzilla.redhat.com/show_bug.cgi?id=2183777
>
> I am using the RHEL with kernel 4.18.0-481.el8.x86_64.

Hi Liang

Can you reproduce these issues with a modern kernel? net-next, or 6.3?

The normal process for issues like this is to investigate with the
latest kernel, and then backport fixes to old stable kernels.

Andrew

2023-04-11 05:01:13

by Jay Vosburgh

[permalink] [raw]
Subject: Re: [Question] About bonding offload

Liang Li <[email protected]> wrote:

>Hi Everyone,
>
>I'm a redhat network-qe and am testing bonding offload. e.g. gso,tso,gro,lro.
>I got two questions during my testing.
>
>1. The tcp performance has no difference when bonding GRO is on versus off.
>When testing with bonding, I always get ~890 Mbits/sec bandwidth no
>matter whether GRO is on.
>When testing with a physical NIC instead of bonding on the same
>machine, with GRO off, I get 464 Mbits/sec bandwidth, with GRO on, I
>get 897 Mbits/sec bandwidth.
>So looks like the GRO can't be turned off on bonding?

Well, it's probably more correct to say that GRO is
unimplemented for "stacked on top" interfaces like bonding (or bridge,
vlan, team, etc). GRO operates early in the receive processing, when
the device driver is receiving packets, typically by calling
napi_gro_receive() from its NAPI poll function. This is well before
bonding, bridge, et al, are involved, as these drivers don't do NAPI at
all.

>I used iperf3 to test performance.
>And I limited iperf3 process cpu usage during my testing to simulate a
>cpu bottleneck.
>Otherwise it's difficult to see bandwidth differences when offload is
>on versus off.
>
>I reported a bz for this: https://bugzilla.redhat.com/show_bug.cgi?id=2183434
>
>2. Should bonding propagate offload configuration to slaves?
>For now, only "ethtool -K bond0 lro off" can be propagated to slaves,
>others can't be propagated to slaves, e.g.
> ethtool -K bond0 tso on/off
> ethtool -K bond0 gso on/off
> ethtool -K bond0 gro on/off
> ethtool -K bond0 lro on
>All above configurations can't be propagated to bonding slaves.

The LRO case is because it's set in NETIF_F_UPPER_DISABLES, as
checked in netdev_sync_upper_features() and netdev_sync_lower_features().

A subset of features is handled in bond_compute_features().
Some feature changes, e.g., scatter-gather, do propagate upwards (but
not downwards), as bonding handles NETDEV_FEAT_CHANGE events for its
members (but not vice versa).

TSO, GSO, and GRO aren't handled in either of these situations,
and so changes don't propagate at all. Whether they should or not is a
separate, complicated, question. E.g., should features propagate
upwards, or downwards? How many levels of nesting?

-J

>I reports a bz for this: https://bugzilla.redhat.com/show_bug.cgi?id=2183777
>
>I am using the RHEL with kernel 4.18.0-481.el8.x86_64.
>
>BR,
>Liang Li
>

---
-Jay Vosburgh, [email protected]

2023-04-11 05:27:24

by Liang Li

[permalink] [raw]
Subject: Re: [Question] About bonding offload

Thanks everyone! Glad to know this.

On Tue, Apr 11, 2023 at 12:58 PM Jay Vosburgh
<[email protected]> wrote:
>
> Liang Li <[email protected]> wrote:
>
> >Hi Everyone,
> >
> >I'm a redhat network-qe and am testing bonding offload. e.g. gso,tso,gro,lro.
> >I got two questions during my testing.
> >
> >1. The tcp performance has no difference when bonding GRO is on versus off.
> >When testing with bonding, I always get ~890 Mbits/sec bandwidth no
> >matter whether GRO is on.
> >When testing with a physical NIC instead of bonding on the same
> >machine, with GRO off, I get 464 Mbits/sec bandwidth, with GRO on, I
> >get 897 Mbits/sec bandwidth.
> >So looks like the GRO can't be turned off on bonding?
>
> Well, it's probably more correct to say that GRO is
> unimplemented for "stacked on top" interfaces like bonding (or bridge,
> vlan, team, etc). GRO operates early in the receive processing, when
> the device driver is receiving packets, typically by calling
> napi_gro_receive() from its NAPI poll function. This is well before
> bonding, bridge, et al, are involved, as these drivers don't do NAPI at
> all.
>
> >I used iperf3 to test performance.
> >And I limited iperf3 process cpu usage during my testing to simulate a
> >cpu bottleneck.
> >Otherwise it's difficult to see bandwidth differences when offload is
> >on versus off.
> >
> >I reported a bz for this: https://bugzilla.redhat.com/show_bug.cgi?id=2183434
> >
> >2. Should bonding propagate offload configuration to slaves?
> >For now, only "ethtool -K bond0 lro off" can be propagated to slaves,
> >others can't be propagated to slaves, e.g.
> > ethtool -K bond0 tso on/off
> > ethtool -K bond0 gso on/off
> > ethtool -K bond0 gro on/off
> > ethtool -K bond0 lro on
> >All above configurations can't be propagated to bonding slaves.
>
> The LRO case is because it's set in NETIF_F_UPPER_DISABLES, as
> checked in netdev_sync_upper_features() and netdev_sync_lower_features().
>
> A subset of features is handled in bond_compute_features().
> Some feature changes, e.g., scatter-gather, do propagate upwards (but
> not downwards), as bonding handles NETDEV_FEAT_CHANGE events for its
> members (but not vice versa).
>
> TSO, GSO, and GRO aren't handled in either of these situations,
> and so changes don't propagate at all. Whether they should or not is a
> separate, complicated, question. E.g., should features propagate
> upwards, or downwards? How many levels of nesting?
>
> -J
>
> >I reports a bz for this: https://bugzilla.redhat.com/show_bug.cgi?id=2183777
> >
> >I am using the RHEL with kernel 4.18.0-481.el8.x86_64.
> >
> >BR,
> >Liang Li
> >
>
> ---
> -Jay Vosburgh, [email protected]
>