2020-12-18 20:53:09

by Jarod Wilson

[permalink] [raw]
Subject: [RFC PATCH net-next] bonding: add a vlan+srcmac tx hashing option

This comes from an end-user request, where they're running multiple VMs on
hosts with bonded interfaces connected to some interest switch topologies,
where 802.3ad isn't an option. They're currently running a proprietary
solution that effectively achieves load-balancing of VMs and bandwidth
utilization improvements with a similar form of transmission algorithm.

Basically, each VM has it's own vlan, so it always sends its traffic out
the same interface, unless that interface fails. Traffic gets split
between the interfaces, maintaining a consistent path, with failover still
available if an interface goes down.

This has been rudimetarily tested to provide similar results, suitable for
them to use to move off their current proprietary solution.

Still on the TODO list, if these even looks sane to begin with, is
fleshing out Documentation/networking/bonding.rst.

Cc: Jay Vosburgh <[email protected]>
Cc: Veaceslav Falico <[email protected]>
Cc: Andy Gospodarek <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Thomas Davis <[email protected]>
Cc: [email protected]
Signed-off-by: Jarod Wilson <[email protected]>
---
drivers/net/bonding/bond_main.c | 27 +++++++++++++++++++++++++--
drivers/net/bonding/bond_options.c | 1 +
include/linux/netdevice.h | 1 +
include/uapi/linux/if_bonding.h | 1 +
4 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 5fe5232cc3f3..151ce8c7a56f 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -164,7 +164,7 @@ module_param(xmit_hash_policy, charp, 0);
MODULE_PARM_DESC(xmit_hash_policy, "balance-alb, balance-tlb, balance-xor, 802.3ad hashing method; "
"0 for layer 2 (default), 1 for layer 3+4, "
"2 for layer 2+3, 3 for encap layer 2+3, "
- "4 for encap layer 3+4");
+ "4 for encap layer 3+4, 5 for vlan+srcmac");
module_param(arp_interval, int, 0);
MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
module_param_array(arp_ip_target, charp, NULL, 0);
@@ -1434,6 +1434,8 @@ static enum netdev_lag_hash bond_lag_hash_type(struct bonding *bond,
return NETDEV_LAG_HASH_E23;
case BOND_XMIT_POLICY_ENCAP34:
return NETDEV_LAG_HASH_E34;
+ case BOND_XMIT_POLICY_VLAN_SRCMAC:
+ return NETDEV_LAG_HASH_VLAN_SRCMAC;
default:
return NETDEV_LAG_HASH_UNKNOWN;
}
@@ -3494,6 +3496,20 @@ static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk,
return true;
}

+static inline u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
+{
+ struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);
+ u32 srcmac = mac_hdr->h_source[5];
+ u16 vlan;
+
+ if (!skb_vlan_tag_present(skb))
+ return srcmac;
+
+ vlan = skb_vlan_tag_get(skb);
+
+ return srcmac ^ vlan;
+}
+
/* Extract the appropriate headers based on bond's xmit policy */
static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
struct flow_keys *fk)
@@ -3501,10 +3517,14 @@ static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
bool l34 = bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER34;
int noff, proto = -1;

- if (bond->params.xmit_policy > BOND_XMIT_POLICY_LAYER23) {
+ switch (bond->params.xmit_policy) {
+ case BOND_XMIT_POLICY_ENCAP23:
+ case BOND_XMIT_POLICY_ENCAP34:
memset(fk, 0, sizeof(*fk));
return __skb_flow_dissect(NULL, skb, &flow_keys_bonding,
fk, NULL, 0, 0, 0, 0);
+ default:
+ break;
}

fk->ports.ports = 0;
@@ -3556,6 +3576,9 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
skb->l4_hash)
return skb->hash;

+ if (bond->params.xmit_policy == BOND_XMIT_POLICY_VLAN_SRCMAC)
+ return bond_vlan_srcmac_hash(skb);
+
if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 ||
!bond_flow_dissect(bond, skb, &flow))
return bond_eth_hash(skb);
diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index a4e4e15f574d..9826fe46fca1 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -101,6 +101,7 @@ static const struct bond_opt_value bond_xmit_hashtype_tbl[] = {
{ "layer2+3", BOND_XMIT_POLICY_LAYER23, 0},
{ "encap2+3", BOND_XMIT_POLICY_ENCAP23, 0},
{ "encap3+4", BOND_XMIT_POLICY_ENCAP34, 0},
+ { "vlansrc", BOND_XMIT_POLICY_VLAN_SRCMAC, 0},
{ NULL, -1, 0},
};

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 7bf167993c05..551eac4ab747 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2633,6 +2633,7 @@ enum netdev_lag_hash {
NETDEV_LAG_HASH_L23,
NETDEV_LAG_HASH_E23,
NETDEV_LAG_HASH_E34,
+ NETDEV_LAG_HASH_VLAN_SRCMAC,
NETDEV_LAG_HASH_UNKNOWN,
};

diff --git a/include/uapi/linux/if_bonding.h b/include/uapi/linux/if_bonding.h
index 45f3750aa861..e8eb4ad03cf1 100644
--- a/include/uapi/linux/if_bonding.h
+++ b/include/uapi/linux/if_bonding.h
@@ -94,6 +94,7 @@
#define BOND_XMIT_POLICY_LAYER23 2 /* layer 2+3 (IP ^ MAC) */
#define BOND_XMIT_POLICY_ENCAP23 3 /* encapsulated layer 2+3 */
#define BOND_XMIT_POLICY_ENCAP34 4 /* encapsulated layer 3+4 */
+#define BOND_XMIT_POLICY_VLAN_SRCMAC 5 /* vlan + source MAC */

/* 802.3ad port state definitions (43.4.2.2 in the 802.3ad standard) */
#define LACP_STATE_LACP_ACTIVITY 0x1
--
2.29.2


2020-12-19 00:20:40

by Jay Vosburgh

[permalink] [raw]
Subject: Re: [RFC PATCH net-next] bonding: add a vlan+srcmac tx hashing option

Jarod Wilson <[email protected]> wrote:

>This comes from an end-user request, where they're running multiple VMs on
>hosts with bonded interfaces connected to some interest switch topologies,
>where 802.3ad isn't an option. They're currently running a proprietary
>solution that effectively achieves load-balancing of VMs and bandwidth
>utilization improvements with a similar form of transmission algorithm.
>
>Basically, each VM has it's own vlan, so it always sends its traffic out
>the same interface, unless that interface fails. Traffic gets split
>between the interfaces, maintaining a consistent path, with failover still
>available if an interface goes down.
>
>This has been rudimetarily tested to provide similar results, suitable for
>them to use to move off their current proprietary solution.
>
>Still on the TODO list, if these even looks sane to begin with, is
>fleshing out Documentation/networking/bonding.rst.

I'm sure you're aware, but any final submission will also need
to include netlink and iproute2 support.

>Cc: Jay Vosburgh <[email protected]>
>Cc: Veaceslav Falico <[email protected]>
>Cc: Andy Gospodarek <[email protected]>
>Cc: "David S. Miller" <[email protected]>
>Cc: Jakub Kicinski <[email protected]>
>Cc: Thomas Davis <[email protected]>
>Cc: [email protected]
>Signed-off-by: Jarod Wilson <[email protected]>
>---
> drivers/net/bonding/bond_main.c | 27 +++++++++++++++++++++++++--
> drivers/net/bonding/bond_options.c | 1 +
> include/linux/netdevice.h | 1 +
> include/uapi/linux/if_bonding.h | 1 +
> 4 files changed, 28 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 5fe5232cc3f3..151ce8c7a56f 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -164,7 +164,7 @@ module_param(xmit_hash_policy, charp, 0);
> MODULE_PARM_DESC(xmit_hash_policy, "balance-alb, balance-tlb, balance-xor, 802.3ad hashing method; "
> "0 for layer 2 (default), 1 for layer 3+4, "
> "2 for layer 2+3, 3 for encap layer 2+3, "
>- "4 for encap layer 3+4");
>+ "4 for encap layer 3+4, 5 for vlan+srcmac");
> module_param(arp_interval, int, 0);
> MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
> module_param_array(arp_ip_target, charp, NULL, 0);
>@@ -1434,6 +1434,8 @@ static enum netdev_lag_hash bond_lag_hash_type(struct bonding *bond,
> return NETDEV_LAG_HASH_E23;
> case BOND_XMIT_POLICY_ENCAP34:
> return NETDEV_LAG_HASH_E34;
>+ case BOND_XMIT_POLICY_VLAN_SRCMAC:
>+ return NETDEV_LAG_HASH_VLAN_SRCMAC;
> default:
> return NETDEV_LAG_HASH_UNKNOWN;
> }
>@@ -3494,6 +3496,20 @@ static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk,
> return true;
> }
>
>+static inline u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
>+{
>+ struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);
>+ u32 srcmac = mac_hdr->h_source[5];
>+ u16 vlan;
>+
>+ if (!skb_vlan_tag_present(skb))
>+ return srcmac;
>+
>+ vlan = skb_vlan_tag_get(skb);
>+
>+ return srcmac ^ vlan;

For the common configuration wherein multiple VLANs are
configured atop a single interface (and thus by default end up with the
same MAC address), this seems like a fairly weak hash. The TCI is 16
bits (12 of which are the VID), but only 8 are used from the MAC, which
will be a constant.

Is this algorithm copying the proprietary solution you mention?

-J

>+}
>+
> /* Extract the appropriate headers based on bond's xmit policy */
> static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
> struct flow_keys *fk)
>@@ -3501,10 +3517,14 @@ static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
> bool l34 = bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER34;
> int noff, proto = -1;
>
>- if (bond->params.xmit_policy > BOND_XMIT_POLICY_LAYER23) {
>+ switch (bond->params.xmit_policy) {
>+ case BOND_XMIT_POLICY_ENCAP23:
>+ case BOND_XMIT_POLICY_ENCAP34:
> memset(fk, 0, sizeof(*fk));
> return __skb_flow_dissect(NULL, skb, &flow_keys_bonding,
> fk, NULL, 0, 0, 0, 0);
>+ default:
>+ break;
> }
>
> fk->ports.ports = 0;
>@@ -3556,6 +3576,9 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
> skb->l4_hash)
> return skb->hash;
>
>+ if (bond->params.xmit_policy == BOND_XMIT_POLICY_VLAN_SRCMAC)
>+ return bond_vlan_srcmac_hash(skb);
>+
> if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 ||
> !bond_flow_dissect(bond, skb, &flow))
> return bond_eth_hash(skb);
>diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
>index a4e4e15f574d..9826fe46fca1 100644
>--- a/drivers/net/bonding/bond_options.c
>+++ b/drivers/net/bonding/bond_options.c
>@@ -101,6 +101,7 @@ static const struct bond_opt_value bond_xmit_hashtype_tbl[] = {
> { "layer2+3", BOND_XMIT_POLICY_LAYER23, 0},
> { "encap2+3", BOND_XMIT_POLICY_ENCAP23, 0},
> { "encap3+4", BOND_XMIT_POLICY_ENCAP34, 0},
>+ { "vlansrc", BOND_XMIT_POLICY_VLAN_SRCMAC, 0},
> { NULL, -1, 0},
> };
>
>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>index 7bf167993c05..551eac4ab747 100644
>--- a/include/linux/netdevice.h
>+++ b/include/linux/netdevice.h
>@@ -2633,6 +2633,7 @@ enum netdev_lag_hash {
> NETDEV_LAG_HASH_L23,
> NETDEV_LAG_HASH_E23,
> NETDEV_LAG_HASH_E34,
>+ NETDEV_LAG_HASH_VLAN_SRCMAC,
> NETDEV_LAG_HASH_UNKNOWN,
> };
>
>diff --git a/include/uapi/linux/if_bonding.h b/include/uapi/linux/if_bonding.h
>index 45f3750aa861..e8eb4ad03cf1 100644
>--- a/include/uapi/linux/if_bonding.h
>+++ b/include/uapi/linux/if_bonding.h
>@@ -94,6 +94,7 @@
> #define BOND_XMIT_POLICY_LAYER23 2 /* layer 2+3 (IP ^ MAC) */
> #define BOND_XMIT_POLICY_ENCAP23 3 /* encapsulated layer 2+3 */
> #define BOND_XMIT_POLICY_ENCAP34 4 /* encapsulated layer 3+4 */
>+#define BOND_XMIT_POLICY_VLAN_SRCMAC 5 /* vlan + source MAC */
>
> /* 802.3ad port state definitions (43.4.2.2 in the 802.3ad standard) */
> #define LACP_STATE_LACP_ACTIVITY 0x1
>--
>2.29.2
>

---
-Jay Vosburgh, [email protected]

2020-12-28 10:13:40

by Jiri Pirko

[permalink] [raw]
Subject: Re: [RFC PATCH net-next] bonding: add a vlan+srcmac tx hashing option

Fri, Dec 18, 2020 at 08:30:33PM CET, [email protected] wrote:
>This comes from an end-user request, where they're running multiple VMs on
>hosts with bonded interfaces connected to some interest switch topologies,
>where 802.3ad isn't an option. They're currently running a proprietary
>solution that effectively achieves load-balancing of VMs and bandwidth
>utilization improvements with a similar form of transmission algorithm.
>
>Basically, each VM has it's own vlan, so it always sends its traffic out
>the same interface, unless that interface fails. Traffic gets split
>between the interfaces, maintaining a consistent path, with failover still
>available if an interface goes down.
>
>This has been rudimetarily tested to provide similar results, suitable for
>them to use to move off their current proprietary solution.
>
>Still on the TODO list, if these even looks sane to begin with, is
>fleshing out Documentation/networking/bonding.rst.

Jarod, did you consider using team driver instead ? :)

2021-01-08 00:01:03

by Jarod Wilson

[permalink] [raw]
Subject: Re: [RFC PATCH net-next] bonding: add a vlan+srcmac tx hashing option

On Mon, Dec 28, 2020 at 11:11:45AM +0100, Jiri Pirko wrote:
> Fri, Dec 18, 2020 at 08:30:33PM CET, [email protected] wrote:
> >This comes from an end-user request, where they're running multiple VMs on
> >hosts with bonded interfaces connected to some interest switch topologies,
> >where 802.3ad isn't an option. They're currently running a proprietary
> >solution that effectively achieves load-balancing of VMs and bandwidth
> >utilization improvements with a similar form of transmission algorithm.
> >
> >Basically, each VM has it's own vlan, so it always sends its traffic out
> >the same interface, unless that interface fails. Traffic gets split
> >between the interfaces, maintaining a consistent path, with failover still
> >available if an interface goes down.
> >
> >This has been rudimetarily tested to provide similar results, suitable for
> >them to use to move off their current proprietary solution.
> >
> >Still on the TODO list, if these even looks sane to begin with, is
> >fleshing out Documentation/networking/bonding.rst.
>
> Jarod, did you consider using team driver instead ? :)

That's actually one of the things that was suggested, since team I believe
already has support for this, but the user really wants to use bonding.
We're finding that a lot of users really still prefer bonding over team.

--
Jarod Wilson
[email protected]

2021-01-08 00:07:07

by Jarod Wilson

[permalink] [raw]
Subject: Re: [RFC PATCH net-next] bonding: add a vlan+srcmac tx hashing option

On Fri, Dec 18, 2020 at 04:18:59PM -0800, Jay Vosburgh wrote:
> Jarod Wilson <[email protected]> wrote:
>
> >This comes from an end-user request, where they're running multiple VMs on
> >hosts with bonded interfaces connected to some interest switch topologies,
> >where 802.3ad isn't an option. They're currently running a proprietary
> >solution that effectively achieves load-balancing of VMs and bandwidth
> >utilization improvements with a similar form of transmission algorithm.
> >
> >Basically, each VM has it's own vlan, so it always sends its traffic out
> >the same interface, unless that interface fails. Traffic gets split
> >between the interfaces, maintaining a consistent path, with failover still
> >available if an interface goes down.
> >
> >This has been rudimetarily tested to provide similar results, suitable for
> >them to use to move off their current proprietary solution.
> >
> >Still on the TODO list, if these even looks sane to begin with, is
> >fleshing out Documentation/networking/bonding.rst.
>
> I'm sure you're aware, but any final submission will also need
> to include netlink and iproute2 support.

I believe everything for netlink support is already included, but I'll
double-check that before submitting something for inclusion consideration.

> >diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> >index 5fe5232cc3f3..151ce8c7a56f 100644
> >--- a/drivers/net/bonding/bond_main.c
> >+++ b/drivers/net/bonding/bond_main.c
> >@@ -164,7 +164,7 @@ module_param(xmit_hash_policy, charp, 0);
> > MODULE_PARM_DESC(xmit_hash_policy, "balance-alb, balance-tlb, balance-xor, 802.3ad hashing method; "
> > "0 for layer 2 (default), 1 for layer 3+4, "
> > "2 for layer 2+3, 3 for encap layer 2+3, "
> >- "4 for encap layer 3+4");
> >+ "4 for encap layer 3+4, 5 for vlan+srcmac");
> > module_param(arp_interval, int, 0);
> > MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
> > module_param_array(arp_ip_target, charp, NULL, 0);
> >@@ -1434,6 +1434,8 @@ static enum netdev_lag_hash bond_lag_hash_type(struct bonding *bond,
> > return NETDEV_LAG_HASH_E23;
> > case BOND_XMIT_POLICY_ENCAP34:
> > return NETDEV_LAG_HASH_E34;
> >+ case BOND_XMIT_POLICY_VLAN_SRCMAC:
> >+ return NETDEV_LAG_HASH_VLAN_SRCMAC;
> > default:
> > return NETDEV_LAG_HASH_UNKNOWN;
> > }
> >@@ -3494,6 +3496,20 @@ static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk,
> > return true;
> > }
> >
> >+static inline u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
> >+{
> >+ struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);
> >+ u32 srcmac = mac_hdr->h_source[5];
> >+ u16 vlan;
> >+
> >+ if (!skb_vlan_tag_present(skb))
> >+ return srcmac;
> >+
> >+ vlan = skb_vlan_tag_get(skb);
> >+
> >+ return srcmac ^ vlan;
>
> For the common configuration wherein multiple VLANs are
> configured atop a single interface (and thus by default end up with the
> same MAC address), this seems like a fairly weak hash. The TCI is 16
> bits (12 of which are the VID), but only 8 are used from the MAC, which
> will be a constant.
>
> Is this algorithm copying the proprietary solution you mention?

I've not actually seen the code in question, so I can't be 100% certain,
but this is exactly how it was described to me, and testing seems to bear
out that it behaves at least similarly enough for the user. They like
simplicity, and the very basic hash suits their needs, which are basically
just getting some load-balancing with failover w/o having to have lacp,
running on some older switches that can't do lacp.

--
Jarod Wilson
[email protected]

2021-01-08 13:15:26

by Jiri Pirko

[permalink] [raw]
Subject: Re: [RFC PATCH net-next] bonding: add a vlan+srcmac tx hashing option

Fri, Jan 08, 2021 at 12:58:13AM CET, [email protected] wrote:
>On Mon, Dec 28, 2020 at 11:11:45AM +0100, Jiri Pirko wrote:
>> Fri, Dec 18, 2020 at 08:30:33PM CET, [email protected] wrote:
>> >This comes from an end-user request, where they're running multiple VMs on
>> >hosts with bonded interfaces connected to some interest switch topologies,
>> >where 802.3ad isn't an option. They're currently running a proprietary
>> >solution that effectively achieves load-balancing of VMs and bandwidth
>> >utilization improvements with a similar form of transmission algorithm.
>> >
>> >Basically, each VM has it's own vlan, so it always sends its traffic out
>> >the same interface, unless that interface fails. Traffic gets split
>> >between the interfaces, maintaining a consistent path, with failover still
>> >available if an interface goes down.
>> >
>> >This has been rudimetarily tested to provide similar results, suitable for
>> >them to use to move off their current proprietary solution.
>> >
>> >Still on the TODO list, if these even looks sane to begin with, is
>> >fleshing out Documentation/networking/bonding.rst.
>>
>> Jarod, did you consider using team driver instead ? :)
>
>That's actually one of the things that was suggested, since team I believe
>already has support for this, but the user really wants to use bonding.
>We're finding that a lot of users really still prefer bonding over team.

Do you know the reason, other than "nostalgia"?

2021-01-08 15:26:47

by Jarod Wilson

[permalink] [raw]
Subject: Re: [RFC PATCH net-next] bonding: add a vlan+srcmac tx hashing option

On Fri, Jan 08, 2021 at 02:12:56PM +0100, Jiri Pirko wrote:
> Fri, Jan 08, 2021 at 12:58:13AM CET, [email protected] wrote:
> >On Mon, Dec 28, 2020 at 11:11:45AM +0100, Jiri Pirko wrote:
> >> Fri, Dec 18, 2020 at 08:30:33PM CET, [email protected] wrote:
> >> >This comes from an end-user request, where they're running multiple VMs on
> >> >hosts with bonded interfaces connected to some interest switch topologies,
> >> >where 802.3ad isn't an option. They're currently running a proprietary
> >> >solution that effectively achieves load-balancing of VMs and bandwidth
> >> >utilization improvements with a similar form of transmission algorithm.
> >> >
> >> >Basically, each VM has it's own vlan, so it always sends its traffic out
> >> >the same interface, unless that interface fails. Traffic gets split
> >> >between the interfaces, maintaining a consistent path, with failover still
> >> >available if an interface goes down.
> >> >
> >> >This has been rudimetarily tested to provide similar results, suitable for
> >> >them to use to move off their current proprietary solution.
> >> >
> >> >Still on the TODO list, if these even looks sane to begin with, is
> >> >fleshing out Documentation/networking/bonding.rst.
> >>
> >> Jarod, did you consider using team driver instead ? :)
> >
> >That's actually one of the things that was suggested, since team I believe
> >already has support for this, but the user really wants to use bonding.
> >We're finding that a lot of users really still prefer bonding over team.
>
> Do you know the reason, other than "nostalgia"?

I've heard a few different reasons that come to mind:

1) nostalgia is definitely one -- "we know bonding here"
2) support -- "the things I'm running say I need bonding to properly
support failover in their environment". How accurate this is, I don't
actually know.
3) monitoring -- "my monitoring solution knows about bonding, but not
about team". This is probably easily fixed, but may or may not be in the
user's direct control.
4) footprint -- "bonding does the job w/o team's userspace footprint".
I think this one is kind of hard for team to do anything about, bonding
really does have a smaller userspace footprint, which is a plus for
embedded type applications and high-security environments looking to keep
things as minimal as possible.

I think I've heard a few "we tried team years ago and it didn't work" as
well, which of course is ridiculous as a reason not to try something again,
since a lot can change in a few years in this world.

--
Jarod Wilson
[email protected]

2021-01-13 02:39:13

by Jarod Wilson

[permalink] [raw]
Subject: Re: [RFC PATCH net-next] bonding: add a vlan+srcmac tx hashing option

On Thu, Jan 07, 2021 at 07:03:40PM -0500, Jarod Wilson wrote:
> On Fri, Dec 18, 2020 at 04:18:59PM -0800, Jay Vosburgh wrote:
> > Jarod Wilson <[email protected]> wrote:
> >
> > >This comes from an end-user request, where they're running multiple VMs on
> > >hosts with bonded interfaces connected to some interest switch topologies,
> > >where 802.3ad isn't an option. They're currently running a proprietary
> > >solution that effectively achieves load-balancing of VMs and bandwidth
> > >utilization improvements with a similar form of transmission algorithm.
> > >
> > >Basically, each VM has it's own vlan, so it always sends its traffic out
> > >the same interface, unless that interface fails. Traffic gets split
> > >between the interfaces, maintaining a consistent path, with failover still
> > >available if an interface goes down.
> > >
> > >This has been rudimetarily tested to provide similar results, suitable for
> > >them to use to move off their current proprietary solution.
> > >
> > >Still on the TODO list, if these even looks sane to begin with, is
> > >fleshing out Documentation/networking/bonding.rst.
> >
> > I'm sure you're aware, but any final submission will also need
> > to include netlink and iproute2 support.
>
> I believe everything for netlink support is already included, but I'll
> double-check that before submitting something for inclusion consideration.

I'm not certain if what you actually meant was that I'd have to patch
iproute2 as well, which I've definitely stumbled onto today, but it's a
2-line patch, and everything seems to be working fine with it:

$ sudo ip link set bond0 type bond xmit_hash_policy 5
$ ip -d link show bond0
11: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether ce:85:5e:24:ce:90 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
bond mode balance-xor miimon 0 updelay 0 downdelay 0 peer_notify_delay 0 use_carrier 1 arp_interval 0 arp_validate none arp_all_targets any primary_reselect always fail_over_mac none xmit_hash_policy vlansrc resend_igmp 1 num_grat_arp 1 all_slaves_active 0 min_links 0 lp_interval 1 packets_per_slave 1 lacp_rate slow ad_select stable tlb_dynamic_lb 1 addrgenmode eui64 numtxqueues 16 numrxqueues 16 gso_max_size 65536 gso_max_segs 65535
$ grep Hash /proc/net/bonding/bond0
Transmit Hash Policy: vlansrc (5)

Nothing bad seems to happen on an older kernel if one tries to set the new
hash, you just get told that it's an invalid argument.

I *think* this is all ready for submission then, so I'll get both the kernel
and iproute2 patches out soon.

--
Jarod Wilson
[email protected]

2021-01-13 02:45:50

by Jay Vosburgh

[permalink] [raw]
Subject: Re: [RFC PATCH net-next] bonding: add a vlan+srcmac tx hashing option

Jarod Wilson <[email protected]> wrote:

>On Thu, Jan 07, 2021 at 07:03:40PM -0500, Jarod Wilson wrote:
>> On Fri, Dec 18, 2020 at 04:18:59PM -0800, Jay Vosburgh wrote:
>> > Jarod Wilson <[email protected]> wrote:
>> >
>> > >This comes from an end-user request, where they're running multiple VMs on
>> > >hosts with bonded interfaces connected to some interest switch topologies,
>> > >where 802.3ad isn't an option. They're currently running a proprietary
>> > >solution that effectively achieves load-balancing of VMs and bandwidth
>> > >utilization improvements with a similar form of transmission algorithm.
>> > >
>> > >Basically, each VM has it's own vlan, so it always sends its traffic out
>> > >the same interface, unless that interface fails. Traffic gets split
>> > >between the interfaces, maintaining a consistent path, with failover still
>> > >available if an interface goes down.
>> > >
>> > >This has been rudimetarily tested to provide similar results, suitable for
>> > >them to use to move off their current proprietary solution.
>> > >
>> > >Still on the TODO list, if these even looks sane to begin with, is
>> > >fleshing out Documentation/networking/bonding.rst.
>> >
>> > I'm sure you're aware, but any final submission will also need
>> > to include netlink and iproute2 support.
>>
>> I believe everything for netlink support is already included, but I'll
>> double-check that before submitting something for inclusion consideration.
>
>I'm not certain if what you actually meant was that I'd have to patch
>iproute2 as well, which I've definitely stumbled onto today, but it's a
>2-line patch, and everything seems to be working fine with it:

Yes, that's what I meant.

>$ sudo ip link set bond0 type bond xmit_hash_policy 5

Does the above work with the text label (presumably "vlansrc")
as well as the number, and does "ip link add test type bond help" print
the correct text for XMIT_HASH_POLICY?

-J

>$ ip -d link show bond0
>11: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
> link/ether ce:85:5e:24:ce:90 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
> bond mode balance-xor miimon 0 updelay 0 downdelay 0 peer_notify_delay 0 use_carrier 1 arp_interval 0 arp_validate none arp_all_targets any primary_reselect always fail_over_mac none xmit_hash_policy vlansrc resend_igmp 1 num_grat_arp 1 all_slaves_active 0 min_links 0 lp_interval 1 packets_per_slave 1 lacp_rate slow ad_select stable tlb_dynamic_lb 1 addrgenmode eui64 numtxqueues 16 numrxqueues 16 gso_max_size 65536 gso_max_segs 65535
>$ grep Hash /proc/net/bonding/bond0
>Transmit Hash Policy: vlansrc (5)
>
>Nothing bad seems to happen on an older kernel if one tries to set the new
>hash, you just get told that it's an invalid argument.
>
>I *think* this is all ready for submission then, so I'll get both the kernel
>and iproute2 patches out soon.
>
>--
>Jarod Wilson
>[email protected]

---
-Jay Vosburgh, [email protected]

2021-01-13 03:17:12

by Jarod Wilson

[permalink] [raw]
Subject: Re: [RFC PATCH net-next] bonding: add a vlan+srcmac tx hashing option

On Tue, Jan 12, 2021 at 01:39:10PM -0800, Jay Vosburgh wrote:
> Jarod Wilson <[email protected]> wrote:
>
> >On Thu, Jan 07, 2021 at 07:03:40PM -0500, Jarod Wilson wrote:
> >> On Fri, Dec 18, 2020 at 04:18:59PM -0800, Jay Vosburgh wrote:
> >> > Jarod Wilson <[email protected]> wrote:
> >> >
> >> > >This comes from an end-user request, where they're running multiple VMs on
> >> > >hosts with bonded interfaces connected to some interest switch topologies,
> >> > >where 802.3ad isn't an option. They're currently running a proprietary
> >> > >solution that effectively achieves load-balancing of VMs and bandwidth
> >> > >utilization improvements with a similar form of transmission algorithm.
> >> > >
> >> > >Basically, each VM has it's own vlan, so it always sends its traffic out
> >> > >the same interface, unless that interface fails. Traffic gets split
> >> > >between the interfaces, maintaining a consistent path, with failover still
> >> > >available if an interface goes down.
> >> > >
> >> > >This has been rudimetarily tested to provide similar results, suitable for
> >> > >them to use to move off their current proprietary solution.
> >> > >
> >> > >Still on the TODO list, if these even looks sane to begin with, is
> >> > >fleshing out Documentation/networking/bonding.rst.
> >> >
> >> > I'm sure you're aware, but any final submission will also need
> >> > to include netlink and iproute2 support.
> >>
> >> I believe everything for netlink support is already included, but I'll
> >> double-check that before submitting something for inclusion consideration.
> >
> >I'm not certain if what you actually meant was that I'd have to patch
> >iproute2 as well, which I've definitely stumbled onto today, but it's a
> >2-line patch, and everything seems to be working fine with it:
>
> Yes, that's what I meant.
>
> >$ sudo ip link set bond0 type bond xmit_hash_policy 5
>
> Does the above work with the text label (presumably "vlansrc")
> as well as the number, and does "ip link add test type bond help" print
> the correct text for XMIT_HASH_POLICY?

All of the above looks correct to me, output below. Before submitting...
Could rename it from vlansrc to vlan+srcmac or some variation thereof if
it's desired. I tried to keep it relatively short, but it's perhaps a bit
less succinct like I have it now, and other modes include a +.

$ sudo modprobe bonding mode=2 max_bonds=1 xmit_hash_policy=0
$ sudo ip link set bond0 type bond xmit_hash_policy vlansrc
$ cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v4.18.0-272.el8.vstx.x86_64

Bonding Mode: load balancing (xor)
Transmit Hash Policy: vlansrc (5)
MII Status: down
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0

$ sudo ip link add test type bond help
Usage: ... bond [ mode BONDMODE ] [ active_slave SLAVE_DEV ]
[ clear_active_slave ] [ miimon MIIMON ]
[ updelay UPDELAY ] [ downdelay DOWNDELAY ]
[ peer_notify_delay DELAY ]
[ use_carrier USE_CARRIER ]
[ arp_interval ARP_INTERVAL ]
[ arp_validate ARP_VALIDATE ]
[ arp_all_targets ARP_ALL_TARGETS ]
[ arp_ip_target [ ARP_IP_TARGET, ... ] ]
[ primary SLAVE_DEV ]
[ primary_reselect PRIMARY_RESELECT ]
[ fail_over_mac FAIL_OVER_MAC ]
[ xmit_hash_policy XMIT_HASH_POLICY ]
[ resend_igmp RESEND_IGMP ]
[ num_grat_arp|num_unsol_na NUM_GRAT_ARP|NUM_UNSOL_NA ]
[ all_slaves_active ALL_SLAVES_ACTIVE ]
[ min_links MIN_LINKS ]
[ lp_interval LP_INTERVAL ]
[ packets_per_slave PACKETS_PER_SLAVE ]
[ tlb_dynamic_lb TLB_DYNAMIC_LB ]
[ lacp_rate LACP_RATE ]
[ ad_select AD_SELECT ]
[ ad_user_port_key PORTKEY ]
[ ad_actor_sys_prio SYSPRIO ]
[ ad_actor_system LLADDR ]

BONDMODE := balance-rr|active-backup|balance-xor|broadcast|802.3ad|balance-tlb|balance-alb
ARP_VALIDATE := none|active|backup|all
ARP_ALL_TARGETS := any|all
PRIMARY_RESELECT := always|better|failure
FAIL_OVER_MAC := none|active|follow
XMIT_HASH_POLICY := layer2|layer2+3|layer3+4|encap2+3|encap3+4|vlansrc
LACP_RATE := slow|fast
AD_SELECT := stable|bandwidth|count


--
Jarod Wilson
[email protected]

2021-01-14 02:06:28

by Jarod Wilson

[permalink] [raw]
Subject: [PATCH net-next v2] bonding: add a vlan+mac tx hashing option

This comes from an end-user request, where they're running multiple VMs on
hosts with bonded interfaces connected to some interest switch topologies,
where 802.3ad isn't an option. They're currently running a proprietary
solution that effectively achieves load-balancing of VMs and bandwidth
utilization improvements with a similar form of transmission algorithm.

Basically, each VM has it's own vlan, so it always sends its traffic out
the same interface, unless that interface fails. Traffic gets split
between the interfaces, maintaining a consistent path, with failover still
available if an interface goes down.

This has been rudimetarily tested to provide similar results, suitable for
them to use to move off their current proprietary solution. A patch for
iproute2 is forthcoming as well, to properly support the new mode there as
well.

Cc: Jay Vosburgh <[email protected]>
Cc: Veaceslav Falico <[email protected]>
Cc: Andy Gospodarek <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Thomas Davis <[email protected]>
Cc: [email protected]
Signed-off-by: Jarod Wilson <[email protected]>
---
v2: verified netlink interfaces working, added Documentation, changed
tx hash mode name to vlan+mac for consistency and clarity.

Documentation/networking/bonding.rst | 13 +++++++++++++
drivers/net/bonding/bond_main.c | 27 +++++++++++++++++++++++++--
drivers/net/bonding/bond_options.c | 1 +
include/linux/netdevice.h | 1 +
include/uapi/linux/if_bonding.h | 1 +
5 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst
index adc314639085..c78ceb7630a0 100644
--- a/Documentation/networking/bonding.rst
+++ b/Documentation/networking/bonding.rst
@@ -951,6 +951,19 @@ xmit_hash_policy
packets will be distributed according to the encapsulated
flows.

+ vlan+mac
+
+ This policy uses a very rudimentary vland ID and source mac
+ ID hash to load-balance traffic per-vlan, with failover
+ should one leg fail. The intended use case is for a bond
+ shared by multiple virtual machines, all configured to
+ use their own vlan, to give lacp-like functionality
+ without requiring lacp-capable switching hardware.
+
+ The formula for the hash is simply
+
+ hash = (vlan ID) XOR (source MAC)
+
The default value is layer2. This option was added in bonding
version 2.6.3. In earlier versions of bonding, this parameter
does not exist, and the layer2 policy is the only policy. The
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 5fe5232cc3f3..766c09a553c1 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -164,7 +164,7 @@ module_param(xmit_hash_policy, charp, 0);
MODULE_PARM_DESC(xmit_hash_policy, "balance-alb, balance-tlb, balance-xor, 802.3ad hashing method; "
"0 for layer 2 (default), 1 for layer 3+4, "
"2 for layer 2+3, 3 for encap layer 2+3, "
- "4 for encap layer 3+4");
+ "4 for encap layer 3+4, 5 for vlan+mac");
module_param(arp_interval, int, 0);
MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
module_param_array(arp_ip_target, charp, NULL, 0);
@@ -1434,6 +1434,8 @@ static enum netdev_lag_hash bond_lag_hash_type(struct bonding *bond,
return NETDEV_LAG_HASH_E23;
case BOND_XMIT_POLICY_ENCAP34:
return NETDEV_LAG_HASH_E34;
+ case BOND_XMIT_POLICY_VLAN_SRCMAC:
+ return NETDEV_LAG_HASH_VLAN_SRCMAC;
default:
return NETDEV_LAG_HASH_UNKNOWN;
}
@@ -3494,6 +3496,20 @@ static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk,
return true;
}

+static inline u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
+{
+ struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);
+ u32 srcmac = mac_hdr->h_source[5];
+ u16 vlan;
+
+ if (!skb_vlan_tag_present(skb))
+ return srcmac;
+
+ vlan = skb_vlan_tag_get(skb);
+
+ return srcmac ^ vlan;
+}
+
/* Extract the appropriate headers based on bond's xmit policy */
static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
struct flow_keys *fk)
@@ -3501,10 +3517,14 @@ static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
bool l34 = bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER34;
int noff, proto = -1;

- if (bond->params.xmit_policy > BOND_XMIT_POLICY_LAYER23) {
+ switch (bond->params.xmit_policy) {
+ case BOND_XMIT_POLICY_ENCAP23:
+ case BOND_XMIT_POLICY_ENCAP34:
memset(fk, 0, sizeof(*fk));
return __skb_flow_dissect(NULL, skb, &flow_keys_bonding,
fk, NULL, 0, 0, 0, 0);
+ default:
+ break;
}

fk->ports.ports = 0;
@@ -3556,6 +3576,9 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
skb->l4_hash)
return skb->hash;

+ if (bond->params.xmit_policy == BOND_XMIT_POLICY_VLAN_SRCMAC)
+ return bond_vlan_srcmac_hash(skb);
+
if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 ||
!bond_flow_dissect(bond, skb, &flow))
return bond_eth_hash(skb);
diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index a4e4e15f574d..deafe3587c80 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -101,6 +101,7 @@ static const struct bond_opt_value bond_xmit_hashtype_tbl[] = {
{ "layer2+3", BOND_XMIT_POLICY_LAYER23, 0},
{ "encap2+3", BOND_XMIT_POLICY_ENCAP23, 0},
{ "encap3+4", BOND_XMIT_POLICY_ENCAP34, 0},
+ { "vlan+mac", BOND_XMIT_POLICY_VLAN_SRCMAC, 0},
{ NULL, -1, 0},
};

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 5b949076ed23..a94ce80a2fe1 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2615,6 +2615,7 @@ enum netdev_lag_hash {
NETDEV_LAG_HASH_L23,
NETDEV_LAG_HASH_E23,
NETDEV_LAG_HASH_E34,
+ NETDEV_LAG_HASH_VLAN_SRCMAC,
NETDEV_LAG_HASH_UNKNOWN,
};

diff --git a/include/uapi/linux/if_bonding.h b/include/uapi/linux/if_bonding.h
index 45f3750aa861..e8eb4ad03cf1 100644
--- a/include/uapi/linux/if_bonding.h
+++ b/include/uapi/linux/if_bonding.h
@@ -94,6 +94,7 @@
#define BOND_XMIT_POLICY_LAYER23 2 /* layer 2+3 (IP ^ MAC) */
#define BOND_XMIT_POLICY_ENCAP23 3 /* encapsulated layer 2+3 */
#define BOND_XMIT_POLICY_ENCAP34 4 /* encapsulated layer 3+4 */
+#define BOND_XMIT_POLICY_VLAN_SRCMAC 5 /* vlan + source MAC */

/* 802.3ad port state definitions (43.4.2.2 in the 802.3ad standard) */
#define LACP_STATE_LACP_ACTIVITY 0x1
--
2.29.2

2021-01-15 05:25:20

by moyufeng

[permalink] [raw]
Subject: question about bonding mode 4

Hi Team,

I have a question about bonding. During testing bonding mode 4
scenarios, I find that there is a very low probability that
the pointer is null. The following information is displayed:

[99359.795934] bond0: (slave eth13.2001): Port 2 did not find a suitable aggregator
[99359.796960] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
[99359.798127] Mem abort info:
[99359.798526] ESR = 0x96000004
[99359.798938] EC = 0x25: DABT (current EL), IL = 32 bits
[99359.799673] SET = 0, FnV = 0
[99359.800106] EA = 0, S1PTW = 0
[99359.800554] Data abort info:
[99359.800952] ISV = 0, ISS = 0x00000004
[99359.801522] CM = 0, WnR = 0
[99359.801970] user pgtable: 4k pages, 48-bit VAs, pgdp=00000000c64e6000
[99359.802876] [0000000000000020] pgd=0000000000000000
[99359.803555] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[99359.804369] Modules linked in: bonding hns3(-) hclgevf hnae3 [last unloaded: bonding]
[99359.805494] CPU: 1 PID: 951 Comm: kworker/u10:2 Not tainted 5.7.0-rc4+ #1
[99359.806455] Hardware name: linux,dummy-virt (DT)
[99359.807107] Workqueue: bond0 bond_3ad_state_machine_handler [bonding]
[99359.808056] pstate: 60c00005 (nZCv daif +PAN +UAO)
[99359.808722] pc : bond_3ad_state_machine_handler+0x7fc/0xdb8 [bonding]
[99359.809652] lr : bond_3ad_state_machine_handler+0x7f4/0xdb8 [bonding]
[99359.810535] sp : ffff80001882bd20
[99359.811012] x29: ffff80001882bd20 x28: ffff000085939a38
[99359.811791] x27: ffff00008649bb68 x26: 00000000aaaaaaab
[99359.812871] x25: ffff800009401000 x24: ffff800009408de4
[99359.814049] x23: ffff80001882bd98 x22: ffff00008649b880
[99359.815210] x21: 0000000000000000 x20: ffff000085939a00
[99359.816401] x19: ffff00008649b880 x18: ffff800012572988
[99359.817637] x17: 0000000000000000 x16: 0000000000000000
[99359.818870] x15: ffff80009882b987 x14: 726f746167657267
[99359.820090] x13: 676120656c626174 x12: 697573206120646e
[99359.821374] x11: 696620746f6e2064 x10: 696420322074726f
[99359.822659] x9 : 50203a2931303032 x8 : 0000000000081391
[99359.823891] x7 : ffff8000108e3ad0 x6 : ffff8000128858bb
[99359.825109] x5 : 0000000000000000 x4 : 0000000000000000
[99359.826262] x3 : 00000000ffffffff x2 : 906b329bb5362a00
[99359.827394] x1 : 906b329bb5362a00 x0 : 0000000000000000
[99359.828540] Call trace:
[99359.829071] bond_3ad_state_machine_handler+0x7fc/0xdb8 [bonding]
[99359.830367] process_one_work+0x15c/0x4a0
[99359.831216] worker_thread+0x50/0x478
[99359.832022] kthread+0x130/0x160
[99359.832716] ret_from_fork+0x10/0x18
[99359.833487] Code: 910c0021 95f704bb f9403f80 b5ffe300 (f9401000)
[99359.834742] ---[ end trace c7a8e02914afc4e0 ]---
[99359.835817] Kernel panic - not syncing: Fatal exception in interrupt
[99359.837334] SMP: stopping secondary CPUs
[99359.838277] Kernel Offset: disabled
[99359.839086] CPU features: 0x080002,22208218
[99359.840053] Memory Limit: none
[99359.840783] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

The test procedure is as follows:
1. Configure bonding and set it to mode 4.
echo "4" > /sys/class/net/bond0/bonding/mode
ifconfig bond0 up

2. Configure two VLANs and add them to the bonding in step 1.
vconfig add eth0 2001
vconfig add eth1 2001
ifenslave bond0 eth0.2001 eth1.2001

3. Unload the network device driver and bonding driver.
rmmod hns3
rmmod hclge
rmmod hnae3
rmmod bonding.ko

4. Repeat the preceding steps for a long time.

By checking the logic in ad_port_selection_logic(), I find that
if enter the branch "Port %d did not find a suitable aggregator",
the value of port->aggregator will be NULL, causing the problem.

So I'd like to ask what circumstances will be involved in this
branch, and what should be done in this case?


The detailed code analysis is as follows:

static void ad_port_selection_logic(struct port *port, bool *update_slave_arr)
{
[...]
/* if the port is connected to other aggregator, detach it */
if (port->aggregator) {
/* detach the port from its former aggregator */
temp_aggregator = port->aggregator;
for (curr_port = temp_aggregator->lag_ports; curr_port;
last_port = curr_port,
curr_port = curr_port->next_port_in_aggregator) {
if (curr_port == port) {
temp_aggregator->num_of_ports--;
/* if it is the first port attached to the
* aggregator
*/
if (!last_port) {
temp_aggregator->lag_ports =
port->next_port_in_aggregator;
} else {
/* not the first port attached to the
* aggregator
*/
last_port->next_port_in_aggregator =
port->next_port_in_aggregator;
}

/* clear the port's relations to this
* aggregator
*/
port->aggregator = NULL;

----analysis: set port->aggregator NULL at the beginning of ad_port_selection_logic().

port->next_port_in_aggregator = NULL;
port->actor_port_aggregator_identifier = 0;

slave_dbg(bond->dev, port->slave->dev, "Port %d left LAG %d\n",
port->actor_port_number,
temp_aggregator->aggregator_identifier);
/* if the aggregator is empty, clear its
* parameters, and set it ready to be attached
*/
if (!temp_aggregator->lag_ports)
ad_clear_agg(temp_aggregator);
break;
}
}
if (!curr_port) {
/* meaning: the port was related to an aggregator
* but was not on the aggregator port list
*/
net_warn_ratelimited("%s: (slave %s): Warning: Port %d was related to aggregator %d but was not on its port list\n",
port->slave->bond->dev->name,
port->slave->dev->name,
port->actor_port_number,
port->aggregator->aggregator_identifier);
}
}
/* search on all aggregators for a suitable aggregator for this port */
bond_for_each_slave(bond, slave, iter) {
aggregator = &(SLAVE_AD_INFO(slave)->aggregator);
/* keep a free aggregator for later use(if needed) */
if (!aggregator->lag_ports) {
if (!free_aggregator)
free_aggregator = aggregator;

----analysis: Save free_aggregator if found a free aggregator. But in this case, no free aggregator is available.

continue;
}
/* check if current aggregator suits us */
if (((aggregator->actor_oper_aggregator_key == port->actor_oper_port_key) && /* if all parameters match AND */
MAC_ADDRESS_EQUAL(&(aggregator->partner_system), &(port->partner_oper.system)) &&
(aggregator->partner_system_priority == port->partner_oper.system_priority) &&
(aggregator->partner_oper_aggregator_key == port->partner_oper.key)
) &&
((!MAC_ADDRESS_EQUAL(&(port->partner_oper.system), &(null_mac_addr)) && /* partner answers */
!aggregator->is_individual) /* but is not individual OR */
)
) {
/* attach to the founded aggregator */
port->aggregator = aggregator;

----analysis: If a proper aggregator is found, port->aggregator is assigned here.

port->actor_port_aggregator_identifier =
port->aggregator->aggregator_identifier;
port->next_port_in_aggregator = aggregator->lag_ports;
port->aggregator->num_of_ports++;
aggregator->lag_ports = port;
slave_dbg(bond->dev, slave->dev, "Port %d joined LAG %d (existing LAG)\n",
port->actor_port_number,
port->aggregator->aggregator_identifier);

/* mark this port as selected */
port->sm_vars |= AD_PORT_SELECTED;
found = 1;

----analysis: Set found to 1 if port->aggregator is assigned.

break;
}
}
/* the port couldn't find an aggregator - attach it to a new
* aggregator
*/
if (!found) {
if (free_aggregator) {
/* assign port a new aggregator */
port->aggregator = free_aggregator;

----analysis: No proper free_aggregator is found. Therefore, port->aggregator cannot be assigned here.

port->actor_port_aggregator_identifier =
port->aggregator->aggregator_identifier;

/* update the new aggregator's parameters
* if port was responsed from the end-user
*/
if (port->actor_oper_port_key & AD_DUPLEX_KEY_MASKS)
/* if port is full duplex */
port->aggregator->is_individual = false;
else
port->aggregator->is_individual = true;

port->aggregator->actor_admin_aggregator_key =
port->actor_admin_port_key;
port->aggregator->actor_oper_aggregator_key =
port->actor_oper_port_key;
port->aggregator->partner_system =
port->partner_oper.system;
port->aggregator->partner_system_priority =
port->partner_oper.system_priority;
port->aggregator->partner_oper_aggregator_key = port->partner_oper.key;
port->aggregator->receive_state = 1;
port->aggregator->transmit_state = 1;
port->aggregator->lag_ports = port;
port->aggregator->num_of_ports++;

/* mark this port as selected */
port->sm_vars |= AD_PORT_SELECTED;

slave_dbg(bond->dev, port->slave->dev, "Port %d joined LAG %d (new LAG)\n",
port->actor_port_number,
port->aggregator->aggregator_identifier);
} else {
slave_err(bond->dev, port->slave->dev,
"Port %d did not find a suitable aggregator\n",
port->actor_port_number);

----analysis: The fault scenario goes to this branch, and port->aggregator remains NULL.

}
}
/* if all aggregator's ports are READY_N == TRUE, set ready=TRUE
* in all aggregator's ports, else set ready=FALSE in all
* aggregator's ports
*/
__set_agg_ports_ready(port->aggregator,
__agg_ports_are_ready(port->aggregator));

----analysis: port->aggregator is still NULL, which causes problem.


aggregator = __get_first_agg(port);
ad_agg_selection_logic(aggregator, update_slave_arr);

if (!port->aggregator->is_active)
port->actor_oper_port_state &= ~LACP_STATE_SYNCHRONIZATION;
}


Thanks.

2021-01-15 19:25:42

by Jarod Wilson

[permalink] [raw]
Subject: [PATCH net-next v3] bonding: add a vlan+srcmac tx hashing option

This comes from an end-user request, where they're running multiple VMs on
hosts with bonded interfaces connected to some interest switch topologies,
where 802.3ad isn't an option. They're currently running a proprietary
solution that effectively achieves load-balancing of VMs and bandwidth
utilization improvements with a similar form of transmission algorithm.

Basically, each VM has it's own vlan, so it always sends its traffic out
the same interface, unless that interface fails. Traffic gets split
between the interfaces, maintaining a consistent path, with failover still
available if an interface goes down.

Unlike bond_eth_hash(), this hash function is using the full source MAC
address instead of just the last byte, as there are so few components to
the hash, and in the no-vlan case, we would be returning just the last
byte of the source MAC as the hash value. It's entirely possible to have
two NICs in a bond with the same last byte of their MAC, but not the same
MAC, so this adjustment should guarantee distinct hashes in all cases.

This has been rudimetarily tested to provide similar results to the
proprietary solution it is aiming to replace. A patch for iproute2 is also
posted, to properly support the new mode there as well.

Cc: Jay Vosburgh <[email protected]>
Cc: Veaceslav Falico <[email protected]>
Cc: Andy Gospodarek <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Thomas Davis <[email protected]>
Cc: [email protected]
Signed-off-by: Jarod Wilson <[email protected]>
---
v2: verified netlink interfaces working, added Documentation, changed
tx hash mode name to vlan+mac for consistency and clarity.
v3: drop inline from hash function, use full source MAC, not just the
last byte, expand explanation in patch description, extend hash name to
vlan+srcmac.

Documentation/networking/bonding.rst | 13 +++++++++++
drivers/net/bonding/bond_main.c | 34 ++++++++++++++++++++++++++--
drivers/net/bonding/bond_options.c | 13 ++++++-----
include/linux/netdevice.h | 1 +
include/uapi/linux/if_bonding.h | 1 +
5 files changed, 54 insertions(+), 8 deletions(-)

diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst
index adc314639085..36562dcd3e1e 100644
--- a/Documentation/networking/bonding.rst
+++ b/Documentation/networking/bonding.rst
@@ -951,6 +951,19 @@ xmit_hash_policy
packets will be distributed according to the encapsulated
flows.

+ vlan+srcmac
+
+ This policy uses a very rudimentary vland ID and source mac
+ ID hash to load-balance traffic per-vlan, with failover
+ should one leg fail. The intended use case is for a bond
+ shared by multiple virtual machines, all configured to
+ use their own vlan, to give lacp-like functionality
+ without requiring lacp-capable switching hardware.
+
+ The formula for the hash is simply
+
+ hash = (vlan ID) XOR (source MAC vendor) XOR (source MAC dev)
+
The default value is layer2. This option was added in bonding
version 2.6.3. In earlier versions of bonding, this parameter
does not exist, and the layer2 policy is the only policy. The
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 5fe5232cc3f3..d4bc4d4e953b 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -164,7 +164,7 @@ module_param(xmit_hash_policy, charp, 0);
MODULE_PARM_DESC(xmit_hash_policy, "balance-alb, balance-tlb, balance-xor, 802.3ad hashing method; "
"0 for layer 2 (default), 1 for layer 3+4, "
"2 for layer 2+3, 3 for encap layer 2+3, "
- "4 for encap layer 3+4");
+ "4 for encap layer 3+4, 5 for vlan+srcmac");
module_param(arp_interval, int, 0);
MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
module_param_array(arp_ip_target, charp, NULL, 0);
@@ -1434,6 +1434,8 @@ static enum netdev_lag_hash bond_lag_hash_type(struct bonding *bond,
return NETDEV_LAG_HASH_E23;
case BOND_XMIT_POLICY_ENCAP34:
return NETDEV_LAG_HASH_E34;
+ case BOND_XMIT_POLICY_VLAN_SRCMAC:
+ return NETDEV_LAG_HASH_VLAN_SRCMAC;
default:
return NETDEV_LAG_HASH_UNKNOWN;
}
@@ -3494,6 +3496,27 @@ static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk,
return true;
}

+static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
+{
+ struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);
+ u32 srcmac_vendor = 0, srcmac_dev = 0;
+ u16 vlan;
+ int i;
+
+ for (i = 0; i < 3; i++)
+ srcmac_vendor = (srcmac_vendor << 8) | mac_hdr->h_source[i];
+
+ for (i = 3; i < ETH_ALEN; i++)
+ srcmac_dev = (srcmac_dev << 8) | mac_hdr->h_source[i];
+
+ if (!skb_vlan_tag_present(skb))
+ return srcmac_vendor ^ srcmac_dev;
+
+ vlan = skb_vlan_tag_get(skb);
+
+ return vlan ^ srcmac_vendor ^ srcmac_dev;
+}
+
/* Extract the appropriate headers based on bond's xmit policy */
static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
struct flow_keys *fk)
@@ -3501,10 +3524,14 @@ static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
bool l34 = bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER34;
int noff, proto = -1;

- if (bond->params.xmit_policy > BOND_XMIT_POLICY_LAYER23) {
+ switch (bond->params.xmit_policy) {
+ case BOND_XMIT_POLICY_ENCAP23:
+ case BOND_XMIT_POLICY_ENCAP34:
memset(fk, 0, sizeof(*fk));
return __skb_flow_dissect(NULL, skb, &flow_keys_bonding,
fk, NULL, 0, 0, 0, 0);
+ default:
+ break;
}

fk->ports.ports = 0;
@@ -3556,6 +3583,9 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
skb->l4_hash)
return skb->hash;

+ if (bond->params.xmit_policy == BOND_XMIT_POLICY_VLAN_SRCMAC)
+ return bond_vlan_srcmac_hash(skb);
+
if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 ||
!bond_flow_dissect(bond, skb, &flow))
return bond_eth_hash(skb);
diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index a4e4e15f574d..c69400c5bf07 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -96,12 +96,13 @@ static const struct bond_opt_value bond_pps_tbl[] = {
};

static const struct bond_opt_value bond_xmit_hashtype_tbl[] = {
- { "layer2", BOND_XMIT_POLICY_LAYER2, BOND_VALFLAG_DEFAULT},
- { "layer3+4", BOND_XMIT_POLICY_LAYER34, 0},
- { "layer2+3", BOND_XMIT_POLICY_LAYER23, 0},
- { "encap2+3", BOND_XMIT_POLICY_ENCAP23, 0},
- { "encap3+4", BOND_XMIT_POLICY_ENCAP34, 0},
- { NULL, -1, 0},
+ { "layer2", BOND_XMIT_POLICY_LAYER2, BOND_VALFLAG_DEFAULT},
+ { "layer3+4", BOND_XMIT_POLICY_LAYER34, 0},
+ { "layer2+3", BOND_XMIT_POLICY_LAYER23, 0},
+ { "encap2+3", BOND_XMIT_POLICY_ENCAP23, 0},
+ { "encap3+4", BOND_XMIT_POLICY_ENCAP34, 0},
+ { "vlan+srcmac", BOND_XMIT_POLICY_VLAN_SRCMAC, 0},
+ { NULL, -1, 0},
};

static const struct bond_opt_value bond_arp_validate_tbl[] = {
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 5b949076ed23..a94ce80a2fe1 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2615,6 +2615,7 @@ enum netdev_lag_hash {
NETDEV_LAG_HASH_L23,
NETDEV_LAG_HASH_E23,
NETDEV_LAG_HASH_E34,
+ NETDEV_LAG_HASH_VLAN_SRCMAC,
NETDEV_LAG_HASH_UNKNOWN,
};

diff --git a/include/uapi/linux/if_bonding.h b/include/uapi/linux/if_bonding.h
index 45f3750aa861..e8eb4ad03cf1 100644
--- a/include/uapi/linux/if_bonding.h
+++ b/include/uapi/linux/if_bonding.h
@@ -94,6 +94,7 @@
#define BOND_XMIT_POLICY_LAYER23 2 /* layer 2+3 (IP ^ MAC) */
#define BOND_XMIT_POLICY_ENCAP23 3 /* encapsulated layer 2+3 */
#define BOND_XMIT_POLICY_ENCAP34 4 /* encapsulated layer 3+4 */
+#define BOND_XMIT_POLICY_VLAN_SRCMAC 5 /* vlan + source MAC */

/* 802.3ad port state definitions (43.4.2.2 in the 802.3ad standard) */
#define LACP_STATE_LACP_ACTIVITY 0x1
--
2.29.2

2021-01-19 05:36:07

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH net-next v3] bonding: add a vlan+srcmac tx hashing option

On 1/15/21 12:21 PM, Jarod Wilson wrote:
> diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst
> index adc314639085..36562dcd3e1e 100644
> --- a/Documentation/networking/bonding.rst
> +++ b/Documentation/networking/bonding.rst
> @@ -951,6 +951,19 @@ xmit_hash_policy
> packets will be distributed according to the encapsulated
> flows.
>
> + vlan+srcmac
> +
> + This policy uses a very rudimentary vland ID and source mac

s/vland/vlan/

> + ID hash to load-balance traffic per-vlan, with failover

drop ID on this line; just 'source mac'.


> + should one leg fail. The intended use case is for a bond
> + shared by multiple virtual machines, all configured to
> + use their own vlan, to give lacp-like functionality
> + without requiring lacp-capable switching hardware.
> +
> + The formula for the hash is simply
> +
> + hash = (vlan ID) XOR (source MAC vendor) XOR (source MAC dev)
> +
> The default value is layer2. This option was added in bonding
> version 2.6.3. In earlier versions of bonding, this parameter
> does not exist, and the layer2 policy is the only policy. The

2021-01-19 05:40:00

by Jarod Wilson

[permalink] [raw]
Subject: Re: [PATCH net-next v3] bonding: add a vlan+srcmac tx hashing option

On Mon, Jan 18, 2021 at 04:10:38PM -0700, David Ahern wrote:
> On 1/15/21 12:21 PM, Jarod Wilson wrote:
> > diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst
> > index adc314639085..36562dcd3e1e 100644
> > --- a/Documentation/networking/bonding.rst
> > +++ b/Documentation/networking/bonding.rst
> > @@ -951,6 +951,19 @@ xmit_hash_policy
> > packets will be distributed according to the encapsulated
> > flows.
> >
> > + vlan+srcmac
> > +
> > + This policy uses a very rudimentary vland ID and source mac
>
> s/vland/vlan/
>
> > + ID hash to load-balance traffic per-vlan, with failover
>
> drop ID on this line; just 'source mac'.

Bah. Crap. Didn't test documentation, clearly. Or proof-read it. Will fix
in v4. Hopefully, nothing else to change though...

--
Jarod Wilson
[email protected]

2021-01-19 05:40:02

by Jarod Wilson

[permalink] [raw]
Subject: [PATCH net-next v4] bonding: add a vlan+srcmac tx hashing option

This comes from an end-user request, where they're running multiple VMs on
hosts with bonded interfaces connected to some interest switch topologies,
where 802.3ad isn't an option. They're currently running a proprietary
solution that effectively achieves load-balancing of VMs and bandwidth
utilization improvements with a similar form of transmission algorithm.

Basically, each VM has it's own vlan, so it always sends its traffic out
the same interface, unless that interface fails. Traffic gets split
between the interfaces, maintaining a consistent path, with failover still
available if an interface goes down.

Unlike bond_eth_hash(), this hash function is using the full source MAC
address instead of just the last byte, as there are so few components to
the hash, and in the no-vlan case, we would be returning just the last
byte of the source MAC as the hash value. It's entirely possible to have
two NICs in a bond with the same last byte of their MAC, but not the same
MAC, so this adjustment should guarantee distinct hashes in all cases.

This has been rudimetarily tested to provide similar results to the
proprietary solution it is aiming to replace. A patch for iproute2 is also
posted, to properly support the new mode there as well.

Cc: Jay Vosburgh <[email protected]>
Cc: Veaceslav Falico <[email protected]>
Cc: Andy Gospodarek <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Thomas Davis <[email protected]>
Cc: [email protected]
Signed-off-by: Jarod Wilson <[email protected]>
---
v2: verified netlink interfaces working, added Documentation, changed
tx hash mode name to vlan+mac for consistency and clarity.
v3: drop inline from hash function, use full source MAC, not just the
last byte, expand explanation in patch description, extend hash name to
vlan+srcmac.
v4: fix documentation issues pointed out by David Ahern

Documentation/networking/bonding.rst | 13 +++++++++++
drivers/net/bonding/bond_main.c | 34 ++++++++++++++++++++++++++--
drivers/net/bonding/bond_options.c | 13 ++++++-----
include/linux/netdevice.h | 1 +
include/uapi/linux/if_bonding.h | 1 +
5 files changed, 54 insertions(+), 8 deletions(-)

diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst
index adc314639085..36562dcd3e1e 100644
--- a/Documentation/networking/bonding.rst
+++ b/Documentation/networking/bonding.rst
@@ -951,6 +951,19 @@ xmit_hash_policy
packets will be distributed according to the encapsulated
flows.

+ vlan+srcmac
+
+ This policy uses a very rudimentary vlan ID and source mac
+ hash to load-balance traffic per-vlan, with failover
+ should one leg fail. The intended use case is for a bond
+ shared by multiple virtual machines, all configured to
+ use their own vlan, to give lacp-like functionality
+ without requiring lacp-capable switching hardware.
+
+ The formula for the hash is simply
+
+ hash = (vlan ID) XOR (source MAC vendor) XOR (source MAC dev)
+
The default value is layer2. This option was added in bonding
version 2.6.3. In earlier versions of bonding, this parameter
does not exist, and the layer2 policy is the only policy. The
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 5fe5232cc3f3..d4bc4d4e953b 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -164,7 +164,7 @@ module_param(xmit_hash_policy, charp, 0);
MODULE_PARM_DESC(xmit_hash_policy, "balance-alb, balance-tlb, balance-xor, 802.3ad hashing method; "
"0 for layer 2 (default), 1 for layer 3+4, "
"2 for layer 2+3, 3 for encap layer 2+3, "
- "4 for encap layer 3+4");
+ "4 for encap layer 3+4, 5 for vlan+srcmac");
module_param(arp_interval, int, 0);
MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
module_param_array(arp_ip_target, charp, NULL, 0);
@@ -1434,6 +1434,8 @@ static enum netdev_lag_hash bond_lag_hash_type(struct bonding *bond,
return NETDEV_LAG_HASH_E23;
case BOND_XMIT_POLICY_ENCAP34:
return NETDEV_LAG_HASH_E34;
+ case BOND_XMIT_POLICY_VLAN_SRCMAC:
+ return NETDEV_LAG_HASH_VLAN_SRCMAC;
default:
return NETDEV_LAG_HASH_UNKNOWN;
}
@@ -3494,6 +3496,27 @@ static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk,
return true;
}

+static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
+{
+ struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);
+ u32 srcmac_vendor = 0, srcmac_dev = 0;
+ u16 vlan;
+ int i;
+
+ for (i = 0; i < 3; i++)
+ srcmac_vendor = (srcmac_vendor << 8) | mac_hdr->h_source[i];
+
+ for (i = 3; i < ETH_ALEN; i++)
+ srcmac_dev = (srcmac_dev << 8) | mac_hdr->h_source[i];
+
+ if (!skb_vlan_tag_present(skb))
+ return srcmac_vendor ^ srcmac_dev;
+
+ vlan = skb_vlan_tag_get(skb);
+
+ return vlan ^ srcmac_vendor ^ srcmac_dev;
+}
+
/* Extract the appropriate headers based on bond's xmit policy */
static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
struct flow_keys *fk)
@@ -3501,10 +3524,14 @@ static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
bool l34 = bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER34;
int noff, proto = -1;

- if (bond->params.xmit_policy > BOND_XMIT_POLICY_LAYER23) {
+ switch (bond->params.xmit_policy) {
+ case BOND_XMIT_POLICY_ENCAP23:
+ case BOND_XMIT_POLICY_ENCAP34:
memset(fk, 0, sizeof(*fk));
return __skb_flow_dissect(NULL, skb, &flow_keys_bonding,
fk, NULL, 0, 0, 0, 0);
+ default:
+ break;
}

fk->ports.ports = 0;
@@ -3556,6 +3583,9 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
skb->l4_hash)
return skb->hash;

+ if (bond->params.xmit_policy == BOND_XMIT_POLICY_VLAN_SRCMAC)
+ return bond_vlan_srcmac_hash(skb);
+
if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 ||
!bond_flow_dissect(bond, skb, &flow))
return bond_eth_hash(skb);
diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index a4e4e15f574d..c69400c5bf07 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -96,12 +96,13 @@ static const struct bond_opt_value bond_pps_tbl[] = {
};

static const struct bond_opt_value bond_xmit_hashtype_tbl[] = {
- { "layer2", BOND_XMIT_POLICY_LAYER2, BOND_VALFLAG_DEFAULT},
- { "layer3+4", BOND_XMIT_POLICY_LAYER34, 0},
- { "layer2+3", BOND_XMIT_POLICY_LAYER23, 0},
- { "encap2+3", BOND_XMIT_POLICY_ENCAP23, 0},
- { "encap3+4", BOND_XMIT_POLICY_ENCAP34, 0},
- { NULL, -1, 0},
+ { "layer2", BOND_XMIT_POLICY_LAYER2, BOND_VALFLAG_DEFAULT},
+ { "layer3+4", BOND_XMIT_POLICY_LAYER34, 0},
+ { "layer2+3", BOND_XMIT_POLICY_LAYER23, 0},
+ { "encap2+3", BOND_XMIT_POLICY_ENCAP23, 0},
+ { "encap3+4", BOND_XMIT_POLICY_ENCAP34, 0},
+ { "vlan+srcmac", BOND_XMIT_POLICY_VLAN_SRCMAC, 0},
+ { NULL, -1, 0},
};

static const struct bond_opt_value bond_arp_validate_tbl[] = {
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 5b949076ed23..a94ce80a2fe1 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2615,6 +2615,7 @@ enum netdev_lag_hash {
NETDEV_LAG_HASH_L23,
NETDEV_LAG_HASH_E23,
NETDEV_LAG_HASH_E34,
+ NETDEV_LAG_HASH_VLAN_SRCMAC,
NETDEV_LAG_HASH_UNKNOWN,
};

diff --git a/include/uapi/linux/if_bonding.h b/include/uapi/linux/if_bonding.h
index 45f3750aa861..e8eb4ad03cf1 100644
--- a/include/uapi/linux/if_bonding.h
+++ b/include/uapi/linux/if_bonding.h
@@ -94,6 +94,7 @@
#define BOND_XMIT_POLICY_LAYER23 2 /* layer 2+3 (IP ^ MAC) */
#define BOND_XMIT_POLICY_ENCAP23 3 /* encapsulated layer 2+3 */
#define BOND_XMIT_POLICY_ENCAP34 4 /* encapsulated layer 3+4 */
+#define BOND_XMIT_POLICY_VLAN_SRCMAC 5 /* vlan + source MAC */

/* 802.3ad port state definitions (43.4.2.2 in the 802.3ad standard) */
#define LACP_STATE_LACP_ACTIVITY 0x1
--
2.29.2

2021-01-20 06:39:31

by patchwork-bot+netdevbpf

[permalink] [raw]
Subject: Re: [PATCH net-next v4] bonding: add a vlan+srcmac tx hashing option

Hello:

This patch was applied to netdev/net-next.git (refs/heads/master):

On Mon, 18 Jan 2021 20:09:27 -0500 you wrote:
> This comes from an end-user request, where they're running multiple VMs on
> hosts with bonded interfaces connected to some interest switch topologies,
> where 802.3ad isn't an option. They're currently running a proprietary
> solution that effectively achieves load-balancing of VMs and bandwidth
> utilization improvements with a similar form of transmission algorithm.
>
> Basically, each VM has it's own vlan, so it always sends its traffic out
> the same interface, unless that interface fails. Traffic gets split
> between the interfaces, maintaining a consistent path, with failover still
> available if an interface goes down.
>
> [...]

Here is the summary with links:
- [net-next,v4] bonding: add a vlan+srcmac tx hashing option
https://git.kernel.org/netdev/net-next/c/7b8fc0103bb5

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html


2021-01-23 06:15:19

by moyufeng

[permalink] [raw]
Subject: Re: question about bonding mode 4

Ping...
Any comments? Thanks!

On 2021/1/15 10:02, moyufeng wrote:
> Hi Team,
>
> I have a question about bonding. During testing bonding mode 4
> scenarios, I find that there is a very low probability that
> the pointer is null. The following information is displayed:
>
> [99359.795934] bond0: (slave eth13.2001): Port 2 did not find a suitable aggregator
> [99359.796960] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
> [99359.798127] Mem abort info:
> [99359.798526] ESR = 0x96000004
> [99359.798938] EC = 0x25: DABT (current EL), IL = 32 bits
> [99359.799673] SET = 0, FnV = 0
> [99359.800106] EA = 0, S1PTW = 0
> [99359.800554] Data abort info:
> [99359.800952] ISV = 0, ISS = 0x00000004
> [99359.801522] CM = 0, WnR = 0
> [99359.801970] user pgtable: 4k pages, 48-bit VAs, pgdp=00000000c64e6000
> [99359.802876] [0000000000000020] pgd=0000000000000000
> [99359.803555] Internal error: Oops: 96000004 [#1] PREEMPT SMP
> [99359.804369] Modules linked in: bonding hns3(-) hclgevf hnae3 [last unloaded: bonding]
> [99359.805494] CPU: 1 PID: 951 Comm: kworker/u10:2 Not tainted 5.7.0-rc4+ #1
> [99359.806455] Hardware name: linux,dummy-virt (DT)
> [99359.807107] Workqueue: bond0 bond_3ad_state_machine_handler [bonding]
> [99359.808056] pstate: 60c00005 (nZCv daif +PAN +UAO)
> [99359.808722] pc : bond_3ad_state_machine_handler+0x7fc/0xdb8 [bonding]
> [99359.809652] lr : bond_3ad_state_machine_handler+0x7f4/0xdb8 [bonding]
> [99359.810535] sp : ffff80001882bd20
> [99359.811012] x29: ffff80001882bd20 x28: ffff000085939a38
> [99359.811791] x27: ffff00008649bb68 x26: 00000000aaaaaaab
> [99359.812871] x25: ffff800009401000 x24: ffff800009408de4
> [99359.814049] x23: ffff80001882bd98 x22: ffff00008649b880
> [99359.815210] x21: 0000000000000000 x20: ffff000085939a00
> [99359.816401] x19: ffff00008649b880 x18: ffff800012572988
> [99359.817637] x17: 0000000000000000 x16: 0000000000000000
> [99359.818870] x15: ffff80009882b987 x14: 726f746167657267
> [99359.820090] x13: 676120656c626174 x12: 697573206120646e
> [99359.821374] x11: 696620746f6e2064 x10: 696420322074726f
> [99359.822659] x9 : 50203a2931303032 x8 : 0000000000081391
> [99359.823891] x7 : ffff8000108e3ad0 x6 : ffff8000128858bb
> [99359.825109] x5 : 0000000000000000 x4 : 0000000000000000
> [99359.826262] x3 : 00000000ffffffff x2 : 906b329bb5362a00
> [99359.827394] x1 : 906b329bb5362a00 x0 : 0000000000000000
> [99359.828540] Call trace:
> [99359.829071] bond_3ad_state_machine_handler+0x7fc/0xdb8 [bonding]
> [99359.830367] process_one_work+0x15c/0x4a0
> [99359.831216] worker_thread+0x50/0x478
> [99359.832022] kthread+0x130/0x160
> [99359.832716] ret_from_fork+0x10/0x18
> [99359.833487] Code: 910c0021 95f704bb f9403f80 b5ffe300 (f9401000)
> [99359.834742] ---[ end trace c7a8e02914afc4e0 ]---
> [99359.835817] Kernel panic - not syncing: Fatal exception in interrupt
> [99359.837334] SMP: stopping secondary CPUs
> [99359.838277] Kernel Offset: disabled
> [99359.839086] CPU features: 0x080002,22208218
> [99359.840053] Memory Limit: none
> [99359.840783] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
>
> The test procedure is as follows:
> 1. Configure bonding and set it to mode 4.
> echo "4" > /sys/class/net/bond0/bonding/mode
> ifconfig bond0 up
>
> 2. Configure two VLANs and add them to the bonding in step 1.
> vconfig add eth0 2001
> vconfig add eth1 2001
> ifenslave bond0 eth0.2001 eth1.2001
>
> 3. Unload the network device driver and bonding driver.
> rmmod hns3
> rmmod hclge
> rmmod hnae3
> rmmod bonding.ko
>
> 4. Repeat the preceding steps for a long time.
>
> By checking the logic in ad_port_selection_logic(), I find that
> if enter the branch "Port %d did not find a suitable aggregator",
> the value of port->aggregator will be NULL, causing the problem.
>
> So I'd like to ask what circumstances will be involved in this
> branch, and what should be done in this case?
>
>
> The detailed code analysis is as follows:
>
> static void ad_port_selection_logic(struct port *port, bool *update_slave_arr)
> {
> [...]
> /* if the port is connected to other aggregator, detach it */
> if (port->aggregator) {
> /* detach the port from its former aggregator */
> temp_aggregator = port->aggregator;
> for (curr_port = temp_aggregator->lag_ports; curr_port;
> last_port = curr_port,
> curr_port = curr_port->next_port_in_aggregator) {
> if (curr_port == port) {
> temp_aggregator->num_of_ports--;
> /* if it is the first port attached to the
> * aggregator
> */
> if (!last_port) {
> temp_aggregator->lag_ports =
> port->next_port_in_aggregator;
> } else {
> /* not the first port attached to the
> * aggregator
> */
> last_port->next_port_in_aggregator =
> port->next_port_in_aggregator;
> }
>
> /* clear the port's relations to this
> * aggregator
> */
> port->aggregator = NULL;
>
> ----analysis: set port->aggregator NULL at the beginning of ad_port_selection_logic().
>
> port->next_port_in_aggregator = NULL;
> port->actor_port_aggregator_identifier = 0;
>
> slave_dbg(bond->dev, port->slave->dev, "Port %d left LAG %d\n",
> port->actor_port_number,
> temp_aggregator->aggregator_identifier);
> /* if the aggregator is empty, clear its
> * parameters, and set it ready to be attached
> */
> if (!temp_aggregator->lag_ports)
> ad_clear_agg(temp_aggregator);
> break;
> }
> }
> if (!curr_port) {
> /* meaning: the port was related to an aggregator
> * but was not on the aggregator port list
> */
> net_warn_ratelimited("%s: (slave %s): Warning: Port %d was related to aggregator %d but was not on its port list\n",
> port->slave->bond->dev->name,
> port->slave->dev->name,
> port->actor_port_number,
> port->aggregator->aggregator_identifier);
> }
> }
> /* search on all aggregators for a suitable aggregator for this port */
> bond_for_each_slave(bond, slave, iter) {
> aggregator = &(SLAVE_AD_INFO(slave)->aggregator);
> /* keep a free aggregator for later use(if needed) */
> if (!aggregator->lag_ports) {
> if (!free_aggregator)
> free_aggregator = aggregator;
>
> ----analysis: Save free_aggregator if found a free aggregator. But in this case, no free aggregator is available.
>
> continue;
> }
> /* check if current aggregator suits us */
> if (((aggregator->actor_oper_aggregator_key == port->actor_oper_port_key) && /* if all parameters match AND */
> MAC_ADDRESS_EQUAL(&(aggregator->partner_system), &(port->partner_oper.system)) &&
> (aggregator->partner_system_priority == port->partner_oper.system_priority) &&
> (aggregator->partner_oper_aggregator_key == port->partner_oper.key)
> ) &&
> ((!MAC_ADDRESS_EQUAL(&(port->partner_oper.system), &(null_mac_addr)) && /* partner answers */
> !aggregator->is_individual) /* but is not individual OR */
> )
> ) {
> /* attach to the founded aggregator */
> port->aggregator = aggregator;
>
> ----analysis: If a proper aggregator is found, port->aggregator is assigned here.
>
> port->actor_port_aggregator_identifier =
> port->aggregator->aggregator_identifier;
> port->next_port_in_aggregator = aggregator->lag_ports;
> port->aggregator->num_of_ports++;
> aggregator->lag_ports = port;
> slave_dbg(bond->dev, slave->dev, "Port %d joined LAG %d (existing LAG)\n",
> port->actor_port_number,
> port->aggregator->aggregator_identifier);
>
> /* mark this port as selected */
> port->sm_vars |= AD_PORT_SELECTED;
> found = 1;
>
> ----analysis: Set found to 1 if port->aggregator is assigned.
>
> break;
> }
> }
> /* the port couldn't find an aggregator - attach it to a new
> * aggregator
> */
> if (!found) {
> if (free_aggregator) {
> /* assign port a new aggregator */
> port->aggregator = free_aggregator;
>
> ----analysis: No proper free_aggregator is found. Therefore, port->aggregator cannot be assigned here.
>
> port->actor_port_aggregator_identifier =
> port->aggregator->aggregator_identifier;
>
> /* update the new aggregator's parameters
> * if port was responsed from the end-user
> */
> if (port->actor_oper_port_key & AD_DUPLEX_KEY_MASKS)
> /* if port is full duplex */
> port->aggregator->is_individual = false;
> else
> port->aggregator->is_individual = true;
>
> port->aggregator->actor_admin_aggregator_key =
> port->actor_admin_port_key;
> port->aggregator->actor_oper_aggregator_key =
> port->actor_oper_port_key;
> port->aggregator->partner_system =
> port->partner_oper.system;
> port->aggregator->partner_system_priority =
> port->partner_oper.system_priority;
> port->aggregator->partner_oper_aggregator_key = port->partner_oper.key;
> port->aggregator->receive_state = 1;
> port->aggregator->transmit_state = 1;
> port->aggregator->lag_ports = port;
> port->aggregator->num_of_ports++;
>
> /* mark this port as selected */
> port->sm_vars |= AD_PORT_SELECTED;
>
> slave_dbg(bond->dev, port->slave->dev, "Port %d joined LAG %d (new LAG)\n",
> port->actor_port_number,
> port->aggregator->aggregator_identifier);
> } else {
> slave_err(bond->dev, port->slave->dev,
> "Port %d did not find a suitable aggregator\n",
> port->actor_port_number);
>
> ----analysis: The fault scenario goes to this branch, and port->aggregator remains NULL.
>
> }
> }
> /* if all aggregator's ports are READY_N == TRUE, set ready=TRUE
> * in all aggregator's ports, else set ready=FALSE in all
> * aggregator's ports
> */
> __set_agg_ports_ready(port->aggregator,
> __agg_ports_are_ready(port->aggregator));
>
> ----analysis: port->aggregator is still NULL, which causes problem.
>
>
> aggregator = __get_first_agg(port);
> ad_agg_selection_logic(aggregator, update_slave_arr);
>
> if (!port->aggregator->is_active)
> port->actor_oper_port_state &= ~LACP_STATE_SYNCHRONIZATION;
> }
>
>
> Thanks.
> .
>

2021-01-29 19:15:27

by Jay Vosburgh

[permalink] [raw]
Subject: Re: question about bonding mode 4

moyufeng <[email protected]> wrote:

>Ping...
>Any comments? Thanks!
>
>On 2021/1/15 10:02, moyufeng wrote:
>> Hi Team,
>>
>> I have a question about bonding. During testing bonding mode 4
>> scenarios, I find that there is a very low probability that
>> the pointer is null. The following information is displayed:
>>
>> [99359.795934] bond0: (slave eth13.2001): Port 2 did not find a suitable aggregator
>> [99359.796960] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
>> [99359.798127] Mem abort info:
>> [99359.798526] ESR = 0x96000004
>> [99359.798938] EC = 0x25: DABT (current EL), IL = 32 bits
>> [99359.799673] SET = 0, FnV = 0
>> [99359.800106] EA = 0, S1PTW = 0
>> [99359.800554] Data abort info:
>> [99359.800952] ISV = 0, ISS = 0x00000004
>> [99359.801522] CM = 0, WnR = 0
>> [99359.801970] user pgtable: 4k pages, 48-bit VAs, pgdp=00000000c64e6000
>> [99359.802876] [0000000000000020] pgd=0000000000000000
>> [99359.803555] Internal error: Oops: 96000004 [#1] PREEMPT SMP
>> [99359.804369] Modules linked in: bonding hns3(-) hclgevf hnae3 [last unloaded: bonding]
>> [99359.805494] CPU: 1 PID: 951 Comm: kworker/u10:2 Not tainted 5.7.0-rc4+ #1
>> [99359.806455] Hardware name: linux,dummy-virt (DT)
>> [99359.807107] Workqueue: bond0 bond_3ad_state_machine_handler [bonding]
>> [99359.808056] pstate: 60c00005 (nZCv daif +PAN +UAO)
>> [99359.808722] pc : bond_3ad_state_machine_handler+0x7fc/0xdb8 [bonding]
>> [99359.809652] lr : bond_3ad_state_machine_handler+0x7f4/0xdb8 [bonding]
>> [99359.810535] sp : ffff80001882bd20
>> [99359.811012] x29: ffff80001882bd20 x28: ffff000085939a38
>> [99359.811791] x27: ffff00008649bb68 x26: 00000000aaaaaaab
>> [99359.812871] x25: ffff800009401000 x24: ffff800009408de4
>> [99359.814049] x23: ffff80001882bd98 x22: ffff00008649b880
>> [99359.815210] x21: 0000000000000000 x20: ffff000085939a00
>> [99359.816401] x19: ffff00008649b880 x18: ffff800012572988
>> [99359.817637] x17: 0000000000000000 x16: 0000000000000000
>> [99359.818870] x15: ffff80009882b987 x14: 726f746167657267
>> [99359.820090] x13: 676120656c626174 x12: 697573206120646e
>> [99359.821374] x11: 696620746f6e2064 x10: 696420322074726f
>> [99359.822659] x9 : 50203a2931303032 x8 : 0000000000081391
>> [99359.823891] x7 : ffff8000108e3ad0 x6 : ffff8000128858bb
>> [99359.825109] x5 : 0000000000000000 x4 : 0000000000000000
>> [99359.826262] x3 : 00000000ffffffff x2 : 906b329bb5362a00
>> [99359.827394] x1 : 906b329bb5362a00 x0 : 0000000000000000
>> [99359.828540] Call trace:
>> [99359.829071] bond_3ad_state_machine_handler+0x7fc/0xdb8 [bonding]
>> [99359.830367] process_one_work+0x15c/0x4a0
>> [99359.831216] worker_thread+0x50/0x478
>> [99359.832022] kthread+0x130/0x160
>> [99359.832716] ret_from_fork+0x10/0x18
>> [99359.833487] Code: 910c0021 95f704bb f9403f80 b5ffe300 (f9401000)
>> [99359.834742] ---[ end trace c7a8e02914afc4e0 ]---
>> [99359.835817] Kernel panic - not syncing: Fatal exception in interrupt
>> [99359.837334] SMP: stopping secondary CPUs
>> [99359.838277] Kernel Offset: disabled
>> [99359.839086] CPU features: 0x080002,22208218
>> [99359.840053] Memory Limit: none
>> [99359.840783] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
>>
>> The test procedure is as follows:
>> 1. Configure bonding and set it to mode 4.
>> echo "4" > /sys/class/net/bond0/bonding/mode
>> ifconfig bond0 up
>>
>> 2. Configure two VLANs and add them to the bonding in step 1.
>> vconfig add eth0 2001
>> vconfig add eth1 2001
>> ifenslave bond0 eth0.2001 eth1.2001
>>
>> 3. Unload the network device driver and bonding driver.
>> rmmod hns3
>> rmmod hclge
>> rmmod hnae3
>> rmmod bonding.ko

Are you running the above in a script, and can you share the
entire thing?

Does the issue occur with the current net-next?

>> 4. Repeat the preceding steps for a long time.

When you run this test, what are the network interfaces eth0 and
eth1 connected to, and are those ports configured for VLAN 2001 and
LACP?

>> By checking the logic in ad_port_selection_logic(), I find that
>> if enter the branch "Port %d did not find a suitable aggregator",
>> the value of port->aggregator will be NULL, causing the problem.
>>
>> So I'd like to ask what circumstances will be involved in this
>> branch, and what should be done in this case?

Well, in principle, this shouldn't ever happen. Every port
structure contains an aggregator structure, so there should always be
one available somewhere. I'm going to speculate that there's a race
condition somewhere in the teardown processing vs the LACP state machine
that invalidates this presumption.

>> The detailed code analysis is as follows:

[...]

>> /* if all aggregator's ports are READY_N == TRUE, set ready=TRUE
>> * in all aggregator's ports, else set ready=FALSE in all
>> * aggregator's ports
>> */
>> __set_agg_ports_ready(port->aggregator,
>> __agg_ports_are_ready(port->aggregator));
>>
>> ----analysis: port->aggregator is still NULL, which causes problem.
>>
>> aggregator = __get_first_agg(port);
>> ad_agg_selection_logic(aggregator, update_slave_arr);
>>
>> if (!port->aggregator->is_active)
>> port->actor_oper_port_state &= ~LACP_STATE_SYNCHRONIZATION;

Correct, if the "did not find a suitable aggregator" path is
taken, port->aggregator is NULL and bad things happen in the above
block.

This is something that needs to be fixed, but I'm also concerned
that there are other issues lurking, so I'd like to be able to reproduce
this.

-J

---
-Jay Vosburgh, [email protected]

2021-01-30 09:43:59

by moyufeng

[permalink] [raw]
Subject: Re: question about bonding mode 4



On 2021/1/30 3:11, Jay Vosburgh wrote:
> moyufeng <[email protected]> wrote:
>
>> Ping...
>> Any comments? Thanks!
>>
>> On 2021/1/15 10:02, moyufeng wrote:
>>> Hi Team,
>>>
>>> I have a question about bonding. During testing bonding mode 4
>>> scenarios, I find that there is a very low probability that
>>> the pointer is null. The following information is displayed:
>>>
>>> [99359.795934] bond0: (slave eth13.2001): Port 2 did not find a suitable aggregator
>>> [99359.796960] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
>>> [99359.798127] Mem abort info:
>>> [99359.798526] ESR = 0x96000004
>>> [99359.798938] EC = 0x25: DABT (current EL), IL = 32 bits
>>> [99359.799673] SET = 0, FnV = 0
>>> [99359.800106] EA = 0, S1PTW = 0
>>> [99359.800554] Data abort info:
>>> [99359.800952] ISV = 0, ISS = 0x00000004
>>> [99359.801522] CM = 0, WnR = 0
>>> [99359.801970] user pgtable: 4k pages, 48-bit VAs, pgdp=00000000c64e6000
>>> [99359.802876] [0000000000000020] pgd=0000000000000000
>>> [99359.803555] Internal error: Oops: 96000004 [#1] PREEMPT SMP
>>> [99359.804369] Modules linked in: bonding hns3(-) hclgevf hnae3 [last unloaded: bonding]
>>> [99359.805494] CPU: 1 PID: 951 Comm: kworker/u10:2 Not tainted 5.7.0-rc4+ #1
>>> [99359.806455] Hardware name: linux,dummy-virt (DT)
>>> [99359.807107] Workqueue: bond0 bond_3ad_state_machine_handler [bonding]
>>> [99359.808056] pstate: 60c00005 (nZCv daif +PAN +UAO)
>>> [99359.808722] pc : bond_3ad_state_machine_handler+0x7fc/0xdb8 [bonding]
>>> [99359.809652] lr : bond_3ad_state_machine_handler+0x7f4/0xdb8 [bonding]
>>> [99359.810535] sp : ffff80001882bd20
>>> [99359.811012] x29: ffff80001882bd20 x28: ffff000085939a38
>>> [99359.811791] x27: ffff00008649bb68 x26: 00000000aaaaaaab
>>> [99359.812871] x25: ffff800009401000 x24: ffff800009408de4
>>> [99359.814049] x23: ffff80001882bd98 x22: ffff00008649b880
>>> [99359.815210] x21: 0000000000000000 x20: ffff000085939a00
>>> [99359.816401] x19: ffff00008649b880 x18: ffff800012572988
>>> [99359.817637] x17: 0000000000000000 x16: 0000000000000000
>>> [99359.818870] x15: ffff80009882b987 x14: 726f746167657267
>>> [99359.820090] x13: 676120656c626174 x12: 697573206120646e
>>> [99359.821374] x11: 696620746f6e2064 x10: 696420322074726f
>>> [99359.822659] x9 : 50203a2931303032 x8 : 0000000000081391
>>> [99359.823891] x7 : ffff8000108e3ad0 x6 : ffff8000128858bb
>>> [99359.825109] x5 : 0000000000000000 x4 : 0000000000000000
>>> [99359.826262] x3 : 00000000ffffffff x2 : 906b329bb5362a00
>>> [99359.827394] x1 : 906b329bb5362a00 x0 : 0000000000000000
>>> [99359.828540] Call trace:
>>> [99359.829071] bond_3ad_state_machine_handler+0x7fc/0xdb8 [bonding]
>>> [99359.830367] process_one_work+0x15c/0x4a0
>>> [99359.831216] worker_thread+0x50/0x478
>>> [99359.832022] kthread+0x130/0x160
>>> [99359.832716] ret_from_fork+0x10/0x18
>>> [99359.833487] Code: 910c0021 95f704bb f9403f80 b5ffe300 (f9401000)
>>> [99359.834742] ---[ end trace c7a8e02914afc4e0 ]---
>>> [99359.835817] Kernel panic - not syncing: Fatal exception in interrupt
>>> [99359.837334] SMP: stopping secondary CPUs
>>> [99359.838277] Kernel Offset: disabled
>>> [99359.839086] CPU features: 0x080002,22208218
>>> [99359.840053] Memory Limit: none
>>> [99359.840783] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
>>>
>>> The test procedure is as follows:
>>> 1. Configure bonding and set it to mode 4.
>>> echo "4" > /sys/class/net/bond0/bonding/mode
>>> ifconfig bond0 up
>>>
>>> 2. Configure two VLANs and add them to the bonding in step 1.
>>> vconfig add eth0 2001
>>> vconfig add eth1 2001
>>> ifenslave bond0 eth0.2001 eth1.2001
>>>
>>> 3. Unload the network device driver and bonding driver.
>>> rmmod hns3
>>> rmmod hclge
>>> rmmod hnae3
>>> rmmod bonding.ko
>
> Are you running the above in a script, and can you share the
> entire thing?

Thanks for your reply.

Yes, it's a script, as below. However, the recurrence probability is
very low. So far, this problem occurs only once :(

#start
#!/bin/bash
Version=$(uname -r)
KoPath=/lib/modules/"${Version}"
while [ 1 ];do
insmod ${KoPath}/hnae3.ko
insmod ${KoPath}/hclgevf.ko
insmod ${KoPath}/hns3.ko
insmod ${KoPath}/bonding.ko

ifconfig eth0 up
ifconfig eth1 up

echo "4" > /sys/class/net/bond0/bonding/mode
ifconfig bond0 192.168.1.101
echo 100 > "/sys/class/net/bond0/bonding/miimon"
sleep 8

vconfig add eth0 2001
vconfig add eth1 2001
ifconfig eth0.2001 192.168.2.101 up
ifconfig eth1.2001 192.168.3.101 up
sleep 8

ifenslave bond0 eth0.2001 eth13.2001
sleep 8

rmmod hns3
rmmod hclge
rmmod hclgevf
rmmod hnae3
rmmod bonding.ko
sleep 5
done
#end

>
> Does the issue occur with the current net-next?
>
Yes

>>> 4. Repeat the preceding steps for a long time.
>
> When you run this test, what are the network interfaces eth0 and
> eth1 connected to, and are those ports configured for VLAN 2001 and
> LACP?
>
Yes, the configurations at both ends are the same.

>>> By checking the logic in ad_port_selection_logic(), I find that
>>> if enter the branch "Port %d did not find a suitable aggregator",
>>> the value of port->aggregator will be NULL, causing the problem.
>>>
>>> So I'd like to ask what circumstances will be involved in this
>>> branch, and what should be done in this case?
>
> Well, in principle, this shouldn't ever happen. Every port
> structure contains an aggregator structure, so there should always be
> one available somewhere. I'm going to speculate that there's a race
> condition somewhere in the teardown processing vs the LACP state machine
> that invalidates this presumption.
>
I agree with your inference. But I don't have a better way to prove it,
since the recurrence probability is too low.

>>> The detailed code analysis is as follows:
>
> [...]
>
>>> /* if all aggregator's ports are READY_N == TRUE, set ready=TRUE
>>> * in all aggregator's ports, else set ready=FALSE in all
>>> * aggregator's ports
>>> */
>>> __set_agg_ports_ready(port->aggregator,
>>> __agg_ports_are_ready(port->aggregator));
>>>
>>> ----analysis: port->aggregator is still NULL, which causes problem.
>>>
>>> aggregator = __get_first_agg(port);
>>> ad_agg_selection_logic(aggregator, update_slave_arr);
>>>
>>> if (!port->aggregator->is_active)
>>> port->actor_oper_port_state &= ~LACP_STATE_SYNCHRONIZATION;
>
> Correct, if the "did not find a suitable aggregator" path is
> taken, port->aggregator is NULL and bad things happen in the above
> block.
>
> This is something that needs to be fixed, but I'm also concerned
> that there are other issues lurking, so I'd like to be able to reproduce
> this.
>
> -J
>
I will continue to reproduce the problem and try to find ways to improve
the reproducibility probability.

Since this path is incorrect, could you help to fix it?

Thank you! :)

> ---
> -Jay Vosburgh, [email protected]
> .
>