2019-08-23 17:35:06

by zhangsha (A)

[permalink] [raw]
Subject: [PATCH v2] bonding: force enable lacp port after link state recovery for 802.3ad

From: Sha Zhang <[email protected]>

After the commit 334031219a84 ("bonding/802.3ad: fix slave link
initialization transition states") merged,
the slave's link status will be changed to BOND_LINK_FAIL
from BOND_LINK_DOWN in the following scenario:
- Driver reports loss of carrier and
bonding driver receives NETDEV_DOWN notifier
- slave's duplex and speed is zerod and
its port->is_enabled is cleard to 'false';
- Driver reports link recovery and
bonding driver receives NETDEV_UP notifier;
- If speed/duplex getting failed here, the link status
will be changed to BOND_LINK_FAIL;
- The MII monotor later recover the slave's speed/duplex
and set link status to BOND_LINK_UP, but remains
the 'port->is_enabled' to 'false'.

In this scenario, the lacp port will not be enabled even its speed
and duplex are valid. The bond will not send LACPDU's, and its
state is 'AD_STATE_DEFAULTED' forever. The simplest fix I think
is to call bond_3ad_handle_link_change() in bond_miimon_commit,
this function can enable lacp after port slave speed check.
As enabled, the lacp port can run its state machine normally
after link recovery.

Signed-off-by: Sha Zhang <[email protected]>
---
drivers/net/bonding/bond_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 931d9d9..ef4ec99 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2206,7 +2206,7 @@ static void bond_miimon_commit(struct bonding *bond)
*/
if (BOND_MODE(bond) == BOND_MODE_8023AD &&
slave->link == BOND_LINK_UP)
- bond_3ad_adapter_speed_duplex_changed(slave);
+ bond_3ad_handle_link_change(slave, BOND_LINK_UP);
continue;

case BOND_LINK_UP:
--
1.8.3.1


2019-08-27 22:06:31

by David Miller

[permalink] [raw]
Subject: Re: [PATCH v2] bonding: force enable lacp port after link state recovery for 802.3ad

From: <[email protected]>
Date: Fri, 23 Aug 2019 11:42:09 +0800

> - If speed/duplex getting failed here, the link status
> will be changed to BOND_LINK_FAIL;

How does it fail at this step? I suspect this is a driver specific
problem.

2019-08-28 20:30:42

by David Miller

[permalink] [raw]
Subject: Re: [PATCH v2] bonding: force enable lacp port after link state recovery for 802.3ad


You've had enough time to respon to my feedback question.

I'm tossing this patch.

2019-08-29 11:35:46

by zhangsha (A)

[permalink] [raw]
Subject: RE: [PATCH v2] bonding: force enable lacp port after link state recovery for 802.3ad



> -----Original Message-----
> From: David Miller [mailto:[email protected]]
> Sent: 2019??8??28?? 6:05
> To: zhangsha (A) <[email protected]>
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; yuehaibing
> <[email protected]>; hunongda <[email protected]>;
> Chenzhendong (alex) <[email protected]>
> Subject: Re: [PATCH v2] bonding: force enable lacp port after link state
> recovery for 802.3ad
>
> From: <[email protected]>
> Date: Fri, 23 Aug 2019 11:42:09 +0800
>
> > - If speed/duplex getting failed here, the link status
> > will be changed to BOND_LINK_FAIL;
>
> How does it fail at this step? I suspect this is a driver specific problem.

Hi, David,
I'm really sorry for the delayed email and appreciated for your feedback.

I was testing in kernel 4.19 with a Huawei hinic card when the problem occurred.
I checked the dmesg and got the logs in the following order:
1) link status definitely down for interface eth6, disabling it
2) link status up again after 0 ms for interface eth6
3) the paterner's system mac becomes to "00:00:00:00:00:00".
By reading the codes, I think that the link status of the slave should be changed
to BOND_LINK_FAIL from BOND_LINK_DOWN.

As this problem has only occurred once only, I am not very sure about whether this is a
driver specific problem or not at the moment. But I find the commit 4d2c0cda,
its log says " Some NIC drivers don't have correct speed/duplex settings at the
time they send NETDEV_UP notification ...", so I prefer to believe it's not.

To my problem I think it is not enough that link-monitoring (miimon) only set
SPEED/DUPLEX right, the lacp port should be enabled too at the same time.