2019-05-10 22:21:47

by Jarod Wilson

[permalink] [raw]
Subject: [PATCH] bonding: fix arp_validate toggling in active-backup mode

There's currently a problem with toggling arp_validate on and off with an
active-backup bond. At the moment, you can start up a bond, like so:

modprobe bonding mode=1 arp_interval=100 arp_validate=0 arp_ip_targets=192.168.1.1
ip link set bond0 down
echo "ens4f0" > /sys/class/net/bond0/bonding/slaves
echo "ens4f1" > /sys/class/net/bond0/bonding/slaves
ip link set bond0 up
ip addr add 192.168.1.2/24 dev bond0

Pings to 192.168.1.1 work just fine. Now turn on arp_validate:

echo 1 > /sys/class/net/bond0/bonding/arp_validate

Pings to 192.168.1.1 continue to work just fine. Now when you go to turn
arp_validate off again, the link falls flat on it's face:

echo 0 > /sys/class/net/bond0/bonding/arp_validate
dmesg
...
[133191.911987] bond0: Setting arp_validate to none (0)
[133194.257793] bond0: bond_should_notify_peers: slave ens4f0
[133194.258031] bond0: link status definitely down for interface ens4f0, disabling it
[133194.259000] bond0: making interface ens4f1 the new active one
[133197.330130] bond0: link status definitely down for interface ens4f1, disabling it
[133197.331191] bond0: now running without any active interface!

The problem lies in bond_options.c, where passing in arp_validate=0
results in bond->recv_probe getting set to NULL. This flies directly in
the face of commit 3fe68df97c7f, which says we need to set recv_probe =
bond_arp_recv, even if we're not using arp_validate. Said commit fixed
this in bond_option_arp_interval_set, but missed that we can get to that
same state in bond_option_arp_validate_set as well.

One solution would be to universally set recv_probe = bond_arp_recv here
as well, but I don't think bond_option_arp_validate_set has any business
touching recv_probe at all, and that should be left to the arp_interval
code, so we can just make things much tidier here.

Fixes: 3fe68df97c7f ("bonding: always set recv_probe to bond_arp_rcv in arp monitor")
CC: Jay Vosburgh <[email protected]>
CC: Veaceslav Falico <[email protected]>
CC: Andy Gospodarek <[email protected]>
CC: "David S. Miller" <[email protected]>
CC: [email protected]
Signed-off-by: Jarod Wilson <[email protected]>
---
drivers/net/bonding/bond_options.c | 7 -------
1 file changed, 7 deletions(-)

diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index da1fc17295d9..b996967af8d9 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -1098,13 +1098,6 @@ static int bond_option_arp_validate_set(struct bonding *bond,
{
netdev_dbg(bond->dev, "Setting arp_validate to %s (%llu)\n",
newval->string, newval->value);
-
- if (bond->dev->flags & IFF_UP) {
- if (!newval->value)
- bond->recv_probe = NULL;
- else if (bond->params.arp_interval)
- bond->recv_probe = bond_arp_rcv;
- }
bond->params.arp_validate = newval->value;

return 0;
--
2.20.1


2019-05-10 23:18:57

by Jay Vosburgh

[permalink] [raw]
Subject: Re: [PATCH] bonding: fix arp_validate toggling in active-backup mode

Jarod Wilson <[email protected]> wrote:

>There's currently a problem with toggling arp_validate on and off with an
>active-backup bond. At the moment, you can start up a bond, like so:
>
>modprobe bonding mode=1 arp_interval=100 arp_validate=0 arp_ip_targets=192.168.1.1
>ip link set bond0 down
>echo "ens4f0" > /sys/class/net/bond0/bonding/slaves
>echo "ens4f1" > /sys/class/net/bond0/bonding/slaves
>ip link set bond0 up
>ip addr add 192.168.1.2/24 dev bond0
>
>Pings to 192.168.1.1 work just fine. Now turn on arp_validate:
>
>echo 1 > /sys/class/net/bond0/bonding/arp_validate
>
>Pings to 192.168.1.1 continue to work just fine. Now when you go to turn
>arp_validate off again, the link falls flat on it's face:
>
>echo 0 > /sys/class/net/bond0/bonding/arp_validate
>dmesg
>...
>[133191.911987] bond0: Setting arp_validate to none (0)
>[133194.257793] bond0: bond_should_notify_peers: slave ens4f0
>[133194.258031] bond0: link status definitely down for interface ens4f0, disabling it
>[133194.259000] bond0: making interface ens4f1 the new active one
>[133197.330130] bond0: link status definitely down for interface ens4f1, disabling it
>[133197.331191] bond0: now running without any active interface!
>
>The problem lies in bond_options.c, where passing in arp_validate=0
>results in bond->recv_probe getting set to NULL. This flies directly in
>the face of commit 3fe68df97c7f, which says we need to set recv_probe =
>bond_arp_recv, even if we're not using arp_validate. Said commit fixed
>this in bond_option_arp_interval_set, but missed that we can get to that
>same state in bond_option_arp_validate_set as well.
>
>One solution would be to universally set recv_probe = bond_arp_recv here
>as well, but I don't think bond_option_arp_validate_set has any business
>touching recv_probe at all, and that should be left to the arp_interval
>code, so we can just make things much tidier here.
>
>Fixes: 3fe68df97c7f ("bonding: always set recv_probe to bond_arp_rcv in arp monitor")

Is the above Fixes: tag correct? 3fe68df97c7f is not the source
of the erroneous logic being removed, which was introduced by

commit 29c4948293bfc426e52a921f4259eb3676961e81
Author: [email protected] <[email protected]>
Date: Thu Dec 12 14:10:38 2013 -0800

bonding: add arp_validate netlink support

Regardless of which Fixes: is correct, the patch itself looks
fine to me:

Signed-off-by: Jay Vosburgh <[email protected]>

-J


>CC: Jay Vosburgh <[email protected]>
>CC: Veaceslav Falico <[email protected]>
>CC: Andy Gospodarek <[email protected]>
>CC: "David S. Miller" <[email protected]>
>CC: [email protected]
>Signed-off-by: Jarod Wilson <[email protected]>
>---
> drivers/net/bonding/bond_options.c | 7 -------
> 1 file changed, 7 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
>index da1fc17295d9..b996967af8d9 100644
>--- a/drivers/net/bonding/bond_options.c
>+++ b/drivers/net/bonding/bond_options.c
>@@ -1098,13 +1098,6 @@ static int bond_option_arp_validate_set(struct bonding *bond,
> {
> netdev_dbg(bond->dev, "Setting arp_validate to %s (%llu)\n",
> newval->string, newval->value);
>-
>- if (bond->dev->flags & IFF_UP) {
>- if (!newval->value)
>- bond->recv_probe = NULL;
>- else if (bond->params.arp_interval)
>- bond->recv_probe = bond_arp_rcv;
>- }
> bond->params.arp_validate = newval->value;
>
> return 0;
>--
>2.20.1
>

---
-Jay Vosburgh, [email protected]

2019-05-11 06:23:29

by Jarod Wilson

[permalink] [raw]
Subject: Re: [PATCH] bonding: fix arp_validate toggling in active-backup mode

On 5/10/19 6:53 PM, Jay Vosburgh wrote:
> Jarod Wilson <[email protected]> wrote:
>
>> There's currently a problem with toggling arp_validate on and off with an
>> active-backup bond. At the moment, you can start up a bond, like so:
>>
>> modprobe bonding mode=1 arp_interval=100 arp_validate=0 arp_ip_targets=192.168.1.1
>> ip link set bond0 down
>> echo "ens4f0" > /sys/class/net/bond0/bonding/slaves
>> echo "ens4f1" > /sys/class/net/bond0/bonding/slaves
>> ip link set bond0 up
>> ip addr add 192.168.1.2/24 dev bond0
>>
>> Pings to 192.168.1.1 work just fine. Now turn on arp_validate:
>>
>> echo 1 > /sys/class/net/bond0/bonding/arp_validate
>>
>> Pings to 192.168.1.1 continue to work just fine. Now when you go to turn
>> arp_validate off again, the link falls flat on it's face:
>>
>> echo 0 > /sys/class/net/bond0/bonding/arp_validate
>> dmesg
>> ...
>> [133191.911987] bond0: Setting arp_validate to none (0)
>> [133194.257793] bond0: bond_should_notify_peers: slave ens4f0
>> [133194.258031] bond0: link status definitely down for interface ens4f0, disabling it
>> [133194.259000] bond0: making interface ens4f1 the new active one
>> [133197.330130] bond0: link status definitely down for interface ens4f1, disabling it
>> [133197.331191] bond0: now running without any active interface!
>>
>> The problem lies in bond_options.c, where passing in arp_validate=0
>> results in bond->recv_probe getting set to NULL. This flies directly in
>> the face of commit 3fe68df97c7f, which says we need to set recv_probe =
>> bond_arp_recv, even if we're not using arp_validate. Said commit fixed
>> this in bond_option_arp_interval_set, but missed that we can get to that
>> same state in bond_option_arp_validate_set as well.
>>
>> One solution would be to universally set recv_probe = bond_arp_recv here
>> as well, but I don't think bond_option_arp_validate_set has any business
>> touching recv_probe at all, and that should be left to the arp_interval
>> code, so we can just make things much tidier here.
>>
>> Fixes: 3fe68df97c7f ("bonding: always set recv_probe to bond_arp_rcv in arp monitor")
>
> Is the above Fixes: tag correct? 3fe68df97c7f is not the source
> of the erroneous logic being removed, which was introduced by
>
> commit 29c4948293bfc426e52a921f4259eb3676961e81
> Author: [email protected] <[email protected]>
> Date: Thu Dec 12 14:10:38 2013 -0800
>
> bonding: add arp_validate netlink support

I wasn't entirely sure that was the best choice for Fixes either, it was
sort of more "Augments the Fix in", so I'd certainly have no objection
to changing the Fixes tag to the earlier commit instead.

--
Jarod Wilson
[email protected]

2019-05-13 19:28:57

by Jay Vosburgh

[permalink] [raw]
Subject: Re: [PATCH] bonding: fix arp_validate toggling in active-backup mode

Jarod Wilson <[email protected]> wrote:

>On 5/10/19 6:53 PM, Jay Vosburgh wrote:
>> Jarod Wilson <[email protected]> wrote:
>>
>>> There's currently a problem with toggling arp_validate on and off with an
>>> active-backup bond. At the moment, you can start up a bond, like so:
>>>
>>> modprobe bonding mode=1 arp_interval=100 arp_validate=0 arp_ip_targets=192.168.1.1
>>> ip link set bond0 down
>>> echo "ens4f0" > /sys/class/net/bond0/bonding/slaves
>>> echo "ens4f1" > /sys/class/net/bond0/bonding/slaves
>>> ip link set bond0 up
>>> ip addr add 192.168.1.2/24 dev bond0
>>>
>>> Pings to 192.168.1.1 work just fine. Now turn on arp_validate:
>>>
>>> echo 1 > /sys/class/net/bond0/bonding/arp_validate
>>>
>>> Pings to 192.168.1.1 continue to work just fine. Now when you go to turn
>>> arp_validate off again, the link falls flat on it's face:
>>>
>>> echo 0 > /sys/class/net/bond0/bonding/arp_validate
>>> dmesg
>>> ...
>>> [133191.911987] bond0: Setting arp_validate to none (0)
>>> [133194.257793] bond0: bond_should_notify_peers: slave ens4f0
>>> [133194.258031] bond0: link status definitely down for interface ens4f0, disabling it
>>> [133194.259000] bond0: making interface ens4f1 the new active one
>>> [133197.330130] bond0: link status definitely down for interface ens4f1, disabling it
>>> [133197.331191] bond0: now running without any active interface!
>>>
>>> The problem lies in bond_options.c, where passing in arp_validate=0
>>> results in bond->recv_probe getting set to NULL. This flies directly in
>>> the face of commit 3fe68df97c7f, which says we need to set recv_probe =
>>> bond_arp_recv, even if we're not using arp_validate. Said commit fixed
>>> this in bond_option_arp_interval_set, but missed that we can get to that
>>> same state in bond_option_arp_validate_set as well.
>>>
>>> One solution would be to universally set recv_probe = bond_arp_recv here
>>> as well, but I don't think bond_option_arp_validate_set has any business
>>> touching recv_probe at all, and that should be left to the arp_interval
>>> code, so we can just make things much tidier here.
>>>
>>> Fixes: 3fe68df97c7f ("bonding: always set recv_probe to bond_arp_rcv in arp monitor")
>>
>> Is the above Fixes: tag correct? 3fe68df97c7f is not the source
>> of the erroneous logic being removed, which was introduced by
>>
>> commit 29c4948293bfc426e52a921f4259eb3676961e81
>> Author: [email protected] <[email protected]>
>> Date: Thu Dec 12 14:10:38 2013 -0800
>>
>> bonding: add arp_validate netlink support
>
>I wasn't entirely sure that was the best choice for Fixes either, it was
>sort of more "Augments the Fix in", so I'd certainly have no objection to
>changing the Fixes tag to the earlier commit instead.

That would be my preference, as the 29c4948293bf commit looks to
be the change actually being fixed.

-J

---
-Jay Vosburgh, [email protected]

2019-05-13 19:29:47

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] bonding: fix arp_validate toggling in active-backup mode

From: Jay Vosburgh <[email protected]>
Date: Mon, 13 May 2019 09:43:30 -0700

> That would be my preference, as the 29c4948293bf commit looks to
> be the change actually being fixed.

Sorry I pushed the original commit message out :-(

But isn't the Fixes: tag he choose the one where the logic actually
causes problems? That's kind of my real criteria for Fixes: tags.

2019-05-13 19:30:16

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] bonding: fix arp_validate toggling in active-backup mode

From: Jarod Wilson <[email protected]>
Date: Fri, 10 May 2019 17:57:09 -0400

> There's currently a problem with toggling arp_validate on and off with an
> active-backup bond. At the moment, you can start up a bond, like so:
...
> The problem lies in bond_options.c, where passing in arp_validate=0
> results in bond->recv_probe getting set to NULL. This flies directly in
> the face of commit 3fe68df97c7f, which says we need to set recv_probe =
> bond_arp_recv, even if we're not using arp_validate. Said commit fixed
> this in bond_option_arp_interval_set, but missed that we can get to that
> same state in bond_option_arp_validate_set as well.
>
> One solution would be to universally set recv_probe = bond_arp_recv here
> as well, but I don't think bond_option_arp_validate_set has any business
> touching recv_probe at all, and that should be left to the arp_interval
> code, so we can just make things much tidier here.
>
> Fixes: 3fe68df97c7f ("bonding: always set recv_probe to bond_arp_rcv in arp monitor")
...
> Signed-off-by: Jarod Wilson <[email protected]>

Applied and queued up for -stable, thanks.

2019-05-13 19:32:37

by Jay Vosburgh

[permalink] [raw]
Subject: Re: [PATCH] bonding: fix arp_validate toggling in active-backup mode

David Miller <[email protected]> wrote:

>From: Jay Vosburgh <[email protected]>
>Date: Mon, 13 May 2019 09:43:30 -0700
>
>> That would be my preference, as the 29c4948293bf commit looks to
>> be the change actually being fixed.
>
>Sorry I pushed the original commit message out :-(
>
>But isn't the Fixes: tag he choose the one where the logic actually
>causes problems? That's kind of my real criteria for Fixes: tags.

I don't think so. It looks like the problem being fixed here
(clearing recv_probe when we shouldn't) was introduced at 29c4948293bf,
but was not the only place the same problem existed. 3fe68df97c7f fixed
the other occurrences of this problem, but missed the specific one added
by 29c4948293bf, which is now fixed by this patch.

In any event, both of the commits in question are old enough
that it's kind of moot, as -stable will presumably get the right thing
regardless.

-J

---
-Jay Vosburgh, [email protected]