Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp5529609yba; Mon, 13 May 2019 12:28:57 -0700 (PDT) X-Google-Smtp-Source: APXvYqzQqFczfOePIUdA9Iq5S5Hxy0RQqS1RUK1hEciysF9Lr2DdUWhPNe21JQCGJoQQsaI1UKF1 X-Received: by 2002:a63:90c7:: with SMTP id a190mr34391892pge.23.1557775736951; Mon, 13 May 2019 12:28:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1557775736; cv=none; d=google.com; s=arc-20160816; b=jpAdzkeEMJgB9yaOnaV62UzAqIQUSJEUCETLFRMVYIgtK9OcksXCxa9u97VZGm10dk Yk4m3Qn53ZIxxqIWhDEoI5W7ReK3HG2V+0DeEsQ/TL0gfBEZ8Cvh7C4drPydVmOZkLVO fS9Me5JDvwQ2FfwqGHabTvFXnBWuMAwEWrcRCSN+0YifUiH6Qhq3pofHP/EEUuMZXXkx 2bD9bUUbid2lKJ3Jp/yzuv6T7v17s9K1P56mtCVnb/uRRYQDb6+xcg6Zd8RWKYQ1DqS+ 6mzwzGLZdVSy22XAHYIVEr5ikewaA9hsM5c6DV5Xxxa0sd1P2l+vilxmhuszv1fW+tFr clRQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:content-transfer-encoding :content-id:mime-version:comments:references:in-reply-to:subject:cc :to:from; bh=k5SfZbSp0vhZ7M+7WAZ5uywpFvQz2CqJSfeAwX0C6UY=; b=I39UMXkNAFUyOBluH5ERE0YZR5OAhzqBD+1pmsNetRMoWlnODS8b/gdNgMiiG8Q2lC dzwZGQlUc6NEe/pJhC/ZfOIiaEXuZsXNXMm1U/P4ilzgk3L8C/fRxb+z0/OA8h1L+suR 7nsEerW6rivMg04MPzkJ0fjezdZvyWa6M2gxIivQHp0Q1VjI0vwruygd9z0VJNpxfWz1 /+Il3mchJWDpxP2CqwOj3q8fHStGXMXP4TID6CSeYsbjJ9FimtNDWgbn2mZRaimmtlLP XszSdPVY4YhZvs1jwY0Dwb8vJkIJwuS6Z1EoXqj1DErh9QW4SA0NZxGMSYW4m8Rfz6Sg b6tg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v11si17599882plg.381.2019.05.13.12.28.41; Mon, 13 May 2019 12:28:56 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730764AbfEMQp1 convert rfc822-to-8bit (ORCPT + 99 others); Mon, 13 May 2019 12:45:27 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:50323 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727132AbfEMQp0 (ORCPT ); Mon, 13 May 2019 12:45:26 -0400 Received: from c-67-160-6-8.hsd1.wa.comcast.net ([67.160.6.8] helo=famine.localdomain) by youngberry.canonical.com with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.76) (envelope-from ) id 1hQE32-0002lx-Av; Mon, 13 May 2019 16:43:32 +0000 Received: by famine.localdomain (Postfix, from userid 1000) id 69B2B5FF13; Mon, 13 May 2019 09:43:30 -0700 (PDT) Received: from famine (localhost [127.0.0.1]) by famine.localdomain (Postfix) with ESMTP id 61686A6E12; Mon, 13 May 2019 09:43:30 -0700 (PDT) From: Jay Vosburgh To: Jarod Wilson cc: linux-kernel@vger.kernel.org, Veaceslav Falico , Andy Gospodarek , "David S. Miller" , netdev@vger.kernel.org Subject: Re: [PATCH] bonding: fix arp_validate toggling in active-backup mode In-reply-to: <2033e768-9e35-ac89-c526-4c28fc3f747e@redhat.com> References: <20190510215709.19162-1-jarod@redhat.com> <26675.1557528809@famine> <2033e768-9e35-ac89-c526-4c28fc3f747e@redhat.com> Comments: In-reply-to Jarod Wilson message dated "Sat, 11 May 2019 02:12:58 -0400." X-Mailer: MH-E 8.6+git; nmh 1.6; GNU Emacs 27.0.50 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <6719.1557765810.1@famine> Content-Transfer-Encoding: 8BIT Date: Mon, 13 May 2019 09:43:30 -0700 Message-ID: <6720.1557765810@famine> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Jarod Wilson wrote: >On 5/10/19 6:53 PM, Jay Vosburgh wrote: >> Jarod Wilson wrote: >> >>> There's currently a problem with toggling arp_validate on and off with an >>> active-backup bond. At the moment, you can start up a bond, like so: >>> >>> modprobe bonding mode=1 arp_interval=100 arp_validate=0 arp_ip_targets=192.168.1.1 >>> ip link set bond0 down >>> echo "ens4f0" > /sys/class/net/bond0/bonding/slaves >>> echo "ens4f1" > /sys/class/net/bond0/bonding/slaves >>> ip link set bond0 up >>> ip addr add 192.168.1.2/24 dev bond0 >>> >>> Pings to 192.168.1.1 work just fine. Now turn on arp_validate: >>> >>> echo 1 > /sys/class/net/bond0/bonding/arp_validate >>> >>> Pings to 192.168.1.1 continue to work just fine. Now when you go to turn >>> arp_validate off again, the link falls flat on it's face: >>> >>> echo 0 > /sys/class/net/bond0/bonding/arp_validate >>> dmesg >>> ... >>> [133191.911987] bond0: Setting arp_validate to none (0) >>> [133194.257793] bond0: bond_should_notify_peers: slave ens4f0 >>> [133194.258031] bond0: link status definitely down for interface ens4f0, disabling it >>> [133194.259000] bond0: making interface ens4f1 the new active one >>> [133197.330130] bond0: link status definitely down for interface ens4f1, disabling it >>> [133197.331191] bond0: now running without any active interface! >>> >>> The problem lies in bond_options.c, where passing in arp_validate=0 >>> results in bond->recv_probe getting set to NULL. This flies directly in >>> the face of commit 3fe68df97c7f, which says we need to set recv_probe = >>> bond_arp_recv, even if we're not using arp_validate. Said commit fixed >>> this in bond_option_arp_interval_set, but missed that we can get to that >>> same state in bond_option_arp_validate_set as well. >>> >>> One solution would be to universally set recv_probe = bond_arp_recv here >>> as well, but I don't think bond_option_arp_validate_set has any business >>> touching recv_probe at all, and that should be left to the arp_interval >>> code, so we can just make things much tidier here. >>> >>> Fixes: 3fe68df97c7f ("bonding: always set recv_probe to bond_arp_rcv in arp monitor") >> >> Is the above Fixes: tag correct? 3fe68df97c7f is not the source >> of the erroneous logic being removed, which was introduced by >> >> commit 29c4948293bfc426e52a921f4259eb3676961e81 >> Author: sfeldma@cumulusnetworks.com >> Date: Thu Dec 12 14:10:38 2013 -0800 >> >> bonding: add arp_validate netlink support > >I wasn't entirely sure that was the best choice for Fixes either, it was >sort of more "Augments the Fix in", so I'd certainly have no objection to >changing the Fixes tag to the earlier commit instead. That would be my preference, as the 29c4948293bf commit looks to be the change actually being fixed. -J --- -Jay Vosburgh, jay.vosburgh@canonical.com