Received: by 2002:a05:6a10:9e8c:0:0:0:0 with SMTP id y12csp260967pxx; Thu, 29 Oct 2020 01:49:55 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyVJQhaxxngIhKtXkFIonbT3ngW6kCNTNBcV+6gmzk3UI0SWih6iU8hV4UQKnhR3R21/J9/ X-Received: by 2002:a17:906:1c50:: with SMTP id l16mr2967744ejg.144.1603961395461; Thu, 29 Oct 2020 01:49:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1603961395; cv=none; d=google.com; s=arc-20160816; b=YKSr+6M0zTT9n9yc+YGvK83LcZwadkjRmGcEbgoJ/kLA1MAVB9q4GhTSEZiNNvOOcv hTWCW+xjX2Zr9sO/FKKHeemdIgIeiSNsfk6bOVCsVMjWprn+s9me3SqBpntiJxpp5mSr 8ucfni9mGBtJZE9R+YLmrbyaX0eQpsmR1ZQuVZKMk9SdGfJDMCPc7yVCfEfc0A900mrG glMOM10DVHdjmA80/YWkuh7N6G3XYwcn/6Na+MiwxHTXEuvRei3/Rk2YfybkZtJ7R683 Z19vx/Ms8GxBJmnm+VDttiQhsuKm7X99zM/NqIW6A3UzZRvAdhHihtP3a0eRfS2fCE9Y zcfA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:content-transfer-encoding :content-id:mime-version:comments:references:in-reply-to:subject:cc :to:from; bh=Vrr+PvPTwJFGasOikIFHOdnZZU8ORab/HfFD9hg7kDw=; b=fBWh7fYlfzCPcDQC5moAUQBPH1LHbgWF7diTTrS15gFvKT3S6Ni2h+wmCe2q7SDx7k wyU2alGY7qAnso6ClHTZ9kHjKRoLF3kz97CVxFblq/gDEzts6vt1zFIQiJmIoZyFIeFY 6O13KImcNt/zXbNAnYD7GaLp/R1jbHj3eJpK+NEkFS1zTKA91ufCIng353d6DYrbod4v GgyTu/ReD50lhGXjxSw4uyLMHANmklgfPHpJF1m59gGDaczN2cnh5vy1gAZOXDmiNbU4 obCDRkKndW4Iw2V1IA+nlRD/085DBQqd6MAz5nBsgIsn+T5i33zEFPEzJvWv+vhs2p0d YH4g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id r2si1062649ejr.265.2020.10.29.01.49.33; Thu, 29 Oct 2020 01:49:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732808AbgJ2BrT convert rfc822-to-8bit (ORCPT + 99 others); Wed, 28 Oct 2020 21:47:19 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:35021 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732728AbgJ2BrS (ORCPT ); Wed, 28 Oct 2020 21:47:18 -0400 Received: from 1.general.jvosburgh.us.vpn ([10.172.68.206] helo=famine.localdomain) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kXcX5-0003Gh-9K; Wed, 28 Oct 2020 03:53:55 +0000 Received: by famine.localdomain (Postfix, from userid 1000) id C8A435FEE7; Tue, 27 Oct 2020 20:53:53 -0700 (PDT) Received: from famine (localhost [127.0.0.1]) by famine.localdomain (Postfix) with ESMTP id C19719FAC7; Tue, 27 Oct 2020 20:53:53 -0700 (PDT) From: Jay Vosburgh To: LIU Yulong cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Veaceslav Falico , Andy Gospodarek , "David S. Miller" , Jakub Kicinski Subject: Re: [PATCH] net: bonding: alb disable balance for IPv6 multicast related mac In-reply-to: <1603850163-4563-1-git-send-email-i@liuyulong.me> References: <1603850163-4563-1-git-send-email-i@liuyulong.me> Comments: In-reply-to LIU Yulong message dated "Wed, 28 Oct 2020 09:56:03 +0800." X-Mailer: MH-E 8.6+git; nmh 1.6; GNU Emacs 27.0.50 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <22347.1603857233.1@famine> Content-Transfer-Encoding: 8BIT Date: Tue, 27 Oct 2020 20:53:53 -0700 Message-ID: <22348.1603857233@famine> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org LIU Yulong wrote: >According to the RFC 2464 [1] the prefix "33:33:xx:xx:xx:xx" is defined to >construct the multicast destination MAC address for IPv6 multicast traffic. >The NDP (Neighbor Discovery Protocol for IPv6)[2] will comply with such >rule. The work steps [6] are: > *) Let's assume a destination address of 2001:db8:1:1::1. > *) This is mapped into the "Solicited Node Multicast Address" (SNMA) > format of ff02::1:ffXX:XXXX. > *) The XX:XXXX represent the last 24 bits of the SNMA, and are derived > directly from the last 24 bits of the destination address. > *) Resulting in a SNMA ff02::1:ff00:0001, or ff02::1:ff00:1. > *) This, being a multicast address, can be mapped to a multicast MAC > address, using the format 33-33-XX-XX-XX-XX > *) Resulting in 33-33-ff-00-00-01. > *) This is a MAC address that is only being listened for by nodes > sharing the same last 24 bits. > *) In other words, while there is a chance for a "address collision", > it is a vast improvement over ARP's guaranteed "collision". >Kernel related code can be found at [3][4][5]. > >The current bond alb has some leaks of such MAC ranges which will cause >the physical world failed to determain the back tunnel of the reply >packet during the response in a Spine-and-Leaf data center architecture. >The basic topology looks like this: > > +-------------+ > | | > +---| Border Leaf |-----+ > | | | | > | +-------------+ | > | | > | tunnel-1 | tunnel-2 > | | > | | >+---+----+ +------+-+ >| | | | >| Leaf1 +--X-X-X-X--+ Leaf2 | tunnel-3 will be checked to prevent loop >| | tunnel-3 | | >+--------+ +-+------+ > | | > | | > | | > | | > | | > | | > +----+ +----+ > +--+nic1+---+nic2+---+ > | +----+ +----+ | > | bond6 | > | | > | HOST | > +--------------------+ This description is, overall, very comprehensive, and I believe I generally understand what issue you're fixing (which seems to be a complicated means to cause MAC flapping), although I'm unclear on a few details, below. However, if you could make the ASCII art smaller I think that would be better. >When nic1 is sending the normal IPv6 traffic to the gateway in Border leaf, >the nic2 (slave) will send the NS packet out periodically, automatically >and implicitly as well. This is an example packet sending from the slave >nic2 which will broke the traffic. With this patch applied, what would happen if nic2 sends the normal IPv6 traffic from the source MAC in question (because it is tx-balanced there), and the Neighbor Solicitation multicast then goes out via nic1? > ac:1f:6b:90:5c:eb > 33:33:ff:00:00:01, ethertype 802.1Q (0x8100), > length 90: vlan 205, p 0, ethertype IPv6, (hlim 255, > next-header ICMPv6 (58) payload length: 32) > fe80::f816:3eff:feba:2d8c > ff02::1:ff00:1: > [icmp6 sum ok] ICMP6, neighbor solicitation, length 32, > who has 240e:980:2f00:4000::1 > source link-address option (1), length 8 (1): fa:16:3e:ba:2d:8c > 0x0000: fa16 3eba 2d8c > 0x0000: 3333 ff00 0001 ac1f 6b90 5ceb 8100 00cd > 0x0010: 86dd 6000 0000 0020 3aff fe80 0000 0000 > 0x0020: 0000 f816 3eff feba 2d8c ff02 0000 0000 > 0x0030: 0000 0000 0001 ff00 0001 8700 14d3 0000 > 0x0040: 0000 240e 0980 2f00 4000 0000 0000 0000 > 0x0050: 0001 0101 fa16 3eba 2d8c And perhaps trim out the hex dump here. >MAC "fa:16:3e:ba:2d:8c" was first learnt at Leaf1 based on the underlay >mechanism(BGP EVPN). When this example packet was sent to Border leaf and >replied with dst_mac "fa:16:3e:ba:2d:8c", Leaf2 will try to send packet >back to tunnel-3 at this point dropping happens because of the loop >defense. All the original normal IPv6 traffic will be lead to the tunnel-2 >and then drop. Link is broken now. Where does MAC fa:16:3e:ba:2d:8c come from? Is this the MAC address of the bond itself? Assuming that "learnt at Leaf1" means that Leaf1 knows to forward it to bond6:nic1, why does the loop defense drop the packet if Leaf1 is on the forwarding path? >This patch addresses such issue by check the entire MAC range definde by >the RFC 2464. Adding a new helper method to check the first two octets >are the value 3333. If the dest mac is matched, no balance will be >enabled. > >[1] https://tools.ietf.org/html/rfc2464#section-7 >[2] https://tools.ietf.org/html/rfc4861 >[3] linux.git/tree/include/net/if_inet6.h#n209-n221 >[4] linux.git/tree/net/ipv6/ndisc.c#n291 >[5] linux.git/tree/net/ipv6/ndisc.c#n346-n348 >[6] https://en.citizendium.org/wiki/Neighbor_Discovery > >Signed-off-by: LIU Yulong >--- > drivers/net/bonding/bond_alb.c | 10 ++++------ > include/linux/etherdevice.h | 12 ++++++++++++ > 2 files changed, 16 insertions(+), 6 deletions(-) > >diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c >index 095ea51..a4a30bd 100644 >--- a/drivers/net/bonding/bond_alb.c >+++ b/drivers/net/bonding/bond_alb.c >@@ -24,9 +24,6 @@ > #include > #include > >-static const u8 mac_v6_allmcast[ETH_ALEN + 2] __long_aligned = { >- 0x33, 0x33, 0x00, 0x00, 0x00, 0x01 >-}; > static const int alb_delta_in_ticks = HZ / ALB_TIMER_TICKS_PER_SEC; > > #pragma pack(1) >@@ -1422,10 +1419,11 @@ struct slave *bond_xmit_alb_slave_get(struct bonding *bond, > break; > } > >- /* IPv6 uses all-nodes multicast as an equivalent to >- * broadcasts in IPv4. >+ /* IPv6 multicast destination should disable the tx-balance since >+ * the pyhsical world may get into a mass status which will lead >+ * to the IPv6 traffic broken. I think this comment can be simplified to simply say that IPv6 multicast destinations should not be tx-balanced, which I suspect is the real purpose. > */ >- if (ether_addr_equal_64bits(eth_data->h_dest, mac_v6_allmcast)) { >+ if (is_ipv6_multicast_ether_addr(eth_data->h_dest)) { > do_tx_balance = false; > break; > } >diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h >index 2e5debc..c6101ab 100644 >--- a/include/linux/etherdevice.h >+++ b/include/linux/etherdevice.h >@@ -178,6 +178,18 @@ static inline bool is_unicast_ether_addr(const u8 *addr) > } > > /** >+ * is_ipv6_multicast_ether_addr - Determine if the Ethernet address is for >+ * IPv6 multicast (rfc2464). >+ * @addr: Pointer to a six-byte array containing the Ethernet address >+ * >+ * Return true if the address is a multicast for IPv6. >+ */ >+static inline bool is_ipv6_multicast_ether_addr(const u8 *addr) >+{ >+ return (addr[0] & addr[1]) == 0x33; >+} I don't think this does what is intended. It will return true for a MAC that starts with any two values whose bitwise AND is 0x33, e.g., 0x73 0x3b. For IPv6 multicast, the first two octets of the MAC must be exactly 0x33 0x33. -J >+ >+/** > * is_valid_ether_addr - Determine if the given Ethernet address is valid > * @addr: Pointer to a six-byte array containing the Ethernet address > * >-- >1.8.3.1 --- -Jay Vosburgh, jay.vosburgh@canonical.com