Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp3330127iob; Mon, 16 May 2022 19:38:35 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzBJ7x/fsJu35rK5QBn8JJk+MX4qTNNbWRTc4K9Jkud3MZk8SZVMuL8NGvdWl03CEmxxuHQ X-Received: by 2002:a05:6402:238f:b0:42a:98d8:ae1b with SMTP id j15-20020a056402238f00b0042a98d8ae1bmr14511084eda.168.1652755115580; Mon, 16 May 2022 19:38:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652755115; cv=none; d=google.com; s=arc-20160816; b=NeUNtS4TcwpcQT1AMe0oP8s7n7wtMHHU0VVc4rKgK+nKtXx+I2LWRGktHWh3G4cEw3 rZyRoyu3yPCxasfdDYzO2q818BzPMDSxUIJHGV8iHrzVimdq4/zbI3lJxF20/ChpeAvO UEShW2jwGsUplVUlOsf1emqM6tCcvlQ/vFIvXgGDN+aETjqQfOUgMixk65X8JZwwdb0J WqvTJv9Ye31H6HNomp6pWv/M2Bxj32Lxt6fHc8CgxAPvKSpiMJtAz9gMYN1YyY4ApNYD jRY0I8201h02eRkYH0PSXfhOPhwLLjI510XzjzCmF46WAdLsTTU5mK38cJRtieJvxQa4 5hcw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=4VmwZEvEe/v2B8OMNtqQ8l5ayC8KoHdt1Ef7RxfFIL8=; b=BnRoGDMVV+HoP9SLniHeCklMgDExb6OM/xtivwDHudNEnNUguBTxWoFF8D5g02vZMm tiFF7GHVPJEGpLkWygkUbKD0bKopPBAPb7gJD8scuKEJscCQ8GTBWJS9Mv7o82QO4lTt kFvMM+kLgbeSFe01a3XDWo1+OKkoIfnRSY5cR96YV0DaWxO8Wr+HbiTaMI9dcDTD4zfs /+mZWQTksWl9DjbxePYWJSvLhDtWzPybgTjBEy0MhUV25JdnGtv7L6WHogXSW/BPl6tt We/ZnXRfkZbboaA64Z/F1qnldDdb3EwGhknUJB2ULPUaLniU++oi28FJXwpfoFurtIlP OLUw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@blackwall-org.20210112.gappssmtp.com header.s=20210112 header.b=LJ+mNiku; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id pj9-20020a170906d78900b006df76385d4dsi1212802ejb.493.2022.05.16.19.38.10; Mon, 16 May 2022 19:38:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@blackwall-org.20210112.gappssmtp.com header.s=20210112 header.b=LJ+mNiku; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344165AbiEPRWs (ORCPT + 99 others); Mon, 16 May 2022 13:22:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42126 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240390AbiEPRWq (ORCPT ); Mon, 16 May 2022 13:22:46 -0400 Received: from mail-ed1-x531.google.com (mail-ed1-x531.google.com [IPv6:2a00:1450:4864:20::531]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1390ECFD for ; Mon, 16 May 2022 10:22:44 -0700 (PDT) Received: by mail-ed1-x531.google.com with SMTP id s3so4087557edr.9 for ; Mon, 16 May 2022 10:22:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=blackwall-org.20210112.gappssmtp.com; s=20210112; h=message-id:date:mime-version:user-agent:subject:content-language:to :cc:references:from:in-reply-to:content-transfer-encoding; bh=4VmwZEvEe/v2B8OMNtqQ8l5ayC8KoHdt1Ef7RxfFIL8=; b=LJ+mNikuaHaNKJvOJMh9rpCqRP1y6TjBckuTAr6JMud7x28k17v9zbGZpD6CRrnHzx R5X+y0tBWMpEdFLo+W41gCjItEWDXzwjHOQnuedI6Ji0YXezUsG94IhfZ7y++4ZdTLQ7 zIsKNHenylNpwKlWS0MVYbVPJ5DxI1jKSSQdLsakggI8c8bjlf1F0+rQrHd1JRD7aEXw PKozw6K9KKG3975twn3efJp7KbraPV2bkNYCTgd1Z+p8c6/YANHKawFM1N1pA2x/egal 3wn559I/0SFlifOfBQwdER60qotGtdSgTqfOc2TVyaxOqaLqLHUpgIV4ejIswUBIXB/w K7GQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=4VmwZEvEe/v2B8OMNtqQ8l5ayC8KoHdt1Ef7RxfFIL8=; b=vtC7kZ4LbjXsxDodCwKyoLu1vzNw69rBs773bwPhbLqFn5O2GbgjXRHrB5lFgiz8ck 9WuvUhVkzm8njuab/GbE0UvB6MU+14KtvIQqVXmwIVV2GPdRIgOl/OcRpKJHm9hdbqsF bPhzgs+n8tuPjIRu+DKgI5lZ//6tS95lZ9LmjQjPmhUXZ/vy/wjHslyTUkUAE787VEbk tCi3zVcel6B7AdWvH3m6j78Tv/w7hahrpAay5pKS7og/E5D3jDe2kYfMYPn8YgN5si1+ qq+m/gmPGPjsNjXG8WHoRLz5SSXiQGtVVt1+Af95w7xD2Ivz08wqziTVAoyY7lP2lbYe ZXDQ== X-Gm-Message-State: AOAM533aQtzG0jOQwyh5T4173aFlsUk+GeqrPqHtBxoPK/5E45VBtZRa SMxPMh7gW4cjwzo+YYBG1JKQ8w== X-Received: by 2002:a05:6402:3326:b0:426:4883:60a with SMTP id e38-20020a056402332600b004264883060amr14677604eda.310.1652721762438; Mon, 16 May 2022 10:22:42 -0700 (PDT) Received: from [192.168.0.111] (87-243-81-1.ip.btc-net.bg. [87.243.81.1]) by smtp.gmail.com with ESMTPSA id eb8-20020a170907280800b006f3ef214e53sm28785ejc.185.2022.05.16.10.22.41 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 16 May 2022 10:22:42 -0700 (PDT) Message-ID: <53357187-aedf-20dd-a331-bc501aa0484e@blackwall.org> Date: Mon, 16 May 2022 20:22:40 +0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Subject: Re: [PATCH net-next v3] bond: add mac filter option for balance-xor Content-Language: en-US To: Jonathan Toppins , netdev@vger.kernel.org Cc: toke@redhat.com, Long Xin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jonathan Corbet , Jay Vosburgh , Veaceslav Falico , Andy Gospodarek , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org References: <4c9db6ac-aa24-2ca2-3e44-18cfb23ac1bc@blackwall.org> <6431569f-fb09-096e-7a89-284a71aa5c0f@redhat.com> From: Nikolay Aleksandrov In-Reply-To: <6431569f-fb09-096e-7a89-284a71aa5c0f@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 16/05/2022 17:06, Jonathan Toppins wrote: > On 5/15/22 02:32, Nikolay Aleksandrov wrote: >> On 15/05/2022 00:41, Nikolay Aleksandrov wrote: >>> On 13/05/2022 20:43, Jonathan Toppins wrote: >>>> Implement a MAC filter that prevents duplicate frame delivery when >>>> handling BUM traffic. This attempts to partially replicate OvS SLB >>>> Bonding[1] like functionality without requiring significant change >>>> in the Linux bridging code. >>>> >>>> A typical network setup for this feature would be: >>>> >>>>              .--------------------------------------------. >>>>              |         .--------------------.             | >>>>              |         |                    |             | >>>>         .-------------------.               |             | >>>>         |    | Bond 0  |    |               |             | >>>>         | .--'---. .---'--. |               |             | >>>>    .----|-| eth0 |-| eth1 |-|----.    .-----+----.   .----+------. >>>>    |    | '------' '------' |    |    | Switch 1 |   | Switch 2  | >>>>    |    '---,---------------'    |    |          +---+           | >>>>    |       /                     |    '----+-----'   '----+------' >>>>    |  .---'---.    .------.      |         |              | >>>>    |  |  br0  |----| VM 1 |      |      ~~~~~~~~~~~~~~~~~~~~~ >>>>    |  '-------'    '------'      |     (                     ) >>>>    |      |        .------.      |     ( Rest of Network     ) >>>>    |      '--------| VM # |      |     (_____________________) >>>>    |               '------'      | >>>>    |  Host 1                     | >>>>    '-----------------------------' >>>> >>>> Where 'VM1' and 'VM#' are hosts connected to a Linux bridge, br0, with >>>> bond0 and its associated links, eth0 & eth1, provide ingress/egress. One >>>> can assume bond0, br1, and hosts VM1 to VM# are all contained in a >>>> single box, as depicted. Interfaces eth0 and eth1 provide redundant >>>> connections to the data center with the requirement to use all bandwidth >>>> when the system is functioning normally. Switch 1 and Switch 2 are >>>> physical switches that do not implement any advanced L2 management >>>> features such as MLAG, Cisco's VPC, or LACP. >>>> >>>> Combining this feature with vlan+srcmac hash policy allows a user to >>>> create an access network without the need to use expensive switches that >>>> support features like Cisco's VCP. >>>> >>>> [1] https://docs.openvswitch.org/en/latest/topics/bonding/#slb-bonding >>>> >>>> Co-developed-by: Long Xin >>>> Signed-off-by: Long Xin >>>> Signed-off-by: Jonathan Toppins >>>> --- >>>> >>>> Notes: >>>>      v2: >>>>       * dropped needless abstraction functions and put code in module init >>>>       * renamed variable "rc" to "ret" to stay consistent with most of the >>>>         code >>>>       * fixed parameter setting management, when arp-monitor is turned on >>>>         this feature will be turned off similar to how miimon and arp-monitor >>>>         interact >>>>       * renamed bond_xor_recv to bond_mac_filter_recv for a little more >>>>         clarity >>>>       * it appears the implied default return code for any bonding recv probe >>>>         must be `RX_HANDLER_ANOTHER`. Changed the default return code of >>>>         bond_mac_filter_recv to use this return value to not break skb >>>>         processing when the skb dev is switched to the bond dev: >>>>           `skb->dev = bond->dev` >>>>           v3: Nik's comments >>>>       * clarified documentation >>>>       * fixed inline and basic reverse Christmas tree formatting >>>>       * zero'ed entry in mac_create >>>>       * removed read_lock taking in bond_mac_filter_recv >>>>       * made has_expired() atomic and removed critical sections >>>>         surrounding calls to has_expired(), this also removed the >>>>         use-after-free that would have occurred: >>>>             spin_lock_irqsave(&entry->lock, flags); >>>>                 if (has_expired(bond, entry)) >>>>                     mac_delete(bond, entry); >>>>             spin_unlock_irqrestore(&entry->lock, flags); <--- >>>>       * moved init/destroy of mac_filter_tbl to bond_open/bond_close >>>>         this removed the complex option dependencies, the only behavioural >>>>         change the user will see is if the bond is up and mac_filter is >>>>         enabled if they try and set arp_interval they will receive -EBUSY >>>>       * in bond_changelink moved processing of mac_filter option just below >>>>         mode processing >>>> >>>>   Documentation/networking/bonding.rst  |  20 +++ >>>>   drivers/net/bonding/Makefile          |   2 +- >>>>   drivers/net/bonding/bond_mac_filter.c | 201 ++++++++++++++++++++++++++ >>>>   drivers/net/bonding/bond_mac_filter.h |  37 +++++ >>>>   drivers/net/bonding/bond_main.c       |  30 ++++ >>>>   drivers/net/bonding/bond_netlink.c    |  13 ++ >>>>   drivers/net/bonding/bond_options.c    |  81 +++++++++-- >>>>   drivers/net/bonding/bonding_priv.h    |   1 + >>>>   include/net/bond_options.h            |   1 + >>>>   include/net/bonding.h                 |   3 + >>>>   include/uapi/linux/if_link.h          |   1 + >>>>   11 files changed, 373 insertions(+), 17 deletions(-) >>>>   create mode 100644 drivers/net/bonding/bond_mac_filter.c >>>>   create mode 100644 drivers/net/bonding/bond_mac_filter.h >>>> >>> >> [snip] >> >> The same problem solved using a few nftables rules (in case you don't want to load eBPF): >> $ nft 'add table netdev nt' >> $ nft 'add chain netdev nt bond0EgressFilter { type filter hook egress device bond0 priority 0; }' >> $ nft 'add chain netdev nt bond0IngressFilter { type filter hook ingress device bond0 priority 0; }' >> $ nft 'add set netdev nt macset { type ether_addr; flags timeout; }' >> $ nft 'add rule netdev nt bond0EgressFilter set update ether saddr timeout 5s @macset' >> $ nft 'add rule netdev nt bond0IngressFilter ether saddr @macset counter drop' >> > > I get the following when trying to apply this on a fedora 35 install. > > root@fedora ~]# ip link add bond0 type bond mode balance-xor xmit_hash_policy vlan+srcmac > [root@fedora ~]# nft 'add table netdev nt' > [root@fedora ~]# nft 'add chain netdev nt bond0EgressFilter { type filter hook egress device bond0 priority 0; }' > Error: unknown chain hook > add chain netdev nt bond0EgressFilter { type filter hook egress device bond0 priority 0; } >                                                          ^^^^^^ > [root@fedora ~]# uname -a > Linux fedora 5.17.5-200.fc35.x86_64 #1 SMP PREEMPT Thu Apr 28 15:41:41 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux > Well, take it up with the Fedora nftables package maintainer. :) Your nftables version is old (I'd guess <1.0.1): commit 510c4fad7e78 Author: Lukas Wunner Date: Wed Mar 11 13:20:06 2020 +0100 src: Support netdev egress hook $ git tag --contains 510c4fad7e78f v1.0.1 v1.0.2 I just tested it[1] on Linux 5.16.18-200.fc35.x86_64 #1 SMP PREEMPT Mon Mar 28 14:10:07 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux Cheers, Nik [1] You can clearly see the dynamically learned mac on egress (52:54:00:23:5f:13) and traffic with that source is now blocked on ingress. $ nft -a list table netdev nt set macset { # handle 10 type ether_addr size 65535 flags timeout elements = { 52:54:00:23:5f:13 timeout 5s expires 4s192ms } } chain bond0EgressFilter { # handle 8 type filter hook egress device "bond0" priority filter; policy accept; update @macset { ether saddr timeout 5s } # handle 11 } chain bond0IngressFilter { # handle 9 type filter hook ingress device "bond0" priority filter; policy accept; }