Received: by 2002:a05:6602:18e:0:0:0:0 with SMTP id m14csp1934491ioo; Mon, 23 May 2022 06:37:30 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxn2ft1jUEtgOO3Xps01CJPfmzg4Z1qzc3CkpgIRIg3/7lThItRFvuYKdnWwU7mzRaAUuBt X-Received: by 2002:a17:902:690b:b0:159:65c:9044 with SMTP id j11-20020a170902690b00b00159065c9044mr23427662plk.47.1653313049932; Mon, 23 May 2022 06:37:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1653313049; cv=none; d=google.com; s=arc-20160816; b=wka0Wmme/cUAjjn3WzhB/+lpwGzFF++GkoJDQtdq+xcqRPTfAJr0pBwnLoSuegHcsq TWHCKOYpfVili9C0ckpBADy6Q1a3a981YVf/Dqe44IHzYCZSGlP+2vF0PNtSyl7m1cE4 MC8giuOQbtHfaFbrQ/uaMkmEmy0CSoF+HcZ3JIcPgJBzarMF7wk4bcFj3QAgdaS5F59j s9/6fLLAy2F5TnrbmQnjnTJbbWq3b3Zv0h41KlZgNmiinz+1TrHa1eqk2ZLZoOefmk3a fPdSHJe3Mbg7Eyi2gbIvGbEhpXLvkzEZYvTXyBF6B0wbtQXLVoAVzWFQdrb+OUhJYl7P BRiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=8c2zoLQL1fQT0gHH2dJM2YqDfdDFpygc8onqT1U47Dw=; b=IsdefgseXYowU9Hej/hGY5tF5gMWQbZTvLvxjyc89LzD1nsyIgBjWNKmyanF5BtOgs f8SHwrA4tUtM47GrjacuzhYqt3iwoFhk9kjnM7ijt8fCniTFwNYX5cIhAZcc+BPtTes/ GMQY2v0dz/7PUq4HlP1m1ocstE3gPK7RbRQdGjnQIw2QGBsKN37BKyPLsjz12C0Vgjkr iavAp+001WMP2z1CDR7uDkZuii5652mO/j90pq3FwDjqUEL6B3uwWgbjkG1A288Rff95 68gbyBsJpu8SluRM10OO9E8osFThS2q7avl6pQ1AbLsY7EPm4An/5LWrv+ADNepGLM1V nZIQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=I5ZPtM+y; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id t3-20020a635f03000000b003816043efb5si9711866pgb.426.2022.05.23.06.37.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 May 2022 06:37:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=I5ZPtM+y; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id CEA954160A; Mon, 23 May 2022 06:37:20 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236336AbiEWNfs (ORCPT + 99 others); Mon, 23 May 2022 09:35:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43290 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236297AbiEWNfq (ORCPT ); Mon, 23 May 2022 09:35:46 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 1E9DB19030 for ; Mon, 23 May 2022 06:35:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1653312943; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8c2zoLQL1fQT0gHH2dJM2YqDfdDFpygc8onqT1U47Dw=; b=I5ZPtM+y2687zJGz9ZZWypYRScRAFSwnvi4kFBZ8CS3Vzj5qQcxgTl7fhmb4S5a7KiKszu 2UIihPoMHeyWTGMEfbap9I5+JassacBk7WPCFqGxQLjSwG4CT3blw+kBNzUhW8ZUrFw2vm qAI4AIU+p1ZkRJQywPL1tChm8+GLeD8= Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-348-y8_Bs4JVP9SoX72R9ams3g-1; Mon, 23 May 2022 09:35:41 -0400 X-MC-Unique: y8_Bs4JVP9SoX72R9ams3g-1 Received: by mail-qk1-f199.google.com with SMTP id b1-20020a05620a118100b006a36dec1b16so3844707qkk.2 for ; Mon, 23 May 2022 06:35:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=8c2zoLQL1fQT0gHH2dJM2YqDfdDFpygc8onqT1U47Dw=; b=Dn5cEDTR46601AQweAIClfFasZrKEC8IXDenojAqydqhy5fv+KQr8ko/WzOORMWkDt UxTB5egdF8LIonMkqFErb3uPO+17uQY5Pis0AjbDsWAH39eSEspLFViwkWb6L7d8+luT u1diZzMCckxKQ7N1pkzyt/TAN0LpJWpLOPLJ/u/AHXk75Mw9xcvKuSRUhe8xyrYeBR0i VqsgjnbBkeDtFl38J44eGUyYi1dTek45e+VNqawSZooZDMaVoTkt14/0Ezcmst2Cfv6z rLNA2MZS2uks7niI7HxT+Zu7gXplD4Um9BkfJG3Z9gKW3vEFHS4VtLUVdlfWOnOyRXM6 0kgg== X-Gm-Message-State: AOAM5318tPxIrk0I8Wv1t7gZS35CBoMCWBzfCnT7WdrjqusaR4xYR0mo pkG02IUyvJOytaA7r01uwuOHVcPVKZbje8Il8NwU4Xni53ZAcxx1FqKGdOxeSWZPB8ll+yU09Nf TGROPz4DfEOjxznuYBeKf2P1O X-Received: by 2002:a05:620a:4091:b0:6a0:54cf:c0ed with SMTP id f17-20020a05620a409100b006a054cfc0edmr13913339qko.578.1653312940952; Mon, 23 May 2022 06:35:40 -0700 (PDT) X-Received: by 2002:a05:620a:4091:b0:6a0:54cf:c0ed with SMTP id f17-20020a05620a409100b006a054cfc0edmr13913312qko.578.1653312940682; Mon, 23 May 2022 06:35:40 -0700 (PDT) Received: from [192.168.98.18] ([107.12.98.143]) by smtp.gmail.com with ESMTPSA id w36-20020a05622a192400b002f39b99f69dsm4199479qtc.55.2022.05.23.06.35.39 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 23 May 2022 06:35:40 -0700 (PDT) Message-ID: Date: Mon, 23 May 2022 09:35:38 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.0 Subject: Re: [PATCH net-next v3] bond: add mac filter option for balance-xor Content-Language: en-US To: Nikolay Aleksandrov , netdev@vger.kernel.org Cc: toke@redhat.com, Long Xin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jonathan Corbet , Jay Vosburgh , Veaceslav Falico , Andy Gospodarek , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org References: <4c9db6ac-aa24-2ca2-3e44-18cfb23ac1bc@blackwall.org> From: Jonathan Toppins In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-5.9 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,RDNS_NONE,SPF_HELO_NONE, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/15/22 02:32, Nikolay Aleksandrov wrote: > On 15/05/2022 00:41, Nikolay Aleksandrov wrote: >> On 13/05/2022 20:43, Jonathan Toppins wrote: >>> Implement a MAC filter that prevents duplicate frame delivery when >>> handling BUM traffic. This attempts to partially replicate OvS SLB >>> Bonding[1] like functionality without requiring significant change >>> in the Linux bridging code. >>> >>> A typical network setup for this feature would be: >>> >>> .--------------------------------------------. >>> | .--------------------. | >>> | | | | >>> .-------------------. | | >>> | | Bond 0 | | | | >>> | .--'---. .---'--. | | | >>> .----|-| eth0 |-| eth1 |-|----. .-----+----. .----+------. >>> | | '------' '------' | | | Switch 1 | | Switch 2 | >>> | '---,---------------' | | +---+ | >>> | / | '----+-----' '----+------' >>> | .---'---. .------. | | | >>> | | br0 |----| VM 1 | | ~~~~~~~~~~~~~~~~~~~~~ >>> | '-------' '------' | ( ) >>> | | .------. | ( Rest of Network ) >>> | '--------| VM # | | (_____________________) >>> | '------' | >>> | Host 1 | >>> '-----------------------------' >>> >>> Where 'VM1' and 'VM#' are hosts connected to a Linux bridge, br0, with >>> bond0 and its associated links, eth0 & eth1, provide ingress/egress. One >>> can assume bond0, br1, and hosts VM1 to VM# are all contained in a >>> single box, as depicted. Interfaces eth0 and eth1 provide redundant >>> connections to the data center with the requirement to use all bandwidth >>> when the system is functioning normally. Switch 1 and Switch 2 are >>> physical switches that do not implement any advanced L2 management >>> features such as MLAG, Cisco's VPC, or LACP. >>> >>> Combining this feature with vlan+srcmac hash policy allows a user to >>> create an access network without the need to use expensive switches that >>> support features like Cisco's VCP. >>> >>> [1] https://docs.openvswitch.org/en/latest/topics/bonding/#slb-bonding >>> >>> Co-developed-by: Long Xin >>> Signed-off-by: Long Xin >>> Signed-off-by: Jonathan Toppins >>> --- >>> >>> Notes: >>> v2: >>> * dropped needless abstraction functions and put code in module init >>> * renamed variable "rc" to "ret" to stay consistent with most of the >>> code >>> * fixed parameter setting management, when arp-monitor is turned on >>> this feature will be turned off similar to how miimon and arp-monitor >>> interact >>> * renamed bond_xor_recv to bond_mac_filter_recv for a little more >>> clarity >>> * it appears the implied default return code for any bonding recv probe >>> must be `RX_HANDLER_ANOTHER`. Changed the default return code of >>> bond_mac_filter_recv to use this return value to not break skb >>> processing when the skb dev is switched to the bond dev: >>> `skb->dev = bond->dev` >>> >>> v3: Nik's comments >>> * clarified documentation >>> * fixed inline and basic reverse Christmas tree formatting >>> * zero'ed entry in mac_create >>> * removed read_lock taking in bond_mac_filter_recv >>> * made has_expired() atomic and removed critical sections >>> surrounding calls to has_expired(), this also removed the >>> use-after-free that would have occurred: >>> spin_lock_irqsave(&entry->lock, flags); >>> if (has_expired(bond, entry)) >>> mac_delete(bond, entry); >>> spin_unlock_irqrestore(&entry->lock, flags); <--- >>> * moved init/destroy of mac_filter_tbl to bond_open/bond_close >>> this removed the complex option dependencies, the only behavioural >>> change the user will see is if the bond is up and mac_filter is >>> enabled if they try and set arp_interval they will receive -EBUSY >>> * in bond_changelink moved processing of mac_filter option just below >>> mode processing >>> >>> Documentation/networking/bonding.rst | 20 +++ >>> drivers/net/bonding/Makefile | 2 +- >>> drivers/net/bonding/bond_mac_filter.c | 201 ++++++++++++++++++++++++++ >>> drivers/net/bonding/bond_mac_filter.h | 37 +++++ >>> drivers/net/bonding/bond_main.c | 30 ++++ >>> drivers/net/bonding/bond_netlink.c | 13 ++ >>> drivers/net/bonding/bond_options.c | 81 +++++++++-- >>> drivers/net/bonding/bonding_priv.h | 1 + >>> include/net/bond_options.h | 1 + >>> include/net/bonding.h | 3 + >>> include/uapi/linux/if_link.h | 1 + >>> 11 files changed, 373 insertions(+), 17 deletions(-) >>> create mode 100644 drivers/net/bonding/bond_mac_filter.c >>> create mode 100644 drivers/net/bonding/bond_mac_filter.h >>> >> > [snip] > > The same problem solved using a few nftables rules (in case you don't want to load eBPF): > $ nft 'add table netdev nt' > $ nft 'add chain netdev nt bond0EgressFilter { type filter hook egress device bond0 priority 0; }' > $ nft 'add chain netdev nt bond0IngressFilter { type filter hook ingress device bond0 priority 0; }' > $ nft 'add set netdev nt macset { type ether_addr; flags timeout; }' > $ nft 'add rule netdev nt bond0EgressFilter set update ether saddr timeout 5s @macset' > $ nft 'add rule netdev nt bond0IngressFilter ether saddr @macset counter drop' So I did some testing on this nft solution and it performs largely the same as the bonding solution except for handling rule1 of the OVS SLB[1] for multicast[2] and broadcast[3] traffic. If this can be corrected then I think it would be a possible solution otherwise the bonding solution fits the requirements better. -Jon [1] https://docs.openvswitch.org/en/latest/topics/bonding/#slb-bonding [2] https://gitlab.com/liali666/virtual-networking/-/blob/master/tests/test-0012-ovs-rule1-multicast [3] https://gitlab.com/liali666/virtual-networking/-/blob/master/tests/test-0011-ovs-rule1-broadcast