Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp1033501iob; Fri, 13 May 2022 20:25:14 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwnFoMY6Pej2PVQlGNZ1pc+lZiUB6nNXwJG9zybf7im9eMZKlFWUdJ2OxFOY5f+1PigkEpi X-Received: by 2002:a05:600c:4fce:b0:394:5f8e:8124 with SMTP id o14-20020a05600c4fce00b003945f8e8124mr17675142wmq.107.1652498714695; Fri, 13 May 2022 20:25:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652498714; cv=none; d=google.com; s=arc-20160816; b=kZU3EpO9ThXZRM3OsWDZKPtJCvQ4T+aWOQmOEDDQMHjz5br+4/vsH/7KfqerVZbx0y kdew83SQkZjHtmyFe/NXZUrNdzQvkQGzDNd4+QBYkMyExXmluxEWpKrWANbTfw1TL5wF Y/yhaK2Azv6dSGeZrhlGObFZIDEBvFcEIvrwwgjPTWLvnOwmUOfxwkvon23Blhug9taz 6+zLiXbZw73PukzGug9U9go9sDv6PdNIb4oE6sWPQMFJClUEJtsUfQ7LynV9u7PoLvMJ 4RGcFFG1tW/rEEdVoHkpEwy+BfdsUbXcVriEISRBfLx+6tuTHfA+WA7XsbmJXC6sBs5S N5Ow== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=Bi+uH/wZS5AkNpz5xFa5gSB6zQd3C6hwa/ddsG2FtDA=; b=wFKmS0oDm3zz7SUc2u3etJUrHtLFGrAqcunkJ8/GZp/R8z2c8/X1gy700ijAS8xP4A HReMMy2cdOmbusYJcEm6eIyDwTPZVzsvY6sOyF8rK+2n2RxXe7iyrzMIInztVyPYwJik 2TgHjRnTdwFptqCif6+8A5edmpWyPC4yTxuS2DKmijLbKSC2QlC3waU02uJObjIAKhVB GvkO9gwvJt4wjO0rSSaUXaTIyteQa90ZR85yoHnn8R30QbjOS2tvJnWNMI0fw4W2jM7b KsEM0sjXbqr2uMBLNEmLPKSEQ729PWhsaPu92x9+xgEiQ/Q+j+RgOErKOCikQWjRjfY9 aQRQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Y7mtLE1Z; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id b7-20020adfd1c7000000b00207a2a8e869si5085759wrd.1045.2022.05.13.20.25.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 May 2022 20:25:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Y7mtLE1Z; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id E309540B815; Fri, 13 May 2022 17:02:17 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1378046AbiEMQub (ORCPT + 99 others); Fri, 13 May 2022 12:50:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47470 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1382538AbiEMQuC (ORCPT ); Fri, 13 May 2022 12:50:02 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id ADE6D527E7 for ; Fri, 13 May 2022 09:49:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1652460598; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Bi+uH/wZS5AkNpz5xFa5gSB6zQd3C6hwa/ddsG2FtDA=; b=Y7mtLE1Z5rpi0vAiCioUn7j0sGI25Xf5MnciZKBFjkhce54ue15Sr+MT/Na7DZo/a4RYDX Tck8bwT7Mm0ACk+Qf+rVgi6TGnEBpUYkBBJHR3ZdXQz/kYvegxIMsTy0BDDMWGYl76EoZr MbGyx5+N+OJItBMrp7zOSrxMr87FEqo= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-146-pYjAnWixPTmtiKzVFmuVJw-1; Fri, 13 May 2022 12:49:55 -0400 X-MC-Unique: pYjAnWixPTmtiKzVFmuVJw-1 Received: by mail-qk1-f200.google.com with SMTP id y140-20020a376492000000b0069f7e985c95so6740142qkb.15 for ; Fri, 13 May 2022 09:49:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=Bi+uH/wZS5AkNpz5xFa5gSB6zQd3C6hwa/ddsG2FtDA=; b=iCUK1Hapws4ODXM7uP0Wba4JZn9NnFljlnkCQ5H87SDUMOKfkzRr7gMHP9pw1bHW3+ AF07Z0oIi6XBOGZ+SJuFthpOGrVPoitgSi7bFLeZmsXX3y0fFRH5LVrsGU83oyGKclDn qGBl683tKQQbDadnnpS7A6+B7ZIcklq5Vcl9nzuwfXT38pbKQHqxuPjy0puBrV4vl43r Pv5Aijo1NQu2QxgOJisKYlvOxZe8xGgbkIn6JDKJcmwPIP9snldGDytbRwYbZhssdaNb pDMs1l4RQhvYb70iQINBw5DAoByT7MhZ4ftzhJ3iQ/qEKXhKpz/QWnwC5jN/ABGn2fzK hR9w== X-Gm-Message-State: AOAM5307C7nn7wsjAPivCJ9QPB+bebsbM2/Hwm2f/1d68eKTt6XcxXGm L9wwbEyMbTo+jWinxXjEtlYUc4I8AO1FSYb5PZ1rtkAtqa2Se0p6LefVyQAwDMmKySvkXvHhTub qygRtnCYaWZHRQCZs7+pisg0R X-Received: by 2002:a05:6214:2ae:b0:456:31d0:c934 with SMTP id m14-20020a05621402ae00b0045631d0c934mr5225762qvv.45.1652460595249; Fri, 13 May 2022 09:49:55 -0700 (PDT) X-Received: by 2002:a05:6214:2ae:b0:456:31d0:c934 with SMTP id m14-20020a05621402ae00b0045631d0c934mr5225747qvv.45.1652460594963; Fri, 13 May 2022 09:49:54 -0700 (PDT) Received: from [192.168.98.18] ([107.15.110.69]) by smtp.gmail.com with ESMTPSA id bj31-20020a05620a191f00b0069fc13ce22asm1750937qkb.91.2022.05.13.09.49.53 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 13 May 2022 09:49:54 -0700 (PDT) Message-ID: <1d52f228-ea8c-6d97-13b6-8ec912188e07@redhat.com> Date: Fri, 13 May 2022 12:49:52 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Subject: Re: [PATCH net-next v2] bond: add mac filter option for balance-xor Content-Language: en-US To: Nikolay Aleksandrov , netdev@vger.kernel.org Cc: Long Xin , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jonathan Corbet , Jay Vosburgh , Veaceslav Falico , Andy Gospodarek , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, =?UTF-8?Q?Toke_H=c3=b8iland-J=c3=b8rgensen?= , Jesper Dangaard Brouer References: <6227427ef3b57d7de6d4d95e9dd7c9b222a37bf6.1651689665.git.jtoppins@redhat.com> From: Jonathan Toppins In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,RDNS_NONE,SPF_HELO_NONE, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/13/22 12:19, Nikolay Aleksandrov wrote: > On 13/05/2022 18:42, Jonathan Toppins wrote: >> Hi Nik, thanks for the review. Responses below. >> >> On 5/5/22 08:14, Nikolay Aleksandrov wrote: >>> On 04/05/2022 21:47, Jonathan Toppins wrote: >>>> Implement a MAC filter that prevents duplicate frame delivery when >>>> handling BUM traffic. This attempts to partially replicate OvS SLB >>>> Bonding[1] like functionality without requiring significant change >>>> in the Linux bridging code. >>>> >>>> A typical network setup for this feature would be: >>>> >>>>              .--------------------------------------------. >>>>              |         .--------------------.             | >>>>              |         |                    |             | >>>>         .-------------------.               |             | >>>>         |    | Bond 0  |    |               |             | >>>>         | .--'---. .---'--. |               |             | >>>>    .----|-| eth0 |-| eth1 |-|----.    .-----+----.   .----+------. >>>>    |    | '------' '------' |    |    | Switch 1 |   | Switch 2  | >>>>    |    '---,---------------'    |    |          +---+           | >>>>    |       /                     |    '----+-----'   '----+------' >>>>    |  .---'---.    .------.      |         |              | >>>>    |  |  br0  |----| VM 1 |      |      ~~~~~~~~~~~~~~~~~~~~~ >>>>    |  '-------'    '------'      |     (                     ) >>>>    |      |        .------.      |     ( Rest of Network     ) >>>>    |      '--------| VM # |      |     (_____________________) >>>>    |               '------'      | >>>>    |  Host 1                     | >>>>    '-----------------------------' >>>> >>>> Where 'VM1' and 'VM#' are hosts connected to a Linux bridge, br0, with >>>> bond0 and its associated links, eth0 & eth1, provide ingress/egress. One >>>> can assume bond0, br1, and hosts VM1 to VM# are all contained in a >>>> single box, as depicted. Interfaces eth0 and eth1 provide redundant >>>> connections to the data center with the requirement to use all bandwidth >>>> when the system is functioning normally. Switch 1 and Switch 2 are >>>> physical switches that do not implement any advanced L2 management >>>> features such as MLAG, Cisco's VPC, or LACP. >>>> >>>> Combining this feature with vlan+srcmac hash policy allows a user to >>>> create an access network without the need to use expensive switches that >>>> support features like Cisco's VCP. >>>> >>>> [1] https://docs.openvswitch.org/en/latest/topics/bonding/#slb-bonding >>>> >>>> Co-developed-by: Long Xin >>>> Signed-off-by: Long Xin >>>> Signed-off-by: Jonathan Toppins >>>> --- >>>> >>>> Notes: >>>>      v2: >>>>       * dropped needless abstraction functions and put code in module init >>>>       * renamed variable "rc" to "ret" to stay consistent with most of the >>>>         code >>>>       * fixed parameter setting management, when arp-monitor is turned on >>>>         this feature will be turned off similar to how miimon and arp-monitor >>>>         interact >>>>       * renamed bond_xor_recv to bond_mac_filter_recv for a little more >>>>         clarity >>>>       * it appears the implied default return code for any bonding recv probe >>>>         must be `RX_HANDLER_ANOTHER`. Changed the default return code of >>>>         bond_mac_filter_recv to use this return value to not break skb >>>>         processing when the skb dev is switched to the bond dev: >>>>           `skb->dev = bond->dev` >>>> >>>>   Documentation/networking/bonding.rst  |  19 +++ >>>>   drivers/net/bonding/Makefile          |   2 +- >>>>   drivers/net/bonding/bond_mac_filter.c | 201 ++++++++++++++++++++++++++ >>>>   drivers/net/bonding/bond_mac_filter.h |  39 +++++ >>>>   drivers/net/bonding/bond_main.c       |  27 ++++ >>>>   drivers/net/bonding/bond_netlink.c    |  13 ++ >>>>   drivers/net/bonding/bond_options.c    |  86 ++++++++++- >>>>   drivers/net/bonding/bonding_priv.h    |   1 + >>>>   include/net/bond_options.h            |   1 + >>>>   include/net/bonding.h                 |   3 + >>>>   include/uapi/linux/if_link.h          |   1 + >>>>   11 files changed, 390 insertions(+), 3 deletions(-) >>>>   create mode 100644 drivers/net/bonding/bond_mac_filter.c >>>>   create mode 100644 drivers/net/bonding/bond_mac_filter.h >>>> >>> >>> Hi Jonathan, >>> I must mention that this is easily solvable with two very simple ebpf programs, one on egress >>> to track source macs and one on ingress to filter them, it can also easily be solved by a >>> user-space agent that adds macs for filtering in many different ways, after all these VMs >>> run on the host and you don't need bond-specific knowledge to do this. Also you have no visibility >>> into what is currently being filtered, so it will be difficult to debug. With the above solutions >>> you already have that. I don't think the bond should be doing any learning or filtering, this is >>> deviating a lot from its purpose and adds unnecessary complexity. >>> That being said, if you decide to continue with the set, comments are below... >> >> This is an excellent observation, it does appear this could likely be done with eBPF. However, the delivery of such a solution to a user would be the difficult part. There appears to be no standard way for attaching a program to an interface, it still seems customary to write your own custom loader. Where would the user run this loader? In Debian likely in a post up hook with ifupdown, in Fedora one would have to write a locally custom dispatcher script (assuming Network Manager) that only ran the loader for a given interface. In short I do not see a reasonably appropriate way to deploy an eBPF program to users with the current infrastructure. Also, I am not aware of the bpf syscall supporting signed program loading. Signing kernel modules seems popular with some distros to identify limits of support and authentication of an unmodified system. I suspect similar bpf support might be needed to identify support and authentication for deployed programs. >> > > A great deal of the distributions (almost all major ones) out there already use eBPF for various tasks, so I can't see > how any of these arguments apply nowadays. There are standard ways to load eBPF programs that have been around > for quite some time and most of the different software needed to achieve that is already packaged > for all major distributions (and has been for a long time). Anyway getting into the details of "how" the user would load the program > is not really pertinent to the discussion, that doesn't warrant adding so much new complexity in the bonding driver > which will have to be maintained forever. Honestly, I don't like the idea of adding learning to the bonding at all, > I think it's the wrong place for it, especially when the solution can easily be achieved with already available means. > It might not even be eBPF, you can do it with a user-space agent that uses nftables or some other filtering mechanism, > I'm sure you can think of many other ways to solve it which don't require this new infrastructure. All of these ways > to solve it have many advantages over this (e.g. visibility into the current entries being filtered, control over them and so on). > > That's my opinion of course, it'd be nice to get feedback from others as well. Input from others would be helpful, I cannot claim eBPF is an inferior technical solution to this proposed solution. So if this bonding option approach is the wrong path, would like to know sooner rather than later to attempt another path. I am by no means an expert on eBPF, I know how to spell it, I do know there were support issues when considering the eBPF option. I will post my v3 to continue the technical review, but I think the "should we even do this" review should be continued here. -Jon