Received: by 10.223.185.111 with SMTP id b44csp501327wrg; Fri, 9 Mar 2018 08:27:31 -0800 (PST) X-Google-Smtp-Source: AG47ELv8ns7XeBUBDdJfgHmhS0Jcra5UxXIWLt2OJRl59boWRjPY840G3fjwDWMp9FNbP4Tx9VMy X-Received: by 10.101.96.43 with SMTP id p11mr24789335pgu.430.1520612851279; Fri, 09 Mar 2018 08:27:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1520612851; cv=none; d=google.com; s=arc-20160816; b=ZsSMARvqlgRTgBzC9VbS+f9wopsyN6DXz3oR1wTb7uWpXIorqcuwzYVgEQJ3ajDuMA e91g3HVDvYLVerp5/ewhMEJuFsLQRM+Lgw3xMdOJDb1xREsbQlcdXtTkOttQ6bzUhHBi OYaF3jz6kDo3tBSVTkozsSdJw8hM0hmmlZGLuq/mzA22uQX5k2kzYwWia/K2erqFHvzB JUN17qzXR1eZKLjvOUM7NVaP9ZQIlP+RkN+FzI9j8UkUb5r8nHDjSupSQpPXp/LQ3RNR bvA/sKclK0bkM0BHx1ieiTocZ6I6Ri3Zy7QejUTNDD0soBVk4swDrV7q4AGRiVK3l1yR COMA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date :arc-authentication-results; bh=N1b86jrJSupXykY4Epm1xywYvAKCmaVnou5fuh0y2+w=; b=g//dIKrT5Ftzt7ja2VQuoHJCPZSakOSnXMtRUTGx9NqkX2vaqvPLNBkDplZhp8Ss8l sgTAA1yalyXQjFujfcw0MWreZvP2jQs2vDYuhwF3eDrM2frkhO0G+fzgf1ttLNDF+Xq3 2y4y4Jf7YooTofOLthIrPI7+fawU7rC2n9Ym0BnuIdJZDrvc6gRRr6V75u7VthAXChBi GJtiCDlRaxsffG4b4OD8fEXwYpJgIBlp5y2a3tMFK30jxcYUr+8yDilulKAovAbvl5i5 K/WJS+yDg+eAtN/Vt6fF3YpZNczmPGQhodTVQoSsHuWtF5QnXVb9mcj8gTPnTaWzl58f KSJw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e93-v6si1094521plk.159.2018.03.09.08.27.16; Fri, 09 Mar 2018 08:27:31 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751318AbeCIQ0W (ORCPT + 99 others); Fri, 9 Mar 2018 11:26:22 -0500 Received: from Chamillionaire.breakpoint.cc ([146.0.238.67]:54790 "EHLO Chamillionaire.breakpoint.cc" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751170AbeCIQ0V (ORCPT ); Fri, 9 Mar 2018 11:26:21 -0500 Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.84_2) (envelope-from ) id 1euKqO-0006oq-2o; Fri, 09 Mar 2018 17:26:08 +0100 Date: Fri, 9 Mar 2018 17:26:08 +0100 From: Florian Westphal To: David Woodhouse Cc: Florian Westphal , Pablo Neira Ayuso , David Miller , rga@amazon.de, bridge@lists.linux-foundation.org, stephen@networkplumber.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, aliguori@amazon.com, nbd@openwrt.org Subject: Re: [RFC PATCH v2] bridge: make it possible for packets to traverse the bridge without hitting netfilter Message-ID: <20180309162608.GC19924@breakpoint.cc> References: <1424705163-13428-1-git-send-email-imrep.amz@gmail.com> <20150223160619.GF24297@breakpoint.cc> <54EEF32D.2010202@amazon.de> <20150226.113431.238255529591339000.davem@davemloft.net> <54F982B5.90108@gmail.com> <20150306142932.GA15926@salvia> <20150306163700.GC20382@breakpoint.cc> <1520609475.17937.42.camel@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1520609475.17937.42.camel@infradead.org> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org David Woodhouse wrote: > > > On Fri, 2015-03-06 at 17:37 +0100, Florian Westphal wrote: > > > > > > I did performance measurements in the following way: > > > >? > > > > Removed those pieces of the packet pipeline that I don't necessarily > > > > need one-by-one.? Then measured their effect on small packet > > > > performance. > > > >? > > > > This was the only part that produced considerable effect. > > > >? > > > > The pure speculation was about why the effect is more than 15% > > > > increase in packet throughput, although the code path avoided > > > > contains way less code than 15% of the packet pipeline.? It seems, > > > > Felix Fietkau profiled similar changes, and found my guess well > > > > founded. > > > >? > > > > Now could anybody explain me what else is wrong with my patch? > > >? > > > We have to come up with a more generic solution for this. > > > > Jiri Benc suggested to allowing to attach netfilter hooks e.g. via tc > > action, maybe that would be an option worth investigating. > > > > Then you could for instance add filtering rules only to the bridge port > > that needs it. > > > > > These sysfs tweaks you're proposing look to me like an obscure way to > > > tune this. > > > > I agree, adding more tunables isn't all that helpful, in the past this > > only helped to prolong the problem. > > How feasible would it be to make it completely dynamic? > > A given hook could automatically disable itself (for a given device) if > the result of running it the first time was *tautologically* false for > that device (i.e. regardless of the packet itself, or anything else). > > The hook would need to be automatically re-enabled if the rule chain > ever changes (and might subsequently disable itself again). > > Is that something that's worth exploring for the general case? AF_BRIDGE hooks sit in the net namespace, so its enough for one bridge to request filtering to bring in the hook overhead for all bridges in the same netns. Alternatives: - place the bridges that need filtering in different netns - use tc ingress for filtering - use nftables ingress hook for filtering (it sits in almost same location as tc ingress hook) to attach the ruleset to those bridge ports that need packet filtering. (The original request came from user with tons of bridges where only one single bridge needed filtering). One alternative I see is to place the bridge hooks into the bridge device (net_bridge struct, which is in netdev private area). But, as you already mentioned we would need to annotate the hooks to figure out which device(s) they are for. This sounds rather fragile to me, so i would just use nft ingress: #!/sbin/nft -f table netdev ingress { chain in_public { type filter hook ingress device eth0 priority 0; ip saddr 192.168.0.1 counter } }