Received: by 2002:a05:6358:5282:b0:b5:90e7:25cb with SMTP id g2csp3963446rwa; Tue, 23 Aug 2022 13:26:45 -0700 (PDT) X-Google-Smtp-Source: AA6agR7uHQ8COuKnUvl/2wa1tOZdvQ4r3SzBfdSSCmx2zo6M5T8EIar3zqhfhkmrqiztyTfpQRMB X-Received: by 2002:a63:6a05:0:b0:42a:3b5d:8084 with SMTP id f5-20020a636a05000000b0042a3b5d8084mr18460172pgc.281.1661286405681; Tue, 23 Aug 2022 13:26:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1661286405; cv=none; d=google.com; s=arc-20160816; b=leYB4tqPIt5pJaR7pA0hgokkWJXZPdKFSLl+RKweKT6C6U5fCNpk2MvQlC7lD8J+fd iQvbQUQglSSknC6S8JgAq1tUrxMI8GXqVeMXNeRBn7CsOSUKQ3HWjECJ0LQn2Z6n1+mC fBsAL/jQ7CjnWEQw7b7wb9VkNNB4EvAbtLwR/EzO54M0F1iI6WiEY91+ApqMD8XybT4S 0koNJRkH/BrBXInAF4IqD5fp90T0krd9oWjYinCdzJ7eqVNaC1iBdHtWnLPfMpoXcdZ8 JK8nxAUayWGvMenYp3358cuyXsDXMJlHWNK4SaclELRQ5/ZfD30RwarV+2tfr5HVT4Xg RULQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:message-id:in-reply-to :subject:cc:to:from:date; bh=tu/v0AILpbdJh+wyRvi0FLV7Yg6bET3nBZn5e47GlxY=; b=s7He5PyxQqDVgM3YdTgwx9mhgsUB5OZMGOozjOS3EEZoiRi7jYQEeks0WQUpHu+4Uq 9ea+dNMN/FVZ5Qb+k3kSo4+WSRc1pGBIFZiuVUwrbkz+efl08cuOHloeFLJOsPEDC7VV 6wAu+qZb3eFrq88EW1wGVZAd2fZxvDlrVnaUjdFfSIHKUU5K0qtlUoZ4YJ6qRgQn3BIt R5VjHIw8+aIX9azaAmFG4oAsBxCRvJXtnzJoUUJFV7mnLuCVBW0pWtO2Mng7RmHqhKMo EJScPThxgYoDT4/YYeRc+7DCmA19WTbSZigCyb5jMuQAGlrz0hUCkBsIR+6+qmLaE+BR LnyA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w186-20020a6382c3000000b0041cb15022e4si16077473pgd.767.2022.08.23.13.26.34; Tue, 23 Aug 2022 13:26:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231434AbiHWT33 (ORCPT + 99 others); Tue, 23 Aug 2022 15:29:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48748 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231357AbiHWT3L (ORCPT ); Tue, 23 Aug 2022 15:29:11 -0400 Received: from mg.ssi.bg (mg.ssi.bg [193.238.174.37]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 437407C526; Tue, 23 Aug 2022 11:17:38 -0700 (PDT) Received: from mg.ssi.bg (localhost [127.0.0.1]) by mg.ssi.bg (Proxmox) with ESMTP id C28FF13501; Tue, 23 Aug 2022 21:16:40 +0300 (EEST) Received: from ink.ssi.bg (unknown [193.238.174.40]) by mg.ssi.bg (Proxmox) with ESMTP id 12885134BC; Tue, 23 Aug 2022 21:16:38 +0300 (EEST) Received: from ja.ssi.bg (unknown [178.16.129.10]) by ink.ssi.bg (Postfix) with ESMTPS id 8DE973C0441; Tue, 23 Aug 2022 21:16:32 +0300 (EEST) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by ja.ssi.bg (8.17.1/8.16.1) with ESMTP id 27NIGTfi085162; Tue, 23 Aug 2022 21:16:31 +0300 Date: Tue, 23 Aug 2022 21:16:29 +0300 (EEST) From: Julian Anastasov To: "longguang.yue" cc: horms@verge.net.au, kadlec@netfilter.org, fw@strlen.de, pablo@netfilter.org, lvs-devel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] ipvs: add a sysctl switch to control ipvs to bypass OUTPUT chain or not In-Reply-To: <20220819100702.14889-1-bigclouds@163.com> Message-ID: <495ceee5-f8dc-06e-d1ef-258d1889c7b8@ssi.bg> References: <20220819100702.14889-1-bigclouds@163.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, On Fri, 19 Aug 2022, longguang.yue wrote: > Netfilter's rules are matched in sequence, more rules worse performance. > IPVS is a special system, its traffic is clear and definite, for better > performance, should better not be interfered heavily. This patch adds a > sysctl switch and enable ipvs to control traffic to pass netfilter > OUTPUT chain or not. > > Signed-off-by: longguang.yue > --- > include/net/ip_vs.h | 11 +++++++++++ > net/netfilter/ipvs/ip_vs_ctl.c | 8 ++++++++ > net/netfilter/ipvs/ip_vs_xmit.c | 31 ++++++++++++++++++++++++------- > 3 files changed, 43 insertions(+), 7 deletions(-) > > diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h > index ff1804a0c469..c1232ef3a1b5 100644 > --- a/include/net/ip_vs.h > +++ b/include/net/ip_vs.h > @@ -932,6 +932,7 @@ struct netns_ipvs { > int sysctl_schedule_icmp; > int sysctl_ignore_tunneled; > int sysctl_run_estimation; > + int sysctl_output_bypass; > > /* ip_vs_lblc */ > int sysctl_lblc_expiration; > @@ -1077,6 +1078,11 @@ static inline int sysctl_run_estimation(struct netns_ipvs *ipvs) > return ipvs->sysctl_run_estimation; > } > > +static inline int sysctl_output_bypass(struct netns_ipvs *ipvs) > +{ > + return ipvs->sysctl_output_bypass; > +} > + > #else > > static inline int sysctl_sync_threshold(struct netns_ipvs *ipvs) > @@ -1174,6 +1180,11 @@ static inline int sysctl_run_estimation(struct netns_ipvs *ipvs) > return 1; > } > > +static inline int sysctl_output_bypass(struct netns_ipvs *ipvs) > +{ > + return 0; > +} > + > #endif > > /* IPVS core functions > diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c > index efab2b06d373..8a08a783e85e 100644 > --- a/net/netfilter/ipvs/ip_vs_ctl.c > +++ b/net/netfilter/ipvs/ip_vs_ctl.c > @@ -2019,6 +2019,12 @@ static struct ctl_table vs_vars[] = { > .mode = 0644, > .proc_handler = proc_dointvec, > }, > + { > + .procname = "output_bypass", > + .maxlen = sizeof(int), > + .mode = 0644, > + .proc_handler = proc_dointvec, > + }, > #ifdef CONFIG_IP_VS_DEBUG > { > .procname = "debug_level", > @@ -4094,6 +4100,8 @@ static int __net_init ip_vs_control_net_init_sysctl(struct netns_ipvs *ipvs) > tbl[idx++].data = &ipvs->sysctl_ignore_tunneled; > ipvs->sysctl_run_estimation = 1; > tbl[idx++].data = &ipvs->sysctl_run_estimation; > + ipvs->sysctl_output_bypass = 1; Default should be 0, so without above line. > + tbl[idx++].data = &ipvs->sysctl_output_bypass; > #ifdef CONFIG_IP_VS_DEBUG > /* Global sysctls must be ro in non-init netns */ > if (!net_eq(net, &init_net)) > diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c > index 029171379884..46a34dd2555e 100644 > --- a/net/netfilter/ipvs/ip_vs_xmit.c > +++ b/net/netfilter/ipvs/ip_vs_xmit.c > @@ -653,8 +653,12 @@ static inline int ip_vs_nat_send_or_cont(int pf, struct sk_buff *skb, > skb_forward_csum(skb); > if (skb->dev) > skb_clear_tstamp(skb); > - NF_HOOK(pf, NF_INET_LOCAL_OUT, cp->ipvs->net, NULL, skb, > - NULL, skb_dst(skb)->dev, dst_output); > + if (sysctl_output_bypass(cp->ipvs)) { > + dst_output(cp->ipvs->net, NULL, skb); > + } else { > + NF_HOOK(pf, NF_INET_LOCAL_OUT, cp->ipvs->net, NULL, skb, > + NULL, skb_dst(skb)->dev, dst_output); > + } > } else > ret = NF_ACCEPT; > > @@ -675,8 +679,12 @@ static inline int ip_vs_send_or_cont(int pf, struct sk_buff *skb, > skb_forward_csum(skb); > if (skb->dev) > skb_clear_tstamp(skb); > - NF_HOOK(pf, NF_INET_LOCAL_OUT, cp->ipvs->net, NULL, skb, > - NULL, skb_dst(skb)->dev, dst_output); > + if (sysctl_output_bypass(cp->ipvs)) { > + dst_output(cp->ipvs->net, NULL, skb); > + } else { > + NF_HOOK(pf, NF_INET_LOCAL_OUT, cp->ipvs->net, NULL, skb, > + NULL, skb_dst(skb)->dev, dst_output); > + } > } else > ret = NF_ACCEPT; > return ret; > @@ -1262,10 +1270,19 @@ ip_vs_tunnel_xmit(struct sk_buff *skb, struct ip_vs_conn *cp, > skb->ignore_df = 1; > > ret = ip_vs_tunnel_xmit_prepare(skb, cp); > - if (ret == NF_ACCEPT) > - ip_local_out(net, skb->sk, skb); > - else if (ret == NF_DROP) > + if (ret == NF_ACCEPT) { > + if (sysctl_output_bypass(cp->ipvs)) { > + struct iphdr *iph = ip_hdr(skb); > + > + iph->tot_len = htons(skb->len); > + ip_send_check(iph); > + dst_output(cp->ipvs->net, NULL, skb); After our last discussion I still think this is a hack. If you expand also __ip6_local_out you'll see what maintenance cost adds such flag. Your patch also warns how much ip_local_out changed in the years and our NF_HOOK calls look as hacks too. They were used to avoid duplicate calls like ip_send_check which we should not have anymore, ip_send_check was added to the ip_defrag flow in 2015 with commit 0848f6428ba3. So, I think we should be moving to ip_local_out/ip6_local_out calls which are risky to duplicate here in the IPVS code. > + } else { > + ip_local_out(net, skb->sk, skb); > + } > + } else if (ret == NF_DROP) { > kfree_skb(skb); > + } > > LeaveFunction(10); Regards -- Julian Anastasov