Received: by 2002:a05:6a10:f3d0:0:0:0:0 with SMTP id a16csp366697pxv; Wed, 30 Jun 2021 07:24:28 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz9cAlAyxVRmY95br1CywFtW8eKVHEVImirEYi1QiyyZxaiEG2MahHaqRxhvZ3uiG9y4J0x X-Received: by 2002:a17:907:1c01:: with SMTP id nc1mr3420596ejc.504.1625063067838; Wed, 30 Jun 2021 07:24:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1625063067; cv=none; d=google.com; s=arc-20160816; b=hHKWlvDRHW8JcFvhPu/OhiiNoxWJT8KJvuggllgimlewpcOnQ6+lko37rXxcAO5tlR 4BaZHqoPhoGXCsqF5gvW2BS326rt5NU+z/JoMujol1u/g0bAO0A3OQYkIN6uSxE+dODJ j1loM1bEgS39E3fgV+xD8KnhBdiuKNmdoytBhQo0s1cuRp3dStm82LdgDukcDPOUNjln 8xR3hGZzeHmuqYdbMXXkJ/HHG4m4AeEW9SkptWdD+KMMxlb3R+rIwit75aUfKsTj0FhE ABjhqBmZbmLL3swBadN2N5s3NE33oGj/hn9hIZrjHmq0/WyGlhWvI/QfQYNKefh2nB3Y 79RQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=NtMoRHgrR/BrAyF7qbtBEwp00ipuQ+3asTeiWbAkxFk=; b=MMuAEIeGmHSx+kV0Y0DjWYmjgtvVnDig9KtxfFRytTeUu4AEQ1x/DvCPbnZwVEg5cS KS18NyFbKbUt0z4xd2fsWPZGKqXAxgDVi34Dh4qik6TkGfv6cSSvwOz+ZC0ylk54hbnf sBzTArXH/ObV9vMvPbLUVfVD2icBc9rg0DUv81YKpGbg+bOEYSoODhf18XXWtnrq3VFL LNII1ffAmBHYUDmFgyVxW3XxMq/PQeJsYEqBgXlNr7cm08owtojpA1i09KwNMk8UJFnJ qswXYhzFBP57GSQ8HzkONlI+6zlvDGlK5Ow2UorXXbikgsEJA9wvpQZfGdjNSnv8B2JF mWtg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id ga23si13638571ejc.215.2021.06.30.07.23.59; Wed, 30 Jun 2021 07:24:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235088AbhF3OXd (ORCPT + 99 others); Wed, 30 Jun 2021 10:23:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53180 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234851AbhF3OXc (ORCPT ); Wed, 30 Jun 2021 10:23:32 -0400 Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 386F8C061756; Wed, 30 Jun 2021 07:21:03 -0700 (PDT) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1lyb57-0001Fh-WB; Wed, 30 Jun 2021 16:20:50 +0200 Date: Wed, 30 Jun 2021 16:20:49 +0200 From: Florian Westphal To: Cole Dishington Cc: pablo@netfilter.org, Anthony Lineham , Scott Parlane , Blair Steven , Jozsef Kadlecsik , Florian Westphal , "David S. Miller" , Jakub Kicinski , netfilter-devel@vger.kernel.org, coreteam@netfilter.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] net: netfilter: Add RFC-7597 Section 5.1 PSID support Message-ID: <20210630142049.GC18022@breakpoint.cc> References: <20210426122324.GA975@breakpoint.cc> <20210629004819.4750-1-Cole.Dishington@alliedtelesis.co.nz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210629004819.4750-1-Cole.Dishington@alliedtelesis.co.nz> User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Cole Dishington wrote: > Comments: > Selecting the ports for psid needs to be in nf_nat_core since the PSID ranges are not a single range. e.g. offset=1024, PSID=0, psid_length=8 generates the ranges 1024-1027, 2048-2051, ..., 63488-63491, ... (example taken from RFC7597 B.2). > This is why it is enough to set NF_NAT_RANGE_PROTO_SPECIFIED and init upper/lower boundaries. I suspect this misses a NOT. But current algorithm has problems, see below. > + if (range->flags & NF_NAT_RANGE_PSID) { > + /* PSID defines a group of port ranges, per PSID. PSID > + * is already contained in min and max. > + */ > + unsigned int min_to_max, base; > + > + min = ntohs(range->min_proto.all); > + max = ntohs(range->max_proto.all); > + base = ntohs(range->base_proto.all); > + min_to_max = max - min; > + for (; max <= (1 << 16) - 1; min += base, max = min + min_to_max) { > + for (off = 0; off <= min_to_max; off++) { > + *keyptr = htons(min + off); > + if (!nf_nat_used_tuple(tuple, ct)) > + return; > + } > + } > + } I fear this searches waaaay to many ports. We had softlockups in the past because of exhausive searches. See a504b703bb1da526a01593da0e4be2af9d9f5fa8 ("netfilter: nat: limit port clash resolution attempts"). I suggest you try pre-selecting one of the eligible ranges in nf_nat_masquerade_ipv4 when the 'newrange' is filled in and set RANGE_PROTO_SPECIFIED. Maybe even prandom-based preselection is good enough. > /* If no range specified... */ > if (!(range->flags & NF_NAT_RANGE_PROTO_SPECIFIED)) { > /* If it's dst rewrite, can't change port */ > @@ -529,11 +572,19 @@ get_unique_tuple(struct nf_conntrack_tuple *tuple, > > /* Only bother mapping if it's not already in range and unique */ > if (!(range->flags & NF_NAT_RANGE_PROTO_RANDOM_ALL)) { > - if (range->flags & NF_NAT_RANGE_PROTO_SPECIFIED) { > + /* PSID mode is present always needs to check > + * to see if the source ports are in range. > + */ > + if (range->flags & NF_NAT_RANGE_PROTO_SPECIFIED || > + (range->flags & NF_NAT_RANGE_PSID && Why the extra check? Can't you set NF_NAT_RANGE_PROTO_SPECIFIED in case PSID is requested by userspace? > diff --git a/net/netfilter/nf_nat_ftp.c b/net/netfilter/nf_nat_ftp.c > index aace6768a64e..f65163278db0 100644 > --- a/net/netfilter/nf_nat_ftp.c > +++ b/net/netfilter/nf_nat_ftp.c > @@ -17,6 +17,10 @@ > #include > #include > #include > +void nf_nat_l4proto_unique_tuple(struct nf_conntrack_tuple *tuple, > + const struct nf_nat_range2 *range, > + enum nf_nat_manip_type maniptype, > + const struct nf_conn *ct); > > #define NAT_HELPER_NAME "ftp" > > @@ -72,8 +76,13 @@ static unsigned int nf_nat_ftp(struct sk_buff *skb, > u_int16_t port; > int dir = CTINFO2DIR(ctinfo); > struct nf_conn *ct = exp->master; > + struct nf_conn_nat *nat = nfct_nat(ct); > char buffer[sizeof("|1||65535|") + INET6_ADDRSTRLEN]; > unsigned int buflen; > + int ret; > + > + if (WARN_ON_ONCE(!nat)) > + return NF_DROP; > > pr_debug("type %i, off %u len %u\n", type, matchoff, matchlen); > > @@ -86,18 +95,14 @@ static unsigned int nf_nat_ftp(struct sk_buff *skb, > * this one. */ > exp->expectfn = nf_nat_follow_master; > > - /* Try to get same port: if not, try to change it. */ > - for (port = ntohs(exp->saved_proto.tcp.port); port != 0; port++) { > - int ret; > - > - exp->tuple.dst.u.tcp.port = htons(port); > - ret = nf_ct_expect_related(exp, 0); > - if (ret == 0) > - break; > - else if (ret != -EBUSY) { > - port = 0; > - break; > - } > + /* Find a port that matches the MASQ rule. */ > + nf_nat_l4proto_unique_tuple(&exp->tuple, nat->range, > + dir ? NF_NAT_MANIP_SRC : NF_NAT_MANIP_DST, > + ct); Hmm, I am ingorant on details here, but is this correct? This could be an inbound connection, rather than outbound. > diff --git a/net/netfilter/nf_nat_helper.c b/net/netfilter/nf_nat_helper.c > index a263505455fc..2d105e4eb8f8 100644 > --- a/net/netfilter/nf_nat_helper.c > +++ b/net/netfilter/nf_nat_helper.c > @@ -179,15 +179,23 @@ EXPORT_SYMBOL(nf_nat_mangle_udp_packet); > void nf_nat_follow_master(struct nf_conn *ct, > struct nf_conntrack_expect *exp) > { > + struct nf_conn_nat *nat = NULL; > struct nf_nat_range2 range; > > /* This must be a fresh one. */ > BUG_ON(ct->status & IPS_NAT_DONE_MASK); > > - /* Change src to where master sends to */ > - range.flags = NF_NAT_RANGE_MAP_IPS; > - range.min_addr = range.max_addr > - = ct->master->tuplehash[!exp->dir].tuple.dst.u3; > + if (exp->master && !exp->dir) { > + nat = nfct_nat(exp->master); > + if (nat) > + range = *nat->range; Can't you store the psid-relevant parts of the range struct only? Non-PSID doesn't need the original range, so why do you? > diff --git a/net/netfilter/nf_nat_masquerade.c b/net/netfilter/nf_nat_masquerade.c > index 8e8a65d46345..d83cd3d8ad3f 100644 > --- a/net/netfilter/nf_nat_masquerade.c > +++ b/net/netfilter/nf_nat_masquerade.c > @@ -45,10 +45,6 @@ nf_nat_masquerade_ipv4(struct sk_buff *skb, unsigned int hooknum, > return NF_DROP; > } > > - nat = nf_ct_nat_ext_add(ct); > - if (nat) > - nat->masq_index = out->ifindex; > - > /* Transfer from original range. */ > memset(&newrange.min_addr, 0, sizeof(newrange.min_addr)); > memset(&newrange.max_addr, 0, sizeof(newrange.max_addr)); > @@ -57,6 +53,15 @@ nf_nat_masquerade_ipv4(struct sk_buff *skb, unsigned int hooknum, > newrange.max_addr.ip = newsrc; > newrange.min_proto = range->min_proto; > newrange.max_proto = range->max_proto; > + newrange.base_proto = range->base_proto; > + > + nat = nf_ct_nat_ext_add(ct); > + if (nat) { > + nat->masq_index = out->ifindex; > + if (!nat->range) > + nat->range = kmalloc(sizeof(*nat->range), 0); > + memcpy(nat->range, &newrange, sizeof(*nat->range)); kmemdup. Also misses error handling. Should use GFP_ATOMIC. Where is this free'd again? It would be good if you could chop this up in smaller chunks. A selftest would be nice as well (see tools/testing/selftests/netfilter).