Received: by 2002:a05:6358:16cc:b0:ea:6187:17c9 with SMTP id r12csp11211874rwl; Mon, 2 Jan 2023 16:39:09 -0800 (PST) X-Google-Smtp-Source: AMrXdXvKCquZgJS77onbyjpGsUCas93Qi+C+/IxjSHV/C93e2I98fv0//BAZefTuD3IDr2Klj0PL X-Received: by 2002:a05:6a20:7b25:b0:a2:e391:f8f3 with SMTP id s37-20020a056a207b2500b000a2e391f8f3mr58414940pzh.34.1672706349280; Mon, 02 Jan 2023 16:39:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672706349; cv=none; d=google.com; s=arc-20160816; b=bOFcmZUaKo4XufHyiSeNu1wOr8aOj0sw/U/29mvnkyYDqfjThljzirlcdS+qwUiDrO F2V/MOpulNmGH6ZiKekkoH5UnBgEtszNZ3CJ/cXBjtfCSOZnSYEikS3WcZjXKrczraDt kuYxEYSfyyZD5UZl7l4PJRrL26asaAdF3wtmdANajaeVRDnhI7++mlY54PMKctsA2RE8 1WnD9ZKYDzeQvX/tDe/9KGXPW9gZlsJdqUoYhtK6PPS/DjU3KRIHKKR0PwSHpPVgrAJT X+/L3pttHTRlt6ZMWVwt9QD4wYZ2CY0dP1jVdyRde2bgFB/niju2r1bFhel8DAtQWKWD skeQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=UYSfbZHaZiRzvcUGlZ72kiuTZe0hkGLK83eVEG4p7xE=; b=C51A/wNIFGelWp71eN6CiK+YDq/3BxPZahHPrMX8riia7KmLczlDp9Plr98N9VKQks VSZ0SZXUKF/AEXQc5U10amuwCbV7hBnWxFbcPkn3+iUMprqcPy8/je4cRsVdvdlj7Jc2 dpOssYrlPX5kHpmxjV+haFtpTuowSuIE7tGChleEAN4zH3bGkPGoVtD3FYp7VSRrtNot WHdlHT0+2oJOLbuLJvZvE/gZfKTku6dzYwTwxdnJUo352RxFJk7wPgBU3xgBswzslBip cxRjSm5LGGAdOeUbY6tMGKQxGYInouocPhNKg1WL39L96dPUjyMz34rA7YN2Uauqj5dq yU+g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=UlM5tCzB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bg3-20020a056a02010300b00477abe15f06si31068361pgb.63.2023.01.02.16.39.01; Mon, 02 Jan 2023 16:39:09 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=UlM5tCzB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231551AbjACAAd (ORCPT + 63 others); Mon, 2 Jan 2023 19:00:33 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58474 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229587AbjACAAb (ORCPT ); Mon, 2 Jan 2023 19:00:31 -0500 Received: from mail-lf1-x12a.google.com (mail-lf1-x12a.google.com [IPv6:2a00:1450:4864:20::12a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1CCC7DEB; Mon, 2 Jan 2023 16:00:29 -0800 (PST) Received: by mail-lf1-x12a.google.com with SMTP id g13so43494490lfv.7; Mon, 02 Jan 2023 16:00:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=UYSfbZHaZiRzvcUGlZ72kiuTZe0hkGLK83eVEG4p7xE=; b=UlM5tCzB/7uHbHxmVB7H2mUOJjYyAv/24TEOPYMIclLY5G7N6zZcCU0iFfmmlRXhXi YvzEvzos9KtdlF9X83lz/EoqvXJiEJyzrGLEytE3p66zQ1aULOy1NF0CdaFeu4sGtqax acs/Y0Ryj0po/zKxDOx5z+0JThT2Xs1m+pCYDaKSkgNOs8d/Ck1I2J8ra888zelHSI+p 4hmLpawu8Ngdvq4bOUaGqMdrSoHXJJoaR/o/Rdp9xKUqyndsn6tqu9Ys/g4zkkYEuJ7m 3C93OOoDKOfwOlEVnszIR+hGPRwlUvxaLwNObgUH6mLgEtyeab8hZlkJg7jr6MyOgLKj HN8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=UYSfbZHaZiRzvcUGlZ72kiuTZe0hkGLK83eVEG4p7xE=; b=XHyGzhFRaApt4jSXKCUogFOO+cROyTwqYL3uT9Vek9FtayOjowJJfW7ImCpHr6ZkRT 5OvyoGG/OYJLLjZyyw9Kp2GEZLhiYUtcqO/zWT+wagEAYNsyx60asrfUBdsd3yvwjNqZ zJJtQR8zLgJdzBI65OijXj//fJgRE+EFpVV8Mlt1ge3WmngkQQba36MIrgSmOw7G3rC+ lcPQa8U84jawzYWL+0BEO6cwqtAhr+eEhe/BhZjXMzXTkdac/AhShI677J4+sfDTAW2m CnSvLKiLy90KOAmaUIe58UkcXfVB5Dq0UimnTbHpGp71HNfgXzagPTSmWl3XcmmyhTSe qGJg== X-Gm-Message-State: AFqh2krNDhAk7xJkhfIplUEVmjk+arc9ygjLuOztOOk7QMF0Vd609Azt auko5PoT5WZ0FAMzkAy6qbWBsZpl8qsG2HY0mGM= X-Received: by 2002:ac2:41da:0:b0:4b4:af05:4a8d with SMTP id d26-20020ac241da000000b004b4af054a8dmr1884470lfi.415.1672704027048; Mon, 02 Jan 2023 16:00:27 -0800 (PST) MIME-Version: 1.0 References: <20221218234801.579114-1-jmaxwell37@gmail.com> <9f145202ca6a59b48d4430ed26a7ab0fe4c5dfaf.camel@redhat.com> <20221223212835.eb9d03f3f7db22360e34341d@uniroma2.it> In-Reply-To: From: Jonathan Maxwell Date: Tue, 3 Jan 2023 10:59:50 +1100 Message-ID: Subject: Re: [net-next] ipv6: fix routing cache overflow for raw sockets To: Andrea Mayer Cc: Paolo Abeni , davem@davemloft.net, edumazet@google.com, kuba@kernel.org, yoshfuji@linux-ipv6.org, dsahern@kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Stefano Salsano , Paolo Lungaroni , Ahmed Abdelsalam Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Andrea, Happy New Year. Any chance you could test this patch based on the latest net-next kernel and let me know the result? diff --git a/include/net/dst_ops.h b/include/net/dst_ops.h index 88ff7bb2bb9b..632086b2f644 100644 --- a/include/net/dst_ops.h +++ b/include/net/dst_ops.h @@ -16,7 +16,7 @@ struct dst_ops { unsigned short family; unsigned int gc_thresh; - int (*gc)(struct dst_ops *ops); + void (*gc)(struct dst_ops *ops); struct dst_entry * (*check)(struct dst_entry *, __u32 cookie); unsigned int (*default_advmss)(const struct dst_entry *); unsigned int (*mtu)(const struct dst_entry *); diff --git a/net/core/dst.c b/net/core/dst.c index 6d2dd03dafa8..31c08a3386d3 100644 --- a/net/core/dst.c +++ b/net/core/dst.c @@ -82,12 +82,8 @@ void *dst_alloc(struct dst_ops *ops, struct net_device *dev, if (ops->gc && !(flags & DST_NOCOUNT) && - dst_entries_get_fast(ops) > ops->gc_thresh) { - if (ops->gc(ops)) { - pr_notice_ratelimited("Route cache is full: consider increasing sysctl net.ipv6.route.max_size.\n"); - return NULL; - } - } + dst_entries_get_fast(ops) > ops->gc_thresh) + ops->gc(ops); dst = kmem_cache_alloc(ops->kmem_cachep, GFP_ATOMIC); if (!dst) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index e74e0361fd92..b643dda68d31 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -91,7 +91,7 @@ static struct dst_entry *ip6_negative_advice(struct dst_entry *); static void ip6_dst_destroy(struct dst_entry *); static void ip6_dst_ifdown(struct dst_entry *, struct net_device *dev, int how); -static int ip6_dst_gc(struct dst_ops *ops); +static void ip6_dst_gc(struct dst_ops *ops); static int ip6_pkt_discard(struct sk_buff *skb); static int ip6_pkt_discard_out(struct net *net, struct sock *sk, struct sk_buff *skb); @@ -3284,11 +3284,10 @@ struct dst_entry *icmp6_dst_alloc(struct net_device *dev, return dst; } -static int ip6_dst_gc(struct dst_ops *ops) +static void ip6_dst_gc(struct dst_ops *ops) { struct net *net = container_of(ops, struct net, ipv6.ip6_dst_ops); int rt_min_interval = net->ipv6.sysctl.ip6_rt_gc_min_interval; - int rt_max_size = net->ipv6.sysctl.ip6_rt_max_size; int rt_elasticity = net->ipv6.sysctl.ip6_rt_gc_elasticity; int rt_gc_timeout = net->ipv6.sysctl.ip6_rt_gc_timeout; unsigned long rt_last_gc = net->ipv6.ip6_rt_last_gc; @@ -3296,11 +3295,10 @@ static int ip6_dst_gc(struct dst_ops *ops) int entries; entries = dst_entries_get_fast(ops); - if (entries > rt_max_size) + if (entries > ops->gc_thresh) entries = dst_entries_get_slow(ops); - if (time_after(rt_last_gc + rt_min_interval, jiffies) && - entries <= rt_max_size) + if (time_after(rt_last_gc + rt_min_interval, jiffies)) goto out; fib6_run_gc(atomic_inc_return(&net->ipv6.ip6_rt_gc_expire), net, true); @@ -3310,7 +3308,6 @@ static int ip6_dst_gc(struct dst_ops *ops) out: val = atomic_read(&net->ipv6.ip6_rt_gc_expire); atomic_set(&net->ipv6.ip6_rt_gc_expire, val - (val >> rt_elasticity)); - return entries > rt_max_size; } static int ip6_nh_lookup_table(struct net *net, struct fib6_config *cfg, @@ -6512,7 +6509,7 @@ static int __net_init ip6_route_net_init(struct net *net) #endif net->ipv6.sysctl.flush_delay = 0; - net->ipv6.sysctl.ip6_rt_max_size = 4096; + net->ipv6.sysctl.ip6_rt_max_size = INT_MAX; net->ipv6.sysctl.ip6_rt_gc_min_interval = HZ / 2; net->ipv6.sysctl.ip6_rt_gc_timeout = 60*HZ; net->ipv6.sysctl.ip6_rt_gc_interval = 30*HZ; On Sat, Dec 24, 2022 at 6:38 PM Jonathan Maxwell wrote: > > On Sat, Dec 24, 2022 at 7:28 AM Andrea Mayer wrote: > > > > Hi Jon, > > please see below, thanks. > > > > On Wed, 21 Dec 2022 08:48:11 +1100 > > Jonathan Maxwell wrote: > > > > > On Tue, Dec 20, 2022 at 11:35 PM Paolo Abeni wrote: > > > > > > > > On Mon, 2022-12-19 at 10:48 +1100, Jon Maxwell wrote: > > > > > Sending Ipv6 packets in a loop via a raw socket triggers an issue where a > > > > > route is cloned by ip6_rt_cache_alloc() for each packet sent. This quickly > > > > > consumes the Ipv6 max_size threshold which defaults to 4096 resulting in > > > > > these warnings: > > > > > > > > > > [1] 99.187805] dst_alloc: 7728 callbacks suppressed > > > > > [2] Route cache is full: consider increasing sysctl net.ipv6.route.max_size. > > > > > . > > > > > . > > > > > [300] Route cache is full: consider increasing sysctl net.ipv6.route.max_size. > > > > > > > > If I read correctly, the maximum number of dst that the raw socket can > > > > use this way is limited by the number of packets it allows via the > > > > sndbuf limit, right? > > > > > > > > > > Yes, but in my test sndbuf limit is never hit so it clones a route for > > > every packet. > > > > > > e.g: > > > > > > output from C program sending 5000000 packets via a raw socket. > > > > > > ip raw: total num pkts 5000000 > > > > > > # bpftrace -e 'kprobe:dst_alloc {@count[comm] = count()}' > > > Attaching 1 probe... > > > > > > @count[a.out]: 5000009 > > > > > > > Are other FLOWI_FLAG_KNOWN_NH users affected, too? e.g. nf_dup_ipv6, > > > > ipvs, seg6? > > > > > > > > > > Any call to ip6_pol_route(s) where no res.nh->fib_nh_gw_family is 0 can do it. > > > But we have only seen this for raw sockets so far. > > > > > > > In the SRv6 subsystem, the seg6_lookup_nexthop() is used by some > > cross-connecting behaviors such as End.X and End.DX6 to forward traffic to a > > specified nexthop. SRv6 End.X/DX6 can specify an IPv6 DA (i.e., a nexthop) > > different from the one carried by the IPv6 header. For this purpose, > > seg6_lookup_nexthop() sets the FLOWI_FLAG_KNOWN_NH. > > > Hi Andrea, > > Thanks for pointing that datapath out. The more generic approach we are > taking bringing Ipv6 closer to Ipv4 in this regard should fix all instances > of this. > > > > > > [1] 99.187805] dst_alloc: 7728 callbacks suppressed > > > > > [2] Route cache is full: consider increasing sysctl net.ipv6.route.max_size. > > > > > . > > > > > . > > > > > [300] Route cache is full: consider increasing sysctl net.ipv6.route.max_size. > > > > I can reproduce the same warning messages reported by you, by instantiating an > > End.X behavior whose nexthop is handled by a route for which there is no "via". > > In this configuration, the ip6_pol_route() (called by seg6_lookup_nexthop()) > > triggers ip6_rt_cache_alloc() because i) the FLOWI_FLAG_KNOWN_NH is present ii) > > and the res.nh->fib_nh_gw_family is 0 (as already pointed out). > > > > Nice, when I get back after the holiday break I'll submit the next patch. It > would be great if you could test the new patch and let me know how it works in > your tests at that juncture. I'll keep you posted. > > Regards > > Jon > > > > Regards > > > > > > Jon > > > > Ciao, > > Andrea