Received: by 2002:a05:6358:f14:b0:e5:3b68:ec04 with SMTP id b20csp1652254rwj; Sat, 24 Dec 2022 00:51:51 -0800 (PST) X-Google-Smtp-Source: AMrXdXvKqkF0+czYrgfMwyqU0F9dG0SEkkeYyYnCixrVGn2GNvWd7E61JrBFH6NeaZdIh3llfWCK X-Received: by 2002:a17:90b:23ce:b0:212:ef19:1e0 with SMTP id md14-20020a17090b23ce00b00212ef1901e0mr13623126pjb.1.1671871911811; Sat, 24 Dec 2022 00:51:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671871911; cv=none; d=google.com; s=arc-20160816; b=iXFlbf49VhRjK08MAEtA6VGIwtYP/Zik0kN35cngxBfaXGBDA1SS8miSVcIrG7td52 Bfs17EfBc+i8PnWEx51nMq3r/U6hvMgBRPX5PG2rZvsjgYwsyuchRLlQ2s17mqfeH4ln +i6sDfxCT2aho9zGL7CMCz98QwN6mWO6yHBjTpGQxgwWmaj2BUJapAsEZi+uIu5Rx0t1 Tapa4xmiMmK1lWrfpsxCqBqpXDB2dumM32PcEjrR7c017G1KQwUdgtyrPJiBOK+Em1Z9 iVCqmYijwYx20NBZifTGFSvVGyqS6gAIAd1MdfmJJZvv5F7b82Os1xWP+H3gtI77Nnen f0Bg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=PEIbv2GFycGfkOrB6KG3qHEk5qaPBRjT/j4MyPFbif0=; b=ulltKlWqtnQupXnCIxyVAezr4AxsRDPhVqQwNGE8s/PUbVcrOHRnUw2ZbMT2cVz7E+ 7kOOF/wnRoIhZf9RcpaQMPePIgFW24dxgWAlKklzBk3FUvIG5Je0/D7v3SdGQw5ziR0B 6P8of7ZqMW9m9BTq4IM8rguvrwbbgcTMzjV4W0sLU1GeIc8s3gbS2b0QXFuOCc/g2hhH mYfiFujLhcnnQcLJ3P3MdoM/eq8JhEh1o09EHjJhwGHQnTULy49O56BN2elitZatuVSU ZZHE2QE2HhaD1n12/8Q7T7nnauPp9ngxIUskVqjeo8Svyblt4M0eaxOT/KjE5wgv+LuK 4ETg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=E8W67IhC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p3-20020a634f43000000b00478fbfd5205si5848457pgl.302.2022.12.24.00.51.43; Sat, 24 Dec 2022 00:51:51 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=E8W67IhC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230166AbiLXHil (ORCPT + 64 others); Sat, 24 Dec 2022 02:38:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50334 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229637AbiLXHij (ORCPT ); Sat, 24 Dec 2022 02:38:39 -0500 Received: from mail-lj1-x22d.google.com (mail-lj1-x22d.google.com [IPv6:2a00:1450:4864:20::22d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E171E15F22; Fri, 23 Dec 2022 23:38:37 -0800 (PST) Received: by mail-lj1-x22d.google.com with SMTP id s22so7143176ljp.5; Fri, 23 Dec 2022 23:38:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=PEIbv2GFycGfkOrB6KG3qHEk5qaPBRjT/j4MyPFbif0=; b=E8W67IhC7OkRrpnhj7Qq8fVxhwp8zOnW5G4smr4zyvYRJAPkpf/vGsl4Nf0WcoMMQm dWHSHaQiWqz9s9DbfDLsEMPtIzyUjZhGWMX3Y7ZefggFsFOP7ywfT4SqxpOogC0owt6w gJEGtqkF1a7LCYkm0Q9woCiZzRXSPMbuAKxO8cezymxjMMqQ09f7C5wL2Qq6kZJg0VW6 rCMP95gFXwRJg28ybVtQAJfTPW8QPhKibYJGwKYnLwo7b4ABoHZW3ZbwcIbsH8V4IACN QtmlfBpYcEO6M8kPaKE42/CI0WO2U7jZPPDS5TgCyszeixZ8Y/1moY2HUFi5oa1scfkh osVA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=PEIbv2GFycGfkOrB6KG3qHEk5qaPBRjT/j4MyPFbif0=; b=DPSHHDiocGHZcxZehukzSZw3MPC69xwU03xyUQDuqeK+1uO0VMXZ4zbv7mKh9tT6do 657lVK146cc9VWrP63X2Rugm8Ffdh0/TovegSc0IFEZ1/afHnyISVNsBo7wKFWA4tFoK E+7cMX5xDx5Jcxl+ofcPxeMUwDK7VICBtIU15+Rm2VMug/RY2OZXbrikBhY58vhtmdow YwrNqdSSo8Hg9rH32qvF/7buNz1VZIr1h945zmmb/K9HDUQN9fO3Tj7F13OYpf0CwJVR SxEV+Pj/FgOB/7m8r4M7vLc1k64hCrZoc4kRD53WEBv+IFZu1iSSgc/l4ddC9+b6isJd 1IVQ== X-Gm-Message-State: AFqh2krZ3AWOEcAm5pTF4v8S3nDHXjJsDOrRdUdi0RdIw5qfRYyXX7u+ jQxgxkA6uc7qMrJEYegVQlDpwjsHPWA2Kw0EVKQ= X-Received: by 2002:a2e:9b53:0:b0:277:155d:28c4 with SMTP id o19-20020a2e9b53000000b00277155d28c4mr760979ljj.123.1671867516121; Fri, 23 Dec 2022 23:38:36 -0800 (PST) MIME-Version: 1.0 References: <20221218234801.579114-1-jmaxwell37@gmail.com> <9f145202ca6a59b48d4430ed26a7ab0fe4c5dfaf.camel@redhat.com> <20221223212835.eb9d03f3f7db22360e34341d@uniroma2.it> In-Reply-To: <20221223212835.eb9d03f3f7db22360e34341d@uniroma2.it> From: Jonathan Maxwell Date: Sat, 24 Dec 2022 18:38:01 +1100 Message-ID: Subject: Re: [net-next] ipv6: fix routing cache overflow for raw sockets To: Andrea Mayer Cc: Paolo Abeni , davem@davemloft.net, edumazet@google.com, kuba@kernel.org, yoshfuji@linux-ipv6.org, dsahern@kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Stefano Salsano , Paolo Lungaroni , Ahmed Abdelsalam Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Dec 24, 2022 at 7:28 AM Andrea Mayer wrote: > > Hi Jon, > please see below, thanks. > > On Wed, 21 Dec 2022 08:48:11 +1100 > Jonathan Maxwell wrote: > > > On Tue, Dec 20, 2022 at 11:35 PM Paolo Abeni wrote: > > > > > > On Mon, 2022-12-19 at 10:48 +1100, Jon Maxwell wrote: > > > > Sending Ipv6 packets in a loop via a raw socket triggers an issue where a > > > > route is cloned by ip6_rt_cache_alloc() for each packet sent. This quickly > > > > consumes the Ipv6 max_size threshold which defaults to 4096 resulting in > > > > these warnings: > > > > > > > > [1] 99.187805] dst_alloc: 7728 callbacks suppressed > > > > [2] Route cache is full: consider increasing sysctl net.ipv6.route.max_size. > > > > . > > > > . > > > > [300] Route cache is full: consider increasing sysctl net.ipv6.route.max_size. > > > > > > If I read correctly, the maximum number of dst that the raw socket can > > > use this way is limited by the number of packets it allows via the > > > sndbuf limit, right? > > > > > > > Yes, but in my test sndbuf limit is never hit so it clones a route for > > every packet. > > > > e.g: > > > > output from C program sending 5000000 packets via a raw socket. > > > > ip raw: total num pkts 5000000 > > > > # bpftrace -e 'kprobe:dst_alloc {@count[comm] = count()}' > > Attaching 1 probe... > > > > @count[a.out]: 5000009 > > > > > Are other FLOWI_FLAG_KNOWN_NH users affected, too? e.g. nf_dup_ipv6, > > > ipvs, seg6? > > > > > > > Any call to ip6_pol_route(s) where no res.nh->fib_nh_gw_family is 0 can do it. > > But we have only seen this for raw sockets so far. > > > > In the SRv6 subsystem, the seg6_lookup_nexthop() is used by some > cross-connecting behaviors such as End.X and End.DX6 to forward traffic to a > specified nexthop. SRv6 End.X/DX6 can specify an IPv6 DA (i.e., a nexthop) > different from the one carried by the IPv6 header. For this purpose, > seg6_lookup_nexthop() sets the FLOWI_FLAG_KNOWN_NH. > Hi Andrea, Thanks for pointing that datapath out. The more generic approach we are taking bringing Ipv6 closer to Ipv4 in this regard should fix all instances of this. > > > > [1] 99.187805] dst_alloc: 7728 callbacks suppressed > > > > [2] Route cache is full: consider increasing sysctl net.ipv6.route.max_size. > > > > . > > > > . > > > > [300] Route cache is full: consider increasing sysctl net.ipv6.route.max_size. > > I can reproduce the same warning messages reported by you, by instantiating an > End.X behavior whose nexthop is handled by a route for which there is no "via". > In this configuration, the ip6_pol_route() (called by seg6_lookup_nexthop()) > triggers ip6_rt_cache_alloc() because i) the FLOWI_FLAG_KNOWN_NH is present ii) > and the res.nh->fib_nh_gw_family is 0 (as already pointed out). > Nice, when I get back after the holiday break I'll submit the next patch. It would be great if you could test the new patch and let me know how it works in your tests at that juncture. I'll keep you posted. Regards Jon > > Regards > > > > Jon > > Ciao, > Andrea