Received: by 2002:a05:6358:7058:b0:131:369:b2a3 with SMTP id 24csp3071204rwp; Fri, 14 Jul 2023 17:01:32 -0700 (PDT) X-Google-Smtp-Source: APBJJlGabnPiPmSKOtd9vWSFtfwL5fRCCrQeukb4l8ES8DoZJUKwJ1pUeKAq7BVgHKxJemPJCjPJ X-Received: by 2002:a17:90a:fb0e:b0:25c:8b5e:814 with SMTP id it14-20020a17090afb0e00b0025c8b5e0814mr5111850pjb.44.1689379292527; Fri, 14 Jul 2023 17:01:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689379292; cv=none; d=google.com; s=arc-20160816; b=n2M16dM3FOidlEE1iLuu2EqgyqOIQPrWYaigwyJeeGfAKKhDNk0rrliT8W/54Fj5A9 OgyHDnlmHHz1uWzqA1Hcsnsdy0kkqFuVe8mAnHwXV4hLVwsygXK+6t6BUygvpSv7NfuN qla5pvUttnQyttnR6FrY59tguEgixoYZrUZGpimzok7bgFOgBqi7AAx5gf/izdnmd/YZ jb8g5s3GrKcOdtnbKgT68p/xBzav4BRx/jkEeeg8bzPb85opjzJphvLujy5/p8jzx6c7 1PPo65a/XYbu0/ZuQJ5TMR75wkrcYGx0AXSLnYOnqEmho/1xg2bwOciAGk71MA9kWsC0 oajQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=zBT3rUjJGAxYbppKiA656uXoRrGlh1XdLp4eh9nDCrQ=; fh=U8bxSgs163fXsWOexzmYaJHx/OPgPpDSTs5ZcOgxHac=; b=rBEzcKteUpqMSbjBM/WH+UIQgrW7Uu2ZINaQZO60LIa9fEhNu77jaqqsWFK7o4afsg w79zE8BsmKtjvJR0wdMR9+rTx8cNfUKoLK/vNxuhpXeGkmR+74538V017Yv9JOIGqa5B 6pHT5+olEw1YODfkrpX5hQz1LsT4efMi92iT+Y1uoj0ih7JnEn8S5XODJnhjBiF5Oy0E VvvPySXgOVzzVtAXQECuwn9tdmxq/ds224BBz2l7LKVzZ+UEpHzVjElHuZhCIg2WpD/I kv1v/ECMpuIU1b5ru+d+jen4FaU5/KzOSV8gYSkX6TcPUcukobDombODcKp5KXxWqnZr g/7Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cloudflare.com header.s=google header.b=p2J7w1aj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=cloudflare.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z8-20020a17090a8b8800b0025e9a3124c7si1945526pjn.143.2023.07.14.17.01.21; Fri, 14 Jul 2023 17:01:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@cloudflare.com header.s=google header.b=p2J7w1aj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=cloudflare.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229737AbjGNXiz (ORCPT + 99 others); Fri, 14 Jul 2023 19:38:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48956 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229502AbjGNXiy (ORCPT ); Fri, 14 Jul 2023 19:38:54 -0400 Received: from mail-wr1-x431.google.com (mail-wr1-x431.google.com [IPv6:2a00:1450:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DDECB3A92 for ; Fri, 14 Jul 2023 16:38:52 -0700 (PDT) Received: by mail-wr1-x431.google.com with SMTP id ffacd0b85a97d-314172bac25so2553618f8f.3 for ; Fri, 14 Jul 2023 16:38:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; t=1689377931; x=1691969931; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=zBT3rUjJGAxYbppKiA656uXoRrGlh1XdLp4eh9nDCrQ=; b=p2J7w1ajpOdvwrAGvge0quNYmL6gv7jREgk3lB+pZPHG8ZNS/mdOZ80VZXHKw3CVR3 tNLxlc1PxlmwAQSpmcMhF1laQQZL8/quot+4wRhUysKnqpDGqM/QPeUIvhj6rWzttEwj OBviYO6vvg7P6SS6qBZc3j2mh77sZTstHPKyI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689377931; x=1691969931; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zBT3rUjJGAxYbppKiA656uXoRrGlh1XdLp4eh9nDCrQ=; b=FnypPc1yulfTzLHWqI45St2cfXkGmxFXG4JWmbiPh2UL8nk80AWEVYp8MvZ4Musu2b G92q/TbxGGWSk05TPTFDraKaeIohorFPdTFsj5kzwhMCg9Shg/NDpNZv0LcU8RnnR80O c6iQaDHBI7G396VLb4/NSa1JLnh7thCHAF3lCFj1wiAPEqLhhM860dfu0bWwCHds7Ox3 eMGmTC5EBlNbZ5RI38Z+yLW3TAJZ8CbE4zmjkO7oadqFVtn/ApbsOAT1mI64tg7Pa9DR EpwsEdvgh0AyYNvQjzQFoYLYLvAPPqUMumyhT8Sa356MWwdhZenSbNlsWJeRyxdMG5rw e2mg== X-Gm-Message-State: ABy/qLYD46D7xry5TBYiwnmHcW/BEC+SXclaI3kq+4uEs4SYLB+YBMTL vaPjz/1Ek4piVxZ6d8k9Zf3X2BR7LJE/EfNYpy5M0A== X-Received: by 2002:a5d:5505:0:b0:313:f98a:1fd3 with SMTP id b5-20020a5d5505000000b00313f98a1fd3mr5794979wrv.27.1689377931105; Fri, 14 Jul 2023 16:38:51 -0700 (PDT) MIME-Version: 1.0 References: <20230711043453.64095-1-ivan@cloudflare.com> <20230711193612.22c9bc04@kernel.org> <20230712104210.3b86b779@kernel.org> In-Reply-To: From: Ivan Babrou Date: Fri, 14 Jul 2023 16:38:40 -0700 Message-ID: Subject: Re: [RFC PATCH net-next] tcp: add a tracepoint for tcp_listen_queue_drop To: David Ahern Cc: Jakub Kicinski , Yan Zhai , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@cloudflare.com, Eric Dumazet , "David S. Miller" , Paolo Abeni , Steven Rostedt , Masami Hiramatsu Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jul 14, 2023 at 8:09=E2=80=AFAM David Ahern wr= ote: > > We can start a separate discussion to break it down by category if it > > would help. Let me know what kind of information you would like us to > > provide to help with that. I assume you're interested in kernel stacks > > leading to kfree_skb with NOT_SPECIFIED reason, but maybe there's > > something else. > > stack traces would be helpful. Here you go: https://lore.kernel.org/netdev/CABWYdi00L+O30Q=3DZah28QwZ_5RU-= xcxLFUK2Zj08A8MrLk9jzg@mail.gmail.com/ > > Even if I was only interested in one specific reason, I would still > > have to arm the whole tracepoint and route a ton of skbs I'm not > > interested in into my bpf code. This seems like a lot of overhead, > > especially if I'm dropping some attack packets. > > you can add a filter on the tracepoint event to limit what is passed > (although I have not tried the filter with an ebpf program - e.g., > reason !=3D NOT_SPECIFIED). Absolutely, but isn't there overhead to even do just that for every freed s= kb? > > If you have an ebpf example that would help me extract the destination > > port from an skb in kfree_skb, I'd be interested in taking a look and > > trying to make it work. > > This is from 2020 and I forget which kernel version (pre-BTF), but it > worked at that time and allowed userspace to summarize drop reasons by > various network data (mac, L3 address, n-tuple, etc): > > https://github.com/dsahern/bpf-progs/blob/master/ksrc/pktdrop.c It doesn't seem to extract the L4 metadata (local port specifically), which is what I'm after. > > The need to extract the protocol level information in ebpf is only > > making kfree_skb more expensive for the needs of catching rare cases > > when we run out of buffer space (UDP) or listen queue (TCP). These two > > cases are very common failure scenarios that people are interested in > > catching with straightforward tracepoints that can give them the > > needed information easily and cheaply. > > > > I sympathize with the desire to keep the number of tracepoints in > > check, but I also feel like UDP buffer drops and TCP listen drops > > tracepoints are very much justified to exist. > > sure, kfree_skb is like the raw_syscall tracepoint - it can be more than > what you need for a specific problem, but it is also give you way more > than you are thinking about today. I really like the comparison to raw_syscall tracepoint. There are two flavo= rs: 1. Catch-all: raw_syscalls:sys_enter, which is similar to skb:kfree_skb. 2. Specific tracepoints: syscalls:sys_enter_* for every syscall. If you are interested in one rare syscall, you wouldn't attach to a catch-all firehose and the filter for id in post. Understandably, we probably can't have a separate skb:kfree_skb for every reason. However, some of them are more useful than others and I believe that tcp listen drops fall into that category. We went through a similar exercise with audit subsystem, which in fact always arms all syscalls even if you audit one of them: * https://lore.kernel.org/audit/20230523181624.19932-1-ivan@cloudflare.com/= T/#u With pictures, if you're interested: * https://mastodon.ivan.computer/@mastodon/110426498281603668 Nobody audits futex(), but if you audit execve(), all the rules run for both. Some rules will run faster, but all of them will run. It's a lot of overhead with millions of CPUs, which I'm trying to avoid (the planet is hot as it is). Ultimately my arguments for a separate tracepoint for tcp listen drops are at the bottom of my reply to Jakub: * https://lore.kernel.org/netdev/CABWYdi2BGi=3DiRCfLhmQCqO=3D1eaQ1WaCG7F9Ws= Jrz-7=3D=3DocZidg@mail.gmail.com/