Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1E420C433FE for ; Wed, 24 Nov 2021 16:34:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349050AbhKXQhc (ORCPT ); Wed, 24 Nov 2021 11:37:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49544 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242469AbhKXQhb (ORCPT ); Wed, 24 Nov 2021 11:37:31 -0500 Received: from mail-lf1-x130.google.com (mail-lf1-x130.google.com [IPv6:2a00:1450:4864:20::130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 402BEC061746 for ; Wed, 24 Nov 2021 08:34:21 -0800 (PST) Received: by mail-lf1-x130.google.com with SMTP id b40so8728749lfv.10 for ; Wed, 24 Nov 2021 08:34:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=zPOV/LNYwYTsmB7mQfhpRU+MWIN56vkkbdkQqLN+cvk=; b=bJ63a2e3Wy0M/xAdLhyaS4w9SbJ0nm2ErOeirbaa8ttvAjUGrt3n+uoQrwtA8+hWfd ssH9Z9lR1kI0OAKzPImESpcPZ8IZcoCaIkI8XQf5BKZmLZ7exvaa2/Mz/BeEQRf6oo0i HmYXatM1v2rz+lybQ4T/PYzN4LzvdDECznQdU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=zPOV/LNYwYTsmB7mQfhpRU+MWIN56vkkbdkQqLN+cvk=; b=b64ysyxqy2RZrrT7V2iJ16jQpZFJIGHJFAdDCHRREXkE0ahmcGUUjhpnqjNVpKX8pz 5ljXsu+zhOJ2ZseBWng0n+QNGsGd7bC1XkXJauROqSGw3dS+HBnF2Uy0xvy+PvzKSgHg NyIoNqrEvD05H3kIT+ncjCPiRyxO7bMg3wB9uytUUkyjJWkcEakJ6HthBiQZZAzqV0r3 TdFoYn9oIDLZ36MXRHAMfM2Of340STFydN+hx+OL6xnC9TwYUAY2lpJXRPNEWS/hJRAW AitwHw7+YIIpLXXVl/ak1/OedF1ZnEc2+eJWtwsXX2eh2Chqrkf4TccUcsOrAXI7puJX k5kA== X-Gm-Message-State: AOAM531ApdnTlHn3lYeeTKY2BPCW9xeshfwY8ty+/H8sUU37vgEdES1Z 3IOOfkhPecE7LoEn7+216vho3ORBeW/XArtodvb2eA== X-Google-Smtp-Source: ABdhPJxll5plpSGM9GkzJLSlx7bxnJPHlV7QraqV5JuIj+jX+jjAxnkuvJ3L3Cz96Fx4R8waIVALa7LdANrfPhP7N5g= X-Received: by 2002:a19:6b08:: with SMTP id d8mr16476607lfa.39.1637771659326; Wed, 24 Nov 2021 08:34:19 -0800 (PST) MIME-Version: 1.0 References: <20211123163955.154512-1-alexandr.lobakin@intel.com> <20211123163955.154512-22-alexandr.lobakin@intel.com> <77407c26-4e32-232c-58e0-2d601d781f84@iogearbox.net> In-Reply-To: <77407c26-4e32-232c-58e0-2d601d781f84@iogearbox.net> From: Lorenz Bauer Date: Wed, 24 Nov 2021 16:34:08 +0000 Message-ID: Subject: Re: [PATCH v2 net-next 21/26] ice: add XDP and XSK generic per-channel statistics To: Daniel Borkmann Cc: Alexander Lobakin , "David S. Miller" , Jakub Kicinski , Jesse Brandeburg , Michal Swiatkowski , Maciej Fijalkowski , Jonathan Corbet , Shay Agroskin , Arthur Kiyanovski , David Arinzon , Noam Dagan , Saeed Bishara , Ioana Ciornei , Claudiu Manoil , Tony Nguyen , Thomas Petazzoni , Marcin Wojtas , Russell King , Saeed Mahameed , Leon Romanovsky , Alexei Starovoitov , Jesper Dangaard Brouer , =?UTF-8?B?VG9rZSBIw7hpbGFuZC1Kw7hyZ2Vuc2Vu?= , John Fastabend , Edward Cree , Martin Habets , "Michael S. Tsirkin" , Jason Wang , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Lorenzo Bianconi , Yajun Deng , Sergey Ryazanov , David Ahern , Andrei Vagin , Johannes Berg , Vladimir Oltean , Cong Wang , Networking , linux-doc@vger.kernel.org, LKML , linux-rdma@vger.kernel.org, bpf , virtualization@lists.linux-foundation.org Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Daniel asked me to share my opinion, as Cloudflare has an XDP load balancer as well. On Wed, 24 Nov 2021 at 00:53, Daniel Borkmann wrote: > I'm just taking our XDP L4LB in Cilium as an example: there we already count errors and > export them via per-cpu map that eventually lead to XDP_DROP cases including the /reason/ > which caused the XDP_DROP (e.g. Prometheus can then scrape these insights from all the > nodes in the cluster). Given the different action codes are very often application specific, > there's not much debugging that you can do when /only/ looking at `ip link xdpstats` to > gather insight on *why* some of these actions were triggered (e.g. fib lookup failure, etc). Agreed. For our purpose we often want to know whether a specific program has been invoked. Per-channel or per device stats don't help us much since we have a chain of programs (not using libxdp though). My colleague Arthur has written xdpcap [1], which gives per-action, per-program counters. This way we can correlate an action with a packet and a program. > If really of interest, then maybe libxdp could have such per-action counters as opt-in in > its call chain.. We could also make it part of BPF_ENABLE_STATS, it's kind of coarse grained though. > In the case of ice_run_xdp() today, we already bump total_rx_bytes/total_rx_pkts under > XDP and update ice_update_rx_ring_stats(). I do see the case for XDP_TX and XDP_REDIRECT > where we run into driver-specific errors that are /outside of the reach/ of the BPF prog. > For example, we've been running into errors from XDP_TX in ice_xmit_xdp_ring() in the > past during testing, and were able to pinpoint the location as xdp_ring->tx_stats.tx_busy > was increasing. These things are useful and would make sense to standardize for XDP context. I'd like to see more tracepoints like trace_xdp_exception, personally. We can use things like bpftrace for exploration and ebpf_exporter [2] to generate alerts much more easily than something wired into iproute2. Best Lorenz 1: https://github.com/cloudflare/xdpcap 2: https://github.com/cloudflare/ebpf_exporter -- Lorenz Bauer | Systems Engineer 6th Floor, County Hall/The Riverside Building, SE1 7PB, UK www.cloudflare.com