Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp1196266rdb; Wed, 6 Dec 2023 11:06:59 -0800 (PST) X-Google-Smtp-Source: AGHT+IENMJTTUeyHJiQK3pC70A9hJyLXOKRtsN1QQY0/x5Rjvv3DS/1pXTqKnhNZMj+Ii1HmDfXl X-Received: by 2002:a17:90a:49:b0:286:7469:b153 with SMTP id 9-20020a17090a004900b002867469b153mr1472279pjb.46.1701889619338; Wed, 06 Dec 2023 11:06:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701889619; cv=none; d=google.com; s=arc-20160816; b=Oh6nwbGb9fFzD0Ri2pPnWtCmwlTiKjOVxnRvO1uTm0rhaAMUsTFJLfXfuiqvRWd1fs P5RiXX2HNDZfYsv11tNeaf31AMl/yZKlBXxq9c6F+l+QfFYzXNee/D0MhhvXLhLhZ7j8 UELCY1eTpk39sgY4Dbl5gCF1kx1kZDD+82qJbNOtAfkRRmC3SlSLPoeczrc14TwwTMZe 5ZGeTMUxcG0g53hyrx4H9X0mAbyhPl5ioJuDyN+t9ycUZwlO/+KzxctSKl5qnjOdJoKN JpCcDWcuR8r9VxXUa8EYpRqET18akt4DmZuDxZvd7irgKoTaASZg3NvU+v7/sp2/s3kp D4yQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:from:subject :message-id:references:mime-version:in-reply-to:date:dkim-signature; bh=I409PBeV0vI0BSi/sNEko+BMZt4s00+VHGZ9Cxi/uGs=; fh=z9XbSqAViTDt2/e+oYvpWCkY8O2vGF0xn7gpiIemcPM=; b=qjH4f4BKX6PtwsaeTi+v1YRVQ6WhUFfjQeitwlP/3G0R6tKtUXRm1xbIPb7yHz7wtz 9PN4StbSwwr2lId8MuL9l9zqqLzbgVcrGFLE3SDpY5LI4aj4qCsLVH1FGgan9na1wfiQ vtOizg3b3FzHlcyroW1wrRNo/d/qSAFsiDjBGWejFBu9yn4pLYqwNekDftFR+msecArk xdZQCCa+jm3QnmKr70cMiHhbwOOTMeFkuEHTC1syWU6MeYEJJYAeVXiL9OvqP4tPoH0h EJRTOf5ep/2rHqNF+yxT6Eials3mqfWiMDRhJwOPWPgd9WoE2wwpnRyE7eStebiMGloe SoWQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b="a/b5xmrK"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from agentk.vger.email (agentk.vger.email. [2620:137:e000::3:2]) by mx.google.com with ESMTPS id t6-20020a17090ae50600b00286b29bb241si293556pjy.7.2023.12.06.11.06.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Dec 2023 11:06:59 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) client-ip=2620:137:e000::3:2; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b="a/b5xmrK"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 3156B8028FBF; Wed, 6 Dec 2023 11:06:24 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1379219AbjLFTGE (ORCPT + 99 others); Wed, 6 Dec 2023 14:06:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37976 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1379124AbjLFTGC (ORCPT ); Wed, 6 Dec 2023 14:06:02 -0500 Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com [IPv6:2607:f8b0:4864:20::54a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1B199181 for ; Wed, 6 Dec 2023 11:06:08 -0800 (PST) Received: by mail-pg1-x54a.google.com with SMTP id 41be03b00d2f7-5c17cff57f9so65185a12.0 for ; Wed, 06 Dec 2023 11:06:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1701889567; x=1702494367; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=I409PBeV0vI0BSi/sNEko+BMZt4s00+VHGZ9Cxi/uGs=; b=a/b5xmrKgk/7uyzJAxjbIr2Xb7wyTKzTfLsjFb6oWW05EP6YlIjaNmUcBT0I1x8yq5 TEXhXYDpVmPJELmA6eoz0fCoelVWgMttaxPwpp2tsf+r+jVlJvsoeSEE54lur5ShfBs9 vWWXsgclEYLdYu/0MGT2L1AoCRq8ijqhLD8eQUf69H0NWlAR3WSH6EujcCYZmMaQNSsA RxiBPVWWa8zcJmmenYgCRNJWLyyIt8hm1H/4IR5q36cszbwoBHZzPaCwcWVv4MYTSRth RnwkyJJdnvKS5+Pj+G0Re10FUOBe8MuenPHZF2JltBxQGtxDUfQclF6ORuX7TPEyRPZK KxSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701889567; x=1702494367; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=I409PBeV0vI0BSi/sNEko+BMZt4s00+VHGZ9Cxi/uGs=; b=JoyE+xo+xChgbo3lPrxKRtjEVRFQ0cDTefIE6JhiUSXKl135NVRs6xOBglZSbXVbq2 VQfLekWqA7kQkJQeFDm3kyDrZ7Gdc35m02V7Qnx0nkcHXsSZypgOpH0D0MrlKmZFdUW5 r3/HuqKNJeRXumP/tlAFLFNHjDL0aYMxCK+2jDTAcELqEAIkWQHKlGjFLOJRfsMBUB6X 1LuyMyRmjo6pRhyDrNln8qezB4iKEJmH5kyTQkLgP2M3tVIIRKkNuIMunLzb3Gvzjvz+ 5V6WZovIWmZq17ZBsxzGdK8Vv7r8oDMLY6Q+mb7ACjjJFzUaYJiTlkihAdncQtwlAE6W 3waw== X-Gm-Message-State: AOJu0Yy+3KcuLNAUN5DY8KtykKD/pXZ2DVRYmjFnKnqGQD/Z3DRZGdm9 CDq4mUAkxMA5407DdjzvTsXnIPE= X-Received: from sdf.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5935]) (user=sdf job=sendgmr) by 2002:a65:67d2:0:b0:5c6:a4e5:2d6a with SMTP id b18-20020a6567d2000000b005c6a4e52d6amr14804pgs.7.1701889567399; Wed, 06 Dec 2023 11:06:07 -0800 (PST) Date: Wed, 6 Dec 2023 11:06:05 -0800 In-Reply-To: Mime-Version: 1.0 References: <20231203165129.1740512-1-yoong.siang.song@intel.com> <20231203165129.1740512-3-yoong.siang.song@intel.com> <43b01013-e78b-417e-b169-91909c7309b1@kernel.org> <656de830e8d70_2e983e294ca@willemb.c.googlers.com.notmuch> <5a0faf8cc9ec3ab0d5082c66b909c582c8f1eae6.camel@siemens.com> <656f66023f7bd_3dd6422942a@willemb.c.googlers.com.notmuch> Message-ID: Subject: Re: [xdp-hints] Re: [PATCH bpf-next v3 2/3] net: stmmac: add Launch Time support to XDP ZC From: Stanislav Fomichev To: Magnus Karlsson Cc: Willem de Bruijn , Florian Bezdeka , yoong.siang.song@intel.com, Jesper Dangaard Brouer , davem@davemloft.net, Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jonathan Corbet , Bjorn Topel , magnus.karlsson@intel.com, maciej.fijalkowski@intel.com, Jonathan Lemon , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Lorenzo Bianconi , Tariq Toukan , Willem de Bruijn , Maxime Coquelin , Andrii Nakryiko , Mykola Lysenko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Hao Luo , Jiri Olsa , Shuah Khan , Alexandre Torgue , Jose Abreu , "netdev@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-doc@vger.kernel.org" , "bpf@vger.kernel.org" , "xdp-hints@xdp-project.net" , "linux-stm32@st-md-mailman.stormreply.com" , "linux-arm-kernel@lists.infradead.org" , "linux-kselftest@vger.kernel.org" Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Wed, 06 Dec 2023 11:06:24 -0800 (PST) On 12/06, Magnus Karlsson wrote: > On Tue, 5 Dec 2023 at 20:39, Stanislav Fomichev wrote: > > > > On 12/05, Willem de Bruijn wrote: > > > Stanislav Fomichev wrote: > > > > On Tue, Dec 5, 2023 at 7:34=E2=80=AFAM Florian Bezdeka > > > > wrote: > > > > > > > > > > On Tue, 2023-12-05 at 15:25 +0000, Song, Yoong Siang wrote: > > > > > > On Monday, December 4, 2023 10:55 PM, Willem de Bruijn wrote: > > > > > > > Jesper Dangaard Brouer wrote: > > > > > > > > > > > > > > > > > > > > > > > > On 12/3/23 17:51, Song Yoong Siang wrote: > > > > > > > > > This patch enables Launch Time (Time-Based Scheduling) su= pport to XDP zero > > > > > > > > > copy via XDP Tx metadata framework. > > > > > > > > > > > > > > > > > > Signed-off-by: Song Yoong Siang > > > > > > > > > --- > > > > > > > > > drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 = ++ > > > > > > > > > > > > > > > > As requested before, I think we need to see another driver = implementing > > > > > > > > this. > > > > > > > > > > > > > > > > I propose driver igc and chip i225. > > > > > > > > > > > > Sure. I will include igc patches in next version. > > > > > > > > > > > > > > > > > > > > > > The interesting thing for me is to see how the LaunchTime m= ax 1 second > > > > > > > > into the future[1] is handled code wise. One suggestion is = to add a > > > > > > > > section to Documentation/networking/xsk-tx-metadata.rst per= driver that > > > > > > > > mentions/documents these different hardware limitations. I= t is natural > > > > > > > > that different types of hardware have limitations. This is= a close-to > > > > > > > > hardware-level abstraction/API, and IMHO as long as we docu= ment the > > > > > > > > limitations we can expose this API without too many limitat= ions for more > > > > > > > > capable hardware. > > > > > > > > > > > > Sure. I will try to add hardware limitations in documentation. > > > > > > > > > > > > > > > > > > > > I would assume that the kfunc will fail when a value is passe= d that > > > > > > > cannot be programmed. > > > > > > > > > > > > > > > > > > > In current design, the xsk_tx_metadata_request() dint got retur= n value. > > > > > > So user won't know if their request is fail. > > > > > > It is complex to inform user which request is failing. > > > > > > Therefore, IMHO, it is good that we let driver handle the error= silently. > > > > > > > > > > > > > > > > If the programmed value is invalid, the packet will be "dropped" = / will > > > > > never make it to the wire, right? > > > > > > Programmable behavior is to either drop or cap to some boundary > > > value, such as the farthest programmable time in the future: the > > > horizon. In fq: > > > > > > /* Check if packet timestamp is too far in the future= . */ > > > if (fq_packet_beyond_horizon(skb, q, now)) { > > > if (q->horizon_drop) { > > > q->stat_horizon_drops++; > > > return qdisc_drop(skb, sch, t= o_free); > > > } > > > q->stat_horizon_caps++; > > > skb->tstamp =3D now + q->horizon; > > > } > > > fq_skb_cb(skb)->time_to_send =3D skb->tstamp; > > > > > > Drop is the more obviously correct mode. > > > > > > Programming with a clock source that the driver does not support will > > > then be a persistent failure. > > > > > > Preferably, this driver capability can be queried beforehand (rather > > > than only through reading error counters afterwards). > > > > > > Perhaps it should not be a driver task to convert from possibly > > > multiple clock sources to the device native clock. Right now, we do > > > use per-device timecounters for this, implemented in the driver. > > > > > > As for which clocks are relevant. For PTP, I suppose the device PHC, > > > converted to nsec. For pacing offload, TCP uses CLOCK_MONOTONIC. > > > > Do we need to expose some generic netdev netlink apis to query/adjust > > nic clock sources (or maybe there is something existing already)? > > Then the userspace can be responsible for syncing/converting the > > timestamps to the internal nic clocks. +1 to trying to avoid doing > > this in the drivers. > > > > > > > That is clearly a situation that the user should be informed abou= t. For > > > > > RT systems this normally means that something is really wrong reg= arding > > > > > timing / cycle overflow. Such systems have to react on that situa= tion. > > > > > > > > In general, af_xdp is a bit lacking in this 'notify the user that t= hey > > > > somehow messed up' area :-( > > > > For example, pushing a tx descriptor with a wrong addr/len in zc mo= de > > > > will not give any visible signal back (besides driver potentially > > > > spilling something into dmesg as it was in the mlx case). > > > > We can probably start with having some counters for these events? > > > > > > This is because the AF_XDP completion queue descriptor format is only > > > a u64 address? > > > > Yeah. XDP_COPY mode has the descriptor validation which is exported via > > recvmsg errno, but zerocopy path seems to be too deep in the stack > > to report something back. And there is no place, as you mention, > > in the completion ring to report the status. > > > > > Could error conditions be reported on tx completion in the metadata, > > > using xsk_tx_metadata_complete? > > > > That would be one way to do it, yes. But then the error reporting depen= ds > > on the metadata opt-in. Having a separate ring to export the errors, > > or having a v2 tx-completions layout with extra 'status' field would al= so > > work. >=20 > There are error counters for the non-metadata and offloading cases > above that can be retrieved with the XDP_STATISTICS getsockopt(). From > if_xdp.h: >=20 > struct xdp_statistics { > __u64 rx_dropped; /* Dropped for other reasons */ > __u64 rx_invalid_descs; /* Dropped due to invalid descriptor */ > __u64 tx_invalid_descs; /* Dropped due to invalid descriptor */ > __u64 rx_ring_full; /* Dropped due to rx ring being full */ > __u64 rx_fill_ring_empty_descs; /* Failed to retrieve item > from fill ring */ > __u64 tx_ring_empty_descs; /* Failed to retrieve item from tx rin= g */ > }; >=20 > Albeit, these are aggregate statistics and do not say anything about > which packet that caused it. Works well for things that are > programming bugs that should not occur (such as rx_invalid_descs and > tx_invalid_descs) and requires the programmer to debug and fix his or > her program, but it does not work for requests that might fail even > though the program is correct and need to be handled on a packet by > packet basis. So something needs to be added for that as you both say. >=20 > Would prefer if we could avoid a v2 completion descriptor format or > another ring that needs to be checked all the time, so if we could > live with providing the error status in the metadata field of the > packet at completion time, that would be good. Though having the error > status in the completion ring would be faster as that cache line is > hot, while the metadata section of the packet is likely not at > completion time. So that speaks for a v2 completion ring format. Just > thinking out loud here. In this case, maybe adding tx_over_horizon_dropped to XDP_STATISTICS is all we need here? We can have some new api to query this horizon per netdev.