2024-03-01 16:25:58

by Song, Yoong Siang

[permalink] [raw]
Subject: [PATCH iwl-next,v2 0/2] XDP Tx Hardware Timestamp for igc driver

Implemented XDP transmit hardware timestamp metadata for igc driver.

This patchset is tested with tools/testing/selftests/bpf/xdp_hw_metadata
on Intel ADL-S platform. Below are the test steps and results.

Test Step 1: Run xdp_hw_metadata app
sudo ./xdp_hw_metadata <iface> > /dev/shm/result.log

Test Step 2: Enable Tx hardware timestamp
sudo hwstamp_ctl -i <iface> -t 1 -r 1

Test Step 3: Run ptp4l and phc2sys for time synchronization

Test Step 4: Generate 1000 UDP packets with 1ms interval
sudo trafgen --dev <iface> '{eth(da=<mac addr>), udp(dp=9091)}' -t 1ms -n 1000

Result of last 3 packets:
poll: 1 (0) skip=50 fail=0 redir=998
xsk_ring_cons__peek: 1
0x55d7e76a02d0: rx_desc[997]->addr=96110 addr=96110 comp_addr=96110 EoP
rx_hash: 0x11A51182 with RSS type:0x1
HW RX-time: 1677795020447895823 (sec:1677795020.4479) delta to User RX-time sec:0.0000 (16.309 usec)
XDP RX-time: 1677795020447906552 (sec:1677795020.4479) delta to User RX-time sec:0.0000 (5.580 usec)
No rx_vlan_tci or rx_vlan_proto, err=-95
0x55d7e76a02d0: ping-pong with csum=ab19 (want 315b) csum_start=34 csum_offset=6
0x55d7e76a02d0: complete tx idx=997 addr=65010
HW TX-complete-time: 1677795020447961519 (sec:1677795020.4480) delta to User TX-complete-time sec:0.0001 (79.979 usec)
XDP RX-time: 1677795020447906552 (sec:1677795020.4479) delta to User TX-complete-time sec:0.0001 (134.946 usec)
HW RX-time: 1677795020447895823 (sec:1677795020.4479) delta to HW TX-complete-time sec:0.0001 (65.696 usec)
0x55d7e76a02d0: complete rx idx=1125 addr=96110

poll: 1 (0) skip=50 fail=0 redir=999
xsk_ring_cons__peek: 1
0x55d7e76a02d0: rx_desc[998]->addr=98110 addr=98110 comp_addr=98110 EoP
rx_hash: 0x11A51182 with RSS type:0x1
HW RX-time: 1677795020448904440 (sec:1677795020.4489) delta to User RX-time sec:0.0000 (15.920 usec)
XDP RX-time: 1677795020448915139 (sec:1677795020.4489) delta to User RX-time sec:0.0000 (5.221 usec)
No rx_vlan_tci or rx_vlan_proto, err=-95
0x55d7e76a02d0: ping-pong with csum=ab19 (want 315b) csum_start=34 csum_offset=6
0x55d7e76a02d0: complete tx idx=998 addr=66010
HW TX-complete-time: 1677795020448969442 (sec:1677795020.4490) delta to User TX-complete-time sec:0.0001 (80.163 usec)
XDP RX-time: 1677795020448915139 (sec:1677795020.4489) delta to User TX-complete-time sec:0.0001 (134.466 usec)
HW RX-time: 1677795020448904440 (sec:1677795020.4489) delta to HW TX-complete-time sec:0.0001 (65.002 usec)
0x55d7e76a02d0: complete rx idx=1126 addr=98110

poll: 1 (0) skip=50 fail=0 redir=1000
xsk_ring_cons__peek: 1
0x55d7e76a02d0: rx_desc[999]->addr=99110 addr=99110 comp_addr=99110 EoP
rx_hash: 0x11A51182 with RSS type:0x1
HW RX-time: 1677795020449912415 (sec:1677795020.4499) delta to User RX-time sec:0.0000 (16.441 usec)
XDP RX-time: 1677795020449923362 (sec:1677795020.4499) delta to User RX-time sec:0.0000 (5.494 usec)
No rx_vlan_tci or rx_vlan_proto, err=-95
0x55d7e76a02d0: ping-pong with csum=ab19 (want 315b) csum_start=34 csum_offset=6
0x55d7e76a02d0: complete tx idx=999 addr=67010
HW TX-complete-time: 1677795020449977468 (sec:1677795020.4500) delta to User TX-complete-time sec:0.0001 (81.036 usec)
XDP RX-time: 1677795020449923362 (sec:1677795020.4499) delta to User TX-complete-time sec:0.0001 (135.142 usec)
HW RX-time: 1677795020449912415 (sec:1677795020.4499) delta to HW TX-complete-time sec:0.0001 (65.053 usec)
0x55d7e76a02d0: complete rx idx=1127 addr=99110

Besides, this patchset is tested with iperf3 to check the impact of holding tx completion.
Based on results below, the impact is not observable.

Result of iperf3 without trafgen command in step 4:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-100.00 sec 27.4 GBytes 2.35 Gbits/sec 0 sender
[ 5] 0.00-100.04 sec 27.4 GBytes 2.35 Gbits/sec receiver

Result of iperf3 running parallel with trafgen command in step 4:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-100.00 sec 27.4 GBytes 2.35 Gbits/sec 0 sender
[ 5] 0.00-100.04 sec 27.4 GBytes 2.35 Gbits/sec receiver

V1: https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/

changelog:
V1 -> V2
- In struct igc_tx_timestamp_request, keep a pointer to igc_tx_buffer,
instead of pointing xsk_pending_ts (Vinicius).
- In struct igc_tx_timestamp_request, introduce buffer_type to indicate
whether skb or igc_tx_buffer pointer should be use (Vinicius).
- In struct igc_metadata_request, remove igc_adapter pointer (Vinicius).
- When request tx hwts, copy the value of cmd_type, instead of using
pointer (Vinicius).
- For boolean variable, use true and false, instead of 1 and 0 (Vinicius).
- In igc_xsk_request_timestamp(), make an early return if none of the 4 ts
registers is available (Vinicius).
- Create helper functions to clear tx buffer and skb for tstamp (John).
- Perform throughput test with mix traffic (Vinicius & John).

Song Yoong Siang (2):
selftests/bpf: xdp_hw_metadata reduce sleep interval
igc: Add Tx hardware timestamp request for AF_XDP zero-copy packet

drivers/net/ethernet/intel/igc/igc.h | 71 +++++++-----
drivers/net/ethernet/intel/igc/igc_main.c | 108 +++++++++++++++++-
drivers/net/ethernet/intel/igc/igc_ptp.c | 40 +++++--
tools/testing/selftests/bpf/xdp_hw_metadata.c | 2 +-
4 files changed, 180 insertions(+), 41 deletions(-)

--
2.34.1



2024-03-01 16:26:58

by Song, Yoong Siang

[permalink] [raw]
Subject: [PATCH iwl-next,v2 2/2] igc: Add Tx hardware timestamp request for AF_XDP zero-copy packet

This patch adds support to per-packet Tx hardware timestamp request to
AF_XDP zero-copy packet via XDP Tx metadata framework. Please note that
user needs to enable Tx HW timestamp capability via igc_ioctl() with
SIOCSHWTSTAMP cmd before sending xsk Tx hardware timestamp request.

Same as implementation in RX timestamp XDP hints kfunc metadata, Timer 0
(adjustable clock) is used in xsk Tx hardware timestamp. i225/i226 have
four sets of timestamping registers. Both *skb and *xsk_tx_buffer pointers
are used to indicate whether the timestamping register is already occupied.

Furthermore, a boolean variable named xsk_pending_ts is used to hold the
transmit completion until the tx hardware timestamp is ready. This is
because, for i225/i226, the timestamp notification event comes some time
after the transmit completion event. The driver will retrigger hardware irq
to clean the packet after retrieve the tx hardware timestamp.

Besides, xsk_meta is added into struct igc_tx_timestamp_request as a hook
to the metadata location of the transmit packet. When the Tx timestamp
interrupt is fired, the interrupt handler will copy the value of Tx hwts
into metadata location via xsk_tx_metadata_complete().

Co-developed-by: Lai Peter Jun Ann <[email protected]>
Signed-off-by: Lai Peter Jun Ann <[email protected]>
Signed-off-by: Song Yoong Siang <[email protected]>
---
drivers/net/ethernet/intel/igc/igc.h | 71 ++++++++------
drivers/net/ethernet/intel/igc/igc_main.c | 108 ++++++++++++++++++++--
drivers/net/ethernet/intel/igc/igc_ptp.c | 40 ++++++--
3 files changed, 179 insertions(+), 40 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index cfa6baccec55..22bb4f245240 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -72,13 +72,46 @@ struct igc_rx_packet_stats {
u64 other_packets;
};

+enum igc_tx_buffer_type {
+ IGC_TX_BUFFER_TYPE_SKB,
+ IGC_TX_BUFFER_TYPE_XDP,
+ IGC_TX_BUFFER_TYPE_XSK,
+};
+
+/* wrapper around a pointer to a socket buffer,
+ * so a DMA handle can be stored along with the buffer
+ */
+struct igc_tx_buffer {
+ union igc_adv_tx_desc *next_to_watch;
+ unsigned long time_stamp;
+ enum igc_tx_buffer_type type;
+ union {
+ struct sk_buff *skb;
+ struct xdp_frame *xdpf;
+ };
+ unsigned int bytecount;
+ u16 gso_segs;
+ __be16 protocol;
+
+ DEFINE_DMA_UNMAP_ADDR(dma);
+ DEFINE_DMA_UNMAP_LEN(len);
+ u32 tx_flags;
+ bool xsk_pending_ts;
+};
+
struct igc_tx_timestamp_request {
- struct sk_buff *skb; /* reference to the packet being timestamped */
+ union { /* reference to the packet being timestamped */
+ struct sk_buff *skb;
+ struct igc_tx_buffer *xsk_tx_buffer;
+ };
+ enum igc_tx_buffer_type buffer_type;
unsigned long start; /* when the tstamp request started (jiffies) */
u32 mask; /* _TSYNCTXCTL_TXTT_{X} bit for this request */
u32 regl; /* which TXSTMPL_{X} register should be used */
u32 regh; /* which TXSTMPH_{X} register should be used */
u32 flags; /* flags that should be added to the tx_buffer */
+ u8 xsk_queue_index; /* Tx queue which requesting timestamp */
+ struct xsk_tx_metadata_compl xsk_meta; /* ref to xsk Tx metadata */
};

struct igc_inline_rx_tstamps {
@@ -322,6 +355,9 @@ void igc_disable_tx_ring(struct igc_ring *ring);
void igc_enable_tx_ring(struct igc_ring *ring);
int igc_xsk_wakeup(struct net_device *dev, u32 queue_id, u32 flags);

+/* AF_XDP TX metadata operations */
+extern const struct xsk_tx_metadata_ops igc_xsk_tx_metadata_ops;
+
/* igc_dump declarations */
void igc_rings_dump(struct igc_adapter *adapter);
void igc_regs_dump(struct igc_adapter *adapter);
@@ -507,32 +543,6 @@ enum igc_boards {
#define TXD_USE_COUNT(S) DIV_ROUND_UP((S), IGC_MAX_DATA_PER_TXD)
#define DESC_NEEDED (MAX_SKB_FRAGS + 4)

-enum igc_tx_buffer_type {
- IGC_TX_BUFFER_TYPE_SKB,
- IGC_TX_BUFFER_TYPE_XDP,
- IGC_TX_BUFFER_TYPE_XSK,
-};
-
-/* wrapper around a pointer to a socket buffer,
- * so a DMA handle can be stored along with the buffer
- */
-struct igc_tx_buffer {
- union igc_adv_tx_desc *next_to_watch;
- unsigned long time_stamp;
- enum igc_tx_buffer_type type;
- union {
- struct sk_buff *skb;
- struct xdp_frame *xdpf;
- };
- unsigned int bytecount;
- u16 gso_segs;
- __be16 protocol;
-
- DEFINE_DMA_UNMAP_ADDR(dma);
- DEFINE_DMA_UNMAP_LEN(len);
- u32 tx_flags;
-};
-
struct igc_rx_buffer {
union {
struct {
@@ -556,6 +566,13 @@ struct igc_xdp_buff {
struct igc_inline_rx_tstamps *rx_ts; /* data indication bit IGC_RXDADV_STAT_TSIP */
};

+struct igc_metadata_request {
+ struct igc_tx_buffer *tx_buffer;
+ struct xsk_tx_metadata *meta;
+ struct igc_ring *tx_ring;
+ u32 cmd_type;
+};
+
struct igc_q_vector {
struct igc_adapter *adapter; /* backlink */
void __iomem *itr_register;
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 3af52d238f3b..d8c7a3b290a4 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -2878,6 +2878,84 @@ static void igc_update_tx_stats(struct igc_q_vector *q_vector,
q_vector->tx.total_packets += packets;
}

+static void igc_xsk_request_timestamp(void *_priv)
+{
+ struct igc_metadata_request *meta_req = _priv;
+ struct igc_ring *tx_ring = meta_req->tx_ring;
+ struct igc_tx_timestamp_request *tstamp;
+ u32 tx_flags = IGC_TX_FLAGS_TSTAMP;
+ struct igc_adapter *adapter;
+ unsigned long lock_flags;
+ bool found = false;
+ int i;
+
+ if (test_bit(IGC_RING_FLAG_TX_HWTSTAMP, &tx_ring->flags)) {
+ adapter = netdev_priv(tx_ring->netdev);
+
+ spin_lock_irqsave(&adapter->ptp_tx_lock, lock_flags);
+
+ /* Search for available tstamp regs */
+ for (i = 0; i < IGC_MAX_TX_TSTAMP_REGS; i++) {
+ tstamp = &adapter->tx_tstamp[i];
+
+ if (tstamp->skb)
+ continue;
+
+ found = true;
+ break;
+ }
+
+ /* Return if no available tstamp regs */
+ if (!found) {
+ adapter->tx_hwtstamp_skipped++;
+ spin_unlock_irqrestore(&adapter->ptp_tx_lock,
+ lock_flags);
+ return;
+ }
+
+ tstamp->start = jiffies;
+ tstamp->xsk_queue_index = tx_ring->queue_index;
+ tstamp->xsk_tx_buffer = meta_req->tx_buffer;
+ tstamp->buffer_type = IGC_TX_BUFFER_TYPE_XSK;
+
+ /* Hold the transmit completion until timestamp is ready */
+ meta_req->tx_buffer->xsk_pending_ts = true;
+
+ /* Keep the pointer to tx_timestamp, which is located in XDP
+ * metadata area. It is the location to store the value of
+ * tx hardware timestamp.
+ */
+ xsk_tx_metadata_to_compl(meta_req->meta, &tstamp->xsk_meta);
+
+ /* Set timestamp bit based on the _TSTAMP(_X) bit. */
+ tx_flags |= tstamp->flags;
+ meta_req->cmd_type |= IGC_SET_FLAG(tx_flags,
+ IGC_TX_FLAGS_TSTAMP,
+ (IGC_ADVTXD_MAC_TSTAMP));
+ meta_req->cmd_type |= IGC_SET_FLAG(tx_flags,
+ IGC_TX_FLAGS_TSTAMP_1,
+ (IGC_ADVTXD_TSTAMP_REG_1));
+ meta_req->cmd_type |= IGC_SET_FLAG(tx_flags,
+ IGC_TX_FLAGS_TSTAMP_2,
+ (IGC_ADVTXD_TSTAMP_REG_2));
+ meta_req->cmd_type |= IGC_SET_FLAG(tx_flags,
+ IGC_TX_FLAGS_TSTAMP_3,
+ (IGC_ADVTXD_TSTAMP_REG_3));
+
+ spin_unlock_irqrestore(&adapter->ptp_tx_lock, lock_flags);
+ }
+}
+
+static u64 igc_xsk_fill_timestamp(void *_priv)
+{
+ return *(u64 *)_priv;
+}
+
+const struct xsk_tx_metadata_ops igc_xsk_tx_metadata_ops = {
+ .tmo_request_timestamp = igc_xsk_request_timestamp,
+ .tmo_fill_timestamp = igc_xsk_fill_timestamp,
+};
+
static void igc_xdp_xmit_zc(struct igc_ring *ring)
{
struct xsk_buff_pool *pool = ring->xsk_pool;
@@ -2899,24 +2977,34 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
budget = igc_desc_unused(ring);

while (xsk_tx_peek_desc(pool, &xdp_desc) && budget--) {
- u32 cmd_type, olinfo_status;
+ struct igc_metadata_request meta_req;
+ struct xsk_tx_metadata *meta = NULL;
struct igc_tx_buffer *bi;
+ u32 olinfo_status;
dma_addr_t dma;

- cmd_type = IGC_ADVTXD_DTYP_DATA | IGC_ADVTXD_DCMD_DEXT |
- IGC_ADVTXD_DCMD_IFCS | IGC_TXD_DCMD |
- xdp_desc.len;
+ meta_req.cmd_type = IGC_ADVTXD_DTYP_DATA |
+ IGC_ADVTXD_DCMD_DEXT |
+ IGC_ADVTXD_DCMD_IFCS |
+ IGC_TXD_DCMD | xdp_desc.len;
olinfo_status = xdp_desc.len << IGC_ADVTXD_PAYLEN_SHIFT;

dma = xsk_buff_raw_get_dma(pool, xdp_desc.addr);
+ meta = xsk_buff_get_metadata(pool, xdp_desc.addr);
xsk_buff_raw_dma_sync_for_device(pool, dma, xdp_desc.len);
+ bi = &ring->tx_buffer_info[ntu];
+
+ meta_req.tx_ring = ring;
+ meta_req.tx_buffer = bi;
+ meta_req.meta = meta;
+ xsk_tx_metadata_request(meta, &igc_xsk_tx_metadata_ops,
+ &meta_req);

tx_desc = IGC_TX_DESC(ring, ntu);
- tx_desc->read.cmd_type_len = cpu_to_le32(cmd_type);
+ tx_desc->read.cmd_type_len = cpu_to_le32(meta_req.cmd_type);
tx_desc->read.olinfo_status = cpu_to_le32(olinfo_status);
tx_desc->read.buffer_addr = cpu_to_le64(dma);

- bi = &ring->tx_buffer_info[ntu];
bi->type = IGC_TX_BUFFER_TYPE_XSK;
bi->protocol = 0;
bi->bytecount = xdp_desc.len;
@@ -2979,6 +3067,13 @@ static bool igc_clean_tx_irq(struct igc_q_vector *q_vector, int napi_budget)
if (!(eop_desc->wb.status & cpu_to_le32(IGC_TXD_STAT_DD)))
break;

+ /* Hold the completions while there's a pending tx hardware
+ * timestamp request from XDP Tx metadata.
+ */
+ if (tx_buffer->type == IGC_TX_BUFFER_TYPE_XSK &&
+ tx_buffer->xsk_pending_ts)
+ break;
+
/* clear next_to_watch to prevent false hangs */
tx_buffer->next_to_watch = NULL;

@@ -6818,6 +6913,7 @@ static int igc_probe(struct pci_dev *pdev,

netdev->netdev_ops = &igc_netdev_ops;
netdev->xdp_metadata_ops = &igc_xdp_metadata_ops;
+ netdev->xsk_tx_metadata_ops = &igc_xsk_tx_metadata_ops;
igc_ethtool_set_ops(netdev);
netdev->watchdog_timeo = 5 * HZ;

diff --git a/drivers/net/ethernet/intel/igc/igc_ptp.c b/drivers/net/ethernet/intel/igc/igc_ptp.c
index 885faaa7b9de..5edaf738fd19 100644
--- a/drivers/net/ethernet/intel/igc/igc_ptp.c
+++ b/drivers/net/ethernet/intel/igc/igc_ptp.c
@@ -11,6 +11,7 @@
#include <linux/ktime.h>
#include <linux/delay.h>
#include <linux/iopoll.h>
+#include <net/xdp_sock.h>

#define INCVALUE_MASK 0x7fffffff
#define ISGN 0x80000000
@@ -545,6 +546,25 @@ static void igc_ptp_enable_rx_timestamp(struct igc_adapter *adapter)
wr32(IGC_TSYNCRXCTL, val);
}

+static void igc_ptp_free_tx_buffer(struct igc_adapter *adapter,
+ struct igc_tx_timestamp_request *tstamp)
+{
+ if (tstamp->buffer_type == IGC_TX_BUFFER_TYPE_XSK) {
+ /* Release the transmit completion */
+ tstamp->xsk_tx_buffer->xsk_pending_ts = false;
+ tstamp->xsk_tx_buffer = NULL;
+ tstamp->buffer_type = 0;
+
+ /* Trigger txrx interrupt for transmit completion */
+ igc_xsk_wakeup(adapter->netdev, tstamp->xsk_queue_index, 0);
+
+ return;
+ }
+
+ dev_kfree_skb_any(tstamp->skb);
+ tstamp->skb = NULL;
+}
+
static void igc_ptp_clear_tx_tstamp(struct igc_adapter *adapter)
{
unsigned long flags;
@@ -555,8 +575,8 @@ static void igc_ptp_clear_tx_tstamp(struct igc_adapter *adapter)
for (i = 0; i < IGC_MAX_TX_TSTAMP_REGS; i++) {
struct igc_tx_timestamp_request *tstamp = &adapter->tx_tstamp[i];

- dev_kfree_skb_any(tstamp->skb);
- tstamp->skb = NULL;
+ if (tstamp->skb)
+ igc_ptp_free_tx_buffer(adapter, tstamp);
}

spin_unlock_irqrestore(&adapter->ptp_tx_lock, flags);
@@ -657,8 +677,9 @@ static int igc_ptp_set_timestamp_mode(struct igc_adapter *adapter,
static void igc_ptp_tx_timeout(struct igc_adapter *adapter,
struct igc_tx_timestamp_request *tstamp)
{
- dev_kfree_skb_any(tstamp->skb);
- tstamp->skb = NULL;
+ if (tstamp->skb)
+ igc_ptp_free_tx_buffer(adapter, tstamp);
+
adapter->tx_hwtstamp_timeouts++;

netdev_warn(adapter->netdev, "Tx timestamp timeout\n");
@@ -729,10 +750,15 @@ static void igc_ptp_tx_reg_to_stamp(struct igc_adapter *adapter,
shhwtstamps.hwtstamp =
ktime_add_ns(shhwtstamps.hwtstamp, adjust);

- tstamp->skb = NULL;
+ /* Copy the tx hardware timestamp into xdp metadata or skb */
+ if (tstamp->buffer_type == IGC_TX_BUFFER_TYPE_XSK)
+ xsk_tx_metadata_complete(&tstamp->xsk_meta,
+ &igc_xsk_tx_metadata_ops,
+ &shhwtstamps.hwtstamp);
+ else
+ skb_tstamp_tx(skb, &shhwtstamps);

- skb_tstamp_tx(skb, &shhwtstamps);
- dev_kfree_skb_any(skb);
+ igc_ptp_free_tx_buffer(adapter, tstamp);
}

/**
--
2.34.1


2024-03-01 16:30:20

by Song, Yoong Siang

[permalink] [raw]
Subject: [PATCH iwl-next,v2 1/2] selftests/bpf: xdp_hw_metadata reduce sleep interval

In current ping-pong design, xdp_hw_metadata will wait until the packet
transmition completely done, then only start to receive the next packet.

The current sleep interval is 10ms, which is unnecessary large. Typically,
a NIC does not need such a long time to transmit a packet. Furthermore,
during this 10ms sleep time, the app is unable to receive incoming packets.

Therefore, this commit reduce sleep interval to 10us, so that
xdp_hw_metadata able to support periodic packets with shorter interval.
10us * 500 = 5ms should be enough for packet transmission and status
retrival.

Signed-off-by: Song Yoong Siang <[email protected]>
---
tools/testing/selftests/bpf/xdp_hw_metadata.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c b/tools/testing/selftests/bpf/xdp_hw_metadata.c
index 878d68db0325..bdf5d8180067 100644
--- a/tools/testing/selftests/bpf/xdp_hw_metadata.c
+++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c
@@ -480,7 +480,7 @@ static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd, clockid_t
for (int j = 0; j < 500; j++) {
if (complete_tx(xsk, clock_id))
break;
- usleep(10*1000);
+ usleep(10);
}
}
}
--
2.34.1


2024-03-01 17:24:13

by John Fastabend

[permalink] [raw]
Subject: RE: [PATCH iwl-next,v2 1/2] selftests/bpf: xdp_hw_metadata reduce sleep interval

Song Yoong Siang wrote:
> In current ping-pong design, xdp_hw_metadata will wait until the packet
> transmition completely done, then only start to receive the next packet.
>
> The current sleep interval is 10ms, which is unnecessary large. Typically,
> a NIC does not need such a long time to transmit a packet. Furthermore,
> during this 10ms sleep time, the app is unable to receive incoming packets.
>
> Therefore, this commit reduce sleep interval to 10us, so that
> xdp_hw_metadata able to support periodic packets with shorter interval.
> 10us * 500 = 5ms should be enough for packet transmission and status
> retrival.
>
> Signed-off-by: Song Yoong Siang <[email protected]>
> ---
> tools/testing/selftests/bpf/xdp_hw_metadata.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c b/tools/testing/selftests/bpf/xdp_hw_metadata.c
> index 878d68db0325..bdf5d8180067 100644
> --- a/tools/testing/selftests/bpf/xdp_hw_metadata.c
> +++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c
> @@ -480,7 +480,7 @@ static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd, clockid_t
> for (int j = 0; j < 500; j++) {
> if (complete_tx(xsk, clock_id))
> break;
> - usleep(10*1000);
> + usleep(10);
> }
> }
> }
> --
> 2.34.1
>

Acked-by: John Fastabend <[email protected]>

2024-03-01 17:54:54

by John Fastabend

[permalink] [raw]
Subject: RE: [PATCH iwl-next,v2 2/2] igc: Add Tx hardware timestamp request for AF_XDP zero-copy packet

Song Yoong Siang wrote:
> This patch adds support to per-packet Tx hardware timestamp request to
> AF_XDP zero-copy packet via XDP Tx metadata framework. Please note that
> user needs to enable Tx HW timestamp capability via igc_ioctl() with
> SIOCSHWTSTAMP cmd before sending xsk Tx hardware timestamp request.
>
> Same as implementation in RX timestamp XDP hints kfunc metadata, Timer 0
> (adjustable clock) is used in xsk Tx hardware timestamp. i225/i226 have
> four sets of timestamping registers. Both *skb and *xsk_tx_buffer pointers
> are used to indicate whether the timestamping register is already occupied.
>
> Furthermore, a boolean variable named xsk_pending_ts is used to hold the
> transmit completion until the tx hardware timestamp is ready. This is
> because, for i225/i226, the timestamp notification event comes some time
> after the transmit completion event. The driver will retrigger hardware irq
> to clean the packet after retrieve the tx hardware timestamp.
>
> Besides, xsk_meta is added into struct igc_tx_timestamp_request as a hook
> to the metadata location of the transmit packet. When the Tx timestamp
> interrupt is fired, the interrupt handler will copy the value of Tx hwts
> into metadata location via xsk_tx_metadata_complete().
>
> Co-developed-by: Lai Peter Jun Ann <[email protected]>
> Signed-off-by: Lai Peter Jun Ann <[email protected]>
> Signed-off-by: Song Yoong Siang <[email protected]>
> ---

[...]

>
> +static void igc_xsk_request_timestamp(void *_priv)
> +{
> + struct igc_metadata_request *meta_req = _priv;
> + struct igc_ring *tx_ring = meta_req->tx_ring;
> + struct igc_tx_timestamp_request *tstamp;
> + u32 tx_flags = IGC_TX_FLAGS_TSTAMP;
> + struct igc_adapter *adapter;
> + unsigned long lock_flags;
> + bool found = false;
> + int i;
> +
> + if (test_bit(IGC_RING_FLAG_TX_HWTSTAMP, &tx_ring->flags)) {
> + adapter = netdev_priv(tx_ring->netdev);
> +
> + spin_lock_irqsave(&adapter->ptp_tx_lock, lock_flags);
> +
> + /* Search for available tstamp regs */
> + for (i = 0; i < IGC_MAX_TX_TSTAMP_REGS; i++) {
> + tstamp = &adapter->tx_tstamp[i];
> +
> + if (tstamp->skb)
> + continue;
> +
> + found = true;
> + break;

Not how I would have written this loop construct seems a bit odd
to default break but it works.

> + }
> +
> + /* Return if no available tstamp regs */
> + if (!found) {
> + adapter->tx_hwtstamp_skipped++;
> + spin_unlock_irqrestore(&adapter->ptp_tx_lock,
> + lock_flags);
> + return;
> + }

[...]

>
> +static void igc_ptp_free_tx_buffer(struct igc_adapter *adapter,
> + struct igc_tx_timestamp_request *tstamp)
> +{
> + if (tstamp->buffer_type == IGC_TX_BUFFER_TYPE_XSK) {
> + /* Release the transmit completion */
> + tstamp->xsk_tx_buffer->xsk_pending_ts = false;
> + tstamp->xsk_tx_buffer = NULL;
> + tstamp->buffer_type = 0;
> +
> + /* Trigger txrx interrupt for transmit completion */
> + igc_xsk_wakeup(adapter->netdev, tstamp->xsk_queue_index, 0);

Just curious because I didn't find it. Fairly sure I just need to look more,
but don't you want to still 'tstamp->skb = NULL' in this path somewhere?
It looks like triggering the tx interrupt again with buffer_type == 0 wouldn't
do the null.

I suspect I just missed it.

2024-03-01 18:12:17

by Stanislav Fomichev

[permalink] [raw]
Subject: Re: [PATCH iwl-next,v2 1/2] selftests/bpf: xdp_hw_metadata reduce sleep interval

On 03/02, Song Yoong Siang wrote:
> In current ping-pong design, xdp_hw_metadata will wait until the packet
> transmition completely done, then only start to receive the next packet.
>
> The current sleep interval is 10ms, which is unnecessary large. Typically,
> a NIC does not need such a long time to transmit a packet. Furthermore,
> during this 10ms sleep time, the app is unable to receive incoming packets.
>
> Therefore, this commit reduce sleep interval to 10us, so that
> xdp_hw_metadata able to support periodic packets with shorter interval.
> 10us * 500 = 5ms should be enough for packet transmission and status
> retrival.
>
> Signed-off-by: Song Yoong Siang <[email protected]>

Acked-by: Stanislav Fomichev <[email protected]>

2024-03-02 04:04:32

by Song, Yoong Siang

[permalink] [raw]
Subject: RE: [PATCH iwl-next,v2 2/2] igc: Add Tx hardware timestamp request for AF_XDP zero-copy packet

On Saturday, March 2, 2024 1:55 AM, John Fastabend <[email protected]> wrote:
>Song Yoong Siang wrote:
>> This patch adds support to per-packet Tx hardware timestamp request to
>> AF_XDP zero-copy packet via XDP Tx metadata framework. Please note that
>> user needs to enable Tx HW timestamp capability via igc_ioctl() with
>> SIOCSHWTSTAMP cmd before sending xsk Tx hardware timestamp request.
>>
>> Same as implementation in RX timestamp XDP hints kfunc metadata, Timer 0
>> (adjustable clock) is used in xsk Tx hardware timestamp. i225/i226 have
>> four sets of timestamping registers. Both *skb and *xsk_tx_buffer pointers
>> are used to indicate whether the timestamping register is already occupied.
>>
>> Furthermore, a boolean variable named xsk_pending_ts is used to hold the
>> transmit completion until the tx hardware timestamp is ready. This is
>> because, for i225/i226, the timestamp notification event comes some time
>> after the transmit completion event. The driver will retrigger hardware irq
>> to clean the packet after retrieve the tx hardware timestamp.
>>
>> Besides, xsk_meta is added into struct igc_tx_timestamp_request as a hook
>> to the metadata location of the transmit packet. When the Tx timestamp
>> interrupt is fired, the interrupt handler will copy the value of Tx hwts
>> into metadata location via xsk_tx_metadata_complete().
>>
>> Co-developed-by: Lai Peter Jun Ann <[email protected]>
>> Signed-off-by: Lai Peter Jun Ann <[email protected]>
>> Signed-off-by: Song Yoong Siang <[email protected]>
>> ---
>
>[...]
>
>>
>> +static void igc_xsk_request_timestamp(void *_priv)
>> +{
>> + struct igc_metadata_request *meta_req = _priv;
>> + struct igc_ring *tx_ring = meta_req->tx_ring;
>> + struct igc_tx_timestamp_request *tstamp;
>> + u32 tx_flags = IGC_TX_FLAGS_TSTAMP;
>> + struct igc_adapter *adapter;
>> + unsigned long lock_flags;
>> + bool found = false;
>> + int i;
>> +
>> + if (test_bit(IGC_RING_FLAG_TX_HWTSTAMP, &tx_ring->flags)) {
>> + adapter = netdev_priv(tx_ring->netdev);
>> +
>> + spin_lock_irqsave(&adapter->ptp_tx_lock, lock_flags);
>> +
>> + /* Search for available tstamp regs */
>> + for (i = 0; i < IGC_MAX_TX_TSTAMP_REGS; i++) {
>> + tstamp = &adapter->tx_tstamp[i];
>> +
>> + if (tstamp->skb)
>> + continue;
>> +
>> + found = true;
>> + break;
>
>Not how I would have written this loop construct seems a bit odd
>to default break but it works.

Hi John,
First of all, thank you for reviewing the patch.
I agree that we can make the loop better.
How about I change the loop to below:

for (i = 0; i < IGC_MAX_TX_TSTAMP_REGS; i++) {
tstamp = &adapter->tx_tstamp[i];

if (!tstamp->skb) {
found = true;
break;
}
}

>
>> + }
>> +
>> + /* Return if no available tstamp regs */
>> + if (!found) {
>> + adapter->tx_hwtstamp_skipped++;
>> + spin_unlock_irqrestore(&adapter->ptp_tx_lock,
>> + lock_flags);
>> + return;
>> + }
>
>[...]
>
>>
>> +static void igc_ptp_free_tx_buffer(struct igc_adapter *adapter,
>> + struct igc_tx_timestamp_request *tstamp)
>> +{
>> + if (tstamp->buffer_type == IGC_TX_BUFFER_TYPE_XSK) {
>> + /* Release the transmit completion */
>> + tstamp->xsk_tx_buffer->xsk_pending_ts = false;
>> + tstamp->xsk_tx_buffer = NULL;
>> + tstamp->buffer_type = 0;
>> +
>> + /* Trigger txrx interrupt for transmit completion */
>> + igc_xsk_wakeup(adapter->netdev, tstamp->xsk_queue_index, 0);
>
>Just curious because I didn't find it. Fairly sure I just need to look more,
>but don't you want to still 'tstamp->skb = NULL' in this path somewhere?
>It looks like triggering the tx interrupt again with buffer_type == 0 wouldn't
>do the null.
>
>I suspect I just missed it.

Since the timestamp register will only be used by either skb or xsk,
So we make tstamp->xsk_tx_buffer and tstamp->skb as union,
Then based on tstamp->buffer_type to decide whether
tstamp->xsk_tx_buffer or tstamp->skb should be used.

My thought is, by setting tstamp->xsk_tx_buffer=NULL,
tstamp->skb will become NULL as well, and vice versa.

Thanks & Regards
Siang






2024-03-03 01:59:22

by John Fastabend

[permalink] [raw]
Subject: RE: [PATCH iwl-next,v2 2/2] igc: Add Tx hardware timestamp request for AF_XDP zero-copy packet

Song, Yoong Siang wrote:
> On Saturday, March 2, 2024 1:55 AM, John Fastabend <[email protected]> wrote:
> >Song Yoong Siang wrote:
> >> This patch adds support to per-packet Tx hardware timestamp request to
> >> AF_XDP zero-copy packet via XDP Tx metadata framework. Please note that
> >> user needs to enable Tx HW timestamp capability via igc_ioctl() with
> >> SIOCSHWTSTAMP cmd before sending xsk Tx hardware timestamp request.
> >>
> >> Same as implementation in RX timestamp XDP hints kfunc metadata, Timer 0
> >> (adjustable clock) is used in xsk Tx hardware timestamp. i225/i226 have
> >> four sets of timestamping registers. Both *skb and *xsk_tx_buffer pointers
> >> are used to indicate whether the timestamping register is already occupied.
> >>
> >> Furthermore, a boolean variable named xsk_pending_ts is used to hold the
> >> transmit completion until the tx hardware timestamp is ready. This is
> >> because, for i225/i226, the timestamp notification event comes some time
> >> after the transmit completion event. The driver will retrigger hardware irq
> >> to clean the packet after retrieve the tx hardware timestamp.
> >>
> >> Besides, xsk_meta is added into struct igc_tx_timestamp_request as a hook
> >> to the metadata location of the transmit packet. When the Tx timestamp
> >> interrupt is fired, the interrupt handler will copy the value of Tx hwts
> >> into metadata location via xsk_tx_metadata_complete().
> >>
> >> Co-developed-by: Lai Peter Jun Ann <[email protected]>
> >> Signed-off-by: Lai Peter Jun Ann <[email protected]>
> >> Signed-off-by: Song Yoong Siang <[email protected]>
> >> ---
> >
> >[...]
> >
> >>
> >> +static void igc_xsk_request_timestamp(void *_priv)
> >> +{
> >> + struct igc_metadata_request *meta_req = _priv;
> >> + struct igc_ring *tx_ring = meta_req->tx_ring;
> >> + struct igc_tx_timestamp_request *tstamp;
> >> + u32 tx_flags = IGC_TX_FLAGS_TSTAMP;
> >> + struct igc_adapter *adapter;
> >> + unsigned long lock_flags;
> >> + bool found = false;
> >> + int i;
> >> +
> >> + if (test_bit(IGC_RING_FLAG_TX_HWTSTAMP, &tx_ring->flags)) {
> >> + adapter = netdev_priv(tx_ring->netdev);
> >> +
> >> + spin_lock_irqsave(&adapter->ptp_tx_lock, lock_flags);
> >> +
> >> + /* Search for available tstamp regs */
> >> + for (i = 0; i < IGC_MAX_TX_TSTAMP_REGS; i++) {
> >> + tstamp = &adapter->tx_tstamp[i];
> >> +
> >> + if (tstamp->skb)
> >> + continue;
> >> +
> >> + found = true;
> >> + break;
> >
> >Not how I would have written this loop construct seems a bit odd
> >to default break but it works.
>
> Hi John,
> First of all, thank you for reviewing the patch.
> I agree that we can make the loop better.
> How about I change the loop to below:

That is more natural to me, but whatever reads best for you
is probably ok.

>
> for (i = 0; i < IGC_MAX_TX_TSTAMP_REGS; i++) {
> tstamp = &adapter->tx_tstamp[i];
>
> if (!tstamp->skb) {
> found = true;
> break;
> }
> }
>
> >
> >> + }
> >> +
> >> + /* Return if no available tstamp regs */
> >> + if (!found) {
> >> + adapter->tx_hwtstamp_skipped++;
> >> + spin_unlock_irqrestore(&adapter->ptp_tx_lock,
> >> + lock_flags);
> >> + return;
> >> + }
> >
> >[...]
> >
> >>
> >> +static void igc_ptp_free_tx_buffer(struct igc_adapter *adapter,
> >> + struct igc_tx_timestamp_request *tstamp)
> >> +{
> >> + if (tstamp->buffer_type == IGC_TX_BUFFER_TYPE_XSK) {
> >> + /* Release the transmit completion */
> >> + tstamp->xsk_tx_buffer->xsk_pending_ts = false;
> >> + tstamp->xsk_tx_buffer = NULL;
> >> + tstamp->buffer_type = 0;
> >> +
> >> + /* Trigger txrx interrupt for transmit completion */
> >> + igc_xsk_wakeup(adapter->netdev, tstamp->xsk_queue_index, 0);
> >
> >Just curious because I didn't find it. Fairly sure I just need to look more,
> >but don't you want to still 'tstamp->skb = NULL' in this path somewhere?
> >It looks like triggering the tx interrupt again with buffer_type == 0 wouldn't
> >do the null.
> >
> >I suspect I just missed it.
>
> Since the timestamp register will only be used by either skb or xsk,
> So we make tstamp->xsk_tx_buffer and tstamp->skb as union,
> Then based on tstamp->buffer_type to decide whether
> tstamp->xsk_tx_buffer or tstamp->skb should be used.
>
> My thought is, by setting tstamp->xsk_tx_buffer=NULL,
> tstamp->skb will become NULL as well, and vice versa.

Seems good to me. Maybe a comment though? Otherwise I suspect next
person to read the code will have to spend the extra time to track
it down as well.

>
> Thanks & Regards
> Siang

Also feel free to carry my ack into the v2 if you make the couple
small nitpick changes.

Acked-by: John Fastabend <[email protected]>


2024-03-03 07:16:05

by Song, Yoong Siang

[permalink] [raw]
Subject: RE: [PATCH iwl-next,v2 2/2] igc: Add Tx hardware timestamp request for AF_XDP zero-copy packet

On Sunday, March 3, 2024 9:59 AM, John Fastabend <[email protected]> wrote:
>Song, Yoong Siang wrote:
>> On Saturday, March 2, 2024 1:55 AM, John Fastabend <[email protected]> wrote:
>> >Song Yoong Siang wrote:
>> >> This patch adds support to per-packet Tx hardware timestamp request to
>> >> AF_XDP zero-copy packet via XDP Tx metadata framework. Please note that
>> >> user needs to enable Tx HW timestamp capability via igc_ioctl() with
>> >> SIOCSHWTSTAMP cmd before sending xsk Tx hardware timestamp request.
>> >>
>> >> Same as implementation in RX timestamp XDP hints kfunc metadata, Timer 0
>> >> (adjustable clock) is used in xsk Tx hardware timestamp. i225/i226 have
>> >> four sets of timestamping registers. Both *skb and *xsk_tx_buffer pointers
>> >> are used to indicate whether the timestamping register is already occupied.
>> >>
>> >> Furthermore, a boolean variable named xsk_pending_ts is used to hold the
>> >> transmit completion until the tx hardware timestamp is ready. This is
>> >> because, for i225/i226, the timestamp notification event comes some time
>> >> after the transmit completion event. The driver will retrigger hardware irq
>> >> to clean the packet after retrieve the tx hardware timestamp.
>> >>
>> >> Besides, xsk_meta is added into struct igc_tx_timestamp_request as a hook
>> >> to the metadata location of the transmit packet. When the Tx timestamp
>> >> interrupt is fired, the interrupt handler will copy the value of Tx hwts
>> >> into metadata location via xsk_tx_metadata_complete().
>> >>
>> >> Co-developed-by: Lai Peter Jun Ann <[email protected]>
>> >> Signed-off-by: Lai Peter Jun Ann <[email protected]>
>> >> Signed-off-by: Song Yoong Siang <[email protected]>
>> >> ---
>> >
>> >[...]
>> >
>> >>
>> >> +static void igc_xsk_request_timestamp(void *_priv)
>> >> +{
>> >> + struct igc_metadata_request *meta_req = _priv;
>> >> + struct igc_ring *tx_ring = meta_req->tx_ring;
>> >> + struct igc_tx_timestamp_request *tstamp;
>> >> + u32 tx_flags = IGC_TX_FLAGS_TSTAMP;
>> >> + struct igc_adapter *adapter;
>> >> + unsigned long lock_flags;
>> >> + bool found = false;
>> >> + int i;
>> >> +
>> >> + if (test_bit(IGC_RING_FLAG_TX_HWTSTAMP, &tx_ring->flags)) {
>> >> + adapter = netdev_priv(tx_ring->netdev);
>> >> +
>> >> + spin_lock_irqsave(&adapter->ptp_tx_lock, lock_flags);
>> >> +
>> >> + /* Search for available tstamp regs */
>> >> + for (i = 0; i < IGC_MAX_TX_TSTAMP_REGS; i++) {
>> >> + tstamp = &adapter->tx_tstamp[i];
>> >> +
>> >> + if (tstamp->skb)
>> >> + continue;
>> >> +
>> >> + found = true;
>> >> + break;
>> >
>> >Not how I would have written this loop construct seems a bit odd
>> >to default break but it works.
>>
>> Hi John,
>> First of all, thank you for reviewing the patch.
>> I agree that we can make the loop better.
>> How about I change the loop to below:
>
>That is more natural to me, but whatever reads best for you
>is probably ok.
>

I am ok with both versions.
I will change it in next version for better readability and clarity.

>>
>> for (i = 0; i < IGC_MAX_TX_TSTAMP_REGS; i++) {
>> tstamp = &adapter->tx_tstamp[i];
>>
>> if (!tstamp->skb) {
>> found = true;
>> break;
>> }
>> }
>>
>> >
>> >> + }
>> >> +
>> >> + /* Return if no available tstamp regs */
>> >> + if (!found) {
>> >> + adapter->tx_hwtstamp_skipped++;
>> >> + spin_unlock_irqrestore(&adapter->ptp_tx_lock,
>> >> + lock_flags);
>> >> + return;
>> >> + }
>> >
>> >[...]
>> >
>> >>
>> >> +static void igc_ptp_free_tx_buffer(struct igc_adapter *adapter,
>> >> + struct igc_tx_timestamp_request *tstamp)
>> >> +{
>> >> + if (tstamp->buffer_type == IGC_TX_BUFFER_TYPE_XSK) {
>> >> + /* Release the transmit completion */
>> >> + tstamp->xsk_tx_buffer->xsk_pending_ts = false;
>> >> + tstamp->xsk_tx_buffer = NULL;
>> >> + tstamp->buffer_type = 0;
>> >> +
>> >> + /* Trigger txrx interrupt for transmit completion */
>> >> + igc_xsk_wakeup(adapter->netdev, tstamp->xsk_queue_index, 0);
>> >
>> >Just curious because I didn't find it. Fairly sure I just need to look more,
>> >but don't you want to still 'tstamp->skb = NULL' in this path somewhere?
>> >It looks like triggering the tx interrupt again with buffer_type == 0 wouldn't
>> >do the null.
>> >
>> >I suspect I just missed it.
>>
>> Since the timestamp register will only be used by either skb or xsk,
>> So we make tstamp->xsk_tx_buffer and tstamp->skb as union,
>> Then based on tstamp->buffer_type to decide whether
>> tstamp->xsk_tx_buffer or tstamp->skb should be used.
>>
>> My thought is, by setting tstamp->xsk_tx_buffer=NULL,
>> tstamp->skb will become NULL as well, and vice versa.
>
>Seems good to me. Maybe a comment though? Otherwise I suspect next
>person to read the code will have to spend the extra time to track
>it down as well.
>

Sure. I will add comment in next version.

>>
>> Thanks & Regards
>> Siang
>
>Also feel free to carry my ack into the v2 if you make the couple
>small nitpick changes.
>
>Acked-by: John Fastabend <[email protected]>
>
Thank you for your acknowledgment.