2024-02-23 19:27:50

by Rahul Rameshbabu

[permalink] [raw]
Subject: [PATCH RFC net-next v1 0/6] ethtool HW timestamping statistics

The goal of this patch series is to introduce a common set of ethtool statistics
for hardware timestamping that a driver implementer can hook into. The
statistics counters added are based on what I believe are common
patterns/behaviors found across various hardware timestamping implementations
seen in the kernel tree today. The mlx5 family of devices is used as the PoC for
this patch series. Other vendors are more than welcome to chim in on this
series.

Statistics can be queried from either the DMA or PHY layers. I think this
concept of layer-based statistics selection and the general timestamping layer
selection work Kory Maincent is working on can be converged together [1].

[1] https://lore.kernel.org/netdev/[email protected]/

Signed-off-by: Rahul Rameshbabu <[email protected]>
---
Rahul Rameshbabu (6):
ethtool: add interface to read Tx hardware timestamping statistics
net/mlx5e: Introduce lost_cqe statistic counter for PTP Tx port
timestamping CQ
net/mlx5e: Introduce timestamps statistic counter for Tx DMA layer
net/mlx5e: Implement ethtool hardware timestamping statistics
tools: ynl: ethtool.py: Make tool invokable from any CWD
tools: ynl: ethtool.py: Add ts ethtool statistics group

.../ethernet/mellanox/mlx5/counters.rst | 11 +++
.../net/ethernet/mellanox/mlx5/core/en/ptp.c | 1 +
.../ethernet/mellanox/mlx5/core/en_ethtool.c | 9 +++
.../ethernet/mellanox/mlx5/core/en_stats.c | 71 +++++++++++++++++++
.../ethernet/mellanox/mlx5/core/en_stats.h | 4 ++
.../net/ethernet/mellanox/mlx5/core/en_tx.c | 6 +-
include/linux/ethtool.h | 28 ++++++++
include/uapi/linux/ethtool.h | 20 ++++++
include/uapi/linux/ethtool_netlink.h | 17 +++++
net/ethtool/netlink.h | 3 +-
net/ethtool/stats.c | 53 ++++++++++++--
net/ethtool/strset.c | 5 ++
tools/net/ynl/ethtool.py | 9 ++-
13 files changed, 227 insertions(+), 10 deletions(-)

--
2.42.0



2024-02-23 19:27:59

by Rahul Rameshbabu

[permalink] [raw]
Subject: [PATCH RFC net-next v1 3/6] net/mlx5e: Introduce timestamps statistic counter for Tx DMA layer

Count number of transmitted packets that were hardware timestamped at the
device DMA layer.

Signed-off-by: Rahul Rameshbabu <[email protected]>
Reviewed-by: Dragos Tatulea <[email protected]>
---
.../device_drivers/ethernet/mellanox/mlx5/counters.rst | 5 +++++
drivers/net/ethernet/mellanox/mlx5/core/en_stats.c | 2 ++
drivers/net/ethernet/mellanox/mlx5/core/en_stats.h | 1 +
drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 6 ++++--
4 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
index 5464cd9e2694..fed821ef9b09 100644
--- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
+++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
@@ -300,6 +300,11 @@ the software port.
in the beginning of the queue. This is a normal condition.
- Informative

+ * - `tx[i]_timestamps`
+ - Transmitted packets that were hardware timestamped at the device's DMA
+ layer.
+ - Informative
+
* - `tx[i]_added_vlan_packets`
- The number of packets sent where vlan tag insertion was offloaded to the
hardware.
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
index 7e63d7c88894..bc31196d348a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
@@ -2046,6 +2046,7 @@ static const struct counter_desc sq_stats_desc[] = {
{ MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, csum_partial_inner) },
{ MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, added_vlan_packets) },
{ MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, nop) },
+ { MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, timestamps) },
{ MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, mpwqe_blks) },
{ MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, mpwqe_pkts) },
#ifdef CONFIG_MLX5_EN_TLS
@@ -2198,6 +2199,7 @@ static const struct counter_desc qos_sq_stats_desc[] = {
{ MLX5E_DECLARE_QOS_TX_STAT(struct mlx5e_sq_stats, csum_partial_inner) },
{ MLX5E_DECLARE_QOS_TX_STAT(struct mlx5e_sq_stats, added_vlan_packets) },
{ MLX5E_DECLARE_QOS_TX_STAT(struct mlx5e_sq_stats, nop) },
+ { MLX5E_DECLARE_QOS_TX_STAT(struct mlx5e_sq_stats, timestamps) },
{ MLX5E_DECLARE_QOS_TX_STAT(struct mlx5e_sq_stats, mpwqe_blks) },
{ MLX5E_DECLARE_QOS_TX_STAT(struct mlx5e_sq_stats, mpwqe_pkts) },
#ifdef CONFIG_MLX5_EN_TLS
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
index 03f6265d3ed5..3c634c5fd420 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
@@ -429,6 +429,7 @@ struct mlx5e_sq_stats {
u64 stopped;
u64 dropped;
u64 recover;
+ u64 timestamps;
/* dirtied @completion */
u64 cqes ____cacheline_aligned_in_smp;
u64 wake;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index 5c166d9d2dca..5acba323246e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -748,11 +748,13 @@ static void mlx5e_consume_skb(struct mlx5e_txqsq *sq, struct sk_buff *skb,
u64 ts = get_cqe_ts(cqe);

hwts.hwtstamp = mlx5e_cqe_ts_to_ns(sq->ptp_cyc2time, sq->clock, ts);
- if (sq->ptpsq)
+ if (sq->ptpsq) {
mlx5e_skb_cb_hwtstamp_handler(skb, MLX5E_SKB_CB_CQE_HWTSTAMP,
hwts.hwtstamp, sq->ptpsq->cq_stats);
- else
+ } else {
skb_tstamp_tx(skb, &hwts);
+ sq->stats->timestamps++;
+ }
}

napi_consume_skb(skb, napi_budget);
--
2.42.0


2024-02-23 19:28:25

by Rahul Rameshbabu

[permalink] [raw]
Subject: [PATCH RFC net-next v1 1/6] ethtool: add interface to read Tx hardware timestamping statistics

Multiple network devices that support hardware timestamping appear to have
common behavior with regards to timestamp handling. Implement common Tx
hardware timestamping statistics in a tx_stats struct_group. Common Rx
hardware timestamping statistics can subsequently be implemented in a
rx_stats struct_group for ethtool_ts_stats.

Signed-off-by: Rahul Rameshbabu <[email protected]>
Reviewed-by: Dragos Tatulea <[email protected]>
---
include/linux/ethtool.h | 28 +++++++++++++++
include/uapi/linux/ethtool.h | 20 +++++++++++
include/uapi/linux/ethtool_netlink.h | 17 +++++++++
net/ethtool/netlink.h | 3 +-
net/ethtool/stats.c | 53 +++++++++++++++++++++++++---
net/ethtool/strset.c | 5 +++
6 files changed, 120 insertions(+), 6 deletions(-)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index b90c33607594..1cc1010afaca 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -483,6 +483,31 @@ struct ethtool_rmon_stats {
);
};

+/**
+ * struct ethtool_ts_stats - HW timestamping statistics
+ * @layer: input field denoting whether stats should be queried from the DMA or
+ * PHY timestamping layer. Defaults to the active layer for packet
+ * timestamping.
+ * @tx_stats: struct group for TX HW timestamping
+ * @pkts: Number of packets successfully timestamped by the queried
+ * layer.
+ * @lost: Number of packet timestamps that failed to get applied on a
+ * packet by the queried layer.
+ * @late: Number of packet timestamps that were delivered by the
+ * hardware but were lost due to arriving too late.
+ * @err: Number of timestamping errors that occurred on the queried
+ * layer.
+ */
+struct ethtool_ts_stats {
+ enum ethtool_ts_stats_layer layer;
+ struct_group(tx_stats,
+ u64 pkts;
+ u64 lost;
+ u64 late;
+ u64 err;
+ );
+};
+
#define ETH_MODULE_EEPROM_PAGE_LEN 128
#define ETH_MODULE_MAX_I2C_ADDRESS 0x7f

@@ -807,6 +832,7 @@ struct ethtool_rxfh_param {
* @get_eth_ctrl_stats: Query some of the IEEE 802.3 MAC Ctrl statistics.
* @get_rmon_stats: Query some of the RMON (RFC 2819) statistics.
* Set %ranges to a pointer to zero-terminated array of byte ranges.
+ * @get_eth_ts_stats: Query the device hardware timestamping statistics.
* @get_module_power_mode: Get the power mode policy for the plug-in module
* used by the network device and its operational power mode, if
* plugged-in.
@@ -943,6 +969,8 @@ struct ethtool_ops {
void (*get_rmon_stats)(struct net_device *dev,
struct ethtool_rmon_stats *rmon_stats,
const struct ethtool_rmon_hist_range **ranges);
+ void (*get_ts_stats)(struct net_device *dev,
+ struct ethtool_ts_stats *ts_stats);
int (*get_module_power_mode)(struct net_device *dev,
struct ethtool_module_power_mode_params *params,
struct netlink_ext_ack *extack);
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 06ef6b78b7de..39edae554fc5 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -681,6 +681,7 @@ enum ethtool_link_ext_substate_module {
* @ETH_SS_STATS_ETH_MAC: names of IEEE 802.3 MAC statistics
* @ETH_SS_STATS_ETH_CTRL: names of IEEE 802.3 MAC Control statistics
* @ETH_SS_STATS_RMON: names of RMON statistics
+ * @ETH_SS_STATS_TS: names of hardware timestamping statistics
*
* @ETH_SS_COUNT: number of defined string sets
*/
@@ -706,6 +707,7 @@ enum ethtool_stringset {
ETH_SS_STATS_ETH_MAC,
ETH_SS_STATS_ETH_CTRL,
ETH_SS_STATS_RMON,
+ ETH_SS_STATS_TS,

/* add new constants above here */
ETH_SS_COUNT
@@ -1462,6 +1464,24 @@ struct ethtool_ts_info {
__u32 rx_reserved[3];
};

+/**
+ * enum ethtool_ts_stats_layer - layer to query hardware timestamping statistics
+ * @ETHTOOL_TS_STATS_LAYER_ACTIVE:
+ * retrieve the statistics from the layer that is currently feeding
+ * hardware timestamps for packets.
+ * @ETHTOOL_TS_STATS_LAYER_DMA:
+ * retrieve the statistics from the DMA hardware timestamping layer of the
+ * device.
+ * @ETHTOOL_TS_STATS_PHY:
+ * retrieve the statistics from the PHY hardware timestamping layer of the
+ * device.
+ */
+enum ethtool_ts_stats_layer {
+ ETHTOOL_TS_STATS_LAYER_ACTIVE,
+ ETHTOOL_TS_STATS_LAYER_DMA,
+ ETHTOOL_TS_STATS_LAYER_PHY,
+};
+
/*
* %ETHTOOL_SFEATURES changes features present in features[].valid to the
* values of corresponding bits in features[].requested. Bits in .requested
diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index 3f89074aa06c..55f2a3c8caa0 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -749,6 +749,7 @@ enum {
ETHTOOL_A_STATS_GRP, /* nest - _A_STATS_GRP_* */

ETHTOOL_A_STATS_SRC, /* u32 */
+ ETHTOOL_A_STATS_LAYER, /* u32 */

/* add new constants above here */
__ETHTOOL_A_STATS_CNT,
@@ -760,6 +761,7 @@ enum {
ETHTOOL_STATS_ETH_MAC,
ETHTOOL_STATS_ETH_CTRL,
ETHTOOL_STATS_RMON,
+ ETHTOOL_STATS_TS,

/* add new constants above here */
__ETHTOOL_STATS_CNT
@@ -875,6 +877,21 @@ enum {
ETHTOOL_A_STATS_RMON_MAX = (__ETHTOOL_A_STATS_RMON_CNT - 1)
};

+enum {
+ /* hwTimestampingTxPkts */
+ ETHTOOL_A_STATS_TS_TX_PKT,
+ /* hwTimestampingTxLost */
+ ETHTOOL_A_STATS_TS_TX_LOST,
+ /* hwTimestampingTxLate */
+ ETHTOOL_A_STATS_TS_TX_LATE,
+ /* hwTimestampingTxErrors */
+ ETHTOOL_A_STATS_TS_TX_ERRORS,
+
+ /* add new constants above here */
+ __ETHTOOL_A_STATS_TS_CNT,
+ ETHTOOL_A_STATS_TS_MAX = (__ETHTOOL_A_STATS_TS_CNT - 1)
+};
+
/* MODULE */

enum {
diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
index 9a333a8d04c1..962ecd62aeea 100644
--- a/net/ethtool/netlink.h
+++ b/net/ethtool/netlink.h
@@ -429,7 +429,7 @@ extern const struct nla_policy ethnl_tunnel_info_get_policy[ETHTOOL_A_TUNNEL_INF
extern const struct nla_policy ethnl_fec_get_policy[ETHTOOL_A_FEC_HEADER + 1];
extern const struct nla_policy ethnl_fec_set_policy[ETHTOOL_A_FEC_AUTO + 1];
extern const struct nla_policy ethnl_module_eeprom_get_policy[ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS + 1];
-extern const struct nla_policy ethnl_stats_get_policy[ETHTOOL_A_STATS_SRC + 1];
+extern const struct nla_policy ethnl_stats_get_policy[__ETHTOOL_A_STATS_CNT];
extern const struct nla_policy ethnl_phc_vclocks_get_policy[ETHTOOL_A_PHC_VCLOCKS_HEADER + 1];
extern const struct nla_policy ethnl_module_get_policy[ETHTOOL_A_MODULE_HEADER + 1];
extern const struct nla_policy ethnl_module_set_policy[ETHTOOL_A_MODULE_POWER_MODE_POLICY + 1];
@@ -454,5 +454,6 @@ extern const char stats_eth_phy_names[__ETHTOOL_A_STATS_ETH_PHY_CNT][ETH_GSTRING
extern const char stats_eth_mac_names[__ETHTOOL_A_STATS_ETH_MAC_CNT][ETH_GSTRING_LEN];
extern const char stats_eth_ctrl_names[__ETHTOOL_A_STATS_ETH_CTRL_CNT][ETH_GSTRING_LEN];
extern const char stats_rmon_names[__ETHTOOL_A_STATS_RMON_CNT][ETH_GSTRING_LEN];
+extern const char stats_ts_names[__ETHTOOL_A_STATS_TS_CNT][ETH_GSTRING_LEN];

#endif /* _NET_ETHTOOL_NETLINK_H */
diff --git a/net/ethtool/stats.c b/net/ethtool/stats.c
index 912f0c4fff2f..e4333d77fb31 100644
--- a/net/ethtool/stats.c
+++ b/net/ethtool/stats.c
@@ -8,6 +8,7 @@ struct stats_req_info {
struct ethnl_req_info base;
DECLARE_BITMAP(stat_mask, __ETHTOOL_STATS_CNT);
enum ethtool_mac_stats_src src;
+ enum ethtool_ts_stats_layer layer;
};

#define STATS_REQINFO(__req_base) \
@@ -20,6 +21,7 @@ struct stats_reply_data {
struct ethtool_eth_mac_stats mac_stats;
struct ethtool_eth_ctrl_stats ctrl_stats;
struct ethtool_rmon_stats rmon_stats;
+ struct ethtool_ts_stats ts_stats;
);
const struct ethtool_rmon_hist_range *rmon_ranges;
};
@@ -32,6 +34,7 @@ const char stats_std_names[__ETHTOOL_STATS_CNT][ETH_GSTRING_LEN] = {
[ETHTOOL_STATS_ETH_MAC] = "eth-mac",
[ETHTOOL_STATS_ETH_CTRL] = "eth-ctrl",
[ETHTOOL_STATS_RMON] = "rmon",
+ [ETHTOOL_STATS_TS] = "ts",
};

const char stats_eth_phy_names[__ETHTOOL_A_STATS_ETH_PHY_CNT][ETH_GSTRING_LEN] = {
@@ -76,18 +79,27 @@ const char stats_rmon_names[__ETHTOOL_A_STATS_RMON_CNT][ETH_GSTRING_LEN] = {
[ETHTOOL_A_STATS_RMON_JABBER] = "etherStatsJabbers",
};

-const struct nla_policy ethnl_stats_get_policy[ETHTOOL_A_STATS_SRC + 1] = {
- [ETHTOOL_A_STATS_HEADER] =
- NLA_POLICY_NESTED(ethnl_header_policy),
- [ETHTOOL_A_STATS_GROUPS] = { .type = NLA_NESTED },
- [ETHTOOL_A_STATS_SRC] =
+const char stats_ts_names[__ETHTOOL_A_STATS_TS_CNT][ETH_GSTRING_LEN] = {
+ [ETHTOOL_A_STATS_TS_TX_PKT] = "hwTimestampingTxPkts",
+ [ETHTOOL_A_STATS_TS_TX_LOST] = "hwTimestampingTxLost",
+ [ETHTOOL_A_STATS_TS_TX_LATE] = "hwTimestampingTxLate",
+ [ETHTOOL_A_STATS_TS_TX_ERRORS] = "hwTimestampingTxErrors",
+};
+
+const struct nla_policy ethnl_stats_get_policy[__ETHTOOL_A_STATS_CNT] = {
+ [ETHTOOL_A_STATS_HEADER] = NLA_POLICY_NESTED(ethnl_header_policy),
+ [ETHTOOL_A_STATS_GROUPS] = { .type = NLA_NESTED },
+ [ETHTOOL_A_STATS_SRC] =
NLA_POLICY_MAX(NLA_U32, ETHTOOL_MAC_STATS_SRC_PMAC),
+ [ETHTOOL_A_STATS_LAYER] =
+ NLA_POLICY_MAX(NLA_U32, ETHTOOL_TS_STATS_LAYER_PHY),
};

static int stats_parse_request(struct ethnl_req_info *req_base,
struct nlattr **tb,
struct netlink_ext_ack *extack)
{
+ enum ethtool_ts_stats_layer layer = ETHTOOL_TS_STATS_LAYER_ACTIVE;
enum ethtool_mac_stats_src src = ETHTOOL_MAC_STATS_SRC_AGGREGATE;
struct stats_req_info *req_info = STATS_REQINFO(req_base);
bool mod = false;
@@ -104,9 +116,12 @@ static int stats_parse_request(struct ethnl_req_info *req_base,
return -EINVAL;
}

+ if (tb[ETHTOOL_A_STATS_LAYER])
+ layer = nla_get_u32(tb[ETHTOOL_A_STATS_LAYER]);
if (tb[ETHTOOL_A_STATS_SRC])
src = nla_get_u32(tb[ETHTOOL_A_STATS_SRC]);

+ req_info->layer = layer;
req_info->src = src;

return 0;
@@ -118,6 +133,7 @@ static int stats_prepare_data(const struct ethnl_req_info *req_base,
{
const struct stats_req_info *req_info = STATS_REQINFO(req_base);
struct stats_reply_data *data = STATS_REPDATA(reply_base);
+ enum ethtool_ts_stats_layer layer = req_info->layer;
enum ethtool_mac_stats_src src = req_info->src;
struct net_device *dev = reply_base->dev;
int ret;
@@ -144,6 +160,7 @@ static int stats_prepare_data(const struct ethnl_req_info *req_base,
data->mac_stats.src = src;
data->ctrl_stats.src = src;
data->rmon_stats.src = src;
+ data->ts_stats.layer = layer;

if (test_bit(ETHTOOL_STATS_ETH_PHY, req_info->stat_mask) &&
dev->ethtool_ops->get_eth_phy_stats)
@@ -158,6 +175,9 @@ static int stats_prepare_data(const struct ethnl_req_info *req_base,
dev->ethtool_ops->get_rmon_stats)
dev->ethtool_ops->get_rmon_stats(dev, &data->rmon_stats,
&data->rmon_ranges);
+ if (test_bit(ETHTOOL_STATS_TS, req_info->stat_mask) &&
+ dev->ethtool_ops->get_ts_stats)
+ dev->ethtool_ops->get_ts_stats(dev, &data->ts_stats);

ethnl_ops_complete(dev);
return 0;
@@ -194,6 +214,10 @@ static int stats_reply_size(const struct ethnl_req_info *req_base,
nla_total_size(4)) * /* _A_STATS_GRP_HIST_BKT_HI */
ETHTOOL_RMON_HIST_MAX * 2;
}
+ if (test_bit(ETHTOOL_STATS_TS, req_info->stat_mask)) {
+ n_stats += sizeof(struct ethtool_ts_stats) / sizeof(u64);
+ n_grps++;
+ }

len += n_grps * (nla_total_size(0) + /* _A_STATS_GRP */
nla_total_size(4) + /* _A_STATS_GRP_ID */
@@ -370,6 +394,22 @@ static int stats_put_rmon_stats(struct sk_buff *skb,
return 0;
}

+static int stats_put_ts_stats(struct sk_buff *skb,
+ const struct stats_reply_data *data)
+{
+ if (stat_put(skb, ETHTOOL_A_STATS_TS_TX_PKT,
+ data->ts_stats.pkts) ||
+ stat_put(skb, ETHTOOL_A_STATS_TS_TX_LOST,
+ data->ts_stats.lost) ||
+ stat_put(skb, ETHTOOL_A_STATS_TS_TX_LATE,
+ data->ts_stats.late) ||
+ stat_put(skb, ETHTOOL_A_STATS_TS_TX_ERRORS,
+ data->ts_stats.err))
+ return -EMSGSIZE;
+
+ return 0;
+}
+
static int stats_put_stats(struct sk_buff *skb,
const struct stats_reply_data *data,
u32 id, u32 ss_id,
@@ -423,6 +463,9 @@ static int stats_fill_reply(struct sk_buff *skb,
if (!ret && test_bit(ETHTOOL_STATS_RMON, req_info->stat_mask))
ret = stats_put_stats(skb, data, ETHTOOL_STATS_RMON,
ETH_SS_STATS_RMON, stats_put_rmon_stats);
+ if (!ret && test_bit(ETHTOOL_STATS_TS, req_info->stat_mask))
+ ret = stats_put_stats(skb, data, ETHTOOL_STATS_TS,
+ ETH_SS_STATS_TS, stats_put_ts_stats);

return ret;
}
diff --git a/net/ethtool/strset.c b/net/ethtool/strset.c
index c678b484a079..ce1e193076c3 100644
--- a/net/ethtool/strset.c
+++ b/net/ethtool/strset.c
@@ -105,6 +105,11 @@ static const struct strset_info info_template[] = {
.count = __ETHTOOL_A_STATS_RMON_CNT,
.strings = stats_rmon_names,
},
+ [ETH_SS_STATS_TS] = {
+ .per_dev = false,
+ .count = __ETHTOOL_A_STATS_TS_CNT,
+ .strings = stats_ts_names,
+ },
};

struct strset_req_info {
--
2.42.0


2024-02-23 19:28:38

by Rahul Rameshbabu

[permalink] [raw]
Subject: [PATCH RFC net-next v1 4/6] net/mlx5e: Implement ethtool hardware timestamping statistics

Feed driver statistics counters related to hardware timestamping to
standardized ethtool hardware timestamping statistics group.

Signed-off-by: Rahul Rameshbabu <[email protected]>
Reviewed-by: Dragos Tatulea <[email protected]>
---
.../ethernet/mellanox/mlx5/core/en_ethtool.c | 9 +++
.../ethernet/mellanox/mlx5/core/en_stats.c | 68 +++++++++++++++++++
.../ethernet/mellanox/mlx5/core/en_stats.h | 2 +
3 files changed, 79 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index cc51ce16df14..d3b77054c30a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -2381,6 +2381,14 @@ static void mlx5e_get_rmon_stats(struct net_device *netdev,
mlx5e_stats_rmon_get(priv, rmon_stats, ranges);
}

+static void mlx5e_get_ts_stats(struct net_device *netdev,
+ struct ethtool_ts_stats *ts_stats)
+{
+ struct mlx5e_priv *priv = netdev_priv(netdev);
+
+ mlx5e_stats_ts_get(priv, ts_stats);
+}
+
const struct ethtool_ops mlx5e_ethtool_ops = {
.cap_rss_ctx_supported = true,
.supported_coalesce_params = ETHTOOL_COALESCE_USECS |
@@ -2430,5 +2438,6 @@ const struct ethtool_ops mlx5e_ethtool_ops = {
.get_eth_mac_stats = mlx5e_get_eth_mac_stats,
.get_eth_ctrl_stats = mlx5e_get_eth_ctrl_stats,
.get_rmon_stats = mlx5e_get_rmon_stats,
+ .get_ts_stats = mlx5e_get_ts_stats,
.get_link_ext_stats = mlx5e_get_link_ext_stats
};
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
index bc31196d348a..836198445726 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
@@ -1155,6 +1155,74 @@ void mlx5e_stats_rmon_get(struct mlx5e_priv *priv,
*ranges = mlx5e_rmon_ranges;
}

+void mlx5e_stats_ts_get(struct mlx5e_priv *priv,
+ struct ethtool_ts_stats *ts_stats)
+{
+ enum ethtool_ts_stats_layer layer;
+ struct mlx5e_ptp *ptp;
+ bool tx_ptp_opened;
+ int i, j;
+
+ mutex_lock(&priv->state_lock);
+
+ tx_ptp_opened = priv->tx_ptp_opened;
+
+ /* NOTE: this needs to be changed whenever ethtool timestamping
+ * layer selection is implemented.
+ */
+ if (ts_stats->layer == ETHTOOL_TS_STATS_LAYER_ACTIVE)
+ layer = tx_ptp_opened ? ETHTOOL_TS_STATS_LAYER_PHY :
+ ETHTOOL_TS_STATS_LAYER_DMA;
+ else
+ layer = ts_stats->layer;
+
+ switch (layer) {
+ case ETHTOOL_TS_STATS_LAYER_PHY:
+ if (!tx_ptp_opened)
+ return;
+
+ ptp = priv->channels.ptp;
+
+ ts_stats->pkts = 0;
+ ts_stats->err = 0;
+ ts_stats->late = 0;
+ ts_stats->lost = 0;
+
+ /* Aggregate stats across all TCs */
+ for (i = 0; i < ptp->num_tc; i++) {
+ struct mlx5e_ptp_cq_stats *stats = ptp->ptpsq[i].cq_stats;
+
+ ts_stats->pkts += stats->cqe;
+ ts_stats->err += stats->abort + stats->err_cqe;
+ ts_stats->late += stats->late_cqe;
+ ts_stats->lost += stats->lost_cqe;
+ }
+ break;
+ case ETHTOOL_TS_STATS_LAYER_DMA:
+ /* DMA layer will always successfully timestamp packets. Other
+ * counters do not make sense for this layer.
+ */
+ ts_stats->pkts = 0;
+
+ /* Aggregate stats across all SQs */
+ mutex_lock(&priv->state_lock);
+ for (j = 0; j < priv->channels.num; j++) {
+ struct mlx5e_channel *c = priv->channels.c[j];
+
+ for (i = 0; i < c->num_tc; i++) {
+ struct mlx5e_sq_stats *stats = c->sq[i].stats;
+
+ ts_stats->pkts += stats->timestamps;
+ }
+ }
+ break;
+ default:
+ break;
+ }
+
+ mutex_unlock(&priv->state_lock);
+}
+
#define PPORT_PHY_STATISTICAL_OFF(c) \
MLX5_BYTE_OFF(ppcnt_reg, \
counter_set.phys_layer_statistical_cntrs.c##_high)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
index 3c634c5fd420..7b3e6cf1229a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
@@ -126,6 +126,8 @@ void mlx5e_stats_eth_ctrl_get(struct mlx5e_priv *priv,
void mlx5e_stats_rmon_get(struct mlx5e_priv *priv,
struct ethtool_rmon_stats *rmon,
const struct ethtool_rmon_hist_range **ranges);
+void mlx5e_stats_ts_get(struct mlx5e_priv *priv,
+ struct ethtool_ts_stats *ts_stats);
void mlx5e_get_link_ext_stats(struct net_device *dev,
struct ethtool_link_ext_stats *stats);

--
2.42.0


2024-02-23 19:28:57

by Rahul Rameshbabu

[permalink] [raw]
Subject: [PATCH RFC net-next v1 5/6] tools: ynl: ethtool.py: Make tool invokable from any CWD

ethtool.py depends on yml files in a specific location of the linux kernel
tree. Using relative lookup for those files means that ethtool.py would
need to be run under tools/net/ynl/. Lookup needed yml files without
depending on the current working directory that ethtool.py is invoked from.

Signed-off-by: Rahul Rameshbabu <[email protected]>
Reviewed-by: Dragos Tatulea <[email protected]>
---
tools/net/ynl/ethtool.py | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tools/net/ynl/ethtool.py b/tools/net/ynl/ethtool.py
index 6c9f7e31250c..44ba3ba58ed9 100755
--- a/tools/net/ynl/ethtool.py
+++ b/tools/net/ynl/ethtool.py
@@ -6,6 +6,7 @@ import json
import pprint
import sys
import re
+import os

from lib import YnlFamily

@@ -152,8 +153,11 @@ def main():
global args
args = parser.parse_args()

- spec = '../../../Documentation/netlink/specs/ethtool.yaml'
- schema = '../../../Documentation/netlink/genetlink-legacy.yaml'
+ script_abs_dir = os.path.dirname(os.path.abspath(sys.argv[0]))
+ spec = os.path.join(script_abs_dir,
+ '../../../Documentation/netlink/specs/ethtool.yaml')
+ schema = os.path.join(script_abs_dir,
+ '../../../Documentation/netlink/genetlink-legacy.yaml')

ynl = YnlFamily(spec, schema)

--
2.42.0


2024-02-23 19:29:19

by Rahul Rameshbabu

[permalink] [raw]
Subject: [PATCH RFC net-next v1 6/6] tools: ynl: ethtool.py: Add ts ethtool statistics group

Add placeholder for hardware timestamping statistics group for testing
stats-get ethtool netlink family operation.

Signed-off-by: Rahul Rameshbabu <[email protected]>
Reviewed-by: Dragos Tatulea <[email protected]>
---
tools/net/ynl/ethtool.py | 1 +
1 file changed, 1 insertion(+)

diff --git a/tools/net/ynl/ethtool.py b/tools/net/ynl/ethtool.py
index 44ba3ba58ed9..d3bf3690d3a5 100755
--- a/tools/net/ynl/ethtool.py
+++ b/tools/net/ynl/ethtool.py
@@ -314,6 +314,7 @@ def main():
{ 'name': 'eth-mac', 'value': True },
#{ 'name': 'eth-ctrl', 'value': True },
#{ 'name': 'rmon', 'value': True },
+ #{ 'name': 'ts', 'value': True },
#],
},
},
--
2.42.0


2024-02-23 21:02:23

by Jacob Keller

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v1 0/6] ethtool HW timestamping statistics



On 2/23/2024 11:24 AM, Rahul Rameshbabu wrote:
> The goal of this patch series is to introduce a common set of ethtool statistics
> for hardware timestamping that a driver implementer can hook into. The
> statistics counters added are based on what I believe are common
> patterns/behaviors found across various hardware timestamping implementations
> seen in the kernel tree today. The mlx5 family of devices is used as the PoC for
> this patch series. Other vendors are more than welcome to chim in on this
> series.
>
> Statistics can be queried from either the DMA or PHY layers. I think this
> concept of layer-based statistics selection and the general timestamping layer
> selection work Kory Maincent is working on can be converged together [1].
>
> [1] https://lore.kernel.org/netdev/[email protected]/
>

Makes sense! I like this direction, I had meant to work on this for the
Intel parts, but got sidetracked by other tasks. I look forward to
seeing what you've done here.

2024-02-23 21:07:32

by Jacob Keller

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v1 1/6] ethtool: add interface to read Tx hardware timestamping statistics



On 2/23/2024 11:24 AM, Rahul Rameshbabu wrote:
> +/**
> + * struct ethtool_ts_stats - HW timestamping statistics
> + * @layer: input field denoting whether stats should be queried from the DMA or
> + * PHY timestamping layer. Defaults to the active layer for packet
> + * timestamping.
> + * @tx_stats: struct group for TX HW timestamping
> + * @pkts: Number of packets successfully timestamped by the queried
> + * layer.
> + * @lost: Number of packet timestamps that failed to get applied on a
> + * packet by the queried layer.
> + * @late: Number of packet timestamps that were delivered by the
> + * hardware but were lost due to arriving too late.
> + * @err: Number of timestamping errors that occurred on the queried
> + * layer.
> + */
> +struct ethtool_ts_stats {
> + enum ethtool_ts_stats_layer layer;
> + struct_group(tx_stats,
> + u64 pkts;
> + u64 lost;
> + u64 late;
> + u64 err;
> + );
> +};

The Intel ice drivers has the following Tx timestamp statistics:

tx_hwtstamp_skipped - indicates when we get a Tx timestamp request but
are unable to fulfill it.
tx_hwtstamp_timeouts - indicates we had a Tx timestamp skb waiting for a
timestamp from hardware but it didn't get received within some internal
time limit.
tx_hwtstamp_flushed - indicates that we flushed an outstanding timestamp
before it completed, such as if the link resets or similar.
tx_hwtstamp_discarded - indicates that we obtained a timestamp from
hardware but were unable to complete it due to invalid cached data used
for timestamp extension.

I think these could be translated roughly to one of the lost, late, or
err stats. I am a bit confused as to how drivers could distinguish
between lost and late, but I guess that depends on the specific hardware
design.

In theory we could keep some of these more detailed stats but I don't
think we strictly need to be as detailed as the ice driver is.

The only major addition I think is the skipped stat, which I would
prefer to have. Perhaps that could be tracked in the netdev layer by
checking whether the skb flags to see whether or not the driver actually
set the appropriate flag?

I think i can otherwise translate the flushed status to the lost
category, the timeout to the late category, and everything else to the
error category. I can easily add a counter to track completed timestamps
as well.

TL;DR; I would like to see a "skipped" category since I think that
should be distinguished from general errors.

Thanks!

2024-02-23 21:09:03

by Jacob Keller

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v1 5/6] tools: ynl: ethtool.py: Make tool invokable from any CWD



On 2/23/2024 11:24 AM, Rahul Rameshbabu wrote:
> ethtool.py depends on yml files in a specific location of the linux kernel
> tree. Using relative lookup for those files means that ethtool.py would
> need to be run under tools/net/ynl/. Lookup needed yml files without
> depending on the current working directory that ethtool.py is invoked from.
>
> Signed-off-by: Rahul Rameshbabu <[email protected]>
> Reviewed-by: Dragos Tatulea <[email protected]>
> ---
> tools/net/ynl/ethtool.py | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/tools/net/ynl/ethtool.py b/tools/net/ynl/ethtool.py
> index 6c9f7e31250c..44ba3ba58ed9 100755
> --- a/tools/net/ynl/ethtool.py
> +++ b/tools/net/ynl/ethtool.py
> @@ -6,6 +6,7 @@ import json
> import pprint
> import sys
> import re
> +import os
>
> from lib import YnlFamily
>
> @@ -152,8 +153,11 @@ def main():
> global args
> args = parser.parse_args()
>
> - spec = '../../../Documentation/netlink/specs/ethtool.yaml'
> - schema = '../../../Documentation/netlink/genetlink-legacy.yaml'
> + script_abs_dir = os.path.dirname(os.path.abspath(sys.argv[0]))
> + spec = os.path.join(script_abs_dir,
> + '../../../Documentation/netlink/specs/ethtool.yaml')
> + schema = os.path.join(script_abs_dir,
> + '../../../Documentation/netlink/genetlink-legacy.yaml')
>

This seems like a worthwhile improvement to make the tool more usable.

Thanks,
Jake

> ynl = YnlFamily(spec, schema)
>

2024-02-23 21:13:25

by Jacob Keller

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v1 0/6] ethtool HW timestamping statistics



On 2/23/2024 1:00 PM, Jacob Keller wrote:
>
>
> On 2/23/2024 11:24 AM, Rahul Rameshbabu wrote:
>> The goal of this patch series is to introduce a common set of ethtool statistics
>> for hardware timestamping that a driver implementer can hook into. The
>> statistics counters added are based on what I believe are common
>> patterns/behaviors found across various hardware timestamping implementations
>> seen in the kernel tree today. The mlx5 family of devices is used as the PoC for
>> this patch series. Other vendors are more than welcome to chim in on this
>> series.
>>
>> Statistics can be queried from either the DMA or PHY layers. I think this
>> concept of layer-based statistics selection and the general timestamping layer
>> selection work Kory Maincent is working on can be converged together [1].
>>
>> [1] https://lore.kernel.org/netdev/[email protected]/
>>
>
> Makes sense! I like this direction, I had meant to work on this for the
> Intel parts, but got sidetracked by other tasks. I look forward to
> seeing what you've done here.
>

I'm fairly happy with the series overall, and the entire thing read
pretty straight forward.

I'd be happy to follow up with patches to convert the ice driver to
report these statistics as well.

2024-02-23 22:35:19

by Rahul Rameshbabu

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v1 1/6] ethtool: add interface to read Tx hardware timestamping statistics

On Fri, 23 Feb, 2024 13:07:08 -0800 Jacob Keller <[email protected]> wrote:
> On 2/23/2024 11:24 AM, Rahul Rameshbabu wrote:
>> +/**
>> + * struct ethtool_ts_stats - HW timestamping statistics
>> + * @layer: input field denoting whether stats should be queried from the DMA or
>> + * PHY timestamping layer. Defaults to the active layer for packet
>> + * timestamping.
>> + * @tx_stats: struct group for TX HW timestamping
>> + * @pkts: Number of packets successfully timestamped by the queried
>> + * layer.
>> + * @lost: Number of packet timestamps that failed to get applied on a
>> + * packet by the queried layer.
>> + * @late: Number of packet timestamps that were delivered by the
>> + * hardware but were lost due to arriving too late.
>> + * @err: Number of timestamping errors that occurred on the queried
>> + * layer.
>> + */
>> +struct ethtool_ts_stats {
>> + enum ethtool_ts_stats_layer layer;
>> + struct_group(tx_stats,
>> + u64 pkts;
>> + u64 lost;
>> + u64 late;
>> + u64 err;
>> + );
>> +};
>
> The Intel ice drivers has the following Tx timestamp statistics:
>
> tx_hwtstamp_skipped - indicates when we get a Tx timestamp request but
> are unable to fulfill it.
> tx_hwtstamp_timeouts - indicates we had a Tx timestamp skb waiting for a
> timestamp from hardware but it didn't get received within some internal
> time limit.

This is interesting. In mlx5 land, the only case where we are unable to
fulfill a hwtstamp is when the timestamp information is lost or late.

lost for us means that the timestamp never arrived within some internal
time limit that our device will supposedly never be able to deliver
timestamp information after that point.

late for us is that we got hardware timestamp information delivered
after that internal time limit. We are able to track this by using
identifiers in our completion events and we only release references to
these identifiers upon delivery (never delivering leaks the references.
Enough build up leads to a recovery flow). The theory for us is that
late timestamp information arrival after that period of time should not
happen. However the truth is that it does happen and we want our driver
implementation to be resilient to this case rather than trusting the
time interval.

Do you have any example of a case of skipping timestamp information that
is not related to lack of delivery over time? I am wondering if this
case is more like a hardware error or not. Or is it more like something
along the lines of being busy/would impact line rate of timestamp
information must be recorded?

> tx_hwtstamp_flushed - indicates that we flushed an outstanding timestamp
> before it completed, such as if the link resets or similar.
> tx_hwtstamp_discarded - indicates that we obtained a timestamp from
> hardware but were unable to complete it due to invalid cached data used
> for timestamp extension.
>
> I think these could be translated roughly to one of the lost, late, or
> err stats. I am a bit confused as to how drivers could distinguish
> between lost and late, but I guess that depends on the specific hardware
> design.
>
> In theory we could keep some of these more detailed stats but I don't
> think we strictly need to be as detailed as the ice driver is.

We also converged a statistic in the mlx5 driver to the simple error
counter here. I think what makes sense is design specific counters
should be exposed as driver specific counters and more common counters
should be converged into the ethtool_ts_stats struct.

>
> The only major addition I think is the skipped stat, which I would
> prefer to have. Perhaps that could be tracked in the netdev layer by
> checking whether the skb flags to see whether or not the driver actually
> set the appropriate flag?

I guess the problem is how would the core stack know at what layer this
was skipped at (I think Kory's patch series can be used to help with
this since it's adding a common interface in ethtool to select the
timestamping layer). As of today, mlx5 is the only driver I know of that
supports selecting between the DMA and PHY layers for timestamp
information.

>
> I think i can otherwise translate the flushed status to the lost
> category, the timeout to the late category, and everything else to the
> error category. I can easily add a counter to track completed timestamps
> as well.

Sounds good.

--
Thanks,

Rahul Rameshbabu

2024-02-23 22:42:27

by Rahul Rameshbabu

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v1 5/6] tools: ynl: ethtool.py: Make tool invokable from any CWD

On Fri, 23 Feb, 2024 13:08:34 -0800 Jacob Keller <[email protected]> wrote:
> On 2/23/2024 11:24 AM, Rahul Rameshbabu wrote:
>> ethtool.py depends on yml files in a specific location of the linux kernel
>> tree. Using relative lookup for those files means that ethtool.py would
>> need to be run under tools/net/ynl/. Lookup needed yml files without
>> depending on the current working directory that ethtool.py is invoked from.
>>
>> Signed-off-by: Rahul Rameshbabu <[email protected]>
>> Reviewed-by: Dragos Tatulea <[email protected]>
>> ---
>> tools/net/ynl/ethtool.py | 8 ++++++--
>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/net/ynl/ethtool.py b/tools/net/ynl/ethtool.py
>> index 6c9f7e31250c..44ba3ba58ed9 100755
>> --- a/tools/net/ynl/ethtool.py
>> +++ b/tools/net/ynl/ethtool.py
>> @@ -6,6 +6,7 @@ import json
>> import pprint
>> import sys
>> import re
>> +import os
>>
>> from lib import YnlFamily
>>
>> @@ -152,8 +153,11 @@ def main():
>> global args
>> args = parser.parse_args()
>>
>> - spec = '../../../Documentation/netlink/specs/ethtool.yaml'
>> - schema = '../../../Documentation/netlink/genetlink-legacy.yaml'
>> + script_abs_dir = os.path.dirname(os.path.abspath(sys.argv[0]))
>> + spec = os.path.join(script_abs_dir,
>> + '../../../Documentation/netlink/specs/ethtool.yaml')
>> + schema = os.path.join(script_abs_dir,
>> + '../../../Documentation/netlink/genetlink-legacy.yaml')
>>
>
> This seems like a worthwhile improvement to make the tool more usable.
>

Unfortunately, even though in the next patch after this one where I add
the ts stats group as a comment, the tool seems to fail at rendering the
actual counters of the stats group, so I had to use the ethtool tree to
test. I should look into that, so that way other contributors such as
Intel can simply use this script to test that they hooked into the
ethtool ts stats interface correctly.

--
Thanks,

Rahul Rameshbabu

2024-02-23 22:49:04

by Rahul Rameshbabu

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v1 0/6] ethtool HW timestamping statistics

On Fri, 23 Feb, 2024 13:00:10 -0800 Jacob Keller <[email protected]> wrote:
> On 2/23/2024 11:24 AM, Rahul Rameshbabu wrote:
>> The goal of this patch series is to introduce a common set of ethtool statistics
>> for hardware timestamping that a driver implementer can hook into. The
>> statistics counters added are based on what I believe are common
>> patterns/behaviors found across various hardware timestamping implementations
>> seen in the kernel tree today. The mlx5 family of devices is used as the PoC for
>> this patch series. Other vendors are more than welcome to chim in on this
>> series.
>>
>> Statistics can be queried from either the DMA or PHY layers. I think this
>> concept of layer-based statistics selection and the general timestamping layer
>> selection work Kory Maincent is working on can be converged together [1].
>>
>> [1] https://lore.kernel.org/netdev/[email protected]/
>>
>
> Makes sense! I like this direction, I had meant to work on this for the
> Intel parts, but got sidetracked by other tasks.

I still have a back burner task for a cyclecounter API change you
suggested a while back.... Hoping to get to that after getting some
large features out of the way in the next two months.

--
Thanks,

Rahul Rameshbabu

2024-02-23 22:49:24

by Jacob Keller

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v1 1/6] ethtool: add interface to read Tx hardware timestamping statistics



On 2/23/2024 2:21 PM, Rahul Rameshbabu wrote:
>> The Intel ice drivers has the following Tx timestamp statistics:
>>
>> tx_hwtstamp_skipped - indicates when we get a Tx timestamp request but
>> are unable to fulfill it.
>> tx_hwtstamp_timeouts - indicates we had a Tx timestamp skb waiting for a
>> timestamp from hardware but it didn't get received within some internal
>> time limit.
>
> This is interesting. In mlx5 land, the only case where we are unable to
> fulfill a hwtstamp is when the timestamp information is lost or late.
>

For ice, the timestamps are captured in the PHY and stored in a block of
registers with limited slots. The driver tracks the available slots and
uses one when a Tx timestamp request comes in.

So we have "skipped" because its possible to request too many timestamps
at once and fill up all the slots before the first timestamp is reported
back.

> lost for us means that the timestamp never arrived within some internal
> time limit that our device will supposedly never be able to deliver
> timestamp information after that point.
>

That is more or less the equivalent we have for timeout.

> late for us is that we got hardware timestamp information delivered
> after that internal time limit. We are able to track this by using
> identifiers in our completion events and we only release references to
> these identifiers upon delivery (never delivering leaks the references.
> Enough build up leads to a recovery flow). The theory for us is that
> late timestamp information arrival after that period of time should not
> happen. However the truth is that it does happen and we want our driver
> implementation to be resilient to this case rather than trusting the
> time interval.
>

In our case, because of how the slots work, once we "timeout" a slot, it
could get re-used. We set the timeout to be pretty ridiculous (1 second)
to ensure that if we do timeout its almost certainly because hardware
never timestamped the packet.

> Do you have any example of a case of skipping timestamp information that
> is not related to lack of delivery over time? I am wondering if this
> case is more like a hardware error or not. Or is it more like something
> along the lines of being busy/would impact line rate of timestamp
> information must be recorded?
>

The main example for skipped is the event where all our slots are full
at point of timestamp request.

There have been a few rare cases where things like a link event or
issues with the MAC dropping a packet where the PHY simply never gets
the packet and thus never timestamps it. This is typically the result of
a lost timestamp.

Flushed, for us, is when we reset the timestamp block while it has
timestamps oustanding. This can happen for example due to link changes,
where we ultimately

>> tx_hwtstamp_flushed - indicates that we flushed an outstanding timestamp
>> before it completed, such as if the link resets or similar.
>> tx_hwtstamp_discarded - indicates that we obtained a timestamp from
>> hardware but were unable to complete it due to invalid cached data used
>> for timestamp extension.
>>
>> I think these could be translated roughly to one of the lost, late, or
>> err stats. I am a bit confused as to how drivers could distinguish
>> between lost and late, but I guess that depends on the specific hardware
>> design.
>>
>> In theory we could keep some of these more detailed stats but I don't
>> think we strictly need to be as detailed as the ice driver is.
>
> We also converged a statistic in the mlx5 driver to the simple error
> counter here. I think what makes sense is design specific counters
> should be exposed as driver specific counters and more common counters
> should be converged into the ethtool_ts_stats struct.
>

Sure that seems reasonable.

>>
>> The only major addition I think is the skipped stat, which I would
>> prefer to have. Perhaps that could be tracked in the netdev layer by
>> checking whether the skb flags to see whether or not the driver actually
>> set the appropriate flag?
>
> I guess the problem is how would the core stack know at what layer this
> was skipped at (I think Kory's patch series can be used to help with
> this since it's adding a common interface in ethtool to select the
> timestamping layer). As of today, mlx5 is the only driver I know of that
> supports selecting between the DMA and PHY layers for timestamp
> information.
>

Well, the way the interface worked in my understanding was that the core
sets the SKBTX_HW_TSTAMP flag. The driver is supposed to then prepare
the packet for timestamp and set the SKBTX_IN_PROGRESS flag. I just
looked though, and it looks like ice doesn't actually set this flag...

If we fixed this, in theory the stack should be able to check after the
packet gets sent with SKBTX_HW_TSTAMP, if SKBTX_IN_PROGRESS isn't set
then it would be a skipped timestamp?

Its not really a huge deal, and this could just be lumped into either
lost or err.

2024-02-24 00:15:31

by Rahul Rameshbabu

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v1 1/6] ethtool: add interface to read Tx hardware timestamping statistics


On Fri, 23 Feb, 2024 14:48:51 -0800 Jacob Keller <[email protected]> wrote:
> On 2/23/2024 2:21 PM, Rahul Rameshbabu wrote:
>>> The Intel ice drivers has the following Tx timestamp statistics:
>>>
>>> tx_hwtstamp_skipped - indicates when we get a Tx timestamp request but
>>> are unable to fulfill it.
>>> tx_hwtstamp_timeouts - indicates we had a Tx timestamp skb waiting for a
>>> timestamp from hardware but it didn't get received within some internal
>>> time limit.
>>
>> This is interesting. In mlx5 land, the only case where we are unable to
>> fulfill a hwtstamp is when the timestamp information is lost or late.
>>
>
> For ice, the timestamps are captured in the PHY and stored in a block of
> registers with limited slots. The driver tracks the available slots and
> uses one when a Tx timestamp request comes in.
>
> So we have "skipped" because its possible to request too many timestamps
> at once and fill up all the slots before the first timestamp is reported
> back.
>
>> lost for us means that the timestamp never arrived within some internal
>> time limit that our device will supposedly never be able to deliver
>> timestamp information after that point.
>>
>
> That is more or less the equivalent we have for timeout.
>
>> late for us is that we got hardware timestamp information delivered
>> after that internal time limit. We are able to track this by using
>> identifiers in our completion events and we only release references to
>> these identifiers upon delivery (never delivering leaks the references.
>> Enough build up leads to a recovery flow). The theory for us is that
>> late timestamp information arrival after that period of time should not
>> happen. However the truth is that it does happen and we want our driver
>> implementation to be resilient to this case rather than trusting the
>> time interval.
>>
>
> In our case, because of how the slots work, once we "timeout" a slot, it
> could get re-used. We set the timeout to be pretty ridiculous (1 second)
> to ensure that if we do timeout its almost certainly because hardware
> never timestamped the packet.

We were thinking about that design as well. We use a 1 second timeout to
be safe.

#define MLX5E_PTP_TS_CQE_UNDELIVERED_TIMEOUT (1 * NSEC_PER_SEC)

Our device does not do any bookkeeping internally to prevent a
completion event with timestamp information from arriving after 1
second. Some internal folks have said it shouldn't be possible, but we
did not want to take any chances and built a model that is resilient to
timestamp deliveries after any period of time even after consuming the
skb without appending timestamp information. If no other vendor finds
this useful, we could roll this up into the error counter and leave the
late counter as vendor specific. I do not want to introduce too many
counters that are hard to understand. We document the device specific
counters on top of introducing them in the code base already.

https://docs.kernel.org/networking/device_drivers/ethernet/mellanox/mlx5/counters.html

>
>> Do you have any example of a case of skipping timestamp information that
>> is not related to lack of delivery over time? I am wondering if this
>> case is more like a hardware error or not. Or is it more like something
>> along the lines of being busy/would impact line rate of timestamp
>> information must be recorded?
>>
>
> The main example for skipped is the event where all our slots are full
> at point of timestamp request.

This is what I was guessing as the main (if not only reason). For this
specific reason, I think a general "busy" stats counter makes sense.
mlx5 does not need this counter, but I can see a lot of other hw
implementations needing this. (The skipped counter name obviously should
be left only in the ice driver. Just felt "busy" was easy to understand
for generalized counters.)

The reason why I prefer busy is that "skip" to me makes me think someone
used SIOCSHWTSTAMP to filter which packets get timestamped which is very
different from something like lack of resource availability.

https://docs.kernel.org/networking/timestamping.html#hardware-timestamping-configuration-siocshwtstamp-and-siocghwtstamp

>
> There have been a few rare cases where things like a link event or
> issues with the MAC dropping a packet where the PHY simply never gets
> the packet and thus never timestamps it. This is typically the result of
> a lost timestamp.
>
> Flushed, for us, is when we reset the timestamp block while it has
> timestamps oustanding. This can happen for example due to link changes,
> where we ultimately
>
>>> tx_hwtstamp_flushed - indicates that we flushed an outstanding timestamp
>>> before it completed, such as if the link resets or similar.
>>> tx_hwtstamp_discarded - indicates that we obtained a timestamp from
>>> hardware but were unable to complete it due to invalid cached data used
>>> for timestamp extension.
>>>
>>> I think these could be translated roughly to one of the lost, late, or
>>> err stats. I am a bit confused as to how drivers could distinguish
>>> between lost and late, but I guess that depends on the specific hardware
>>> design.
>>>
>>> In theory we could keep some of these more detailed stats but I don't
>>> think we strictly need to be as detailed as the ice driver is.
>>
>> We also converged a statistic in the mlx5 driver to the simple error
>> counter here. I think what makes sense is design specific counters
>> should be exposed as driver specific counters and more common counters
>> should be converged into the ethtool_ts_stats struct.
>>
>
> Sure that seems reasonable.
>
>>>
>>> The only major addition I think is the skipped stat, which I would
>>> prefer to have. Perhaps that could be tracked in the netdev layer by
>>> checking whether the skb flags to see whether or not the driver actually
>>> set the appropriate flag?
>>
>> I guess the problem is how would the core stack know at what layer this
>> was skipped at (I think Kory's patch series can be used to help with
>> this since it's adding a common interface in ethtool to select the
>> timestamping layer). As of today, mlx5 is the only driver I know of that
>> supports selecting between the DMA and PHY layers for timestamp
>> information.
>>
>
> Well, the way the interface worked in my understanding was that the core
> sets the SKBTX_HW_TSTAMP flag. The driver is supposed to then prepare
> the packet for timestamp and set the SKBTX_IN_PROGRESS flag. I just
> looked though, and it looks like ice doesn't actually set this flag...

That would be a good fix. We set this in mlx5.

/* device driver is going to provide hardware time stamp */
SKBTX_IN_PROGRESS = 1 << 2,

>
> If we fixed this, in theory the stack should be able to check after the
> packet gets sent with SKBTX_HW_TSTAMP, if SKBTX_IN_PROGRESS isn't set
> then it would be a skipped timestamp?

One question I have about this idea. Wouldn't SKBTX_IN_PROGRESS also not
be set in the case when timestamp information is lost/a timeout occurs?
I feel like the problem is not being able to separate these two cases
from the perspective of the core stack.

Btw, mlx5 does keep the flag even when we fail to write timestamp
information..... I feel like it might be a good idea to add a warning in
the core stack if both SKBTX_HW_TSTAMP and SKBTX_IN_PROGRESS are set but
the skb is consumed without skb_hwtstamps(skb) being written by the
driver before consuming the skb.

>
> Its not really a huge deal, and this could just be lumped into either
> lost or err.

--
Thanks,

Rahul Rameshbabu

2024-02-26 09:38:38

by Kory Maincent

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v1 1/6] ethtool: add interface to read Tx hardware timestamping statistics

On Fri, 23 Feb 2024 11:24:45 -0800
Rahul Rameshbabu <[email protected]> wrote:

> Multiple network devices that support hardware timestamping appear to have
> common behavior with regards to timestamp handling. Implement common Tx
> hardware timestamping statistics in a tx_stats struct_group. Common Rx
> hardware timestamping statistics can subsequently be implemented in a
> rx_stats struct_group for ethtool_ts_stats.
>
> Signed-off-by: Rahul Rameshbabu <[email protected]>
> Reviewed-by: Dragos Tatulea <[email protected]>

> +/**
> + * enum ethtool_ts_stats_layer - layer to query hardware timestamping
> statistics
> + * @ETHTOOL_TS_STATS_LAYER_ACTIVE:
> + * retrieve the statistics from the layer that is currently feeding
> + * hardware timestamps for packets.
> + * @ETHTOOL_TS_STATS_LAYER_DMA:
> + * retrieve the statistics from the DMA hardware timestamping layer
> of the
> + * device.
> + * @ETHTOOL_TS_STATS_PHY:
> + * retrieve the statistics from the PHY hardware timestamping layer
> of the
> + * device.
> + */
> +enum ethtool_ts_stats_layer {
> + ETHTOOL_TS_STATS_LAYER_ACTIVE,
> + ETHTOOL_TS_STATS_LAYER_DMA,
> + ETHTOOL_TS_STATS_LAYER_PHY,
> +};

The all point of my v8 series new implementation (asked by the maintainers) was
to move on from the timestamp layer to the phc provider which is described by a
phc index + phc qualifier (precise IEEE 1588/approx DMA). The struct being
introduce in patch 9 of my series.
You should do the same, use the phc provider instead of the layer.

With using only the layer and in case of several PHYs we could not reach the
right ts stats.
Same goes for the MAC having both type of timestamp IEEE 1588 and DMA.

Regards,
--
Köry Maincent, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com

2024-02-26 09:57:00

by Kory Maincent

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v1 4/6] net/mlx5e: Implement ethtool hardware timestamping statistics

On Fri, 23 Feb 2024 11:24:48 -0800
Rahul Rameshbabu <[email protected]> wrote:

> Feed driver statistics counters related to hardware timestamping to
> standardized ethtool hardware timestamping statistics group.
>
> Signed-off-by: Rahul Rameshbabu <[email protected]>
> Reviewed-by: Dragos Tatulea <[email protected]>

> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
> b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c index
> bc31196d348a..836198445726 100644 ---
> a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c +++
> b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c @@ -1155,6 +1155,74 @@
> void mlx5e_stats_rmon_get(struct mlx5e_priv *priv, *ranges =
> mlx5e_rmon_ranges; }
>
> +void mlx5e_stats_ts_get(struct mlx5e_priv *priv,
> + struct ethtool_ts_stats *ts_stats)
> +{
> + enum ethtool_ts_stats_layer layer;
> + struct mlx5e_ptp *ptp;
> + bool tx_ptp_opened;
> + int i, j;
> +
> + mutex_lock(&priv->state_lock);
> +
> + tx_ptp_opened = priv->tx_ptp_opened;
> +
> + /* NOTE: this needs to be changed whenever ethtool timestamping
> + * layer selection is implemented.
> + */
> + if (ts_stats->layer == ETHTOOL_TS_STATS_LAYER_ACTIVE)
> + layer = tx_ptp_opened ? ETHTOOL_TS_STATS_LAYER_PHY :
> + ETHTOOL_TS_STATS_LAYER_DMA;
> + else
> + layer = ts_stats->layer;
> +
> + switch (layer) {
> + case ETHTOOL_TS_STATS_LAYER_PHY:
> + if (!tx_ptp_opened)
> + return;
> +
> + ptp = priv->channels.ptp;
> +
> + ts_stats->pkts = 0;
> + ts_stats->err = 0;
> + ts_stats->late = 0;
> + ts_stats->lost = 0;
> +
> + /* Aggregate stats across all TCs */
> + for (i = 0; i < ptp->num_tc; i++) {
> + struct mlx5e_ptp_cq_stats *stats =
> ptp->ptpsq[i].cq_stats; +
> + ts_stats->pkts += stats->cqe;
> + ts_stats->err += stats->abort + stats->err_cqe;
> + ts_stats->late += stats->late_cqe;
> + ts_stats->lost += stats->lost_cqe;
> + }
> + break;
> + case ETHTOOL_TS_STATS_LAYER_DMA:
> + /* DMA layer will always successfully timestamp packets.
> Other
> + * counters do not make sense for this layer.
> + */
> + ts_stats->pkts = 0;
> +
> + /* Aggregate stats across all SQs */
> + mutex_lock(&priv->state_lock);
> + for (j = 0; j < priv->channels.num; j++) {
> + struct mlx5e_channel *c = priv->channels.c[j];
> +
> + for (i = 0; i < c->num_tc; i++) {
> + struct mlx5e_sq_stats *stats =
> c->sq[i].stats; +
> + ts_stats->pkts += stats->timestamps;
> + }
> + }
> + break;
> + default:
> + break;
> + }
> +
> + mutex_unlock(&priv->state_lock);
> +}
> +

To follow the same logic as my patch series you should use phc qualifier instead
of the layer. See patch 9 of my series.
With HWTSTAMP_PROVIDER_QUALIFIER_PRECISE for the IEEE 1588 which mean the PHY
layer on your case and HWTSTAMP_PROVIDER_QUALIFIER_APPROX for the DMA layer.

Even if the timestamp is made physically on the PHY, this driver does not
register any phy device. The NIC manages all the network architecture by
itself. We decided to use the phc qualifier to fit this use case.

The layer description should only be used internally in the kernel when we are
registering a PHY device and using the phy tsinfo/hwtstamp/rxtstamp/tstsamp
callbacks.

Regards,
--
Köry Maincent, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com

2024-02-26 10:27:32

by Kory Maincent

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v1 1/6] ethtool: add interface to read Tx hardware timestamping statistics

On Fri, 23 Feb 2024 11:24:45 -0800
Rahul Rameshbabu <[email protected]> wrote:

> Multiple network devices that support hardware timestamping appear to have
> common behavior with regards to timestamp handling. Implement common Tx
> hardware timestamping statistics in a tx_stats struct_group. Common Rx
> hardware timestamping statistics can subsequently be implemented in a
> rx_stats struct_group for ethtool_ts_stats.
>
> Signed-off-by: Rahul Rameshbabu <[email protected]>
> Reviewed-by: Dragos Tatulea <[email protected]>


> +
> +const struct nla_policy ethnl_stats_get_policy[__ETHTOOL_A_STATS_CNT] = {
> + [ETHTOOL_A_STATS_HEADER] = NLA_POLICY_NESTED(ethnl_header_policy),
> + [ETHTOOL_A_STATS_GROUPS] = { .type = NLA_NESTED },
> + [ETHTOOL_A_STATS_SRC] =
> NLA_POLICY_MAX(NLA_U32, ETHTOOL_MAC_STATS_SRC_PMAC),
> + [ETHTOOL_A_STATS_LAYER] =
> + NLA_POLICY_MAX(NLA_U32, ETHTOOL_TS_STATS_LAYER_PHY),
> };

You should add this new netlink attributes to the specs in a new patch to be
able to test it.

diff --git a/Documentation/netlink/specs/ethtool.yaml
b/Documentation/netlink/specs/ethtool.yaml
index cfe48f8d6283..118508de2c88 100644
--- a/Documentation/netlink/specs/ethtool.yaml
+++ b/Documentation/netlink/specs/ethtool.yaml
@@ -859,6 +859,9 @@ attribute-sets:
-
name: src
type: u32
+ -
+ name: layer
+ type: u32
-
name: phc-vclocks
attributes:
@@ -1526,6 +1529,7 @@ operations:
attributes:
- header
- groups
+ - layer
reply:
attributes:
- header

Regards,
--
Köry Maincent, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com

2024-02-26 20:01:09

by Jacob Keller

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v1 1/6] ethtool: add interface to read Tx hardware timestamping statistics



On 2/23/2024 3:43 PM, Rahul Rameshbabu wrote:
>
> On Fri, 23 Feb, 2024 14:48:51 -0800 Jacob Keller <[email protected]> wrote:
>> On 2/23/2024 2:21 PM, Rahul Rameshbabu wrote:
>>>> The Intel ice drivers has the following Tx timestamp statistics:
>>>>
>>>> tx_hwtstamp_skipped - indicates when we get a Tx timestamp request but
>>>> are unable to fulfill it.
>>>> tx_hwtstamp_timeouts - indicates we had a Tx timestamp skb waiting for a
>>>> timestamp from hardware but it didn't get received within some internal
>>>> time limit.
>>>
>>> This is interesting. In mlx5 land, the only case where we are unable to
>>> fulfill a hwtstamp is when the timestamp information is lost or late.
>>>
>>
>> For ice, the timestamps are captured in the PHY and stored in a block of
>> registers with limited slots. The driver tracks the available slots and
>> uses one when a Tx timestamp request comes in.
>>
>> So we have "skipped" because its possible to request too many timestamps
>> at once and fill up all the slots before the first timestamp is reported
>> back.
>>
>>> lost for us means that the timestamp never arrived within some internal
>>> time limit that our device will supposedly never be able to deliver
>>> timestamp information after that point.
>>>
>>
>> That is more or less the equivalent we have for timeout.
>>
>>> late for us is that we got hardware timestamp information delivered
>>> after that internal time limit. We are able to track this by using
>>> identifiers in our completion events and we only release references to
>>> these identifiers upon delivery (never delivering leaks the references.
>>> Enough build up leads to a recovery flow). The theory for us is that
>>> late timestamp information arrival after that period of time should not
>>> happen. However the truth is that it does happen and we want our driver
>>> implementation to be resilient to this case rather than trusting the
>>> time interval.
>>>
>>
>> In our case, because of how the slots work, once we "timeout" a slot, it
>> could get re-used. We set the timeout to be pretty ridiculous (1 second)
>> to ensure that if we do timeout its almost certainly because hardware
>> never timestamped the packet.
>
> We were thinking about that design as well. We use a 1 second timeout to
> be safe.
>
> #define MLX5E_PTP_TS_CQE_UNDELIVERED_TIMEOUT (1 * NSEC_PER_SEC)
>
> Our device does not do any bookkeeping internally to prevent a
> completion event with timestamp information from arriving after 1
> second. Some internal folks have said it shouldn't be possible, but we
> did not want to take any chances and built a model that is resilient to
> timestamp deliveries after any period of time even after consuming the
> skb without appending timestamp information. If no other vendor finds
> this useful, we could roll this up into the error counter and leave the
> late counter as vendor specific. I do not want to introduce too many
> counters that are hard to understand. We document the device specific
> counters on top of introducing them in the code base already.
>
> https://docs.kernel.org/networking/device_drivers/ethernet/mellanox/mlx5/counters.html
>

We can't distinguish "late". At best we could notice if we get a
timestamp on an index thats not currently active, but we wouldn't know
for sure if it was late from a previous request or due to some other
programming error.

>>
>>> Do you have any example of a case of skipping timestamp information that
>>> is not related to lack of delivery over time? I am wondering if this
>>> case is more like a hardware error or not. Or is it more like something
>>> along the lines of being busy/would impact line rate of timestamp
>>> information must be recorded?
>>>
>>
>> The main example for skipped is the event where all our slots are full
>> at point of timestamp request.
>
> This is what I was guessing as the main (if not only reason). For this
> specific reason, I think a general "busy" stats counter makes sense.
> mlx5 does not need this counter, but I can see a lot of other hw
> implementations needing this. (The skipped counter name obviously should
> be left only in the ice driver. Just felt "busy" was easy to understand
> for generalized counters.)

Yea, I don't expect this would be required for all hardware but it seems
like a common approach if you have limited slots for Tx timestamps
available.

>
> The reason why I prefer busy is that "skip" to me makes me think someone
> used SIOCSHWTSTAMP to filter which packets get timestamped which is very
> different from something like lack of resource availability.
>

Busy is fine with me.

>>>> The only major addition I think is the skipped stat, which I would
>>>> prefer to have. Perhaps that could be tracked in the netdev layer by
>>>> checking whether the skb flags to see whether or not the driver actually
>>>> set the appropriate flag?
>>>
>>> I guess the problem is how would the core stack know at what layer this
>>> was skipped at (I think Kory's patch series can be used to help with
>>> this since it's adding a common interface in ethtool to select the
>>> timestamping layer). As of today, mlx5 is the only driver I know of that
>>> supports selecting between the DMA and PHY layers for timestamp
>>> information.
>>>
>>
>> Well, the way the interface worked in my understanding was that the core
>> sets the SKBTX_HW_TSTAMP flag. The driver is supposed to then prepare
>> the packet for timestamp and set the SKBTX_IN_PROGRESS flag. I just
>> looked though, and it looks like ice doesn't actually set this flag...
>
> That would be a good fix. We set this in mlx5.
>
> /* device driver is going to provide hardware time stamp */
> SKBTX_IN_PROGRESS = 1 << 2,
>

Yea. I kind of wonder how necessary it is since we haven't been setting
it and don't seem to have an existing bug report for this. I can dig
through the kernel and see what it actually does...

>>
>> If we fixed this, in theory the stack should be able to check after the
>> packet gets sent with SKBTX_HW_TSTAMP, if SKBTX_IN_PROGRESS isn't set
>> then it would be a skipped timestamp?
>
> One question I have about this idea. Wouldn't SKBTX_IN_PROGRESS also not
> be set in the case when timestamp information is lost/a timeout occurs?
> I feel like the problem is not being able to separate these two cases
> from the perspective of the core stack.
>
> Btw, mlx5 does keep the flag even when we fail to write timestamp
> information..... I feel like it might be a good idea to add a warning in
> the core stack if both SKBTX_HW_TSTAMP and SKBTX_IN_PROGRESS are set but
> the skb is consumed without skb_hwtstamps(skb) being written by the
> driver before consuming the skb.
>

I was thinking the check would happen much earlier, i.e. the moment we
exit the driver xmit routines it would check whether SKBTX_IN_PROGRESS
is set. This would be well before any actual Tx timestamp was acquired.
Its basically a "if we set SKBTX_HW_TSTAMP and you didn't set
IN_PROGRESS then we know you didn't even start a timestamp request, so
you must have been busy"

It might not be workable because I think the IN_PROGRESS flag is used
for another purpose. I tried to read the documentation for it in
Documentation, but I got confused a bit. I'm going to go through the
code and see what places actually check this flag.

2024-02-29 02:05:41

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v1 1/6] ethtool: add interface to read Tx hardware timestamping statistics

On Fri, 23 Feb 2024 11:24:45 -0800 Rahul Rameshbabu wrote:
> +/**
> + * struct ethtool_ts_stats - HW timestamping statistics
> + * @layer: input field denoting whether stats should be queried from the DMA or
> + * PHY timestamping layer. Defaults to the active layer for packet
> + * timestamping.

I think we're better off attaching this to an existing message in the
dump (using ETHTOOL_FLAG_STATS / ethtool -I), like we do for pause,
fec, etc., rather than having to build a separate hierarchy and copy
identifiers within ETHTOOL_MSG_STATS_GET.

Sorry if I mis-directed you this way.

2024-02-29 02:09:10

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v1 5/6] tools: ynl: ethtool.py: Make tool invokable from any CWD

On Fri, 23 Feb 2024 14:39:07 -0800 Rahul Rameshbabu wrote:
> Unfortunately, even though in the next patch after this one where I add
> the ts stats group as a comment, the tool seems to fail at rendering the
> actual counters of the stats group, so I had to use the ethtool tree to
> test. I should look into that, so that way other contributors such as
> Intel can simply use this script to test that they hooked into the
> ethtool ts stats interface correctly.

IIRC some attribute nesting in ETHTOOL_MSG_STATS_GET is too "creative"
for the YAML specs. Using the ETHTOOL_FLAG_STATS path should avoid this
problem.

2024-02-29 22:21:11

by Rahul Rameshbabu

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v1 1/6] ethtool: add interface to read Tx hardware timestamping statistics


On Wed, 28 Feb, 2024 18:05:20 -0800 Jakub Kicinski <[email protected]> wrote:
> On Fri, 23 Feb 2024 11:24:45 -0800 Rahul Rameshbabu wrote:
>> +/**
>> + * struct ethtool_ts_stats - HW timestamping statistics
>> + * @layer: input field denoting whether stats should be queried from the DMA or
>> + * PHY timestamping layer. Defaults to the active layer for packet
>> + * timestamping.
>
> I think we're better off attaching this to an existing message in the
> dump (using ETHTOOL_FLAG_STATS / ethtool -I), like we do for pause,
> fec, etc., rather than having to build a separate hierarchy and copy
> identifiers within ETHTOOL_MSG_STATS_GET.
>
> Sorry if I mis-directed you this way.

No worries. I can do that in my v2.

2024-03-07 18:53:06

by Rahul Rameshbabu

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v1 1/6] ethtool: add interface to read Tx hardware timestamping statistics

Hi Jacob,

On Mon, 26 Feb, 2024 11:54:49 -0800 Jacob Keller <[email protected]> wrote:
> On 2/23/2024 3:43 PM, Rahul Rameshbabu wrote:
>>
>> On Fri, 23 Feb, 2024 14:48:51 -0800 Jacob Keller <[email protected]>
>> wrote:
>>> On 2/23/2024 2:21 PM, Rahul Rameshbabu wrote:
>>>> Do you have any example of a case of skipping timestamp information that
>>>> is not related to lack of delivery over time? I am wondering if this
>>>> case is more like a hardware error or not. Or is it more like something
>>>> along the lines of being busy/would impact line rate of timestamp
>>>> information must be recorded?
>>>>
>>>
>>> The main example for skipped is the event where all our slots are full
>>> at point of timestamp request.
>>
>> This is what I was guessing as the main (if not only reason). For this
>> specific reason, I think a general "busy" stats counter makes sense.
>> mlx5 does not need this counter, but I can see a lot of other hw
>> implementations needing this. (The skipped counter name obviously should
>> be left only in the ice driver. Just felt "busy" was easy to understand
>> for generalized counters.)
>
> Yea, I don't expect this would be required for all hardware but it seems
> like a common approach if you have limited slots for Tx timestamps
> available.
>
Sorry to bump this thread once more, but I had a question regarding the
Intel driver in regards to this. Instead of having a busy case when all
the slots are full, would it make sense to stop the netdev queues in
this case, we actually do this in mlx5 (though keep in mind that we have
a dedicated queue just for port/phy timestamping that we start/stop).

Maybe in your case, you can have a mix of HW timestamping and non-HW
timestamping in the same queue, which is why you have a busy case?

Wanted to inquire about this before sending out a RFC v2.
>>
>> The reason why I prefer busy is that "skip" to me makes me think someone
>> used SIOCSHWTSTAMP to filter which packets get timestamped which is very
>> different from something like lack of resource availability.
>>
>
> Busy is fine with me.
>

--
Thanks,

Rahul Rameshbabu

2024-03-08 03:29:35

by Jacob Keller

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v1 1/6] ethtool: add interface to read Tx hardware timestamping statistics



On 3/7/2024 10:47 AM, Rahul Rameshbabu wrote:
> Hi Jacob,
>
> On Mon, 26 Feb, 2024 11:54:49 -0800 Jacob Keller <[email protected]> wrote:
>> On 2/23/2024 3:43 PM, Rahul Rameshbabu wrote:
>>>
>>> On Fri, 23 Feb, 2024 14:48:51 -0800 Jacob Keller <[email protected]>
>>> wrote:
>>>> On 2/23/2024 2:21 PM, Rahul Rameshbabu wrote:
>>>>> Do you have any example of a case of skipping timestamp information that
>>>>> is not related to lack of delivery over time? I am wondering if this
>>>>> case is more like a hardware error or not. Or is it more like something
>>>>> along the lines of being busy/would impact line rate of timestamp
>>>>> information must be recorded?
>>>>>
>>>>
>>>> The main example for skipped is the event where all our slots are full
>>>> at point of timestamp request.
>>>
>>> This is what I was guessing as the main (if not only reason). For this
>>> specific reason, I think a general "busy" stats counter makes sense.
>>> mlx5 does not need this counter, but I can see a lot of other hw
>>> implementations needing this. (The skipped counter name obviously should
>>> be left only in the ice driver. Just felt "busy" was easy to understand
>>> for generalized counters.)
>>
>> Yea, I don't expect this would be required for all hardware but it seems
>> like a common approach if you have limited slots for Tx timestamps
>> available.
>>
> Sorry to bump this thread once more, but I had a question regarding the
> Intel driver in regards to this. Instead of having a busy case when all
> the slots are full, would it make sense to stop the netdev queues in
> this case, we actually do this in mlx5 (though keep in mind that we have
> a dedicated queue just for port/phy timestamping that we start/stop).
>
> Maybe in your case, you can have a mix of HW timestamping and non-HW
> timestamping in the same queue, which is why you have a busy case?
>

We don't use a dedicated queue. The issue isn't queue capacity so much
as it is the number of slots in the PHY for where it can save the
timestamp data.

In practice the most common application (ptp4l) synchronously waits for
timestamps, and only has one outstanding at a time. Likely due to
limitations with original hardware that only supported one outstanding
Tx timestamp.

> Wanted to inquire about this before sending out a RFC v2.

That's actually an interesting approach to change to a dedicated queue
which we could lock and start/stop it when the indexes are full. How
does that interact with the stack UDP and Ethernet stacks? Presumably
when you go to transmit, you'd need to pick a queue and if its stopped
you'd have to drop or tell the stack?

I think I remember someone experimenting with returning NETDEV_TX_BUSY
when the slots were full, but in practice this caused a lot of issues.
None of the other devices we have with only a single slot (one set of
registers, ixgbe, i40e, igb, e1000) did that either.

If this queue model behaves in a sane way (or if we can communicate
something similar by reporting back up the stack without needing a
dedicated queue?) that could be better than the current situation.


>>>
>>> The reason why I prefer busy is that "skip" to me makes me think someone
>>> used SIOCSHWTSTAMP to filter which packets get timestamped which is very
>>> different from something like lack of resource availability.
>>>
>>
>> Busy is fine with me.
>>
>
> --
> Thanks,
>
> Rahul Rameshbabu

2024-03-08 05:30:06

by Rahul Rameshbabu

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v1 1/6] ethtool: add interface to read Tx hardware timestamping statistics


On Thu, 07 Mar, 2024 19:29:08 -0800 Jacob Keller <[email protected]> wrote:
> On 3/7/2024 10:47 AM, Rahul Rameshbabu wrote:
>> Hi Jacob,
>>
>> On Mon, 26 Feb, 2024 11:54:49 -0800 Jacob Keller <[email protected]> wrote:
>>> On 2/23/2024 3:43 PM, Rahul Rameshbabu wrote:
>>>>
>>>> On Fri, 23 Feb, 2024 14:48:51 -0800 Jacob Keller <[email protected]>
>>>> wrote:
>>>>> On 2/23/2024 2:21 PM, Rahul Rameshbabu wrote:
>>>>>> Do you have any example of a case of skipping timestamp information that
>>>>>> is not related to lack of delivery over time? I am wondering if this
>>>>>> case is more like a hardware error or not. Or is it more like something
>>>>>> along the lines of being busy/would impact line rate of timestamp
>>>>>> information must be recorded?
>>>>>>
>>>>>
>>>>> The main example for skipped is the event where all our slots are full
>>>>> at point of timestamp request.
>>>>
>>>> This is what I was guessing as the main (if not only reason). For this
>>>> specific reason, I think a general "busy" stats counter makes sense.
>>>> mlx5 does not need this counter, but I can see a lot of other hw
>>>> implementations needing this. (The skipped counter name obviously should
>>>> be left only in the ice driver. Just felt "busy" was easy to understand
>>>> for generalized counters.)
>>>
>>> Yea, I don't expect this would be required for all hardware but it seems
>>> like a common approach if you have limited slots for Tx timestamps
>>> available.
>>>
>> Sorry to bump this thread once more, but I had a question regarding the
>> Intel driver in regards to this. Instead of having a busy case when all
>> the slots are full, would it make sense to stop the netdev queues in
>> this case, we actually do this in mlx5 (though keep in mind that we have
>> a dedicated queue just for port/phy timestamping that we start/stop).
>>
>> Maybe in your case, you can have a mix of HW timestamping and non-HW
>> timestamping in the same queue, which is why you have a busy case?
>>
>
> We don't use a dedicated queue. The issue isn't queue capacity so much
> as it is the number of slots in the PHY for where it can save the
> timestamp data.

In mlx5, we use a dedicated queue just for the purpose of HW
timestamping because we actually do have a similar slot mechanism. We
call it metadata. We have a limit of 256 entries. We steer PTP traffic
specifically (though we will be changing this to any HW timestamped
traffic with the work Kory is doing) to this queue by matching against
the protocol and port. All other traffic goes to the normal queues that
cannot consume the timestamping slots. When all the slots are occupied,
we stop the timestamping queue rather than throwing some busy error.

>
> In practice the most common application (ptp4l) synchronously waits for
> timestamps, and only has one outstanding at a time. Likely due to
> limitations with original hardware that only supported one outstanding
> Tx timestamp.
>
>> Wanted to inquire about this before sending out a RFC v2.
>
> That's actually an interesting approach to change to a dedicated queue
> which we could lock and start/stop it when the indexes are full. How
> does that interact with the stack UDP and Ethernet stacks? Presumably
> when you go to transmit, you'd need to pick a queue and if its stopped
> you'd have to drop or tell the stack?

Let me share a pointer in mlx5 for how we do the queue selection. Like I
mentioned, we steer ptp traffic specifically, but we can change this to
just steer any skb that indicates hw timestamping.

* https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h?id=3aaa8ce7a3350d95b241046ae2401103a4384ba2#n71

Then, here is how we manage stopping and waking the queue (we tell the
core stack about this so we do not have to drop traffic due to some kind
of busy state because our metadata/slots are all consumed).

* https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c?id=3aaa8ce7a3350d95b241046ae2401103a4384ba2#n775
* https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c?id=3aaa8ce7a3350d95b241046ae2401103a4384ba2#n257
* https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c?id=3aaa8ce7a3350d95b241046ae2401103a4384ba2#n397

>
> I think I remember someone experimenting with returning NETDEV_TX_BUSY
> when the slots were full, but in practice this caused a lot of issues.
> None of the other devices we have with only a single slot (one set of
> registers, ixgbe, i40e, igb, e1000) did that either.

So we experimented that even with a single slot (we had reasons for
testing this), the dedicated queue for timestamping worked out nicely. I
really would suggest investigating this model since I think it might
play out nicely for the Intel family.

>
> If this queue model behaves in a sane way (or if we can communicate
> something similar by reporting back up the stack without needing a
> dedicated queue?) that could be better than the current situation.

I personally really like the dedicated queue in the device drivers, but
if we want to instead model this slot management work in the core netdev
stack, I do not think that is a bad endeavor either (when slots are
full, hw timestamping traffic is held back till they become available).
I do think the netif_tx_wake_queue/netif_tx_stop_queue + dedicated HW
timestamping queue does work out nicely.

Let me know your thoughts on this. If you think it's an interesting idea
to explore, lets not add the busy counter now in this series. I already
dropped the late counter. We can add the busy counter later on if you
feel this model I have shared is not viable for Intel. I wanted to avoid
introducing too many counters pre-emptively that might not actually be
consumed widely. I had a thought that what you presented with slots is
very similar to what we have with metadata in mlx5, so I thought that
maybe handling the management of these slots in a different way with
something like a dedicated queue for HW timestamping could make the
design cleaner.

--
Thanks,

Rahul Rameshbabu

2024-03-08 22:29:51

by Jacob Keller

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v1 1/6] ethtool: add interface to read Tx hardware timestamping statistics



On 3/7/2024 9:09 PM, Rahul Rameshbabu wrote:
>
> On Thu, 07 Mar, 2024 19:29:08 -0800 Jacob Keller <[email protected]> wrote:
>> On 3/7/2024 10:47 AM, Rahul Rameshbabu wrote:
>>> Hi Jacob,
>>>
>>> On Mon, 26 Feb, 2024 11:54:49 -0800 Jacob Keller <[email protected]> wrote:
>>>> On 2/23/2024 3:43 PM, Rahul Rameshbabu wrote:
>>>>>
>>>>> On Fri, 23 Feb, 2024 14:48:51 -0800 Jacob Keller <[email protected]>
>>>>> wrote:
>>>>>> On 2/23/2024 2:21 PM, Rahul Rameshbabu wrote:
>>>>>>> Do you have any example of a case of skipping timestamp information that
>>>>>>> is not related to lack of delivery over time? I am wondering if this
>>>>>>> case is more like a hardware error or not. Or is it more like something
>>>>>>> along the lines of being busy/would impact line rate of timestamp
>>>>>>> information must be recorded?
>>>>>>>
>>>>>>
>>>>>> The main example for skipped is the event where all our slots are full
>>>>>> at point of timestamp request.
>>>>>
>>>>> This is what I was guessing as the main (if not only reason). For this
>>>>> specific reason, I think a general "busy" stats counter makes sense.
>>>>> mlx5 does not need this counter, but I can see a lot of other hw
>>>>> implementations needing this. (The skipped counter name obviously should
>>>>> be left only in the ice driver. Just felt "busy" was easy to understand
>>>>> for generalized counters.)
>>>>
>>>> Yea, I don't expect this would be required for all hardware but it seems
>>>> like a common approach if you have limited slots for Tx timestamps
>>>> available.
>>>>
>>> Sorry to bump this thread once more, but I had a question regarding the
>>> Intel driver in regards to this. Instead of having a busy case when all
>>> the slots are full, would it make sense to stop the netdev queues in
>>> this case, we actually do this in mlx5 (though keep in mind that we have
>>> a dedicated queue just for port/phy timestamping that we start/stop).
>>>
>>> Maybe in your case, you can have a mix of HW timestamping and non-HW
>>> timestamping in the same queue, which is why you have a busy case?
>>>
>>
>> We don't use a dedicated queue. The issue isn't queue capacity so much
>> as it is the number of slots in the PHY for where it can save the
>> timestamp data.
>
> In mlx5, we use a dedicated queue just for the purpose of HW
> timestamping because we actually do have a similar slot mechanism. We
> call it metadata. We have a limit of 256 entries. We steer PTP traffic
> specifically (though we will be changing this to any HW timestamped
> traffic with the work Kory is doing) to this queue by matching against
> the protocol and port. All other traffic goes to the normal queues that
> cannot consume the timestamping slots. When all the slots are occupied,
> we stop the timestamping queue rather than throwing some busy error.
>
>>
>> In practice the most common application (ptp4l) synchronously waits for
>> timestamps, and only has one outstanding at a time. Likely due to
>> limitations with original hardware that only supported one outstanding
>> Tx timestamp.
>>
>>> Wanted to inquire about this before sending out a RFC v2.
>>
>> That's actually an interesting approach to change to a dedicated queue
>> which we could lock and start/stop it when the indexes are full. How
>> does that interact with the stack UDP and Ethernet stacks? Presumably
>> when you go to transmit, you'd need to pick a queue and if its stopped
>> you'd have to drop or tell the stack?
>
> Let me share a pointer in mlx5 for how we do the queue selection. Like I
> mentioned, we steer ptp traffic specifically, but we can change this to
> just steer any skb that indicates hw timestamping.
>
> * https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h?id=3aaa8ce7a3350d95b241046ae2401103a4384ba2#n71
>
> Then, here is how we manage stopping and waking the queue (we tell the
> core stack about this so we do not have to drop traffic due to some kind
> of busy state because our metadata/slots are all consumed).
>

Makes sense.

> * https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c?id=3aaa8ce7a3350d95b241046ae2401103a4384ba2#n775
> * https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c?id=3aaa8ce7a3350d95b241046ae2401103a4384ba2#n257
> * https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c?id=3aaa8ce7a3350d95b241046ae2401103a4384ba2#n397
>
>>
>> I think I remember someone experimenting with returning NETDEV_TX_BUSY
>> when the slots were full, but in practice this caused a lot of issues.
>> None of the other devices we have with only a single slot (one set of
>> registers, ixgbe, i40e, igb, e1000) did that either.
>
> So we experimented that even with a single slot (we had reasons for
> testing this), the dedicated queue for timestamping worked out nicely. I
> really would suggest investigating this model since I think it might
> play out nicely for the Intel family.
>
>>
>> If this queue model behaves in a sane way (or if we can communicate
>> something similar by reporting back up the stack without needing a
>> dedicated queue?) that could be better than the current situation.
>
> I personally really like the dedicated queue in the device drivers, but
> if we want to instead model this slot management work in the core netdev
> stack, I do not think that is a bad endeavor either (when slots are
> full, hw timestamping traffic is held back till they become available).
> I do think the netif_tx_wake_queue/netif_tx_stop_queue + dedicated HW
> timestamping queue does work out nicely.

Ok so if I understand this right, .ndo_select_queue has the stack pick a
queue, and we'd implement this to use the SKB flag. Then whenever the
slots for the queue are full we issue netif_tx_stop_queue, and whenever
the slots are released and we have slots open again we issue
netif_tx_wake_queue..

While the queue is stopped, the stack basically just buffers requests
and doesn't try to call the ndo_do_xmit routine for that queue until the
queue is ready again?

Using a dedicated queue has some other advantages in that it could be
programmed with different priority both from the hardware side (prefer
packets waiting in the timestamp queue) and from the software side
(prioritize CPUs running the threads for processing it). That could be
useful in some applications too...

>
> Let me know your thoughts on this. If you think it's an interesting idea
> to explore, lets not add the busy counter now in this series. I already
> dropped the late counter. We can add the busy counter later on if you
> feel this model I have shared is not viable for Intel. I wanted to avoid
> introducing too many counters pre-emptively that might not actually be
> consumed widely. I had a thought that what you presented with slots is
> very similar to what we have with metadata in mlx5, so I thought that
> maybe handling the management of these slots in a different way with
> something like a dedicated queue for HW timestamping could make the
> design cleaner.
>

I think I agree with the queue model, though I'm not sure when I could
get to working on implementing this. I'm fine with dropping the busy
counter from this series.

> --
> Thanks,
>
> Rahul Rameshbabu

2024-03-08 22:35:31

by Rahul Rameshbabu

[permalink] [raw]
Subject: Re: [PATCH RFC net-next v1 1/6] ethtool: add interface to read Tx hardware timestamping statistics


On Fri, 08 Mar, 2024 14:28:01 -0800 Jacob Keller <[email protected]> wrote:
> On 3/7/2024 9:09 PM, Rahul Rameshbabu wrote:
>>
>> On Thu, 07 Mar, 2024 19:29:08 -0800 Jacob Keller <[email protected]> wrote:
>>> On 3/7/2024 10:47 AM, Rahul Rameshbabu wrote:
>>>> Hi Jacob,
>>>>
>>>> On Mon, 26 Feb, 2024 11:54:49 -0800 Jacob Keller <[email protected]> wrote:
>>>>> On 2/23/2024 3:43 PM, Rahul Rameshbabu wrote:
>>>>>>
>>>>>> On Fri, 23 Feb, 2024 14:48:51 -0800 Jacob Keller <[email protected]>
>>>>>> wrote:
>>>>>>> On 2/23/2024 2:21 PM, Rahul Rameshbabu wrote:
>>>>>>>> Do you have any example of a case of skipping timestamp information that
>>>>>>>> is not related to lack of delivery over time? I am wondering if this
>>>>>>>> case is more like a hardware error or not. Or is it more like something
>>>>>>>> along the lines of being busy/would impact line rate of timestamp
>>>>>>>> information must be recorded?
>>>>>>>>
>>>>>>>
>>>>>>> The main example for skipped is the event where all our slots are full
>>>>>>> at point of timestamp request.
>>>>>>
>>>>>> This is what I was guessing as the main (if not only reason). For this
>>>>>> specific reason, I think a general "busy" stats counter makes sense.
>>>>>> mlx5 does not need this counter, but I can see a lot of other hw
>>>>>> implementations needing this. (The skipped counter name obviously should
>>>>>> be left only in the ice driver. Just felt "busy" was easy to understand
>>>>>> for generalized counters.)
>>>>>
>>>>> Yea, I don't expect this would be required for all hardware but it seems
>>>>> like a common approach if you have limited slots for Tx timestamps
>>>>> available.
>>>>>
>>>> Sorry to bump this thread once more, but I had a question regarding the
>>>> Intel driver in regards to this. Instead of having a busy case when all
>>>> the slots are full, would it make sense to stop the netdev queues in
>>>> this case, we actually do this in mlx5 (though keep in mind that we have
>>>> a dedicated queue just for port/phy timestamping that we start/stop).
>>>>
>>>> Maybe in your case, you can have a mix of HW timestamping and non-HW
>>>> timestamping in the same queue, which is why you have a busy case?
>>>>
>>>
>>> We don't use a dedicated queue. The issue isn't queue capacity so much
>>> as it is the number of slots in the PHY for where it can save the
>>> timestamp data.
>>
>> In mlx5, we use a dedicated queue just for the purpose of HW
>> timestamping because we actually do have a similar slot mechanism. We
>> call it metadata. We have a limit of 256 entries. We steer PTP traffic
>> specifically (though we will be changing this to any HW timestamped
>> traffic with the work Kory is doing) to this queue by matching against
>> the protocol and port. All other traffic goes to the normal queues that
>> cannot consume the timestamping slots. When all the slots are occupied,
>> we stop the timestamping queue rather than throwing some busy error.
>>
>>>
>>> In practice the most common application (ptp4l) synchronously waits for
>>> timestamps, and only has one outstanding at a time. Likely due to
>>> limitations with original hardware that only supported one outstanding
>>> Tx timestamp.
>>>
>>>> Wanted to inquire about this before sending out a RFC v2.
>>>
>>> That's actually an interesting approach to change to a dedicated queue
>>> which we could lock and start/stop it when the indexes are full. How
>>> does that interact with the stack UDP and Ethernet stacks? Presumably
>>> when you go to transmit, you'd need to pick a queue and if its stopped
>>> you'd have to drop or tell the stack?
>>
>> Let me share a pointer in mlx5 for how we do the queue selection. Like I
>> mentioned, we steer ptp traffic specifically, but we can change this to
>> just steer any skb that indicates hw timestamping.
>>
>> *
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h?id=3aaa8ce7a3350d95b241046ae2401103a4384ba2#n71
>>
>> Then, here is how we manage stopping and waking the queue (we tell the
>> core stack about this so we do not have to drop traffic due to some kind
>> of busy state because our metadata/slots are all consumed).
>>
>
> Makes sense.
>
>> *
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c?id=3aaa8ce7a3350d95b241046ae2401103a4384ba2#n775
>> *
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c?id=3aaa8ce7a3350d95b241046ae2401103a4384ba2#n257
>> *
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c?id=3aaa8ce7a3350d95b241046ae2401103a4384ba2#n397
>>
>>>
>>> I think I remember someone experimenting with returning NETDEV_TX_BUSY
>>> when the slots were full, but in practice this caused a lot of issues.
>>> None of the other devices we have with only a single slot (one set of
>>> registers, ixgbe, i40e, igb, e1000) did that either.
>>
>> So we experimented that even with a single slot (we had reasons for
>> testing this), the dedicated queue for timestamping worked out nicely. I
>> really would suggest investigating this model since I think it might
>> play out nicely for the Intel family.
>>
>>>
>>> If this queue model behaves in a sane way (or if we can communicate
>>> something similar by reporting back up the stack without needing a
>>> dedicated queue?) that could be better than the current situation.
>>
>> I personally really like the dedicated queue in the device drivers, but
>> if we want to instead model this slot management work in the core netdev
>> stack, I do not think that is a bad endeavor either (when slots are
>> full, hw timestamping traffic is held back till they become available).
>> I do think the netif_tx_wake_queue/netif_tx_stop_queue + dedicated HW
>> timestamping queue does work out nicely.
>
> Ok so if I understand this right, .ndo_select_queue has the stack pick a
> queue, and we'd implement this to use the SKB flag. Then whenever the
> slots for the queue are full we issue netif_tx_stop_queue, and whenever
> the slots are released and we have slots open again we issue
> netif_tx_wake_queue..

Your summary here is correct.

>
> While the queue is stopped, the stack basically just buffers requests
> and doesn't try to call the ndo_do_xmit routine for that queue until the
> queue is ready again?

Yes, that's right. Because of this, you can avoid needing to report a
busy state and just let the stack buffer till the device backpressure
for timestamping resources is released (timestamp information is
retrieved from the device, making the slots available).

>
> Using a dedicated queue has some other advantages in that it could be
> programmed with different priority both from the hardware side (prefer
> packets waiting in the timestamp queue) and from the software side
> (prioritize CPUs running the threads for processing it). That could be
> useful in some applications too...
>
>>
>> Let me know your thoughts on this. If you think it's an interesting idea
>> to explore, lets not add the busy counter now in this series. I already
>> dropped the late counter. We can add the busy counter later on if you
>> feel this model I have shared is not viable for Intel. I wanted to avoid
>> introducing too many counters pre-emptively that might not actually be
>> consumed widely. I had a thought that what you presented with slots is
>> very similar to what we have with metadata in mlx5, so I thought that
>> maybe handling the management of these slots in a different way with
>> something like a dedicated queue for HW timestamping could make the
>> design cleaner.
>>
>
> I think I agree with the queue model, though I'm not sure when I could
> get to working on implementing this. I'm fine with dropping the busy
> counter from this series.

No worries. If you are interested in further discussion or any kind of
RFC review, I am open to this. If you feel this model is not
satisfactory in the future, we can discuss this and then look at adding
the busy counter.

--
Thanks,

Rahul Rameshbabu

2024-03-09 08:45:13

by Rahul Rameshbabu

[permalink] [raw]
Subject: [PATCH RFC v2 0/6] ethtool HW timestamping statistics

The goal of this patch series is to introduce a common set of ethtool statistics
for hardware timestamping that a driver implementer can hook into. The
statistics counters added are based on what I believe are common
patterns/behaviors found across various hardware timestamping implementations
seen in the kernel tree today. The mlx5 family of devices is used as the PoC for
this patch series. Other vendors are more than welcome to chim in on this
series.

Changes since v1:
- Dropped the late statistics counter since that was not general enough
- Dropped the layer selection attribute for timestamping statistics
- Take Jakub's suggestion and converted to ETHTOOL_FLAG_STATS
- Provided a working interface to query these new statistics from
tools/net/ynl/ethtool.py

Link: https://lore.kernel.org/netdev/[email protected]/
Signed-off-by: Rahul Rameshbabu <[email protected]>
---
Rahul Rameshbabu (6):
ethtool: add interface to read Tx hardware timestamping statistics
net/mlx5e: Introduce lost_cqe statistic counter for PTP Tx port
timestamping CQ
net/mlx5e: Introduce timestamps statistic counter for Tx DMA layer
net/mlx5e: Implement ethtool hardware timestamping statistics
tools: ynl: ethtool.py: Make tool invokable from any CWD
tools: ynl: ethtool.py: Output timestamping statistics from tsinfo-get
operation

Documentation/netlink/specs/ethtool.yaml | 20 +++++++
.../ethernet/mellanox/mlx5/counters.rst | 11 ++++
.../net/ethernet/mellanox/mlx5/core/en/ptp.c | 1 +
.../ethernet/mellanox/mlx5/core/en_ethtool.c | 9 ++++
.../ethernet/mellanox/mlx5/core/en_stats.c | 48 +++++++++++++++++
.../ethernet/mellanox/mlx5/core/en_stats.h | 4 ++
.../net/ethernet/mellanox/mlx5/core/en_tx.c | 6 ++-
include/linux/ethtool.h | 21 ++++++++
include/uapi/linux/ethtool_netlink.h | 15 ++++++
net/ethtool/tsinfo.c | 52 ++++++++++++++++++-
tools/net/ynl/ethtool.py | 19 +++++--
11 files changed, 200 insertions(+), 6 deletions(-)

--
2.42.0


2024-03-09 08:45:32

by Rahul Rameshbabu

[permalink] [raw]
Subject: [PATCH RFC v2 1/6] ethtool: add interface to read Tx hardware timestamping statistics

Multiple network devices that support hardware timestamping appear to have
common behavior with regards to timestamp handling. Implement common Tx
hardware timestamping statistics in a tx_stats struct_group. Common Rx
hardware timestamping statistics can subsequently be implemented in a
rx_stats struct_group for ethtool_ts_stats.

Signed-off-by: Rahul Rameshbabu <[email protected]>
---
Documentation/netlink/specs/ethtool.yaml | 20 +++++++++
include/linux/ethtool.h | 21 ++++++++++
include/uapi/linux/ethtool_netlink.h | 15 +++++++
net/ethtool/tsinfo.c | 52 +++++++++++++++++++++++-
4 files changed, 107 insertions(+), 1 deletion(-)

diff --git a/Documentation/netlink/specs/ethtool.yaml b/Documentation/netlink/specs/ethtool.yaml
index 197208f419dc..f99b003c78c0 100644
--- a/Documentation/netlink/specs/ethtool.yaml
+++ b/Documentation/netlink/specs/ethtool.yaml
@@ -559,6 +559,21 @@ attribute-sets:
-
name: tx-lpi-timer
type: u32
+ -
+ name: ts-stat
+ attributes:
+ -
+ name: pad
+ type: pad
+ -
+ name: tx-pkts
+ type: u64
+ -
+ name: tx-lost
+ type: u64
+ -
+ name: tx-err
+ type: u64
-
name: tsinfo
attributes:
@@ -581,6 +596,10 @@ attribute-sets:
-
name: phc-index
type: u32
+ -
+ name: stats
+ type: nest
+ nested-attributes: ts-stat
-
name: cable-result
attributes:
@@ -1388,6 +1407,7 @@ operations:
- tx-types
- rx-filters
- phc-index
+ - stats
dump: *tsinfo-get-op
-
name: cable-test-act
diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index b90c33607594..a1704938a6fb 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -483,6 +483,24 @@ struct ethtool_rmon_stats {
);
};

+/**
+ * struct ethtool_ts_stats - HW timestamping statistics
+ * @tx_stats: struct group for TX HW timestamping
+ * @pkts: Number of packets successfully timestamped by the queried
+ * layer.
+ * @lost: Number of packet timestamps that failed to get applied on a
+ * packet by the queried layer.
+ * @err: Number of timestamping errors that occurred on the queried
+ * layer.
+ */
+struct ethtool_ts_stats {
+ struct_group(tx_stats,
+ u64 pkts;
+ u64 lost;
+ u64 err;
+ );
+};
+
#define ETH_MODULE_EEPROM_PAGE_LEN 128
#define ETH_MODULE_MAX_I2C_ADDRESS 0x7f

@@ -759,6 +777,7 @@ struct ethtool_rxfh_param {
* It may be called with RCU, or rtnl or reference on the device.
* Drivers supporting transmit time stamps in software should set this to
* ethtool_op_get_ts_info().
+ * @get_ts_stats: Query the device hardware timestamping statistics.
* @get_module_info: Get the size and type of the eeprom contained within
* a plug-in module.
* @get_module_eeprom: Get the eeprom information from the plug-in module
@@ -901,6 +920,8 @@ struct ethtool_ops {
struct ethtool_dump *, void *);
int (*set_dump)(struct net_device *, struct ethtool_dump *);
int (*get_ts_info)(struct net_device *, struct ethtool_ts_info *);
+ void (*get_ts_stats)(struct net_device *dev,
+ struct ethtool_ts_stats *ts_stats);
int (*get_module_info)(struct net_device *,
struct ethtool_modinfo *);
int (*get_module_eeprom)(struct net_device *,
diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index 3f89074aa06c..046a78d9421d 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -478,12 +478,27 @@ enum {
ETHTOOL_A_TSINFO_TX_TYPES, /* bitset */
ETHTOOL_A_TSINFO_RX_FILTERS, /* bitset */
ETHTOOL_A_TSINFO_PHC_INDEX, /* u32 */
+ ETHTOOL_A_TSINFO_STATS, /* nest - _A_TSINFO_STAT */

/* add new constants above here */
__ETHTOOL_A_TSINFO_CNT,
ETHTOOL_A_TSINFO_MAX = (__ETHTOOL_A_TSINFO_CNT - 1)
};

+enum {
+ ETHTOOL_A_TS_STAT_UNSPEC,
+ ETHTOOL_A_TS_STAT_PAD,
+
+ ETHTOOL_A_TS_STAT_TX_PKT, /* array, u64 */
+ ETHTOOL_A_TS_STAT_TX_LOST, /* array, u64 */
+ ETHTOOL_A_TS_STAT_TX_ERR, /* array, u64 */
+
+ /* add new constants above here */
+ __ETHTOOL_A_TS_STAT_CNT,
+ ETHTOOL_A_TS_STAT_MAX = (__ETHTOOL_A_TS_STAT_CNT - 1)
+
+};
+
/* PHC VCLOCKS */

enum {
diff --git a/net/ethtool/tsinfo.c b/net/ethtool/tsinfo.c
index 9daed0aab162..0d1370ded122 100644
--- a/net/ethtool/tsinfo.c
+++ b/net/ethtool/tsinfo.c
@@ -13,14 +13,18 @@ struct tsinfo_req_info {
struct tsinfo_reply_data {
struct ethnl_reply_data base;
struct ethtool_ts_info ts_info;
+ struct ethtool_ts_stats stats;
};

#define TSINFO_REPDATA(__reply_base) \
container_of(__reply_base, struct tsinfo_reply_data, base)

+#define ETHTOOL_TS_STAT_CNT \
+ (__ETHTOOL_A_TS_STAT_CNT - (ETHTOOL_A_TS_STAT_PAD + 1))
+
const struct nla_policy ethnl_tsinfo_get_policy[] = {
[ETHTOOL_A_TSINFO_HEADER] =
- NLA_POLICY_NESTED(ethnl_header_policy),
+ NLA_POLICY_NESTED(ethnl_header_policy_stats),
};

static int tsinfo_prepare_data(const struct ethnl_req_info *req_base,
@@ -34,6 +38,12 @@ static int tsinfo_prepare_data(const struct ethnl_req_info *req_base,
ret = ethnl_ops_begin(dev);
if (ret < 0)
return ret;
+ if (req_base->flags & ETHTOOL_FLAG_STATS &&
+ dev->ethtool_ops->get_ts_stats) {
+ ethtool_stats_init((u64 *)&data->stats,
+ sizeof(data->stats) / sizeof(u64));
+ dev->ethtool_ops->get_ts_stats(dev, &data->stats);
+ }
ret = __ethtool_get_ts_info(dev, &data->ts_info);
ethnl_ops_complete(dev);

@@ -79,10 +89,47 @@ static int tsinfo_reply_size(const struct ethnl_req_info *req_base,
}
if (ts_info->phc_index >= 0)
len += nla_total_size(sizeof(u32)); /* _TSINFO_PHC_INDEX */
+ if (req_base->flags & ETHTOOL_FLAG_STATS)
+ len += nla_total_size(0) + /* _TSINFO_STATS */
+ nla_total_size_64bit(sizeof(u64)) * ETHTOOL_TS_STAT_CNT;

return len;
}

+static int tsinfo_put_stat(struct sk_buff *skb, u64 val, u16 attrtype)
+{
+ if (val == ETHTOOL_STAT_NOT_SET)
+ return 0;
+ if (nla_put_u64_64bit(skb, attrtype, val, ETHTOOL_A_TS_STAT_PAD))
+ return -EMSGSIZE;
+ return 0;
+}
+
+static int tsinfo_put_stats(struct sk_buff *skb,
+ const struct ethtool_ts_stats *stats)
+{
+ struct nlattr *nest;
+
+ nest = nla_nest_start(skb, ETHTOOL_A_TSINFO_STATS);
+ if (!nest)
+ return -EMSGSIZE;
+
+ if (tsinfo_put_stat(skb, stats->tx_stats.pkts,
+ ETHTOOL_A_TS_STAT_TX_PKT) ||
+ tsinfo_put_stat(skb, stats->tx_stats.lost,
+ ETHTOOL_A_TS_STAT_TX_LOST) ||
+ tsinfo_put_stat(skb, stats->tx_stats.err,
+ ETHTOOL_A_TS_STAT_TX_ERR))
+ goto err_cancel;
+
+ nla_nest_end(skb, nest);
+ return 0;
+
+err_cancel:
+ nla_nest_cancel(skb, nest);
+ return -EMSGSIZE;
+}
+
static int tsinfo_fill_reply(struct sk_buff *skb,
const struct ethnl_req_info *req_base,
const struct ethnl_reply_data *reply_base)
@@ -119,6 +166,9 @@ static int tsinfo_fill_reply(struct sk_buff *skb,
if (ts_info->phc_index >= 0 &&
nla_put_u32(skb, ETHTOOL_A_TSINFO_PHC_INDEX, ts_info->phc_index))
return -EMSGSIZE;
+ if (req_base->flags & ETHTOOL_FLAG_STATS &&
+ tsinfo_put_stats(skb, &data->stats))
+ return -EMSGSIZE;

return 0;
}
--
2.42.0


2024-03-09 08:45:49

by Rahul Rameshbabu

[permalink] [raw]
Subject: [PATCH RFC v2 3/6] net/mlx5e: Introduce timestamps statistic counter for Tx DMA layer

Count number of transmitted packets that were hardware timestamped at the
device DMA layer.

Signed-off-by: Rahul Rameshbabu <[email protected]>
---
.../device_drivers/ethernet/mellanox/mlx5/counters.rst | 5 +++++
drivers/net/ethernet/mellanox/mlx5/core/en_stats.c | 2 ++
drivers/net/ethernet/mellanox/mlx5/core/en_stats.h | 1 +
drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 6 ++++--
4 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
index 5464cd9e2694..fed821ef9b09 100644
--- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
+++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
@@ -300,6 +300,11 @@ the software port.
in the beginning of the queue. This is a normal condition.
- Informative

+ * - `tx[i]_timestamps`
+ - Transmitted packets that were hardware timestamped at the device's DMA
+ layer.
+ - Informative
+
* - `tx[i]_added_vlan_packets`
- The number of packets sent where vlan tag insertion was offloaded to the
hardware.
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
index 7e63d7c88894..bc31196d348a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
@@ -2046,6 +2046,7 @@ static const struct counter_desc sq_stats_desc[] = {
{ MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, csum_partial_inner) },
{ MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, added_vlan_packets) },
{ MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, nop) },
+ { MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, timestamps) },
{ MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, mpwqe_blks) },
{ MLX5E_DECLARE_TX_STAT(struct mlx5e_sq_stats, mpwqe_pkts) },
#ifdef CONFIG_MLX5_EN_TLS
@@ -2198,6 +2199,7 @@ static const struct counter_desc qos_sq_stats_desc[] = {
{ MLX5E_DECLARE_QOS_TX_STAT(struct mlx5e_sq_stats, csum_partial_inner) },
{ MLX5E_DECLARE_QOS_TX_STAT(struct mlx5e_sq_stats, added_vlan_packets) },
{ MLX5E_DECLARE_QOS_TX_STAT(struct mlx5e_sq_stats, nop) },
+ { MLX5E_DECLARE_QOS_TX_STAT(struct mlx5e_sq_stats, timestamps) },
{ MLX5E_DECLARE_QOS_TX_STAT(struct mlx5e_sq_stats, mpwqe_blks) },
{ MLX5E_DECLARE_QOS_TX_STAT(struct mlx5e_sq_stats, mpwqe_pkts) },
#ifdef CONFIG_MLX5_EN_TLS
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
index 03f6265d3ed5..3c634c5fd420 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
@@ -429,6 +429,7 @@ struct mlx5e_sq_stats {
u64 stopped;
u64 dropped;
u64 recover;
+ u64 timestamps;
/* dirtied @completion */
u64 cqes ____cacheline_aligned_in_smp;
u64 wake;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index 5c166d9d2dca..5acba323246e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -748,11 +748,13 @@ static void mlx5e_consume_skb(struct mlx5e_txqsq *sq, struct sk_buff *skb,
u64 ts = get_cqe_ts(cqe);

hwts.hwtstamp = mlx5e_cqe_ts_to_ns(sq->ptp_cyc2time, sq->clock, ts);
- if (sq->ptpsq)
+ if (sq->ptpsq) {
mlx5e_skb_cb_hwtstamp_handler(skb, MLX5E_SKB_CB_CQE_HWTSTAMP,
hwts.hwtstamp, sq->ptpsq->cq_stats);
- else
+ } else {
skb_tstamp_tx(skb, &hwts);
+ sq->stats->timestamps++;
+ }
}

napi_consume_skb(skb, napi_budget);
--
2.42.0


2024-03-09 08:46:12

by Rahul Rameshbabu

[permalink] [raw]
Subject: [PATCH RFC v2 2/6] net/mlx5e: Introduce lost_cqe statistic counter for PTP Tx port timestamping CQ

Track the number of times a CQE was expected to not be delivered on PTP Tx
port timestamping CQ. A CQE is expected to not be delivered if a certain
amount of time passes since the corresponding CQE containing the DMA
timestamp information has arrived. Increment the late_cqe counter when such
a CQE does manage to be delivered to the CQ.

Signed-off-by: Rahul Rameshbabu <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
---
.../device_drivers/ethernet/mellanox/mlx5/counters.rst | 6 ++++++
drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c | 1 +
drivers/net/ethernet/mellanox/mlx5/core/en_stats.c | 1 +
drivers/net/ethernet/mellanox/mlx5/core/en_stats.h | 1 +
4 files changed, 9 insertions(+)

diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
index f69ee1ebee01..5464cd9e2694 100644
--- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
+++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
@@ -702,6 +702,12 @@ the software port.
the device typically ensures not posting the CQE.
- Error

+ * - `ptp_cq[i]_lost_cqe`
+ - Number of times a CQE is expected to not be delivered on the PTP
+ timestamping CQE by the device due to a time delta elapsing. If such a
+ CQE is somehow delivered, `ptp_cq[i]_late_cqe` is incremented.
+ - Error
+
.. [#ring_global] The corresponding ring and global counters do not share the
same name (i.e. do not follow the common naming scheme).

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
index fd4ef6431142..1dd4bf7f7dbe 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
@@ -169,6 +169,7 @@ static void mlx5e_ptpsq_mark_ts_cqes_undelivered(struct mlx5e_ptpsq *ptpsq,
WARN_ON_ONCE(!pos->inuse);
pos->inuse = false;
list_del(&pos->entry);
+ ptpsq->cq_stats->lost_cqe++;
}
spin_unlock(&cqe_list->tracker_list_lock);
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
index 4b96ad657145..7e63d7c88894 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
@@ -2158,6 +2158,7 @@ static const struct counter_desc ptp_cq_stats_desc[] = {
{ MLX5E_DECLARE_PTP_CQ_STAT(struct mlx5e_ptp_cq_stats, abort) },
{ MLX5E_DECLARE_PTP_CQ_STAT(struct mlx5e_ptp_cq_stats, abort_abs_diff_ns) },
{ MLX5E_DECLARE_PTP_CQ_STAT(struct mlx5e_ptp_cq_stats, late_cqe) },
+ { MLX5E_DECLARE_PTP_CQ_STAT(struct mlx5e_ptp_cq_stats, lost_cqe) },
};

static const struct counter_desc ptp_rq_stats_desc[] = {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
index 12b3607afecd..03f6265d3ed5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
@@ -461,6 +461,7 @@ struct mlx5e_ptp_cq_stats {
u64 abort;
u64 abort_abs_diff_ns;
u64 late_cqe;
+ u64 lost_cqe;
};

struct mlx5e_rep_stats {
--
2.42.0


2024-03-09 08:46:16

by Rahul Rameshbabu

[permalink] [raw]
Subject: [PATCH RFC v2 4/6] net/mlx5e: Implement ethtool hardware timestamping statistics

Feed driver statistics counters related to hardware timestamping to
standardized ethtool hardware timestamping statistics group.

Signed-off-by: Rahul Rameshbabu <[email protected]>
---
.../ethernet/mellanox/mlx5/core/en_ethtool.c | 9 ++++
.../ethernet/mellanox/mlx5/core/en_stats.c | 45 +++++++++++++++++++
.../ethernet/mellanox/mlx5/core/en_stats.h | 2 +
3 files changed, 56 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index cc51ce16df14..d3b77054c30a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -2381,6 +2381,14 @@ static void mlx5e_get_rmon_stats(struct net_device *netdev,
mlx5e_stats_rmon_get(priv, rmon_stats, ranges);
}

+static void mlx5e_get_ts_stats(struct net_device *netdev,
+ struct ethtool_ts_stats *ts_stats)
+{
+ struct mlx5e_priv *priv = netdev_priv(netdev);
+
+ mlx5e_stats_ts_get(priv, ts_stats);
+}
+
const struct ethtool_ops mlx5e_ethtool_ops = {
.cap_rss_ctx_supported = true,
.supported_coalesce_params = ETHTOOL_COALESCE_USECS |
@@ -2430,5 +2438,6 @@ const struct ethtool_ops mlx5e_ethtool_ops = {
.get_eth_mac_stats = mlx5e_get_eth_mac_stats,
.get_eth_ctrl_stats = mlx5e_get_eth_ctrl_stats,
.get_rmon_stats = mlx5e_get_rmon_stats,
+ .get_ts_stats = mlx5e_get_ts_stats,
.get_link_ext_stats = mlx5e_get_link_ext_stats
};
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
index bc31196d348a..465c1423528f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
@@ -1155,6 +1155,51 @@ void mlx5e_stats_rmon_get(struct mlx5e_priv *priv,
*ranges = mlx5e_rmon_ranges;
}

+void mlx5e_stats_ts_get(struct mlx5e_priv *priv,
+ struct ethtool_ts_stats *ts_stats)
+{
+ int i, j;
+
+ mutex_lock(&priv->state_lock);
+
+ if (priv->tx_ptp_opened) {
+ struct mlx5e_ptp *ptp = priv->channels.ptp;
+
+ ts_stats->pkts = 0;
+ ts_stats->err = 0;
+ ts_stats->lost = 0;
+
+ /* Aggregate stats across all TCs */
+ for (i = 0; i < ptp->num_tc; i++) {
+ struct mlx5e_ptp_cq_stats *stats =
+ ptp->ptpsq[i].cq_stats;
+
+ ts_stats->pkts += stats->cqe;
+ ts_stats->err += stats->abort + stats->err_cqe +
+ stats->late_cqe;
+ ts_stats->lost += stats->lost_cqe;
+ }
+ } else {
+ /* DMA layer will always successfully timestamp packets. Other
+ * counters do not make sense for this layer.
+ */
+ ts_stats->pkts = 0;
+
+ /* Aggregate stats across all SQs */
+ for (j = 0; j < priv->channels.num; j++) {
+ struct mlx5e_channel *c = priv->channels.c[j];
+
+ for (i = 0; i < c->num_tc; i++) {
+ struct mlx5e_sq_stats *stats = c->sq[i].stats;
+
+ ts_stats->pkts += stats->timestamps;
+ }
+ }
+ }
+
+ mutex_unlock(&priv->state_lock);
+}
+
#define PPORT_PHY_STATISTICAL_OFF(c) \
MLX5_BYTE_OFF(ppcnt_reg, \
counter_set.phys_layer_statistical_cntrs.c##_high)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
index 3c634c5fd420..7b3e6cf1229a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
@@ -126,6 +126,8 @@ void mlx5e_stats_eth_ctrl_get(struct mlx5e_priv *priv,
void mlx5e_stats_rmon_get(struct mlx5e_priv *priv,
struct ethtool_rmon_stats *rmon,
const struct ethtool_rmon_hist_range **ranges);
+void mlx5e_stats_ts_get(struct mlx5e_priv *priv,
+ struct ethtool_ts_stats *ts_stats);
void mlx5e_get_link_ext_stats(struct net_device *dev,
struct ethtool_link_ext_stats *stats);

--
2.42.0


2024-03-09 08:46:43

by Rahul Rameshbabu

[permalink] [raw]
Subject: [PATCH RFC v2 5/6] tools: ynl: ethtool.py: Make tool invokable from any CWD

ethtool.py depends on yml files in a specific location of the linux kernel
tree. Using relative lookup for those files means that ethtool.py would
need to be run under tools/net/ynl/. Lookup needed yml files without
depending on the current working directory that ethtool.py is invoked from.

Signed-off-by: Rahul Rameshbabu <[email protected]>
---
tools/net/ynl/ethtool.py | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tools/net/ynl/ethtool.py b/tools/net/ynl/ethtool.py
index 6c9f7e31250c..44ba3ba58ed9 100755
--- a/tools/net/ynl/ethtool.py
+++ b/tools/net/ynl/ethtool.py
@@ -6,6 +6,7 @@ import json
import pprint
import sys
import re
+import os

from lib import YnlFamily

@@ -152,8 +153,11 @@ def main():
global args
args = parser.parse_args()

- spec = '../../../Documentation/netlink/specs/ethtool.yaml'
- schema = '../../../Documentation/netlink/genetlink-legacy.yaml'
+ script_abs_dir = os.path.dirname(os.path.abspath(sys.argv[0]))
+ spec = os.path.join(script_abs_dir,
+ '../../../Documentation/netlink/specs/ethtool.yaml')
+ schema = os.path.join(script_abs_dir,
+ '../../../Documentation/netlink/genetlink-legacy.yaml')

ynl = YnlFamily(spec, schema)

--
2.42.0


2024-03-09 08:46:48

by Rahul Rameshbabu

[permalink] [raw]
Subject: [PATCH RFC v2 6/6] tools: ynl: ethtool.py: Output timestamping statistics from tsinfo-get operation

Print the nested stats attribute containing timestamping statistics when
the --show-time-stamping flag is used.

[root@binary-eater-vm-01 linux-ethtool-ts]# ./tools/net/ynl/ethtool.py --show-time-stamping mlx5_1
Time stamping parameters for mlx5_1:
Capabilities:
hardware-transmit
hardware-receive
hardware-raw-clock
PTP Hardware Clock: 0
Hardware Transmit Timestamp Modes:
off
on
Hardware Receive Filter Modes:
none
all
Statistics:
tx-pkts: 8
tx-lost: 0
tx-err: 0

Signed-off-by: Rahul Rameshbabu <[email protected]>
---
tools/net/ynl/ethtool.py | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/tools/net/ynl/ethtool.py b/tools/net/ynl/ethtool.py
index 44ba3ba58ed9..193399e7fbd1 100755
--- a/tools/net/ynl/ethtool.py
+++ b/tools/net/ynl/ethtool.py
@@ -324,7 +324,13 @@ def main():
return

if args.show_time_stamping:
- tsinfo = dumpit(ynl, args, 'tsinfo-get')
+ req = {
+ 'header': {
+ 'flags': 1 << 2,
+ },
+ }
+
+ tsinfo = dumpit(ynl, args, 'tsinfo-get', req)

print(f'Time stamping parameters for {args.device}:')

@@ -338,6 +344,9 @@ def main():

print('Hardware Receive Filter Modes:')
[print(f'\t{v}') for v in bits_to_dict(tsinfo['rx-filters'])]
+
+ print('Statistics:')
+ [print(f'\t{k}: {v}') for k, v in tsinfo['stats'].items()]
return

print(f'Settings for {args.device}:')
--
2.42.0


2024-03-11 12:44:02

by Kory Maincent

[permalink] [raw]
Subject: Re: [PATCH RFC v2 5/6] tools: ynl: ethtool.py: Make tool invokable from any CWD

On Sat, 9 Mar 2024 00:44:39 -0800
Rahul Rameshbabu <[email protected]> wrote:

> ethtool.py depends on yml files in a specific location of the linux kernel
> tree. Using relative lookup for those files means that ethtool.py would
> need to be run under tools/net/ynl/. Lookup needed yml files without
> depending on the current working directory that ethtool.py is invoked from.
>
> Signed-off-by: Rahul Rameshbabu <[email protected]>

You should send this patch standalone as it has no relevance to the series
subject.

--
Köry Maincent, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com

2024-03-12 23:54:27

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH RFC v2 1/6] ethtool: add interface to read Tx hardware timestamping statistics

On Sat, 9 Mar 2024 00:44:35 -0800 Rahul Rameshbabu wrote:
> Multiple network devices that support hardware timestamping appear to have
> common behavior with regards to timestamp handling. Implement common Tx
> hardware timestamping statistics in a tx_stats struct_group. Common Rx
> hardware timestamping statistics can subsequently be implemented in a
> rx_stats struct_group for ethtool_ts_stats.

> Documentation/netlink/specs/ethtool.yaml | 20 +++++++++
> include/linux/ethtool.h | 21 ++++++++++
> include/uapi/linux/ethtool_netlink.h | 15 +++++++
> net/ethtool/tsinfo.c | 52 +++++++++++++++++++++++-
> 4 files changed, 107 insertions(+), 1 deletion(-)

Feels like we should mention the new stats somehow in
Documentation/networking/ethtool-netlink.rst

> diff --git a/Documentation/netlink/specs/ethtool.yaml b/Documentation/netlink/specs/ethtool.yaml
> index 197208f419dc..f99b003c78c0 100644
> --- a/Documentation/netlink/specs/ethtool.yaml
> +++ b/Documentation/netlink/specs/ethtool.yaml
> @@ -559,6 +559,21 @@ attribute-sets:
> -
> name: tx-lpi-timer
> type: u32
> + -
> + name: ts-stat
> + attributes:
> + -
> + name: pad
> + type: pad

You can remove the pad entry, and...

> + -
> + name: tx-pkts
> + type: u64

..use the uint type for the stats

> + -
> + name: tx-lost
> + type: u64
> + -
> + name: tx-err
> + type: u64
> -
> name: tsinfo
> attributes:

> +/**
> + * struct ethtool_ts_stats - HW timestamping statistics
> + * @tx_stats: struct group for TX HW timestamping
> + * @pkts: Number of packets successfully timestamped by the queried
> + * layer.
> + * @lost: Number of packet timestamps that failed to get applied on a
> + * packet by the queried layer.
> + * @err: Number of timestamping errors that occurred on the queried
> + * layer.

the kdocs for @lost and @err are not very clear

> + * @get_ts_stats: Query the device hardware timestamping statistics.

Let's copy & paste the "Drivers must not zero" text in here?
People seem to miss that requirement anyway, but at least we'll
have something to point at in review.

> +enum {
> + ETHTOOL_A_TS_STAT_UNSPEC,
> + ETHTOOL_A_TS_STAT_PAD,
> +
> + ETHTOOL_A_TS_STAT_TX_PKT, /* array, u64 */
> + ETHTOOL_A_TS_STAT_TX_LOST, /* array, u64 */
> + ETHTOOL_A_TS_STAT_TX_ERR, /* array, u64 */

I don't think these are arrays.

> +
> + /* add new constants above here */
> + __ETHTOOL_A_TS_STAT_CNT,
> + ETHTOOL_A_TS_STAT_MAX = (__ETHTOOL_A_TS_STAT_CNT - 1)
> +
> +};

2024-03-12 23:55:56

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH RFC v2 6/6] tools: ynl: ethtool.py: Output timestamping statistics from tsinfo-get operation

On Sat, 9 Mar 2024 00:44:40 -0800 Rahul Rameshbabu wrote:
> + req = {
> + 'header': {
> + 'flags': 1 << 2,
> + },
> + }

You should be able to use the name of the flag instead of the raw value.
Jiri added that recently, IIRC.

2024-03-14 00:25:56

by Rahul Rameshbabu

[permalink] [raw]
Subject: Re: [PATCH RFC v2 6/6] tools: ynl: ethtool.py: Output timestamping statistics from tsinfo-get operation


On Tue, 12 Mar, 2024 16:55:44 -0700 Jakub Kicinski <[email protected]> wrote:
> On Sat, 9 Mar 2024 00:44:40 -0800 Rahul Rameshbabu wrote:
>> + req = {
>> + 'header': {
>> + 'flags': 1 << 2,
>> + },
>> + }
>
> You should be able to use the name of the flag instead of the raw value.
> Jiri added that recently, IIRC.

I think this is for 'flag' type attributes. Not for the "header" flags
for the ethtool request, so I believe this cannot be done here, since
the header flags are a u32 type, not a flag type.

https://lore.kernel.org/netdev/[email protected]/

-
name: header
attributes:
-
name: dev-index
type: u32
-
name: dev-name
type: string
-
name: flags
type: u32

vs

-
name: bitset-bit
attributes:
-
name: index
type: u32
-
name: name
type: string
-
name: value
type: flag

So I believe Jiri's change applies for the latter, not the former (could
be wrong here).

--
Thanks,

Rahul Rameshbabu


2024-03-14 00:28:50

by Rahul Rameshbabu

[permalink] [raw]
Subject: Re: [PATCH RFC v2 1/6] ethtool: add interface to read Tx hardware timestamping statistics

On Tue, 12 Mar, 2024 16:53:46 -0700 Jakub Kicinski <[email protected]> wrote:
> On Sat, 9 Mar 2024 00:44:35 -0800 Rahul Rameshbabu wrote:
>> Multiple network devices that support hardware timestamping appear to have
>> common behavior with regards to timestamp handling. Implement common Tx
>> hardware timestamping statistics in a tx_stats struct_group. Common Rx
>> hardware timestamping statistics can subsequently be implemented in a
>> rx_stats struct_group for ethtool_ts_stats.
>
>> Documentation/netlink/specs/ethtool.yaml | 20 +++++++++
>> include/linux/ethtool.h | 21 ++++++++++
>> include/uapi/linux/ethtool_netlink.h | 15 +++++++
>> net/ethtool/tsinfo.c | 52 +++++++++++++++++++++++-
>> 4 files changed, 107 insertions(+), 1 deletion(-)
>
> Feels like we should mention the new stats somehow in
> Documentation/networking/ethtool-netlink.rst
>
>> diff --git a/Documentation/netlink/specs/ethtool.yaml b/Documentation/netlink/specs/ethtool.yaml
>> index 197208f419dc..f99b003c78c0 100644
>> --- a/Documentation/netlink/specs/ethtool.yaml
>> +++ b/Documentation/netlink/specs/ethtool.yaml
>> @@ -559,6 +559,21 @@ attribute-sets:
>> -
>> name: tx-lpi-timer
>> type: u32
>> + -
>> + name: ts-stat
>> + attributes:
>> + -
>> + name: pad
>> + type: pad
>
> You can remove the pad entry, and...
>
>> + -
>> + name: tx-pkts
>> + type: u64
>
> ...use the uint type for the stats
>
>> + -
>> + name: tx-lost
>> + type: u64
>> + -
>> + name: tx-err
>> + type: u64
>> -
>> name: tsinfo
>> attributes:
>
>> +/**
>> + * struct ethtool_ts_stats - HW timestamping statistics
>> + * @tx_stats: struct group for TX HW timestamping
>> + * @pkts: Number of packets successfully timestamped by the queried
>> + * layer.
>> + * @lost: Number of packet timestamps that failed to get applied on a
>> + * packet by the queried layer.
>> + * @err: Number of timestamping errors that occurred on the queried
>> + * layer.
>
> the kdocs for @lost and @err are not very clear

Makes sense given that these are stale and should have been changed
between my v1 and v2. Here is my new attempt at this.

/**
* struct ethtool_ts_stats - HW timestamping statistics
* @tx_stats: struct group for TX HW timestamping
* @pkts: Number of packets successfully timestamped by the hardware.
* @lost: Number of hardware timestamping requests where the timestamping
* information from the hardware never arrived for submission with
* the skb.
* @err: Number of arbitrary timestamp generation error events that the
* hardware encountered.
*/


>
>> + * @get_ts_stats: Query the device hardware timestamping statistics.
>
> Let's copy & paste the "Drivers must not zero" text in here?
> People seem to miss that requirement anyway, but at least we'll
> have something to point at in review.
>
>> +enum {
>> + ETHTOOL_A_TS_STAT_UNSPEC,
>> + ETHTOOL_A_TS_STAT_PAD,
>> +
>> + ETHTOOL_A_TS_STAT_TX_PKT, /* array, u64 */
>> + ETHTOOL_A_TS_STAT_TX_LOST, /* array, u64 */
>> + ETHTOOL_A_TS_STAT_TX_ERR, /* array, u64 */
>
> I don't think these are arrays.

Sorry, copy-paste mistake from FEC stats....

>
>> +
>> + /* add new constants above here */
>> + __ETHTOOL_A_TS_STAT_CNT,
>> + ETHTOOL_A_TS_STAT_MAX = (__ETHTOOL_A_TS_STAT_CNT - 1)
>> +
>> +};

Agreed with all the comments here. Have accounted for them in my patches
for my next submission (aiming for non-RFC once the net-next window is
open again).

--
Thanks,

Rahul Rameshbabu

2024-03-14 00:41:39

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH RFC v2 1/6] ethtool: add interface to read Tx hardware timestamping statistics

On Wed, 13 Mar 2024 17:26:11 -0700 Rahul Rameshbabu wrote:
> Makes sense given that these are stale and should have been changed
> between my v1 and v2. Here is my new attempt at this.
>
> /**
> * struct ethtool_ts_stats - HW timestamping statistics
> * @tx_stats: struct group for TX HW timestamping
> * @pkts: Number of packets successfully timestamped by the hardware.
> * @lost: Number of hardware timestamping requests where the timestamping
> * information from the hardware never arrived for submission with
> * the skb.

Should we give some guidance to drivers which "ignore" time stamping
requests if they used up all the "slots"? Even if just temporary until
they are fixed? Maybe we can add after all the fields something like:

For drivers which ignore further timestamping requests when there are
too many in flight, the ignored requests are currently not counted by
any of the statistics.

Adjust as needed, I basing this on the vague memory that this was
the conclusion in the last discussion :)

> * @err: Number of arbitrary timestamp generation error events that the
> * hardware encountered.
> */

Just to be crystal clear let's also call out that @lost is not included
in @err.

2024-03-14 00:47:22

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH RFC v2 6/6] tools: ynl: ethtool.py: Output timestamping statistics from tsinfo-get operation

On Wed, 13 Mar 2024 17:22:50 -0700 Rahul Rameshbabu wrote:
> I think this is for 'flag' type attributes. Not for the "header" flags
> for the ethtool request, so I believe this cannot be done here, since
> the header flags are a u32 type, not a flag type.
>
> https://lore.kernel.org/netdev/[email protected]/
>
> -
> name: header
> attributes:
> -
> name: dev-index
> type: u32
> -
> name: dev-name
> type: string
> -
> name: flags
> type: u32
>
> vs
>
> -
> name: bitset-bit
> attributes:
> -
> name: index
> type: u32
> -
> name: name
> type: string
> -
> name: value
> type: flag
>
> So I believe Jiri's change applies for the latter, not the former (could
> be wrong here).

Ah, we're missing the enum definition and linking :S

I mean:

diff --git a/Documentation/netlink/specs/ethtool.yaml b/Documentation/netlink/specs/ethtool.yaml
index 197208f419dc..e1626c94d93b 100644
--- a/Documentation/netlink/specs/ethtool.yaml
+++ b/Documentation/netlink/specs/ethtool.yaml
@@ -16,6 +16,10 @@ doc: Partial family for Ethtool Netlink.
name: stringset
type: enum
entries: []
+ -
+ name: header-flags
+ type: flags
+ entries: [ compact-bitset, omit-reply, stats ]

attribute-sets:
-
@@ -30,6 +34,7 @@ doc: Partial family for Ethtool Netlink.
-
name: flags
type: u32
+ enum: header-flags

-
name: bitset-bit

See if that works and feel free to post it with my suggested-by

2024-03-14 00:54:00

by Rahul Rameshbabu

[permalink] [raw]
Subject: Re: [PATCH RFC v2 1/6] ethtool: add interface to read Tx hardware timestamping statistics


On Wed, 13 Mar, 2024 17:41:07 -0700 Jakub Kicinski <[email protected]> wrote:
> On Wed, 13 Mar 2024 17:26:11 -0700 Rahul Rameshbabu wrote:
>> Makes sense given that these are stale and should have been changed
>> between my v1 and v2. Here is my new attempt at this.
>>
>> /**
>> * struct ethtool_ts_stats - HW timestamping statistics
>> * @tx_stats: struct group for TX HW timestamping
>> * @pkts: Number of packets successfully timestamped by the hardware.
>> * @lost: Number of hardware timestamping requests where the timestamping
>> * information from the hardware never arrived for submission with
>> * the skb.
>
> Should we give some guidance to drivers which "ignore" time stamping
> requests if they used up all the "slots"? Even if just temporary until
> they are fixed? Maybe we can add after all the fields something like:
>
> For drivers which ignore further timestamping requests when there are
> too many in flight, the ignored requests are currently not counted by
> any of the statistics.

I was actually thinking it would be better to merge them into the error
counter temporarily. Reason being is that in the case Intel notices that
their slots are full, they just drop traffic from my understanding
today. If the error counters increment in that situation, it helps with
the debug to a degree. EBUSY is an error in general.

>
> Adjust as needed, I basing this on the vague memory that this was
> the conclusion in the last discussion :)
>
>> * @err: Number of arbitrary timestamp generation error events that the
>> * hardware encountered.
>> */
>
> Just to be crystal clear let's also call out that @lost is not included
> in @err.

Ack.

--
Thanks,

Rahul Rameshbabu

2024-03-14 01:40:29

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH RFC v2 1/6] ethtool: add interface to read Tx hardware timestamping statistics

On Wed, 13 Mar 2024 17:50:39 -0700 Rahul Rameshbabu wrote:
> > Should we give some guidance to drivers which "ignore" time stamping
> > requests if they used up all the "slots"? Even if just temporary until
> > they are fixed? Maybe we can add after all the fields something like:
> >
> > For drivers which ignore further timestamping requests when there are
> > too many in flight, the ignored requests are currently not counted by
> > any of the statistics.
>
> I was actually thinking it would be better to merge them into the error
> counter temporarily. Reason being is that in the case Intel notices that
> their slots are full, they just drop traffic from my understanding
> today. If the error counters increment in that situation, it helps with
> the debug to a degree. EBUSY is an error in general.

That works, too, let's recommend it (FWIW no preference whether
in the entry for @err or somewhere separately in the kdoc).

2024-03-14 04:24:09

by Rahul Rameshbabu

[permalink] [raw]
Subject: Re: [PATCH RFC v2 1/6] ethtool: add interface to read Tx hardware timestamping statistics


On Wed, 13 Mar, 2024 18:40:17 -0700 Jakub Kicinski <[email protected]> wrote:
> On Wed, 13 Mar 2024 17:50:39 -0700 Rahul Rameshbabu wrote:
>> > Should we give some guidance to drivers which "ignore" time stamping
>> > requests if they used up all the "slots"? Even if just temporary until
>> > they are fixed? Maybe we can add after all the fields something like:
>> >
>> > For drivers which ignore further timestamping requests when there are
>> > too many in flight, the ignored requests are currently not counted by
>> > any of the statistics.
>>
>> I was actually thinking it would be better to merge them into the error
>> counter temporarily. Reason being is that in the case Intel notices that
>> their slots are full, they just drop traffic from my understanding
>> today. If the error counters increment in that situation, it helps with
>> the debug to a degree. EBUSY is an error in general.
>
> That works, too, let's recommend it (FWIW no preference whether
> in the entry for @err or somewhere separately in the kdoc).

/**
* struct ethtool_ts_stats - HW timestamping statistics
* @tx_stats: struct group for TX HW timestamping
* @pkts: Number of packets successfully timestamped by the hardware.
* @lost: Number of hardware timestamping requests where the timestamping
* information from the hardware never arrived for submission with
* the skb.
* @err: Number of arbitrary timestamp generation error events that the
* hardware encountered, exclusive of @lost statistics. Cases such
* as resource exhaustion, unavailability, firmware errors, and
* detected illogical timestamp values not submitted with the skb
* are inclusive to this counter.
*/

Here is my current draft for the error counter documentation.

--
Thanks,

Rahul Rameshbabu

2024-03-14 06:25:49

by Rahul Rameshbabu

[permalink] [raw]
Subject: Re: [PATCH RFC v2 6/6] tools: ynl: ethtool.py: Output timestamping statistics from tsinfo-get operation


On Wed, 13 Mar, 2024 17:47:07 -0700 Jakub Kicinski <[email protected]> wrote:
>
> Ah, we're missing the enum definition and linking :S
>
> I mean:
>
> diff --git a/Documentation/netlink/specs/ethtool.yaml b/Documentation/netlink/specs/ethtool.yaml
> index 197208f419dc..e1626c94d93b 100644
> --- a/Documentation/netlink/specs/ethtool.yaml
> +++ b/Documentation/netlink/specs/ethtool.yaml
> @@ -16,6 +16,10 @@ doc: Partial family for Ethtool Netlink.
> name: stringset
> type: enum
> entries: []
> + -
> + name: header-flags
> + type: flags
> + entries: [ compact-bitset, omit-reply, stats ]

I am running into some strange issues with this even after regenerating
ynl generated/ by running make under tools/net/ynl/.

Traceback (most recent call last):
File "/root/linux-ethtool-ts/./tools/net/ynl/ethtool.py", line 437, in <module>
main()
File "/root/linux-ethtool-ts/./tools/net/ynl/ethtool.py", line 333, in main
tsinfo = dumpit(ynl, args, 'tsinfo-get', req)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/linux-ethtool-ts/./tools/net/ynl/ethtool.py", line 91, in dumpit
reply = ynl.dump(op_name, { 'header': {} } | extra)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/linux-ethtool-ts/tools/net/ynl/lib/ynl.py", line 873, in dump
return self._op(method, vals, [], dump=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/linux-ethtool-ts/tools/net/ynl/lib/ynl.py", line 824, in _op
msg += self._add_attr(op.attr_set.name, name, value, search_attrs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/linux-ethtool-ts/tools/net/ynl/lib/ynl.py", line 459, in _add_attr
attr_payload += self._add_attr(attr['nested-attributes'],
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/linux-ethtool-ts/tools/net/ynl/lib/ynl.py", line 481, in _add_attr
attr_payload = format.pack(int(value))
^^^^^^^^^^
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'dict'

What's your expectation for how the request structure would look like? I
have tried the following.

if args.show_time_stamping:
req = {
'header': {
'flags': 'stats',
},
}

if args.show_time_stamping:
req = {
'header': {
'flags': {
'stats': True,
},
},
}

I tried looking through the lib/ynl.py code, but I did not understand
how the 'flags' type was specifically handled.

>
> attribute-sets:
> -
> @@ -30,6 +34,7 @@ doc: Partial family for Ethtool Netlink.
> -
> name: flags
> type: u32
> + enum: header-flags
>
> -
> name: bitset-bit
>
> See if that works and feel free to post it with my suggested-by

--
Thanks,

Rahul Rameshbabu

2024-03-14 17:07:41

by Rahul Rameshbabu

[permalink] [raw]
Subject: Re: [PATCH RFC v2 1/6] ethtool: add interface to read Tx hardware timestamping statistics


On Tue, 12 Mar, 2024 16:53:46 -0700 Jakub Kicinski <[email protected]> wrote:
> On Sat, 9 Mar 2024 00:44:35 -0800 Rahul Rameshbabu wrote:
>> Multiple network devices that support hardware timestamping appear to have
>> common behavior with regards to timestamp handling. Implement common Tx
>> hardware timestamping statistics in a tx_stats struct_group. Common Rx
>> hardware timestamping statistics can subsequently be implemented in a
>> rx_stats struct_group for ethtool_ts_stats.
>
>> Documentation/netlink/specs/ethtool.yaml | 20 +++++++++
>> include/linux/ethtool.h | 21 ++++++++++
>> include/uapi/linux/ethtool_netlink.h | 15 +++++++
>> net/ethtool/tsinfo.c | 52 +++++++++++++++++++++++-
>> 4 files changed, 107 insertions(+), 1 deletion(-)
>
> Feels like we should mention the new stats somehow in
> Documentation/networking/ethtool-netlink.rst
>
>> diff --git a/Documentation/netlink/specs/ethtool.yaml b/Documentation/netlink/specs/ethtool.yaml
>> index 197208f419dc..f99b003c78c0 100644
>> --- a/Documentation/netlink/specs/ethtool.yaml
>> +++ b/Documentation/netlink/specs/ethtool.yaml
>> @@ -559,6 +559,21 @@ attribute-sets:
>> -
>> name: tx-lpi-timer
>> type: u32
>> + -
>> + name: ts-stat
>> + attributes:
>> + -
>> + name: pad
>> + type: pad
>
> You can remove the pad entry, and...
>

You need the pad to match with ETHTOOL_A_TS_STAT_PAD (which similar to
other ethtool stats currently defined). Otherwise, you run into the
following.... mm-stat and fec-stat are good examples.

[root@binary-eater-vm-01 linux-ethtool-ts]# ./tools/net/ynl/ethtool.py --show-time-stamping mlx5_1
Traceback (most recent call last):
File "/root/linux-ethtool-ts/tools/net/ynl/lib/ynl.py", line 598, in _decode
attr_spec = attr_space.attrs_by_val[attr.type]
~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^
KeyError: 4

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/root/linux-ethtool-ts/./tools/net/ynl/ethtool.py", line 437, in <module>
main()
File "/root/linux-ethtool-ts/./tools/net/ynl/ethtool.py", line 333, in main
tsinfo = dumpit(ynl, args, 'tsinfo-get', req)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/linux-ethtool-ts/./tools/net/ynl/ethtool.py", line 91, in dumpit
reply = ynl.dump(op_name, { 'header': {} } | extra)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/linux-ethtool-ts/tools/net/ynl/lib/ynl.py", line 873, in dump
return self._op(method, vals, [], dump=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/linux-ethtool-ts/tools/net/ynl/lib/ynl.py", line 858, in _op
rsp_msg = self._decode(decoded.raw_attrs, op.attr_set.name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/linux-ethtool-ts/tools/net/ynl/lib/ynl.py", line 607, in _decode
subdict = self._decode(NlAttrs(attr.raw), attr_spec['nested-attributes'], search_attrs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/linux-ethtool-ts/tools/net/ynl/lib/ynl.py", line 601, in _decode
raise Exception(f"Space '{space}' has no attribute with value '{attr.type}'")
Exception: Space 'ts-stat' has no attribute with value '4'

>> +enum {
>> + ETHTOOL_A_TS_STAT_UNSPEC,
>> + ETHTOOL_A_TS_STAT_PAD,
>> +
>> + ETHTOOL_A_TS_STAT_TX_PKT, /* array, u64 */
>> + ETHTOOL_A_TS_STAT_TX_LOST, /* array, u64 */
>> + ETHTOOL_A_TS_STAT_TX_ERR, /* array, u64 */
>
> I don't think these are arrays.
>
>> +
>> + /* add new constants above here */
>> + __ETHTOOL_A_TS_STAT_CNT,
>> + ETHTOOL_A_TS_STAT_MAX = (__ETHTOOL_A_TS_STAT_CNT - 1)
>> +
>> +};

--
Thanks,

Rahul Rameshbabu

2024-03-14 17:50:32

by Jacob Keller

[permalink] [raw]
Subject: RE: [PATCH RFC v2 1/6] ethtool: add interface to read Tx hardware timestamping statistics



> -----Original Message-----
> From: Jakub Kicinski <[email protected]>
> Sent: Wednesday, March 13, 2024 6:40 PM
> To: Rahul Rameshbabu <[email protected]>
> Cc: Zaki, Ahmed <[email protected]>; Lobakin, Aleksander
> <[email protected]>; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; Keller, Jacob E
> <[email protected]>; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; linux-
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> Greenwalt, Paul <[email protected]>; Kitszel, Przemyslaw
> <[email protected]>; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; Drewek, Wojciech
> <[email protected]>
> Subject: Re: [PATCH RFC v2 1/6] ethtool: add interface to read Tx hardware
> timestamping statistics
>
> On Wed, 13 Mar 2024 17:50:39 -0700 Rahul Rameshbabu wrote:
> > > Should we give some guidance to drivers which "ignore" time stamping
> > > requests if they used up all the "slots"? Even if just temporary until
> > > they are fixed? Maybe we can add after all the fields something like:
> > >
> > > For drivers which ignore further timestamping requests when there are
> > > too many in flight, the ignored requests are currently not counted by
> > > any of the statistics.
> >
> > I was actually thinking it would be better to merge them into the error
> > counter temporarily. Reason being is that in the case Intel notices that
> > their slots are full, they just drop traffic from my understanding
> > today. If the error counters increment in that situation, it helps with
> > the debug to a degree. EBUSY is an error in general.
>
> That works, too, let's recommend it (FWIW no preference whether
> in the entry for @err or somewhere separately in the kdoc).

We don't drop traffic, we send the packets just fine.. We just never report a timestamp for them, since we don't program the hardware to capture that timestamp.

2024-03-14 18:06:09

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH RFC v2 6/6] tools: ynl: ethtool.py: Output timestamping statistics from tsinfo-get operation

On Wed, 13 Mar 2024 23:07:22 -0700 Rahul Rameshbabu wrote:
> What's your expectation for how the request structure would look like? I
> have tried the following.
>
> if args.show_time_stamping:
> req = {
> 'header': {
> 'flags': 'stats',
> },
> }
>
> if args.show_time_stamping:
> req = {
> 'header': {
> 'flags': {
> 'stats': True,
> },
> },
> }
>
> I tried looking through the lib/ynl.py code, but I did not understand
> how the 'flags' type was specifically handled.

Rebased on latest net-next?

I used this:

/tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/ethtool.yaml \
--do fec-get --json '{"header":{"dev-index": 2, "flags": "stats"}}'

2024-03-14 18:06:57

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH RFC v2 1/6] ethtool: add interface to read Tx hardware timestamping statistics

On Thu, 14 Mar 2024 10:01:49 -0700 Rahul Rameshbabu wrote:
> You need the pad to match with ETHTOOL_A_TS_STAT_PAD (which similar to
> other ethtool stats currently defined). Otherwise, you run into the
> following.... mm-stat and fec-stat are good examples.

I don't understand.
Are you sure you changef the kernel to use uint, rebuilt and
there is no ETHTOOL_A_TS_STAT_PAD anywhere, anymore?

2024-03-14 18:43:05

by Rahul Rameshbabu

[permalink] [raw]
Subject: Re: [PATCH RFC v2 6/6] tools: ynl: ethtool.py: Output timestamping statistics from tsinfo-get operation


On Thu, 14 Mar, 2024 11:05:53 -0700 Jakub Kicinski <[email protected]> wrote:
> On Wed, 13 Mar 2024 23:07:22 -0700 Rahul Rameshbabu wrote:
>> What's your expectation for how the request structure would look like? I
>> have tried the following.
>>
>> if args.show_time_stamping:
>> req = {
>> 'header': {
>> 'flags': 'stats',
>> },
>> }
>>
>> if args.show_time_stamping:
>> req = {
>> 'header': {
>> 'flags': {
>> 'stats': True,
>> },
>> },
>> }
>>
>> I tried looking through the lib/ynl.py code, but I did not understand
>> how the 'flags' type was specifically handled.
>
> Rebased on latest net-next?

Thanks, I was using our internal branch that mixes net-next with some of
our mlx5 changes for verification purposes, and it was not rebased to
latest net-next. Your suggestion works as expected with latest net-next
and will be integrated into the series (with the suggested-by trailer
added).

>
> I used this:
>
> ./tools/net/ynl/cli.py \
> --spec Documentation/netlink/specs/ethtool.yaml \
> --do fec-get --json '{"header":{"dev-index": 2, "flags": "stats"}}'

--
Thanks,

Rahul Rameshbabu

2024-03-14 18:50:14

by Rahul Rameshbabu

[permalink] [raw]
Subject: Re: [PATCH RFC v2 1/6] ethtool: add interface to read Tx hardware timestamping statistics


On Thu, 14 Mar, 2024 17:50:10 +0000 "Keller, Jacob E" <[email protected]> wrote:
>> -----Original Message-----
>> From: Jakub Kicinski <[email protected]>
>> Sent: Wednesday, March 13, 2024 6:40 PM
>> To: Rahul Rameshbabu <[email protected]>
>> Cc: Zaki, Ahmed <[email protected]>; Lobakin, Aleksander
>> <[email protected]>; [email protected];
>> [email protected]; [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected]; Keller, Jacob E
>> <[email protected]>; [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected]; linux-
>> [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected];
>> Greenwalt, Paul <[email protected]>; Kitszel, Przemyslaw
>> <[email protected]>; [email protected];
>> [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected]; Drewek, Wojciech
>> <[email protected]>
>> Subject: Re: [PATCH RFC v2 1/6] ethtool: add interface to read Tx hardware
>> timestamping statistics
>>
>> On Wed, 13 Mar 2024 17:50:39 -0700 Rahul Rameshbabu wrote:
>> > > Should we give some guidance to drivers which "ignore" time stamping
>> > > requests if they used up all the "slots"? Even if just temporary until
>> > > they are fixed? Maybe we can add after all the fields something like:
>> > >
>> > > For drivers which ignore further timestamping requests when there are
>> > > too many in flight, the ignored requests are currently not counted by
>> > > any of the statistics.
>> >
>> > I was actually thinking it would be better to merge them into the error
>> > counter temporarily. Reason being is that in the case Intel notices that
>> > their slots are full, they just drop traffic from my understanding
>> > today. If the error counters increment in that situation, it helps with
>> > the debug to a degree. EBUSY is an error in general.
>>
>> That works, too, let's recommend it (FWIW no preference whether
>> in the entry for @err or somewhere separately in the kdoc).
>
> We don't drop traffic, we send the packets just fine.. We just never report a
> timestamp for them, since we don't program the hardware to capture that
> timestamp.

My actual kdoc comments are better now, but I should have been better
with the language I used here in the email. In my head, I was thinking
more about the case of not submitting HW timestamp information when
sending out the packet rather than dropping the packet entirely (I would
say that is still a timestamping error case).

--
Thanks,

Rahul Rameshbabu

2024-03-14 18:52:15

by Rahul Rameshbabu

[permalink] [raw]
Subject: Re: [PATCH RFC v2 1/6] ethtool: add interface to read Tx hardware timestamping statistics


On Thu, 14 Mar, 2024 10:59:43 -0700 Jakub Kicinski <[email protected]> wrote:
> On Thu, 14 Mar 2024 10:01:49 -0700 Rahul Rameshbabu wrote:
>> You need the pad to match with ETHTOOL_A_TS_STAT_PAD (which similar to
>> other ethtool stats currently defined). Otherwise, you run into the
>> following.... mm-stat and fec-stat are good examples.
>
> I don't understand.
> Are you sure you changef the kernel to use uint, rebuilt and
> there is no ETHTOOL_A_TS_STAT_PAD anywhere, anymore?

Sorry, I was not as clear as I could have been with my last reply. I did
leave ETHTOOL_A_TS_STAT_PAD in when I tested (intentionally). I was
trying to mimic other ethtool stats implementations, but you are saying
that in general there is no need for this padding (which I agree with)
and I can remove that unnecessary offset. It'll be different from the
existing stats, but I am ok with that.

--
Thanks,

Rahul Rameshbabu


2024-03-14 19:07:12

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH RFC v2 1/6] ethtool: add interface to read Tx hardware timestamping statistics

On Thu, 14 Mar 2024 11:43:07 -0700 Rahul Rameshbabu wrote:
> > I don't understand.
> > Are you sure you changef the kernel to use uint, rebuilt and
> > there is no ETHTOOL_A_TS_STAT_PAD anywhere, anymore?
>
> Sorry, I was not as clear as I could have been with my last reply. I did
> leave ETHTOOL_A_TS_STAT_PAD in when I tested (intentionally). I was
> trying to mimic other ethtool stats implementations, but you are saying
> that in general there is no need for this padding (which I agree with)
> and I can remove that unnecessary offset. It'll be different from the
> existing stats, but I am ok with that.

Yes, the small divergence is fine - uint is pretty recent addition.

2024-03-14 20:04:53

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH RFC v2 6/6] tools: ynl: ethtool.py: Output timestamping statistics from tsinfo-get operation

On Wed, 13 Mar 2024 17:47:07 -0700 Jakub Kicinski wrote:
> + -
> + name: header-flags
> + type: flags
> + entries: [ compact-bitset, omit-reply, stats ]

Ah. Throw in an empty:

enum-name:

into this, or change the uAPI so that an enum called
ethtool_header_flags actually exists :)

Otherwise make -C tools/net/ynl will break

2024-03-14 20:06:58

by Rahul Rameshbabu

[permalink] [raw]
Subject: Re: [PATCH RFC v2 6/6] tools: ynl: ethtool.py: Output timestamping statistics from tsinfo-get operation


On Thu, 14 Mar, 2024 13:04:36 -0700 Jakub Kicinski <[email protected]> wrote:
> On Wed, 13 Mar 2024 17:47:07 -0700 Jakub Kicinski wrote:
>> + -
>> + name: header-flags
>> + type: flags
>> + entries: [ compact-bitset, omit-reply, stats ]
>
> Ah. Throw in an empty:
>
> enum-name:
>
> into this, or change the uAPI so that an enum called
> ethtool_header_flags actually exists :)
>
> Otherwise make -C tools/net/ynl will break

Already handled this in my patches by making the enum for
ethtool_header_flags. Thanks for double checking though.

--
Rahul Rameshbabu

2024-03-14 20:17:48

by Rahul Rameshbabu

[permalink] [raw]
Subject: Re: [PATCH RFC v2 1/6] ethtool: add interface to read Tx hardware timestamping statistics


On Thu, 14 Mar, 2024 12:06:47 -0700 Jakub Kicinski <[email protected]> wrote:
> On Thu, 14 Mar 2024 11:43:07 -0700 Rahul Rameshbabu wrote:
>> > I don't understand.
>> > Are you sure you changef the kernel to use uint, rebuilt and
>> > there is no ETHTOOL_A_TS_STAT_PAD anywhere, anymore?
>>
>> Sorry, I was not as clear as I could have been with my last reply. I did
>> leave ETHTOOL_A_TS_STAT_PAD in when I tested (intentionally). I was
>> trying to mimic other ethtool stats implementations, but you are saying
>> that in general there is no need for this padding (which I agree with)
>> and I can remove that unnecessary offset. It'll be different from the
>> existing stats, but I am ok with that.
>
> Yes, the small divergence is fine - uint is pretty recent addition.

Yes, the uint suggestion is great since we no longer need to depend on
the padding. Thanks for the feedback. Accounted for in my patches.

--
Thanks,

Rahul Rameshbabu