2024-06-04 00:46:51

by Joe Damato

[permalink] [raw]
Subject: [RFC net-next v4 0/2] mlx5: Add netdev-genl queue stats

Greetings:

Welcome to rfc v4.

Significant rewrite from v3 and hopefully getting closer to correctly
exporting per queue stats from mlx5. Please see changelog below for
detailed changes, especially regarding PTP stats.

Note that my NIC does not seem to support PTP and I couldn't get the
mlnx-tools mlnx_qos script to work, so I was only able to test the
following cases:

- device up at booot
- adjusting queue counts
- device down (e.g. ip link set dev eth4 down)

Please see the commit message of patch 2/2 for more details on output
and test cases.

v3 thread: https://lore.kernel.org/lkml/[email protected]/T/

Thanks,
Joe

rfcv3 -> rfcv4:
- Patch 1/2 now creates a mapping (priv->txq2sq_stats) which maps txq
indices to sq_stats structures so stats can be accessed directly.
This mapping is kept up to date along side txq2sq.

- Patch 2/2:
- All mutex_lock/unlock on state_lock has been dropped.
- mlx5e_get_queue_stats_rx now uses ASSERT_RTNL() and has a special
case for PTP. If PTP was ever opened, is currently opened, and the
channel index matches, stats for PTP RX are output.
- mlx5e_get_queue_stats_tx rewritten to use priv->txq2sq_stats. No
corner cases are needed here because any txq idx (passed in as i)
will have an up to date mapping in priv->txq2sq_stats.
- mlx5e_get_base_stats:
- in the RX case:
- iterates from [params.num_channels, stats_nch) collecting
stats.
- if ptp was ever opened but is currently closed, add the PTP
stats.
- in the TX case:
- handle 2 cases:
- the channel is available, so sum only the unavailable TCs
[mlx5e_get_dcb_num_tc, max_opened_tc).
- the channel is unavailable, so sum all TCs [0, max_opened_tc).
- if ptp was ever opened but is currently closed, add the PTP
sq stats.

v2 -> rfcv3:
- Added patch 1/2 which creates some helpers for computing the txq_ix
and ch_ix/tc_ix.

- Patch 2/2 modified in several ways:
- Fixed variable declarations in mlx5e_get_queue_stats_rx to be at
the start of the function.
- mlx5e_get_queue_stats_tx rewritten to access sq stats directly by
using the helpers added in the previous patch.
- mlx5e_get_base_stats modified in several ways:
- Took the state_lock when accessing priv->channels.
- For the base RX stats, code was simplified to call
mlx5e_get_queue_stats_rx instead of repeating the same code.
- For the base TX stats, I attempted to implement what I think
Tariq suggested in the previous thread:
- for available channels, only unavailable TC stats are summed
- for unavailable channels, all stats for TCs up to
max_opened_tc are summed.

v1 - > v2:
- Essentially a full rewrite after comments from Jakub, Tariq, and
Zhu.

Joe Damato (2):
net/mlx5e: Add txq to sq stats mapping
net/mlx5e: Add per queue netdev-genl stats

drivers/net/ethernet/mellanox/mlx5/core/en.h | 2 +
.../net/ethernet/mellanox/mlx5/core/en/qos.c | 13 +-
.../net/ethernet/mellanox/mlx5/core/en_main.c | 149 +++++++++++++++++-
3 files changed, 161 insertions(+), 3 deletions(-)

--
2.25.1



2024-06-04 00:47:32

by Joe Damato

[permalink] [raw]
Subject: [RFC net-next v4 2/2] net/mlx5e: Add per queue netdev-genl stats

./cli.py --spec netlink/specs/netdev.yaml \
--dump qstats-get --json '{"scope": "queue"}'

...snip

{'ifindex': 7,
'queue-id': 62,
'queue-type': 'rx',
'rx-alloc-fail': 0,
'rx-bytes': 105965251,
'rx-packets': 179790},
{'ifindex': 7,
'queue-id': 0,
'queue-type': 'tx',
'tx-bytes': 9402665,
'tx-packets': 17551},

...snip

Also tested with the script tools/testing/selftests/drivers/net/stats.py
in several scenarios to ensure stats tallying was correct:

- on boot (default queue counts)
- adjusting queue count up or down (ethtool -L eth0 combined ...)

The tools/testing/selftests/drivers/net/stats.py brings the device up,
so to test with the device down, I did the following:

$ ip link show eth4
7: eth4: <BROADCAST,MULTICAST> mtu 9000 qdisc mq state DOWN [..snip..]
[..snip..]

$ cat /proc/net/dev | grep eth4
eth4: 235710489 434811 [..snip rx..] 2878744 21227 [..snip tx..]

$ ./cli.py --spec ../../../Documentation/netlink/specs/netdev.yaml \
--dump qstats-get --json '{"ifindex": 7}'
[{'ifindex': 7,
'rx-alloc-fail': 0,
'rx-bytes': 235710489,
'rx-packets': 434811,
'tx-bytes': 2878744,
'tx-packets': 21227}]

Compare the values in /proc/net/dev match the output of cli for the same
device, even while the device is down.

Note that while the device is down, per queue stats output nothing
(because the device is down there are no queues):

$ ./cli.py --spec ../../../Documentation/netlink/specs/netdev.yaml \
--dump qstats-get --json '{"scope": "queue", "ifindex": 7}'
[]

Signed-off-by: Joe Damato <[email protected]>
---
.../net/ethernet/mellanox/mlx5/core/en_main.c | 138 ++++++++++++++++++
1 file changed, 138 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index d03fd1c98eb6..76d64bbcf250 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -39,6 +39,7 @@
#include <linux/debugfs.h>
#include <linux/if_bridge.h>
#include <linux/filter.h>
+#include <net/netdev_queues.h>
#include <net/page_pool/types.h>
#include <net/pkt_sched.h>
#include <net/xdp_sock_drv.h>
@@ -5279,6 +5280,142 @@ static bool mlx5e_tunnel_any_tx_proto_supported(struct mlx5_core_dev *mdev)
return (mlx5_vxlan_allowed(mdev->vxlan) || mlx5_geneve_tx_allowed(mdev));
}

+static void mlx5e_get_queue_stats_rx(struct net_device *dev, int i,
+ struct netdev_queue_stats_rx *stats)
+{
+ struct mlx5e_priv *priv = netdev_priv(dev);
+ struct mlx5e_channel_stats *channel_stats;
+ struct mlx5e_rq_stats *xskrq_stats;
+ struct mlx5e_rq_stats *rq_stats;
+
+ ASSERT_RTNL();
+ if (mlx5e_is_uplink_rep(priv))
+ return;
+
+ /* ptp was ever opened, is currently open, and channel index matches i
+ * then export stats
+ */
+ if (priv->rx_ptp_opened && priv->channels.ptp) {
+ if (test_bit(MLX5E_PTP_STATE_RX, priv->channels.ptp->state) &&
+ priv->channels.ptp->rq.ix == i) {
+ rq_stats = &priv->ptp_stats.rq;
+ stats->packets = rq_stats->packets;
+ stats->bytes = rq_stats->bytes;
+ stats->alloc_fail = rq_stats->buff_alloc_err;
+ return;
+ }
+ }
+
+ channel_stats = priv->channel_stats[i];
+ xskrq_stats = &channel_stats->xskrq;
+ rq_stats = &channel_stats->rq;
+
+ stats->packets = rq_stats->packets + xskrq_stats->packets;
+ stats->bytes = rq_stats->bytes + xskrq_stats->bytes;
+ stats->alloc_fail = rq_stats->buff_alloc_err +
+ xskrq_stats->buff_alloc_err;
+}
+
+static void mlx5e_get_queue_stats_tx(struct net_device *dev, int i,
+ struct netdev_queue_stats_tx *stats)
+{
+ struct mlx5e_priv *priv = netdev_priv(dev);
+ struct mlx5e_sq_stats *sq_stats;
+
+ ASSERT_RTNL();
+ /* no special case needed for ptp htb etc since txq2sq_stats is kept up
+ * to date for active sq_stats, otherwise get_base_stats takes care of
+ * inactive sqs.
+ */
+ sq_stats = priv->txq2sq_stats[i];
+ stats->packets = sq_stats->packets;
+ stats->bytes = sq_stats->bytes;
+}
+
+static void mlx5e_get_base_stats(struct net_device *dev,
+ struct netdev_queue_stats_rx *rx,
+ struct netdev_queue_stats_tx *tx)
+{
+ struct mlx5e_priv *priv = netdev_priv(dev);
+ int i, tc;
+
+ ASSERT_RTNL();
+ if (!mlx5e_is_uplink_rep(priv)) {
+ rx->packets = 0;
+ rx->bytes = 0;
+ rx->alloc_fail = 0;
+
+ for (i = priv->channels.params.num_channels; i < priv->stats_nch; i++) {
+ struct netdev_queue_stats_rx rx_i = {0};
+
+ mlx5e_get_queue_stats_rx(dev, i, &rx_i);
+
+ rx->packets += rx_i.packets;
+ rx->bytes += rx_i.bytes;
+ rx->alloc_fail += rx_i.alloc_fail;
+ }
+
+ if (priv->rx_ptp_opened) {
+ /* if PTP was opened, but is not currently open, then
+ * report the stats here. otherwise,
+ * mlx5e_get_queue_stats_rx will get it
+ */
+ if (priv->channels.ptp &&
+ !test_bit(MLX5E_PTP_STATE_RX, priv->channels.ptp->state)) {
+ struct mlx5e_rq_stats *rq_stats = &priv->ptp_stats.rq;
+
+ rx->packets += rq_stats->packets;
+ rx->bytes += rq_stats->bytes;
+ }
+ }
+ }
+
+ tx->packets = 0;
+ tx->bytes = 0;
+
+ for (i = 0; i < priv->stats_nch; i++) {
+ struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i];
+
+ /* while iterating through all channels [0, stats_nch], there
+ * are two cases to handle:
+ *
+ * 1. the channel is available, so sum only the unavailable TCs
+ * [mlx5e_get_dcb_num_tc, max_opened_tc).
+ *
+ * 2. the channel is unavailable, so sum all TCs [0, max_opened_tc).
+ */
+ if (i < priv->channels.params.num_channels)
+ tc = mlx5e_get_dcb_num_tc(&priv->channels.params);
+ else
+ tc = 0;
+
+ for (; tc < priv->max_opened_tc; tc++) {
+ struct mlx5e_sq_stats *sq_stats = &channel_stats->sq[tc];
+
+ tx->packets += sq_stats->packets;
+ tx->bytes += sq_stats->bytes;
+ }
+ }
+
+ if (priv->tx_ptp_opened) {
+ /* only report PTP TCs if it was opened but is now closed */
+ if (priv->channels.ptp && !test_bit(MLX5E_PTP_STATE_TX, priv->channels.ptp->state)) {
+ for (tc = 0; tc < priv->channels.ptp->num_tc; tc++) {
+ struct mlx5e_sq_stats *sq_stats = &priv->ptp_stats.sq[tc];
+
+ tx->packets += sq_stats->packets;
+ tx->bytes += sq_stats->bytes;
+ }
+ }
+ }
+}
+
+static const struct netdev_stat_ops mlx5e_stat_ops = {
+ .get_queue_stats_rx = mlx5e_get_queue_stats_rx,
+ .get_queue_stats_tx = mlx5e_get_queue_stats_tx,
+ .get_base_stats = mlx5e_get_base_stats,
+};
+
static void mlx5e_build_nic_netdev(struct net_device *netdev)
{
struct mlx5e_priv *priv = netdev_priv(netdev);
@@ -5296,6 +5433,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)

netdev->watchdog_timeo = 15 * HZ;

+ netdev->stat_ops = &mlx5e_stat_ops;
netdev->ethtool_ops = &mlx5e_ethtool_ops;

netdev->vlan_features |= NETIF_F_SG;
--
2.25.1


2024-06-04 00:48:59

by Joe Damato

[permalink] [raw]
Subject: [RFC net-next v4 1/2] net/mlx5e: Add txq to sq stats mapping

mlx5 currently maps txqs to an sq via priv->txq2sq. It is useful to map
txqs to sq_stats, as well, for direct access to stats.

Add priv->txq2sq_stats and insert mappings. The mappings will be used
next to tabulate stats information.

Signed-off-by: Joe Damato <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/en.h | 2 ++
drivers/net/ethernet/mellanox/mlx5/core/en/qos.c | 13 +++++++++++--
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 11 ++++++++++-
3 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index e85fb71bf0b4..4ae3eee3940c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -885,6 +885,8 @@ struct mlx5e_priv {
/* priv data path fields - start */
struct mlx5e_selq selq;
struct mlx5e_txqsq **txq2sq;
+ struct mlx5e_sq_stats **txq2sq_stats;
+
#ifdef CONFIG_MLX5_CORE_EN_DCB
struct mlx5e_dcbx_dp dcbx_dp;
#endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c
index 6743806b8480..e89272a5d036 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c
@@ -170,6 +170,7 @@ int mlx5e_activate_qos_sq(void *data, u16 node_qid, u32 hw_id)
mlx5e_tx_disable_queue(netdev_get_tx_queue(priv->netdev, qid));

priv->txq2sq[qid] = sq;
+ priv->txq2sq_stats[qid] = sq->stats;

/* Make the change to txq2sq visible before the queue is started.
* As mlx5e_xmit runs under a spinlock, there is an implicit ACQUIRE,
@@ -186,6 +187,7 @@ int mlx5e_activate_qos_sq(void *data, u16 node_qid, u32 hw_id)
void mlx5e_deactivate_qos_sq(struct mlx5e_priv *priv, u16 qid)
{
struct mlx5e_txqsq *sq;
+ u16 mlx5e_qid;

sq = mlx5e_get_qos_sq(priv, qid);
if (!sq) /* Handle the case when the SQ failed to open. */
@@ -194,7 +196,10 @@ void mlx5e_deactivate_qos_sq(struct mlx5e_priv *priv, u16 qid)
qos_dbg(sq->mdev, "Deactivate QoS SQ qid %u\n", qid);
mlx5e_deactivate_txqsq(sq);

- priv->txq2sq[mlx5e_qid_from_qos(&priv->channels, qid)] = NULL;
+ mlx5e_qid = mlx5e_qid_from_qos(&priv->channels, qid);
+
+ priv->txq2sq[mlx5e_qid] = NULL;
+ priv->txq2sq_stats[mlx5e_qid] = NULL;

/* Make the change to txq2sq visible before the queue is started again.
* As mlx5e_xmit runs under a spinlock, there is an implicit ACQUIRE,
@@ -325,6 +330,7 @@ void mlx5e_qos_deactivate_queues(struct mlx5e_channel *c)
{
struct mlx5e_params *params = &c->priv->channels.params;
struct mlx5e_txqsq __rcu **qos_sqs;
+ u16 mlx5e_qid;
int i;

qos_sqs = mlx5e_state_dereference(c->priv, c->qos_sqs);
@@ -342,8 +348,11 @@ void mlx5e_qos_deactivate_queues(struct mlx5e_channel *c)
qos_dbg(c->mdev, "Deactivate QoS SQ qid %u\n", qid);
mlx5e_deactivate_txqsq(sq);

+ mlx5e_qid = mlx5e_qid_from_qos(&c->priv->channels, qid);
+
/* The queue is disabled, no synchronization with datapath is needed. */
- c->priv->txq2sq[mlx5e_qid_from_qos(&c->priv->channels, qid)] = NULL;
+ c->priv->txq2sq[mlx5e_qid] = NULL;
+ c->priv->txq2sq_stats[mlx5e_qid] = NULL;
}
}

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index c53c99dde558..d03fd1c98eb6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3111,6 +3111,7 @@ static void mlx5e_build_txq_maps(struct mlx5e_priv *priv)
struct mlx5e_txqsq *sq = &c->sq[tc];

priv->txq2sq[sq->txq_ix] = sq;
+ priv->txq2sq_stats[sq->txq_ix] = sq->stats;
}
}

@@ -3125,6 +3126,7 @@ static void mlx5e_build_txq_maps(struct mlx5e_priv *priv)
struct mlx5e_txqsq *sq = &c->ptpsq[tc].txqsq;

priv->txq2sq[sq->txq_ix] = sq;
+ priv->txq2sq_stats[sq->txq_ix] = sq->stats;
}

out:
@@ -5824,9 +5826,13 @@ int mlx5e_priv_init(struct mlx5e_priv *priv,
if (!priv->txq2sq)
goto err_destroy_workqueue;

+ priv->txq2sq_stats = kcalloc_node(num_txqs, sizeof(*priv->txq2sq_stats), GFP_KERNEL, node);
+ if (!priv->txq2sq_stats)
+ goto err_free_txq2sq;
+
priv->tx_rates = kcalloc_node(num_txqs, sizeof(*priv->tx_rates), GFP_KERNEL, node);
if (!priv->tx_rates)
- goto err_free_txq2sq;
+ goto err_free_txq2sq_stats;

priv->channel_stats =
kcalloc_node(nch, sizeof(*priv->channel_stats), GFP_KERNEL, node);
@@ -5837,6 +5843,8 @@ int mlx5e_priv_init(struct mlx5e_priv *priv,

err_free_tx_rates:
kfree(priv->tx_rates);
+err_free_txq2sq_stats:
+ kfree(priv->txq2sq_stats);
err_free_txq2sq:
kfree(priv->txq2sq);
err_destroy_workqueue:
@@ -5860,6 +5868,7 @@ void mlx5e_priv_cleanup(struct mlx5e_priv *priv)
kvfree(priv->channel_stats[i]);
kfree(priv->channel_stats);
kfree(priv->tx_rates);
+ kfree(priv->txq2sq_stats);
kfree(priv->txq2sq);
destroy_workqueue(priv->wq);
mlx5e_selq_cleanup(&priv->selq);
--
2.25.1


2024-06-06 16:25:21

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [RFC net-next v4 0/2] mlx5: Add netdev-genl queue stats

On Tue, 4 Jun 2024 00:46:24 +0000 Joe Damato wrote:
> Significant rewrite from v3 and hopefully getting closer to correctly
> exporting per queue stats from mlx5. Please see changelog below for
> detailed changes, especially regarding PTP stats.
>
> Note that my NIC does not seem to support PTP and I couldn't get the
> mlnx-tools mlnx_qos script to work, so I was only able to test the
> following cases:
>
> - device up at booot
> - adjusting queue counts
> - device down (e.g. ip link set dev eth4 down)
>
> Please see the commit message of patch 2/2 for more details on output
> and test cases.

nvidia, please review this

It's less than 200 lines of code, and every time Joe posts a new
version he has to wait a week to get feedback.

2024-06-06 20:12:20

by Tariq Toukan

[permalink] [raw]
Subject: Re: [RFC net-next v4 2/2] net/mlx5e: Add per queue netdev-genl stats



On 04/06/2024 3:46, Joe Damato wrote:
> ./cli.py --spec netlink/specs/netdev.yaml \
> --dump qstats-get --json '{"scope": "queue"}'
>
> ...snip
>
> {'ifindex': 7,
> 'queue-id': 62,
> 'queue-type': 'rx',
> 'rx-alloc-fail': 0,
> 'rx-bytes': 105965251,
> 'rx-packets': 179790},
> {'ifindex': 7,
> 'queue-id': 0,
> 'queue-type': 'tx',
> 'tx-bytes': 9402665,
> 'tx-packets': 17551},
>
> ...snip
>
> Also tested with the script tools/testing/selftests/drivers/net/stats.py
> in several scenarios to ensure stats tallying was correct:
>
> - on boot (default queue counts)
> - adjusting queue count up or down (ethtool -L eth0 combined ...)
>
> The tools/testing/selftests/drivers/net/stats.py brings the device up,
> so to test with the device down, I did the following:
>
> $ ip link show eth4
> 7: eth4: <BROADCAST,MULTICAST> mtu 9000 qdisc mq state DOWN [..snip..]
> [..snip..]
>
> $ cat /proc/net/dev | grep eth4
> eth4: 235710489 434811 [..snip rx..] 2878744 21227 [..snip tx..]
>
> $ ./cli.py --spec ../../../Documentation/netlink/specs/netdev.yaml \
> --dump qstats-get --json '{"ifindex": 7}'
> [{'ifindex': 7,
> 'rx-alloc-fail': 0,
> 'rx-bytes': 235710489,
> 'rx-packets': 434811,
> 'tx-bytes': 2878744,
> 'tx-packets': 21227}]
>
> Compare the values in /proc/net/dev match the output of cli for the same
> device, even while the device is down.
>
> Note that while the device is down, per queue stats output nothing
> (because the device is down there are no queues):

This part is not true anymore.

>
> $ ./cli.py --spec ../../../Documentation/netlink/specs/netdev.yaml \
> --dump qstats-get --json '{"scope": "queue", "ifindex": 7}'
> []
>
> Signed-off-by: Joe Damato <[email protected]>
> ---
> .../net/ethernet/mellanox/mlx5/core/en_main.c | 138 ++++++++++++++++++
> 1 file changed, 138 insertions(+)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index d03fd1c98eb6..76d64bbcf250 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -39,6 +39,7 @@
> #include <linux/debugfs.h>
> #include <linux/if_bridge.h>
> #include <linux/filter.h>
> +#include <net/netdev_queues.h>
> #include <net/page_pool/types.h>
> #include <net/pkt_sched.h>
> #include <net/xdp_sock_drv.h>
> @@ -5279,6 +5280,142 @@ static bool mlx5e_tunnel_any_tx_proto_supported(struct mlx5_core_dev *mdev)
> return (mlx5_vxlan_allowed(mdev->vxlan) || mlx5_geneve_tx_allowed(mdev));
> }
>
> +static void mlx5e_get_queue_stats_rx(struct net_device *dev, int i,
> + struct netdev_queue_stats_rx *stats)
> +{
> + struct mlx5e_priv *priv = netdev_priv(dev);
> + struct mlx5e_channel_stats *channel_stats;
> + struct mlx5e_rq_stats *xskrq_stats;
> + struct mlx5e_rq_stats *rq_stats;
> +
> + ASSERT_RTNL();
> + if (mlx5e_is_uplink_rep(priv))
> + return;
> +
> + /* ptp was ever opened, is currently open, and channel index matches i
> + * then export stats
> + */
> + if (priv->rx_ptp_opened && priv->channels.ptp) {
> + if (test_bit(MLX5E_PTP_STATE_RX, priv->channels.ptp->state) &&
> + priv->channels.ptp->rq.ix == i) {

PTP RQ index is naively assigned to zero:
rq->ix = MLX5E_PTP_CHANNEL_IX;

but this isn't to be used as the stats index.
Today, the PTP-RQ has no matcing rxq in the kernel level.
i.e. turning PTP-RQ on won't add a kernel-level RXQ to the
real_num_rx_queues.
Maybe we better do.
If not, and the current state is kept, the best we can do is let the
PTP-RQ naively contribute its queue-stat to channel 0.

> + rq_stats = &priv->ptp_stats.rq;
> + stats->packets = rq_stats->packets;
> + stats->bytes = rq_stats->bytes;
> + stats->alloc_fail = rq_stats->buff_alloc_err;
> + return;
> + }
> + }
> +
> + channel_stats = priv->channel_stats[i];
> + xskrq_stats = &channel_stats->xskrq;
> + rq_stats = &channel_stats->rq;
> +
> + stats->packets = rq_stats->packets + xskrq_stats->packets;
> + stats->bytes = rq_stats->bytes + xskrq_stats->bytes;
> + stats->alloc_fail = rq_stats->buff_alloc_err +
> + xskrq_stats->buff_alloc_err;
> +}
> +
> +static void mlx5e_get_queue_stats_tx(struct net_device *dev, int i,
> + struct netdev_queue_stats_tx *stats)
> +{
> + struct mlx5e_priv *priv = netdev_priv(dev);
> + struct mlx5e_sq_stats *sq_stats;
> +
> + ASSERT_RTNL();
> + /* no special case needed for ptp htb etc since txq2sq_stats is kept up
> + * to date for active sq_stats, otherwise get_base_stats takes care of
> + * inactive sqs.
> + */
> + sq_stats = priv->txq2sq_stats[i];
> + stats->packets = sq_stats->packets;
> + stats->bytes = sq_stats->bytes;
> +}
> +
> +static void mlx5e_get_base_stats(struct net_device *dev,
> + struct netdev_queue_stats_rx *rx,
> + struct netdev_queue_stats_tx *tx)
> +{
> + struct mlx5e_priv *priv = netdev_priv(dev);
> + int i, tc;
> +
> + ASSERT_RTNL();
> + if (!mlx5e_is_uplink_rep(priv)) {
> + rx->packets = 0;
> + rx->bytes = 0;
> + rx->alloc_fail = 0;
> +
> + for (i = priv->channels.params.num_channels; i < priv->stats_nch; i++) {
> + struct netdev_queue_stats_rx rx_i = {0};
> +
> + mlx5e_get_queue_stats_rx(dev, i, &rx_i);
> +
> + rx->packets += rx_i.packets;
> + rx->bytes += rx_i.bytes;
> + rx->alloc_fail += rx_i.alloc_fail;
> + }
> +
> + if (priv->rx_ptp_opened) {
> + /* if PTP was opened, but is not currently open, then
> + * report the stats here. otherwise,
> + * mlx5e_get_queue_stats_rx will get it
> + */

We shouldn't care if the RQ is currently open. The stats are always there.
This applies to all RQs and SQs.

> + if (priv->channels.ptp &&
> + !test_bit(MLX5E_PTP_STATE_RX, priv->channels.ptp->state)) {
> + struct mlx5e_rq_stats *rq_stats = &priv->ptp_stats.rq;
> +
> + rx->packets += rq_stats->packets;
> + rx->bytes += rq_stats->bytes;
> + }
> + }
> + }
> +
> + tx->packets = 0;
> + tx->bytes = 0;
> +
> + for (i = 0; i < priv->stats_nch; i++) {
> + struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i];
> +
> + /* while iterating through all channels [0, stats_nch], there
> + * are two cases to handle:
> + *
> + * 1. the channel is available, so sum only the unavailable TCs
> + * [mlx5e_get_dcb_num_tc, max_opened_tc).
> + *
> + * 2. the channel is unavailable, so sum all TCs [0, max_opened_tc).
> + */

Even if the channel is not available, mlx5e_get_queue_stats_tx()
accesses and returns its stats.
Here you need to only cover SQs that have no mapping in range
[0..real_num_tx_queues - 1].

> + if (i < priv->channels.params.num_channels)
> + tc = mlx5e_get_dcb_num_tc(&priv->channels.params);
> + else
> + tc = 0;
> +
> + for (; tc < priv->max_opened_tc; tc++) {
> + struct mlx5e_sq_stats *sq_stats = &channel_stats->sq[tc];
> +
> + tx->packets += sq_stats->packets;
> + tx->bytes += sq_stats->bytes;
> + }
> + }
> +
> + if (priv->tx_ptp_opened) {
> + /* only report PTP TCs if it was opened but is now closed */
> + if (priv->channels.ptp && !test_bit(MLX5E_PTP_STATE_TX, priv->channels.ptp->state)) {
> + for (tc = 0; tc < priv->channels.ptp->num_tc; tc++) {
> + struct mlx5e_sq_stats *sq_stats = &priv->ptp_stats.sq[tc];
> +
> + tx->packets += sq_stats->packets;
> + tx->bytes += sq_stats->bytes;
> + }
> + }
> + }
> +}
> +
> +static const struct netdev_stat_ops mlx5e_stat_ops = {
> + .get_queue_stats_rx = mlx5e_get_queue_stats_rx,
> + .get_queue_stats_tx = mlx5e_get_queue_stats_tx,
> + .get_base_stats = mlx5e_get_base_stats,
> +};
> +
> static void mlx5e_build_nic_netdev(struct net_device *netdev)
> {
> struct mlx5e_priv *priv = netdev_priv(netdev);
> @@ -5296,6 +5433,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
>
> netdev->watchdog_timeo = 15 * HZ;
>
> + netdev->stat_ops = &mlx5e_stat_ops;
> netdev->ethtool_ops = &mlx5e_ethtool_ops;
>
> netdev->vlan_features |= NETIF_F_SG;

2024-06-06 20:14:16

by Tariq Toukan

[permalink] [raw]
Subject: Re: [RFC net-next v4 1/2] net/mlx5e: Add txq to sq stats mapping



On 04/06/2024 3:46, Joe Damato wrote:
> mlx5 currently maps txqs to an sq via priv->txq2sq. It is useful to map
> txqs to sq_stats, as well, for direct access to stats.
>
> Add priv->txq2sq_stats and insert mappings. The mappings will be used
> next to tabulate stats information.
>
> Signed-off-by: Joe Damato <[email protected]>
> ---
> drivers/net/ethernet/mellanox/mlx5/core/en.h | 2 ++
> drivers/net/ethernet/mellanox/mlx5/core/en/qos.c | 13 +++++++++++--
> drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 11 ++++++++++-
> 3 files changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
> index e85fb71bf0b4..4ae3eee3940c 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
> @@ -885,6 +885,8 @@ struct mlx5e_priv {
> /* priv data path fields - start */
> struct mlx5e_selq selq;
> struct mlx5e_txqsq **txq2sq;
> + struct mlx5e_sq_stats **txq2sq_stats;
> +
> #ifdef CONFIG_MLX5_CORE_EN_DCB
> struct mlx5e_dcbx_dp dcbx_dp;
> #endif
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c
> index 6743806b8480..e89272a5d036 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c
> @@ -170,6 +170,7 @@ int mlx5e_activate_qos_sq(void *data, u16 node_qid, u32 hw_id)
> mlx5e_tx_disable_queue(netdev_get_tx_queue(priv->netdev, qid));
>
> priv->txq2sq[qid] = sq;
> + priv->txq2sq_stats[qid] = sq->stats;
>
> /* Make the change to txq2sq visible before the queue is started.
> * As mlx5e_xmit runs under a spinlock, there is an implicit ACQUIRE,
> @@ -186,6 +187,7 @@ int mlx5e_activate_qos_sq(void *data, u16 node_qid, u32 hw_id)
> void mlx5e_deactivate_qos_sq(struct mlx5e_priv *priv, u16 qid)
> {
> struct mlx5e_txqsq *sq;
> + u16 mlx5e_qid;

Do not use mlx5 in variables names.

>
> sq = mlx5e_get_qos_sq(priv, qid);
> if (!sq) /* Handle the case when the SQ failed to open. */
> @@ -194,7 +196,10 @@ void mlx5e_deactivate_qos_sq(struct mlx5e_priv *priv, u16 qid)
> qos_dbg(sq->mdev, "Deactivate QoS SQ qid %u\n", qid);
> mlx5e_deactivate_txqsq(sq);
>
> - priv->txq2sq[mlx5e_qid_from_qos(&priv->channels, qid)] = NULL;
> + mlx5e_qid = mlx5e_qid_from_qos(&priv->channels, qid);
> +
> + priv->txq2sq[mlx5e_qid] = NULL;
> + priv->txq2sq_stats[mlx5e_qid] = NULL;
>
> /* Make the change to txq2sq visible before the queue is started again.
> * As mlx5e_xmit runs under a spinlock, there is an implicit ACQUIRE,
> @@ -325,6 +330,7 @@ void mlx5e_qos_deactivate_queues(struct mlx5e_channel *c)
> {
> struct mlx5e_params *params = &c->priv->channels.params;
> struct mlx5e_txqsq __rcu **qos_sqs;
> + u16 mlx5e_qid;
> int i;
>
> qos_sqs = mlx5e_state_dereference(c->priv, c->qos_sqs);
> @@ -342,8 +348,11 @@ void mlx5e_qos_deactivate_queues(struct mlx5e_channel *c)
> qos_dbg(c->mdev, "Deactivate QoS SQ qid %u\n", qid);
> mlx5e_deactivate_txqsq(sq);
>
> + mlx5e_qid = mlx5e_qid_from_qos(&c->priv->channels, qid);
> +
> /* The queue is disabled, no synchronization with datapath is needed. */
> - c->priv->txq2sq[mlx5e_qid_from_qos(&c->priv->channels, qid)] = NULL;
> + c->priv->txq2sq[mlx5e_qid] = NULL;
> + c->priv->txq2sq_stats[mlx5e_qid] = NULL;
> }
> }
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index c53c99dde558..d03fd1c98eb6 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -3111,6 +3111,7 @@ static void mlx5e_build_txq_maps(struct mlx5e_priv *priv)
> struct mlx5e_txqsq *sq = &c->sq[tc];
>
> priv->txq2sq[sq->txq_ix] = sq;
> + priv->txq2sq_stats[sq->txq_ix] = sq->stats;
> }
> }
>
> @@ -3125,6 +3126,7 @@ static void mlx5e_build_txq_maps(struct mlx5e_priv *priv)
> struct mlx5e_txqsq *sq = &c->ptpsq[tc].txqsq;
>
> priv->txq2sq[sq->txq_ix] = sq;
> + priv->txq2sq_stats[sq->txq_ix] = sq->stats;
> }
>
> out:
> @@ -5824,9 +5826,13 @@ int mlx5e_priv_init(struct mlx5e_priv *priv,
> if (!priv->txq2sq)
> goto err_destroy_workqueue;
>
> + priv->txq2sq_stats = kcalloc_node(num_txqs, sizeof(*priv->txq2sq_stats), GFP_KERNEL, node);
> + if (!priv->txq2sq_stats)
> + goto err_free_txq2sq;
> +
> priv->tx_rates = kcalloc_node(num_txqs, sizeof(*priv->tx_rates), GFP_KERNEL, node);
> if (!priv->tx_rates)
> - goto err_free_txq2sq;
> + goto err_free_txq2sq_stats;
>
> priv->channel_stats =
> kcalloc_node(nch, sizeof(*priv->channel_stats), GFP_KERNEL, node);
> @@ -5837,6 +5843,8 @@ int mlx5e_priv_init(struct mlx5e_priv *priv,
>
> err_free_tx_rates:
> kfree(priv->tx_rates);
> +err_free_txq2sq_stats:
> + kfree(priv->txq2sq_stats);
> err_free_txq2sq:
> kfree(priv->txq2sq);
> err_destroy_workqueue:
> @@ -5860,6 +5868,7 @@ void mlx5e_priv_cleanup(struct mlx5e_priv *priv)
> kvfree(priv->channel_stats[i]);
> kfree(priv->channel_stats);
> kfree(priv->tx_rates);
> + kfree(priv->txq2sq_stats);
> kfree(priv->txq2sq);
> destroy_workqueue(priv->wq);
> mlx5e_selq_cleanup(&priv->selq);

2024-06-06 21:55:03

by Joe Damato

[permalink] [raw]
Subject: Re: [RFC net-next v4 2/2] net/mlx5e: Add per queue netdev-genl stats

On Thu, Jun 06, 2024 at 11:11:57PM +0300, Tariq Toukan wrote:
>
>
> On 04/06/2024 3:46, Joe Damato wrote:
> > ./cli.py --spec netlink/specs/netdev.yaml \
> > --dump qstats-get --json '{"scope": "queue"}'
> >
> > ...snip
> >
> > {'ifindex': 7,
> > 'queue-id': 62,
> > 'queue-type': 'rx',
> > 'rx-alloc-fail': 0,
> > 'rx-bytes': 105965251,
> > 'rx-packets': 179790},
> > {'ifindex': 7,
> > 'queue-id': 0,
> > 'queue-type': 'tx',
> > 'tx-bytes': 9402665,
> > 'tx-packets': 17551},
> >
> > ...snip
> >
> > Also tested with the script tools/testing/selftests/drivers/net/stats.py
> > in several scenarios to ensure stats tallying was correct:
> >
> > - on boot (default queue counts)
> > - adjusting queue count up or down (ethtool -L eth0 combined ...)
> >
> > The tools/testing/selftests/drivers/net/stats.py brings the device up,
> > so to test with the device down, I did the following:
> >
> > $ ip link show eth4
> > 7: eth4: <BROADCAST,MULTICAST> mtu 9000 qdisc mq state DOWN [..snip..]
> > [..snip..]
> >
> > $ cat /proc/net/dev | grep eth4
> > eth4: 235710489 434811 [..snip rx..] 2878744 21227 [..snip tx..]
> >
> > $ ./cli.py --spec ../../../Documentation/netlink/specs/netdev.yaml \
> > --dump qstats-get --json '{"ifindex": 7}'
> > [{'ifindex': 7,
> > 'rx-alloc-fail': 0,
> > 'rx-bytes': 235710489,
> > 'rx-packets': 434811,
> > 'tx-bytes': 2878744,
> > 'tx-packets': 21227}]
> >
> > Compare the values in /proc/net/dev match the output of cli for the same
> > device, even while the device is down.
> >
> > Note that while the device is down, per queue stats output nothing
> > (because the device is down there are no queues):
>
> This part is not true anymore.

It is true with this patch applied and running the command below.
Maybe I should have been more explicit that using cli.py outputs []
when scope = queue, which could be an internal cli.py thing, but
this is definitely true with this patch.

Did you test it and get different results?

> >
> > $ ./cli.py --spec ../../../Documentation/netlink/specs/netdev.yaml \
> > --dump qstats-get --json '{"scope": "queue", "ifindex": 7}'
> > []
> >
> > Signed-off-by: Joe Damato <[email protected]>
> > ---
> > .../net/ethernet/mellanox/mlx5/core/en_main.c | 138 ++++++++++++++++++
> > 1 file changed, 138 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > index d03fd1c98eb6..76d64bbcf250 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > @@ -39,6 +39,7 @@
> > #include <linux/debugfs.h>
> > #include <linux/if_bridge.h>
> > #include <linux/filter.h>
> > +#include <net/netdev_queues.h>
> > #include <net/page_pool/types.h>
> > #include <net/pkt_sched.h>
> > #include <net/xdp_sock_drv.h>
> > @@ -5279,6 +5280,142 @@ static bool mlx5e_tunnel_any_tx_proto_supported(struct mlx5_core_dev *mdev)
> > return (mlx5_vxlan_allowed(mdev->vxlan) || mlx5_geneve_tx_allowed(mdev));
> > }
> > +static void mlx5e_get_queue_stats_rx(struct net_device *dev, int i,
> > + struct netdev_queue_stats_rx *stats)
> > +{
> > + struct mlx5e_priv *priv = netdev_priv(dev);
> > + struct mlx5e_channel_stats *channel_stats;
> > + struct mlx5e_rq_stats *xskrq_stats;
> > + struct mlx5e_rq_stats *rq_stats;
> > +
> > + ASSERT_RTNL();
> > + if (mlx5e_is_uplink_rep(priv))
> > + return;
> > +
> > + /* ptp was ever opened, is currently open, and channel index matches i
> > + * then export stats
> > + */
> > + if (priv->rx_ptp_opened && priv->channels.ptp) {
> > + if (test_bit(MLX5E_PTP_STATE_RX, priv->channels.ptp->state) &&
> > + priv->channels.ptp->rq.ix == i) {
>
> PTP RQ index is naively assigned to zero:
> rq->ix = MLX5E_PTP_CHANNEL_IX;
>
> but this isn't to be used as the stats index.
> Today, the PTP-RQ has no matcing rxq in the kernel level.
> i.e. turning PTP-RQ on won't add a kernel-level RXQ to the
> real_num_rx_queues.
> Maybe we better do.
> If not, and the current state is kept, the best we can do is let the PTP-RQ
> naively contribute its queue-stat to channel 0.

OK, it sounds like the easiest thing to do is just count PTP as
channel 0, so if i == 0, I'll in the PTP stats.

But please see below regarding testing whether or not PTP is
actually enabled or not.

> > + rq_stats = &priv->ptp_stats.rq;
> > + stats->packets = rq_stats->packets;
> > + stats->bytes = rq_stats->bytes;
> > + stats->alloc_fail = rq_stats->buff_alloc_err;
> > + return;
> > + }
> > + }
> > +
> > + channel_stats = priv->channel_stats[i];
> > + xskrq_stats = &channel_stats->xskrq;
> > + rq_stats = &channel_stats->rq;
> > +
> > + stats->packets = rq_stats->packets + xskrq_stats->packets;
> > + stats->bytes = rq_stats->bytes + xskrq_stats->bytes;
> > + stats->alloc_fail = rq_stats->buff_alloc_err +
> > + xskrq_stats->buff_alloc_err;
> > +}
> > +
> > +static void mlx5e_get_queue_stats_tx(struct net_device *dev, int i,
> > + struct netdev_queue_stats_tx *stats)
> > +{
> > + struct mlx5e_priv *priv = netdev_priv(dev);
> > + struct mlx5e_sq_stats *sq_stats;
> > +
> > + ASSERT_RTNL();
> > + /* no special case needed for ptp htb etc since txq2sq_stats is kept up
> > + * to date for active sq_stats, otherwise get_base_stats takes care of
> > + * inactive sqs.
> > + */
> > + sq_stats = priv->txq2sq_stats[i];
> > + stats->packets = sq_stats->packets;
> > + stats->bytes = sq_stats->bytes;
> > +}
> > +
> > +static void mlx5e_get_base_stats(struct net_device *dev,
> > + struct netdev_queue_stats_rx *rx,
> > + struct netdev_queue_stats_tx *tx)
> > +{
> > + struct mlx5e_priv *priv = netdev_priv(dev);
> > + int i, tc;
> > +
> > + ASSERT_RTNL();
> > + if (!mlx5e_is_uplink_rep(priv)) {
> > + rx->packets = 0;
> > + rx->bytes = 0;
> > + rx->alloc_fail = 0;
> > +
> > + for (i = priv->channels.params.num_channels; i < priv->stats_nch; i++) {
> > + struct netdev_queue_stats_rx rx_i = {0};
> > +
> > + mlx5e_get_queue_stats_rx(dev, i, &rx_i);
> > +
> > + rx->packets += rx_i.packets;
> > + rx->bytes += rx_i.bytes;
> > + rx->alloc_fail += rx_i.alloc_fail;
> > + }
> > +
> > + if (priv->rx_ptp_opened) {
> > + /* if PTP was opened, but is not currently open, then
> > + * report the stats here. otherwise,
> > + * mlx5e_get_queue_stats_rx will get it
> > + */
>
> We shouldn't care if the RQ is currently open. The stats are always there.
> This applies to all RQs and SQs.

The idea was that if PTP was opened before but the bit was set to
signal that it is closed now, it would reported in base because it's
inactive -- like other inactive RQs.

If it was opened before --AND-- the open bit was set, it'll be
reported with channel 0 stats in mlx5e_get_queue_stats_rx.

That makes sense to me, but it sounds like you are saying something
different?

Are you saying to always report it with channel 0 in
mlx5e_get_queue_stats_rx even if:

- !priv->rx_ptp_opened (meaning it was never opened, and thus
would be zero)

and

- (priv->rx_ptp_opened && !test_bit(MLX5E_PTP_STATE_RX,
priv->channels.ptp->state)) meaning it was opened before, but is
currently closed

If so, that means we never report PTP in base. Is that what you want
me to do?

I honestly don't care either way but this seems slightly
inconsistent, doesn't it?

If base is reporting inactive RQs, shouldn't PTP be reported here if
its inactive ?

Please let me know.

> > + if (priv->channels.ptp &&
> > + !test_bit(MLX5E_PTP_STATE_RX, priv->channels.ptp->state)) {
> > + struct mlx5e_rq_stats *rq_stats = &priv->ptp_stats.rq;
> > +
> > + rx->packets += rq_stats->packets;
> > + rx->bytes += rq_stats->bytes;
> > + }
> > + }
> > + }
> > +
> > + tx->packets = 0;
> > + tx->bytes = 0;
> > +
> > + for (i = 0; i < priv->stats_nch; i++) {
> > + struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i];
> > +
> > + /* while iterating through all channels [0, stats_nch], there
> > + * are two cases to handle:
> > + *
> > + * 1. the channel is available, so sum only the unavailable TCs
> > + * [mlx5e_get_dcb_num_tc, max_opened_tc).
> > + *
> > + * 2. the channel is unavailable, so sum all TCs [0, max_opened_tc).
> > + */
>
> Even if the channel is not available, mlx5e_get_queue_stats_tx() accesses
> and returns its stats.

Ah, yes. My mistake.

> Here you need to only cover SQs that have no mapping in range
> [0..real_num_tx_queues - 1].

So, that means the loops should be:

outer loop: [0, priv->stats_nch)
inner loop: [ mlx5e_get_dcb_num_tc, max_opened_tc )

Right? That would be only the SQs which have no mapping, I think.

> > + if (i < priv->channels.params.num_channels)
> > + tc = mlx5e_get_dcb_num_tc(&priv->channels.params);
> > + else
> > + tc = 0;
> > +
> > + for (; tc < priv->max_opened_tc; tc++) {
> > + struct mlx5e_sq_stats *sq_stats = &channel_stats->sq[tc];
> > +
> > + tx->packets += sq_stats->packets;
> > + tx->bytes += sq_stats->bytes;
> > + }
> > + }
> > +
> > + if (priv->tx_ptp_opened) {
> > + /* only report PTP TCs if it was opened but is now closed */
> > + if (priv->channels.ptp && !test_bit(MLX5E_PTP_STATE_TX, priv->channels.ptp->state)) {
> > + for (tc = 0; tc < priv->channels.ptp->num_tc; tc++) {
> > + struct mlx5e_sq_stats *sq_stats = &priv->ptp_stats.sq[tc];
> > +
> > + tx->packets += sq_stats->packets;
> > + tx->bytes += sq_stats->bytes;
> > + }
> > + }
> > + }
> > +}
> > +
> > +static const struct netdev_stat_ops mlx5e_stat_ops = {
> > + .get_queue_stats_rx = mlx5e_get_queue_stats_rx,
> > + .get_queue_stats_tx = mlx5e_get_queue_stats_tx,
> > + .get_base_stats = mlx5e_get_base_stats,
> > +};
> > +
> > static void mlx5e_build_nic_netdev(struct net_device *netdev)
> > {
> > struct mlx5e_priv *priv = netdev_priv(netdev);
> > @@ -5296,6 +5433,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
> > netdev->watchdog_timeo = 15 * HZ;
> > + netdev->stat_ops = &mlx5e_stat_ops;
> > netdev->ethtool_ops = &mlx5e_ethtool_ops;
> > netdev->vlan_features |= NETIF_F_SG;

2024-06-07 00:19:55

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [RFC net-next v4 2/2] net/mlx5e: Add per queue netdev-genl stats

On Thu, 6 Jun 2024 14:54:40 -0700 Joe Damato wrote:
> > > Compare the values in /proc/net/dev match the output of cli for the same
> > > device, even while the device is down.
> > >
> > > Note that while the device is down, per queue stats output nothing
> > > (because the device is down there are no queues):
> >
> > This part is not true anymore.
>
> It is true with this patch applied and running the command below.
> Maybe I should have been more explicit that using cli.py outputs []
> when scope = queue, which could be an internal cli.py thing, but
> this is definitely true with this patch.
>
> Did you test it and get different results?

To avoid drivers having their own interpretations what "closed" means,
core hides all queues in closed state:

https://elixir.bootlin.com/linux/v6.10-rc1/source/net/core/netdev-genl.c#L582

> > PTP RQ index is naively assigned to zero:
> > rq->ix = MLX5E_PTP_CHANNEL_IX;
> >
> > but this isn't to be used as the stats index.
> > Today, the PTP-RQ has no matcing rxq in the kernel level.
> > i.e. turning PTP-RQ on won't add a kernel-level RXQ to the
> > real_num_rx_queues.
> > Maybe we better do.
> > If not, and the current state is kept, the best we can do is let the PTP-RQ
> > naively contribute its queue-stat to channel 0.
>
> OK, it sounds like the easiest thing to do is just count PTP as
> channel 0, so if i == 0, I'll in the PTP stats.
>
> But please see below regarding testing whether or not PTP is
> actually enabled or not.

If we can I think we should avoid making queue 0 too special.
If someone configures steering and only expects certain packets on
queue 0 - getting PTP counted there will be a surprise.
I vote to always count it towards base.

2024-06-07 01:02:24

by Joe Damato

[permalink] [raw]
Subject: Re: [RFC net-next v4 2/2] net/mlx5e: Add per queue netdev-genl stats

On Thu, Jun 06, 2024 at 05:19:42PM -0700, Jakub Kicinski wrote:
> On Thu, 6 Jun 2024 14:54:40 -0700 Joe Damato wrote:
> > > > Compare the values in /proc/net/dev match the output of cli for the same
> > > > device, even while the device is down.
> > > >
> > > > Note that while the device is down, per queue stats output nothing
> > > > (because the device is down there are no queues):
> > >
> > > This part is not true anymore.
> >
> > It is true with this patch applied and running the command below.
> > Maybe I should have been more explicit that using cli.py outputs []
> > when scope = queue, which could be an internal cli.py thing, but
> > this is definitely true with this patch.
> >
> > Did you test it and get different results?
>
> To avoid drivers having their own interpretations what "closed" means,
> core hides all queues in closed state:
>
> https://elixir.bootlin.com/linux/v6.10-rc1/source/net/core/netdev-genl.c#L582
>
> > > PTP RQ index is naively assigned to zero:
> > > rq->ix = MLX5E_PTP_CHANNEL_IX;
> > >
> > > but this isn't to be used as the stats index.
> > > Today, the PTP-RQ has no matcing rxq in the kernel level.
> > > i.e. turning PTP-RQ on won't add a kernel-level RXQ to the
> > > real_num_rx_queues.
> > > Maybe we better do.
> > > If not, and the current state is kept, the best we can do is let the PTP-RQ
> > > naively contribute its queue-stat to channel 0.
> >
> > OK, it sounds like the easiest thing to do is just count PTP as
> > channel 0, so if i == 0, I'll in the PTP stats.
> >
> > But please see below regarding testing whether or not PTP is
> > actually enabled or not.
>
> If we can I think we should avoid making queue 0 too special.
> If someone configures steering and only expects certain packets on
> queue 0 - getting PTP counted there will be a surprise.
> I vote to always count it towards base.

I'm OK with reporting PTP RX in base and only in base.

But, that would then leave PTP TX:

PTP TX stats are reported in mlx5e_get_queue_stats_tx because
the user will pass in an 'i' which refers to the PTP txq. This works
fine with the mlx5e_get_queue_stats_tx code as-is because the PTP
txqs are mapped in the new priv->txq2sq_stats array.

However.... if PTP is enabled and then disabled by the user, that
leaves us in this state:

priv->tx_ptp_opened && !test_bit(MLX5E_PTP_STATE_TX, channels.ptp->state)

e.g. PTP TX was opened at some point but is currently disabled as
the bit is unset.

In this case, when the txq2sq_stats map is built, it'll exclude PTP
stats struct from that mapping if MLX5E_PTP_STATE_TX is not set.

So, in this case, the stats have to be reported in base with
something like this (psuedo code):

if (priv->tx_ptp_opened &&
! test_bit(MLX5E_PTP_STATE_TX, channels.ptp->state)) {
for (tc = 0; tc < priv->channels.ptp->num_tc; tc++) {
tx->packets += ...ptp_stats.sq[tc].packets;
tx->bytes += ...ptp_stats.sq[tc].bytes;
}
}

Right? Or am I just way off here?

2024-06-07 07:46:46

by Tariq Toukan

[permalink] [raw]
Subject: Re: [RFC net-next v4 2/2] net/mlx5e: Add per queue netdev-genl stats


On 07/06/2024 3:19, Jakub Kicinski wrote:
> On Thu, 6 Jun 2024 14:54:40 -0700 Joe Damato wrote:
>>>> Compare the values in /proc/net/dev match the output of cli for the same
>>>> device, even while the device is down.
>>>>
>>>> Note that while the device is down, per queue stats output nothing
>>>> (because the device is down there are no queues):
>>>
>>> This part is not true anymore.
>>
>> It is true with this patch applied and running the command below.
>> Maybe I should have been more explicit that using cli.py outputs []
>> when scope = queue, which could be an internal cli.py thing, but
>> this is definitely true with this patch.
>>
>> Did you test it and get different results?
>
> To avoid drivers having their own interpretations what "closed" means,
> core hides all queues in closed state:
>
> https://elixir.bootlin.com/linux/v6.10-rc1/source/net/core/netdev-genl.c#L582
>

Oh, so the kernel doesn't even call the driver's
mlx5e_get_queue_stats_rx/tx callbacks if interface is down. Although our
driver can easily satisfy the query and provide the stats.

I think the kernel here makes some design assumption about the stats,
and enforces it on all vendor drivers.
I don't think it's a matter of "closed channel" interpretation, it's
more about persistent stats.
IMO the kernel should be generic enough to let both designs (persistent
and non-persistent stats) integrate naturally with this new queue
netdev-genl stats feature.

>>> PTP RQ index is naively assigned to zero:
>>> rq->ix = MLX5E_PTP_CHANNEL_IX;
>>>
>>> but this isn't to be used as the stats index.
>>> Today, the PTP-RQ has no matcing rxq in the kernel level.
>>> i.e. turning PTP-RQ on won't add a kernel-level RXQ to the
>>> real_num_rx_queues.
>>> Maybe we better do.
>>> If not, and the current state is kept, the best we can do is let the PTP-RQ
>>> naively contribute its queue-stat to channel 0.
>>
>> OK, it sounds like the easiest thing to do is just count PTP as
>> channel 0, so if i == 0, I'll in the PTP stats.
>>
>> But please see below regarding testing whether or not PTP is
>> actually enabled or not.
>
> If we can I think we should avoid making queue 0 too special.
> If someone configures steering and only expects certain packets on
> queue 0 - getting PTP counted there will be a surprise.
> I vote to always count it towards base.

+1, let's count PTP RX in the base, especially that it has no matching
kernel-level rxq.

Another option is to add one more kernel rxq for it (i.e. set
real_num_rx_queues to num_channels + 1). But, that would be a bigger
change, we can keep it for a followup discussion.

2024-06-07 07:54:20

by Tariq Toukan

[permalink] [raw]
Subject: Re: [RFC net-next v4 2/2] net/mlx5e: Add per queue netdev-genl stats



On 07/06/2024 4:02, Joe Damato wrote:
> On Thu, Jun 06, 2024 at 05:19:42PM -0700, Jakub Kicinski wrote:
>> On Thu, 6 Jun 2024 14:54:40 -0700 Joe Damato wrote:
>>>>> Compare the values in /proc/net/dev match the output of cli for the same
>>>>> device, even while the device is down.
>>>>>
>>>>> Note that while the device is down, per queue stats output nothing
>>>>> (because the device is down there are no queues):
>>>>
>>>> This part is not true anymore.
>>>
>>> It is true with this patch applied and running the command below.
>>> Maybe I should have been more explicit that using cli.py outputs []
>>> when scope = queue, which could be an internal cli.py thing, but
>>> this is definitely true with this patch.
>>>
>>> Did you test it and get different results?
>>
>> To avoid drivers having their own interpretations what "closed" means,
>> core hides all queues in closed state:
>>
>> https://elixir.bootlin.com/linux/v6.10-rc1/source/net/core/netdev-genl.c#L582
>>
>>>> PTP RQ index is naively assigned to zero:
>>>> rq->ix = MLX5E_PTP_CHANNEL_IX;
>>>>
>>>> but this isn't to be used as the stats index.
>>>> Today, the PTP-RQ has no matcing rxq in the kernel level.
>>>> i.e. turning PTP-RQ on won't add a kernel-level RXQ to the
>>>> real_num_rx_queues.
>>>> Maybe we better do.
>>>> If not, and the current state is kept, the best we can do is let the PTP-RQ
>>>> naively contribute its queue-stat to channel 0.
>>>
>>> OK, it sounds like the easiest thing to do is just count PTP as
>>> channel 0, so if i == 0, I'll in the PTP stats.
>>>
>>> But please see below regarding testing whether or not PTP is
>>> actually enabled or not.
>>
>> If we can I think we should avoid making queue 0 too special.
>> If someone configures steering and only expects certain packets on
>> queue 0 - getting PTP counted there will be a surprise.
>> I vote to always count it towards base.
>
> I'm OK with reporting PTP RX in base and only in base.
>
> But, that would then leave PTP TX:
>

Right, currently there's no consistency between the PTP RX/TX
accountment in real_num_queues.
I don't want to create more work for you, but IMO in the longterm I
should follow it up with a patch that adds PTP-RX to real_num_rx_queues.

> PTP TX stats are reported in mlx5e_get_queue_stats_tx because
> the user will pass in an 'i' which refers to the PTP txq. This works
> fine with the mlx5e_get_queue_stats_tx code as-is because the PTP
> txqs are mapped in the new priv->txq2sq_stats array.
>
> However.... if PTP is enabled and then disabled by the user, that
> leaves us in this state:
>
> priv->tx_ptp_opened && !test_bit(MLX5E_PTP_STATE_TX, channels.ptp->state)
>
> e.g. PTP TX was opened at some point but is currently disabled as
> the bit is unset.
>
> In this case, when the txq2sq_stats map is built, it'll exclude PTP
> stats struct from that mapping if MLX5E_PTP_STATE_TX is not set.
>
> So, in this case, the stats have to be reported in base with
> something like this (psuedo code):
>
> if (priv->tx_ptp_opened &&
> ! test_bit(MLX5E_PTP_STATE_TX, channels.ptp->state)) {
> for (tc = 0; tc < priv->channels.ptp->num_tc; tc++) {

Do not take num_tc from here, this ptp memory might not exist at this point.
Calculate it the regular way with max_opened_tc.

> tx->packets += ...ptp_stats.sq[tc].packets;
> tx->bytes += ...ptp_stats.sq[tc].bytes;
> }
> }
>
> Right? Or am I just way off here?

2024-06-07 20:50:59

by Joe Damato

[permalink] [raw]
Subject: Re: [RFC net-next v4 2/2] net/mlx5e: Add per queue netdev-genl stats

On Thu, Jun 06, 2024 at 11:11:57PM +0300, Tariq Toukan wrote:
>
>
> On 04/06/2024 3:46, Joe Damato wrote:
> > ./cli.py --spec netlink/specs/netdev.yaml \
> > --dump qstats-get --json '{"scope": "queue"}'
> >
> > ...snip
> >
> > {'ifindex': 7,
> > 'queue-id': 62,
> > 'queue-type': 'rx',
> > 'rx-alloc-fail': 0,
> > 'rx-bytes': 105965251,
> > 'rx-packets': 179790},
> > {'ifindex': 7,
> > 'queue-id': 0,
> > 'queue-type': 'tx',
> > 'tx-bytes': 9402665,
> > 'tx-packets': 17551},
> >
> > ...snip
> >
> > Also tested with the script tools/testing/selftests/drivers/net/stats.py
> > in several scenarios to ensure stats tallying was correct:
> >
> > - on boot (default queue counts)
> > - adjusting queue count up or down (ethtool -L eth0 combined ...)
> >
> > The tools/testing/selftests/drivers/net/stats.py brings the device up,
> > so to test with the device down, I did the following:
> >
> > $ ip link show eth4
> > 7: eth4: <BROADCAST,MULTICAST> mtu 9000 qdisc mq state DOWN [..snip..]
> > [..snip..]
> >
> > $ cat /proc/net/dev | grep eth4
> > eth4: 235710489 434811 [..snip rx..] 2878744 21227 [..snip tx..]
> >
> > $ ./cli.py --spec ../../../Documentation/netlink/specs/netdev.yaml \
> > --dump qstats-get --json '{"ifindex": 7}'
> > [{'ifindex': 7,
> > 'rx-alloc-fail': 0,
> > 'rx-bytes': 235710489,
> > 'rx-packets': 434811,
> > 'tx-bytes': 2878744,
> > 'tx-packets': 21227}]
> >
> > Compare the values in /proc/net/dev match the output of cli for the same
> > device, even while the device is down.
> >
> > Note that while the device is down, per queue stats output nothing
> > (because the device is down there are no queues):
>
> This part is not true anymore.
>
> >
> > $ ./cli.py --spec ../../../Documentation/netlink/specs/netdev.yaml \
> > --dump qstats-get --json '{"scope": "queue", "ifindex": 7}'
> > []
> >
> > Signed-off-by: Joe Damato <[email protected]>
> > ---
> > .../net/ethernet/mellanox/mlx5/core/en_main.c | 138 ++++++++++++++++++
> > 1 file changed, 138 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > index d03fd1c98eb6..76d64bbcf250 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > @@ -39,6 +39,7 @@
> > #include <linux/debugfs.h>
> > #include <linux/if_bridge.h>
> > #include <linux/filter.h>
> > +#include <net/netdev_queues.h>
> > #include <net/page_pool/types.h>
> > #include <net/pkt_sched.h>
> > #include <net/xdp_sock_drv.h>
> > @@ -5279,6 +5280,142 @@ static bool mlx5e_tunnel_any_tx_proto_supported(struct mlx5_core_dev *mdev)
> > return (mlx5_vxlan_allowed(mdev->vxlan) || mlx5_geneve_tx_allowed(mdev));
> > }
> > +static void mlx5e_get_queue_stats_rx(struct net_device *dev, int i,
> > + struct netdev_queue_stats_rx *stats)
> > +{
> > + struct mlx5e_priv *priv = netdev_priv(dev);
> > + struct mlx5e_channel_stats *channel_stats;
> > + struct mlx5e_rq_stats *xskrq_stats;
> > + struct mlx5e_rq_stats *rq_stats;
> > +
> > + ASSERT_RTNL();
> > + if (mlx5e_is_uplink_rep(priv))
> > + return;
> > +
> > + /* ptp was ever opened, is currently open, and channel index matches i
> > + * then export stats
> > + */
> > + if (priv->rx_ptp_opened && priv->channels.ptp) {
> > + if (test_bit(MLX5E_PTP_STATE_RX, priv->channels.ptp->state) &&
> > + priv->channels.ptp->rq.ix == i) {
>
> PTP RQ index is naively assigned to zero:
> rq->ix = MLX5E_PTP_CHANNEL_IX;
>
> but this isn't to be used as the stats index.
> Today, the PTP-RQ has no matcing rxq in the kernel level.
> i.e. turning PTP-RQ on won't add a kernel-level RXQ to the
> real_num_rx_queues.
> Maybe we better do.
> If not, and the current state is kept, the best we can do is let the PTP-RQ
> naively contribute its queue-stat to channel 0.
>
> > + rq_stats = &priv->ptp_stats.rq;
> > + stats->packets = rq_stats->packets;
> > + stats->bytes = rq_stats->bytes;
> > + stats->alloc_fail = rq_stats->buff_alloc_err;
> > + return;
> > + }
> > + }
> > +
> > + channel_stats = priv->channel_stats[i];
> > + xskrq_stats = &channel_stats->xskrq;
> > + rq_stats = &channel_stats->rq;
> > +
> > + stats->packets = rq_stats->packets + xskrq_stats->packets;
> > + stats->bytes = rq_stats->bytes + xskrq_stats->bytes;
> > + stats->alloc_fail = rq_stats->buff_alloc_err +
> > + xskrq_stats->buff_alloc_err;
> > +}
> > +
> > +static void mlx5e_get_queue_stats_tx(struct net_device *dev, int i,
> > + struct netdev_queue_stats_tx *stats)
> > +{
> > + struct mlx5e_priv *priv = netdev_priv(dev);
> > + struct mlx5e_sq_stats *sq_stats;
> > +
> > + ASSERT_RTNL();
> > + /* no special case needed for ptp htb etc since txq2sq_stats is kept up
> > + * to date for active sq_stats, otherwise get_base_stats takes care of
> > + * inactive sqs.
> > + */
> > + sq_stats = priv->txq2sq_stats[i];
> > + stats->packets = sq_stats->packets;
> > + stats->bytes = sq_stats->bytes;
> > +}
> > +
> > +static void mlx5e_get_base_stats(struct net_device *dev,
> > + struct netdev_queue_stats_rx *rx,
> > + struct netdev_queue_stats_tx *tx)
> > +{
> > + struct mlx5e_priv *priv = netdev_priv(dev);
> > + int i, tc;
> > +
> > + ASSERT_RTNL();
> > + if (!mlx5e_is_uplink_rep(priv)) {
> > + rx->packets = 0;
> > + rx->bytes = 0;
> > + rx->alloc_fail = 0;
> > +
> > + for (i = priv->channels.params.num_channels; i < priv->stats_nch; i++) {
> > + struct netdev_queue_stats_rx rx_i = {0};
> > +
> > + mlx5e_get_queue_stats_rx(dev, i, &rx_i);
> > +
> > + rx->packets += rx_i.packets;
> > + rx->bytes += rx_i.bytes;
> > + rx->alloc_fail += rx_i.alloc_fail;
> > + }
> > +
> > + if (priv->rx_ptp_opened) {
> > + /* if PTP was opened, but is not currently open, then
> > + * report the stats here. otherwise,
> > + * mlx5e_get_queue_stats_rx will get it
> > + */
>
> We shouldn't care if the RQ is currently open. The stats are always there.
> This applies to all RQs and SQs.
>
> > + if (priv->channels.ptp &&
> > + !test_bit(MLX5E_PTP_STATE_RX, priv->channels.ptp->state)) {
> > + struct mlx5e_rq_stats *rq_stats = &priv->ptp_stats.rq;
> > +
> > + rx->packets += rq_stats->packets;
> > + rx->bytes += rq_stats->bytes;
> > + }
> > + }
> > + }
> > +
> > + tx->packets = 0;
> > + tx->bytes = 0;
> > +
> > + for (i = 0; i < priv->stats_nch; i++) {
> > + struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i];
> > +
> > + /* while iterating through all channels [0, stats_nch], there
> > + * are two cases to handle:
> > + *
> > + * 1. the channel is available, so sum only the unavailable TCs
> > + * [mlx5e_get_dcb_num_tc, max_opened_tc).
> > + *
> > + * 2. the channel is unavailable, so sum all TCs [0, max_opened_tc).
> > + */
>
> Even if the channel is not available, mlx5e_get_queue_stats_tx() accesses
> and returns its stats.

Thanks for your review; I do sincerely appreciate it.

I'm sorry, but either I'm misunderstanding something or I simply
disagree with you.

Imagine a simple case with 64 queues at boot, no QoS/HTB, no PTP.

At boot, [0..63] are the txq_ixs that will be passed in as i to
mlx5e_get_queue_stats_tx.

The user then reduces the queues to 4 via ethtool.

Now [0..3] are the txq_ixs that will be passed in as i to
mlx5e_get_queue_stats_tx.

In both cases: [0..real_num_tx_queues) are valid "i" that
mlx5e_get_queue_stats_tx will have stats for.

Now, consider the code for mlx5e_get_dcb_num_tc (from
drivers/net/ethernet/mellanox/mlx5/core/en.h):

return params->mqprio.mode == TC_MQPRIO_MODE_DCB ?
params->mqprio.num_tc : 1;

In our simple case calling mlx5e_get_dcb_num_tc returns 1.
In mlx5e_priv_init, the code sets priv->max_opened_tc = 1.

If we simply do a loop like this:

[0...stats->n_ch)
[ mlx5e_get_dcb_num_tc ... max_opened_tc )
[...collect_tx_stats...]

We won't collect any stats for channels 4..63 in our example above
because the inner loop because [1..1) which does nothing.

To confirm this, I tested the above loop anyway and the test script
showed that stats were missing:

NETIF=eth4 tools/testing/selftests/drivers/net/stats.py

# Exception| Exception: Qstats are lower, fetched later
not ok 3 stats.pkt_byte_sum

I think the original code for TX stats in base for deactivated
queues may be correct.

Another way to explain the problem: any queue from [4..63] will be
gone, so we need to accumulate the stats from all TCs from 0 to
max_opened_tc (which in our example is 1).

Can you let me know if you agree? I would like to submit a real v5
which will include:
- output PTP RX in base always if it was ever opened
- output PTP TX in base only if it was ever opened and ptp is NULL
or the bit is unset
- leave the existing TX queue code in base from this RFC v4 (other
than the changes to PTP TX)

Because in my tests using:

NETIF=eth4 tools/testing/selftests/drivers/net/stats.py

shows that stats match:

KTAP version 1
1..4
ok 1 stats.check_pause
ok 2 stats.check_fec
ok 3 stats.pkt_byte_sum
ok 4 stats.qstat_by_ifindex
# Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0

>
> > + if (i < priv->channels.params.num_channels)
> > + tc = mlx5e_get_dcb_num_tc(&priv->channels.params);
> > + else
> > + tc = 0;

My original code above and below maybe is not clear, but it is
saying:

- for queues which have been deactivated, sum up all their TCs
(since their txq_ix is no longer in the range of
[0..real_num_tx_queues).

- for queues which have NOT been deactivated, sum up all the
inactive TCs.

Based on my tests, I believe this to be correct.

> > + for (; tc < priv->max_opened_tc; tc++) {
> > + struct mlx5e_sq_stats *sq_stats = &channel_stats->sq[tc];
> > +
> > + tx->packets += sq_stats->packets;
> > + tx->bytes += sq_stats->bytes;
> > + }
> > + }
> > +

2024-06-11 00:27:23

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [RFC net-next v4 2/2] net/mlx5e: Add per queue netdev-genl stats

On Fri, 7 Jun 2024 10:53:55 +0300 Tariq Toukan wrote:
> I don't want to create more work for you, but IMO in the longterm I
> should follow it up with a patch that adds PTP-RX to real_num_rx_queues.

Please don't. real_num_*x_queues is basically uAPI. Let's not modify
behavior at the uAPI level to work around deficiencies of the device.

2024-06-12 20:14:21

by Joe Damato

[permalink] [raw]
Subject: Re: [RFC net-next v4 2/2] net/mlx5e: Add per queue netdev-genl stats

On Fri, Jun 07, 2024 at 01:50:37PM -0700, Joe Damato wrote:
> On Thu, Jun 06, 2024 at 11:11:57PM +0300, Tariq Toukan wrote:
> >
> >
> > On 04/06/2024 3:46, Joe Damato wrote:
> > > ./cli.py --spec netlink/specs/netdev.yaml \
> > > --dump qstats-get --json '{"scope": "queue"}'
> > >
> > > ...snip
> > >
> > > {'ifindex': 7,
> > > 'queue-id': 62,
> > > 'queue-type': 'rx',
> > > 'rx-alloc-fail': 0,
> > > 'rx-bytes': 105965251,
> > > 'rx-packets': 179790},
> > > {'ifindex': 7,
> > > 'queue-id': 0,
> > > 'queue-type': 'tx',
> > > 'tx-bytes': 9402665,
> > > 'tx-packets': 17551},
> > >
> > > ...snip
> > >
> > > Also tested with the script tools/testing/selftests/drivers/net/stats.py
> > > in several scenarios to ensure stats tallying was correct:
> > >
> > > - on boot (default queue counts)
> > > - adjusting queue count up or down (ethtool -L eth0 combined ...)
> > >
> > > The tools/testing/selftests/drivers/net/stats.py brings the device up,
> > > so to test with the device down, I did the following:
> > >
> > > $ ip link show eth4
> > > 7: eth4: <BROADCAST,MULTICAST> mtu 9000 qdisc mq state DOWN [..snip..]
> > > [..snip..]
> > >
> > > $ cat /proc/net/dev | grep eth4
> > > eth4: 235710489 434811 [..snip rx..] 2878744 21227 [..snip tx..]
> > >
> > > $ ./cli.py --spec ../../../Documentation/netlink/specs/netdev.yaml \
> > > --dump qstats-get --json '{"ifindex": 7}'
> > > [{'ifindex': 7,
> > > 'rx-alloc-fail': 0,
> > > 'rx-bytes': 235710489,
> > > 'rx-packets': 434811,
> > > 'tx-bytes': 2878744,
> > > 'tx-packets': 21227}]
> > >
> > > Compare the values in /proc/net/dev match the output of cli for the same
> > > device, even while the device is down.
> > >
> > > Note that while the device is down, per queue stats output nothing
> > > (because the device is down there are no queues):
> >
> > This part is not true anymore.
> >
> > >
> > > $ ./cli.py --spec ../../../Documentation/netlink/specs/netdev.yaml \
> > > --dump qstats-get --json '{"scope": "queue", "ifindex": 7}'
> > > []
> > >
> > > Signed-off-by: Joe Damato <[email protected]>
> > > ---
> > > .../net/ethernet/mellanox/mlx5/core/en_main.c | 138 ++++++++++++++++++
> > > 1 file changed, 138 insertions(+)
> > >
> > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > > index d03fd1c98eb6..76d64bbcf250 100644
> > > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > > @@ -39,6 +39,7 @@
> > > #include <linux/debugfs.h>
> > > #include <linux/if_bridge.h>
> > > #include <linux/filter.h>
> > > +#include <net/netdev_queues.h>
> > > #include <net/page_pool/types.h>
> > > #include <net/pkt_sched.h>
> > > #include <net/xdp_sock_drv.h>
> > > @@ -5279,6 +5280,142 @@ static bool mlx5e_tunnel_any_tx_proto_supported(struct mlx5_core_dev *mdev)
> > > return (mlx5_vxlan_allowed(mdev->vxlan) || mlx5_geneve_tx_allowed(mdev));
> > > }
> > > +static void mlx5e_get_queue_stats_rx(struct net_device *dev, int i,
> > > + struct netdev_queue_stats_rx *stats)
> > > +{
> > > + struct mlx5e_priv *priv = netdev_priv(dev);
> > > + struct mlx5e_channel_stats *channel_stats;
> > > + struct mlx5e_rq_stats *xskrq_stats;
> > > + struct mlx5e_rq_stats *rq_stats;
> > > +
> > > + ASSERT_RTNL();
> > > + if (mlx5e_is_uplink_rep(priv))
> > > + return;
> > > +
> > > + /* ptp was ever opened, is currently open, and channel index matches i
> > > + * then export stats
> > > + */
> > > + if (priv->rx_ptp_opened && priv->channels.ptp) {
> > > + if (test_bit(MLX5E_PTP_STATE_RX, priv->channels.ptp->state) &&
> > > + priv->channels.ptp->rq.ix == i) {
> >
> > PTP RQ index is naively assigned to zero:
> > rq->ix = MLX5E_PTP_CHANNEL_IX;
> >
> > but this isn't to be used as the stats index.
> > Today, the PTP-RQ has no matcing rxq in the kernel level.
> > i.e. turning PTP-RQ on won't add a kernel-level RXQ to the
> > real_num_rx_queues.
> > Maybe we better do.
> > If not, and the current state is kept, the best we can do is let the PTP-RQ
> > naively contribute its queue-stat to channel 0.
> >
> > > + rq_stats = &priv->ptp_stats.rq;
> > > + stats->packets = rq_stats->packets;
> > > + stats->bytes = rq_stats->bytes;
> > > + stats->alloc_fail = rq_stats->buff_alloc_err;
> > > + return;
> > > + }
> > > + }
> > > +
> > > + channel_stats = priv->channel_stats[i];
> > > + xskrq_stats = &channel_stats->xskrq;
> > > + rq_stats = &channel_stats->rq;
> > > +
> > > + stats->packets = rq_stats->packets + xskrq_stats->packets;
> > > + stats->bytes = rq_stats->bytes + xskrq_stats->bytes;
> > > + stats->alloc_fail = rq_stats->buff_alloc_err +
> > > + xskrq_stats->buff_alloc_err;
> > > +}
> > > +
> > > +static void mlx5e_get_queue_stats_tx(struct net_device *dev, int i,
> > > + struct netdev_queue_stats_tx *stats)
> > > +{
> > > + struct mlx5e_priv *priv = netdev_priv(dev);
> > > + struct mlx5e_sq_stats *sq_stats;
> > > +
> > > + ASSERT_RTNL();
> > > + /* no special case needed for ptp htb etc since txq2sq_stats is kept up
> > > + * to date for active sq_stats, otherwise get_base_stats takes care of
> > > + * inactive sqs.
> > > + */
> > > + sq_stats = priv->txq2sq_stats[i];
> > > + stats->packets = sq_stats->packets;
> > > + stats->bytes = sq_stats->bytes;
> > > +}
> > > +
> > > +static void mlx5e_get_base_stats(struct net_device *dev,
> > > + struct netdev_queue_stats_rx *rx,
> > > + struct netdev_queue_stats_tx *tx)
> > > +{
> > > + struct mlx5e_priv *priv = netdev_priv(dev);
> > > + int i, tc;
> > > +
> > > + ASSERT_RTNL();
> > > + if (!mlx5e_is_uplink_rep(priv)) {
> > > + rx->packets = 0;
> > > + rx->bytes = 0;
> > > + rx->alloc_fail = 0;
> > > +
> > > + for (i = priv->channels.params.num_channels; i < priv->stats_nch; i++) {
> > > + struct netdev_queue_stats_rx rx_i = {0};
> > > +
> > > + mlx5e_get_queue_stats_rx(dev, i, &rx_i);
> > > +
> > > + rx->packets += rx_i.packets;
> > > + rx->bytes += rx_i.bytes;
> > > + rx->alloc_fail += rx_i.alloc_fail;
> > > + }
> > > +
> > > + if (priv->rx_ptp_opened) {
> > > + /* if PTP was opened, but is not currently open, then
> > > + * report the stats here. otherwise,
> > > + * mlx5e_get_queue_stats_rx will get it
> > > + */
> >
> > We shouldn't care if the RQ is currently open. The stats are always there.
> > This applies to all RQs and SQs.
> >
> > > + if (priv->channels.ptp &&
> > > + !test_bit(MLX5E_PTP_STATE_RX, priv->channels.ptp->state)) {
> > > + struct mlx5e_rq_stats *rq_stats = &priv->ptp_stats.rq;
> > > +
> > > + rx->packets += rq_stats->packets;
> > > + rx->bytes += rq_stats->bytes;
> > > + }
> > > + }
> > > + }
> > > +
> > > + tx->packets = 0;
> > > + tx->bytes = 0;
> > > +
> > > + for (i = 0; i < priv->stats_nch; i++) {
> > > + struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i];
> > > +
> > > + /* while iterating through all channels [0, stats_nch], there
> > > + * are two cases to handle:
> > > + *
> > > + * 1. the channel is available, so sum only the unavailable TCs
> > > + * [mlx5e_get_dcb_num_tc, max_opened_tc).
> > > + *
> > > + * 2. the channel is unavailable, so sum all TCs [0, max_opened_tc).
> > > + */
> >
> > Even if the channel is not available, mlx5e_get_queue_stats_tx() accesses
> > and returns its stats.
>
> Thanks for your review; I do sincerely appreciate it.
>
> I'm sorry, but either I'm misunderstanding something or I simply
> disagree with you.
>
> Imagine a simple case with 64 queues at boot, no QoS/HTB, no PTP.
>
> At boot, [0..63] are the txq_ixs that will be passed in as i to
> mlx5e_get_queue_stats_tx.
>
> The user then reduces the queues to 4 via ethtool.
>
> Now [0..3] are the txq_ixs that will be passed in as i to
> mlx5e_get_queue_stats_tx.
>
> In both cases: [0..real_num_tx_queues) are valid "i" that
> mlx5e_get_queue_stats_tx will have stats for.
>
> Now, consider the code for mlx5e_get_dcb_num_tc (from
> drivers/net/ethernet/mellanox/mlx5/core/en.h):
>
> return params->mqprio.mode == TC_MQPRIO_MODE_DCB ?
> params->mqprio.num_tc : 1;
>
> In our simple case calling mlx5e_get_dcb_num_tc returns 1.
> In mlx5e_priv_init, the code sets priv->max_opened_tc = 1.
>
> If we simply do a loop like this:
>
> [0...stats->n_ch)
> [ mlx5e_get_dcb_num_tc ... max_opened_tc )
> [...collect_tx_stats...]
>
> We won't collect any stats for channels 4..63 in our example above
> because the inner loop because [1..1) which does nothing.
>
> To confirm this, I tested the above loop anyway and the test script
> showed that stats were missing:
>
> NETIF=eth4 tools/testing/selftests/drivers/net/stats.py
>
> # Exception| Exception: Qstats are lower, fetched later
> not ok 3 stats.pkt_byte_sum
>
> I think the original code for TX stats in base for deactivated
> queues may be correct.
>
> Another way to explain the problem: any queue from [4..63] will be
> gone, so we need to accumulate the stats from all TCs from 0 to
> max_opened_tc (which in our example is 1).
>
> Can you let me know if you agree? I would like to submit a real v5
> which will include:
> - output PTP RX in base always if it was ever opened
> - output PTP TX in base only if it was ever opened and ptp is NULL
> or the bit is unset
> - leave the existing TX queue code in base from this RFC v4 (other
> than the changes to PTP TX)
>
> Because in my tests using:
>
> NETIF=eth4 tools/testing/selftests/drivers/net/stats.py
>
> shows that stats match:
>
> KTAP version 1
> 1..4
> ok 1 stats.check_pause
> ok 2 stats.check_fec
> ok 3 stats.pkt_byte_sum
> ok 4 stats.qstat_by_ifindex
> # Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0

I didn't hear back on the above, so I've sent a v5 that leaves the
TX logic mostly the same (except for PTP).

As explained above, tests suggest that this is correct unless I am
missing something -- which I totally could be.

I've added a comment to the code to explain what it is doing, which
will hopefully make my thought process more clear.

Let's pick this back up with any comments on the v5 thread:

https://lore.kernel.org/netdev/[email protected]/

Thanks,
Joe