On Fri, May 03, 2024 at 12:55:41PM +0200, Zhu Yanjun wrote:
> On 03.05.24 04:25, Joe Damato wrote:
> > Hi:
> >
> > This is only 1 patch, so I know a cover letter isn't necessary, but it
> > seems there are a few things to mention.
> >
> > This change adds support for the per queue netdev-genl API to mlx5,
> > which seems to output stats:
> >
> > ./cli.py --spec ../../../Documentation/netlink/specs/netdev.yaml \
> > --dump qstats-get --json '{"scope": "queue"}'
> >
> > ...snip
> > {'ifindex': 7,
> > 'queue-id': 28,
> > 'queue-type': 'tx',
> > 'tx-bytes': 399462,
> > 'tx-packets': 3311},
> > ...snip
>
> Ethtool -S ethx can get the above information
> "
> ...
> tx-0.packets: 2094
> tx-0.bytes: 294141
> rx-0.packets: 2200
> rx-0.bytes: 267673
> ...
> "
>
> >
> > I've tried to use the tooling suggested to verify that the per queue
> > stats match the rtnl stats by doing this:
> >
> > NETIF=eth0 tools/testing/selftests/drivers/net/stats.py
> >
> > And the tool outputs that there is a failure:
> >
> > # Exception| Exception: Qstats are lower, fetched later
> > not ok 3 stats.pkt_byte_sum
>
> With ethtool, does the above problem still occur?
Thanks for the suggestion, with ethtool it seems correct using the same
logic as the test, I understand correctly.
The failing test fetches rtnl first then qstats, but sees lower qstats - the
test expects qstats to be equal or higher since they are read later. In order to
reproduce this with ethtool, I'd need to fetch with ethtool first and then
fetch qstats and compare.
A correct output would show equal or higher stats from qstats than ethtool
because there is minor delay in running the commands.
Here's a quick example using ethtool of what I get (note that in the output of
cli.py the bytes are printed before the packets):
$ ethtool -S eth0 | egrep '(rx[0-3]_(bytes|packets))' && \
echo "======" && \
./cli.py --spec ../../../Documentation/netlink/specs/netdev.yaml \
--dump qstats-get --json '{"scope": "queue", "ifindex": 7}' \
| egrep '(rx-(packets|bytes))'
rx0_packets: 10799916
rx0_bytes: 4949724904
rx1_packets: 26170804
rx1_bytes: 12694250232
rx2_packets: 11901885
rx2_bytes: 5593129387
rx3_packets: 13219784
rx3_bytes: 6151431963
======
'rx-bytes': 4949927222,
'rx-packets': 10800354},
'rx-bytes': 12694488803,
'rx-packets': 26171309},
'rx-bytes': 5593321247,
'rx-packets': 11902360},
'rx-bytes': 6151735533,
'rx-packets': 13220389},
So you can see that the numbers "look right", the qstats (collected by cli.py)
are collected later and are slightly larger, as expected:
rx0_packets from ethtool: 10799916
rx0_packets from cli.py: 10800354
rx0_bytes from ethtool: 4949724904
rx0_bytes from cli.py: 4949927222
So this looks correct to me and in this case I'd be more inclinded to assume
that RTNL on mlx5 is "overcounting" because:
1. it includes the PTP stats that I don't include in my qstats, and/or
2. some other reason I don't understand
> >
> > The other tests all pass (including stats.qstat_by_ifindex).
> >
> > This appears to mean that the netdev-genl queue stats have lower numbers
> > than the rtnl stats even though the rtnl stats are fetched first. I
> > added some debugging and found that both rx and tx bytes and packets are
> > slightly lower.
> >
> > The only explanations I can think of for this are:
> >
> > 1. tx_ptp_opened and rx_ptp_opened are both true, in which case
> > mlx5e_fold_sw_stats64 adds bytes and packets to the rtnl struct and
> > might account for the difference. I skip this case in my
> > implementation, so that could certainly explain it.
> > 2. Maybe I'm just misunderstanding how stats aggregation works in mlx5,
> > and that's why the numbers are slightly off?
> >
> > It appears that the driver uses a workqueue to queue stats updates which
> > happen periodically.
> >
> > 0. the driver occasionally calls queue_work on the update_stats_work
> > workqueue.
> > 1. This eventually calls MLX5E_DECLARE_STATS_GRP_OP_UPDATE_STATS(sw),
> > in drivers/net/ethernet/mellanox/mlx5/core/en_stats.c, which appears
> > to begin by first memsetting the internal stats struct where stats are
> > aggregated to zero. This would mean, I think, the get_base_stats
> > netdev-genl API implementation that I have is correct: simply set
> > everything to 0.... otherwise we'd end up double counting in the
> > netdev-genl RX and TX handlers.
> > 2. Next, each of the stats helpers are called to collect stats into the
> > freshly 0'd internal struct (for example:
> > mlx5e_stats_grp_sw_update_stats_rq_stats).
> >
> > That seems to be how stats are aggregated, which would suggest that if I
> > simply .... do what I'm doing in this change the numbers should line up.
> >
> > But they don't and its either because of PTP or because I am
> > misunderstanding/doing something wrong.
> >
> > Maybe the MLNX folks can suggest a hint?
> >
> > Thanks,
> > Joe
> >
> > Joe Damato (1):
> > net/mlx5e: Add per queue netdev-genl stats
> >
> > .../net/ethernet/mellanox/mlx5/core/en_main.c | 68 +++++++++++++++++++
> > 1 file changed, 68 insertions(+)
> >
>
On Fri, 3 May 2024 11:43:27 -0700 Joe Damato wrote:
> 1. it includes the PTP stats that I don't include in my qstats, and/or
> 2. some other reason I don't understand
Can you add the PTP stats to the "base" values?
I.e. inside mlx5e_get_base_stats()?
We should probably touch up the kdoc a little bit, but it sounds like
the sort of thing which would fall into the realm of "misc delta"
values:
diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
index c7ac4539eafc..f5d9f3ad5b66 100644
--- a/include/net/netdev_queues.h
+++ b/include/net/netdev_queues.h
@@ -59,6 +59,8 @@ struct netdev_queue_stats_tx {
* statistics will not generally add up to the total number of events for
* the device. The @get_base_stats callback allows filling in the delta
* between events for currently live queues and overall device history.
+ * @get_base_stats can also be used to report any miscellaneous packets
+ * transferred outside of the main set of queues used by the networking stack.
* When the statistics for the entire device are queried, first @get_base_stats
* is issued to collect the delta, and then a series of per-queue callbacks.
* Only statistics which are set in @get_base_stats will be reported
SG?
On Fri, May 03, 2024 at 02:58:08PM -0700, Jakub Kicinski wrote:
> On Fri, 3 May 2024 11:43:27 -0700 Joe Damato wrote:
> > 1. it includes the PTP stats that I don't include in my qstats, and/or
> > 2. some other reason I don't understand
>
> Can you add the PTP stats to the "base" values?
> I.e. inside mlx5e_get_base_stats()?
I tried adding them to rx and tx and mlx5e_get_base_stats (similar to what
mlx5e_fold_sw_stats64 does) and the test still fails.
Maybe something about the rtnl stats are what's off here and the queue
stats are fine?
FWIW: I spoke with the Mellanox folks off list several weeks ago and they
seemed to suggest skipping the PTP stats made the most sense.
I think at that time I didn't really understand get_base_stats that well,
so maybe we'd have come to a different conclusion then.
FWIW, here's what I tried and the rtnl vs qstat test still failed in
exactly the same way:
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -5337,10 +5337,25 @@ static void mlx5e_get_base_stats(struct net_device *dev,
rx->packets = 0;
rx->bytes = 0;
rx->alloc_fail = 0;
+ if (priv->rx_ptp_opened) {
+ struct mlx5e_rq_stats *rq_stats = &priv->ptp_stats.rq;
+ rx->packets = rq_stats->packets;
+ rx->bytes = rq_stats->bytes;
+ }
}
tx->packets = 0;
tx->bytes = 0;
+
+ if (priv->tx_ptp_opened) {
+ int i;
+ for (i = 0; i < priv->max_opened_tc; i++) {
+ struct mlx5e_sq_stats *sq_stats = &priv->ptp_stats.sq[i];
+
+ tx->packets += sq_stats->packets;
+ tx->bytes += sq_stats->bytes;
+ }
+ }
}
> We should probably touch up the kdoc a little bit, but it sounds like
> the sort of thing which would fall into the realm of "misc delta"
> values:
>
> diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
> index c7ac4539eafc..f5d9f3ad5b66 100644
> --- a/include/net/netdev_queues.h
> +++ b/include/net/netdev_queues.h
> @@ -59,6 +59,8 @@ struct netdev_queue_stats_tx {
> * statistics will not generally add up to the total number of events for
> * the device. The @get_base_stats callback allows filling in the delta
> * between events for currently live queues and overall device history.
> + * @get_base_stats can also be used to report any miscellaneous packets
> + * transferred outside of the main set of queues used by the networking stack.
> * When the statistics for the entire device are queried, first @get_base_stats
> * is issued to collect the delta, and then a series of per-queue callbacks.
> * Only statistics which are set in @get_base_stats will be reported
>
>
> SG?
I think that sounds good and makes sense, yea. By that definition, then I
should leave the PTP stats as shown above. If you agree, I'll add that
to the v2.
I feel like I should probably wait before sending a v2 with PTP included in
get_base_stats to see if the Mellanox folks have any hints about why rtnl
!= queue stats on mlx5?
What do you think?
On Fri, 3 May 2024 16:53:40 -0700 Joe Damato wrote:
> > diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
> > index c7ac4539eafc..f5d9f3ad5b66 100644
> > --- a/include/net/netdev_queues.h
> > +++ b/include/net/netdev_queues.h
> > @@ -59,6 +59,8 @@ struct netdev_queue_stats_tx {
> > * statistics will not generally add up to the total number of events for
> > * the device. The @get_base_stats callback allows filling in the delta
> > * between events for currently live queues and overall device history.
> > + * @get_base_stats can also be used to report any miscellaneous packets
> > + * transferred outside of the main set of queues used by the networking stack.
> > * When the statistics for the entire device are queried, first @get_base_stats
> > * is issued to collect the delta, and then a series of per-queue callbacks.
> > * Only statistics which are set in @get_base_stats will be reported
> >
> >
> > SG?
>
> I think that sounds good and makes sense, yea. By that definition, then I
> should leave the PTP stats as shown above. If you agree, I'll add that
> to the v2.
Yup, agreed.
> I feel like I should probably wait before sending a v2 with PTP included in
> get_base_stats to see if the Mellanox folks have any hints about why rtnl
> != queue stats on mlx5?
>
> What do you think?
Very odd, the code doesn't appear to be doing any magic :S Did you try
to print what the delta in values is? Does bringing the interface up and
down affect the size of it?
On Fri, May 03, 2024 at 05:34:29PM -0700, Jakub Kicinski wrote:
> On Fri, 3 May 2024 16:53:40 -0700 Joe Damato wrote:
> > > diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
> > > index c7ac4539eafc..f5d9f3ad5b66 100644
> > > --- a/include/net/netdev_queues.h
> > > +++ b/include/net/netdev_queues.h
> > > @@ -59,6 +59,8 @@ struct netdev_queue_stats_tx {
> > > * statistics will not generally add up to the total number of events for
> > > * the device. The @get_base_stats callback allows filling in the delta
> > > * between events for currently live queues and overall device history.
> > > + * @get_base_stats can also be used to report any miscellaneous packets
> > > + * transferred outside of the main set of queues used by the networking stack.
> > > * When the statistics for the entire device are queried, first @get_base_stats
> > > * is issued to collect the delta, and then a series of per-queue callbacks.
> > > * Only statistics which are set in @get_base_stats will be reported
> > >
> > >
> > > SG?
> >
> > I think that sounds good and makes sense, yea. By that definition, then I
> > should leave the PTP stats as shown above. If you agree, I'll add that
> > to the v2.
>
> Yup, agreed.
>
> > I feel like I should probably wait before sending a v2 with PTP included in
> > get_base_stats to see if the Mellanox folks have any hints about why rtnl
> > != queue stats on mlx5?
> >
> > What do you think?
>
> Very odd, the code doesn't appear to be doing any magic :S Did you try
> to print what the delta in values is? Does bringing the interface up and
> down affect the size of it?
I booted the kernel which includes PTP stats in the base stats as you've
suggested (as shown in the diff in this thread) and I've brought the
interface down and back up:
$ sudo ip link set dev eth0 down
$ sudo ip link set dev eth0 up
Re ran the test script, which includes some mild debugging print out I
added to show the delta for rx-packets (but I think all stats are off):
# Exception| Exception: Qstats are lower, fetched later
key: rx-packets rstat: 1192281902 qstat: 1186755777
key: rx-packets rstat: 1192281902 qstat: 1186755781
So qstat is lower by (1192281902 - 1186755781) = 5,526,121
Not really sure why, but I'll take another look at the code this morning to
see if I can figure out what's going on.
I'm clearly doing something wrong or misunderstanding something about the
accounting that will seem extremely obvious in retrospect.
On 06/05/2024 21:04, Joe Damato wrote:
> On Fri, May 03, 2024 at 05:34:29PM -0700, Jakub Kicinski wrote:
>> On Fri, 3 May 2024 16:53:40 -0700 Joe Damato wrote:
>>>> diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
>>>> index c7ac4539eafc..f5d9f3ad5b66 100644
>>>> --- a/include/net/netdev_queues.h
>>>> +++ b/include/net/netdev_queues.h
>>>> @@ -59,6 +59,8 @@ struct netdev_queue_stats_tx {
>>>> * statistics will not generally add up to the total number of events for
>>>> * the device. The @get_base_stats callback allows filling in the delta
>>>> * between events for currently live queues and overall device history.
>>>> + * @get_base_stats can also be used to report any miscellaneous packets
>>>> + * transferred outside of the main set of queues used by the networking stack.
>>>> * When the statistics for the entire device are queried, first @get_base_stats
>>>> * is issued to collect the delta, and then a series of per-queue callbacks.
>>>> * Only statistics which are set in @get_base_stats will be reported
>>>>
>>>>
>>>> SG?
>>>
>>> I think that sounds good and makes sense, yea. By that definition, then I
>>> should leave the PTP stats as shown above. If you agree, I'll add that
>>> to the v2.
>>
>> Yup, agreed.
>>
>>> I feel like I should probably wait before sending a v2 with PTP included in
>>> get_base_stats to see if the Mellanox folks have any hints about why rtnl
>>> != queue stats on mlx5?
>>>
>>> What do you think?
>>
>> Very odd, the code doesn't appear to be doing any magic :S Did you try
>> to print what the delta in values is? Does bringing the interface up and
>> down affect the size of it?
>
> I booted the kernel which includes PTP stats in the base stats as you've
> suggested (as shown in the diff in this thread) and I've brought the
> interface down and back up:
>
> $ sudo ip link set dev eth0 down
> $ sudo ip link set dev eth0 up
>
> Re ran the test script, which includes some mild debugging print out I
> added to show the delta for rx-packets (but I think all stats are off):
>
> # Exception| Exception: Qstats are lower, fetched later
>
> key: rx-packets rstat: 1192281902 qstat: 1186755777
> key: rx-packets rstat: 1192281902 qstat: 1186755781
>
> So qstat is lower by (1192281902 - 1186755781) = 5,526,121
>
> Not really sure why, but I'll take another look at the code this morning to
> see if I can figure out what's going on.
>
> I'm clearly doing something wrong or misunderstanding something about the
> accounting that will seem extremely obvious in retrospect.
Hi Joe,
Thanks for your patch.
Apologies for the late response. I was on PTO for some time.
From first look the patch looks okay. The overall approach seems correct.
The off-channels queues (like PTP) do not exist in default. So they are
out of the game unless you explicitly enables them.
A possible reason for this difference is the queues included in the sum.
Our stats are persistent across configuration changes, so they doesn't
reset when number of channels changes for example.
We keep stats entries for al ring indices that ever existed. Our driver
loops and sums up the stats for all of them, while the stack loops only
up to the current netdev->real_num_rx_queues.
Can this explain the diff here?
On Thu, May 09, 2024 at 12:40:01AM +0300, Tariq Toukan wrote:
>
>
> On 06/05/2024 21:04, Joe Damato wrote:
> > On Fri, May 03, 2024 at 05:34:29PM -0700, Jakub Kicinski wrote:
> > > On Fri, 3 May 2024 16:53:40 -0700 Joe Damato wrote:
> > > > > diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
> > > > > index c7ac4539eafc..f5d9f3ad5b66 100644
> > > > > --- a/include/net/netdev_queues.h
> > > > > +++ b/include/net/netdev_queues.h
> > > > > @@ -59,6 +59,8 @@ struct netdev_queue_stats_tx {
> > > > > * statistics will not generally add up to the total number of events for
> > > > > * the device. The @get_base_stats callback allows filling in the delta
> > > > > * between events for currently live queues and overall device history.
> > > > > + * @get_base_stats can also be used to report any miscellaneous packets
> > > > > + * transferred outside of the main set of queues used by the networking stack.
> > > > > * When the statistics for the entire device are queried, first @get_base_stats
> > > > > * is issued to collect the delta, and then a series of per-queue callbacks.
> > > > > * Only statistics which are set in @get_base_stats will be reported
> > > > >
> > > > >
> > > > > SG?
> > > >
> > > > I think that sounds good and makes sense, yea. By that definition, then I
> > > > should leave the PTP stats as shown above. If you agree, I'll add that
> > > > to the v2.
> > >
> > > Yup, agreed.
> > >
> > > > I feel like I should probably wait before sending a v2 with PTP included in
> > > > get_base_stats to see if the Mellanox folks have any hints about why rtnl
> > > > != queue stats on mlx5?
> > > >
> > > > What do you think?
> > >
> > > Very odd, the code doesn't appear to be doing any magic :S Did you try
> > > to print what the delta in values is? Does bringing the interface up and
> > > down affect the size of it?
> >
> > I booted the kernel which includes PTP stats in the base stats as you've
> > suggested (as shown in the diff in this thread) and I've brought the
> > interface down and back up:
> >
> > $ sudo ip link set dev eth0 down
> > $ sudo ip link set dev eth0 up
> >
> > Re ran the test script, which includes some mild debugging print out I
> > added to show the delta for rx-packets (but I think all stats are off):
> >
> > # Exception| Exception: Qstats are lower, fetched later
> >
> > key: rx-packets rstat: 1192281902 qstat: 1186755777
> > key: rx-packets rstat: 1192281902 qstat: 1186755781
> >
> > So qstat is lower by (1192281902 - 1186755781) = 5,526,121
> >
> > Not really sure why, but I'll take another look at the code this morning to
> > see if I can figure out what's going on.
> >
> > I'm clearly doing something wrong or misunderstanding something about the
> > accounting that will seem extremely obvious in retrospect.
>
> Hi Joe,
>
> Thanks for your patch.
> Apologies for the late response. I was on PTO for some time.
No worries, I hope you enjoyed your PTO. I appreciate your response, time,
and energy.
> From first look the patch looks okay. The overall approach seems correct.
Sounds good to me!
> The off-channels queues (like PTP) do not exist in default. So they are out
> of the game unless you explicitly enables them.
I did not enable them, but if you saw the thread, it sounds like Jakub's
preference is that in the v2 I include the PTP stats in get_base_stats.
Are you OK with that?
Are there other queue stats I should include as well?
> A possible reason for this difference is the queues included in the sum.
> Our stats are persistent across configuration changes, so they doesn't reset
> when number of channels changes for example.
>
> We keep stats entries for al ring indices that ever existed. Our driver
> loops and sums up the stats for all of them, while the stack loops only up
> to the current netdev->real_num_rx_queues.
>
> Can this explain the diff here?
Yes, that was it. Sorry I didn't realize this case. My lab machine runs a
script to adjust the queue count shortly after booting.
I disabled that and re-ran:
NETIF=eth0 tools/testing/selftests/drivers/net/stats.py
and all tests pass.
On Wed, 8 May 2024 16:24:08 -0700 Joe Damato wrote:
> > A possible reason for this difference is the queues included in the sum.
> > Our stats are persistent across configuration changes, so they doesn't reset
> > when number of channels changes for example.
> >
> > We keep stats entries for al ring indices that ever existed. Our driver
> > loops and sums up the stats for all of them, while the stack loops only up
> > to the current netdev->real_num_rx_queues.
> >
> > Can this explain the diff here?
>
> Yes, that was it. Sorry I didn't realize this case. My lab machine runs a
> script to adjust the queue count shortly after booting.
>
> I disabled that and re-ran:
>
> NETIF=eth0 tools/testing/selftests/drivers/net/stats.py
>
> and all tests pass.
Stating the obvious, perhaps, but in this case we should add the stats
from inactive queues to the base (which when the NIC is down means all
queues).
On Wed, May 08, 2024 at 05:56:38PM -0700, Jakub Kicinski wrote:
> On Wed, 8 May 2024 16:24:08 -0700 Joe Damato wrote:
> > > A possible reason for this difference is the queues included in the sum.
> > > Our stats are persistent across configuration changes, so they doesn't reset
> > > when number of channels changes for example.
> > >
> > > We keep stats entries for al ring indices that ever existed. Our driver
> > > loops and sums up the stats for all of them, while the stack loops only up
> > > to the current netdev->real_num_rx_queues.
> > >
> > > Can this explain the diff here?
> >
> > Yes, that was it. Sorry I didn't realize this case. My lab machine runs a
> > script to adjust the queue count shortly after booting.
> >
> > I disabled that and re-ran:
> >
> > NETIF=eth0 tools/testing/selftests/drivers/net/stats.py
> >
> > and all tests pass.
>
> Stating the obvious, perhaps, but in this case we should add the stats
> from inactive queues to the base (which when the NIC is down means all
> queues).
If I'm following that right and understanding mlx5 (two things I am
unlikely to do simultaneously), that sounds to me like:
- mlx5e_get_queue_stats_rx and mlx5e_get_queue_stats_tx check if i <
priv->channels.params.num_channels (instead of priv->stats_nch), and when
summing mlx5e_sq_stats in the latter function, it's up to
priv->channels.params.mqprio.num_tc instead of priv->max_opened_tc.
- mlx5e_get_base_stats accumulates and outputs stats for everything from
priv->channels.params.num_channels to priv->stats_nch, and
priv->channels.params.mqprio.num_tc to priv->max_opened_tc... which
should cover the inactive queues, I think.
Just writing that all out to avoid hacking up the wrong thing for the v2
and to reduce overall noise on the list :)
On Thu, 9 May 2024 01:57:52 +0000 Joe Damato wrote:
> If I'm following that right and understanding mlx5 (two things I am
> unlikely to do simultaneously), that sounds to me like:
>
> - mlx5e_get_queue_stats_rx and mlx5e_get_queue_stats_tx check if i <
> priv->channels.params.num_channels (instead of priv->stats_nch),
Yes, tho, not sure whether the "if i < ...num_channels" is even
necessary, as core already checks against real_num_rx_queues.
> and when
> summing mlx5e_sq_stats in the latter function, it's up to
> priv->channels.params.mqprio.num_tc instead of priv->max_opened_tc.
>
> - mlx5e_get_base_stats accumulates and outputs stats for everything from
> priv->channels.params.num_channels to priv->stats_nch, and
I'm not sure num_channels gets set to 0 when device is down so possibly
from "0 if down else ...num_channels" to stats_nch.
> priv->channels.params.mqprio.num_tc to priv->max_opened_tc... which
> should cover the inactive queues, I think.
>
> Just writing that all out to avoid hacking up the wrong thing for the v2
> and to reduce overall noise on the list :)
On Wed, May 08, 2024 at 07:08:39PM -0700, Jakub Kicinski wrote:
> On Thu, 9 May 2024 01:57:52 +0000 Joe Damato wrote:
> > If I'm following that right and understanding mlx5 (two things I am
> > unlikely to do simultaneously), that sounds to me like:
> >
> > - mlx5e_get_queue_stats_rx and mlx5e_get_queue_stats_tx check if i <
> > priv->channels.params.num_channels (instead of priv->stats_nch),
>
> Yes, tho, not sure whether the "if i < ...num_channels" is even
> necessary, as core already checks against real_num_rx_queues.
OK, I'll omit the i < ... check in the v2, then.
> > and when
> > summing mlx5e_sq_stats in the latter function, it's up to
> > priv->channels.params.mqprio.num_tc instead of priv->max_opened_tc.
> >
> > - mlx5e_get_base_stats accumulates and outputs stats for everything from
> > priv->channels.params.num_channels to priv->stats_nch, and
>
> I'm not sure num_channels gets set to 0 when device is down so possibly
> from "0 if down else ...num_channels" to stats_nch.
Yea, it looks like priv->channels.params.num_channels is untouched on
ndo_stop, but:
mlx5e_close (ndo_close)
mlx5e_close_locked
mlx5e_close_channels
priv->channels->num = 0;
and on open priv->channels->num is restored from 0 to
priv->channels.params.num_channels.
So, priv->channels->num to priv->stats_nch would be, I think, the inactive
queues. I'll give it a try locally real quick.
> > priv->channels.params.mqprio.num_tc to priv->max_opened_tc... which
> > should cover the inactive queues, I think.
> >
> > Just writing that all out to avoid hacking up the wrong thing for the v2
> > and to reduce overall noise on the list :)
On Wed, May 08, 2024 at 07:08:39PM -0700, Jakub Kicinski wrote:
> On Thu, 9 May 2024 01:57:52 +0000 Joe Damato wrote:
> > If I'm following that right and understanding mlx5 (two things I am
> > unlikely to do simultaneously), that sounds to me like:
> >
> > - mlx5e_get_queue_stats_rx and mlx5e_get_queue_stats_tx check if i <
> > priv->channels.params.num_channels (instead of priv->stats_nch),
>
> Yes, tho, not sure whether the "if i < ...num_channels" is even
> necessary, as core already checks against real_num_rx_queues.
>
> > and when
> > summing mlx5e_sq_stats in the latter function, it's up to
> > priv->channels.params.mqprio.num_tc instead of priv->max_opened_tc.
> >
> > - mlx5e_get_base_stats accumulates and outputs stats for everything from
> > priv->channels.params.num_channels to priv->stats_nch, and
>
> I'm not sure num_channels gets set to 0 when device is down so possibly
> from "0 if down else ...num_channels" to stats_nch.
Yea, you were right:
if (priv->channels.num == 0)
i = 0;
else
i = priv->channels.params.num_channels;
for (; i < priv->stats_nch; i++) {
Seems to be working now when I adjust the queue count and the test is
passing as I adjust the queue count up or down. Cool.
Adding TCs to the NIC triggers the test to fail, so there's still some bug
in how I'm accumulating stats from the hw TCs.
On 09/05/2024 9:30, Joe Damato wrote:
> On Wed, May 08, 2024 at 07:08:39PM -0700, Jakub Kicinski wrote:
>> On Thu, 9 May 2024 01:57:52 +0000 Joe Damato wrote:
>>> If I'm following that right and understanding mlx5 (two things I am
>>> unlikely to do simultaneously), that sounds to me like:
>>>
>>> - mlx5e_get_queue_stats_rx and mlx5e_get_queue_stats_tx check if i <
>>> priv->channels.params.num_channels (instead of priv->stats_nch),
>>
>> Yes, tho, not sure whether the "if i < ...num_channels" is even
>> necessary, as core already checks against real_num_rx_queues.
>>
>>> and when
>>> summing mlx5e_sq_stats in the latter function, it's up to
>>> priv->channels.params.mqprio.num_tc instead of priv->max_opened_tc.
>>>
>>> - mlx5e_get_base_stats accumulates and outputs stats for everything from
>>> priv->channels.params.num_channels to priv->stats_nch, and
>>
>> I'm not sure num_channels gets set to 0 when device is down so possibly
>> from "0 if down else ...num_channels" to stats_nch.
>
> Yea, you were right:
>
> if (priv->channels.num == 0)
> i = 0;
> else
> i = priv->channels.params.num_channels;
>
> for (; i < priv->stats_nch; i++) {
>
> Seems to be working now when I adjust the queue count and the test is
> passing as I adjust the queue count up or down. Cool.
>
I agree that get_base should include all inactive queues stats.
But it's not straight forward to implement.
A few guiding points:
Use mlx5e_get_dcb_num_tc(params) for current num_tc.
txq_ix (within the real_num_tx_queues) is calculated by c->ix + tc *
params->num_channels.
The txqsq stats struct is chosen by channel_stats[c->ix]->sq[tc].
It means, in the base stats you should include SQ stats for:
1. all SQs of non-active channels, i.e. ch in [params.num_channels,
priv->stats_nch), tc in [0, priv->max_opened_tc).
2. all SQs of non-active TCs in active channels [0,
params.num_channels), tc in [mlx5e_get_dcb_num_tc(params),
priv->max_opened_tc).
Now I actually see that the patch has issues in mlx5e_get_queue_stats_tx.
You should not loop over all TCs of channel index i.
You must do a reverse mapping from "i" to the pair/tuple [ch_ix, tc],
and then access a single TXQ stats by priv->channel_stats[ch_ix].sq[tc].
> Adding TCs to the NIC triggers the test to fail, so there's still some bug
> in how I'm accumulating stats from the hw TCs.
On Thu, May 09, 2024 at 01:16:15PM +0300, Tariq Toukan wrote:
>
>
> On 09/05/2024 9:30, Joe Damato wrote:
> > On Wed, May 08, 2024 at 07:08:39PM -0700, Jakub Kicinski wrote:
> > > On Thu, 9 May 2024 01:57:52 +0000 Joe Damato wrote:
> > > > If I'm following that right and understanding mlx5 (two things I am
> > > > unlikely to do simultaneously), that sounds to me like:
> > > >
> > > > - mlx5e_get_queue_stats_rx and mlx5e_get_queue_stats_tx check if i <
> > > > priv->channels.params.num_channels (instead of priv->stats_nch),
> > >
> > > Yes, tho, not sure whether the "if i < ...num_channels" is even
> > > necessary, as core already checks against real_num_rx_queues.
> > >
> > > > and when
> > > > summing mlx5e_sq_stats in the latter function, it's up to
> > > > priv->channels.params.mqprio.num_tc instead of priv->max_opened_tc.
> > > >
> > > > - mlx5e_get_base_stats accumulates and outputs stats for everything from
> > > > priv->channels.params.num_channels to priv->stats_nch, and
> > >
> > > I'm not sure num_channels gets set to 0 when device is down so possibly
> > > from "0 if down else ...num_channels" to stats_nch.
> >
> > Yea, you were right:
> >
> > if (priv->channels.num == 0)
> > i = 0;
> > else
> > i = priv->channels.params.num_channels;
> > for (; i < priv->stats_nch; i++) {
> >
> > Seems to be working now when I adjust the queue count and the test is
> > passing as I adjust the queue count up or down. Cool.
> >
>
> I agree that get_base should include all inactive queues stats.
> But it's not straight forward to implement.
>
> A few guiding points:
Thanks for the guiding points - it is very helpful.
> Use mlx5e_get_dcb_num_tc(params) for current num_tc.
>
> txq_ix (within the real_num_tx_queues) is calculated by c->ix + tc *
> params->num_channels.
>
> The txqsq stats struct is chosen by channel_stats[c->ix]->sq[tc].
>
> It means, in the base stats you should include SQ stats for:
> 1. all SQs of non-active channels, i.e. ch in [params.num_channels,
> priv->stats_nch), tc in [0, priv->max_opened_tc).
> 2. all SQs of non-active TCs in active channels [0, params.num_channels), tc
> in [mlx5e_get_dcb_num_tc(params), priv->max_opened_tc).
Thanks yea this is what I was working on last night -- I realized that I
need to include the non-active TCs on the active channels, too and have
some code that does that.
I'm still off slightly, but am giving it another look now.
> Now I actually see that the patch has issues in mlx5e_get_queue_stats_tx.
> You should not loop over all TCs of channel index i.
> You must do a reverse mapping from "i" to the pair/tuple [ch_ix, tc], and
> then access a single TXQ stats by priv->channel_stats[ch_ix].sq[tc].
OK, thanks for explaining that, I'll take a closer look at this as well.
> > Adding TCs to the NIC triggers the test to fail, so there's still some bug
> > in how I'm accumulating stats from the hw TCs.
On Thu, May 09, 2024 at 01:16:15PM +0300, Tariq Toukan wrote:
>
>
> On 09/05/2024 9:30, Joe Damato wrote:
> > On Wed, May 08, 2024 at 07:08:39PM -0700, Jakub Kicinski wrote:
> > > On Thu, 9 May 2024 01:57:52 +0000 Joe Damato wrote:
> > > > If I'm following that right and understanding mlx5 (two things I am
> > > > unlikely to do simultaneously), that sounds to me like:
> > > >
> > > > - mlx5e_get_queue_stats_rx and mlx5e_get_queue_stats_tx check if i <
> > > > priv->channels.params.num_channels (instead of priv->stats_nch),
> > >
> > > Yes, tho, not sure whether the "if i < ...num_channels" is even
> > > necessary, as core already checks against real_num_rx_queues.
> > >
> > > > and when
> > > > summing mlx5e_sq_stats in the latter function, it's up to
> > > > priv->channels.params.mqprio.num_tc instead of priv->max_opened_tc.
> > > >
> > > > - mlx5e_get_base_stats accumulates and outputs stats for everything from
> > > > priv->channels.params.num_channels to priv->stats_nch, and
> > >
> > > I'm not sure num_channels gets set to 0 when device is down so possibly
> > > from "0 if down else ...num_channels" to stats_nch.
> >
> > Yea, you were right:
> >
> > if (priv->channels.num == 0)
> > i = 0;
> > else
> > i = priv->channels.params.num_channels;
> > for (; i < priv->stats_nch; i++) {
> >
> > Seems to be working now when I adjust the queue count and the test is
> > passing as I adjust the queue count up or down. Cool.
> >
>
> I agree that get_base should include all inactive queues stats.
> But it's not straight forward to implement.
>
> A few guiding points:
>
> Use mlx5e_get_dcb_num_tc(params) for current num_tc.
>
> txq_ix (within the real_num_tx_queues) is calculated by c->ix + tc *
> params->num_channels.
>
> The txqsq stats struct is chosen by channel_stats[c->ix]->sq[tc].
>
> It means, in the base stats you should include SQ stats for:
> 1. all SQs of non-active channels, i.e. ch in [params.num_channels,
> priv->stats_nch), tc in [0, priv->max_opened_tc).
> 2. all SQs of non-active TCs in active channels [0, params.num_channels), tc
> in [mlx5e_get_dcb_num_tc(params), priv->max_opened_tc).
>
> Now I actually see that the patch has issues in mlx5e_get_queue_stats_tx.
> You should not loop over all TCs of channel index i.
> You must do a reverse mapping from "i" to the pair/tuple [ch_ix, tc], and
> then access a single TXQ stats by priv->channel_stats[ch_ix].sq[tc].
It looks like txq2sq probably will help with this?
Something like:
for (j = 0; j < mlx5e_get_dcb_num_tc(); j++) {
sq = priv->txq2sq[j];
if (sq->ch_ix == i) {
/* this sq->stats is what I need */
}
}
Is that right?
Not sure if I'm missing something obvious here, sorry, I've been puzzling
over the mlx5 source for a bit.
BTW: kind of related but in mlx5e_alloc_txqsq the int tc param is unused (I
think). It might be helpful to struct mlx5e_txqsq to have a tc field and
then in mlx5e_alloc_txqsq:
sq->tc = tc;
Not sure if that'd be helpful in general, but I could send that as a
separate patch.
On Thu, May 09, 2024 at 05:31:06PM -0700, Joe Damato wrote:
> On Thu, May 09, 2024 at 01:16:15PM +0300, Tariq Toukan wrote:
> >
> >
> > On 09/05/2024 9:30, Joe Damato wrote:
> > > On Wed, May 08, 2024 at 07:08:39PM -0700, Jakub Kicinski wrote:
> > > > On Thu, 9 May 2024 01:57:52 +0000 Joe Damato wrote:
> > > > > If I'm following that right and understanding mlx5 (two things I am
> > > > > unlikely to do simultaneously), that sounds to me like:
> > > > >
> > > > > - mlx5e_get_queue_stats_rx and mlx5e_get_queue_stats_tx check if i <
> > > > > priv->channels.params.num_channels (instead of priv->stats_nch),
> > > >
> > > > Yes, tho, not sure whether the "if i < ...num_channels" is even
> > > > necessary, as core already checks against real_num_rx_queues.
> > > >
> > > > > and when
> > > > > summing mlx5e_sq_stats in the latter function, it's up to
> > > > > priv->channels.params.mqprio.num_tc instead of priv->max_opened_tc.
> > > > >
> > > > > - mlx5e_get_base_stats accumulates and outputs stats for everything from
> > > > > priv->channels.params.num_channels to priv->stats_nch, and
> > > >
> > > > I'm not sure num_channels gets set to 0 when device is down so possibly
> > > > from "0 if down else ...num_channels" to stats_nch.
> > >
> > > Yea, you were right:
> > >
> > > if (priv->channels.num == 0)
> > > i = 0;
> > > else
> > > i = priv->channels.params.num_channels;
> > > for (; i < priv->stats_nch; i++) {
> > >
> > > Seems to be working now when I adjust the queue count and the test is
> > > passing as I adjust the queue count up or down. Cool.
> > >
> >
> > I agree that get_base should include all inactive queues stats.
> > But it's not straight forward to implement.
> >
> > A few guiding points:
> >
> > Use mlx5e_get_dcb_num_tc(params) for current num_tc.
> >
> > txq_ix (within the real_num_tx_queues) is calculated by c->ix + tc *
> > params->num_channels.
> >
> > The txqsq stats struct is chosen by channel_stats[c->ix]->sq[tc].
> >
> > It means, in the base stats you should include SQ stats for:
> > 1. all SQs of non-active channels, i.e. ch in [params.num_channels,
> > priv->stats_nch), tc in [0, priv->max_opened_tc).
> > 2. all SQs of non-active TCs in active channels [0, params.num_channels), tc
> > in [mlx5e_get_dcb_num_tc(params), priv->max_opened_tc).
> >
> > Now I actually see that the patch has issues in mlx5e_get_queue_stats_tx.
> > You should not loop over all TCs of channel index i.
> > You must do a reverse mapping from "i" to the pair/tuple [ch_ix, tc], and
> > then access a single TXQ stats by priv->channel_stats[ch_ix].sq[tc].
>
> It looks like txq2sq probably will help with this?
>
> Something like:
>
> for (j = 0; j < mlx5e_get_dcb_num_tc(); j++) {
> sq = priv->txq2sq[j];
> if (sq->ch_ix == i) {
> /* this sq->stats is what I need */
> }
> }
>
> Is that right?
This was incorrect, but I think I got it in the v2 I just sent out. When
you have the time, please take a look at that version.
Thanks for the guidance, it was very helpful.
> Not sure if I'm missing something obvious here, sorry, I've been puzzling
> over the mlx5 source for a bit.
>
> BTW: kind of related but in mlx5e_alloc_txqsq the int tc param is unused (I
> think). It might be helpful to struct mlx5e_txqsq to have a tc field and
> then in mlx5e_alloc_txqsq:
>
> sq->tc = tc;
>
> Not sure if that'd be helpful in general, but I could send that as a
> separate patch.