2024-05-31 00:11:47

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [RFC net-next v3 0/2] mlx5: Add netdev-genl queue stats

On Wed, 29 May 2024 03:16:25 +0000 Joe Damato wrote:
> Worth noting that Tariq suggested I also export HTB/QOS stats in
> mlx5e_get_base_stats.

Why to base, and not report them as queue stats?

Judging by mlx5e_update_tx_netdev_queues() calls sprinkled in
./mlx5/core/en/htb.c it seems that the driver will update the
real_num_tx_queues accordingly. And from mlx5e_qid_from_qos()
it seems like the inverse calculation is:

i - (chs->params.num_channels + is_ptp)*mlx5e_get_dcb_num_tc(&chs->params)

But really, isn't it enough to use priv->txq2sq[i] for the active
queues, and not active ones you've already covered?

> I am open to doing this, but I think if I were to do that, HTB/QOS queue
> stats should also be exported by rtnl so that the script above will
> continue to show that the output is correct.
>
> I'd like to propose: adding HTB/QOS to both rtnl *and* the netdev-genl
> code together at the same time, but a later time, separate from this
> change.

Hm, are HTB queues really not counted in rtnl? That'd be pretty wrong.


2024-05-31 00:15:41

by Joe Damato

[permalink] [raw]
Subject: Re: [RFC net-next v3 0/2] mlx5: Add netdev-genl queue stats

On Thu, May 30, 2024 at 05:11:28PM -0700, Jakub Kicinski wrote:
> On Wed, 29 May 2024 03:16:25 +0000 Joe Damato wrote:
> > Worth noting that Tariq suggested I also export HTB/QOS stats in
> > mlx5e_get_base_stats.
>
> Why to base, and not report them as queue stats?
>
> Judging by mlx5e_update_tx_netdev_queues() calls sprinkled in
> ../mlx5/core/en/htb.c it seems that the driver will update the
> real_num_tx_queues accordingly. And from mlx5e_qid_from_qos()
> it seems like the inverse calculation is:
>
> i - (chs->params.num_channels + is_ptp)*mlx5e_get_dcb_num_tc(&chs->params)
>
> But really, isn't it enough to use priv->txq2sq[i] for the active
> queues, and not active ones you've already covered?

This is what I proposed in the thread for the v2, but Tariq
suggested a different approach he liked more, please see this
message for more details:

https://lore.kernel.org/netdev/[email protected]/

I attempted to implement option 1 as he described in his message.

> > I am open to doing this, but I think if I were to do that, HTB/QOS queue
> > stats should also be exported by rtnl so that the script above will
> > continue to show that the output is correct.
> >
> > I'd like to propose: adding HTB/QOS to both rtnl *and* the netdev-genl
> > code together at the same time, but a later time, separate from this
> > change.
>
> Hm, are HTB queues really not counted in rtnl? That'd be pretty wrong.

As far as I can tell (and I could be wrong) I didn't see them
included in the rtnl stats, but I'll take another look now.

2024-05-31 00:39:15

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [RFC net-next v3 0/2] mlx5: Add netdev-genl queue stats

On Thu, 30 May 2024 17:15:25 -0700 Joe Damato wrote:
> > Why to base, and not report them as queue stats?
> >
> > Judging by mlx5e_update_tx_netdev_queues() calls sprinkled in
> > ../mlx5/core/en/htb.c it seems that the driver will update the
> > real_num_tx_queues accordingly. And from mlx5e_qid_from_qos()
> > it seems like the inverse calculation is:
> >
> > i - (chs->params.num_channels + is_ptp)*mlx5e_get_dcb_num_tc(&chs->params)
> >
> > But really, isn't it enough to use priv->txq2sq[i] for the active
> > queues, and not active ones you've already covered?
>
> This is what I proposed in the thread for the v2, but Tariq
> suggested a different approach he liked more, please see this
> message for more details:
>
> https://lore.kernel.org/netdev/[email protected]/
>
> I attempted to implement option 1 as he described in his message.

I see, although it sounds like option 2 would also work.

Saeed can you shine any light here? I'm worried Tariq is already AFK
for the weekend and we'll make no progress until Monday...

2024-05-31 00:57:08

by Joe Damato

[permalink] [raw]
Subject: Re: [RFC net-next v3 0/2] mlx5: Add netdev-genl queue stats

On Thu, May 30, 2024 at 05:39:02PM -0700, Jakub Kicinski wrote:
> On Thu, 30 May 2024 17:15:25 -0700 Joe Damato wrote:
> > > Why to base, and not report them as queue stats?
> > >
> > > Judging by mlx5e_update_tx_netdev_queues() calls sprinkled in
> > > ../mlx5/core/en/htb.c it seems that the driver will update the
> > > real_num_tx_queues accordingly. And from mlx5e_qid_from_qos()
> > > it seems like the inverse calculation is:
> > >
> > > i - (chs->params.num_channels + is_ptp)*mlx5e_get_dcb_num_tc(&chs->params)
> > >
> > > But really, isn't it enough to use priv->txq2sq[i] for the active
> > > queues, and not active ones you've already covered?
> >
> > This is what I proposed in the thread for the v2, but Tariq
> > suggested a different approach he liked more, please see this
> > message for more details:
> >
> > https://lore.kernel.org/netdev/[email protected]/
> >
> > I attempted to implement option 1 as he described in his message.
>
> I see, although it sounds like option 2 would also work.

I don't really mind either way; from Tariq's message it sounded like
he preferred option 1, so I tried to implement that thinking that it would be
my best bet at getting this done.

If option 2 is easier/preferred for some reason... it seems like
(other than the locking I forgot to include) the base implementation
in v2 was correct and I could use what I proposed in the thread for
the tx stats, which was:

mutex_lock(&priv->state_lock);
if (priv->channels.num > 0) {
sq = priv->txq2sq[i];
stats->packets = sq->stats->packets;
stats->bytes = sq->stats->bytes;
}
mutex_unlock(&priv->state_lock);

And I would have implemented option 2... IIUC.

2024-05-31 01:08:07

by Joe Damato

[permalink] [raw]
Subject: Re: [RFC net-next v3 0/2] mlx5: Add netdev-genl queue stats

On Thu, May 30, 2024 at 05:15:25PM -0700, Joe Damato wrote:
> On Thu, May 30, 2024 at 05:11:28PM -0700, Jakub Kicinski wrote:
> > On Wed, 29 May 2024 03:16:25 +0000 Joe Damato wrote:
> > > Worth noting that Tariq suggested I also export HTB/QOS stats in
> > > mlx5e_get_base_stats.
> >
> > Why to base, and not report them as queue stats?
> >
> > Judging by mlx5e_update_tx_netdev_queues() calls sprinkled in
> > ../mlx5/core/en/htb.c it seems that the driver will update the
> > real_num_tx_queues accordingly. And from mlx5e_qid_from_qos()
> > it seems like the inverse calculation is:
> >
> > i - (chs->params.num_channels + is_ptp)*mlx5e_get_dcb_num_tc(&chs->params)
> >
> > But really, isn't it enough to use priv->txq2sq[i] for the active
> > queues, and not active ones you've already covered?
>
> This is what I proposed in the thread for the v2, but Tariq
> suggested a different approach he liked more, please see this
> message for more details:
>
> https://lore.kernel.org/netdev/[email protected]/
>
> I attempted to implement option 1 as he described in his message.
>
> > > I am open to doing this, but I think if I were to do that, HTB/QOS queue
> > > stats should also be exported by rtnl so that the script above will
> > > continue to show that the output is correct.
> > >
> > > I'd like to propose: adding HTB/QOS to both rtnl *and* the netdev-genl
> > > code together at the same time, but a later time, separate from this
> > > change.
> >
> > Hm, are HTB queues really not counted in rtnl? That'd be pretty wrong.
>
> As far as I can tell (and I could be wrong) I didn't see them
> included in the rtnl stats, but I'll take another look now.

I looked and it seems like the htb stats are included in ethtool's
output if you look at mlx5/core/en_stats.c, it appears that
mlx5e_stats_grp_sw_update_stats_qos rolls up the htb stats.

However, the rtnl stats, seem to be computed in mlx5/core/en_main.c
via mlx5e_get_stats calling mlx5e_fold_sw_stats64, which appears to
gather stats for rx, tx, and ptp, but it doesn't seem like qos/htb
is handled.

Unless I am missing something, I think mlx5e_fold_sw_stats64 would
need code similar to mlx5e_stats_grp_sw_update_stats_qos and then
rtnl would account for htb stats.

That said: since it seems the htb numbers are not included right
now, I was proposing adding that in later both to rtnl and
netdev-genl together, hoping that would keep the proposed
simpler/easier to get accepted.

LMK what you think.

2024-05-31 01:26:47

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [RFC net-next v3 0/2] mlx5: Add netdev-genl queue stats

On Thu, 30 May 2024 18:07:31 -0700 Joe Damato wrote:
> Unless I am missing something, I think mlx5e_fold_sw_stats64 would
> need code similar to mlx5e_stats_grp_sw_update_stats_qos and then
> rtnl would account for htb stats.

Hm, I think you're right. I'm just surprised this could have gone
unnoticed for so long.

> That said: since it seems the htb numbers are not included right
> now, I was proposing adding that in later both to rtnl and
> netdev-genl together, hoping that would keep the proposed
> simpler/easier to get accepted.

SGTM.

2024-05-31 18:30:53

by Joe Damato

[permalink] [raw]
Subject: Re: [RFC net-next v3 0/2] mlx5: Add netdev-genl queue stats

On Thu, May 30, 2024 at 06:26:30PM -0700, Jakub Kicinski wrote:
> On Thu, 30 May 2024 18:07:31 -0700 Joe Damato wrote:
> > Unless I am missing something, I think mlx5e_fold_sw_stats64 would
> > need code similar to mlx5e_stats_grp_sw_update_stats_qos and then
> > rtnl would account for htb stats.
>
> Hm, I think you're right. I'm just surprised this could have gone
> unnoticed for so long.
>
> > That said: since it seems the htb numbers are not included right
> > now, I was proposing adding that in later both to rtnl and
> > netdev-genl together, hoping that would keep the proposed
> > simpler/easier to get accepted.
>
> SGTM.

Cool, so based on that it seems like I just need to figure out the
correct implementation for base and tx stats that is correct and
that will be accepted.

Hoping to hear back from them soon as I'd personally love to have
this API available on our systems.