> static void stmmac_configure_cbs(struct stmmac_priv *priv)
> {
> u32 tx_queues_count = priv->plat->tx_queues_to_use;
> u32 mode_to_use;
> u32 queue;
> + u32 ptr, speed_div;
> + u64 value;
> +
> + /* Port Transmit Rate and Speed Divider */
> + switch (priv->speed) {
> + case SPEED_10000:
> + ptr = 32;
> + speed_div = 10000000;
> + break;
> + case SPEED_5000:
> + ptr = 32;
> + speed_div = 5000000;
> + break;
> + case SPEED_2500:
> + ptr = 8;
> + speed_div = 2500000;
> + break;
> + case SPEED_1000:
> + ptr = 8;
> + speed_div = 1000000;
> + break;
> + case SPEED_100:
> + ptr = 4;
> + speed_div = 100000;
> + break;
> + default:
No SPEED_10 ?
> + netdev_dbg(priv->dev, "link speed is not known\n");
> + }
>
> /* queue 0 is reserved for legacy traffic */
> for (queue = 1; queue < tx_queues_count; queue++) {
> @@ -3196,6 +3231,12 @@ static void stmmac_configure_cbs(struct stmmac_priv *priv)
> if (mode_to_use == MTL_QUEUE_DCB)
> continue;
>
> + value = div_s64(priv->old_idleslope[queue] * 1024ll * ptr, speed_div);
> + priv->plat->tx_queues_cfg[queue].idle_slope = value & GENMASK(31, 0);
Rather than masking off the top bits, shouldn't you be looking for
overflow? that indicates the configuration is not possible. You don't
have a good way to report the problem, since there is no user action
on link up, so you cannot return -EINVAL or -EOPNOTSUPP. So you
probably want to set the hardware as close as possible.
Also, what happens if the result of the div is 0? Does 0 have a
special meaning?
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c
> index 222540b55480..d3526ad91aff 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c
> @@ -355,6 +355,9 @@ static int tc_setup_cbs(struct stmmac_priv *priv,
> if (!priv->dma_cap.av)
> return -EOPNOTSUPP;
>
> + if (!netif_carrier_ok(priv->dev))
> + return -ENETDOWN;
> +
Now that you are configuring the hardware on link up, does that
matter?
Andrew
On Thu, May 30, 2024 at 02:50:52PM +0200, Xiaolei Wang wrote:
> When the port is relinked, if the speed changes, the CBS parameters
> should be updated, so saving the user transmission parameters so
> that idle_slope and send_slope can be recalculated after the speed
> changes after linking up can help reconfigure CBS after the speed
> changes.
>
> Fixes: 1f705bc61aee ("net: stmmac: Add support for CBS QDISC")
> Signed-off-by: Xiaolei Wang <[email protected]>
> ---
> v1 -> v2
> - Update CBS parameters when speed changes
May I ask what is the point of this patch? The bandwidth fraction, as
IEEE 802.1Q defines it, it a function of idleSlope / portTransmitRate,
the latter of which is a runtime variant. If the link speed changes at
runtime, which is entirely possible, I see no alternative than to let
user space figure out that this happened, and decide what to do. This is
a consequence of the fact that the tc-cbs UAPI takes the raw idleSlope
as direct input, rather than something more high level like the desired
bandwidth for the stream itself, which could be dynamically computed by
the kernel.
On Thu, May 30, 2024 at 04:28:22PM +0300, Vladimir Oltean wrote:
> On Thu, May 30, 2024 at 02:50:52PM +0200, Xiaolei Wang wrote:
> > When the port is relinked, if the speed changes, the CBS parameters
> > should be updated, so saving the user transmission parameters so
> > that idle_slope and send_slope can be recalculated after the speed
> > changes after linking up can help reconfigure CBS after the speed
> > changes.
> >
> > Fixes: 1f705bc61aee ("net: stmmac: Add support for CBS QDISC")
> > Signed-off-by: Xiaolei Wang <[email protected]>
> > ---
> > v1 -> v2
> > - Update CBS parameters when speed changes
>
> May I ask what is the point of this patch? The bandwidth fraction, as
> IEEE 802.1Q defines it, it a function of idleSlope / portTransmitRate,
> the latter of which is a runtime variant. If the link speed changes at
> runtime, which is entirely possible, I see no alternative than to let
> user space figure out that this happened, and decide what to do. This is
> a consequence of the fact that the tc-cbs UAPI takes the raw idleSlope
> as direct input, rather than something more high level like the desired
> bandwidth for the stream itself, which could be dynamically computed by
> the kernel.
So what should be the behaviour here? Refuse setting CBS parameters if
the link is down, and clear the hardware configuration of the CBS
parameters each and every time there is a link-down event? Isn't that
going to make the driver's in-use settings inconsistent with what the
kernel thinks have been set? AFAIK, tc qdisc's don't vanish from the
kernel just because the link went down.
I think what you're proposing leads to the hardware being effectively
"de-programmed" for CBS while "tc qdisc show" will probably report
that CBS is active on the interface - which clearly would be absurd.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
On Thu, May 30, 2024 at 02:40:30PM +0100, Russell King (Oracle) wrote:
> On Thu, May 30, 2024 at 04:28:22PM +0300, Vladimir Oltean wrote:
> > On Thu, May 30, 2024 at 02:50:52PM +0200, Xiaolei Wang wrote:
> > > When the port is relinked, if the speed changes, the CBS parameters
> > > should be updated, so saving the user transmission parameters so
> > > that idle_slope and send_slope can be recalculated after the speed
> > > changes after linking up can help reconfigure CBS after the speed
> > > changes.
> > >
> > > Fixes: 1f705bc61aee ("net: stmmac: Add support for CBS QDISC")
> > > Signed-off-by: Xiaolei Wang <[email protected]>
> > > ---
> > > v1 -> v2
> > > - Update CBS parameters when speed changes
> >
> > May I ask what is the point of this patch? The bandwidth fraction, as
> > IEEE 802.1Q defines it, it a function of idleSlope / portTransmitRate,
> > the latter of which is a runtime variant. If the link speed changes at
> > runtime, which is entirely possible, I see no alternative than to let
> > user space figure out that this happened, and decide what to do. This is
> > a consequence of the fact that the tc-cbs UAPI takes the raw idleSlope
> > as direct input, rather than something more high level like the desired
> > bandwidth for the stream itself, which could be dynamically computed by
> > the kernel.
>
> So what should be the behaviour here? Refuse setting CBS parameters if
> the link is down, and clear the hardware configuration of the CBS
> parameters each and every time there is a link-down event? Isn't that
> going to make the driver's in-use settings inconsistent with what the
> kernel thinks have been set? AFAIK, tc qdisc's don't vanish from the
> kernel just because the link went down.
>
> I think what you're proposing leads to the hardware being effectively
> "de-programmed" for CBS while "tc qdisc show" will probably report
> that CBS is active on the interface - which clearly would be absurd.
No, just program to hardware right away the idleSlope, sendSlope,
loCredit and hiCredit that were communicated by user space. Those were
computed for a specific link speed and it is user space's business to
monitor that this link speed is maintained for as long as the streams
are necessary (otherwise those parameters are no longer valid).
One could even recover the portTransmitRate that the parameters were
computed for (it should be idleSlope - sendSlope, in Kbps).
AKA keep the driver as it is.
I don't see why the CBS parameters would need to be de-programmed from
hardware on a link down event. Is that some stmmac specific thing?
Xiaolei may have a bone to pick with the fact that tc-cbs takes its
input the way it does, but that's an entirely different matter..
On Thu, May 30, 2024 at 04:53:35PM +0300, Vladimir Oltean wrote:
> On Thu, May 30, 2024 at 02:40:30PM +0100, Russell King (Oracle) wrote:
> > On Thu, May 30, 2024 at 04:28:22PM +0300, Vladimir Oltean wrote:
> > > On Thu, May 30, 2024 at 02:50:52PM +0200, Xiaolei Wang wrote:
> > > > When the port is relinked, if the speed changes, the CBS parameters
> > > > should be updated, so saving the user transmission parameters so
> > > > that idle_slope and send_slope can be recalculated after the speed
> > > > changes after linking up can help reconfigure CBS after the speed
> > > > changes.
> > > >
> > > > Fixes: 1f705bc61aee ("net: stmmac: Add support for CBS QDISC")
> > > > Signed-off-by: Xiaolei Wang <[email protected]>
> > > > ---
> > > > v1 -> v2
> > > > - Update CBS parameters when speed changes
> > >
> > > May I ask what is the point of this patch? The bandwidth fraction, as
> > > IEEE 802.1Q defines it, it a function of idleSlope / portTransmitRate,
> > > the latter of which is a runtime variant. If the link speed changes at
> > > runtime, which is entirely possible, I see no alternative than to let
> > > user space figure out that this happened, and decide what to do. This is
> > > a consequence of the fact that the tc-cbs UAPI takes the raw idleSlope
> > > as direct input, rather than something more high level like the desired
> > > bandwidth for the stream itself, which could be dynamically computed by
> > > the kernel.
> >
> > So what should be the behaviour here? Refuse setting CBS parameters if
> > the link is down, and clear the hardware configuration of the CBS
> > parameters each and every time there is a link-down event? Isn't that
> > going to make the driver's in-use settings inconsistent with what the
> > kernel thinks have been set? AFAIK, tc qdisc's don't vanish from the
> > kernel just because the link went down.
> >
> > I think what you're proposing leads to the hardware being effectively
> > "de-programmed" for CBS while "tc qdisc show" will probably report
> > that CBS is active on the interface - which clearly would be absurd.
>
> No, just program to hardware right away the idleSlope, sendSlope,
> loCredit and hiCredit that were communicated by user space. Those were
> computed for a specific link speed and it is user space's business to
> monitor that this link speed is maintained for as long as the streams
> are necessary (otherwise those parameters are no longer valid).
> One could even recover the portTransmitRate that the parameters were
> computed for (it should be idleSlope - sendSlope, in Kbps).
>
> AKA keep the driver as it is.
>
> I don't see why the CBS parameters would need to be de-programmed from
> hardware on a link down event. Is that some stmmac specific thing?
If the driver is having to do computation on the parameters based on
the link speed, then when the link speed changes, the parameters
no longer match what the kernel _thinks_ those parameters were
programmed with.
What I'm trying to get over to you is that what you propose causes
an inconsistency between how the hardware is _programmed_ to behave
for CBS and what the kernel reports the CBS settings are if the
link speed changes.
For example, if the link was operating at 10G, and the idle slope
set by userspace is A, and the send slope was B, tc qdisc show
will report an idle slope of A and send slope of B.
If the link speed now changes to 5G, then, without updating the
settings in the hardware, the multiplier for the register values
will have reduced by a factor of two, meaning they're twice as
large as they should be for values of A and B.
However, tc qdisc show continues to report that values of A and B
are being used, but the hardware is actually using 2 * A and 2 * B.
It's all very well saying that userspace should basically reconstruct
the tc settings when the link changes, but not everyone is aware of
that. I'm saying it's a problem if one isn't aware of this issue with
this hardware, and one looks at tc qdisc show output, and assumes
that reflects what is actually being used when it isn't.
It's quality of implmentation - as far as I'm concerned, the kernel
should *not* mislead the user like this.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
On Thu, May 30, 2024 at 03:14:48PM +0100, Russell King (Oracle) wrote:
> > I don't see why the CBS parameters would need to be de-programmed from
> > hardware on a link down event. Is that some stmmac specific thing?
>
> If the driver is having to do computation on the parameters based on
> the link speed, then when the link speed changes, the parameters
> no longer match what the kernel _thinks_ those parameters were
> programmed with.
>
> What I'm trying to get over to you is that what you propose causes
> an inconsistency between how the hardware is _programmed_ to behave
> for CBS and what the kernel reports the CBS settings are if the
> link speed changes.
>
> It's all very well saying that userspace should basically reconstruct
> the tc settings when the link changes, but not everyone is aware of
> that. I'm saying it's a problem if one isn't aware of this issue with
> this hardware, and one looks at tc qdisc show output, and assumes
> that reflects what is actually being used when it isn't.
>
> It's quality of implmentation - as far as I'm concerned, the kernel
> should *not* mislead the user like this.
I was saying that the tc-cbs parameters input into the kernel should
already have the link speed baked into them:
portTransmitRate = idleSlope - sendSlope. In theory one could feed any
data into the kernel, but this is based on the IEEE 802.1Q formulas.
I had missed the fact that there is a calculation dependent on
priv->speed within tc_setup_cbs(), and I'm sorry for that. I thought
that the values were passed unaltered down to stmmac_config_cbs(). So
"make no change to the driver" is no longer my recommendation.
In that case, my recommendation is to do as sja1105_setup_tc_cbs() does:
replace priv->speed with the portTransmitRate recovered from the tc-cbs
parameters, and fully expect that when the link speed changes, user
space comes along and changes those parameters.
On Thu, 30 May 2024 14:40:30 +0100 Russell King (Oracle) wrote:
> I think what you're proposing leads to the hardware being effectively
> "de-programmed" for CBS while "tc qdisc show" will probably report
> that CBS is active on the interface - which clearly would be absurd.
FWIW the "switch-offloaded" qdiscs do support reporting that they got
"de-programmed" given that more complex hierarchies can easily go out
of what HW is capable of.
They call the driver from the .dump callback, nominally to get
stats (e.g. red_dump() -> red_dump_offload_stats()) but it also
refreshes the offloaded state (see qdisc_offload_dump_helper()).
For "NIC-offloaded" qdiscs (i.e. all traffic passes thru the host,
rather than being forwarded) the stats callback makes less sense.
But all this is to say that there _is_ precedent for clearing
qdisc "offloaded" bits.