2017-08-10 05:39:25

by Kalle Valo

[permalink] [raw]
Subject: Regression: Bug 196547 - Since 4.12 - bonding module not working with wireless drivers

Hi Mahesh and Andy,

James Feeney reported that there's a serious regression in bonding
module since v4.12, it doesn't work with wireless drivers anymore as
wireless drivers don't report the link speed via ethtool:

https://bugzilla.kernel.org/show_bug.cgi?id=196547

In the bug report it's said that this commit is the culprit:

3f3c278c94dd bonding: fix active-backup transition

Is there a fix for this or should that commit be reverted? This seems to
be a serious regression as there are multiple reports already and we
should get it fixed for v4.13, and the fix backported to v4.12 stable
release.

--
Kalle Valo


2017-08-17 03:32:12

by Ben Greear

[permalink] [raw]
Subject: Re: Regression: Bug 196547 - Since 4.12 - bonding module not working with wireless drivers

On 08/16/2017 08:18 PM, Dan Williams wrote:
> On Wed, 2017-08-16 at 19:36 -0700, Ben Greear wrote:
>> On 08/16/2017 07:11 PM, Dan Williams wrote:
>>> On Wed, 2017-08-16 at 14:31 -0700, David Miller wrote:
>>>> From: Dan Williams <[email protected]>
>>>> Date: Wed, 16 Aug 2017 16:22:41 -0500
>>>>
>>>>> My biggest suggestion is that perhaps bonding should grow
>>>>
>>>> hysteresis
>>>>> for link speeds. Since WiFi can change speed every packet, you
>>>>
>>>> probably
>>>>> don't want the bond characteristics changing every couple
>>>>> seconds
>>>>
>>>> just
>>>>> in case your WiFi link is jumping around. Ethernet won't
>>>>> bounce
>>>>
>>>> around
>>>>> that much, so the hysteresis would have no effect there. Or,
>>>>> if
>>>>
>>>> people
>>>>> are concerned about response time to speed changes on ethernet
>>>>
>>>> (where
>>>>> you probably do want an instant switch-over) some new flag to
>>>>
>>>> indicate
>>>>> that certain devices don't have stable speeds over time.
>>>>
>>>> Or just report the average of the range the wireless link can
>>>> hit,
>>>> and
>>>> be done with it.
>>>>
>>>> I think you guys are overcomplicating things.
>>>
>>> That range can be from 1 to > 800Mb/s. No, it won't usually be all
>>> over that range, but it won't be uncommon to fluctuate by hundreds
>>> of
>>> Mb/s. I'm not sure a simple average is really the answer
>>> here. Even
>>> doing that would require new knobs to ethtool, since the rate
>>> depends
>>> heavily on card capabilities and also what AP you're connected to
>>> *at
>>> that moment*. If you roam to another AP, then the max speed can
>>> certainly change.
>>>
>>> You'll probably say "aim for the 75% case" or something like that,
>>> which is fine, but then you're depending on your 75% case to be (a)
>>> single AP, (b) never move (eg, only bond wifi + ethernet), (c)
>>> little
>>> radio interference. I'm not sure I'd buy that. If I've put words
>>> in
>>> your mouth, forgive me.
>>
>> If you keep ethtool API simple and just return the last (rx-rate +
>> tx-rate) / 2, or the rate averaged
>> over the last 100 frames or 10 seconds, then the caller can do longer
>> term averaging
>> as it sees fit. Probably no need for lots of averaging complexity in
>> the kernel.
>
> Yeah, that works too, but I was thinking it was better to present the
> actual data through ethtool so that things other than bonding could use
> it, and since bonding is the thing that actually cares about the
> fluctuation, make it do the more extensive processing.

What do you mean by 'actual data'? If you want to know the most accurate
transmit/rx rate info, then you need to pay attention to each and every frame's tx/rx rate, as
well as it's ampdu/amsdu, retries, etc. It is virtually impossible.

So, you will have to settle for something less... I suggest something simple
to calculate, similar to existing values that are available via debugfs and/or
'iw dev foo station dump', etc. Let higher layers manipulate the raw data
as they see fit (they can query ethtool as often as they like).

Thanks,
Ben


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2017-08-11 13:14:55

by Kalle Valo

[permalink] [raw]
Subject: Re: Regression: Bug 196547 - Since 4.12 - bonding module not working with wireless drivers

Andreas Born <[email protected]> writes:

> Earlier today I submitted the patch (bonding: require speed/duplex
> only for 802.3ad, alb and tlb) [2] that only partially reverts what is
> a regression following my aforementioned logic. This seems to me like
> the best solution in the short term since it should satisfy both
> usergroups represented by Mahesh and James and restores consistence
> with the bonding documentation. James already commented approvingly on
> that patch in the bug report. [3]
>
> Regards
> Andreas
>
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/bonding.txt

Great, thanks.

I'll take it the patch is meant for net tree (and not net-next) so that
it will be fixed for v4.13? Also it should backported to v4.12 stable
tree. I don't see any mention of that in the patch submission and that's
why I'm asking.

--
Kalle Valo

2017-08-10 17:52:04

by Andreas Born

[permalink] [raw]
Subject: Re: Regression: Bug 196547 - Since 4.12 - bonding module not working with wireless drivers

Hi everyone,

2017-08-10 14:43 GMT+02:00 Arend van Spriel <[email protected]>:
>
>
> On 10-08-17 07:39, Kalle Valo wrote:
>>
>> Hi Mahesh and Andy,
>>
>> James Feeney reported that there's a serious regression in bonding
>> module since v4.12, it doesn't work with wireless drivers anymore as
>> wireless drivers don't report the link speed via ethtool:
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=3D196547
>>
>> In the bug report it's said that this commit is the culprit:
>>
>> 3f3c278c94dd bonding: fix active-backup transition
>
>
> This commit references another one. ie. commit c4adfc822bf5 ("bonding: ma=
ke speed, duplex setting consistent with link state"). Before this commit t=
he result of __ethtool_get_link_ksettings() was simply ignored.

Actually it was not simply ignored in any case: Further down in
bond_miimon_commit() there's a conditional call to
bond_3ad_handle_link_change() which triggers an update using
__get_link_speed() to actually access the result. A similar handler is
also called for lb modes.

>
> Commit 3f3c278c94dd ("bonding: fix active-backup transition") moves setti=
ng the link state to the call sites of bond_update_speed_duplex(), just not=
all call sites.
>
>> Is there a fix for this or should that commit be reverted? This seems to
>> be a serious regression as there are multiple reports already and we
>> should get it fixed for v4.13, and the fix backported to v4.12 stable
>> release.
>
>
> The ethtool callbacks really seem optional. At least in brcmfmac, the wir=
eless driver I maintain, I only provide get_drvinfo callback and there is n=
o warning triggered upon registering the netdev. The changes above now requ=
ire each netdev to implement the get_link_ksettings callback (get_settings =
is deprecated) or the link is marked as DOWN ruling it out to be used as ac=
tive bond slave. To the end-users who were using bonding this is simply a r=
egression. So to fix that both changes should be reverted in my opinion.

Yes, also to me as user of a wireless slave in an active-backup bond
it's clearly a regression. But only partially for some modes like
active-backup since the bonding documentation [1] clearly lists as a
prerequisite

1) for 802.3ad: "Ethtool support in the base drivers for retrieving
the speed and duplex of each slave."
2) for tlb/alb: "Ethtool support in the base drivers for retrieving
the speed of each slave."

This was previously not directly enforced in the bonding code and thus
probably occasionally caused unexpected behavior. At least such
behavior is what to my understanding commit c4adfc822bf5 ("bonding:
make speed, duplex setting consistent with link state") and
3f3c278c94dd ("bonding: fix active-backup transition") intend to fix
with an apparent focus on 802.3ad. However those commits went too far
by requiring a get_link_ksettings implementation by every slave driver
REGARDLESS of the bond mode.

Earlier today I submitted the patch (bonding: require speed/duplex
only for 802.3ad, alb and tlb) [2] that only partially reverts what is
a regression following my aforementioned logic. This seems to me like
the best solution in the short term since it should satisfy both
usergroups represented by Mahesh and James and restores consistence
with the bonding documentation. James already commented approvingly on
that patch in the bug report. [3]

Regards
Andreas

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree=
/Documentation/networking/bonding.txt
[2] https://www.spinics.net/lists/netdev/msg448662.html
[3] https://bugzilla.kernel.org/show_bug.cgi?id=3D196547#c10

2017-08-17 02:13:30

by Dan Williams

[permalink] [raw]
Subject: Re: Regression: Bug 196547 - Since 4.12 - bonding module not working with wireless drivers

On Wed, 2017-08-16 at 14:31 -0700, David Miller wrote:
> From: Dan Williams <[email protected]>
> Date: Wed, 16 Aug 2017 16:22:41 -0500
>
> > My biggest suggestion is that perhaps bonding should grow
> hysteresis
> > for link speeds. Since WiFi can change speed every packet, you
> probably
> > don't want the bond characteristics changing every couple seconds
> just
> > in case your WiFi link is jumping around.  Ethernet won't bounce
> around
> > that much, so the hysteresis would have no effect there.  Or, if
> people
> > are concerned about response time to speed changes on ethernet
> (where
> > you probably do want an instant switch-over) some new flag to
> indicate
> > that certain devices don't have stable speeds over time.
>
> Or just report the average of the range the wireless link can hit,
> and
> be done with it.
>
> I think you guys are overcomplicating things.

That range can be from 1 to > 800Mb/s. No, it won't usually be all
over that range, but it won't be uncommon to fluctuate by hundreds of
Mb/s. I'm not sure a simple average is really the answer here. Even
doing that would require new knobs to ethtool, since the rate depends
heavily on card capabilities and also what AP you're connected to *at
that moment*. If you roam to another AP, then the max speed can
certainly change.

You'll probably say "aim for the 75% case" or something like that,
which is fine, but then you're depending on your 75% case to be (a)
single AP, (b) never move (eg, only bond wifi + ethernet), (c) little
radio interference. I'm not sure I'd buy that. If I've put words in
your mouth, forgive me.

Dan

2017-08-10 12:43:11

by Arend van Spriel

[permalink] [raw]
Subject: Re: Regression: Bug 196547 - Since 4.12 - bonding module not working with wireless drivers



On 10-08-17 07:39, Kalle Valo wrote:
> Hi Mahesh and Andy,
>
> James Feeney reported that there's a serious regression in bonding
> module since v4.12, it doesn't work with wireless drivers anymore as
> wireless drivers don't report the link speed via ethtool:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=196547
>
> In the bug report it's said that this commit is the culprit:
>
> 3f3c278c94dd bonding: fix active-backup transition

This commit references another one. ie. commit c4adfc822bf5 ("bonding:
make speed, duplex setting consistent with link state"). Before this
commit the result of __ethtool_get_link_ksettings() was simply ignored.

--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -365,9 +365,10 @@ int bond_set_carrier(struct bonding *bond)
/* Get link speed and duplex from the slave's base driver
* using ethtool. If for some reason the call fails or the
* values are invalid, set speed and duplex to -1,
- * and return.
+ * and return. Return 1 if speed or duplex settings are
+ * UNKNOWN; 0 otherwise.
*/
-static void bond_update_speed_duplex(struct slave *slave)
+static int bond_update_speed_duplex(struct slave *slave)
{
struct net_device *slave_dev = slave->dev;
struct ethtool_link_ksettings ecmd;
@@ -377,24 +378,27 @@ static void bond_update_speed_duplex(struct slave
*slave)
slave->duplex = DUPLEX_UNKNOWN;

res = __ethtool_get_link_ksettings(slave_dev, &ecmd);
- if (res < 0)
- return;
-
- if (ecmd.base.speed == 0 || ecmd.base.speed == ((__u32)-1))
- return;
-
+ if (res < 0) {
+ slave->link = BOND_LINK_DOWN;
+ return 1;
+ }
+ if (ecmd.base.speed == 0 || ecmd.base.speed == ((__u32)-1)) {
+ slave->link = BOND_LINK_DOWN;
+ return 1;
+ }

Commit 3f3c278c94dd ("bonding: fix active-backup transition") moves
setting the link state to the call sites of bond_update_speed_duplex(),
just not all call sites.

> Is there a fix for this or should that commit be reverted? This seems to
> be a serious regression as there are multiple reports already and we
> should get it fixed for v4.13, and the fix backported to v4.12 stable
> release.

The ethtool callbacks really seem optional. At least in brcmfmac, the
wireless driver I maintain, I only provide get_drvinfo callback and
there is no warning triggered upon registering the netdev. The changes
above now require each netdev to implement the get_link_ksettings
callback (get_settings is deprecated) or the link is marked as DOWN
ruling it out to be used as active bond slave. To the end-users who were
using bonding this is simply a regression. So to fix that both changes
should be reverted in my opinion.

Now specifically for wireless interfaces we could implement
get_link_ksettings callback although most of the fields requested are
meaningless in wireless context. Regarding the speed and half-duplex
values we raised some concerns in an earlier discussion with James.
Wireless is always half-duplex as there can be only one (unintended ref
to [1]). If the reported speed in wifi is difficult. In wifi we have
txrate and rxrate which are inherently asynchronous and it is a
per-packet value so it is going to change a lot. Seeing only 4 call
sites in the bonding code tells me that is not taken into account. All
in all this shenanigan seems netconf material to me.

Regards,
Arend

[1] https://en.wikipedia.org/wiki/Highlander_(film)

2017-08-16 21:01:54

by David Miller

[permalink] [raw]
Subject: Re: Regression: Bug 196547 - Since 4.12 - bonding module not working with wireless drivers

From: James Feeney <[email protected]>
Date: Wed, 16 Aug 2017 14:44:27 -0600

> On 08/13/2017 11:42 AM, Andreas Born wrote:
>> On a side note I would recommend some of my own reading to you about
>> patch submission in general [1] and on netdev specifically [2].
>
> Mmm - [2] and [3], I suspect. Thanks Andreas. I'll be studying those. Yeah,
> I'm still learning what is needed and what works. Sometimes, just a note to the
> author is more than enough to resolve a problem. Sometimes, discussion is
> needed. And other times... well, certain people are infamous... but no problem
> here, thankfully.
>
>> And, just wondering, who's going to eventually close that bugreport?
>> https://bugzilla.kernel.org/show_bug.cgi?id=196547
>
> I can close it when the patches actually land in the kernel.

All of the patches are in Linus's tree.

2017-08-17 02:36:58

by Ben Greear

[permalink] [raw]
Subject: Re: Regression: Bug 196547 - Since 4.12 - bonding module not working with wireless drivers

On 08/16/2017 07:11 PM, Dan Williams wrote:
> On Wed, 2017-08-16 at 14:31 -0700, David Miller wrote:
>> From: Dan Williams <[email protected]>
>> Date: Wed, 16 Aug 2017 16:22:41 -0500
>>
>>> My biggest suggestion is that perhaps bonding should grow
>> hysteresis
>>> for link speeds. Since WiFi can change speed every packet, you
>> probably
>>> don't want the bond characteristics changing every couple seconds
>> just
>>> in case your WiFi link is jumping around. Ethernet won't bounce
>> around
>>> that much, so the hysteresis would have no effect there. Or, if
>> people
>>> are concerned about response time to speed changes on ethernet
>> (where
>>> you probably do want an instant switch-over) some new flag to
>> indicate
>>> that certain devices don't have stable speeds over time.
>>
>> Or just report the average of the range the wireless link can hit,
>> and
>> be done with it.
>>
>> I think you guys are overcomplicating things.
>
> That range can be from 1 to > 800Mb/s. No, it won't usually be all
> over that range, but it won't be uncommon to fluctuate by hundreds of
> Mb/s. I'm not sure a simple average is really the answer here. Even
> doing that would require new knobs to ethtool, since the rate depends
> heavily on card capabilities and also what AP you're connected to *at
> that moment*. If you roam to another AP, then the max speed can
> certainly change.
>
> You'll probably say "aim for the 75% case" or something like that,
> which is fine, but then you're depending on your 75% case to be (a)
> single AP, (b) never move (eg, only bond wifi + ethernet), (c) little
> radio interference. I'm not sure I'd buy that. If I've put words in
> your mouth, forgive me.

If you keep ethtool API simple and just return the last (rx-rate + tx-rate) / 2, or the rate averaged
over the last 100 frames or 10 seconds, then the caller can do longer term averaging
as it sees fit. Probably no need for lots of averaging complexity in the kernel.

rate-ctrl for wifi basically doesn't happen until you transmit or receive a
fairly steady stream, so it will fluctuate a lot.

Thanks,
Ben

>
> Dan
>


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2017-08-16 20:45:39

by James Feeney

[permalink] [raw]
Subject: Re: Regression: Bug 196547 - Since 4.12 - bonding module not working with wireless drivers

On 08/13/2017 11:42 AM, Andreas Born wrote:
> On a side note I would recommend some of my own reading to you about
> patch submission in general [1] and on netdev specifically [2].

Mmm - [2] and [3], I suspect. Thanks Andreas. I'll be studying those. Yeah,
I'm still learning what is needed and what works. Sometimes, just a note to the
author is more than enough to resolve a problem. Sometimes, discussion is
needed. And other times... well, certain people are infamous... but no problem
here, thankfully.

> And, just wondering, who's going to eventually close that bugreport?
> https://bugzilla.kernel.org/show_bug.cgi?id=196547

I can close it when the patches actually land in the kernel. I'm glad to see
that there was an "Ack" from Mahesh.

On the topic of wireless support for kernel ethtool reporting, I'm wondering, is
there is any consensus about that?

And, for instance, is there any *other* way for the bonding module to make
"better link" decisions for wireless links? As "wireless" becomes more capable,
possibly more diverse, and probably more essential for computing, this is likely
to become a bigger issue.

Ben Greear mentioned that he had added some support to the ath10k driver. Dan
Williams mentioned the possibility of updating the mac80211 stack for support.
And Arend van Spriel suggested that the issue might best be left for the next
Netconf.

Immediate problem solved, but maybe a larger issue still needs to be addressed?

James

2017-08-16 21:31:17

by David Miller

[permalink] [raw]
Subject: Re: Regression: Bug 196547 - Since 4.12 - bonding module not working with wireless drivers

From: Dan Williams <[email protected]>
Date: Wed, 16 Aug 2017 16:22:41 -0500

> My biggest suggestion is that perhaps bonding should grow hysteresis
> for link speeds. Since WiFi can change speed every packet, you probably
> don't want the bond characteristics changing every couple seconds just
> in case your WiFi link is jumping around. Ethernet won't bounce around
> that much, so the hysteresis would have no effect there. Or, if people
> are concerned about response time to speed changes on ethernet (where
> you probably do want an instant switch-over) some new flag to indicate
> that certain devices don't have stable speeds over time.

Or just report the average of the range the wireless link can hit, and
be done with it.

I think you guys are overcomplicating things.

2017-08-12 07:35:50

by Kalle Valo

[permalink] [raw]
Subject: Re: Regression: Bug 196547 - Since 4.12 - bonding module not working with wireless drivers

Kalle Valo <[email protected]> writes:

> Andreas Born <[email protected]> writes:
>
>> Earlier today I submitted the patch (bonding: require speed/duplex
>> only for 802.3ad, alb and tlb) [2] that only partially reverts what is
>> a regression following my aforementioned logic. This seems to me like
>> the best solution in the short term since it should satisfy both
>> usergroups represented by Mahesh and James and restores consistence
>> with the bonding documentation. James already commented approvingly on
>> that patch in the bug report. [3]
>>
>> Regards
>> Andreas
>>
>> [1]
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/bonding.txt
>
> Great, thanks.
>
> I'll take it the patch is meant for net tree (and not net-next) so that
> it will be fixed for v4.13? Also it should backported to v4.12 stable
> tree. I don't see any mention of that in the patch submission and that's
> why I'm asking.

I see that Dave applied this to the net tree and queued also for stable,
excellent. Thanks everyone!

https://patchwork.ozlabs.org/patch/800080/

--
Kalle Valo

2017-08-16 21:24:24

by Dan Williams

[permalink] [raw]
Subject: Re: Regression: Bug 196547 - Since 4.12 - bonding module not working with wireless drivers

On Wed, 2017-08-16 at 14:44 -0600, James Feeney wrote:
> On 08/13/2017 11:42 AM, Andreas Born wrote:
> > On a side note I would recommend some of my own reading to you
> > about
> > patch submission in general [1] and on netdev specifically [2].
>
> Mmm - [2] and [3], I suspect.  Thanks Andreas.  I'll be studying
> those.  Yeah,
> I'm still learning what is needed and what works.  Sometimes, just a
> note to the
> author is more than enough to resolve a problem.  Sometimes,
> discussion is
> needed.  And other times... well, certain people are infamous... but
> no problem
> here, thankfully.
>
> > And, just wondering, who's going to eventually close that
> > bugreport?
> > https://bugzilla.kernel.org/show_bug.cgi?id=196547
>
> I can close it when the patches actually land in the kernel.  I'm
> glad to see
> that there was an "Ack" from Mahesh.
>
> On the topic of wireless support for kernel ethtool reporting, I'm
> wondering, is
> there is any consensus about that?
>
> And, for instance, is there any *other* way for the bonding module to
> make
> "better link" decisions for wireless links?  As "wireless" becomes
> more capable,
> possibly more diverse, and probably more essential for computing,
> this is likely
> to become a bigger issue.
>
> Ben Greear mentioned that he had added some support to the ath10k
> driver.  Dan
> Williams mentioned the possibility of updating the mac80211 stack for
> support.
> And Arend van Spriel suggested that the issue might best be left for
> the next
> Netconf.
>
> Immediate problem solved, but maybe a larger issue still needs to be
> addressed?

Again, it's technically possible to add the link settings support to
wireless drivers. But the issue is around what bonding would do with
that information in its various modes.

My biggest suggestion is that perhaps bonding should grow hysteresis
for link speeds. Since WiFi can change speed every packet, you probably
don't want the bond characteristics changing every couple seconds just
in case your WiFi link is jumping around. Ethernet won't bounce around
that much, so the hysteresis would have no effect there. Or, if people
are concerned about response time to speed changes on ethernet (where
you probably do want an instant switch-over) some new flag to indicate
that certain devices don't have stable speeds over time.

Dan

2017-08-17 03:19:56

by Dan Williams

[permalink] [raw]
Subject: Re: Regression: Bug 196547 - Since 4.12 - bonding module not working with wireless drivers

On Wed, 2017-08-16 at 19:36 -0700, Ben Greear wrote:
> On 08/16/2017 07:11 PM, Dan Williams wrote:
> > On Wed, 2017-08-16 at 14:31 -0700, David Miller wrote:
> > > From: Dan Williams <[email protected]>
> > > Date: Wed, 16 Aug 2017 16:22:41 -0500
> > >
> > > > My biggest suggestion is that perhaps bonding should grow
> > >
> > > hysteresis
> > > > for link speeds. Since WiFi can change speed every packet, you
> > >
> > > probably
> > > > don't want the bond characteristics changing every couple
> > > > seconds
> > >
> > > just
> > > > in case your WiFi link is jumping around.  Ethernet won't
> > > > bounce
> > >
> > > around
> > > > that much, so the hysteresis would have no effect there.  Or,
> > > > if
> > >
> > > people
> > > > are concerned about response time to speed changes on ethernet
> > >
> > > (where
> > > > you probably do want an instant switch-over) some new flag to
> > >
> > > indicate
> > > > that certain devices don't have stable speeds over time.
> > >
> > > Or just report the average of the range the wireless link can
> > > hit,
> > > and
> > > be done with it.
> > >
> > > I think you guys are overcomplicating things.
> >
> > That range can be from 1 to > 800Mb/s.  No, it won't usually be all
> > over that range, but it won't be uncommon to fluctuate by hundreds
> > of
> > Mb/s.  I'm not sure a simple average is really the answer
> > here.  Even
> > doing that would require new knobs to ethtool, since the rate
> > depends
> > heavily on card capabilities and also what AP you're connected to
> > *at
> > that moment*.  If you roam to another AP, then the max speed can
> > certainly change.
> >
> > You'll probably say "aim for the 75% case" or something like that,
> > which is fine, but then you're depending on your 75% case to be (a)
> > single AP, (b) never move (eg, only bond wifi + ethernet), (c)
> > little
> > radio interference.  I'm not sure I'd buy that.  If I've put words
> > in
> > your mouth, forgive me.
>
> If you keep ethtool API simple and just return the last (rx-rate +
> tx-rate) / 2, or the rate averaged
> over the last 100 frames or 10 seconds, then the caller can do longer
> term averaging
> as it sees fit.  Probably no need for lots of averaging complexity in
> the kernel.

Yeah, that works too, but I was thinking it was better to present the
actual data through ethtool so that things other than bonding could use
it, and since bonding is the thing that actually cares about the
fluctuation, make it do the more extensive processing.

Dan


> rate-ctrl for wifi basically doesn't happen until you transmit or
> receive a
> fairly steady stream, so it will fluctuate a lot.

2017-08-13 17:42:54

by Andreas Born

[permalink] [raw]
Subject: Re: Regression: Bug 196547 - Since 4.12 - bonding module not working with wireless drivers

2017-08-12 21:30 GMT+02:00 James Feeney <[email protected]>:
>
>
> Andreas patch failed to address the continuous, *10-times per second* warning
> which will "spam" the log file, sometimes the console, whenever the test fails:
> if (bond_update_speed_duplex(slave) && bond_needs_speed_duplex(bond)) {...}
> which then has:
> netdev_warn(bond->dev, "failed to get link speed/duplex for%s\n",
> slave->dev->name);
>
> That is the sort of irresponsible code that "works fine" as long as there are no
> errors, and it never actually runs!
>
> I'm guessing that the simple fix is to use "net_warn_ratelimited()" instead of
> "netdev_warn()", where net/core/utils.c says:
>
> /*
> * All net warning printk()s should be guarded by this function.
> */
> int net_ratelimit(void)
> {
> return __ratelimit(&net_ratelimit_state);
> }
>
> though Andreas has also suggested "pr_warn_ratelimited()", which instead uses
> "__ratelimit(&_rs)".
>
> Here's a link to the rate-limiting patch, after Andreas' patch is already
> applied, since you say that David has already applied the first patch:
>
> https://bugzilla.kernel.org/attachment.cgi?id=257903
>

The other day I also sent a patch for part 2 of your bug report on
this list. Better safe than sorry by doing one thing per patch. This
second patch received feedback and its v2 is currently awaiting
review. Sorry, for informing you only now. I should have cc'd you in
the first place. You can lookup the state of that patch on patchwork.
[1] It's basically the same as previous versions just changing the
original code even less.
.
I sort of expect that this second patch won't be queued for stable.
But let's see... Essentially it's a regression too and additionally
would've been fixed by a plain revert.

On a side note I would recommend some of my own reading to you about
patch submission in general [1] and on netdev specifically [2].
Putting your suggested changes in the right form makes everyone's life
easier especially your own saving you lots of back and forth by mail.
Seeing the amount of mail on this list during the last days was reason
enough to comprehend the necessity of those standards.

And, just wondering, who's going to eventually close that bugreport?
https://bugzilla.kernel.org/show_bug.cgi?id=196547

Cheers
Andreas

[1] https://patchwork.ozlabs.org/patch/800770/
[2] https://www.kernel.org/doc/html/latest/process/submitting-patches.html
[3] https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/tree/Documentation/networking/netdev-FAQ.txt

2017-08-17 02:42:06

by David Miller

[permalink] [raw]
Subject: Re: Regression: Bug 196547 - Since 4.12 - bonding module not working with wireless drivers

From: Dan Williams <[email protected]>
Date: Wed, 16 Aug 2017 21:11:47 -0500

> You'll probably say "aim for the 75% case" or something like that,
> which is fine, but then you're depending on your 75% case to be (a)
> single AP, (b) never move (eg, only bond wifi + ethernet), (c) little
> radio interference. I'm not sure I'd buy that. If I've put words in
> your mouth, forgive me.

You can interpret what I'm saying as "some reasonable value is
better than no value at all."

You can keep arguing about AP changes, radio interference, etc.
but anything is better than the current situation.

Start small and simple, try to handle everything over time and
gradually.

2017-08-12 19:31:30

by James Feeney

[permalink] [raw]
Subject: Re: Regression: Bug 196547 - Since 4.12 - bonding module not working with wireless drivers

Hey Kalle

Still, a problem:

On 08/12/2017 01:35 AM, Kalle Valo wrote:
> Kalle Valo <[email protected]> writes:
>
>> Andreas Born <[email protected]> writes:
>>
>>> Earlier today I submitted the patch (bonding: require speed/duplex
>>> only for 802.3ad, alb and tlb) [2] that only partially reverts what is
>>> a regression following my aforementioned logic. This seems to me like
>>> the best solution in the short term since it should satisfy both
>>> usergroups represented by Mahesh and James and restores consistence
>>> with the bonding documentation. James already commented approvingly on
>>> that patch in the bug report. [3]
>>>
>>> Regards
>>> Andreas
>>>
>>> [1]
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/bonding.txt
>>
>> Great, thanks.
>>
>> I'll take it the patch is meant for net tree (and not net-next) so that
>> it will be fixed for v4.13? Also it should backported to v4.12 stable
>> tree. I don't see any mention of that in the patch submission and that's
>> why I'm asking.
>
> I see that Dave applied this to the net tree and queued also for stable,
> excellent. Thanks everyone!
>
> https://patchwork.ozlabs.org/patch/800080/
>

Andreas patch failed to address the continuous, *10-times per second* warning
which will "spam" the log file, sometimes the console, whenever the test fails:
if (bond_update_speed_duplex(slave) && bond_needs_speed_duplex(bond)) {...}
which then has:
netdev_warn(bond->dev, "failed to get link speed/duplex for%s\n",
slave->dev->name);

That is the sort of irresponsible code that "works fine" as long as there are no
errors, and it never actually runs!

I'm guessing that the simple fix is to use "net_warn_ratelimited()" instead of
"netdev_warn()", where net/core/utils.c says:

/*
* All net warning printk()s should be guarded by this function.
*/
int net_ratelimit(void)
{
return __ratelimit(&net_ratelimit_state);
}

though Andreas has also suggested "pr_warn_ratelimited()", which instead uses
"__ratelimit(&_rs)".

Here's a link to the rate-limiting patch, after Andreas' patch is already
applied, since you say that David has already applied the first patch:

https://bugzilla.kernel.org/attachment.cgi?id=257903


Thanks
James

2017-08-17 05:33:51

by Jay Vosburgh

[permalink] [raw]
Subject: Re: Regression: Bug 196547 - Since 4.12 - bonding module not working with wireless drivers

Dan Williams <[email protected]> wrote:
[...]
>You'll probably say "aim for the 75% case" or something like that,
>which is fine, but then you're depending on your 75% case to be (a)
>single AP, (b) never move (eg, only bond wifi + ethernet), (c) little
>radio interference. I'm not sure I'd buy that. If I've put words in
>your mouth, forgive me.

The primary use case that I'm aware of for bonding with wireless
devices is in active-backup mode, paired with an ethernet adapter set as
the bonding "primary" device. I think this is the case that absolutely
should just work. I don't think bonding cares (or should care) about
(a) - (c) for this use.

Your point (b) suggests that there are use cases other than the
above; I'm unfamiliar with any use other than wifi + ethernet, can you
elaborate?

-J

---
-Jay Vosburgh, [email protected]