2024-04-25 15:40:39

by Lukasz Majewski

[permalink] [raw]
Subject: [net-next PATCH] hsr: Simplify code for announcing HSR nodes timer setup

Up till now the code to start HSR announce timer, which triggers sending
supervisory frames, was assuming that hsr_netdev_notify() would be called
at least twice for hsrX interface. This was required to have different
values for old and current values of network device's operstate.

This is problematic for a case where hsrX interface is already in the
operational state when hsr_netdev_notify() is called, so timer is not
configured to trigger and as a result the hsrX is not sending supervisory
frames to HSR ring.

This error has been discovered when hsr_ping.sh script was run. To be
more specific - for the hsr1 and hsr2 the hsr_netdev_notify() was
called at least twice with different IF_OPER_{LOWERDOWN|DOWN|UP} states
assigned in hsr_check_carrier_and_operstate(hsr). As a result there was
no issue with sending supervisory frames.
However, with hsr3, the notify function was called only once with
operstate set to IF_OPER_UP and timer responsible for triggering
supervisory frames was not fired.

The solution is to use netif_oper_up() helper function to assess if
network device is up and then setup timer. Otherwise the timer is
activated.

Signed-off-by: Lukasz Majewski <[email protected]>
---
net/hsr/hsr_device.c | 15 +++++----------
1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c
index cd1e7c6d2fc0..e91d897e2cee 100644
--- a/net/hsr/hsr_device.c
+++ b/net/hsr/hsr_device.c
@@ -61,39 +61,34 @@ static bool hsr_check_carrier(struct hsr_port *master)
return false;
}

-static void hsr_check_announce(struct net_device *hsr_dev,
- unsigned char old_operstate)
+static void hsr_check_announce(struct net_device *hsr_dev)
{
struct hsr_priv *hsr;

hsr = netdev_priv(hsr_dev);
-
- if (READ_ONCE(hsr_dev->operstate) == IF_OPER_UP && old_operstate != IF_OPER_UP) {
+ if (netif_oper_up(hsr_dev)) {
/* Went up */
hsr->announce_count = 0;
mod_timer(&hsr->announce_timer,
jiffies + msecs_to_jiffies(HSR_ANNOUNCE_INTERVAL));
- }
-
- if (READ_ONCE(hsr_dev->operstate) != IF_OPER_UP && old_operstate == IF_OPER_UP)
+ } else {
/* Went down */
del_timer(&hsr->announce_timer);
+ }
}

void hsr_check_carrier_and_operstate(struct hsr_priv *hsr)
{
struct hsr_port *master;
- unsigned char old_operstate;
bool has_carrier;

master = hsr_port_get_hsr(hsr, HSR_PT_MASTER);
/* netif_stacked_transfer_operstate() cannot be used here since
* it doesn't set IF_OPER_LOWERLAYERDOWN (?)
*/
- old_operstate = READ_ONCE(master->dev->operstate);
has_carrier = hsr_check_carrier(master);
hsr_set_operstate(master, has_carrier);
- hsr_check_announce(master->dev, old_operstate);
+ hsr_check_announce(master->dev);
}

int hsr_get_max_mtu(struct hsr_priv *hsr)
--
2.20.1



2024-04-27 00:33:32

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [net-next PATCH] hsr: Simplify code for announcing HSR nodes timer setup

On Thu, 25 Apr 2024 17:39:58 +0200 Lukasz Majewski wrote:
> Up till now the code to start HSR announce timer, which triggers sending
> supervisory frames, was assuming that hsr_netdev_notify() would be called
> at least twice for hsrX interface. This was required to have different
> values for old and current values of network device's operstate.
>
> This is problematic for a case where hsrX interface is already in the
> operational state when hsr_netdev_notify() is called, so timer is not
> configured to trigger and as a result the hsrX is not sending supervisory
> frames to HSR ring.
>
> This error has been discovered when hsr_ping.sh script was run. To be
> more specific - for the hsr1 and hsr2 the hsr_netdev_notify() was
> called at least twice with different IF_OPER_{LOWERDOWN|DOWN|UP} states
> assigned in hsr_check_carrier_and_operstate(hsr). As a result there was
> no issue with sending supervisory frames.
> However, with hsr3, the notify function was called only once with
> operstate set to IF_OPER_UP and timer responsible for triggering
> supervisory frames was not fired.
>
> The solution is to use netif_oper_up() helper function to assess if
> network device is up and then setup timer. Otherwise the timer is
> activated.

NETDEV_CHANGE can get called for multiple trivial reasons, if the timer
is already running we'll mess with the spacing of the frames, no?

If there is a path where the device may get activated without the
notifier firing - maybe we can check carrier there and schedule the
timer?

Also sounds like a bug fix, so please add a Fixes tag.

2024-04-29 10:09:25

by Lukasz Majewski

[permalink] [raw]
Subject: Re: [net-next PATCH] hsr: Simplify code for announcing HSR nodes timer setup

Hi Jakub,

> On Thu, 25 Apr 2024 17:39:58 +0200 Lukasz Majewski wrote:
> > Up till now the code to start HSR announce timer, which triggers
> > sending supervisory frames, was assuming that hsr_netdev_notify()
> > would be called at least twice for hsrX interface. This was
> > required to have different values for old and current values of
> > network device's operstate.
> >
> > This is problematic for a case where hsrX interface is already in
> > the operational state when hsr_netdev_notify() is called, so timer
> > is not configured to trigger and as a result the hsrX is not
> > sending supervisory frames to HSR ring.
> >
> > This error has been discovered when hsr_ping.sh script was run. To
> > be more specific - for the hsr1 and hsr2 the hsr_netdev_notify() was
> > called at least twice with different IF_OPER_{LOWERDOWN|DOWN|UP}
> > states assigned in hsr_check_carrier_and_operstate(hsr). As a
> > result there was no issue with sending supervisory frames.
> > However, with hsr3, the notify function was called only once with
> > operstate set to IF_OPER_UP and timer responsible for triggering
> > supervisory frames was not fired.
> >
> > The solution is to use netif_oper_up() helper function to assess if
> > network device is up and then setup timer. Otherwise the timer is
> > activated.
>
> NETDEV_CHANGE can get called for multiple trivial reasons,

I've assumed that NETDEV_CHANGE would be called when the link has
changed - i.e. it is down/up or carrier is down/up.

The timer shall be running _only_ when the hsrX port is fully
operational (i.e. at least one of 'slave' ports is up and running).

The motivation for this patch was to enable HSR announce timer not only
on state change, but also when the ethernet device is already up (as it
happens with QEMU + netns setup).


> if the
> timer is already running we'll mess with the spacing of the frames,
> no?

When NETDEV_CHANGE is trigger for reason different than carrier (or
port state) change and the netif_oper_up() returns true, the period for
HSR supervisory frames (i.e. HSR_ANNOUNCE_INTEVAL) would be violated.

What are here the potential threads?

>
> If there is a path where the device may get activated without the
> notifier firing - maybe we can check carrier there and schedule the
> timer?

As I've stated above - IMHO the "announce" supervisory frames shall be
send only when HSR interface is up and running.

>
> Also sounds like a bug fix, so please add a Fixes tag.

Ok.


Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH, Managing Director: Erika Unter
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: [email protected]


Attachments:
(No filename) (499.00 B)
OpenPGP digital signature

2024-04-29 17:41:36

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [net-next PATCH] hsr: Simplify code for announcing HSR nodes timer setup

On Mon, 29 Apr 2024 12:09:04 +0200 Lukasz Majewski wrote:
> > if the
> > timer is already running we'll mess with the spacing of the frames,
> > no?
>
> When NETDEV_CHANGE is trigger for reason different than carrier (or
> port state) change and the netif_oper_up() returns true, the period for
> HSR supervisory frames (i.e. HSR_ANNOUNCE_INTEVAL) would be violated.
>
> What are here the potential threads?

Practically speaking I'm not sure if anyone uses any of the weird IFF_*
flags, but they are defined in uAPI (enum net_device_flags) and I don't
see much validation so presumably it's possible to flip them.

2024-04-30 12:55:08

by Lukasz Majewski

[permalink] [raw]
Subject: Re: [net-next PATCH] hsr: Simplify code for announcing HSR nodes timer setup

Hi Jakub,

> On Mon, 29 Apr 2024 12:09:04 +0200 Lukasz Majewski wrote:
> > > if the
> > > timer is already running we'll mess with the spacing of the
> > > frames, no?
> >
> > When NETDEV_CHANGE is trigger for reason different than carrier (or
> > port state) change and the netif_oper_up() returns true, the period
> > for HSR supervisory frames (i.e. HSR_ANNOUNCE_INTEVAL) would be
> > violated.
> >
> > What are here the potential threads?
>
> Practically speaking I'm not sure if anyone uses any of the weird
> IFF_* flags, but they are defined in uAPI (enum net_device_flags) and
> I don't see much validation so presumably it's possible to flip them.

Ok, I see.

Then - what would you recommend instead? The approach with manual
checking the previous state has described drawbacks.

I've poked around kernel sources and it looks like the netif_oper_up()
is used in conjunction with netif_running():

netif_running(dev) && netif_oper_up(dev)

so, IMHO the netif_running(dev) shall be added to the condition.


In the uapi/include/linux/if.h there are serveral IF_OPER_* flags
defined. It looks to me that only for the IF_OPER_UP the HSR interface
shall send announcement supervisory frames. With other conditions it
shall be turned off.


Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH, Managing Director: Erika Unter
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: [email protected]


Attachments:
(No filename) (499.00 B)
OpenPGP digital signature

2024-04-30 14:45:57

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [net-next PATCH] hsr: Simplify code for announcing HSR nodes timer setup

On Tue, 30 Apr 2024 14:52:43 +0200 Lukasz Majewski wrote:
> > Practically speaking I'm not sure if anyone uses any of the weird
> > IFF_* flags, but they are defined in uAPI (enum net_device_flags) and
> > I don't see much validation so presumably it's possible to flip them.
>
> Ok, I see.
>
> Then - what would you recommend instead? The approach with manual
> checking the previous state has described drawbacks.

Add a bool somewhere to track if the timer has been scheduled?
The NETDEV_ events in question are called under rtnl_lock, so
no extra locking should be needed.