2024-06-08 01:48:13

by John Thomson

[permalink] [raw]
Subject: [RFC net-next] net: dsa: generate port ifname if exists or invalid

In the case where a DSA port (via DTB label) had an interface name
that collided with an existing netdev name, register_netdevice failed
with -EEXIST, and the port was not usable. While this did correctly
identify a configuration error in DTB, rather bringup the port with an
enumerated interface name, which can be renamed later from userspace
where required.
While this does change the implicit expectation that it is an error if
the DSA port cannot use it's predictable (DTS label) name, there is no
functionality to stop netdev from allocating one of these (perhaps
poorly selected) DSA port names to a non-DSA device before the DSA
device can.

While at it, also test that the port name is a valid interface name,
before doing the work to setup the device, and use an enumerated name
otherwise.

This was seen recently (for the EdgeRouter X device) in OpenWrt when a
downstream hack [1] was removed, which had used DTS label for ifname
in an ethernet device driver, in favour of renaming ifnames in userspace.
At the time the device was added to OpenWrt, it only used one network
device driver interface, plus the switch ports, so eth1 (matching physical
labelling) was used as a switch port label. Since, this device has
been adjusted to use phy muxing, exposing a switch port instead as the
second network device, so at bringup for this DSA port, eth1
(which is later renamed in userspace) exists, and the eth1 labelled
DSA port cannot be used.

[1]: https://lore.kernel.org/netdev/[email protected]/

Signed-off-by: John Thomson <[email protected]>
---

RFC:
Not a full solution.

Not sure if supported, I cannot see any users in tree DTS,
but I guess I would need to skip these checks (and should mark as
NEM_NAME_ENUM) if port->name contains '%'.

name is also used in alloc_netdev_mqs, and I have not worked out if any
of the functionality between alloc_netdev_mqs and the register_netdevice
uses name, so I added these test early, but believe without a rntl lock,
a colliding name could still be allocated to another device between this
introduced test, and where this device does lock and register_netdevice
near the end of this function.
To deal with this looks to require moving the rntl_lock before
these tests, which would lock around significantly more.

As an alternative, could we possibly always register an enumerated name,
then (if name valid) dev_change_name (not exported), while still within
the lock after register_netdevice?

Or could we introduce a parameter or switch-level DTS property that forces
DSA to ignore port labels, so that all network devices names can be
managed from userspace (using the existing port DSA label as intended name,
as this still seems the best place to define device labels, even if the
driver does not use this label)?

Cheers
---
net/dsa/user.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)

diff --git a/net/dsa/user.c b/net/dsa/user.c
index 867c5fe9a4da..347d2d8eb219 100644
--- a/net/dsa/user.c
+++ b/net/dsa/user.c
@@ -2684,6 +2684,7 @@ int dsa_user_create(struct dsa_port *port)
struct dsa_switch *ds = port->ds;
struct net_device *user_dev;
struct dsa_user_priv *p;
+ bool valid_name = false;
const char *name;
int assign_type;
int ret;
@@ -2692,6 +2693,20 @@ int dsa_user_create(struct dsa_port *port)
ds->num_tx_queues = 1;

if (port->name) {
+ if (!netdev_name_in_use(&init_net, port->name))
+ valid_name = true;
+ else
+ netdev_warn(conduit, "port %d set name: %s: already in use\n",
+ port->index, port->name);
+ if (dev_valid_name(port->name)) {
+ valid_name &= true;
+ } else {
+ valid_name = false;
+ netdev_warn(conduit, "port %d set name: %s: is invalid\n",
+ port->index, port->name);
+ }
+ }
+ if (valid_name) {
name = port->name;
assign_type = NET_NAME_PREDICTABLE;
} else {
--
2.45.1



2024-06-13 11:44:17

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [RFC net-next] net: dsa: generate port ifname if exists or invalid

On Sat, Jun 08, 2024 at 11:47:24AM +1000, John Thomson wrote:
> RFC:
> Not a full solution.
>
> Not sure if supported, I cannot see any users in tree DTS,
> but I guess I would need to skip these checks (and should mark as
> NEM_NAME_ENUM) if port->name contains '%'.
>
> name is also used in alloc_netdev_mqs, and I have not worked out if any
> of the functionality between alloc_netdev_mqs and the register_netdevice
> uses name, so I added these test early, but believe without a rntl lock,
> a colliding name could still be allocated to another device between this
> introduced test, and where this device does lock and register_netdevice
> near the end of this function.
> To deal with this looks to require moving the rntl_lock before
> these tests, which would lock around significantly more.
>
> As an alternative, could we possibly always register an enumerated name,
> then (if name valid) dev_change_name (not exported), while still within
> the lock after register_netdevice?
>
> Or could we introduce a parameter or switch-level DTS property that forces
> DSA to ignore port labels, so that all network devices names can be
> managed from userspace (using the existing port DSA label as intended name,
> as this still seems the best place to define device labels, even if the
> driver does not use this label)?

Why not just _not_ use the 'label' device tree property, and bring
a decent udev implementation into OpenWrt which can handle persistent
naming according to the labels on the box? Even within DSA, it is
considered better practice to use udev rather than 'label'. Not to
mention that once available, udev is a uniform solution for all network
interfaces, unlike 'label'.

Full disclosure: I myself tried for about 30 minutes to convert the udev
rules below into an /etc/hotplug.d script that procd would run, before
getting the impression it's never going to work as intended, because by
the time all relevant "add" actions run (built-in drivers), user space
hasn't even loaded, and thus hasn't got a chance to run any hooks.
I haven't actually opened the source code to compare how other uevent
handlers deal with this.

ACTION=="add", SUBSYSTEM=="net", KERNELS=="0000:00:00.5", DRIVERS=="mscc_felix", ATTR{phys_port_name}=="p0", NAME="swp0"
ACTION=="add", SUBSYSTEM=="net", KERNELS=="0000:00:00.5", DRIVERS=="mscc_felix", ATTR{phys_port_name}=="p1", NAME="swp1"
ACTION=="add", SUBSYSTEM=="net", KERNELS=="0000:00:00.5", DRIVERS=="mscc_felix", ATTR{phys_port_name}=="p2", NAME="swp2"
ACTION=="add", SUBSYSTEM=="net", KERNELS=="0000:00:00.5", DRIVERS=="mscc_felix", ATTR{phys_port_name}=="p3", NAME="swp3"
ACTION=="add", SUBSYSTEM=="net", KERNELS=="0000:00:00.5", DRIVERS=="mscc_felix", ATTR{phys_port_name}=="p4", NAME="swp4"
ACTION=="add", SUBSYSTEM=="net", KERNELS=="0000:00:00.5", DRIVERS=="mscc_felix", ATTR{phys_port_name}=="p5", NAME="swp5"

2024-06-15 14:49:45

by Daniel Golle

[permalink] [raw]
Subject: Re: [RFC net-next] net: dsa: generate port ifname if exists or invalid

Hi Vladimir,

On Thu, Jun 13, 2024 at 02:43:14PM +0300, Vladimir Oltean wrote:
> On Sat, Jun 08, 2024 at 11:47:24AM +1000, John Thomson wrote:
> > RFC:
> > Not a full solution.
> >
> > Not sure if supported, I cannot see any users in tree DTS,
> > but I guess I would need to skip these checks (and should mark as
> > NEM_NAME_ENUM) if port->name contains '%'.
> >
> > name is also used in alloc_netdev_mqs, and I have not worked out if any
> > of the functionality between alloc_netdev_mqs and the register_netdevice
> > uses name, so I added these test early, but believe without a rntl lock,
> > a colliding name could still be allocated to another device between this
> > introduced test, and where this device does lock and register_netdevice
> > near the end of this function.
> > To deal with this looks to require moving the rntl_lock before
> > these tests, which would lock around significantly more.
> >
> > As an alternative, could we possibly always register an enumerated name,
> > then (if name valid) dev_change_name (not exported), while still within
> > the lock after register_netdevice?
> >
> > Or could we introduce a parameter or switch-level DTS property that forces
> > DSA to ignore port labels, so that all network devices names can be
> > managed from userspace (using the existing port DSA label as intended name,
> > as this still seems the best place to define device labels, even if the
> > driver does not use this label)?
>
> Why not just _not_ use the 'label' device tree property, and bring
> a decent udev implementation into OpenWrt which can handle persistent
> naming according to the labels on the box? Even within DSA, it is
> considered better practice to use udev rather than 'label'. Not to
> mention that once available, udev is a uniform solution for all network
> interfaces, unlike 'label'.

Sounds fine generally. Where would you store the device-specific renaming
rules while making sure you don't need to carry the rules for all devices
onto every single device? Would you generate a device-specific rootfs for
each and every device? For obvious reasons this is something we'd very
much like to avoid, as building individual filesystems for ~ 1000 devices
would be insane compared to having a bunch (< 100) of generic filesystems
which some of them fitting a large group (ie. same SoC) of boards.
Most OpenWrt devices out there are based on the same SoCs, so currently
the devices in the popular targets like MT7621 or IPQ40xx all share the
same target-wide kernel **and rootfs**.

tl;dr: The good thing about the 'label' property is certainly that such
board- specific details are kept in DT, and hence a generic rootfs can
deal with it.

As having the 'label' property applied also for non-DSA netdevs by the
kernel has been rejected we did come up with a simple userland
implementation:

https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=2a25c6ace8d833cf491a66846a0b9e7c5387b8f0

For interfaces added at a later stage at boot, ie. by loading kernel modules
or actual hotplug, we could do the same in a hotplug script.

So yes, dropping support for dealing with the 'label' property in kernel
entirely would also fix it for us, because then we would just always deal
with it in userland (still using the same property in DT, just not applied
by the kernel).

>
> Full disclosure: I myself tried for about 30 minutes to convert the udev
> rules below into an /etc/hotplug.d script that procd would run, before
> getting the impression it's never going to work as intended, because by
> the time all relevant "add" actions run (built-in drivers), user space
> hasn't even loaded, and thus hasn't got a chance to run any hooks.
> I haven't actually opened the source code to compare how other uevent
> handlers deal with this.
>
> ACTION=="add", SUBSYSTEM=="net", KERNELS=="0000:00:00.5", DRIVERS=="mscc_felix", ATTR{phys_port_name}=="p0", NAME="swp0"
> ACTION=="add", SUBSYSTEM=="net", KERNELS=="0000:00:00.5", DRIVERS=="mscc_felix", ATTR{phys_port_name}=="p1", NAME="swp1"
> ACTION=="add", SUBSYSTEM=="net", KERNELS=="0000:00:00.5", DRIVERS=="mscc_felix", ATTR{phys_port_name}=="p2", NAME="swp2"
> ACTION=="add", SUBSYSTEM=="net", KERNELS=="0000:00:00.5", DRIVERS=="mscc_felix", ATTR{phys_port_name}=="p3", NAME="swp3"
> ACTION=="add", SUBSYSTEM=="net", KERNELS=="0000:00:00.5", DRIVERS=="mscc_felix", ATTR{phys_port_name}=="p4", NAME="swp4"
> ACTION=="add", SUBSYSTEM=="net", KERNELS=="0000:00:00.5", DRIVERS=="mscc_felix", ATTR{phys_port_name}=="p5", NAME="swp5"
>

Yes, this is a problem in general. We will need better coldplug support,
right now only devices added after procd is launched are taken care of.


Cheers


Daniel