Hi,
Many DSA switches (drivers/net/dsa/) are in fact complex SoCs with more
embedded peripherals than just the Ethernet switching IP. For example,
I was trying to add interrupt support for the internal PHYs of the NXP
SJA1110 switch. For context, one SJA1110 switch as seen in fsl-lx2160a-bluebox3.dts
currently has bindings which look like this:
sw2: ethernet-switch@2 {
compatible = "nxp,sja1110a";
reg = <2>;
spi-max-frequency = <4000000>;
spi-cpol;
dsa,member = <0 1>;
ethernet-ports {
#address-cells = <1>;
#size-cells = <0>;
/* Microcontroller port */
port@0 {
reg = <0>;
status = "disabled";
};
sw2p1: port@1 {
reg = <1>;
link = <&sw1p4>;
phy-mode = "sgmii";
fixed-link {
speed = <1000>;
full-duplex;
};
};
port@2 {
reg = <2>;
ethernet = <&dpmac18>;
phy-mode = "rgmii-id";
rx-internal-delay-ps = <2000>;
tx-internal-delay-ps = <2000>;
fixed-link {
speed = <1000>;
full-duplex;
};
};
port@3 {
reg = <3>;
label = "1ge_p2";
phy-mode = "rgmii-id";
phy-handle = <&sw2_mii3_phy>;
};
port@4 {
reg = <4>;
label = "to_sw3";
phy-mode = "2500base-x";
fixed-link {
speed = <2500>;
full-duplex;
};
};
port@5 {
reg = <5>;
label = "trx7";
phy-mode = "internal";
phy-handle = <&sw2_port5_base_t1_phy>;
};
port@6 {
reg = <6>;
label = "trx8";
phy-mode = "internal";
phy-handle = <&sw2_port6_base_t1_phy>;
};
port@7 {
reg = <7>;
label = "trx9";
phy-mode = "internal";
phy-handle = <&sw2_port7_base_t1_phy>;
};
port@8 {
reg = <8>;
label = "trx10";
phy-mode = "internal";
phy-handle = <&sw2_port8_base_t1_phy>;
};
port@9 {
reg = <9>;
label = "trx11";
phy-mode = "internal";
phy-handle = <&sw2_port9_base_t1_phy>;
};
port@a {
reg = <10>;
label = "trx12";
phy-mode = "internal";
phy-handle = <&sw2_port10_base_t1_phy>;
};
};
mdios {
#address-cells = <1>;
#size-cells = <0>;
mdio@0 {
compatible = "nxp,sja1110-base-t1-mdio";
#address-cells = <1>;
#size-cells = <0>;
reg = <0>;
sw2_port5_base_t1_phy: ethernet-phy@1 {
compatible = "ethernet-phy-ieee802.3-c45";
reg = <0x1>;
};
sw2_port6_base_t1_phy: ethernet-phy@2 {
compatible = "ethernet-phy-ieee802.3-c45";
reg = <0x2>;
};
sw2_port7_base_t1_phy: ethernet-phy@3 {
compatible = "ethernet-phy-ieee802.3-c45";
reg = <0x3>;
};
sw2_port8_base_t1_phy: ethernet-phy@4 {
compatible = "ethernet-phy-ieee802.3-c45";
reg = <0x4>;
};
sw2_port9_base_t1_phy: ethernet-phy@5 {
compatible = "ethernet-phy-ieee802.3-c45";
reg = <0x5>;
};
sw2_port10_base_t1_phy: ethernet-phy@6 {
compatible = "ethernet-phy-ieee802.3-c45";
reg = <0x6>;
};
};
};
};
To add interrupts in a naive way, similar to how other DSA drivers have
done it, it would have to be done like this:
#include <dt-bindings/interrupt-controller/nxp-sja1110-acu-slir.h>
sw2: ethernet-switch@2 {
compatible = "nxp,sja1110a";
reg = <2>;
spi-max-frequency = <4000000>;
spi-cpol;
dsa,member = <0 1>;
slir2: interrupt-controller {
compatible = "nxp,sja1110-acu-slir";
interrupt-controller;
#interrupt-cells = <1>;
interrupt-parent = <&gpio 10>;
};
ethernet-ports {
#address-cells = <1>;
#size-cells = <0>;
/* Microcontroller port */
port@0 {
reg = <0>;
status = "disabled";
};
sw2p1: port@1 {
reg = <1>;
link = <&sw1p4>;
phy-mode = "sgmii";
fixed-link {
speed = <1000>;
full-duplex;
};
};
port@2 {
reg = <2>;
ethernet = <&dpmac18>;
phy-mode = "rgmii-id";
rx-internal-delay-ps = <2000>;
tx-internal-delay-ps = <2000>;
fixed-link {
speed = <1000>;
full-duplex;
};
};
port@3 {
reg = <3>;
label = "1ge_p2";
phy-mode = "rgmii-id";
phy-handle = <&sw2_mii3_phy>;
};
port@4 {
reg = <4>;
label = "to_sw3";
phy-mode = "2500base-x";
fixed-link {
speed = <2500>;
full-duplex;
};
};
port@5 {
reg = <5>;
label = "trx7";
phy-mode = "internal";
phy-handle = <&sw2_port5_base_t1_phy>;
};
port@6 {
reg = <6>;
label = "trx8";
phy-mode = "internal";
phy-handle = <&sw2_port6_base_t1_phy>;
};
port@7 {
reg = <7>;
label = "trx9";
phy-mode = "internal";
phy-handle = <&sw2_port7_base_t1_phy>;
};
port@8 {
reg = <8>;
label = "trx10";
phy-mode = "internal";
phy-handle = <&sw2_port8_base_t1_phy>;
};
port@9 {
reg = <9>;
label = "trx11";
phy-mode = "internal";
phy-handle = <&sw2_port9_base_t1_phy>;
};
port@a {
reg = <10>;
label = "trx12";
phy-mode = "internal";
phy-handle = <&sw2_port10_base_t1_phy>;
};
};
mdios {
#address-cells = <1>;
#size-cells = <0>;
mdio@0 {
compatible = "nxp,sja1110-base-t1-mdio";
#address-cells = <1>;
#size-cells = <0>;
reg = <0>;
sw2_port5_base_t1_phy: ethernet-phy@1 {
compatible = "ethernet-phy-ieee802.3-c45";
reg = <0x1>;
interrupts-extended = <&slir2 SJA1110_IRQ_CBT1_PHY1>;
};
sw2_port6_base_t1_phy: ethernet-phy@2 {
compatible = "ethernet-phy-ieee802.3-c45";
reg = <0x2>;
interrupts-extended = <&slir2 SJA1110_IRQ_CBT1_PHY2>;
};
sw2_port7_base_t1_phy: ethernet-phy@3 {
compatible = "ethernet-phy-ieee802.3-c45";
reg = <0x3>;
interrupts-extended = <&slir2 SJA1110_IRQ_CBT1_PHY3>;
};
sw2_port8_base_t1_phy: ethernet-phy@4 {
compatible = "ethernet-phy-ieee802.3-c45";
reg = <0x4>;
interrupts-extended = <&slir2 SJA1110_IRQ_CBT1_PHY4>;
};
sw2_port9_base_t1_phy: ethernet-phy@5 {
compatible = "ethernet-phy-ieee802.3-c45";
reg = <0x5>;
interrupts-extended = <&slir2 SJA1110_IRQ_CBT1_PHY5>;
};
sw2_port10_base_t1_phy: ethernet-phy@6 {
compatible = "ethernet-phy-ieee802.3-c45";
reg = <0x6>;
interrupts-extended = <&slir2 SJA1110_IRQ_CBT1_PHY6>;
};
};
};
};
However, the irq_domain/irqchip handling code in this case will go to
drivers/net/dsa/, and it won't really be a "driver" (there is no struct
device of its own). I don't really like that, the drivers/net/dsa/
folder should ideally contain only drivers for the switching IP.
It doesn't scale to find here code for GPIO, interrupts, MDIO, hwmon and
what have you; not only because the folder gets bloated with irrelevant
stuff, but also because DSA maintainers are not the best reviewers of
drivers which really belong to (and make use of infrastructure from)
other subsystems.
Not only that, but "interrupt-controller" cannot even have a unit
address with these bindings, because it's on the same level as
"ethernet-ports" which requires not having it. But there are multiple
interrupt controllers in the SJA1110 block (I could count 3). Not clear
how the other 3 would be defined in the device tree in this format.
The logical continuation of the existing bindings would be to do what
was done for the multiple MDIO controllers: an "interrupt-controllers"
container node, with children with #address-cells = <1>.
I think that doesn't scale very well either, so I was looking into
transitioning the sja1105 bindings to something similar to what Colin
Foster has done with vsc7512 (ocelot). For this switch, new-style
bindings would look like this:
soc@2 {
compatible = "nxp,sja1110-soc";
reg = <2>;
spi-max-frequency = <4000000>;
spi-cpol;
#address-cells = <1>;
#size-cells = <1>;
sw2: ethernet-switch@0 {
compatible = "nxp,sja1110a";
reg = <0x000000 0x400000>;
resets = <&sw2_rgu SJA1110_RGU_ETHSW_RST>;
dsa,member = <0 1>;
ethernet-ports {
#address-cells = <1>;
#size-cells = <0>;
/* Microcontroller port */
port@0 {
reg = <0>;
status = "disabled";
};
sw2p1: port@1 {
reg = <1>;
link = <&sw1p4>;
phy-mode = "sgmii";
fixed-link {
speed = <1000>;
full-duplex;
};
};
port@2 {
reg = <2>;
ethernet = <&dpmac18>;
phy-mode = "rgmii-id";
rx-internal-delay-ps = <2000>;
tx-internal-delay-ps = <2000>;
fixed-link {
speed = <1000>;
full-duplex;
};
};
port@3 {
reg = <3>;
label = "1ge_p2";
phy-mode = "rgmii-id";
phy-handle = <&sw2_mii3_phy>;
};
port@4 {
reg = <4>;
label = "to_sw3";
phy-mode = "2500base-x";
fixed-link {
speed = <2500>;
full-duplex;
};
};
port@5 {
reg = <5>;
label = "trx7";
phy-mode = "internal";
phy-handle = <&sw2_port5_base_t1_phy>;
};
port@6 {
reg = <6>;
label = "trx8";
phy-mode = "internal";
phy-handle = <&sw2_port6_base_t1_phy>;
};
port@7 {
reg = <7>;
label = "trx9";
phy-mode = "internal";
phy-handle = <&sw2_port7_base_t1_phy>;
};
port@8 {
reg = <8>;
label = "trx10";
phy-mode = "internal";
phy-handle = <&sw2_port8_base_t1_phy>;
};
port@9 {
reg = <9>;
label = "trx11";
phy-mode = "internal";
phy-handle = <&sw2_port9_base_t1_phy>;
};
port@a {
reg = <10>;
label = "trx12";
phy-mode = "internal";
phy-handle = <&sw2_port10_base_t1_phy>;
};
};
};
mdio@704000 {
compatible = "nxp,sja1110-base-t1-mdio";
#address-cells = <1>;
#size-cells = <0>;
reg = <0x704000 0x1000>;
sw2_port5_base_t1_phy: ethernet-phy@1 {
compatible = "ethernet-phy-ieee802.3-c45";
reg = <0x1>;
interrupts-extended = <&slir2 SJA1110_IRQ_CBT1_PHY1>;
};
sw2_port6_base_t1_phy: ethernet-phy@2 {
compatible = "ethernet-phy-ieee802.3-c45";
reg = <0x2>;
interrupts-extended = <&slir2 SJA1110_IRQ_CBT1_PHY2>;
};
sw2_port7_base_t1_phy: ethernet-phy@3 {
compatible = "ethernet-phy-ieee802.3-c45";
reg = <0x3>;
interrupts-extended = <&slir2 SJA1110_IRQ_CBT1_PHY3>;
};
sw2_port8_base_t1_phy: ethernet-phy@4 {
compatible = "ethernet-phy-ieee802.3-c45";
reg = <0x4>;
interrupts-extended = <&slir2 SJA1110_IRQ_CBT1_PHY4>;
};
sw2_port9_base_t1_phy: ethernet-phy@5 {
compatible = "ethernet-phy-ieee802.3-c45";
reg = <0x5>;
interrupts-extended = <&slir2 SJA1110_IRQ_CBT1_PHY5>;
};
sw2_port10_base_t1_phy: ethernet-phy@6 {
compatible = "ethernet-phy-ieee802.3-c45";
reg = <0x6>;
interrupts-extended = <&slir2 SJA1110_IRQ_CBT1_PHY6>;
};
};
mdio@709000 {
compatible = "nxp,sja1110-base-tx-mdio";
#address-cells = <1>;
#size-cells = <0>;
reg = <0x709000 0x1000>;
};
slir2: interrupt-controller@711fe0 {
compatible = "nxp,sja1110-acu-slir";
reg = <0x711fe0 0x10>;
interrupt-controller;
#interrupt-cells = <1>;
interrupt-parent = <&gpio 10>;
};
gpio@712000 {
compatible = "nxp,sja1110-gpio";
reg = <0x712000 0x1000>;
gpio-controller;
};
slir2_gpio: interrupt-controller@712fe0 {
compatible = "nxp,sja1110-gpio-slir";
reg = <0x712fe0 0x10>;
interrupt-controller;
#interrupt-cells = <1>;
};
sw2_rgu: reset@718000 {
compatible = "nxp,sja1110-rgu";
reg = <0x718000 0x1000>;
#reset-cells = <1>;
};
sw2_cgu: clock-controller@719000 {
compatible = "nxp,sja1110-cgu";
reg = <0x719000 0x1000>;
#clock-cells = <1>;
};
slir2_pmu: interrupt-controller@71afe0 {
compatible = "nxp,sja1110-pmu-slir";
reg = <0x71afe0 0x10>;
interrupt-controller;
#interrupt-cells = <1>;
};
};
In this model, the Ethernet switch is just one of the children of the
SoC driver (the one owning the spi_device, and creating regmaps for its
children). The DSA driver doesn't have to concern itself with the other
drivers, which can live in their own folders.
It seems like mfd is a tool which could help with the driver for
"nxp,sja1110-soc", but mfd_add_devices() requires an array of struct
mfd_cell, and I don't really like that. I want a driver which just
creates IORESOURCE_REG arrays of resources for each child OF node
according to their "reg" properties, creates regmaps for each resource
using a regmap_config that I give it, associates the regmaps with the
parent using devres, and allocates/probes a platform device driver for
each of these child OF nodes. I don't want to spell out a different
mfd_cell for each driver that I'd like the SoC to probe.
It looks like of_platform_populate() would be an alternative option for
this task, but that doesn't live up to the task either. It will assume
that the addresses of the SoC children are in the CPU's address space
(IORESOURCE_MEM), and attempt to translate them. It simply doesn't have
the concept of IORESOURCE_REG. The MFD drivers which call
of_platform_populate() (simple-mfd-i2c.c) simply don't have unit
addresses for their children, and this is why address translation isn't
a problem for them.
In fact, this seems to be a rather large limitation of include/linux/of_address.h.
Even something as simple as of_address_count() will end up trying to
translate the address into the CPU memory space, so not even open-coding
the resource creation in the SoC driver is as simple as it appears.
Is there a better way than completely open-coding the parsing of the OF
addresses when turning them into IORESOURCE_REG resources (or open-coding
mfd_cells for each child)? Would there be a desire in creating a generic
set of helpers which create platform devices with IORESOURCE_REG resources,
based solely on OF addresses of children? What would be the correct
scope for these helpers?
> I think that doesn't scale very well either, so I was looking into
> transitioning the sja1105 bindings to something similar to what Colin
> Foster has done with vsc7512 (ocelot). For this switch, new-style
> bindings would look like this:
Have you looked at probe ordering issues? MFD devices i've played with
are very independent. They are a bunch of IP blocks sharing a bus. A
switch however is very interconnected.
>
> soc@2 {
> compatible = "nxp,sja1110-soc";
> reg = <2>;
> spi-max-frequency = <4000000>;
> spi-cpol;
> #address-cells = <1>;
> #size-cells = <1>;
>
> sw2: ethernet-switch@0 {
> compatible = "nxp,sja1110a";
> reg = <0x000000 0x400000>;
> resets = <&sw2_rgu SJA1110_RGU_ETHSW_RST>;
> dsa,member = <0 1>;
>
> ethernet-ports {
> #address-cells = <1>;
> #size-cells = <0>;
...
>
> port@3 {
> reg = <3>;
> label = "1ge_p2";
> phy-mode = "rgmii-id";
> phy-handle = <&sw2_mii3_phy>;
> };
So for the switch to probe, the PHY needs to probe first.
> mdio@704000 {
> compatible = "nxp,sja1110-base-t1-mdio";
> #address-cells = <1>;
> #size-cells = <0>;
> reg = <0x704000 0x1000>;
>
> sw2_port5_base_t1_phy: ethernet-phy@1 {
> compatible = "ethernet-phy-ieee802.3-c45";
> reg = <0x1>;
> interrupts-extended = <&slir2 SJA1110_IRQ_CBT1_PHY1>;
> };
For the PHY to probe requires that the interrupt controller probes first.
> slir2: interrupt-controller@711fe0 {
> compatible = "nxp,sja1110-acu-slir";
> reg = <0x711fe0 0x10>;
> interrupt-controller;
> #interrupt-cells = <1>;
> interrupt-parent = <&gpio 10>;
> };
and the interrupt controller requires its parent gpio controller
probes first. I assume this is the host SOC GPIO controller, not the
switches GPIO controller.
> sw2_rgu: reset@718000 {
> compatible = "nxp,sja1110-rgu";
> reg = <0x718000 0x1000>;
> #reset-cells = <1>;
> };
and presumably something needs to hit the reset at some point? Will
there be DT phandles to this?
>
> sw2_cgu: clock-controller@719000 {
> compatible = "nxp,sja1110-cgu";
> reg = <0x719000 0x1000>;
> #clock-cells = <1>;
> };
and phandles to the clock driver?
Before doing too much in this direction, i would want to be sure you
have sufficient control of ordering and the DT loops are not too
complex, that the MFD core and the driver core can actually get
everything probed.
The current way of doing it, with the drivers embedded inside the DSA
driver is that DT blob only exposes what needs to be seen outside of
the DSA driver. And the driver has full control over the order it
probes its internal sub drivers, so ensuring they are probed in the
correct order, and the linking is done in C, not DT, were again the
driver is in full control.
I do however agree that being able to split sub drivers out of the
main driver is a good idea, and putting them in the appropriated
subsystem would help with code review.
Maybe the media subsystem has some pointers how to do this. It also
has complex devices made up from lots of sub devices.
Andrew
On Thu, Dec 22, 2022 at 03:34:27PM +0100, Andrew Lunn wrote:
> > I think that doesn't scale very well either, so I was looking into
> > transitioning the sja1105 bindings to something similar to what Colin
> > Foster has done with vsc7512 (ocelot). For this switch, new-style
> > bindings would look like this:
>
> Have you looked at probe ordering issues? MFD devices i've played with
> are very independent. They are a bunch of IP blocks sharing a bus. A
> switch however is very interconnected.
Some bits are functionally fully independent in a switch SoC as well -
the GPIO controller might have nothing to do with the MDIO controller.
Sure, there might be interdependencies too. That being said, there
shouldn't be probe ordering issues. Children of the soc node can depend
on each other (not circularly), but they are probed in parallel by the
soc driver, so that's not a problem.
> > soc@2 {
> > compatible = "nxp,sja1110-soc";
> > reg = <2>;
> > spi-max-frequency = <4000000>;
> > spi-cpol;
> > #address-cells = <1>;
> > #size-cells = <1>;
> >
> > sw2: ethernet-switch@0 {
> > compatible = "nxp,sja1110a";
> > reg = <0x000000 0x400000>;
> > resets = <&sw2_rgu SJA1110_RGU_ETHSW_RST>;
> > dsa,member = <0 1>;
> >
> > ethernet-ports {
> > #address-cells = <1>;
> > #size-cells = <0>;
>
> ...
>
> >
> > port@3 {
> > reg = <3>;
> > label = "1ge_p2";
> > phy-mode = "rgmii-id";
> > phy-handle = <&sw2_mii3_phy>;
> > };
>
> So for the switch to probe, the PHY needs to probe first.
Yup. This is better/clearer compared to the original binding, where the
mdio was a child of the ethernet-switch, now they can probe truly in
parallel, and fw_devlink can even enforce any ordering it wants.
> > mdio@704000 {
> > compatible = "nxp,sja1110-base-t1-mdio";
> > #address-cells = <1>;
> > #size-cells = <0>;
> > reg = <0x704000 0x1000>;
> >
> > sw2_port5_base_t1_phy: ethernet-phy@1 {
> > compatible = "ethernet-phy-ieee802.3-c45";
> > reg = <0x1>;
> > interrupts-extended = <&slir2 SJA1110_IRQ_CBT1_PHY1>;
> > };
>
> For the PHY to probe requires that the interrupt controller probes first.
Yup. This was actually a problem with fw_devlink with the old bindings,
especially with mv88e6xxx-type bindings, where the interrupt-controller;
property gets applied to the ethernet-switch node, and so, the
interrupts-extended property is practically a backlink.
> > slir2: interrupt-controller@711fe0 {
> > compatible = "nxp,sja1110-acu-slir";
> > reg = <0x711fe0 0x10>;
> > interrupt-controller;
> > #interrupt-cells = <1>;
> > interrupt-parent = <&gpio 10>;
> > };
>
> and the interrupt controller requires its parent gpio controller
> probes first. I assume this is the host SOC GPIO controller, not the
> switches GPIO controller.
Yup, the interrupt-parent is a host interrupt, not something handled by
the switch. Although I've added logic to this irqchip driver (not posted)
which makes the parent interrupt completely optional, and it falls back
to poll mode if the parent IRQ is missing.
>
> > sw2_rgu: reset@718000 {
> > compatible = "nxp,sja1110-rgu";
> > reg = <0x718000 0x1000>;
> > #reset-cells = <1>;
> > };
>
> and presumably something needs to hit the reset at some point? Will
> there be DT phandles to this?
Yes, there already is a phandle in the ethernet-switch node:
resets = <&sw2_rgu SJA1110_RGU_ETHSW_RST>;
This is something which SJA1110 engineers did better than SJA1105: the
reset domains are much better separated. Via the RGU block, one can
individually reset different peripherals.
Here's the content of my #include <dt-bindings/reset/nxp-sja1110-rgu.h>:
/* SPDX-License-Identifier: GPL-2.0+ */
/*
* Device Tree binding constants for NXP SJA1110 Reset Generation Unit
*
* Copyright 2022 NXP
*/
#ifndef __DT_BINDINGS_NXP_SJA1110_RGU_H
#define __DT_BINDINGS_NXP_SJA1110_RGU_H
#define SJA1110_RGU_CBT1_RST 22
#define SJA1110_RGU_PERSS_RST 21
#define SJA1110_RGU_ETHSW_RST 20
#define SJA1110_RGU_MCSS_RST 19
#define SJA1110_RGU_NT1SS_PORT4_RST 18
#define SJA1110_RGU_NT1SS_PORT3_RST 17
#define SJA1110_RGU_NT1SS_PORT2_RST 16
#define SJA1110_RGU_NT1SS_PORT1_RST 15
#define SJA1110_RGU_CBT1_RESUME 14
#define SJA1110_RGU_CHIP_SYS_RST 13
#define SJA1110_RGU_PLL_DISABLE 12
#define SJA1110_RGU_DEVCFG_RST 11
#define SJA1110_RGU_OTP_RST 10
#define SJA1110_RGU_OSC_DISABLE 9
#define SJA1110_RGU_PMC_RST 8
#define SJA1110_RGU_SISS_RST 7
#define SJA1110_RGU_WARM_RST 6
#define SJA1110_RGU_COLD_RST 5
#define SJA1110_RGU_COLD_INP_RST 4
#define SJA1110_RGU_HW_RST 3
#define SJA1110_RGU_HW_INP_RST 2
#define SJA1110_RGU_POR_RST 1
#define SJA1110_RGU_PORTAP_RST 0
#endif /* __DT_BINDINGS_NXP_SJA1110_RGU_H */
I haven't yet discovered the mapping of all peripherals to reset
domains, but the switch reset really resets only the switch IP (this is
necessary to load a different static config into it).
This is different compared to SJA1105, where the XPCS also gets reset
when the switch reset is triggered. That led to workarounds such as me
needing to call xpcs_do_config() from sja1105_static_config_reload() -
every time the switching IP got reset. Food for thought, especially with
Sean Anderson's proposal to treat PCS devices using the regular device
model with independent probe and remove - how independent the PCS truly
is will depend on the hardware integration.
> >
> > sw2_cgu: clock-controller@719000 {
> > compatible = "nxp,sja1110-cgu";
> > reg = <0x719000 0x1000>;
> > #clock-cells = <1>;
> > };
>
> and phandles to the clock driver?
Yup. Consumers of CGU clocks can either be some other SJA1110
peripherals, or external devices altogether, which need to keep a clock
ticking. Currently I don't have a need for a CGU driver, it's there
mostly for illustrative purposes.
> Before doing too much in this direction, i would want to be sure you
> have sufficient control of ordering and the DT loops are not too
> complex, that the MFD core and the driver core can actually get
> everything probed.
Yup, I did think about that.
> The current way of doing it, with the drivers embedded inside the DSA
> driver is that DT blob only exposes what needs to be seen outside of
> the DSA driver. And the driver has full control over the order it
> probes its internal sub drivers, so ensuring they are probed in the
> correct order, and the linking is done in C, not DT, were again the
> driver is in full control.
Calling the sub-functions "drivers" is a bit too much, since in the
Linux device model sense, the old/standard way proposes only a single
driver for a single device structure: the spi_driver / i2c_driver /
mdio_driver / platform_driver for the switch chip as a whole. That
device driver calls dsa_register_switch(), but also optionally calls
mdiobus_register(), gpiochip_add_data(), irq_domain_add_linear(), etc
etc. Basically it registers a single struct device with all the
subsystems that the switching chip needs.
> I do however agree that being able to split sub drivers out of the
> main driver is a good idea, and putting them in the appropriated
> subsystem would help with code review.
Yup. Concretely, here, my problem is somewhat different. It is related
to OF addresses for all those SoC children. Somehow that was a problem
even in the old-style (current) bindings, but in a different way: see
the "mdios" subnode.
Some other mfd drivers which call of_platform_populate() and their
children have unit addresses are all memory-mapped in the CPU address
space. Not the case here.
> Maybe the media subsystem has some pointers how to do this. It also
> has complex devices made up from lots of sub devices.
You mean something like struct v4l2_subdev_ops? This seems like the
precise definition of what I'd like to avoid: a predefined set of
subfunctions decided by the DSA core.
Or maybe something else? To be honest, I don't know much about the media
subsystem. This is what I saw.
> > Maybe the media subsystem has some pointers how to do this. It also
> > has complex devices made up from lots of sub devices.
>
> You mean something like struct v4l2_subdev_ops? This seems like the
> precise definition of what I'd like to avoid: a predefined set of
> subfunctions decided by the DSA core.
>
> Or maybe something else? To be honest, I don't know much about the media
> subsystem. This is what I saw.
Russell King put in some infrastructure where a media 'glue' driver
has a list of other drivers which need to probe and register there
resources with the kernel before it then becomes active and glues all
the parts together. I just know it exists, i've never used it, so i've
no idea if it could be useful or not.
What i'm really trying to say is that we should look outside of netdev
and see if similar problems have been solved somewhere else and all
that is needed is some code copying.
Andrew
On 22/12/2022 14:48, Vladimir Oltean wrote:
> Hi,
(...)
> To add interrupts in a naive way, similar to how other DSA drivers have
> done it, it would have to be done like this:
>
> #include <dt-bindings/interrupt-controller/nxp-sja1110-acu-slir.h>
>
> sw2: ethernet-switch@2 {
> compatible = "nxp,sja1110a";
> reg = <2>;
> spi-max-frequency = <4000000>;
> spi-cpol;
> dsa,member = <0 1>;
>
> slir2: interrupt-controller {
> compatible = "nxp,sja1110-acu-slir";
> interrupt-controller;
> #interrupt-cells = <1>;
> interrupt-parent = <&gpio 10>;
> };
>
> ethernet-ports {
> #address-cells = <1>;
> #size-cells = <0>;
>
just trim the code... we do not need to scroll over unrelated pieces.
> };
>
> mdios {
> #address-cells = <1>;
> #size-cells = <0>;
>
> mdio@0 {
> compatible = "nxp,sja1110-base-t1-mdio";
> #address-cells = <1>;
> #size-cells = <0>;
> reg = <0>;
>
> sw2_port5_base_t1_phy: ethernet-phy@1 {
> compatible = "ethernet-phy-ieee802.3-c45";
> reg = <0x1>;
> interrupts-extended = <&slir2 SJA1110_IRQ_CBT1_PHY1>;
> };
>
...
> };
> };
>
> However, the irq_domain/irqchip handling code in this case will go to
> drivers/net/dsa/, and it won't really be a "driver" (there is no struct
Why? Devicetree hierarchy has nothing to do with Linux driver hierarchy
and nothing stops you from putting irqchip code in respective directory
for such DT. Your parent device can be MFD, can be same old DSA switch
driver etc. Several options.
Best regards,
Krzysztof
On Fri, Dec 23, 2022 at 09:44:14AM +0100, Krzysztof Kozlowski wrote:
> just trim the code... we do not need to scroll over unrelated pieces.
ok
> > However, the irq_domain/irqchip handling code in this case will go to
> > drivers/net/dsa/, and it won't really be a "driver" (there is no struct
>
> Why? Devicetree hierarchy has nothing to do with Linux driver hierarchy
> and nothing stops you from putting irqchip code in respective directory
> for such DT. Your parent device can be MFD, can be same old DSA switch
> driver etc. Several options.
True, in fact I've already migrated in my tree the drivers for
nxp,sja1110-base-tx-mdio and nxp,sja1110-base-t1-mdio (which in the
current bindings, are under ethernet-switch/mdios/mdio@N) to dedicated
platform drivers under drivers/net/mdio/. The sja1105 driver will have
to support old bindings as well, so code in sja1105_mdio.c which
registers platform devices for MDIO nodes for compatibility will have to
stay.
But I don't want to keep doing that for other peripherals. The irqchip
is not a child of the ethernet-switch, not in any sense at all. The
ethernet-switch even has 2 IRQ lines which need to be provided by the
irqchip, so there would be a circular dependency in the device tree
description if the ethernet-switch was the parent.
fw_devlink doesn't really like that, and has been causing problems for
similar topologies with other DSA switches. There have been discussions
with Saravana Kannan, and he proposed introducing a FWNODE_FLAG_BROKEN_PARENT
flag, that says "don't create device links between a consumer and a
supplier, if the consumer needs a resource from the supplier to probe,
and the supplier needs to manually probe the consumer to finish its own
probing".
https://patchwork.kernel.org/project/netdevbpf/cover/[email protected]/
That patch didn't really go anywhere to my knowledge, but I'd prefer to
sidestep all that discussion about what constitutes a broken parent and
what doesn't, and here, introducing an irqchip driver which is a fwnode
child of the ethernet-switch driver seems like a big mistake, given past
experience.
I omitted this because I wanted to give you a bit of context, but I felt
like some of it was outside the scope of my real question, which is
mostly about how to deal with IORESOURCE_REG resources through generic
OF address code.
On Fri, Dec 23, 2022 at 5:44 AM Vladimir Oltean <[email protected]> wrote:
>
> On Fri, Dec 23, 2022 at 09:44:14AM +0100, Krzysztof Kozlowski wrote:
> > just trim the code... we do not need to scroll over unrelated pieces.
>
> ok
>
> > > However, the irq_domain/irqchip handling code in this case will go to
> > > drivers/net/dsa/, and it won't really be a "driver" (there is no struct
> >
> > Why? Devicetree hierarchy has nothing to do with Linux driver hierarchy
> > and nothing stops you from putting irqchip code in respective directory
> > for such DT. Your parent device can be MFD, can be same old DSA switch
> > driver etc. Several options.
Hi Vladimir,
I stumbled onto this thread when searching for some old emails between
us to refresh my memory on fw_devlink + DSA issues.
>
> True, in fact I've already migrated in my tree the drivers for
> nxp,sja1110-base-tx-mdio and nxp,sja1110-base-t1-mdio (which in the
> current bindings, are under ethernet-switch/mdios/mdio@N) to dedicated
> platform drivers under drivers/net/mdio/. The sja1105 driver will have
> to support old bindings as well, so code in sja1105_mdio.c which
> registers platform devices for MDIO nodes for compatibility will have to
> stay.
>
> But I don't want to keep doing that for other peripherals. The irqchip
> is not a child of the ethernet-switch, not in any sense at all. The
> ethernet-switch even has 2 IRQ lines which need to be provided by the
> irqchip, so there would be a circular dependency in the device tree
> description if the ethernet-switch was the parent.
I'm glad you are looking into this and agree how IRQ controllers are
independent of the rest of the ethernet-switch, etc.
> fw_devlink doesn't really like that, and has been causing problems for
> similar topologies with other DSA switches. There have been discussions
> with Saravana Kannan, and he proposed introducing a FWNODE_FLAG_BROKEN_PARENT
> flag, that says "don't create device links between a consumer and a
> supplier, if the consumer needs a resource from the supplier to probe,
> and the supplier needs to manually probe the consumer to finish its own
> probing".
> https://patchwork.kernel.org/project/netdevbpf/cover/[email protected]/
It did land as FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD and it's used for
PHYs. But yeah, it's not a great long term solution.
> That patch didn't really go anywhere to my knowledge, but I'd prefer to
> sidestep all that discussion about what constitutes a broken parent and
> what doesn't, and here, introducing an irqchip driver which is a fwnode
> child of the ethernet-switch driver seems like a big mistake, given past
> experience.
IMHO, the DSA is a logical device that's made up of many different
pieces of real hardware IP. IMHO an ideal solution would be something
like a dsa_bus type where we add a dsa_device. The dsa_device will
list all the necessary devices (IRQ, PHY, MDIO, etc -- they can be
wherever they want in DT) as its suppliers and when the dsa_device is
probed, it can assume all its suppliers are present and then do the
DSA initialization.
This would also solve the PHYs problem you stated earlier. So,
basically you'd move some of the dsa initialization code into the
dsa_probe() function of the dsa_bus.
Hope I'm making some sense. Let me know if you want to discuss this
further and I can try and provide more context and details.
Also, there's already a driver core feature that does just this --
component devices -- but the implementation is old and not so great
IMHO. Component device model can be done better using device links. I
want to refactor the component device framework to use device links,
but that's a problem for another time.
-Saravana
Hi Saravana,
This has nothing to do with the topic of the thread, right? (probing a
DSA switch SoC as a MFD parent for its sub-blocks).
On Mon, Feb 06, 2023 at 10:49:03PM -0800, Saravana Kannan wrote:
> IMHO, the DSA is a logical device that's made up of many different
> pieces of real hardware IP. IMHO an ideal solution would be something
> like a dsa_bus type where we add a dsa_device.
So in DSA terminology, what you call a dsa_device would be a DSA switch tree
(a group of interconnected DSA chips forming an Ethernet switching fabric),
and what you call the dsa_bus would be the bus on which the switch trees live?
> The dsa_device will list all the necessary devices (IRQ, PHY, MDIO,
> etc -- they can be wherever they want in DT) as its suppliers and when
> the dsa_device is probed, it can assume all its suppliers are present
> and then do the DSA initialization.
How would the dsa_device (the DSA switch tree) know what are its suppliers?
DSA is library code used by multiple drivers from multiple vendors.
It's just that this library code has a built-in mechanism for serializing
probing, in case those switches form a tree. The dependencies are not fixed.
Would fw_devlink be involved in determining these? Up to what degree can
the dependencies be "wherever they want in DT"?
> This would also solve the PHYs problem you stated earlier.
How would it solve it?
> So, basically you'd move some of the dsa initialization code into the
> dsa_probe() function of the dsa_bus.
Not clear what makes this solution better than what currently exists.
> Hope I'm making some sense. Let me know if you want to discuss this
> further and I can try and provide more context and details.
>
> Also, there's already a driver core feature that does just this --
> component devices -- but the implementation is old and not so great
> IMHO. Component device model can be done better using device links. I
> want to refactor the component device framework to use device links,
> but that's a problem for another time.
Ok, in my mind the problem is the way in which each individual DSA
switch driver, written by $vendor, organizes the resources/peripherals
pertaining to a single chip. Not the way in which the DSA framework
coordinates the probing of multiple chips in an arbitrarily-defined
topology, and defers some initialization to a dsa_switch_ops :: setup()
function. The latter part doesn't have a good justification to be
changed IMO; it doesn't do any harm. I.o.w., I never gave an example
of cross-chip fw_devlink dependencies that would need to be satisfied,
only ones local to a chip.
But since your email seems to be about cross-chip, let me try to
entertain the idea and ask you to explain what you're proposing.
Let's take an absolutely extreme (and hypothetical) example:
+--------------------------------+
| |
| Host SoC running Linux |
| |
| +----------+ +----------+ |
| | MDIO | | ETH | |
| |controller| |controller| |
| | | | (DSA | |
| | | | master) | |
+--+----------+---+----------+---+
| |
| |
MDIO | | ETH
| |
v v
+--------------------------------+
| CPU port |-------
| |-------
| MDIO-controller |-------
| +----------+ DSA switch #1 |------- External
| | MDIO | |------- ETH
| |controller| |------- ports
| | | |-------
| | | cascade port |-------
+--+----------+------------------+
| ^ |
| | |
MDIO | CLK | | ETH
| | |
v | v
+--------------------------------+
| cascade port |-------
| |-------
| |-------
| MDIO-controlled |------- External
| switch #2 |------- ETH
| |------- ports
| |-------
| |-------
+--------------------------------+
where we have a DSA switch tree with 2 chips, both have their registers
accessible over MDIO. But chip #1 is accessed through the host's MDIO
controller, and chip #2 through chip #1's MDIO controller.
These chips are also going to be used with PTP, and the PTP timers of
the 2 switches don't feed off of individual oscillators as would be
usual, but instead, there is a single oscillator feeding into one
switch, and that switch forwards this as a clock to the other switch
(because board designers heard it's more trendy this way). So switch #2
provides a clock to switch #1.
With the current mainline DSA code structure, assuming a monolithically
written $vendor driver (as basically N-1 of them are), the above would
not work under any circumstance.
In the alternative in which we implement what you hinted at and nothing
more (a dsa_device composed of switches 1 and 2), how would this ever
come any closer to the conclusion that the 2 monolithic blocks above are
able to probe linearly, and finally bind the aggregate device?
In the alternative in which we implement what I proposed in this email
thread and nothing more, the drivers for the chips would have to not be
monolithically written. First to probe would be the SoC driver for switch
chip #1. That would probe its children: the MDIO controller, the PTP timer,
the Ethernet switching IP with its ports (the DSA driver proper).
Perhaps the DSA driver has some unmet dependencies, so it defers probe.
Doesn't affect the rest.
After chip #1 probes its MDIO controller, that MDIO bus probes its own
children. On that bus is the switch chip #2, which can probe its own SoC
driver. Same story: PTP timer, MDIO bus, Ethernet switching IP (DSA proper).
Let's say that this DSA driver had no dependencies and can probe
straight away. It enters dsa_tree_setup_routing_table(), the infamous
function which determines "hey, you have a cascade port towards another
DSA switching IP which hasn't probed yet, I can't fully initialize you
either" and basically finishes probing "early".
Then finally, the dependency for the DSA switching IP of chip #1 gets
resolved and it retries probing. It enters dsa_tree_setup_routing_table(),
and this figures out "ah, you're the last switch, the tree is complete,
we've been waiting for you!" and initializes the entire Ethernet fabric
from the probing context of this switch. So this ad-hoc mechanism only
affects the probing of the switching fabric and Ethernet ports, the rest
can go on with its business.
So I gave you an example in which I don't really understand what would
an aggregate device representing the entire switching fabric bring new
to the table.
Also, PCBs with more than 1 switch on them are relatively rare. Usually,
when you need more Ethernet ports, you just buy a larger switching chip.
DSA has no problem working with a single chip, and many chip vendors
don't actually support cascading. I think that for the volume of users
of the multi-chip feature, it actually works very well and with very
little code. I think that the problems of multi-chip setups are the same
as the problem of single-chip setups.
> Let's take an absolutely extreme (and hypothetical) example:
>
> +--------------------------------+
> | |
> | Host SoC running Linux |
> | |
> | +----------+ +----------+ |
> | | MDIO | | ETH | |
> | |controller| |controller| |
> | | | | (DSA | |
> | | | | master) | |
> +--+----------+---+----------+---+
> | |
> | |
> MDIO | | ETH
> | |
> v v
> +--------------------------------+
> | CPU port |-------
> | |-------
> | MDIO-controller |-------
> | +----------+ DSA switch #1 |------- External
> | | MDIO | |------- ETH
> | |controller| |------- ports
> | | | |-------
> | | | cascade port |-------
> +--+----------+------------------+
> | ^ |
> | | |
> MDIO | CLK | | ETH
> | | |
> v | v
> +--------------------------------+
> | cascade port |-------
> | |-------
> | |-------
> | MDIO-controlled |------- External
> | switch #2 |------- ETH
> | |------- ports
> | |-------
> | |-------
> +--------------------------------+
>
> where we have a DSA switch tree with 2 chips, both have their registers
> accessible over MDIO. But chip #1 is accessed through the host's MDIO
> controller, and chip #2 through chip #1's MDIO controller.
>
> These chips are also going to be used with PTP, and the PTP timers of
> the 2 switches don't feed off of individual oscillators as would be
> usual, but instead, there is a single oscillator feeding into one
> switch, and that switch forwards this as a clock to the other switch
> (because board designers heard it's more trendy this way). So switch #2
> provides a clock to switch #1.
>
>
> With the current mainline DSA code structure, assuming a monolithically
> written $vendor driver (as basically N-1 of them are), the above would
> not work under any circumstance.
I'm not sure that is true. The switches probe method should register
with the driver core any resources a switch provides. So switch #1
MDIO bus driver is registered during its probe, allowing the probe of
switch #2 to happen. When switch #2 probes, it should register its
clock with the common clock framework, etc.
However, the linking of resources together, the PTP clock in your
example, should happen in the switches setup() call, which only
happens once all the switches in the cluster have probed, so all the
needed resources should be available.
Because we have these two phases, i think the above setup would work.
Andrew
On Sat, Feb 11, 2023 at 04:50:39PM +0100, Andrew Lunn wrote:
> I'm not sure that is true. The switches probe method should register
> with the driver core any resources a switch provides. So switch #1
> MDIO bus driver is registered during its probe, allowing the probe of
> switch #2 to happen. When switch #2 probes, it should register its
> clock with the common clock framework, etc.
>
> However, the linking of resources together, the PTP clock in your
> example, should happen in the switches setup() call, which only
> happens once all the switches in the cluster have probed, so all the
> needed resources should be available.
>
> Because we have these two phases, i think the above setup would work.
I was thinking it wouldn't work because the PTP timer sub-driver would
first have to request the clock input from the CCF, before registering
its own clock output with the CCF. So the driver writer would be forced
to request a clock from the CCF in the probe() phase, if it wanted to
also register as a clock provider in the probe() phase.
I.o.w., switch #1 would get an -EPROBE_DEFER waiting for the clock from
switch #2, before it would get to call dsa_register_switch().
TLDR: the rule "provide all you have in stage 1, request all you need
in stage 2" would only work if you don't need anything to register your
providers.