2021-06-24 10:56:00

by Lukasz Majewski

[permalink] [raw]
Subject: Re: [RFC 2/3] net: Provide switchdev driver for NXP's More Than IP L2 switch

Hi Andrew,

> > > Using volatile is generally wrong. Why do you need it?
> >
> > This was the code, which I took from the legacy driver. I will
> > adjust it.
>
> It is called 'vendor crap' for a reason.

:-)

>
> > > > + for_each_available_child_of_node(p, port) {
> > > > + if (of_property_read_u32(port, "reg",
> > > > &port_num))
> > > > + continue;
> > > > +
> > > > + priv->n_ports = port_num;
> > > > +
> > > > + fep_np = of_parse_phandle(port, "phy-handle",
> > > > 0);
> > >
> > > As i said, phy-handle points to a phy. It minimum, you need to
> > > call this mac-handle. But that then makes this switch driver very
> > > different to every other switch driver.
> >
> > Other drivers (DSA for example) use "ethernet" or "link" properties.
> > Maybe those can be reused?
>
> Not really. They have well known meanings and they are nothing like
> what you are trying to do. You need a new name. Maybe 'mac-handle'?

Ok.

>
>
> > > > + pdev = of_find_device_by_node(fep_np);
> > > > + ndev = platform_get_drvdata(pdev);
> > > > + priv->fep[port_num - 1] = netdev_priv(ndev);
> > > >
> > >
> > > What happens when somebody puts reg=<42>; in DT?
> >
> > I do guess that this will break the code.
> >
> > However, DSA DT descriptions also rely on the exact numbering [1]
> > (via e.g. reg property) of the ports. I've followed this paradigm.
>
> DSA does a range check:
>
> for_each_available_child_of_node(ports, port) {
> err = of_property_read_u32(port, "reg", &reg);
> if (err)
> goto out_put_node;
>
> if (reg >= ds->num_ports) {
> dev_err(ds->dev, "port %pOF index %u exceeds
> num_ports (%zu)\n", port, reg, ds->num_ports);
> err = -EINVAL;
> goto out_put_node;
> }
>

Ok.

> > > I would say, your basic structure needs to change, to make it more
> > > like other switchdev drivers. You need to replace the two FEC
> > > device instances with one switchdev driver.
> >
> > I've used the cpsw_new.c as the example.
> >
> > > The switchdev driver will then
> > > instantiate the two netdevs for the two external MACs.
> >
> > Then there is a question - what about eth[01], which already
> > exists?
>
> They don't exist. cpsw_new is not used at the same time as cpsw, it
> replaces it. This new driver would replace the FEC driver.

I see.

> cpsw_new
> makes use of some of the code in the cpsw driver to implement two
> netdevs. This new FEC switch driver would do the same, make use of
> some of the low level code, e.g. for DMA access, MDIO bus, etc.

I'm not sure if the imx28 switch is similar to one from TI (cpsw-3g)
- it looks to me that the bypass mode for both seems to be very
different. For example, on NXP when switch is disabled we need to
handle two DMA[01]. When it is enabled, only one is used. The approach
with two DMAs is best handled with FEC driver instantiation.

>
> > To be honest - such driver for L2 switch already has been forward
> > ported by me [2] to v4.19.
>
> Which is fine, you can do whatever you want in your own fork. But for
> mainline, we need a clean architecture.

This code is a forward port of vendor's (Freescale) old driver. It uses
the _wrong_ approach, but it can (still) be used in production after
some adjustments.

> I'm not convinced your code is
> that clean,

The code from [2] needs some vendor ioctl based tool (or hardcode) to
configure the switch.

> and how well future features can be added. Do you have
> support for VLANS? Adding and removing entries to the lookup tables?
> How will IGMP snooping work? How will STP work?

This can be easily added with serving netstack hooks (as it is already
done with cpsw_new) in the new switchdev based version [3] (based on
v5.12).

>
> Andrew

Links:

[3] -
https://source.denx.de/linux/linux-imx28-l2switch/-/commits/imx28-v5.12-L2-upstream-switchdev-RFC_v1


Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: [email protected]


Attachments:
(No filename) (499.00 B)
OpenPGP digital signature

2021-06-24 13:36:50

by Andrew Lunn

[permalink] [raw]
Subject: Re: [RFC 2/3] net: Provide switchdev driver for NXP's More Than IP L2 switch

> I'm not sure if the imx28 switch is similar to one from TI (cpsw-3g)
> - it looks to me that the bypass mode for both seems to be very
> different. For example, on NXP when switch is disabled we need to
> handle two DMA[01]. When it is enabled, only one is used. The approach
> with two DMAs is best handled with FEC driver instantiation.

I don't know if it applies to the FEC, but switches often have
registers which control which egress port an ingress port can send
packets to. So by default, you allow CPU to port0, CPU to port1, but
block between port0 to port1. This would give you two independent
interface, the switch enabled, and using one DMA. When the bridge is
configured, you simply allow port0 and send/receive packets to/from
port1. No change to the DMA setup, etc.

> The code from [2] needs some vendor ioctl based tool (or hardcode) to
> configure the switch.

This would not be allowed. You configure switches in Linux using the
existing user space tools. No vendor tools are used.

> > and how well future features can be added. Do you have
> > support for VLANS? Adding and removing entries to the lookup tables?
> > How will IGMP snooping work? How will STP work?
>
> This can be easily added with serving netstack hooks (as it is already
> done with cpsw_new) in the new switchdev based version [3] (based on
> v5.12).

Here i'm less convinced. I expect a fully functioning switch driver is
going to need switch specific versions of some of the netdev ops
functions, maybe the ethtool ops as well. It is going to want to add
devlink ops. By hacking around with the FEC driver in the way you are,
you might get very basic switch operation working. But as we have seen
with cpsw, going from very basic to a fully functioning switchdev
driver required a new driver, cpsw_new. It was getting more and more
difficult to add features because its structure was just wrong. We
don't want to add code to the kernel which is probably a dead end.

Andrew

2021-06-24 14:40:02

by Lukasz Majewski

[permalink] [raw]
Subject: Re: [RFC 2/3] net: Provide switchdev driver for NXP's More Than IP L2 switch

Hi Andrew,

> > I'm not sure if the imx28 switch is similar to one from TI (cpsw-3g)
> > - it looks to me that the bypass mode for both seems to be very
> > different. For example, on NXP when switch is disabled we need to
> > handle two DMA[01]. When it is enabled, only one is used. The
> > approach with two DMAs is best handled with FEC driver
> > instantiation.
>
> I don't know if it applies to the FEC, but switches often have
> registers which control which egress port an ingress port can send
> packets to. So by default, you allow CPU to port0, CPU to port1, but
> block between port0 to port1. This would give you two independent
> interface, the switch enabled, and using one DMA. When the bridge is
> configured, you simply allow port0 and send/receive packets to/from
> port1. No change to the DMA setup, etc.

Please correct me if I misunderstood this concept - but it seems like
you refer to the use case where the switch is enabled, and by changing
it's "allowed internal port's" mapping it decides if frames are passed
between engress ports (port1 and port2).

----------
DMA0 -> |P0 P1| -> ENET-MAC (PHY control) -> eth0 (lan1)
|L2 SW |
| P2| -> ENET-MAC (PHY control) -> eth1 (lan2)
----------

DMA1 (not used)

We can use this approach when we keep always enabled L2 switch.

However now in FEC we use the "bypass" mode, where:
DMA0 -> ENET-MAC (FEC instance driver 1) -> eth0
DMA1 -> ENET-MAC (FEC instance driver 2) -> eth1

And the "bypass" mode is the default one.


I'm just concerned how we are going to gracefully "switch" between L2
switch and bypass configuration? In this patch series - I used the
"hook" corresponding to 'ip link set eth[01] master br0' command.

In other words - how we want to manage DMA0 and DMA1 when switch is
enabled and disabled (in "bypass mode").

>
> > The code from [2] needs some vendor ioctl based tool (or hardcode)
> > to configure the switch.
>
> This would not be allowed. You configure switches in Linux using the
> existing user space tools. No vendor tools are used.

Exactly - that was the rationale to bring support for L2 switch to
mainline kernel.

>
> > > and how well future features can be added. Do you have
> > > support for VLANS? Adding and removing entries to the lookup
> > > tables? How will IGMP snooping work? How will STP work?
> >
> > This can be easily added with serving netstack hooks (as it is
> > already done with cpsw_new) in the new switchdev based version [3]
> > (based on v5.12).
>
> Here i'm less convinced. I expect a fully functioning switch driver is
> going to need switch specific versions of some of the netdev ops
> functions, maybe the ethtool ops as well.

Definately, the current L2 switch driver would need more work.

> It is going to want to add
> devlink ops. By hacking around with the FEC driver

I believe that I will not touch fec_main.[hc] files more than I did in
the
"[RFC 3/3] net: imx: Adjust fec_main.c to provide support for L2
switch"

as the switch management (and hooks) are going to be added solely to
drivers/net/ethernet/freescale/mtipsw/fec_mtip.[hc]. [*]

This would separate L2 switch driver from the current FEC driver.

> in the way you are,
> you might get very basic switch operation working.

Yes, this is the current status - only simple L2 switching works.

> But as we have seen
> with cpsw, going from very basic to a fully functioning switchdev
> driver required a new driver, cpsw_new.

The new driver for L2 switch has been introduced in [*]. The legacy FEC
driver will also work without it.

> It was getting more and more
> difficult to add features because its structure was just wrong. We
> don't want to add code to the kernel which is probably a dead end.
>

I cannot say for sure, but all the switch/bridge related hooks can be
added to [*], so fec_main will not bloat.

> Andrew




Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: [email protected]


Attachments:
(No filename) (499.00 B)
OpenPGP digital signature

2021-06-24 16:12:36

by Andrew Lunn

[permalink] [raw]
Subject: Re: [RFC 2/3] net: Provide switchdev driver for NXP's More Than IP L2 switch

On Thu, Jun 24, 2021 at 04:35:42PM +0200, Lukasz Majewski wrote:
> Hi Andrew,
>
> > > I'm not sure if the imx28 switch is similar to one from TI (cpsw-3g)
> > > - it looks to me that the bypass mode for both seems to be very
> > > different. For example, on NXP when switch is disabled we need to
> > > handle two DMA[01]. When it is enabled, only one is used. The
> > > approach with two DMAs is best handled with FEC driver
> > > instantiation.
> >
> > I don't know if it applies to the FEC, but switches often have
> > registers which control which egress port an ingress port can send
> > packets to. So by default, you allow CPU to port0, CPU to port1, but
> > block between port0 to port1. This would give you two independent
> > interface, the switch enabled, and using one DMA. When the bridge is
> > configured, you simply allow port0 and send/receive packets to/from
> > port1. No change to the DMA setup, etc.
>
> Please correct me if I misunderstood this concept - but it seems like
> you refer to the use case where the switch is enabled, and by changing
> it's "allowed internal port's" mapping it decides if frames are passed
> between engress ports (port1 and port2).

Correct.


> ----------
> DMA0 -> |P0 P1| -> ENET-MAC (PHY control) -> eth0 (lan1)
> |L2 SW |
> | P2| -> ENET-MAC (PHY control) -> eth1 (lan2)
> ----------
>
> DMA1 (not used)
>
> We can use this approach when we keep always enabled L2 switch.
>
> However now in FEC we use the "bypass" mode, where:
> DMA0 -> ENET-MAC (FEC instance driver 1) -> eth0
> DMA1 -> ENET-MAC (FEC instance driver 2) -> eth1
>
> And the "bypass" mode is the default one.

Which is not a problem, when you refactor the FEC into a library and a
driver, plus add a new switch driver. When the FEC loads, it uses
bypass mode, the switch disabled. When the new switch driver loads, it
always enables the switch, but disables communication between the two
ports until they both join the same bridge.

But i doubt we are actually getting anywhere. You say you don't have
time to write a new driver. I'm not convinced you can hack the FEC
like you are suggesting and not end up in the mess the cpsw had,
before they wrote a new driver.

Andrew

2021-06-25 10:03:39

by Lukasz Majewski

[permalink] [raw]
Subject: Re: [RFC 2/3] net: Provide switchdev driver for NXP's More Than IP L2 switch

Hi Andrew,

> On Thu, Jun 24, 2021 at 04:35:42PM +0200, Lukasz Majewski wrote:
> > Hi Andrew,
> >
> > > > I'm not sure if the imx28 switch is similar to one from TI
> > > > (cpsw-3g)
> > > > - it looks to me that the bypass mode for both seems to be very
> > > > different. For example, on NXP when switch is disabled we need
> > > > to handle two DMA[01]. When it is enabled, only one is used. The
> > > > approach with two DMAs is best handled with FEC driver
> > > > instantiation.
> > >
> > > I don't know if it applies to the FEC, but switches often have
> > > registers which control which egress port an ingress port can send
> > > packets to. So by default, you allow CPU to port0, CPU to port1,
> > > but block between port0 to port1. This would give you two
> > > independent interface, the switch enabled, and using one DMA.
> > > When the bridge is configured, you simply allow port0 and
> > > send/receive packets to/from port1. No change to the DMA setup,
> > > etc.
> >
> > Please correct me if I misunderstood this concept - but it seems
> > like you refer to the use case where the switch is enabled, and by
> > changing it's "allowed internal port's" mapping it decides if
> > frames are passed between engress ports (port1 and port2).
>
> Correct.
>
>
> > ----------
> > DMA0 -> |P0 P1| -> ENET-MAC (PHY control) -> eth0 (lan1)
> > |L2 SW |
> > | P2| -> ENET-MAC (PHY control) -> eth1 (lan2)
> > ----------
> >
> > DMA1 (not used)
> >
> > We can use this approach when we keep always enabled L2 switch.
> >
> > However now in FEC we use the "bypass" mode, where:
> > DMA0 -> ENET-MAC (FEC instance driver 1) -> eth0
> > DMA1 -> ENET-MAC (FEC instance driver 2) -> eth1
> >
> > And the "bypass" mode is the default one.
>
> Which is not a problem, when you refactor the FEC into a library and a
> driver, plus add a new switch driver. When the FEC loads, it uses
> bypass mode, the switch disabled. When the new switch driver loads, it
> always enables the switch, but disables communication between the two
> ports until they both join the same bridge.

Ok, the proposed idea would be to use FEC (refactored) on devices which
are not equipped with the switch.

On devices, which have this IP block (like vf610, imx287) we would use
the driver with switch enabled and then in switch either bridge or
separate the traffic?

>
> But i doubt we are actually getting anywhere. You say you don't have
> time to write a new driver.

Yes, I believe that this would be a very time consuming task. Joakim
also pointed out that the rewrite from NXP will not happen anytime soon.

> I'm not convinced you can hack the FEC
> like you are suggesting

I do believe that I can just extend the L2 switch driver (fec_mtip.c
file to be precise) to provide full blown L2 switch functionality
without touching the legacy FEC more than in this patch set.

Would you consider applying this patch series then?

> and not end up in the mess the cpsw had,
> before they wrote a new driver.

I do see some conceptual differences between those two drivers.

>
> Andrew


Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: [email protected]


Attachments:
(No filename) (499.00 B)
OpenPGP digital signature

2021-06-25 14:42:14

by Andrew Lunn

[permalink] [raw]
Subject: Re: [RFC 2/3] net: Provide switchdev driver for NXP's More Than IP L2 switch

> I do believe that I can just extend the L2 switch driver (fec_mtip.c
> file to be precise) to provide full blown L2 switch functionality
> without touching the legacy FEC more than in this patch set.
>
> Would you consider applying this patch series then?

What is most important is the ABI. If something is merged now, we need
to ensure it does not block later refactoring to a clean new
driver. The DT binding is considered ABI. So the DT binding needs to
be like a traditional switchdev driver. Florian already pointed out,
you can use a binding very similar to DSA. ti,cpsw-switch.yaml is
another good example.

So before considering merging your changes, i would like to see a
usable binding.

I also don't remember seeing support for STP. Without that, your
network has broadcast storm problems when there are loops. So i would
like to see the code needed to put ports into blocking, listening,
learning, and forwarding states.

Andrew

2021-06-28 12:35:22

by Lukasz Majewski

[permalink] [raw]
Subject: Re: [RFC 2/3] net: Provide switchdev driver for NXP's More Than IP L2 switch

Hi Andrew,

> > I do believe that I can just extend the L2 switch driver (fec_mtip.c
> > file to be precise) to provide full blown L2 switch functionality
> > without touching the legacy FEC more than in this patch set.
> >
> > Would you consider applying this patch series then?
>
> What is most important is the ABI. If something is merged now, we need
> to ensure it does not block later refactoring to a clean new
> driver. The DT binding is considered ABI. So the DT binding needs to
> be like a traditional switchdev driver. Florian already pointed out,
> you can use a binding very similar to DSA. ti,cpsw-switch.yaml is
> another good example.

The best I could get would be:

&eth_switch {
compatible = "imx,mtip-l2switch";
reg = <0x800f8000 0x400>, <0x800fC000 0x4000>;

interrupts = <100>;
status = "okay";

ethernet-ports {
port1@1 {
reg = <1>;
label = "eth0";
phys = <&mac0 0>;
};

port2@2 {
reg = <2>;
label = "eth1";
phys = <&mac1 1>;
};
};
};

Which would abuse the "phys" properties usages - as 'mac[01]' are
referring to ethernet controllers.

On TI SoCs (e.g. am33xx-l4.dtsi) phys refer to some separate driver
responsible for PHY management. On NXP this is integrated with FEC
driver itself.

>
> So before considering merging your changes, i would like to see a
> usable binding.
>
> I also don't remember seeing support for STP. Without that, your
> network has broadcast storm problems when there are loops. So i would
> like to see the code needed to put ports into blocking, listening,
> learning, and forwarding states.
>
> Andrew




Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: [email protected]


Attachments:
(No filename) (499.00 B)
OpenPGP digital signature

2021-06-28 12:50:21

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [RFC 2/3] net: Provide switchdev driver for NXP's More Than IP L2 switch

On Mon, Jun 28, 2021 at 02:05:26PM +0200, Lukasz Majewski wrote:
> Hi Andrew,
>
> > > I do believe that I can just extend the L2 switch driver (fec_mtip.c
> > > file to be precise) to provide full blown L2 switch functionality
> > > without touching the legacy FEC more than in this patch set.
> > >
> > > Would you consider applying this patch series then?
> >
> > What is most important is the ABI. If something is merged now, we need
> > to ensure it does not block later refactoring to a clean new
> > driver. The DT binding is considered ABI. So the DT binding needs to
> > be like a traditional switchdev driver. Florian already pointed out,
> > you can use a binding very similar to DSA. ti,cpsw-switch.yaml is
> > another good example.
>
> The best I could get would be:
>
> &eth_switch {
> compatible = "imx,mtip-l2switch";
> reg = <0x800f8000 0x400>, <0x800fC000 0x4000>;
>
> interrupts = <100>;
> status = "okay";
>
> ethernet-ports {
> port1@1 {
> reg = <1>;
> label = "eth0";
> phys = <&mac0 0>;
> };
>
> port2@2 {
> reg = <2>;
> label = "eth1";
> phys = <&mac1 1>;
> };
> };
> };
>
> Which would abuse the "phys" properties usages - as 'mac[01]' are
> referring to ethernet controllers.
>
> On TI SoCs (e.g. am33xx-l4.dtsi) phys refer to some separate driver
> responsible for PHY management. On NXP this is integrated with FEC
> driver itself.

If we were really honest, the binding would need to be called

port@0 {
puppet = <&mac0>;
};

port@1 {
puppet = <&mac1>;
};

which speaks for itself as to why accepting "puppet master" drivers is
not really very compelling. I concur with the recommendation given by
Andrew and Florian to refactor FEC as a multi-port single driver.

> >
> > So before considering merging your changes, i would like to see a
> > usable binding.
> >
> > I also don't remember seeing support for STP. Without that, your
> > network has broadcast storm problems when there are loops. So i would
> > like to see the code needed to put ports into blocking, listening,
> > learning, and forwarding states.
> >
> > Andrew

I cannot stress enough how important it is for us to see STP support and
consequently the ndo_start_xmit procedure for switch ports.
Let me see if I understand correctly. When the switch is enabled, eth0
sends packets towards both physical switch ports, and eth1 sends packets
towards none, but eth0 handles the link state of switch port 0, and eth1
handles the link state of switch port 1?

2021-06-28 13:25:42

by Andrew Lunn

[permalink] [raw]
Subject: Re: [RFC 2/3] net: Provide switchdev driver for NXP's More Than IP L2 switch

> The best I could get would be:
>
> &eth_switch {
> compatible = "imx,mtip-l2switch";
> reg = <0x800f8000 0x400>, <0x800fC000 0x4000>;
>
> interrupts = <100>;
> status = "okay";
>
> ethernet-ports {
> port1@1 {
> reg = <1>;
> label = "eth0";
> phys = <&mac0 0>;
> };
>
> port2@2 {
> reg = <2>;
> label = "eth1";
> phys = <&mac1 1>;
> };
> };
> };
>
> Which would abuse the "phys" properties usages - as 'mac[01]' are
> referring to ethernet controllers.

This is not how a dedicated driver would have its binding. We should
not establish this as ABI.

So, sorry, but no.

Andrew

2021-06-28 14:19:38

by Lukasz Majewski

[permalink] [raw]
Subject: Re: [RFC 2/3] net: Provide switchdev driver for NXP's More Than IP L2 switch

Hi Vladimir,

> On Mon, Jun 28, 2021 at 02:05:26PM +0200, Lukasz Majewski wrote:
> > Hi Andrew,
> >
> > > > I do believe that I can just extend the L2 switch driver
> > > > (fec_mtip.c file to be precise) to provide full blown L2 switch
> > > > functionality without touching the legacy FEC more than in this
> > > > patch set.
> > > >
> > > > Would you consider applying this patch series then?
> > >
> > > What is most important is the ABI. If something is merged now, we
> > > need to ensure it does not block later refactoring to a clean new
> > > driver. The DT binding is considered ABI. So the DT binding needs
> > > to be like a traditional switchdev driver. Florian already
> > > pointed out, you can use a binding very similar to DSA.
> > > ti,cpsw-switch.yaml is another good example.
> >
> > The best I could get would be:
> >
> > &eth_switch {
> > compatible = "imx,mtip-l2switch";
> > reg = <0x800f8000 0x400>, <0x800fC000 0x4000>;
> >
> > interrupts = <100>;
> > status = "okay";
> >
> > ethernet-ports {
> > port1@1 {
> > reg = <1>;
> > label = "eth0";
> > phys = <&mac0 0>;
> > };
> >
> > port2@2 {
> > reg = <2>;
> > label = "eth1";
> > phys = <&mac1 1>;
> > };
> > };
> > };
> >
> > Which would abuse the "phys" properties usages - as 'mac[01]' are
> > referring to ethernet controllers.
> >
> > On TI SoCs (e.g. am33xx-l4.dtsi) phys refer to some separate driver
> > responsible for PHY management. On NXP this is integrated with FEC
> > driver itself.
>
> If we were really honest, the binding would need to be called
>
> port@0 {
> puppet = <&mac0>;
> };
>
> port@1 {
> puppet = <&mac1>;
> };
>
> which speaks for itself as to why accepting "puppet master" drivers is
> not really very compelling. I concur with the recommendation given by
> Andrew and Florian to refactor FEC as a multi-port single driver.

Ok.

>
> > >
> > > So before considering merging your changes, i would like to see a
> > > usable binding.
> > >
> > > I also don't remember seeing support for STP. Without that, your
> > > network has broadcast storm problems when there are loops. So i
> > > would like to see the code needed to put ports into blocking,
> > > listening, learning, and forwarding states.
> > >
> > > Andrew
>
> I cannot stress enough how important it is for us to see STP support
> and consequently the ndo_start_xmit procedure for switch ports.

Ok.

> Let me see if I understand correctly. When the switch is enabled, eth0
> sends packets towards both physical switch ports, and eth1 sends
> packets towards none, but eth0 handles the link state of switch port
> 0, and eth1 handles the link state of switch port 1?

Exactly, this is how FEC driver is utilized for this switch.


Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: [email protected]


Attachments:
(No filename) (499.00 B)
OpenPGP digital signature

2021-06-28 14:19:39

by Lukasz Majewski

[permalink] [raw]
Subject: Re: [RFC 2/3] net: Provide switchdev driver for NXP's More Than IP L2 switch

Hi Andrew,

> > The best I could get would be:
> >
> > &eth_switch {
> > compatible = "imx,mtip-l2switch";
> > reg = <0x800f8000 0x400>, <0x800fC000 0x4000>;
> >
> > interrupts = <100>;
> > status = "okay";
> >
> > ethernet-ports {
> > port1@1 {
> > reg = <1>;
> > label = "eth0";
> > phys = <&mac0 0>;
> > };
> >
> > port2@2 {
> > reg = <2>;
> > label = "eth1";
> > phys = <&mac1 1>;
> > };
> > };
> > };
> >
> > Which would abuse the "phys" properties usages - as 'mac[01]' are
> > referring to ethernet controllers.
>
> This is not how a dedicated driver would have its binding. We should
> not establish this as ABI.
>
> So, sorry, but no.

Thanks for the clear statement about upstream requirements.

>
> Andrew




Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: [email protected]


Attachments:
(No filename) (499.00 B)
OpenPGP digital signature

2021-06-28 14:41:07

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [RFC 2/3] net: Provide switchdev driver for NXP's More Than IP L2 switch

On Mon, Jun 28, 2021 at 04:13:14PM +0200, Lukasz Majewski wrote:
> > > > So before considering merging your changes, i would like to see a
> > > > usable binding.
> > > >
> > > > I also don't remember seeing support for STP. Without that, your
> > > > network has broadcast storm problems when there are loops. So i
> > > > would like to see the code needed to put ports into blocking,
> > > > listening, learning, and forwarding states.
> > > >
> > > > Andrew
> >
> > I cannot stress enough how important it is for us to see STP support
> > and consequently the ndo_start_xmit procedure for switch ports.
>
> Ok.
>
> > Let me see if I understand correctly. When the switch is enabled, eth0
> > sends packets towards both physical switch ports, and eth1 sends
> > packets towards none, but eth0 handles the link state of switch port
> > 0, and eth1 handles the link state of switch port 1?
>
> Exactly, this is how FEC driver is utilized for this switch.

This is a much bigger problem than anything which has to do with code
organization. Linux does not have any sort of support for unmanaged
switches. Please try to find out if your switch is supposed to be able
to be managed (run control protocols on the CPU). If not, well, I don't
know what to suggest.

2021-06-29 08:12:43

by Lukasz Majewski

[permalink] [raw]
Subject: Re: [RFC 2/3] net: Provide switchdev driver for NXP's More Than IP L2 switch

Hi Vladimir,

> On Mon, Jun 28, 2021 at 04:13:14PM +0200, Lukasz Majewski wrote:
> > > > > So before considering merging your changes, i would like to
> > > > > see a usable binding.
> > > > >
> > > > > I also don't remember seeing support for STP. Without that,
> > > > > your network has broadcast storm problems when there are
> > > > > loops. So i would like to see the code needed to put ports
> > > > > into blocking, listening, learning, and forwarding states.
> > > > >
> > > > > Andrew
> > >
> > > I cannot stress enough how important it is for us to see STP
> > > support and consequently the ndo_start_xmit procedure for switch
> > > ports.
> >
> > Ok.
> >
> > > Let me see if I understand correctly. When the switch is enabled,
> > > eth0 sends packets towards both physical switch ports, and eth1
> > > sends packets towards none, but eth0 handles the link state of
> > > switch port 0, and eth1 handles the link state of switch port 1?
> >
> > Exactly, this is how FEC driver is utilized for this switch.
>
> This is a much bigger problem than anything which has to do with code
> organization. Linux does not have any sort of support for unmanaged
> switches.

My impression is similar. This switch cannot easily fit into DSA (lack
of appending tags) nor to switchdev.

The latter is caused by two modes of operation:

- Bypass mode (no switch) -> DMA1 and DMA0 are used
- Switch mode -> only DMA0 is used


Moreover, from my understanding of the CPSW - looks like it uses always
just a single DMA, and the switching seems to be the default operation
for two ethernet ports.

The "bypass mode" from NXP's L2 switch seems to be achieved inside the
CPSW switch, by configuring it to not pass packets between those ports.

> Please try to find out if your switch is supposed to be able
> to be managed (run control protocols on the CPU).

It can support all the "normal" set of L2 switch features:

- VLANs, lookup table (with learning), filtering and forwarding
(Multicast, Broadcast, Unicast), priority queues, IP snooping, etc.

Frames for BPDU are recognized by the switch and can be used to
implement support for RSTP. However, this switch has a separate address
space (not covered and accessed by FEC address).

> If not, well, I
> don't know what to suggest.

For me it looks like the NXP's L2 switch shall be treated _just_ as
offloading IP block to accelerate switching (NXP already support
dpaa[2] for example).

The idea with having it configured on demand, when:
ip link add name br0 type bridge; ip link set br0 up;
ip link set eth0 master br0;
ip link set eth1 master br0;

Seems to be a reasonable one. In the above scenario it would work hand
by hand with FEC drivers (as those would handle PHY communication
setup and link up/down events).

It would be welcome if the community could come up with some rough idea
how to proceed with this IP block support (especially that for example
imx287 is used in many embedded devices and is going to be in active
production for next 10+ years).


Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: [email protected]


Attachments:
(No filename) (499.00 B)
OpenPGP digital signature

2021-06-29 09:33:48

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [RFC 2/3] net: Provide switchdev driver for NXP's More Than IP L2 switch

On Tue, Jun 29, 2021 at 10:09:37AM +0200, Lukasz Majewski wrote:
> Hi Vladimir,
>
> > On Mon, Jun 28, 2021 at 04:13:14PM +0200, Lukasz Majewski wrote:
> > > > > > So before considering merging your changes, i would like to
> > > > > > see a usable binding.
> > > > > >
> > > > > > I also don't remember seeing support for STP. Without that,
> > > > > > your network has broadcast storm problems when there are
> > > > > > loops. So i would like to see the code needed to put ports
> > > > > > into blocking, listening, learning, and forwarding states.
> > > > > >
> > > > > > Andrew
> > > >
> > > > I cannot stress enough how important it is for us to see STP
> > > > support and consequently the ndo_start_xmit procedure for switch
> > > > ports.
> > >
> > > Ok.
> > >
> > > > Let me see if I understand correctly. When the switch is enabled,
> > > > eth0 sends packets towards both physical switch ports, and eth1
> > > > sends packets towards none, but eth0 handles the link state of
> > > > switch port 0, and eth1 handles the link state of switch port 1?
> > >
> > > Exactly, this is how FEC driver is utilized for this switch.
> >
> > This is a much bigger problem than anything which has to do with code
> > organization. Linux does not have any sort of support for unmanaged
> > switches.
>
> My impression is similar. This switch cannot easily fit into DSA (lack
> of appending tags)

No, this is not why the switch does not fit the DSA model.
DSA assumes that the master interface and the switch are two completely
separate devices which manage themselves independently. Their boundary
is typically at the level of a MAC-to-MAC connection, although vendors
have sometimes blurred this line a bit in the case of integrated
switches. But the key point is that if there are 2 external ports going
to the switch, these should be managed by the switch driver. But when
the switch is sandwiched between the Ethernet controller of the "DSA
master" (the DMA engine of fec0) and the DSA master's MAC (still owned
by fec), the separation isn't quite what DSA expects, is it? Remember
that in the case of the MTIP switch, the fec driver needs to put the
MACs going to the switch in promiscuous mode such that the switch
behaves as a switch and actually forwards packets by MAC DA instead of
dropping them. So the system is much more tightly coupled.

+---------------------------------------------------------------------------+
| |
| +--------------+ +--------------------+--------+ +------------+
| | | | MTIP switch | | | |
| | fec 1 DMA |---x | | Port 2 |------| fec 1 MAC |
| | | | \ / | | | |
| +--------------+ | \/ +--------+ +------------+
| | /\ | |
| +--------------+ +--------+ / \ +--------+ +------------+
| | | | | | | | |
| | fec 0 DMA |--------| Port 0 | | Port 1 |------| fec 0 MAC |
| | | | | | | | |
| +--------------+ +--------+-----------+--------+ +------------+
| |
+---------------------------------------------------------------------------+

Is this DSA? I don't really think so, but you could still try to argue
otherwise.

The opposite is also true. DSA supports switches that don't append tags
to packets (see sja1105). This doesn't make them "less DSA", just more
of a pain to work with.

> nor to switchdev.
>
> The latter is caused by two modes of operation:
>
> - Bypass mode (no switch) -> DMA1 and DMA0 are used
> - Switch mode -> only DMA0 is used
>
>
> Moreover, from my understanding of the CPSW - looks like it uses always
> just a single DMA, and the switching seems to be the default operation
> for two ethernet ports.
>
> The "bypass mode" from NXP's L2 switch seems to be achieved inside the
> CPSW switch, by configuring it to not pass packets between those ports.

I don't exactly see the point you're trying to make here. At the end of
the day, the only thing that matters is what you expose to the user.
With no way (when the switch is enabled) for a socket opened on eth0 to
send/receive packets coming only from the first port, and a socket
opened on eth1 to send/receive packets coming only from the second port,
I think this driver attempt is a pretty far cry from what a switch
driver in Linux is expected to offer, be it modeled as switchdev or DSA.

> > Please try to find out if your switch is supposed to be able
> > to be managed (run control protocols on the CPU).
>
> It can support all the "normal" set of L2 switch features:
>
> - VLANs, lookup table (with learning), filtering and forwarding
> (Multicast, Broadcast, Unicast), priority queues, IP snooping, etc.
>
> Frames for BPDU are recognized by the switch and can be used to
> implement support for RSTP. However, this switch has a separate address
> space (not covered and accessed by FEC address).
>
> > If not, well, I
> > don't know what to suggest.
>
> For me it looks like the NXP's L2 switch shall be treated _just_ as
> offloading IP block to accelerate switching (NXP already support
> dpaa[2] for example).
>
> The idea with having it configured on demand, when:
> ip link add name br0 type bridge; ip link set br0 up;
> ip link set eth0 master br0;
> ip link set eth1 master br0;
>
> Seems to be a reasonable one. In the above scenario it would work hand
> by hand with FEC drivers (as those would handle PHY communication
> setup and link up/down events).

You seem to imply that we are suggesting something different.

> It would be welcome if the community could come up with some rough idea
> how to proceed with this IP block support

Ok, so what I would do if I really cared that much about mainline
support is I would refactor the FEC driver to offer its core
functionality to a new multi-port driver that is able to handle the FEC
DMA interfaces, the MACs and the switch. EXPORT_SYMBOL_GPL is your
friend.

This driver would probe on a device tree binding with 3 "reg" values: 1
for the fec@800f0000, 1 for the fec@800f4000 and 1 for the switch@800f8000.
No puppet master driver which coordinates other drivers, just a single
driver that, depending on the operating state, manages all the SoC
resources in a way that will offer a sane and consistent view of the
Ethernet ports.

So it will have a different .ndo_start_xmit implementation depending on
whether the switch is bypassed or not (if you need to send a packet on
eth1 and the switch is bypassed, you send it through the DMA interface
of eth1, otherwise you send it through the DMA interface of eth0 in a
way in which the switch will actually route it to the eth1 physical
port).

Then I would implement support for BPDU RX/TX (I haven't looked at the
documentation, but I expect that what this switch offers for control
traffic doesn't scale at high speeds (if it does, great, then send and
receive all your packets as control packets, to have precise port
identification). If it doesn't, you'll need a way to treat your data
plane packets differently from the control plane packets. For the data
plane, you can perhaps borrow some ideas from net/dsa/tag_8021q.c, or
even from Tobias Waldekranz's proposal to just let data plane packets
coming from the bridge slide into the switch with no precise control of
the destination port at all, just let the switch perform FDB lookups for
those packets because the switch hardware FDB is supposed to be more or
less in sync with the bridge software FDB:
https://patchwork.kernel.org/project/netdevbpf/cover/[email protected]/

> (especially that for example imx287 is used in many embedded devices
> and is going to be in active production for next 10+ years).

Well, I guess you have a plan then. There are still 10+ years left to
enjoy the benefits of a proper driver design...

2021-06-29 12:53:54

by Lukasz Majewski

[permalink] [raw]
Subject: Re: [RFC 2/3] net: Provide switchdev driver for NXP's More Than IP L2 switch

Hi Vladimir,

> On Tue, Jun 29, 2021 at 10:09:37AM +0200, Lukasz Majewski wrote:
> > Hi Vladimir,
> >
> > > On Mon, Jun 28, 2021 at 04:13:14PM +0200, Lukasz Majewski wrote:
> > > > > > > So before considering merging your changes, i would like
> > > > > > > to see a usable binding.
> > > > > > >
> > > > > > > I also don't remember seeing support for STP. Without
> > > > > > > that, your network has broadcast storm problems when
> > > > > > > there are loops. So i would like to see the code needed
> > > > > > > to put ports into blocking, listening, learning, and
> > > > > > > forwarding states.
> > > > > > >
> > > > > > > Andrew
> > > > >
> > > > > I cannot stress enough how important it is for us to see STP
> > > > > support and consequently the ndo_start_xmit procedure for
> > > > > switch ports.
> > > >
> > > > Ok.
> > > >
> > > > > Let me see if I understand correctly. When the switch is
> > > > > enabled, eth0 sends packets towards both physical switch
> > > > > ports, and eth1 sends packets towards none, but eth0 handles
> > > > > the link state of switch port 0, and eth1 handles the link
> > > > > state of switch port 1?
> > > >
> > > > Exactly, this is how FEC driver is utilized for this switch.
> > >
> > > This is a much bigger problem than anything which has to do with
> > > code organization. Linux does not have any sort of support for
> > > unmanaged switches.
> >
> > My impression is similar. This switch cannot easily fit into DSA
> > (lack of appending tags)
>
> No, this is not why the switch does not fit the DSA model.
> DSA assumes that the master interface and the switch are two
> completely separate devices which manage themselves independently.
> Their boundary is typically at the level of a MAC-to-MAC connection,
> although vendors have sometimes blurred this line a bit in the case
> of integrated switches. But the key point is that if there are 2
> external ports going to the switch, these should be managed by the
> switch driver. But when the switch is sandwiched between the Ethernet
> controller of the "DSA master" (the DMA engine of fec0) and the DSA
> master's MAC (still owned by fec), the separation isn't quite what
> DSA expects, is it? Remember that in the case of the MTIP switch, the
> fec driver needs to put the MACs going to the switch in promiscuous
> mode such that the switch behaves as a switch and actually forwards
> packets by MAC DA instead of dropping them. So the system is much
> more tightly coupled.
>
> +---------------------------------------------------------------------------+
> |
> | | +--------------+ +--------------------+--------+
> +------------+ | | | | MTIP switch |
> | | | | | fec 1 DMA |---x |
> | Port 2 |------| fec 1 MAC | | | | |
> \ / | | | | | +--------------+ |
> \/ +--------+ +------------+ |
> | /\ | | |
> +--------------+ +--------+ / \ +--------+
> +------------+ | | | | | |
> | | | | | fec 0 DMA |--------| Port 0 |
> | Port 1 |------| fec 0 MAC | | | | | |
> | | | | | +--------------+
> +--------+-----------+--------+ +------------+ |
> |
> +---------------------------------------------------------------------------+
>
> Is this DSA? I don't really think so, but you could still try to argue
> otherwise.
>
> The opposite is also true. DSA supports switches that don't append
> tags to packets (see sja1105). This doesn't make them "less DSA",
> just more of a pain to work with.
>
> > nor to switchdev.
> >
> > The latter is caused by two modes of operation:
> >
> > - Bypass mode (no switch) -> DMA1 and DMA0 are used
> > - Switch mode -> only DMA0 is used
> >
> >
> > Moreover, from my understanding of the CPSW - looks like it uses
> > always just a single DMA, and the switching seems to be the default
> > operation for two ethernet ports.
> >
> > The "bypass mode" from NXP's L2 switch seems to be achieved inside
> > the CPSW switch, by configuring it to not pass packets between
> > those ports.
>
> I don't exactly see the point you're trying to make here. At the end
> of the day, the only thing that matters is what you expose to the
> user. With no way (when the switch is enabled) for a socket opened on
> eth0 to send/receive packets coming only from the first port, and a
> socket opened on eth1 to send/receive packets coming only from the
> second port, I think this driver attempt is a pretty far cry from
> what a switch driver in Linux is expected to offer, be it modeled as
> switchdev or DSA.
>
> > > Please try to find out if your switch is supposed to be able
> > > to be managed (run control protocols on the CPU).
> >
> > It can support all the "normal" set of L2 switch features:
> >
> > - VLANs, lookup table (with learning), filtering and forwarding
> > (Multicast, Broadcast, Unicast), priority queues, IP snooping,
> > etc.
> >
> > Frames for BPDU are recognized by the switch and can be used to
> > implement support for RSTP. However, this switch has a separate
> > address space (not covered and accessed by FEC address).
> >
> > > If not, well, I
> > > don't know what to suggest.
> >
> > For me it looks like the NXP's L2 switch shall be treated _just_ as
> > offloading IP block to accelerate switching (NXP already support
> > dpaa[2] for example).
> >
> > The idea with having it configured on demand, when:
> > ip link add name br0 type bridge; ip link set br0 up;
> > ip link set eth0 master br0;
> > ip link set eth1 master br0;
> >
> > Seems to be a reasonable one. In the above scenario it would work
> > hand by hand with FEC drivers (as those would handle PHY
> > communication setup and link up/down events).
>
> You seem to imply that we are suggesting something different.
>
> > It would be welcome if the community could come up with some rough
> > idea how to proceed with this IP block support
>
> Ok, so what I would do if I really cared that much about mainline
> support is I would refactor the FEC driver to offer its core
> functionality to a new multi-port driver that is able to handle the
> FEC DMA interfaces, the MACs and the switch. EXPORT_SYMBOL_GPL is your
> friend.
>
> This driver would probe on a device tree binding with 3 "reg" values:
> 1 for the fec@800f0000, 1 for the fec@800f4000 and 1 for the
> switch@800f8000. No puppet master driver which coordinates other
> drivers, just a single driver that, depending on the operating state,
> manages all the SoC resources in a way that will offer a sane and
> consistent view of the Ethernet ports.
>
> So it will have a different .ndo_start_xmit implementation depending
> on whether the switch is bypassed or not (if you need to send a
> packet on eth1 and the switch is bypassed, you send it through the
> DMA interface of eth1, otherwise you send it through the DMA
> interface of eth0 in a way in which the switch will actually route it
> to the eth1 physical port).
>
> Then I would implement support for BPDU RX/TX (I haven't looked at the
> documentation, but I expect that what this switch offers for control
> traffic doesn't scale at high speeds (if it does, great, then send and
> receive all your packets as control packets, to have precise port
> identification). If it doesn't, you'll need a way to treat your data
> plane packets differently from the control plane packets. For the data
> plane, you can perhaps borrow some ideas from net/dsa/tag_8021q.c, or
> even from Tobias Waldekranz's proposal to just let data plane packets
> coming from the bridge slide into the switch with no precise control
> of the destination port at all, just let the switch perform FDB
> lookups for those packets because the switch hardware FDB is supposed
> to be more or less in sync with the bridge software FDB:
> https://patchwork.kernel.org/project/netdevbpf/cover/[email protected]/
>

Thanks for sketching and sharing such detailed plan.

> > (especially that for example imx287 is used in many embedded devices
> > and is going to be in active production for next 10+ years).
>
> Well, I guess you have a plan then. There are still 10+ years left to
> enjoy the benefits of a proper driver design...

:-)


Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: [email protected]


Attachments:
(No filename) (499.00 B)
OpenPGP digital signature