2022-06-27 13:01:45

by Michael Walle

[permalink] [raw]
Subject: fwnode_for_each_child_node() and OF backend discrepancy

Hi,

I tired to iterate over all child nodes, regardless if they are
available
or not. Now there is that handy fwnode_for_each_child_node() (and the
fwnode_for_each_available_child_node()). The only thing is the OF
backend
already skips disabled nodes [1], making fwnode_for_each_child_node()
and
fwnode_for_each_available_child_node() behave the same with the OF
backend.

Doesn't seem to be noticed by anyone for now. I'm not sure how to fix
that
one. fwnode_for_each_child_node() and also fwnode_get_next_child_node()
are
used by a handful of drivers. I've looked at some, but couldn't decide
whether they really want to iterate over all child nodes or just the
enabled
ones.

Any thoughts?

-michael

[1]
https://elixir.bootlin.com/linux/v5.19-rc3/source/drivers/of/property.c#L960


2022-06-27 13:23:20

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: fwnode_for_each_child_node() and OF backend discrepancy

On 27/06/2022 14:49, Michael Walle wrote:
> Hi,
>
> I tired to iterate over all child nodes, regardless if they are
> available
> or not. Now there is that handy fwnode_for_each_child_node() (and the
> fwnode_for_each_available_child_node()). The only thing is the OF
> backend
> already skips disabled nodes [1], making fwnode_for_each_child_node()
> and
> fwnode_for_each_available_child_node() behave the same with the OF
> backend.
>
> Doesn't seem to be noticed by anyone for now. I'm not sure how to fix
> that
> one. fwnode_for_each_child_node() and also fwnode_get_next_child_node()
> are
> used by a handful of drivers. I've looked at some, but couldn't decide
> whether they really want to iterate over all child nodes or just the
> enabled
> ones.

If I get it correctly, this was introduced by 8a0662d9ed29 ("Driver
core: Unified interface for firmware node properties")
.

The question to Rafael - what was your intention when you added
device_get_next_child_node() looking only for available nodes?

My understanding is that this implementation should be consistent with
OF implementation, so fwnode_get_next_child_node=get any child.

However maybe ACPI treats it somehow differently?

Best regards,
Krzysztof

2022-06-27 13:46:25

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: fwnode_for_each_child_node() and OF backend discrepancy

On Mon, Jun 27, 2022 at 3:08 PM Krzysztof Kozlowski
<[email protected]> wrote:
>
> On 27/06/2022 14:49, Michael Walle wrote:
> > Hi,
> >
> > I tired to iterate over all child nodes, regardless if they are
> > available
> > or not. Now there is that handy fwnode_for_each_child_node() (and the
> > fwnode_for_each_available_child_node()). The only thing is the OF
> > backend
> > already skips disabled nodes [1], making fwnode_for_each_child_node()
> > and
> > fwnode_for_each_available_child_node() behave the same with the OF
> > backend.
> >
> > Doesn't seem to be noticed by anyone for now. I'm not sure how to fix
> > that
> > one. fwnode_for_each_child_node() and also fwnode_get_next_child_node()
> > are
> > used by a handful of drivers. I've looked at some, but couldn't decide
> > whether they really want to iterate over all child nodes or just the
> > enabled
> > ones.
>
> If I get it correctly, this was introduced by 8a0662d9ed29 ("Driver
> core: Unified interface for firmware node properties")
> .

Originally it was, but then it has been reworked a few times.

The backend callbacks were introduced by Sakari, in particular.

> The question to Rafael - what was your intention when you added
> device_get_next_child_node() looking only for available nodes?

That depends on the backend.

fwnode_for_each_available_child_node() is more specific and IIRC it
was introduced for fw_devlink (CC Saravana).

> My understanding is that this implementation should be consistent with
> OF implementation, so fwnode_get_next_child_node=get any child.

IIUC, the OF implementation is not consistent with the
fwnode_get_next_child_node=get any child thing.

> However maybe ACPI treats it somehow differently?

acpi_get_next_subnode() simply returns the next subnode it can find.

2022-06-28 11:02:32

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: fwnode_for_each_child_node() and OF backend discrepancy

On 27/06/2022 15:33, Rafael J. Wysocki wrote:
> On Mon, Jun 27, 2022 at 3:08 PM Krzysztof Kozlowski
> <[email protected]> wrote:
>>
>> On 27/06/2022 14:49, Michael Walle wrote:
>>> Hi,
>>>
>>> I tired to iterate over all child nodes, regardless if they are
>>> available
>>> or not. Now there is that handy fwnode_for_each_child_node() (and the
>>> fwnode_for_each_available_child_node()). The only thing is the OF
>>> backend
>>> already skips disabled nodes [1], making fwnode_for_each_child_node()
>>> and
>>> fwnode_for_each_available_child_node() behave the same with the OF
>>> backend.
>>>
>>> Doesn't seem to be noticed by anyone for now. I'm not sure how to fix
>>> that
>>> one. fwnode_for_each_child_node() and also fwnode_get_next_child_node()
>>> are
>>> used by a handful of drivers. I've looked at some, but couldn't decide
>>> whether they really want to iterate over all child nodes or just the
>>> enabled
>>> ones.
>>
>> If I get it correctly, this was introduced by 8a0662d9ed29 ("Driver
>> core: Unified interface for firmware node properties")
>> .
>
> Originally it was, but then it has been reworked a few times.
>
> The backend callbacks were introduced by Sakari, in particular.

I see you as an author of 8a0662d9ed29 which adds
device_get_next_child_node() and uses of_get_next_available_child()
instead of of_get_next_child(). Although it was back in 2014, so maybe
it will be tricky to get original intention. :)

Which commit do you mean when you refer to Sakari's work?

>
>> The question to Rafael - what was your intention when you added
>> device_get_next_child_node() looking only for available nodes?
>
> That depends on the backend.

We talk about OF backend. In your commit device_get_next_child_node for
OF uses explicitly available node, not any node.

> fwnode_for_each_available_child_node() is more specific and IIRC it
> was introduced for fw_devlink (CC Saravana).
>
>> My understanding is that this implementation should be consistent with
>> OF implementation, so fwnode_get_next_child_node=get any child.
>
> IIUC, the OF implementation is not consistent with the
> fwnode_get_next_child_node=get any child thing.
>
>> However maybe ACPI treats it somehow differently?
>
> acpi_get_next_subnode() simply returns the next subnode it can find.


Best regards,
Krzysztof

2022-06-28 11:19:04

by Andy Shevchenko

[permalink] [raw]
Subject: Re: fwnode_for_each_child_node() and OF backend discrepancy

On Mon, Jun 27, 2022 at 02:49:51PM +0200, Michael Walle wrote:
> Hi,
>
> I tired to iterate over all child nodes, regardless if they are available
> or not. Now there is that handy fwnode_for_each_child_node() (and the
> fwnode_for_each_available_child_node()). The only thing is the OF backend
> already skips disabled nodes [1], making fwnode_for_each_child_node() and
> fwnode_for_each_available_child_node() behave the same with the OF backend.
>
> Doesn't seem to be noticed by anyone for now. I'm not sure how to fix that
> one. fwnode_for_each_child_node() and also fwnode_get_next_child_node() are
> used by a handful of drivers. I've looked at some, but couldn't decide
> whether they really want to iterate over all child nodes or just the enabled
> ones.
>
> Any thoughts?

It was discussed at least twice this year (in regard to some new IIO drivers)
and Rob told that iterating over disabled (not available) nodes in OF kinda
legacy/design mistake. That's why device_for_each_child_node() goes only
over available nodes only.

So, why do you need to iterate over disabled ones?

--
With Best Regards,
Andy Shevchenko


2022-06-28 12:24:15

by Michael Walle

[permalink] [raw]
Subject: Re: fwnode_for_each_child_node() and OF backend discrepancy

Am 2022-06-28 13:10, schrieb Andy Shevchenko:
> On Mon, Jun 27, 2022 at 02:49:51PM +0200, Michael Walle wrote:
>> Hi,
>>
>> I tired to iterate over all child nodes, regardless if they are
>> available
>> or not. Now there is that handy fwnode_for_each_child_node() (and the
>> fwnode_for_each_available_child_node()). The only thing is the OF
>> backend
>> already skips disabled nodes [1], making fwnode_for_each_child_node()
>> and
>> fwnode_for_each_available_child_node() behave the same with the OF
>> backend.
>>
>> Doesn't seem to be noticed by anyone for now. I'm not sure how to fix
>> that
>> one. fwnode_for_each_child_node() and also
>> fwnode_get_next_child_node() are
>> used by a handful of drivers. I've looked at some, but couldn't decide
>> whether they really want to iterate over all child nodes or just the
>> enabled
>> ones.
>>
>> Any thoughts?
>
> It was discussed at least twice this year (in regard to some new IIO
> drivers)
> and Rob told that iterating over disabled (not available) nodes in OF
> kinda
> legacy/design mistake. That's why device_for_each_child_node() goes
> only
> over available nodes only.

Mh, but then the fwnode_for_each_child_node() is very misleading, esp.
with the presence of fwnode_for_each_available_child_node().

> So, why do you need to iterate over disabled ones?

I was trying to fix the lan966x driver [1] which doesn't work if there
are disabled nodes in between. My steps would have been:
(1) change fwnode_for_each_available_child_node() to
fwnode_for_each_child_node(), maybe with a fixes tag, as it's
easy to backport
(2) introduce new compatibles and deduce the number of ports
according to the compatible string and not by counting
the child nodes.
(3) keep the old behavior for the legacy compatible and mark it
as deprecated in the binding
(4) move the device tree over to the new compatible string

-michael

[1]
https://elixir.bootlin.com/linux/v5.19-rc4/source/drivers/net/ethernet/microchip/lan966x/lan966x_main.c#L1029

2022-06-28 13:17:45

by Andy Shevchenko

[permalink] [raw]
Subject: Re: fwnode_for_each_child_node() and OF backend discrepancy

On Tue, Jun 28, 2022 at 1:54 PM Michael Walle <[email protected]> wrote:
>
> Am 2022-06-28 13:10, schrieb Andy Shevchenko:
> > On Mon, Jun 27, 2022 at 02:49:51PM +0200, Michael Walle wrote:
> >> Hi,
> >>
> >> I tired to iterate over all child nodes, regardless if they are
> >> available
> >> or not. Now there is that handy fwnode_for_each_child_node() (and the
> >> fwnode_for_each_available_child_node()). The only thing is the OF
> >> backend
> >> already skips disabled nodes [1], making fwnode_for_each_child_node()
> >> and
> >> fwnode_for_each_available_child_node() behave the same with the OF
> >> backend.
> >>
> >> Doesn't seem to be noticed by anyone for now. I'm not sure how to fix
> >> that
> >> one. fwnode_for_each_child_node() and also
> >> fwnode_get_next_child_node() are
> >> used by a handful of drivers. I've looked at some, but couldn't decide
> >> whether they really want to iterate over all child nodes or just the
> >> enabled
> >> ones.
> >>
> >> Any thoughts?
> >
> > It was discussed at least twice this year (in regard to some new IIO
> > drivers)
> > and Rob told that iterating over disabled (not available) nodes in OF
> > kinda
> > legacy/design mistake. That's why device_for_each_child_node() goes
> > only
> > over available nodes only.
>
> Mh, but then the fwnode_for_each_child_node() is very misleading, esp.
> with the presence of fwnode_for_each_available_child_node().
>
> > So, why do you need to iterate over disabled ones?
>
> I was trying to fix the lan966x driver [1] which doesn't work if there
> are disabled nodes in between.

Can you elaborate what's wrong now in the behaviour of the driver? In
the code it uses twice the _available variant.

> My steps would have been:
> (1) change fwnode_for_each_available_child_node() to
> fwnode_for_each_child_node(), maybe with a fixes tag, as it's
> easy to backport
> (2) introduce new compatibles and deduce the number of ports
> according to the compatible string and not by counting
> the child nodes.
> (3) keep the old behavior for the legacy compatible and mark it
> as deprecated in the binding
> (4) move the device tree over to the new compatible string

> [1]
> https://elixir.bootlin.com/linux/v5.19-rc4/source/drivers/net/ethernet/microchip/lan966x/lan966x_main.c


--
With Best Regards,
Andy Shevchenko

2022-06-28 13:33:58

by Michael Walle

[permalink] [raw]
Subject: Re: fwnode_for_each_child_node() and OF backend discrepancy

>> I was trying to fix the lan966x driver [1] which doesn't work if there
>> are disabled nodes in between.
>
> Can you elaborate what's wrong now in the behaviour of the driver? In
> the code it uses twice the _available variant.

Imagine the following device tree snippet:
port0 {
reg = <0>;
status = "okay";
}
port1 {
reg = <1>;
status = "disabled";
}
port@2 {
reg = <2>;
status = "okay";
}

The driver will set num_phys_ports to 2. When port@2 is probed, it
will have the (correct!) physical port number 2. That will then
trigger various EINVAL checks with "port_num >= num_phys_ports" or
WARN()s.

So the easiest fix would be to actual count all the child nodes
(regardless if they are available or not), assuming there are as
many nodes as physical ports.

But num_phys_ports being a property of the hardware I don't
think it's good to deduce it by counting the child nodes anyway,
but it should rather be a (hardcoded) property of the driver.

-michael

[1]
https://elixir.bootlin.com/linux/v5.19-rc4/source/drivers/net/ethernet/microchip/lan966x/lan966x_main.c

2022-06-28 13:35:02

by Andy Shevchenko

[permalink] [raw]
Subject: Re: fwnode_for_each_child_node() and OF backend discrepancy

On Tue, Jun 28, 2022 at 3:23 PM Michael Walle <[email protected]> wrote:
>
> >> I was trying to fix the lan966x driver [1] which doesn't work if there
> >> are disabled nodes in between.
> >
> > Can you elaborate what's wrong now in the behaviour of the driver? In
> > the code it uses twice the _available variant.
>
> Imagine the following device tree snippet:
> port0 {
> reg = <0>;
> status = "okay";
> }
> port1 {
> reg = <1>;
> status = "disabled";
> }
> port@2 {
> reg = <2>;
> status = "okay";
> }
>
> The driver will set num_phys_ports to 2. When port@2 is probed, it
> will have the (correct!) physical port number 2. That will then
> trigger various EINVAL checks with "port_num >= num_phys_ports" or
> WARN()s.

It means the above mentioned condition is wrong: it should be

"port_idx >= num_phys_ports" (if the port_idx doesn't exists, that's
the bug in the first place)

> So the easiest fix would be to actual count all the child nodes
> (regardless if they are available or not), assuming there are as
> many nodes as physical ports.
>
> But num_phys_ports being a property of the hardware

So, name is wrong, that's how I read it, it should be
num_of_acrive_phys_ports (or alike).

> I don't
> think it's good to deduce it by counting the child nodes anyway,

Right.

> but it should rather be a (hardcoded) property of the driver.

Also good to update.

> [1]
> https://elixir.bootlin.com/linux/v5.19-rc4/source/drivers/net/ethernet/microchip/lan966x/lan966x_main.c

--
With Best Regards,
Andy Shevchenko

2022-06-28 14:00:10

by Michael Walle

[permalink] [raw]
Subject: Re: fwnode_for_each_child_node() and OF backend discrepancy

[adding Horatiu Vultur, because we now digress to the bug
in the switch, rather than that odd OF behavior]

Am 2022-06-28 15:29, schrieb Andy Shevchenko:
> On Tue, Jun 28, 2022 at 3:23 PM Michael Walle <[email protected]> wrote:
>>
>> >> I was trying to fix the lan966x driver [1] which doesn't work if there
>> >> are disabled nodes in between.
>> >
>> > Can you elaborate what's wrong now in the behaviour of the driver? In
>> > the code it uses twice the _available variant.
>>
>> Imagine the following device tree snippet:
>> port0 {
>> reg = <0>;
>> status = "okay";
>> }
>> port1 {
>> reg = <1>;
>> status = "disabled";
>> }
>> port@2 {
>> reg = <2>;
>> status = "okay";
>> }
>>
>> The driver will set num_phys_ports to 2. When port@2 is probed, it
>> will have the (correct!) physical port number 2. That will then
>> trigger various EINVAL checks with "port_num >= num_phys_ports" or
>> WARN()s.
>
> It means the above mentioned condition is wrong: it should be
>
> "port_idx >= num_phys_ports" (if the port_idx doesn't exists, that's
> the bug in the first place)

I can't follow you here. Please note, that you need the actual
physical port number. It's not a made up number, but corresponds
to a physical port on that ethernet switch. So you can't just skip
the disabled ones. port@2 must have port number 2.

>> So the easiest fix would be to actual count all the child nodes
>> (regardless if they are available or not), assuming there are as
>> many nodes as physical ports.
>>
>> But num_phys_ports being a property of the hardware
>
> So, name is wrong, that's how I read it, it should be
> num_of_acrive_phys_ports (or alike).

See above, it is not just an iterator but corresponds to
a hardware property.

>> I don't
>> think it's good to deduce it by counting the child nodes anyway,
>
> Right.
>
>> but it should rather be a (hardcoded) property of the driver.
>
> Also good to update.

Horatiu, can we determine the actual number of ports (or maybe
determine if its a LAN9668 or a LAN9662) from the hardware itself
in an easy way? That way we wouldn't need a new compatible string,
but could use the generic "lan966x" one.

-michael

[1]
https://elixir.bootlin.com/linux/v5.19-rc4/source/drivers/net/ethernet/microchip/lan966x/lan966x_main.c

2022-06-28 14:24:14

by Michael Walle

[permalink] [raw]
Subject: Re: fwnode_for_each_child_node() and OF backend discrepancy

Am 2022-06-28 15:51, schrieb Krzysztof Kozlowski:
> On 28/06/2022 15:47, Michael Walle wrote:
>> [adding Horatiu Vultur, because we now digress to the bug
>> in the switch, rather than that odd OF behavior]
>>
>> Am 2022-06-28 15:29, schrieb Andy Shevchenko:
>>> On Tue, Jun 28, 2022 at 3:23 PM Michael Walle <[email protected]>
>>> wrote:
>>>>
>>>>>> I was trying to fix the lan966x driver [1] which doesn't work if
>>>>>> there
>>>>>> are disabled nodes in between.
>>>>>
>>>>> Can you elaborate what's wrong now in the behaviour of the driver?
>>>>> In
>>>>> the code it uses twice the _available variant.
>>>>
>>>> Imagine the following device tree snippet:
>>>> port0 {
>>>> reg = <0>;
>>>> status = "okay";
>>>> }
>>>> port1 {
>>>> reg = <1>;
>>>> status = "disabled";
>>>> }
>>>> port@2 {
>>>> reg = <2>;
>>>> status = "okay";
>>>> }
>>>>
>>>> The driver will set num_phys_ports to 2. When port@2 is probed, it
>>>> will have the (correct!) physical port number 2. That will then
>>>> trigger various EINVAL checks with "port_num >= num_phys_ports" or
>>>> WARN()s.
>>>
>>> It means the above mentioned condition is wrong: it should be
>>>
>>> "port_idx >= num_phys_ports" (if the port_idx doesn't exists, that's
>>> the bug in the first place)
>>
>> I can't follow you here. Please note, that you need the actual
>> physical port number. It's not a made up number, but corresponds
>> to a physical port on that ethernet switch. So you can't just skip
>> the disabled ones. port@2 must have port number 2.
>
> The number "2" you get from the reg property, so where is exactly the
> problem?

That you need to get the total number of ports of the hardware (which
is also used for things beyond validation, eg. during switch init
all ports will are disabled). And right now, that is done by counting
all the nodes - which is bad, I guess we agree here. But it works,
as long as no ports are disabled and all ports are described in the
device tree. But I have device trees where some are disabled.

I assume, you cannot read the hardware itself to get the number of
physical ports; and we have the compatible "microchip,lan966x-switch",
which is the generic one, so it could be the LAN9668 (with 8 ports)
or the LAN9662 (with 2 ports). We somehow have to retain backwards
compatibility. Thus my idea was to at least make the handling slightly
better and count *any* child nodes. So it doesn't fall apart with
disabled
nodes. Then introduce proper compatible strings
"microchip,lan9668-switch"
and use that to hardcode the num_phys_ports to 8. But there will be
device trees with microchip,lan966x-switch out there, which we do want
to support.

I see the following options:
(1) just don't care and get rid of the "microchip,lan966x-switch"
compatible
(2) quick fix for the older kernels by counting all the nodes and
proper fix for the newer kernels with dedicated compatibles
(3) no fix for older kernels, introduce new compatibles for new
kernels
(4) keep generic compatible if the hardware can be read out to get
the number of ports.

I think (1) isn't the way to go. (2) was my try until I noticed
that odd fwnode behavior. But judging by this thread, I don't think
thats possible. I don't know if (4) is possible at all. If not we'd
be left with (3).

> If you want to validate it against some maximum number of ports, based
> on DTS, it makes no sense... One can supply a DTS with port number
> 10000
> and 10000 nodes, or with specific property "maximum-port-number=10000".
> It would make sense if you validate it against driver-hard-coded values
> (which you later mentioned in your reply).

Yes, see above.

-michael

2022-06-28 14:34:01

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: fwnode_for_each_child_node() and OF backend discrepancy

On 28/06/2022 15:47, Michael Walle wrote:
> [adding Horatiu Vultur, because we now digress to the bug
> in the switch, rather than that odd OF behavior]
>
> Am 2022-06-28 15:29, schrieb Andy Shevchenko:
>> On Tue, Jun 28, 2022 at 3:23 PM Michael Walle <[email protected]> wrote:
>>>
>>>>> I was trying to fix the lan966x driver [1] which doesn't work if there
>>>>> are disabled nodes in between.
>>>>
>>>> Can you elaborate what's wrong now in the behaviour of the driver? In
>>>> the code it uses twice the _available variant.
>>>
>>> Imagine the following device tree snippet:
>>> port0 {
>>> reg = <0>;
>>> status = "okay";
>>> }
>>> port1 {
>>> reg = <1>;
>>> status = "disabled";
>>> }
>>> port@2 {
>>> reg = <2>;
>>> status = "okay";
>>> }
>>>
>>> The driver will set num_phys_ports to 2. When port@2 is probed, it
>>> will have the (correct!) physical port number 2. That will then
>>> trigger various EINVAL checks with "port_num >= num_phys_ports" or
>>> WARN()s.
>>
>> It means the above mentioned condition is wrong: it should be
>>
>> "port_idx >= num_phys_ports" (if the port_idx doesn't exists, that's
>> the bug in the first place)
>
> I can't follow you here. Please note, that you need the actual
> physical port number. It's not a made up number, but corresponds
> to a physical port on that ethernet switch. So you can't just skip
> the disabled ones. port@2 must have port number 2.

The number "2" you get from the reg property, so where is exactly the
problem?

If you want to validate it against some maximum number of ports, based
on DTS, it makes no sense... One can supply a DTS with port number 10000
and 10000 nodes, or with specific property "maximum-port-number=10000".
It would make sense if you validate it against driver-hard-coded values
(which you later mentioned in your reply).


Best regards,
Krzysztof

2022-06-28 14:47:38

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: fwnode_for_each_child_node() and OF backend discrepancy

On 28/06/2022 16:22, Michael Walle wrote:
>>> I can't follow you here. Please note, that you need the actual
>>> physical port number. It's not a made up number, but corresponds
>>> to a physical port on that ethernet switch. So you can't just skip
>>> the disabled ones. port@2 must have port number 2.
>>
>> The number "2" you get from the reg property, so where is exactly the
>> problem?
>
> That you need to get the total number of ports of the hardware (which
> is also used for things beyond validation, eg. during switch init
> all ports will are disabled). And right now, that is done by counting
> all the nodes - which is bad, I guess we agree here.

It's bad also from another reason - the DT node was explicitly disabled,
but you perform some operation on actual hardware representing this
node. I would assume that a disabled DT node means it is not
operational, e.g. hardware not present or missing clocks, so you should
not treat it as another meaning - power down/unused.

> But it works,
> as long as no ports are disabled and all ports are described in the
> device tree. But I have device trees where some are disabled.

I am not sure if I follow this. You have devices which
1. have unused ports, but all DT nodes are available/okay,
2. have unused ports, which are in DT status=disabled?

Doesn't case 2 break the bindings? If so, we don't care about such
out-of-tree users. We cannot support all of possible weird combinations
in out-of-tree DTS files...

>
> I assume, you cannot read the hardware itself to get the number of
> physical ports; and we have the compatible "microchip,lan966x-switch",
> which is the generic one, so it could be the LAN9668 (with 8 ports)
> or the LAN9662 (with 2 ports).

I'll keep that argument for future when I see again patches adding such
wildcard compatible. :) I had to discuss with some folks...

Although the compatible difference does not have to be important here,
because one could say the 9662 and 9668 programming model is exaclty the
same and they differ by number of ports. This number of ports can be a
dedicated property or counted from the children (if they were all
available).

> We somehow have to retain backwards
> compatibility. Thus my idea was to at least make the handling slightly
> better and count *any* child nodes. So it doesn't fall apart with
> disabled
> nodes. Then introduce proper compatible strings
> "microchip,lan9668-switch"
> and use that to hardcode the num_phys_ports to 8. But there will be
> device trees with microchip,lan966x-switch out there, which we do want
> to support.
>
> I see the following options:
> (1) just don't care and get rid of the "microchip,lan966x-switch"
> compatible
> (2) quick fix for the older kernels by counting all the nodes and
> proper fix for the newer kernels with dedicated compatibles
> (3) no fix for older kernels, introduce new compatibles for new
> kernels

I propose this one. I would not care about out-of-tree DTSes which
decided to disable random stuff and expect things working. :)

> (4) keep generic compatible if the hardware can be read out to get
> the number of ports.
>
> I think (1) isn't the way to go. (2) was my try until I noticed
> that odd fwnode behavior. But judging by this thread, I don't think
> thats possible. I don't know if (4) is possible at all. If not we'd
> be left with (3).


Best regards,
Krzysztof

2022-06-28 14:48:47

by Sakari Ailus

[permalink] [raw]
Subject: Re: fwnode_for_each_child_node() and OF backend discrepancy

Hi Krzysztof, Rafael,

On Tue, Jun 28, 2022 at 12:32:12PM +0200, Krzysztof Kozlowski wrote:
> On 27/06/2022 15:33, Rafael J. Wysocki wrote:
> > On Mon, Jun 27, 2022 at 3:08 PM Krzysztof Kozlowski
> > <[email protected]> wrote:
> >>
> >> On 27/06/2022 14:49, Michael Walle wrote:
> >>> Hi,
> >>>
> >>> I tired to iterate over all child nodes, regardless if they are
> >>> available
> >>> or not. Now there is that handy fwnode_for_each_child_node() (and the
> >>> fwnode_for_each_available_child_node()). The only thing is the OF
> >>> backend
> >>> already skips disabled nodes [1], making fwnode_for_each_child_node()
> >>> and
> >>> fwnode_for_each_available_child_node() behave the same with the OF
> >>> backend.
> >>>
> >>> Doesn't seem to be noticed by anyone for now. I'm not sure how to fix
> >>> that
> >>> one. fwnode_for_each_child_node() and also fwnode_get_next_child_node()
> >>> are
> >>> used by a handful of drivers. I've looked at some, but couldn't decide
> >>> whether they really want to iterate over all child nodes or just the
> >>> enabled
> >>> ones.
> >>
> >> If I get it correctly, this was introduced by 8a0662d9ed29 ("Driver
> >> core: Unified interface for firmware node properties")
> >> .
> >
> > Originally it was, but then it has been reworked a few times.
> >
> > The backend callbacks were introduced by Sakari, in particular.
>
> I see you as an author of 8a0662d9ed29 which adds
> device_get_next_child_node() and uses of_get_next_available_child()
> instead of of_get_next_child(). Although it was back in 2014, so maybe
> it will be tricky to get original intention. :)
>
> Which commit do you mean when you refer to Sakari's work?
>
> >
> >> The question to Rafael - what was your intention when you added
> >> device_get_next_child_node() looking only for available nodes?
> >
> > That depends on the backend.
>
> We talk about OF backend. In your commit device_get_next_child_node for
> OF uses explicitly available node, not any node.

Well spotted.

I suppose that when the only function to get the next (available) child was
added, the expection was perhaps that only available child nodes did matter
in this API. On ACPI the two are almost always the same thing --- the
property API originates from OF and ACPI primarily uses different means to
work with what's under the level of devices.

What I'd perhaps do is to change the OF behaviour and switch callers to use
a different variant for drivers that do not appear solely ACPI-oriented.

acpi_fwnode_device_is_available() should return true for data nodes.
Otherwise getting the next available child isn't meaningful at all --- and
property API is primarily dealing with data nodes when it comes to ACPI.

I might not still try to backport the fixes as it matters more from API
consistency point of view than being an actual problem _right now_.

Also cc Heikki.

>
> > fwnode_for_each_available_child_node() is more specific and IIRC it
> > was introduced for fw_devlink (CC Saravana).
> >
> >> My understanding is that this implementation should be consistent with
> >> OF implementation, so fwnode_get_next_child_node=get any child.
> >
> > IIUC, the OF implementation is not consistent with the
> > fwnode_get_next_child_node=get any child thing.
> >
> >> However maybe ACPI treats it somehow differently?
> >
> > acpi_get_next_subnode() simply returns the next subnode it can find.

--
Kind regards,

Sakari Ailus

2022-06-28 15:16:01

by Michael Walle

[permalink] [raw]
Subject: Re: fwnode_for_each_child_node() and OF backend discrepancy

Am 2022-06-28 16:36, schrieb Krzysztof Kozlowski:
> On 28/06/2022 16:22, Michael Walle wrote:
>>>> I can't follow you here. Please note, that you need the actual
>>>> physical port number. It's not a made up number, but corresponds
>>>> to a physical port on that ethernet switch. So you can't just skip
>>>> the disabled ones. port@2 must have port number 2.
>>>
>>> The number "2" you get from the reg property, so where is exactly the
>>> problem?
>>
>> That you need to get the total number of ports of the hardware (which
>> is also used for things beyond validation, eg. during switch init
>> all ports will are disabled). And right now, that is done by counting
>> all the nodes - which is bad, I guess we agree here.
>
> It's bad also from another reason - the DT node was explicitly
> disabled,
> but you perform some operation on actual hardware representing this
> node. I would assume that a disabled DT node means it is not
> operational, e.g. hardware not present or missing clocks, so you should
> not treat it as another meaning - power down/unused.

Mh. Assume a SoC with an integrated ethernet switch. Some ports
are externally connected, some don't. I'd think they should be disabled,
no? Until now, all bindings I know, treat them as disabled. But OTOH
you still need to do some configurations on them, like disable port
forwarding, disable them or whatever. So the hardware is present, but
it is not connected to anything.

>> But it works,
>> as long as no ports are disabled and all ports are described in the
>> device tree. But I have device trees where some are disabled.
>
> I am not sure if I follow this. You have devices which
> 1. have unused ports, but all DT nodes are available/okay,
> 2. have unused ports, which are in DT status=disabled?
>
> Doesn't case 2 break the bindings? If so, we don't care about such
> out-of-tree users. We cannot support all of possible weird combinations
> in out-of-tree DTS files...

Case 1 is invalid I think.

How does case 2 break the binding? It breaks the driver, yes. But not
the binding. I agree on the out-of-tree argument, *but* isn't that
what the binding is for, that out-of-tree device trees gonna work as
long as they follow the binding? And I don't see where it dictates that
all
nodes must be enabled; nor that it must either be 2 or 8 children nodes.

>> I assume, you cannot read the hardware itself to get the number of
>> physical ports; and we have the compatible "microchip,lan966x-switch",
>> which is the generic one, so it could be the LAN9668 (with 8 ports)
>> or the LAN9662 (with 2 ports).
>
> I'll keep that argument for future when I see again patches adding such
> wildcard compatible. :) I had to discuss with some folks...
>
> Although the compatible difference does not have to be important here,
> because one could say the 9662 and 9668 programming model is exaclty
> the
> same and they differ by number of ports. This number of ports can be a
> dedicated property or counted from the children (if they were all
> available).

Mh. Rob was always in favor of dedicated compatible strings. And I
think there are also subtle differences. Eg. the LAN9662 has some kind
of accelerating engine, if I'm not mistaken.

So what do you prefer:

compatible = "microchip,lan9668";
and
compatible = "microchip,lan9662";

or

compatible = "microchip,lan966x";
microchip,num-phys-ports = <8>;
and
compatible = "microchip,lan966x";
microchip,num-phys-ports = <2>;
microchip,accelerating-engine;
..

The argument here was always, we don't want too much properties if
it can be deduced by the compatible string.

>> We somehow have to retain backwards
>> compatibility. Thus my idea was to at least make the handling slightly
>> better and count *any* child nodes. So it doesn't fall apart with
>> disabled
>> nodes. Then introduce proper compatible strings
>> "microchip,lan9668-switch"
>> and use that to hardcode the num_phys_ports to 8. But there will be
>> device trees with microchip,lan966x-switch out there, which we do want
>> to support.
>>
>> I see the following options:
>> (1) just don't care and get rid of the "microchip,lan966x-switch"
>> compatible
>> (2) quick fix for the older kernels by counting all the nodes and
>> proper fix for the newer kernels with dedicated compatibles
>> (3) no fix for older kernels, introduce new compatibles for new
>> kernels
>
> I propose this one. I would not care about out-of-tree DTSes which
> decided to disable random stuff and expect things working. :)

I'd argue, that is the usual case for all the switch bindings I
know of; not some unusual config. E.g. the SoC dtsi disables all
ports by default and only the ones which are actually connected
by the board are then enabled in the board dts, see
arch/arm/boot/dts/lan966x.dtsi
arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi

That being said, I don't care too much about the older kernels.
So I'm fine with (3).

-michael

>
>> (4) keep generic compatible if the hardware can be read out to get
>> the number of ports.
>>
>> I think (1) isn't the way to go. (2) was my try until I noticed
>> that odd fwnode behavior. But judging by this thread, I don't think
>> thats possible. I don't know if (4) is possible at all. If not we'd
>> be left with (3).

2022-06-28 15:29:52

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: fwnode_for_each_child_node() and OF backend discrepancy

On 28/06/2022 17:09, Michael Walle wrote:
>> It's bad also from another reason - the DT node was explicitly
>> disabled,
>> but you perform some operation on actual hardware representing this
>> node. I would assume that a disabled DT node means it is not
>> operational, e.g. hardware not present or missing clocks, so you should
>> not treat it as another meaning - power down/unused.
>
> Mh. Assume a SoC with an integrated ethernet switch. Some ports
> are externally connected, some don't. I'd think they should be disabled,
> no? Until now, all bindings I know, treat them as disabled. But OTOH
> you still need to do some configurations on them, like disable port
> forwarding, disable them or whatever. So the hardware is present, but
> it is not connected to anything.

I see your point and the meaning is okay... except that drivers don't
touch disabled nodes. If a device (with some address space) is disabled,
you do not write there "please be power off". Here the case is a bit
different, because I think ports do not have their own address space.
Yet it contradicts the logic - something is disabled in DT and you
expect to perform actual operations on it.

>
>>> But it works,
>>> as long as no ports are disabled and all ports are described in the
>>> device tree. But I have device trees where some are disabled.
>>
>> I am not sure if I follow this. You have devices which
>> 1. have unused ports, but all DT nodes are available/okay,
>> 2. have unused ports, which are in DT status=disabled?
>>
>> Doesn't case 2 break the bindings? If so, we don't care about such
>> out-of-tree users. We cannot support all of possible weird combinations
>> in out-of-tree DTS files...
>
> Case 1 is invalid I think.
>
> How does case 2 break the binding? It breaks the driver, yes. But not
> the binding.

The binding asks to describe all the ports, not describe and disable them.

> I agree on the out-of-tree argument, *but* isn't that
> what the binding is for, that out-of-tree device trees gonna work as
> long as they follow the binding? And I don't see where it dictates that
> all
> nodes must be enabled; nor that it must either be 2 or 8 children nodes.

True, that's not specific, but as with any incomplete hardware
description in DTS, the binding cannot guarantee you
backwards-compatibility.

The hardware should be described fully in DTS and bindings expect that
as well.

>
>>> I assume, you cannot read the hardware itself to get the number of
>>> physical ports; and we have the compatible "microchip,lan966x-switch",
>>> which is the generic one, so it could be the LAN9668 (with 8 ports)
>>> or the LAN9662 (with 2 ports).
>>
>> I'll keep that argument for future when I see again patches adding such
>> wildcard compatible. :) I had to discuss with some folks...
>>
>> Although the compatible difference does not have to be important here,
>> because one could say the 9662 and 9668 programming model is exaclty
>> the
>> same and they differ by number of ports. This number of ports can be a
>> dedicated property or counted from the children (if they were all
>> available).
>
> Mh. Rob was always in favor of dedicated compatible strings. And I
> think there are also subtle differences. Eg. the LAN9662 has some kind
> of accelerating engine, if I'm not mistaken.
>
> So what do you prefer:
>
> compatible = "microchip,lan9668";
> and
> compatible = "microchip,lan9662";

This one.

>
> or
>
> compatible = "microchip,lan966x";
> microchip,num-phys-ports = <8>;
> and
> compatible = "microchip,lan966x";
> microchip,num-phys-ports = <2>;
> microchip,accelerating-engine;
> ..
>
> The argument here was always, we don't want too much properties if
> it can be deduced by the compatible string.
>
>>> We somehow have to retain backwards
>>> compatibility. Thus my idea was to at least make the handling slightly
>>> better and count *any* child nodes. So it doesn't fall apart with
>>> disabled
>>> nodes. Then introduce proper compatible strings
>>> "microchip,lan9668-switch"
>>> and use that to hardcode the num_phys_ports to 8. But there will be
>>> device trees with microchip,lan966x-switch out there, which we do want
>>> to support.
>>>
>>> I see the following options:
>>> (1) just don't care and get rid of the "microchip,lan966x-switch"
>>> compatible
>>> (2) quick fix for the older kernels by counting all the nodes and
>>> proper fix for the newer kernels with dedicated compatibles
>>> (3) no fix for older kernels, introduce new compatibles for new
>>> kernels
>>
>> I propose this one. I would not care about out-of-tree DTSes which
>> decided to disable random stuff and expect things working. :)
>
> I'd argue, that is the usual case for all the switch bindings I
> know of; not some unusual config. E.g. the SoC dtsi disables all
> ports by default and only the ones which are actually connected
> by the board are then enabled in the board dts, see
> arch/arm/boot/dts/lan966x.dtsi
> arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
>
> That being said, I don't care too much about the older kernels.
> So I'm fine with (3).


Best regards,
Krzysztof

2022-06-28 21:33:38

by Andy Shevchenko

[permalink] [raw]
Subject: Re: fwnode_for_each_child_node() and OF backend discrepancy

On Tue, Jun 28, 2022 at 5:17 PM Krzysztof Kozlowski
<[email protected]> wrote:
> On 28/06/2022 17:09, Michael Walle wrote:

...

> > Mh. Assume a SoC with an integrated ethernet switch. Some ports
> > are externally connected, some don't. I'd think they should be disabled,
> > no? Until now, all bindings I know, treat them as disabled. But OTOH
> > you still need to do some configurations on them, like disable port
> > forwarding, disable them or whatever. So the hardware is present, but
> > it is not connected to anything.
>
> I see your point and the meaning is okay... except that drivers don't
> touch disabled nodes. If a device (with some address space) is disabled,
> you do not write there "please be power off". Here the case is a bit
> different, because I think ports do not have their own address space.
> Yet it contradicts the logic - something is disabled in DT and you
> expect to perform actual operations on it.

You beat me up to this comment, I also see a contradiction of what
"disabled" means in your, Michael, case and what it should be.

If you need to perform an operation on some piece of HW, it has not to
be disabled.

Or, you may deduce them by knowing how many ports in hardware (this is
usually done not by counting the nodes, but by a property) and do
whatever you want on ones, you have not listed (by port_num) in the
array of parsed children.

--
With Best Regards,
Andy Shevchenko

2022-06-28 21:43:53

by Michael Walle

[permalink] [raw]
Subject: Re: fwnode_for_each_child_node() and OF backend discrepancy

Am 2022-06-28 22:52, schrieb Horatiu Vultur:
> The 06/28/2022 22:28, Andy Shevchenko wrote:
>> EXTERNAL EMAIL: Do not click links or open attachments unless you know
>> the content is safe
>>
>> On Tue, Jun 28, 2022 at 5:17 PM Krzysztof Kozlowski
>> <[email protected]> wrote:
>> > On 28/06/2022 17:09, Michael Walle wrote:
>
> Hi,
>
> Sorry for joint this late.
>
>>
>> ...
>>
>> > > Mh. Assume a SoC with an integrated ethernet switch. Some ports
>> > > are externally connected, some don't. I'd think they should be disabled,
>> > > no? Until now, all bindings I know, treat them as disabled. But OTOH
>> > > you still need to do some configurations on them, like disable port
>> > > forwarding, disable them or whatever. So the hardware is present, but
>> > > it is not connected to anything.
>> >
>> > I see your point and the meaning is okay... except that drivers don't
>> > touch disabled nodes. If a device (with some address space) is disabled,
>> > you do not write there "please be power off". Here the case is a bit
>> > different, because I think ports do not have their own address space.
>> > Yet it contradicts the logic - something is disabled in DT and you
>> > expect to perform actual operations on it.
>>
>> You beat me up to this comment, I also see a contradiction of what
>> "disabled" means in your, Michael, case and what it should be.
>>
>> If you need to perform an operation on some piece of HW, it has not to
>> be disabled.
>>
>> Or, you may deduce them by knowing how many ports in hardware (this is
>> usually done not by counting the nodes, but by a property) and do
>> whatever you want on ones, you have not listed (by port_num) in the
>> array of parsed children.
>
> It is not possible to have a defined for the MAX number of ports that
> supported by lan966x. Which is 8. And assigned that define to
> num_phys_ports instead of counting the entries in DT?

You mean also for the lan9662? I'm pretty sure that doesn't
work. Have a look where num_phys_ports is used. One random
example:
https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/microchip/lan966x/lan966x_main.c#L874

So if your switch only has 4 ports, then I'd guess you'll
access a non-existing register.

-michael

2022-06-28 22:34:28

by Vladimir Oltean

[permalink] [raw]
Subject: Re: fwnode_for_each_child_node() and OF backend discrepancy

On Tue, Jun 28, 2022 at 03:47:59PM +0200, Michael Walle wrote:
> Horatiu, can we determine the actual number of ports (or maybe
> determine if its a LAN9668 or a LAN9662) from the hardware itself
> in an easy way? That way we wouldn't need a new compatible string,
> but could use the generic "lan966x" one.

Never seen a lan966x switch, but if it's anything like the Ocelot-1
family, you should be able to determine the port count by reading the
out-of-reset value of any register that contains a port mask which has
all ones by default (any of the PGIDs in the multicast/flooding
destinations range, or the VLAN table port masks for any VLAN ID).

Or you can read the size of the packet buffer and infer from that which
switch model it is, based on a driver hardcoded lookup table. I fully
expect a switch with fewer ports to have smaller packet buffer. See
ocelot_detect_features() for an example of registers I am talking about.
Maybe lan966x has something similar.

Of course these are just band aids and it would still be good to modify
the device trees with the proper
compatible = "microchip,lan9668-switch", "microchip,lan966x-switch";
rather than rely on educated guesswork (which is still guesswork, after all).

2022-06-29 11:04:55

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: fwnode_for_each_child_node() and OF backend discrepancy

On Tue, Jun 28, 2022 at 12:32 PM Krzysztof Kozlowski
<[email protected]> wrote:
>
> On 27/06/2022 15:33, Rafael J. Wysocki wrote:
> > On Mon, Jun 27, 2022 at 3:08 PM Krzysztof Kozlowski
> > <[email protected]> wrote:
> >>
> >> On 27/06/2022 14:49, Michael Walle wrote:
> >>> Hi,
> >>>
> >>> I tired to iterate over all child nodes, regardless if they are
> >>> available
> >>> or not. Now there is that handy fwnode_for_each_child_node() (and the
> >>> fwnode_for_each_available_child_node()). The only thing is the OF
> >>> backend
> >>> already skips disabled nodes [1], making fwnode_for_each_child_node()
> >>> and
> >>> fwnode_for_each_available_child_node() behave the same with the OF
> >>> backend.
> >>>
> >>> Doesn't seem to be noticed by anyone for now. I'm not sure how to fix
> >>> that
> >>> one. fwnode_for_each_child_node() and also fwnode_get_next_child_node()
> >>> are
> >>> used by a handful of drivers. I've looked at some, but couldn't decide
> >>> whether they really want to iterate over all child nodes or just the
> >>> enabled
> >>> ones.
> >>
> >> If I get it correctly, this was introduced by 8a0662d9ed29 ("Driver
> >> core: Unified interface for firmware node properties")
> >> .
> >
> > Originally it was, but then it has been reworked a few times.
> >
> > The backend callbacks were introduced by Sakari, in particular.
>
> I see you as an author of 8a0662d9ed29 which adds
> device_get_next_child_node() and uses of_get_next_available_child()
> instead of of_get_next_child(). Although it was back in 2014, so maybe
> it will be tricky to get original intention. :)

The OF part of this was based on Grant's suggestions. My
understanding at that time was that this was the right thing to do for
OF and nobody told me otherwise.

> Which commit do you mean when you refer to Sakari's work?

3708184afc77 device property: Move FW type specific functionality to
FW specific files

However, it didn't change the "available" vs "any" behavior for OF.

> >
> >> The question to Rafael - what was your intention when you added
> >> device_get_next_child_node() looking only for available nodes?
> >
> > That depends on the backend.
>
> We talk about OF backend. In your commit device_get_next_child_node for
> OF uses explicitly available node, not any node.

Yes, it does.

If that doesn't match the cases in which it is used, I guess it can be adjusted.

> > fwnode_for_each_available_child_node() is more specific and IIRC it
> > was introduced for fw_devlink (CC Saravana).
> >
> >> My understanding is that this implementation should be consistent with
> >> OF implementation, so fwnode_get_next_child_node=get any child.
> >
> > IIUC, the OF implementation is not consistent with the
> > fwnode_get_next_child_node=get any child thing.
> >
> >> However maybe ACPI treats it somehow differently?
> >
> > acpi_get_next_subnode() simply returns the next subnode it can find.

I guess that the confusion is related to what "available" means for ACPI and OF.

In the ACPI case it means "this a device object corresponding to a
device that is present". In OF it is related to the "status" property
AFAICS.

2022-06-29 13:53:56

by Grant Likely

[permalink] [raw]
Subject: Re: fwnode_for_each_child_node() and OF backend discrepancy



> On 29 Jun 2022, at 11:50, Rafael J. Wysocki <[email protected]> wrote:
>
> On Tue, Jun 28, 2022 at 12:32 PM Krzysztof Kozlowski
> <[email protected]> wrote:
>>
>> On 27/06/2022 15:33, Rafael J. Wysocki wrote:
>>> On Mon, Jun 27, 2022 at 3:08 PM Krzysztof Kozlowski
>>> <[email protected]> wrote:
>>>>
>>>> On 27/06/2022 14:49, Michael Walle wrote:
>>>>> Hi,
>>>>>
>>>>> I tired to iterate over all child nodes, regardless if they are
>>>>> available
>>>>> or not. Now there is that handy fwnode_for_each_child_node() (and the
>>>>> fwnode_for_each_available_child_node()). The only thing is the OF
>>>>> backend
>>>>> already skips disabled nodes [1], making fwnode_for_each_child_node()
>>>>> and
>>>>> fwnode_for_each_available_child_node() behave the same with the OF
>>>>> backend.
>>>>>
>>>>> Doesn't seem to be noticed by anyone for now. I'm not sure how to fix
>>>>> that
>>>>> one. fwnode_for_each_child_node() and also fwnode_get_next_child_node()
>>>>> are
>>>>> used by a handful of drivers. I've looked at some, but couldn't decide
>>>>> whether they really want to iterate over all child nodes or just the
>>>>> enabled
>>>>> ones.
>>>>
>>>> If I get it correctly, this was introduced by 8a0662d9ed29 ("Driver
>>>> core: Unified interface for firmware node properties")
>>>> .
>>>
>>> Originally it was, but then it has been reworked a few times.
>>>
>>> The backend callbacks were introduced by Sakari, in particular.
>>
>> I see you as an author of 8a0662d9ed29 which adds
>> device_get_next_child_node() and uses of_get_next_available_child()
>> instead of of_get_next_child(). Although it was back in 2014, so maybe
>> it will be tricky to get original intention. :)
>
> The OF part of this was based on Grant's suggestions. My
> understanding at that time was that this was the right thing to do for
> OF and nobody told me otherwise.
>
>> Which commit do you mean when you refer to Sakari's work?
>
> 3708184afc77 device property: Move FW type specific functionality to
> FW specific files
>
> However, it didn't change the "available" vs "any" behavior for OF.

Back in the mists of time indeed. I don’t remember anything specific about all/available variants of the fwnode_ helpers. Auditing the existing users is probably needed to decide whether or not it can be changed.

g.

>
>>>
>>>> The question to Rafael - what was your intention when you added
>>>> device_get_next_child_node() looking only for available nodes?
>>>
>>> That depends on the backend.
>>
>> We talk about OF backend. In your commit device_get_next_child_node for
>> OF uses explicitly available node, not any node.
>
> Yes, it does.
>
> If that doesn't match the cases in which it is used, I guess it can be adjusted.
>
>>> fwnode_for_each_available_child_node() is more specific and IIRC it
>>> was introduced for fw_devlink (CC Saravana).
>>>
>>>> My understanding is that this implementation should be consistent with
>>>> OF implementation, so fwnode_get_next_child_node=get any child.
>>>
>>> IIUC, the OF implementation is not consistent with the
>>> fwnode_get_next_child_node=get any child thing.
>>>
>>>> However maybe ACPI treats it somehow differently?
>>>
>>> acpi_get_next_subnode() simply returns the next subnode it can find.
>
> I guess that the confusion is related to what "available" means for ACPI and OF.
>
> In the ACPI case it means "this a device object corresponding to a
> device that is present". In OF it is related to the "status" property
> AFAICS.

2022-06-30 21:18:58

by Michael Walle

[permalink] [raw]
Subject: Re: fwnode_for_each_child_node() and OF backend discrepancy

Am 2022-06-30 22:16, schrieb Horatiu Vultur:
> The 06/28/2022 23:07, Michael Walle wrote:
>> EXTERNAL EMAIL: Do not click links or open attachments unless you know
>> the content is safe
>>
>> Am 2022-06-28 22:52, schrieb Horatiu Vultur:
>> > The 06/28/2022 22:28, Andy Shevchenko wrote:
>> > > EXTERNAL EMAIL: Do not click links or open attachments unless you know
>> > > the content is safe
>> > >
>> > > On Tue, Jun 28, 2022 at 5:17 PM Krzysztof Kozlowski
>> > > <[email protected]> wrote:
>> > > > On 28/06/2022 17:09, Michael Walle wrote:
>> >
>> > Hi,
>> >
>> > Sorry for joint this late.
>> >
>> > >
>> > > ...
>> > >
>> > > > > Mh. Assume a SoC with an integrated ethernet switch. Some ports
>> > > > > are externally connected, some don't. I'd think they should be disabled,
>> > > > > no? Until now, all bindings I know, treat them as disabled. But OTOH
>> > > > > you still need to do some configurations on them, like disable port
>> > > > > forwarding, disable them or whatever. So the hardware is present, but
>> > > > > it is not connected to anything.
>> > > >
>> > > > I see your point and the meaning is okay... except that drivers don't
>> > > > touch disabled nodes. If a device (with some address space) is disabled,
>> > > > you do not write there "please be power off". Here the case is a bit
>> > > > different, because I think ports do not have their own address space.
>> > > > Yet it contradicts the logic - something is disabled in DT and you
>> > > > expect to perform actual operations on it.
>> > >
>> > > You beat me up to this comment, I also see a contradiction of what
>> > > "disabled" means in your, Michael, case and what it should be.
>> > >
>> > > If you need to perform an operation on some piece of HW, it has not to
>> > > be disabled.
>> > >
>> > > Or, you may deduce them by knowing how many ports in hardware (this is
>> > > usually done not by counting the nodes, but by a property) and do
>> > > whatever you want on ones, you have not listed (by port_num) in the
>> > > array of parsed children.
>> >
>> > It is not possible to have a defined for the MAX number of ports that
>> > supported by lan966x. Which is 8. And assigned that define to
>> > num_phys_ports instead of counting the entries in DT?
>>
>> You mean also for the lan9662? I'm pretty sure that doesn't
>> work. Have a look where num_phys_ports is used. One random
>> example:
>> https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/microchip/lan966x/lan966x_main.c#L874
>>
>> So if your switch only has 4 ports, then I'd guess you'll
>> access a non-existing register.
>
> Underneath lan662 and lan668 is the same chip. The HW people disable
> some ports/features on each platform but from what I know you will
> still
> be able to access the registers.

I noticed that there are still 8 ports in the register description and
assumed that it was wrong [1]. But ok, that makes sense in some way.
OTOH that means, we cannot do the guesswork Vladimir proposed.

-michael

[1] https://microchip-ung.github.io/lan9662_reginfo/reginfo_LAN9662.html

2022-06-30 21:28:48

by Vladimir Oltean

[permalink] [raw]
Subject: Re: fwnode_for_each_child_node() and OF backend discrepancy

On Thu, Jun 30, 2022 at 11:00:37PM +0200, Michael Walle wrote:
> > > > It is not possible to have a defined for the MAX number of ports that
> > > > supported by lan966x. Which is 8. And assigned that define to
> > > > num_phys_ports instead of counting the entries in DT?
> > >
> > > You mean also for the lan9662? I'm pretty sure that doesn't
> > > work. Have a look where num_phys_ports is used. One random
> > > example:
> > > https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/microchip/lan966x/lan966x_main.c#L874
> > >
> > > So if your switch only has 4 ports, then I'd guess you'll
> > > access a non-existing register.
> >
> > Underneath lan662 and lan668 is the same chip. The HW people disable
> > some ports/features on each platform but from what I know you will still
> > be able to access the registers.
>
> I noticed that there are still 8 ports in the register description and
> assumed that it was wrong [1]. But ok, that makes sense in some way.
> OTOH that means, we cannot do the guesswork Vladimir proposed.
>
> -michael
>
> [1] https://microchip-ung.github.io/lan9662_reginfo/reginfo_LAN9662.html

Are you 100% positive that the default values for the flooding PGIDs are
GENMASK(8, 0) for a 4-port switch? And that the packet buffer has the
same size for a switch with half as many ports? Ok...

But in that case, what exactly is the problem if the port count of 8 is
a synthesis time constant for lan966x, and if the CPU port module is
always at index 8 in the analyzer (with a gap between indices 4 and 7)?
Just hardcode lan966x->num_phys_ports to 8 and work with that throughout.
Allocate lan966x->ports as an array of 8 pointers to struct lan966x_port
(which they are already), and the pointers themselves are populated as
being the netdev_priv of the interfaces that are actually present and
used. Literally the only thing you need to fix is that you need to
hardcode num_phys_ports to 8, problem solved. It means that lan9662 is
nothing but a lan9668 where the last 4 ports have 'status = "disabled"'
in the device tree.

2022-06-30 21:57:54

by Michael Walle

[permalink] [raw]
Subject: Re: fwnode_for_each_child_node() and OF backend discrepancy

Am 2022-06-30 23:21, schrieb Vladimir Oltean:
> On Thu, Jun 30, 2022 at 11:00:37PM +0200, Michael Walle wrote:
>> > > > It is not possible to have a defined for the MAX number of ports that
>> > > > supported by lan966x. Which is 8. And assigned that define to
>> > > > num_phys_ports instead of counting the entries in DT?
>> > >
>> > > You mean also for the lan9662? I'm pretty sure that doesn't
>> > > work. Have a look where num_phys_ports is used. One random
>> > > example:
>> > > https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/microchip/lan966x/lan966x_main.c#L874
>> > >
>> > > So if your switch only has 4 ports, then I'd guess you'll
>> > > access a non-existing register.
>> >
>> > Underneath lan662 and lan668 is the same chip. The HW people disable
>> > some ports/features on each platform but from what I know you will still
>> > be able to access the registers.
>>
>> I noticed that there are still 8 ports in the register description and
>> assumed that it was wrong [1]. But ok, that makes sense in some way.
>> OTOH that means, we cannot do the guesswork Vladimir proposed.
>>
>> -michael
>>
>> [1]
>> https://microchip-ung.github.io/lan9662_reginfo/reginfo_LAN9662.html
>
> Are you 100% positive that the default values for the flooding PGIDs
> are
> GENMASK(8, 0) for a 4-port switch? And that the packet buffer has the
> same size for a switch with half as many ports? Ok...
>
> But in that case, what exactly is the problem if the port count of 8 is
> a synthesis time constant for lan966x, and if the CPU port module is
> always at index 8 in the analyzer (with a gap between indices 4 and 7)?
> Just hardcode lan966x->num_phys_ports to 8 and work with that
> throughout.
> Allocate lan966x->ports as an array of 8 pointers to struct
> lan966x_port
> (which they are already), and the pointers themselves are populated as
> being the netdev_priv of the interfaces that are actually present and
> used. Literally the only thing you need to fix is that you need to
> hardcode num_phys_ports to 8, problem solved. It means that lan9662 is
> nothing but a lan9668 where the last 4 ports have 'status = "disabled"'
> in the device tree.

If that's the case, sure. The last four ports can just be omitted no?
Of course you'll lose the possibility to validate the device tree ports
input, i.e. port@5 would perfectly valid although it doesn't make any
sense for the lan9662. (Regardless if it is disabled, or if it's
omitted)

-michael