2020-05-22 21:34:42

by Jeremy Linton

[permalink] [raw]
Subject: [RFC 00/11] Make C45 autoprobe more robust

It would be nice if we could depend on the c45 scanner
to identify standards complaint phys or fail cleanly
enough that we can turn around and continue probing the
bus for c22 devices.

In order to pull this off we should be looking at a larger
range of MMD addresses, as well as doing a better job of judging
if a phy appears to be responding correctly. Once that is in
place we then allow a MDIO bus to report that its c45 capable
with a c22 fallback.

So this set is really a heavy RFC, and I have my own set of
questions about it.

First, it seems like its ok to scan reserved parts of the MMD space,
given the existing code is scanning MMD 0. Should we do a better
job of blocking out the reserved areas? Or was this really what
the commit to sanitize the c22 capability was fixing (avoid a
probe at location 0).

Secondly, are there parts of the system that are depending on
"stub" MDIO devices being created?The DT code looks ok, but I
think the existing code path left open a number possibilities
where devices are created without valid IDs. The commit which
cleared the c22 capability registers from the device id list
doesn't make much sense to me except to create bugus devices
by avoiding breaking out of the devices loop early. There were
a couple other cases (all 0 device lists, device lists
reporting devices that respond as 0xFFFFFFFF to the id registers).

Do we want to probe some of the additional package id registers?

Do we want a more "strict" flag for fully compliant MDIO/PHY
combinations or are we ok with extra stub devices? The 3rd to last
set is just using the C45_FIRST flag for it.

What have I missed?

Jeremy Linton (11):
net: phy: Don't report success if devices weren't found
net: phy: Simplify MMD device list termination
net: phy: refactor c45 phy identification sequence
net: phy: Handle c22 regs presence better
net: phy: Scan the entire MMD device space
net: phy: Hoist no phy detected state
net: phy: reset invalid phy reads of 0 back to 0xffffffff
net: phy: Allow mdio buses to auto-probe c45 devices
net: phy: Refuse to consider phy_id=0 a valid phy
net: example acpize xgmac_mdio
net: example xgmac enable extended scanning

drivers/net/ethernet/freescale/xgmac_mdio.c | 28 +++--
drivers/net/phy/mdio_bus.c | 9 +-
drivers/net/phy/phy_device.c | 112 ++++++++++++--------
include/linux/phy.h | 7 +-
4 files changed, 100 insertions(+), 56 deletions(-)

--
2.26.2


2020-05-22 21:34:47

by Jeremy Linton

[permalink] [raw]
Subject: [RFC 04/11] net: phy: Handle c22 regs presence better

Until this point, we have been sanitizing the c22
regs presence bit out of all the MMD device lists.
This is incorrect as it causes the 0xFFFFFFFF checks
to incorrectly fail. Further, it turns out that we
want to utilize this flag to make a determination that
there is actually a phy at this location and we should
be accessing it using c22.

Signed-off-by: Jeremy Linton <[email protected]>
---
drivers/net/phy/phy_device.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index f0761fa5e40b..2d677490ecab 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -689,9 +689,6 @@ static int get_phy_c45_devs_in_pkg(struct mii_bus *bus, int addr, int dev_addr,
return -EIO;
*devices_in_package |= phy_reg;

- /* Bit 0 doesn't represent a device, it indicates c22 regs presence */
- *devices_in_package &= ~BIT(0);
-
return 0;
}

@@ -742,6 +739,8 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
int i;
const int num_ids = ARRAY_SIZE(c45_ids->device_ids);
u32 *devs = &c45_ids->devices_in_package;
+ bool c22_present = false;
+ bool valid_id = false;

/* Find first non-zero Devices In package. Device zero is reserved
* for 802.3 c45 complied PHYs, so don't probe it at first.
@@ -770,6 +769,10 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
return 0;
}

+ /* Bit 0 doesn't represent a device, it indicates c22 regs presence */
+ c22_present = *devs & BIT(0);
+ *devs &= ~BIT(0);
+
/* Now probe Device Identifiers for each device present. */
for (i = 1; i < num_ids; i++) {
if (!(c45_ids->devices_in_package & (1 << i)))
@@ -778,6 +781,13 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
ret = _get_phy_id(bus, addr, i, &c45_ids->device_ids[i], true);
if (ret < 0)
return ret;
+ if (valid_phy_id(c45_ids->device_ids[i]))
+ valid_id = true;
+ }
+
+ if (!valid_id && c22_present) {
+ *phy_id = 0xffffffff;
+ return 0;
}
*phy_id = 0;
return 0;
--
2.26.2

2020-05-22 21:35:01

by Jeremy Linton

[permalink] [raw]
Subject: [RFC 03/11] net: phy: refactor c45 phy identification sequence

Lets factor out the phy id logic, and make it generic
so that it can be used for c22 and c45.

Signed-off-by: Jeremy Linton <[email protected]>
---
drivers/net/phy/phy_device.c | 65 +++++++++++++++++++-----------------
1 file changed, 35 insertions(+), 30 deletions(-)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 7746c07b97fe..f0761fa5e40b 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -695,6 +695,29 @@ static int get_phy_c45_devs_in_pkg(struct mii_bus *bus, int addr, int dev_addr,
return 0;
}

+static int _get_phy_id(struct mii_bus *bus, int addr, int dev_addr,
+ u32 *phy_id, bool c45)
+{
+ int phy_reg, reg_addr;
+
+ int reg_base = c45 ? (MII_ADDR_C45 | dev_addr << 16) : 0;
+
+ reg_addr = reg_base | MII_PHYSID1;
+ phy_reg = mdiobus_read(bus, addr, reg_addr);
+ if (phy_reg < 0)
+ return -EIO;
+
+ *phy_id = phy_reg << 16;
+
+ reg_addr = reg_base | MII_PHYSID2;
+ phy_reg = mdiobus_read(bus, addr, reg_addr);
+ if (phy_reg < 0)
+ return -EIO;
+ *phy_id |= phy_reg;
+
+ return 0;
+}
+
static bool valid_phy_id(int val)
{
return (val > 0 && ((val & 0x1fffffff) != 0x1fffffff));
@@ -715,17 +738,17 @@ static bool valid_phy_id(int val)
*/
static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
struct phy_c45_device_ids *c45_ids) {
- int phy_reg;
- int i, reg_addr;
+ int ret;
+ int i;
const int num_ids = ARRAY_SIZE(c45_ids->device_ids);
u32 *devs = &c45_ids->devices_in_package;

/* Find first non-zero Devices In package. Device zero is reserved
* for 802.3 c45 complied PHYs, so don't probe it at first.
*/
- for (i = 1; i < num_ids && *devs == 0; i++) {
- phy_reg = get_phy_c45_devs_in_pkg(bus, addr, i, devs);
- if (phy_reg < 0)
+ for (i = 0; i < num_ids && *devs == 0; i++) {
+ ret = get_phy_c45_devs_in_pkg(bus, addr, i, devs);
+ if (ret < 0)
return -EIO;

if ((*devs & 0x1fffffff) == 0x1fffffff) {
@@ -752,17 +775,9 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
if (!(c45_ids->devices_in_package & (1 << i)))
continue;

- reg_addr = MII_ADDR_C45 | i << 16 | MII_PHYSID1;
- phy_reg = mdiobus_read(bus, addr, reg_addr);
- if (phy_reg < 0)
- return -EIO;
- c45_ids->device_ids[i] = phy_reg << 16;
-
- reg_addr = MII_ADDR_C45 | i << 16 | MII_PHYSID2;
- phy_reg = mdiobus_read(bus, addr, reg_addr);
- if (phy_reg < 0)
- return -EIO;
- c45_ids->device_ids[i] |= phy_reg;
+ ret = _get_phy_id(bus, addr, i, &c45_ids->device_ids[i], true);
+ if (ret < 0)
+ return ret;
}
*phy_id = 0;
return 0;
@@ -787,27 +802,17 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
static int get_phy_id(struct mii_bus *bus, int addr, u32 *phy_id,
bool is_c45, struct phy_c45_device_ids *c45_ids)
{
- int phy_reg;
+ int ret;

if (is_c45)
return get_phy_c45_ids(bus, addr, phy_id, c45_ids);

- /* Grab the bits from PHYIR1, and put them in the upper half */
- phy_reg = mdiobus_read(bus, addr, MII_PHYSID1);
- if (phy_reg < 0) {
+ ret = _get_phy_id(bus, addr, 0, phy_id, false);
+ if (ret < 0) {
/* returning -ENODEV doesn't stop bus scanning */
- return (phy_reg == -EIO || phy_reg == -ENODEV) ? -ENODEV : -EIO;
+ return (ret == -EIO || ret == -ENODEV) ? -ENODEV : -EIO;
}

- *phy_id = phy_reg << 16;
-
- /* Grab the bits from PHYIR2, and put them in the lower half */
- phy_reg = mdiobus_read(bus, addr, MII_PHYSID2);
- if (phy_reg < 0)
- return -EIO;
-
- *phy_id |= phy_reg;
-
return 0;
}

--
2.26.2

2020-05-22 21:35:42

by Jeremy Linton

[permalink] [raw]
Subject: [RFC 08/11] net: phy: Allow mdio buses to auto-probe c45 devices

The mdiobus_scan logic is currently hardcoded to only
work with c22 devices. This works fairly well in most
cases, but its possible a c45 device doesn't respond
despite being a standard phy. If the parent hardware
is capable, it makes sense to scan for c45 devices before
falling back to c22.

As we want this to reflect the capabilities of the STA,
lets add a field to the mii_bus structure to represent
the capability. That way devices can opt into the extended
scanning. Existing users should continue to default to c22
only scanning as long as they are zero'ing the structure
before use.

At the moment its a yes/no option, but in the future
it may be useful to extend this to c45 only policy, or
add additional classes and policies.

Signed-off-by: Jeremy Linton <[email protected]>
---
drivers/net/phy/mdio_bus.c | 9 +++++++--
include/linux/phy.h | 5 +++++
2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index 7a4eb3f2cb74..3ab618e15f2c 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -732,10 +732,15 @@ EXPORT_SYMBOL(mdiobus_free);
*/
struct phy_device *mdiobus_scan(struct mii_bus *bus, int addr)
{
- struct phy_device *phydev;
+ struct phy_device *phydev = ERR_PTR(-ENODEV);
int err;

- phydev = get_phy_device(bus, addr, false);
+ if (bus->probe_capabilities == MDIOBUS_C45_FIRST)
+ phydev = get_phy_device(bus, addr, true);
+
+ if (IS_ERR(phydev))
+ phydev = get_phy_device(bus, addr, false);
+
if (IS_ERR(phydev))
return phydev;

diff --git a/include/linux/phy.h b/include/linux/phy.h
index 480a6b153227..d68e1ebab484 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -275,6 +275,11 @@ struct mii_bus {
int reset_delay_us;
/* RESET GPIO descriptor pointer */
struct gpio_desc *reset_gpiod;
+ /* bus capabilities, used for probing */
+ enum {
+ MDIOBUS_C22_ONLY = 0,
+ MDIOBUS_C45_FIRST,
+ } probe_capabilities;
};
#define to_mii_bus(d) container_of(d, struct mii_bus, dev)

--
2.26.2

2020-05-22 21:36:18

by Jeremy Linton

[permalink] [raw]
Subject: [RFC 02/11] net: phy: Simplify MMD device list termination

Since we are already checking for *devs == 0 after
the loop terminates, we can add a mostly F's check
as well. With that change we can simplify the return/break
sequence inside the loop.

Add a valid_phy_id() macro for this, since we will be using it
in a couple other places.

Signed-off-by: Jeremy Linton <[email protected]>
---
drivers/net/phy/phy_device.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 245899b58a7d..7746c07b97fe 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -695,6 +695,11 @@ static int get_phy_c45_devs_in_pkg(struct mii_bus *bus, int addr, int dev_addr,
return 0;
}

+static bool valid_phy_id(int val)
+{
+ return (val > 0 && ((val & 0x1fffffff) != 0x1fffffff));
+}
+
/**
* get_phy_c45_ids - reads the specified addr for its 802.3-c45 IDs.
* @bus: the target MII bus
@@ -732,18 +737,12 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
phy_reg = get_phy_c45_devs_in_pkg(bus, addr, 0, devs);
if (phy_reg < 0)
return -EIO;
- /* no device there, let's get out of here */
- if ((*devs & 0x1fffffff) == 0x1fffffff) {
- *phy_id = 0xffffffff;
- return 0;
- } else {
- break;
- }
+ break;
}
}

/* no reported devices */
- if (*devs == 0) {
+ if (!valid_phy_id(*devs)) {
*phy_id = 0xffffffff;
return 0;
}
--
2.26.2

2020-05-22 21:37:02

by Jeremy Linton

[permalink] [raw]
Subject: [RFC 01/11] net: phy: Don't report success if devices weren't found

C45 devices are to return 0 for registers they haven't
implemented. This means in theory we can terminate the
device search loop without finding any MMDs. In that
case we want to immediately return indicating that
nothing was found rather than continuing to probe
and falling into the success state at the bottom.

Signed-off-by: Jeremy Linton <[email protected]>
---
drivers/net/phy/phy_device.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index ac2784192472..245899b58a7d 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -742,6 +742,12 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
}
}

+ /* no reported devices */
+ if (*devs == 0) {
+ *phy_id = 0xffffffff;
+ return 0;
+ }
+
/* Now probe Device Identifiers for each device present. */
for (i = 1; i < num_ids; i++) {
if (!(c45_ids->devices_in_package & (1 << i)))
--
2.26.2

2020-05-23 15:30:28

by Andrew Lunn

[permalink] [raw]
Subject: Re: [RFC 03/11] net: phy: refactor c45 phy identification sequence

On Fri, May 22, 2020 at 04:30:51PM -0500, Jeremy Linton wrote:
> Lets factor out the phy id logic, and make it generic
> so that it can be used for c22 and c45.
>
> Signed-off-by: Jeremy Linton <[email protected]>
> ---
> drivers/net/phy/phy_device.c | 65 +++++++++++++++++++-----------------
> 1 file changed, 35 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index 7746c07b97fe..f0761fa5e40b 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -695,6 +695,29 @@ static int get_phy_c45_devs_in_pkg(struct mii_bus *bus, int addr, int dev_addr,
> return 0;
> }
>
> +static int _get_phy_id(struct mii_bus *bus, int addr, int dev_addr,
> + u32 *phy_id, bool c45)

Hi Jeremy

How about read_phy_id() so you can avoid the _ prefix.

> static bool valid_phy_id(int val)
> {
> return (val > 0 && ((val & 0x1fffffff) != 0x1fffffff));
> @@ -715,17 +738,17 @@ static bool valid_phy_id(int val)
> */
> static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
> struct phy_c45_device_ids *c45_ids) {
> - int phy_reg;
> - int i, reg_addr;
> + int ret;
> + int i;
> const int num_ids = ARRAY_SIZE(c45_ids->device_ids);
> u32 *devs = &c45_ids->devices_in_package;
>
> /* Find first non-zero Devices In package. Device zero is reserved
> * for 802.3 c45 complied PHYs, so don't probe it at first.
> */
> - for (i = 1; i < num_ids && *devs == 0; i++) {
> - phy_reg = get_phy_c45_devs_in_pkg(bus, addr, i, devs);
> - if (phy_reg < 0)
> + for (i = 0; i < num_ids && *devs == 0; i++) {
> + ret = get_phy_c45_devs_in_pkg(bus, addr, i, devs);
> + if (ret < 0)
> return -EIO;

Renaming reg_addr to ret does not belong in this patch.

Andrew

2020-05-23 17:18:31

by Jeremy Linton

[permalink] [raw]
Subject: Re: [RFC 03/11] net: phy: refactor c45 phy identification sequence

Hi,

Thanks for taking a look at this!

On 5/23/20 10:28 AM, Andrew Lunn wrote:
> On Fri, May 22, 2020 at 04:30:51PM -0500, Jeremy Linton wrote:
>> Lets factor out the phy id logic, and make it generic
>> so that it can be used for c22 and c45.
>>
>> Signed-off-by: Jeremy Linton <[email protected]>
>> ---
>> drivers/net/phy/phy_device.c | 65 +++++++++++++++++++-----------------
>> 1 file changed, 35 insertions(+), 30 deletions(-)
>>
>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>> index 7746c07b97fe..f0761fa5e40b 100644
>> --- a/drivers/net/phy/phy_device.c
>> +++ b/drivers/net/phy/phy_device.c
>> @@ -695,6 +695,29 @@ static int get_phy_c45_devs_in_pkg(struct mii_bus *bus, int addr, int dev_addr,
>> return 0;
>> }
>>
>> +static int _get_phy_id(struct mii_bus *bus, int addr, int dev_addr,
>> + u32 *phy_id, bool c45)
>
> Hi Jeremy
>
> How about read_phy_id() so you can avoid the _ prefix.

Yes, that sounds good.

>
>> static bool valid_phy_id(int val)
>> {
>> return (val > 0 && ((val & 0x1fffffff) != 0x1fffffff));
>> @@ -715,17 +738,17 @@ static bool valid_phy_id(int val)
>> */
>> static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
>> struct phy_c45_device_ids *c45_ids) {
>> - int phy_reg;
>> - int i, reg_addr;
>> + int ret;
>> + int i;
>> const int num_ids = ARRAY_SIZE(c45_ids->device_ids);
>> u32 *devs = &c45_ids->devices_in_package;
>>
>> /* Find first non-zero Devices In package. Device zero is reserved
>> * for 802.3 c45 complied PHYs, so don't probe it at first.
>> */
>> - for (i = 1; i < num_ids && *devs == 0; i++) {
>> - phy_reg = get_phy_c45_devs_in_pkg(bus, addr, i, devs);
>> - if (phy_reg < 0)
>> + for (i = 0; i < num_ids && *devs == 0; i++) {
>> + ret = get_phy_c45_devs_in_pkg(bus, addr, i, devs);
>> + if (ret < 0)
>> return -EIO;
>
> Renaming reg_addr to ret does not belong in this patch.

Sure, that makes sense. The rename was a last min change when I was
shuffling the args around.

Thanks,


2020-05-23 17:35:07

by Jeremy Linton

[permalink] [raw]
Subject: Re: [RFC 03/11] net: phy: refactor c45 phy identification sequence

Hi,

On 5/23/20 10:28 AM, Andrew Lunn wrote:
> On Fri, May 22, 2020 at 04:30:51PM -0500, Jeremy Linton wrote:
>> Lets factor out the phy id logic, and make it generic
>> so that it can be used for c22 and c45.
>>
>> Signed-off-by: Jeremy Linton <[email protected]>
>> ---
>> drivers/net/phy/phy_device.c | 65 +++++++++++++++++++-----------------
>> 1 file changed, 35 insertions(+), 30 deletions(-)
>>
>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>> index 7746c07b97fe..f0761fa5e40b 100644
>> --- a/drivers/net/phy/phy_device.c
>> +++ b/drivers/net/phy/phy_device.c
>> @@ -695,6 +695,29 @@ static int get_phy_c45_devs_in_pkg(struct mii_bus *bus, int addr, int dev_addr,
>> return 0;
>> }
>>
>> +static int _get_phy_id(struct mii_bus *bus, int addr, int dev_addr,
>> + u32 *phy_id, bool c45)
>
> Hi Jeremy
>
> How about read_phy_id() so you can avoid the _ prefix.
>
>> static bool valid_phy_id(int val)
>> {
>> return (val > 0 && ((val & 0x1fffffff) != 0x1fffffff));
>> @@ -715,17 +738,17 @@ static bool valid_phy_id(int val)
>> */
>> static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
>> struct phy_c45_device_ids *c45_ids) {
>> - int phy_reg;
>> - int i, reg_addr;
>> + int ret;
>> + int i;
>> const int num_ids = ARRAY_SIZE(c45_ids->device_ids);
>> u32 *devs = &c45_ids->devices_in_package;
>>
>> /* Find first non-zero Devices In package. Device zero is reserved
>> * for 802.3 c45 complied PHYs, so don't probe it at first.
>> */
>> - for (i = 1; i < num_ids && *devs == 0; i++) {
>> - phy_reg = get_phy_c45_devs_in_pkg(bus, addr, i, devs);
>> - if (phy_reg < 0)
>> + for (i = 0; i < num_ids && *devs == 0; i++) {
>> + ret = get_phy_c45_devs_in_pkg(bus, addr, i, devs);
>> + if (ret < 0)
>> return -EIO;
>
> Renaming reg_addr to ret does not belong in this patch.
>

Looks like I changed the loop index in this patch while shuffling things
around yesterday too. The "for (i = 1/0.." change belongs in 5/11 as well.



2020-05-23 18:23:34

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [RFC 01/11] net: phy: Don't report success if devices weren't found

On Fri, May 22, 2020 at 04:30:49PM -0500, Jeremy Linton wrote:
> C45 devices are to return 0 for registers they haven't
> implemented. This means in theory we can terminate the
> device search loop without finding any MMDs. In that
> case we want to immediately return indicating that
> nothing was found rather than continuing to probe
> and falling into the success state at the bottom.

This is a little confusing when you read this comment:

/* If mostly Fs, there is no device there,
* then let's continue to probe more, as some
* 10G PHYs have zero Devices In package,
* e.g. Cortina CS4315/CS4340 PHY.
*/

Since it appears to be talking about the case of a PHY where *devs will
be zero. However, tracking down the original submission, it seems this
is not the case at all, and the comment is grossly misleading.

It seems these PHYs only report a valid data in the Devices In Package
registers for devad=0 - it has nothing to do with a zero Devices In
Package.

Can I suggest that this comment is fixed while we're changing the code
to explicitly reject this "zero Devices In package" so that it's not
confusing?

Thanks.

>
> Signed-off-by: Jeremy Linton <[email protected]>
> ---
> drivers/net/phy/phy_device.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index ac2784192472..245899b58a7d 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -742,6 +742,12 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
> }
> }
>
> + /* no reported devices */
> + if (*devs == 0) {
> + *phy_id = 0xffffffff;
> + return 0;
> + }
> +
> /* Now probe Device Identifiers for each device present. */
> for (i = 1; i < num_ids; i++) {
> if (!(c45_ids->devices_in_package & (1 << i)))
> --
> 2.26.2
>
>

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC for 0.8m (est. 1762m) line in suburbia: sync at 13.1Mbps down 424kbps up

2020-05-23 18:34:07

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [RFC 03/11] net: phy: refactor c45 phy identification sequence

On Fri, May 22, 2020 at 04:30:51PM -0500, Jeremy Linton wrote:
> Lets factor out the phy id logic, and make it generic
> so that it can be used for c22 and c45.
>
> Signed-off-by: Jeremy Linton <[email protected]>
> ---
> drivers/net/phy/phy_device.c | 65 +++++++++++++++++++-----------------
> 1 file changed, 35 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index 7746c07b97fe..f0761fa5e40b 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -695,6 +695,29 @@ static int get_phy_c45_devs_in_pkg(struct mii_bus *bus, int addr, int dev_addr,
> return 0;
> }
>
> +static int _get_phy_id(struct mii_bus *bus, int addr, int dev_addr,
> + u32 *phy_id, bool c45)
> +{
> + int phy_reg, reg_addr;
> +
> + int reg_base = c45 ? (MII_ADDR_C45 | dev_addr << 16) : 0;

int phy_reg, reg_addr, reg_base;

reg_base = c45 ? (MII_ADDR_C45 | dev_addr << 16) : 0;

I think would be preferable.

> +
> + reg_addr = reg_base | MII_PHYSID1;
> + phy_reg = mdiobus_read(bus, addr, reg_addr);
> + if (phy_reg < 0)
> + return -EIO;
> +
> + *phy_id = phy_reg << 16;
> +
> + reg_addr = reg_base | MII_PHYSID2;
> + phy_reg = mdiobus_read(bus, addr, reg_addr);
> + if (phy_reg < 0)
> + return -EIO;
> + *phy_id |= phy_reg;

The line spacing seems to be a little inconsistent between the above two
register reads.

> +
> + return 0;
> +}

So, _get_phy_id() returns either zero or -EIO.

> +
> static bool valid_phy_id(int val)
> {
> return (val > 0 && ((val & 0x1fffffff) != 0x1fffffff));
> @@ -715,17 +738,17 @@ static bool valid_phy_id(int val)
> */
> static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
> struct phy_c45_device_ids *c45_ids) {
> - int phy_reg;
> - int i, reg_addr;
> + int ret;
> + int i;
> const int num_ids = ARRAY_SIZE(c45_ids->device_ids);
> u32 *devs = &c45_ids->devices_in_package;

I feel a "reverse christmas tree" complaint brewing... yes, the original
code didn't follow it. Maybe a tidy up while touching this code?

>
> /* Find first non-zero Devices In package. Device zero is reserved
> * for 802.3 c45 complied PHYs, so don't probe it at first.
> */
> - for (i = 1; i < num_ids && *devs == 0; i++) {
> - phy_reg = get_phy_c45_devs_in_pkg(bus, addr, i, devs);
> - if (phy_reg < 0)
> + for (i = 0; i < num_ids && *devs == 0; i++) {
> + ret = get_phy_c45_devs_in_pkg(bus, addr, i, devs);
> + if (ret < 0)
> return -EIO;
>
> if ((*devs & 0x1fffffff) == 0x1fffffff) {
> @@ -752,17 +775,9 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
> if (!(c45_ids->devices_in_package & (1 << i)))
> continue;
>
> - reg_addr = MII_ADDR_C45 | i << 16 | MII_PHYSID1;
> - phy_reg = mdiobus_read(bus, addr, reg_addr);
> - if (phy_reg < 0)
> - return -EIO;
> - c45_ids->device_ids[i] = phy_reg << 16;
> -
> - reg_addr = MII_ADDR_C45 | i << 16 | MII_PHYSID2;
> - phy_reg = mdiobus_read(bus, addr, reg_addr);
> - if (phy_reg < 0)
> - return -EIO;
> - c45_ids->device_ids[i] |= phy_reg;
> + ret = _get_phy_id(bus, addr, i, &c45_ids->device_ids[i], true);
> + if (ret < 0)
> + return ret;

So here we can only propagate the -EIO, so this keeps the semantics.

> }
> *phy_id = 0;
> return 0;
> @@ -787,27 +802,17 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
> static int get_phy_id(struct mii_bus *bus, int addr, u32 *phy_id,
> bool is_c45, struct phy_c45_device_ids *c45_ids)
> {
> - int phy_reg;
> + int ret;
>
> if (is_c45)
> return get_phy_c45_ids(bus, addr, phy_id, c45_ids);
>
> - /* Grab the bits from PHYIR1, and put them in the upper half */
> - phy_reg = mdiobus_read(bus, addr, MII_PHYSID1);
> - if (phy_reg < 0) {
> + ret = _get_phy_id(bus, addr, 0, phy_id, false);
> + if (ret < 0) {
> /* returning -ENODEV doesn't stop bus scanning */
> - return (phy_reg == -EIO || phy_reg == -ENODEV) ? -ENODEV : -EIO;
> + return (ret == -EIO || ret == -ENODEV) ? -ENODEV : -EIO;

Since ret will only ever be -EIO here, this can only ever return
-ENODEV, which is a functional change in the code (probably unintended.)
Nevertheless, it's likely introducing a bug if the intention is for
some other return from mdiobus_read() to be handled differently.

> }
>
> - *phy_id = phy_reg << 16;
> -
> - /* Grab the bits from PHYIR2, and put them in the lower half */
> - phy_reg = mdiobus_read(bus, addr, MII_PHYSID2);
> - if (phy_reg < 0)
> - return -EIO;

... whereas this one always returns -EIO on any error.

So, I think you have the potential in this patch to introduce a subtle
change of behaviour, which may lead to problems - have you closely
analysed why the code was the way it was, and whether your change of
behaviour is actually valid?

> -
> - *phy_id |= phy_reg;
> -
> return 0;
> }
>
> --
> 2.26.2
>
>

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC for 0.8m (est. 1762m) line in suburbia: sync at 13.1Mbps down 424kbps up

2020-05-23 18:38:23

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [RFC 02/11] net: phy: Simplify MMD device list termination

On Fri, May 22, 2020 at 04:30:50PM -0500, Jeremy Linton wrote:
> Since we are already checking for *devs == 0 after
> the loop terminates, we can add a mostly F's check
> as well. With that change we can simplify the return/break
> sequence inside the loop.
>
> Add a valid_phy_id() macro for this, since we will be using it
> in a couple other places.

I'm not sure you have the name of this correct, and your usage layer
in your patch series is correct.

>
> Signed-off-by: Jeremy Linton <[email protected]>
> ---
> drivers/net/phy/phy_device.c | 15 +++++++--------
> 1 file changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index 245899b58a7d..7746c07b97fe 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -695,6 +695,11 @@ static int get_phy_c45_devs_in_pkg(struct mii_bus *bus, int addr, int dev_addr,
> return 0;
> }
>
> +static bool valid_phy_id(int val)
> +{
> + return (val > 0 && ((val & 0x1fffffff) != 0x1fffffff));
> +}
> +
> /**
> * get_phy_c45_ids - reads the specified addr for its 802.3-c45 IDs.
> * @bus: the target MII bus
> @@ -732,18 +737,12 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
> phy_reg = get_phy_c45_devs_in_pkg(bus, addr, 0, devs);
> if (phy_reg < 0)
> return -EIO;
> - /* no device there, let's get out of here */
> - if ((*devs & 0x1fffffff) == 0x1fffffff) {
> - *phy_id = 0xffffffff;
> - return 0;
> - } else {
> - break;
> - }
> + break;
> }
> }
>
> /* no reported devices */
> - if (*devs == 0) {
> + if (!valid_phy_id(*devs)) {

You are using this to validate the "devices in package" value, not the
PHY ID value. So, IMHO this should be called "valid_devs_in_package()"
or similar.

> *phy_id = 0xffffffff;
> return 0;
> }
> --
> 2.26.2
>
>

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC for 0.8m (est. 1762m) line in suburbia: sync at 13.1Mbps down 424kbps up

2020-05-23 18:40:10

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [RFC 04/11] net: phy: Handle c22 regs presence better

On Fri, May 22, 2020 at 04:30:52PM -0500, Jeremy Linton wrote:
> Until this point, we have been sanitizing the c22
> regs presence bit out of all the MMD device lists.
> This is incorrect as it causes the 0xFFFFFFFF checks
> to incorrectly fail. Further, it turns out that we
> want to utilize this flag to make a determination that
> there is actually a phy at this location and we should
> be accessing it using c22.
>
> Signed-off-by: Jeremy Linton <[email protected]>
> ---
> drivers/net/phy/phy_device.c | 16 +++++++++++++---
> 1 file changed, 13 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index f0761fa5e40b..2d677490ecab 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -689,9 +689,6 @@ static int get_phy_c45_devs_in_pkg(struct mii_bus *bus, int addr, int dev_addr,
> return -EIO;
> *devices_in_package |= phy_reg;
>
> - /* Bit 0 doesn't represent a device, it indicates c22 regs presence */
> - *devices_in_package &= ~BIT(0);
> -
> return 0;
> }
>
> @@ -742,6 +739,8 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
> int i;
> const int num_ids = ARRAY_SIZE(c45_ids->device_ids);
> u32 *devs = &c45_ids->devices_in_package;
> + bool c22_present = false;
> + bool valid_id = false;
>
> /* Find first non-zero Devices In package. Device zero is reserved
> * for 802.3 c45 complied PHYs, so don't probe it at first.
> @@ -770,6 +769,10 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
> return 0;
> }
>
> + /* Bit 0 doesn't represent a device, it indicates c22 regs presence */
> + c22_present = *devs & BIT(0);
> + *devs &= ~BIT(0);
> +
> /* Now probe Device Identifiers for each device present. */
> for (i = 1; i < num_ids; i++) {
> if (!(c45_ids->devices_in_package & (1 << i)))
> @@ -778,6 +781,13 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
> ret = _get_phy_id(bus, addr, i, &c45_ids->device_ids[i], true);
> if (ret < 0)
> return ret;
> + if (valid_phy_id(c45_ids->device_ids[i]))
> + valid_id = true;

Here you are using your "devices in package" validator to validate the
PHY ID value. One of the things it does is mask this value with
0x1fffffff. That means you lose some of the vendor OUI. To me, this
looks completely wrong.

> + }
> +
> + if (!valid_id && c22_present) {
> + *phy_id = 0xffffffff;
> + return 0;
> }
> *phy_id = 0;
> return 0;
> --
> 2.26.2
>
>

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC for 0.8m (est. 1762m) line in suburbia: sync at 13.1Mbps down 424kbps up

2020-05-23 19:15:37

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [RFC 03/11] net: phy: refactor c45 phy identification sequence

On Sat, May 23, 2020 at 12:32:59PM -0500, Jeremy Linton wrote:
> Hi,
>
> On 5/23/20 10:28 AM, Andrew Lunn wrote:
> > On Fri, May 22, 2020 at 04:30:51PM -0500, Jeremy Linton wrote:
> > > Lets factor out the phy id logic, and make it generic
> > > so that it can be used for c22 and c45.
> > >
> > > Signed-off-by: Jeremy Linton <[email protected]>
> > > ---
> > > drivers/net/phy/phy_device.c | 65 +++++++++++++++++++-----------------
> > > 1 file changed, 35 insertions(+), 30 deletions(-)
> > >
> > > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> > > index 7746c07b97fe..f0761fa5e40b 100644
> > > --- a/drivers/net/phy/phy_device.c
> > > +++ b/drivers/net/phy/phy_device.c
> > > @@ -695,6 +695,29 @@ static int get_phy_c45_devs_in_pkg(struct mii_bus *bus, int addr, int dev_addr,
> > > return 0;
> > > }
> > > +static int _get_phy_id(struct mii_bus *bus, int addr, int dev_addr,
> > > + u32 *phy_id, bool c45)
> >
> > Hi Jeremy
> >
> > How about read_phy_id() so you can avoid the _ prefix.
> >
> > > static bool valid_phy_id(int val)
> > > {
> > > return (val > 0 && ((val & 0x1fffffff) != 0x1fffffff));
> > > @@ -715,17 +738,17 @@ static bool valid_phy_id(int val)
> > > */
> > > static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
> > > struct phy_c45_device_ids *c45_ids) {
> > > - int phy_reg;
> > > - int i, reg_addr;
> > > + int ret;
> > > + int i;
> > > const int num_ids = ARRAY_SIZE(c45_ids->device_ids);
> > > u32 *devs = &c45_ids->devices_in_package;
> > > /* Find first non-zero Devices In package. Device zero is reserved
> > > * for 802.3 c45 complied PHYs, so don't probe it at first.
> > > */
> > > - for (i = 1; i < num_ids && *devs == 0; i++) {
> > > - phy_reg = get_phy_c45_devs_in_pkg(bus, addr, i, devs);
> > > - if (phy_reg < 0)
> > > + for (i = 0; i < num_ids && *devs == 0; i++) {
> > > + ret = get_phy_c45_devs_in_pkg(bus, addr, i, devs);
> > > + if (ret < 0)
> > > return -EIO;
> >
> > Renaming reg_addr to ret does not belong in this patch.
> >
>
> Looks like I changed the loop index in this patch while shuffling things
> around yesterday too. The "for (i = 1/0.." change belongs in 5/11 as well.

Indeed; that's a change of behaviour - see the CS4315/CS4340 workaround.

Note that MMD 0 is explicitly marked "Reserved" in 802.3, so we
shouldn't be probing it in this way.

Also note that bit 0 of the "devices in package" does not mean that
MMD 0 is accessible; it means that the clause 22 registers are
present:

"Bit 5.0 is used to indicate that Clause 22 functionality has been
implemented within a Clause 45 electrical interface device."

which basically means Clause 22 protocol over Clause 45 electrical
interface.

So, we should be careful about poking in MMD 0 if 5.0 is set.

Some places that will break are:

- phylink_phy_read() / phylink_phy_write()

- phy_restart_aneg() / phy_config_aneg() if we have an clause 45
MDIO interface that can't issue clause 22 MDIO cycles with a PHY
that sets 5.0.

These comments apply more to your patch 4, but you brought up the
MMD 0 accesses in this patch, so I thought I'd provide it here.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC for 0.8m (est. 1762m) line in suburbia: sync at 13.1Mbps down 424kbps up

2020-05-23 19:54:14

by Andrew Lunn

[permalink] [raw]
Subject: Re: [RFC 03/11] net: phy: refactor c45 phy identification sequence

> > static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
> > struct phy_c45_device_ids *c45_ids) {
> > - int phy_reg;
> > - int i, reg_addr;
> > + int ret;
> > + int i;
> > const int num_ids = ARRAY_SIZE(c45_ids->device_ids);
> > u32 *devs = &c45_ids->devices_in_package;
>
> I feel a "reverse christmas tree" complaint brewing... yes, the original
> code didn't follow it. Maybe a tidy up while touching this code?

At minimum, a patch should not make it worse. ret and i should clearly
be after devs.

> > static int get_phy_id(struct mii_bus *bus, int addr, u32 *phy_id,
> > bool is_c45, struct phy_c45_device_ids *c45_ids)
> > {
> > - int phy_reg;
> > + int ret;
> >
> > if (is_c45)
> > return get_phy_c45_ids(bus, addr, phy_id, c45_ids);
> >
> > - /* Grab the bits from PHYIR1, and put them in the upper half */
> > - phy_reg = mdiobus_read(bus, addr, MII_PHYSID1);
> > - if (phy_reg < 0) {
> > + ret = _get_phy_id(bus, addr, 0, phy_id, false);
> > + if (ret < 0) {
> > /* returning -ENODEV doesn't stop bus scanning */
> > - return (phy_reg == -EIO || phy_reg == -ENODEV) ? -ENODEV : -EIO;
> > + return (ret == -EIO || ret == -ENODEV) ? -ENODEV : -EIO;
>
> Since ret will only ever be -EIO here, this can only ever return
> -ENODEV, which is a functional change in the code (probably unintended.)
> Nevertheless, it's likely introducing a bug if the intention is for
> some other return from mdiobus_read() to be handled differently.
>
> > }
> >
> > - *phy_id = phy_reg << 16;
> > -
> > - /* Grab the bits from PHYIR2, and put them in the lower half */
> > - phy_reg = mdiobus_read(bus, addr, MII_PHYSID2);
> > - if (phy_reg < 0)
> > - return -EIO;
>
> ... whereas this one always returns -EIO on any error.
>
> So, I think you have the potential in this patch to introduce a subtle
> change of behaviour, which may lead to problems - have you closely
> analysed why the code was the way it was, and whether your change of
> behaviour is actually valid?

I could be remembering this wrongly, but i think this is to do with
orion_mdio_xsmi_read() returning -ENODEV, not 0xffffffffff, if there
is no device on the bus at the given address. -EIO is fatal to the
scan, everything stops with the assumption the bus is broken. -ENODEV
should not be fatal to the scan.

Andrew

2020-05-23 20:04:31

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [RFC 03/11] net: phy: refactor c45 phy identification sequence

On Sat, May 23, 2020 at 09:51:31PM +0200, Andrew Lunn wrote:
> > > static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
> > > struct phy_c45_device_ids *c45_ids) {
> > > - int phy_reg;
> > > - int i, reg_addr;
> > > + int ret;
> > > + int i;
> > > const int num_ids = ARRAY_SIZE(c45_ids->device_ids);
> > > u32 *devs = &c45_ids->devices_in_package;
> >
> > I feel a "reverse christmas tree" complaint brewing... yes, the original
> > code didn't follow it. Maybe a tidy up while touching this code?
>
> At minimum, a patch should not make it worse. ret and i should clearly
> be after devs.
>
> > > static int get_phy_id(struct mii_bus *bus, int addr, u32 *phy_id,
> > > bool is_c45, struct phy_c45_device_ids *c45_ids)
> > > {
> > > - int phy_reg;
> > > + int ret;
> > >
> > > if (is_c45)
> > > return get_phy_c45_ids(bus, addr, phy_id, c45_ids);
> > >
> > > - /* Grab the bits from PHYIR1, and put them in the upper half */
> > > - phy_reg = mdiobus_read(bus, addr, MII_PHYSID1);
> > > - if (phy_reg < 0) {
> > > + ret = _get_phy_id(bus, addr, 0, phy_id, false);
> > > + if (ret < 0) {
> > > /* returning -ENODEV doesn't stop bus scanning */
> > > - return (phy_reg == -EIO || phy_reg == -ENODEV) ? -ENODEV : -EIO;
> > > + return (ret == -EIO || ret == -ENODEV) ? -ENODEV : -EIO;
> >
> > Since ret will only ever be -EIO here, this can only ever return
> > -ENODEV, which is a functional change in the code (probably unintended.)
> > Nevertheless, it's likely introducing a bug if the intention is for
> > some other return from mdiobus_read() to be handled differently.
> >
> > > }
> > >
> > > - *phy_id = phy_reg << 16;
> > > -
> > > - /* Grab the bits from PHYIR2, and put them in the lower half */
> > > - phy_reg = mdiobus_read(bus, addr, MII_PHYSID2);
> > > - if (phy_reg < 0)
> > > - return -EIO;
> >
> > ... whereas this one always returns -EIO on any error.
> >
> > So, I think you have the potential in this patch to introduce a subtle
> > change of behaviour, which may lead to problems - have you closely
> > analysed why the code was the way it was, and whether your change of
> > behaviour is actually valid?
>
> I could be remembering this wrongly, but i think this is to do with
> orion_mdio_xsmi_read() returning -ENODEV, not 0xffffffffff, if there
> is no device on the bus at the given address. -EIO is fatal to the
> scan, everything stops with the assumption the bus is broken. -ENODEV
> should not be fatal to the scan.

Maybe orion_mdio_xsmi_read() should be fixed then? Also, maybe
adding return code documentation for mdiobus_read() / mdiobus_write()
would help MDIO driver authors have some consistency in what
errors they are expected to return (does anyone know for certain?)

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC for 0.8m (est. 1762m) line in suburbia: sync at 13.1Mbps down 424kbps up

2020-05-24 14:47:01

by Andrew Lunn

[permalink] [raw]
Subject: Re: [RFC 08/11] net: phy: Allow mdio buses to auto-probe c45 devices

> +++ b/include/linux/phy.h
> @@ -275,6 +275,11 @@ struct mii_bus {
> int reset_delay_us;
> /* RESET GPIO descriptor pointer */
> struct gpio_desc *reset_gpiod;
> + /* bus capabilities, used for probing */
> + enum {
> + MDIOBUS_C22_ONLY = 0,
> + MDIOBUS_C45_FIRST,
> + } probe_capabilities;
> };


I'm not so keen on _FIRST. It suggest _LAST would also be valid. But
that then suggests this is not a bus property, but a PHY property, and
some PHYs might need _FIRST and other phys need _LAST, and then you
have a bus which has both sorts of PHY on it, and you have a problem.

So i think it would be better to have

enum {
MDIOBUS_UNKNOWN = 0,
MDIOBUS_C22,
MDIOBUS_C45,
MDIOBUS_C45_C22,
} bus_capabilities;

Describe just what the bus master can support.

Andrew

2020-05-25 02:40:17

by Jeremy Linton

[permalink] [raw]
Subject: Re: [RFC 03/11] net: phy: refactor c45 phy identification sequence

Hi,

On 5/23/20 3:01 PM, Russell King - ARM Linux admin wrote:
> On Sat, May 23, 2020 at 09:51:31PM +0200, Andrew Lunn wrote:
>>>> static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
>>>> struct phy_c45_device_ids *c45_ids) {
>>>> - int phy_reg;
>>>> - int i, reg_addr;
>>>> + int ret;
>>>> + int i;
>>>> const int num_ids = ARRAY_SIZE(c45_ids->device_ids);
>>>> u32 *devs = &c45_ids->devices_in_package;
>>>
>>> I feel a "reverse christmas tree" complaint brewing... yes, the original
>>> code didn't follow it. Maybe a tidy up while touching this code?
>>
>> At minimum, a patch should not make it worse. ret and i should clearly
>> be after devs.
>>
>>>> static int get_phy_id(struct mii_bus *bus, int addr, u32 *phy_id,
>>>> bool is_c45, struct phy_c45_device_ids *c45_ids)
>>>> {
>>>> - int phy_reg;
>>>> + int ret;
>>>>
>>>> if (is_c45)
>>>> return get_phy_c45_ids(bus, addr, phy_id, c45_ids);
>>>>
>>>> - /* Grab the bits from PHYIR1, and put them in the upper half */
>>>> - phy_reg = mdiobus_read(bus, addr, MII_PHYSID1);
>>>> - if (phy_reg < 0) {
>>>> + ret = _get_phy_id(bus, addr, 0, phy_id, false);
>>>> + if (ret < 0) {
>>>> /* returning -ENODEV doesn't stop bus scanning */
>>>> - return (phy_reg == -EIO || phy_reg == -ENODEV) ? -ENODEV : -EIO;
>>>> + return (ret == -EIO || ret == -ENODEV) ? -ENODEV : -EIO;
>>>
>>> Since ret will only ever be -EIO here, this can only ever return
>>> -ENODEV, which is a functional change in the code (probably unintended.)
>>> Nevertheless, it's likely introducing a bug if the intention is for
>>> some other return from mdiobus_read() to be handled differently.
>>>
>>>> }
>>>>
>>>> - *phy_id = phy_reg << 16;
>>>> -
>>>> - /* Grab the bits from PHYIR2, and put them in the lower half */
>>>> - phy_reg = mdiobus_read(bus, addr, MII_PHYSID2);
>>>> - if (phy_reg < 0)
>>>> - return -EIO;
>>>
>>> ... whereas this one always returns -EIO on any error.
>>>
>>> So, I think you have the potential in this patch to introduce a subtle
>>> change of behaviour, which may lead to problems - have you closely
>>> analysed why the code was the way it was, and whether your change of
>>> behaviour is actually valid?
>>
>> I could be remembering this wrongly, but i think this is to do with
>> orion_mdio_xsmi_read() returning -ENODEV, not 0xffffffffff, if there
>> is no device on the bus at the given address. -EIO is fatal to the
>> scan, everything stops with the assumption the bus is broken. -ENODEV
>> should not be fatal to the scan.
>
> Maybe orion_mdio_xsmi_read() should be fixed then? Also, maybe
> adding return code documentation for mdiobus_read() / mdiobus_write()
> would help MDIO driver authors have some consistency in what
> errors they are expected to return (does anyone know for certain?)
>

My understanding at this point (which is mostly based on the xgmac
here), is that 0xffffffff is equivalent to "bus responding correctly,
phy failed to respond at this register location" while any -Eerror is
understood as "something wrong with bus", and the mdio core then makes a
choice about terminating just the current phy search (ENODEV), or
terminating the entire mdio bus (basically everything else) registration.

I will see about clarifying the docs. I need to see if that will end up
being a bit of a rabbit hole before committing to including that in this
set.

Which brings up the problem that at least xgmac_mdio doesn't appear to
handle being told "your bus registration failed" without OOPSing the
probe routine. I think Calvin is aware of this, and I believe he has
some additional xgmac/etc patches on top of this set. Although he pinged
me offline the other day to say that apparently all my hunk shuffling
broke some of the c45 phy detection I had working earlier in the week.

2020-05-25 02:53:08

by Jeremy Linton

[permalink] [raw]
Subject: Re: [RFC 02/11] net: phy: Simplify MMD device list termination

Hi,

On 5/23/20 1:36 PM, Russell King - ARM Linux admin wrote:
> On Fri, May 22, 2020 at 04:30:50PM -0500, Jeremy Linton wrote:
>> Since we are already checking for *devs == 0 after
>> the loop terminates, we can add a mostly F's check
>> as well. With that change we can simplify the return/break
>> sequence inside the loop.
>>
>> Add a valid_phy_id() macro for this, since we will be using it
>> in a couple other places.
>
> I'm not sure you have the name of this correct, and your usage layer
> in your patch series is correct.

Or the name is poor..

>
>>
>> Signed-off-by: Jeremy Linton <[email protected]>
>> ---
>> drivers/net/phy/phy_device.c | 15 +++++++--------
>> 1 file changed, 7 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>> index 245899b58a7d..7746c07b97fe 100644
>> --- a/drivers/net/phy/phy_device.c
>> +++ b/drivers/net/phy/phy_device.c
>> @@ -695,6 +695,11 @@ static int get_phy_c45_devs_in_pkg(struct mii_bus *bus, int addr, int dev_addr,
>> return 0;
>> }
>>
>> +static bool valid_phy_id(int val)
>> +{
>> + return (val > 0 && ((val & 0x1fffffff) != 0x1fffffff));
>> +}
>> +
>> /**
>> * get_phy_c45_ids - reads the specified addr for its 802.3-c45 IDs.
>> * @bus: the target MII bus
>> @@ -732,18 +737,12 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
>> phy_reg = get_phy_c45_devs_in_pkg(bus, addr, 0, devs);
>> if (phy_reg < 0)
>> return -EIO;
>> - /* no device there, let's get out of here */
>> - if ((*devs & 0x1fffffff) == 0x1fffffff) {
>> - *phy_id = 0xffffffff;
>> - return 0;
>> - } else {
>> - break;
>> - }
>> + break;
>> }
>> }
>>
>> /* no reported devices */
>> - if (*devs == 0) {
>> + if (!valid_phy_id(*devs)) {
>
> You are using this to validate the "devices in package" value, not the
> PHY ID value. So, IMHO this should be called "valid_devs_in_package()"
> or similar.

Hmmm, its more "valid_phy_reg()" since it ends up being used to validate
both the devs in package as well as phy id.



>
>> *phy_id = 0xffffffff;
>> return 0;
>> }
>> --
>> 2.26.2
>>
>>
>

2020-05-25 03:08:40

by Jeremy Linton

[permalink] [raw]
Subject: Re: [RFC 01/11] net: phy: Don't report success if devices weren't found

Hi,

Thanks for taking a look at this.

On 5/23/20 1:20 PM, Russell King - ARM Linux admin wrote:
> On Fri, May 22, 2020 at 04:30:49PM -0500, Jeremy Linton wrote:
>> C45 devices are to return 0 for registers they haven't
>> implemented. This means in theory we can terminate the
>> device search loop without finding any MMDs. In that
>> case we want to immediately return indicating that
>> nothing was found rather than continuing to probe
>> and falling into the success state at the bottom.
>
> This is a little confusing when you read this comment:
>
> /* If mostly Fs, there is no device there,
> * then let's continue to probe more, as some
> * 10G PHYs have zero Devices In package,
> * e.g. Cortina CS4315/CS4340 PHY.
> */
>
> Since it appears to be talking about the case of a PHY where *devs will
> be zero. However, tracking down the original submission, it seems this
> is not the case at all, and the comment is grossly misleading.
>
> It seems these PHYs only report a valid data in the Devices In Package
> registers for devad=0 - it has nothing to do with a zero Devices In
> Package.

Yes, this ended up being my understanding of this commit, and is part of
my justification for starting the devices search at the reserved address
0 rather than 1.

>
> Can I suggest that this comment is fixed while we're changing the code
> to explicitly reject this "zero Devices In package" so that it's not
> confusing?

Its probably better to kill it in 5/11 with a mention that we are
starting at a reserved address?

OTOH, I'm a bit concerned that reading at 0 as the first address will
cause problems because the original code was only triggering it after a
read returning 0xFFFFFFFF at a valid MMD address. It does
simplify/clarify the loop though. If it weren't for this 0 read, I would
have tried to avoid some of the additional MMD reserved addresses.

2020-05-25 03:39:42

by Jeremy Linton

[permalink] [raw]
Subject: Re: [RFC 04/11] net: phy: Handle c22 regs presence better

Hi,

On 5/23/20 1:37 PM, Russell King - ARM Linux admin wrote:
> On Fri, May 22, 2020 at 04:30:52PM -0500, Jeremy Linton wrote:
>> Until this point, we have been sanitizing the c22
>> regs presence bit out of all the MMD device lists.
>> This is incorrect as it causes the 0xFFFFFFFF checks
>> to incorrectly fail. Further, it turns out that we
>> want to utilize this flag to make a determination that
>> there is actually a phy at this location and we should
>> be accessing it using c22.
>>
>> Signed-off-by: Jeremy Linton <[email protected]>
>> ---
>> drivers/net/phy/phy_device.c | 16 +++++++++++++---
>> 1 file changed, 13 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>> index f0761fa5e40b..2d677490ecab 100644
>> --- a/drivers/net/phy/phy_device.c
>> +++ b/drivers/net/phy/phy_device.c
>> @@ -689,9 +689,6 @@ static int get_phy_c45_devs_in_pkg(struct mii_bus *bus, int addr, int dev_addr,
>> return -EIO;
>> *devices_in_package |= phy_reg;
>>
>> - /* Bit 0 doesn't represent a device, it indicates c22 regs presence */
>> - *devices_in_package &= ~BIT(0);
>> -
>> return 0;
>> }
>>
>> @@ -742,6 +739,8 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
>> int i;
>> const int num_ids = ARRAY_SIZE(c45_ids->device_ids);
>> u32 *devs = &c45_ids->devices_in_package;
>> + bool c22_present = false;
>> + bool valid_id = false;
>>
>> /* Find first non-zero Devices In package. Device zero is reserved
>> * for 802.3 c45 complied PHYs, so don't probe it at first.
>> @@ -770,6 +769,10 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
>> return 0;
>> }
>>
>> + /* Bit 0 doesn't represent a device, it indicates c22 regs presence */
>> + c22_present = *devs & BIT(0);
>> + *devs &= ~BIT(0);
>> +
>> /* Now probe Device Identifiers for each device present. */
>> for (i = 1; i < num_ids; i++) {
>> if (!(c45_ids->devices_in_package & (1 << i)))
>> @@ -778,6 +781,13 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
>> ret = _get_phy_id(bus, addr, i, &c45_ids->device_ids[i], true);
>> if (ret < 0)
>> return ret;
>> + if (valid_phy_id(c45_ids->device_ids[i]))
>> + valid_id = true;
>
> Here you are using your "devices in package" validator to validate the
> PHY ID value. One of the things it does is mask this value with
> 0x1fffffff. That means you lose some of the vendor OUI. To me, this
> looks completely wrong.

I think in this case I was just using it like the comment in
get_phy_device() "if the phy_id is mostly F's, there is no device here".

My understanding is that the code is trying to avoid the 0xFFFFFFFF
returns that seem to indicate "bus ok, phy didn't respond".

I just checked the OUI registration, and while there are a couple OUI's
registered that have a number of FFF's in them, none of those cases
seems to overlap sufficiently to cause this to throw them out. Plus a
phy would also have to have model+revision set to 'F's. So while might
be possible, if unlikely, at the moment I think the OUI registration
keeps this from being a problem. Particularly, if i'm reading the
mapping correctly, the OUI mapping guarantees that the field cannot be
all '1's due to the OUI having X & M bits cleared. It sort of looks like
the mapping is trying to lose those bits, by tossing bit 1 & 2, but the
X & M are in the wrong octet (AFAIK, I just read it three times cause it
didn't make any sense).


2020-05-25 05:21:51

by Jeremy Linton

[permalink] [raw]
Subject: Re: [RFC 08/11] net: phy: Allow mdio buses to auto-probe c45 devices

Hi,

On 5/24/20 9:44 AM, Andrew Lunn wrote:
>> +++ b/include/linux/phy.h
>> @@ -275,6 +275,11 @@ struct mii_bus {
>> int reset_delay_us;
>> /* RESET GPIO descriptor pointer */
>> struct gpio_desc *reset_gpiod;
>> + /* bus capabilities, used for probing */
>> + enum {
>> + MDIOBUS_C22_ONLY = 0,
>> + MDIOBUS_C45_FIRST,
>> + } probe_capabilities;
>> };
>
>
> I'm not so keen on _FIRST. It suggest _LAST would also be valid. But
> that then suggests this is not a bus property, but a PHY property, and
> some PHYs might need _FIRST and other phys need _LAST, and then you
> have a bus which has both sorts of PHY on it, and you have a problem.
>
> So i think it would be better to have
>
> enum {
> MDIOBUS_UNKNOWN = 0,
> MDIOBUS_C22,
> MDIOBUS_C45,
> MDIOBUS_C45_C22,
> } bus_capabilities;
>
> Describe just what the bus master can support.

Yes, the naming is reasonable and I will update it in the next patch. I
went around a bit myself with this naming early on, and the problem I
saw was that a C45 capable master, can have C45 electrical phy's that
only respond to c22 requests (AFAIK). So the MDIOBUS_C45 (I think I was
calling it C45_ONLY) is an invalid selection. Not, that it wouldn't be
helpful to have a C45_ONLY case, but that the assumption is that you
wouldn't try and probe c22 registers, which I thought was a mistake.


Thanks,

2020-05-25 08:25:54

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [RFC 02/11] net: phy: Simplify MMD device list termination

On Sun, May 24, 2020 at 09:48:55PM -0500, Jeremy Linton wrote:
> Hi,
>
> On 5/23/20 1:36 PM, Russell King - ARM Linux admin wrote:
> > On Fri, May 22, 2020 at 04:30:50PM -0500, Jeremy Linton wrote:
> > > Since we are already checking for *devs == 0 after
> > > the loop terminates, we can add a mostly F's check
> > > as well. With that change we can simplify the return/break
> > > sequence inside the loop.
> > >
> > > Add a valid_phy_id() macro for this, since we will be using it
> > > in a couple other places.
> >
> > I'm not sure you have the name of this correct, and your usage layer
> > in your patch series is correct.
>
> Or the name is poor..
>
> >
> > >
> > > Signed-off-by: Jeremy Linton <[email protected]>
> > > ---
> > > drivers/net/phy/phy_device.c | 15 +++++++--------
> > > 1 file changed, 7 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> > > index 245899b58a7d..7746c07b97fe 100644
> > > --- a/drivers/net/phy/phy_device.c
> > > +++ b/drivers/net/phy/phy_device.c
> > > @@ -695,6 +695,11 @@ static int get_phy_c45_devs_in_pkg(struct mii_bus *bus, int addr, int dev_addr,
> > > return 0;
> > > }
> > > +static bool valid_phy_id(int val)
> > > +{
> > > + return (val > 0 && ((val & 0x1fffffff) != 0x1fffffff));
> > > +}
> > > +
> > > /**
> > > * get_phy_c45_ids - reads the specified addr for its 802.3-c45 IDs.
> > > * @bus: the target MII bus
> > > @@ -732,18 +737,12 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
> > > phy_reg = get_phy_c45_devs_in_pkg(bus, addr, 0, devs);
> > > if (phy_reg < 0)
> > > return -EIO;
> > > - /* no device there, let's get out of here */
> > > - if ((*devs & 0x1fffffff) == 0x1fffffff) {
> > > - *phy_id = 0xffffffff;
> > > - return 0;
> > > - } else {
> > > - break;
> > > - }
> > > + break;
> > > }
> > > }
> > > /* no reported devices */
> > > - if (*devs == 0) {
> > > + if (!valid_phy_id(*devs)) {
> >
> > You are using this to validate the "devices in package" value, not the
> > PHY ID value. So, IMHO this should be called "valid_devs_in_package()"
> > or similar.
>
> Hmmm, its more "valid_phy_reg()" since it ends up being used to validate
> both the devs in package as well as phy id.

I don't think that is a valid use of the code you've put in
valid_phy_id().

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC for 0.8m (est. 1762m) line in suburbia: sync at 13.1Mbps down 424kbps up

2020-05-25 09:47:57

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [RFC 01/11] net: phy: Don't report success if devices weren't found

On Sun, May 24, 2020 at 09:46:55PM -0500, Jeremy Linton wrote:
> Hi,
>
> Thanks for taking a look at this.
>
> On 5/23/20 1:20 PM, Russell King - ARM Linux admin wrote:
> > On Fri, May 22, 2020 at 04:30:49PM -0500, Jeremy Linton wrote:
> > > C45 devices are to return 0 for registers they haven't
> > > implemented. This means in theory we can terminate the
> > > device search loop without finding any MMDs. In that
> > > case we want to immediately return indicating that
> > > nothing was found rather than continuing to probe
> > > and falling into the success state at the bottom.
> >
> > This is a little confusing when you read this comment:
> >
> > /* If mostly Fs, there is no device there,
> > * then let's continue to probe more, as some
> > * 10G PHYs have zero Devices In package,
> > * e.g. Cortina CS4315/CS4340 PHY.
> > */
> >
> > Since it appears to be talking about the case of a PHY where *devs will
> > be zero. However, tracking down the original submission, it seems this
> > is not the case at all, and the comment is grossly misleading.
> >
> > It seems these PHYs only report a valid data in the Devices In Package
> > registers for devad=0 - it has nothing to do with a zero Devices In
> > Package.
>
> Yes, this ended up being my understanding of this commit, and is part of my
> justification for starting the devices search at the reserved address 0
> rather than 1.
>
> >
> > Can I suggest that this comment is fixed while we're changing the code
> > to explicitly reject this "zero Devices In package" so that it's not
> > confusing?
>
> Its probably better to kill it in 5/11 with a mention that we are starting
> at a reserved address?
>
> OTOH, I'm a bit concerned that reading at 0 as the first address will cause
> problems because the original code was only triggering it after a read
> returning 0xFFFFFFFF at a valid MMD address. It does simplify/clarify the
> loop though. If it weren't for this 0 read, I would have tried to avoid some
> of the additional MMD reserved addresses.

Yes, that is the worry, and as MMD 0 is marked as reserved, I don't
think we should routinely probe it.

As I've already mentioned, note that bit 0 of devices-in-package does
not mean that there is a MMD 0 implemented, even if bit 0 is set. Bit
0 means that the clause 22 register set is available through clause 22
cycles. So, simplifying the loop to start at 0 and removing the work-
around could end up breaking Cortina PHYs if they don't set bit 0 in
their devices in package - but I don't see why we should depend on bit 0
being set for their workaround.

So, I think you're going to have to add a work-around to ignore bit 0,
which brings up the question whether this is worth it or not.

Hence, I think this is a "simplifcation" too far.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC for 0.8m (est. 1762m) line in suburbia: sync at 13.1Mbps down 424kbps up

2020-05-25 10:09:13

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [RFC 04/11] net: phy: Handle c22 regs presence better

On Sun, May 24, 2020 at 10:34:13PM -0500, Jeremy Linton wrote:
> Hi,
>
> On 5/23/20 1:37 PM, Russell King - ARM Linux admin wrote:
> > On Fri, May 22, 2020 at 04:30:52PM -0500, Jeremy Linton wrote:
> > > Until this point, we have been sanitizing the c22
> > > regs presence bit out of all the MMD device lists.
> > > This is incorrect as it causes the 0xFFFFFFFF checks
> > > to incorrectly fail. Further, it turns out that we
> > > want to utilize this flag to make a determination that
> > > there is actually a phy at this location and we should
> > > be accessing it using c22.
> > >
> > > Signed-off-by: Jeremy Linton <[email protected]>
> > > ---
> > > drivers/net/phy/phy_device.c | 16 +++++++++++++---
> > > 1 file changed, 13 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> > > index f0761fa5e40b..2d677490ecab 100644
> > > --- a/drivers/net/phy/phy_device.c
> > > +++ b/drivers/net/phy/phy_device.c
> > > @@ -689,9 +689,6 @@ static int get_phy_c45_devs_in_pkg(struct mii_bus *bus, int addr, int dev_addr,
> > > return -EIO;
> > > *devices_in_package |= phy_reg;
> > > - /* Bit 0 doesn't represent a device, it indicates c22 regs presence */
> > > - *devices_in_package &= ~BIT(0);
> > > -
> > > return 0;
> > > }
> > > @@ -742,6 +739,8 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
> > > int i;
> > > const int num_ids = ARRAY_SIZE(c45_ids->device_ids);
> > > u32 *devs = &c45_ids->devices_in_package;
> > > + bool c22_present = false;
> > > + bool valid_id = false;
> > > /* Find first non-zero Devices In package. Device zero is reserved
> > > * for 802.3 c45 complied PHYs, so don't probe it at first.
> > > @@ -770,6 +769,10 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
> > > return 0;
> > > }
> > > + /* Bit 0 doesn't represent a device, it indicates c22 regs presence */
> > > + c22_present = *devs & BIT(0);
> > > + *devs &= ~BIT(0);
> > > +
> > > /* Now probe Device Identifiers for each device present. */
> > > for (i = 1; i < num_ids; i++) {
> > > if (!(c45_ids->devices_in_package & (1 << i)))
> > > @@ -778,6 +781,13 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
> > > ret = _get_phy_id(bus, addr, i, &c45_ids->device_ids[i], true);
> > > if (ret < 0)
> > > return ret;
> > > + if (valid_phy_id(c45_ids->device_ids[i]))
> > > + valid_id = true;
> >
> > Here you are using your "devices in package" validator to validate the
> > PHY ID value. One of the things it does is mask this value with
> > 0x1fffffff. That means you lose some of the vendor OUI. To me, this
> > looks completely wrong.
>
> I think in this case I was just using it like the comment in
> get_phy_device() "if the phy_id is mostly F's, there is no device here".
>
> My understanding is that the code is trying to avoid the 0xFFFFFFFF returns
> that seem to indicate "bus ok, phy didn't respond".
>
> I just checked the OUI registration, and while there are a couple OUI's
> registered that have a number of FFF's in them, none of those cases seems to
> overlap sufficiently to cause this to throw them out. Plus a phy would also
> have to have model+revision set to 'F's. So while might be possible, if
> unlikely, at the moment I think the OUI registration keeps this from being a
> problem. Particularly, if i'm reading the mapping correctly, the OUI mapping
> guarantees that the field cannot be all '1's due to the OUI having X & M
> bits cleared. It sort of looks like the mapping is trying to lose those
> bits, by tossing bit 1 & 2, but the X & M are in the wrong octet (AFAIK, I
> just read it three times cause it didn't make any sense).

I should also note that we have at least one supported PHY where one
of the MMDs returns 0xfffe for even numbered registers and 0x0000 for
odd numbered registers in one of the vendor MMDs for addresses 0
through 0xefff - which has a bit set in the devices-in-package.

It also returns 0x0082 for almost every register in MMD 2, but MMD 2's
devices-in-package bit is clear in most of the valid MMDs, so we
shouldn't touch it.

These reveal the problem of randomly probing MMDs - they can return
unexpected values and not be as well behaved as we would like them to
be. Using register 8 to detect presence may be beneficial, but that
may also introduce problems as we haven't used that before (and we
don't know whether any PHY that wrong.) I know at least the 88x3310
gets it right for all except the vendor MMDs, where the low addresses
appear non-confromant to the 802.3 specs. Both vendor MMDs are
definitely implemented, just not with anything conforming to 802.3.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC for 0.8m (est. 1762m) line in suburbia: sync at 13.1Mbps down 424kbps up

2020-05-25 16:26:59

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [RFC 08/11] net: phy: Allow mdio buses to auto-probe c45 devices

On Sun, May 24, 2020 at 11:28:52PM -0500, Jeremy Linton wrote:
> Hi,
>
> On 5/24/20 9:44 AM, Andrew Lunn wrote:
> > > +++ b/include/linux/phy.h
> > > @@ -275,6 +275,11 @@ struct mii_bus {
> > > int reset_delay_us;
> > > /* RESET GPIO descriptor pointer */
> > > struct gpio_desc *reset_gpiod;
> > > + /* bus capabilities, used for probing */
> > > + enum {
> > > + MDIOBUS_C22_ONLY = 0,
> > > + MDIOBUS_C45_FIRST,
> > > + } probe_capabilities;
> > > };
> >
> >
> > I'm not so keen on _FIRST. It suggest _LAST would also be valid. But
> > that then suggests this is not a bus property, but a PHY property, and
> > some PHYs might need _FIRST and other phys need _LAST, and then you
> > have a bus which has both sorts of PHY on it, and you have a problem.
> >
> > So i think it would be better to have
> >
> > enum {
> > MDIOBUS_UNKNOWN = 0,
> > MDIOBUS_C22,
> > MDIOBUS_C45,
> > MDIOBUS_C45_C22,
> > } bus_capabilities;
> >
> > Describe just what the bus master can support.
>
> Yes, the naming is reasonable and I will update it in the next patch. I went
> around a bit myself with this naming early on, and the problem I saw was
> that a C45 capable master, can have C45 electrical phy's that only respond
> to c22 requests (AFAIK).

If you have a master that can only generate clause 45 cycles, and
someone is daft enough to connect a clause 22 only PHY to it, the
result is hardware that doesn't work - there's no getting around
that. The MDIO interface can't generate the appropriate cycles to
access the clause 22 PHY. So, this is not something we need care
about.

> So the MDIOBUS_C45 (I think I was calling it
> C45_ONLY) is an invalid selection. Not, that it wouldn't be helpful to have
> a C45_ONLY case, but that the assumption is that you wouldn't try and probe
> c22 registers, which I thought was a mistake.

MDIOBUS_C45 means "I can generate clause 45 cycles".
MDIOBUS_C22 means "I can generate clause 22 cycles".
MDIOBUS_C45_C22 means "I can generate both clause 45 and clause 22
cycles."

Notice carefully the values these end up with - MDIOBUS_C22 = BIT(0),
MDIOBUS_C45 = BIT(1), MDIOBUS_C45_C22 = BIT(0) | BIT(1). I suspect
that was no coincidence in Andrew's suggestion.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC for 0.8m (est. 1762m) line in suburbia: sync at 13.1Mbps down 424kbps up

2020-05-25 18:55:07

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [RFC 04/11] net: phy: Handle c22 regs presence better

On Sun, May 24, 2020 at 10:34:13PM -0500, Jeremy Linton wrote:
> Hi,
>
> On 5/23/20 1:37 PM, Russell King - ARM Linux admin wrote:
> > On Fri, May 22, 2020 at 04:30:52PM -0500, Jeremy Linton wrote:
> > > Until this point, we have been sanitizing the c22
> > > regs presence bit out of all the MMD device lists.
> > > This is incorrect as it causes the 0xFFFFFFFF checks
> > > to incorrectly fail. Further, it turns out that we
> > > want to utilize this flag to make a determination that
> > > there is actually a phy at this location and we should
> > > be accessing it using c22.
> > >
> > > Signed-off-by: Jeremy Linton <[email protected]>
> > > ---
> > > drivers/net/phy/phy_device.c | 16 +++++++++++++---
> > > 1 file changed, 13 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> > > index f0761fa5e40b..2d677490ecab 100644
> > > --- a/drivers/net/phy/phy_device.c
> > > +++ b/drivers/net/phy/phy_device.c
> > > @@ -689,9 +689,6 @@ static int get_phy_c45_devs_in_pkg(struct mii_bus *bus, int addr, int dev_addr,
> > > return -EIO;
> > > *devices_in_package |= phy_reg;
> > > - /* Bit 0 doesn't represent a device, it indicates c22 regs presence */
> > > - *devices_in_package &= ~BIT(0);
> > > -
> > > return 0;
> > > }
> > > @@ -742,6 +739,8 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
> > > int i;
> > > const int num_ids = ARRAY_SIZE(c45_ids->device_ids);
> > > u32 *devs = &c45_ids->devices_in_package;
> > > + bool c22_present = false;
> > > + bool valid_id = false;
> > > /* Find first non-zero Devices In package. Device zero is reserved
> > > * for 802.3 c45 complied PHYs, so don't probe it at first.
> > > @@ -770,6 +769,10 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
> > > return 0;
> > > }
> > > + /* Bit 0 doesn't represent a device, it indicates c22 regs presence */
> > > + c22_present = *devs & BIT(0);
> > > + *devs &= ~BIT(0);
> > > +
> > > /* Now probe Device Identifiers for each device present. */
> > > for (i = 1; i < num_ids; i++) {
> > > if (!(c45_ids->devices_in_package & (1 << i)))
> > > @@ -778,6 +781,13 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
> > > ret = _get_phy_id(bus, addr, i, &c45_ids->device_ids[i], true);
> > > if (ret < 0)
> > > return ret;
> > > + if (valid_phy_id(c45_ids->device_ids[i]))
> > > + valid_id = true;
> >
> > Here you are using your "devices in package" validator to validate the
> > PHY ID value. One of the things it does is mask this value with
> > 0x1fffffff. That means you lose some of the vendor OUI. To me, this
> > looks completely wrong.
>
> I think in this case I was just using it like the comment in
> get_phy_device() "if the phy_id is mostly F's, there is no device here".

Yes, that is certainly an interesting comment. What's so magic about
this 0x1fffffff? If it's about the time taken for the bus to rise
to logic 1 when not being actively driven by a PHY, then it actually
makes little sense, because we perform two transations to read each half
of the field, and both should have the same behaviour. If this was the
issue, we should be masking and testing against 0x1fff1fff rather than
0x1fffffff.

> I just checked the OUI registration, and while there are a couple OUI's
> registered that have a number of FFF's in them, none of those cases seems to
> overlap sufficiently to cause this to throw them out. Plus a phy would also
> have to have model+revision set to 'F's. So while might be possible, if
> unlikely, at the moment I think the OUI registration keeps this from being a
> problem. Particularly, if i'm reading the mapping correctly, the OUI mapping
> guarantees that the field cannot be all '1's due to the OUI having X & M
> bits cleared. It sort of looks like the mapping is trying to lose those
> bits, by tossing bit 1 & 2, but the X & M are in the wrong octet (AFAIK, I
> just read it three times cause it didn't make any sense).

The most-bits-set OUI that is currently allocated is 5C-FF-FF. This
would result in a register value of 0x73fffc00 to 0x73ffffff, so as
you say, it should be safe.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC for 0.8m (est. 1762m) line in suburbia: sync at 13.1Mbps down 424kbps up

2020-05-25 20:50:08

by Andrew Lunn

[permalink] [raw]
Subject: Re: [RFC 08/11] net: phy: Allow mdio buses to auto-probe c45 devices

> > > So i think it would be better to have
> > >
> > > enum {
> > > MDIOBUS_UNKNOWN = 0,
> > > MDIOBUS_C22,
> > > MDIOBUS_C45,
> > > MDIOBUS_C45_C22,
> > > } bus_capabilities;
> > >
> > > Describe just what the bus master can support.
> >
> > Yes, the naming is reasonable and I will update it in the next patch. I went
> > around a bit myself with this naming early on, and the problem I saw was
> > that a C45 capable master, can have C45 electrical phy's that only respond
> > to c22 requests (AFAIK).
>
> If you have a master that can only generate clause 45 cycles, and
> someone is daft enough to connect a clause 22 only PHY to it, the
> result is hardware that doesn't work - there's no getting around
> that. The MDIO interface can't generate the appropriate cycles to
> access the clause 22 PHY. So, this is not something we need care
> about.
>
> > So the MDIOBUS_C45 (I think I was calling it
> > C45_ONLY) is an invalid selection. Not, that it wouldn't be helpful to have
> > a C45_ONLY case, but that the assumption is that you wouldn't try and probe
> > c22 registers, which I thought was a mistake.
>
> MDIOBUS_C45 means "I can generate clause 45 cycles".
> MDIOBUS_C22 means "I can generate clause 22 cycles".
> MDIOBUS_C45_C22 means "I can generate both clause 45 and clause 22
> cycles."
>
> Notice carefully the values these end up with - MDIOBUS_C22 = BIT(0),
> MDIOBUS_C45 = BIT(1), MDIOBUS_C45_C22 = BIT(0) | BIT(1). I suspect
> that was no coincidence in Andrew's suggestion.

Hi Russell

What was a nice side affect. Since i doubt Jeremy is going to go
through every MDIO driver and set the capabilities correctly, i wanted
0 to have a safe meaning. In the code we should treat MDIOBUS_UNKNOWN
and MDIOBUS_C22 identically. But maybe some time in the distant
future, we can make 0 issue a warning.

Andrew

2020-05-25 21:58:17

by Jeremy Linton

[permalink] [raw]
Subject: Re: [RFC 01/11] net: phy: Don't report success if devices weren't found

Hi,

On 5/25/20 4:45 AM, Russell King - ARM Linux admin wrote:
> On Sun, May 24, 2020 at 09:46:55PM -0500, Jeremy Linton wrote:
>> Hi,
>>
>> Thanks for taking a look at this.
>>
>> On 5/23/20 1:20 PM, Russell King - ARM Linux admin wrote:
>>> On Fri, May 22, 2020 at 04:30:49PM -0500, Jeremy Linton wrote:
>>>> C45 devices are to return 0 for registers they haven't
>>>> implemented. This means in theory we can terminate the
>>>> device search loop without finding any MMDs. In that
>>>> case we want to immediately return indicating that
>>>> nothing was found rather than continuing to probe
>>>> and falling into the success state at the bottom.
>>>
>>> This is a little confusing when you read this comment:
>>>
>>> /* If mostly Fs, there is no device there,
>>> * then let's continue to probe more, as some
>>> * 10G PHYs have zero Devices In package,
>>> * e.g. Cortina CS4315/CS4340 PHY.
>>> */
>>>
>>> Since it appears to be talking about the case of a PHY where *devs will
>>> be zero. However, tracking down the original submission, it seems this
>>> is not the case at all, and the comment is grossly misleading.
>>>
>>> It seems these PHYs only report a valid data in the Devices In Package
>>> registers for devad=0 - it has nothing to do with a zero Devices In
>>> Package.
>>
>> Yes, this ended up being my understanding of this commit, and is part of my
>> justification for starting the devices search at the reserved address 0
>> rather than 1.
>>
>>>
>>> Can I suggest that this comment is fixed while we're changing the code
>>> to explicitly reject this "zero Devices In package" so that it's not
>>> confusing?
>>
>> Its probably better to kill it in 5/11 with a mention that we are starting
>> at a reserved address?
>>
>> OTOH, I'm a bit concerned that reading at 0 as the first address will cause
>> problems because the original code was only triggering it after a read
>> returning 0xFFFFFFFF at a valid MMD address. It does simplify/clarify the
>> loop though. If it weren't for this 0 read, I would have tried to avoid some
>> of the additional MMD reserved addresses.
>
> Yes, that is the worry, and as MMD 0 is marked as reserved, I don't
> think we should routinely probe it.

Hmm, ok, so it gets a bit more complex then. The loop could probe all
the valid MMD addresses, then if that doesn't appear to have yielded
anything try the reserved ones? Its actually not a big code change
because we could have a hardcoded bitfield of valid MMD addresses which
we check before trying the probe. Then make one pass through the loop
hitting the valid ones, and if we still didn't find anything, try the
reserved addresses.


>
> As I've already mentioned, note that bit 0 of devices-in-package does
> not mean that there is a MMD 0 implemented, even if bit 0 is set. Bit
> 0 means that the clause 22 register set is available through clause 22
> cycles. So, simplifying the loop to start at 0 and removing the work-
> around could end up breaking Cortina PHYs if they don't set bit 0 in
> their devices in package - but I don't see why we should depend on bit 0
> being set for their workaround.
Just to be clear this set probes MMD 0 except to ask for the device
list. That is the same behavior as before. That is because all the
subsequent id/etc loops are indexed to start at MMD 1. The primary
difference for the cortiana PHY's AFAIK, is the order we ask for the
devices list. Before it had to fail at a valid addr before reading 0,
now it just starts at 0 and continues to probe if that fails. Some of
this is required (continuing scan on failure) due to phys that "fail"
reading the devices list for the lower MMD's addresses but work on the
higher ones.


>
> So, I think you're going to have to add a work-around to ignore bit 0,
> which brings up the question whether this is worth it or not.

It does ignore bit 0, it gets turned into the C22 regs flag, and
cleared/ignored in the remainder of the code (do to MMD loop indexes
starting at 1).

>
> Hence, I think this is a "simplifcation" too far.
>

Ok, if i'm understanding correctly, avoid the reserved addresses unless
we fail to get a device list as before. That isn't a problem, I will
include that in the next revision.


Thanks,

2020-05-25 22:02:31

by Jeremy Linton

[permalink] [raw]
Subject: Re: [RFC 04/11] net: phy: Handle c22 regs presence better

Hi,

On 5/25/20 5:06 AM, Russell King - ARM Linux admin wrote:
> On Sun, May 24, 2020 at 10:34:13PM -0500, Jeremy Linton wrote:
>> Hi,
>>
>> On 5/23/20 1:37 PM, Russell King - ARM Linux admin wrote:
>>> On Fri, May 22, 2020 at 04:30:52PM -0500, Jeremy Linton wrote:
>>>> Until this point, we have been sanitizing the c22
>>>> regs presence bit out of all the MMD device lists.
>>>> This is incorrect as it causes the 0xFFFFFFFF checks
>>>> to incorrectly fail. Further, it turns out that we
>>>> want to utilize this flag to make a determination that
>>>> there is actually a phy at this location and we should
>>>> be accessing it using c22.
>>>>
>>>> Signed-off-by: Jeremy Linton <[email protected]>
>>>> ---
>>>> drivers/net/phy/phy_device.c | 16 +++++++++++++---
>>>> 1 file changed, 13 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>>>> index f0761fa5e40b..2d677490ecab 100644
>>>> --- a/drivers/net/phy/phy_device.c
>>>> +++ b/drivers/net/phy/phy_device.c
>>>> @@ -689,9 +689,6 @@ static int get_phy_c45_devs_in_pkg(struct mii_bus *bus, int addr, int dev_addr,
>>>> return -EIO;
>>>> *devices_in_package |= phy_reg;
>>>> - /* Bit 0 doesn't represent a device, it indicates c22 regs presence */
>>>> - *devices_in_package &= ~BIT(0);
>>>> -
>>>> return 0;
>>>> }
>>>> @@ -742,6 +739,8 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
>>>> int i;
>>>> const int num_ids = ARRAY_SIZE(c45_ids->device_ids);
>>>> u32 *devs = &c45_ids->devices_in_package;
>>>> + bool c22_present = false;
>>>> + bool valid_id = false;
>>>> /* Find first non-zero Devices In package. Device zero is reserved
>>>> * for 802.3 c45 complied PHYs, so don't probe it at first.
>>>> @@ -770,6 +769,10 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
>>>> return 0;
>>>> }
>>>> + /* Bit 0 doesn't represent a device, it indicates c22 regs presence */
>>>> + c22_present = *devs & BIT(0);
>>>> + *devs &= ~BIT(0);
>>>> +
>>>> /* Now probe Device Identifiers for each device present. */
>>>> for (i = 1; i < num_ids; i++) {
>>>> if (!(c45_ids->devices_in_package & (1 << i)))
>>>> @@ -778,6 +781,13 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
>>>> ret = _get_phy_id(bus, addr, i, &c45_ids->device_ids[i], true);
>>>> if (ret < 0)
>>>> return ret;
>>>> + if (valid_phy_id(c45_ids->device_ids[i]))
>>>> + valid_id = true;
>>>
>>> Here you are using your "devices in package" validator to validate the
>>> PHY ID value. One of the things it does is mask this value with
>>> 0x1fffffff. That means you lose some of the vendor OUI. To me, this
>>> looks completely wrong.
>>
>> I think in this case I was just using it like the comment in
>> get_phy_device() "if the phy_id is mostly F's, there is no device here".
>>
>> My understanding is that the code is trying to avoid the 0xFFFFFFFF returns
>> that seem to indicate "bus ok, phy didn't respond".
>>
>> I just checked the OUI registration, and while there are a couple OUI's
>> registered that have a number of FFF's in them, none of those cases seems to
>> overlap sufficiently to cause this to throw them out. Plus a phy would also
>> have to have model+revision set to 'F's. So while might be possible, if
>> unlikely, at the moment I think the OUI registration keeps this from being a
>> problem. Particularly, if i'm reading the mapping correctly, the OUI mapping
>> guarantees that the field cannot be all '1's due to the OUI having X & M
>> bits cleared. It sort of looks like the mapping is trying to lose those
>> bits, by tossing bit 1 & 2, but the X & M are in the wrong octet (AFAIK, I
>> just read it three times cause it didn't make any sense).
>
> I should also note that we have at least one supported PHY where one
> of the MMDs returns 0xfffe for even numbered registers and 0x0000 for
> odd numbered registers in one of the vendor MMDs for addresses 0
> through 0xefff - which has a bit set in the devices-in-package.
>
> It also returns 0x0082 for almost every register in MMD 2, but MMD 2's
> devices-in-package bit is clear in most of the valid MMDs, so we
> shouldn't touch it.
>
> These reveal the problem of randomly probing MMDs - they can return
> unexpected values and not be as well behaved as we would like them to
> be. Using register 8 to detect presence may be beneficial, but that
> may also introduce problems as we haven't used that before (and we
> don't know whether any PHY that wrong.) I know at least the 88x3310
> gets it right for all except the vendor MMDs, where the low addresses
> appear non-confromant to the 802.3 specs. Both vendor MMDs are
> definitely implemented, just not with anything conforming to 802.3.

Yes, we know even for the NXP reference hardware, one of the phy's
doesn't probe out correctly because it doesn't respond to the ieee
defined registers. I think at this point, there really isn't anything we
can do about that unless we involve the (ACPI) firmware in currently
nonstandard behaviors.

So, my goals here have been to first, not break anything, and then do a
slightly better job finding phy's that are (mostly?) responding
correctly to the 802.3 spec. So we can say "if you hardware is ACPI
conformant, and you have IEEE conformant phy's you should be ok". So,
for your example phy, I guess the immediate answer is "use DT" or "find
a conformant phy", or even "abstract it in the firmware and use a
mailbox interface".



2020-05-25 22:03:42

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [RFC 04/11] net: phy: Handle c22 regs presence better

On Mon, May 25, 2020 at 04:51:16PM -0500, Jeremy Linton wrote:
> Hi,
>
> On 5/25/20 5:06 AM, Russell King - ARM Linux admin wrote:
> > On Sun, May 24, 2020 at 10:34:13PM -0500, Jeremy Linton wrote:
> > > Hi,
> > >
> > > On 5/23/20 1:37 PM, Russell King - ARM Linux admin wrote:
> > > > On Fri, May 22, 2020 at 04:30:52PM -0500, Jeremy Linton wrote:
> > > > > Until this point, we have been sanitizing the c22
> > > > > regs presence bit out of all the MMD device lists.
> > > > > This is incorrect as it causes the 0xFFFFFFFF checks
> > > > > to incorrectly fail. Further, it turns out that we
> > > > > want to utilize this flag to make a determination that
> > > > > there is actually a phy at this location and we should
> > > > > be accessing it using c22.
> > > > >
> > > > > Signed-off-by: Jeremy Linton <[email protected]>
> > > > > ---
> > > > > drivers/net/phy/phy_device.c | 16 +++++++++++++---
> > > > > 1 file changed, 13 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> > > > > index f0761fa5e40b..2d677490ecab 100644
> > > > > --- a/drivers/net/phy/phy_device.c
> > > > > +++ b/drivers/net/phy/phy_device.c
> > > > > @@ -689,9 +689,6 @@ static int get_phy_c45_devs_in_pkg(struct mii_bus *bus, int addr, int dev_addr,
> > > > > return -EIO;
> > > > > *devices_in_package |= phy_reg;
> > > > > - /* Bit 0 doesn't represent a device, it indicates c22 regs presence */
> > > > > - *devices_in_package &= ~BIT(0);
> > > > > -
> > > > > return 0;
> > > > > }
> > > > > @@ -742,6 +739,8 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
> > > > > int i;
> > > > > const int num_ids = ARRAY_SIZE(c45_ids->device_ids);
> > > > > u32 *devs = &c45_ids->devices_in_package;
> > > > > + bool c22_present = false;
> > > > > + bool valid_id = false;
> > > > > /* Find first non-zero Devices In package. Device zero is reserved
> > > > > * for 802.3 c45 complied PHYs, so don't probe it at first.
> > > > > @@ -770,6 +769,10 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
> > > > > return 0;
> > > > > }
> > > > > + /* Bit 0 doesn't represent a device, it indicates c22 regs presence */
> > > > > + c22_present = *devs & BIT(0);
> > > > > + *devs &= ~BIT(0);
> > > > > +
> > > > > /* Now probe Device Identifiers for each device present. */
> > > > > for (i = 1; i < num_ids; i++) {
> > > > > if (!(c45_ids->devices_in_package & (1 << i)))
> > > > > @@ -778,6 +781,13 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
> > > > > ret = _get_phy_id(bus, addr, i, &c45_ids->device_ids[i], true);
> > > > > if (ret < 0)
> > > > > return ret;
> > > > > + if (valid_phy_id(c45_ids->device_ids[i]))
> > > > > + valid_id = true;
> > > >
> > > > Here you are using your "devices in package" validator to validate the
> > > > PHY ID value. One of the things it does is mask this value with
> > > > 0x1fffffff. That means you lose some of the vendor OUI. To me, this
> > > > looks completely wrong.
> > >
> > > I think in this case I was just using it like the comment in
> > > get_phy_device() "if the phy_id is mostly F's, there is no device here".
> > >
> > > My understanding is that the code is trying to avoid the 0xFFFFFFFF returns
> > > that seem to indicate "bus ok, phy didn't respond".
> > >
> > > I just checked the OUI registration, and while there are a couple OUI's
> > > registered that have a number of FFF's in them, none of those cases seems to
> > > overlap sufficiently to cause this to throw them out. Plus a phy would also
> > > have to have model+revision set to 'F's. So while might be possible, if
> > > unlikely, at the moment I think the OUI registration keeps this from being a
> > > problem. Particularly, if i'm reading the mapping correctly, the OUI mapping
> > > guarantees that the field cannot be all '1's due to the OUI having X & M
> > > bits cleared. It sort of looks like the mapping is trying to lose those
> > > bits, by tossing bit 1 & 2, but the X & M are in the wrong octet (AFAIK, I
> > > just read it three times cause it didn't make any sense).
> >
> > I should also note that we have at least one supported PHY where one
> > of the MMDs returns 0xfffe for even numbered registers and 0x0000 for
> > odd numbered registers in one of the vendor MMDs for addresses 0
> > through 0xefff - which has a bit set in the devices-in-package.
> >
> > It also returns 0x0082 for almost every register in MMD 2, but MMD 2's
> > devices-in-package bit is clear in most of the valid MMDs, so we
> > shouldn't touch it.
> >
> > These reveal the problem of randomly probing MMDs - they can return
> > unexpected values and not be as well behaved as we would like them to
> > be. Using register 8 to detect presence may be beneficial, but that
> > may also introduce problems as we haven't used that before (and we
> > don't know whether any PHY that wrong.) I know at least the 88x3310
> > gets it right for all except the vendor MMDs, where the low addresses
> > appear non-confromant to the 802.3 specs. Both vendor MMDs are
> > definitely implemented, just not with anything conforming to 802.3.
>
> Yes, we know even for the NXP reference hardware, one of the phy's doesn't
> probe out correctly because it doesn't respond to the ieee defined
> registers. I think at this point, there really isn't anything we can do
> about that unless we involve the (ACPI) firmware in currently nonstandard
> behaviors.
>
> So, my goals here have been to first, not break anything, and then do a
> slightly better job finding phy's that are (mostly?) responding correctly to
> the 802.3 spec. So we can say "if you hardware is ACPI conformant, and you
> have IEEE conformant phy's you should be ok". So, for your example phy, I
> guess the immediate answer is "use DT" or "find a conformant phy", or even
> "abstract it in the firmware and use a mailbox interface".

You haven't understood. The PHY does conform for most of the MMDs,
but there are a number that do not conform.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC for 0.8m (est. 1762m) line in suburbia: sync at 13.1Mbps down 424kbps up

2020-05-25 22:03:57

by Jeremy Linton

[permalink] [raw]
Subject: Re: [RFC 01/11] net: phy: Don't report success if devices weren't found

Hi,

On 5/25/20 4:07 PM, Russell King - ARM Linux admin wrote:
> On Mon, May 25, 2020 at 04:02:13PM -0500, Jeremy Linton wrote:
>>> So, I think you're going to have to add a work-around to ignore bit 0,
>>> which brings up the question whether this is worth it or not.
>>
>> It does ignore bit 0, it gets turned into the C22 regs flag, and
>> cleared/ignored in the remainder of the code (do to MMD loop indexes
>> starting at 1).
>
> However, I've already pointed out that that isn't the case in a
> number of functions that I listed in another email, and I suspect
> was glossed over.
>

Hmm, right, I might not be understanding, I'm still considering your
comments in 4/11 and a couple others..

OTOH, the mmd 0 logic could be completely removed, as its actually been
broken for a year or so in linux (AFAIK) because the code triggering it
was disabled when the C22 sanitation patch was merged. OTOH, this patch
is still clearing the C22 flag from devices, so anything dependent
entirely on that should have the same behavior as before.

So, there is a bug in the is_valid_phy/device macro, because I messed it
up when I converted it to a function because its using a signed val,
when it should be unsigned. I don't think that is what you were hinting
in 4/11 though.

Thanks,

2020-05-25 22:27:54

by Andrew Lunn

[permalink] [raw]
Subject: Re: [RFC 04/11] net: phy: Handle c22 regs presence better

> Yes, we know even for the NXP reference hardware, one of the phy's doesn't
> probe out correctly because it doesn't respond to the ieee defined
> registers. I think at this point, there really isn't anything we can do
> about that unless we involve the (ACPI) firmware in currently nonstandard
> behaviors.
>
> So, my goals here have been to first, not break anything, and then do a
> slightly better job finding phy's that are (mostly?) responding correctly to
> the 802.3 spec. So we can say "if you hardware is ACPI conformant, and you
> have IEEE conformant phy's you should be ok". So, for your example phy, I
> guess the immediate answer is "use DT" or "find a conformant phy", or even
> "abstract it in the firmware and use a mailbox interface".

Hi Jeremy

You need to be careful here, when you say "use DT". With a c45 PHY
of_mdiobus_register_phy() calls get_phy_device() to see if the device
exists on the bus. So your changes to get_phy_device() etc, needs to
continue to find devices it used to find, even if they are not fully
complient to 802.3.

Andrew

2020-05-25 22:29:58

by Jeremy Linton

[permalink] [raw]
Subject: Re: [RFC 08/11] net: phy: Allow mdio buses to auto-probe c45 devices

Hi,

On 5/25/20 3:25 AM, Russell King - ARM Linux admin wrote:
> On Sun, May 24, 2020 at 11:28:52PM -0500, Jeremy Linton wrote:
>> Hi,
>>
>> On 5/24/20 9:44 AM, Andrew Lunn wrote:
>>>> +++ b/include/linux/phy.h
>>>> @@ -275,6 +275,11 @@ struct mii_bus {
>>>> int reset_delay_us;
>>>> /* RESET GPIO descriptor pointer */
>>>> struct gpio_desc *reset_gpiod;
>>>> + /* bus capabilities, used for probing */
>>>> + enum {
>>>> + MDIOBUS_C22_ONLY = 0,
>>>> + MDIOBUS_C45_FIRST,
>>>> + } probe_capabilities;
>>>> };
>>>
>>>
>>> I'm not so keen on _FIRST. It suggest _LAST would also be valid. But
>>> that then suggests this is not a bus property, but a PHY property, and
>>> some PHYs might need _FIRST and other phys need _LAST, and then you
>>> have a bus which has both sorts of PHY on it, and you have a problem.
>>>
>>> So i think it would be better to have
>>>
>>> enum {
>>> MDIOBUS_UNKNOWN = 0,
>>> MDIOBUS_C22,
>>> MDIOBUS_C45,
>>> MDIOBUS_C45_C22,
>>> } bus_capabilities;
>>>
>>> Describe just what the bus master can support.
>>
>> Yes, the naming is reasonable and I will update it in the next patch. I went
>> around a bit myself with this naming early on, and the problem I saw was
>> that a C45 capable master, can have C45 electrical phy's that only respond
>> to c22 requests (AFAIK).
>
> If you have a master that can only generate clause 45 cycles, and
> someone is daft enough to connect a clause 22 only PHY to it, the
> result is hardware that doesn't work - there's no getting around
> that. The MDIO interface can't generate the appropriate cycles to
> access the clause 22 PHY. So, this is not something we need care
> about.
>
>> So the MDIOBUS_C45 (I think I was calling it
>> C45_ONLY) is an invalid selection. Not, that it wouldn't be helpful to have
>> a C45_ONLY case, but that the assumption is that you wouldn't try and probe
>> c22 registers, which I thought was a mistake.
>
> MDIOBUS_C45 means "I can generate clause 45 cycles".
> MDIOBUS_C22 means "I can generate clause 22 cycles".
> MDIOBUS_C45_C22 means "I can generate both clause 45 and clause 22
> cycles."

Hi, to be clear, we are talking about c45 controllers that can access
the c22 register space via c45, or controllers which are
electrically/level shifting to be compatible with c22 voltages/etc?

The nxp hardware in question has 1, 10 and 40Gbit phys on the same MDIO,
the 1gbit we fall back to c22 registers because it doesn't respond
correctly to c45 registers. Which is AFAIK what the bit0 C22 regs bit is
for..

The general logic right now for a C45_FIRST is attempt to detect a c45
phy and if nothing is detected fall back and attempt c22 register
access. That is whats picking up the 1G phys. If for whatever reason the
MDIO controller can't do the right thing to access the c22 regs, I guess
there really isn't anything we can do about it.


>
> Notice carefully the values these end up with - MDIOBUS_C22 = BIT(0),
> MDIOBUS_C45 = BIT(1), MDIOBUS_C45_C22 = BIT(0) | BIT(1). I suspect
> that was no coincidence in Andrew's suggestion.
>

2020-05-25 22:51:42

by Jeremy Linton

[permalink] [raw]
Subject: Re: [RFC 04/11] net: phy: Handle c22 regs presence better

Hi,

On 5/25/20 5:06 PM, Andrew Lunn wrote:
>> Yes, we know even for the NXP reference hardware, one of the phy's doesn't
>> probe out correctly because it doesn't respond to the ieee defined
>> registers. I think at this point, there really isn't anything we can do
>> about that unless we involve the (ACPI) firmware in currently nonstandard
>> behaviors.
>>
>> So, my goals here have been to first, not break anything, and then do a
>> slightly better job finding phy's that are (mostly?) responding correctly to
>> the 802.3 spec. So we can say "if you hardware is ACPI conformant, and you
>> have IEEE conformant phy's you should be ok". So, for your example phy, I
>> guess the immediate answer is "use DT" or "find a conformant phy", or even
>> "abstract it in the firmware and use a mailbox interface".
>
> Hi Jeremy
>
> You need to be careful here, when you say "use DT". With a c45 PHY
> of_mdiobus_register_phy() calls get_phy_device() to see if the device
> exists on the bus. So your changes to get_phy_device() etc, needs to
> continue to find devices it used to find, even if they are not fully
> complient to 802.3.
>

Yes, that is my "don't break anything". But, in a number of cases I
can't tell if something is an intentional "bug", or what exactly the
intended side effect actually was. The c22 bit0 sanitation is in this
bucket, because its basically disabling the MMD0 probe..

I know for sure we find phys that previously weren't found. OTOH, i'm
not sure how many that were previously "found" are now getting kicked
out by because they are doing something "bad" that looked like a bug.


2020-05-25 22:55:42

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [RFC 08/11] net: phy: Allow mdio buses to auto-probe c45 devices

On Mon, May 25, 2020 at 05:09:56PM -0500, Jeremy Linton wrote:
> Hi,
>
> On 5/25/20 3:25 AM, Russell King - ARM Linux admin wrote:
> > On Sun, May 24, 2020 at 11:28:52PM -0500, Jeremy Linton wrote:
> > > Hi,
> > >
> > > On 5/24/20 9:44 AM, Andrew Lunn wrote:
> > > > > +++ b/include/linux/phy.h
> > > > > @@ -275,6 +275,11 @@ struct mii_bus {
> > > > > int reset_delay_us;
> > > > > /* RESET GPIO descriptor pointer */
> > > > > struct gpio_desc *reset_gpiod;
> > > > > + /* bus capabilities, used for probing */
> > > > > + enum {
> > > > > + MDIOBUS_C22_ONLY = 0,
> > > > > + MDIOBUS_C45_FIRST,
> > > > > + } probe_capabilities;
> > > > > };
> > > >
> > > >
> > > > I'm not so keen on _FIRST. It suggest _LAST would also be valid. But
> > > > that then suggests this is not a bus property, but a PHY property, and
> > > > some PHYs might need _FIRST and other phys need _LAST, and then you
> > > > have a bus which has both sorts of PHY on it, and you have a problem.
> > > >
> > > > So i think it would be better to have
> > > >
> > > > enum {
> > > > MDIOBUS_UNKNOWN = 0,
> > > > MDIOBUS_C22,
> > > > MDIOBUS_C45,
> > > > MDIOBUS_C45_C22,
> > > > } bus_capabilities;
> > > >
> > > > Describe just what the bus master can support.
> > >
> > > Yes, the naming is reasonable and I will update it in the next patch. I went
> > > around a bit myself with this naming early on, and the problem I saw was
> > > that a C45 capable master, can have C45 electrical phy's that only respond
> > > to c22 requests (AFAIK).
> >
> > If you have a master that can only generate clause 45 cycles, and
> > someone is daft enough to connect a clause 22 only PHY to it, the
> > result is hardware that doesn't work - there's no getting around
> > that. The MDIO interface can't generate the appropriate cycles to
> > access the clause 22 PHY. So, this is not something we need care
> > about.
> >
> > > So the MDIOBUS_C45 (I think I was calling it
> > > C45_ONLY) is an invalid selection. Not, that it wouldn't be helpful to have
> > > a C45_ONLY case, but that the assumption is that you wouldn't try and probe
> > > c22 registers, which I thought was a mistake.
> >
> > MDIOBUS_C45 means "I can generate clause 45 cycles".
> > MDIOBUS_C22 means "I can generate clause 22 cycles".
> > MDIOBUS_C45_C22 means "I can generate both clause 45 and clause 22
> > cycles."
>
> Hi, to be clear, we are talking about c45 controllers that can access the
> c22 register space via c45, or controllers which are electrically/level
> shifting to be compatible with c22 voltages/etc?

To me, Andrew was quite clear that these flags should describe what
the MDIO controller can support, and what I understand from Andrew's
email is:

If it can't support clause 45 accesses, then it should not advertise
MDIOBUS_C45 nor MDIOBUS_C45_C22.

If it can't support clause 22 accesses, then it should not advertise
MDIOBUS_C22.

And that's all there is to it.

If you want to talk about clause 45 access via the clause 22 register
set, that is a property of the PHY, not of the MDIO controller, and
the MDIO controller has no business attempting to describe that
property in any shape or form since it is a PHY property.

If you have access to clause 22 registers, then you likely have the
clause 22 ID registers, and the way phylib currently works, that will
also match a PHY driver.

Once we have a PHY and a driver bound, then even if it is a C45
driver, accesses using the phy_*_mmd() functions will _automatically_
switch to using C22 cycles to the indirect C45 access registers if
the PHY has not been detected as a C45 PHY (phydev->is_c45 is false.)

So, I'm coming to the conclusion that you're making way more work
here than is necessary, your changes to gratuitously change the way
stuff in phylib work which is not risk-free are completely unnecessary
and really not a risk worth taking when it's likely that we already
have the code mostly in place to be able to support your PHY.

I think there are some questionable uses of phydev->is_c45 that
your point about accessing C45 PHYs through C22 indirect registers
brings up which need to be resolved, but I really don't think that's
a completely separate issue.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC for 0.8m (est. 1762m) line in suburbia: sync at 13.1Mbps down 424kbps up

2020-05-25 22:55:53

by Jeremy Linton

[permalink] [raw]
Subject: Re: [RFC 04/11] net: phy: Handle c22 regs presence better

On 5/25/20 5:01 PM, Russell King - ARM Linux admin wrote:
> On Mon, May 25, 2020 at 04:51:16PM -0500, Jeremy Linton wrote:
>> Hi,
>>
>> On 5/25/20 5:06 AM, Russell King - ARM Linux admin wrote:
>>> On Sun, May 24, 2020 at 10:34:13PM -0500, Jeremy Linton wrote:
>>>> Hi,
>>>>
>>>> On 5/23/20 1:37 PM, Russell King - ARM Linux admin wrote:
>>>>> On Fri, May 22, 2020 at 04:30:52PM -0500, Jeremy Linton wrote:
>>>>>> Until this point, we have been sanitizing the c22
>>>>>> regs presence bit out of all the MMD device lists.
>>>>>> This is incorrect as it causes the 0xFFFFFFFF checks
>>>>>> to incorrectly fail. Further, it turns out that we
>>>>>> want to utilize this flag to make a determination that
>>>>>> there is actually a phy at this location and we should
>>>>>> be accessing it using c22.
>>>>>>
>>>>>> Signed-off-by: Jeremy Linton <[email protected]>
>>>>>> ---
>>>>>> drivers/net/phy/phy_device.c | 16 +++++++++++++---
>>>>>> 1 file changed, 13 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>>>>>> index f0761fa5e40b..2d677490ecab 100644
>>>>>> --- a/drivers/net/phy/phy_device.c
>>>>>> +++ b/drivers/net/phy/phy_device.c
>>>>>> @@ -689,9 +689,6 @@ static int get_phy_c45_devs_in_pkg(struct mii_bus *bus, int addr, int dev_addr,
>>>>>> return -EIO;
>>>>>> *devices_in_package |= phy_reg;
>>>>>> - /* Bit 0 doesn't represent a device, it indicates c22 regs presence */
>>>>>> - *devices_in_package &= ~BIT(0);
>>>>>> -
>>>>>> return 0;
>>>>>> }
>>>>>> @@ -742,6 +739,8 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
>>>>>> int i;
>>>>>> const int num_ids = ARRAY_SIZE(c45_ids->device_ids);
>>>>>> u32 *devs = &c45_ids->devices_in_package;
>>>>>> + bool c22_present = false;
>>>>>> + bool valid_id = false;
>>>>>> /* Find first non-zero Devices In package. Device zero is reserved
>>>>>> * for 802.3 c45 complied PHYs, so don't probe it at first.
>>>>>> @@ -770,6 +769,10 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
>>>>>> return 0;
>>>>>> }
>>>>>> + /* Bit 0 doesn't represent a device, it indicates c22 regs presence */
>>>>>> + c22_present = *devs & BIT(0);
>>>>>> + *devs &= ~BIT(0);
>>>>>> +
>>>>>> /* Now probe Device Identifiers for each device present. */
>>>>>> for (i = 1; i < num_ids; i++) {
>>>>>> if (!(c45_ids->devices_in_package & (1 << i)))
>>>>>> @@ -778,6 +781,13 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
>>>>>> ret = _get_phy_id(bus, addr, i, &c45_ids->device_ids[i], true);
>>>>>> if (ret < 0)
>>>>>> return ret;
>>>>>> + if (valid_phy_id(c45_ids->device_ids[i]))
>>>>>> + valid_id = true;
>>>>>
>>>>> Here you are using your "devices in package" validator to validate the
>>>>> PHY ID value. One of the things it does is mask this value with
>>>>> 0x1fffffff. That means you lose some of the vendor OUI. To me, this
>>>>> looks completely wrong.
>>>>
>>>> I think in this case I was just using it like the comment in
>>>> get_phy_device() "if the phy_id is mostly F's, there is no device here".
>>>>
>>>> My understanding is that the code is trying to avoid the 0xFFFFFFFF returns
>>>> that seem to indicate "bus ok, phy didn't respond".
>>>>
>>>> I just checked the OUI registration, and while there are a couple OUI's
>>>> registered that have a number of FFF's in them, none of those cases seems to
>>>> overlap sufficiently to cause this to throw them out. Plus a phy would also
>>>> have to have model+revision set to 'F's. So while might be possible, if
>>>> unlikely, at the moment I think the OUI registration keeps this from being a
>>>> problem. Particularly, if i'm reading the mapping correctly, the OUI mapping
>>>> guarantees that the field cannot be all '1's due to the OUI having X & M
>>>> bits cleared. It sort of looks like the mapping is trying to lose those
>>>> bits, by tossing bit 1 & 2, but the X & M are in the wrong octet (AFAIK, I
>>>> just read it three times cause it didn't make any sense).
>>>
>>> I should also note that we have at least one supported PHY where one
>>> of the MMDs returns 0xfffe for even numbered registers and 0x0000 for
>>> odd numbered registers in one of the vendor MMDs for addresses 0
>>> through 0xefff - which has a bit set in the devices-in-package.
>>>
>>> It also returns 0x0082 for almost every register in MMD 2, but MMD 2's
>>> devices-in-package bit is clear in most of the valid MMDs, so we
>>> shouldn't touch it.
>>>
>>> These reveal the problem of randomly probing MMDs - they can return
>>> unexpected values and not be as well behaved as we would like them to
>>> be. Using register 8 to detect presence may be beneficial, but that
>>> may also introduce problems as we haven't used that before (and we
>>> don't know whether any PHY that wrong.) I know at least the 88x3310
>>> gets it right for all except the vendor MMDs, where the low addresses
>>> appear non-confromant to the 802.3 specs. Both vendor MMDs are
>>> definitely implemented, just not with anything conforming to 802.3.
>>
>> Yes, we know even for the NXP reference hardware, one of the phy's doesn't
>> probe out correctly because it doesn't respond to the ieee defined
>> registers. I think at this point, there really isn't anything we can do
>> about that unless we involve the (ACPI) firmware in currently nonstandard
>> behaviors.
>>
>> So, my goals here have been to first, not break anything, and then do a
>> slightly better job finding phy's that are (mostly?) responding correctly to
>> the 802.3 spec. So we can say "if you hardware is ACPI conformant, and you
>> have IEEE conformant phy's you should be ok". So, for your example phy, I
>> guess the immediate answer is "use DT" or "find a conformant phy", or even
>> "abstract it in the firmware and use a mailbox interface".
>
> You haven't understood. The PHY does conform for most of the MMDs,
> but there are a number that do not conform.

Probably...

Except that i'm not sure how that is a problem at the moment, its still
going to trigger as a found phy, and walk the same mmd list as before
requesting drivers. Those drivers haven't changed their behavior so
where is the problem? If there is a problem its in 7/11 where things are
getting kicked due to seemingly invalid Ids.

The 1/11 devices=0 case actually appears to be a bug i'm fixing because
you won't get an ID or a MMD list from that (before or after).


2020-05-25 23:08:49

by Andrew Lunn

[permalink] [raw]
Subject: Re: [RFC 04/11] net: phy: Handle c22 regs presence better

> I know for sure we find phys that previously weren't found.

That is in itself somewhat dangerous. Those using primitive
configuration systems are probably going to use phy_find_first(),
rather than an address on the bus. I always recommend against that,
because if another PHY suddenly pops up on the bus, bad things can
happen.

Andrew

2020-05-25 23:10:08

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [RFC 04/11] net: phy: Handle c22 regs presence better

On Mon, May 25, 2020 at 05:17:27PM -0500, Jeremy Linton wrote:
> Hi,
>
> On 5/25/20 5:06 PM, Andrew Lunn wrote:
> > > Yes, we know even for the NXP reference hardware, one of the phy's doesn't
> > > probe out correctly because it doesn't respond to the ieee defined
> > > registers. I think at this point, there really isn't anything we can do
> > > about that unless we involve the (ACPI) firmware in currently nonstandard
> > > behaviors.
> > >
> > > So, my goals here have been to first, not break anything, and then do a
> > > slightly better job finding phy's that are (mostly?) responding correctly to
> > > the 802.3 spec. So we can say "if you hardware is ACPI conformant, and you
> > > have IEEE conformant phy's you should be ok". So, for your example phy, I
> > > guess the immediate answer is "use DT" or "find a conformant phy", or even
> > > "abstract it in the firmware and use a mailbox interface".
> > Hi Jeremy
> >
> > You need to be careful here, when you say "use DT". With a c45 PHY
> > of_mdiobus_register_phy() calls get_phy_device() to see if the device
> > exists on the bus. So your changes to get_phy_device() etc, needs to
> > continue to find devices it used to find, even if they are not fully
> > complient to 802.3.
> >
>
> Yes, that is my "don't break anything". But, in a number of cases I can't
> tell if something is an intentional "bug", or what exactly the intended side
> effect actually was. The c22 bit0 sanitation is in this bucket, because its
> basically disabling the MMD0 probe..

I'm really not sure it causes any problem what so ever. Have you read
the commit adding cortina.c to see what it says - there is an
interesting comment about what it requires in firmware. That is, it
calls for an explicit "ethernet-phy-id" compatible in DT naming the
PHY ID, but that can't be used for Clause 45 PHYs (it will be ignored.)
So, it will be treated by the kernel as a Clause 22 PHY.

It is presently in use:

arch/powerpc/boot/dts/fsl/t4240rdb.dts: compatible = "ethernet-phy-id13e5.1002";
arch/powerpc/boot/dts/fsl/t4240rdb.dts: compatible = "ethernet-phy-id13e5.1002";
arch/powerpc/boot/dts/fsl/t4240rdb.dts: compatible = "ethernet-phy-id13e5.1002";
arch/powerpc/boot/dts/fsl/t4240rdb.dts: compatible = "ethernet-phy-id13e5.1002";
arch/powerpc/boot/dts/fsl/t2080rdb.dts: compatible = "ethernet-phy-id13e5.1002";
arch/powerpc/boot/dts/fsl/t2080rdb.dts: compatible = "ethernet-phy-id13e5.1002";

Now, given this, none of the Clause 45 PHY detection code will be
touched while probing for these PHYs, so really that work-around to
read form MMD 0 in the Clause 45 probing function really doesn't seem
to apply to these PHYs.

Next, when you read cortina.c, it becomes obvious that the PHY's MMD 0
doesn't even follow IEEE 802.3 - the ID registers are at register 0/1
not 2/3. So even if we did try to read the ID from MMD 0, we wouldn't
be reading the ID from the right registers.

Hence, I don't think anything has been broken at all by the commit
you refer to.

> I know for sure we find phys that previously weren't found. OTOH, i'm not
> sure how many that were previously "found" are now getting kicked out by
> because they are doing something "bad" that looked like a bug.

I don't think you've found any problem what so ever.

For these PHYs to be automatically probed, they need to have a DT
string identifying them as clause 45 compliant. From the DTS files
I've provided above, that isn't the case, this code path won't be
reached, so nothing has been broken. In any case, for the
reasons I mention above wrt non-standard register layout, it seems
it couldn't have worked via this probing code.

I would dig some more into the history of the change to
get_phy_c45_ids() and how it relates to the addition of the Cortina
driver, but unfortunately my machine is being painfully slow with
git log searching the history that it's way too time consuming for
me to do anything further now, but the conclusion I'm coming to is
that there has been no regression how you think there has been.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC for 0.8m (est. 1762m) line in suburbia: sync at 13.1Mbps down 424kbps up

2020-05-25 23:12:17

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [RFC 04/11] net: phy: Handle c22 regs presence better

On Mon, May 25, 2020 at 05:22:07PM -0500, Jeremy Linton wrote:
> On 5/25/20 5:01 PM, Russell King - ARM Linux admin wrote:
> > On Mon, May 25, 2020 at 04:51:16PM -0500, Jeremy Linton wrote:
> > > Hi,
> > >
> > > On 5/25/20 5:06 AM, Russell King - ARM Linux admin wrote:
> > > > On Sun, May 24, 2020 at 10:34:13PM -0500, Jeremy Linton wrote:
> > > > > Hi,
> > > > >
> > > > > On 5/23/20 1:37 PM, Russell King - ARM Linux admin wrote:
> > > > > > On Fri, May 22, 2020 at 04:30:52PM -0500, Jeremy Linton wrote:
> > > > > > > Until this point, we have been sanitizing the c22
> > > > > > > regs presence bit out of all the MMD device lists.
> > > > > > > This is incorrect as it causes the 0xFFFFFFFF checks
> > > > > > > to incorrectly fail. Further, it turns out that we
> > > > > > > want to utilize this flag to make a determination that
> > > > > > > there is actually a phy at this location and we should
> > > > > > > be accessing it using c22.
> > > > > > >
> > > > > > > Signed-off-by: Jeremy Linton <[email protected]>
> > > > > > > ---
> > > > > > > drivers/net/phy/phy_device.c | 16 +++++++++++++---
> > > > > > > 1 file changed, 13 insertions(+), 3 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> > > > > > > index f0761fa5e40b..2d677490ecab 100644
> > > > > > > --- a/drivers/net/phy/phy_device.c
> > > > > > > +++ b/drivers/net/phy/phy_device.c
> > > > > > > @@ -689,9 +689,6 @@ static int get_phy_c45_devs_in_pkg(struct mii_bus *bus, int addr, int dev_addr,
> > > > > > > return -EIO;
> > > > > > > *devices_in_package |= phy_reg;
> > > > > > > - /* Bit 0 doesn't represent a device, it indicates c22 regs presence */
> > > > > > > - *devices_in_package &= ~BIT(0);
> > > > > > > -
> > > > > > > return 0;
> > > > > > > }
> > > > > > > @@ -742,6 +739,8 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
> > > > > > > int i;
> > > > > > > const int num_ids = ARRAY_SIZE(c45_ids->device_ids);
> > > > > > > u32 *devs = &c45_ids->devices_in_package;
> > > > > > > + bool c22_present = false;
> > > > > > > + bool valid_id = false;
> > > > > > > /* Find first non-zero Devices In package. Device zero is reserved
> > > > > > > * for 802.3 c45 complied PHYs, so don't probe it at first.
> > > > > > > @@ -770,6 +769,10 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
> > > > > > > return 0;
> > > > > > > }
> > > > > > > + /* Bit 0 doesn't represent a device, it indicates c22 regs presence */
> > > > > > > + c22_present = *devs & BIT(0);
> > > > > > > + *devs &= ~BIT(0);
> > > > > > > +
> > > > > > > /* Now probe Device Identifiers for each device present. */
> > > > > > > for (i = 1; i < num_ids; i++) {
> > > > > > > if (!(c45_ids->devices_in_package & (1 << i)))
> > > > > > > @@ -778,6 +781,13 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
> > > > > > > ret = _get_phy_id(bus, addr, i, &c45_ids->device_ids[i], true);
> > > > > > > if (ret < 0)
> > > > > > > return ret;
> > > > > > > + if (valid_phy_id(c45_ids->device_ids[i]))
> > > > > > > + valid_id = true;
> > > > > >
> > > > > > Here you are using your "devices in package" validator to validate the
> > > > > > PHY ID value. One of the things it does is mask this value with
> > > > > > 0x1fffffff. That means you lose some of the vendor OUI. To me, this
> > > > > > looks completely wrong.
> > > > >
> > > > > I think in this case I was just using it like the comment in
> > > > > get_phy_device() "if the phy_id is mostly F's, there is no device here".
> > > > >
> > > > > My understanding is that the code is trying to avoid the 0xFFFFFFFF returns
> > > > > that seem to indicate "bus ok, phy didn't respond".
> > > > >
> > > > > I just checked the OUI registration, and while there are a couple OUI's
> > > > > registered that have a number of FFF's in them, none of those cases seems to
> > > > > overlap sufficiently to cause this to throw them out. Plus a phy would also
> > > > > have to have model+revision set to 'F's. So while might be possible, if
> > > > > unlikely, at the moment I think the OUI registration keeps this from being a
> > > > > problem. Particularly, if i'm reading the mapping correctly, the OUI mapping
> > > > > guarantees that the field cannot be all '1's due to the OUI having X & M
> > > > > bits cleared. It sort of looks like the mapping is trying to lose those
> > > > > bits, by tossing bit 1 & 2, but the X & M are in the wrong octet (AFAIK, I
> > > > > just read it three times cause it didn't make any sense).
> > > >
> > > > I should also note that we have at least one supported PHY where one
> > > > of the MMDs returns 0xfffe for even numbered registers and 0x0000 for
> > > > odd numbered registers in one of the vendor MMDs for addresses 0
> > > > through 0xefff - which has a bit set in the devices-in-package.
> > > >
> > > > It also returns 0x0082 for almost every register in MMD 2, but MMD 2's
> > > > devices-in-package bit is clear in most of the valid MMDs, so we
> > > > shouldn't touch it.
> > > >
> > > > These reveal the problem of randomly probing MMDs - they can return
> > > > unexpected values and not be as well behaved as we would like them to
> > > > be. Using register 8 to detect presence may be beneficial, but that
> > > > may also introduce problems as we haven't used that before (and we
> > > > don't know whether any PHY that wrong.) I know at least the 88x3310
> > > > gets it right for all except the vendor MMDs, where the low addresses
> > > > appear non-confromant to the 802.3 specs. Both vendor MMDs are
> > > > definitely implemented, just not with anything conforming to 802.3.
> > >
> > > Yes, we know even for the NXP reference hardware, one of the phy's doesn't
> > > probe out correctly because it doesn't respond to the ieee defined
> > > registers. I think at this point, there really isn't anything we can do
> > > about that unless we involve the (ACPI) firmware in currently nonstandard
> > > behaviors.
> > >
> > > So, my goals here have been to first, not break anything, and then do a
> > > slightly better job finding phy's that are (mostly?) responding correctly to
> > > the 802.3 spec. So we can say "if you hardware is ACPI conformant, and you
> > > have IEEE conformant phy's you should be ok". So, for your example phy, I
> > > guess the immediate answer is "use DT" or "find a conformant phy", or even
> > > "abstract it in the firmware and use a mailbox interface".
> >
> > You haven't understood. The PHY does conform for most of the MMDs,
> > but there are a number that do not conform.
>
> Probably...
>
> Except that i'm not sure how that is a problem at the moment, its still
> going to trigger as a found phy, and walk the same mmd list as before
> requesting drivers. Those drivers haven't changed their behavior so where is
> the problem? If there is a problem its in 7/11 where things are getting
> kicked due to seemingly invalid Ids.
>
> The 1/11 devices=0 case actually appears to be a bug i'm fixing because you
> won't get an ID or a MMD list from that (before or after).

I think I've just flattened that argument in my immediately preceding
reply on the Cortina situation; I think you've grossly misread that
through not fully researching the history and then finding the
existing users.

There is no bug that you are fixing from what I can see.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC for 0.8m (est. 1762m) line in suburbia: sync at 13.1Mbps down 424kbps up

2020-05-25 23:15:13

by Andrew Lunn

[permalink] [raw]
Subject: Re: [RFC 04/11] net: phy: Handle c22 regs presence better

> arch/powerpc/boot/dts/fsl/t4240rdb.dts: compatible = "ethernet-phy-id13e5.1002";
> arch/powerpc/boot/dts/fsl/t4240rdb.dts: compatible = "ethernet-phy-id13e5.1002";
> arch/powerpc/boot/dts/fsl/t4240rdb.dts: compatible = "ethernet-phy-id13e5.1002";
> arch/powerpc/boot/dts/fsl/t4240rdb.dts: compatible = "ethernet-phy-id13e5.1002";
> arch/powerpc/boot/dts/fsl/t2080rdb.dts: compatible = "ethernet-phy-id13e5.1002";
> arch/powerpc/boot/dts/fsl/t2080rdb.dts: compatible = "ethernet-phy-id13e5.1002";

Hi Jeremy

You are doing this work for NXP right? Maybe you can ask them to go
searching in the cellar and find you one of these boards?

Andrew

2020-05-25 23:20:28

by Jeremy Linton

[permalink] [raw]
Subject: Re: [RFC 04/11] net: phy: Handle c22 regs presence better

Hi,

On 5/25/20 5:01 PM, Russell King - ARM Linux admin wrote:
> On Mon, May 25, 2020 at 04:51:16PM -0500, Jeremy Linton wrote:
>> Hi,
>>
>> On 5/25/20 5:06 AM, Russell King - ARM Linux admin wrote:
>>> On Sun, May 24, 2020 at 10:34:13PM -0500, Jeremy Linton wrote:
>>>> Hi,
>>>>
>>>> On 5/23/20 1:37 PM, Russell King - ARM Linux admin wrote:
>>>>> On Fri, May 22, 2020 at 04:30:52PM -0500, Jeremy Linton wrote:
>>>>>> Until this point, we have been sanitizing the c22
>>>>>> regs presence bit out of all the MMD device lists.
>>>>>> This is incorrect as it causes the 0xFFFFFFFF checks
>>>>>> to incorrectly fail. Further, it turns out that we
>>>>>> want to utilize this flag to make a determination that
>>>>>> there is actually a phy at this location and we should
>>>>>> be accessing it using c22.
>>>>>>
>>>>>> Signed-off-by: Jeremy Linton <[email protected]>
>>>>>> ---
>>>>>> drivers/net/phy/phy_device.c | 16 +++++++++++++---
>>>>>> 1 file changed, 13 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>>>>>> index f0761fa5e40b..2d677490ecab 100644
>>>>>> --- a/drivers/net/phy/phy_device.c
>>>>>> +++ b/drivers/net/phy/phy_device.c
>>>>>> @@ -689,9 +689,6 @@ static int get_phy_c45_devs_in_pkg(struct mii_bus *bus, int addr, int dev_addr,
>>>>>> return -EIO;
>>>>>> *devices_in_package |= phy_reg;
>>>>>> - /* Bit 0 doesn't represent a device, it indicates c22 regs presence */
>>>>>> - *devices_in_package &= ~BIT(0);
>>>>>> -
>>>>>> return 0;
>>>>>> }
>>>>>> @@ -742,6 +739,8 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
>>>>>> int i;
>>>>>> const int num_ids = ARRAY_SIZE(c45_ids->device_ids);
>>>>>> u32 *devs = &c45_ids->devices_in_package;
>>>>>> + bool c22_present = false;
>>>>>> + bool valid_id = false;
>>>>>> /* Find first non-zero Devices In package. Device zero is reserved
>>>>>> * for 802.3 c45 complied PHYs, so don't probe it at first.
>>>>>> @@ -770,6 +769,10 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
>>>>>> return 0;
>>>>>> }
>>>>>> + /* Bit 0 doesn't represent a device, it indicates c22 regs presence */
>>>>>> + c22_present = *devs & BIT(0);
>>>>>> + *devs &= ~BIT(0);
>>>>>> +
>>>>>> /* Now probe Device Identifiers for each device present. */
>>>>>> for (i = 1; i < num_ids; i++) {
>>>>>> if (!(c45_ids->devices_in_package & (1 << i)))
>>>>>> @@ -778,6 +781,13 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
>>>>>> ret = _get_phy_id(bus, addr, i, &c45_ids->device_ids[i], true);
>>>>>> if (ret < 0)
>>>>>> return ret;
>>>>>> + if (valid_phy_id(c45_ids->device_ids[i]))
>>>>>> + valid_id = true;
>>>>>
>>>>> Here you are using your "devices in package" validator to validate the
>>>>> PHY ID value. One of the things it does is mask this value with
>>>>> 0x1fffffff. That means you lose some of the vendor OUI. To me, this
>>>>> looks completely wrong.
>>>>
>>>> I think in this case I was just using it like the comment in
>>>> get_phy_device() "if the phy_id is mostly F's, there is no device here".
>>>>
>>>> My understanding is that the code is trying to avoid the 0xFFFFFFFF returns
>>>> that seem to indicate "bus ok, phy didn't respond".
>>>>
>>>> I just checked the OUI registration, and while there are a couple OUI's
>>>> registered that have a number of FFF's in them, none of those cases seems to
>>>> overlap sufficiently to cause this to throw them out. Plus a phy would also
>>>> have to have model+revision set to 'F's. So while might be possible, if
>>>> unlikely, at the moment I think the OUI registration keeps this from being a
>>>> problem. Particularly, if i'm reading the mapping correctly, the OUI mapping
>>>> guarantees that the field cannot be all '1's due to the OUI having X & M
>>>> bits cleared. It sort of looks like the mapping is trying to lose those
>>>> bits, by tossing bit 1 & 2, but the X & M are in the wrong octet (AFAIK, I
>>>> just read it three times cause it didn't make any sense).
>>>
>>> I should also note that we have at least one supported PHY where one
>>> of the MMDs returns 0xfffe for even numbered registers and 0x0000 for
>>> odd numbered registers in one of the vendor MMDs for addresses 0
>>> through 0xefff - which has a bit set in the devices-in-package.
>>>
>>> It also returns 0x0082 for almost every register in MMD 2, but MMD 2's
>>> devices-in-package bit is clear in most of the valid MMDs, so we
>>> shouldn't touch it.
>>>
>>> These reveal the problem of randomly probing MMDs - they can return
>>> unexpected values and not be as well behaved as we would like them to
>>> be. Using register 8 to detect presence may be beneficial, but that
>>> may also introduce problems as we haven't used that before (and we
>>> don't know whether any PHY that wrong.) I know at least the 88x3310
>>> gets it right for all except the vendor MMDs, where the low addresses
>>> appear non-confromant to the 802.3 specs. Both vendor MMDs are
>>> definitely implemented, just not with anything conforming to 802.3.
>>
>> Yes, we know even for the NXP reference hardware, one of the phy's doesn't
>> probe out correctly because it doesn't respond to the ieee defined
>> registers. I think at this point, there really isn't anything we can do
>> about that unless we involve the (ACPI) firmware in currently nonstandard
>> behaviors.
>>
>> So, my goals here have been to first, not break anything, and then do a
>> slightly better job finding phy's that are (mostly?) responding correctly to
>> the 802.3 spec. So we can say "if you hardware is ACPI conformant, and you
>> have IEEE conformant phy's you should be ok". So, for your example phy, I
>> guess the immediate answer is "use DT" or "find a conformant phy", or even
>> "abstract it in the firmware and use a mailbox interface".
>
> You haven't understood. The PHY does conform for most of the MMDs,
> but there are a number that do not conform.
>

Maybe I should clarify. This set is still terminating the search for a
valid MMD device list on the first MMD that responds. It then probes the
same ID registers of the flagged MMDs as before. What has changed is
that we search higher into the MMD address space for a device list. So
previously if a device didn't respond to MMD0-8 it was ignored. Now it
needs to fail all of 0-31 to be ignored. Similarly for the ID's, if we
find what appears to be a valid MMD device list, then we will probe not
only the original 1-8 MMDs for IDs, but 1-31 MMDs for IDs.

So any device which presented a non zero, non "mostly ff's" devices list
in 0-8 will get the same device list as before. Similarly we probe for
ids at the same MMD addresses as before, with additional MMD detection
>8. This change means we pick up additional phys, and we detect the
correct MMD ids for more devices.

The possible negative differences are in 7/11 where we transition a
device which responded with an ID=0 to 0xFFFFFFFF which will cause the
code to not request a phy/mmd driver for 00000000 (or potentially some
bogus OUI's due to the 1fffffff difference). Assuming a valid phy id
gets applied somewhere, the same phy driver will load, and presumably it
will know what to do with MMD's in the devices list.

So, I don't see anything in your example above which changes the
detected MMD devices. Potentially though, not only will you get the
previous MMD Id's, but you might get a few new ones. Whether MMD2 is
probed is going to depend on whether MMD0/1's devices list has the MMD2
device bit set (you weren't clear in the description above).

2020-05-25 23:32:26

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [RFC 01/11] net: phy: Don't report success if devices weren't found

On Mon, May 25, 2020 at 04:02:13PM -0500, Jeremy Linton wrote:
> > So, I think you're going to have to add a work-around to ignore bit 0,
> > which brings up the question whether this is worth it or not.
>
> It does ignore bit 0, it gets turned into the C22 regs flag, and
> cleared/ignored in the remainder of the code (do to MMD loop indexes
> starting at 1).

However, I've already pointed out that that isn't the case in a
number of functions that I listed in another email, and I suspect
was glossed over.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC for 0.8m (est. 1762m) line in suburbia: sync at 13.1Mbps down 424kbps up

2020-05-25 23:34:45

by Jeremy Linton

[permalink] [raw]
Subject: Re: [RFC 04/11] net: phy: Handle c22 regs presence better

On 5/25/20 6:09 PM, Russell King - ARM Linux admin wrote:
> On Mon, May 25, 2020 at 05:22:07PM -0500, Jeremy Linton wrote:
>> On 5/25/20 5:01 PM, Russell King - ARM Linux admin wrote:
>>> On Mon, May 25, 2020 at 04:51:16PM -0500, Jeremy Linton wrote:
>>>> Hi,
>>>>
>>>> On 5/25/20 5:06 AM, Russell King - ARM Linux admin wrote:
>>>>> On Sun, May 24, 2020 at 10:34:13PM -0500, Jeremy Linton wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 5/23/20 1:37 PM, Russell King - ARM Linux admin wrote:
>>>>>>> On Fri, May 22, 2020 at 04:30:52PM -0500, Jeremy Linton wrote:
>>>>>>>> Until this point, we have been sanitizing the c22
>>>>>>>> regs presence bit out of all the MMD device lists.
>>>>>>>> This is incorrect as it causes the 0xFFFFFFFF checks
>>>>>>>> to incorrectly fail. Further, it turns out that we
>>>>>>>> want to utilize this flag to make a determination that
>>>>>>>> there is actually a phy at this location and we should
>>>>>>>> be accessing it using c22.
>>>>>>>>
>>>>>>>> Signed-off-by: Jeremy Linton <[email protected]>
>>>>>>>> ---
>>>>>>>> drivers/net/phy/phy_device.c | 16 +++++++++++++---
>>>>>>>> 1 file changed, 13 insertions(+), 3 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>>>>>>>> index f0761fa5e40b..2d677490ecab 100644
>>>>>>>> --- a/drivers/net/phy/phy_device.c
>>>>>>>> +++ b/drivers/net/phy/phy_device.c
>>>>>>>> @@ -689,9 +689,6 @@ static int get_phy_c45_devs_in_pkg(struct mii_bus *bus, int addr, int dev_addr,
>>>>>>>> return -EIO;
>>>>>>>> *devices_in_package |= phy_reg;
>>>>>>>> - /* Bit 0 doesn't represent a device, it indicates c22 regs presence */
>>>>>>>> - *devices_in_package &= ~BIT(0);
>>>>>>>> -
>>>>>>>> return 0;
>>>>>>>> }
>>>>>>>> @@ -742,6 +739,8 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
>>>>>>>> int i;
>>>>>>>> const int num_ids = ARRAY_SIZE(c45_ids->device_ids);
>>>>>>>> u32 *devs = &c45_ids->devices_in_package;
>>>>>>>> + bool c22_present = false;
>>>>>>>> + bool valid_id = false;
>>>>>>>> /* Find first non-zero Devices In package. Device zero is reserved
>>>>>>>> * for 802.3 c45 complied PHYs, so don't probe it at first.
>>>>>>>> @@ -770,6 +769,10 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
>>>>>>>> return 0;
>>>>>>>> }
>>>>>>>> + /* Bit 0 doesn't represent a device, it indicates c22 regs presence */
>>>>>>>> + c22_present = *devs & BIT(0);
>>>>>>>> + *devs &= ~BIT(0);
>>>>>>>> +
>>>>>>>> /* Now probe Device Identifiers for each device present. */
>>>>>>>> for (i = 1; i < num_ids; i++) {
>>>>>>>> if (!(c45_ids->devices_in_package & (1 << i)))
>>>>>>>> @@ -778,6 +781,13 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
>>>>>>>> ret = _get_phy_id(bus, addr, i, &c45_ids->device_ids[i], true);
>>>>>>>> if (ret < 0)
>>>>>>>> return ret;
>>>>>>>> + if (valid_phy_id(c45_ids->device_ids[i]))
>>>>>>>> + valid_id = true;
>>>>>>>
>>>>>>> Here you are using your "devices in package" validator to validate the
>>>>>>> PHY ID value. One of the things it does is mask this value with
>>>>>>> 0x1fffffff. That means you lose some of the vendor OUI. To me, this
>>>>>>> looks completely wrong.
>>>>>>
>>>>>> I think in this case I was just using it like the comment in
>>>>>> get_phy_device() "if the phy_id is mostly F's, there is no device here".
>>>>>>
>>>>>> My understanding is that the code is trying to avoid the 0xFFFFFFFF returns
>>>>>> that seem to indicate "bus ok, phy didn't respond".
>>>>>>
>>>>>> I just checked the OUI registration, and while there are a couple OUI's
>>>>>> registered that have a number of FFF's in them, none of those cases seems to
>>>>>> overlap sufficiently to cause this to throw them out. Plus a phy would also
>>>>>> have to have model+revision set to 'F's. So while might be possible, if
>>>>>> unlikely, at the moment I think the OUI registration keeps this from being a
>>>>>> problem. Particularly, if i'm reading the mapping correctly, the OUI mapping
>>>>>> guarantees that the field cannot be all '1's due to the OUI having X & M
>>>>>> bits cleared. It sort of looks like the mapping is trying to lose those
>>>>>> bits, by tossing bit 1 & 2, but the X & M are in the wrong octet (AFAIK, I
>>>>>> just read it three times cause it didn't make any sense).
>>>>>
>>>>> I should also note that we have at least one supported PHY where one
>>>>> of the MMDs returns 0xfffe for even numbered registers and 0x0000 for
>>>>> odd numbered registers in one of the vendor MMDs for addresses 0
>>>>> through 0xefff - which has a bit set in the devices-in-package.
>>>>>
>>>>> It also returns 0x0082 for almost every register in MMD 2, but MMD 2's
>>>>> devices-in-package bit is clear in most of the valid MMDs, so we
>>>>> shouldn't touch it.
>>>>>
>>>>> These reveal the problem of randomly probing MMDs - they can return
>>>>> unexpected values and not be as well behaved as we would like them to
>>>>> be. Using register 8 to detect presence may be beneficial, but that
>>>>> may also introduce problems as we haven't used that before (and we
>>>>> don't know whether any PHY that wrong.) I know at least the 88x3310
>>>>> gets it right for all except the vendor MMDs, where the low addresses
>>>>> appear non-confromant to the 802.3 specs. Both vendor MMDs are
>>>>> definitely implemented, just not with anything conforming to 802.3.
>>>>
>>>> Yes, we know even for the NXP reference hardware, one of the phy's doesn't
>>>> probe out correctly because it doesn't respond to the ieee defined
>>>> registers. I think at this point, there really isn't anything we can do
>>>> about that unless we involve the (ACPI) firmware in currently nonstandard
>>>> behaviors.
>>>>
>>>> So, my goals here have been to first, not break anything, and then do a
>>>> slightly better job finding phy's that are (mostly?) responding correctly to
>>>> the 802.3 spec. So we can say "if you hardware is ACPI conformant, and you
>>>> have IEEE conformant phy's you should be ok". So, for your example phy, I
>>>> guess the immediate answer is "use DT" or "find a conformant phy", or even
>>>> "abstract it in the firmware and use a mailbox interface".
>>>
>>> You haven't understood. The PHY does conform for most of the MMDs,
>>> but there are a number that do not conform.
>>
>> Probably...
>>
>> Except that i'm not sure how that is a problem at the moment, its still
>> going to trigger as a found phy, and walk the same mmd list as before
>> requesting drivers. Those drivers haven't changed their behavior so where is
>> the problem? If there is a problem its in 7/11 where things are getting
>> kicked due to seemingly invalid Ids.
>>
>> The 1/11 devices=0 case actually appears to be a bug i'm fixing because you
>> won't get an ID or a MMD list from that (before or after).
>
> I think I've just flattened that argument in my immediately preceding
> reply on the Cortina situation; I think you've grossly misread that
> through not fully researching the history and then finding the
> existing users.
>
> There is no bug that you are fixing from what I can see.

One of us is missing something,

The "cortina" solution is broken in the current kernel. That is because
lines 726-742 are dead code due to line 693.

I believe I've understood the problem there, and corrected it in this
set along with a few others, but its distinctly possible that isn't true.


2020-05-25 23:36:01

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [RFC 04/11] net: phy: Handle c22 regs presence better

On Mon, May 25, 2020 at 06:22:19PM -0500, Jeremy Linton wrote:
> On 5/25/20 6:09 PM, Russell King - ARM Linux admin wrote:
> > On Mon, May 25, 2020 at 05:22:07PM -0500, Jeremy Linton wrote:
> > > On 5/25/20 5:01 PM, Russell King - ARM Linux admin wrote:
> > > > On Mon, May 25, 2020 at 04:51:16PM -0500, Jeremy Linton wrote:
> > > > > Hi,
> > > > >
> > > > > On 5/25/20 5:06 AM, Russell King - ARM Linux admin wrote:
> > > > > > On Sun, May 24, 2020 at 10:34:13PM -0500, Jeremy Linton wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > On 5/23/20 1:37 PM, Russell King - ARM Linux admin wrote:
> > > > > > > > On Fri, May 22, 2020 at 04:30:52PM -0500, Jeremy Linton wrote:
> > > > > > > > > Until this point, we have been sanitizing the c22
> > > > > > > > > regs presence bit out of all the MMD device lists.
> > > > > > > > > This is incorrect as it causes the 0xFFFFFFFF checks
> > > > > > > > > to incorrectly fail. Further, it turns out that we
> > > > > > > > > want to utilize this flag to make a determination that
> > > > > > > > > there is actually a phy at this location and we should
> > > > > > > > > be accessing it using c22.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Jeremy Linton <[email protected]>
> > > > > > > > > ---
> > > > > > > > > drivers/net/phy/phy_device.c | 16 +++++++++++++---
> > > > > > > > > 1 file changed, 13 insertions(+), 3 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> > > > > > > > > index f0761fa5e40b..2d677490ecab 100644
> > > > > > > > > --- a/drivers/net/phy/phy_device.c
> > > > > > > > > +++ b/drivers/net/phy/phy_device.c
> > > > > > > > > @@ -689,9 +689,6 @@ static int get_phy_c45_devs_in_pkg(struct mii_bus *bus, int addr, int dev_addr,
> > > > > > > > > return -EIO;
> > > > > > > > > *devices_in_package |= phy_reg;
> > > > > > > > > - /* Bit 0 doesn't represent a device, it indicates c22 regs presence */
> > > > > > > > > - *devices_in_package &= ~BIT(0);
> > > > > > > > > -
> > > > > > > > > return 0;
> > > > > > > > > }
> > > > > > > > > @@ -742,6 +739,8 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
> > > > > > > > > int i;
> > > > > > > > > const int num_ids = ARRAY_SIZE(c45_ids->device_ids);
> > > > > > > > > u32 *devs = &c45_ids->devices_in_package;
> > > > > > > > > + bool c22_present = false;
> > > > > > > > > + bool valid_id = false;
> > > > > > > > > /* Find first non-zero Devices In package. Device zero is reserved
> > > > > > > > > * for 802.3 c45 complied PHYs, so don't probe it at first.
> > > > > > > > > @@ -770,6 +769,10 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
> > > > > > > > > return 0;
> > > > > > > > > }
> > > > > > > > > + /* Bit 0 doesn't represent a device, it indicates c22 regs presence */
> > > > > > > > > + c22_present = *devs & BIT(0);
> > > > > > > > > + *devs &= ~BIT(0);
> > > > > > > > > +
> > > > > > > > > /* Now probe Device Identifiers for each device present. */
> > > > > > > > > for (i = 1; i < num_ids; i++) {
> > > > > > > > > if (!(c45_ids->devices_in_package & (1 << i)))
> > > > > > > > > @@ -778,6 +781,13 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
> > > > > > > > > ret = _get_phy_id(bus, addr, i, &c45_ids->device_ids[i], true);
> > > > > > > > > if (ret < 0)
> > > > > > > > > return ret;
> > > > > > > > > + if (valid_phy_id(c45_ids->device_ids[i]))
> > > > > > > > > + valid_id = true;
> > > > > > > >
> > > > > > > > Here you are using your "devices in package" validator to validate the
> > > > > > > > PHY ID value. One of the things it does is mask this value with
> > > > > > > > 0x1fffffff. That means you lose some of the vendor OUI. To me, this
> > > > > > > > looks completely wrong.
> > > > > > >
> > > > > > > I think in this case I was just using it like the comment in
> > > > > > > get_phy_device() "if the phy_id is mostly F's, there is no device here".
> > > > > > >
> > > > > > > My understanding is that the code is trying to avoid the 0xFFFFFFFF returns
> > > > > > > that seem to indicate "bus ok, phy didn't respond".
> > > > > > >
> > > > > > > I just checked the OUI registration, and while there are a couple OUI's
> > > > > > > registered that have a number of FFF's in them, none of those cases seems to
> > > > > > > overlap sufficiently to cause this to throw them out. Plus a phy would also
> > > > > > > have to have model+revision set to 'F's. So while might be possible, if
> > > > > > > unlikely, at the moment I think the OUI registration keeps this from being a
> > > > > > > problem. Particularly, if i'm reading the mapping correctly, the OUI mapping
> > > > > > > guarantees that the field cannot be all '1's due to the OUI having X & M
> > > > > > > bits cleared. It sort of looks like the mapping is trying to lose those
> > > > > > > bits, by tossing bit 1 & 2, but the X & M are in the wrong octet (AFAIK, I
> > > > > > > just read it three times cause it didn't make any sense).
> > > > > >
> > > > > > I should also note that we have at least one supported PHY where one
> > > > > > of the MMDs returns 0xfffe for even numbered registers and 0x0000 for
> > > > > > odd numbered registers in one of the vendor MMDs for addresses 0
> > > > > > through 0xefff - which has a bit set in the devices-in-package.
> > > > > >
> > > > > > It also returns 0x0082 for almost every register in MMD 2, but MMD 2's
> > > > > > devices-in-package bit is clear in most of the valid MMDs, so we
> > > > > > shouldn't touch it.
> > > > > >
> > > > > > These reveal the problem of randomly probing MMDs - they can return
> > > > > > unexpected values and not be as well behaved as we would like them to
> > > > > > be. Using register 8 to detect presence may be beneficial, but that
> > > > > > may also introduce problems as we haven't used that before (and we
> > > > > > don't know whether any PHY that wrong.) I know at least the 88x3310
> > > > > > gets it right for all except the vendor MMDs, where the low addresses
> > > > > > appear non-confromant to the 802.3 specs. Both vendor MMDs are
> > > > > > definitely implemented, just not with anything conforming to 802.3.
> > > > >
> > > > > Yes, we know even for the NXP reference hardware, one of the phy's doesn't
> > > > > probe out correctly because it doesn't respond to the ieee defined
> > > > > registers. I think at this point, there really isn't anything we can do
> > > > > about that unless we involve the (ACPI) firmware in currently nonstandard
> > > > > behaviors.
> > > > >
> > > > > So, my goals here have been to first, not break anything, and then do a
> > > > > slightly better job finding phy's that are (mostly?) responding correctly to
> > > > > the 802.3 spec. So we can say "if you hardware is ACPI conformant, and you
> > > > > have IEEE conformant phy's you should be ok". So, for your example phy, I
> > > > > guess the immediate answer is "use DT" or "find a conformant phy", or even
> > > > > "abstract it in the firmware and use a mailbox interface".
> > > >
> > > > You haven't understood. The PHY does conform for most of the MMDs,
> > > > but there are a number that do not conform.
> > >
> > > Probably...
> > >
> > > Except that i'm not sure how that is a problem at the moment, its still
> > > going to trigger as a found phy, and walk the same mmd list as before
> > > requesting drivers. Those drivers haven't changed their behavior so where is
> > > the problem? If there is a problem its in 7/11 where things are getting
> > > kicked due to seemingly invalid Ids.
> > >
> > > The 1/11 devices=0 case actually appears to be a bug i'm fixing because you
> > > won't get an ID or a MMD list from that (before or after).
> >
> > I think I've just flattened that argument in my immediately preceding
> > reply on the Cortina situation; I think you've grossly misread that
> > through not fully researching the history and then finding the
> > existing users.
> >
> > There is no bug that you are fixing from what I can see.
>
> One of us is missing something,
>
> The "cortina" solution is broken in the current kernel. That is because
> lines 726-742 are dead code due to line 693.
>
> I believe I've understood the problem there, and corrected it in this set
> along with a few others, but its distinctly possible that isn't true.

The code you refer to above is NOT used on the platforms that I have
identified use the Cortina PHY. If this code is not used, it has not
caused any issue, and there is no breakage due to the change you are
referring to.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC for 0.8m (est. 1762m) line in suburbia: sync at 13.1Mbps down 424kbps up

2020-05-25 23:37:19

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [RFC 04/11] net: phy: Handle c22 regs presence better

On Mon, May 25, 2020 at 06:16:18PM -0500, Jeremy Linton wrote:
> Hi,
>
> On 5/25/20 5:01 PM, Russell King - ARM Linux admin wrote:
> > On Mon, May 25, 2020 at 04:51:16PM -0500, Jeremy Linton wrote:
> > > So, my goals here have been to first, not break anything, and then do a
> > > slightly better job finding phy's that are (mostly?) responding correctly to
> > > the 802.3 spec. So we can say "if you hardware is ACPI conformant, and you
> > > have IEEE conformant phy's you should be ok". So, for your example phy, I
> > > guess the immediate answer is "use DT" or "find a conformant phy", or even
> > > "abstract it in the firmware and use a mailbox interface".
> >
> > You haven't understood. The PHY does conform for most of the MMDs,
> > but there are a number that do not conform.
> >
>
> Maybe I should clarify. This set is still terminating the search for a valid
> MMD device list on the first MMD that responds. It then probes the same ID
> registers of the flagged MMDs as before. What has changed is that we search
> higher into the MMD address space for a device list. So previously if a
> device didn't respond to MMD0-8 it was ignored. Now it needs to fail all of
> 0-31 to be ignored. Similarly for the ID's, if we find what appears to be a
> valid MMD device list, then we will probe not only the original 1-8 MMDs for
> IDs, but 1-31 MMDs for IDs.

Clarification is not required; I understand what you're doing, but you
are not understanding my points.

For the 88x3310, your change means that the list of IDs for this PHY
will not only 0x002b09aX, 0x01410daX (the official IDs), but also
0x00000000 and 0xfffe0000 from MMD 30 and 31 respectively, which are
not real IDs. That's two incorrect IDs that should actually not be
there.

Here's what the first few registers from MMD 30 and 31 look like on
this PHY:

MMD30:
Addr Data
0000 0000 0000 0000 0000 0000 0000 0000 0000
0008 0000 0000 0000 0000 0000 0000 0000 0000
0010 0000 0000 0000 0000 0000 0000 0000 0000

MMD31:
0000 fffe 0000 fffe 0000 fffe 0000 fffe 0000
0008 fffe 0000 fffe 0000 fffe 0000 fffe 0000
0010 fffe 0000 fffe 0000 fffe 0000 fffe 0000

We've got away with it so far on several counts:
1. The code doesn't probe that high for IDs.
2. We have no driver that may match those IDs.

You're taking away (1), so now all it takes is for condition (2) to
be broken, and we can end up with a regression if another driver
appears that may match using those.

So, I would suggest that you avoid reading IDs from MMD 30/31, or
maybe only read the ID from MMDs > 8 if register 8 indicates that
there is a device present at that MMD. That would be compliant
with IEEE 802.3, preserve our existing behaviour, while allowing
us to expand the IDs that we probe for to have a better chance of
only detecting truely valid IDs.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC for 0.8m (est. 1762m) line in suburbia: sync at 13.1Mbps down 424kbps up

2020-05-25 23:45:01

by Jeremy Linton

[permalink] [raw]
Subject: Re: [RFC 04/11] net: phy: Handle c22 regs presence better

Hi,

On 5/25/20 6:33 PM, Russell King - ARM Linux admin wrote:
> On Mon, May 25, 2020 at 06:22:19PM -0500, Jeremy Linton wrote:
>> On 5/25/20 6:09 PM, Russell King - ARM Linux admin wrote:
>>> On Mon, May 25, 2020 at 05:22:07PM -0500, Jeremy Linton wrote:
>>>> On 5/25/20 5:01 PM, Russell King - ARM Linux admin wrote:
>>>>> On Mon, May 25, 2020 at 04:51:16PM -0500, Jeremy Linton wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 5/25/20 5:06 AM, Russell King - ARM Linux admin wrote:
>>>>>>> On Sun, May 24, 2020 at 10:34:13PM -0500, Jeremy Linton wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On 5/23/20 1:37 PM, Russell King - ARM Linux admin wrote:
>>>>>>>>> On Fri, May 22, 2020 at 04:30:52PM -0500, Jeremy Linton wrote:
>>>>>>>>>> Until this point, we have been sanitizing the c22
>>>>>>>>>> regs presence bit out of all the MMD device lists.
>>>>>>>>>> This is incorrect as it causes the 0xFFFFFFFF checks
>>>>>>>>>> to incorrectly fail. Further, it turns out that we
>>>>>>>>>> want to utilize this flag to make a determination that
>>>>>>>>>> there is actually a phy at this location and we should
>>>>>>>>>> be accessing it using c22.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Jeremy Linton <[email protected]>
>>>>>>>>>> ---
>>>>>>>>>> drivers/net/phy/phy_device.c | 16 +++++++++++++---
>>>>>>>>>> 1 file changed, 13 insertions(+), 3 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>>>>>>>>>> index f0761fa5e40b..2d677490ecab 100644
>>>>>>>>>> --- a/drivers/net/phy/phy_device.c
>>>>>>>>>> +++ b/drivers/net/phy/phy_device.c
>>>>>>>>>> @@ -689,9 +689,6 @@ static int get_phy_c45_devs_in_pkg(struct mii_bus *bus, int addr, int dev_addr,
>>>>>>>>>> return -EIO;
>>>>>>>>>> *devices_in_package |= phy_reg;
>>>>>>>>>> - /* Bit 0 doesn't represent a device, it indicates c22 regs presence */
>>>>>>>>>> - *devices_in_package &= ~BIT(0);
>>>>>>>>>> -
>>>>>>>>>> return 0;
>>>>>>>>>> }
>>>>>>>>>> @@ -742,6 +739,8 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
>>>>>>>>>> int i;
>>>>>>>>>> const int num_ids = ARRAY_SIZE(c45_ids->device_ids);
>>>>>>>>>> u32 *devs = &c45_ids->devices_in_package;
>>>>>>>>>> + bool c22_present = false;
>>>>>>>>>> + bool valid_id = false;
>>>>>>>>>> /* Find first non-zero Devices In package. Device zero is reserved
>>>>>>>>>> * for 802.3 c45 complied PHYs, so don't probe it at first.
>>>>>>>>>> @@ -770,6 +769,10 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
>>>>>>>>>> return 0;
>>>>>>>>>> }
>>>>>>>>>> + /* Bit 0 doesn't represent a device, it indicates c22 regs presence */
>>>>>>>>>> + c22_present = *devs & BIT(0);
>>>>>>>>>> + *devs &= ~BIT(0);
>>>>>>>>>> +
>>>>>>>>>> /* Now probe Device Identifiers for each device present. */
>>>>>>>>>> for (i = 1; i < num_ids; i++) {
>>>>>>>>>> if (!(c45_ids->devices_in_package & (1 << i)))
>>>>>>>>>> @@ -778,6 +781,13 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
>>>>>>>>>> ret = _get_phy_id(bus, addr, i, &c45_ids->device_ids[i], true);
>>>>>>>>>> if (ret < 0)
>>>>>>>>>> return ret;
>>>>>>>>>> + if (valid_phy_id(c45_ids->device_ids[i]))
>>>>>>>>>> + valid_id = true;
>>>>>>>>>
>>>>>>>>> Here you are using your "devices in package" validator to validate the
>>>>>>>>> PHY ID value. One of the things it does is mask this value with
>>>>>>>>> 0x1fffffff. That means you lose some of the vendor OUI. To me, this
>>>>>>>>> looks completely wrong.
>>>>>>>>
>>>>>>>> I think in this case I was just using it like the comment in
>>>>>>>> get_phy_device() "if the phy_id is mostly F's, there is no device here".
>>>>>>>>
>>>>>>>> My understanding is that the code is trying to avoid the 0xFFFFFFFF returns
>>>>>>>> that seem to indicate "bus ok, phy didn't respond".
>>>>>>>>
>>>>>>>> I just checked the OUI registration, and while there are a couple OUI's
>>>>>>>> registered that have a number of FFF's in them, none of those cases seems to
>>>>>>>> overlap sufficiently to cause this to throw them out. Plus a phy would also
>>>>>>>> have to have model+revision set to 'F's. So while might be possible, if
>>>>>>>> unlikely, at the moment I think the OUI registration keeps this from being a
>>>>>>>> problem. Particularly, if i'm reading the mapping correctly, the OUI mapping
>>>>>>>> guarantees that the field cannot be all '1's due to the OUI having X & M
>>>>>>>> bits cleared. It sort of looks like the mapping is trying to lose those
>>>>>>>> bits, by tossing bit 1 & 2, but the X & M are in the wrong octet (AFAIK, I
>>>>>>>> just read it three times cause it didn't make any sense).
>>>>>>>
>>>>>>> I should also note that we have at least one supported PHY where one
>>>>>>> of the MMDs returns 0xfffe for even numbered registers and 0x0000 for
>>>>>>> odd numbered registers in one of the vendor MMDs for addresses 0
>>>>>>> through 0xefff - which has a bit set in the devices-in-package.
>>>>>>>
>>>>>>> It also returns 0x0082 for almost every register in MMD 2, but MMD 2's
>>>>>>> devices-in-package bit is clear in most of the valid MMDs, so we
>>>>>>> shouldn't touch it.
>>>>>>>
>>>>>>> These reveal the problem of randomly probing MMDs - they can return
>>>>>>> unexpected values and not be as well behaved as we would like them to
>>>>>>> be. Using register 8 to detect presence may be beneficial, but that
>>>>>>> may also introduce problems as we haven't used that before (and we
>>>>>>> don't know whether any PHY that wrong.) I know at least the 88x3310
>>>>>>> gets it right for all except the vendor MMDs, where the low addresses
>>>>>>> appear non-confromant to the 802.3 specs. Both vendor MMDs are
>>>>>>> definitely implemented, just not with anything conforming to 802.3.
>>>>>>
>>>>>> Yes, we know even for the NXP reference hardware, one of the phy's doesn't
>>>>>> probe out correctly because it doesn't respond to the ieee defined
>>>>>> registers. I think at this point, there really isn't anything we can do
>>>>>> about that unless we involve the (ACPI) firmware in currently nonstandard
>>>>>> behaviors.
>>>>>>
>>>>>> So, my goals here have been to first, not break anything, and then do a
>>>>>> slightly better job finding phy's that are (mostly?) responding correctly to
>>>>>> the 802.3 spec. So we can say "if you hardware is ACPI conformant, and you
>>>>>> have IEEE conformant phy's you should be ok". So, for your example phy, I
>>>>>> guess the immediate answer is "use DT" or "find a conformant phy", or even
>>>>>> "abstract it in the firmware and use a mailbox interface".
>>>>>
>>>>> You haven't understood. The PHY does conform for most of the MMDs,
>>>>> but there are a number that do not conform.
>>>>
>>>> Probably...
>>>>
>>>> Except that i'm not sure how that is a problem at the moment, its still
>>>> going to trigger as a found phy, and walk the same mmd list as before
>>>> requesting drivers. Those drivers haven't changed their behavior so where is
>>>> the problem? If there is a problem its in 7/11 where things are getting
>>>> kicked due to seemingly invalid Ids.
>>>>
>>>> The 1/11 devices=0 case actually appears to be a bug i'm fixing because you
>>>> won't get an ID or a MMD list from that (before or after).
>>>
>>> I think I've just flattened that argument in my immediately preceding
>>> reply on the Cortina situation; I think you've grossly misread that
>>> through not fully researching the history and then finding the
>>> existing users.
>>>
>>> There is no bug that you are fixing from what I can see.
>>
>> One of us is missing something,
>>
>> The "cortina" solution is broken in the current kernel. That is because
>> lines 726-742 are dead code due to line 693.
>>
>> I believe I've understood the problem there, and corrected it in this set
>> along with a few others, but its distinctly possible that isn't true.
>
> The code you refer to above is NOT used on the platforms that I have
> identified use the Cortina PHY. If this code is not used, it has not
> caused any issue, and there is no breakage due to the change you are
> referring to.
>
Right, which is what I sort of expected. Because its falling back to a
device list of 0xFFFFFFFF, which means probe every single MMD.

Combined with the lack of filtering means that your getting a bunch of
MMD IDs that potentially are invalid, along with any that happen to be
valid. Its that behavior (and some others) which were what blew this set
up from a couple lines of tweaks into this mess.

I don't really see a way to guess at all the "wrong" ids that are being
pushed into the system. Which is why I started to think about a "strict"
mode later in the set. Maybe at this point the only way around some of
these bugs/side effects/etc is just a second c45 probe routine if we
don't think its possible to implement a variable strictness scanner in
this code path without losing phys that previously were detected.


2020-05-25 23:48:23

by Jeremy Linton

[permalink] [raw]
Subject: Re: [RFC 04/11] net: phy: Handle c22 regs presence better

Hi,


On 5/25/20 6:12 PM, Andrew Lunn wrote:
>> arch/powerpc/boot/dts/fsl/t4240rdb.dts: compatible = "ethernet-phy-id13e5.1002";
>> arch/powerpc/boot/dts/fsl/t4240rdb.dts: compatible = "ethernet-phy-id13e5.1002";
>> arch/powerpc/boot/dts/fsl/t4240rdb.dts: compatible = "ethernet-phy-id13e5.1002";
>> arch/powerpc/boot/dts/fsl/t4240rdb.dts: compatible = "ethernet-phy-id13e5.1002";
>> arch/powerpc/boot/dts/fsl/t2080rdb.dts: compatible = "ethernet-phy-id13e5.1002";
>> arch/powerpc/boot/dts/fsl/t2080rdb.dts: compatible = "ethernet-phy-id13e5.1002";
>
> Hi Jeremy
>
> You are doing this work for NXP right? Maybe you can ask them to go
> searching in the cellar and find you one of these boards?

Yes, thats a good idea. I've been talking to various parties to let me
run this code on some of their "odd" devices.


2020-05-25 23:49:16

by Andrew Lunn

[permalink] [raw]
Subject: Re: [RFC 04/11] net: phy: Handle c22 regs presence better

> Right, which is what I sort of expected. Because its falling back to a
> device list of 0xFFFFFFFF, which means probe every single MMD.
>
> Combined with the lack of filtering means that your getting a bunch of MMD
> IDs that potentially are invalid, along with any that happen to be valid.
> Its that behavior (and some others) which were what blew this set up from a
> couple lines of tweaks into this mess.
>
> I don't really see a way to guess at all the "wrong" ids that are being
> pushed into the system. Which is why I started to think about a "strict"
> mode later in the set. Maybe at this point the only way around some of these
> bugs/side effects/etc is just a second c45 probe routine if we don't think
> its possible to implement a variable strictness scanner in this code path
> without losing phys that previously were detected.

Hi Jeremy

I really think we need to find one of those boards which have a
cortina and see what it actually does. Can you get in contact with NXP
and see if they can find one?

Thanks
Andrew

2020-05-25 23:50:12

by Andrew Lunn

[permalink] [raw]
Subject: Re: [RFC 04/11] net: phy: Handle c22 regs presence better

On Mon, May 25, 2020 at 06:46:10PM -0500, Jeremy Linton wrote:
> Hi,
>
>
> On 5/25/20 6:12 PM, Andrew Lunn wrote:
> > > arch/powerpc/boot/dts/fsl/t4240rdb.dts: compatible = "ethernet-phy-id13e5.1002";
> > > arch/powerpc/boot/dts/fsl/t4240rdb.dts: compatible = "ethernet-phy-id13e5.1002";
> > > arch/powerpc/boot/dts/fsl/t4240rdb.dts: compatible = "ethernet-phy-id13e5.1002";
> > > arch/powerpc/boot/dts/fsl/t4240rdb.dts: compatible = "ethernet-phy-id13e5.1002";
> > > arch/powerpc/boot/dts/fsl/t2080rdb.dts: compatible = "ethernet-phy-id13e5.1002";
> > > arch/powerpc/boot/dts/fsl/t2080rdb.dts: compatible = "ethernet-phy-id13e5.1002";
> >
> > Hi Jeremy
> >
> > You are doing this work for NXP right? Maybe you can ask them to go
> > searching in the cellar and find you one of these boards?
>
> Yes, thats a good idea. I've been talking to various parties to let me run
> this code on some of their "odd" devices.

O.K. great.

Then i think we should all stop emailing for a while until we know
more.

Andrew

2020-05-26 00:03:17

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [RFC 04/11] net: phy: Handle c22 regs presence better

On Mon, May 25, 2020 at 06:42:50PM -0500, Jeremy Linton wrote:
> Hi,
>
> On 5/25/20 6:33 PM, Russell King - ARM Linux admin wrote:
> > On Mon, May 25, 2020 at 06:22:19PM -0500, Jeremy Linton wrote:
> > > On 5/25/20 6:09 PM, Russell King - ARM Linux admin wrote:
> > > > On Mon, May 25, 2020 at 05:22:07PM -0500, Jeremy Linton wrote:
> > > > > On 5/25/20 5:01 PM, Russell King - ARM Linux admin wrote:
> > > > > > On Mon, May 25, 2020 at 04:51:16PM -0500, Jeremy Linton wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > On 5/25/20 5:06 AM, Russell King - ARM Linux admin wrote:
> > > > > > > > On Sun, May 24, 2020 at 10:34:13PM -0500, Jeremy Linton wrote:
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > On 5/23/20 1:37 PM, Russell King - ARM Linux admin wrote:
> > > > > > > > > > On Fri, May 22, 2020 at 04:30:52PM -0500, Jeremy Linton wrote:
> > > > > > > > > > > Until this point, we have been sanitizing the c22
> > > > > > > > > > > regs presence bit out of all the MMD device lists.
> > > > > > > > > > > This is incorrect as it causes the 0xFFFFFFFF checks
> > > > > > > > > > > to incorrectly fail. Further, it turns out that we
> > > > > > > > > > > want to utilize this flag to make a determination that
> > > > > > > > > > > there is actually a phy at this location and we should
> > > > > > > > > > > be accessing it using c22.
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Jeremy Linton <[email protected]>
> > > > > > > > > > > ---
> > > > > > > > > > > drivers/net/phy/phy_device.c | 16 +++++++++++++---
> > > > > > > > > > > 1 file changed, 13 insertions(+), 3 deletions(-)
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> > > > > > > > > > > index f0761fa5e40b..2d677490ecab 100644
> > > > > > > > > > > --- a/drivers/net/phy/phy_device.c
> > > > > > > > > > > +++ b/drivers/net/phy/phy_device.c
> > > > > > > > > > > @@ -689,9 +689,6 @@ static int get_phy_c45_devs_in_pkg(struct mii_bus *bus, int addr, int dev_addr,
> > > > > > > > > > > return -EIO;
> > > > > > > > > > > *devices_in_package |= phy_reg;
> > > > > > > > > > > - /* Bit 0 doesn't represent a device, it indicates c22 regs presence */
> > > > > > > > > > > - *devices_in_package &= ~BIT(0);
> > > > > > > > > > > -
> > > > > > > > > > > return 0;
> > > > > > > > > > > }
> > > > > > > > > > > @@ -742,6 +739,8 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
> > > > > > > > > > > int i;
> > > > > > > > > > > const int num_ids = ARRAY_SIZE(c45_ids->device_ids);
> > > > > > > > > > > u32 *devs = &c45_ids->devices_in_package;
> > > > > > > > > > > + bool c22_present = false;
> > > > > > > > > > > + bool valid_id = false;
> > > > > > > > > > > /* Find first non-zero Devices In package. Device zero is reserved
> > > > > > > > > > > * for 802.3 c45 complied PHYs, so don't probe it at first.
> > > > > > > > > > > @@ -770,6 +769,10 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
> > > > > > > > > > > return 0;
> > > > > > > > > > > }
> > > > > > > > > > > + /* Bit 0 doesn't represent a device, it indicates c22 regs presence */
> > > > > > > > > > > + c22_present = *devs & BIT(0);
> > > > > > > > > > > + *devs &= ~BIT(0);
> > > > > > > > > > > +
> > > > > > > > > > > /* Now probe Device Identifiers for each device present. */
> > > > > > > > > > > for (i = 1; i < num_ids; i++) {
> > > > > > > > > > > if (!(c45_ids->devices_in_package & (1 << i)))
> > > > > > > > > > > @@ -778,6 +781,13 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
> > > > > > > > > > > ret = _get_phy_id(bus, addr, i, &c45_ids->device_ids[i], true);
> > > > > > > > > > > if (ret < 0)
> > > > > > > > > > > return ret;
> > > > > > > > > > > + if (valid_phy_id(c45_ids->device_ids[i]))
> > > > > > > > > > > + valid_id = true;
> > > > > > > > > >
> > > > > > > > > > Here you are using your "devices in package" validator to validate the
> > > > > > > > > > PHY ID value. One of the things it does is mask this value with
> > > > > > > > > > 0x1fffffff. That means you lose some of the vendor OUI. To me, this
> > > > > > > > > > looks completely wrong.
> > > > > > > > >
> > > > > > > > > I think in this case I was just using it like the comment in
> > > > > > > > > get_phy_device() "if the phy_id is mostly F's, there is no device here".
> > > > > > > > >
> > > > > > > > > My understanding is that the code is trying to avoid the 0xFFFFFFFF returns
> > > > > > > > > that seem to indicate "bus ok, phy didn't respond".
> > > > > > > > >
> > > > > > > > > I just checked the OUI registration, and while there are a couple OUI's
> > > > > > > > > registered that have a number of FFF's in them, none of those cases seems to
> > > > > > > > > overlap sufficiently to cause this to throw them out. Plus a phy would also
> > > > > > > > > have to have model+revision set to 'F's. So while might be possible, if
> > > > > > > > > unlikely, at the moment I think the OUI registration keeps this from being a
> > > > > > > > > problem. Particularly, if i'm reading the mapping correctly, the OUI mapping
> > > > > > > > > guarantees that the field cannot be all '1's due to the OUI having X & M
> > > > > > > > > bits cleared. It sort of looks like the mapping is trying to lose those
> > > > > > > > > bits, by tossing bit 1 & 2, but the X & M are in the wrong octet (AFAIK, I
> > > > > > > > > just read it three times cause it didn't make any sense).
> > > > > > > >
> > > > > > > > I should also note that we have at least one supported PHY where one
> > > > > > > > of the MMDs returns 0xfffe for even numbered registers and 0x0000 for
> > > > > > > > odd numbered registers in one of the vendor MMDs for addresses 0
> > > > > > > > through 0xefff - which has a bit set in the devices-in-package.
> > > > > > > >
> > > > > > > > It also returns 0x0082 for almost every register in MMD 2, but MMD 2's
> > > > > > > > devices-in-package bit is clear in most of the valid MMDs, so we
> > > > > > > > shouldn't touch it.
> > > > > > > >
> > > > > > > > These reveal the problem of randomly probing MMDs - they can return
> > > > > > > > unexpected values and not be as well behaved as we would like them to
> > > > > > > > be. Using register 8 to detect presence may be beneficial, but that
> > > > > > > > may also introduce problems as we haven't used that before (and we
> > > > > > > > don't know whether any PHY that wrong.) I know at least the 88x3310
> > > > > > > > gets it right for all except the vendor MMDs, where the low addresses
> > > > > > > > appear non-confromant to the 802.3 specs. Both vendor MMDs are
> > > > > > > > definitely implemented, just not with anything conforming to 802.3.
> > > > > > >
> > > > > > > Yes, we know even for the NXP reference hardware, one of the phy's doesn't
> > > > > > > probe out correctly because it doesn't respond to the ieee defined
> > > > > > > registers. I think at this point, there really isn't anything we can do
> > > > > > > about that unless we involve the (ACPI) firmware in currently nonstandard
> > > > > > > behaviors.
> > > > > > >
> > > > > > > So, my goals here have been to first, not break anything, and then do a
> > > > > > > slightly better job finding phy's that are (mostly?) responding correctly to
> > > > > > > the 802.3 spec. So we can say "if you hardware is ACPI conformant, and you
> > > > > > > have IEEE conformant phy's you should be ok". So, for your example phy, I
> > > > > > > guess the immediate answer is "use DT" or "find a conformant phy", or even
> > > > > > > "abstract it in the firmware and use a mailbox interface".
> > > > > >
> > > > > > You haven't understood. The PHY does conform for most of the MMDs,
> > > > > > but there are a number that do not conform.
> > > > >
> > > > > Probably...
> > > > >
> > > > > Except that i'm not sure how that is a problem at the moment, its still
> > > > > going to trigger as a found phy, and walk the same mmd list as before
> > > > > requesting drivers. Those drivers haven't changed their behavior so where is
> > > > > the problem? If there is a problem its in 7/11 where things are getting
> > > > > kicked due to seemingly invalid Ids.
> > > > >
> > > > > The 1/11 devices=0 case actually appears to be a bug i'm fixing because you
> > > > > won't get an ID or a MMD list from that (before or after).
> > > >
> > > > I think I've just flattened that argument in my immediately preceding
> > > > reply on the Cortina situation; I think you've grossly misread that
> > > > through not fully researching the history and then finding the
> > > > existing users.
> > > >
> > > > There is no bug that you are fixing from what I can see.
> > >
> > > One of us is missing something,
> > >
> > > The "cortina" solution is broken in the current kernel. That is because
> > > lines 726-742 are dead code due to line 693.
> > >
> > > I believe I've understood the problem there, and corrected it in this set
> > > along with a few others, but its distinctly possible that isn't true.
> >
> > The code you refer to above is NOT used on the platforms that I have
> > identified use the Cortina PHY. If this code is not used, it has not
> > caused any issue, and there is no breakage due to the change you are
> > referring to.
> >
> Right, which is what I sort of expected. Because its falling back to a
> device list of 0xFFFFFFFF, which means probe every single MMD.

No. no no no no no.

In the platforms that I have identified, the Cortina PHY will be
created by the DT code (drivers/of/of_mdio.c). The PHY device
will be created by phy_device_create() with is_c45 *false*.
phydev->c45_ids will actually end up containing all-zeros.
So, there is no list of MMDs in this case.

The phy_device_create() path does not call get_phy_c45_ids().
This code is not run for any of the platforms I've identified
for a Cortina PHY.

The workaround for the devices-in-package was added back in
2015, two years _before_ there was a Cortina PHY driver, when
the phylib support for Clause 45 PHYs was in its infancy - there
was something really basic that didn't care what ID the PHY was,
just assumed it was a 10G PHY, no configuration of it, and just
reported link up/down. So, back then IDs were mostly meaningless
for Clause 45 PHYs.

In 2017, a Cortina PHY driver was added, and we now have some
platforms that use this, and they _totally_ avoid this code path.

The workaround is likely obsolete and redundant, but we've no way
to know if removing it will create a regression.

In any case, for the reasons I've already clearly set out in one of
my previous emails analysing the Cortina situation (which you seem
to have ignored), pointing out that even the MMD 0 register set is
not compatible and would likely not reveal a valid ID, I think it's
highly likely that Cortina PHYs did not work for very long between
2015 and 2017, but continue to work fine today.

Please, rather than immediately writing yet another email trying to
"clarify" something in reply to this email, please go away and think
about some of the points I've raised, read the code as it stands,
not only in phylib but also in drivers/of. Look at the device tree
descriptions for the boards I've pointed out. Analyse what the code
would do. It will help you immensely to have that understanding,
and I'm sure you will come to the same conclusion I have that the
workaround we see in get_phy_c45_ids() is likely obsolete.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC for 0.8m (est. 1762m) line in suburbia: sync at 13.1Mbps down 424kbps up