2020-12-18 03:25:50

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH v1 0/5] Enable fw_devlink=on by default

As discussed in LPC 2020, cyclic dependencies in firmware that couldn't
be broken using logic was one of the last remaining reasons
fw_devlink=on couldn't be set by default.

This series changes fw_devlink so that when a cyclic dependency is found
in firmware, the links between those devices fallback to permissive mode
behavior. This way, the rest of the system still benefits from
fw_devlink, but the ambiguous cases fallback to permissive mode.

Setting fw_devlink=on by default brings a bunch of benefits (currently,
only for systems with device tree firmware):
* Significantly cuts down deferred probes.
* Device probe is effectively attempted in graph order.
* Makes it much easier to load drivers as modules without having to
worry about functional dependencies between modules (depmod is still
needed for symbol dependencies).

Greg/Rafael,

Can we get this pulled into 5.11-rc1 or -rc2 soon please? I expect to
see some issues due to device drivers that aren't following best
practices (they don't expose the device to driver core). Want to
identify those early on and try to have them fixed before 5.11 release.
See [1] for an example of such a case.

If we do end up have to revert anything, it'll just be Patch 5/5 (a one
liner).

Marc,

You had hit issues with fw_devlink=on before on some of your systems.
Want to give this a shot?

Jisheng,

Want to fix up one of those gpio drivers you were having problems with?

Thanks,
Saravana

[1] - https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/

Cc: Jisheng Zhang <[email protected]>
Cc: Kevin Hilman <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Nicolas Saenz Julienne <[email protected]>
Cc: Marc Zyngier <[email protected]>

Saravana Kannan (5):
driver core: Add debug logs for device link related probe deferrals
driver core: Add device link support for INFERRED flag
driver core: Have fw_devlink use DL_FLAG_INFERRED
driver core: Handle cycles in device links created by fw_devlink
driver core: Set fw_devlink=on by default

drivers/base/core.c | 101 +++++++++++++++++++++++++++++++++++------
include/linux/device.h | 2 +
2 files changed, 90 insertions(+), 13 deletions(-)

--
2.29.2.684.gfbc64c5ab5-goog


2020-12-18 03:25:56

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH v1 2/5] driver core: Add device link support for INFERRED flag

This flag can never be added to a device link that already exists and
doesn't have the flag set. It can only be added when a device link is
created for the first time or it can be maintained if the device link
already has the it set.

This flag will be used for marking device links created ONLY by
inferring dependencies from data and NOT from explicit action by device
drivers/frameworks. This will be useful in the future when we need to
deal with cycles in dependencies inferred from firmware.

Signed-off-by: Saravana Kannan <[email protected]>
---
drivers/base/core.c | 15 +++++++++++----
include/linux/device.h | 2 ++
2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index fe8601197b84..5827dbff7f21 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -229,7 +229,8 @@ int device_is_dependent(struct device *dev, void *target)
return ret;

list_for_each_entry(link, &dev->links.consumers, s_node) {
- if (link->flags == (DL_FLAG_SYNC_STATE_ONLY | DL_FLAG_MANAGED))
+ if ((link->flags & ~DL_FLAG_INFERRED) ==
+ (DL_FLAG_SYNC_STATE_ONLY | DL_FLAG_MANAGED))
continue;

if (link->consumer == target)
@@ -302,7 +303,8 @@ static int device_reorder_to_tail(struct device *dev, void *not_used)

device_for_each_child(dev, NULL, device_reorder_to_tail);
list_for_each_entry(link, &dev->links.consumers, s_node) {
- if (link->flags == (DL_FLAG_SYNC_STATE_ONLY | DL_FLAG_MANAGED))
+ if ((link->flags & ~DL_FLAG_INFERRED) ==
+ (DL_FLAG_SYNC_STATE_ONLY | DL_FLAG_MANAGED))
continue;
device_reorder_to_tail(link->consumer, NULL);
}
@@ -546,7 +548,8 @@ postcore_initcall(devlink_class_init);
#define DL_MANAGED_LINK_FLAGS (DL_FLAG_AUTOREMOVE_CONSUMER | \
DL_FLAG_AUTOREMOVE_SUPPLIER | \
DL_FLAG_AUTOPROBE_CONSUMER | \
- DL_FLAG_SYNC_STATE_ONLY)
+ DL_FLAG_SYNC_STATE_ONLY | \
+ DL_FLAG_INFERRED)

#define DL_ADD_VALID_FLAGS (DL_MANAGED_LINK_FLAGS | DL_FLAG_STATELESS | \
DL_FLAG_PM_RUNTIME | DL_FLAG_RPM_ACTIVE)
@@ -615,7 +618,7 @@ struct device_link *device_link_add(struct device *consumer,
if (!consumer || !supplier || flags & ~DL_ADD_VALID_FLAGS ||
(flags & DL_FLAG_STATELESS && flags & DL_MANAGED_LINK_FLAGS) ||
(flags & DL_FLAG_SYNC_STATE_ONLY &&
- flags != DL_FLAG_SYNC_STATE_ONLY) ||
+ (flags & ~DL_FLAG_INFERRED) != DL_FLAG_SYNC_STATE_ONLY) ||
(flags & DL_FLAG_AUTOPROBE_CONSUMER &&
flags & (DL_FLAG_AUTOREMOVE_CONSUMER |
DL_FLAG_AUTOREMOVE_SUPPLIER)))
@@ -671,6 +674,10 @@ struct device_link *device_link_add(struct device *consumer,
if (link->consumer != consumer)
continue;

+ if (link->flags & DL_FLAG_INFERRED &&
+ !(flags & DL_FLAG_INFERRED))
+ link->flags &= ~DL_FLAG_INFERRED;
+
if (flags & DL_FLAG_PM_RUNTIME) {
if (!(link->flags & DL_FLAG_PM_RUNTIME)) {
pm_runtime_new_link(consumer);
diff --git a/include/linux/device.h b/include/linux/device.h
index 89bb8b84173e..cb5eb2e58c25 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -323,6 +323,7 @@ enum device_link_state {
* AUTOPROBE_CONSUMER: Probe consumer driver automatically after supplier binds.
* MANAGED: The core tracks presence of supplier/consumer drivers (internal).
* SYNC_STATE_ONLY: Link only affects sync_state() behavior.
+ * INFERRED: Inferred from data (eg: firmware) and not from driver actions.
*/
#define DL_FLAG_STATELESS BIT(0)
#define DL_FLAG_AUTOREMOVE_CONSUMER BIT(1)
@@ -332,6 +333,7 @@ enum device_link_state {
#define DL_FLAG_AUTOPROBE_CONSUMER BIT(5)
#define DL_FLAG_MANAGED BIT(6)
#define DL_FLAG_SYNC_STATE_ONLY BIT(7)
+#define DL_FLAG_INFERRED BIT(8)

/**
* enum dl_dev_state - Device driver presence tracking information.
--
2.29.2.684.gfbc64c5ab5-goog

2020-12-18 03:26:12

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH v1 4/5] driver core: Handle cycles in device links created by fw_devlink

Sometimes, firmware can have cyclic dependencies between devices. But
one or more of those dependencies in the cycle are false dependencies
that don't affect the probing of the device.

fw_devlink can detect some of these false dependencies using logic. But
when it can't, we don't want to block probing of the devices in this
cyclic dependency.

So, instead of using normal device links for the devices in this cycle,
we need to switch to SYNC_STATE_ONLY device links between these devices.
This is so that sync_state() callback correctness is still maintained
while we allow these device to probe.

This is functionally similar to switching to fw_devlink=permissive just
for the devices in the cycle.

Signed-off-by: Saravana Kannan <[email protected]>
---
drivers/base/core.c | 58 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 57 insertions(+), 1 deletion(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 1107d03aa6b3..4cc030361165 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1505,6 +1505,53 @@ static void fw_devlink_parse_fwtree(struct fwnode_handle *fwnode)
fw_devlink_parse_fwtree(child);
}

+/**
+ * fw_devlink_relax_cycle - Convert cyclic links to SYNC_STATE_ONLY links
+ * @con: Device to check dependencies for.
+ * @sup: Device to check against.
+ *
+ * Check if @sup depends on @con or any device dependent on it (its child or
+ * its consumer etc). When such a cyclic dependency is found, convert all
+ * device links created solely by fw_devlink into SYNC_STATE_ONLY device links.
+ * This is the equivalent of doing fw_devlink=permissive just between the
+ * devices in the cycle. We need to do this because, at this point, fw_devlink
+ * can't tell which of these dependencies is not a real dependency.
+ *
+ * Return 1 if a cycle is found. Otherwise, return 0.
+ */
+int fw_devlink_relax_cycle(struct device *con, void *sup)
+{
+ struct device_link *link;
+ int ret;
+
+ if (con == sup)
+ return 1;
+
+ ret = device_for_each_child(con, sup, fw_devlink_relax_cycle);
+ if (ret)
+ return ret;
+
+ list_for_each_entry(link, &con->links.consumers, s_node) {
+ if ((link->flags & ~DL_FLAG_INFERRED) ==
+ (DL_FLAG_SYNC_STATE_ONLY | DL_FLAG_MANAGED))
+ continue;
+
+ if (!fw_devlink_relax_cycle(link->consumer, sup))
+ continue;
+
+ ret = 1;
+
+ if (!(link->flags & DL_FLAG_INFERRED))
+ continue;
+
+ pm_runtime_drop_link(link);
+ link->flags = DL_FLAG_MANAGED | FW_DEVLINK_FLAGS_PERMISSIVE;
+ dev_dbg(link->consumer, "Relaxing link with %s\n",
+ dev_name(link->supplier));
+ }
+ return ret;
+}
+
/**
* fw_devlink_create_devlink - Create a device link from a consumer to fwnode
* @con - Consumer device for the device link
@@ -1536,8 +1583,17 @@ static int fw_devlink_create_devlink(struct device *con,
* If this fails, it is due to cycles in device links. Just
* give up on this link and treat it as invalid.
*/
- if (!device_link_add(con, sup_dev, flags))
+ if (!device_link_add(con, sup_dev, flags) &&
+ !(flags & DL_FLAG_SYNC_STATE_ONLY)) {
+ dev_info(con, "Fixing up cyclic dependency with %s\n",
+ dev_name(sup_dev));
+ device_links_write_lock();
+ fw_devlink_relax_cycle(con, sup_dev);
+ device_links_write_unlock();
+ device_link_add(con, sup_dev,
+ FW_DEVLINK_FLAGS_PERMISSIVE);
ret = -EINVAL;
+ }

goto out;
}
--
2.29.2.684.gfbc64c5ab5-goog

2020-12-18 03:26:34

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH v1 3/5] driver core: Have fw_devlink use DL_FLAG_INFERRED

This will be useful in identifying device links created only due to
fw_devlink when we need to break cyclic dependencies due to fw_devlink.

Signed-off-by: Saravana Kannan <[email protected]>
---
drivers/base/core.c | 22 ++++++++++++++--------
1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 5827dbff7f21..1107d03aa6b3 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1450,7 +1450,14 @@ static void device_links_purge(struct device *dev)
device_links_write_unlock();
}

-static u32 fw_devlink_flags = DL_FLAG_SYNC_STATE_ONLY;
+#define FW_DEVLINK_FLAGS_PERMISSIVE (DL_FLAG_INFERRED | \
+ DL_FLAG_SYNC_STATE_ONLY)
+#define FW_DEVLINK_FLAGS_ON (DL_FLAG_INFERRED | \
+ DL_FLAG_AUTOPROBE_CONSUMER)
+#define FW_DEVLINK_FLAGS_RPM (FW_DEVLINK_FLAGS_ON | \
+ DL_FLAG_PM_RUNTIME)
+
+static u32 fw_devlink_flags = FW_DEVLINK_FLAGS_PERMISSIVE;
static int __init fw_devlink_setup(char *arg)
{
if (!arg)
@@ -1459,12 +1466,11 @@ static int __init fw_devlink_setup(char *arg)
if (strcmp(arg, "off") == 0) {
fw_devlink_flags = 0;
} else if (strcmp(arg, "permissive") == 0) {
- fw_devlink_flags = DL_FLAG_SYNC_STATE_ONLY;
+ fw_devlink_flags = FW_DEVLINK_FLAGS_PERMISSIVE;
} else if (strcmp(arg, "on") == 0) {
- fw_devlink_flags = DL_FLAG_AUTOPROBE_CONSUMER;
+ fw_devlink_flags = FW_DEVLINK_FLAGS_ON;
} else if (strcmp(arg, "rpm") == 0) {
- fw_devlink_flags = DL_FLAG_AUTOPROBE_CONSUMER |
- DL_FLAG_PM_RUNTIME;
+ fw_devlink_flags = FW_DEVLINK_FLAGS_RPM;
}
return 0;
}
@@ -1477,7 +1483,7 @@ u32 fw_devlink_get_flags(void)

static bool fw_devlink_is_permissive(void)
{
- return fw_devlink_flags == DL_FLAG_SYNC_STATE_ONLY;
+ return fw_devlink_flags == FW_DEVLINK_FLAGS_PERMISSIVE;
}

static void fw_devlink_parse_fwnode(struct fwnode_handle *fwnode)
@@ -1624,7 +1630,7 @@ static void __fw_devlink_link_to_consumers(struct device *dev)
con_dev = NULL;
} else {
own_link = false;
- dl_flags = DL_FLAG_SYNC_STATE_ONLY;
+ dl_flags = FW_DEVLINK_FLAGS_PERMISSIVE;
}
}

@@ -1679,7 +1685,7 @@ static void __fw_devlink_link_to_suppliers(struct device *dev,
if (own_link)
dl_flags = fw_devlink_get_flags();
else
- dl_flags = DL_FLAG_SYNC_STATE_ONLY;
+ dl_flags = FW_DEVLINK_FLAGS_PERMISSIVE;

list_for_each_entry_safe(link, tmp, &fwnode->suppliers, c_hook) {
int ret;
--
2.29.2.684.gfbc64c5ab5-goog

2020-12-18 03:32:17

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

Cyclic dependencies in some firmware was one of the last remaining
reasons fw_devlink=on couldn't be set by default. Now that cyclic
dependencies don't block probing, set fw_devlink=on by default.

Setting fw_devlink=on by default brings a bunch of benefits (currently,
only for systems with device tree firmware):
* Significantly cuts down deferred probes.
* Device probe is effectively attempted in graph order.
* Makes it much easier to load drivers as modules without having to
worry about functional dependencies between modules (depmod is still
needed for symbol dependencies).

If this patch prevents some devices from probing, it's very likely due
to the system having one or more device drivers that "probe"/set up a
device (DT node with compatible property) without creating a struct
device for it. If we hit such cases, the device drivers need to be
fixed so that they populate struct devices and probe them like normal
device drivers so that the driver core is aware of the devices and their
status. See [1] for an example of such a case.

[1] - https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
Signed-off-by: Saravana Kannan <[email protected]>
---
drivers/base/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 4cc030361165..803bfa6eb823 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1457,7 +1457,7 @@ static void device_links_purge(struct device *dev)
#define FW_DEVLINK_FLAGS_RPM (FW_DEVLINK_FLAGS_ON | \
DL_FLAG_PM_RUNTIME)

-static u32 fw_devlink_flags = FW_DEVLINK_FLAGS_PERMISSIVE;
+static u32 fw_devlink_flags = FW_DEVLINK_FLAGS_ON;
static int __init fw_devlink_setup(char *arg)
{
if (!arg)
--
2.29.2.684.gfbc64c5ab5-goog

2020-12-18 06:42:42

by kernel test robot

[permalink] [raw]
Subject: [RFC PATCH] driver core: fw_devlink_relax_cycle() can be static


Reported-by: kernel test robot <[email protected]>
Signed-off-by: kernel test robot <[email protected]>
---
core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 4cc0303611650c..4e15193aafad6a 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1519,7 +1519,7 @@ static void fw_devlink_parse_fwtree(struct fwnode_handle *fwnode)
*
* Return 1 if a cycle is found. Otherwise, return 0.
*/
-int fw_devlink_relax_cycle(struct device *con, void *sup)
+static int fw_devlink_relax_cycle(struct device *con, void *sup)
{
struct device_link *link;
int ret;

2020-12-18 06:45:02

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v1 4/5] driver core: Handle cycles in device links created by fw_devlink

Hi Saravana,

I love your patch! Perhaps something to improve:

[auto build test WARNING on driver-core/driver-core-testing]
[also build test WARNING on linus/master next-20201217]
[cannot apply to linux/master v5.10]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Saravana-Kannan/Enable-fw_devlink-on-by-default/20201218-112111
base: https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git accefff5b547a9a1d959c7e76ad539bf2480e78b
config: i386-randconfig-s001-20201217 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.3-184-g1b896707-dirty
# https://github.com/0day-ci/linux/commit/7bdc87ea0400318d827410f454ec7e5fbaf470c3
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Saravana-Kannan/Enable-fw_devlink-on-by-default/20201218-112111
git checkout 7bdc87ea0400318d827410f454ec7e5fbaf470c3
# save the attached .config to linux build tree
make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=i386

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>


"sparse warnings: (new ones prefixed by >>)"
>> drivers/base/core.c:1522:5: sparse: sparse: symbol 'fw_devlink_relax_cycle' was not declared. Should it be static?

Please review and possibly fold the followup patch.

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (1.73 kB)
.config.gz (33.78 kB)
Download all attachments

2020-12-18 06:52:26

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v1 4/5] driver core: Handle cycles in device links created by fw_devlink

Hi Saravana,

I love your patch! Perhaps something to improve:

[auto build test WARNING on driver-core/driver-core-testing]
[also build test WARNING on linus/master next-20201217]
[cannot apply to linux/master v5.10]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Saravana-Kannan/Enable-fw_devlink-on-by-default/20201218-112111
base: https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git accefff5b547a9a1d959c7e76ad539bf2480e78b
config: mips-randconfig-r016-20201217 (attached as .config)
compiler: mips64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/7bdc87ea0400318d827410f454ec7e5fbaf470c3
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Saravana-Kannan/Enable-fw_devlink-on-by-default/20201218-112111
git checkout 7bdc87ea0400318d827410f454ec7e5fbaf470c3
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=mips

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All warnings (new ones prefixed by >>):

>> drivers/base/core.c:1522:5: warning: no previous prototype for 'fw_devlink_relax_cycle' [-Wmissing-prototypes]
1522 | int fw_devlink_relax_cycle(struct device *con, void *sup)
| ^~~~~~~~~~~~~~~~~~~~~~


vim +/fw_devlink_relax_cycle +1522 drivers/base/core.c

1507
1508 /**
1509 * fw_devlink_relax_cycle - Convert cyclic links to SYNC_STATE_ONLY links
1510 * @con: Device to check dependencies for.
1511 * @sup: Device to check against.
1512 *
1513 * Check if @sup depends on @con or any device dependent on it (its child or
1514 * its consumer etc). When such a cyclic dependency is found, convert all
1515 * device links created solely by fw_devlink into SYNC_STATE_ONLY device links.
1516 * This is the equivalent of doing fw_devlink=permissive just between the
1517 * devices in the cycle. We need to do this because, at this point, fw_devlink
1518 * can't tell which of these dependencies is not a real dependency.
1519 *
1520 * Return 1 if a cycle is found. Otherwise, return 0.
1521 */
> 1522 int fw_devlink_relax_cycle(struct device *con, void *sup)
1523 {
1524 struct device_link *link;
1525 int ret;
1526
1527 if (con == sup)
1528 return 1;
1529
1530 ret = device_for_each_child(con, sup, fw_devlink_relax_cycle);
1531 if (ret)
1532 return ret;
1533
1534 list_for_each_entry(link, &con->links.consumers, s_node) {
1535 if ((link->flags & ~DL_FLAG_INFERRED) ==
1536 (DL_FLAG_SYNC_STATE_ONLY | DL_FLAG_MANAGED))
1537 continue;
1538
1539 if (!fw_devlink_relax_cycle(link->consumer, sup))
1540 continue;
1541
1542 ret = 1;
1543
1544 if (!(link->flags & DL_FLAG_INFERRED))
1545 continue;
1546
1547 pm_runtime_drop_link(link);
1548 link->flags = DL_FLAG_MANAGED | FW_DEVLINK_FLAGS_PERMISSIVE;
1549 dev_dbg(link->consumer, "Relaxing link with %s\n",
1550 dev_name(link->supplier));
1551 }
1552 return ret;
1553 }
1554

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (3.64 kB)
.config.gz (33.62 kB)
Download all attachments

2020-12-18 07:30:31

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v1 4/5] driver core: Handle cycles in device links created by fw_devlink

Hi Saravana,

I love your patch! Perhaps something to improve:

[auto build test WARNING on driver-core/driver-core-testing]
[also build test WARNING on linus/master next-20201217]
[cannot apply to linux/master v5.10]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Saravana-Kannan/Enable-fw_devlink-on-by-default/20201218-112111
base: https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git accefff5b547a9a1d959c7e76ad539bf2480e78b
config: riscv-randconfig-r014-20201217 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project cee1e7d14f4628d6174b33640d502bff3b54ae45)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install riscv cross compiling tool for clang build
# apt-get install binutils-riscv64-linux-gnu
# https://github.com/0day-ci/linux/commit/7bdc87ea0400318d827410f454ec7e5fbaf470c3
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Saravana-Kannan/Enable-fw_devlink-on-by-default/20201218-112111
git checkout 7bdc87ea0400318d827410f454ec7e5fbaf470c3
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=riscv

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All warnings (new ones prefixed by >>):

~~~~~~~~~~ ^
arch/riscv/include/asm/mmio.h:87:48: note: expanded from macro 'readb_cpu'
#define readb_cpu(c) ({ u8 __r = __raw_readb(c); __r; })
^
In file included from drivers/base/core.c:27:
In file included from include/linux/netdevice.h:37:
In file included from include/net/net_namespace.h:39:
In file included from include/linux/skbuff.h:31:
In file included from include/linux/dma-mapping.h:10:
In file included from include/linux/scatterlist.h:9:
In file included from arch/riscv/include/asm/io.h:149:
include/asm-generic/io.h:564:9: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
return inw(addr);
^~~~~~~~~
arch/riscv/include/asm/io.h:56:76: note: expanded from macro 'inw'
#define inw(c) ({ u16 __v; __io_pbr(); __v = readw_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; })
~~~~~~~~~~ ^
arch/riscv/include/asm/mmio.h:88:76: note: expanded from macro 'readw_cpu'
#define readw_cpu(c) ({ u16 __r = le16_to_cpu((__force __le16)__raw_readw(c)); __r; })
^
include/uapi/linux/byteorder/little_endian.h:36:51: note: expanded from macro '__le16_to_cpu'
#define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
^
In file included from drivers/base/core.c:27:
In file included from include/linux/netdevice.h:37:
In file included from include/net/net_namespace.h:39:
In file included from include/linux/skbuff.h:31:
In file included from include/linux/dma-mapping.h:10:
In file included from include/linux/scatterlist.h:9:
In file included from arch/riscv/include/asm/io.h:149:
include/asm-generic/io.h:572:9: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
return inl(addr);
^~~~~~~~~
arch/riscv/include/asm/io.h:57:76: note: expanded from macro 'inl'
#define inl(c) ({ u32 __v; __io_pbr(); __v = readl_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; })
~~~~~~~~~~ ^
arch/riscv/include/asm/mmio.h:89:76: note: expanded from macro 'readl_cpu'
#define readl_cpu(c) ({ u32 __r = le32_to_cpu((__force __le32)__raw_readl(c)); __r; })
^
include/uapi/linux/byteorder/little_endian.h:34:51: note: expanded from macro '__le32_to_cpu'
#define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
^
In file included from drivers/base/core.c:27:
In file included from include/linux/netdevice.h:37:
In file included from include/net/net_namespace.h:39:
In file included from include/linux/skbuff.h:31:
In file included from include/linux/dma-mapping.h:10:
In file included from include/linux/scatterlist.h:9:
In file included from arch/riscv/include/asm/io.h:149:
include/asm-generic/io.h:580:2: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
outb(value, addr);
^~~~~~~~~~~~~~~~~
arch/riscv/include/asm/io.h:59:68: note: expanded from macro 'outb'
#define outb(v,c) ({ __io_pbw(); writeb_cpu((v),(void*)(PCI_IOBASE + (c))); __io_paw(); })
~~~~~~~~~~ ^
arch/riscv/include/asm/mmio.h:91:52: note: expanded from macro 'writeb_cpu'
#define writeb_cpu(v, c) ((void)__raw_writeb((v), (c)))
^
In file included from drivers/base/core.c:27:
In file included from include/linux/netdevice.h:37:
In file included from include/net/net_namespace.h:39:
In file included from include/linux/skbuff.h:31:
In file included from include/linux/dma-mapping.h:10:
In file included from include/linux/scatterlist.h:9:
In file included from arch/riscv/include/asm/io.h:149:
include/asm-generic/io.h:588:2: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
outw(value, addr);
^~~~~~~~~~~~~~~~~
arch/riscv/include/asm/io.h:60:68: note: expanded from macro 'outw'
#define outw(v,c) ({ __io_pbw(); writew_cpu((v),(void*)(PCI_IOBASE + (c))); __io_paw(); })
~~~~~~~~~~ ^
arch/riscv/include/asm/mmio.h:92:76: note: expanded from macro 'writew_cpu'
#define writew_cpu(v, c) ((void)__raw_writew((__force u16)cpu_to_le16(v), (c)))
^
In file included from drivers/base/core.c:27:
In file included from include/linux/netdevice.h:37:
In file included from include/net/net_namespace.h:39:
In file included from include/linux/skbuff.h:31:
In file included from include/linux/dma-mapping.h:10:
In file included from include/linux/scatterlist.h:9:
In file included from arch/riscv/include/asm/io.h:149:
include/asm-generic/io.h:596:2: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
outl(value, addr);
^~~~~~~~~~~~~~~~~
arch/riscv/include/asm/io.h:61:68: note: expanded from macro 'outl'
#define outl(v,c) ({ __io_pbw(); writel_cpu((v),(void*)(PCI_IOBASE + (c))); __io_paw(); })
~~~~~~~~~~ ^
arch/riscv/include/asm/mmio.h:93:76: note: expanded from macro 'writel_cpu'
#define writel_cpu(v, c) ((void)__raw_writel((__force u32)cpu_to_le32(v), (c)))
^
In file included from drivers/base/core.c:27:
In file included from include/linux/netdevice.h:37:
In file included from include/net/net_namespace.h:39:
In file included from include/linux/skbuff.h:31:
In file included from include/linux/dma-mapping.h:10:
In file included from include/linux/scatterlist.h:9:
In file included from arch/riscv/include/asm/io.h:149:
include/asm-generic/io.h:1005:55: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
return (port > MMIO_UPPER_LIMIT) ? NULL : PCI_IOBASE + port;
~~~~~~~~~~ ^
>> drivers/base/core.c:1522:5: warning: no previous prototype for function 'fw_devlink_relax_cycle' [-Wmissing-prototypes]
int fw_devlink_relax_cycle(struct device *con, void *sup)
^
drivers/base/core.c:1522:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
int fw_devlink_relax_cycle(struct device *con, void *sup)
^
static
8 warnings generated.


vim +/fw_devlink_relax_cycle +1522 drivers/base/core.c

1507
1508 /**
1509 * fw_devlink_relax_cycle - Convert cyclic links to SYNC_STATE_ONLY links
1510 * @con: Device to check dependencies for.
1511 * @sup: Device to check against.
1512 *
1513 * Check if @sup depends on @con or any device dependent on it (its child or
1514 * its consumer etc). When such a cyclic dependency is found, convert all
1515 * device links created solely by fw_devlink into SYNC_STATE_ONLY device links.
1516 * This is the equivalent of doing fw_devlink=permissive just between the
1517 * devices in the cycle. We need to do this because, at this point, fw_devlink
1518 * can't tell which of these dependencies is not a real dependency.
1519 *
1520 * Return 1 if a cycle is found. Otherwise, return 0.
1521 */
> 1522 int fw_devlink_relax_cycle(struct device *con, void *sup)
1523 {
1524 struct device_link *link;
1525 int ret;
1526
1527 if (con == sup)
1528 return 1;
1529
1530 ret = device_for_each_child(con, sup, fw_devlink_relax_cycle);
1531 if (ret)
1532 return ret;
1533
1534 list_for_each_entry(link, &con->links.consumers, s_node) {
1535 if ((link->flags & ~DL_FLAG_INFERRED) ==
1536 (DL_FLAG_SYNC_STATE_ONLY | DL_FLAG_MANAGED))
1537 continue;
1538
1539 if (!fw_devlink_relax_cycle(link->consumer, sup))
1540 continue;
1541
1542 ret = 1;
1543
1544 if (!(link->flags & DL_FLAG_INFERRED))
1545 continue;
1546
1547 pm_runtime_drop_link(link);
1548 link->flags = DL_FLAG_MANAGED | FW_DEVLINK_FLAGS_PERMISSIVE;
1549 dev_dbg(link->consumer, "Relaxing link with %s\n",
1550 dev_name(link->supplier));
1551 }
1552 return ret;
1553 }
1554

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (10.83 kB)
.config.gz (18.83 kB)
Download all attachments

2020-12-18 21:13:55

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] Enable fw_devlink=on by default

On Thu, Dec 17, 2020 at 7:17 PM Saravana Kannan <[email protected]> wrote:
>
> As discussed in LPC 2020, cyclic dependencies in firmware that couldn't
> be broken using logic was one of the last remaining reasons
> fw_devlink=on couldn't be set by default.
>
> This series changes fw_devlink so that when a cyclic dependency is found
> in firmware, the links between those devices fallback to permissive mode
> behavior. This way, the rest of the system still benefits from
> fw_devlink, but the ambiguous cases fallback to permissive mode.
>
> Setting fw_devlink=on by default brings a bunch of benefits (currently,
> only for systems with device tree firmware):
> * Significantly cuts down deferred probes.
> * Device probe is effectively attempted in graph order.
> * Makes it much easier to load drivers as modules without having to
> worry about functional dependencies between modules (depmod is still
> needed for symbol dependencies).
>
> Greg/Rafael,
>
> Can we get this pulled into 5.11-rc1 or -rc2 soon please? I expect to
> see some issues due to device drivers that aren't following best
> practices (they don't expose the device to driver core). Want to
> identify those early on and try to have them fixed before 5.11 release.
> See [1] for an example of such a case.
>
> If we do end up have to revert anything, it'll just be Patch 5/5 (a one
> liner).
>
> Marc,
>
> You had hit issues with fw_devlink=on before on some of your systems.
> Want to give this a shot?

Marc,

If you decide to test this, please also pull in this patch. It should
fix all your interrupt issues.

https://lore.kernel.org/lkml/[email protected]/

-Saravana

2020-12-21 08:21:20

by Jisheng Zhang

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] Enable fw_devlink=on by default

On Thu, 17 Dec 2020 19:16:58 -0800 Saravana Kannan wrote:


>
>
> As discussed in LPC 2020, cyclic dependencies in firmware that couldn't
> be broken using logic was one of the last remaining reasons
> fw_devlink=on couldn't be set by default.
>
> This series changes fw_devlink so that when a cyclic dependency is found
> in firmware, the links between those devices fallback to permissive mode
> behavior. This way, the rest of the system still benefits from
> fw_devlink, but the ambiguous cases fallback to permissive mode.
>
> Setting fw_devlink=on by default brings a bunch of benefits (currently,
> only for systems with device tree firmware):
> * Significantly cuts down deferred probes.
> * Device probe is effectively attempted in graph order.
> * Makes it much easier to load drivers as modules without having to
> worry about functional dependencies between modules (depmod is still
> needed for symbol dependencies).
>
> Greg/Rafael,
>
> Can we get this pulled into 5.11-rc1 or -rc2 soon please? I expect to
> see some issues due to device drivers that aren't following best
> practices (they don't expose the device to driver core). Want to
> identify those early on and try to have them fixed before 5.11 release.
> See [1] for an example of such a case.
>
> If we do end up have to revert anything, it'll just be Patch 5/5 (a one
> liner).
>
> Marc,
>
> You had hit issues with fw_devlink=on before on some of your systems.
> Want to give this a shot?
>
> Jisheng,
>
> Want to fix up one of those gpio drivers you were having problems with?
>

Hi Saravana,

I didn't send fix for the gpio-dwapb.c in last development window, so can
send patch once 5.11-rc1 is released.

thanks

2020-12-21 09:51:31

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] Enable fw_devlink=on by default

On Fri, Dec 18, 2020 at 4:17 AM Saravana Kannan <[email protected]> wrote:
>
> As discussed in LPC 2020, cyclic dependencies in firmware that couldn't
> be broken using logic was one of the last remaining reasons
> fw_devlink=on couldn't be set by default.
>
> This series changes fw_devlink so that when a cyclic dependency is found
> in firmware, the links between those devices fallback to permissive mode
> behavior. This way, the rest of the system still benefits from
> fw_devlink, but the ambiguous cases fallback to permissive mode.
>
> Setting fw_devlink=on by default brings a bunch of benefits (currently,
> only for systems with device tree firmware):
> * Significantly cuts down deferred probes.
> * Device probe is effectively attempted in graph order.
> * Makes it much easier to load drivers as modules without having to
> worry about functional dependencies between modules (depmod is still
> needed for symbol dependencies).
>
> Greg/Rafael,
>
> Can we get this pulled into 5.11-rc1 or -rc2 soon please?

Honestly, I'd rather not (but it's up to Greg).

This is a new series posted during the merge window, so it should not
be looked at even according to the rules.

Personally, I don't have the time to look at it now.

> I expect to see some issues due to device drivers that aren't following best
> practices (they don't expose the device to driver core). Want to
> identify those early on and try to have them fixed before 5.11 release.
> See [1] for an example of such a case.

So it should be posted right after -rc1 and spend a whole cycle in linux-next.

> If we do end up have to revert anything, it'll just be Patch 5/5 (a one
> liner).

Which totally doesn't matter IMV.

2021-01-07 20:06:26

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] Enable fw_devlink=on by default

On Thu, Dec 17, 2020 at 07:16:58PM -0800, Saravana Kannan wrote:
> As discussed in LPC 2020, cyclic dependencies in firmware that couldn't
> be broken using logic was one of the last remaining reasons
> fw_devlink=on couldn't be set by default.
>
> This series changes fw_devlink so that when a cyclic dependency is found
> in firmware, the links between those devices fallback to permissive mode
> behavior. This way, the rest of the system still benefits from
> fw_devlink, but the ambiguous cases fallback to permissive mode.
>
> Setting fw_devlink=on by default brings a bunch of benefits (currently,
> only for systems with device tree firmware):
> * Significantly cuts down deferred probes.
> * Device probe is effectively attempted in graph order.
> * Makes it much easier to load drivers as modules without having to
> worry about functional dependencies between modules (depmod is still
> needed for symbol dependencies).
>
> Greg/Rafael,
>
> Can we get this pulled into 5.11-rc1 or -rc2 soon please? I expect to
> see some issues due to device drivers that aren't following best
> practices (they don't expose the device to driver core). Want to
> identify those early on and try to have them fixed before 5.11 release.
> See [1] for an example of such a case.

Now queued up in my tree, will show up in linux-next in a few days,
let's see what breaks! :)

And it is scheduled for 5.12-rc1, not 5.11, sorry.

thanks,

greg k-h

2021-01-07 21:56:21

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] Enable fw_devlink=on by default

On Thu, Jan 7, 2021 at 12:04 PM Greg Kroah-Hartman
<[email protected]> wrote:
>
> On Thu, Dec 17, 2020 at 07:16:58PM -0800, Saravana Kannan wrote:
> > As discussed in LPC 2020, cyclic dependencies in firmware that couldn't
> > be broken using logic was one of the last remaining reasons
> > fw_devlink=on couldn't be set by default.
> >
> > This series changes fw_devlink so that when a cyclic dependency is found
> > in firmware, the links between those devices fallback to permissive mode
> > behavior. This way, the rest of the system still benefits from
> > fw_devlink, but the ambiguous cases fallback to permissive mode.
> >
> > Setting fw_devlink=on by default brings a bunch of benefits (currently,
> > only for systems with device tree firmware):
> > * Significantly cuts down deferred probes.
> > * Device probe is effectively attempted in graph order.
> > * Makes it much easier to load drivers as modules without having to
> > worry about functional dependencies between modules (depmod is still
> > needed for symbol dependencies).
> >
> > Greg/Rafael,
> >
> > Can we get this pulled into 5.11-rc1 or -rc2 soon please? I expect to
> > see some issues due to device drivers that aren't following best
> > practices (they don't expose the device to driver core). Want to
> > identify those early on and try to have them fixed before 5.11 release.
> > See [1] for an example of such a case.
>
> Now queued up in my tree, will show up in linux-next in a few days,
> let's see what breaks! :)
>
> And it is scheduled for 5.12-rc1, not 5.11, sorry.

Thanks. Not too worried about the actual version. I just want things
to start breaking as soon as possible if they are going to break.

-Saravana

2021-01-11 13:09:01

by Marek Szyprowski

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

Hi Saravana,

On 18.12.2020 04:17, Saravana Kannan wrote:
> Cyclic dependencies in some firmware was one of the last remaining
> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> dependencies don't block probing, set fw_devlink=on by default.
>
> Setting fw_devlink=on by default brings a bunch of benefits (currently,
> only for systems with device tree firmware):
> * Significantly cuts down deferred probes.
> * Device probe is effectively attempted in graph order.
> * Makes it much easier to load drivers as modules without having to
> worry about functional dependencies between modules (depmod is still
> needed for symbol dependencies).
>
> If this patch prevents some devices from probing, it's very likely due
> to the system having one or more device drivers that "probe"/set up a
> device (DT node with compatible property) without creating a struct
> device for it. If we hit such cases, the device drivers need to be
> fixed so that they populate struct devices and probe them like normal
> device drivers so that the driver core is aware of the devices and their
> status. See [1] for an example of such a case.
>
> [1] - https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> Signed-off-by: Saravana Kannan <[email protected]>

This patch landed recently in linux next-20210111 as commit e590474768f1
("driver core: Set fw_devlink=on by default"). Sadly it breaks Exynos
IOMMU operation, what causes lots of devices being deferred and not
probed at all. I've briefly checked and noticed that
exynos_sysmmu_probe() is never called after this patch. This is really
strange for me, as the SYSMMU controllers on Exynos platform are regular
platform devices registered by the OF code. The driver code is here:
drivers/iommu/exynos-iommu.c, example dts:
arch/arm/boot/dts/exynos3250.dtsi (compatible = "samsung,exynos-sysmmu").

> ---
> drivers/base/core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 4cc030361165..803bfa6eb823 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -1457,7 +1457,7 @@ static void device_links_purge(struct device *dev)
> #define FW_DEVLINK_FLAGS_RPM (FW_DEVLINK_FLAGS_ON | \
> DL_FLAG_PM_RUNTIME)
>
> -static u32 fw_devlink_flags = FW_DEVLINK_FLAGS_PERMISSIVE;
> +static u32 fw_devlink_flags = FW_DEVLINK_FLAGS_ON;
> static int __init fw_devlink_setup(char *arg)
> {
> if (!arg)

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland

2021-01-11 14:23:35

by Marek Szyprowski

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

On 11.01.2021 12:12, Marek Szyprowski wrote:
> On 18.12.2020 04:17, Saravana Kannan wrote:
>> Cyclic dependencies in some firmware was one of the last remaining
>> reasons fw_devlink=on couldn't be set by default. Now that cyclic
>> dependencies don't block probing, set fw_devlink=on by default.
>>
>> Setting fw_devlink=on by default brings a bunch of benefits (currently,
>> only for systems with device tree firmware):
>> * Significantly cuts down deferred probes.
>> * Device probe is effectively attempted in graph order.
>> * Makes it much easier to load drivers as modules without having to
>>    worry about functional dependencies between modules (depmod is still
>>    needed for symbol dependencies).
>>
>> If this patch prevents some devices from probing, it's very likely due
>> to the system having one or more device drivers that "probe"/set up a
>> device (DT node with compatible property) without creating a struct
>> device for it.  If we hit such cases, the device drivers need to be
>> fixed so that they populate struct devices and probe them like normal
>> device drivers so that the driver core is aware of the devices and their
>> status. See [1] for an example of such a case.
>>
>> [1] -
>> https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
>> Signed-off-by: Saravana Kannan <[email protected]>
>
> This patch landed recently in linux next-20210111 as commit
> e590474768f1 ("driver core: Set fw_devlink=on by default"). Sadly it
> breaks Exynos IOMMU operation, what causes lots of devices being
> deferred and not probed at all. I've briefly checked and noticed that
> exynos_sysmmu_probe() is never called after this patch. This is really
> strange for me, as the SYSMMU controllers on Exynos platform are
> regular platform devices registered by the OF code. The driver code is
> here: drivers/iommu/exynos-iommu.c, example dts:
> arch/arm/boot/dts/exynos3250.dtsi (compatible = "samsung,exynos-sysmmu").

Okay, I found the source of this problem. It is caused by Exynos power
domain driver, which is not platform driver yet. I will post a patch,
which converts it to the platform driver.

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland

2021-01-11 21:51:32

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

On Mon, Jan 11, 2021 at 6:18 AM Marek Szyprowski
<[email protected]> wrote:
>
> On 11.01.2021 12:12, Marek Szyprowski wrote:
> > On 18.12.2020 04:17, Saravana Kannan wrote:
> >> Cyclic dependencies in some firmware was one of the last remaining
> >> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> >> dependencies don't block probing, set fw_devlink=on by default.
> >>
> >> Setting fw_devlink=on by default brings a bunch of benefits (currently,
> >> only for systems with device tree firmware):
> >> * Significantly cuts down deferred probes.
> >> * Device probe is effectively attempted in graph order.
> >> * Makes it much easier to load drivers as modules without having to
> >> worry about functional dependencies between modules (depmod is still
> >> needed for symbol dependencies).
> >>
> >> If this patch prevents some devices from probing, it's very likely due
> >> to the system having one or more device drivers that "probe"/set up a
> >> device (DT node with compatible property) without creating a struct
> >> device for it. If we hit such cases, the device drivers need to be
> >> fixed so that they populate struct devices and probe them like normal
> >> device drivers so that the driver core is aware of the devices and their
> >> status. See [1] for an example of such a case.
> >>
> >> [1] -
> >> https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> >> Signed-off-by: Saravana Kannan <[email protected]>
> >
> > This patch landed recently in linux next-20210111 as commit
> > e590474768f1 ("driver core: Set fw_devlink=on by default"). Sadly it
> > breaks Exynos IOMMU operation, what causes lots of devices being
> > deferred and not probed at all. I've briefly checked and noticed that
> > exynos_sysmmu_probe() is never called after this patch. This is really
> > strange for me, as the SYSMMU controllers on Exynos platform are
> > regular platform devices registered by the OF code. The driver code is
> > here: drivers/iommu/exynos-iommu.c, example dts:
> > arch/arm/boot/dts/exynos3250.dtsi (compatible = "samsung,exynos-sysmmu").
>
> Okay, I found the source of this problem. It is caused by Exynos power
> domain driver, which is not platform driver yet. I will post a patch,
> which converts it to the platform driver.

Thanks Marek! Hopefully the debug logs I added were sufficient to
figure out the reason.

-Saravana

2021-01-12 23:04:34

by Marek Szyprowski

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

On 11.01.2021 22:47, Saravana Kannan wrote:
> On Mon, Jan 11, 2021 at 6:18 AM Marek Szyprowski
> <[email protected]> wrote:
>> On 11.01.2021 12:12, Marek Szyprowski wrote:
>>> On 18.12.2020 04:17, Saravana Kannan wrote:
>>>> Cyclic dependencies in some firmware was one of the last remaining
>>>> reasons fw_devlink=on couldn't be set by default. Now that cyclic
>>>> dependencies don't block probing, set fw_devlink=on by default.
>>>>
>>>> Setting fw_devlink=on by default brings a bunch of benefits (currently,
>>>> only for systems with device tree firmware):
>>>> * Significantly cuts down deferred probes.
>>>> * Device probe is effectively attempted in graph order.
>>>> * Makes it much easier to load drivers as modules without having to
>>>> worry about functional dependencies between modules (depmod is still
>>>> needed for symbol dependencies).
>>>>
>>>> If this patch prevents some devices from probing, it's very likely due
>>>> to the system having one or more device drivers that "probe"/set up a
>>>> device (DT node with compatible property) without creating a struct
>>>> device for it. If we hit such cases, the device drivers need to be
>>>> fixed so that they populate struct devices and probe them like normal
>>>> device drivers so that the driver core is aware of the devices and their
>>>> status. See [1] for an example of such a case.
>>>>
>>>> [1] -
>>>> https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
>>>> Signed-off-by: Saravana Kannan <[email protected]>
>>> This patch landed recently in linux next-20210111 as commit
>>> e590474768f1 ("driver core: Set fw_devlink=on by default"). Sadly it
>>> breaks Exynos IOMMU operation, what causes lots of devices being
>>> deferred and not probed at all. I've briefly checked and noticed that
>>> exynos_sysmmu_probe() is never called after this patch. This is really
>>> strange for me, as the SYSMMU controllers on Exynos platform are
>>> regular platform devices registered by the OF code. The driver code is
>>> here: drivers/iommu/exynos-iommu.c, example dts:
>>> arch/arm/boot/dts/exynos3250.dtsi (compatible = "samsung,exynos-sysmmu").
>> Okay, I found the source of this problem. It is caused by Exynos power
>> domain driver, which is not platform driver yet. I will post a patch,
>> which converts it to the platform driver.
> Thanks Marek! Hopefully the debug logs I added were sufficient to
> figure out the reason.

Frankly, it took me a while to figure out that device core waits for the
power domain devices. Maybe it would be possible to add some more debug
messages or hints? Like the reason of the deferred probe in
/sys/kernel/debug/devices_deferred ?

Best regards

--
Marek Szyprowski, PhD
Samsung R&D Institute Poland

2021-01-13 04:04:26

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

On Mon, Jan 11, 2021 at 11:11 PM Marek Szyprowski
<[email protected]> wrote:
>
> On 11.01.2021 22:47, Saravana Kannan wrote:
> > On Mon, Jan 11, 2021 at 6:18 AM Marek Szyprowski
> > <[email protected]> wrote:
> >> On 11.01.2021 12:12, Marek Szyprowski wrote:
> >>> On 18.12.2020 04:17, Saravana Kannan wrote:
> >>>> Cyclic dependencies in some firmware was one of the last remaining
> >>>> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> >>>> dependencies don't block probing, set fw_devlink=on by default.
> >>>>
> >>>> Setting fw_devlink=on by default brings a bunch of benefits (currently,
> >>>> only for systems with device tree firmware):
> >>>> * Significantly cuts down deferred probes.
> >>>> * Device probe is effectively attempted in graph order.
> >>>> * Makes it much easier to load drivers as modules without having to
> >>>> worry about functional dependencies between modules (depmod is still
> >>>> needed for symbol dependencies).
> >>>>
> >>>> If this patch prevents some devices from probing, it's very likely due
> >>>> to the system having one or more device drivers that "probe"/set up a
> >>>> device (DT node with compatible property) without creating a struct
> >>>> device for it. If we hit such cases, the device drivers need to be
> >>>> fixed so that they populate struct devices and probe them like normal
> >>>> device drivers so that the driver core is aware of the devices and their
> >>>> status. See [1] for an example of such a case.
> >>>>
> >>>> [1] -
> >>>> https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> >>>> Signed-off-by: Saravana Kannan <[email protected]>
> >>> This patch landed recently in linux next-20210111 as commit
> >>> e590474768f1 ("driver core: Set fw_devlink=on by default"). Sadly it
> >>> breaks Exynos IOMMU operation, what causes lots of devices being
> >>> deferred and not probed at all. I've briefly checked and noticed that
> >>> exynos_sysmmu_probe() is never called after this patch. This is really
> >>> strange for me, as the SYSMMU controllers on Exynos platform are
> >>> regular platform devices registered by the OF code. The driver code is
> >>> here: drivers/iommu/exynos-iommu.c, example dts:
> >>> arch/arm/boot/dts/exynos3250.dtsi (compatible = "samsung,exynos-sysmmu").
> >> Okay, I found the source of this problem. It is caused by Exynos power
> >> domain driver, which is not platform driver yet. I will post a patch,
> >> which converts it to the platform driver.
> > Thanks Marek! Hopefully the debug logs I added were sufficient to
> > figure out the reason.
>
> Frankly, it took me a while to figure out that device core waits for the
> power domain devices. Maybe it would be possible to add some more debug
> messages or hints? Like the reason of the deferred probe in
> /sys/kernel/debug/devices_deferred ?

There's already a /sys/devices/.../<device>/waiting_for_supplier file
that tells you if the device is waiting for a supplier device to be
added. That file goes away once the device probes. If the file has 1,
then it's waiting for the supplier device to be added (like your
case). If it's 0, then the device is just waiting on one of the
existing suppliers to probe. You can find the existing suppliers
through /sys/devices/.../<device>/supplier:*/supplier. Also, flip
these dev_dbg() to dev_info() if you need more details about deferred
probing.

https://lore.kernel.org/lkml/[email protected]/

Hopefully this meets what you are looking for?

-Saravana

2021-01-13 07:06:22

by Marek Szyprowski

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

Hi Saravana,

On 12.01.2021 21:51, Saravana Kannan wrote:
> On Mon, Jan 11, 2021 at 11:11 PM Marek Szyprowski
> <[email protected]> wrote:
>> On 11.01.2021 22:47, Saravana Kannan wrote:
>>> On Mon, Jan 11, 2021 at 6:18 AM Marek Szyprowski
>>> <[email protected]> wrote:
>>>> On 11.01.2021 12:12, Marek Szyprowski wrote:
>>>>> On 18.12.2020 04:17, Saravana Kannan wrote:
>>>>>> Cyclic dependencies in some firmware was one of the last remaining
>>>>>> reasons fw_devlink=on couldn't be set by default. Now that cyclic
>>>>>> dependencies don't block probing, set fw_devlink=on by default.
>>>>>>
>>>>>> Setting fw_devlink=on by default brings a bunch of benefits (currently,
>>>>>> only for systems with device tree firmware):
>>>>>> * Significantly cuts down deferred probes.
>>>>>> * Device probe is effectively attempted in graph order.
>>>>>> * Makes it much easier to load drivers as modules without having to
>>>>>> worry about functional dependencies between modules (depmod is still
>>>>>> needed for symbol dependencies).
>>>>>>
>>>>>> If this patch prevents some devices from probing, it's very likely due
>>>>>> to the system having one or more device drivers that "probe"/set up a
>>>>>> device (DT node with compatible property) without creating a struct
>>>>>> device for it. If we hit such cases, the device drivers need to be
>>>>>> fixed so that they populate struct devices and probe them like normal
>>>>>> device drivers so that the driver core is aware of the devices and their
>>>>>> status. See [1] for an example of such a case.
>>>>>>
>>>>>> [1] -
>>>>>> https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
>>>>>> Signed-off-by: Saravana Kannan <[email protected]>
>>>>> This patch landed recently in linux next-20210111 as commit
>>>>> e590474768f1 ("driver core: Set fw_devlink=on by default"). Sadly it
>>>>> breaks Exynos IOMMU operation, what causes lots of devices being
>>>>> deferred and not probed at all. I've briefly checked and noticed that
>>>>> exynos_sysmmu_probe() is never called after this patch. This is really
>>>>> strange for me, as the SYSMMU controllers on Exynos platform are
>>>>> regular platform devices registered by the OF code. The driver code is
>>>>> here: drivers/iommu/exynos-iommu.c, example dts:
>>>>> arch/arm/boot/dts/exynos3250.dtsi (compatible = "samsung,exynos-sysmmu").
>>>> Okay, I found the source of this problem. It is caused by Exynos power
>>>> domain driver, which is not platform driver yet. I will post a patch,
>>>> which converts it to the platform driver.
>>> Thanks Marek! Hopefully the debug logs I added were sufficient to
>>> figure out the reason.
>> Frankly, it took me a while to figure out that device core waits for the
>> power domain devices. Maybe it would be possible to add some more debug
>> messages or hints? Like the reason of the deferred probe in
>> /sys/kernel/debug/devices_deferred ?
> There's already a /sys/devices/.../<device>/waiting_for_supplier file
> that tells you if the device is waiting for a supplier device to be
> added. That file goes away once the device probes. If the file has 1,
> then it's waiting for the supplier device to be added (like your
> case). If it's 0, then the device is just waiting on one of the
> existing suppliers to probe. You can find the existing suppliers
> through /sys/devices/.../<device>/supplier:*/supplier. Also, flip
> these dev_dbg() to dev_info() if you need more details about deferred
> probing.

Frankly speaking I doubt that anyone will find those. Even experienced
developer might need some time to figure it out.

I expect that such information will be at least in the mentioned
/sys/kernel/debug/devices_deferred file. We already have infrastructure
for putting the deferred probe reason there, see dev_err_probe()
function. Even such a simple change makes the debugging this issue much
easier:

diff --git a/drivers/base/core.c b/drivers/base/core.c
index cd8e518fadd6..ceb5aed5a84c 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -937,12 +937,13 @@ int device_links_check_suppliers(struct device *dev)
        mutex_lock(&fwnode_link_lock);
        if (dev->fwnode && !list_empty(&dev->fwnode->suppliers) &&
            !fw_devlink_is_permissive()) {
-               dev_dbg(dev, "probe deferral - wait for supplier %pfwP\n",
+               ret = dev_err_probe(dev, -EPROBE_DEFER,
+                       "probe deferral - wait for supplier %pfwP\n",
list_first_entry(&dev->fwnode->suppliers,
                        struct fwnode_link,
                        c_hook)->supplier);
                mutex_unlock(&fwnode_link_lock);
-               return -EPROBE_DEFER;
+               return ret;
        }
        mutex_unlock(&fwnode_link_lock);

@@ -955,9 +956,9 @@ int device_links_check_suppliers(struct device *dev)
                if (link->status != DL_STATE_AVAILABLE &&
                    !(link->flags & DL_FLAG_SYNC_STATE_ONLY)) {
                        device_links_missing_supplier(dev);
-                       dev_dbg(dev, "probe deferral - supplier %s not
ready\n",
+                       ret = dev_err_probe(dev, -EPROBE_DEFER,
+                               "probe deferral - supplier %s not ready\n",
                                dev_name(link->supplier));
-                       ret = -EPROBE_DEFER;
                        break;
                }
                WRITE_ONCE(link->status, DL_STATE_CONSUMER_PROBE);


After such change:

# cat /sys/kernet/debug/devices_deferred
sound
13620000.sysmmu platform: probe deferral - supplier
10023c40.power-domain not ready
13630000.sysmmu platform: probe deferral - supplier
10023c40.power-domain not ready
12e20000.sysmmu platform: probe deferral - supplier
10023c20.power-domain not ready
11a20000.sysmmu platform: probe deferral - supplier
10023c00.power-domain not ready
11a30000.sysmmu platform: probe deferral - supplier
10023c00.power-domain not ready
11a40000.sysmmu platform: probe deferral - supplier
10023c00.power-domain not ready
11a50000.sysmmu platform: probe deferral - supplier
10023c00.power-domain not ready
11a60000.sysmmu platform: probe deferral - supplier
10023c00.power-domain not ready
11e20000.sysmmu platform: probe deferral - supplier
10023c80.power-domain not ready
12d00000.hdmi   platform: probe deferral - supplier
10023c20.power-domain not ready
10048000.clock-controller       platform: probe deferral - supplier
10023ca0.power-domain not ready
12260000.sysmmu platform: probe deferral - supplier
10048000.clock-controller not ready
12270000.sysmmu platform: probe deferral - supplier
10048000.clock-controller not ready
122a0000.sysmmu platform: probe deferral - supplier
10048000.clock-controller not ready
122b0000.sysmmu platform: probe deferral - supplier
10048000.clock-controller not ready
123b0000.sysmmu platform: probe deferral - supplier
10048000.clock-controller not ready
123c0000.sysmmu platform: probe deferral - supplier
10048000.clock-controller not ready
12c10000.mixer  platform: probe deferral - supplier
10023c20.power-domain not ready
13000000.gpu    platform: probe deferral - supplier
10023c60.power-domain not ready

Probably the message can be adjusted a bit, this would significantly
help me finding that is the source of the problem.

Best regards

--
Marek Szyprowski, PhD
Samsung R&D Institute Poland

2021-01-13 11:15:11

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] Enable fw_devlink=on by default

On 2021-01-07 20:05, Greg Kroah-Hartman wrote:
> On Thu, Dec 17, 2020 at 07:16:58PM -0800, Saravana Kannan wrote:
>> As discussed in LPC 2020, cyclic dependencies in firmware that
>> couldn't
>> be broken using logic was one of the last remaining reasons
>> fw_devlink=on couldn't be set by default.
>>
>> This series changes fw_devlink so that when a cyclic dependency is
>> found
>> in firmware, the links between those devices fallback to permissive
>> mode
>> behavior. This way, the rest of the system still benefits from
>> fw_devlink, but the ambiguous cases fallback to permissive mode.
>>
>> Setting fw_devlink=on by default brings a bunch of benefits
>> (currently,
>> only for systems with device tree firmware):
>> * Significantly cuts down deferred probes.
>> * Device probe is effectively attempted in graph order.
>> * Makes it much easier to load drivers as modules without having to
>> worry about functional dependencies between modules (depmod is still
>> needed for symbol dependencies).
>>
>> Greg/Rafael,
>>
>> Can we get this pulled into 5.11-rc1 or -rc2 soon please? I expect to
>> see some issues due to device drivers that aren't following best
>> practices (they don't expose the device to driver core). Want to
>> identify those early on and try to have them fixed before 5.11
>> release.
>> See [1] for an example of such a case.
>
> Now queued up in my tree, will show up in linux-next in a few days,
> let's see what breaks! :)
>
> And it is scheduled for 5.12-rc1, not 5.11, sorry.

For the record, this breaks my rk3399 board, (NanoPC-T4) as no mass
storage can be discovered (it lives on PCIe):

(initramfs) find /sys -name 'waiting_for_supplier'| xargs grep .| egrep
-v ':0$'
/sys/devices/platform/ff3d0000.i2c/i2c-4/4-0022/waiting_for_supplier:1
/sys/devices/platform/f8000000.pcie/waiting_for_supplier:1
/sys/devices/platform/fe320000.mmc/waiting_for_supplier:1
/sys/devices/platform/sdio-pwrseq/waiting_for_supplier:1
/sys/devices/platform/ff3c0000.i2c/i2c-0/0-001b/waiting_for_supplier:1

Enabling the debug prints in device_links_check_suppliers(), I end up
with
the dump below (apologies for the size).

This seems to all hang on the GPIO banks, but it is pretty unclear what
is wrong with them.

Happy to test things further.

M.

platform vcc3v3-sys: probe deferral - supplier vcc12v0-sys not ready
platform vcc5v0-sys: probe deferral - supplier vcc12v0-sys not ready
platform vcc1v8-s3: probe deferral - wait for supplier pmic@1b
platform vcc3v0-sd: probe deferral - supplier vcc3v3-sys not ready
platform vcca0v9-s3: probe deferral - supplier vcc1v8-s3 not ready
platform vcca1v8-s3: probe deferral - supplier vcc1v8-s3 not ready
platform vbus-typec: probe deferral - supplier vcc5v0-sys not ready
platform vcc5v0-host0: probe deferral - supplier vcc5v0-sys not ready
platform f8000000.pcie: probe deferral - wait for supplier
gpio2@ff780000
platform sdio-pwrseq: probe deferral - wait for supplier gpio0@ff720000
platform vcc1v8-s3: probe deferral - wait for supplier pmic@1b
platform vcca0v9-s3: probe deferral - supplier vcc1v8-s3 not ready
platform vcca1v8-s3: probe deferral - supplier vcc1v8-s3 not ready
platform f8000000.pcie: probe deferral - wait for supplier
gpio2@ff780000
platform sdio-pwrseq: probe deferral - wait for supplier gpio0@ff720000
platform vcc1v8-s3: probe deferral - wait for supplier pmic@1b
platform vcca0v9-s3: probe deferral - supplier vcc1v8-s3 not ready
platform vcca1v8-s3: probe deferral - supplier vcc1v8-s3 not ready
platform f8000000.pcie: probe deferral - wait for supplier
gpio2@ff780000
platform sdio-pwrseq: probe deferral - wait for supplier gpio0@ff720000
platform vcc1v8-s3: probe deferral - wait for supplier pmic@1b
platform vcca0v9-s3: probe deferral - supplier vcc1v8-s3 not ready
platform vcca1v8-s3: probe deferral - supplier vcc1v8-s3 not ready
platform f8000000.pcie: probe deferral - wait for supplier
gpio2@ff780000
platform sdio-pwrseq: probe deferral - wait for supplier gpio0@ff720000
platform vcc1v8-s3: probe deferral - wait for supplier pmic@1b
platform vcca0v9-s3: probe deferral - supplier vcc1v8-s3 not ready
platform vcca1v8-s3: probe deferral - supplier vcc1v8-s3 not ready
platform f8000000.pcie: probe deferral - wait for supplier
gpio2@ff780000
platform sdio-pwrseq: probe deferral - wait for supplier gpio0@ff720000
platform vcc1v8-s3: probe deferral - wait for supplier pmic@1b
platform vcca0v9-s3: probe deferral - supplier vcc1v8-s3 not ready
platform vcca1v8-s3: probe deferral - supplier vcc1v8-s3 not ready
platform f8000000.pcie: probe deferral - wait for supplier
gpio2@ff780000
platform sdio-pwrseq: probe deferral - wait for supplier gpio0@ff720000
platform vcc1v8-s3: probe deferral - wait for supplier pmic@1b
platform vcca0v9-s3: probe deferral - supplier vcc1v8-s3 not ready
platform vcca1v8-s3: probe deferral - supplier vcc1v8-s3 not ready
platform fe320000.mmc: probe deferral - wait for supplier pmic@1b
platform f8000000.pcie: probe deferral - wait for supplier
gpio2@ff780000
platform sdio-pwrseq: probe deferral - wait for supplier gpio0@ff720000
platform vcc1v8-s3: probe deferral - wait for supplier pmic@1b
platform vcca0v9-s3: probe deferral - supplier vcc1v8-s3 not ready
platform vcca1v8-s3: probe deferral - supplier vcc1v8-s3 not ready
platform fe320000.mmc: probe deferral - wait for supplier pmic@1b
platform f8000000.pcie: probe deferral - wait for supplier
gpio2@ff780000
platform sdio-pwrseq: probe deferral - wait for supplier gpio0@ff720000
platform vcc1v8-s3: probe deferral - supplier 0-001b not ready
platform vcca0v9-s3: probe deferral - supplier vcc1v8-s3 not ready
platform vcca1v8-s3: probe deferral - supplier vcc1v8-s3 not ready
platform f8000000.pcie: probe deferral - wait for supplier
gpio2@ff780000
platform fe320000.mmc: probe deferral - wait for supplier
gpio0@ff720000
platform sdio-pwrseq: probe deferral - wait for supplier gpio0@ff720000
platform vcc1v8-s3: probe deferral - supplier 0-001b not ready
platform vcca0v9-s3: probe deferral - supplier vcc1v8-s3 not ready
platform vcca1v8-s3: probe deferral - supplier vcc1v8-s3 not ready
platform f8000000.pcie: probe deferral - wait for supplier
gpio2@ff780000
platform fe320000.mmc: probe deferral - wait for supplier
gpio0@ff720000
platform fe300000.ethernet: probe deferral - supplier 0-001b not ready
platform fe380000.usb: probe deferral - supplier
ff770000.syscon:usb2-phy@e450 not ready
platform fe3c0000.usb: probe deferral - supplier
ff770000.syscon:usb2-phy@e460 not ready
platform sdio-pwrseq: probe deferral - wait for supplier gpio0@ff720000
platform vcc1v8-s3: probe deferral - supplier 0-001b not ready
platform vcca0v9-s3: probe deferral - supplier vcc1v8-s3 not ready
platform vcca1v8-s3: probe deferral - supplier vcc1v8-s3 not ready
platform f8000000.pcie: probe deferral - wait for supplier
gpio2@ff780000
platform fe3a0000.usb: probe deferral - supplier
ff770000.syscon:usb2-phy@e450 not ready
platform fe3e0000.usb: probe deferral - supplier
ff770000.syscon:usb2-phy@e460 not ready
platform fe320000.mmc: probe deferral - wait for supplier
gpio0@ff720000
platform fe300000.ethernet: probe deferral - supplier 0-001b not ready
platform sdio-pwrseq: probe deferral - wait for supplier gpio0@ff720000
platform vcc1v8-s3: probe deferral - supplier 0-001b not ready
platform vcca0v9-s3: probe deferral - supplier vcc1v8-s3 not ready
platform vcca1v8-s3: probe deferral - supplier vcc1v8-s3 not ready
platform f8000000.pcie: probe deferral - wait for supplier
gpio2@ff780000
i2c 0-001b: probe deferral - wait for supplier gpio1@ff730000
platform fe320000.mmc: probe deferral - wait for supplier
gpio0@ff720000
platform fe300000.ethernet: probe deferral - supplier 0-001b not ready
platform sdio-pwrseq: probe deferral - wait for supplier gpio0@ff720000
platform vcc1v8-s3: probe deferral - supplier 0-001b not ready
platform vcca0v9-s3: probe deferral - supplier vcc1v8-s3 not ready
platform vcca1v8-s3: probe deferral - supplier vcc1v8-s3 not ready
platform f8000000.pcie: probe deferral - wait for supplier
gpio2@ff780000
i2c 0-001b: probe deferral - wait for supplier gpio1@ff730000
platform fe320000.mmc: probe deferral - wait for supplier
gpio0@ff720000
platform fe300000.ethernet: probe deferral - supplier 0-001b not ready
platform sdio-pwrseq: probe deferral - wait for supplier gpio0@ff720000
platform vcc1v8-s3: probe deferral - supplier 0-001b not ready
platform vcca0v9-s3: probe deferral - supplier vcc1v8-s3 not ready
platform vcca1v8-s3: probe deferral - supplier vcc1v8-s3 not ready
platform f8000000.pcie: probe deferral - wait for supplier
gpio2@ff780000
i2c 0-001b: probe deferral - wait for supplier gpio1@ff730000
platform fe320000.mmc: probe deferral - wait for supplier
gpio0@ff720000
platform fe300000.ethernet: probe deferral - supplier 0-001b not ready
platform sdio-pwrseq: probe deferral - wait for supplier gpio0@ff720000
platform vcc1v8-s3: probe deferral - supplier 0-001b not ready
platform vcca0v9-s3: probe deferral - supplier vcc1v8-s3 not ready
platform vcca1v8-s3: probe deferral - supplier vcc1v8-s3 not ready
platform f8000000.pcie: probe deferral - wait for supplier
gpio2@ff780000
i2c 0-001b: probe deferral - wait for supplier gpio1@ff730000
platform fe320000.mmc: probe deferral - wait for supplier
gpio0@ff720000
platform fe300000.ethernet: probe deferral - supplier 0-001b not ready
platform sdio-pwrseq: probe deferral - wait for supplier gpio0@ff720000
platform vcc1v8-s3: probe deferral - supplier 0-001b not ready
platform vcca0v9-s3: probe deferral - supplier vcc1v8-s3 not ready
platform vcca1v8-s3: probe deferral - supplier vcc1v8-s3 not ready
platform f8000000.pcie: probe deferral - wait for supplier
gpio2@ff780000
i2c 0-001b: probe deferral - wait for supplier gpio1@ff730000
platform fe320000.mmc: probe deferral - wait for supplier
gpio0@ff720000
platform fe300000.ethernet: probe deferral - supplier 0-001b not ready
platform sdio-pwrseq: probe deferral - wait for supplier gpio0@ff720000

--
Jazz is not dead. It just smells funny...

2021-01-13 11:32:16

by Jon Hunter

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] Enable fw_devlink=on by default


On 18/12/2020 03:16, Saravana Kannan wrote:
> As discussed in LPC 2020, cyclic dependencies in firmware that couldn't
> be broken using logic was one of the last remaining reasons
> fw_devlink=on couldn't be set by default.
>
> This series changes fw_devlink so that when a cyclic dependency is found
> in firmware, the links between those devices fallback to permissive mode
> behavior. This way, the rest of the system still benefits from
> fw_devlink, but the ambiguous cases fallback to permissive mode.
>
> Setting fw_devlink=on by default brings a bunch of benefits (currently,
> only for systems with device tree firmware):
> * Significantly cuts down deferred probes.
> * Device probe is effectively attempted in graph order.
> * Makes it much easier to load drivers as modules without having to
> worry about functional dependencies between modules (depmod is still
> needed for symbol dependencies).


One issue we have come across with this is the of_mdio.c driver. On
Tegra194 Jetson Xavier I am seeing the following ...

boot: logs: [ 4.194791] WARNING KERN WARNING: CPU: 0 PID: 1 at /dvs/git/dirty/git-master_l4t-upstream/kernel/drivers/base/core.c:1189 device_links_driver_bound+0x240/0x260
boot: logs: [ 4.207683] WARNING KERN Modules linked in:
boot: logs: [ 4.210691] WARNING KERN CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.11.0-rc3-next-20210112-gdf869cab4b35 #1
boot: logs: [ 4.219221] WARNING KERN Hardware name: NVIDIA Jetson AGX Xavier Developer Kit (DT)
boot: logs: [ 4.225628] WARNING KERN pstate: 80400009 (Nzcv daif +PAN -UAO -TCO BTYPE=--)
boot: logs: [ 4.231542] WARNING KERN pc : device_links_driver_bound+0x240/0x260
boot: logs: [ 4.236587] WARNING KERN lr : device_links_driver_bound+0xf8/0x260
boot: logs: [ 4.241560] WARNING KERN sp : ffff800011f4b980
boot: logs: [ 4.244819] WARNING KERN x29: ffff800011f4b980 x28: ffff00008208a0a0
boot: logs: [ 4.250051] WARNING KERN x27: ffff00008208a080 x26: 00000000ffffffff
boot: logs: [ 4.255271] WARNING KERN x25: 0000000000000003 x24: ffff800011b99000
boot: logs: [ 4.260489] WARNING KERN x23: 0000000000000001 x22: ffff800011df14f0
boot: logs: [ 4.265706] WARNING KERN x21: ffff800011f4b9f8 x20: ffff800011df1000
boot: logs: [ 4.270934] WARNING KERN x19: ffff00008208a000 x18: 0000000000000005
boot: logs: [ 4.276166] WARNING KERN x17: 0000000000000007 x16: 0000000000000001
boot: logs: [ 4.281382] WARNING KERN x15: ffff000080030c90 x14: ffff0000805c9df8
boot: logs: [ 4.286618] WARNING KERN x13: 0000000000000000 x12: ffff000080030c90
boot: logs: [ 4.291847] WARNING KERN x11: ffff0000805c9da8 x10: 0000000000000040
boot: logs: [ 4.297061] WARNING KERN x9 : ffff000080030c98 x8 : 0000000000000000
boot: logs: [ 4.302291] WARNING KERN x7 : 0000000000000009 x6 : 0000000000000000
boot: logs: [ 4.307509] WARNING KERN x5 : ffff000080100000 x4 : 0000000000000000
boot: logs: [ 4.312739] WARNING KERN x3 : ffff800011df1e38 x2 : ffff000080908c10
boot: logs: [ 4.317956] WARNING KERN x1 : 0000000000000001 x0 : ffff0000809ca400
boot: logs: [ 4.323183] WARNING KERN Call trace:
boot: logs: [ 4.325593] WARNING KERN device_links_driver_bound+0x240/0x260
boot: logs: [ 4.330301] WARNING KERN driver_bound+0x70/0xd0
boot: logs: [ 4.333740] WARNING KERN device_bind_driver+0x50/0x60
boot: logs: [ 4.337671] WARNING KERN phy_attach_direct+0x258/0x2e0
boot: logs: [ 4.341718] WARNING KERN phylink_of_phy_connect+0x7c/0x140
boot: logs: [ 4.346081] WARNING KERN stmmac_open+0xb04/0xc70
boot: logs: [ 4.349612] WARNING KERN __dev_open+0xe0/0x190
boot: logs: [ 4.352972] WARNING KERN __dev_change_flags+0x16c/0x1b8
boot: logs: [ 4.357081] WARNING KERN dev_change_flags+0x20/0x60
boot: logs: [ 4.360856] WARNING KERN ip_auto_config+0x2a0/0xfe8
boot: logs: [ 4.364633] WARNING KERN do_one_initcall+0x58/0x1b8
boot: logs: [ 4.368405] WARNING KERN kernel_init_freeable+0x1ec/0x240
boot: logs: [ 4.372698] WARNING KERN kernel_init+0x10/0x110
boot: logs: [ 4.376130] WARNING KERN ret_from_fork+0x10/0x18


So looking at this change does this mean that the of_mdio needs to be
converted to a proper driver? I would have thought that this will be
seen on several platforms.

Cheers
Jon

--
nvpublic

2021-01-13 11:46:35

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] Enable fw_devlink=on by default

On Thu, 2020-12-17 at 19:16 -0800, Saravana Kannan wrote:
> As discussed in LPC 2020, cyclic dependencies in firmware that couldn't
> be broken using logic was one of the last remaining reasons
> fw_devlink=on couldn't be set by default.
>
> This series changes fw_devlink so that when a cyclic dependency is found
> in firmware, the links between those devices fallback to permissive mode
> behavior. This way, the rest of the system still benefits from
> fw_devlink, but the ambiguous cases fallback to permissive mode.
>
> Setting fw_devlink=on by default brings a bunch of benefits (currently,
> only for systems with device tree firmware):
> * Significantly cuts down deferred probes.
> * Device probe is effectively attempted in graph order.
> * Makes it much easier to load drivers as modules without having to
>   worry about functional dependencies between modules (depmod is still
>   needed for symbol dependencies).

FWIW I don't see any issues with this on Raspberry Pi 4 :).

Regards,
Nicolas


Attachments:
signature.asc (499.00 B)
This is a digitally signed message part

2021-01-13 11:51:21

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] Enable fw_devlink=on by default

On 2021-01-13 11:44, Nicolas Saenz Julienne wrote:
> On Thu, 2020-12-17 at 19:16 -0800, Saravana Kannan wrote:
>> As discussed in LPC 2020, cyclic dependencies in firmware that
>> couldn't
>> be broken using logic was one of the last remaining reasons
>> fw_devlink=on couldn't be set by default.
>>
>> This series changes fw_devlink so that when a cyclic dependency is
>> found
>> in firmware, the links between those devices fallback to permissive
>> mode
>> behavior. This way, the rest of the system still benefits from
>> fw_devlink, but the ambiguous cases fallback to permissive mode.
>>
>> Setting fw_devlink=on by default brings a bunch of benefits
>> (currently,
>> only for systems with device tree firmware):
>> * Significantly cuts down deferred probes.
>> * Device probe is effectively attempted in graph order.
>> * Makes it much easier to load drivers as modules without having to
>>   worry about functional dependencies between modules (depmod is still
>>   needed for symbol dependencies).
>
> FWIW I don't see any issues with this on Raspberry Pi 4 :).

Keep bragging! ;-)

M.
--
Jazz is not dead. It just smells funny...

2021-01-13 15:29:26

by Jon Hunter

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] Enable fw_devlink=on by default


On 13/01/2021 11:11, Marc Zyngier wrote:
> On 2021-01-07 20:05, Greg Kroah-Hartman wrote:
>> On Thu, Dec 17, 2020 at 07:16:58PM -0800, Saravana Kannan wrote:
>>> As discussed in LPC 2020, cyclic dependencies in firmware that couldn't
>>> be broken using logic was one of the last remaining reasons
>>> fw_devlink=on couldn't be set by default.
>>>
>>> This series changes fw_devlink so that when a cyclic dependency is found
>>> in firmware, the links between those devices fallback to permissive mode
>>> behavior. This way, the rest of the system still benefits from
>>> fw_devlink, but the ambiguous cases fallback to permissive mode.
>>>
>>> Setting fw_devlink=on by default brings a bunch of benefits (currently,
>>> only for systems with device tree firmware):
>>> * Significantly cuts down deferred probes.
>>> * Device probe is effectively attempted in graph order.
>>> * Makes it much easier to load drivers as modules without having to
>>>   worry about functional dependencies between modules (depmod is still
>>>   needed for symbol dependencies).
>>>
>>> Greg/Rafael,
>>>
>>> Can we get this pulled into 5.11-rc1 or -rc2 soon please? I expect to
>>> see some issues due to device drivers that aren't following best
>>> practices (they don't expose the device to driver core). Want to
>>> identify those early on and try to have them fixed before 5.11 release.
>>> See [1] for an example of such a case.
>>
>> Now queued up in my tree, will show up in linux-next in a few days,
>> let's see what breaks!  :)
>>
>> And it is scheduled for 5.12-rc1, not 5.11, sorry.
>
> For the record, this breaks my rk3399 board, (NanoPC-T4) as no mass
> storage can be discovered (it lives on PCIe):
>
> (initramfs) find /sys -name 'waiting_for_supplier'| xargs grep .| egrep
> -v ':0$'
> /sys/devices/platform/ff3d0000.i2c/i2c-4/4-0022/waiting_for_supplier:1
> /sys/devices/platform/f8000000.pcie/waiting_for_supplier:1
> /sys/devices/platform/fe320000.mmc/waiting_for_supplier:1
> /sys/devices/platform/sdio-pwrseq/waiting_for_supplier:1
> /sys/devices/platform/ff3c0000.i2c/i2c-0/0-001b/waiting_for_supplier:1
>
> Enabling the debug prints in device_links_check_suppliers(), I end up with
> the dump below (apologies for the size).


I am seeing the same problem on Tegra30 Cardhu A04 where several regulators
are continuously deferred and prevents the board from booting ...

[ 2.518334] platform panel: probe deferral - supplier regulator@11 not ready

[ 2.525503] platform regulator@1: probe deferral - supplier 4-002d not ready

[ 2.533141] platform regulator@3: probe deferral - supplier regulator@101 not ready

[ 2.540856] platform regulator@5: probe deferral - supplier regulator@101 not ready

[ 2.548589] platform regulator@6: probe deferral - supplier regulator@101 not ready

[ 2.556316] platform regulator@7: probe deferral - supplier regulator@101 not ready

[ 2.564041] platform regulator@8: probe deferral - supplier regulator@101 not ready

[ 2.571743] platform regulator@9: probe deferral - supplier regulator@101 not ready

[ 2.579463] platform regulator@10: probe deferral - supplier regulator@101 not ready

[ 2.587273] platform regulator@11: probe deferral - supplier regulator@101 not ready

[ 2.595088] platform regulator@12: probe deferral - supplier regulator@104 not ready

[ 2.603837] platform regulator@102: probe deferral - supplier regulator@104 not ready

[ 2.611726] platform regulator@103: probe deferral - supplier regulator@104 not ready

[ 2.620137] platform 3000.pcie: probe deferral - supplier regulator@5 not ready


Cheers
Jon

--
nvpublic

2021-01-13 19:25:51

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

On Tue, Jan 12, 2021 at 11:04 PM Marek Szyprowski
<[email protected]> wrote:
>
> Hi Saravana,
>
> On 12.01.2021 21:51, Saravana Kannan wrote:
> > On Mon, Jan 11, 2021 at 11:11 PM Marek Szyprowski
> > <[email protected]> wrote:
> >> On 11.01.2021 22:47, Saravana Kannan wrote:
> >>> On Mon, Jan 11, 2021 at 6:18 AM Marek Szyprowski
> >>> <[email protected]> wrote:
> >>>> On 11.01.2021 12:12, Marek Szyprowski wrote:
> >>>>> On 18.12.2020 04:17, Saravana Kannan wrote:
> >>>>>> Cyclic dependencies in some firmware was one of the last remaining
> >>>>>> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> >>>>>> dependencies don't block probing, set fw_devlink=on by default.
> >>>>>>
> >>>>>> Setting fw_devlink=on by default brings a bunch of benefits (currently,
> >>>>>> only for systems with device tree firmware):
> >>>>>> * Significantly cuts down deferred probes.
> >>>>>> * Device probe is effectively attempted in graph order.
> >>>>>> * Makes it much easier to load drivers as modules without having to
> >>>>>> worry about functional dependencies between modules (depmod is still
> >>>>>> needed for symbol dependencies).
> >>>>>>
> >>>>>> If this patch prevents some devices from probing, it's very likely due
> >>>>>> to the system having one or more device drivers that "probe"/set up a
> >>>>>> device (DT node with compatible property) without creating a struct
> >>>>>> device for it. If we hit such cases, the device drivers need to be
> >>>>>> fixed so that they populate struct devices and probe them like normal
> >>>>>> device drivers so that the driver core is aware of the devices and their
> >>>>>> status. See [1] for an example of such a case.
> >>>>>>
> >>>>>> [1] -
> >>>>>> https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> >>>>>> Signed-off-by: Saravana Kannan <[email protected]>
> >>>>> This patch landed recently in linux next-20210111 as commit
> >>>>> e590474768f1 ("driver core: Set fw_devlink=on by default"). Sadly it
> >>>>> breaks Exynos IOMMU operation, what causes lots of devices being
> >>>>> deferred and not probed at all. I've briefly checked and noticed that
> >>>>> exynos_sysmmu_probe() is never called after this patch. This is really
> >>>>> strange for me, as the SYSMMU controllers on Exynos platform are
> >>>>> regular platform devices registered by the OF code. The driver code is
> >>>>> here: drivers/iommu/exynos-iommu.c, example dts:
> >>>>> arch/arm/boot/dts/exynos3250.dtsi (compatible = "samsung,exynos-sysmmu").
> >>>> Okay, I found the source of this problem. It is caused by Exynos power
> >>>> domain driver, which is not platform driver yet. I will post a patch,
> >>>> which converts it to the platform driver.
> >>> Thanks Marek! Hopefully the debug logs I added were sufficient to
> >>> figure out the reason.
> >> Frankly, it took me a while to figure out that device core waits for the
> >> power domain devices. Maybe it would be possible to add some more debug
> >> messages or hints? Like the reason of the deferred probe in
> >> /sys/kernel/debug/devices_deferred ?
> > There's already a /sys/devices/.../<device>/waiting_for_supplier file
> > that tells you if the device is waiting for a supplier device to be
> > added. That file goes away once the device probes. If the file has 1,
> > then it's waiting for the supplier device to be added (like your
> > case). If it's 0, then the device is just waiting on one of the
> > existing suppliers to probe. You can find the existing suppliers
> > through /sys/devices/.../<device>/supplier:*/supplier. Also, flip
> > these dev_dbg() to dev_info() if you need more details about deferred
> > probing.
>
> Frankly speaking I doubt that anyone will find those. Even experienced
> developer might need some time to figure it out.
>
> I expect that such information will be at least in the mentioned
> /sys/kernel/debug/devices_deferred file. We already have infrastructure
> for putting the deferred probe reason there, see dev_err_probe()
> function. Even such a simple change makes the debugging this issue much
> easier:
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index cd8e518fadd6..ceb5aed5a84c 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -937,12 +937,13 @@ int device_links_check_suppliers(struct device *dev)
> mutex_lock(&fwnode_link_lock);
> if (dev->fwnode && !list_empty(&dev->fwnode->suppliers) &&
> !fw_devlink_is_permissive()) {
> - dev_dbg(dev, "probe deferral - wait for supplier %pfwP\n",
> + ret = dev_err_probe(dev, -EPROBE_DEFER,
> + "probe deferral - wait for supplier %pfwP\n",
> list_first_entry(&dev->fwnode->suppliers,
> struct fwnode_link,
> c_hook)->supplier);
> mutex_unlock(&fwnode_link_lock);
> - return -EPROBE_DEFER;
> + return ret;
> }
> mutex_unlock(&fwnode_link_lock);
>
> @@ -955,9 +956,9 @@ int device_links_check_suppliers(struct device *dev)
> if (link->status != DL_STATE_AVAILABLE &&
> !(link->flags & DL_FLAG_SYNC_STATE_ONLY)) {
> device_links_missing_supplier(dev);
> - dev_dbg(dev, "probe deferral - supplier %s not
> ready\n",
> + ret = dev_err_probe(dev, -EPROBE_DEFER,
> + "probe deferral - supplier %s not ready\n",
> dev_name(link->supplier));
> - ret = -EPROBE_DEFER;
> break;
> }
> WRITE_ONCE(link->status, DL_STATE_CONSUMER_PROBE);
>
>
> After such change:
>
> # cat /sys/kernet/debug/devices_deferred

Sweet! I wasn't aware of this file at all.

However, on a side note, one of my TODO items is to not add devices to
the deferred probe list if they'll never probe yet (due to suppliers
not having probed). On a board I tested on, it cut down really_probe()
calls by 75%! So the probe attempt itself effectively happens in graph
order (which I think is pretty cool). So that's going to conflict with
this file. I'll have to see what to do about that.

Thanks for this pointer. Let me sit on this for 2 weeks and see how I
can incorporate your suggestion while allowing for the above. And then
I'll send out a patch. Does that work?

-Saravana

> sound
> 13620000.sysmmu platform: probe deferral - supplier
> 10023c40.power-domain not ready
> 13630000.sysmmu platform: probe deferral - supplier
> 10023c40.power-domain not ready
> 12e20000.sysmmu platform: probe deferral - supplier
> 10023c20.power-domain not ready
> 11a20000.sysmmu platform: probe deferral - supplier
> 10023c00.power-domain not ready
> 11a30000.sysmmu platform: probe deferral - supplier
> 10023c00.power-domain not ready
> 11a40000.sysmmu platform: probe deferral - supplier
> 10023c00.power-domain not ready
> 11a50000.sysmmu platform: probe deferral - supplier
> 10023c00.power-domain not ready
> 11a60000.sysmmu platform: probe deferral - supplier
> 10023c00.power-domain not ready
> 11e20000.sysmmu platform: probe deferral - supplier
> 10023c80.power-domain not ready
> 12d00000.hdmi platform: probe deferral - supplier
> 10023c20.power-domain not ready
> 10048000.clock-controller platform: probe deferral - supplier
> 10023ca0.power-domain not ready
> 12260000.sysmmu platform: probe deferral - supplier
> 10048000.clock-controller not ready
> 12270000.sysmmu platform: probe deferral - supplier
> 10048000.clock-controller not ready
> 122a0000.sysmmu platform: probe deferral - supplier
> 10048000.clock-controller not ready
> 122b0000.sysmmu platform: probe deferral - supplier
> 10048000.clock-controller not ready
> 123b0000.sysmmu platform: probe deferral - supplier
> 10048000.clock-controller not ready
> 123c0000.sysmmu platform: probe deferral - supplier
> 10048000.clock-controller not ready
> 12c10000.mixer platform: probe deferral - supplier
> 10023c20.power-domain not ready
> 13000000.gpu platform: probe deferral - supplier
> 10023c60.power-domain not ready
>
> Probably the message can be adjusted a bit, this would significantly
> help me finding that is the source of the problem.
>
> Best regards
>
> --
> Marek Szyprowski, PhD
> Samsung R&D Institute Poland
>

2021-01-13 21:04:59

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] Enable fw_devlink=on by default

On Wed, Jan 13, 2021 at 3:11 AM Marc Zyngier <[email protected]> wrote:
>
> On 2021-01-07 20:05, Greg Kroah-Hartman wrote:
> > On Thu, Dec 17, 2020 at 07:16:58PM -0800, Saravana Kannan wrote:
> >> As discussed in LPC 2020, cyclic dependencies in firmware that
> >> couldn't
> >> be broken using logic was one of the last remaining reasons
> >> fw_devlink=on couldn't be set by default.
> >>
> >> This series changes fw_devlink so that when a cyclic dependency is
> >> found
> >> in firmware, the links between those devices fallback to permissive
> >> mode
> >> behavior. This way, the rest of the system still benefits from
> >> fw_devlink, but the ambiguous cases fallback to permissive mode.
> >>
> >> Setting fw_devlink=on by default brings a bunch of benefits
> >> (currently,
> >> only for systems with device tree firmware):
> >> * Significantly cuts down deferred probes.
> >> * Device probe is effectively attempted in graph order.
> >> * Makes it much easier to load drivers as modules without having to
> >> worry about functional dependencies between modules (depmod is still
> >> needed for symbol dependencies).
> >>
> >> Greg/Rafael,
> >>
> >> Can we get this pulled into 5.11-rc1 or -rc2 soon please? I expect to
> >> see some issues due to device drivers that aren't following best
> >> practices (they don't expose the device to driver core). Want to
> >> identify those early on and try to have them fixed before 5.11
> >> release.
> >> See [1] for an example of such a case.
> >
> > Now queued up in my tree, will show up in linux-next in a few days,
> > let's see what breaks! :)
> >
> > And it is scheduled for 5.12-rc1, not 5.11, sorry.
>
> For the record, this breaks my rk3399 board, (NanoPC-T4) as no mass
> storage can be discovered (it lives on PCIe):
>
> (initramfs) find /sys -name 'waiting_for_supplier'| xargs grep .| egrep
> -v ':0$'
> /sys/devices/platform/ff3d0000.i2c/i2c-4/4-0022/waiting_for_supplier:1
> /sys/devices/platform/f8000000.pcie/waiting_for_supplier:1
> /sys/devices/platform/fe320000.mmc/waiting_for_supplier:1
> /sys/devices/platform/sdio-pwrseq/waiting_for_supplier:1
> /sys/devices/platform/ff3c0000.i2c/i2c-0/0-001b/waiting_for_supplier:1
>
> Enabling the debug prints in device_links_check_suppliers(), I end up
> with
> the dump below (apologies for the size).
>
> This seems to all hang on the GPIO banks, but it is pretty unclear what
> is wrong with them.
>
> Happy to test things further.

Thanks for the logs Marc. Looks like a majority/all of the issue is
due to gpio device nodes [1] being "probed" without creating a proper
struct device for it.

You can see here [2] how the driver for the parent device just loops
through the child DT nodes and initializes them. This would be okay if
the DT nodes didn't have a "compatible" property for the gpio device
[1]. So to fix this, the driver[2] needs to be updated to properly
populate the child devices and then probe them using a proper driver
for "rockchip,gpio-bank". And most of the gpio init code into this new
driver.

The DT implementation of fw_devlink has the expectation that device
tree nodes with "compatible" properties will have struct devices
created for them. Without this expectation, it has no way to know how
far up the ancestor chain fw_devlink needs to walk up before it can
expect a supplier device to create a device link to.

Heiko,

Could you please refactor drivers/pinctrl/pinctrl-rockchip.c to create
and probe struct devices for "rockchip,gpio-bank" nodes? This allows
fw_devlink to work for these devices and makes sure devices probe in
the right order, suspend/resume in the right order, etc.

-Saravana

[1] - https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/boot/dts/rockchip/rk3399.dtsi#n1956
[2] - https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pinctrl/pinctrl-rockchip.c#n3566

>
> M.
>
> platform vcc3v3-sys: probe deferral - supplier vcc12v0-sys not ready
> platform vcc5v0-sys: probe deferral - supplier vcc12v0-sys not ready
> platform vcc1v8-s3: probe deferral - wait for supplier pmic@1b
> platform vcc3v0-sd: probe deferral - supplier vcc3v3-sys not ready
> platform vcca0v9-s3: probe deferral - supplier vcc1v8-s3 not ready
> platform vcca1v8-s3: probe deferral - supplier vcc1v8-s3 not ready
> platform vbus-typec: probe deferral - supplier vcc5v0-sys not ready
> platform vcc5v0-host0: probe deferral - supplier vcc5v0-sys not ready
> platform f8000000.pcie: probe deferral - wait for supplier
> gpio2@ff780000
> platform sdio-pwrseq: probe deferral - wait for supplier gpio0@ff720000
> platform vcc1v8-s3: probe deferral - wait for supplier pmic@1b
> platform vcca0v9-s3: probe deferral - supplier vcc1v8-s3 not ready
> platform vcca1v8-s3: probe deferral - supplier vcc1v8-s3 not ready
> platform f8000000.pcie: probe deferral - wait for supplier
> gpio2@ff780000
> platform sdio-pwrseq: probe deferral - wait for supplier gpio0@ff720000
> platform vcc1v8-s3: probe deferral - wait for supplier pmic@1b
> platform vcca0v9-s3: probe deferral - supplier vcc1v8-s3 not ready
> platform vcca1v8-s3: probe deferral - supplier vcc1v8-s3 not ready
> platform f8000000.pcie: probe deferral - wait for supplier
> gpio2@ff780000
> platform sdio-pwrseq: probe deferral - wait for supplier gpio0@ff720000
> platform vcc1v8-s3: probe deferral - wait for supplier pmic@1b
> platform vcca0v9-s3: probe deferral - supplier vcc1v8-s3 not ready
> platform vcca1v8-s3: probe deferral - supplier vcc1v8-s3 not ready
> platform f8000000.pcie: probe deferral - wait for supplier
> gpio2@ff780000
> platform sdio-pwrseq: probe deferral - wait for supplier gpio0@ff720000
> platform vcc1v8-s3: probe deferral - wait for supplier pmic@1b
> platform vcca0v9-s3: probe deferral - supplier vcc1v8-s3 not ready
> platform vcca1v8-s3: probe deferral - supplier vcc1v8-s3 not ready
> platform f8000000.pcie: probe deferral - wait for supplier
> gpio2@ff780000
> platform sdio-pwrseq: probe deferral - wait for supplier gpio0@ff720000
> platform vcc1v8-s3: probe deferral - wait for supplier pmic@1b
> platform vcca0v9-s3: probe deferral - supplier vcc1v8-s3 not ready
> platform vcca1v8-s3: probe deferral - supplier vcc1v8-s3 not ready
> platform f8000000.pcie: probe deferral - wait for supplier
> gpio2@ff780000
> platform sdio-pwrseq: probe deferral - wait for supplier gpio0@ff720000
> platform vcc1v8-s3: probe deferral - wait for supplier pmic@1b
> platform vcca0v9-s3: probe deferral - supplier vcc1v8-s3 not ready
> platform vcca1v8-s3: probe deferral - supplier vcc1v8-s3 not ready
> platform fe320000.mmc: probe deferral - wait for supplier pmic@1b
> platform f8000000.pcie: probe deferral - wait for supplier
> gpio2@ff780000
> platform sdio-pwrseq: probe deferral - wait for supplier gpio0@ff720000
> platform vcc1v8-s3: probe deferral - wait for supplier pmic@1b
> platform vcca0v9-s3: probe deferral - supplier vcc1v8-s3 not ready
> platform vcca1v8-s3: probe deferral - supplier vcc1v8-s3 not ready
> platform fe320000.mmc: probe deferral - wait for supplier pmic@1b
> platform f8000000.pcie: probe deferral - wait for supplier
> gpio2@ff780000
> platform sdio-pwrseq: probe deferral - wait for supplier gpio0@ff720000
> platform vcc1v8-s3: probe deferral - supplier 0-001b not ready
> platform vcca0v9-s3: probe deferral - supplier vcc1v8-s3 not ready
> platform vcca1v8-s3: probe deferral - supplier vcc1v8-s3 not ready
> platform f8000000.pcie: probe deferral - wait for supplier
> gpio2@ff780000
> platform fe320000.mmc: probe deferral - wait for supplier
> gpio0@ff720000
> platform sdio-pwrseq: probe deferral - wait for supplier gpio0@ff720000
> platform vcc1v8-s3: probe deferral - supplier 0-001b not ready
> platform vcca0v9-s3: probe deferral - supplier vcc1v8-s3 not ready
> platform vcca1v8-s3: probe deferral - supplier vcc1v8-s3 not ready
> platform f8000000.pcie: probe deferral - wait for supplier
> gpio2@ff780000
> platform fe320000.mmc: probe deferral - wait for supplier
> gpio0@ff720000
> platform fe300000.ethernet: probe deferral - supplier 0-001b not ready
> platform fe380000.usb: probe deferral - supplier
> ff770000.syscon:usb2-phy@e450 not ready
> platform fe3c0000.usb: probe deferral - supplier
> ff770000.syscon:usb2-phy@e460 not ready
> platform sdio-pwrseq: probe deferral - wait for supplier gpio0@ff720000
> platform vcc1v8-s3: probe deferral - supplier 0-001b not ready
> platform vcca0v9-s3: probe deferral - supplier vcc1v8-s3 not ready
> platform vcca1v8-s3: probe deferral - supplier vcc1v8-s3 not ready
> platform f8000000.pcie: probe deferral - wait for supplier
> gpio2@ff780000
> platform fe3a0000.usb: probe deferral - supplier
> ff770000.syscon:usb2-phy@e450 not ready
> platform fe3e0000.usb: probe deferral - supplier
> ff770000.syscon:usb2-phy@e460 not ready
> platform fe320000.mmc: probe deferral - wait for supplier
> gpio0@ff720000
> platform fe300000.ethernet: probe deferral - supplier 0-001b not ready
> platform sdio-pwrseq: probe deferral - wait for supplier gpio0@ff720000
> platform vcc1v8-s3: probe deferral - supplier 0-001b not ready
> platform vcca0v9-s3: probe deferral - supplier vcc1v8-s3 not ready
> platform vcca1v8-s3: probe deferral - supplier vcc1v8-s3 not ready
> platform f8000000.pcie: probe deferral - wait for supplier
> gpio2@ff780000
> i2c 0-001b: probe deferral - wait for supplier gpio1@ff730000
> platform fe320000.mmc: probe deferral - wait for supplier
> gpio0@ff720000
> platform fe300000.ethernet: probe deferral - supplier 0-001b not ready
> platform sdio-pwrseq: probe deferral - wait for supplier gpio0@ff720000
> platform vcc1v8-s3: probe deferral - supplier 0-001b not ready
> platform vcca0v9-s3: probe deferral - supplier vcc1v8-s3 not ready
> platform vcca1v8-s3: probe deferral - supplier vcc1v8-s3 not ready
> platform f8000000.pcie: probe deferral - wait for supplier
> gpio2@ff780000
> i2c 0-001b: probe deferral - wait for supplier gpio1@ff730000
> platform fe320000.mmc: probe deferral - wait for supplier
> gpio0@ff720000
> platform fe300000.ethernet: probe deferral - supplier 0-001b not ready
> platform sdio-pwrseq: probe deferral - wait for supplier gpio0@ff720000
> platform vcc1v8-s3: probe deferral - supplier 0-001b not ready
> platform vcca0v9-s3: probe deferral - supplier vcc1v8-s3 not ready
> platform vcca1v8-s3: probe deferral - supplier vcc1v8-s3 not ready
> platform f8000000.pcie: probe deferral - wait for supplier
> gpio2@ff780000
> i2c 0-001b: probe deferral - wait for supplier gpio1@ff730000
> platform fe320000.mmc: probe deferral - wait for supplier
> gpio0@ff720000
> platform fe300000.ethernet: probe deferral - supplier 0-001b not ready
> platform sdio-pwrseq: probe deferral - wait for supplier gpio0@ff720000
> platform vcc1v8-s3: probe deferral - supplier 0-001b not ready
> platform vcca0v9-s3: probe deferral - supplier vcc1v8-s3 not ready
> platform vcca1v8-s3: probe deferral - supplier vcc1v8-s3 not ready
> platform f8000000.pcie: probe deferral - wait for supplier
> gpio2@ff780000
> i2c 0-001b: probe deferral - wait for supplier gpio1@ff730000
> platform fe320000.mmc: probe deferral - wait for supplier
> gpio0@ff720000
> platform fe300000.ethernet: probe deferral - supplier 0-001b not ready
> platform sdio-pwrseq: probe deferral - wait for supplier gpio0@ff720000
> platform vcc1v8-s3: probe deferral - supplier 0-001b not ready
> platform vcca0v9-s3: probe deferral - supplier vcc1v8-s3 not ready
> platform vcca1v8-s3: probe deferral - supplier vcc1v8-s3 not ready
> platform f8000000.pcie: probe deferral - wait for supplier
> gpio2@ff780000
> i2c 0-001b: probe deferral - wait for supplier gpio1@ff730000
> platform fe320000.mmc: probe deferral - wait for supplier
> gpio0@ff720000
> platform fe300000.ethernet: probe deferral - supplier 0-001b not ready
> platform sdio-pwrseq: probe deferral - wait for supplier gpio0@ff720000
>
> --
> Jazz is not dead. It just smells funny...

2021-01-13 21:39:56

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] Enable fw_devlink=on by default

On Wed, Jan 13, 2021 at 3:30 AM Jon Hunter <[email protected]> wrote:
>
>
> On 18/12/2020 03:16, Saravana Kannan wrote:
> > As discussed in LPC 2020, cyclic dependencies in firmware that couldn't
> > be broken using logic was one of the last remaining reasons
> > fw_devlink=on couldn't be set by default.
> >
> > This series changes fw_devlink so that when a cyclic dependency is found
> > in firmware, the links between those devices fallback to permissive mode
> > behavior. This way, the rest of the system still benefits from
> > fw_devlink, but the ambiguous cases fallback to permissive mode.
> >
> > Setting fw_devlink=on by default brings a bunch of benefits (currently,
> > only for systems with device tree firmware):
> > * Significantly cuts down deferred probes.
> > * Device probe is effectively attempted in graph order.
> > * Makes it much easier to load drivers as modules without having to
> > worry about functional dependencies between modules (depmod is still
> > needed for symbol dependencies).
>
>
> One issue we have come across with this is the of_mdio.c driver. On
> Tegra194 Jetson Xavier I am seeing the following ...
>
> boot: logs: [ 4.194791] WARNING KERN WARNING: CPU: 0 PID: 1 at /dvs/git/dirty/git-master_l4t-upstream/kernel/drivers/base/core.c:1189 device_links_driver_bound+0x240/0x260
> boot: logs: [ 4.207683] WARNING KERN Modules linked in:
> boot: logs: [ 4.210691] WARNING KERN CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.11.0-rc3-next-20210112-gdf869cab4b35 #1
> boot: logs: [ 4.219221] WARNING KERN Hardware name: NVIDIA Jetson AGX Xavier Developer Kit (DT)
> boot: logs: [ 4.225628] WARNING KERN pstate: 80400009 (Nzcv daif +PAN -UAO -TCO BTYPE=--)
> boot: logs: [ 4.231542] WARNING KERN pc : device_links_driver_bound+0x240/0x260
> boot: logs: [ 4.236587] WARNING KERN lr : device_links_driver_bound+0xf8/0x260
> boot: logs: [ 4.241560] WARNING KERN sp : ffff800011f4b980
> boot: logs: [ 4.244819] WARNING KERN x29: ffff800011f4b980 x28: ffff00008208a0a0
> boot: logs: [ 4.250051] WARNING KERN x27: ffff00008208a080 x26: 00000000ffffffff
> boot: logs: [ 4.255271] WARNING KERN x25: 0000000000000003 x24: ffff800011b99000
> boot: logs: [ 4.260489] WARNING KERN x23: 0000000000000001 x22: ffff800011df14f0
> boot: logs: [ 4.265706] WARNING KERN x21: ffff800011f4b9f8 x20: ffff800011df1000
> boot: logs: [ 4.270934] WARNING KERN x19: ffff00008208a000 x18: 0000000000000005
> boot: logs: [ 4.276166] WARNING KERN x17: 0000000000000007 x16: 0000000000000001
> boot: logs: [ 4.281382] WARNING KERN x15: ffff000080030c90 x14: ffff0000805c9df8
> boot: logs: [ 4.286618] WARNING KERN x13: 0000000000000000 x12: ffff000080030c90
> boot: logs: [ 4.291847] WARNING KERN x11: ffff0000805c9da8 x10: 0000000000000040
> boot: logs: [ 4.297061] WARNING KERN x9 : ffff000080030c98 x8 : 0000000000000000
> boot: logs: [ 4.302291] WARNING KERN x7 : 0000000000000009 x6 : 0000000000000000
> boot: logs: [ 4.307509] WARNING KERN x5 : ffff000080100000 x4 : 0000000000000000
> boot: logs: [ 4.312739] WARNING KERN x3 : ffff800011df1e38 x2 : ffff000080908c10
> boot: logs: [ 4.317956] WARNING KERN x1 : 0000000000000001 x0 : ffff0000809ca400
> boot: logs: [ 4.323183] WARNING KERN Call trace:
> boot: logs: [ 4.325593] WARNING KERN device_links_driver_bound+0x240/0x260
> boot: logs: [ 4.330301] WARNING KERN driver_bound+0x70/0xd0
> boot: logs: [ 4.333740] WARNING KERN device_bind_driver+0x50/0x60
> boot: logs: [ 4.337671] WARNING KERN phy_attach_direct+0x258/0x2e0
> boot: logs: [ 4.341718] WARNING KERN phylink_of_phy_connect+0x7c/0x140
> boot: logs: [ 4.346081] WARNING KERN stmmac_open+0xb04/0xc70
> boot: logs: [ 4.349612] WARNING KERN __dev_open+0xe0/0x190
> boot: logs: [ 4.352972] WARNING KERN __dev_change_flags+0x16c/0x1b8
> boot: logs: [ 4.357081] WARNING KERN dev_change_flags+0x20/0x60
> boot: logs: [ 4.360856] WARNING KERN ip_auto_config+0x2a0/0xfe8
> boot: logs: [ 4.364633] WARNING KERN do_one_initcall+0x58/0x1b8
> boot: logs: [ 4.368405] WARNING KERN kernel_init_freeable+0x1ec/0x240
> boot: logs: [ 4.372698] WARNING KERN kernel_init+0x10/0x110
> boot: logs: [ 4.376130] WARNING KERN ret_from_fork+0x10/0x18
>
>
> So looking at this change does this mean that the of_mdio needs to be
> converted to a proper driver?

Sorry, there's not enough context in this log for me to tell how this
is even related to of_mdio.c. My guess is this is related to network
stack directly calling device_bind_driver() and not updating device
link state correctly. See what device_links_check_suppliers() does in
the normal path. I think I know which warning this is, but can you
check your tree and tell me the code you see in
drivers/base/core.c:1189 ?

Also, can you give me a few more lines above and below this log and
also explain why you think this is related to of_mdio.c? Where is the
DT file for this board in case I need to look at it? And where is this
phy node defined in DT?

If there's an easy way to convert it to a proper driver, that's always
better than calling into driver core in a piecemeal fashion.

> I would have thought that this will be
> seen on several platforms.

I'm surprised you are seeing this issue only now. I'd have expected it
to have happened even without this series.

-Saravana

2021-01-13 21:44:58

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] Enable fw_devlink=on by default

On Wed, Jan 13, 2021 at 7:27 AM Jon Hunter <[email protected]> wrote:
>
>
> On 13/01/2021 11:11, Marc Zyngier wrote:
> > On 2021-01-07 20:05, Greg Kroah-Hartman wrote:
> >> On Thu, Dec 17, 2020 at 07:16:58PM -0800, Saravana Kannan wrote:
> >>> As discussed in LPC 2020, cyclic dependencies in firmware that couldn't
> >>> be broken using logic was one of the last remaining reasons
> >>> fw_devlink=on couldn't be set by default.
> >>>
> >>> This series changes fw_devlink so that when a cyclic dependency is found
> >>> in firmware, the links between those devices fallback to permissive mode
> >>> behavior. This way, the rest of the system still benefits from
> >>> fw_devlink, but the ambiguous cases fallback to permissive mode.
> >>>
> >>> Setting fw_devlink=on by default brings a bunch of benefits (currently,
> >>> only for systems with device tree firmware):
> >>> * Significantly cuts down deferred probes.
> >>> * Device probe is effectively attempted in graph order.
> >>> * Makes it much easier to load drivers as modules without having to
> >>> worry about functional dependencies between modules (depmod is still
> >>> needed for symbol dependencies).
> >>>
> >>> Greg/Rafael,
> >>>
> >>> Can we get this pulled into 5.11-rc1 or -rc2 soon please? I expect to
> >>> see some issues due to device drivers that aren't following best
> >>> practices (they don't expose the device to driver core). Want to
> >>> identify those early on and try to have them fixed before 5.11 release.
> >>> See [1] for an example of such a case.
> >>
> >> Now queued up in my tree, will show up in linux-next in a few days,
> >> let's see what breaks! :)
> >>
> >> And it is scheduled for 5.12-rc1, not 5.11, sorry.
> >
> > For the record, this breaks my rk3399 board, (NanoPC-T4) as no mass
> > storage can be discovered (it lives on PCIe):
> >
> > (initramfs) find /sys -name 'waiting_for_supplier'| xargs grep .| egrep
> > -v ':0$'
> > /sys/devices/platform/ff3d0000.i2c/i2c-4/4-0022/waiting_for_supplier:1
> > /sys/devices/platform/f8000000.pcie/waiting_for_supplier:1
> > /sys/devices/platform/fe320000.mmc/waiting_for_supplier:1
> > /sys/devices/platform/sdio-pwrseq/waiting_for_supplier:1
> > /sys/devices/platform/ff3c0000.i2c/i2c-0/0-001b/waiting_for_supplier:1
> >
> > Enabling the debug prints in device_links_check_suppliers(), I end up with
> > the dump below (apologies for the size).
>
>
> I am seeing the same problem on Tegra30 Cardhu A04 where several regulators
> are continuously deferred and prevents the board from booting ...
>
> [ 2.518334] platform panel: probe deferral - supplier regulator@11 not ready
>
> [ 2.525503] platform regulator@1: probe deferral - supplier 4-002d not ready
>
> [ 2.533141] platform regulator@3: probe deferral - supplier regulator@101 not ready
>
> [ 2.540856] platform regulator@5: probe deferral - supplier regulator@101 not ready
>
> [ 2.548589] platform regulator@6: probe deferral - supplier regulator@101 not ready
>
> [ 2.556316] platform regulator@7: probe deferral - supplier regulator@101 not ready
>
> [ 2.564041] platform regulator@8: probe deferral - supplier regulator@101 not ready
>
> [ 2.571743] platform regulator@9: probe deferral - supplier regulator@101 not ready
>
> [ 2.579463] platform regulator@10: probe deferral - supplier regulator@101 not ready
>
> [ 2.587273] platform regulator@11: probe deferral - supplier regulator@101 not ready
>
> [ 2.595088] platform regulator@12: probe deferral - supplier regulator@104 not ready
>
> [ 2.603837] platform regulator@102: probe deferral - supplier regulator@104 not ready
>
> [ 2.611726] platform regulator@103: probe deferral - supplier regulator@104 not ready
>
> [ 2.620137] platform 3000.pcie: probe deferral - supplier regulator@5 not ready

Looks like this is not the whole log? Do you see any "wait for
supplier" logs? That's what all these boot issues should boil down to.
And as usual, pointer to DT for this board please.

-Saravana

2021-01-14 02:15:57

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] Enable fw_devlink=on by default

On Wed, Jan 13, 2021 at 3:48 AM Marc Zyngier <[email protected]> wrote:
>
> On 2021-01-13 11:44, Nicolas Saenz Julienne wrote:
> > On Thu, 2020-12-17 at 19:16 -0800, Saravana Kannan wrote:
> >> As discussed in LPC 2020, cyclic dependencies in firmware that
> >> couldn't
> >> be broken using logic was one of the last remaining reasons
> >> fw_devlink=on couldn't be set by default.
> >>
> >> This series changes fw_devlink so that when a cyclic dependency is
> >> found
> >> in firmware, the links between those devices fallback to permissive
> >> mode
> >> behavior. This way, the rest of the system still benefits from
> >> fw_devlink, but the ambiguous cases fallback to permissive mode.
> >>
> >> Setting fw_devlink=on by default brings a bunch of benefits
> >> (currently,
> >> only for systems with device tree firmware):
> >> * Significantly cuts down deferred probes.
> >> * Device probe is effectively attempted in graph order.
> >> * Makes it much easier to load drivers as modules without having to
> >> worry about functional dependencies between modules (depmod is still
> >> needed for symbol dependencies).
> >
> > FWIW I don't see any issues with this on Raspberry Pi 4 :).
>
> Keep bragging! ;-)
>

Yay! Thanks for confirming Nicolas.

-Saravana

2021-01-14 07:39:20

by Marek Szyprowski

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

Hi Saravana,

On 13.01.2021 20:23, Saravana Kannan wrote:
> On Tue, Jan 12, 2021 at 11:04 PM Marek Szyprowski
> <[email protected]> wrote:
>> On 12.01.2021 21:51, Saravana Kannan wrote:
>>> On Mon, Jan 11, 2021 at 11:11 PM Marek Szyprowski
>>> <[email protected]> wrote:
>>>> On 11.01.2021 22:47, Saravana Kannan wrote:
>>>>> On Mon, Jan 11, 2021 at 6:18 AM Marek Szyprowski
>>>>> <[email protected]> wrote:
>>>>>> On 11.01.2021 12:12, Marek Szyprowski wrote:
>>>>>>> On 18.12.2020 04:17, Saravana Kannan wrote:
>>>>>>>> Cyclic dependencies in some firmware was one of the last remaining
>>>>>>>> reasons fw_devlink=on couldn't be set by default. Now that cyclic
>>>>>>>> dependencies don't block probing, set fw_devlink=on by default.
>>>>>>>>
>>>>>>>> Setting fw_devlink=on by default brings a bunch of benefits (currently,
>>>>>>>> only for systems with device tree firmware):
>>>>>>>> * Significantly cuts down deferred probes.
>>>>>>>> * Device probe is effectively attempted in graph order.
>>>>>>>> * Makes it much easier to load drivers as modules without having to
>>>>>>>> worry about functional dependencies between modules (depmod is still
>>>>>>>> needed for symbol dependencies).
>>>>>>>>
>>>>>>>> If this patch prevents some devices from probing, it's very likely due
>>>>>>>> to the system having one or more device drivers that "probe"/set up a
>>>>>>>> device (DT node with compatible property) without creating a struct
>>>>>>>> device for it. If we hit such cases, the device drivers need to be
>>>>>>>> fixed so that they populate struct devices and probe them like normal
>>>>>>>> device drivers so that the driver core is aware of the devices and their
>>>>>>>> status. See [1] for an example of such a case.
>>>>>>>>
>>>>>>>> [1] -
>>>>>>>> https://protect2.fireeye.com/v1/url?k=68f5d8ba-376ee1f5-68f453f5-0cc47a30d446-324e64700545ab93&q=1&e=fb455b9e-c8c7-40d0-8e3c-d9d3713d519b&u=https%3A%2F%2Flore.kernel.org%2Flkml%2FCAGETcx9PiX%3D%3DmLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw%40mail.gmail.com%2F
>>>>>>>> Signed-off-by: Saravana Kannan <[email protected]>
>>>>>>> This patch landed recently in linux next-20210111 as commit
>>>>>>> e590474768f1 ("driver core: Set fw_devlink=on by default"). Sadly it
>>>>>>> breaks Exynos IOMMU operation, what causes lots of devices being
>>>>>>> deferred and not probed at all. I've briefly checked and noticed that
>>>>>>> exynos_sysmmu_probe() is never called after this patch. This is really
>>>>>>> strange for me, as the SYSMMU controllers on Exynos platform are
>>>>>>> regular platform devices registered by the OF code. The driver code is
>>>>>>> here: drivers/iommu/exynos-iommu.c, example dts:
>>>>>>> arch/arm/boot/dts/exynos3250.dtsi (compatible = "samsung,exynos-sysmmu").
>>>>>> Okay, I found the source of this problem. It is caused by Exynos power
>>>>>> domain driver, which is not platform driver yet. I will post a patch,
>>>>>> which converts it to the platform driver.
>>>>> Thanks Marek! Hopefully the debug logs I added were sufficient to
>>>>> figure out the reason.
>>>> Frankly, it took me a while to figure out that device core waits for the
>>>> power domain devices. Maybe it would be possible to add some more debug
>>>> messages or hints? Like the reason of the deferred probe in
>>>> /sys/kernel/debug/devices_deferred ?
>>> There's already a /sys/devices/.../<device>/waiting_for_supplier file
>>> that tells you if the device is waiting for a supplier device to be
>>> added. That file goes away once the device probes. If the file has 1,
>>> then it's waiting for the supplier device to be added (like your
>>> case). If it's 0, then the device is just waiting on one of the
>>> existing suppliers to probe. You can find the existing suppliers
>>> through /sys/devices/.../<device>/supplier:*/supplier. Also, flip
>>> these dev_dbg() to dev_info() if you need more details about deferred
>>> probing.
>> Frankly speaking I doubt that anyone will find those. Even experienced
>> developer might need some time to figure it out.
>>
>> I expect that such information will be at least in the mentioned
>> /sys/kernel/debug/devices_deferred file. We already have infrastructure
>> for putting the deferred probe reason there, see dev_err_probe()
>> function. Even such a simple change makes the debugging this issue much
>> easier:
>>
>> diff --git a/drivers/base/core.c b/drivers/base/core.c
>> index cd8e518fadd6..ceb5aed5a84c 100644
>> --- a/drivers/base/core.c
>> +++ b/drivers/base/core.c
>> @@ -937,12 +937,13 @@ int device_links_check_suppliers(struct device *dev)
>> mutex_lock(&fwnode_link_lock);
>> if (dev->fwnode && !list_empty(&dev->fwnode->suppliers) &&
>> !fw_devlink_is_permissive()) {
>> - dev_dbg(dev, "probe deferral - wait for supplier %pfwP\n",
>> + ret = dev_err_probe(dev, -EPROBE_DEFER,
>> + "probe deferral - wait for supplier %pfwP\n",
>> list_first_entry(&dev->fwnode->suppliers,
>> struct fwnode_link,
>> c_hook)->supplier);
>> mutex_unlock(&fwnode_link_lock);
>> - return -EPROBE_DEFER;
>> + return ret;
>> }
>> mutex_unlock(&fwnode_link_lock);
>>
>> @@ -955,9 +956,9 @@ int device_links_check_suppliers(struct device *dev)
>> if (link->status != DL_STATE_AVAILABLE &&
>> !(link->flags & DL_FLAG_SYNC_STATE_ONLY)) {
>> device_links_missing_supplier(dev);
>> - dev_dbg(dev, "probe deferral - supplier %s not
>> ready\n",
>> + ret = dev_err_probe(dev, -EPROBE_DEFER,
>> + "probe deferral - supplier %s not ready\n",
>> dev_name(link->supplier));
>> - ret = -EPROBE_DEFER;
>> break;
>> }
>> WRITE_ONCE(link->status, DL_STATE_CONSUMER_PROBE);
>>
>>
>> After such change:
>>
>> # cat /sys/kernet/debug/devices_deferred
> Sweet! I wasn't aware of this file at all.
>
> However, on a side note, one of my TODO items is to not add devices to
> the deferred probe list if they'll never probe yet (due to suppliers
> not having probed). On a board I tested on, it cut down really_probe()
> calls by 75%! So the probe attempt itself effectively happens in graph
> order (which I think is pretty cool). So that's going to conflict with
> this file. I'll have to see what to do about that.
>
> Thanks for this pointer. Let me sit on this for 2 weeks and see how I
> can incorporate your suggestion while allowing for the above. And then
> I'll send out a patch. Does that work?

Fine for me.

Even if you want to change the core not to probe devices that miss their
suppliers (what's good imho), the 'devices_deferred' file might still
contain all of them. For user it is just a list of devices that are not
yet available in the system with the optional reasons for that.

Best regards

--
Marek Szyprowski, PhD
Samsung R&D Institute Poland

2021-01-14 11:38:45

by Jon Hunter

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] Enable fw_devlink=on by default


On 13/01/2021 21:29, Saravana Kannan wrote:

...

>> I am seeing the same problem on Tegra30 Cardhu A04 where several regulators
>> are continuously deferred and prevents the board from booting ...
>>
>> [ 2.518334] platform panel: probe deferral - supplier regulator@11 not ready
>>
>> [ 2.525503] platform regulator@1: probe deferral - supplier 4-002d not ready
>>
>> [ 2.533141] platform regulator@3: probe deferral - supplier regulator@101 not ready
>>
>> [ 2.540856] platform regulator@5: probe deferral - supplier regulator@101 not ready
>>
>> [ 2.548589] platform regulator@6: probe deferral - supplier regulator@101 not ready
>>
>> [ 2.556316] platform regulator@7: probe deferral - supplier regulator@101 not ready
>>
>> [ 2.564041] platform regulator@8: probe deferral - supplier regulator@101 not ready
>>
>> [ 2.571743] platform regulator@9: probe deferral - supplier regulator@101 not ready
>>
>> [ 2.579463] platform regulator@10: probe deferral - supplier regulator@101 not ready
>>
>> [ 2.587273] platform regulator@11: probe deferral - supplier regulator@101 not ready
>>
>> [ 2.595088] platform regulator@12: probe deferral - supplier regulator@104 not ready
>>
>> [ 2.603837] platform regulator@102: probe deferral - supplier regulator@104 not ready
>>
>> [ 2.611726] platform regulator@103: probe deferral - supplier regulator@104 not ready
>>
>> [ 2.620137] platform 3000.pcie: probe deferral - supplier regulator@5 not ready
>
> Looks like this is not the whole log? Do you see any "wait for
> supplier" logs? That's what all these boot issues should boil down to.
> And as usual, pointer to DT for this board please.

Ah yes I see ...

platform regulator@1: probe deferral - wait for supplier tps65911@2d

Yes the device-tree for this board can be found here [0]. Looks like
there is a circular dependency between the vddctrl_reg and vddcore_reg.
This is part of coupled regulators which have a two-way linkage [1]. So
this change appears to conflict with this.

Jon

[0]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/tegra30-cardhu-a04.dts
[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/regulator/regulator.yaml#n129

--
nvpublic

2021-01-14 16:13:23

by Jon Hunter

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] Enable fw_devlink=on by default


On 13/01/2021 21:26, Saravana Kannan wrote:
> On Wed, Jan 13, 2021 at 3:30 AM Jon Hunter <[email protected]> wrote:
>>
>>
>> On 18/12/2020 03:16, Saravana Kannan wrote:
>>> As discussed in LPC 2020, cyclic dependencies in firmware that couldn't
>>> be broken using logic was one of the last remaining reasons
>>> fw_devlink=on couldn't be set by default.
>>>
>>> This series changes fw_devlink so that when a cyclic dependency is found
>>> in firmware, the links between those devices fallback to permissive mode
>>> behavior. This way, the rest of the system still benefits from
>>> fw_devlink, but the ambiguous cases fallback to permissive mode.
>>>
>>> Setting fw_devlink=on by default brings a bunch of benefits (currently,
>>> only for systems with device tree firmware):
>>> * Significantly cuts down deferred probes.
>>> * Device probe is effectively attempted in graph order.
>>> * Makes it much easier to load drivers as modules without having to
>>> worry about functional dependencies between modules (depmod is still
>>> needed for symbol dependencies).
>>
>>
>> One issue we have come across with this is the of_mdio.c driver. On
>> Tegra194 Jetson Xavier I am seeing the following ...
>>
>> boot: logs: [ 4.194791] WARNING KERN WARNING: CPU: 0 PID: 1 at /dvs/git/dirty/git-master_l4t-upstream/kernel/drivers/base/core.c:1189 device_links_driver_bound+0x240/0x260
>> boot: logs: [ 4.207683] WARNING KERN Modules linked in:
>> boot: logs: [ 4.210691] WARNING KERN CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.11.0-rc3-next-20210112-gdf869cab4b35 #1
>> boot: logs: [ 4.219221] WARNING KERN Hardware name: NVIDIA Jetson AGX Xavier Developer Kit (DT)
>> boot: logs: [ 4.225628] WARNING KERN pstate: 80400009 (Nzcv daif +PAN -UAO -TCO BTYPE=--)
>> boot: logs: [ 4.231542] WARNING KERN pc : device_links_driver_bound+0x240/0x260
>> boot: logs: [ 4.236587] WARNING KERN lr : device_links_driver_bound+0xf8/0x260
>> boot: logs: [ 4.241560] WARNING KERN sp : ffff800011f4b980
>> boot: logs: [ 4.244819] WARNING KERN x29: ffff800011f4b980 x28: ffff00008208a0a0
>> boot: logs: [ 4.250051] WARNING KERN x27: ffff00008208a080 x26: 00000000ffffffff
>> boot: logs: [ 4.255271] WARNING KERN x25: 0000000000000003 x24: ffff800011b99000
>> boot: logs: [ 4.260489] WARNING KERN x23: 0000000000000001 x22: ffff800011df14f0
>> boot: logs: [ 4.265706] WARNING KERN x21: ffff800011f4b9f8 x20: ffff800011df1000
>> boot: logs: [ 4.270934] WARNING KERN x19: ffff00008208a000 x18: 0000000000000005
>> boot: logs: [ 4.276166] WARNING KERN x17: 0000000000000007 x16: 0000000000000001
>> boot: logs: [ 4.281382] WARNING KERN x15: ffff000080030c90 x14: ffff0000805c9df8
>> boot: logs: [ 4.286618] WARNING KERN x13: 0000000000000000 x12: ffff000080030c90
>> boot: logs: [ 4.291847] WARNING KERN x11: ffff0000805c9da8 x10: 0000000000000040
>> boot: logs: [ 4.297061] WARNING KERN x9 : ffff000080030c98 x8 : 0000000000000000
>> boot: logs: [ 4.302291] WARNING KERN x7 : 0000000000000009 x6 : 0000000000000000
>> boot: logs: [ 4.307509] WARNING KERN x5 : ffff000080100000 x4 : 0000000000000000
>> boot: logs: [ 4.312739] WARNING KERN x3 : ffff800011df1e38 x2 : ffff000080908c10
>> boot: logs: [ 4.317956] WARNING KERN x1 : 0000000000000001 x0 : ffff0000809ca400
>> boot: logs: [ 4.323183] WARNING KERN Call trace:
>> boot: logs: [ 4.325593] WARNING KERN device_links_driver_bound+0x240/0x260
>> boot: logs: [ 4.330301] WARNING KERN driver_bound+0x70/0xd0
>> boot: logs: [ 4.333740] WARNING KERN device_bind_driver+0x50/0x60
>> boot: logs: [ 4.337671] WARNING KERN phy_attach_direct+0x258/0x2e0
>> boot: logs: [ 4.341718] WARNING KERN phylink_of_phy_connect+0x7c/0x140
>> boot: logs: [ 4.346081] WARNING KERN stmmac_open+0xb04/0xc70
>> boot: logs: [ 4.349612] WARNING KERN __dev_open+0xe0/0x190
>> boot: logs: [ 4.352972] WARNING KERN __dev_change_flags+0x16c/0x1b8
>> boot: logs: [ 4.357081] WARNING KERN dev_change_flags+0x20/0x60
>> boot: logs: [ 4.360856] WARNING KERN ip_auto_config+0x2a0/0xfe8
>> boot: logs: [ 4.364633] WARNING KERN do_one_initcall+0x58/0x1b8
>> boot: logs: [ 4.368405] WARNING KERN kernel_init_freeable+0x1ec/0x240
>> boot: logs: [ 4.372698] WARNING KERN kernel_init+0x10/0x110
>> boot: logs: [ 4.376130] WARNING KERN ret_from_fork+0x10/0x18
>>
>>
>> So looking at this change does this mean that the of_mdio needs to be
>> converted to a proper driver?
>
> Sorry, there's not enough context in this log for me to tell how this
> is even related to of_mdio.c. My guess is this is related to network
> stack directly calling device_bind_driver() and not updating device
> link state correctly. See what device_links_check_suppliers() does in
> the normal path. I think I know which warning this is, but can you
> check your tree and tell me the code you see in
> drivers/base/core.c:1189 ?

Yes this is the warning shown here [0] and this is coming from
the 'Generic PHY stmmac-0:00' device.

> Also, can you give me a few more lines above and below this log and
> also explain why you think this is related to of_mdio.c? Where is the
> DT file for this board in case I need to look at it? And where is this
> phy node defined in DT?

[ 4.179760] dwc-eth-dwmac 2490000.ethernet: User ID: 0x10, Synopsys ID: 0x50
[ 4.186743] dwc-eth-dwmac 2490000.ethernet: DWMAC4/5
[ 4.191755] dwc-eth-dwmac 2490000.ethernet: DMA HW capability register supported
[ 4.199062] dwc-eth-dwmac 2490000.ethernet: RX Checksum Offload Engine supported
[ 4.206379] dwc-eth-dwmac 2490000.ethernet: TX Checksum insertion supported
[ 4.213247] dwc-eth-dwmac 2490000.ethernet: Wake-Up On Lan supported
[ 4.219617] dwc-eth-dwmac 2490000.ethernet: TSO supported
[ 4.224954] dwc-eth-dwmac 2490000.ethernet: Enable RX Mitigation via HW Watchdog Timer
[ 4.232800] dwc-eth-dwmac 2490000.ethernet: device MAC address 4a:48:a7:a2:2e:d6
[ 4.240115] dwc-eth-dwmac 2490000.ethernet: Enabled Flow TC (entries=8)
[ 4.246638] dwc-eth-dwmac 2490000.ethernet: TSO feature enabled
[ 4.252499] dwc-eth-dwmac 2490000.ethernet: SPH feature enabled
[ 4.258383] dwc-eth-dwmac 2490000.ethernet: Using 40 bits DMA width
[ 4.265058] libphy: stmmac: probed
[ 4.269421] irq: IRQ63: trimming hierarchy from :bus@0:pmc@c360000
[ 4.276957] platform 3610000.usb: probe deferral - supplier 3520000.padctl not ready
[ 4.286759] platform 31c0000.i2c: probe deferral - wait for supplier dpaux@155e0000
[ 4.295970] cpufreq: cpufreq_online: CPU0: Running at unlisted initial frequency: 1305000 KHz, changing to: 1344000 KHz
[ 4.308146] cpufreq: cpufreq_online: CPU2: Running at unlisted initial frequency: 1306000 KHz, changing to: 1344000 KHz
[ 4.320108] cpufreq: cpufreq_online: CPU4: Running at unlisted initial frequency: 1305000 KHz, changing to: 1344000 KHz
[ 4.332191] cpufreq: cpufreq_online: CPU6: Running at unlisted initial frequency: 1305000 KHz, changing to: 1344000 KHz
[ 4.349276] sdhci-tegra 3400000.mmc: Got CD GPIO
[ 4.360405] mmc0: CQHCI version 5.10
[ 4.363006] tegra-xusb 3610000.usb: Firmware timestamp: 2019-07-24 05:47:34 UTC
[ 4.371278] tegra-xusb 3610000.usb: xHCI Host Controller
[ 4.371298] tegra-xusb 3610000.usb: new USB bus registered, assigned bus number 1
[ 4.371958] tegra-xusb 3610000.usb: hcc params 0x0184ff25 hci version 0x110 quirks 0x0000000000010810
[ 4.372001] tegra-xusb 3610000.usb: irq 29, io mem 0x03610000
[ 4.372522] hub 1-0:1.0: USB hub found
[ 4.372546] hub 1-0:1.0: 4 ports detected
[ 4.372887] tegra-xusb 3610000.usb: xHCI Host Controller
[ 4.372894] tegra-xusb 3610000.usb: new USB bus registered, assigned bus number 2
[ 4.372900] tegra-xusb 3610000.usb: Host supports USB 3.1 Enhanced SuperSpeed
[ 4.373227] hub 2-0:1.0: USB hub found
[ 4.373251] hub 2-0:1.0: 4 ports detected
[ 4.376437] platform 31c0000.i2c: probe deferral - wait for supplier dpaux@155e0000
[ 4.447782] platform 31c0000.i2c: probe deferral - wait for supplier dpaux@155e0000
[ 4.457409] irq: IRQ64: trimming hierarchy from :bus@0:pmc@c360000
[ 4.463735] irq: IRQ65: trimming hierarchy from :bus@0:interrupt-controller@3881000
[ 4.471401] input: gpio-keys as /devices/platform/gpio-keys/input/input0
[ 4.476701] mmc0: SDHCI controller on 3460000.mmc [3460000.mmc] using ADMA 64-bit
[ 4.485440] irq: IRQ66: trimming hierarchy from :bus@0:pmc@c360000
[ 4.486043] platform 31c0000.i2c: probe deferral - wait for supplier dpaux@155e0000
[ 4.492120] mmc1: SDHCI controller on 3400000.mmc [3400000.mmc] using ADMA 64-bit
[ 4.507063] platform 31c0000.i2c: probe deferral - wait for supplier dpaux@155e0000
[ 4.514674] ------------[ cut here ]------------
[ 4.524876] WARNING: CPU: 3 PID: 1 at /local/workdir/tegra/mlt-linux_next/kernel/drivers/base/core.c:1188 device_links_driver_bound+0x29c/0x2d8
[ 4.537563] Modules linked in:
[ 4.540602] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.11.0-rc3-next-20210113-dirty #1
[ 4.548545] Hardware name: NVIDIA Jetson AGX Xavier Developer Kit (DT)
[ 4.555019] pstate: 60400009 (nZCv daif +PAN -UAO -TCO BTYPE=--)
[ 4.560938] pc : device_links_driver_bound+0x29c/0x2d8
[ 4.566050] lr : device_links_driver_bound+0x29c/0x2d8
[ 4.571171] sp : ffff800011f4b980
[ 4.574467] x29: ffff800011f4b980 x28: ffff00008208a080
[ 4.579732] x27: ffff00008208a0a0 x26: ffff000080908c10
[ 4.585036] x25: ffff800011e4af73 x24: ffff800011b99000
[ 4.590347] x23: ffff800011df1428 x22: ffff800011f4b9f8
[ 4.595634] x21: ffff800011df1000 x20: ffff00008208a000
[ 4.600916] x19: ffff0000809ca400 x18: ffffffffffffffff
[ 4.606236] x17: 0000000000000007 x16: 0000000000000001
[ 4.611479] x15: 0000000000000613 x14: ffff800011f4b610
[ 4.616780] x13: 00000000ffffffea x12: ffff800011c0a320
[ 4.622027] x11: 0000000000000001 x10: 0000000000000001
[ 4.627304] x9 : 0000000000000003 x8 : ffff800011bb2378
[ 4.632589] x7 : ffff800011c0a378 x6 : c0000000ffffefff
[ 4.637831] x5 : 0000000000017fe8 x4 : 0000000000000000
[ 4.643124] x3 : 00000000ffffffff x2 : ffff800011bb22e8
[ 4.645339] mmc0: Command Queue Engine enabled
[ 4.648397] x1 : a13f0a1c9773d600 x0 : 0000000000000000
[ 4.648414] Call trace:
[ 4.648424] device_links_driver_bound+0x29c/0x2d8
[ 4.648446] driver_bound+0x6c/0xf8
[ 4.648455] device_bind_driver+0x50/0x60
[ 4.648462] phy_attach_direct+0x258/0x2e0
[ 4.648473] phylink_of_phy_connect+0x7c/0x140
[ 4.652967] mmc0: new HS200 MMC card at address 0001
[ 4.658075] stmmac_open+0xb04/0xc70
[ 4.658093] __dev_open+0xe0/0x190
[ 4.658142] __dev_change_flags+0x16c/0x1b8
[ 4.665285] dev_change_flags+0x20/0x60
[ 4.665326] ip_auto_config+0x2a0/0xfe8
[ 4.665340] do_one_initcall+0x58/0x1b8
[ 4.672731] kernel_init_freeable+0x1ec/0x240
[ 4.672746] kernel_init+0x10/0x110
[ 4.716301] ret_from_fork+0x10/0x18
[ 4.719865] ---[ end trace 819cead1701ad8da ]---
[ 4.724955] platform 31c0000.i2c: probe deferral - wait for supplier dpaux@155e0000
[ 4.725143] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB
[ 4.725260] dwc-eth-dwmac 2490000.ethernet eth0: PHY [stmmac-0:00] driver [Generic PHY] (irq=POLL)
[ 4.726387] dwmac4: Master AXI performs any burst length
[ 4.726410] dwc-eth-dwmac 2490000.ethernet eth0: No Safety Features support found
[ 4.726840] dwc-eth-dwmac 2490000.ethernet eth0: IEEE 1588-2008 Advanced Timestamp supported
[ 4.727011] dwc-eth-dwmac 2490000.ethernet eth0: registered PTP clock
[ 4.737024] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii-id link mode


The warning is occurring when device_bind_driver() is called in
phy_attach_direct() [1]. The device-tree ethernet node for this
board can be found here [2].

> If there's an easy way to convert it to a proper driver, that's always
> better than calling into driver core in a piecemeal fashion.

So this is a generic phy driver that has been around for quite some
time AFAICT.

>> I would have thought that this will be
>> seen on several platforms.
>
> I'm surprised you are seeing this issue only now. I'd have expected it
> to have happened even without this series.

We have automated testing that checks for new warnings with -next and
this is definitely new and the bisect points to this change.

Cheers
Jon


[0] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/base/core.c?h=next-20210112#n1189
[1] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/net/phy/phy_device.c?h=next-20210112#n1357
[2] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi?h=next-20210112#n31

--
nvpublic

2021-01-14 16:43:59

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] Enable fw_devlink=on by default

On Thu, Jan 14, 2021 at 3:35 AM Jon Hunter <[email protected]> wrote:
>
>
> On 13/01/2021 21:29, Saravana Kannan wrote:
>
> ...
>
> >> I am seeing the same problem on Tegra30 Cardhu A04 where several regulators
> >> are continuously deferred and prevents the board from booting ...
> >>
> >> [ 2.518334] platform panel: probe deferral - supplier regulator@11 not ready
> >>
> >> [ 2.525503] platform regulator@1: probe deferral - supplier 4-002d not ready
> >>
> >> [ 2.533141] platform regulator@3: probe deferral - supplier regulator@101 not ready
> >>
> >> [ 2.540856] platform regulator@5: probe deferral - supplier regulator@101 not ready
> >>
> >> [ 2.548589] platform regulator@6: probe deferral - supplier regulator@101 not ready
> >>
> >> [ 2.556316] platform regulator@7: probe deferral - supplier regulator@101 not ready
> >>
> >> [ 2.564041] platform regulator@8: probe deferral - supplier regulator@101 not ready
> >>
> >> [ 2.571743] platform regulator@9: probe deferral - supplier regulator@101 not ready
> >>
> >> [ 2.579463] platform regulator@10: probe deferral - supplier regulator@101 not ready
> >>
> >> [ 2.587273] platform regulator@11: probe deferral - supplier regulator@101 not ready
> >>
> >> [ 2.595088] platform regulator@12: probe deferral - supplier regulator@104 not ready
> >>
> >> [ 2.603837] platform regulator@102: probe deferral - supplier regulator@104 not ready
> >>
> >> [ 2.611726] platform regulator@103: probe deferral - supplier regulator@104 not ready
> >>
> >> [ 2.620137] platform 3000.pcie: probe deferral - supplier regulator@5 not ready
> >
> > Looks like this is not the whole log? Do you see any "wait for
> > supplier" logs? That's what all these boot issues should boil down to.
> > And as usual, pointer to DT for this board please.
>
> Ah yes I see ...
>
> platform regulator@1: probe deferral - wait for supplier tps65911@2d

Do you mind sharing the full log please? It's hard to tell you
anything useful with bits and pieces of logs.

> Yes the device-tree for this board can be found here [0]. Looks like
> there is a circular dependency between the vddctrl_reg and vddcore_reg.
> This is part of coupled regulators which have a two-way linkage [1]. So
> this change appears to conflict with this.

fw_devlink doesn't track "regulator-coupled-with". So that's probably
not it. Also, this patch series was made to handle simple cycles
properly. It'll functionally disable the device links it created when
it comes to probe ordering. Only two overlapping cycles might cause
issues -- and even that, not all the time. So yeah, full log please.

-Saravana

> Jon
>
> [0]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/tegra30-cardhu-a04.dts
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/regulator/regulator.yaml#n129
>
> --
> nvpublic

2021-01-14 16:51:10

by Jon Hunter

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] Enable fw_devlink=on by default


On 14/01/2021 16:40, Saravana Kannan wrote:
> On Thu, Jan 14, 2021 at 3:35 AM Jon Hunter <[email protected]> wrote:
>>
>>
>> On 13/01/2021 21:29, Saravana Kannan wrote:
>>
>> ...
>>
>>>> I am seeing the same problem on Tegra30 Cardhu A04 where several regulators
>>>> are continuously deferred and prevents the board from booting ...
>>>>
>>>> [ 2.518334] platform panel: probe deferral - supplier regulator@11 not ready
>>>>
>>>> [ 2.525503] platform regulator@1: probe deferral - supplier 4-002d not ready
>>>>
>>>> [ 2.533141] platform regulator@3: probe deferral - supplier regulator@101 not ready
>>>>
>>>> [ 2.540856] platform regulator@5: probe deferral - supplier regulator@101 not ready
>>>>
>>>> [ 2.548589] platform regulator@6: probe deferral - supplier regulator@101 not ready
>>>>
>>>> [ 2.556316] platform regulator@7: probe deferral - supplier regulator@101 not ready
>>>>
>>>> [ 2.564041] platform regulator@8: probe deferral - supplier regulator@101 not ready
>>>>
>>>> [ 2.571743] platform regulator@9: probe deferral - supplier regulator@101 not ready
>>>>
>>>> [ 2.579463] platform regulator@10: probe deferral - supplier regulator@101 not ready
>>>>
>>>> [ 2.587273] platform regulator@11: probe deferral - supplier regulator@101 not ready
>>>>
>>>> [ 2.595088] platform regulator@12: probe deferral - supplier regulator@104 not ready
>>>>
>>>> [ 2.603837] platform regulator@102: probe deferral - supplier regulator@104 not ready
>>>>
>>>> [ 2.611726] platform regulator@103: probe deferral - supplier regulator@104 not ready
>>>>
>>>> [ 2.620137] platform 3000.pcie: probe deferral - supplier regulator@5 not ready
>>>
>>> Looks like this is not the whole log? Do you see any "wait for
>>> supplier" logs? That's what all these boot issues should boil down to.
>>> And as usual, pointer to DT for this board please.
>>
>> Ah yes I see ...
>>
>> platform regulator@1: probe deferral - wait for supplier tps65911@2d
>
> Do you mind sharing the full log please? It's hard to tell you
> anything useful with bits and pieces of logs.
>
>> Yes the device-tree for this board can be found here [0]. Looks like
>> there is a circular dependency between the vddctrl_reg and vddcore_reg.
>> This is part of coupled regulators which have a two-way linkage [1]. So
>> this change appears to conflict with this.
>
> fw_devlink doesn't track "regulator-coupled-with". So that's probably
> not it. Also, this patch series was made to handle simple cycles
> properly. It'll functionally disable the device links it created when
> it comes to probe ordering. Only two overlapping cycles might cause
> issues -- and even that, not all the time. So yeah, full log please.


No problem. Please find attached.

Cheers
Jon


--
nvpublic


Attachments:
tegra30-cardhu-a04-bootlog.txt (101.65 kB)

2021-01-14 16:52:04

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] Enable fw_devlink=on by default

On Thu, Jan 14, 2021 at 8:11 AM Jon Hunter <[email protected]> wrote:
>
>
> On 13/01/2021 21:26, Saravana Kannan wrote:
> > On Wed, Jan 13, 2021 at 3:30 AM Jon Hunter <[email protected]> wrote:
> >>
> >>
> >> On 18/12/2020 03:16, Saravana Kannan wrote:
> >>> As discussed in LPC 2020, cyclic dependencies in firmware that couldn't
> >>> be broken using logic was one of the last remaining reasons
> >>> fw_devlink=on couldn't be set by default.
> >>>
> >>> This series changes fw_devlink so that when a cyclic dependency is found
> >>> in firmware, the links between those devices fallback to permissive mode
> >>> behavior. This way, the rest of the system still benefits from
> >>> fw_devlink, but the ambiguous cases fallback to permissive mode.
> >>>
> >>> Setting fw_devlink=on by default brings a bunch of benefits (currently,
> >>> only for systems with device tree firmware):
> >>> * Significantly cuts down deferred probes.
> >>> * Device probe is effectively attempted in graph order.
> >>> * Makes it much easier to load drivers as modules without having to
> >>> worry about functional dependencies between modules (depmod is still
> >>> needed for symbol dependencies).
> >>
> >>
> >> One issue we have come across with this is the of_mdio.c driver. On
> >> Tegra194 Jetson Xavier I am seeing the following ...
> >>
> >> boot: logs: [ 4.194791] WARNING KERN WARNING: CPU: 0 PID: 1 at /dvs/git/dirty/git-master_l4t-upstream/kernel/drivers/base/core.c:1189 device_links_driver_bound+0x240/0x260
> >> boot: logs: [ 4.207683] WARNING KERN Modules linked in:
> >> boot: logs: [ 4.210691] WARNING KERN CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.11.0-rc3-next-20210112-gdf869cab4b35 #1
> >> boot: logs: [ 4.219221] WARNING KERN Hardware name: NVIDIA Jetson AGX Xavier Developer Kit (DT)
> >> boot: logs: [ 4.225628] WARNING KERN pstate: 80400009 (Nzcv daif +PAN -UAO -TCO BTYPE=--)
> >> boot: logs: [ 4.231542] WARNING KERN pc : device_links_driver_bound+0x240/0x260
> >> boot: logs: [ 4.236587] WARNING KERN lr : device_links_driver_bound+0xf8/0x260
> >> boot: logs: [ 4.241560] WARNING KERN sp : ffff800011f4b980
> >> boot: logs: [ 4.244819] WARNING KERN x29: ffff800011f4b980 x28: ffff00008208a0a0
> >> boot: logs: [ 4.250051] WARNING KERN x27: ffff00008208a080 x26: 00000000ffffffff
> >> boot: logs: [ 4.255271] WARNING KERN x25: 0000000000000003 x24: ffff800011b99000
> >> boot: logs: [ 4.260489] WARNING KERN x23: 0000000000000001 x22: ffff800011df14f0
> >> boot: logs: [ 4.265706] WARNING KERN x21: ffff800011f4b9f8 x20: ffff800011df1000
> >> boot: logs: [ 4.270934] WARNING KERN x19: ffff00008208a000 x18: 0000000000000005
> >> boot: logs: [ 4.276166] WARNING KERN x17: 0000000000000007 x16: 0000000000000001
> >> boot: logs: [ 4.281382] WARNING KERN x15: ffff000080030c90 x14: ffff0000805c9df8
> >> boot: logs: [ 4.286618] WARNING KERN x13: 0000000000000000 x12: ffff000080030c90
> >> boot: logs: [ 4.291847] WARNING KERN x11: ffff0000805c9da8 x10: 0000000000000040
> >> boot: logs: [ 4.297061] WARNING KERN x9 : ffff000080030c98 x8 : 0000000000000000
> >> boot: logs: [ 4.302291] WARNING KERN x7 : 0000000000000009 x6 : 0000000000000000
> >> boot: logs: [ 4.307509] WARNING KERN x5 : ffff000080100000 x4 : 0000000000000000
> >> boot: logs: [ 4.312739] WARNING KERN x3 : ffff800011df1e38 x2 : ffff000080908c10
> >> boot: logs: [ 4.317956] WARNING KERN x1 : 0000000000000001 x0 : ffff0000809ca400
> >> boot: logs: [ 4.323183] WARNING KERN Call trace:
> >> boot: logs: [ 4.325593] WARNING KERN device_links_driver_bound+0x240/0x260
> >> boot: logs: [ 4.330301] WARNING KERN driver_bound+0x70/0xd0
> >> boot: logs: [ 4.333740] WARNING KERN device_bind_driver+0x50/0x60
> >> boot: logs: [ 4.337671] WARNING KERN phy_attach_direct+0x258/0x2e0
> >> boot: logs: [ 4.341718] WARNING KERN phylink_of_phy_connect+0x7c/0x140
> >> boot: logs: [ 4.346081] WARNING KERN stmmac_open+0xb04/0xc70
> >> boot: logs: [ 4.349612] WARNING KERN __dev_open+0xe0/0x190
> >> boot: logs: [ 4.352972] WARNING KERN __dev_change_flags+0x16c/0x1b8
> >> boot: logs: [ 4.357081] WARNING KERN dev_change_flags+0x20/0x60
> >> boot: logs: [ 4.360856] WARNING KERN ip_auto_config+0x2a0/0xfe8
> >> boot: logs: [ 4.364633] WARNING KERN do_one_initcall+0x58/0x1b8
> >> boot: logs: [ 4.368405] WARNING KERN kernel_init_freeable+0x1ec/0x240
> >> boot: logs: [ 4.372698] WARNING KERN kernel_init+0x10/0x110
> >> boot: logs: [ 4.376130] WARNING KERN ret_from_fork+0x10/0x18
> >>
> >>
> >> So looking at this change does this mean that the of_mdio needs to be
> >> converted to a proper driver?
> >
> > Sorry, there's not enough context in this log for me to tell how this
> > is even related to of_mdio.c. My guess is this is related to network
> > stack directly calling device_bind_driver() and not updating device
> > link state correctly. See what device_links_check_suppliers() does in
> > the normal path. I think I know which warning this is, but can you
> > check your tree and tell me the code you see in
> > drivers/base/core.c:1189 ?
>
> Yes this is the warning shown here [0] and this is coming from
> the 'Generic PHY stmmac-0:00' device.

Can you print the supplier and consumer device when this warning is
happening and let me know? That'd help too. I'm guessing the phy is
the consumer.

>
> > Also, can you give me a few more lines above and below this log and
> > also explain why you think this is related to of_mdio.c? Where is the
> > DT file for this board in case I need to look at it? And where is this
> > phy node defined in DT?
>
> [ 4.179760] dwc-eth-dwmac 2490000.ethernet: User ID: 0x10, Synopsys ID: 0x50
> [ 4.186743] dwc-eth-dwmac 2490000.ethernet: DWMAC4/5
> [ 4.191755] dwc-eth-dwmac 2490000.ethernet: DMA HW capability register supported
> [ 4.199062] dwc-eth-dwmac 2490000.ethernet: RX Checksum Offload Engine supported
> [ 4.206379] dwc-eth-dwmac 2490000.ethernet: TX Checksum insertion supported
> [ 4.213247] dwc-eth-dwmac 2490000.ethernet: Wake-Up On Lan supported
> [ 4.219617] dwc-eth-dwmac 2490000.ethernet: TSO supported
> [ 4.224954] dwc-eth-dwmac 2490000.ethernet: Enable RX Mitigation via HW Watchdog Timer
> [ 4.232800] dwc-eth-dwmac 2490000.ethernet: device MAC address 4a:48:a7:a2:2e:d6
> [ 4.240115] dwc-eth-dwmac 2490000.ethernet: Enabled Flow TC (entries=8)
> [ 4.246638] dwc-eth-dwmac 2490000.ethernet: TSO feature enabled
> [ 4.252499] dwc-eth-dwmac 2490000.ethernet: SPH feature enabled
> [ 4.258383] dwc-eth-dwmac 2490000.ethernet: Using 40 bits DMA width
> [ 4.265058] libphy: stmmac: probed
> [ 4.269421] irq: IRQ63: trimming hierarchy from :bus@0:pmc@c360000
> [ 4.276957] platform 3610000.usb: probe deferral - supplier 3520000.padctl not ready
> [ 4.286759] platform 31c0000.i2c: probe deferral - wait for supplier dpaux@155e0000
> [ 4.295970] cpufreq: cpufreq_online: CPU0: Running at unlisted initial frequency: 1305000 KHz, changing to: 1344000 KHz
> [ 4.308146] cpufreq: cpufreq_online: CPU2: Running at unlisted initial frequency: 1306000 KHz, changing to: 1344000 KHz
> [ 4.320108] cpufreq: cpufreq_online: CPU4: Running at unlisted initial frequency: 1305000 KHz, changing to: 1344000 KHz
> [ 4.332191] cpufreq: cpufreq_online: CPU6: Running at unlisted initial frequency: 1305000 KHz, changing to: 1344000 KHz
> [ 4.349276] sdhci-tegra 3400000.mmc: Got CD GPIO
> [ 4.360405] mmc0: CQHCI version 5.10
> [ 4.363006] tegra-xusb 3610000.usb: Firmware timestamp: 2019-07-24 05:47:34 UTC
> [ 4.371278] tegra-xusb 3610000.usb: xHCI Host Controller
> [ 4.371298] tegra-xusb 3610000.usb: new USB bus registered, assigned bus number 1
> [ 4.371958] tegra-xusb 3610000.usb: hcc params 0x0184ff25 hci version 0x110 quirks 0x0000000000010810
> [ 4.372001] tegra-xusb 3610000.usb: irq 29, io mem 0x03610000
> [ 4.372522] hub 1-0:1.0: USB hub found
> [ 4.372546] hub 1-0:1.0: 4 ports detected
> [ 4.372887] tegra-xusb 3610000.usb: xHCI Host Controller
> [ 4.372894] tegra-xusb 3610000.usb: new USB bus registered, assigned bus number 2
> [ 4.372900] tegra-xusb 3610000.usb: Host supports USB 3.1 Enhanced SuperSpeed
> [ 4.373227] hub 2-0:1.0: USB hub found
> [ 4.373251] hub 2-0:1.0: 4 ports detected
> [ 4.376437] platform 31c0000.i2c: probe deferral - wait for supplier dpaux@155e0000
> [ 4.447782] platform 31c0000.i2c: probe deferral - wait for supplier dpaux@155e0000
> [ 4.457409] irq: IRQ64: trimming hierarchy from :bus@0:pmc@c360000
> [ 4.463735] irq: IRQ65: trimming hierarchy from :bus@0:interrupt-controller@3881000
> [ 4.471401] input: gpio-keys as /devices/platform/gpio-keys/input/input0
> [ 4.476701] mmc0: SDHCI controller on 3460000.mmc [3460000.mmc] using ADMA 64-bit
> [ 4.485440] irq: IRQ66: trimming hierarchy from :bus@0:pmc@c360000
> [ 4.486043] platform 31c0000.i2c: probe deferral - wait for supplier dpaux@155e0000
> [ 4.492120] mmc1: SDHCI controller on 3400000.mmc [3400000.mmc] using ADMA 64-bit
> [ 4.507063] platform 31c0000.i2c: probe deferral - wait for supplier dpaux@155e0000
> [ 4.514674] ------------[ cut here ]------------
> [ 4.524876] WARNING: CPU: 3 PID: 1 at /local/workdir/tegra/mlt-linux_next/kernel/drivers/base/core.c:1188 device_links_driver_bound+0x29c/0x2d8
> [ 4.537563] Modules linked in:
> [ 4.540602] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.11.0-rc3-next-20210113-dirty #1
> [ 4.548545] Hardware name: NVIDIA Jetson AGX Xavier Developer Kit (DT)
> [ 4.555019] pstate: 60400009 (nZCv daif +PAN -UAO -TCO BTYPE=--)
> [ 4.560938] pc : device_links_driver_bound+0x29c/0x2d8
> [ 4.566050] lr : device_links_driver_bound+0x29c/0x2d8
> [ 4.571171] sp : ffff800011f4b980
> [ 4.574467] x29: ffff800011f4b980 x28: ffff00008208a080
> [ 4.579732] x27: ffff00008208a0a0 x26: ffff000080908c10
> [ 4.585036] x25: ffff800011e4af73 x24: ffff800011b99000
> [ 4.590347] x23: ffff800011df1428 x22: ffff800011f4b9f8
> [ 4.595634] x21: ffff800011df1000 x20: ffff00008208a000
> [ 4.600916] x19: ffff0000809ca400 x18: ffffffffffffffff
> [ 4.606236] x17: 0000000000000007 x16: 0000000000000001
> [ 4.611479] x15: 0000000000000613 x14: ffff800011f4b610
> [ 4.616780] x13: 00000000ffffffea x12: ffff800011c0a320
> [ 4.622027] x11: 0000000000000001 x10: 0000000000000001
> [ 4.627304] x9 : 0000000000000003 x8 : ffff800011bb2378
> [ 4.632589] x7 : ffff800011c0a378 x6 : c0000000ffffefff
> [ 4.637831] x5 : 0000000000017fe8 x4 : 0000000000000000
> [ 4.643124] x3 : 00000000ffffffff x2 : ffff800011bb22e8
> [ 4.645339] mmc0: Command Queue Engine enabled
> [ 4.648397] x1 : a13f0a1c9773d600 x0 : 0000000000000000
> [ 4.648414] Call trace:
> [ 4.648424] device_links_driver_bound+0x29c/0x2d8
> [ 4.648446] driver_bound+0x6c/0xf8
> [ 4.648455] device_bind_driver+0x50/0x60
> [ 4.648462] phy_attach_direct+0x258/0x2e0
> [ 4.648473] phylink_of_phy_connect+0x7c/0x140
> [ 4.652967] mmc0: new HS200 MMC card at address 0001
> [ 4.658075] stmmac_open+0xb04/0xc70
> [ 4.658093] __dev_open+0xe0/0x190
> [ 4.658142] __dev_change_flags+0x16c/0x1b8
> [ 4.665285] dev_change_flags+0x20/0x60
> [ 4.665326] ip_auto_config+0x2a0/0xfe8
> [ 4.665340] do_one_initcall+0x58/0x1b8
> [ 4.672731] kernel_init_freeable+0x1ec/0x240
> [ 4.672746] kernel_init+0x10/0x110
> [ 4.716301] ret_from_fork+0x10/0x18
> [ 4.719865] ---[ end trace 819cead1701ad8da ]---
> [ 4.724955] platform 31c0000.i2c: probe deferral - wait for supplier dpaux@155e0000
> [ 4.725143] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB
> [ 4.725260] dwc-eth-dwmac 2490000.ethernet eth0: PHY [stmmac-0:00] driver [Generic PHY] (irq=POLL)
> [ 4.726387] dwmac4: Master AXI performs any burst length
> [ 4.726410] dwc-eth-dwmac 2490000.ethernet eth0: No Safety Features support found
> [ 4.726840] dwc-eth-dwmac 2490000.ethernet eth0: IEEE 1588-2008 Advanced Timestamp supported
> [ 4.727011] dwc-eth-dwmac 2490000.ethernet eth0: registered PTP clock
> [ 4.737024] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii-id link mode
>
>
> The warning is occurring when device_bind_driver() is called in
> phy_attach_direct() [1]. The device-tree ethernet node for this
> board can be found here [2].

So the warning itself isn't a problem -- it's not breaking anything or
leaking memory or anything like that. But the device link is jumping
states in an incorrect manner. With enough context of this code (why
the device_bind_driver() is being called directly instead of going
through the normal probe path), it should be easy to fix (I'll just
need to fix up the device link state).

> > If there's an easy way to convert it to a proper driver, that's always
> > better than calling into driver core in a piecemeal fashion.
>
> So this is a generic phy driver that has been around for quite some
> time AFAICT.
>
> >> I would have thought that this will be
> >> seen on several platforms.
> >
> > I'm surprised you are seeing this issue only now. I'd have expected it
> > to have happened even without this series.
>
> We have automated testing that checks for new warnings with -next and
> this is definitely new and the bisect points to this change.

Yeah, after I sent the email, I figured out why you were starting to
see it only now. It's because with fw_devlink=on it'll create device
links that'll track the status of supplier/consumers a bit
differently.

-Saravana

>
>
> [0] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/base/core.c?h=next-20210112#n1189
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/net/phy/phy_device.c?h=next-20210112#n1357
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi?h=next-20210112#n31
>
> --
> nvpublic

2021-01-14 16:52:51

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] Enable fw_devlink=on by default

On Wed, Jan 13, 2021 at 11:48 PM Andy Shevchenko
<[email protected]> wrote:
>
>
>
> On Monday, December 21, 2020, Jisheng Zhang <[email protected]> wrote:
>>
>> On Thu, 17 Dec 2020 19:16:58 -0800 Saravana Kannan wrote:
>>
>>
>> >
>> >
>> > As discussed in LPC 2020, cyclic dependencies in firmware that couldn't
>> > be broken using logic was one of the last remaining reasons
>> > fw_devlink=on couldn't be set by default.
>> >
>> > This series changes fw_devlink so that when a cyclic dependency is found
>> > in firmware, the links between those devices fallback to permissive mode
>> > behavior. This way, the rest of the system still benefits from
>> > fw_devlink, but the ambiguous cases fallback to permissive mode.
>> >
>> > Setting fw_devlink=on by default brings a bunch of benefits (currently,
>> > only for systems with device tree firmware):
>> > * Significantly cuts down deferred probes.
>> > * Device probe is effectively attempted in graph order.
>> > * Makes it much easier to load drivers as modules without having to
>> > worry about functional dependencies between modules (depmod is still
>> > needed for symbol dependencies).
>> >
>> > Greg/Rafael,
>> >
>> > Can we get this pulled into 5.11-rc1 or -rc2 soon please? I expect to
>> > see some issues due to device drivers that aren't following best
>> > practices (they don't expose the device to driver core). Want to
>> > identify those early on and try to have them fixed before 5.11 release.
>> > See [1] for an example of such a case.
>> >
>> > If we do end up have to revert anything, it'll just be Patch 5/5 (a one
>> > liner).
>> >
>> > Marc,
>> >
>> > You had hit issues with fw_devlink=on before on some of your systems.
>> > Want to give this a shot?
>> >
>> > Jisheng,
>> >
>> > Want to fix up one of those gpio drivers you were having problems with?
>> >
>>
>> Hi Saravana,
>>
>> I didn't send fix for the gpio-dwapb.c in last development window, so can
>> send patch once 5.11-rc1 is released.
>
>
> If you are going to do anything with that GPIO driver, it should be removal of compatible strings from the device child nodes. The driver IIRC never used them anyhow anyway.

We already discussed this in a different thread. Just deleting DT is
not okay. That breaks a new kernel + old DT combo. Upgrading the
kernel shouldn't break a board.

-Saravana

2021-01-14 16:56:46

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] Enable fw_devlink=on by default

On Thu, Jan 14, 2021 at 8:48 AM Jon Hunter <[email protected]> wrote:
>
>
> On 14/01/2021 16:40, Saravana Kannan wrote:
> > On Thu, Jan 14, 2021 at 3:35 AM Jon Hunter <[email protected]> wrote:
> >>
> >>
> >> On 13/01/2021 21:29, Saravana Kannan wrote:
> >>
> >> ...
> >>
> >>>> I am seeing the same problem on Tegra30 Cardhu A04 where several regulators
> >>>> are continuously deferred and prevents the board from booting ...
> >>>>
> >>>> [ 2.518334] platform panel: probe deferral - supplier regulator@11 not ready
> >>>>
> >>>> [ 2.525503] platform regulator@1: probe deferral - supplier 4-002d not ready
> >>>>
> >>>> [ 2.533141] platform regulator@3: probe deferral - supplier regulator@101 not ready
> >>>>
> >>>> [ 2.540856] platform regulator@5: probe deferral - supplier regulator@101 not ready
> >>>>
> >>>> [ 2.548589] platform regulator@6: probe deferral - supplier regulator@101 not ready
> >>>>
> >>>> [ 2.556316] platform regulator@7: probe deferral - supplier regulator@101 not ready
> >>>>
> >>>> [ 2.564041] platform regulator@8: probe deferral - supplier regulator@101 not ready
> >>>>
> >>>> [ 2.571743] platform regulator@9: probe deferral - supplier regulator@101 not ready
> >>>>
> >>>> [ 2.579463] platform regulator@10: probe deferral - supplier regulator@101 not ready
> >>>>
> >>>> [ 2.587273] platform regulator@11: probe deferral - supplier regulator@101 not ready
> >>>>
> >>>> [ 2.595088] platform regulator@12: probe deferral - supplier regulator@104 not ready
> >>>>
> >>>> [ 2.603837] platform regulator@102: probe deferral - supplier regulator@104 not ready
> >>>>
> >>>> [ 2.611726] platform regulator@103: probe deferral - supplier regulator@104 not ready
> >>>>
> >>>> [ 2.620137] platform 3000.pcie: probe deferral - supplier regulator@5 not ready
> >>>
> >>> Looks like this is not the whole log? Do you see any "wait for
> >>> supplier" logs? That's what all these boot issues should boil down to.
> >>> And as usual, pointer to DT for this board please.
> >>
> >> Ah yes I see ...
> >>
> >> platform regulator@1: probe deferral - wait for supplier tps65911@2d
> >
> > Do you mind sharing the full log please? It's hard to tell you
> > anything useful with bits and pieces of logs.
> >
> >> Yes the device-tree for this board can be found here [0]. Looks like
> >> there is a circular dependency between the vddctrl_reg and vddcore_reg.
> >> This is part of coupled regulators which have a two-way linkage [1]. So
> >> this change appears to conflict with this.
> >
> > fw_devlink doesn't track "regulator-coupled-with". So that's probably
> > not it. Also, this patch series was made to handle simple cycles
> > properly. It'll functionally disable the device links it created when
> > it comes to probe ordering. Only two overlapping cycles might cause
> > issues -- and even that, not all the time. So yeah, full log please.
>
>
> No problem. Please find attached.

Thanks! I think you forgot to enable those logs though. Also, while
you are at it, maybe enable the logs in device_link_add() too please?

-Saravana

>
> Cheers
> Jon
>
>
> --
> nvpublic

2021-01-14 16:59:10

by Jon Hunter

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] Enable fw_devlink=on by default


On 14/01/2021 16:47, Saravana Kannan wrote:

...

>> Yes this is the warning shown here [0] and this is coming from
>> the 'Generic PHY stmmac-0:00' device.
>
> Can you print the supplier and consumer device when this warning is
> happening and let me know? That'd help too. I'm guessing the phy is
> the consumer.


Sorry I should have included that. I added a print to dump this on
another build but failed to include here.

WARNING KERN Generic PHY stmmac-0:00: supplier 2200000.gpio (status 1)

The status is the link->status and looks like the supplier is the
gpio controller. I have verified that the gpio controller is probed
before this successfully.

> So the warning itself isn't a problem -- it's not breaking anything or
> leaking memory or anything like that. But the device link is jumping
> states in an incorrect manner. With enough context of this code (why
> the device_bind_driver() is being called directly instead of going
> through the normal probe path), it should be easy to fix (I'll just
> need to fix up the device link state).


Correct, the board seems to boot fine, we just get this warning.

Cheers
Jon

--
nvpublic

2021-01-14 18:12:57

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

On Wed, Jan 13, 2021 at 11:36 PM Marek Szyprowski
<[email protected]> wrote:
>
> Hi Saravana,
>
> On 13.01.2021 20:23, Saravana Kannan wrote:
> > On Tue, Jan 12, 2021 at 11:04 PM Marek Szyprowski
> > <[email protected]> wrote:
> >> On 12.01.2021 21:51, Saravana Kannan wrote:
> >>> On Mon, Jan 11, 2021 at 11:11 PM Marek Szyprowski
> >>> <[email protected]> wrote:
> >>>> On 11.01.2021 22:47, Saravana Kannan wrote:
> >>>>> On Mon, Jan 11, 2021 at 6:18 AM Marek Szyprowski
> >>>>> <[email protected]> wrote:
> >>>>>> On 11.01.2021 12:12, Marek Szyprowski wrote:
> >>>>>>> On 18.12.2020 04:17, Saravana Kannan wrote:
> >>>>>>>> Cyclic dependencies in some firmware was one of the last remaining
> >>>>>>>> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> >>>>>>>> dependencies don't block probing, set fw_devlink=on by default.
> >>>>>>>>
> >>>>>>>> Setting fw_devlink=on by default brings a bunch of benefits (currently,
> >>>>>>>> only for systems with device tree firmware):
> >>>>>>>> * Significantly cuts down deferred probes.
> >>>>>>>> * Device probe is effectively attempted in graph order.
> >>>>>>>> * Makes it much easier to load drivers as modules without having to
> >>>>>>>> worry about functional dependencies between modules (depmod is still
> >>>>>>>> needed for symbol dependencies).
> >>>>>>>>
> >>>>>>>> If this patch prevents some devices from probing, it's very likely due
> >>>>>>>> to the system having one or more device drivers that "probe"/set up a
> >>>>>>>> device (DT node with compatible property) without creating a struct
> >>>>>>>> device for it. If we hit such cases, the device drivers need to be
> >>>>>>>> fixed so that they populate struct devices and probe them like normal
> >>>>>>>> device drivers so that the driver core is aware of the devices and their
> >>>>>>>> status. See [1] for an example of such a case.
> >>>>>>>>
> >>>>>>>> [1] -
> >>>>>>>> https://protect2.fireeye.com/v1/url?k=68f5d8ba-376ee1f5-68f453f5-0cc47a30d446-324e64700545ab93&q=1&e=fb455b9e-c8c7-40d0-8e3c-d9d3713d519b&u=https%3A%2F%2Flore.kernel.org%2Flkml%2FCAGETcx9PiX%3D%3DmLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw%40mail.gmail.com%2F
> >>>>>>>> Signed-off-by: Saravana Kannan <[email protected]>
> >>>>>>> This patch landed recently in linux next-20210111 as commit
> >>>>>>> e590474768f1 ("driver core: Set fw_devlink=on by default"). Sadly it
> >>>>>>> breaks Exynos IOMMU operation, what causes lots of devices being
> >>>>>>> deferred and not probed at all. I've briefly checked and noticed that
> >>>>>>> exynos_sysmmu_probe() is never called after this patch. This is really
> >>>>>>> strange for me, as the SYSMMU controllers on Exynos platform are
> >>>>>>> regular platform devices registered by the OF code. The driver code is
> >>>>>>> here: drivers/iommu/exynos-iommu.c, example dts:
> >>>>>>> arch/arm/boot/dts/exynos3250.dtsi (compatible = "samsung,exynos-sysmmu").
> >>>>>> Okay, I found the source of this problem. It is caused by Exynos power
> >>>>>> domain driver, which is not platform driver yet. I will post a patch,
> >>>>>> which converts it to the platform driver.
> >>>>> Thanks Marek! Hopefully the debug logs I added were sufficient to
> >>>>> figure out the reason.
> >>>> Frankly, it took me a while to figure out that device core waits for the
> >>>> power domain devices. Maybe it would be possible to add some more debug
> >>>> messages or hints? Like the reason of the deferred probe in
> >>>> /sys/kernel/debug/devices_deferred ?
> >>> There's already a /sys/devices/.../<device>/waiting_for_supplier file
> >>> that tells you if the device is waiting for a supplier device to be
> >>> added. That file goes away once the device probes. If the file has 1,
> >>> then it's waiting for the supplier device to be added (like your
> >>> case). If it's 0, then the device is just waiting on one of the
> >>> existing suppliers to probe. You can find the existing suppliers
> >>> through /sys/devices/.../<device>/supplier:*/supplier. Also, flip
> >>> these dev_dbg() to dev_info() if you need more details about deferred
> >>> probing.
> >> Frankly speaking I doubt that anyone will find those. Even experienced
> >> developer might need some time to figure it out.
> >>
> >> I expect that such information will be at least in the mentioned
> >> /sys/kernel/debug/devices_deferred file. We already have infrastructure
> >> for putting the deferred probe reason there, see dev_err_probe()
> >> function. Even such a simple change makes the debugging this issue much
> >> easier:
> >>
> >> diff --git a/drivers/base/core.c b/drivers/base/core.c
> >> index cd8e518fadd6..ceb5aed5a84c 100644
> >> --- a/drivers/base/core.c
> >> +++ b/drivers/base/core.c
> >> @@ -937,12 +937,13 @@ int device_links_check_suppliers(struct device *dev)
> >> mutex_lock(&fwnode_link_lock);
> >> if (dev->fwnode && !list_empty(&dev->fwnode->suppliers) &&
> >> !fw_devlink_is_permissive()) {
> >> - dev_dbg(dev, "probe deferral - wait for supplier %pfwP\n",
> >> + ret = dev_err_probe(dev, -EPROBE_DEFER,
> >> + "probe deferral - wait for supplier %pfwP\n",
> >> list_first_entry(&dev->fwnode->suppliers,
> >> struct fwnode_link,
> >> c_hook)->supplier);
> >> mutex_unlock(&fwnode_link_lock);
> >> - return -EPROBE_DEFER;
> >> + return ret;
> >> }
> >> mutex_unlock(&fwnode_link_lock);
> >>
> >> @@ -955,9 +956,9 @@ int device_links_check_suppliers(struct device *dev)
> >> if (link->status != DL_STATE_AVAILABLE &&
> >> !(link->flags & DL_FLAG_SYNC_STATE_ONLY)) {
> >> device_links_missing_supplier(dev);
> >> - dev_dbg(dev, "probe deferral - supplier %s not
> >> ready\n",
> >> + ret = dev_err_probe(dev, -EPROBE_DEFER,
> >> + "probe deferral - supplier %s not ready\n",
> >> dev_name(link->supplier));
> >> - ret = -EPROBE_DEFER;
> >> break;
> >> }
> >> WRITE_ONCE(link->status, DL_STATE_CONSUMER_PROBE);
> >>
> >>
> >> After such change:
> >>
> >> # cat /sys/kernet/debug/devices_deferred
> > Sweet! I wasn't aware of this file at all.
> >
> > However, on a side note, one of my TODO items is to not add devices to
> > the deferred probe list if they'll never probe yet (due to suppliers
> > not having probed). On a board I tested on, it cut down really_probe()
> > calls by 75%! So the probe attempt itself effectively happens in graph
> > order (which I think is pretty cool). So that's going to conflict with
> > this file. I'll have to see what to do about that.
> >
> > Thanks for this pointer. Let me sit on this for 2 weeks and see how I
> > can incorporate your suggestion while allowing for the above. And then
> > I'll send out a patch. Does that work?
>
> Fine for me.
>
> Even if you want to change the core not to probe devices that miss their
> suppliers (what's good imho), the 'devices_deferred' file might still
> contain all of them. For user it is just a list of devices that are not
> yet available in the system with the optional reasons for that.

Right, I understood that :) My point was that I'm assuming the debugfs
file loops through the deferred devices list. But with my
optimization, it won't find all the devices. So, we might need YET
another list. :-(

-Saravana

2021-01-14 19:01:57

by Jon Hunter

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] Enable fw_devlink=on by default


On 14/01/2021 16:52, Saravana Kannan wrote:

...

> Thanks! I think you forgot to enable those logs though. Also, while
> you are at it, maybe enable the logs in device_link_add() too please?


Sorry try this one.

Cheers
Jon

--
nvpublic


Attachments:
tegra30-cardhu-a04-bootlog.txt (375.50 kB)

2021-01-14 21:54:56

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] Enable fw_devlink=on by default

On Thu, Jan 14, 2021 at 10:55 AM Jon Hunter <[email protected]> wrote:
>
>
> On 14/01/2021 16:52, Saravana Kannan wrote:
>
> ...
>
> > Thanks! I think you forgot to enable those logs though. Also, while
> > you are at it, maybe enable the logs in device_link_add() too please?
>
>
> Sorry try this one.
>
> Cheers
> Jon

Phew! That took almost 4 hours to debug on the side! I think I figured
it out. Can you try this patch? If it works or improves things, I'll
explain why it helps.

-Saravana

diff --git a/drivers/of/property.c b/drivers/of/property.c
index 5f9eed79a8aa..1c8c65c4a887 100644
--- a/drivers/of/property.c
+++ b/drivers/of/property.c
@@ -1258,6 +1258,8 @@ DEFINE_SIMPLE_PROP(pinctrl5, "pinctrl-5", NULL)
DEFINE_SIMPLE_PROP(pinctrl6, "pinctrl-6", NULL)
DEFINE_SIMPLE_PROP(pinctrl7, "pinctrl-7", NULL)
DEFINE_SIMPLE_PROP(pinctrl8, "pinctrl-8", NULL)
+DEFINE_SIMPLE_PROP(gpio_compat, "gpio", "#gpio-cells")
+DEFINE_SIMPLE_PROP(gpios_compat, "gpios", "#gpio-cells")
DEFINE_SUFFIX_PROP(regulators, "-supply", NULL)
DEFINE_SUFFIX_PROP(gpio, "-gpio", "#gpio-cells")
DEFINE_SUFFIX_PROP(gpios, "-gpios", "#gpio-cells")
@@ -1296,6 +1298,8 @@ static const struct supplier_bindings
of_supplier_bindings[] = {
{ .parse_prop = parse_pinctrl6, },
{ .parse_prop = parse_pinctrl7, },
{ .parse_prop = parse_pinctrl8, },
+ { .parse_prop = parse_gpio_compat, },
+ { .parse_prop = parse_gpios_compat, },
{ .parse_prop = parse_regulators, },
{ .parse_prop = parse_gpio, },
{ .parse_prop = parse_gpios, },

2021-01-15 16:15:35

by Jon Hunter

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] Enable fw_devlink=on by default


On 14/01/2021 21:50, Saravana Kannan wrote:
> On Thu, Jan 14, 2021 at 10:55 AM Jon Hunter <[email protected]> wrote:
>>
>>
>> On 14/01/2021 16:52, Saravana Kannan wrote:
>>
>> ...
>>
>>> Thanks! I think you forgot to enable those logs though. Also, while
>>> you are at it, maybe enable the logs in device_link_add() too please?
>>
>>
>> Sorry try this one.
>>
>> Cheers
>> Jon
>
> Phew! That took almost 4 hours to debug on the side! I think I figured
> it out. Can you try this patch? If it works or improves things, I'll
> explain why it helps.
>
> -Saravana
>
> diff --git a/drivers/of/property.c b/drivers/of/property.c
> index 5f9eed79a8aa..1c8c65c4a887 100644
> --- a/drivers/of/property.c
> +++ b/drivers/of/property.c
> @@ -1258,6 +1258,8 @@ DEFINE_SIMPLE_PROP(pinctrl5, "pinctrl-5", NULL)
> DEFINE_SIMPLE_PROP(pinctrl6, "pinctrl-6", NULL)
> DEFINE_SIMPLE_PROP(pinctrl7, "pinctrl-7", NULL)
> DEFINE_SIMPLE_PROP(pinctrl8, "pinctrl-8", NULL)
> +DEFINE_SIMPLE_PROP(gpio_compat, "gpio", "#gpio-cells")
> +DEFINE_SIMPLE_PROP(gpios_compat, "gpios", "#gpio-cells")
> DEFINE_SUFFIX_PROP(regulators, "-supply", NULL)
> DEFINE_SUFFIX_PROP(gpio, "-gpio", "#gpio-cells")
> DEFINE_SUFFIX_PROP(gpios, "-gpios", "#gpio-cells")
> @@ -1296,6 +1298,8 @@ static const struct supplier_bindings
> of_supplier_bindings[] = {
> { .parse_prop = parse_pinctrl6, },
> { .parse_prop = parse_pinctrl7, },
> { .parse_prop = parse_pinctrl8, },
> + { .parse_prop = parse_gpio_compat, },
> + { .parse_prop = parse_gpios_compat, },
> { .parse_prop = parse_regulators, },
> { .parse_prop = parse_gpio, },
> { .parse_prop = parse_gpios, },
>

Thanks, that worked!

Tested-by: Jon Hunter <[email protected]>

Thanks for digging into that one. Would have taken me more than 4 hours!

Jon

--
nvpublic

2021-01-15 17:46:53

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] Enable fw_devlink=on by default

On Fri, Jan 15, 2021 at 8:13 AM Jon Hunter <[email protected]> wrote:
>
>
> On 14/01/2021 21:50, Saravana Kannan wrote:
> > On Thu, Jan 14, 2021 at 10:55 AM Jon Hunter <[email protected]> wrote:
> >>
> >>
> >> On 14/01/2021 16:52, Saravana Kannan wrote:
> >>
> >> ...
> >>
> >>> Thanks! I think you forgot to enable those logs though. Also, while
> >>> you are at it, maybe enable the logs in device_link_add() too please?
> >>
> >>
> >> Sorry try this one.
> >>
> >> Cheers
> >> Jon
> >
> > Phew! That took almost 4 hours to debug on the side! I think I figured
> > it out. Can you try this patch? If it works or improves things, I'll
> > explain why it helps.
> >
> > -Saravana
> >
> > diff --git a/drivers/of/property.c b/drivers/of/property.c
> > index 5f9eed79a8aa..1c8c65c4a887 100644
> > --- a/drivers/of/property.c
> > +++ b/drivers/of/property.c
> > @@ -1258,6 +1258,8 @@ DEFINE_SIMPLE_PROP(pinctrl5, "pinctrl-5", NULL)
> > DEFINE_SIMPLE_PROP(pinctrl6, "pinctrl-6", NULL)
> > DEFINE_SIMPLE_PROP(pinctrl7, "pinctrl-7", NULL)
> > DEFINE_SIMPLE_PROP(pinctrl8, "pinctrl-8", NULL)
> > +DEFINE_SIMPLE_PROP(gpio_compat, "gpio", "#gpio-cells")
> > +DEFINE_SIMPLE_PROP(gpios_compat, "gpios", "#gpio-cells")
> > DEFINE_SUFFIX_PROP(regulators, "-supply", NULL)
> > DEFINE_SUFFIX_PROP(gpio, "-gpio", "#gpio-cells")
> > DEFINE_SUFFIX_PROP(gpios, "-gpios", "#gpio-cells")
> > @@ -1296,6 +1298,8 @@ static const struct supplier_bindings
> > of_supplier_bindings[] = {
> > { .parse_prop = parse_pinctrl6, },
> > { .parse_prop = parse_pinctrl7, },
> > { .parse_prop = parse_pinctrl8, },
> > + { .parse_prop = parse_gpio_compat, },
> > + { .parse_prop = parse_gpios_compat, },
> > { .parse_prop = parse_regulators, },
> > { .parse_prop = parse_gpio, },
> > { .parse_prop = parse_gpios, },
> >
>
> Thanks, that worked!
>
> Tested-by: Jon Hunter <[email protected]>
>
> Thanks for digging into that one. Would have taken me more than 4 hours!

Thanks for testing. What was happening was that there was a cycle of
2-3 devices. A -(depends on)-> B -> C -> A.

And fw_devlink only understood A -> B since the rest were the gpio
bindings I added above. Without fw_devlink seeing the cycle, it can't
do cycle workarounds. So C's driver was deferring probe waiting on A
and none of them probed.

Once I added these and made the cycle visible to fw_devlink, it
handled it fine (basically between A, B and C, the device links don't
affect probe order anymore).


-Saravana

2021-01-17 23:48:05

by Michael Walle

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

Hi Saravana, again ;)

> Cyclic dependencies in some firmware was one of the last remaining
> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> dependencies don't block probing, set fw_devlink=on by default.
>
> Setting fw_devlink=on by default brings a bunch of benefits (currently,
> only for systems with device tree firmware):
> * Significantly cuts down deferred probes.
> * Device probe is effectively attempted in graph order.
> * Makes it much easier to load drivers as modules without having to
> worry about functional dependencies between modules (depmod is still
> needed for symbol dependencies).
>
> If this patch prevents some devices from probing, it's very likely due
> to the system having one or more device drivers that "probe"/set up a
> device (DT node with compatible property) without creating a struct
> device for it. If we hit such cases, the device drivers need to be
> fixed so that they populate struct devices and probe them like normal
> device drivers so that the driver core is aware of the devices and their
> status. See [1] for an example of such a case.
>
> [1] - https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> Signed-off-by: Saravana Kannan <[email protected]>

This breaks (at least) probing of the PCIe controllers of my board. The
driver in question is
drivers/pci/controller/dwc/pci-layerscape.c
I've also put the maintainers of this driver on CC. Looks like it uses a
proper struct device. But it uses builtin_platform_driver_probe() and
apparently it waits for the iommu which uses module_platform_driver().
Dunno if that will work together.

The board device tree can be found here:
arch/arm64/boot/dts/freescale/fsl-ls1028a-kontron-sl28-var3-ads2.dts

Attached is the log with enabled "probe deferral" messages enabled.

[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd083]
[ 0.000000] Linux version 5.11.0-rc3-next-20210115-00013-g43ea1c90dcc8-dirty (mwalle@mwalle01) (aarch64-linux-gnu-gcc (Debian 8.3.0-2) 8.3.0, GNU ld (GNU Binutils for Debian) 2.31.1) #357 SMP PREEMPT Sun Jan 17 23:46:11 CET 2021
[ 0.000000] Machine model: Kontron SMARC-sAL28 (Single PHY) on SMARC Eval 2.0 carrier
[ 0.000000] efi: UEFI not found.
[ 0.000000] NUMA: No NUMA configuration found
[ 0.000000] NUMA: Faking a node at [mem 0x0000000080000000-0x00000020ffffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0x20ff7d9200-0x20ff7dafff]
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x0000000080000000-0x00000000ffffffff]
[ 0.000000] DMA32 empty
[ 0.000000] Normal [mem 0x0000000100000000-0x00000020ffffffff]
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000080000000-0x00000000ffffffff]
[ 0.000000] node 0: [mem 0x0000002080000000-0x00000020ffffffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x00000020ffffffff]
[ 0.000000] On node 0 totalpages: 1048576
[ 0.000000] DMA zone: 8192 pages used for memmap
[ 0.000000] DMA zone: 0 pages reserved
[ 0.000000] DMA zone: 524288 pages, LIFO batch:63
[ 0.000000] Normal zone: 8192 pages used for memmap
[ 0.000000] Normal zone: 524288 pages, LIFO batch:63
[ 0.000000] cma: Reserved 32 MiB at 0x00000000fcc00000
[ 0.000000] percpu: Embedded 31 pages/cpu s89752 r8192 d29032 u126976
[ 0.000000] pcpu-alloc: s89752 r8192 d29032 u126976 alloc=31*4096
[ 0.000000] pcpu-alloc: [0] 0 [0] 1
[ 0.000000] Detected PIPT I-cache on CPU0
[ 0.000000] CPU features: detected: GIC system register CPU interface
[ 0.000000] CPU features: detected: Spectre-v3a
[ 0.000000] CPU features: detected: Spectre-v2
[ 0.000000] CPU features: detected: Spectre-v4
[ 0.000000] CPU features: detected: ARM errata 1165522, 1319367, or 1530923
[ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 1032192
[ 0.000000] Policy zone: Normal
[ 0.000000] Kernel command line: debug root=/dev/mmcblk0p2 rootwait
[ 0.000000] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes, linear)
[ 0.000000] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes, linear)
[ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[ 0.000000] software IO TLB: mapped [mem 0x00000000f8c00000-0x00000000fcc00000] (64MB)
[ 0.000000] Memory: 3987204K/4194304K available (14592K kernel code, 2024K rwdata, 5776K rodata, 4736K init, 848K bss, 174332K reserved, 32768K cma-reserved)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
[ 0.000000] ftrace: allocating 51093 entries in 200 pages
[ 0.000000] ftrace: allocated 200 pages with 3 groups
[ 0.000000] rcu: Preemptible hierarchical RCU implementation.
[ 0.000000] rcu: RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=2.
[ 0.000000] Trampoline variant of Tasks RCU enabled.
[ 0.000000] Rude variant of Tasks RCU enabled.
[ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
[ 0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
[ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[ 0.000000] GICv3: GIC: Using split EOI/Deactivate mode
[ 0.000000] GICv3: 256 SPIs implemented
[ 0.000000] GICv3: 0 Extended SPIs implemented
[ 0.000000] GICv3: Distributor has no Range Selector support
[ 0.000000] GICv3: 16 PPIs implemented
[ 0.000000] GICv3: CPU0: found redistributor 0 region 0:0x0000000006040000
[ 0.000000] ITS [mem 0x06020000-0x0603ffff]
[ 0.000000] ITS@0x0000000006020000: allocated 65536 Devices @2080180000 (flat, esz 8, psz 64K, shr 0)
[ 0.000000] ITS: using cache flushing for cmd queue
[ 0.000000] GICv3: using LPI property table @0x0000002080200000
[ 0.000000] GIC: using cache flushing for LPI property table
[ 0.000000] GICv3: CPU0: using allocated LPI pending table @0x0000002080210000
[ 0.000000] random: get_random_bytes called from start_kernel+0x668/0x830 with crng_init=0
[ 0.000000] arch_timer: cp15 timer(s) running at 25.00MHz (phys).
[ 0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x5c40939b5, max_idle_ns: 440795202646 ns
[ 0.000000] sched_clock: 56 bits at 25MHz, resolution 40ns, wraps every 4398046511100ns
[ 0.000120] Console: colour dummy device 80x25
[ 0.000389] printk: console [tty0] enabled
[ 0.000439] Calibrating delay loop (skipped), value calculated using timer frequency.. 50.00 BogoMIPS (lpj=100000)
[ 0.000454] pid_max: default: 32768 minimum: 301
[ 0.000500] LSM: Security Framework initializing
[ 0.000553] Mount-cache hash table entries: 8192 (order: 4, 65536 bytes, linear)
[ 0.000587] Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes, linear)
[ 0.001510] rcu: Hierarchical SRCU implementation.
[ 0.001667] Platform MSI: gic-its@6020000 domain created
[ 0.001745] PCI/MSI: /interrupt-controller@6000000/gic-its@6020000 domain created
[ 0.001980] EFI services will not be available.
[ 0.002073] smp: Bringing up secondary CPUs ...
[ 0.002344] Detected PIPT I-cache on CPU1
[ 0.002365] GICv3: CPU1: found redistributor 1 region 0:0x0000000006060000
[ 0.002375] GICv3: CPU1: using allocated LPI pending table @0x0000002080220000
[ 0.002405] CPU1: Booted secondary processor 0x0000000001 [0x410fd083]
[ 0.002471] smp: Brought up 1 node, 2 CPUs
[ 0.002499] SMP: Total of 2 processors activated.
[ 0.002507] CPU features: detected: 32-bit EL0 Support
[ 0.002515] CPU features: detected: CRC32 instructions
[ 0.002524] CPU features: detected: 32-bit EL1 Support
[ 0.011735] CPU: All CPU(s) started at EL2
[ 0.011761] alternatives: patching kernel code
[ 0.012508] devtmpfs: initialized
[ 0.014832] KASLR disabled due to lack of seed
[ 0.020761] DMA-API: preallocated 65536 debug entries
[ 0.020786] DMA-API: debugging enabled by kernel config
[ 0.020795] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[ 0.020815] futex hash table entries: 512 (order: 3, 32768 bytes, linear)
[ 0.021457] pinctrl core: initialized pinctrl subsystem

[ 0.021715] *************************************************************
[ 0.021722] ** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE **
[ 0.021729] ** **
[ 0.021736] ** IOMMU DebugFS SUPPORT HAS BEEN ENABLED IN THIS KERNEL **
[ 0.021742] ** **
[ 0.021749] ** This means that this kernel is built to expose internal **
[ 0.021756] ** IOMMU data structures, which may compromise security on **
[ 0.021762] ** your system. **
[ 0.021769] ** **
[ 0.021776] ** If you see this message and you are not debugging the **
[ 0.021782] ** kernel, report this immediately to your vendor! **
[ 0.021789] ** **
[ 0.021795] ** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE **
[ 0.021802] *************************************************************
[ 0.021884] DMI not present or invalid.
[ 0.022167] NET: Registered protocol family 16
[ 0.023001] DMA: preallocated 512 KiB GFP_KERNEL pool for atomic allocations
[ 0.023114] DMA: preallocated 512 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
[ 0.023286] DMA: preallocated 512 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
[ 0.023317] audit: initializing netlink subsys (disabled)
[ 0.023434] audit: type=2000 audit(0.020:1): state=initialized audit_enabled=0 res=1
[ 0.023717] thermal_sys: Registered thermal governor 'step_wise'
[ 0.023953] cpuidle: using governor menu
[ 0.024076] hw-breakpoint: found 6 breakpoint and 4 watchpoint registers.
[ 0.024112] ASID allocator initialised with 65536 entries
[ 0.024394] Serial: AMBA PL011 UART driver
[ 0.026053] Machine: Kontron SMARC-sAL28 (Single PHY) on SMARC Eval 2.0 carrier
[ 0.026065] SoC family: QorIQ LS1028A
[ 0.026071] SoC ID: svr:0x870b0110, Revision: 1.0
[ 0.038310] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
[ 0.038332] HugeTLB registered 32.0 MiB page size, pre-allocated 0 pages
[ 0.038342] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
[ 0.038351] HugeTLB registered 64.0 KiB page size, pre-allocated 0 pages
[ 0.039157] cryptd: max_cpu_qlen set to 1000
[ 0.040473] ACPI: Interpreter disabled.
[ 0.040536] platform 22c0000.dma-controller: probe deferral - supplier 5000000.iommu not ready
[ 0.040768] iommu: Default domain type: Translated
[ 0.040845] vgaarb: loaded
[ 0.041009] SCSI subsystem initialized
[ 0.041097] libata version 3.00 loaded.
[ 0.041222] usbcore: registered new interface driver usbfs
[ 0.041248] usbcore: registered new interface driver hub
[ 0.041273] usbcore: registered new device driver usb
[ 0.041554] imx-i2c 2000000.i2c: can't get pinctrl, bus recovery not supported
[ 0.041843] i2c i2c-0: IMX I2C adapter registered
[ 0.042000] imx-i2c 2030000.i2c: can't get pinctrl, bus recovery not supported
[ 0.042104] i2c i2c-1: IMX I2C adapter registered
[ 0.042208] imx-i2c 2040000.i2c: can't get pinctrl, bus recovery not supported
[ 0.042391] i2c i2c-2: IMX I2C adapter registered
[ 0.042516] mc: Linux media interface: v0.10
[ 0.042539] videodev: Linux video capture interface: v2.00
[ 0.042566] pps_core: LinuxPPS API ver. 1 registered
[ 0.042573] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <[email protected]>
[ 0.042588] PTP clock support registered
[ 0.042664] EDAC MC: Ver: 3.0.0
[ 0.043046] FPGA manager framework
[ 0.043090] Advanced Linux Sound Architecture Driver Initialized.
[ 0.043543] clocksource: Switched to clocksource arch_sys_counter
[ 0.066221] VFS: Disk quotas dquot_6.6.0
[ 0.066273] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[ 0.066402] pnp: PnP ACPI: disabled
[ 0.074668] NET: Registered protocol family 2
[ 0.074996] tcp_listen_portaddr_hash hash table entries: 2048 (order: 3, 32768 bytes, linear)
[ 0.075026] TCP established hash table entries: 32768 (order: 6, 262144 bytes, linear)
[ 0.075122] TCP bind hash table entries: 32768 (order: 7, 524288 bytes, linear)
[ 0.075454] TCP: Hash tables configured (established 32768 bind 32768)
[ 0.075571] UDP hash table entries: 2048 (order: 4, 65536 bytes, linear)
[ 0.075604] UDP-Lite hash table entries: 2048 (order: 4, 65536 bytes, linear)
[ 0.075708] NET: Registered protocol family 1
[ 0.076019] RPC: Registered named UNIX socket transport module.
[ 0.076031] RPC: Registered udp transport module.
[ 0.076038] RPC: Registered tcp transport module.
[ 0.076045] RPC: Registered tcp NFSv4.1 backchannel transport module.
[ 0.076055] PCI: CLS 0 bytes, default 64
[ 0.076425] hw perfevents: enabled with armv8_cortex_a72 PMU driver, 7 counters available
[ 0.076625] kvm [1]: IPA Size Limit: 44 bits
[ 0.077159] kvm [1]: GICv3: no GICV resource entry
[ 0.077170] kvm [1]: disabling GICv2 emulation
[ 0.077189] kvm [1]: GIC system register CPU interface enabled
[ 0.077228] kvm [1]: vgic interrupt IRQ9
[ 0.077288] kvm [1]: Hyp mode initialized successfully
[ 0.088537] Initialise system trusted keyrings
[ 0.088653] workingset: timestamp_bits=44 max_order=20 bucket_order=0
[ 0.092096] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[ 0.092507] NFS: Registering the id_resolver key type
[ 0.092532] Key type id_resolver registered
[ 0.092540] Key type id_legacy registered
[ 0.092590] nfs4filelayout_init: NFSv4 File Layout Driver Registering...
[ 0.092691] 9p: Installing v9fs 9p2000 file system support
[ 0.123462] Key type asymmetric registered
[ 0.123473] Asymmetric key parser 'x509' registered
[ 0.123499] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 245)
[ 0.123510] io scheduler mq-deadline registered
[ 0.123517] io scheduler kyber registered
[ 0.124604] platform 1f0000000.pcie: probe deferral - supplier 5000000.iommu not ready
[ 0.124746] platform 3400000.pcie: probe deferral - supplier 5000000.iommu not ready
[ 0.124761] platform 3500000.pcie: probe deferral - supplier 5000000.iommu not ready
[ 0.125145] EINJ: ACPI disabled.
[ 0.128791] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[ 0.129769] 21c0500.serial: ttyS0 at MMIO 0x21c0500 (irq = 24, base_baud = 12500000) is a 16550A
[ 1.276625] printk: console [ttyS0] enabled
[ 1.281262] 21c0600.serial: ttyS1 at MMIO 0x21c0600 (irq = 24, base_baud = 12500000) is a 16550A
[ 1.290396] platform 2270000.serial: probe deferral - supplier 22c0000.dma-controller not ready
[ 1.299780] arm-smmu 5000000.iommu: probing hardware configuration...
[ 1.306253] arm-smmu 5000000.iommu: SMMUv2 with:
[ 1.310895] arm-smmu 5000000.iommu: stage 1 translation
[ 1.316232] arm-smmu 5000000.iommu: stage 2 translation
[ 1.321566] arm-smmu 5000000.iommu: nested translation
[ 1.326815] arm-smmu 5000000.iommu: stream matching with 128 register groups
[ 1.333984] arm-smmu 5000000.iommu: 64 context banks (0 stage-2 only)
[ 1.340542] arm-smmu 5000000.iommu: Supported page sizes: 0x61311000
[ 1.347011] arm-smmu 5000000.iommu: Stage-1: 48-bit VA -> 48-bit IPA
[ 1.353481] arm-smmu 5000000.iommu: Stage-2: 48-bit IPA -> 48-bit PA
[ 1.364504] loop: module loaded
[ 1.367804] at24 0-0050: supply vcc not found, using dummy regulator
[ 1.375049] at24 0-0050: 4096 byte 24c32 EEPROM, writable, 32 bytes/write
[ 1.381959] at24 1-0057: supply vcc not found, using dummy regulator
[ 1.389165] at24 1-0057: 8192 byte 24c64 EEPROM, writable, 32 bytes/write
[ 1.396063] at24 2-0050: supply vcc not found, using dummy regulator
[ 1.403265] at24 2-0050: 4096 byte 24c32 EEPROM, writable, 32 bytes/write
[ 1.424830] platform 2120000.spi: probe deferral - supplier 22c0000.dma-controller not ready
[ 1.433801] spi-nor spi0.0: w25q32dw (4096 Kbytes)
[ 1.438888] 8 fixed-partitions partitions found on MTD device 20c0000.spi
[ 1.445729] Creating 8 MTD partitions on "20c0000.spi":
[ 1.450984] 0x000000000000-0x000000010000 : "rcw"
[ 1.459943] 0x000000010000-0x000000100000 : "failsafe bootloader"
[ 1.467909] 0x000000100000-0x000000140000 : "failsafe DP firmware"
[ 1.475904] 0x000000140000-0x0000001e0000 : "failsafe trusted firmware"
[ 1.483901] 0x0000001e0000-0x000000200000 : "reserved"
[ 1.491903] 0x000000200000-0x000000210000 : "configuration store"
[ 1.499897] 0x000000210000-0x0000003e0000 : "bootloader"
[ 1.507909] 0x0000003e0000-0x000000400000 : "bootloader environment"
[ 1.516489] libphy: Fixed MDIO Bus: probed
[ 1.521010] tun: Universal TUN/TAP device driver, 1.6
[ 1.526180] CAN device driver interface
[ 1.530778] thunder_xcv, ver 1.0
[ 1.534051] thunder_bgx, ver 1.0
[ 1.537311] nicpf, ver 1.0
[ 1.540304] hclge is initializing
[ 1.543646] hns3: Hisilicon Ethernet Network Driver for Hip08 Family - version
[ 1.550901] hns3: Copyright (c) 2017 Huawei Corporation.
[ 1.556270] igb: Intel(R) Gigabit Ethernet Network Driver
[ 1.561694] igb: Copyright (c) 2007-2014 Intel Corporation.
[ 1.567359] sky2: driver version 1.30
[ 1.571307] VFIO - User Level meta-driver version: 0.3
[ 1.576745] dwc3 3100000.usb: Adding to iommu group 0
[ 1.582262] dwc3 3110000.usb: Adding to iommu group 1
[ 1.588251] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[ 1.594825] ehci-pci: EHCI PCI platform driver
[ 1.599307] ehci-platform: EHCI generic platform driver
[ 1.604627] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[ 1.610844] ohci-pci: OHCI PCI platform driver
[ 1.615324] ohci-platform: OHCI generic platform driver
[ 1.620846] xhci-hcd xhci-hcd.0.auto: xHCI Host Controller
[ 1.626370] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus number 1
[ 1.634228] xhci-hcd xhci-hcd.0.auto: hcc params 0x0220f66d hci version 0x100 quirks 0x0000000002010010
[ 1.643698] xhci-hcd xhci-hcd.0.auto: irq 29, io mem 0x03100000
[ 1.650061] hub 1-0:1.0: USB hub found
[ 1.653850] hub 1-0:1.0: 1 port detected
[ 1.657959] xhci-hcd xhci-hcd.0.auto: xHCI Host Controller
[ 1.663477] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus number 2
[ 1.671174] xhci-hcd xhci-hcd.0.auto: Host supports USB 3.0 SuperSpeed
[ 1.677758] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM.
[ 1.686134] hub 2-0:1.0: USB hub found
[ 1.689917] hub 2-0:1.0: 1 port detected
[ 1.694085] xhci-hcd xhci-hcd.1.auto: xHCI Host Controller
[ 1.699607] xhci-hcd xhci-hcd.1.auto: new USB bus registered, assigned bus number 3
[ 1.707473] xhci-hcd xhci-hcd.1.auto: hcc params 0x0220f66d hci version 0x100 quirks 0x0000000002010010
[ 1.716936] xhci-hcd xhci-hcd.1.auto: irq 30, io mem 0x03110000
[ 1.723244] hub 3-0:1.0: USB hub found
[ 1.727026] hub 3-0:1.0: 1 port detected
[ 1.731114] xhci-hcd xhci-hcd.1.auto: xHCI Host Controller
[ 1.736633] xhci-hcd xhci-hcd.1.auto: new USB bus registered, assigned bus number 4
[ 1.744329] xhci-hcd xhci-hcd.1.auto: Host supports USB 3.0 SuperSpeed
[ 1.750919] usb usb4: We don't know the algorithms for LPM for this host, disabling LPM.
[ 1.759300] hub 4-0:1.0: USB hub found
[ 1.763090] hub 4-0:1.0: 1 port detected
[ 1.767271] usbcore: registered new interface driver usb-storage
[ 1.774135] udc-core: couldn't find an available UDC - added [g_ether] to list of pending drivers
[ 1.783054] udc-core: couldn't find an available UDC - added [g_mass_storage] to list of pending drivers
[ 1.792577] udc-core: couldn't find an available UDC - added [g_serial] to list of pending drivers
[ 1.801840] input: buttons1 as /devices/platform/buttons1/input/input0
[ 1.809670] ftm-alarm 2800000.timer: registered as rtc1
[ 1.815661] rtc-rv8803 0-0032: Voltage low, temperature compensation stopped.
[ 1.822837] rtc-rv8803 0-0032: Voltage low, data loss detected.
[ 1.829877] rtc-rv8803 0-0032: Voltage low, data is invalid.
[ 1.835639] rtc-rv8803 0-0032: registered as rtc0
[ 1.840966] rtc-rv8803 0-0032: Voltage low, data is invalid.
[ 1.846654] rtc-rv8803 0-0032: hctosys: unable to read the hardware clock
[ 1.853617] i2c /dev entries driver
[ 1.868940] sp805-wdt c000000.watchdog: registration successful
[ 1.875000] sp805-wdt c010000.watchdog: registration successful
[ 1.882900] sl28cpld-wdt 2000000.i2c:sl28cpld@4a:watchdog@4: initial timeout 6 sec
[ 1.890998] qoriq-cpufreq qoriq-cpufreq: Freescale QorIQ CPU frequency scaling driver
[ 1.899245] sdhci: Secure Digital Host Controller Interface driver
[ 1.905461] sdhci: Copyright(c) Pierre Ossman
[ 1.909933] Synopsys Designware Multimedia Card Interface Driver
[ 1.916222] sdhci-pltfm: SDHCI platform and OF driver helper
[ 1.922189] sdhci-esdhc 2140000.mmc: Adding to iommu group 2
[ 1.927922] sdhci-esdhc 2150000.mmc: Adding to iommu group 3
[ 1.933859] ledtrig-cpu: registered to indicate activity on CPUs
[ 1.940481] usbcore: registered new interface driver usbhid
[ 1.946088] usbhid: USB HID core driver
[ 1.951040] wm8904 2-001a: supply DCVDD not found, using dummy regulator
[ 1.957849] wm8904 2-001a: supply DBVDD not found, using dummy regulator
[ 1.959412] mmc0: SDHCI controller on 2150000.mmc [2150000.mmc] using ADMA
[ 1.964608] wm8904 2-001a: supply AVDD not found, using dummy regulator
[ 1.971498] mmc1: SDHCI controller on 2140000.mmc [2140000.mmc] using ADMA
[ 1.978161] wm8904 2-001a: supply CPVDD not found, using dummy regulator
[ 1.991795] wm8904 2-001a: supply MICVDD not found, using dummy regulator
[ 2.000066] wm8904 2-001a: revision A
[ 2.011006] platform f140000.audio-controller: probe deferral - supplier 22c0000.dma-controller not ready
[ 2.020630] platform f150000.audio-controller: probe deferral - supplier 22c0000.dma-controller not ready
[ 2.030314] drop_monitor: Initializing network drop monitor service
[ 2.036934] NET: Registered protocol family 10
[ 2.041875] Segment Routing with IPv6
[ 2.045615] NET: Registered protocol family 17
[ 2.050199] can: controller area network core
[ 2.054614] NET: Registered protocol family 29
[ 2.059081] can: raw protocol
[ 2.062060] can: broadcast manager protocol
[ 2.066276] can: netlink gateway - max_hops=1
[ 2.070785] 9pnet: Installing 9P2000 support
[ 2.071171] random: fast init done
[ 2.075110] Key type dns_resolver registered
[ 2.078504] usb 3-1: new high-speed USB device number 2 using xhci-hcd
[ 2.082977] registered taskstats version 1
[ 2.093467] Loading compiled-in X.509 certificates
[ 2.096902] mmc0: new HS400 MMC card at address 0001
[ 2.100484] fsl-edma 22c0000.dma-controller: Adding to iommu group 4
[ 2.103805] mmcblk0: mmc0:0001 S0J58X 29.6 GiB
[ 2.110777] pci-host-generic 1f0000000.pcie: host bridge /soc/pcie@1f0000000 ranges:
[ 2.114498] mmcblk0boot0: mmc0:0001 S0J58X partition 1 31.5 MiB
[ 2.122014] pci-host-generic 1f0000000.pcie: MEM 0x01f8000000..0x01f815ffff -> 0x0000000000
[ 2.128314] mmcblk0boot1: mmc0:0001 S0J58X partition 2 31.5 MiB
[ 2.136776] pci-host-generic 1f0000000.pcie: MEM 0x01f8160000..0x01f81cffff -> 0x0000000000
[ 2.147614] mmcblk0rpmb: mmc0:0001 S0J58X partition 3 4.00 MiB, chardev (241:0)
[ 2.151551] pci-host-generic 1f0000000.pcie: MEM 0x01f81d0000..0x01f81effff -> 0x0000000000
[ 2.163146] mmcblk0: p1 p2
[ 2.167727] pci-host-generic 1f0000000.pcie: MEM 0x01f81f0000..0x01f820ffff -> 0x0000000000
[ 2.179375] pci-host-generic 1f0000000.pcie: MEM 0x01f8210000..0x01f822ffff -> 0x0000000000
[ 2.179619] mmc1: new ultra high speed SDR104 SDHC card at address 5048
[ 2.188227] pci-host-generic 1f0000000.pcie: MEM 0x01f8230000..0x01f824ffff -> 0x0000000000
[ 2.203691] pci-host-generic 1f0000000.pcie: MEM 0x01fc000000..0x01fc3fffff -> 0x0000000000
[ 2.203959] mmcblk1: mmc1:5048 SD16G 14.4 GiB
[ 2.212582] pci-host-generic 1f0000000.pcie: ECAM at [mem 0x1f0000000-0x1f00fffff] for [bus 00]
[ 2.225049] GPT:Primary header thinks Alt. header is not at the end of the disk.
[ 2.225832] pci-host-generic 1f0000000.pcie: PCI host bridge to bus 0000:00
[ 2.233193] GPT:266272 != 30253055
[ 2.240157] pci_bus 0000:00: root bus resource [bus 00]
[ 2.240164] pci_bus 0000:00: root bus resource [mem 0x1f8000000-0x1f815ffff] (bus address [0x00000000-0x0015ffff])
[ 2.243589] GPT:Alternate GPT header not at the end of the disk.
[ 2.248824] pci_bus 0000:00: root bus resource [mem 0x1f8160000-0x1f81cffff pref] (bus address [0x00000000-0x0006ffff])
[ 2.259211] GPT:266272 != 30253055
[ 2.265242] pci_bus 0000:00: root bus resource [mem 0x1f81d0000-0x1f81effff] (bus address [0x00000000-0x0001ffff])
[ 2.276103] GPT: Use GNU Parted to correct GPT errors.
[ 2.279483] pci_bus 0000:00: root bus resource [mem 0x1f81f0000-0x1f820ffff pref] (bus address [0x00000000-0x0001ffff])
[ 2.279489] pci_bus 0000:00: root bus resource [mem 0x1f8210000-0x1f822ffff] (bus address [0x00000000-0x0001ffff])
[ 2.279494] pci_bus 0000:00: root bus resource [mem 0x1f8230000-0x1f824ffff pref] (bus address [0x00000000-0x0001ffff])
[ 2.279499] pci_bus 0000:00: root bus resource [mem 0x1fc000000-0x1fc3fffff] (bus address [0x00000000-0x003fffff])
[ 2.279521] pci 0000:00:00.0: [1957:e100] type 00 class 0x020001
[ 2.289971] mmcblk1: p1
[ 2.295081] pci 0000:00:00.0: BAR 0: [mem 0x1f8000000-0x1f803ffff 64bit] (from Enhanced Allocation, properties 0x0)
[ 2.356551] pci 0000:00:00.0: BAR 2: [mem 0x1f8160000-0x1f816ffff 64bit pref] (from Enhanced Allocation, properties 0x1)
[ 2.359017] hub 3-1:1.0: USB hub found
[ 2.367482] pci 0000:00:00.0: VF BAR 0: [mem 0x1f81d0000-0x1f81dffff 64bit] (from Enhanced Allocation, properties 0x4)
[ 2.371308] hub 3-1:1.0: 7 ports detected
[ 2.381993] pci 0000:00:00.0: VF BAR 2: [mem 0x1f81f0000-0x1f81fffff 64bit pref] (from Enhanced Allocation, properties 0x3)
[ 2.397218] pci 0000:00:00.0: PME# supported from D0 D3hot
[ 2.402737] pci 0000:00:00.0: VF(n) BAR0 space: [mem 0x1f81d0000-0x1f81effff 64bit] (contains BAR0 for 2 VFs)
[ 2.412699] pci 0000:00:00.0: VF(n) BAR2 space: [mem 0x1f81f0000-0x1f820ffff 64bit pref] (contains BAR2 for 2 VFs)
[ 2.423260] pci 0000:00:00.1: [1957:e100] type 00 class 0x020001
[ 2.429323] pci 0000:00:00.1: BAR 0: [mem 0x1f8040000-0x1f807ffff 64bit] (from Enhanced Allocation, properties 0x0)
[ 2.439810] pci 0000:00:00.1: BAR 2: [mem 0x1f8170000-0x1f817ffff 64bit pref] (from Enhanced Allocation, properties 0x1)
[ 2.450731] pci 0000:00:00.1: VF BAR 0: [mem 0x1f8210000-0x1f821ffff 64bit] (from Enhanced Allocation, properties 0x4)
[ 2.461478] pci 0000:00:00.1: VF BAR 2: [mem 0x1f8230000-0x1f823ffff 64bit pref] (from Enhanced Allocation, properties 0x3)
[ 2.472674] pci 0000:00:00.1: PME# supported from D0 D3hot
[ 2.478193] pci 0000:00:00.1: VF(n) BAR0 space: [mem 0x1f8210000-0x1f822ffff 64bit] (contains BAR0 for 2 VFs)
[ 2.488163] pci 0000:00:00.1: VF(n) BAR2 space: [mem 0x1f8230000-0x1f824ffff 64bit pref] (contains BAR2 for 2 VFs)
[ 2.498689] pci 0000:00:00.2: [1957:e100] type 00 class 0x020001
[ 2.504747] pci 0000:00:00.2: BAR 0: [mem 0x1f8080000-0x1f80bffff 64bit] (from Enhanced Allocation, properties 0x0)
[ 2.515234] pci 0000:00:00.2: BAR 2: [mem 0x1f8180000-0x1f818ffff 64bit pref] (from Enhanced Allocation, properties 0x1)
[ 2.526167] pci 0000:00:00.2: PME# supported from D0 D3hot
[ 2.531802] pci 0000:00:00.3: [1957:ee01] type 00 class 0x088001
[ 2.537868] pci 0000:00:00.3: BAR 0: [mem 0x1f8100000-0x1f811ffff 64bit] (from Enhanced Allocation, properties 0x0)
[ 2.548357] pci 0000:00:00.3: BAR 2: [mem 0x1f8190000-0x1f819ffff 64bit pref] (from Enhanced Allocation, properties 0x1)
[ 2.559290] pci 0000:00:00.3: PME# supported from D0 D3hot
[ 2.564915] pci 0000:00:00.4: [1957:ee02] type 00 class 0x088001
[ 2.570972] pci 0000:00:00.4: BAR 0: [mem 0x1f8120000-0x1f813ffff 64bit] (from Enhanced Allocation, properties 0x0)
[ 2.581457] pci 0000:00:00.4: BAR 2: [mem 0x1f81a0000-0x1f81affff 64bit pref] (from Enhanced Allocation, properties 0x1)
[ 2.592395] pci 0000:00:00.4: PME# supported from D0 D3hot
[ 2.598030] pci 0000:00:00.5: [1957:eef0] type 00 class 0x020801
[ 2.604092] pci 0000:00:00.5: BAR 0: [mem 0x1f8140000-0x1f815ffff 64bit] (from Enhanced Allocation, properties 0x0)
[ 2.614583] pci 0000:00:00.5: BAR 2: [mem 0x1f81b0000-0x1f81bffff 64bit pref] (from Enhanced Allocation, properties 0x1)
[ 2.625505] pci 0000:00:00.5: BAR 4: [mem 0x1fc000000-0x1fc3fffff 64bit] (from Enhanced Allocation, properties 0x0)
[ 2.636003] pci 0000:00:00.5: PME# supported from D0 D3hot
[ 2.641641] pci 0000:00:00.6: [1957:e100] type 00 class 0x020001
[ 2.647697] pci 0000:00:00.6: BAR 0: [mem 0x1f80c0000-0x1f80fffff 64bit] (from Enhanced Allocation, properties 0x0)
[ 2.658184] pci 0000:00:00.6: BAR 2: [mem 0x1f81c0000-0x1f81cffff 64bit pref] (from Enhanced Allocation, properties 0x1)
[ 2.669118] pci 0000:00:00.6: PME# supported from D0 D3hot
[ 2.675519] pci 0000:00:1f.0: [1957:e001] type 00 class 0x080700
[ 2.681596] OF: /soc/pcie@1f0000000: no msi-map translation for id 0xf8 on (null)
[ 2.689371] fsl_enetc 0000:00:00.0: Adding to iommu group 5
[ 2.707554] usb 3-1.6: new full-speed USB device number 3 using xhci-hcd
[ 2.799556] fsl_enetc 0000:00:00.0: enabling device (0400 -> 0402)
[ 2.805802] fsl_enetc 0000:00:00.0: no MAC address specified for SI1, using 82:f0:96:19:76:9c
[ 2.814373] fsl_enetc 0000:00:00.0: no MAC address specified for SI2, using 5e:fb:ae:4d:83:1f
[ 2.823365] libphy: Freescale ENETC MDIO Bus: probed
[ 2.829583] libphy: Freescale ENETC internal MDIO Bus: probed
[ 2.836011] fsl_enetc 0000:00:00.1: Adding to iommu group 6
[ 2.841724] fsl_enetc 0000:00:00.1: device is disabled, skipping
[ 2.847885] fsl_enetc 0000:00:00.2: Adding to iommu group 7
[ 2.853568] fsl_enetc 0000:00:00.2: device is disabled, skipping
[ 2.859709] fsl_enetc_mdio 0000:00:00.3: Adding to iommu group 8
[ 2.872274] hid-generic 0003:064F:2AF9.0001: device has no listeners, quitting
[ 2.971555] fsl_enetc_mdio 0000:00:00.3: enabling device (0400 -> 0402)
[ 2.978403] libphy: FSL PCIe IE Central MDIO Bus: probed
[ 2.983940] mscc_felix 0000:00:00.5: Adding to iommu group 9
[ 2.989781] mscc_felix 0000:00:00.5: device is disabled, skipping
[ 2.996017] fsl_enetc 0000:00:00.6: Adding to iommu group 10
[ 3.001802] fsl_enetc 0000:00:00.6: device is disabled, skipping
[ 3.007871] OF: /soc/pcie@1f0000000: no iommu-map translation for id 0xf8 on (null)
[ 3.015686] pcieport 0000:00:1f.0: PME: Signaling with IRQ 123
[ 3.021749] pcieport 0000:00:1f.0: AER: enabled with IRQ 123
[ 3.028170] 2270000.serial: ttyLP2 at MMIO 0x2270000 (irq = 25, base_baud = 12500000) is a FSL_LPUART
[ 3.038551] spi-nor spi1.0: at25sl321 (4096 Kbytes)
[ 3.047761] asoc-simple-card sound: ASoC: no DMI vendor name!
[ 3.056720] input: buttons0 as /devices/platform/buttons0/input/input1
[ 3.063570] ALSA device list:
[ 3.066547] #0: f150000.audio-controller-wm8904-hifi
[ 3.075682] EXT4-fs (mmcblk0p2): INFO: recovery required on readonly filesystem
[ 3.083049] EXT4-fs (mmcblk0p2): write access will be enabled during recovery
[ 3.095936] EXT4-fs (mmcblk0p2): recovery complete
[ 3.102318] EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
[ 3.103561] usb 3-1.3: new high-speed USB device number 4 using xhci-hcd
[ 3.112153] VFS: Mounted root (ext4 filesystem) readonly on device 179:2.
[ 3.125913] devtmpfs: mounted
[ 3.133476] Freeing unused kernel memory: 4736K
[ 3.147609] Missing param value! Expected 'debug=...value...'
[ 3.153388] Missing param value! Expected 'rootwait=...value...'
[ 3.159421] Run /sbin/init as init process
[ 3.163540] with arguments:
[ 3.166513] /sbin/init
[ 3.169228] with environment:
[ 3.172378] HOME=/
[ 3.174741] TERM=linux
[ 3.205526] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null). Quota mode: none.
[ 3.223479] usb-storage 3-1.3:1.0: USB Mass Storage device detected
[ 3.230192] scsi host0: usb-storage 3-1.3:1.0
[ 3.277936] udevd[139]: starting version 3.2.8
[ 3.283246] random: udevd: uninitialized urandom read (16 bytes read)
[ 3.290280] random: udevd: uninitialized urandom read (16 bytes read)
[ 3.296805] random: udevd: uninitialized urandom read (16 bytes read)
[ 3.304858] udevd[139]: specified group 'kvm' unknown
[ 3.317735] udevd[140]: starting eudev-3.2.8
[ 4.634039] scsi 0:0:0:0: Direct-Access JetFlash Transcend 32GB 1100 PQ: 0 ANSI: 6
[ 4.646337] sd 0:0:0:0: [sda] 61702144 512-byte logical blocks: (31.6 GB/29.4 GiB)
[ 4.657126] sd 0:0:0:0: [sda] Write Protect is off
[ 4.662029] sd 0:0:0:0: [sda] Mode Sense: 43 00 00 00
[ 4.673461] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 4.710188] sda: sda1
[ 4.715321] sd 0:0:0:0: [sda] Attached SCSI removable disk
[ 4.753103] fsl_enetc 0000:00:00.0 gbe0: renamed from eth0
[ 6.006576] urandom_read: 3 callbacks suppressed
[ 6.006587] random: dd: uninitialized urandom read (512 bytes read)
[ 6.108830] fsl_enetc 0000:00:00.0 gbe0: PHY [0000:00:00.0:05] driver [Qualcomm Atheros AR8031/AR8033] (irq=POLL)
[ 6.127032] fsl_enetc 0000:00:00.0 gbe0: configuring for inband/sgmii link mode
[ 6.153943] random: dropbear: uninitialized urandom read (32 bytes read)
[ 10.212329] fsl_enetc 0000:00:00.0 gbe0: Link is Up - 1Gbps/Full - flow control rx/tx
[ 10.220233] IPv6: ADDRCONF(NETDEV_CHANGE): gbe0: link becomes ready
[ 164.963579] random: crng init done

HTH,
-michael

2021-01-18 17:46:08

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

Hi Saravana,

On Fri, Dec 18, 2020 at 4:34 AM Saravana Kannan <[email protected]> wrote:
> Cyclic dependencies in some firmware was one of the last remaining
> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> dependencies don't block probing, set fw_devlink=on by default.
>
> Setting fw_devlink=on by default brings a bunch of benefits (currently,
> only for systems with device tree firmware):
> * Significantly cuts down deferred probes.
> * Device probe is effectively attempted in graph order.
> * Makes it much easier to load drivers as modules without having to
> worry about functional dependencies between modules (depmod is still
> needed for symbol dependencies).
>
> If this patch prevents some devices from probing, it's very likely due
> to the system having one or more device drivers that "probe"/set up a
> device (DT node with compatible property) without creating a struct
> device for it. If we hit such cases, the device drivers need to be
> fixed so that they populate struct devices and probe them like normal
> device drivers so that the driver core is aware of the devices and their
> status. See [1] for an example of such a case.
>
> [1] - https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> Signed-off-by: Saravana Kannan <[email protected]>

Shimoda-san reported that next-20210111 and later fail to boot
on Renesas R-Car Gen3 platforms. No output is seen, unless earlycon
is enabled.

I have bisected this to commit e590474768f1cc04 ("driver core: Set
fw_devlink=on by default").

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2021-01-18 17:48:14

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

Hi Saravana,

On Wed, Jan 13, 2021 at 3:34 AM Saravana Kannan <[email protected]> wrote:
> On Mon, Jan 11, 2021 at 11:11 PM Marek Szyprowski
> <[email protected]> wrote:
> > On 11.01.2021 22:47, Saravana Kannan wrote:
> > > On Mon, Jan 11, 2021 at 6:18 AM Marek Szyprowski
> > > <[email protected]> wrote:
> > >> On 11.01.2021 12:12, Marek Szyprowski wrote:
> > >>> On 18.12.2020 04:17, Saravana Kannan wrote:
> > >>>> Cyclic dependencies in some firmware was one of the last remaining
> > >>>> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> > >>>> dependencies don't block probing, set fw_devlink=on by default.
> > >>>>
> > >>>> Setting fw_devlink=on by default brings a bunch of benefits (currently,
> > >>>> only for systems with device tree firmware):
> > >>>> * Significantly cuts down deferred probes.
> > >>>> * Device probe is effectively attempted in graph order.
> > >>>> * Makes it much easier to load drivers as modules without having to
> > >>>> worry about functional dependencies between modules (depmod is still
> > >>>> needed for symbol dependencies).
> > >>>>
> > >>>> If this patch prevents some devices from probing, it's very likely due
> > >>>> to the system having one or more device drivers that "probe"/set up a
> > >>>> device (DT node with compatible property) without creating a struct
> > >>>> device for it. If we hit such cases, the device drivers need to be
> > >>>> fixed so that they populate struct devices and probe them like normal
> > >>>> device drivers so that the driver core is aware of the devices and their
> > >>>> status. See [1] for an example of such a case.
> > >>>>
> > >>>> [1] -
> > >>>> https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> > >>>> Signed-off-by: Saravana Kannan <[email protected]>
> > >>> This patch landed recently in linux next-20210111 as commit
> > >>> e590474768f1 ("driver core: Set fw_devlink=on by default"). Sadly it
> > >>> breaks Exynos IOMMU operation, what causes lots of devices being
> > >>> deferred and not probed at all. I've briefly checked and noticed that
> > >>> exynos_sysmmu_probe() is never called after this patch. This is really
> > >>> strange for me, as the SYSMMU controllers on Exynos platform are
> > >>> regular platform devices registered by the OF code. The driver code is
> > >>> here: drivers/iommu/exynos-iommu.c, example dts:
> > >>> arch/arm/boot/dts/exynos3250.dtsi (compatible = "samsung,exynos-sysmmu").
> > >> Okay, I found the source of this problem. It is caused by Exynos power
> > >> domain driver, which is not platform driver yet. I will post a patch,
> > >> which converts it to the platform driver.
> > > Thanks Marek! Hopefully the debug logs I added were sufficient to
> > > figure out the reason.
> >
> > Frankly, it took me a while to figure out that device core waits for the
> > power domain devices. Maybe it would be possible to add some more debug
> > messages or hints? Like the reason of the deferred probe in
> > /sys/kernel/debug/devices_deferred ?
>
> There's already a /sys/devices/.../<device>/waiting_for_supplier file
> that tells you if the device is waiting for a supplier device to be
> added. That file goes away once the device probes. If the file has 1,
> then it's waiting for the supplier device to be added (like your
> case). If it's 0, then the device is just waiting on one of the
> existing suppliers to probe. You can find the existing suppliers
> through /sys/devices/.../<device>/supplier:*/supplier. Also, flip
> these dev_dbg() to dev_info() if you need more details about deferred
> probing.

How are we supposed to check the contents of that file, if the system
doesn't even boot into userspace with a ramdisk? All hardware drivers
fail to probe. The only thing that works is "earlycon keep_bootcon",
and kernel output just stops after a while.

Thanks for your suggestions!

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2021-01-18 18:03:45

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

Hi Geert,

On 2021-01-18 17:39, Geert Uytterhoeven wrote:
> Hi Saravana,
>
> On Fri, Dec 18, 2020 at 4:34 AM Saravana Kannan <[email protected]>
> wrote:
>> Cyclic dependencies in some firmware was one of the last remaining
>> reasons fw_devlink=on couldn't be set by default. Now that cyclic
>> dependencies don't block probing, set fw_devlink=on by default.
>>
>> Setting fw_devlink=on by default brings a bunch of benefits
>> (currently,
>> only for systems with device tree firmware):
>> * Significantly cuts down deferred probes.
>> * Device probe is effectively attempted in graph order.
>> * Makes it much easier to load drivers as modules without having to
>> worry about functional dependencies between modules (depmod is still
>> needed for symbol dependencies).
>>
>> If this patch prevents some devices from probing, it's very likely due
>> to the system having one or more device drivers that "probe"/set up a
>> device (DT node with compatible property) without creating a struct
>> device for it. If we hit such cases, the device drivers need to be
>> fixed so that they populate struct devices and probe them like normal
>> device drivers so that the driver core is aware of the devices and
>> their
>> status. See [1] for an example of such a case.
>>
>> [1] -
>> https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
>> Signed-off-by: Saravana Kannan <[email protected]>
>
> Shimoda-san reported that next-20210111 and later fail to boot
> on Renesas R-Car Gen3 platforms. No output is seen, unless earlycon
> is enabled.
>
> I have bisected this to commit e590474768f1cc04 ("driver core: Set
> fw_devlink=on by default").

There is a tentative patch from Saravana here[1], which works around
some issues on my RK3399 platform, and it'd be interesting to find
out whether that helps on your system.

Thanks,

M.

[1]
https://lore.kernel.org/r/[email protected]
--
Jazz is not dead. It just smells funny...

2021-01-18 19:25:44

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

Hi Marc,

On Mon, Jan 18, 2021 at 6:59 PM Marc Zyngier <[email protected]> wrote:
> On 2021-01-18 17:39, Geert Uytterhoeven wrote:
> > On Fri, Dec 18, 2020 at 4:34 AM Saravana Kannan <[email protected]>
> > wrote:
> >> Cyclic dependencies in some firmware was one of the last remaining
> >> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> >> dependencies don't block probing, set fw_devlink=on by default.
> >>
> >> Setting fw_devlink=on by default brings a bunch of benefits
> >> (currently,
> >> only for systems with device tree firmware):
> >> * Significantly cuts down deferred probes.
> >> * Device probe is effectively attempted in graph order.
> >> * Makes it much easier to load drivers as modules without having to
> >> worry about functional dependencies between modules (depmod is still
> >> needed for symbol dependencies).
> >>
> >> If this patch prevents some devices from probing, it's very likely due
> >> to the system having one or more device drivers that "probe"/set up a
> >> device (DT node with compatible property) without creating a struct
> >> device for it. If we hit such cases, the device drivers need to be
> >> fixed so that they populate struct devices and probe them like normal
> >> device drivers so that the driver core is aware of the devices and
> >> their
> >> status. See [1] for an example of such a case.
> >>
> >> [1] -
> >> https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> >> Signed-off-by: Saravana Kannan <[email protected]>
> >
> > Shimoda-san reported that next-20210111 and later fail to boot
> > on Renesas R-Car Gen3 platforms. No output is seen, unless earlycon
> > is enabled.
> >
> > I have bisected this to commit e590474768f1cc04 ("driver core: Set
> > fw_devlink=on by default").
>
> There is a tentative patch from Saravana here[1], which works around
> some issues on my RK3399 platform, and it'd be interesting to find
> out whether that helps on your system.
>
> Thanks,
>
> M.
>
> [1]
> https://lore.kernel.org/r/[email protected]

Thanks for the suggestion, but given no devices probe (incl. GPIO
providers), I'm afraid it won't help. [testing] Indeed.

With the debug prints in device_links_check_suppliers enabled, and
some postprocessing, I get:

255 supplier e6180000.system-controller not ready
9 supplier fe990000.iommu not ready
9 supplier fe980000.iommu not ready
6 supplier febd0000.iommu not ready
6 supplier ec670000.iommu not ready
3 supplier febe0000.iommu not ready
3 supplier e7740000.iommu not ready
3 supplier e6740000.iommu not ready
3 supplier e65ee000.usb-phy not ready
3 supplier e6570000.iommu not ready
3 supplier e6054000.gpio not ready
3 supplier e6053000.gpio not ready

As everything is part of a PM Domain, the (lack of the) system controller
must be the culprit. What's wrong with it? It is registered very early in
the boot:

[ 0.142096] rcar_sysc_pd_init:442: of_genpd_add_provider_onecell() returned 0

Thanks!

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2021-01-18 21:13:35

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

On Sun, Jan 17, 2021 at 3:01 PM Michael Walle <[email protected]> wrote:
>
> Hi Saravana, again ;)

Hi again! :)

>
> > Cyclic dependencies in some firmware was one of the last remaining
> > reasons fw_devlink=on couldn't be set by default. Now that cyclic
> > dependencies don't block probing, set fw_devlink=on by default.
> >
> > Setting fw_devlink=on by default brings a bunch of benefits (currently,
> > only for systems with device tree firmware):
> > * Significantly cuts down deferred probes.
> > * Device probe is effectively attempted in graph order.
> > * Makes it much easier to load drivers as modules without having to
> > worry about functional dependencies between modules (depmod is still
> > needed for symbol dependencies).
> >
> > If this patch prevents some devices from probing, it's very likely due
> > to the system having one or more device drivers that "probe"/set up a
> > device (DT node with compatible property) without creating a struct
> > device for it. If we hit such cases, the device drivers need to be
> > fixed so that they populate struct devices and probe them like normal
> > device drivers so that the driver core is aware of the devices and their
> > status. See [1] for an example of such a case.
> >
> > [1] - https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> > Signed-off-by: Saravana Kannan <[email protected]>
>
> This breaks (at least) probing of the PCIe controllers of my board. The
> driver in question is
> drivers/pci/controller/dwc/pci-layerscape.c
> I've also put the maintainers of this driver on CC. Looks like it uses a
> proper struct device. But it uses builtin_platform_driver_probe() and
> apparently it waits for the iommu which uses module_platform_driver().
> Dunno if that will work together.

Yeah, the builtin vs module doesn't matter. I've had fw_devlink work
multiple times with the consumer driver being built in and the
supplier actually loaded as a module. Making that work is one of the
goals of fw_devlink.

> The board device tree can be found here:
> arch/arm64/boot/dts/freescale/fsl-ls1028a-kontron-sl28-var3-ads2.dts
>
> Attached is the log with enabled "probe deferral" messages enabled.

I took a look at the logs. As you said, pci seems to be waiting on
iommu, but it's not clear why the iommu didn't probe by then. Can you
add initcall_debug=1 and enable the logs in device_link_add()? Btw, I
realize one compromise on the logs is to send them as an attachment
instead of inline. That way, it's still archived in the list, but I
don't have to deal with log lines getting wrapped, etc.

Thanks for reporting the issues. Also, could you try picking up all of
these changes and giving it a shot. It's unlikely to help, but I want
to rule out issues related to fixes in progress.

https://lore.kernel.org/lkml/[email protected]/
https://lore.kernel.org/lkml/[email protected]/
https://lore.kernel.org/lkml/[email protected]/

Thanks,
Saravana

2021-01-19 05:10:34

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

On 2021-01-18 19:16, Geert Uytterhoeven wrote:
> Hi Marc,
>
> On Mon, Jan 18, 2021 at 6:59 PM Marc Zyngier <[email protected]> wrote:
>> On 2021-01-18 17:39, Geert Uytterhoeven wrote:
>> > On Fri, Dec 18, 2020 at 4:34 AM Saravana Kannan <[email protected]>
>> > wrote:
>> >> Cyclic dependencies in some firmware was one of the last remaining
>> >> reasons fw_devlink=on couldn't be set by default. Now that cyclic
>> >> dependencies don't block probing, set fw_devlink=on by default.
>> >>
>> >> Setting fw_devlink=on by default brings a bunch of benefits
>> >> (currently,
>> >> only for systems with device tree firmware):
>> >> * Significantly cuts down deferred probes.
>> >> * Device probe is effectively attempted in graph order.
>> >> * Makes it much easier to load drivers as modules without having to
>> >> worry about functional dependencies between modules (depmod is still
>> >> needed for symbol dependencies).
>> >>
>> >> If this patch prevents some devices from probing, it's very likely due
>> >> to the system having one or more device drivers that "probe"/set up a
>> >> device (DT node with compatible property) without creating a struct
>> >> device for it. If we hit such cases, the device drivers need to be
>> >> fixed so that they populate struct devices and probe them like normal
>> >> device drivers so that the driver core is aware of the devices and
>> >> their
>> >> status. See [1] for an example of such a case.
>> >>
>> >> [1] -
>> >> https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
>> >> Signed-off-by: Saravana Kannan <[email protected]>
>> >
>> > Shimoda-san reported that next-20210111 and later fail to boot
>> > on Renesas R-Car Gen3 platforms. No output is seen, unless earlycon
>> > is enabled.
>> >
>> > I have bisected this to commit e590474768f1cc04 ("driver core: Set
>> > fw_devlink=on by default").
>>
>> There is a tentative patch from Saravana here[1], which works around
>> some issues on my RK3399 platform, and it'd be interesting to find
>> out whether that helps on your system.
>>
>> Thanks,
>>
>> M.
>>
>> [1]
>> https://lore.kernel.org/r/[email protected]
>
> Thanks for the suggestion, but given no devices probe (incl. GPIO
> providers), I'm afraid it won't help. [testing] Indeed.
>
> With the debug prints in device_links_check_suppliers enabled, and
> some postprocessing, I get:
>
> 255 supplier e6180000.system-controller not ready
> 9 supplier fe990000.iommu not ready
> 9 supplier fe980000.iommu not ready
> 6 supplier febd0000.iommu not ready
> 6 supplier ec670000.iommu not ready
> 3 supplier febe0000.iommu not ready
> 3 supplier e7740000.iommu not ready
> 3 supplier e6740000.iommu not ready
> 3 supplier e65ee000.usb-phy not ready
> 3 supplier e6570000.iommu not ready
> 3 supplier e6054000.gpio not ready
> 3 supplier e6053000.gpio not ready
>
> As everything is part of a PM Domain, the (lack of the) system
> controller
> must be the culprit. What's wrong with it? It is registered very early
> in
> the boot:
>
> [ 0.142096] rcar_sysc_pd_init:442: of_genpd_add_provider_onecell()
> returned 0

Yeah, this looks like the exact same problem. The devlink stuff assumes
that because there is a "compatible" property, there will be a driver
directly associated with the node containing this property.

If any other node has a reference to that first node, the dependency
will only get resolved if/when that first node is bound to a driver.
Trouble is, there are *tons* of code in the tree that invalidate
this heuristic, and for each occurrence of this we get another failure.

The patch I referred to papers over it by registering a dummy driver,
but that doesn't scale easily...

M.
--
Jazz is not dead. It just smells funny...

2021-01-19 05:21:41

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

On Mon, Jan 18, 2021 at 11:16 AM Geert Uytterhoeven
<[email protected]> wrote:
>
> Hi Marc,
>
> On Mon, Jan 18, 2021 at 6:59 PM Marc Zyngier <[email protected]> wrote:
> > On 2021-01-18 17:39, Geert Uytterhoeven wrote:
> > > On Fri, Dec 18, 2020 at 4:34 AM Saravana Kannan <[email protected]>
> > > wrote:
> > >> Cyclic dependencies in some firmware was one of the last remaining
> > >> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> > >> dependencies don't block probing, set fw_devlink=on by default.
> > >>
> > >> Setting fw_devlink=on by default brings a bunch of benefits
> > >> (currently,
> > >> only for systems with device tree firmware):
> > >> * Significantly cuts down deferred probes.
> > >> * Device probe is effectively attempted in graph order.
> > >> * Makes it much easier to load drivers as modules without having to
> > >> worry about functional dependencies between modules (depmod is still
> > >> needed for symbol dependencies).
> > >>
> > >> If this patch prevents some devices from probing, it's very likely due
> > >> to the system having one or more device drivers that "probe"/set up a
> > >> device (DT node with compatible property) without creating a struct
> > >> device for it. If we hit such cases, the device drivers need to be
> > >> fixed so that they populate struct devices and probe them like normal
> > >> device drivers so that the driver core is aware of the devices and
> > >> their
> > >> status. See [1] for an example of such a case.
> > >>
> > >> [1] -
> > >> https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> > >> Signed-off-by: Saravana Kannan <[email protected]>
> > >
> > > Shimoda-san reported that next-20210111 and later fail to boot
> > > on Renesas R-Car Gen3 platforms. No output is seen, unless earlycon
> > > is enabled.
> > >
> > > I have bisected this to commit e590474768f1cc04 ("driver core: Set
> > > fw_devlink=on by default").
> >
> > There is a tentative patch from Saravana here[1], which works around
> > some issues on my RK3399 platform, and it'd be interesting to find
> > out whether that helps on your system.
> >
> > Thanks,
> >
> > M.
> >
> > [1]
> > https://lore.kernel.org/r/[email protected]
>
> Thanks for the suggestion, but given no devices probe (incl. GPIO
> providers), I'm afraid it won't help. [testing] Indeed.
>
> With the debug prints in device_links_check_suppliers enabled, and
> some postprocessing, I get:
>
> 255 supplier e6180000.system-controller not ready
> 9 supplier fe990000.iommu not ready
> 9 supplier fe980000.iommu not ready
> 6 supplier febd0000.iommu not ready
> 6 supplier ec670000.iommu not ready
> 3 supplier febe0000.iommu not ready
> 3 supplier e7740000.iommu not ready
> 3 supplier e6740000.iommu not ready
> 3 supplier e65ee000.usb-phy not ready
> 3 supplier e6570000.iommu not ready
> 3 supplier e6054000.gpio not ready
> 3 supplier e6053000.gpio not ready
>
> As everything is part of a PM Domain, the (lack of the) system controller
> must be the culprit. What's wrong with it? It is registered very early in
> the boot:
>
> [ 0.142096] rcar_sysc_pd_init:442: of_genpd_add_provider_onecell() returned 0

Hi Geert,

Thanks for reporting the issue.

Looks like you found the important logs. Can you please enable all
these logs and send the early con logs as an attachment (so I don't
need to deal with lines getting wrapped)?
1. The ones in device_links_check_suppliers()
2. The ones in device_link_add()
3. initcall_debug=1

That should help us figure out what's going on. Also, what's the DT
that corresponds to one of the boards that see this issue?

Lastly, can you please pick up these 3 patches (some need clean up
before they merge) to make sure it's not an issue being worked on from
other bug reports?
https://lore.kernel.org/lkml/[email protected]/
https://lore.kernel.org/lkml/[email protected]/
https://lore.kernel.org/lkml/[email protected]/

I have a strong hunch the 2nd one will fix your issues. fw_devlink can
handle cyclic dependencies now (it basically reverts to
fw_devlink=permissive mode for devices in the cycle), but it needs to
"see" all the dependencies to know there's a cycle. So want to make
sure it "sees" the "gpios" binding used all over some of the Renesas
DT files.

Thanks,
Saravana

2021-01-19 12:01:24

by Michael Walle

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

Am 2021-01-18 22:01, schrieb Saravana Kannan:
> On Sun, Jan 17, 2021 at 3:01 PM Michael Walle <[email protected]> wrote:
>> > Cyclic dependencies in some firmware was one of the last remaining
>> > reasons fw_devlink=on couldn't be set by default. Now that cyclic
>> > dependencies don't block probing, set fw_devlink=on by default.
>> >
>> > Setting fw_devlink=on by default brings a bunch of benefits (currently,
>> > only for systems with device tree firmware):
>> > * Significantly cuts down deferred probes.
>> > * Device probe is effectively attempted in graph order.
>> > * Makes it much easier to load drivers as modules without having to
>> > worry about functional dependencies between modules (depmod is still
>> > needed for symbol dependencies).
>> >
>> > If this patch prevents some devices from probing, it's very likely due
>> > to the system having one or more device drivers that "probe"/set up a
>> > device (DT node with compatible property) without creating a struct
>> > device for it. If we hit such cases, the device drivers need to be
>> > fixed so that they populate struct devices and probe them like normal
>> > device drivers so that the driver core is aware of the devices and their
>> > status. See [1] for an example of such a case.
>> >
>> > [1] - https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
>> > Signed-off-by: Saravana Kannan <[email protected]>
>>
>> This breaks (at least) probing of the PCIe controllers of my board.
>> The
>> driver in question is
>> drivers/pci/controller/dwc/pci-layerscape.c
>> I've also put the maintainers of this driver on CC. Looks like it uses
>> a
>> proper struct device. But it uses builtin_platform_driver_probe() and
>> apparently it waits for the iommu which uses module_platform_driver().
>> Dunno if that will work together.
>
> Yeah, the builtin vs module doesn't matter. I've had fw_devlink work
> multiple times with the consumer driver being built in and the
> supplier actually loaded as a module. Making that work is one of the
> goals of fw_devlink.

Ok.

>> The board device tree can be found here:
>> arch/arm64/boot/dts/freescale/fsl-ls1028a-kontron-sl28-var3-ads2.dts
>>
>> Attached is the log with enabled "probe deferral" messages enabled.
>
> I took a look at the logs. As you said, pci seems to be waiting on
> iommu, but it's not clear why the iommu didn't probe by then. Can you
> add initcall_debug=1 and enable the logs in device_link_add()? Btw, I
> realize one compromise on the logs is to send them as an attachment
> instead of inline. That way, it's still archived in the list, but I
> don't have to deal with log lines getting wrapped, etc.
>
> Thanks for reporting the issues. Also, could you try picking up all of
> these changes and giving it a shot. It's unlikely to help, but I want
> to rule out issues related to fixes in progress.
>
> https://lore.kernel.org/lkml/[email protected]/
> https://lore.kernel.org/lkml/[email protected]/
> https://lore.kernel.org/lkml/[email protected]/

Did pick them up, the last one had a conflict due some superfluous
lines.
Maybe they got reordered in that arrray.

Issue still persist. I've enabled the debug in device_link_add(), in
device_links_check_suppliers() and booted with initcall_debug. Please
see attached log. Lets see how that goes ;)

[ 0.132687] calling ls_pcie_driver_init+0x0/0x38 @ 1
[ 0.132762] platform 3400000.pcie: probe deferral - supplier
5000000.iommu not ready
[ 0.132777] platform 3500000.pcie: probe deferral - supplier
5000000.iommu not ready
[ 0.132818] initcall ls_pcie_driver_init+0x0/0x38 returned -19 after
119 usecs

After that, ls_pcie_driver_init() is never called again.

-michael


Attachments:
boot.log (146.16 kB)

2021-01-19 18:58:57

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

On Tue, Jan 19, 2021 at 1:05 AM Geert Uytterhoeven <[email protected]> wrote:
>
> Hi Saravana,
>
> On Mon, Jan 18, 2021 at 10:19 PM Saravana Kannan <[email protected]> wrote:
> > On Mon, Jan 18, 2021 at 11:16 AM Geert Uytterhoeven
> > <[email protected]> wrote:
> > > On Mon, Jan 18, 2021 at 6:59 PM Marc Zyngier <[email protected]> wrote:
> > > > On 2021-01-18 17:39, Geert Uytterhoeven wrote:
> > > > > On Fri, Dec 18, 2020 at 4:34 AM Saravana Kannan <[email protected]>
> > > > > wrote:
> > > > >> Cyclic dependencies in some firmware was one of the last remaining
> > > > >> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> > > > >> dependencies don't block probing, set fw_devlink=on by default.
> > > > >>
> > > > >> Setting fw_devlink=on by default brings a bunch of benefits
> > > > >> (currently,
> > > > >> only for systems with device tree firmware):
> > > > >> * Significantly cuts down deferred probes.
> > > > >> * Device probe is effectively attempted in graph order.
> > > > >> * Makes it much easier to load drivers as modules without having to
> > > > >> worry about functional dependencies between modules (depmod is still
> > > > >> needed for symbol dependencies).
> > > > >>
> > > > >> If this patch prevents some devices from probing, it's very likely due
> > > > >> to the system having one or more device drivers that "probe"/set up a
> > > > >> device (DT node with compatible property) without creating a struct
> > > > >> device for it. If we hit such cases, the device drivers need to be
> > > > >> fixed so that they populate struct devices and probe them like normal
> > > > >> device drivers so that the driver core is aware of the devices and
> > > > >> their
> > > > >> status. See [1] for an example of such a case.
> > > > >>
> > > > >> [1] -
> > > > >> https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> > > > >> Signed-off-by: Saravana Kannan <[email protected]>
> > > > >
> > > > > Shimoda-san reported that next-20210111 and later fail to boot
> > > > > on Renesas R-Car Gen3 platforms. No output is seen, unless earlycon
> > > > > is enabled.
> > > > >
> > > > > I have bisected this to commit e590474768f1cc04 ("driver core: Set
> > > > > fw_devlink=on by default").
> > > >
> > > > There is a tentative patch from Saravana here[1], which works around
> > > > some issues on my RK3399 platform, and it'd be interesting to find
> > > > out whether that helps on your system.
> > > >
> > > > Thanks,
> > > >
> > > > M.
> > > >
> > > > [1]
> > > > https://lore.kernel.org/r/[email protected]
> > >
> > > Thanks for the suggestion, but given no devices probe (incl. GPIO
> > > providers), I'm afraid it won't help. [testing] Indeed.
> > >
> > > With the debug prints in device_links_check_suppliers enabled, and
> > > some postprocessing, I get:
> > >
> > > 255 supplier e6180000.system-controller not ready
> > > 9 supplier fe990000.iommu not ready
> > > 9 supplier fe980000.iommu not ready
> > > 6 supplier febd0000.iommu not ready
> > > 6 supplier ec670000.iommu not ready
> > > 3 supplier febe0000.iommu not ready
> > > 3 supplier e7740000.iommu not ready
> > > 3 supplier e6740000.iommu not ready
> > > 3 supplier e65ee000.usb-phy not ready
> > > 3 supplier e6570000.iommu not ready
> > > 3 supplier e6054000.gpio not ready
> > > 3 supplier e6053000.gpio not ready
> > >
> > > As everything is part of a PM Domain, the (lack of the) system controller
> > > must be the culprit. What's wrong with it? It is registered very early in
> > > the boot:
> > >
> > > [ 0.142096] rcar_sysc_pd_init:442: of_genpd_add_provider_onecell() returned 0
>
> > Looks like you found the important logs. Can you please enable all
> > these logs and send the early con logs as an attachment (so I don't
> > need to deal with lines getting wrapped)?
> > 1. The ones in device_links_check_suppliers()
> > 2. The ones in device_link_add()
> > 3. initcall_debug=1
>
> I have attached[*] the requested log.
>
> > That should help us figure out what's going on. Also, what's the DT
> > that corresponds to one of the boards that see this issue?
>
> arch/arm64/boot/dts/renesas/r8a77951-salvator-xs.dts
>
> > Lastly, can you please pick up these 3 patches (some need clean up
> > before they merge) to make sure it's not an issue being worked on from
> > other bug reports?
> > https://lore.kernel.org/lkml/[email protected]/
> > https://lore.kernel.org/lkml/[email protected]/
> > https://lore.kernel.org/lkml/[email protected]/
> >
> > I have a strong hunch the 2nd one will fix your issues. fw_devlink can
> > handle cyclic dependencies now (it basically reverts to
> > fw_devlink=permissive mode for devices in the cycle), but it needs to
> > "see" all the dependencies to know there's a cycle. So want to make
> > sure it "sees" the "gpios" binding used all over some of the Renesas
> > DT files.
>
> These patches don't help.
> The 2nd one actually introduces a new failure:
>
> OF: /soc/i2c@e66d8000/gpio@20/pcie-sata-switch-hog: could not get
> #gpio-cells for /cpus/cpu@102
>
> Note that my issues don't seem to be GPIO-related at all.
>
> BTW, you are aware IOMMUs and DMA controllers are optional?
> I.e. device drivers with iommus and/or dmas DT properties where the
> targets of these properties do not have a driver should still be probed,
> eventually. But if the IOMMU or DMA drivers are present, they should be
> probed first, so the device drivers can make use of them.

Thanks for the logs and details.

Yeah, this is going to be a problem then. How is this handled in
static kernels today? Do we just try to make sure the iommus driver
probes the iommu device before the consumers? And then the consumers
simply don't defer probe on failure to get iommu?

I can make this work if modules are not enabled (needs some code
changes), but it's not going to work when there are modules. There's
no way to tell if an iommu module won't be loaded soon. Also, device
links doing this behavior only for iommu/dma is probably not a good
idea. So, whatever we do will have to be common behavior. :(

Another intermediate option I was thinking was having a
CONFIG_FW_DEVLINK_OFF/PERMISSIVE/ON and defaulting it to ON for ARM64
and turning it off in the defconfig for boards for which this doesn't
work. That way, we can incrementally enable fw_devlink.

This week is a very hectic week for me. So, please bear with slow
responses from me for rest of this week. Let me think about this a bit
to see if I can come up with a better solution than what I have in
mind.

Also, can you try deleting "iommu" and "dma" parsing in
of_supplier_bindings[] in driver/of/property.c and see if it helps?
Then we'd know this is the reason for things not working in your case.

> Thanks!
>
> [*] Although attaching means people like myself cannot read and comment
> on the log easily, without saving the attachment first.
> That's also the reason why patches should be submitted inline...

Yeah, I see your concern. If you want to add comments to logs when
sending them, yeah, please go ahead and put it inline. Or if someone
wants to add comments to what you attached, they could copy paste the
relevant sections and add comments.

Thanks,
Saravana

2021-01-19 21:55:49

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

On Tue, Jan 19, 2021 at 10:08 AM Saravana Kannan <[email protected]> wrote:
>
> On Tue, Jan 19, 2021 at 1:05 AM Geert Uytterhoeven <[email protected]> wrote:
> >
> > Hi Saravana,
> >
> > On Mon, Jan 18, 2021 at 10:19 PM Saravana Kannan <[email protected]> wrote:
> > > On Mon, Jan 18, 2021 at 11:16 AM Geert Uytterhoeven
> > > <[email protected]> wrote:
> > > > On Mon, Jan 18, 2021 at 6:59 PM Marc Zyngier <[email protected]> wrote:
> > > > > On 2021-01-18 17:39, Geert Uytterhoeven wrote:
> > > > > > On Fri, Dec 18, 2020 at 4:34 AM Saravana Kannan <[email protected]>
> > > > > > wrote:
> > > > > >> Cyclic dependencies in some firmware was one of the last remaining
> > > > > >> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> > > > > >> dependencies don't block probing, set fw_devlink=on by default.
> > > > > >>
> > > > > >> Setting fw_devlink=on by default brings a bunch of benefits
> > > > > >> (currently,
> > > > > >> only for systems with device tree firmware):
> > > > > >> * Significantly cuts down deferred probes.
> > > > > >> * Device probe is effectively attempted in graph order.
> > > > > >> * Makes it much easier to load drivers as modules without having to
> > > > > >> worry about functional dependencies between modules (depmod is still
> > > > > >> needed for symbol dependencies).
> > > > > >>
> > > > > >> If this patch prevents some devices from probing, it's very likely due
> > > > > >> to the system having one or more device drivers that "probe"/set up a
> > > > > >> device (DT node with compatible property) without creating a struct
> > > > > >> device for it. If we hit such cases, the device drivers need to be
> > > > > >> fixed so that they populate struct devices and probe them like normal
> > > > > >> device drivers so that the driver core is aware of the devices and
> > > > > >> their
> > > > > >> status. See [1] for an example of such a case.
> > > > > >>
> > > > > >> [1] -
> > > > > >> https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> > > > > >> Signed-off-by: Saravana Kannan <[email protected]>
> > > > > >
> > > > > > Shimoda-san reported that next-20210111 and later fail to boot
> > > > > > on Renesas R-Car Gen3 platforms. No output is seen, unless earlycon
> > > > > > is enabled.
> > > > > >
> > > > > > I have bisected this to commit e590474768f1cc04 ("driver core: Set
> > > > > > fw_devlink=on by default").
> > > > >
> > > > > There is a tentative patch from Saravana here[1], which works around
> > > > > some issues on my RK3399 platform, and it'd be interesting to find
> > > > > out whether that helps on your system.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > M.
> > > > >
> > > > > [1]
> > > > > https://lore.kernel.org/r/[email protected]
> > > >
> > > > Thanks for the suggestion, but given no devices probe (incl. GPIO
> > > > providers), I'm afraid it won't help. [testing] Indeed.
> > > >
> > > > With the debug prints in device_links_check_suppliers enabled, and
> > > > some postprocessing, I get:
> > > >
> > > > 255 supplier e6180000.system-controller not ready
> > > > 9 supplier fe990000.iommu not ready
> > > > 9 supplier fe980000.iommu not ready
> > > > 6 supplier febd0000.iommu not ready
> > > > 6 supplier ec670000.iommu not ready
> > > > 3 supplier febe0000.iommu not ready
> > > > 3 supplier e7740000.iommu not ready
> > > > 3 supplier e6740000.iommu not ready
> > > > 3 supplier e65ee000.usb-phy not ready
> > > > 3 supplier e6570000.iommu not ready
> > > > 3 supplier e6054000.gpio not ready
> > > > 3 supplier e6053000.gpio not ready
> > > >
> > > > As everything is part of a PM Domain, the (lack of the) system controller
> > > > must be the culprit. What's wrong with it? It is registered very early in
> > > > the boot:
> > > >
> > > > [ 0.142096] rcar_sysc_pd_init:442: of_genpd_add_provider_onecell() returned 0
> >
> > > Looks like you found the important logs. Can you please enable all
> > > these logs and send the early con logs as an attachment (so I don't
> > > need to deal with lines getting wrapped)?
> > > 1. The ones in device_links_check_suppliers()
> > > 2. The ones in device_link_add()
> > > 3. initcall_debug=1
> >
> > I have attached[*] the requested log.
> >
> > > That should help us figure out what's going on. Also, what's the DT
> > > that corresponds to one of the boards that see this issue?
> >
> > arch/arm64/boot/dts/renesas/r8a77951-salvator-xs.dts
> >
> > > Lastly, can you please pick up these 3 patches (some need clean up
> > > before they merge) to make sure it's not an issue being worked on from
> > > other bug reports?
> > > https://lore.kernel.org/lkml/[email protected]/
> > > https://lore.kernel.org/lkml/[email protected]/
> > > https://lore.kernel.org/lkml/[email protected]/
> > >
> > > I have a strong hunch the 2nd one will fix your issues. fw_devlink can
> > > handle cyclic dependencies now (it basically reverts to
> > > fw_devlink=permissive mode for devices in the cycle), but it needs to
> > > "see" all the dependencies to know there's a cycle. So want to make
> > > sure it "sees" the "gpios" binding used all over some of the Renesas
> > > DT files.
> >
> > These patches don't help.
> > The 2nd one actually introduces a new failure:
> >
> > OF: /soc/i2c@e66d8000/gpio@20/pcie-sata-switch-hog: could not get
> > #gpio-cells for /cpus/cpu@102
> >
> > Note that my issues don't seem to be GPIO-related at all.
> >
> > BTW, you are aware IOMMUs and DMA controllers are optional?
> > I.e. device drivers with iommus and/or dmas DT properties where the
> > targets of these properties do not have a driver should still be probed,
> > eventually. But if the IOMMU or DMA drivers are present, they should be
> > probed first, so the device drivers can make use of them.
>
> Thanks for the logs and details.
>
> Yeah, this is going to be a problem then. How is this handled in
> static kernels today? Do we just try to make sure the iommus driver
> probes the iommu device before the consumers? And then the consumers
> simply don't defer probe on failure to get iommu?
>
> I can make this work if modules are not enabled (needs some code
> changes), but it's not going to work when there are modules. There's
> no way to tell if an iommu module won't be loaded soon. Also, device
> links doing this behavior only for iommu/dma is probably not a good
> idea. So, whatever we do will have to be common behavior. :(
>
> Another intermediate option I was thinking was having a
> CONFIG_FW_DEVLINK_OFF/PERMISSIVE/ON and defaulting it to ON for ARM64
> and turning it off in the defconfig for boards for which this doesn't
> work. That way, we can incrementally enable fw_devlink.
>
> This week is a very hectic week for me. So, please bear with slow
> responses from me for rest of this week. Let me think about this a bit
> to see if I can come up with a better solution than what I have in
> mind.
>
> Also, can you try deleting "iommu" and "dma" parsing in
> of_supplier_bindings[] in driver/of/property.c and see if it helps?
> Then we'd know this is the reason for things not working in your case.

Hi Geert,

I took a look at your logs. It looks like your guess is right. It's at
least one of the issues.

You'll need to convert drivers/soc/renesas/rcar-sysc.c into a platform
driver. You already have a platform device created for it. So just go
ahead and probe it with a platform driver. See what Marek did here
[1].

You probably had to implement it as an "initcall based driver"
because you had to play initcall chicken to make sure the PD hardware
was initialized before the consumers. With fw_devlink=on you won't
have to worry about that. As an added benefit of implementing a proper
platform driver, you can actually implement runtime PM now, your
suspend/resume would be more robust, etc.

[1] - https://lore.kernel.org/lkml/[email protected]/

-Saravana

>
> > Thanks!
> >
> > [*] Although attaching means people like myself cannot read and comment
> > on the log easily, without saving the attachment first.
> > That's also the reason why patches should be submitted inline...
>
> Yeah, I see your concern. If you want to add comments to logs when
> sending them, yeah, please go ahead and put it inline. Or if someone
> wants to add comments to what you attached, they could copy paste the
> relevant sections and add comments.
>
> Thanks,
> Saravana

2021-01-20 00:04:18

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

On Tue, Jan 19, 2021 at 2:41 AM Michael Walle <[email protected]> wrote:
>
> Am 2021-01-18 22:01, schrieb Saravana Kannan:
> > On Sun, Jan 17, 2021 at 3:01 PM Michael Walle <[email protected]> wrote:
> >> > Cyclic dependencies in some firmware was one of the last remaining
> >> > reasons fw_devlink=on couldn't be set by default. Now that cyclic
> >> > dependencies don't block probing, set fw_devlink=on by default.
> >> >
> >> > Setting fw_devlink=on by default brings a bunch of benefits (currently,
> >> > only for systems with device tree firmware):
> >> > * Significantly cuts down deferred probes.
> >> > * Device probe is effectively attempted in graph order.
> >> > * Makes it much easier to load drivers as modules without having to
> >> > worry about functional dependencies between modules (depmod is still
> >> > needed for symbol dependencies).
> >> >
> >> > If this patch prevents some devices from probing, it's very likely due
> >> > to the system having one or more device drivers that "probe"/set up a
> >> > device (DT node with compatible property) without creating a struct
> >> > device for it. If we hit such cases, the device drivers need to be
> >> > fixed so that they populate struct devices and probe them like normal
> >> > device drivers so that the driver core is aware of the devices and their
> >> > status. See [1] for an example of such a case.
> >> >
> >> > [1] - https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> >> > Signed-off-by: Saravana Kannan <[email protected]>
> >>
> >> This breaks (at least) probing of the PCIe controllers of my board.
> >> The
> >> driver in question is
> >> drivers/pci/controller/dwc/pci-layerscape.c
> >> I've also put the maintainers of this driver on CC. Looks like it uses
> >> a
> >> proper struct device. But it uses builtin_platform_driver_probe() and
> >> apparently it waits for the iommu which uses module_platform_driver().
> >> Dunno if that will work together.
> >
> > Yeah, the builtin vs module doesn't matter. I've had fw_devlink work
> > multiple times with the consumer driver being built in and the
> > supplier actually loaded as a module. Making that work is one of the
> > goals of fw_devlink.
>
> Ok.

Hi Michael,

My bad, I spoke too soon. I thought you were talking about builtin_ vs
module_. My response is correct in that context. But the problem here
is related to builtin_platform_driver_probe(). That macro expects the
device (PCI) to be added and ready to probe by the time it's called.
If not, it just gives up and frees the code. That's why it's not
getting called after the first attempt. Can you please convert it into
builtin_platform_driver()? It should be a pretty trivial change.

-Saravana

>
> >> The board device tree can be found here:
> >> arch/arm64/boot/dts/freescale/fsl-ls1028a-kontron-sl28-var3-ads2.dts
> >>
> >> Attached is the log with enabled "probe deferral" messages enabled.
> >
> > I took a look at the logs. As you said, pci seems to be waiting on
> > iommu, but it's not clear why the iommu didn't probe by then. Can you
> > add initcall_debug=1 and enable the logs in device_link_add()? Btw, I
> > realize one compromise on the logs is to send them as an attachment
> > instead of inline. That way, it's still archived in the list, but I
> > don't have to deal with log lines getting wrapped, etc.
> >
> > Thanks for reporting the issues. Also, could you try picking up all of
> > these changes and giving it a shot. It's unlikely to help, but I want
> > to rule out issues related to fixes in progress.
> >
> > https://lore.kernel.org/lkml/[email protected]/
> > https://lore.kernel.org/lkml/[email protected]/
> > https://lore.kernel.org/lkml/[email protected]/
>
> Did pick them up, the last one had a conflict due some superfluous
> lines.
> Maybe they got reordered in that arrray.
>
> Issue still persist. I've enabled the debug in device_link_add(), in
> device_links_check_suppliers() and booted with initcall_debug. Please
> see attached log. Lets see how that goes ;)
>
> [ 0.132687] calling ls_pcie_driver_init+0x0/0x38 @ 1
> [ 0.132762] platform 3400000.pcie: probe deferral - supplier
> 5000000.iommu not ready
> [ 0.132777] platform 3500000.pcie: probe deferral - supplier
> 5000000.iommu not ready
> [ 0.132818] initcall ls_pcie_driver_init+0x0/0x38 returned -19 after
> 119 usecs
>
> After that, ls_pcie_driver_init() is never called again.
>
> -michael

2021-01-20 06:06:55

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

Hi Saravana,

On Mon, Jan 18, 2021 at 10:19 PM Saravana Kannan <[email protected]> wrote:
> On Mon, Jan 18, 2021 at 11:16 AM Geert Uytterhoeven
> <[email protected]> wrote:
> > On Mon, Jan 18, 2021 at 6:59 PM Marc Zyngier <[email protected]> wrote:
> > > On 2021-01-18 17:39, Geert Uytterhoeven wrote:
> > > > On Fri, Dec 18, 2020 at 4:34 AM Saravana Kannan <[email protected]>
> > > > wrote:
> > > >> Cyclic dependencies in some firmware was one of the last remaining
> > > >> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> > > >> dependencies don't block probing, set fw_devlink=on by default.
> > > >>
> > > >> Setting fw_devlink=on by default brings a bunch of benefits
> > > >> (currently,
> > > >> only for systems with device tree firmware):
> > > >> * Significantly cuts down deferred probes.
> > > >> * Device probe is effectively attempted in graph order.
> > > >> * Makes it much easier to load drivers as modules without having to
> > > >> worry about functional dependencies between modules (depmod is still
> > > >> needed for symbol dependencies).
> > > >>
> > > >> If this patch prevents some devices from probing, it's very likely due
> > > >> to the system having one or more device drivers that "probe"/set up a
> > > >> device (DT node with compatible property) without creating a struct
> > > >> device for it. If we hit such cases, the device drivers need to be
> > > >> fixed so that they populate struct devices and probe them like normal
> > > >> device drivers so that the driver core is aware of the devices and
> > > >> their
> > > >> status. See [1] for an example of such a case.
> > > >>
> > > >> [1] -
> > > >> https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> > > >> Signed-off-by: Saravana Kannan <[email protected]>
> > > >
> > > > Shimoda-san reported that next-20210111 and later fail to boot
> > > > on Renesas R-Car Gen3 platforms. No output is seen, unless earlycon
> > > > is enabled.
> > > >
> > > > I have bisected this to commit e590474768f1cc04 ("driver core: Set
> > > > fw_devlink=on by default").
> > >
> > > There is a tentative patch from Saravana here[1], which works around
> > > some issues on my RK3399 platform, and it'd be interesting to find
> > > out whether that helps on your system.
> > >
> > > Thanks,
> > >
> > > M.
> > >
> > > [1]
> > > https://lore.kernel.org/r/[email protected]
> >
> > Thanks for the suggestion, but given no devices probe (incl. GPIO
> > providers), I'm afraid it won't help. [testing] Indeed.
> >
> > With the debug prints in device_links_check_suppliers enabled, and
> > some postprocessing, I get:
> >
> > 255 supplier e6180000.system-controller not ready
> > 9 supplier fe990000.iommu not ready
> > 9 supplier fe980000.iommu not ready
> > 6 supplier febd0000.iommu not ready
> > 6 supplier ec670000.iommu not ready
> > 3 supplier febe0000.iommu not ready
> > 3 supplier e7740000.iommu not ready
> > 3 supplier e6740000.iommu not ready
> > 3 supplier e65ee000.usb-phy not ready
> > 3 supplier e6570000.iommu not ready
> > 3 supplier e6054000.gpio not ready
> > 3 supplier e6053000.gpio not ready
> >
> > As everything is part of a PM Domain, the (lack of the) system controller
> > must be the culprit. What's wrong with it? It is registered very early in
> > the boot:
> >
> > [ 0.142096] rcar_sysc_pd_init:442: of_genpd_add_provider_onecell() returned 0

> Looks like you found the important logs. Can you please enable all
> these logs and send the early con logs as an attachment (so I don't
> need to deal with lines getting wrapped)?
> 1. The ones in device_links_check_suppliers()
> 2. The ones in device_link_add()
> 3. initcall_debug=1

I have attached[*] the requested log.

> That should help us figure out what's going on. Also, what's the DT
> that corresponds to one of the boards that see this issue?

arch/arm64/boot/dts/renesas/r8a77951-salvator-xs.dts

> Lastly, can you please pick up these 3 patches (some need clean up
> before they merge) to make sure it's not an issue being worked on from
> other bug reports?
> https://lore.kernel.org/lkml/[email protected]/
> https://lore.kernel.org/lkml/[email protected]/
> https://lore.kernel.org/lkml/[email protected]/
>
> I have a strong hunch the 2nd one will fix your issues. fw_devlink can
> handle cyclic dependencies now (it basically reverts to
> fw_devlink=permissive mode for devices in the cycle), but it needs to
> "see" all the dependencies to know there's a cycle. So want to make
> sure it "sees" the "gpios" binding used all over some of the Renesas
> DT files.

These patches don't help.
The 2nd one actually introduces a new failure:

OF: /soc/i2c@e66d8000/gpio@20/pcie-sata-switch-hog: could not get
#gpio-cells for /cpus/cpu@102

Note that my issues don't seem to be GPIO-related at all.

BTW, you are aware IOMMUs and DMA controllers are optional?
I.e. device drivers with iommus and/or dmas DT properties where the
targets of these properties do not have a driver should still be probed,
eventually. But if the IOMMU or DMA drivers are present, they should be
probed first, so the device drivers can make use of them.

Thanks!

[*] Although attaching means people like myself cannot read and comment
on the log easily, without saving the attachment first.
That's also the reason why patches should be submitted inline...

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Attachments:
dmesg-5.11.0-rc2-salvator-x-00011-g7b0c4737861f-dirty (120.93 kB)

2021-01-20 10:47:28

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

Hi Saravana,

On Tue, Jan 19, 2021 at 10:51 PM Saravana Kannan <[email protected]> wrote:
> On Tue, Jan 19, 2021 at 10:08 AM Saravana Kannan <[email protected]> wrote:
> > On Tue, Jan 19, 2021 at 1:05 AM Geert Uytterhoeven <[email protected]> wrote:
> > > On Mon, Jan 18, 2021 at 10:19 PM Saravana Kannan <[email protected]> wrote:
> > > > On Mon, Jan 18, 2021 at 11:16 AM Geert Uytterhoeven
> > > > <[email protected]> wrote:
> > > > > On Mon, Jan 18, 2021 at 6:59 PM Marc Zyngier <[email protected]> wrote:
> > > > > > On 2021-01-18 17:39, Geert Uytterhoeven wrote:
> > > > > > > On Fri, Dec 18, 2020 at 4:34 AM Saravana Kannan <[email protected]>
> > > > > > > wrote:
> > > > > > >> Cyclic dependencies in some firmware was one of the last remaining
> > > > > > >> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> > > > > > >> dependencies don't block probing, set fw_devlink=on by default.
> > > > > > >>
> > > > > > >> Setting fw_devlink=on by default brings a bunch of benefits
> > > > > > >> (currently,
> > > > > > >> only for systems with device tree firmware):
> > > > > > >> * Significantly cuts down deferred probes.
> > > > > > >> * Device probe is effectively attempted in graph order.
> > > > > > >> * Makes it much easier to load drivers as modules without having to
> > > > > > >> worry about functional dependencies between modules (depmod is still
> > > > > > >> needed for symbol dependencies).
> > > > > > >>
> > > > > > >> If this patch prevents some devices from probing, it's very likely due
> > > > > > >> to the system having one or more device drivers that "probe"/set up a
> > > > > > >> device (DT node with compatible property) without creating a struct
> > > > > > >> device for it. If we hit such cases, the device drivers need to be
> > > > > > >> fixed so that they populate struct devices and probe them like normal
> > > > > > >> device drivers so that the driver core is aware of the devices and
> > > > > > >> their
> > > > > > >> status. See [1] for an example of such a case.
> > > > > > >>
> > > > > > >> [1] -
> > > > > > >> https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> > > > > > >> Signed-off-by: Saravana Kannan <[email protected]>
> > > > > > >
> > > > > > > Shimoda-san reported that next-20210111 and later fail to boot
> > > > > > > on Renesas R-Car Gen3 platforms. No output is seen, unless earlycon
> > > > > > > is enabled.
> > > > > > >
> > > > > > > I have bisected this to commit e590474768f1cc04 ("driver core: Set
> > > > > > > fw_devlink=on by default").
> > > > > >
> > > > > > There is a tentative patch from Saravana here[1], which works around
> > > > > > some issues on my RK3399 platform, and it'd be interesting to find
> > > > > > out whether that helps on your system.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > M.
> > > > > >
> > > > > > [1]
> > > > > > https://lore.kernel.org/r/[email protected]
> > > > >
> > > > > Thanks for the suggestion, but given no devices probe (incl. GPIO
> > > > > providers), I'm afraid it won't help. [testing] Indeed.
> > > > >
> > > > > With the debug prints in device_links_check_suppliers enabled, and
> > > > > some postprocessing, I get:
> > > > >
> > > > > 255 supplier e6180000.system-controller not ready
> > > > > 9 supplier fe990000.iommu not ready
> > > > > 9 supplier fe980000.iommu not ready
> > > > > 6 supplier febd0000.iommu not ready
> > > > > 6 supplier ec670000.iommu not ready
> > > > > 3 supplier febe0000.iommu not ready
> > > > > 3 supplier e7740000.iommu not ready
> > > > > 3 supplier e6740000.iommu not ready
> > > > > 3 supplier e65ee000.usb-phy not ready
> > > > > 3 supplier e6570000.iommu not ready
> > > > > 3 supplier e6054000.gpio not ready
> > > > > 3 supplier e6053000.gpio not ready
> > > > >
> > > > > As everything is part of a PM Domain, the (lack of the) system controller
> > > > > must be the culprit. What's wrong with it? It is registered very early in
> > > > > the boot:
> > > > >
> > > > > [ 0.142096] rcar_sysc_pd_init:442: of_genpd_add_provider_onecell() returned 0
> > >
> > > > Looks like you found the important logs. Can you please enable all
> > > > these logs and send the early con logs as an attachment (so I don't
> > > > need to deal with lines getting wrapped)?
> > > > 1. The ones in device_links_check_suppliers()
> > > > 2. The ones in device_link_add()
> > > > 3. initcall_debug=1
> > >
> > > I have attached[*] the requested log.
> > >
> > > > That should help us figure out what's going on. Also, what's the DT
> > > > that corresponds to one of the boards that see this issue?
> > >
> > > arch/arm64/boot/dts/renesas/r8a77951-salvator-xs.dts
> > >
> > > > Lastly, can you please pick up these 3 patches (some need clean up
> > > > before they merge) to make sure it's not an issue being worked on from
> > > > other bug reports?
> > > > https://lore.kernel.org/lkml/[email protected]/
> > > > https://lore.kernel.org/lkml/[email protected]/
> > > > https://lore.kernel.org/lkml/[email protected]/
> > > >
> > > > I have a strong hunch the 2nd one will fix your issues. fw_devlink can
> > > > handle cyclic dependencies now (it basically reverts to
> > > > fw_devlink=permissive mode for devices in the cycle), but it needs to
> > > > "see" all the dependencies to know there's a cycle. So want to make
> > > > sure it "sees" the "gpios" binding used all over some of the Renesas
> > > > DT files.
> > >
> > > These patches don't help.
> > > The 2nd one actually introduces a new failure:
> > >
> > > OF: /soc/i2c@e66d8000/gpio@20/pcie-sata-switch-hog: could not get
> > > #gpio-cells for /cpus/cpu@102
> > >
> > > Note that my issues don't seem to be GPIO-related at all.

> I took a look at your logs. It looks like your guess is right. It's at
> least one of the issues.
>
> You'll need to convert drivers/soc/renesas/rcar-sysc.c into a platform
> driver. You already have a platform device created for it. So just go
> ahead and probe it with a platform driver. See what Marek did here
> [1].
>
> You probably had to implement it as an "initcall based driver"
> because you had to play initcall chicken to make sure the PD hardware
> was initialized before the consumers. With fw_devlink=on you won't
> have to worry about that. As an added benefit of implementing a proper
> platform driver, you can actually implement runtime PM now, your
> suspend/resume would be more robust, etc.

On R-Car H1, the system controller driver needs to be active before
secondary CPU setup, hence the early_initcall().
platform_bus_init() is called after that, so this is gonna need a split
initialization. Or a dummy platform driver to make devlinks think
everything is fine ;-)

So basically all producer DT drivers not using a platform (or e.g. i2c)
driver are now broken?
Including all clock drivers using CLK_OF_DECLARE()?

$ git grep -L "\<[a-z0-9]*_driver\>" -- $(git grep -l
"\.compatible\>") | wc -l
249

(includes false positives)

I doubt they'll all get fixed for v5.12, as we're already at rc4...

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2021-01-20 10:51:27

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

Hi Saravana,

On Tue, Jan 19, 2021 at 7:09 PM Saravana Kannan <[email protected]> wrote:
> On Tue, Jan 19, 2021 at 1:05 AM Geert Uytterhoeven <[email protected]> wrote:
> > BTW, you are aware IOMMUs and DMA controllers are optional?
> > I.e. device drivers with iommus and/or dmas DT properties where the
> > targets of these properties do not have a driver should still be probed,
> > eventually. But if the IOMMU or DMA drivers are present, they should be
> > probed first, so the device drivers can make use of them.
>
> Yeah, this is going to be a problem then. How is this handled in
> static kernels today? Do we just try to make sure the iommus driver
> probes the iommu device before the consumers? And then the consumers
> simply don't defer probe on failure to get iommu?

Iommus are handled by the iommu framework, not by the driver.
So the framework decides if/when it's OK to probe a device tied to an
iommu. Hence the consumers' drivers don't return -EPROBE_DEFER, the
framework takes care of that, before drivers' probe() functions are
called.

DMA is handled by consumer drivers, and driver-specific. Many consumer
drivers consider DMA optional, and fall back to PIO if getting the DMA
channel failed. Some drivers retry getting the DMA channel when the
device is used, and thus may start using DMA when the DMAC driver
appears and probes.

> I can make this work if modules are not enabled (needs some code
> changes), but it's not going to work when there are modules. There's
> no way to tell if an iommu module won't be loaded soon. Also, device
> links doing this behavior only for iommu/dma is probably not a good
> idea. So, whatever we do will have to be common behavior. :(

The iommu driver definitely needs to be built-in.
Modular DMAC drivers currently work with consumer drivers that
either consider DMA mandatory, or retry obtaining DMA channels.

> Also, can you try deleting "iommu" and "dma" parsing in
> of_supplier_bindings[] in driver/of/property.c and see if it helps?
> Then we'd know this is the reason for things not working in your case.

It also fails on another system without "iommus" properties:

182 supplier e6180000.system-controller not ready
18 supplier e6055400.gpio not ready
15 supplier e6055800.gpio not ready
15 supplier e6052000.gpio not ready
6 supplier e6055000.gpio not ready

The system controller is the culprit, and is a dependency for all
devices due to power-domains.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2021-01-20 15:00:26

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

Hi Saravana,

On Wed, Jan 20, 2021 at 10:40 AM Geert Uytterhoeven
<[email protected]> wrote:
> On Tue, Jan 19, 2021 at 10:51 PM Saravana Kannan <[email protected]> wrote:
> > On Tue, Jan 19, 2021 at 10:08 AM Saravana Kannan <[email protected]> wrote:
> > > On Tue, Jan 19, 2021 at 1:05 AM Geert Uytterhoeven <[email protected]> wrote:
> > > > On Mon, Jan 18, 2021 at 10:19 PM Saravana Kannan <[email protected]> wrote:
> > > > > On Mon, Jan 18, 2021 at 11:16 AM Geert Uytterhoeven
> > > > > <[email protected]> wrote:
> > > > > > On Mon, Jan 18, 2021 at 6:59 PM Marc Zyngier <[email protected]> wrote:
> > > > > > > On 2021-01-18 17:39, Geert Uytterhoeven wrote:
> > > > > > > > On Fri, Dec 18, 2020 at 4:34 AM Saravana Kannan <[email protected]>
> > > > > > > > wrote:
> > > > > > > >> Cyclic dependencies in some firmware was one of the last remaining
> > > > > > > >> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> > > > > > > >> dependencies don't block probing, set fw_devlink=on by default.
> > > > > > > >>
> > > > > > > >> Setting fw_devlink=on by default brings a bunch of benefits
> > > > > > > >> (currently,
> > > > > > > >> only for systems with device tree firmware):
> > > > > > > >> * Significantly cuts down deferred probes.
> > > > > > > >> * Device probe is effectively attempted in graph order.
> > > > > > > >> * Makes it much easier to load drivers as modules without having to
> > > > > > > >> worry about functional dependencies between modules (depmod is still
> > > > > > > >> needed for symbol dependencies).
> > > > > > > >>
> > > > > > > >> If this patch prevents some devices from probing, it's very likely due
> > > > > > > >> to the system having one or more device drivers that "probe"/set up a
> > > > > > > >> device (DT node with compatible property) without creating a struct
> > > > > > > >> device for it. If we hit such cases, the device drivers need to be
> > > > > > > >> fixed so that they populate struct devices and probe them like normal
> > > > > > > >> device drivers so that the driver core is aware of the devices and
> > > > > > > >> their
> > > > > > > >> status. See [1] for an example of such a case.
> > > > > > > >>
> > > > > > > >> [1] -
> > > > > > > >> https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> > > > > > > >> Signed-off-by: Saravana Kannan <[email protected]>
> > > > > > > >
> > > > > > > > Shimoda-san reported that next-20210111 and later fail to boot
> > > > > > > > on Renesas R-Car Gen3 platforms. No output is seen, unless earlycon
> > > > > > > > is enabled.
> > > > > > > >
> > > > > > > > I have bisected this to commit e590474768f1cc04 ("driver core: Set
> > > > > > > > fw_devlink=on by default").
> > > > > > >
> > > > > > > There is a tentative patch from Saravana here[1], which works around
> > > > > > > some issues on my RK3399 platform, and it'd be interesting to find
> > > > > > > out whether that helps on your system.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > M.
> > > > > > >
> > > > > > > [1]
> > > > > > > https://lore.kernel.org/r/[email protected]
> > > > > >
> > > > > > Thanks for the suggestion, but given no devices probe (incl. GPIO
> > > > > > providers), I'm afraid it won't help. [testing] Indeed.
> > > > > >
> > > > > > With the debug prints in device_links_check_suppliers enabled, and
> > > > > > some postprocessing, I get:
> > > > > >
> > > > > > 255 supplier e6180000.system-controller not ready
> > > > > > 9 supplier fe990000.iommu not ready
> > > > > > 9 supplier fe980000.iommu not ready
> > > > > > 6 supplier febd0000.iommu not ready
> > > > > > 6 supplier ec670000.iommu not ready
> > > > > > 3 supplier febe0000.iommu not ready
> > > > > > 3 supplier e7740000.iommu not ready
> > > > > > 3 supplier e6740000.iommu not ready
> > > > > > 3 supplier e65ee000.usb-phy not ready
> > > > > > 3 supplier e6570000.iommu not ready
> > > > > > 3 supplier e6054000.gpio not ready
> > > > > > 3 supplier e6053000.gpio not ready
> > > > > >
> > > > > > As everything is part of a PM Domain, the (lack of the) system controller
> > > > > > must be the culprit. What's wrong with it? It is registered very early in
> > > > > > the boot:
> > > > > >
> > > > > > [ 0.142096] rcar_sysc_pd_init:442: of_genpd_add_provider_onecell() returned 0
> > > >
> > > > > Looks like you found the important logs. Can you please enable all
> > > > > these logs and send the early con logs as an attachment (so I don't
> > > > > need to deal with lines getting wrapped)?
> > > > > 1. The ones in device_links_check_suppliers()
> > > > > 2. The ones in device_link_add()
> > > > > 3. initcall_debug=1
> > > >
> > > > I have attached[*] the requested log.
> > > >
> > > > > That should help us figure out what's going on. Also, what's the DT
> > > > > that corresponds to one of the boards that see this issue?
> > > >
> > > > arch/arm64/boot/dts/renesas/r8a77951-salvator-xs.dts
> > > >
> > > > > Lastly, can you please pick up these 3 patches (some need clean up
> > > > > before they merge) to make sure it's not an issue being worked on from
> > > > > other bug reports?
> > > > > https://lore.kernel.org/lkml/[email protected]/
> > > > > https://lore.kernel.org/lkml/[email protected]/
> > > > > https://lore.kernel.org/lkml/[email protected]/
> > > > >
> > > > > I have a strong hunch the 2nd one will fix your issues. fw_devlink can
> > > > > handle cyclic dependencies now (it basically reverts to
> > > > > fw_devlink=permissive mode for devices in the cycle), but it needs to
> > > > > "see" all the dependencies to know there's a cycle. So want to make
> > > > > sure it "sees" the "gpios" binding used all over some of the Renesas
> > > > > DT files.
> > > >
> > > > These patches don't help.
> > > > The 2nd one actually introduces a new failure:
> > > >
> > > > OF: /soc/i2c@e66d8000/gpio@20/pcie-sata-switch-hog: could not get
> > > > #gpio-cells for /cpus/cpu@102
> > > >
> > > > Note that my issues don't seem to be GPIO-related at all.
>
> > I took a look at your logs. It looks like your guess is right. It's at
> > least one of the issues.
> >
> > You'll need to convert drivers/soc/renesas/rcar-sysc.c into a platform
> > driver. You already have a platform device created for it. So just go
> > ahead and probe it with a platform driver. See what Marek did here
> > [1].
> >
> > You probably had to implement it as an "initcall based driver"
> > because you had to play initcall chicken to make sure the PD hardware
> > was initialized before the consumers. With fw_devlink=on you won't
> > have to worry about that. As an added benefit of implementing a proper
> > platform driver, you can actually implement runtime PM now, your
> > suspend/resume would be more robust, etc.
>
> On R-Car H1, the system controller driver needs to be active before
> secondary CPU setup, hence the early_initcall().
> platform_bus_init() is called after that, so this is gonna need a split
> initialization. Or a dummy platform driver to make devlinks think
> everything is fine ;-)

Note that adding a dummy platform driver does work.

> So basically all producer DT drivers not using a platform (or e.g. i2c)
> driver are now broken?
> Including all clock drivers using CLK_OF_DECLARE()?

Oh, of_link_to_phandle() ignores device nodes where OF_POPULATED
is set, and of_clk_init() sets that flag. So rcar-sysc should do so, too.
Patch sent.

> $ git grep -L "\<[a-z0-9]*_driver\>" -- $(git grep -l
> "\.compatible\>") | wc -l
> 249
>
> (includes false positives)
>
> I doubt they'll all get fixed for v5.12, as we're already at rc4...

Still more than 100 drivers to fix?

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2021-01-20 17:36:34

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

On Wed, Jan 20, 2021 at 6:27 AM Geert Uytterhoeven <[email protected]> wrote:
>
> Hi Saravana,
>
> On Wed, Jan 20, 2021 at 10:40 AM Geert Uytterhoeven
> <[email protected]> wrote:
> > On Tue, Jan 19, 2021 at 10:51 PM Saravana Kannan <[email protected]> wrote:
> > > On Tue, Jan 19, 2021 at 10:08 AM Saravana Kannan <[email protected]> wrote:
> > > > On Tue, Jan 19, 2021 at 1:05 AM Geert Uytterhoeven <[email protected]> wrote:
> > > > > On Mon, Jan 18, 2021 at 10:19 PM Saravana Kannan <[email protected]> wrote:
> > > > > > On Mon, Jan 18, 2021 at 11:16 AM Geert Uytterhoeven
> > > > > > <[email protected]> wrote:
> > > > > > > On Mon, Jan 18, 2021 at 6:59 PM Marc Zyngier <[email protected]> wrote:
> > > > > > > > On 2021-01-18 17:39, Geert Uytterhoeven wrote:
> > > > > > > > > On Fri, Dec 18, 2020 at 4:34 AM Saravana Kannan <[email protected]>
> > > > > > > > > wrote:
> > > > > > > > >> Cyclic dependencies in some firmware was one of the last remaining
> > > > > > > > >> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> > > > > > > > >> dependencies don't block probing, set fw_devlink=on by default.
> > > > > > > > >>
> > > > > > > > >> Setting fw_devlink=on by default brings a bunch of benefits
> > > > > > > > >> (currently,
> > > > > > > > >> only for systems with device tree firmware):
> > > > > > > > >> * Significantly cuts down deferred probes.
> > > > > > > > >> * Device probe is effectively attempted in graph order.
> > > > > > > > >> * Makes it much easier to load drivers as modules without having to
> > > > > > > > >> worry about functional dependencies between modules (depmod is still
> > > > > > > > >> needed for symbol dependencies).
> > > > > > > > >>
> > > > > > > > >> If this patch prevents some devices from probing, it's very likely due
> > > > > > > > >> to the system having one or more device drivers that "probe"/set up a
> > > > > > > > >> device (DT node with compatible property) without creating a struct
> > > > > > > > >> device for it. If we hit such cases, the device drivers need to be
> > > > > > > > >> fixed so that they populate struct devices and probe them like normal
> > > > > > > > >> device drivers so that the driver core is aware of the devices and
> > > > > > > > >> their
> > > > > > > > >> status. See [1] for an example of such a case.
> > > > > > > > >>
> > > > > > > > >> [1] -
> > > > > > > > >> https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> > > > > > > > >> Signed-off-by: Saravana Kannan <[email protected]>
> > > > > > > > >
> > > > > > > > > Shimoda-san reported that next-20210111 and later fail to boot
> > > > > > > > > on Renesas R-Car Gen3 platforms. No output is seen, unless earlycon
> > > > > > > > > is enabled.
> > > > > > > > >
> > > > > > > > > I have bisected this to commit e590474768f1cc04 ("driver core: Set
> > > > > > > > > fw_devlink=on by default").
> > > > > > > >
> > > > > > > > There is a tentative patch from Saravana here[1], which works around
> > > > > > > > some issues on my RK3399 platform, and it'd be interesting to find
> > > > > > > > out whether that helps on your system.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > M.
> > > > > > > >
> > > > > > > > [1]
> > > > > > > > https://lore.kernel.org/r/[email protected]
> > > > > > >
> > > > > > > Thanks for the suggestion, but given no devices probe (incl. GPIO
> > > > > > > providers), I'm afraid it won't help. [testing] Indeed.
> > > > > > >
> > > > > > > With the debug prints in device_links_check_suppliers enabled, and
> > > > > > > some postprocessing, I get:
> > > > > > >
> > > > > > > 255 supplier e6180000.system-controller not ready
> > > > > > > 9 supplier fe990000.iommu not ready
> > > > > > > 9 supplier fe980000.iommu not ready
> > > > > > > 6 supplier febd0000.iommu not ready
> > > > > > > 6 supplier ec670000.iommu not ready
> > > > > > > 3 supplier febe0000.iommu not ready
> > > > > > > 3 supplier e7740000.iommu not ready
> > > > > > > 3 supplier e6740000.iommu not ready
> > > > > > > 3 supplier e65ee000.usb-phy not ready
> > > > > > > 3 supplier e6570000.iommu not ready
> > > > > > > 3 supplier e6054000.gpio not ready
> > > > > > > 3 supplier e6053000.gpio not ready
> > > > > > >
> > > > > > > As everything is part of a PM Domain, the (lack of the) system controller
> > > > > > > must be the culprit. What's wrong with it? It is registered very early in
> > > > > > > the boot:
> > > > > > >
> > > > > > > [ 0.142096] rcar_sysc_pd_init:442: of_genpd_add_provider_onecell() returned 0
> > > > >
> > > > > > Looks like you found the important logs. Can you please enable all
> > > > > > these logs and send the early con logs as an attachment (so I don't
> > > > > > need to deal with lines getting wrapped)?
> > > > > > 1. The ones in device_links_check_suppliers()
> > > > > > 2. The ones in device_link_add()
> > > > > > 3. initcall_debug=1
> > > > >
> > > > > I have attached[*] the requested log.
> > > > >
> > > > > > That should help us figure out what's going on. Also, what's the DT
> > > > > > that corresponds to one of the boards that see this issue?
> > > > >
> > > > > arch/arm64/boot/dts/renesas/r8a77951-salvator-xs.dts
> > > > >
> > > > > > Lastly, can you please pick up these 3 patches (some need clean up
> > > > > > before they merge) to make sure it's not an issue being worked on from
> > > > > > other bug reports?
> > > > > > https://lore.kernel.org/lkml/[email protected]/
> > > > > > https://lore.kernel.org/lkml/[email protected]/
> > > > > > https://lore.kernel.org/lkml/[email protected]/
> > > > > >
> > > > > > I have a strong hunch the 2nd one will fix your issues. fw_devlink can
> > > > > > handle cyclic dependencies now (it basically reverts to
> > > > > > fw_devlink=permissive mode for devices in the cycle), but it needs to
> > > > > > "see" all the dependencies to know there's a cycle. So want to make
> > > > > > sure it "sees" the "gpios" binding used all over some of the Renesas
> > > > > > DT files.
> > > > >
> > > > > These patches don't help.
> > > > > The 2nd one actually introduces a new failure:
> > > > >
> > > > > OF: /soc/i2c@e66d8000/gpio@20/pcie-sata-switch-hog: could not get
> > > > > #gpio-cells for /cpus/cpu@102
> > > > >
> > > > > Note that my issues don't seem to be GPIO-related at all.
> >
> > > I took a look at your logs. It looks like your guess is right. It's at
> > > least one of the issues.
> > >
> > > You'll need to convert drivers/soc/renesas/rcar-sysc.c into a platform
> > > driver. You already have a platform device created for it. So just go
> > > ahead and probe it with a platform driver. See what Marek did here
> > > [1].
> > >
> > > You probably had to implement it as an "initcall based driver"
> > > because you had to play initcall chicken to make sure the PD hardware
> > > was initialized before the consumers. With fw_devlink=on you won't
> > > have to worry about that. As an added benefit of implementing a proper
> > > platform driver, you can actually implement runtime PM now, your
> > > suspend/resume would be more robust, etc.
> >
> > On R-Car H1, the system controller driver needs to be active before
> > secondary CPU setup, hence the early_initcall().
> > platform_bus_init() is called after that, so this is gonna need a split
> > initialization. Or a dummy platform driver to make devlinks think
> > everything is fine ;-)

I was wondering if you could still probe the "not needed by CPU" power
domains (if there are any) as devices. Using driver-core brings you
good things :)

>
> Note that adding a dummy platform driver does work.
>
> > So basically all producer DT drivers not using a platform (or e.g. i2c)
> > driver are now broken?
> > Including all clock drivers using CLK_OF_DECLARE()?
>
> Oh, of_link_to_phandle() ignores device nodes where OF_POPULATED
> is set, and of_clk_init() sets that flag. So rcar-sysc should do so, too.
> Patch sent.
>
> > $ git grep -L "\<[a-z0-9]*_driver\>" -- $(git grep -l
> > "\.compatible\>") | wc -l
> > 249
> >
> > (includes false positives)
> >
> > I doubt they'll all get fixed for v5.12, as we're already at rc4...
>
> Still more than 100 drivers to fix?

Not fully sure what the grep is trying to catch, but fw_devlink
supports devices on any bus (i2c, platform, pci, etc). So that's not a
problem. It'll be a problem when a struct device is never created for
a real device. Or if it's created, but never probed.

I'm also looking into a bunch of other options for fallback when
fw_devlink=on doesn't work. Too much to explain here -- patches are
easier :)

-Saravana

2021-01-21 08:40:56

by Saravana Kannan

[permalink] [raw]
Subject: [TEST PATCH v1] driver: core: Make fw_devlink=on more forgiving

This patch is for test purposes only and pretty experimental. Code might
not be optimized, clean, formatted properly, etc.

Please review it only for functional bugs like locking bugs, wrong
logic, etc.

It's basically trying to figure out which devices will never probe and
ignore them. Might not always work.

Marek, Geert, Marc,

Can you please try this patch INSTEAD of the other workarounds we found?

Jon, Michael,

I'm explicitly not including you in the "To" because this patch won't
work for your issues.

Cc: Marek Szyprowski <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Marc Zyngier <[email protected]>
Signed-off-by: Saravana Kannan <[email protected]>
---
drivers/base/base.h | 3 ++
drivers/base/core.c | 117 +++++++++++++++++++++++++++++++++++++++++++-
drivers/base/dd.c | 24 +++++++++
3 files changed, 142 insertions(+), 2 deletions(-)

diff --git a/drivers/base/base.h b/drivers/base/base.h
index f5600a83124f..8d5fd95fa147 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -106,6 +106,9 @@ struct device_private {
#define to_device_private_class(obj) \
container_of(obj, struct device_private, knode_class)

+bool fw_devlink_is_permissive(void);
+bool fw_devlink_unblock_probe(struct device *dev);
+
/* initialisation functions */
extern int devices_init(void);
extern int buses_init(void);
diff --git a/drivers/base/core.c b/drivers/base/core.c
index e61e62b624ce..8528704bbb40 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -49,7 +49,6 @@ early_param("sysfs.deprecated", sysfs_deprecated_setup);
static LIST_HEAD(deferred_sync);
static unsigned int defer_sync_state_count = 1;
static DEFINE_MUTEX(fwnode_link_lock);
-static bool fw_devlink_is_permissive(void);

/**
* fwnode_link_add - Create a link between two fwnode_handles.
@@ -1481,7 +1480,7 @@ u32 fw_devlink_get_flags(void)
return fw_devlink_flags;
}

-static bool fw_devlink_is_permissive(void)
+bool fw_devlink_is_permissive(void)
{
return fw_devlink_flags == FW_DEVLINK_FLAGS_PERMISSIVE;
}
@@ -1552,6 +1551,120 @@ static int fw_devlink_relax_cycle(struct device *con, void *sup)
return ret;
}

+static int __device_links_suppliers_available(struct device *dev)
+{
+ struct device_link *link;
+ int ret = 0;
+
+ if (dev->fwnode && !list_empty(&dev->fwnode->suppliers) &&
+ !fw_devlink_is_permissive()) {
+ return -EPROBE_DEFER;
+ }
+
+ list_for_each_entry(link, &dev->links.suppliers, c_node) {
+ if (!(link->flags & DL_FLAG_MANAGED))
+ continue;
+
+ if (link->status != DL_STATE_AVAILABLE &&
+ !(link->flags & DL_FLAG_SYNC_STATE_ONLY)) {
+ ret = -EPROBE_DEFER;
+ break;
+ }
+ }
+
+ return ret;
+}
+
+bool fw_devlink_unblock_probe(struct device *dev)
+{
+ struct fwnode_link *link, *tmp;
+ struct device_link *dev_link, *dev_ln;
+ struct fwnode_handle *fwnode = dev->fwnode;
+ bool unblocked = false;
+
+ if (!fw_devlink_get_flags() || fw_devlink_is_permissive())
+ return false;
+
+ if (!fwnode)
+ return false;
+
+ mutex_lock(&fwnode_link_lock);
+
+ /* Delete questionable fwnode links */
+ list_for_each_entry_safe(link, tmp, &fwnode->suppliers, c_hook) {
+ struct device *par_dev;
+ struct fwnode_handle *par;
+ bool bound;
+
+ /*
+ * Walk up fwnode tree of supplier till we find a parent device
+ * that has been added or a parent fwnode that has fwnode links
+ * (this is a firmware node that is expected to be added as a
+ * device in the future).
+ */
+ par = fwnode_get_parent(link->supplier);
+ while (par && list_empty(&par->suppliers) && !par->dev)
+ par = fwnode_get_next_parent(par);
+
+ /* Supplier is waiting on parent device to be added. */
+ if (par && !par->dev) {
+ fwnode_handle_put(par);
+ continue;
+ }
+
+ if (par && par->dev) {
+ par_dev = get_dev_from_fwnode(fwnode);
+ device_lock(par_dev);
+ bound = device_is_bound(par_dev);
+ device_unlock(par_dev);
+ put_device(par_dev);
+
+ /* Supplier is waiting on parent device to be bound. */
+ if (!bound)
+ continue;
+ }
+
+ /*
+ * Supplier has no parent or the immediate parent device has
+ * been bound to a device. It should have been added by now.
+ * So, this link is spurious. Delete it.
+ */
+ dev_info(dev, "Deleting fwnode link to %pfwP\n",
+ link->supplier);
+ list_del(&link->s_hook);
+ list_del(&link->c_hook);
+ kfree(link);
+ unblocked = true;
+ }
+
+ if (IS_ENABLED(CONFIG_MODULES))
+ goto out;
+
+ device_links_write_lock();
+
+ list_for_each_entry_safe(dev_link, dev_ln, &dev->links.suppliers,
+ c_node) {
+ if (!(dev_link->flags & DL_FLAG_INFERRED) ||
+ dev_link->flags & DL_FLAG_SYNC_STATE_ONLY ||
+ dev_link->status != DL_STATE_DORMANT)
+ continue;
+
+ /* This supplier should have probed by now. */
+ if (!__device_links_suppliers_available(dev_link->supplier)) {
+ dev_info(dev, "Deleting dev link to %s\n",
+ dev_name(dev_link->supplier));
+ device_link_drop_managed(dev_link);
+ unblocked = true;
+ }
+ }
+
+ device_links_write_unlock();
+
+out:
+ mutex_unlock(&fwnode_link_lock);
+ return unblocked;
+}
+
/**
* fw_devlink_create_devlink - Create a device link from a consumer to fwnode
* @con - Consumer device for the device link
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 2f32f38a11ed..d4ccd2a2b6a4 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -301,6 +301,25 @@ static void deferred_probe_timeout_work_func(struct work_struct *work)
}
static DECLARE_DELAYED_WORK(deferred_probe_timeout_work, deferred_probe_timeout_work_func);

+static bool deferred_probe_fw_devlink_unblock(void)
+{
+ struct device *dev;
+ struct device_private *private;
+ bool unblocked = false;
+
+ if (!fw_devlink_get_flags() || fw_devlink_is_permissive())
+ return false;
+
+ mutex_lock(&deferred_probe_mutex);
+ list_for_each_entry(private, &deferred_probe_pending_list, deferred_probe) {
+ dev = private->device;
+ unblocked |= fw_devlink_unblock_probe(dev);
+ }
+ mutex_unlock(&deferred_probe_mutex);
+
+ return unblocked;
+}
+
/**
* deferred_probe_initcall() - Enable probing of deferred devices
*
@@ -317,6 +336,11 @@ static int deferred_probe_initcall(void)
driver_deferred_probe_trigger();
/* Sort as many dependencies as possible before exiting initcalls */
flush_work(&deferred_probe_work);
+
+ while (deferred_probe_fw_devlink_unblock()) {
+ driver_deferred_probe_trigger();
+ flush_work(&deferred_probe_work);
+ }
initcalls_done = true;

/*
--
2.30.0.296.g2bfb1c46d8-goog

2021-01-21 10:38:24

by Marek Szyprowski

[permalink] [raw]
Subject: Re: [TEST PATCH v1] driver: core: Make fw_devlink=on more forgiving

Hi Saravana,

On 21.01.2021 09:22, Saravana Kannan wrote:
> This patch is for test purposes only and pretty experimental. Code might
> not be optimized, clean, formatted properly, etc.
>
> Please review it only for functional bugs like locking bugs, wrong
> logic, etc.
>
> It's basically trying to figure out which devices will never probe and
> ignore them. Might not always work.
>
> Marek, Geert, Marc,
>
> Can you please try this patch INSTEAD of the other workarounds we found?

I've checked the latest linux-next with this patch and commit
c09a3e6c97f0 ("soc: samsung: pm_domains: Convert to regular platform
driver") reverted. Sadly it doesn't help. All devices that belongs to
the Exynos power domains are not probed at all ("supplier
10023cXX.power-domain not ready").

> Jon, Michael,
>
> I'm explicitly not including you in the "To" because this patch won't
> work for your issues.
>
> Cc: Marek Szyprowski <[email protected]>
> Cc: Geert Uytterhoeven <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> Signed-off-by: Saravana Kannan <[email protected]>
> ---
> drivers/base/base.h | 3 ++
> drivers/base/core.c | 117 +++++++++++++++++++++++++++++++++++++++++++-
> drivers/base/dd.c | 24 +++++++++
> 3 files changed, 142 insertions(+), 2 deletions(-)
>
> > [...]

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland

2021-01-21 16:09:51

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

Hi Saravana,

On Wed, Jan 20, 2021 at 6:23 PM Saravana Kannan <[email protected]> wrote:
> On Wed, Jan 20, 2021 at 6:27 AM Geert Uytterhoeven <[email protected]> wrote:
> > On Wed, Jan 20, 2021 at 10:40 AM Geert Uytterhoeven
> > <[email protected]> wrote:
> > > On Tue, Jan 19, 2021 at 10:51 PM Saravana Kannan <[email protected]> wrote:
> > > > On Tue, Jan 19, 2021 at 10:08 AM Saravana Kannan <[email protected]> wrote:
> > > > > On Tue, Jan 19, 2021 at 1:05 AM Geert Uytterhoeven <[email protected]> wrote:
> > > > > > On Mon, Jan 18, 2021 at 10:19 PM Saravana Kannan <[email protected]> wrote:
> > > > > > > On Mon, Jan 18, 2021 at 11:16 AM Geert Uytterhoeven
> > > > > > > <[email protected]> wrote:
> > > > > > > > On Mon, Jan 18, 2021 at 6:59 PM Marc Zyngier <[email protected]> wrote:
> > > > > > > > > On 2021-01-18 17:39, Geert Uytterhoeven wrote:
> > > > > > > > > > On Fri, Dec 18, 2020 at 4:34 AM Saravana Kannan <[email protected]>
> > > > > > > > > > wrote:
> > > > > > > > > >> Cyclic dependencies in some firmware was one of the last remaining
> > > > > > > > > >> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> > > > > > > > > >> dependencies don't block probing, set fw_devlink=on by default.
> > > > > > > > > >>
> > > > > > > > > >> Setting fw_devlink=on by default brings a bunch of benefits
> > > > > > > > > >> (currently,
> > > > > > > > > >> only for systems with device tree firmware):
> > > > > > > > > >> * Significantly cuts down deferred probes.
> > > > > > > > > >> * Device probe is effectively attempted in graph order.
> > > > > > > > > >> * Makes it much easier to load drivers as modules without having to
> > > > > > > > > >> worry about functional dependencies between modules (depmod is still
> > > > > > > > > >> needed for symbol dependencies).
> > > > > > > > > >>
> > > > > > > > > >> If this patch prevents some devices from probing, it's very likely due
> > > > > > > > > >> to the system having one or more device drivers that "probe"/set up a
> > > > > > > > > >> device (DT node with compatible property) without creating a struct
> > > > > > > > > >> device for it. If we hit such cases, the device drivers need to be
> > > > > > > > > >> fixed so that they populate struct devices and probe them like normal
> > > > > > > > > >> device drivers so that the driver core is aware of the devices and
> > > > > > > > > >> their
> > > > > > > > > >> status. See [1] for an example of such a case.
> > > > > > > > > >>
> > > > > > > > > >> [1] -
> > > > > > > > > >> https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> > > > > > > > > >> Signed-off-by: Saravana Kannan <[email protected]>
> > > > > > > > > >
> > > > > > > > > > Shimoda-san reported that next-20210111 and later fail to boot
> > > > > > > > > > on Renesas R-Car Gen3 platforms. No output is seen, unless earlycon
> > > > > > > > > > is enabled.
> > > > > > > > > >
> > > > > > > > > > I have bisected this to commit e590474768f1cc04 ("driver core: Set
> > > > > > > > > > fw_devlink=on by default").

> > > > You'll need to convert drivers/soc/renesas/rcar-sysc.c into a platform
> > > > driver. You already have a platform device created for it. So just go
> > > > ahead and probe it with a platform driver. See what Marek did here
> > > > [1].
> > > >
> > > > You probably had to implement it as an "initcall based driver"
> > > > because you had to play initcall chicken to make sure the PD hardware
> > > > was initialized before the consumers. With fw_devlink=on you won't
> > > > have to worry about that. As an added benefit of implementing a proper
> > > > platform driver, you can actually implement runtime PM now, your
> > > > suspend/resume would be more robust, etc.
> > >
> > > On R-Car H1, the system controller driver needs to be active before
> > > secondary CPU setup, hence the early_initcall().
> > > platform_bus_init() is called after that, so this is gonna need a split
> > > initialization. Or a dummy platform driver to make devlinks think
> > > everything is fine ;-)
>
> I was wondering if you could still probe the "not needed by CPU" power
> domains (if there are any) as devices. Using driver-core brings you
> good things :)

1. That would mean splitting the driver in two parts, looping over the
tables twice, while everything can just be done in the first pass?

2. Which "good things" do you have in mind? Making the driver modular?
Ignoring the dependency for secondary CPU setup on R-Car H1, this
driver could indeed be modular on R-Car Gen2 and Gen3, as long as
the boot loader would pass a ramdisk with the module to the kernel.
The ramdisk could not be loaded in any other way, as all I/O
devices are part of a PM Domain, and thus depend on the SYSC driver.
Note that on some (non-R-Car) SoCs, the timers may be part of a PM
Domain, too.

> > > So basically all producer DT drivers not using a platform (or e.g. i2c)
> > > driver are now broken?
> > > Including all clock drivers using CLK_OF_DECLARE()?
> >
> > Oh, of_link_to_phandle() ignores device nodes where OF_POPULATED
> > is set, and of_clk_init() sets that flag. So rcar-sysc should do so, too.
> > Patch sent.
> > > $ git grep -L "\<[a-z0-9]*_driver\>" -- $(git grep -l
> > > "\.compatible\>") | wc -l
> > > 249
> > >
> > > (includes false positives)
> > >
> > > I doubt they'll all get fixed for v5.12, as we're already at rc4...
> >
> > Still more than 100 drivers to fix?
>
> Not fully sure what the grep is trying to catch, but fw_devlink
> supports devices on any bus (i2c, platform, pci, etc). So that's not a
> problem. It'll be a problem when a struct device is never created for
> a real device. Or if it's created, but never probed.

The grep tries to catch drivers using DT matching (i.e. matching ".compatible")
and not using a driver model driver (i.e. not matching "*_driver").

> I'm also looking into a bunch of other options for fallback when
> fw_devlink=on doesn't work. Too much to explain here -- patches are
> easier :)

I gave it a try on all Renesas platforms I have local access to:

- R-Car Gen2/Gen3:
Setting OF_POPULATED in the rcar-sysc driver[1] made my standard
config boot again. Remaining issues:
- CONFIG_IPMMU_VMSA=n hangs: supplier fe990000.iommu not ready
- CONFIG_RCAR_DMAC=n hangs: supplier e7310000.dma-controller not ready
Note that Ethernet does not use the R-Car DMAC, so DHCP works.
Nevertheless, after that everything hangs, and the board does not
respond to pings anymore
Both IOMMU and DMAC dependencies are optional, hence should be dropped
at late boot (late_initcall?).

- SH-Mobile AG5 and R-Mobile APE6:
The rmobile-sysc driver is similar to the rcar-sysc driver, and does
not use a platform device.
Still, it works, because all dependencies on the System Controller
become unblocked when the rmobile-reset driver binds against the
"renesas,sysc-rmobile" device. Obviously it would fail if no
support for that driver is included in your kernel...

- R-Mobile A1:
Also using the rmobile-sysc driver.
However, this is a single core Cortex-A9, i.e. it does not have an
ARM architectured timer (like R-Mobile APE6) or Cortex-A9 Global
Timer (like SH-Mobile AG5). The timer used (TMU) is located in a PM
Domain controlled by the rmobile-sysc driver, and driver
initialization is postponed beyond the point where something relies
on a working timer, causing a hang.

Setting OF_POPULATED (like in my fix for the rcar-sysc driver) fixes
this, but prevents the rmobile-reset driver from binding against the
same device node, so the reset handling will have to be incorporated
into the rmobile-sysc driver (and will thus be registered very
early).

- RZ/A1 and RZ/A2:
These are not affected, as the timer used (OSTM) is not a platform
driver, but uses TIMER_OF_DECLARE().
Note that the RZ/A2 clock driver uses split initialization:
1. Early (timer) clocks are initialized from CLK_OF_DECLARE_DRIVER,
2. Other clocks are initialized by platform_driver_probe() from a
subsys_initcall.
If the OSTM driver would be a platform_driver, it would block on the
block dependency. Setting the OF_POPULATED flag in the clock driver
would not work: while that flag would unblock probing of the timer
driver, it would also prevent the second part of the clock driver
initialization.

Now, back to the things I was supposed to work on this week ;-)

[1] https://lore.kernel.org/linux-arm-kernel/[email protected]/


Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2021-01-25 17:09:38

by Tudor Ambarus

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

Hi, Saravana,

On 12/18/20 5:17 AM, Saravana Kannan wrote:
> Cyclic dependencies in some firmware was one of the last remaining
> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> dependencies don't block probing, set fw_devlink=on by default.
>
> Setting fw_devlink=on by default brings a bunch of benefits (currently,
> only for systems with device tree firmware):
> * Significantly cuts down deferred probes.
> * Device probe is effectively attempted in graph order.
> * Makes it much easier to load drivers as modules without having to
> worry about functional dependencies between modules (depmod is still
> needed for symbol dependencies).
>
> If this patch prevents some devices from probing, it's very likely due
> to the system having one or more device drivers that "probe"/set up a
> device (DT node with compatible property) without creating a struct
> device for it. If we hit such cases, the device drivers need to be
> fixed so that they populate struct devices and probe them like normal
> device drivers so that the driver core is aware of the devices and their
> status. See [1] for an example of such a case.
>
> [1] - https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> Signed-off-by: Saravana Kannan <[email protected]>

next-20210125 fails to boot on at91 sama5d2 platforms. No output is
seen, unless earlyprintk is enabled.

I have bisected this to commit e590474768f1cc04 ("driver core: Set
fw_devlink=on by default").

I've attached a log that I'm seeing on a sama5d2_xplained (sama5_defconfig
and arch/arm/boot/dts/at91-sama5d2_xplained.dts). I enabled the
following logs:
1. The ones in device_links_check_suppliers()
2. The ones in device_link_add()
3. initcall_debug=1

There seem to be some probe fails due to the pmc supplier not being ready:
calling at_xdmac_init+0x0/0x18 @ 1
platform f0010000.dma-controller: probe deferral - supplier f0014000.pmc not ready
platform f0004000.dma-controller: probe deferral - supplier f0014000.pmc not ready
initcall at_xdmac_init+0x0/0x18 returned -19 after 19531 usecs

calling udc_driver_init+0x0/0x18 @ 1
platform 300000.gadget: probe deferral - supplier f0014000.pmc not ready
initcall udc_driver_init+0x0/0x18 returned -19 after 7524 usecs

There are others too. I'm checking them.

Cheers,
ta


Attachments:
at91-sama5d2_xplained.log (53.78 kB)
at91-sama5d2_xplained.log

2021-01-25 18:21:38

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

On Mon, Jan 25, 2021 at 9:05 AM <[email protected]> wrote:
>
> Hi, Saravana,
>
> On 12/18/20 5:17 AM, Saravana Kannan wrote:
> > Cyclic dependencies in some firmware was one of the last remaining
> > reasons fw_devlink=on couldn't be set by default. Now that cyclic
> > dependencies don't block probing, set fw_devlink=on by default.
> >
> > Setting fw_devlink=on by default brings a bunch of benefits (currently,
> > only for systems with device tree firmware):
> > * Significantly cuts down deferred probes.
> > * Device probe is effectively attempted in graph order.
> > * Makes it much easier to load drivers as modules without having to
> > worry about functional dependencies between modules (depmod is still
> > needed for symbol dependencies).
> >
> > If this patch prevents some devices from probing, it's very likely due
> > to the system having one or more device drivers that "probe"/set up a
> > device (DT node with compatible property) without creating a struct
> > device for it. If we hit such cases, the device drivers need to be
> > fixed so that they populate struct devices and probe them like normal
> > device drivers so that the driver core is aware of the devices and their
> > status. See [1] for an example of such a case.
> >
> > [1] - https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> > Signed-off-by: Saravana Kannan <[email protected]>
>
> next-20210125 fails to boot on at91 sama5d2 platforms. No output is
> seen, unless earlyprintk is enabled.
>
> I have bisected this to commit e590474768f1cc04 ("driver core: Set
> fw_devlink=on by default").
>
> I've attached a log that I'm seeing on a sama5d2_xplained (sama5_defconfig
> and arch/arm/boot/dts/at91-sama5d2_xplained.dts). I enabled the
> following logs:
> 1. The ones in device_links_check_suppliers()
> 2. The ones in device_link_add()
> 3. initcall_debug=1
>
> There seem to be some probe fails due to the pmc supplier not being ready:
> calling at_xdmac_init+0x0/0x18 @ 1
> platform f0010000.dma-controller: probe deferral - supplier f0014000.pmc not ready
> platform f0004000.dma-controller: probe deferral - supplier f0014000.pmc not ready
> initcall at_xdmac_init+0x0/0x18 returned -19 after 19531 usecs
>
> calling udc_driver_init+0x0/0x18 @ 1
> platform 300000.gadget: probe deferral - supplier f0014000.pmc not ready
> initcall udc_driver_init+0x0/0x18 returned -19 after 7524 usecs
>
> There are others too. I'm checking them.

Thanks Tudor. I'll look into this within a few days. I'm also looking
into coming up with a more generic solution.

-Saravana

2021-01-25 23:34:33

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

On Thu, Jan 21, 2021 at 8:04 AM Geert Uytterhoeven <[email protected]> wrote:
>
> Hi Saravana,
>
> On Wed, Jan 20, 2021 at 6:23 PM Saravana Kannan <[email protected]> wrote:
> > On Wed, Jan 20, 2021 at 6:27 AM Geert Uytterhoeven <[email protected]> wrote:
> > > On Wed, Jan 20, 2021 at 10:40 AM Geert Uytterhoeven
> > > <[email protected]> wrote:
> > > > On Tue, Jan 19, 2021 at 10:51 PM Saravana Kannan <[email protected]> wrote:
> > > > > On Tue, Jan 19, 2021 at 10:08 AM Saravana Kannan <[email protected]> wrote:
> > > > > > On Tue, Jan 19, 2021 at 1:05 AM Geert Uytterhoeven <[email protected]> wrote:
> > > > > > > On Mon, Jan 18, 2021 at 10:19 PM Saravana Kannan <[email protected]> wrote:
> > > > > > > > On Mon, Jan 18, 2021 at 11:16 AM Geert Uytterhoeven
> > > > > > > > <[email protected]> wrote:
> > > > > > > > > On Mon, Jan 18, 2021 at 6:59 PM Marc Zyngier <[email protected]> wrote:
> > > > > > > > > > On 2021-01-18 17:39, Geert Uytterhoeven wrote:
> > > > > > > > > > > On Fri, Dec 18, 2020 at 4:34 AM Saravana Kannan <[email protected]>
> > > > > > > > > > > wrote:
> > > > > > > > > > >> Cyclic dependencies in some firmware was one of the last remaining
> > > > > > > > > > >> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> > > > > > > > > > >> dependencies don't block probing, set fw_devlink=on by default.
> > > > > > > > > > >>
> > > > > > > > > > >> Setting fw_devlink=on by default brings a bunch of benefits
> > > > > > > > > > >> (currently,
> > > > > > > > > > >> only for systems with device tree firmware):
> > > > > > > > > > >> * Significantly cuts down deferred probes.
> > > > > > > > > > >> * Device probe is effectively attempted in graph order.
> > > > > > > > > > >> * Makes it much easier to load drivers as modules without having to
> > > > > > > > > > >> worry about functional dependencies between modules (depmod is still
> > > > > > > > > > >> needed for symbol dependencies).
> > > > > > > > > > >>
> > > > > > > > > > >> If this patch prevents some devices from probing, it's very likely due
> > > > > > > > > > >> to the system having one or more device drivers that "probe"/set up a
> > > > > > > > > > >> device (DT node with compatible property) without creating a struct
> > > > > > > > > > >> device for it. If we hit such cases, the device drivers need to be
> > > > > > > > > > >> fixed so that they populate struct devices and probe them like normal
> > > > > > > > > > >> device drivers so that the driver core is aware of the devices and
> > > > > > > > > > >> their
> > > > > > > > > > >> status. See [1] for an example of such a case.
> > > > > > > > > > >>
> > > > > > > > > > >> [1] -
> > > > > > > > > > >> https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> > > > > > > > > > >> Signed-off-by: Saravana Kannan <[email protected]>
> > > > > > > > > > >
> > > > > > > > > > > Shimoda-san reported that next-20210111 and later fail to boot
> > > > > > > > > > > on Renesas R-Car Gen3 platforms. No output is seen, unless earlycon
> > > > > > > > > > > is enabled.
> > > > > > > > > > >
> > > > > > > > > > > I have bisected this to commit e590474768f1cc04 ("driver core: Set
> > > > > > > > > > > fw_devlink=on by default").
>
> > > > > You'll need to convert drivers/soc/renesas/rcar-sysc.c into a platform
> > > > > driver. You already have a platform device created for it. So just go
> > > > > ahead and probe it with a platform driver. See what Marek did here
> > > > > [1].
> > > > >
> > > > > You probably had to implement it as an "initcall based driver"
> > > > > because you had to play initcall chicken to make sure the PD hardware
> > > > > was initialized before the consumers. With fw_devlink=on you won't
> > > > > have to worry about that. As an added benefit of implementing a proper
> > > > > platform driver, you can actually implement runtime PM now, your
> > > > > suspend/resume would be more robust, etc.
> > > >
> > > > On R-Car H1, the system controller driver needs to be active before
> > > > secondary CPU setup, hence the early_initcall().
> > > > platform_bus_init() is called after that, so this is gonna need a split
> > > > initialization. Or a dummy platform driver to make devlinks think
> > > > everything is fine ;-)
> >
> > I was wondering if you could still probe the "not needed by CPU" power
> > domains (if there are any) as devices. Using driver-core brings you
> > good things :)
>
> 1. That would mean splitting the driver in two parts, looping over the
> tables twice, while everything can just be done in the first pass?
>
> 2. Which "good things" do you have in mind? Making the driver modular?
> Ignoring the dependency for secondary CPU setup on R-Car H1, this
> driver could indeed be modular on R-Car Gen2 and Gen3, as long as
> the boot loader would pass a ramdisk with the module to the kernel.
> The ramdisk could not be loaded in any other way, as all I/O
> devices are part of a PM Domain, and thus depend on the SYSC driver.
> Note that on some (non-R-Car) SoCs, the timers may be part of a PM
> Domain, too.

"Good things" like being able to implement runtime pm, suspend/resume
robustness (due to device links). There were a few more benefits I had
in mind when I wrote it, but I don't remember what it was.

The double pass itself is not that big of a deal IMHO. It probably
adds less than a millisecond.

>
> > > > So basically all producer DT drivers not using a platform (or e.g. i2c)
> > > > driver are now broken?
> > > > Including all clock drivers using CLK_OF_DECLARE()?
> > >
> > > Oh, of_link_to_phandle() ignores device nodes where OF_POPULATED
> > > is set, and of_clk_init() sets that flag. So rcar-sysc should do so, too.
> > > Patch sent.
> > > > $ git grep -L "\<[a-z0-9]*_driver\>" -- $(git grep -l
> > > > "\.compatible\>") | wc -l
> > > > 249
> > > >
> > > > (includes false positives)
> > > >
> > > > I doubt they'll all get fixed for v5.12, as we're already at rc4...
> > >
> > > Still more than 100 drivers to fix?
> >
> > Not fully sure what the grep is trying to catch, but fw_devlink
> > supports devices on any bus (i2c, platform, pci, etc). So that's not a
> > problem. It'll be a problem when a struct device is never created for
> > a real device. Or if it's created, but never probed.
>
> The grep tries to catch drivers using DT matching (i.e. matching ".compatible")
> and not using a driver model driver (i.e. not matching "*_driver").

Ah TIL about -L and -l. Thanks.

> > I'm also looking into a bunch of other options for fallback when
> > fw_devlink=on doesn't work. Too much to explain here -- patches are
> > easier :)
>
> I gave it a try on all Renesas platforms I have local access to:

Thanks a lot! Really appreciate the testing and reporting.

>
> - R-Car Gen2/Gen3:
> Setting OF_POPULATED in the rcar-sysc driver[1] made my standard
> config boot again. Remaining issues:
> - CONFIG_IPMMU_VMSA=n hangs: supplier fe990000.iommu not ready
> - CONFIG_RCAR_DMAC=n hangs: supplier e7310000.dma-controller not ready
> Note that Ethernet does not use the R-Car DMAC, so DHCP works.
> Nevertheless, after that everything hangs, and the board does not
> respond to pings anymore
> Both IOMMU and DMAC dependencies are optional, hence should be dropped
> at late boot (late_initcall?).

Yeah, I'm looking into a good/clean way of handling optional
suppliers. There are a bunch of corner cases I need to consider. But
in the end, I need to have it behave as closely as possible to
fw_devlink=permissive.

>
> - SH-Mobile AG5 and R-Mobile APE6:
> The rmobile-sysc driver is similar to the rcar-sysc driver, and does
> not use a platform device.
> Still, it works, because all dependencies on the System Controller
> become unblocked when the rmobile-reset driver binds against the
> "renesas,sysc-rmobile" device. Obviously it would fail if no
> support for that driver is included in your kernel...

Yeah, IMHO two real drivers (not stubs) for a single device tree node
is wrong/weird at a high level. I'd think one should be a child of the
other. But too late to fix that DT now.

Does it make sense for the rmobile-sysc driver to create a new
platform device and have the rmobule-reset bind to that instead? And
then you can bind a stub driver to the "renesas,sysc-rmobile" device?
I know this can be handled by whatever solution I come up with for the
IOMMU case, but that doesn't seem right for this case. We don't have
to decide on this now, but that's my current view.

> - R-Mobile A1:
> Also using the rmobile-sysc driver.
> However, this is a single core Cortex-A9, i.e. it does not have an
> ARM architectured timer (like R-Mobile APE6) or Cortex-A9 Global
> Timer (like SH-Mobile AG5). The timer used (TMU) is located in a PM
> Domain controlled by the rmobile-sysc driver, and driver
> initialization is postponed beyond the point where something relies
> on a working timer, causing a hang.
>
> Setting OF_POPULATED (like in my fix for the rcar-sysc driver) fixes
> this, but prevents the rmobile-reset driver from binding against the
> same device node, so the reset handling will have to be incorporated
> into the rmobile-sysc driver (and will thus be registered very
> early).

Or you can do the "create a child device" option I suggested above.

> - RZ/A1 and RZ/A2:
> These are not affected, as the timer used (OSTM) is not a platform
> driver, but uses TIMER_OF_DECLARE().
> Note that the RZ/A2 clock driver uses split initialization:
> 1. Early (timer) clocks are initialized from CLK_OF_DECLARE_DRIVER,
> 2. Other clocks are initialized by platform_driver_probe() from a
> subsys_initcall.
> If the OSTM driver would be a platform_driver, it would block on the
> block dependency. Setting the OF_POPULATED flag in the clock driver
> would not work: while that flag would unblock probing of the timer
> driver, it would also prevent the second part of the clock driver
> initialization.

So this looks like it's all working fine, right? Yeah, I already took
into account the *OF*_DECLARE macros when I wrote this and was aware
of the split driver implementations. So hopefully this all works out
fine.

> Now, back to the things I was supposed to work on this week ;-)

Really appreciate all this testing and feedback!

-Saravana

2021-01-27 06:02:31

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

Hi Saravana,

On Tue, Jan 26, 2021 at 12:31 AM Saravana Kannan <[email protected]> wrote:
> On Thu, Jan 21, 2021 at 8:04 AM Geert Uytterhoeven <[email protected]> wrote:
> > On Wed, Jan 20, 2021 at 6:23 PM Saravana Kannan <[email protected]> wrote:
> > > On Wed, Jan 20, 2021 at 6:27 AM Geert Uytterhoeven <[email protected]> wrote:
> > > > On Wed, Jan 20, 2021 at 10:40 AM Geert Uytterhoeven
> > > > <[email protected]> wrote:
> > > > > On Tue, Jan 19, 2021 at 10:51 PM Saravana Kannan <[email protected]> wrote:
> > > > > > On Tue, Jan 19, 2021 at 10:08 AM Saravana Kannan <[email protected]> wrote:
> > > > > > > On Tue, Jan 19, 2021 at 1:05 AM Geert Uytterhoeven <[email protected]> wrote:
> > > > > > > > On Mon, Jan 18, 2021 at 10:19 PM Saravana Kannan <[email protected]> wrote:
> > > > > > > > > On Mon, Jan 18, 2021 at 11:16 AM Geert Uytterhoeven
> > > > > > > > > <[email protected]> wrote:
> > > > > > > > > > On Mon, Jan 18, 2021 at 6:59 PM Marc Zyngier <[email protected]> wrote:
> > > > > > > > > > > On 2021-01-18 17:39, Geert Uytterhoeven wrote:
> > > > > > > > > > > > On Fri, Dec 18, 2020 at 4:34 AM Saravana Kannan <[email protected]>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >> Cyclic dependencies in some firmware was one of the last remaining
> > > > > > > > > > > >> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> > > > > > > > > > > >> dependencies don't block probing, set fw_devlink=on by default.
> > > > > > > > > > > >>
> > > > > > > > > > > >> Setting fw_devlink=on by default brings a bunch of benefits
> > > > > > > > > > > >> (currently,
> > > > > > > > > > > >> only for systems with device tree firmware):
> > > > > > > > > > > >> * Significantly cuts down deferred probes.
> > > > > > > > > > > >> * Device probe is effectively attempted in graph order.
> > > > > > > > > > > >> * Makes it much easier to load drivers as modules without having to
> > > > > > > > > > > >> worry about functional dependencies between modules (depmod is still
> > > > > > > > > > > >> needed for symbol dependencies).
> > > > > > > > > > > >>
> > > > > > > > > > > >> If this patch prevents some devices from probing, it's very likely due
> > > > > > > > > > > >> to the system having one or more device drivers that "probe"/set up a
> > > > > > > > > > > >> device (DT node with compatible property) without creating a struct
> > > > > > > > > > > >> device for it. If we hit such cases, the device drivers need to be
> > > > > > > > > > > >> fixed so that they populate struct devices and probe them like normal
> > > > > > > > > > > >> device drivers so that the driver core is aware of the devices and
> > > > > > > > > > > >> their
> > > > > > > > > > > >> status. See [1] for an example of such a case.
> > > > > > > > > > > >>
> > > > > > > > > > > >> [1] -
> > > > > > > > > > > >> https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> > > > > > > > > > > >> Signed-off-by: Saravana Kannan <[email protected]>
> > > > > > > > > > > >
> > > > > > > > > > > > Shimoda-san reported that next-20210111 and later fail to boot
> > > > > > > > > > > > on Renesas R-Car Gen3 platforms. No output is seen, unless earlycon
> > > > > > > > > > > > is enabled.
> > > > > > > > > > > >
> > > > > > > > > > > > I have bisected this to commit e590474768f1cc04 ("driver core: Set
> > > > > > > > > > > > fw_devlink=on by default").
> >
> > > > > > You'll need to convert drivers/soc/renesas/rcar-sysc.c into a platform
> > > > > > driver. You already have a platform device created for it. So just go
> > > > > > ahead and probe it with a platform driver. See what Marek did here
> > > > > > [1].
> > > > > >
> > > > > > You probably had to implement it as an "initcall based driver"
> > > > > > because you had to play initcall chicken to make sure the PD hardware
> > > > > > was initialized before the consumers. With fw_devlink=on you won't
> > > > > > have to worry about that. As an added benefit of implementing a proper
> > > > > > platform driver, you can actually implement runtime PM now, your
> > > > > > suspend/resume would be more robust, etc.
> > > > >
> > > > > On R-Car H1, the system controller driver needs to be active before
> > > > > secondary CPU setup, hence the early_initcall().
> > > > > platform_bus_init() is called after that, so this is gonna need a split
> > > > > initialization. Or a dummy platform driver to make devlinks think
> > > > > everything is fine ;-)
> > >
> > > I was wondering if you could still probe the "not needed by CPU" power
> > > domains (if there are any) as devices. Using driver-core brings you
> > > good things :)
> >
> > 1. That would mean splitting the driver in two parts, looping over the
> > tables twice, while everything can just be done in the first pass?
> >
> > 2. Which "good things" do you have in mind? Making the driver modular?
> > Ignoring the dependency for secondary CPU setup on R-Car H1, this
> > driver could indeed be modular on R-Car Gen2 and Gen3, as long as
> > the boot loader would pass a ramdisk with the module to the kernel.
> > The ramdisk could not be loaded in any other way, as all I/O
> > devices are part of a PM Domain, and thus depend on the SYSC driver.
> > Note that on some (non-R-Car) SoCs, the timers may be part of a PM
> > Domain, too.
>
> "Good things" like being able to implement runtime pm, suspend/resume
> robustness (due to device links). There were a few more benefits I had
> in mind when I wrote it, but I don't remember what it was.

While that is valid for I/O devices, the System Controller is a power
provider, and thus provides Runtime PM services itself. It does not use
Runtime PM itself, as it is always-on.

Note that, in theory, you can have a power provider that can be
powered-down, and thus would use (need) Runtime PM, but then you need to
have a second power provider that is always-on to control the first one,
and the problem would just shift to the second one.

> The double pass itself is not that big of a deal IMHO. It probably
> adds less than a millisecond.

Not all embedded systems run at multi-GHz speed...

> > > > > So basically all producer DT drivers not using a platform (or e.g. i2c)
> > > > > driver are now broken?
> > > > > Including all clock drivers using CLK_OF_DECLARE()?
> > > >
> > > > Oh, of_link_to_phandle() ignores device nodes where OF_POPULATED
> > > > is set, and of_clk_init() sets that flag. So rcar-sysc should do so, too.
> > > > Patch sent.
> > > > > $ git grep -L "\<[a-z0-9]*_driver\>" -- $(git grep -l
> > > > > "\.compatible\>") | wc -l
> > > > > 249
> > > > >
> > > > > (includes false positives)
> > > > >
> > > > > I doubt they'll all get fixed for v5.12, as we're already at rc4...
> > > >
> > > > Still more than 100 drivers to fix?
> > >
> > > Not fully sure what the grep is trying to catch, but fw_devlink
> > > supports devices on any bus (i2c, platform, pci, etc). So that's not a
> > > problem. It'll be a problem when a struct device is never created for
> > > a real device. Or if it's created, but never probed.
> >
> > The grep tries to catch drivers using DT matching (i.e. matching ".compatible")
> > and not using a driver model driver (i.e. not matching "*_driver").
>
> Ah TIL about -L and -l. Thanks.
>
> > > I'm also looking into a bunch of other options for fallback when
> > > fw_devlink=on doesn't work. Too much to explain here -- patches are
> > > easier :)
> >
> > I gave it a try on all Renesas platforms I have local access to:
>
> Thanks a lot! Really appreciate the testing and reporting.
>
> >
> > - R-Car Gen2/Gen3:
> > Setting OF_POPULATED in the rcar-sysc driver[1] made my standard
> > config boot again. Remaining issues:
> > - CONFIG_IPMMU_VMSA=n hangs: supplier fe990000.iommu not ready
> > - CONFIG_RCAR_DMAC=n hangs: supplier e7310000.dma-controller not ready
> > Note that Ethernet does not use the R-Car DMAC, so DHCP works.
> > Nevertheless, after that everything hangs, and the board does not
> > respond to pings anymore
> > Both IOMMU and DMAC dependencies are optional, hence should be dropped
> > at late boot (late_initcall?).
>
> Yeah, I'm looking into a good/clean way of handling optional
> suppliers. There are a bunch of corner cases I need to consider. But
> in the end, I need to have it behave as closely as possible to
> fw_devlink=permissive.

OK.

> > - SH-Mobile AG5 and R-Mobile APE6:
> > The rmobile-sysc driver is similar to the rcar-sysc driver, and does
> > not use a platform device.
> > Still, it works, because all dependencies on the System Controller
> > become unblocked when the rmobile-reset driver binds against the
> > "renesas,sysc-rmobile" device. Obviously it would fail if no
> > support for that driver is included in your kernel...
>
> Yeah, IMHO two real drivers (not stubs) for a single device tree node
> is wrong/weird at a high level. I'd think one should be a child of the
> other. But too late to fix that DT now.
>
> Does it make sense for the rmobile-sysc driver to create a new
> platform device and have the rmobule-reset bind to that instead? And
> then you can bind a stub driver to the "renesas,sysc-rmobile" device?
> I know this can be handled by whatever solution I come up with for the
> IOMMU case, but that doesn't seem right for this case. We don't have
> to decide on this now, but that's my current view.

I guess registering the (system) reset handler in the rmobile-sysc driver
is the simplest solution. We already have clock drivers
registering (device) reset support, as module clock and module reset
are typically combined in the same hardware block.

> > - R-Mobile A1:
> > Also using the rmobile-sysc driver.
> > However, this is a single core Cortex-A9, i.e. it does not have an
> > ARM architectured timer (like R-Mobile APE6) or Cortex-A9 Global
> > Timer (like SH-Mobile AG5). The timer used (TMU) is located in a PM
> > Domain controlled by the rmobile-sysc driver, and driver
> > initialization is postponed beyond the point where something relies
> > on a working timer, causing a hang.
> >
> > Setting OF_POPULATED (like in my fix for the rcar-sysc driver) fixes
> > this, but prevents the rmobile-reset driver from binding against the
> > same device node, so the reset handling will have to be incorporated
> > into the rmobile-sysc driver (and will thus be registered very
> > early).

So the rmobile-sysc driver has to stay a DT driver.

> Or you can do the "create a child device" option I suggested above.

Registering a reset handler from the rmobile-sysc driver is fine.

> > - RZ/A1 and RZ/A2:
> > These are not affected, as the timer used (OSTM) is not a platform
> > driver, but uses TIMER_OF_DECLARE().
> > Note that the RZ/A2 clock driver uses split initialization:
> > 1. Early (timer) clocks are initialized from CLK_OF_DECLARE_DRIVER,
> > 2. Other clocks are initialized by platform_driver_probe() from a
> > subsys_initcall.
> > If the OSTM driver would be a platform_driver, it would block on the
> > block dependency. Setting the OF_POPULATED flag in the clock driver
> > would not work: while that flag would unblock probing of the timer
> > driver, it would also prevent the second part of the clock driver
> > initialization.
>
> So this looks like it's all working fine, right? Yeah, I already took
> into account the *OF*_DECLARE macros when I wrote this and was aware
> of the split driver implementations. So hopefully this all works out
> fine.

Some of it is working by accident.

I expect there are systems where the timer driver has been converted
from TIMER_OF_DECLARE() to a platform driver (which people are
recommending, as it is needed for Runtime PM support etc.), and that
will break. It's hard to predict.

I tested on all Renesas boards I had, as I expected to discover
breakage. But what exactly broke, and why, was sometimes a bit of a
surprise to me ;-)

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2021-01-28 11:03:19

by Tudor Ambarus

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

Hi, Saravana,

On 1/25/21 8:16 PM, Saravana Kannan wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>
> On Mon, Jan 25, 2021 at 9:05 AM <[email protected]> wrote:
>>
>> Hi, Saravana,
>>
>> On 12/18/20 5:17 AM, Saravana Kannan wrote:
>>> Cyclic dependencies in some firmware was one of the last remaining
>>> reasons fw_devlink=on couldn't be set by default. Now that cyclic
>>> dependencies don't block probing, set fw_devlink=on by default.
>>>
>>> Setting fw_devlink=on by default brings a bunch of benefits (currently,
>>> only for systems with device tree firmware):
>>> * Significantly cuts down deferred probes.
>>> * Device probe is effectively attempted in graph order.
>>> * Makes it much easier to load drivers as modules without having to
>>> worry about functional dependencies between modules (depmod is still
>>> needed for symbol dependencies).
>>>
>>> If this patch prevents some devices from probing, it's very likely due
>>> to the system having one or more device drivers that "probe"/set up a
>>> device (DT node with compatible property) without creating a struct
>>> device for it. If we hit such cases, the device drivers need to be
>>> fixed so that they populate struct devices and probe them like normal
>>> device drivers so that the driver core is aware of the devices and their
>>> status. See [1] for an example of such a case.
>>>
>>> [1] - https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
>>> Signed-off-by: Saravana Kannan <[email protected]>
>>
>> next-20210125 fails to boot on at91 sama5d2 platforms. No output is
>> seen, unless earlyprintk is enabled.
>>
>> I have bisected this to commit e590474768f1cc04 ("driver core: Set
>> fw_devlink=on by default").
>>
>> I've attached a log that I'm seeing on a sama5d2_xplained (sama5_defconfig
>> and arch/arm/boot/dts/at91-sama5d2_xplained.dts). I enabled the
>> following logs:
>> 1. The ones in device_links_check_suppliers()
>> 2. The ones in device_link_add()
>> 3. initcall_debug=1
>>
>> There seem to be some probe fails due to the pmc supplier not being ready:
>> calling at_xdmac_init+0x0/0x18 @ 1
>> platform f0010000.dma-controller: probe deferral - supplier f0014000.pmc not ready
>> platform f0004000.dma-controller: probe deferral - supplier f0014000.pmc not ready
>> initcall at_xdmac_init+0x0/0x18 returned -19 after 19531 usecs
>>
>> calling udc_driver_init+0x0/0x18 @ 1
>> platform 300000.gadget: probe deferral - supplier f0014000.pmc not ready
>> initcall udc_driver_init+0x0/0x18 returned -19 after 7524 usecs
>>
>> There are others too. I'm checking them.
>
> Thanks Tudor. I'll look into this within a few days. I'm also looking
> into coming up with a more generic solution.
>

I've sent a patch addressing this at:
https://lore.kernel.org/lkml/[email protected]/T/#u

Can you please take a look?
Cheers,
ta

2021-01-28 15:10:48

by Jon Hunter

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] Enable fw_devlink=on by default


On 14/01/2021 16:56, Jon Hunter wrote:
>
> On 14/01/2021 16:47, Saravana Kannan wrote:
>
> ...
>
>>> Yes this is the warning shown here [0] and this is coming from
>>> the 'Generic PHY stmmac-0:00' device.
>>
>> Can you print the supplier and consumer device when this warning is
>> happening and let me know? That'd help too. I'm guessing the phy is
>> the consumer.
>
>
> Sorry I should have included that. I added a print to dump this on
> another build but failed to include here.
>
> WARNING KERN Generic PHY stmmac-0:00: supplier 2200000.gpio (status 1)
>
> The status is the link->status and looks like the supplier is the
> gpio controller. I have verified that the gpio controller is probed
> before this successfully.
>
>> So the warning itself isn't a problem -- it's not breaking anything or
>> leaking memory or anything like that. But the device link is jumping
>> states in an incorrect manner. With enough context of this code (why
>> the device_bind_driver() is being called directly instead of going
>> through the normal probe path), it should be easy to fix (I'll just
>> need to fix up the device link state).
>
> Correct, the board seems to boot fine, we just get this warning.


Have you had chance to look at this further?

The following does appear to avoid the warning, but I am not sure if
this is the correct thing to do ...

index 9179825ff646..095aba84f7c2 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -456,6 +456,10 @@ int device_bind_driver(struct device *dev)
{
int ret;

+ ret = device_links_check_suppliers(dev);
+ if (ret)
+ return ret;
+
ret = driver_sysfs_add(dev);
if (!ret)
driver_bound(dev);


Cheers
Jon

--
nvpublic

2021-01-28 17:08:44

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

On Thu, Jan 28, 2021 at 2:59 AM <[email protected]> wrote:
>
> Hi, Saravana,
>
> On 1/25/21 8:16 PM, Saravana Kannan wrote:
> > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> >
> > On Mon, Jan 25, 2021 at 9:05 AM <[email protected]> wrote:
> >>
> >> Hi, Saravana,
> >>
> >> On 12/18/20 5:17 AM, Saravana Kannan wrote:
> >>> Cyclic dependencies in some firmware was one of the last remaining
> >>> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> >>> dependencies don't block probing, set fw_devlink=on by default.
> >>>
> >>> Setting fw_devlink=on by default brings a bunch of benefits (currently,
> >>> only for systems with device tree firmware):
> >>> * Significantly cuts down deferred probes.
> >>> * Device probe is effectively attempted in graph order.
> >>> * Makes it much easier to load drivers as modules without having to
> >>> worry about functional dependencies between modules (depmod is still
> >>> needed for symbol dependencies).
> >>>
> >>> If this patch prevents some devices from probing, it's very likely due
> >>> to the system having one or more device drivers that "probe"/set up a
> >>> device (DT node with compatible property) without creating a struct
> >>> device for it. If we hit such cases, the device drivers need to be
> >>> fixed so that they populate struct devices and probe them like normal
> >>> device drivers so that the driver core is aware of the devices and their
> >>> status. See [1] for an example of such a case.
> >>>
> >>> [1] - https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> >>> Signed-off-by: Saravana Kannan <[email protected]>
> >>
> >> next-20210125 fails to boot on at91 sama5d2 platforms. No output is
> >> seen, unless earlyprintk is enabled.
> >>
> >> I have bisected this to commit e590474768f1cc04 ("driver core: Set
> >> fw_devlink=on by default").
> >>
> >> I've attached a log that I'm seeing on a sama5d2_xplained (sama5_defconfig
> >> and arch/arm/boot/dts/at91-sama5d2_xplained.dts). I enabled the
> >> following logs:
> >> 1. The ones in device_links_check_suppliers()
> >> 2. The ones in device_link_add()
> >> 3. initcall_debug=1
> >>
> >> There seem to be some probe fails due to the pmc supplier not being ready:
> >> calling at_xdmac_init+0x0/0x18 @ 1
> >> platform f0010000.dma-controller: probe deferral - supplier f0014000.pmc not ready
> >> platform f0004000.dma-controller: probe deferral - supplier f0014000.pmc not ready
> >> initcall at_xdmac_init+0x0/0x18 returned -19 after 19531 usecs
> >>
> >> calling udc_driver_init+0x0/0x18 @ 1
> >> platform 300000.gadget: probe deferral - supplier f0014000.pmc not ready
> >> initcall udc_driver_init+0x0/0x18 returned -19 after 7524 usecs
> >>
> >> There are others too. I'm checking them.
> >
> > Thanks Tudor. I'll look into this within a few days. I'm also looking
> > into coming up with a more generic solution.
> >
>
> I've sent a patch addressing this at:
> https://lore.kernel.org/lkml/[email protected]/T/#u
>
> Can you please take a look?

Thanks for taking a look at this. I responded in that thread.

-Saravana

2021-01-28 17:34:01

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] Enable fw_devlink=on by default

On Thu, Jan 28, 2021 at 7:03 AM Jon Hunter <[email protected]> wrote:
>
>
> On 14/01/2021 16:56, Jon Hunter wrote:
> >
> > On 14/01/2021 16:47, Saravana Kannan wrote:
> >
> > ...
> >
> >>> Yes this is the warning shown here [0] and this is coming from
> >>> the 'Generic PHY stmmac-0:00' device.
> >>
> >> Can you print the supplier and consumer device when this warning is
> >> happening and let me know? That'd help too. I'm guessing the phy is
> >> the consumer.
> >
> >
> > Sorry I should have included that. I added a print to dump this on
> > another build but failed to include here.
> >
> > WARNING KERN Generic PHY stmmac-0:00: supplier 2200000.gpio (status 1)
> >
> > The status is the link->status and looks like the supplier is the
> > gpio controller. I have verified that the gpio controller is probed
> > before this successfully.
> >
> >> So the warning itself isn't a problem -- it's not breaking anything or
> >> leaking memory or anything like that. But the device link is jumping
> >> states in an incorrect manner. With enough context of this code (why
> >> the device_bind_driver() is being called directly instead of going
> >> through the normal probe path), it should be easy to fix (I'll just
> >> need to fix up the device link state).
> >
> > Correct, the board seems to boot fine, we just get this warning.
>

Hi Jon,

>
> Have you had chance to look at this further?

No, I feel like I'm just spending all my "upstream time" just
replying to email :)

>
> The following does appear to avoid the warning, but I am not sure if
> this is the correct thing to do ...
>
> index 9179825ff646..095aba84f7c2 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -456,6 +456,10 @@ int device_bind_driver(struct device *dev)
> {
> int ret;
>
> + ret = device_links_check_suppliers(dev);
> + if (ret)
> + return ret;
> +

Yeah I knew calling this function (where device_bind_driver() was
called) would take away the warning, but I first want to understand
why the caller wasn't going through the typical device/driver probe
path before I started adding more of the typical device/driver probe
path code in. I don't want to add in code they might have been
explicitly trying to avoid.

Also, once you do this, you'll need the reverse of this (deleting
links/unsetting state change) somewhere.

Also, device_bind_driver() is used in a bunch of places. Need to check
if it's right to call device_links_check_suppliers() in those
instances.

Feel free to look at those items above. I'll try to get to this once I
take care of the "my device is not working!" issues.

Thanks,
Saravana

> ret = driver_sysfs_add(dev);
> if (!ret)
> driver_bound(dev);
>

2021-02-10 08:34:41

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

On Thu, Dec 17, 2020 at 07:17:03PM -0800, Saravana Kannan wrote:
> Cyclic dependencies in some firmware was one of the last remaining
> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> dependencies don't block probing, set fw_devlink=on by default.
>
> Setting fw_devlink=on by default brings a bunch of benefits (currently,
> only for systems with device tree firmware):
> * Significantly cuts down deferred probes.
> * Device probe is effectively attempted in graph order.
> * Makes it much easier to load drivers as modules without having to
> worry about functional dependencies between modules (depmod is still
> needed for symbol dependencies).
>
> If this patch prevents some devices from probing, it's very likely due
> to the system having one or more device drivers that "probe"/set up a
> device (DT node with compatible property) without creating a struct
> device for it. If we hit such cases, the device drivers need to be
> fixed so that they populate struct devices and probe them like normal
> device drivers so that the driver core is aware of the devices and their
> status. See [1] for an example of such a case.
>
> [1] - https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> Signed-off-by: Saravana Kannan <[email protected]>

This patch breaks nios2 boot tests in qemu. The system gets stuck when
trying to reboot. Reverting this patch fixes the problem. Bisect log
is attached.

It may also break a variety of other boot tests, but with 115 of 430
boot tests failing in -next it is difficult to identify all culprits.

Guenter

---
Bisect log:

# bad: [a4bfd8d46ac357c12529e4eebb6c89502b03ecc9] Add linux-next specific files for 20210209
# good: [92bf22614b21a2706f4993b278017e437f7785b3] Linux 5.11-rc7
git bisect start 'HEAD' 'v5.11-rc7'
# good: [a8eb921ba7e8e77d994a1c6c69c8ef08456ecf53] Merge remote-tracking branch 'crypto/master'
git bisect good a8eb921ba7e8e77d994a1c6c69c8ef08456ecf53
# good: [21d507c41bdf83f6afc0e02976e43c10badfc6cd] Merge remote-tracking branch 'spi/for-next'
git bisect good 21d507c41bdf83f6afc0e02976e43c10badfc6cd
# bad: [30cd4c688a3bcf324f011d7716044b1a4681efc1] Merge remote-tracking branch 'soundwire/next'
git bisect bad 30cd4c688a3bcf324f011d7716044b1a4681efc1
# good: [c43d2173d3eb4047bb62a7a393a298a1032cce18] Merge remote-tracking branch 'drivers-x86/for-next'
git bisect good c43d2173d3eb4047bb62a7a393a298a1032cce18
# bad: [4dd66c506de68a592f2dd4ef64cc9b0d7c0f3117] Merge remote-tracking branch 'usb-chipidea-next/for-usb-next'
git bisect bad 4dd66c506de68a592f2dd4ef64cc9b0d7c0f3117
# good: [29b01295a829fba7399ee84afff4e64660e49f04] usb: typec: Add typec_partner_set_pd_revision
git bisect good 29b01295a829fba7399ee84afff4e64660e49f04
# bad: [dac8ab120e531bf7c358b85750338e1b3d3ca0b9] Merge remote-tracking branch 'usb/usb-next'
git bisect bad dac8ab120e531bf7c358b85750338e1b3d3ca0b9
# good: [678481467d2e1460a49e626d8e9ba0c7e9742f53] usb: dwc3: core: Check maximum_speed SSP genXxY
git bisect good 678481467d2e1460a49e626d8e9ba0c7e9742f53
# good: [5d3d0a61479847a8729ffdda33867f6b3443c15f] Merge remote-tracking branch 'ipmi/for-next'
git bisect good 5d3d0a61479847a8729ffdda33867f6b3443c15f
# bad: [e13f5b7a130f7b6d4d34be27a87393890b5ee2ba] of: property: Add fw_devlink support for "gpio" and "gpios" binding
git bisect bad e13f5b7a130f7b6d4d34be27a87393890b5ee2ba
# good: [b0e2fa4f611bb9ab22928605d5b1c7fd44e73955] driver core: Handle cycles in device links created by fw_devlink
git bisect good b0e2fa4f611bb9ab22928605d5b1c7fd44e73955
# bad: [0fab972eef49ef8d30eb91d6bd98861122d083d1] drivers: core: Detach device from power domain on shutdown
git bisect bad 0fab972eef49ef8d30eb91d6bd98861122d083d1
# bad: [e590474768f1cc04852190b61dec692411b22e2a] driver core: Set fw_devlink=on by default
git bisect bad e590474768f1cc04852190b61dec692411b22e2a
# good: [c13b827927112ba6170bea31c638a8573c127461] driver core: fw_devlink_relax_cycle() can be static
git bisect good c13b827927112ba6170bea31c638a8573c127461
# first bad commit: [e590474768f1cc04852190b61dec692411b22e2a] driver core: Set fw_devlink=on by default

2021-02-10 08:45:15

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

On Tue, Feb 9, 2021 at 9:54 PM Guenter Roeck <[email protected]> wrote:
>
> On Thu, Dec 17, 2020 at 07:17:03PM -0800, Saravana Kannan wrote:
> > Cyclic dependencies in some firmware was one of the last remaining
> > reasons fw_devlink=on couldn't be set by default. Now that cyclic
> > dependencies don't block probing, set fw_devlink=on by default.
> >
> > Setting fw_devlink=on by default brings a bunch of benefits (currently,
> > only for systems with device tree firmware):
> > * Significantly cuts down deferred probes.
> > * Device probe is effectively attempted in graph order.
> > * Makes it much easier to load drivers as modules without having to
> > worry about functional dependencies between modules (depmod is still
> > needed for symbol dependencies).
> >
> > If this patch prevents some devices from probing, it's very likely due
> > to the system having one or more device drivers that "probe"/set up a
> > device (DT node with compatible property) without creating a struct
> > device for it. If we hit such cases, the device drivers need to be
> > fixed so that they populate struct devices and probe them like normal
> > device drivers so that the driver core is aware of the devices and their
> > status. See [1] for an example of such a case.
> >
> > [1] - https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> > Signed-off-by: Saravana Kannan <[email protected]>
>
> This patch breaks nios2 boot tests in qemu. The system gets stuck when
> trying to reboot. Reverting this patch fixes the problem. Bisect log
> is attached.

Thanks for the report Guenter. Can you please try this series?
https://lore.kernel.org/lkml/[email protected]/

It's in driver-core-testing too if that's easier.

-Saravana

2021-02-10 15:14:43

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

On 2/10/21 12:20 AM, Saravana Kannan wrote:
> On Tue, Feb 9, 2021 at 9:54 PM Guenter Roeck <[email protected]> wrote:
>>
>> On Thu, Dec 17, 2020 at 07:17:03PM -0800, Saravana Kannan wrote:
>>> Cyclic dependencies in some firmware was one of the last remaining
>>> reasons fw_devlink=on couldn't be set by default. Now that cyclic
>>> dependencies don't block probing, set fw_devlink=on by default.
>>>
>>> Setting fw_devlink=on by default brings a bunch of benefits (currently,
>>> only for systems with device tree firmware):
>>> * Significantly cuts down deferred probes.
>>> * Device probe is effectively attempted in graph order.
>>> * Makes it much easier to load drivers as modules without having to
>>> worry about functional dependencies between modules (depmod is still
>>> needed for symbol dependencies).
>>>
>>> If this patch prevents some devices from probing, it's very likely due
>>> to the system having one or more device drivers that "probe"/set up a
>>> device (DT node with compatible property) without creating a struct
>>> device for it. If we hit such cases, the device drivers need to be
>>> fixed so that they populate struct devices and probe them like normal
>>> device drivers so that the driver core is aware of the devices and their
>>> status. See [1] for an example of such a case.
>>>
>>> [1] - https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
>>> Signed-off-by: Saravana Kannan <[email protected]>
>>
>> This patch breaks nios2 boot tests in qemu. The system gets stuck when
>> trying to reboot. Reverting this patch fixes the problem. Bisect log
>> is attached.
>
> Thanks for the report Guenter. Can you please try this series?
> https://lore.kernel.org/lkml/[email protected]/
>

Not this week. I have lots of reviews to complete before the end of the week,
with the 5.12 commit window coming up.

Given the number of problems observed, I personally think that it is way
too early for this patch. We'll have no end of problems if it is applied
to the upstream kernel in the next commit window. Of course, that is just
my personal opinion.

Thanks,
Guenter

2021-02-10 20:56:59

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

On Wed, Feb 10, 2021 at 7:10 AM Guenter Roeck <[email protected]> wrote:
>
> On 2/10/21 12:20 AM, Saravana Kannan wrote:
> > On Tue, Feb 9, 2021 at 9:54 PM Guenter Roeck <[email protected]> wrote:
> >>
> >> On Thu, Dec 17, 2020 at 07:17:03PM -0800, Saravana Kannan wrote:
> >>> Cyclic dependencies in some firmware was one of the last remaining
> >>> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> >>> dependencies don't block probing, set fw_devlink=on by default.
> >>>
> >>> Setting fw_devlink=on by default brings a bunch of benefits (currently,
> >>> only for systems with device tree firmware):
> >>> * Significantly cuts down deferred probes.
> >>> * Device probe is effectively attempted in graph order.
> >>> * Makes it much easier to load drivers as modules without having to
> >>> worry about functional dependencies between modules (depmod is still
> >>> needed for symbol dependencies).
> >>>
> >>> If this patch prevents some devices from probing, it's very likely due
> >>> to the system having one or more device drivers that "probe"/set up a
> >>> device (DT node with compatible property) without creating a struct
> >>> device for it. If we hit such cases, the device drivers need to be
> >>> fixed so that they populate struct devices and probe them like normal
> >>> device drivers so that the driver core is aware of the devices and their
> >>> status. See [1] for an example of such a case.
> >>>
> >>> [1] - https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> >>> Signed-off-by: Saravana Kannan <[email protected]>
> >>
> >> This patch breaks nios2 boot tests in qemu. The system gets stuck when
> >> trying to reboot. Reverting this patch fixes the problem. Bisect log
> >> is attached.
> >
> > Thanks for the report Guenter. Can you please try this series?
> > https://lore.kernel.org/lkml/[email protected]/
> >
>
> Not this week. I have lots of reviews to complete before the end of the week,
> with the 5.12 commit window coming up.

Ok. By next week, all the fixes should be in linux-next too. So it
should be easier if you choose to test.

> Given the number of problems observed, I personally think that it is way
> too early for this patch. We'll have no end of problems if it is applied
> to the upstream kernel in the next commit window. Of course, that is just
> my personal opinion.

You had said "with 115 of 430 boot tests failing in -next" earlier.
Just to be sure I understand it right, you are not saying this patch
caused them all right? You are just saying that 115 general boot
failures that might mask fw_devlink issues in some of them, right?

Thanks,
Saravana

2021-02-10 21:22:44

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

On 2/10/21 12:52 PM, Saravana Kannan wrote:
> On Wed, Feb 10, 2021 at 7:10 AM Guenter Roeck <[email protected]> wrote:
>>
>> On 2/10/21 12:20 AM, Saravana Kannan wrote:
>>> On Tue, Feb 9, 2021 at 9:54 PM Guenter Roeck <[email protected]> wrote:
>>>>
>>>> On Thu, Dec 17, 2020 at 07:17:03PM -0800, Saravana Kannan wrote:
>>>>> Cyclic dependencies in some firmware was one of the last remaining
>>>>> reasons fw_devlink=on couldn't be set by default. Now that cyclic
>>>>> dependencies don't block probing, set fw_devlink=on by default.
>>>>>
>>>>> Setting fw_devlink=on by default brings a bunch of benefits (currently,
>>>>> only for systems with device tree firmware):
>>>>> * Significantly cuts down deferred probes.
>>>>> * Device probe is effectively attempted in graph order.
>>>>> * Makes it much easier to load drivers as modules without having to
>>>>> worry about functional dependencies between modules (depmod is still
>>>>> needed for symbol dependencies).
>>>>>
>>>>> If this patch prevents some devices from probing, it's very likely due
>>>>> to the system having one or more device drivers that "probe"/set up a
>>>>> device (DT node with compatible property) without creating a struct
>>>>> device for it. If we hit such cases, the device drivers need to be
>>>>> fixed so that they populate struct devices and probe them like normal
>>>>> device drivers so that the driver core is aware of the devices and their
>>>>> status. See [1] for an example of such a case.
>>>>>
>>>>> [1] - https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
>>>>> Signed-off-by: Saravana Kannan <[email protected]>
>>>>
>>>> This patch breaks nios2 boot tests in qemu. The system gets stuck when
>>>> trying to reboot. Reverting this patch fixes the problem. Bisect log
>>>> is attached.
>>>
>>> Thanks for the report Guenter. Can you please try this series?
>>> https://lore.kernel.org/lkml/[email protected]/
>>>
>>
>> Not this week. I have lots of reviews to complete before the end of the week,
>> with the 5.12 commit window coming up.
>
> Ok. By next week, all the fixes should be in linux-next too. So it
> should be easier if you choose to test.
>
>> Given the number of problems observed, I personally think that it is way
>> too early for this patch. We'll have no end of problems if it is applied
>> to the upstream kernel in the next commit window. Of course, that is just
>> my personal opinion.
>
> You had said "with 115 of 430 boot tests failing in -next" earlier.
> Just to be sure I understand it right, you are not saying this patch
> caused them all right? You are just saying that 115 general boot
> failures that might mask fw_devlink issues in some of them, right?
>

Correct.

Guenter

2021-02-11 00:07:16

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] Enable fw_devlink=on by default

On Thu, Jan 28, 2021 at 7:03 AM Jon Hunter <[email protected]> wrote:
>
>
> On 14/01/2021 16:56, Jon Hunter wrote:
> >
> > On 14/01/2021 16:47, Saravana Kannan wrote:
> >
> > ...
> >
> >>> Yes this is the warning shown here [0] and this is coming from
> >>> the 'Generic PHY stmmac-0:00' device.
> >>
> >> Can you print the supplier and consumer device when this warning is
> >> happening and let me know? That'd help too. I'm guessing the phy is
> >> the consumer.
> >
> >
> > Sorry I should have included that. I added a print to dump this on
> > another build but failed to include here.
> >
> > WARNING KERN Generic PHY stmmac-0:00: supplier 2200000.gpio (status 1)
> >
> > The status is the link->status and looks like the supplier is the
> > gpio controller. I have verified that the gpio controller is probed
> > before this successfully.
> >
> >> So the warning itself isn't a problem -- it's not breaking anything or
> >> leaking memory or anything like that. But the device link is jumping
> >> states in an incorrect manner. With enough context of this code (why
> >> the device_bind_driver() is being called directly instead of going
> >> through the normal probe path), it should be easy to fix (I'll just
> >> need to fix up the device link state).
> >
> > Correct, the board seems to boot fine, we just get this warning.
>
>
> Have you had chance to look at this further?

Hi Jon,

I finally got around to looking into this. Here's the email[1] that
describes why it's done this way.

[1] - https://lore.kernel.org/lkml/[email protected]/

>
> The following does appear to avoid the warning, but I am not sure if
> this is the correct thing to do ...
>
> index 9179825ff646..095aba84f7c2 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -456,6 +456,10 @@ int device_bind_driver(struct device *dev)
> {
> int ret;
>
> + ret = device_links_check_suppliers(dev);
> + if (ret)
> + return ret;
> +
> ret = driver_sysfs_add(dev);
> if (!ret)
> driver_bound(dev);

So digging deeper into the usage of device_bind_driver and looking at
[1], it doesn't look like returning an error here is a good option.
When device_bind_driver() is called, the driver's probe function isn't
even called. So, there's no way for the driver to even defer probing
based on any of the suppliers. So, we have a couple of options:

1. Delete all the links to suppliers that haven't bound. We'll still
leave the links to active suppliers alone in case it helps with
suspend/resume correctness.
2. Fix the warning to not warn on suppliers that haven't probed if the
device's driver has no probe function. But this will also need fixing
up the cleanup part when device_release_driver() is called. Also, I'm
not sure if device_bind_driver() is ever called when the driver
actually has a probe() function.

Rafael,

Option 1 above is pretty straightforward.
Option 2 would look something like what's at the end of this email +
caveat about whether the probe check is sufficient.

Do you have a preference between Option 1 vs 2? Or do you have some
other option in mind?

Thanks,
Saravana

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 5481b6940a02..8102b3c48bbc 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1247,7 +1247,8 @@ void device_links_driver_bound(struct device *dev)
*/
device_link_drop_managed(link);
} else {
- WARN_ON(link->status != DL_STATE_CONSUMER_PROBE);
+ WARN_ON(link->status != DL_STATE_CONSUMER_PROBE &&
+ dev->driver->probe);
WRITE_ONCE(link->status, DL_STATE_ACTIVE);
}

@@ -1302,7 +1303,8 @@ static void __device_links_no_driver(struct device *dev)
if (link->supplier->links.status == DL_DEV_DRIVER_BOUND) {
WRITE_ONCE(link->status, DL_STATE_AVAILABLE);
} else {
- WARN_ON(!(link->flags & DL_FLAG_SYNC_STATE_ONLY));
+ WARN_ON(!(link->flags & DL_FLAG_SYNC_STATE_ONLY) &&
+ dev->driver->probe);
WRITE_ONCE(link->status, DL_STATE_DORMANT);
}
}

2021-02-11 15:32:24

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] Enable fw_devlink=on by default

On Thu, Feb 11, 2021 at 1:02 AM Saravana Kannan <[email protected]> wrote:
>
> On Thu, Jan 28, 2021 at 7:03 AM Jon Hunter <[email protected]> wrote:
> >
> >
> > On 14/01/2021 16:56, Jon Hunter wrote:
> > >
> > > On 14/01/2021 16:47, Saravana Kannan wrote:
> > >
> > > ...
> > >
> > >>> Yes this is the warning shown here [0] and this is coming from
> > >>> the 'Generic PHY stmmac-0:00' device.
> > >>
> > >> Can you print the supplier and consumer device when this warning is
> > >> happening and let me know? That'd help too. I'm guessing the phy is
> > >> the consumer.
> > >
> > >
> > > Sorry I should have included that. I added a print to dump this on
> > > another build but failed to include here.
> > >
> > > WARNING KERN Generic PHY stmmac-0:00: supplier 2200000.gpio (status 1)
> > >
> > > The status is the link->status and looks like the supplier is the
> > > gpio controller. I have verified that the gpio controller is probed
> > > before this successfully.
> > >
> > >> So the warning itself isn't a problem -- it's not breaking anything or
> > >> leaking memory or anything like that. But the device link is jumping
> > >> states in an incorrect manner. With enough context of this code (why
> > >> the device_bind_driver() is being called directly instead of going
> > >> through the normal probe path), it should be easy to fix (I'll just
> > >> need to fix up the device link state).
> > >
> > > Correct, the board seems to boot fine, we just get this warning.
> >
> >
> > Have you had chance to look at this further?
>
> Hi Jon,
>
> I finally got around to looking into this. Here's the email[1] that
> describes why it's done this way.
>
> [1] - https://lore.kernel.org/lkml/[email protected]/
>
> >
> > The following does appear to avoid the warning, but I am not sure if
> > this is the correct thing to do ...
> >
> > index 9179825ff646..095aba84f7c2 100644
> > --- a/drivers/base/dd.c
> > +++ b/drivers/base/dd.c
> > @@ -456,6 +456,10 @@ int device_bind_driver(struct device *dev)
> > {
> > int ret;
> >
> > + ret = device_links_check_suppliers(dev);
> > + if (ret)
> > + return ret;
> > +
> > ret = driver_sysfs_add(dev);
> > if (!ret)
> > driver_bound(dev);
>
> So digging deeper into the usage of device_bind_driver and looking at
> [1], it doesn't look like returning an error here is a good option.
> When device_bind_driver() is called, the driver's probe function isn't
> even called. So, there's no way for the driver to even defer probing
> based on any of the suppliers. So, we have a couple of options:
>
> 1. Delete all the links to suppliers that haven't bound.

Or maybe convert them to stateless links? Would that be doable at all?

> We'll still leave the links to active suppliers alone in case it helps with
> suspend/resume correctness.
> 2. Fix the warning to not warn on suppliers that haven't probed if the
> device's driver has no probe function. But this will also need fixing
> up the cleanup part when device_release_driver() is called. Also, I'm
> not sure if device_bind_driver() is ever called when the driver
> actually has a probe() function.
>
> Rafael,
>
> Option 1 above is pretty straightforward.

I would prefer this ->

> Option 2 would look something like what's at the end of this email +
> caveat about whether the probe check is sufficient.

-> because "fix the warning" really means that we haven't got the
device link state machine right and getting it right may imply a major
redesign.

Overall, I'd prefer to take a step back and allow things to stabilize
for a while to let people catch up with this.

> Do you have a preference between Option 1 vs 2? Or do you have some
> other option in mind?
>
> Thanks,
> Saravana
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 5481b6940a02..8102b3c48bbc 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -1247,7 +1247,8 @@ void device_links_driver_bound(struct device *dev)
> */
> device_link_drop_managed(link);
> } else {
> - WARN_ON(link->status != DL_STATE_CONSUMER_PROBE);
> + WARN_ON(link->status != DL_STATE_CONSUMER_PROBE &&
> + dev->driver->probe);
> WRITE_ONCE(link->status, DL_STATE_ACTIVE);
> }
>
> @@ -1302,7 +1303,8 @@ static void __device_links_no_driver(struct device *dev)
> if (link->supplier->links.status == DL_DEV_DRIVER_BOUND) {
> WRITE_ONCE(link->status, DL_STATE_AVAILABLE);
> } else {
> - WARN_ON(!(link->flags & DL_FLAG_SYNC_STATE_ONLY));
> + WARN_ON(!(link->flags & DL_FLAG_SYNC_STATE_ONLY) &&
> + dev->driver->probe);
> WRITE_ONCE(link->status, DL_STATE_DORMANT);
> }
> }

2021-02-11 17:53:15

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] Enable fw_devlink=on by default

On Thu, Feb 11, 2021 at 7:03 AM Rafael J. Wysocki <[email protected]> wrote:
>
> On Thu, Feb 11, 2021 at 1:02 AM Saravana Kannan <[email protected]> wrote:
> >
> > On Thu, Jan 28, 2021 at 7:03 AM Jon Hunter <[email protected]> wrote:
> > >
> > >
> > > On 14/01/2021 16:56, Jon Hunter wrote:
> > > >
> > > > On 14/01/2021 16:47, Saravana Kannan wrote:
> > > >
> > > > ...
> > > >
> > > >>> Yes this is the warning shown here [0] and this is coming from
> > > >>> the 'Generic PHY stmmac-0:00' device.
> > > >>
> > > >> Can you print the supplier and consumer device when this warning is
> > > >> happening and let me know? That'd help too. I'm guessing the phy is
> > > >> the consumer.
> > > >
> > > >
> > > > Sorry I should have included that. I added a print to dump this on
> > > > another build but failed to include here.
> > > >
> > > > WARNING KERN Generic PHY stmmac-0:00: supplier 2200000.gpio (status 1)
> > > >
> > > > The status is the link->status and looks like the supplier is the
> > > > gpio controller. I have verified that the gpio controller is probed
> > > > before this successfully.
> > > >
> > > >> So the warning itself isn't a problem -- it's not breaking anything or
> > > >> leaking memory or anything like that. But the device link is jumping
> > > >> states in an incorrect manner. With enough context of this code (why
> > > >> the device_bind_driver() is being called directly instead of going
> > > >> through the normal probe path), it should be easy to fix (I'll just
> > > >> need to fix up the device link state).
> > > >
> > > > Correct, the board seems to boot fine, we just get this warning.
> > >
> > >
> > > Have you had chance to look at this further?
> >
> > Hi Jon,
> >
> > I finally got around to looking into this. Here's the email[1] that
> > describes why it's done this way.
> >
> > [1] - https://lore.kernel.org/lkml/[email protected]/
> >
> > >
> > > The following does appear to avoid the warning, but I am not sure if
> > > this is the correct thing to do ...
> > >
> > > index 9179825ff646..095aba84f7c2 100644
> > > --- a/drivers/base/dd.c
> > > +++ b/drivers/base/dd.c
> > > @@ -456,6 +456,10 @@ int device_bind_driver(struct device *dev)
> > > {
> > > int ret;
> > >
> > > + ret = device_links_check_suppliers(dev);
> > > + if (ret)
> > > + return ret;
> > > +
> > > ret = driver_sysfs_add(dev);
> > > if (!ret)
> > > driver_bound(dev);
> >
> > So digging deeper into the usage of device_bind_driver and looking at
> > [1], it doesn't look like returning an error here is a good option.
> > When device_bind_driver() is called, the driver's probe function isn't
> > even called. So, there's no way for the driver to even defer probing
> > based on any of the suppliers. So, we have a couple of options:
> >
> > 1. Delete all the links to suppliers that haven't bound.
>
> Or maybe convert them to stateless links? Would that be doable at all?

Yeah, I think it should be doable.

>
> > We'll still leave the links to active suppliers alone in case it helps with
> > suspend/resume correctness.
> > 2. Fix the warning to not warn on suppliers that haven't probed if the
> > device's driver has no probe function. But this will also need fixing
> > up the cleanup part when device_release_driver() is called. Also, I'm
> > not sure if device_bind_driver() is ever called when the driver
> > actually has a probe() function.
> >
> > Rafael,
> >
> > Option 1 above is pretty straightforward.
>
> I would prefer this ->

Ok

>
> > Option 2 would look something like what's at the end of this email +
> > caveat about whether the probe check is sufficient.
>
> -> because "fix the warning" really means that we haven't got the
> device link state machine right and getting it right may imply a major
> redesign.
>
> Overall, I'd prefer to take a step back and allow things to stabilize
> for a while to let people catch up with this.

Are you referring to if/when we implement Option 2? Or do you want to
step back for a while even before implementing Option 1?


-Saravana

>
> > Do you have a preference between Option 1 vs 2? Or do you have some
> > other option in mind?
> >
> > Thanks,
> > Saravana
> >
> > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > index 5481b6940a02..8102b3c48bbc 100644
> > --- a/drivers/base/core.c
> > +++ b/drivers/base/core.c
> > @@ -1247,7 +1247,8 @@ void device_links_driver_bound(struct device *dev)
> > */
> > device_link_drop_managed(link);
> > } else {
> > - WARN_ON(link->status != DL_STATE_CONSUMER_PROBE);
> > + WARN_ON(link->status != DL_STATE_CONSUMER_PROBE &&
> > + dev->driver->probe);
> > WRITE_ONCE(link->status, DL_STATE_ACTIVE);
> > }
> >
> > @@ -1302,7 +1303,8 @@ static void __device_links_no_driver(struct device *dev)
> > if (link->supplier->links.status == DL_DEV_DRIVER_BOUND) {
> > WRITE_ONCE(link->status, DL_STATE_AVAILABLE);
> > } else {
> > - WARN_ON(!(link->flags & DL_FLAG_SYNC_STATE_ONLY));
> > + WARN_ON(!(link->flags & DL_FLAG_SYNC_STATE_ONLY) &&
> > + dev->driver->probe);
> > WRITE_ONCE(link->status, DL_STATE_DORMANT);
> > }
> > }

2021-02-11 18:12:48

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] Enable fw_devlink=on by default

On Thu, Feb 11, 2021 at 6:15 PM Saravana Kannan <[email protected]> wrote:
>
> On Thu, Feb 11, 2021 at 7:03 AM Rafael J. Wysocki <[email protected]> wrote:
> >
> > On Thu, Feb 11, 2021 at 1:02 AM Saravana Kannan <[email protected]> wrote:
> > >
> > > On Thu, Jan 28, 2021 at 7:03 AM Jon Hunter <[email protected]> wrote:
> > > >
> > > >
> > > > On 14/01/2021 16:56, Jon Hunter wrote:
> > > > >
> > > > > On 14/01/2021 16:47, Saravana Kannan wrote:
> > > > >
> > > > > ...
> > > > >
> > > > >>> Yes this is the warning shown here [0] and this is coming from
> > > > >>> the 'Generic PHY stmmac-0:00' device.
> > > > >>
> > > > >> Can you print the supplier and consumer device when this warning is
> > > > >> happening and let me know? That'd help too. I'm guessing the phy is
> > > > >> the consumer.
> > > > >
> > > > >
> > > > > Sorry I should have included that. I added a print to dump this on
> > > > > another build but failed to include here.
> > > > >
> > > > > WARNING KERN Generic PHY stmmac-0:00: supplier 2200000.gpio (status 1)
> > > > >
> > > > > The status is the link->status and looks like the supplier is the
> > > > > gpio controller. I have verified that the gpio controller is probed
> > > > > before this successfully.
> > > > >
> > > > >> So the warning itself isn't a problem -- it's not breaking anything or
> > > > >> leaking memory or anything like that. But the device link is jumping
> > > > >> states in an incorrect manner. With enough context of this code (why
> > > > >> the device_bind_driver() is being called directly instead of going
> > > > >> through the normal probe path), it should be easy to fix (I'll just
> > > > >> need to fix up the device link state).
> > > > >
> > > > > Correct, the board seems to boot fine, we just get this warning.
> > > >
> > > >
> > > > Have you had chance to look at this further?
> > >
> > > Hi Jon,
> > >
> > > I finally got around to looking into this. Here's the email[1] that
> > > describes why it's done this way.
> > >
> > > [1] - https://lore.kernel.org/lkml/[email protected]/
> > >
> > > >
> > > > The following does appear to avoid the warning, but I am not sure if
> > > > this is the correct thing to do ...
> > > >
> > > > index 9179825ff646..095aba84f7c2 100644
> > > > --- a/drivers/base/dd.c
> > > > +++ b/drivers/base/dd.c
> > > > @@ -456,6 +456,10 @@ int device_bind_driver(struct device *dev)
> > > > {
> > > > int ret;
> > > >
> > > > + ret = device_links_check_suppliers(dev);
> > > > + if (ret)
> > > > + return ret;
> > > > +
> > > > ret = driver_sysfs_add(dev);
> > > > if (!ret)
> > > > driver_bound(dev);
> > >
> > > So digging deeper into the usage of device_bind_driver and looking at
> > > [1], it doesn't look like returning an error here is a good option.
> > > When device_bind_driver() is called, the driver's probe function isn't
> > > even called. So, there's no way for the driver to even defer probing
> > > based on any of the suppliers. So, we have a couple of options:
> > >
> > > 1. Delete all the links to suppliers that haven't bound.
> >
> > Or maybe convert them to stateless links? Would that be doable at all?
>
> Yeah, I think it should be doable.
>
> >
> > > We'll still leave the links to active suppliers alone in case it helps with
> > > suspend/resume correctness.
> > > 2. Fix the warning to not warn on suppliers that haven't probed if the
> > > device's driver has no probe function. But this will also need fixing
> > > up the cleanup part when device_release_driver() is called. Also, I'm
> > > not sure if device_bind_driver() is ever called when the driver
> > > actually has a probe() function.
> > >
> > > Rafael,
> > >
> > > Option 1 above is pretty straightforward.
> >
> > I would prefer this ->
>
> Ok
>
> >
> > > Option 2 would look something like what's at the end of this email +
> > > caveat about whether the probe check is sufficient.
> >
> > -> because "fix the warning" really means that we haven't got the
> > device link state machine right and getting it right may imply a major
> > redesign.
> >
> > Overall, I'd prefer to take a step back and allow things to stabilize
> > for a while to let people catch up with this.
>
> Are you referring to if/when we implement Option 2? Or do you want to
> step back for a while even before implementing Option 1?

I would do option 1 and if then see what happens and maybe go back
from there if need be until getting a reasonably stable situation
(that is all of the systems that used to work before still work at
least).

2021-02-12 03:09:00

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 0/5] Enable fw_devlink=on by default

On Thu, Feb 11, 2021 at 9:48 AM Rafael J. Wysocki <[email protected]> wrote:
>
> On Thu, Feb 11, 2021 at 6:15 PM Saravana Kannan <[email protected]> wrote:
> >
> > On Thu, Feb 11, 2021 at 7:03 AM Rafael J. Wysocki <[email protected]> wrote:
> > >
> > > On Thu, Feb 11, 2021 at 1:02 AM Saravana Kannan <[email protected]> wrote:
> > > >
> > > > On Thu, Jan 28, 2021 at 7:03 AM Jon Hunter <[email protected]> wrote:
> > > > >
> > > > >
> > > > > On 14/01/2021 16:56, Jon Hunter wrote:
> > > > > >
> > > > > > On 14/01/2021 16:47, Saravana Kannan wrote:
> > > > > >
> > > > > > ...
> > > > > >
> > > > > >>> Yes this is the warning shown here [0] and this is coming from
> > > > > >>> the 'Generic PHY stmmac-0:00' device.
> > > > > >>
> > > > > >> Can you print the supplier and consumer device when this warning is
> > > > > >> happening and let me know? That'd help too. I'm guessing the phy is
> > > > > >> the consumer.
> > > > > >
> > > > > >
> > > > > > Sorry I should have included that. I added a print to dump this on
> > > > > > another build but failed to include here.
> > > > > >
> > > > > > WARNING KERN Generic PHY stmmac-0:00: supplier 2200000.gpio (status 1)
> > > > > >
> > > > > > The status is the link->status and looks like the supplier is the
> > > > > > gpio controller. I have verified that the gpio controller is probed
> > > > > > before this successfully.
> > > > > >
> > > > > >> So the warning itself isn't a problem -- it's not breaking anything or
> > > > > >> leaking memory or anything like that. But the device link is jumping
> > > > > >> states in an incorrect manner. With enough context of this code (why
> > > > > >> the device_bind_driver() is being called directly instead of going
> > > > > >> through the normal probe path), it should be easy to fix (I'll just
> > > > > >> need to fix up the device link state).
> > > > > >
> > > > > > Correct, the board seems to boot fine, we just get this warning.
> > > > >
> > > > >
> > > > > Have you had chance to look at this further?
> > > >
> > > > Hi Jon,
> > > >
> > > > I finally got around to looking into this. Here's the email[1] that
> > > > describes why it's done this way.
> > > >
> > > > [1] - https://lore.kernel.org/lkml/[email protected]/
> > > >
> > > > >
> > > > > The following does appear to avoid the warning, but I am not sure if
> > > > > this is the correct thing to do ...
> > > > >
> > > > > index 9179825ff646..095aba84f7c2 100644
> > > > > --- a/drivers/base/dd.c
> > > > > +++ b/drivers/base/dd.c
> > > > > @@ -456,6 +456,10 @@ int device_bind_driver(struct device *dev)
> > > > > {
> > > > > int ret;
> > > > >
> > > > > + ret = device_links_check_suppliers(dev);
> > > > > + if (ret)
> > > > > + return ret;
> > > > > +
> > > > > ret = driver_sysfs_add(dev);
> > > > > if (!ret)
> > > > > driver_bound(dev);
> > > >
> > > > So digging deeper into the usage of device_bind_driver and looking at
> > > > [1], it doesn't look like returning an error here is a good option.
> > > > When device_bind_driver() is called, the driver's probe function isn't
> > > > even called. So, there's no way for the driver to even defer probing
> > > > based on any of the suppliers. So, we have a couple of options:
> > > >
> > > > 1. Delete all the links to suppliers that haven't bound.
> > >
> > > Or maybe convert them to stateless links? Would that be doable at all?
> >
> > Yeah, I think it should be doable.
> >
> > >
> > > > We'll still leave the links to active suppliers alone in case it helps with
> > > > suspend/resume correctness.
> > > > 2. Fix the warning to not warn on suppliers that haven't probed if the
> > > > device's driver has no probe function. But this will also need fixing
> > > > up the cleanup part when device_release_driver() is called. Also, I'm
> > > > not sure if device_bind_driver() is ever called when the driver
> > > > actually has a probe() function.
> > > >
> > > > Rafael,
> > > >
> > > > Option 1 above is pretty straightforward.
> > >
> > > I would prefer this ->
> >
> > Ok
> >
> > >
> > > > Option 2 would look something like what's at the end of this email +
> > > > caveat about whether the probe check is sufficient.
> > >
> > > -> because "fix the warning" really means that we haven't got the
> > > device link state machine right and getting it right may imply a major
> > > redesign.
> > >
> > > Overall, I'd prefer to take a step back and allow things to stabilize
> > > for a while to let people catch up with this.
> >
> > Are you referring to if/when we implement Option 2? Or do you want to
> > step back for a while even before implementing Option 1?
>
> I would do option 1 and if then see what happens and maybe go back
> from there if need be until getting a reasonably stable situation
> (that is all of the systems that used to work before still work at
> least).

Ok, I'll implement Option 1 soon. Also, thinking more about it, I
don't like converting it into STATELESS links. It's easy to do, but it
doesn't feel right for the driver core to "create" a STATELESS link
and then "forget" about it. So, when a device is force bound, I'll
just delete the links where the suppliers haven't probed yet.

-Saravana

2021-02-17 03:12:00

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

On Wed, Feb 10, 2021 at 1:21 PM Guenter Roeck <[email protected]> wrote:
>
> On 2/10/21 12:52 PM, Saravana Kannan wrote:
> > On Wed, Feb 10, 2021 at 7:10 AM Guenter Roeck <[email protected]> wrote:
> >>
> >> On 2/10/21 12:20 AM, Saravana Kannan wrote:
> >>> On Tue, Feb 9, 2021 at 9:54 PM Guenter Roeck <[email protected]> wrote:
> >>>>
> >>>> On Thu, Dec 17, 2020 at 07:17:03PM -0800, Saravana Kannan wrote:
> >>>>> Cyclic dependencies in some firmware was one of the last remaining
> >>>>> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> >>>>> dependencies don't block probing, set fw_devlink=on by default.
> >>>>>
> >>>>> Setting fw_devlink=on by default brings a bunch of benefits (currently,
> >>>>> only for systems with device tree firmware):
> >>>>> * Significantly cuts down deferred probes.
> >>>>> * Device probe is effectively attempted in graph order.
> >>>>> * Makes it much easier to load drivers as modules without having to
> >>>>> worry about functional dependencies between modules (depmod is still
> >>>>> needed for symbol dependencies).
> >>>>>
> >>>>> If this patch prevents some devices from probing, it's very likely due
> >>>>> to the system having one or more device drivers that "probe"/set up a
> >>>>> device (DT node with compatible property) without creating a struct
> >>>>> device for it. If we hit such cases, the device drivers need to be
> >>>>> fixed so that they populate struct devices and probe them like normal
> >>>>> device drivers so that the driver core is aware of the devices and their
> >>>>> status. See [1] for an example of such a case.
> >>>>>
> >>>>> [1] - https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> >>>>> Signed-off-by: Saravana Kannan <[email protected]>
> >>>>
> >>>> This patch breaks nios2 boot tests in qemu. The system gets stuck when
> >>>> trying to reboot. Reverting this patch fixes the problem. Bisect log
> >>>> is attached.
> >>>
> >>> Thanks for the report Guenter. Can you please try this series?
> >>> https://lore.kernel.org/lkml/[email protected]/
> >>>
> >>
> >> Not this week. I have lots of reviews to complete before the end of the week,
> >> with the 5.12 commit window coming up.
> >
> > Ok. By next week, all the fixes should be in linux-next too. So it
> > should be easier if you choose to test.
> >
> >> Given the number of problems observed, I personally think that it is way
> >> too early for this patch. We'll have no end of problems if it is applied
> >> to the upstream kernel in the next commit window. Of course, that is just
> >> my personal opinion.
> >
> > You had said "with 115 of 430 boot tests failing in -next" earlier.
> > Just to be sure I understand it right, you are not saying this patch
> > caused them all right? You are just saying that 115 general boot
> > failures that might mask fw_devlink issues in some of them, right?
> >
>
> Correct.

Is it right to assume [1] fixed all known boot issues due to fw_devlink=on?
[1] - https://lore.kernel.org/lkml/[email protected]/

-Saravana

2021-02-17 03:32:42

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

On Tue, Feb 16, 2021 at 06:39:55PM -0800, Saravana Kannan wrote:
> On Wed, Feb 10, 2021 at 1:21 PM Guenter Roeck <[email protected]> wrote:
> >
> > On 2/10/21 12:52 PM, Saravana Kannan wrote:
> > > On Wed, Feb 10, 2021 at 7:10 AM Guenter Roeck <[email protected]> wrote:
> > >>
> > >> On 2/10/21 12:20 AM, Saravana Kannan wrote:
> > >>> On Tue, Feb 9, 2021 at 9:54 PM Guenter Roeck <[email protected]> wrote:
> > >>>>
> > >>>> On Thu, Dec 17, 2020 at 07:17:03PM -0800, Saravana Kannan wrote:
> > >>>>> Cyclic dependencies in some firmware was one of the last remaining
> > >>>>> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> > >>>>> dependencies don't block probing, set fw_devlink=on by default.
> > >>>>>
> > >>>>> Setting fw_devlink=on by default brings a bunch of benefits (currently,
> > >>>>> only for systems with device tree firmware):
> > >>>>> * Significantly cuts down deferred probes.
> > >>>>> * Device probe is effectively attempted in graph order.
> > >>>>> * Makes it much easier to load drivers as modules without having to
> > >>>>> worry about functional dependencies between modules (depmod is still
> > >>>>> needed for symbol dependencies).
> > >>>>>
> > >>>>> If this patch prevents some devices from probing, it's very likely due
> > >>>>> to the system having one or more device drivers that "probe"/set up a
> > >>>>> device (DT node with compatible property) without creating a struct
> > >>>>> device for it. If we hit such cases, the device drivers need to be
> > >>>>> fixed so that they populate struct devices and probe them like normal
> > >>>>> device drivers so that the driver core is aware of the devices and their
> > >>>>> status. See [1] for an example of such a case.
> > >>>>>
> > >>>>> [1] - https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> > >>>>> Signed-off-by: Saravana Kannan <[email protected]>
> > >>>>
> > >>>> This patch breaks nios2 boot tests in qemu. The system gets stuck when
> > >>>> trying to reboot. Reverting this patch fixes the problem. Bisect log
> > >>>> is attached.
> > >>>
> > >>> Thanks for the report Guenter. Can you please try this series?
> > >>> https://lore.kernel.org/lkml/[email protected]/
> > >>>
> > >>
> > >> Not this week. I have lots of reviews to complete before the end of the week,
> > >> with the 5.12 commit window coming up.
> > >
> > > Ok. By next week, all the fixes should be in linux-next too. So it
> > > should be easier if you choose to test.
> > >
> > >> Given the number of problems observed, I personally think that it is way
> > >> too early for this patch. We'll have no end of problems if it is applied
> > >> to the upstream kernel in the next commit window. Of course, that is just
> > >> my personal opinion.
> > >
> > > You had said "with 115 of 430 boot tests failing in -next" earlier.
> > > Just to be sure I understand it right, you are not saying this patch
> > > caused them all right? You are just saying that 115 general boot
> > > failures that might mask fw_devlink issues in some of them, right?
> > >
> >
> > Correct.
>
> Is it right to assume [1] fixed all known boot issues due to fw_devlink=on?
> [1] - https://lore.kernel.org/lkml/[email protected]/
>

I honestly don't know. Current status of -next in my tests is:

Build results:
total: 149 pass: 144 fail: 5
Qemu test results:
total: 432 pass: 371 fail: 61

This is for next-20210216. Newly introduced failures keep popping up. Some
of the failures have been persistent for weeks, so it is all but impossible
to say if affected platforms experience more than one failure.

Also, please keep in mind that my boot tests are very shallow, along the
line of "it boots, therefore it works". It only tests hardware which is
emulated by qemu and is needed for booting. It tests probably much less
than 1% of driver code. It can and should not be used for any useful
fw_devlink related test coverage.

Thanks,
Guenter

2021-02-17 03:34:23

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

On Tue, Feb 16, 2021 at 7:05 PM Guenter Roeck <[email protected]> wrote:
>
> On Tue, Feb 16, 2021 at 06:39:55PM -0800, Saravana Kannan wrote:
> > On Wed, Feb 10, 2021 at 1:21 PM Guenter Roeck <[email protected]> wrote:
> > >
> > > On 2/10/21 12:52 PM, Saravana Kannan wrote:
> > > > On Wed, Feb 10, 2021 at 7:10 AM Guenter Roeck <[email protected]> wrote:
> > > >>
> > > >> On 2/10/21 12:20 AM, Saravana Kannan wrote:
> > > >>> On Tue, Feb 9, 2021 at 9:54 PM Guenter Roeck <[email protected]> wrote:
> > > >>>>
> > > >>>> On Thu, Dec 17, 2020 at 07:17:03PM -0800, Saravana Kannan wrote:
> > > >>>>> Cyclic dependencies in some firmware was one of the last remaining
> > > >>>>> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> > > >>>>> dependencies don't block probing, set fw_devlink=on by default.
> > > >>>>>
> > > >>>>> Setting fw_devlink=on by default brings a bunch of benefits (currently,
> > > >>>>> only for systems with device tree firmware):
> > > >>>>> * Significantly cuts down deferred probes.
> > > >>>>> * Device probe is effectively attempted in graph order.
> > > >>>>> * Makes it much easier to load drivers as modules without having to
> > > >>>>> worry about functional dependencies between modules (depmod is still
> > > >>>>> needed for symbol dependencies).
> > > >>>>>
> > > >>>>> If this patch prevents some devices from probing, it's very likely due
> > > >>>>> to the system having one or more device drivers that "probe"/set up a
> > > >>>>> device (DT node with compatible property) without creating a struct
> > > >>>>> device for it. If we hit such cases, the device drivers need to be
> > > >>>>> fixed so that they populate struct devices and probe them like normal
> > > >>>>> device drivers so that the driver core is aware of the devices and their
> > > >>>>> status. See [1] for an example of such a case.
> > > >>>>>
> > > >>>>> [1] - https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
> > > >>>>> Signed-off-by: Saravana Kannan <[email protected]>
> > > >>>>
> > > >>>> This patch breaks nios2 boot tests in qemu. The system gets stuck when
> > > >>>> trying to reboot. Reverting this patch fixes the problem. Bisect log
> > > >>>> is attached.
> > > >>>
> > > >>> Thanks for the report Guenter. Can you please try this series?
> > > >>> https://lore.kernel.org/lkml/[email protected]/
> > > >>>
> > > >>
> > > >> Not this week. I have lots of reviews to complete before the end of the week,
> > > >> with the 5.12 commit window coming up.
> > > >
> > > > Ok. By next week, all the fixes should be in linux-next too. So it
> > > > should be easier if you choose to test.
> > > >
> > > >> Given the number of problems observed, I personally think that it is way
> > > >> too early for this patch. We'll have no end of problems if it is applied
> > > >> to the upstream kernel in the next commit window. Of course, that is just
> > > >> my personal opinion.
> > > >
> > > > You had said "with 115 of 430 boot tests failing in -next" earlier.
> > > > Just to be sure I understand it right, you are not saying this patch
> > > > caused them all right? You are just saying that 115 general boot
> > > > failures that might mask fw_devlink issues in some of them, right?
> > > >
> > >
> > > Correct.
> >
> > Is it right to assume [1] fixed all known boot issues due to fw_devlink=on?
> > [1] - https://lore.kernel.org/lkml/[email protected]/
> >
>
> I honestly don't know. Current status of -next in my tests is:
>
> Build results:
> total: 149 pass: 144 fail: 5
> Qemu test results:
> total: 432 pass: 371 fail: 61
>
> This is for next-20210216. Newly introduced failures keep popping up. Some
> of the failures have been persistent for weeks, so it is all but impossible
> to say if affected platforms experience more than one failure.
>
> Also, please keep in mind that my boot tests are very shallow, along the
> line of "it boots, therefore it works". It only tests hardware which is
> emulated by qemu and is needed for booting. It tests probably much less
> than 1% of driver code. It can and should not be used for any useful
> fw_devlink related test coverage.

Agreed. I'm not using this for fw_devlink=on test coverage. Just
checking to make sure I've addressed any issues you've seen.

FYI, you can change it at runtime using the kernel commandline param
fw_devlink=permissive. So, you don't have to build all these kernels
again to test if fw_devlink=on is making things worse.

-Saravana