2021-02-06 03:46:14

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH v4 0/8] Make fw_devlink=on more forgiving

There are a lot of devices/drivers where they never have a struct device
created for them or the driver initializes the hardware without ever
binding to the struct device.

This series is intended to avoid any boot regressions due to such
devices/drivers when fw_devlink=on and also address the handling of
optional suppliers.

Patch 1 and 2 addresses the issue of firmware nodes that look like
they'll have struct devices created for them, but will never actually
have struct devices added for them. For example, DT nodes with a
compatible property that don't have devices added for them.

Patch 3 and 4 allow for handling optional DT bindings.

Patch 5 sets up a generic API to handle drivers that never bind with
their devices.

Patch 6 through 8 update different frameworks to use the new API.

Thanks,
Saravana

Saravana Kannan (8):
driver core: fw_devlink: Detect supplier devices that will never be
added
of: property: Don't add links to absent suppliers
driver core: Add fw_devlink.strict kernel param
of: property: Add fw_devlink support for optional properties
driver core: fw_devlink: Handle suppliers that don't use driver core
irqdomain: Mark fwnodes when their irqdomain is added/removed
PM: domains: Mark fwnodes when their powerdomain is added/removed
clk: Mark fwnodes when their clock provider is added/removed

.../admin-guide/kernel-parameters.txt | 5 ++
drivers/base/core.c | 58 ++++++++++++++++++-
drivers/base/power/domain.c | 2 +
drivers/clk/clk.c | 3 +
drivers/of/property.c | 16 +++--
include/linux/fwnode.h | 20 ++++++-
kernel/irq/irqdomain.c | 2 +
7 files changed, 98 insertions(+), 8 deletions(-)

--
2.30.0.478.g8a0d178c01-goog


2021-02-06 03:47:13

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH v4 3/8] driver core: Add fw_devlink.strict kernel param

This param allows forcing all dependencies to be treated as mandatory.
This will be useful for boards in which all optional dependencies like
IOMMUs and DMAs need to be treated as mandatory dependencies.

Signed-off-by: Saravana Kannan <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 5 +++++
drivers/base/core.c | 12 ++++++++++++
include/linux/fwnode.h | 1 +
3 files changed, 18 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a10b545c2070..692b63644133 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1433,6 +1433,11 @@
to enforce probe and suspend/resume ordering.
rpm -- Like "on", but also use to order runtime PM.

+ fw_devlink.strict=<bool>
+ [KNL] Treat all inferred dependencies as mandatory
+ dependencies. This only applies for fw_devlink=on|rpm.
+ Format: <bool>
+
gamecon.map[2|3]=
[HW,JOY] Multisystem joystick and NES/SNES/PSX pad
support via parallel port (up to 5 devices per port)
diff --git a/drivers/base/core.c b/drivers/base/core.c
index c95b1daabac7..f466ab4f1c35 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1521,6 +1521,13 @@ static int __init fw_devlink_setup(char *arg)
}
early_param("fw_devlink", fw_devlink_setup);

+static bool fw_devlink_strict;
+static int __init fw_devlink_strict_setup(char *arg)
+{
+ return strtobool(arg, &fw_devlink_strict);
+}
+early_param("fw_devlink.strict", fw_devlink_strict_setup);
+
u32 fw_devlink_get_flags(void)
{
return fw_devlink_flags;
@@ -1531,6 +1538,11 @@ static bool fw_devlink_is_permissive(void)
return fw_devlink_flags == FW_DEVLINK_FLAGS_PERMISSIVE;
}

+bool fw_devlink_is_strict(void)
+{
+ return fw_devlink_strict && !fw_devlink_is_permissive();
+}
+
static void fw_devlink_parse_fwnode(struct fwnode_handle *fwnode)
{
if (fwnode->flags & FWNODE_FLAG_LINKS_ADDED)
diff --git a/include/linux/fwnode.h b/include/linux/fwnode.h
index 21082f11473f..d5caefe39d93 100644
--- a/include/linux/fwnode.h
+++ b/include/linux/fwnode.h
@@ -162,6 +162,7 @@ static inline void fwnode_init(struct fwnode_handle *fwnode,
}

extern u32 fw_devlink_get_flags(void);
+extern bool fw_devlink_is_strict(void);
int fwnode_link_add(struct fwnode_handle *con, struct fwnode_handle *sup);
void fwnode_links_purge(struct fwnode_handle *fwnode);

--
2.30.0.478.g8a0d178c01-goog

2021-02-06 03:54:33

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH v4 2/8] of: property: Don't add links to absent suppliers

If driver core marks a firmware node as not a device, don't add fwnode
links where it's a supplier.

Fixes: e590474768f1 ("driver core: Set fw_devlink=on by default")
Signed-off-by: Saravana Kannan <[email protected]>
---
drivers/of/property.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/of/property.c b/drivers/of/property.c
index 6287c6d60bb7..53d163c8d39b 100644
--- a/drivers/of/property.c
+++ b/drivers/of/property.c
@@ -1103,7 +1103,9 @@ static int of_link_to_phandle(struct device_node *con_np,
* created for them.
*/
sup_dev = get_dev_from_fwnode(&sup_np->fwnode);
- if (!sup_dev && of_node_check_flag(sup_np, OF_POPULATED)) {
+ if (!sup_dev &&
+ (of_node_check_flag(sup_np, OF_POPULATED) ||
+ sup_np->fwnode.flags & FWNODE_FLAG_NOT_DEVICE)) {
pr_debug("Not linking %pOFP to %pOFP - No struct device\n",
con_np, sup_np);
of_node_put(sup_np);
--
2.30.0.478.g8a0d178c01-goog

2021-02-06 04:02:18

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v4 0/8] Make fw_devlink=on more forgiving

On Fri, Feb 5, 2021 at 2:26 PM Saravana Kannan <[email protected]> wrote:
>
> There are a lot of devices/drivers where they never have a struct device
> created for them or the driver initializes the hardware without ever
> binding to the struct device.
>
> This series is intended to avoid any boot regressions due to such
> devices/drivers when fw_devlink=on and also address the handling of
> optional suppliers.
>
> Patch 1 and 2 addresses the issue of firmware nodes that look like
> they'll have struct devices created for them, but will never actually
> have struct devices added for them. For example, DT nodes with a
> compatible property that don't have devices added for them.
>
> Patch 3 and 4 allow for handling optional DT bindings.
>
> Patch 5 sets up a generic API to handle drivers that never bind with
> their devices.
>
> Patch 6 through 8 update different frameworks to use the new API.
>
> Thanks,
> Saravana
>

Forgot to add version history:

v1 -> v2:
Patch 1: Added a flag to fwnodes that aren't devices.
Patch 3: New patch to ise the flag set in patch 1 to not create bad links.

v2 -> v3:
- Patch 1: Added Rafael's Ack
- New patches 3 and 4

v3 -> v4:
- No changes to patches 1-4.
- New patches 5-8.

-Saravana

> Saravana Kannan (8):
> driver core: fw_devlink: Detect supplier devices that will never be
> added
> of: property: Don't add links to absent suppliers
> driver core: Add fw_devlink.strict kernel param
> of: property: Add fw_devlink support for optional properties
> driver core: fw_devlink: Handle suppliers that don't use driver core
> irqdomain: Mark fwnodes when their irqdomain is added/removed
> PM: domains: Mark fwnodes when their powerdomain is added/removed
> clk: Mark fwnodes when their clock provider is added/removed
>
> .../admin-guide/kernel-parameters.txt | 5 ++
> drivers/base/core.c | 58 ++++++++++++++++++-
> drivers/base/power/domain.c | 2 +
> drivers/clk/clk.c | 3 +
> drivers/of/property.c | 16 +++--
> include/linux/fwnode.h | 20 ++++++-
> kernel/irq/irqdomain.c | 2 +
> 7 files changed, 98 insertions(+), 8 deletions(-)
>
> --
> 2.30.0.478.g8a0d178c01-goog
>

2021-02-06 19:45:47

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v4 0/8] Make fw_devlink=on more forgiving

Hi Saravana,

On Fri, Feb 5, 2021 at 11:26 PM Saravana Kannan <[email protected]> wrote:
> There are a lot of devices/drivers where they never have a struct device
> created for them or the driver initializes the hardware without ever
> binding to the struct device.
>
> This series is intended to avoid any boot regressions due to such
> devices/drivers when fw_devlink=on and also address the handling of
> optional suppliers.

Thanks for your series!

> Patch 5 sets up a generic API to handle drivers that never bind with
> their devices.
>
> Patch 6 through 8 update different frameworks to use the new API.

> driver core: fw_devlink: Handle suppliers that don't use driver core
> irqdomain: Mark fwnodes when their irqdomain is added/removed
> PM: domains: Mark fwnodes when their powerdomain is added/removed
> clk: Mark fwnodes when their clock provider is added/removed

I take it this is an automatic alternative for letting drivers set the
OF_POPULATED flag manually?

Is this actually safe? It's not uncommon for a driver to register
multiple providers, sometimes even of different types (clock, genpd,
irq, reset[1], ...).
Can you be sure consumer drivers do not start probing while their
dependency is still busy registering providers?

[1] Which brings my attention to the fact that devlink does not consider
"resets" properties yet.

Gr{oetje,eeting}s,

Geert


--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2021-02-06 20:51:52

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v4 0/8] Make fw_devlink=on more forgiving

On Sat, Feb 6, 2021 at 11:41 AM Geert Uytterhoeven <[email protected]> wrote:
>
> Hi Saravana,
>
> On Fri, Feb 5, 2021 at 11:26 PM Saravana Kannan <[email protected]> wrote:
> > There are a lot of devices/drivers where they never have a struct device
> > created for them or the driver initializes the hardware without ever
> > binding to the struct device.
> >
> > This series is intended to avoid any boot regressions due to such
> > devices/drivers when fw_devlink=on and also address the handling of
> > optional suppliers.
>
> Thanks for your series!
>
> > Patch 5 sets up a generic API to handle drivers that never bind with
> > their devices.
> >
> > Patch 6 through 8 update different frameworks to use the new API.
>
> > driver core: fw_devlink: Handle suppliers that don't use driver core
> > irqdomain: Mark fwnodes when their irqdomain is added/removed
> > PM: domains: Mark fwnodes when their powerdomain is added/removed
> > clk: Mark fwnodes when their clock provider is added/removed
>
> I take it this is an automatic alternative for letting drivers set the
> OF_POPULATED flag manually?

The frameworks can still continue setting it to avoid creating dead
"struct devices" that'll never be used. This new flag handles cases
where the device is already created, but will never bind to a driver.
So, they are meant to do slightly different things, but the end result
is removing the need for individual drivers to set OF_POPULATED (and
Rob hates that too).

> Is this actually safe? It's not uncommon for a driver to register
> multiple providers, sometimes even of different types (clock, genpd,
> irq, reset[1], ...).

This flag is just an indication that the fwnode has been initialized
by a driver. It's okay if the flag gets set multiple times when a
driver is registering with multiple frameworks. It's also okay if the
flag is cleared multiple times as the driver is uninitializing the
hardware (although, this is very unlikely for drivers that don't use
device-driver model). When we actually try to create device links, we
just check if this happened without a driver actually binding to this
device. There's no "probing" race because the "status" I check goes
through NO_DRIVER -> PROBING -(registering happens)-> BOUND ->
UNBINDING -(deregistering happens) -> NO_DRIVER. So if the fwnode flag
is getting set as part of the driver's probe function, the "status"
value will never be NO_DRIVER.

> Can you be sure consumer drivers do not start probing while their
> dependency is still busy registering providers?

The code only acts on that flag when trying to create device links
from the consumer to the supplier. This is just a way to tell "hey,
don't bother creating a device link, this supplier will never bind".
So it just avoids blocking the consumer. Doesn't really make the
consumers probe earlier than they would have.

> [1] Which brings my attention to the fact that devlink does not consider
> "resets" properties yet.
>

Yeah, we can add that and other bindings as we go.

-Saravana

2021-02-08 08:50:38

by Marek Szyprowski

[permalink] [raw]
Subject: Re: [PATCH v4 0/8] Make fw_devlink=on more forgiving

Hi Saravana,

On 05.02.2021 23:26, Saravana Kannan wrote:
> There are a lot of devices/drivers where they never have a struct device
> created for them or the driver initializes the hardware without ever
> binding to the struct device.
>
> This series is intended to avoid any boot regressions due to such
> devices/drivers when fw_devlink=on and also address the handling of
> optional suppliers.
>
> Patch 1 and 2 addresses the issue of firmware nodes that look like
> they'll have struct devices created for them, but will never actually
> have struct devices added for them. For example, DT nodes with a
> compatible property that don't have devices added for them.
>
> Patch 3 and 4 allow for handling optional DT bindings.
>
> Patch 5 sets up a generic API to handle drivers that never bind with
> their devices.
>
> Patch 6 through 8 update different frameworks to use the new API.

This patchset fixes probing issue observed on various Exynos based
boards even with commit c09a3e6c97f0 ("soc: samsung: pm_domains: Convert
to regular platform driver") reverted. Thanks!

Tested-by: Marek Szyprowski <[email protected]>

Best regards

--
Marek Szyprowski, PhD
Samsung R&D Institute Poland

2021-02-09 00:02:04

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v4 0/8] Make fw_devlink=on more forgiving

On Mon, Feb 8, 2021 at 12:40 AM Marek Szyprowski
<[email protected]> wrote:
>
> Hi Saravana,
>
> On 05.02.2021 23:26, Saravana Kannan wrote:
> > There are a lot of devices/drivers where they never have a struct device
> > created for them or the driver initializes the hardware without ever
> > binding to the struct device.
> >
> > This series is intended to avoid any boot regressions due to such
> > devices/drivers when fw_devlink=on and also address the handling of
> > optional suppliers.
> >
> > Patch 1 and 2 addresses the issue of firmware nodes that look like
> > they'll have struct devices created for them, but will never actually
> > have struct devices added for them. For example, DT nodes with a
> > compatible property that don't have devices added for them.
> >
> > Patch 3 and 4 allow for handling optional DT bindings.
> >
> > Patch 5 sets up a generic API to handle drivers that never bind with
> > their devices.
> >
> > Patch 6 through 8 update different frameworks to use the new API.
>
> This patchset fixes probing issue observed on various Exynos based
> boards even with commit c09a3e6c97f0 ("soc: samsung: pm_domains: Convert
> to regular platform driver") reverted. Thanks!
>
> Tested-by: Marek Szyprowski <[email protected]>
>

Thanks for testing!

-Saravana

2021-02-10 01:00:13

by Rob Herring (Arm)

[permalink] [raw]
Subject: Re: [PATCH v4 2/8] of: property: Don't add links to absent suppliers

On Fri, 05 Feb 2021 14:26:38 -0800, Saravana Kannan wrote:
> If driver core marks a firmware node as not a device, don't add fwnode
> links where it's a supplier.
>
> Fixes: e590474768f1 ("driver core: Set fw_devlink=on by default")
> Signed-off-by: Saravana Kannan <[email protected]>
> ---
> drivers/of/property.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>

Acked-by: Rob Herring <[email protected]>

2021-02-10 09:09:42

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v4 0/8] Make fw_devlink=on more forgiving

On Wed, Feb 10, 2021 at 12:19 AM <[email protected]> wrote:
>
> Hi, Saravana,
>
> On 2/6/21 12:26 AM, Saravana Kannan wrote:
> > There are a lot of devices/drivers where they never have a struct device
> > created for them or the driver initializes the hardware without ever
> > binding to the struct device.
> >
> > This series is intended to avoid any boot regressions due to such
> > devices/drivers when fw_devlink=on and also address the handling of
> > optional suppliers.
> >
> > Patch 1 and 2 addresses the issue of firmware nodes that look like
> > they'll have struct devices created for them, but will never actually
> > have struct devices added for them. For example, DT nodes with a
> > compatible property that don't have devices added for them.
> >
> > Patch 3 and 4 allow for handling optional DT bindings.
> >
> > Patch 5 sets up a generic API to handle drivers that never bind with
> > their devices.
> >
> > Patch 6 through 8 update different frameworks to use the new API.
> >
> > Thanks,
> > Saravana
> >
> > Saravana Kannan (8):
> > driver core: fw_devlink: Detect supplier devices that will never be
> > added
> > of: property: Don't add links to absent suppliers
> > driver core: Add fw_devlink.strict kernel param
> > of: property: Add fw_devlink support for optional properties
> > driver core: fw_devlink: Handle suppliers that don't use driver core
> > irqdomain: Mark fwnodes when their irqdomain is added/removed
> > PM: domains: Mark fwnodes when their powerdomain is added/removed
> > clk: Mark fwnodes when their clock provider is added/removed
> >
> > .../admin-guide/kernel-parameters.txt | 5 ++
> > drivers/base/core.c | 58 ++++++++++++++++++-
> > drivers/base/power/domain.c | 2 +
> > drivers/clk/clk.c | 3 +
> > drivers/of/property.c | 16 +++--
> > include/linux/fwnode.h | 20 ++++++-
> > kernel/irq/irqdomain.c | 2 +
> > 7 files changed, 98 insertions(+), 8 deletions(-)
> >
>
> Even with this patch set applied, sama5d2_xplained can not boot.
> Patch at [1] makes sama5d2_xplained boot again. Stephen applied it
> to clk-next.

I'm glad you won't actually have any boot issues in 5.12, but the fact
you need [1] with this series doesn't make a lot of sense to me
because:

1. The FWNODE_FLAG_INITIALIZED flag will be set for the clock fwnode
in question way before any consumer devices are added.
2. Any consumer device added after (1) will stop trying to link to the
clock device.

Are you somehow adding a consumer to the clock fwnode before (1)?

Can you try this patch without your clk fix? I was trying to avoid
looping through a list, but looks like your case might somehow need
it?

-Saravana

+++ b/drivers/base/core.c
@@ -943,6 +943,31 @@ static void device_links_missing_supplier(struct
device *dev)
}
}

+static int fw_devlink_check_suppliers(struct device *dev)
+{
+ struct fwnode_link *link;
+ int ret = 0;
+
+ if (!dev->fwnode ||fw_devlink_is_permissive())
+ return 0;
+
+ /*
+ * Device waiting for supplier to become available is not allowed to
+ * probe.
+ */
+ mutex_lock(&fwnode_link_lock);
+ list_for_each_entry(link, &dev->fwnode->suppliers, c_hook) {
+ if (link->supplier->flags & FWNODE_FLAG_INITIALIZED)
+ continue;
+
+ ret = -EPROBE_DEFER;
+ break;
+ }
+ mutex_unlock(&fwnode_link_lock);
+
+ return ret;
+}
+
/**
* device_links_check_suppliers - Check presence of supplier drivers.
* @dev: Consumer device.
@@ -964,21 +989,13 @@ int device_links_check_suppliers(struct device *dev)
struct device_link *link;
int ret = 0;

- /*
- * Device waiting for supplier to become available is not allowed to
- * probe.
- */
- mutex_lock(&fwnode_link_lock);
- if (dev->fwnode && !list_empty(&dev->fwnode->suppliers) &&
- !fw_devlink_is_permissive()) {
+ if (fw_devlink_check_suppliers(dev)) {
dev_dbg(dev, "probe deferral - wait for supplier %pfwP\n",
list_first_entry(&dev->fwnode->suppliers,
struct fwnode_link,
c_hook)->supplier);
- mutex_unlock(&fwnode_link_lock);
return -EPROBE_DEFER;
}
- mutex_unlock(&fwnode_link_lock);

device_links_write_lock();



>
> Cheers,
> ta
>
> [1] https://lore.kernel.org/lkml/[email protected]/

2021-02-11 13:24:03

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v4 0/8] Make fw_devlink=on more forgiving

Hi Saravana,

On Fri, Feb 5, 2021 at 11:26 PM Saravana Kannan <[email protected]> wrote:
> There are a lot of devices/drivers where they never have a struct device
> created for them or the driver initializes the hardware without ever
> binding to the struct device.
>
> This series is intended to avoid any boot regressions due to such
> devices/drivers when fw_devlink=on and also address the handling of
> optional suppliers.
>
> Patch 1 and 2 addresses the issue of firmware nodes that look like
> they'll have struct devices created for them, but will never actually
> have struct devices added for them. For example, DT nodes with a
> compatible property that don't have devices added for them.
>
> Patch 3 and 4 allow for handling optional DT bindings.
>
> Patch 5 sets up a generic API to handle drivers that never bind with
> their devices.
>
> Patch 6 through 8 update different frameworks to use the new API.
>
> Thanks,
> Saravana
>
> Saravana Kannan (8):
> driver core: fw_devlink: Detect supplier devices that will never be
> added
> of: property: Don't add links to absent suppliers
> driver core: Add fw_devlink.strict kernel param
> of: property: Add fw_devlink support for optional properties
> driver core: fw_devlink: Handle suppliers that don't use driver core
> irqdomain: Mark fwnodes when their irqdomain is added/removed
> PM: domains: Mark fwnodes when their powerdomain is added/removed
> clk: Mark fwnodes when their clock provider is added/removed

Thanks for your series, which is now part of driver-core-next.
I gave driver-core-next + [1] a try on various Renesas boards.
Test results are below.
In general, the result looks much better than before.

[1] - https://lore.kernel.org/lkml/[email protected]/

1. R-Car Gen2 (Koelsch), R-Car Gen3 (Salvator-X(S), Ebisu).

- Commit 2dfc564bda4a31bc ("soc: renesas: rcar-sysc: Mark device
node OF_POPULATED after init") is no longer needed (but already
queued for v5.12 anyway)

- Some devices are reprobed, despite their drivers returning
a real error code, and not -EPROBE_DEFER:

renesas_wdt e6020000.watchdog: Watchdog blacklisted on r8a7791 ES1.*
(rwdt_probe() returns -ENODEV)

sh-pfc e6060000.pinctrl: pin GP_7_23 already requested by
ee090000.pci; cannot claim for e6590000.usb
sh-pfc e6060000.pinctrl: pin-247 (e6590000.usb) status -22
sh-pfc e6060000.pinctrl: could not request pin 247
(GP_7_23) from group usb0 on device sh-pfc
renesas_usbhs e6590000.usb: Error applying setting,
reverse things back
renesas_usbhs: probe of e6590000.usb failed with error -22

rcar-pcie fe000000.pcie: host bridge /soc/pcie@fe000000 ranges:
rcar-pcie fe000000.pcie: IO
0x00fe100000..0x00fe1fffff -> 0x0000000000
rcar-pcie fe000000.pcie: MEM
0x00fe200000..0x00fe3fffff -> 0x00fe200000
rcar-pcie fe000000.pcie: MEM
0x0030000000..0x0037ffffff -> 0x0030000000
rcar-pcie fe000000.pcie: MEM
0x0038000000..0x003fffffff -> 0x0038000000
rcar-pcie fe000000.pcie: IB MEM
0x0040000000..0x00bfffffff -> 0x0040000000
rcar-pcie fe000000.pcie: IB MEM
0x0200000000..0x02ffffffff -> 0x0200000000
rcar-pcie fe000000.pcie: PCIe link down
(rcar_pcie_probe() returns -ENODEV)

xhci-hcd ee000000.usb: xHCI Host Controller
xhci-hcd ee000000.usb: new USB bus registered, assigned bus number 7
xhci-hcd ee000000.usb: Direct firmware load for
r8a779x_usb3_v3.dlmem failed with error -2
xhci-hcd ee000000.usb: can't setup: -2
xhci-hcd ee000000.usb: USB bus 7 deregistered
xhci-hcd: probe of ee000000.usb failed with error -2

- The PCI reprobing leads to a memory leak, for which I've sent a fix
"[PATCH] PCI: Fix memory leak in pci_register_io_range()"
https://lore.kernel.org/linux-pci/[email protected]/

- I2C on R-Car Gen3 does not seem to use DMA, according to
/sys/kernel/debug/dmaengine/summary:

-dma4chan0 | e66d8000.i2c:tx
-dma4chan1 | e66d8000.i2c:rx
-dma5chan0 | e6510000.i2c:tx

- Disabling CONFIG_IPMMU_VMSA (IOMMU) now works, good!

ignoring dependency for device, assuming no driver

- Disabling CONFIG_RCAR_DMAC works for most devices, except for
sound:

-rcar_sound ec500000.sound: probed

ALSA device list:
- #0: rcar-sound
+ No soundcards found.

# cat /sys/kernel/debug/devices_deferred
2-0010
sound
ec500000.sound

platform e6510000.i2c: Linked as a sync state only
consumer to ec500000.sound
platform ec500000.sound: Linked as a consumer to e6060000.pinctrl
platform ec500000.sound: Linked as a consumer to
e6150000.clock-controller
i2c 2-0010: Linked as a consumer to ec500000.sound
platform ec500000.sound: Linked as a consumer to 2-004f
cs2000-cp 2-004f: revision - C1
i2c-rcar e6510000.i2c: probed
i2c-rcar e6510000.i2c: Dropping the link to ec500000.sound
i2c 2-0010: probe deferral - supplier ec500000.sound not ready

With CONFIG_RCAR_DMAC=y, ec500000.sound is probed quite early.

arch/arm64/boot/dts/renesas/r8a77951-salvator-xs.dts

ak4613: codec@10 {
clocks = <&rcar_sound 3>;

port {
ak4613_endpoint: endpoint {
remote-endpoint = <&rsnd_endpoint0>;
};
};
};

sound_card: sound {
dais = <&rsnd_port0 /* ak4613 */
&rsnd_port1 /* HDMI0 */
&rsnd_port2>; /* HDMI1 */
};

rcar_sound: sound@ec500000 {
ports {
rsnd_port0: port@0 {
rsnd_endpoint0: endpoint {
remote-endpoint =
<&ak4613_endpoint>;
}
}
}
};


2. SH/R-Mobile AG5 (kzm9g), APE6 (ape6evm), A1 (armadillo800-eva)

- "PATCH] soc: renesas: rmobile-sysc: Set OF_POPULATED and absorb
reset handling" is no longer needed
https://lore.kernel.org/linux-arm-kernel/[email protected]/

- On R-Mobile A1, I get a BUG and a memory leak:

BUG: spinlock bad magic on CPU#0, swapper/1
lock: lcdc0_device+0x10c/0x308, .magic: 00000000, .owner:
<none>/-1, .owner_cpu: 0
CPU: 0 PID: 1 Comm: swapper Not tainted
5.11.0-rc5-armadillo-00032-gf0a85c26907e #266
Hardware name: Generic R8A7740 (Flattened Device Tree)
[<c010c3c8>] (unwind_backtrace) from [<c010a49c>]
(show_stack+0x10/0x14)
[<c010a49c>] (show_stack) from [<c0159534>]
(do_raw_spin_lock+0x20/0x94)
[<c0159534>] (do_raw_spin_lock) from [<c04089d8>]
(dev_pm_get_subsys_data+0x30/0xa0)
[<c04089d8>] (dev_pm_get_subsys_data) from [<c0413698>]
(genpd_add_device+0x34/0x1c0)
[<c0413698>] (genpd_add_device) from [<c041389c>]
(of_genpd_add_device+0x34/0x4c)
[<c041389c>] (of_genpd_add_device) from [<c0a1e9bc>]
(board_staging_register_device+0xf8/0x118)
[<c0a1e9bc>] (board_staging_register_device) from
[<c0a1ea00>] (board_staging_register_devices+0x24/0x28)
[<c0a1ea00>] (board_staging_register_devices) from
[<c0a1ea30>] (runtime_board_check+0x2c/0x40)
[<c0a1ea30>] (runtime_board_check) from [<c0101fac>]
(do_one_initcall+0xe0/0x278)
[<c0101fac>] (do_one_initcall) from [<c0a01034>]
(kernel_init_freeable+0x174/0x1c0)
[<c0a01034>] (kernel_init_freeable) from [<c05fd568>]
(kernel_init+0x8/0x118)
[<c05fd568>] (kernel_init) from [<c010011c>]
(ret_from_fork+0x14/0x38)
Exception stack(0xc19c9fb0 to 0xc19c9ff8)
9fa0: 00000000
00000000 00000000 00000000
9fc0: 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000
9fe0: 00000000 00000000 00000000 00000000 00000013 00000000

unreferenced object 0xc4134e00 (size 512):
comm "swapper", pid 1, jiffies 4294937296 (age 3541.930s)
hex dump (first 32 bytes):
00 4e 13 c4 00 4e 13 c4 ff ff ff 7f ff ff ff 7f
.N...N..........
ff ff ff 7f 02 00 00 00 00 5f 13 c4 1c 4e 13 c4
........._...N..
backtrace:
[<de1a3c34>] dev_pm_qos_constraints_allocate+0x10/0xcc
[<d21cf6e4>] dev_pm_qos_add_notifier+0x6c/0xd0
[<e04bbc90>] genpd_add_device+0x178/0x1c0
[<95067303>] of_genpd_add_device+0x34/0x4c
[<c334b97a>] board_staging_register_device+0xf8/0x118
[<01bd495a>] board_staging_register_devices+0x24/0x28
[<fb25a5d8>] runtime_board_check+0x2c/0x40
[<65aed679>] do_one_initcall+0xe0/0x278
[<97e3f4f7>] kernel_init_freeable+0x174/0x1c0
[<63c8fed0>] kernel_init+0x8/0x118
[<f704d96c>] ret_from_fork+0x14/0x38
[<00000000>] 0x0

3. RZ/A1 and RZ/A2: No issues.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2021-02-11 13:30:44

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v4 0/8] Make fw_devlink=on more forgiving

Hi Saravana,

On Thu, Feb 11, 2021 at 2:00 PM Geert Uytterhoeven <[email protected]> wrote:
> - Disabling CONFIG_RCAR_DMAC works for most devices, except for
> sound:

Please ignore. DMA is mandatory for sound, and thus fails in the same
way on v5.11-rc5.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2021-02-12 03:04:47

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v4 0/8] Make fw_devlink=on more forgiving

On Thu, Feb 11, 2021 at 5:00 AM Geert Uytterhoeven <[email protected]> wrote:
>
> Hi Saravana,
>
> On Fri, Feb 5, 2021 at 11:26 PM Saravana Kannan <[email protected]> wrote:
> > There are a lot of devices/drivers where they never have a struct device
> > created for them or the driver initializes the hardware without ever
> > binding to the struct device.
> >
> > This series is intended to avoid any boot regressions due to such
> > devices/drivers when fw_devlink=on and also address the handling of
> > optional suppliers.
> >
> > Patch 1 and 2 addresses the issue of firmware nodes that look like
> > they'll have struct devices created for them, but will never actually
> > have struct devices added for them. For example, DT nodes with a
> > compatible property that don't have devices added for them.
> >
> > Patch 3 and 4 allow for handling optional DT bindings.
> >
> > Patch 5 sets up a generic API to handle drivers that never bind with
> > their devices.
> >
> > Patch 6 through 8 update different frameworks to use the new API.
> >
> > Thanks,
> > Saravana
> >
> > Saravana Kannan (8):
> > driver core: fw_devlink: Detect supplier devices that will never be
> > added
> > of: property: Don't add links to absent suppliers
> > driver core: Add fw_devlink.strict kernel param
> > of: property: Add fw_devlink support for optional properties
> > driver core: fw_devlink: Handle suppliers that don't use driver core
> > irqdomain: Mark fwnodes when their irqdomain is added/removed
> > PM: domains: Mark fwnodes when their powerdomain is added/removed
> > clk: Mark fwnodes when their clock provider is added/removed
>
> Thanks for your series, which is now part of driver-core-next.
> I gave driver-core-next + [1] a try on various Renesas boards.

Thanks!

> Test results are below.
> In general, the result looks much better than before.

Ah, good to hear this.

> [1] - https://lore.kernel.org/lkml/[email protected]/
>
> 1. R-Car Gen2 (Koelsch), R-Car Gen3 (Salvator-X(S), Ebisu).
>
> - Commit 2dfc564bda4a31bc ("soc: renesas: rcar-sysc: Mark device
> node OF_POPULATED after init") is no longer needed (but already
> queued for v5.12 anyway)

Rob doesn't like the proliferation of OF_POPULATED and we don't need
it anymore, so maybe work it out with him? It's a balance between some
wasted memory (struct device(s)) vs not proliferating OF_POPULATED.

> - Some devices are reprobed, despite their drivers returning
> a real error code, and not -EPROBE_DEFER:

Sorry, it's not obvious from the logs below where "reprobing" is
happening. Can you give more pointers please?

Also, thinking more about this, the only way I could see this happen is:
1. Device fails with error that's not -EPROBE_DEFER
2. It somehow gets added to a device link (with AUTOPROBE_CONSUMER
flag) where it's a consumer.
3. The supplier probes and the device gets added to the deferred probe
list again.

But I can't see how this sequence can happen. Device links are created
only when a device is added. And is the supplier isn't added yet, the
consumer wouldn't have probed in the first place.

Other than "annoying waste of time" is this causing any other problems?

> renesas_wdt e6020000.watchdog: Watchdog blacklisted on r8a7791 ES1.*
> (rwdt_probe() returns -ENODEV)
>
> sh-pfc e6060000.pinctrl: pin GP_7_23 already requested by
> ee090000.pci; cannot claim for e6590000.usb
> sh-pfc e6060000.pinctrl: pin-247 (e6590000.usb) status -22
> sh-pfc e6060000.pinctrl: could not request pin 247
> (GP_7_23) from group usb0 on device sh-pfc
> renesas_usbhs e6590000.usb: Error applying setting,
> reverse things back
> renesas_usbhs: probe of e6590000.usb failed with error -22
>
> rcar-pcie fe000000.pcie: host bridge /soc/pcie@fe000000 ranges:
> rcar-pcie fe000000.pcie: IO
> 0x00fe100000..0x00fe1fffff -> 0x0000000000
> rcar-pcie fe000000.pcie: MEM
> 0x00fe200000..0x00fe3fffff -> 0x00fe200000
> rcar-pcie fe000000.pcie: MEM
> 0x0030000000..0x0037ffffff -> 0x0030000000
> rcar-pcie fe000000.pcie: MEM
> 0x0038000000..0x003fffffff -> 0x0038000000
> rcar-pcie fe000000.pcie: IB MEM
> 0x0040000000..0x00bfffffff -> 0x0040000000
> rcar-pcie fe000000.pcie: IB MEM
> 0x0200000000..0x02ffffffff -> 0x0200000000
> rcar-pcie fe000000.pcie: PCIe link down
> (rcar_pcie_probe() returns -ENODEV)
>
> xhci-hcd ee000000.usb: xHCI Host Controller
> xhci-hcd ee000000.usb: new USB bus registered, assigned bus number 7
> xhci-hcd ee000000.usb: Direct firmware load for
> r8a779x_usb3_v3.dlmem failed with error -2
> xhci-hcd ee000000.usb: can't setup: -2
> xhci-hcd ee000000.usb: USB bus 7 deregistered
> xhci-hcd: probe of ee000000.usb failed with error -2
>
> - The PCI reprobing leads to a memory leak, for which I've sent a fix
> "[PATCH] PCI: Fix memory leak in pci_register_io_range()"
> https://lore.kernel.org/linux-pci/[email protected]/

Wrt PCI reprobing,
1. Is this PCI never expected to probe, but it's being reattempted
despite the NOT EPROBE_DEFER error? Or
2. The PCI was deferred probe when it should have probed and then when
it's finally reattemped and it could succeed, we are hitting this mem
leak issue?

I'm basically trying to distinguish between "this stuff should never
be retried" vs "this/it's suppliers got probe deferred with
fw_devlink=on vs but didn't get probe deferred with
fw_devlink=permissive and that's causing issues"

> - I2C on R-Car Gen3 does not seem to use DMA, according to
> /sys/kernel/debug/dmaengine/summary:
>
> -dma4chan0 | e66d8000.i2c:tx
> -dma4chan1 | e66d8000.i2c:rx
> -dma5chan0 | e6510000.i2c:tx

I think I need more context on the problem before I can try to fix it.
I'm also very unfamiliar with that file. With fw_devlink=permissive,
I2C was using DMA? If so, the next step is to see if the I2C relative
probe order with DMA is getting changed and if so, why.

> - Disabling CONFIG_IPMMU_VMSA (IOMMU) now works, good!
>
> ignoring dependency for device, assuming no driver
>
> - Disabling CONFIG_RCAR_DMAC works for most devices, except for
> sound:
>
> -rcar_sound ec500000.sound: probed
>
> ALSA device list:
> - #0: rcar-sound
> + No soundcards found.
>
> # cat /sys/kernel/debug/devices_deferred
> 2-0010
> sound
> ec500000.sound
>
> platform e6510000.i2c: Linked as a sync state only
> consumer to ec500000.sound
> platform ec500000.sound: Linked as a consumer to e6060000.pinctrl
> platform ec500000.sound: Linked as a consumer to
> e6150000.clock-controller
> i2c 2-0010: Linked as a consumer to ec500000.sound
> platform ec500000.sound: Linked as a consumer to 2-004f
> cs2000-cp 2-004f: revision - C1
> i2c-rcar e6510000.i2c: probed
> i2c-rcar e6510000.i2c: Dropping the link to ec500000.sound
> i2c 2-0010: probe deferral - supplier ec500000.sound not ready
>
> With CONFIG_RCAR_DMAC=y, ec500000.sound is probed quite early.

I saw your other reply, so I'll ignore this sound/DMA issue.

>
> arch/arm64/boot/dts/renesas/r8a77951-salvator-xs.dts
>
> ak4613: codec@10 {
> clocks = <&rcar_sound 3>;
>
> port {
> ak4613_endpoint: endpoint {
> remote-endpoint = <&rsnd_endpoint0>;
> };
> };
> };
>
> sound_card: sound {
> dais = <&rsnd_port0 /* ak4613 */
> &rsnd_port1 /* HDMI0 */
> &rsnd_port2>; /* HDMI1 */
> };
>
> rcar_sound: sound@ec500000 {
> ports {
> rsnd_port0: port@0 {
> rsnd_endpoint0: endpoint {
> remote-endpoint =
> <&ak4613_endpoint>;
> }
> }
> }
> };
>
>
> 2. SH/R-Mobile AG5 (kzm9g), APE6 (ape6evm), A1 (armadillo800-eva)
>
> - "PATCH] soc: renesas: rmobile-sysc: Set OF_POPULATED and absorb
> reset handling" is no longer needed
> https://lore.kernel.org/linux-arm-kernel/[email protected]/

Good to see more evidence that this series is fixing things at a more
generic level.

> - On R-Mobile A1, I get a BUG and a memory leak:
>
> BUG: spinlock bad magic on CPU#0, swapper/1
> lock: lcdc0_device+0x10c/0x308, .magic: 00000000, .owner:
> <none>/-1, .owner_cpu: 0
> CPU: 0 PID: 1 Comm: swapper Not tainted
> 5.11.0-rc5-armadillo-00032-gf0a85c26907e #266
> Hardware name: Generic R8A7740 (Flattened Device Tree)
> [<c010c3c8>] (unwind_backtrace) from [<c010a49c>]
> (show_stack+0x10/0x14)
> [<c010a49c>] (show_stack) from [<c0159534>]
> (do_raw_spin_lock+0x20/0x94)
> [<c0159534>] (do_raw_spin_lock) from [<c04089d8>]
> (dev_pm_get_subsys_data+0x30/0xa0)
> [<c04089d8>] (dev_pm_get_subsys_data) from [<c0413698>]
> (genpd_add_device+0x34/0x1c0)
> [<c0413698>] (genpd_add_device) from [<c041389c>]
> (of_genpd_add_device+0x34/0x4c)
> [<c041389c>] (of_genpd_add_device) from [<c0a1e9bc>]
> (board_staging_register_device+0xf8/0x118)
> [<c0a1e9bc>] (board_staging_register_device) from
> [<c0a1ea00>] (board_staging_register_devices+0x24/0x28)
> [<c0a1ea00>] (board_staging_register_devices) from
> [<c0a1ea30>] (runtime_board_check+0x2c/0x40)
> [<c0a1ea30>] (runtime_board_check) from [<c0101fac>]
> (do_one_initcall+0xe0/0x278)
> [<c0101fac>] (do_one_initcall) from [<c0a01034>]
> (kernel_init_freeable+0x174/0x1c0)
> [<c0a01034>] (kernel_init_freeable) from [<c05fd568>]
> (kernel_init+0x8/0x118)
> [<c05fd568>] (kernel_init) from [<c010011c>]
> (ret_from_fork+0x14/0x38)
> Exception stack(0xc19c9fb0 to 0xc19c9ff8)
> 9fa0: 00000000
> 00000000 00000000 00000000
> 9fc0: 00000000 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000
> 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
>
> unreferenced object 0xc4134e00 (size 512):
> comm "swapper", pid 1, jiffies 4294937296 (age 3541.930s)
> hex dump (first 32 bytes):
> 00 4e 13 c4 00 4e 13 c4 ff ff ff 7f ff ff ff 7f
> .N...N..........
> ff ff ff 7f 02 00 00 00 00 5f 13 c4 1c 4e 13 c4
> ........._...N..
> backtrace:
> [<de1a3c34>] dev_pm_qos_constraints_allocate+0x10/0xcc
> [<d21cf6e4>] dev_pm_qos_add_notifier+0x6c/0xd0
> [<e04bbc90>] genpd_add_device+0x178/0x1c0
> [<95067303>] of_genpd_add_device+0x34/0x4c
> [<c334b97a>] board_staging_register_device+0xf8/0x118
> [<01bd495a>] board_staging_register_devices+0x24/0x28
> [<fb25a5d8>] runtime_board_check+0x2c/0x40
> [<65aed679>] do_one_initcall+0xe0/0x278
> [<97e3f4f7>] kernel_init_freeable+0x174/0x1c0
> [<63c8fed0>] kernel_init+0x8/0x118
> [<f704d96c>] ret_from_fork+0x14/0x38
> [<00000000>] 0x0

Hmm... I looked at this in bits and pieces throughout the day. At
least spent an hour looking at this. This doesn't make a lot of sense
to me. I don't even touch anything in this code path AFAICT. Are
modules/kernel mixed up somehow? I need more info before I can help.
Does reverting my pm domain change make any difference (assume it
boots this far without it).

>
> 3. RZ/A1 and RZ/A2: No issues.

Great!

-Saravana

2021-02-12 08:19:11

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v4 0/8] Make fw_devlink=on more forgiving

Hi Saravana,

On Fri, Feb 12, 2021 at 4:00 AM Saravana Kannan <[email protected]> wrote:
> On Thu, Feb 11, 2021 at 5:00 AM Geert Uytterhoeven <[email protected]> wrote:
> > 1. R-Car Gen2 (Koelsch), R-Car Gen3 (Salvator-X(S), Ebisu).
> >
> > - Commit 2dfc564bda4a31bc ("soc: renesas: rcar-sysc: Mark device
> > node OF_POPULATED after init") is no longer needed (but already
> > queued for v5.12 anyway)
>
> Rob doesn't like the proliferation of OF_POPULATED and we don't need
> it anymore, so maybe work it out with him? It's a balance between some
> wasted memory (struct device(s)) vs not proliferating OF_POPULATED.

Rob: should it be reverted? For v5.13?
I guess other similar "fixes" went in in the mean time.

> > - Some devices are reprobed, despite their drivers returning
> > a real error code, and not -EPROBE_DEFER:
>
> Sorry, it's not obvious from the logs below where "reprobing" is
> happening. Can you give more pointers please?

My log was indeed not a full log, but just the reprobes happening.
I'll send you a full log by private email.

> Also, thinking more about this, the only way I could see this happen is:
> 1. Device fails with error that's not -EPROBE_DEFER
> 2. It somehow gets added to a device link (with AUTOPROBE_CONSUMER
> flag) where it's a consumer.
> 3. The supplier probes and the device gets added to the deferred probe
> list again.
>
> But I can't see how this sequence can happen. Device links are created
> only when a device is added. And is the supplier isn't added yet, the
> consumer wouldn't have probed in the first place.

The full log doesn't show any evidence of the device being added
to a list in between the two probes.

> Other than "annoying waste of time" is this causing any other problems?

Probably not. But see below.

> > - The PCI reprobing leads to a memory leak, for which I've sent a fix
> > "[PATCH] PCI: Fix memory leak in pci_register_io_range()"
> > https://lore.kernel.org/linux-pci/[email protected]/
>
> Wrt PCI reprobing,
> 1. Is this PCI never expected to probe, but it's being reattempted
> despite the NOT EPROBE_DEFER error? Or

There is no PCIe card present, so the failure is expected.
Later it is reprobed, which of course fails again.

> 2. The PCI was deferred probe when it should have probed and then when
> it's finally reattemped and it could succeed, we are hitting this mem
> leak issue?

I think the leak has always been there, but it was just exposed by
this unneeded reprobe. I don't think a reprobe after that specific
error path had ever happened before.

> I'm basically trying to distinguish between "this stuff should never
> be retried" vs "this/it's suppliers got probe deferred with
> fw_devlink=on vs but didn't get probe deferred with
> fw_devlink=permissive and that's causing issues"

There should not be a probe deferral, as no -EPROBE_DEFER was
returned.

> > - I2C on R-Car Gen3 does not seem to use DMA, according to
> > /sys/kernel/debug/dmaengine/summary:
> >
> > -dma4chan0 | e66d8000.i2c:tx
> > -dma4chan1 | e66d8000.i2c:rx
> > -dma5chan0 | e6510000.i2c:tx
>
> I think I need more context on the problem before I can try to fix it.
> I'm also very unfamiliar with that file. With fw_devlink=permissive,
> I2C was using DMA? If so, the next step is to see if the I2C relative
> probe order with DMA is getting changed and if so, why.

Yes, I plan to dig deeper to see what really happens...

> > - On R-Mobile A1, I get a BUG and a memory leak:
> >
> > BUG: spinlock bad magic on CPU#0, swapper/1

>
> Hmm... I looked at this in bits and pieces throughout the day. At
> least spent an hour looking at this. This doesn't make a lot of sense
> to me. I don't even touch anything in this code path AFAICT. Are
> modules/kernel mixed up somehow? I need more info before I can help.
> Does reverting my pm domain change make any difference (assume it
> boots this far without it).

I plan to dig deeper to see what really happens...

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2021-02-12 21:03:38

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v4 0/8] Make fw_devlink=on more forgiving

On Fri, Feb 12, 2021 at 12:15 AM Geert Uytterhoeven
<[email protected]> wrote:
>
> Hi Saravana,
>
> On Fri, Feb 12, 2021 at 4:00 AM Saravana Kannan <[email protected]> wrote:
> > On Thu, Feb 11, 2021 at 5:00 AM Geert Uytterhoeven <[email protected]> wrote:
> > > 1. R-Car Gen2 (Koelsch), R-Car Gen3 (Salvator-X(S), Ebisu).
> > >
> > > - Commit 2dfc564bda4a31bc ("soc: renesas: rcar-sysc: Mark device
> > > node OF_POPULATED after init") is no longer needed (but already
> > > queued for v5.12 anyway)
> >
> > Rob doesn't like the proliferation of OF_POPULATED and we don't need
> > it anymore, so maybe work it out with him? It's a balance between some
> > wasted memory (struct device(s)) vs not proliferating OF_POPULATED.
>
> Rob: should it be reverted? For v5.13?
> I guess other similar "fixes" went in in the mean time.
>
> > > - Some devices are reprobed, despite their drivers returning
> > > a real error code, and not -EPROBE_DEFER:
> >
> > Sorry, it's not obvious from the logs below where "reprobing" is
> > happening. Can you give more pointers please?
>
> My log was indeed not a full log, but just the reprobes happening.
> I'll send you a full log by private email.
>
> > Also, thinking more about this, the only way I could see this happen is:
> > 1. Device fails with error that's not -EPROBE_DEFER
> > 2. It somehow gets added to a device link (with AUTOPROBE_CONSUMER
> > flag) where it's a consumer.
> > 3. The supplier probes and the device gets added to the deferred probe
> > list again.
> >
> > But I can't see how this sequence can happen. Device links are created
> > only when a device is added. And is the supplier isn't added yet, the
> > consumer wouldn't have probed in the first place.
>
> The full log doesn't show any evidence of the device being added
> to a list in between the two probes.
>
> > Other than "annoying waste of time" is this causing any other problems?
>
> Probably not. But see below.
>
> > > - The PCI reprobing leads to a memory leak, for which I've sent a fix
> > > "[PATCH] PCI: Fix memory leak in pci_register_io_range()"
> > > https://lore.kernel.org/linux-pci/[email protected]/
> >
> > Wrt PCI reprobing,
> > 1. Is this PCI never expected to probe, but it's being reattempted
> > despite the NOT EPROBE_DEFER error? Or
>
> There is no PCIe card present, so the failure is expected.
> Later it is reprobed, which of course fails again.
>
> > 2. The PCI was deferred probe when it should have probed and then when
> > it's finally reattemped and it could succeed, we are hitting this mem
> > leak issue?
>
> I think the leak has always been there, but it was just exposed by
> this unneeded reprobe. I don't think a reprobe after that specific
> error path had ever happened before.
>
> > I'm basically trying to distinguish between "this stuff should never
> > be retried" vs "this/it's suppliers got probe deferred with
> > fw_devlink=on vs but didn't get probe deferred with
> > fw_devlink=permissive and that's causing issues"
>
> There should not be a probe deferral, as no -EPROBE_DEFER was
> returned.
>
> > > - I2C on R-Car Gen3 does not seem to use DMA, according to
> > > /sys/kernel/debug/dmaengine/summary:
> > >
> > > -dma4chan0 | e66d8000.i2c:tx
> > > -dma4chan1 | e66d8000.i2c:rx
> > > -dma5chan0 | e6510000.i2c:tx
> >
> > I think I need more context on the problem before I can try to fix it.
> > I'm also very unfamiliar with that file. With fw_devlink=permissive,
> > I2C was using DMA? If so, the next step is to see if the I2C relative
> > probe order with DMA is getting changed and if so, why.
>
> Yes, I plan to dig deeper to see what really happens...

Try fw_devlink.strict (you'll need IOMMU enabled too). If that fixes
it and you also don't see this issue with fw_devlink=permissive, then
it means there's probably some unnecessary probe deferral that we
should try to avoid. At least, that's my hunch right now.

Thanks,
Saravana

>
> > > - On R-Mobile A1, I get a BUG and a memory leak:
> > >
> > > BUG: spinlock bad magic on CPU#0, swapper/1
>
> >
> > Hmm... I looked at this in bits and pieces throughout the day. At
> > least spent an hour looking at this. This doesn't make a lot of sense
> > to me. I don't even touch anything in this code path AFAICT. Are
> > modules/kernel mixed up somehow? I need more info before I can help.
> > Does reverting my pm domain change make any difference (assume it
> > boots this far without it).
>
> I plan to dig deeper to see what really happens...
>
> Gr{oetje,eeting}s,
>
> Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
> -- Linus Torvalds

2021-02-15 11:24:32

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v4 0/8] Make fw_devlink=on more forgiving

On Thu, Feb 11, 2021 at 2:00 PM Geert Uytterhoeven <[email protected]> wrote:
> On Fri, Feb 5, 2021 at 11:26 PM Saravana Kannan <[email protected]> wrote:
> > There are a lot of devices/drivers where they never have a struct device
> > created for them or the driver initializes the hardware without ever
> > binding to the struct device.
> >
> > This series is intended to avoid any boot regressions due to such
> > devices/drivers when fw_devlink=on and also address the handling of
> > optional suppliers.

> - Some devices are reprobed, despite their drivers returning
> a real error code, and not -EPROBE_DEFER:
>
> renesas_wdt e6020000.watchdog: Watchdog blacklisted on r8a7791 ES1.*
> (rwdt_probe() returns -ENODEV)
>
> sh-pfc e6060000.pinctrl: pin GP_7_23 already requested by
> ee090000.pci; cannot claim for e6590000.usb
> sh-pfc e6060000.pinctrl: pin-247 (e6590000.usb) status -22
> sh-pfc e6060000.pinctrl: could not request pin 247
> (GP_7_23) from group usb0 on device sh-pfc
> renesas_usbhs e6590000.usb: Error applying setting,
> reverse things back
> renesas_usbhs: probe of e6590000.usb failed with error -22
>
> rcar-pcie fe000000.pcie: host bridge /soc/pcie@fe000000 ranges:
> rcar-pcie fe000000.pcie: IO
> 0x00fe100000..0x00fe1fffff -> 0x0000000000
> rcar-pcie fe000000.pcie: MEM
> 0x00fe200000..0x00fe3fffff -> 0x00fe200000
> rcar-pcie fe000000.pcie: MEM
> 0x0030000000..0x0037ffffff -> 0x0030000000
> rcar-pcie fe000000.pcie: MEM
> 0x0038000000..0x003fffffff -> 0x0038000000
> rcar-pcie fe000000.pcie: IB MEM
> 0x0040000000..0x00bfffffff -> 0x0040000000
> rcar-pcie fe000000.pcie: IB MEM
> 0x0200000000..0x02ffffffff -> 0x0200000000
> rcar-pcie fe000000.pcie: PCIe link down
> (rcar_pcie_probe() returns -ENODEV)
>
> xhci-hcd ee000000.usb: xHCI Host Controller
> xhci-hcd ee000000.usb: new USB bus registered, assigned bus number 7
> xhci-hcd ee000000.usb: Direct firmware load for
> r8a779x_usb3_v3.dlmem failed with error -2
> xhci-hcd ee000000.usb: can't setup: -2
> xhci-hcd ee000000.usb: USB bus 7 deregistered
> xhci-hcd: probe of ee000000.usb failed with error -2

Consumers are added to the deferred probe pending list before
they are probed, but not removed on probe failure.
Patch sent
"[PATCH] driver core: Fix double failed probing with fw_devlink=on"
https://lore.kernel.org/linux-renesas-soc/[email protected]/

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2021-02-15 12:41:43

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v4 0/8] Make fw_devlink=on more forgiving

Hi Saravana,

On Fri, Feb 12, 2021 at 4:00 AM Saravana Kannan <[email protected]> wrote:
> On Thu, Feb 11, 2021 at 5:00 AM Geert Uytterhoeven <[email protected]> wrote:
> > - I2C on R-Car Gen3 does not seem to use DMA, according to
> > /sys/kernel/debug/dmaengine/summary:
> >
> > -dma4chan0 | e66d8000.i2c:tx
> > -dma4chan1 | e66d8000.i2c:rx
> > -dma5chan0 | e6510000.i2c:tx
>
> I think I need more context on the problem before I can try to fix it.
> I'm also very unfamiliar with that file. With fw_devlink=permissive,
> I2C was using DMA? If so, the next step is to see if the I2C relative
> probe order with DMA is getting changed and if so, why.

More detailed log:

platform e66d8000.i2c: Linked as a consumer to e6150000.clock-controller
platform e66d8000.i2c: Linked as a sync state only consumer to e6055400.gpio

Why is e66d8000.i2c not linked as a consumer to e6700000.dma-controller?

platform e6700000.dma-controller: Linked as a consumer to
e6150000.clock-controller
platform e66d8000.i2c: Added to deferred list
platform e6700000.dma-controller: Added to deferred list

bus: 'platform': driver_probe_device: matched device
e6700000.dma-controller with driver rcar-dmac
bus: 'platform': really_probe: probing driver rcar-dmac with
device e6700000.dma-controller
platform e6700000.dma-controller: Driver rcar-dmac requests probe deferral

bus: 'platform': driver_probe_device: matched device e66d8000.i2c
with driver i2c-rcar
bus: 'platform': really_probe: probing driver i2c-rcar with device
e66d8000.i2c

I2C becomes available...

i2c-rcar e66d8000.i2c: request_channel failed for tx (-517)
[...]

but DMA is not available yet, so the driver falls back to PIO.

driver: 'i2c-rcar': driver_bound: bound to device 'e66d8000.i2c'
bus: 'platform': really_probe: bound device e66d8000.i2c to driver i2c-rcar

platform e6700000.dma-controller: Retrying from deferred list
bus: 'platform': driver_probe_device: matched device
e6700000.dma-controller with driver rcar-dmac
bus: 'platform': really_probe: probing driver rcar-dmac with
device e6700000.dma-controller
platform e6700000.dma-controller: Driver rcar-dmac requests probe deferral
platform e6700000.dma-controller: Added to deferred list
platform e6700000.dma-controller: Retrying from deferred list
bus: 'platform': driver_probe_device: matched device
e6700000.dma-controller with driver rcar-dmac
bus: 'platform': really_probe: probing driver rcar-dmac with
device e6700000.dma-controller
driver: 'rcar-dmac': driver_bound: bound to device 'e6700000.dma-controller'
bus: 'platform': really_probe: bound device
e6700000.dma-controller to driver rcar-dmac

DMA becomes available.

Here userspace is entered. /sys/kernel/debug/dmaengine/summary shows
that the I2C controllers do not have DMA channels allocated, as the
kernel has performed no more I2C transfers after DMA became available.

Using i2cdetect shows that DMA is used, which is good:

i2c-rcar e66d8000.i2c: got DMA channel for rx

With permissive devlinks, the clock controller consumers are not added
to the deferred probing list, and probe order is slightly different.
The I2C controllers are still probed before the DMA controllers.
But DMA becomes available a bit earlier, before the probing of the last
I2C slave driver. Hence /sys/kernel/debug/dmaengine/summary shows that
some I2C transfers did use DMA.

So the real issue is that e66d8000.i2c not linked as a consumer to
e6700000.dma-controller.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2021-02-15 15:22:53

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v4 0/8] Make fw_devlink=on more forgiving

Hi Saravana,

On Fri, Feb 12, 2021 at 4:00 AM Saravana Kannan <[email protected]> wrote:
> On Thu, Feb 11, 2021 at 5:00 AM Geert Uytterhoeven <[email protected]> wrote:
> > 1. R-Car Gen2 (Koelsch), R-Car Gen3 (Salvator-X(S), Ebisu).
> >
> > - Commit 2dfc564bda4a31bc ("soc: renesas: rcar-sysc: Mark device
> > node OF_POPULATED after init") is no longer needed (but already
> > queued for v5.12 anyway)
>
> Rob doesn't like the proliferation of OF_POPULATED and we don't need
> it anymore, so maybe work it out with him? It's a balance between some
> wasted memory (struct device(s)) vs not proliferating OF_POPULATED.

> > 2. SH/R-Mobile AG5 (kzm9g), APE6 (ape6evm), A1 (armadillo800-eva)
> >
> > - "PATCH] soc: renesas: rmobile-sysc: Set OF_POPULATED and absorb
> > reset handling" is no longer needed
> > https://lore.kernel.org/linux-arm-kernel/[email protected]/
>
> Good to see more evidence that this series is fixing things at a more
> generic level.

I spoke too soon: if CONFIG_POWER_RESET_RMOBILE=n,
booting fails again, as everything is waiting on the system controller,
which never becomes available.
Rcar-sysc doesn't suffer from this problem, cfr. above.
Perhaps because the rmobile-sysc bindings use a hierarchical instead
of a linear PM domain description, and thus consumers point to the
children of the system controller node?
Cfr. system-controller@e6180000 in arch/arm/boot/dts/r8a7740.dtsi.

> > - On R-Mobile A1, I get a BUG and a memory leak:
> >
> > BUG: spinlock bad magic on CPU#0, swapper/1
> > lock: lcdc0_device+0x10c/0x308, .magic: 00000000, .owner:
> > <none>/-1, .owner_cpu: 0
> > CPU: 0 PID: 1 Comm: swapper Not tainted
> > 5.11.0-rc5-armadillo-00032-gf0a85c26907e #266
> > Hardware name: Generic R8A7740 (Flattened Device Tree)
> > [<c010c3c8>] (unwind_backtrace) from [<c010a49c>]
> > (show_stack+0x10/0x14)
> > [<c010a49c>] (show_stack) from [<c0159534>]
> > (do_raw_spin_lock+0x20/0x94)
> > [<c0159534>] (do_raw_spin_lock) from [<c04089d8>]
> > (dev_pm_get_subsys_data+0x30/0xa0)
> > [<c04089d8>] (dev_pm_get_subsys_data) from [<c0413698>]
> > (genpd_add_device+0x34/0x1c0)
> > [<c0413698>] (genpd_add_device) from [<c041389c>]
> > (of_genpd_add_device+0x34/0x4c)
> > [<c041389c>] (of_genpd_add_device) from [<c0a1e9bc>]
> > (board_staging_register_device+0xf8/0x118)
> > [<c0a1e9bc>] (board_staging_register_device) from

This is indeed a pre-existing problem.
of_genpd_add_device() is called before platform_device_register(),
as it needs to attach the genpd before the device is probed.
But the spinlock is only initialized when the device is registered.
This was masked before due to an unrelated wait context check failure,
which disabled any further spinlock checks, and exposed by fw_devlinks
changing probe order.
Patch sent.
"[PATCH] staging: board: Fix uninitialized spinlock when attaching genpd"
https://lore.kernel.org/r/[email protected]



Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2021-02-15 21:29:07

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v4 0/8] Make fw_devlink=on more forgiving

On Mon, Feb 15, 2021 at 4:38 AM Geert Uytterhoeven <[email protected]> wrote:
>
> Hi Saravana,
>
> On Fri, Feb 12, 2021 at 4:00 AM Saravana Kannan <[email protected]> wrote:
> > On Thu, Feb 11, 2021 at 5:00 AM Geert Uytterhoeven <[email protected]> wrote:
> > > - I2C on R-Car Gen3 does not seem to use DMA, according to
> > > /sys/kernel/debug/dmaengine/summary:
> > >
> > > -dma4chan0 | e66d8000.i2c:tx
> > > -dma4chan1 | e66d8000.i2c:rx
> > > -dma5chan0 | e6510000.i2c:tx
> >
> > I think I need more context on the problem before I can try to fix it.
> > I'm also very unfamiliar with that file. With fw_devlink=permissive,
> > I2C was using DMA? If so, the next step is to see if the I2C relative
> > probe order with DMA is getting changed and if so, why.
>
> More detailed log:
>
> platform e66d8000.i2c: Linked as a consumer to e6150000.clock-controller
> platform e66d8000.i2c: Linked as a sync state only consumer to e6055400.gpio
>
> Why is e66d8000.i2c not linked as a consumer to e6700000.dma-controller?

Because fw_devlink.strict=1 is not set and dma/iommu is considered an
"optional"/"driver decides" dependency.

> platform e6700000.dma-controller: Linked as a consumer to
> e6150000.clock-controller

Is this the only supplier of dma-controller?

> platform e66d8000.i2c: Added to deferred list
> platform e6700000.dma-controller: Added to deferred list
>
> bus: 'platform': driver_probe_device: matched device
> e6700000.dma-controller with driver rcar-dmac
> bus: 'platform': really_probe: probing driver rcar-dmac with
> device e6700000.dma-controller
> platform e6700000.dma-controller: Driver rcar-dmac requests probe deferral
>
> bus: 'platform': driver_probe_device: matched device e66d8000.i2c
> with driver i2c-rcar
> bus: 'platform': really_probe: probing driver i2c-rcar with device
> e66d8000.i2c
>
> I2C becomes available...
>
> i2c-rcar e66d8000.i2c: request_channel failed for tx (-517)
> [...]
>
> but DMA is not available yet, so the driver falls back to PIO.
>
> driver: 'i2c-rcar': driver_bound: bound to device 'e66d8000.i2c'
> bus: 'platform': really_probe: bound device e66d8000.i2c to driver i2c-rcar
>
> platform e6700000.dma-controller: Retrying from deferred list
> bus: 'platform': driver_probe_device: matched device
> e6700000.dma-controller with driver rcar-dmac
> bus: 'platform': really_probe: probing driver rcar-dmac with
> device e6700000.dma-controller
> platform e6700000.dma-controller: Driver rcar-dmac requests probe deferral
> platform e6700000.dma-controller: Added to deferred list
> platform e6700000.dma-controller: Retrying from deferred list
> bus: 'platform': driver_probe_device: matched device
> e6700000.dma-controller with driver rcar-dmac
> bus: 'platform': really_probe: probing driver rcar-dmac with
> device e6700000.dma-controller
> driver: 'rcar-dmac': driver_bound: bound to device 'e6700000.dma-controller'
> bus: 'platform': really_probe: bound device
> e6700000.dma-controller to driver rcar-dmac
>
> DMA becomes available.
>
> Here userspace is entered. /sys/kernel/debug/dmaengine/summary shows
> that the I2C controllers do not have DMA channels allocated, as the
> kernel has performed no more I2C transfers after DMA became available.
>
> Using i2cdetect shows that DMA is used, which is good:
>
> i2c-rcar e66d8000.i2c: got DMA channel for rx
>
> With permissive devlinks, the clock controller consumers are not added
> to the deferred probing list, and probe order is slightly different.
> The I2C controllers are still probed before the DMA controllers.
> But DMA becomes available a bit earlier, before the probing of the last
> I2C slave driver.

This seems like a race? I'm guessing it's two different threads
probing those two devices? And it just happens to work for
"permissive" assuming the boot timing doesn't change?

> Hence /sys/kernel/debug/dmaengine/summary shows that
> some I2C transfers did use DMA.
>
> So the real issue is that e66d8000.i2c not linked as a consumer to
> e6700000.dma-controller.

That's because fw_devlink.strict=1 isn't set. If you need DMA to be
treated as a mandatory supplier, you'll need to set the flag.

Is fw_devlink=on really breaking anything here? It just seems like
"permissive" got lucky with the timing and it could break at any point
in the future. Thought?

-Saravana

2021-02-15 21:59:35

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v4 0/8] Make fw_devlink=on more forgiving

Hi Geert,

On Mon, Feb 15, 2021 at 7:16 AM Geert Uytterhoeven <[email protected]> wrote:
>
> Hi Saravana,
>
> On Fri, Feb 12, 2021 at 4:00 AM Saravana Kannan <[email protected]> wrote:
> > On Thu, Feb 11, 2021 at 5:00 AM Geert Uytterhoeven <[email protected]> wrote:
> > > 1. R-Car Gen2 (Koelsch), R-Car Gen3 (Salvator-X(S), Ebisu).
> > >
> > > - Commit 2dfc564bda4a31bc ("soc: renesas: rcar-sysc: Mark device
> > > node OF_POPULATED after init") is no longer needed (but already
> > > queued for v5.12 anyway)
> >
> > Rob doesn't like the proliferation of OF_POPULATED and we don't need
> > it anymore, so maybe work it out with him? It's a balance between some
> > wasted memory (struct device(s)) vs not proliferating OF_POPULATED.
>
> > > 2. SH/R-Mobile AG5 (kzm9g), APE6 (ape6evm), A1 (armadillo800-eva)
> > >
> > > - "PATCH] soc: renesas: rmobile-sysc: Set OF_POPULATED and absorb
> > > reset handling" is no longer needed
> > > https://lore.kernel.org/linux-arm-kernel/[email protected]/
> >
> > Good to see more evidence that this series is fixing things at a more
> > generic level.
>
> I spoke too soon: if CONFIG_POWER_RESET_RMOBILE=n,
> booting fails again, as everything is waiting on the system controller,
> which never becomes available.
> Rcar-sysc doesn't suffer from this problem, cfr. above.
> Perhaps because the rmobile-sysc bindings use a hierarchical instead
> of a linear PM domain description, and thus consumers point to the
> children of the system controller node?
> Cfr. system-controller@e6180000 in arch/arm/boot/dts/r8a7740.dtsi.

Ok, I see what's going on. The problem is that the "power domain"
fwnode being registered is not the node that contains the "compatible"
property and becomes a device. So this patch[1] is not helping here.
Fix is to do something like this (to avoid using OF_POPULATED flag and
breaking reset):

diff --git a/drivers/soc/renesas/rmobile-sysc.c
b/drivers/soc/renesas/rmobile-sysc.c
index 9046b8c933cb..b7e66139ef7d 100644
--- a/drivers/soc/renesas/rmobile-sysc.c
+++ b/drivers/soc/renesas/rmobile-sysc.c
@@ -344,6 +344,7 @@ static int __init rmobile_init_pm_domains(void)
of_node_put(np);
break;
}
+ fwnode_dev_initialized(&np->fwnode, true);
}

put_special_pds();

Can you give it a shot?

[1] - https://lore.kernel.org/lkml/[email protected]/

> > > - On R-Mobile A1, I get a BUG and a memory leak:
> > >
> > > BUG: spinlock bad magic on CPU#0, swapper/1
> > > lock: lcdc0_device+0x10c/0x308, .magic: 00000000, .owner:
> > > <none>/-1, .owner_cpu: 0
> > > CPU: 0 PID: 1 Comm: swapper Not tainted
> > > 5.11.0-rc5-armadillo-00032-gf0a85c26907e #266
> > > Hardware name: Generic R8A7740 (Flattened Device Tree)
> > > [<c010c3c8>] (unwind_backtrace) from [<c010a49c>]
> > > (show_stack+0x10/0x14)
> > > [<c010a49c>] (show_stack) from [<c0159534>]
> > > (do_raw_spin_lock+0x20/0x94)
> > > [<c0159534>] (do_raw_spin_lock) from [<c04089d8>]
> > > (dev_pm_get_subsys_data+0x30/0xa0)
> > > [<c04089d8>] (dev_pm_get_subsys_data) from [<c0413698>]
> > > (genpd_add_device+0x34/0x1c0)
> > > [<c0413698>] (genpd_add_device) from [<c041389c>]
> > > (of_genpd_add_device+0x34/0x4c)
> > > [<c041389c>] (of_genpd_add_device) from [<c0a1e9bc>]
> > > (board_staging_register_device+0xf8/0x118)
> > > [<c0a1e9bc>] (board_staging_register_device) from
>
> This is indeed a pre-existing problem.
> of_genpd_add_device() is called before platform_device_register(),
> as it needs to attach the genpd before the device is probed.
> But the spinlock is only initialized when the device is registered.
> This was masked before due to an unrelated wait context check failure,
> which disabled any further spinlock checks, and exposed by fw_devlinks
> changing probe order.
> Patch sent.
> "[PATCH] staging: board: Fix uninitialized spinlock when attaching genpd"
> https://lore.kernel.org/r/[email protected]
>

Great!

Thanks,
Saravana

2021-02-16 08:07:45

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v4 0/8] Make fw_devlink=on more forgiving

Hi Saravana,

On Mon, Feb 15, 2021 at 10:27 PM Saravana Kannan <[email protected]> wrote:
> On Mon, Feb 15, 2021 at 4:38 AM Geert Uytterhoeven <[email protected]> wrote:
> > On Fri, Feb 12, 2021 at 4:00 AM Saravana Kannan <[email protected]> wrote:
> > > On Thu, Feb 11, 2021 at 5:00 AM Geert Uytterhoeven <[email protected]> wrote:
> > > > - I2C on R-Car Gen3 does not seem to use DMA, according to
> > > > /sys/kernel/debug/dmaengine/summary:
> > > >
> > > > -dma4chan0 | e66d8000.i2c:tx
> > > > -dma4chan1 | e66d8000.i2c:rx
> > > > -dma5chan0 | e6510000.i2c:tx
> > >
> > > I think I need more context on the problem before I can try to fix it.
> > > I'm also very unfamiliar with that file. With fw_devlink=permissive,
> > > I2C was using DMA? If so, the next step is to see if the I2C relative
> > > probe order with DMA is getting changed and if so, why.
> >
> > More detailed log:
> >
> > platform e66d8000.i2c: Linked as a consumer to e6150000.clock-controller
> > platform e66d8000.i2c: Linked as a sync state only consumer to e6055400.gpio
> >
> > Why is e66d8000.i2c not linked as a consumer to e6700000.dma-controller?
>
> Because fw_devlink.strict=1 is not set and dma/iommu is considered an
> "optional"/"driver decides" dependency.

Oh, I thought dma/iommu were considered mandatory initially,
but dropped as dependencies in the late boot process?

>
> > platform e6700000.dma-controller: Linked as a consumer to
> > e6150000.clock-controller
>
> Is this the only supplier of dma-controller?

No, e6180000.system-controller is also a supplier.

> > platform e66d8000.i2c: Added to deferred list
> > platform e6700000.dma-controller: Added to deferred list
> >
> > bus: 'platform': driver_probe_device: matched device
> > e6700000.dma-controller with driver rcar-dmac
> > bus: 'platform': really_probe: probing driver rcar-dmac with
> > device e6700000.dma-controller
> > platform e6700000.dma-controller: Driver rcar-dmac requests probe deferral
> >
> > bus: 'platform': driver_probe_device: matched device e66d8000.i2c
> > with driver i2c-rcar
> > bus: 'platform': really_probe: probing driver i2c-rcar with device
> > e66d8000.i2c
> >
> > I2C becomes available...
> >
> > i2c-rcar e66d8000.i2c: request_channel failed for tx (-517)
> > [...]
> >
> > but DMA is not available yet, so the driver falls back to PIO.
> >
> > driver: 'i2c-rcar': driver_bound: bound to device 'e66d8000.i2c'
> > bus: 'platform': really_probe: bound device e66d8000.i2c to driver i2c-rcar
> >
> > platform e6700000.dma-controller: Retrying from deferred list
> > bus: 'platform': driver_probe_device: matched device
> > e6700000.dma-controller with driver rcar-dmac
> > bus: 'platform': really_probe: probing driver rcar-dmac with
> > device e6700000.dma-controller
> > platform e6700000.dma-controller: Driver rcar-dmac requests probe deferral
> > platform e6700000.dma-controller: Added to deferred list
> > platform e6700000.dma-controller: Retrying from deferred list
> > bus: 'platform': driver_probe_device: matched device
> > e6700000.dma-controller with driver rcar-dmac
> > bus: 'platform': really_probe: probing driver rcar-dmac with
> > device e6700000.dma-controller
> > driver: 'rcar-dmac': driver_bound: bound to device 'e6700000.dma-controller'
> > bus: 'platform': really_probe: bound device
> > e6700000.dma-controller to driver rcar-dmac
> >
> > DMA becomes available.
> >
> > Here userspace is entered. /sys/kernel/debug/dmaengine/summary shows
> > that the I2C controllers do not have DMA channels allocated, as the
> > kernel has performed no more I2C transfers after DMA became available.
> >
> > Using i2cdetect shows that DMA is used, which is good:
> >
> > i2c-rcar e66d8000.i2c: got DMA channel for rx
> >
> > With permissive devlinks, the clock controller consumers are not added
> > to the deferred probing list, and probe order is slightly different.
> > The I2C controllers are still probed before the DMA controllers.
> > But DMA becomes available a bit earlier, before the probing of the last
> > I2C slave driver.
>
> This seems like a race? I'm guessing it's two different threads
> probing those two devices? And it just happens to work for
> "permissive" assuming the boot timing doesn't change?
>
> > Hence /sys/kernel/debug/dmaengine/summary shows that
> > some I2C transfers did use DMA.
> >
> > So the real issue is that e66d8000.i2c not linked as a consumer to
> > e6700000.dma-controller.
>
> That's because fw_devlink.strict=1 isn't set. If you need DMA to be
> treated as a mandatory supplier, you'll need to set the flag.
>
> Is fw_devlink=on really breaking anything here? It just seems like
> "permissive" got lucky with the timing and it could break at any point
> in the future. Thought?

I don't think there is a race. fw_devlinks calling driver_deferred_probe_add()
on all consumers has a big impact on probe order.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2021-02-16 13:01:42

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v4 0/8] Make fw_devlink=on more forgiving

Hi Saravana,

On Mon, Feb 15, 2021 at 10:57 PM Saravana Kannan <[email protected]> wrote:
> On Mon, Feb 15, 2021 at 7:16 AM Geert Uytterhoeven <[email protected]> wrote:
> > On Fri, Feb 12, 2021 at 4:00 AM Saravana Kannan <[email protected]> wrote:
> > > On Thu, Feb 11, 2021 at 5:00 AM Geert Uytterhoeven <[email protected]> wrote:
> > > > 1. R-Car Gen2 (Koelsch), R-Car Gen3 (Salvator-X(S), Ebisu).
> > > >
> > > > - Commit 2dfc564bda4a31bc ("soc: renesas: rcar-sysc: Mark device
> > > > node OF_POPULATED after init") is no longer needed (but already
> > > > queued for v5.12 anyway)
> > >
> > > Rob doesn't like the proliferation of OF_POPULATED and we don't need
> > > it anymore, so maybe work it out with him? It's a balance between some
> > > wasted memory (struct device(s)) vs not proliferating OF_POPULATED.
> >
> > > > 2. SH/R-Mobile AG5 (kzm9g), APE6 (ape6evm), A1 (armadillo800-eva)
> > > >
> > > > - "PATCH] soc: renesas: rmobile-sysc: Set OF_POPULATED and absorb
> > > > reset handling" is no longer needed
> > > > https://lore.kernel.org/linux-arm-kernel/[email protected]/
> > >
> > > Good to see more evidence that this series is fixing things at a more
> > > generic level.
> >
> > I spoke too soon: if CONFIG_POWER_RESET_RMOBILE=n,
> > booting fails again, as everything is waiting on the system controller,
> > which never becomes available.
> > Rcar-sysc doesn't suffer from this problem, cfr. above.
> > Perhaps because the rmobile-sysc bindings use a hierarchical instead
> > of a linear PM domain description, and thus consumers point to the
> > children of the system controller node?
> > Cfr. system-controller@e6180000 in arch/arm/boot/dts/r8a7740.dtsi.
>
> Ok, I see what's going on. The problem is that the "power domain"
> fwnode being registered is not the node that contains the "compatible"
> property and becomes a device. So this patch[1] is not helping here.
> Fix is to do something like this (to avoid using OF_POPULATED flag and
> breaking reset):
>
> diff --git a/drivers/soc/renesas/rmobile-sysc.c
> b/drivers/soc/renesas/rmobile-sysc.c
> index 9046b8c933cb..b7e66139ef7d 100644
> --- a/drivers/soc/renesas/rmobile-sysc.c
> +++ b/drivers/soc/renesas/rmobile-sysc.c
> @@ -344,6 +344,7 @@ static int __init rmobile_init_pm_domains(void)
> of_node_put(np);
> break;
> }
> + fwnode_dev_initialized(&np->fwnode, true);
> }
>
> put_special_pds();
>
> Can you give it a shot?

Thanks, works. Patch sent
"[PATCH v2] soc: renesas: rmobile-sysc: Mark fwnode when PM domain is added"
https://lore.kernel.org/linux-arm-kernel/[email protected]/

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2021-02-16 18:51:43

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v4 0/8] Make fw_devlink=on more forgiving

On Tue, Feb 16, 2021 at 12:05 AM Geert Uytterhoeven
<[email protected]> wrote:
>
> Hi Saravana,
>
> On Mon, Feb 15, 2021 at 10:27 PM Saravana Kannan <[email protected]> wrote:
> > On Mon, Feb 15, 2021 at 4:38 AM Geert Uytterhoeven <[email protected]> wrote:
> > > On Fri, Feb 12, 2021 at 4:00 AM Saravana Kannan <[email protected]> wrote:
> > > > On Thu, Feb 11, 2021 at 5:00 AM Geert Uytterhoeven <[email protected]> wrote:
> > > > > - I2C on R-Car Gen3 does not seem to use DMA, according to
> > > > > /sys/kernel/debug/dmaengine/summary:
> > > > >
> > > > > -dma4chan0 | e66d8000.i2c:tx
> > > > > -dma4chan1 | e66d8000.i2c:rx
> > > > > -dma5chan0 | e6510000.i2c:tx
> > > >
> > > > I think I need more context on the problem before I can try to fix it.
> > > > I'm also very unfamiliar with that file. With fw_devlink=permissive,
> > > > I2C was using DMA? If so, the next step is to see if the I2C relative
> > > > probe order with DMA is getting changed and if so, why.
> > >
> > > More detailed log:
> > >
> > > platform e66d8000.i2c: Linked as a consumer to e6150000.clock-controller
> > > platform e66d8000.i2c: Linked as a sync state only consumer to e6055400.gpio
> > >
> > > Why is e66d8000.i2c not linked as a consumer to e6700000.dma-controller?
> >
> > Because fw_devlink.strict=1 is not set and dma/iommu is considered an
> > "optional"/"driver decides" dependency.
>
> Oh, I thought dma/iommu were considered mandatory initially,
> but dropped as dependencies in the late boot process?

No, I didn't do that in case the drivers that didn't need the
IOMMU/DMA were sensitive to probe order.

My goal was for fw_devlink=on to not affect probe order for devices
that currently don't need to defer probe. But see below...

>
> >
> > > platform e6700000.dma-controller: Linked as a consumer to
> > > e6150000.clock-controller
> >
> > Is this the only supplier of dma-controller?
>
> No, e6180000.system-controller is also a supplier.
>
> > > platform e66d8000.i2c: Added to deferred list
> > > platform e6700000.dma-controller: Added to deferred list
> > >
> > > bus: 'platform': driver_probe_device: matched device
> > > e6700000.dma-controller with driver rcar-dmac
> > > bus: 'platform': really_probe: probing driver rcar-dmac with
> > > device e6700000.dma-controller
> > > platform e6700000.dma-controller: Driver rcar-dmac requests probe deferral
> > >
> > > bus: 'platform': driver_probe_device: matched device e66d8000.i2c
> > > with driver i2c-rcar
> > > bus: 'platform': really_probe: probing driver i2c-rcar with device
> > > e66d8000.i2c
> > >
> > > I2C becomes available...
> > >
> > > i2c-rcar e66d8000.i2c: request_channel failed for tx (-517)
> > > [...]
> > >
> > > but DMA is not available yet, so the driver falls back to PIO.
> > >
> > > driver: 'i2c-rcar': driver_bound: bound to device 'e66d8000.i2c'
> > > bus: 'platform': really_probe: bound device e66d8000.i2c to driver i2c-rcar
> > >
> > > platform e6700000.dma-controller: Retrying from deferred list
> > > bus: 'platform': driver_probe_device: matched device
> > > e6700000.dma-controller with driver rcar-dmac
> > > bus: 'platform': really_probe: probing driver rcar-dmac with
> > > device e6700000.dma-controller
> > > platform e6700000.dma-controller: Driver rcar-dmac requests probe deferral
> > > platform e6700000.dma-controller: Added to deferred list
> > > platform e6700000.dma-controller: Retrying from deferred list
> > > bus: 'platform': driver_probe_device: matched device
> > > e6700000.dma-controller with driver rcar-dmac
> > > bus: 'platform': really_probe: probing driver rcar-dmac with
> > > device e6700000.dma-controller
> > > driver: 'rcar-dmac': driver_bound: bound to device 'e6700000.dma-controller'
> > > bus: 'platform': really_probe: bound device
> > > e6700000.dma-controller to driver rcar-dmac
> > >
> > > DMA becomes available.
> > >
> > > Here userspace is entered. /sys/kernel/debug/dmaengine/summary shows
> > > that the I2C controllers do not have DMA channels allocated, as the
> > > kernel has performed no more I2C transfers after DMA became available.
> > >
> > > Using i2cdetect shows that DMA is used, which is good:
> > >
> > > i2c-rcar e66d8000.i2c: got DMA channel for rx
> > >
> > > With permissive devlinks, the clock controller consumers are not added
> > > to the deferred probing list, and probe order is slightly different.
> > > The I2C controllers are still probed before the DMA controllers.
> > > But DMA becomes available a bit earlier, before the probing of the last
> > > I2C slave driver.
> >
> > This seems like a race? I'm guessing it's two different threads
> > probing those two devices? And it just happens to work for
> > "permissive" assuming the boot timing doesn't change?
> >
> > > Hence /sys/kernel/debug/dmaengine/summary shows that
> > > some I2C transfers did use DMA.
> > >
> > > So the real issue is that e66d8000.i2c not linked as a consumer to
> > > e6700000.dma-controller.
> >
> > That's because fw_devlink.strict=1 isn't set. If you need DMA to be
> > treated as a mandatory supplier, you'll need to set the flag.
> >
> > Is fw_devlink=on really breaking anything here? It just seems like
> > "permissive" got lucky with the timing and it could break at any point
> > in the future. Thought?
>
> I don't think there is a race.

Can you explain more please? This below makes it sound like DMA just
sneaks in at the last minute.

> > > The I2C controllers are still probed before the DMA controllers.
> > > But DMA becomes available a bit earlier, before the probing of the last
> > > I2C slave driver.

> fw_devlinks calling driver_deferred_probe_add()
> on all consumers has a big impact on probe order.

Ugh... yeah. That's the real issue. This is really a device links
issue that fw_devlink is exposing. I already have a bunch of things in
my TODO list to improve deferred probing and probe ordering. Since
this is not causing boot issues (only DMA issue) with fw_devlink=on,
can we treat this as not a blocking item for fw_devlink=on? Once I go
through my TODO list, it should be fixed (by not changing probe
ordering unnecessarily). And if not, I can help find out a different
solution at that point.

Also, if you have IOMMU drivers, then fw_devlink.strict is also
another solution that's available. On a separate note (not a final
fix), I was wondering if we should have a config for fw_devlink.strict
default value and then have it selected when IOMMU drivers configs are
enabled.

-Saravana

2021-02-16 20:33:52

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v4 0/8] Make fw_devlink=on more forgiving

Hi Saravana,

On Tue, Feb 16, 2021 at 7:49 PM Saravana Kannan <[email protected]> wrote:
> On Tue, Feb 16, 2021 at 12:05 AM Geert Uytterhoeven
> <[email protected]> wrote:
> > On Mon, Feb 15, 2021 at 10:27 PM Saravana Kannan <[email protected]> wrote:
> > > On Mon, Feb 15, 2021 at 4:38 AM Geert Uytterhoeven <[email protected]> wrote:
> > > > On Fri, Feb 12, 2021 at 4:00 AM Saravana Kannan <[email protected]> wrote:
> > > > > On Thu, Feb 11, 2021 at 5:00 AM Geert Uytterhoeven <[email protected]> wrote:
> > > > > > - I2C on R-Car Gen3 does not seem to use DMA, according to
> > > > > > /sys/kernel/debug/dmaengine/summary:
> > > > > >
> > > > > > -dma4chan0 | e66d8000.i2c:tx
> > > > > > -dma4chan1 | e66d8000.i2c:rx
> > > > > > -dma5chan0 | e6510000.i2c:tx
> > > > >
> > > > > I think I need more context on the problem before I can try to fix it.
> > > > > I'm also very unfamiliar with that file. With fw_devlink=permissive,
> > > > > I2C was using DMA? If so, the next step is to see if the I2C relative
> > > > > probe order with DMA is getting changed and if so, why.
> > > >
> > > > More detailed log:
> > > >
> > > > platform e66d8000.i2c: Linked as a consumer to e6150000.clock-controller
> > > > platform e66d8000.i2c: Linked as a sync state only consumer to e6055400.gpio
> > > >
> > > > Why is e66d8000.i2c not linked as a consumer to e6700000.dma-controller?
> > >
> > > Because fw_devlink.strict=1 is not set and dma/iommu is considered an
> > > "optional"/"driver decides" dependency.
> >
> > Oh, I thought dma/iommu were considered mandatory initially,
> > but dropped as dependencies in the late boot process?
>
> No, I didn't do that in case the drivers that didn't need the
> IOMMU/DMA were sensitive to probe order.
>
> My goal was for fw_devlink=on to not affect probe order for devices
> that currently don't need to defer probe. But see below...
>
> >
> > >
> > > > platform e6700000.dma-controller: Linked as a consumer to
> > > > e6150000.clock-controller
> > >
> > > Is this the only supplier of dma-controller?
> >
> > No, e6180000.system-controller is also a supplier.
> >
> > > > platform e66d8000.i2c: Added to deferred list
> > > > platform e6700000.dma-controller: Added to deferred list
> > > >
> > > > bus: 'platform': driver_probe_device: matched device
> > > > e6700000.dma-controller with driver rcar-dmac
> > > > bus: 'platform': really_probe: probing driver rcar-dmac with
> > > > device e6700000.dma-controller
> > > > platform e6700000.dma-controller: Driver rcar-dmac requests probe deferral
> > > >
> > > > bus: 'platform': driver_probe_device: matched device e66d8000.i2c
> > > > with driver i2c-rcar
> > > > bus: 'platform': really_probe: probing driver i2c-rcar with device
> > > > e66d8000.i2c
> > > >
> > > > I2C becomes available...
> > > >
> > > > i2c-rcar e66d8000.i2c: request_channel failed for tx (-517)
> > > > [...]
> > > >
> > > > but DMA is not available yet, so the driver falls back to PIO.
> > > >
> > > > driver: 'i2c-rcar': driver_bound: bound to device 'e66d8000.i2c'
> > > > bus: 'platform': really_probe: bound device e66d8000.i2c to driver i2c-rcar
> > > >
> > > > platform e6700000.dma-controller: Retrying from deferred list
> > > > bus: 'platform': driver_probe_device: matched device
> > > > e6700000.dma-controller with driver rcar-dmac
> > > > bus: 'platform': really_probe: probing driver rcar-dmac with
> > > > device e6700000.dma-controller
> > > > platform e6700000.dma-controller: Driver rcar-dmac requests probe deferral
> > > > platform e6700000.dma-controller: Added to deferred list
> > > > platform e6700000.dma-controller: Retrying from deferred list
> > > > bus: 'platform': driver_probe_device: matched device
> > > > e6700000.dma-controller with driver rcar-dmac
> > > > bus: 'platform': really_probe: probing driver rcar-dmac with
> > > > device e6700000.dma-controller
> > > > driver: 'rcar-dmac': driver_bound: bound to device 'e6700000.dma-controller'
> > > > bus: 'platform': really_probe: bound device
> > > > e6700000.dma-controller to driver rcar-dmac
> > > >
> > > > DMA becomes available.
> > > >
> > > > Here userspace is entered. /sys/kernel/debug/dmaengine/summary shows
> > > > that the I2C controllers do not have DMA channels allocated, as the
> > > > kernel has performed no more I2C transfers after DMA became available.
> > > >
> > > > Using i2cdetect shows that DMA is used, which is good:
> > > >
> > > > i2c-rcar e66d8000.i2c: got DMA channel for rx
> > > >
> > > > With permissive devlinks, the clock controller consumers are not added
> > > > to the deferred probing list, and probe order is slightly different.
> > > > The I2C controllers are still probed before the DMA controllers.
> > > > But DMA becomes available a bit earlier, before the probing of the last
> > > > I2C slave driver.
> > >
> > > This seems like a race? I'm guessing it's two different threads
> > > probing those two devices? And it just happens to work for
> > > "permissive" assuming the boot timing doesn't change?
> > >
> > > > Hence /sys/kernel/debug/dmaengine/summary shows that
> > > > some I2C transfers did use DMA.
> > > >
> > > > So the real issue is that e66d8000.i2c not linked as a consumer to
> > > > e6700000.dma-controller.
> > >
> > > That's because fw_devlink.strict=1 isn't set. If you need DMA to be
> > > treated as a mandatory supplier, you'll need to set the flag.
> > >
> > > Is fw_devlink=on really breaking anything here? It just seems like
> > > "permissive" got lucky with the timing and it could break at any point
> > > in the future. Thought?
> >
> > I don't think there is a race.
>
> Can you explain more please? This below makes it sound like DMA just
> sneaks in at the last minute.

Yes it does, as the DMAC also has a consumer link to the IOMMU.
If you ignore the consumer link from I2C to DMAC, the I2C device has
less dependencies than the DMAC, so the I2C device, and the
devices on the I2C bus, are probed much earlier than the DMAC.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2021-02-25 10:00:14

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v4 0/8] Make fw_devlink=on more forgiving

Hi Saravana,

On Thu, Feb 18, 2021 at 12:57 AM Saravana Kannan <[email protected]> wrote:
> On Tue, Feb 16, 2021 at 12:31 PM Geert Uytterhoeven
> <[email protected]> wrote:
> > On Tue, Feb 16, 2021 at 7:49 PM Saravana Kannan <[email protected]> wrote:
> > > On Tue, Feb 16, 2021 at 12:05 AM Geert Uytterhoeven
> > > <[email protected]> wrote:
> > > > On Mon, Feb 15, 2021 at 10:27 PM Saravana Kannan <[email protected]> wrote:
> > > > > On Mon, Feb 15, 2021 at 4:38 AM Geert Uytterhoeven <[email protected]> wrote:
> > > > > > On Fri, Feb 12, 2021 at 4:00 AM Saravana Kannan <[email protected]> wrote:
> > > > > > > On Thu, Feb 11, 2021 at 5:00 AM Geert Uytterhoeven <[email protected]> wrote:
> > > > > > > > - I2C on R-Car Gen3 does not seem to use DMA, according to
> > > > > > > > /sys/kernel/debug/dmaengine/summary:
> > > > > > > >
> > > > > > > > -dma4chan0 | e66d8000.i2c:tx
> > > > > > > > -dma4chan1 | e66d8000.i2c:rx
> > > > > > > > -dma5chan0 | e6510000.i2c:tx
> > > > > > >
> > > > > > > I think I need more context on the problem before I can try to fix it.
> > > > > > > I'm also very unfamiliar with that file. With fw_devlink=permissive,
> > > > > > > I2C was using DMA? If so, the next step is to see if the I2C relative
> > > > > > > probe order with DMA is getting changed and if so, why.
> > > > > >
> > > > > > More detailed log:
> > > > > >
> > > > > > platform e66d8000.i2c: Linked as a consumer to e6150000.clock-controller
> > > > > > platform e66d8000.i2c: Linked as a sync state only consumer to e6055400.gpio
> > > > > >
> > > > > > Why is e66d8000.i2c not linked as a consumer to e6700000.dma-controller?
> > > > >
> > > > > Because fw_devlink.strict=1 is not set and dma/iommu is considered an
> > > > > "optional"/"driver decides" dependency.
> > > >
> > > > Oh, I thought dma/iommu were considered mandatory initially,
> > > > but dropped as dependencies in the late boot process?
> > >
> > > No, I didn't do that in case the drivers that didn't need the
> > > IOMMU/DMA were sensitive to probe order.
> > >
> > > My goal was for fw_devlink=on to not affect probe order for devices
> > > that currently don't need to defer probe. But see below...
> > >
> > > >
> > > > >
> > > > > > platform e6700000.dma-controller: Linked as a consumer to
> > > > > > e6150000.clock-controller
> > > > >
> > > > > Is this the only supplier of dma-controller?
> > > >
> > > > No, e6180000.system-controller is also a supplier.
> > > >
> > > > > > platform e66d8000.i2c: Added to deferred list
> > > > > > platform e6700000.dma-controller: Added to deferred list
> > > > > >
> > > > > > bus: 'platform': driver_probe_device: matched device
> > > > > > e6700000.dma-controller with driver rcar-dmac
> > > > > > bus: 'platform': really_probe: probing driver rcar-dmac with
> > > > > > device e6700000.dma-controller
> > > > > > platform e6700000.dma-controller: Driver rcar-dmac requests probe deferral
> > > > > >
> > > > > > bus: 'platform': driver_probe_device: matched device e66d8000.i2c
> > > > > > with driver i2c-rcar
> > > > > > bus: 'platform': really_probe: probing driver i2c-rcar with device
> > > > > > e66d8000.i2c
> > > > > >
> > > > > > I2C becomes available...
> > > > > >
> > > > > > i2c-rcar e66d8000.i2c: request_channel failed for tx (-517)
> > > > > > [...]
> > > > > >
> > > > > > but DMA is not available yet, so the driver falls back to PIO.
> > > > > >
> > > > > > driver: 'i2c-rcar': driver_bound: bound to device 'e66d8000.i2c'
> > > > > > bus: 'platform': really_probe: bound device e66d8000.i2c to driver i2c-rcar
> > > > > >
> > > > > > platform e6700000.dma-controller: Retrying from deferred list
> > > > > > bus: 'platform': driver_probe_device: matched device
> > > > > > e6700000.dma-controller with driver rcar-dmac
> > > > > > bus: 'platform': really_probe: probing driver rcar-dmac with
> > > > > > device e6700000.dma-controller
> > > > > > platform e6700000.dma-controller: Driver rcar-dmac requests probe deferral
> > > > > > platform e6700000.dma-controller: Added to deferred list
> > > > > > platform e6700000.dma-controller: Retrying from deferred list
> > > > > > bus: 'platform': driver_probe_device: matched device
> > > > > > e6700000.dma-controller with driver rcar-dmac
> > > > > > bus: 'platform': really_probe: probing driver rcar-dmac with
> > > > > > device e6700000.dma-controller
> > > > > > driver: 'rcar-dmac': driver_bound: bound to device 'e6700000.dma-controller'
> > > > > > bus: 'platform': really_probe: bound device
> > > > > > e6700000.dma-controller to driver rcar-dmac
> > > > > >
> > > > > > DMA becomes available.
> > > > > >
> > > > > > Here userspace is entered. /sys/kernel/debug/dmaengine/summary shows
> > > > > > that the I2C controllers do not have DMA channels allocated, as the
> > > > > > kernel has performed no more I2C transfers after DMA became available.
> > > > > >
> > > > > > Using i2cdetect shows that DMA is used, which is good:
> > > > > >
> > > > > > i2c-rcar e66d8000.i2c: got DMA channel for rx
> > > > > >
> > > > > > With permissive devlinks, the clock controller consumers are not added
> > > > > > to the deferred probing list, and probe order is slightly different.
> > > > > > The I2C controllers are still probed before the DMA controllers.
> > > > > > But DMA becomes available a bit earlier, before the probing of the last
> > > > > > I2C slave driver.
> > > > >
> > > > > This seems like a race? I'm guessing it's two different threads
> > > > > probing those two devices? And it just happens to work for
> > > > > "permissive" assuming the boot timing doesn't change?
> > > > >
> > > > > > Hence /sys/kernel/debug/dmaengine/summary shows that
> > > > > > some I2C transfers did use DMA.
> > > > > >
> > > > > > So the real issue is that e66d8000.i2c not linked as a consumer to
> > > > > > e6700000.dma-controller.
> > > > >
> > > > > That's because fw_devlink.strict=1 isn't set. If you need DMA to be
> > > > > treated as a mandatory supplier, you'll need to set the flag.
> > > > >
> > > > > Is fw_devlink=on really breaking anything here? It just seems like
> > > > > "permissive" got lucky with the timing and it could break at any point
> > > > > in the future. Thought?
> > > >
> > > > I don't think there is a race.
> > >
> > > Can you explain more please? This below makes it sound like DMA just
> > > sneaks in at the last minute.
> >
> > Yes it does, as the DMAC also has a consumer link to the IOMMU.
> > If you ignore the consumer link from I2C to DMAC, the I2C device has
> > less dependencies than the DMAC, so the I2C device, and the
> > devices on the I2C bus, are probed much earlier than the DMAC.
>
> Can you give this a shot?
> https://lore.kernel.org/lkml/[email protected]/T/#u
>
> It should make sure fw_devlink doesn't add a device to the deferred
> probe list too soon and change the probe ordering unnecessarily.

(FTR, to keep all info in this thread)
Yes, this makes I2C use DMA again on Salvator-XS during kernel boot-up.
I haven't run any more elaborate tests on other platforms.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds