2022-07-01 01:44:13

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH v2 0/2] Fix console probe delay when stdout-path isn't set

These patches are on top of driver-core-next.

Even if stdout-path isn't set in DT, this patch should take console
probe times back to how they were before the deferred_probe_timeout
clean up series[1].

v1->v2:
- Fixed the accidental change that Tobias pointed out.
- Added Tested-by tag

[1] - https://lore.kernel.org/lkml/[email protected]/

-Saravana

cc: Rob Herring <[email protected]>
cc: sascha hauer <[email protected]>
cc: peng fan <[email protected]>
cc: kevin hilman <[email protected]>
cc: ulf hansson <[email protected]>
cc: len brown <[email protected]>
cc: pavel machek <[email protected]>
cc: joerg roedel <[email protected]>
cc: will deacon <[email protected]>
cc: andrew lunn <[email protected]>
cc: heiner kallweit <[email protected]>
cc: russell king <[email protected]>
cc: "david s. miller" <[email protected]>
cc: eric dumazet <[email protected]>
cc: jakub kicinski <[email protected]>
cc: paolo abeni <[email protected]>
cc: linus walleij <[email protected]>
cc: hideaki yoshifuji <[email protected]>
cc: david ahern <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
Cc: [email protected]

Saravana Kannan (2):
driver core: Add probe_no_timeout flag for drivers
serial: Set probe_no_timeout for all DT based drivers

drivers/base/base.h | 1 +
drivers/base/core.c | 7 +++++++
drivers/base/dd.c | 3 +++
drivers/tty/ehv_bytechan.c | 1 +
drivers/tty/goldfish.c | 1 +
drivers/tty/hvc/hvc_opal.c | 1 +
drivers/tty/serial/8250/8250_aspeed_vuart.c | 1 +
drivers/tty/serial/8250/8250_bcm2835aux.c | 1 +
drivers/tty/serial/8250/8250_bcm7271.c | 1 +
drivers/tty/serial/8250/8250_dw.c | 1 +
drivers/tty/serial/8250/8250_em.c | 1 +
drivers/tty/serial/8250/8250_ingenic.c | 1 +
drivers/tty/serial/8250/8250_lpc18xx.c | 1 +
drivers/tty/serial/8250/8250_mtk.c | 1 +
drivers/tty/serial/8250/8250_of.c | 1 +
drivers/tty/serial/8250/8250_omap.c | 1 +
drivers/tty/serial/8250/8250_pxa.c | 1 +
drivers/tty/serial/8250/8250_tegra.c | 1 +
drivers/tty/serial/8250/8250_uniphier.c | 1 +
drivers/tty/serial/altera_jtaguart.c | 1 +
drivers/tty/serial/altera_uart.c | 1 +
drivers/tty/serial/amba-pl011.c | 1 +
drivers/tty/serial/apbuart.c | 1 +
drivers/tty/serial/ar933x_uart.c | 1 +
drivers/tty/serial/arc_uart.c | 1 +
drivers/tty/serial/atmel_serial.c | 1 +
drivers/tty/serial/bcm63xx_uart.c | 1 +
drivers/tty/serial/clps711x.c | 1 +
drivers/tty/serial/cpm_uart/cpm_uart_core.c | 1 +
drivers/tty/serial/digicolor-usart.c | 1 +
drivers/tty/serial/fsl_linflexuart.c | 1 +
drivers/tty/serial/fsl_lpuart.c | 1 +
drivers/tty/serial/imx.c | 1 +
drivers/tty/serial/lantiq.c | 1 +
drivers/tty/serial/liteuart.c | 1 +
drivers/tty/serial/lpc32xx_hs.c | 1 +
drivers/tty/serial/max310x.c | 1 +
drivers/tty/serial/meson_uart.c | 1 +
drivers/tty/serial/milbeaut_usio.c | 1 +
drivers/tty/serial/mpc52xx_uart.c | 1 +
drivers/tty/serial/mps2-uart.c | 1 +
drivers/tty/serial/msm_serial.c | 1 +
drivers/tty/serial/mvebu-uart.c | 1 +
drivers/tty/serial/mxs-auart.c | 1 +
drivers/tty/serial/omap-serial.c | 1 +
drivers/tty/serial/owl-uart.c | 1 +
drivers/tty/serial/pic32_uart.c | 1 +
drivers/tty/serial/pmac_zilog.c | 1 +
drivers/tty/serial/pxa.c | 1 +
drivers/tty/serial/qcom_geni_serial.c | 1 +
drivers/tty/serial/rda-uart.c | 1 +
drivers/tty/serial/samsung_tty.c | 1 +
drivers/tty/serial/sc16is7xx.c | 1 +
drivers/tty/serial/serial-tegra.c | 1 +
drivers/tty/serial/sh-sci.c | 1 +
drivers/tty/serial/sifive.c | 1 +
drivers/tty/serial/sprd_serial.c | 1 +
drivers/tty/serial/st-asc.c | 1 +
drivers/tty/serial/stm32-usart.c | 1 +
drivers/tty/serial/sunhv.c | 1 +
drivers/tty/serial/sunplus-uart.c | 1 +
drivers/tty/serial/sunsab.c | 1 +
drivers/tty/serial/sunsu.c | 1 +
drivers/tty/serial/sunzilog.c | 1 +
drivers/tty/serial/tegra-tcu.c | 1 +
drivers/tty/serial/uartlite.c | 1 +
drivers/tty/serial/ucc_uart.c | 1 +
drivers/tty/serial/vt8500_serial.c | 1 +
drivers/tty/serial/xilinx_uartps.c | 1 +
include/linux/device.h | 7 +++++++
include/linux/device/driver.h | 11 +++++++++++
71 files changed, 95 insertions(+)

--
2.37.0.rc0.161.g10f37bed90-goog


2022-07-01 01:50:42

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH v2 1/2] driver core: Add probe_no_timeout flag for drivers

This flag only needs to be set for drivers of devices that meet all the
following conditions:
- Need to probe successfully before userspace init in started
- Have optional suppliers
- Can't wait for deferred_probe_timeout to expire

fw_devlink=on uses this info, as needed, to ignore dependencies on supplier
devices that have not been added or supplier devices that don't have any
drivers. It's still up to the driver to decide which of the missing
suppliers are optional or not.

Fixes: 71066545b48e ("driver core: Set fw_devlink.strict=1 by default")
Reported-by: Sascha Hauer <[email protected]>
Reported-by: Peng Fan <[email protected]>
Reported-by: Fabio Estevam <[email protected]>
Reported-by: Ahmad Fatoum <[email protected]>
Tested-by: Fabio Estevam <[email protected]>
Signed-off-by: Saravana Kannan <[email protected]>
---
drivers/base/base.h | 1 +
drivers/base/core.c | 7 +++++++
drivers/base/dd.c | 3 +++
include/linux/device.h | 7 +++++++
include/linux/device/driver.h | 11 +++++++++++
5 files changed, 29 insertions(+)

diff --git a/drivers/base/base.h b/drivers/base/base.h
index b3a43a164dcd..149822d2086f 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -193,6 +193,7 @@ extern void device_links_no_driver(struct device *dev);
extern bool device_links_busy(struct device *dev);
extern void device_links_unbind_consumers(struct device *dev);
extern void fw_devlink_drivers_done(void);
+extern void fw_devlink_probe_no_timeout(void);

/* device pm support */
void device_pm_move_to_tail(struct device *dev);
diff --git a/drivers/base/core.c b/drivers/base/core.c
index ccdd5b4295de..8e18904a1584 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -54,6 +54,7 @@ static unsigned int defer_sync_state_count = 1;
static DEFINE_MUTEX(fwnode_link_lock);
static bool fw_devlink_is_permissive(void);
static bool fw_devlink_drv_reg_done;
+static bool fw_devlink_no_timeout;
static bool fw_devlink_best_effort;

/**
@@ -969,6 +970,7 @@ static void device_links_missing_supplier(struct device *dev)
static bool dev_is_best_effort(struct device *dev)
{
return (fw_devlink_best_effort && dev->can_match) ||
+ (fw_devlink_no_timeout && dev->probe_no_timeout) ||
(dev->fwnode && (dev->fwnode->flags & FWNODE_FLAG_BEST_EFFORT));
}

@@ -1688,6 +1690,11 @@ void fw_devlink_drivers_done(void)
device_links_write_unlock();
}

+void fw_devlink_probe_no_timeout(void)
+{
+ fw_devlink_no_timeout = true;
+}
+
/**
* wait_for_init_devices_probe - Try to probe any device needed for init
*
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 70f79fc71539..943b0363aaab 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -324,6 +324,8 @@ static int deferred_probe_initcall(void)

if (!IS_ENABLED(CONFIG_MODULES))
fw_devlink_drivers_done();
+ else
+ fw_devlink_probe_no_timeout();

/*
* Trigger deferred probe again, this time we won't defer anything
@@ -734,6 +736,7 @@ static int __driver_probe_device(struct device_driver *drv, struct device *dev)
return -EBUSY;

dev->can_match = true;
+ dev->probe_no_timeout = drv->probe_no_timeout;
pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
drv->bus->name, __func__, dev_name(dev), drv->name);

diff --git a/include/linux/device.h b/include/linux/device.h
index 424b55df0272..e6246b6cf6cf 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -536,6 +536,12 @@ struct device_physical_location {
* @can_match: The device has matched with a driver at least once or it is in
* a bus (like AMBA) which can't check for matching drivers until
* other devices probe successfully.
+ * @probe_no_timeout: Set by driver core to indicate that this device's probe
+ * can't wait till driver_probe_timeout expires. This information
+ * is used by fw_devlink=on to avoid deferring the probe of this
+ * device to wait on supplier devices that haven't been added or
+ * probed successfully.
+ * See also: probe_no_timeout in struct driver.
* @dma_coherent: this particular device is dma coherent, even if the
* architecture supports non-coherent devices.
* @dma_ops_bypass: If set to %true then the dma_ops are bypassed for the
@@ -642,6 +648,7 @@ struct device {
bool of_node_reused:1;
bool state_synced:1;
bool can_match:1;
+ bool probe_no_timeout:1;
#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h
index 7acaabde5396..2ce60e511504 100644
--- a/include/linux/device/driver.h
+++ b/include/linux/device/driver.h
@@ -55,6 +55,15 @@ enum probe_type {
* @owner: The module owner.
* @mod_name: Used for built-in modules.
* @suppress_bind_attrs: Disables bind/unbind via sysfs.
+ * @probe_no_timeout: Set to true by drivers that bind to devices that meet all
+ * these conditions:
+ * - Need to probe successfully before userspace init in started
+ * - Have optional suppliers
+ * - Can't wait for deferred_probe_timeout to expire
+ * fw_devlink=on uses this info, as needed, to ignore dependencies
+ * on supplier devices that have not been added or supplier devices
+ * that don't have any drivers. It's still up to the driver to
+ * decide which of the missing suppliers are optional or not.
* @probe_type: Type of the probe (synchronous or asynchronous) to use.
* @of_match_table: The open firmware table.
* @acpi_match_table: The ACPI match table.
@@ -101,6 +110,8 @@ struct device_driver {
const char *mod_name; /* used for built-in modules */

bool suppress_bind_attrs; /* disables bind/unbind via sysfs */
+ bool probe_no_timeout;
+
enum probe_type probe_type;

const struct of_device_id *of_match_table;
--
2.37.0.rc0.161.g10f37bed90-goog

2022-08-23 16:49:43

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] Fix console probe delay when stdout-path isn't set

On Thu, Jun 30, 2022 at 06:26:38PM -0700, Saravana Kannan wrote:
> These patches are on top of driver-core-next.
>
> Even if stdout-path isn't set in DT, this patch should take console
> probe times back to how they were before the deferred_probe_timeout
> clean up series[1].

Now dropped from my queue due to lack of a response to other reviewer's
questions.

thanks,

greg k-h

2022-08-24 02:20:06

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] Fix console probe delay when stdout-path isn't set

On Tue, Aug 23, 2022 at 4:25 AM Greg Kroah-Hartman
<[email protected]> wrote:
>
> On Thu, Jun 30, 2022 at 06:26:38PM -0700, Saravana Kannan wrote:
> > These patches are on top of driver-core-next.
> >
> > Even if stdout-path isn't set in DT, this patch should take console
> > probe times back to how they were before the deferred_probe_timeout
> > clean up series[1].
>
> Now dropped from my queue due to lack of a response to other reviewer's
> questions.

Sorry, I somehow missed those emails. I'll respond later today/tomorrow.

-Saravana

2022-09-19 03:52:33

by Olof Johansson

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] Fix console probe delay when stdout-path isn't set

On Tue, Aug 23, 2022 at 8:37 AM Greg Kroah-Hartman
<[email protected]> wrote:
>
> On Thu, Jun 30, 2022 at 06:26:38PM -0700, Saravana Kannan wrote:
> > These patches are on top of driver-core-next.
> >
> > Even if stdout-path isn't set in DT, this patch should take console
> > probe times back to how they were before the deferred_probe_timeout
> > clean up series[1].
>
> Now dropped from my queue due to lack of a response to other reviewer's
> questions.

What happened to this patch? I have a 10 second timeout on console
probe on my SiFive Unmatched, and I don't see this flag being set for
the serial driver. In fact, I don't see it anywhere in-tree. I can't
seem to locate another patchset from Saravana around this though, so
I'm not sure where to look for a missing piece for the sifive serial
driver.

This is the second boot time regression (this one not fatal, unlike
the Layerscape PCIe one) from the fw_devlink patchset.

Greg, can you revert the whole set for 6.0, please? It's obviously
nowhere near tested enough to go in and I expect we'll see a bunch of
-stable fixups due to this if we let it remain in.

This seems to be one of the worst releases I've encountered in recent
years on my hardware here due to this patchset. :-(


-Olof

2022-09-19 09:14:35

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] Fix console probe delay when stdout-path isn't set

On Sun, Sep 18, 2022 at 08:44:27PM -0700, Olof Johansson wrote:
> On Tue, Aug 23, 2022 at 8:37 AM Greg Kroah-Hartman
> <[email protected]> wrote:
> >
> > On Thu, Jun 30, 2022 at 06:26:38PM -0700, Saravana Kannan wrote:
> > > These patches are on top of driver-core-next.
> > >
> > > Even if stdout-path isn't set in DT, this patch should take console
> > > probe times back to how they were before the deferred_probe_timeout
> > > clean up series[1].
> >
> > Now dropped from my queue due to lack of a response to other reviewer's
> > questions.
>
> What happened to this patch? I have a 10 second timeout on console
> probe on my SiFive Unmatched, and I don't see this flag being set for
> the serial driver. In fact, I don't see it anywhere in-tree. I can't
> seem to locate another patchset from Saravana around this though, so
> I'm not sure where to look for a missing piece for the sifive serial
> driver.
>
> This is the second boot time regression (this one not fatal, unlike
> the Layerscape PCIe one) from the fw_devlink patchset.
>
> Greg, can you revert the whole set for 6.0, please? It's obviously
> nowhere near tested enough to go in and I expect we'll see a bunch of
> -stable fixups due to this if we let it remain in.

What exactly is "the whole set"? I have the default option fix queued
up and will send that to Linus later this week (am traveling back from
Plumbers still), but have not heard any problems about any other issues
at all other than your report.

thnaks,

greg k-h

2022-09-19 23:51:14

by Olof Johansson

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] Fix console probe delay when stdout-path isn't set

On Mon, Sep 19, 2022 at 1:44 AM Greg Kroah-Hartman
<[email protected]> wrote:
>
> On Sun, Sep 18, 2022 at 08:44:27PM -0700, Olof Johansson wrote:
> > On Tue, Aug 23, 2022 at 8:37 AM Greg Kroah-Hartman
> > <[email protected]> wrote:
> > >
> > > On Thu, Jun 30, 2022 at 06:26:38PM -0700, Saravana Kannan wrote:
> > > > These patches are on top of driver-core-next.
> > > >
> > > > Even if stdout-path isn't set in DT, this patch should take console
> > > > probe times back to how they were before the deferred_probe_timeout
> > > > clean up series[1].
> > >
> > > Now dropped from my queue due to lack of a response to other reviewer's
> > > questions.
> >
> > What happened to this patch? I have a 10 second timeout on console
> > probe on my SiFive Unmatched, and I don't see this flag being set for
> > the serial driver. In fact, I don't see it anywhere in-tree. I can't
> > seem to locate another patchset from Saravana around this though, so
> > I'm not sure where to look for a missing piece for the sifive serial
> > driver.
> >
> > This is the second boot time regression (this one not fatal, unlike
> > the Layerscape PCIe one) from the fw_devlink patchset.
> >
> > Greg, can you revert the whole set for 6.0, please? It's obviously
> > nowhere near tested enough to go in and I expect we'll see a bunch of
> > -stable fixups due to this if we let it remain in.
>
> What exactly is "the whole set"? I have the default option fix queued
> up and will send that to Linus later this week (am traveling back from
> Plumbers still), but have not heard any problems about any other issues
> at all other than your report.

I stand corrected in this case, the issue on the Hifive Unmatched was
a regression due to a PWM clock change -- I just sent a patch for that
(serial driver fix).

So it seems like as long as the fw_devlink.strict=1 patch is reverted,
things are back to a working state here.

I still struggle with how the fw_devlink patchset is expected to work
though, since DT is expected to describe the hardware configuration,
and it has no knowledge of whether there are drivers that will be
bound to any referenced supplier devnodes. It's not going to work well
to assume that they will always be bound, and to add 10 second
timeouts for those cases isn't a good solution. Seems like the number
of special cases will keep adding up.

The whole design feels like it's falling short, and it's been patched
here and there to deal with the shortcomings, instead of revisiting
the full solution. (The patches are the console one, and another to
deal with nfsroot boots).

As long as it doesn't keep regressing others, I suppose the work to
redesign it can happen in-tree, but it's not usually how we try to do
it for new functionality. Especially since it's still being iterated
on (with active patch sets posted around -rc1 for improvements).

Oh, and one more thing for the future -- the main patch that changes
behavior due to dependency tracking is 2f8c3ae8288e, named "driver
core: Add wait_for_init_devices_probe helper function". It's easy to
overlook this when looking at a list of patches since it's said to
just introduce a helper.


-Olof

2022-09-26 18:39:22

by Rob Herring

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] Fix console probe delay when stdout-path isn't set

On Mon, Sep 19, 2022 at 5:56 PM Olof Johansson <[email protected]> wrote:
>
> On Mon, Sep 19, 2022 at 1:44 AM Greg Kroah-Hartman
> <[email protected]> wrote:
> >
> > On Sun, Sep 18, 2022 at 08:44:27PM -0700, Olof Johansson wrote:
> > > On Tue, Aug 23, 2022 at 8:37 AM Greg Kroah-Hartman
> > > <[email protected]> wrote:
> > > >
> > > > On Thu, Jun 30, 2022 at 06:26:38PM -0700, Saravana Kannan wrote:
> > > > > These patches are on top of driver-core-next.
> > > > >
> > > > > Even if stdout-path isn't set in DT, this patch should take console
> > > > > probe times back to how they were before the deferred_probe_timeout
> > > > > clean up series[1].
> > > >
> > > > Now dropped from my queue due to lack of a response to other reviewer's
> > > > questions.
> > >
> > > What happened to this patch? I have a 10 second timeout on console
> > > probe on my SiFive Unmatched, and I don't see this flag being set for
> > > the serial driver. In fact, I don't see it anywhere in-tree. I can't
> > > seem to locate another patchset from Saravana around this though, so
> > > I'm not sure where to look for a missing piece for the sifive serial
> > > driver.
> > >
> > > This is the second boot time regression (this one not fatal, unlike
> > > the Layerscape PCIe one) from the fw_devlink patchset.
> > >
> > > Greg, can you revert the whole set for 6.0, please? It's obviously
> > > nowhere near tested enough to go in and I expect we'll see a bunch of
> > > -stable fixups due to this if we let it remain in.
> >
> > What exactly is "the whole set"? I have the default option fix queued
> > up and will send that to Linus later this week (am traveling back from
> > Plumbers still), but have not heard any problems about any other issues
> > at all other than your report.
>
> I stand corrected in this case, the issue on the Hifive Unmatched was
> a regression due to a PWM clock change -- I just sent a patch for that
> (serial driver fix).
>
> So it seems like as long as the fw_devlink.strict=1 patch is reverted,
> things are back to a working state here.
>
> I still struggle with how the fw_devlink patchset is expected to work
> though, since DT is expected to describe the hardware configuration,
> and it has no knowledge of whether there are drivers that will be
> bound to any referenced supplier devnodes. It's not going to work well
> to assume that they will always be bound, and to add 10 second
> timeouts for those cases isn't a good solution. Seems like the number
> of special cases will keep adding up.

Since the introduction of deferred probe, the kernel has always
assumed if there is a device described, then there is or will be a
driver for it. The result is you can't use new DTs (if they add
providers) with older kernels.

We've ended up with a timeout because no one has come up with a better
way to handle it. What the kernel needs is userspace saying "I'm done
loading modules", but it's debatable whether that's a good solution
too.

Rob

2022-09-27 13:53:01

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] Fix console probe delay when stdout-path isn't set

On Mon, Sep 26, 2022 at 01:25:05PM -0500, Rob Herring wrote:
> On Mon, Sep 19, 2022 at 5:56 PM Olof Johansson <[email protected]> wrote:
> >
> > On Mon, Sep 19, 2022 at 1:44 AM Greg Kroah-Hartman
> > <[email protected]> wrote:
> > >
> > > On Sun, Sep 18, 2022 at 08:44:27PM -0700, Olof Johansson wrote:
> > > > On Tue, Aug 23, 2022 at 8:37 AM Greg Kroah-Hartman
> > > > <[email protected]> wrote:
> > > > >
> > > > > On Thu, Jun 30, 2022 at 06:26:38PM -0700, Saravana Kannan wrote:
> > > > > > These patches are on top of driver-core-next.
> > > > > >
> > > > > > Even if stdout-path isn't set in DT, this patch should take console
> > > > > > probe times back to how they were before the deferred_probe_timeout
> > > > > > clean up series[1].
> > > > >
> > > > > Now dropped from my queue due to lack of a response to other reviewer's
> > > > > questions.
> > > >
> > > > What happened to this patch? I have a 10 second timeout on console
> > > > probe on my SiFive Unmatched, and I don't see this flag being set for
> > > > the serial driver. In fact, I don't see it anywhere in-tree. I can't
> > > > seem to locate another patchset from Saravana around this though, so
> > > > I'm not sure where to look for a missing piece for the sifive serial
> > > > driver.
> > > >
> > > > This is the second boot time regression (this one not fatal, unlike
> > > > the Layerscape PCIe one) from the fw_devlink patchset.
> > > >
> > > > Greg, can you revert the whole set for 6.0, please? It's obviously
> > > > nowhere near tested enough to go in and I expect we'll see a bunch of
> > > > -stable fixups due to this if we let it remain in.
> > >
> > > What exactly is "the whole set"? I have the default option fix queued
> > > up and will send that to Linus later this week (am traveling back from
> > > Plumbers still), but have not heard any problems about any other issues
> > > at all other than your report.
> >
> > I stand corrected in this case, the issue on the Hifive Unmatched was
> > a regression due to a PWM clock change -- I just sent a patch for that
> > (serial driver fix).
> >
> > So it seems like as long as the fw_devlink.strict=1 patch is reverted,
> > things are back to a working state here.
> >
> > I still struggle with how the fw_devlink patchset is expected to work
> > though, since DT is expected to describe the hardware configuration,
> > and it has no knowledge of whether there are drivers that will be
> > bound to any referenced supplier devnodes. It's not going to work well
> > to assume that they will always be bound, and to add 10 second
> > timeouts for those cases isn't a good solution. Seems like the number
> > of special cases will keep adding up.
>
> Since the introduction of deferred probe, the kernel has always
> assumed if there is a device described, then there is or will be a
> driver for it. The result is you can't use new DTs (if they add
> providers) with older kernels.
>
> We've ended up with a timeout because no one has come up with a better
> way to handle it. What the kernel needs is userspace saying "I'm done
> loading modules", but it's debatable whether that's a good solution
> too.

In my opinion the deferred probe is a big hack and that is the root
cause of the issues we have here and there. It has to be redesigned
to be mathematically robust. It was an attempt by Andrzej Hajda to
solve this [1].

[1]: https://events19.linuxfoundation.org/wp-content/uploads/2017/12/Deferred-Problem-Issues-With-Complex-Dependencies-Between-Devices-in-Linux-Kernel-Andrzej-Hajda-Samsung.pdf

--
With Best Regards,
Andy Shevchenko


2022-09-27 14:20:31

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] Fix console probe delay when stdout-path isn't set

On Tue, Sep 27, 2022 at 03:17:59PM +0300, Andy Shevchenko wrote:
> On Mon, Sep 26, 2022 at 01:25:05PM -0500, Rob Herring wrote:
> > On Mon, Sep 19, 2022 at 5:56 PM Olof Johansson <[email protected]> wrote:
> > >
> > > On Mon, Sep 19, 2022 at 1:44 AM Greg Kroah-Hartman
> > > <[email protected]> wrote:
> > > >
> > > > On Sun, Sep 18, 2022 at 08:44:27PM -0700, Olof Johansson wrote:
> > > > > On Tue, Aug 23, 2022 at 8:37 AM Greg Kroah-Hartman
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > On Thu, Jun 30, 2022 at 06:26:38PM -0700, Saravana Kannan wrote:
> > > > > > > These patches are on top of driver-core-next.
> > > > > > >
> > > > > > > Even if stdout-path isn't set in DT, this patch should take console
> > > > > > > probe times back to how they were before the deferred_probe_timeout
> > > > > > > clean up series[1].
> > > > > >
> > > > > > Now dropped from my queue due to lack of a response to other reviewer's
> > > > > > questions.
> > > > >
> > > > > What happened to this patch? I have a 10 second timeout on console
> > > > > probe on my SiFive Unmatched, and I don't see this flag being set for
> > > > > the serial driver. In fact, I don't see it anywhere in-tree. I can't
> > > > > seem to locate another patchset from Saravana around this though, so
> > > > > I'm not sure where to look for a missing piece for the sifive serial
> > > > > driver.
> > > > >
> > > > > This is the second boot time regression (this one not fatal, unlike
> > > > > the Layerscape PCIe one) from the fw_devlink patchset.
> > > > >
> > > > > Greg, can you revert the whole set for 6.0, please? It's obviously
> > > > > nowhere near tested enough to go in and I expect we'll see a bunch of
> > > > > -stable fixups due to this if we let it remain in.
> > > >
> > > > What exactly is "the whole set"? I have the default option fix queued
> > > > up and will send that to Linus later this week (am traveling back from
> > > > Plumbers still), but have not heard any problems about any other issues
> > > > at all other than your report.
> > >
> > > I stand corrected in this case, the issue on the Hifive Unmatched was
> > > a regression due to a PWM clock change -- I just sent a patch for that
> > > (serial driver fix).
> > >
> > > So it seems like as long as the fw_devlink.strict=1 patch is reverted,
> > > things are back to a working state here.
> > >
> > > I still struggle with how the fw_devlink patchset is expected to work
> > > though, since DT is expected to describe the hardware configuration,
> > > and it has no knowledge of whether there are drivers that will be
> > > bound to any referenced supplier devnodes. It's not going to work well
> > > to assume that they will always be bound, and to add 10 second
> > > timeouts for those cases isn't a good solution. Seems like the number
> > > of special cases will keep adding up.
> >
> > Since the introduction of deferred probe, the kernel has always
> > assumed if there is a device described, then there is or will be a
> > driver for it. The result is you can't use new DTs (if they add
> > providers) with older kernels.
> >
> > We've ended up with a timeout because no one has come up with a better
> > way to handle it. What the kernel needs is userspace saying "I'm done
> > loading modules", but it's debatable whether that's a good solution
> > too.
>
> In my opinion the deferred probe is a big hack and that is the root
> cause of the issues we have here and there. It has to be redesigned
> to be mathematically robust. It was an attempt by Andrzej Hajda to
> solve this [1].
>
> [1]: https://events19.linuxfoundation.org/wp-content/uploads/2017/12/Deferred-Problem-Issues-With-Complex-Dependencies-Between-Devices-in-Linux-Kernel-Andrzej-Hajda-Samsung.pdf

deferred probe has _ALWAYS_ been known to be a hack, way back when we
accepted it, but it was the best hack we had to solve a real problem
that we had, so it was accepted.

It's been polished over the years, but yes, it does break down at times,
due to the crazy complexity of hardware systems that we have no control
over.

If you have concrete solutions for how to solve the issue, wonderful,
please submit patches :)

thanks,

greg k-h