LinuxLists.cc - [PATCH v2 1/3] driver core: Break infinite loop when deferred probe can't be satisfied

2020-03-24 12:21:02

Subject: [PATCH v2 1/3] driver core: Break infinite loop when deferred probe can't be satisfied

Consider the following scenario.

The main driver of USB OTG controller (dwc3-pci), which has the following
functional dependencies on certain platform:
- ULPI (tusb1210)
- extcon (tested with extcon-intel-mrfld)

Note, that first driver, tusb1210, is available at the moment of
dwc3-pci probing, while extcon-intel-mrfld is built as a module and
won't appear till user space does something about it.

This is depicted by kernel configuration excerpt:

CONFIG_PHY_TUSB1210=y
CONFIG_USB_DWC3=y
CONFIG_USB_DWC3_ULPI=y
CONFIG_USB_DWC3_DUAL_ROLE=y
CONFIG_USB_DWC3_PCI=y
CONFIG_EXTCON_INTEL_MRFLD=m

In the Buildroot environment the modules are probed by alphabetical ordering
of their modaliases. The latter comes to the case when USB OTG driver will be
probed first followed by extcon one.

So, if the platform anticipates extcon device to be appeared, in the above case
we will get deferred probe of USB OTG, because of ordering.

Since current implementation, done by the commit 58b116bce136 ("drivercore:
deferral race condition fix") counts the amount of triggered deferred probe,
we never advance the situation -- the change makes it to be an infinite loop.

---8<---8<---

[ 22.187127] driver_deferred_probe_trigger <<< 1

...here is the late initcall triggers deferred probe...

[ 22.191725] platform dwc3.0.auto: deferred_probe_work_func in deferred list

...dwc3.0.auto is the only device in the deferred list...

[ 22.198727] platform dwc3.0.auto: deferred_probe_work_func 1 <<< counter 1

...the counter before mutex is unlocked is kept the same...

[ 22.205663] platform dwc3.0.auto: Retrying from deferred list

...mutes has been unlocked, we try to re-probe the driver...

[ 22.211487] bus: 'platform': driver_probe_device: matched device dwc3.0.auto with driver dwc3
[ 22.220060] bus: 'platform': really_probe: probing driver dwc3 with device dwc3.0.auto
[ 22.238735] bus: 'ulpi': driver_probe_device: matched device dwc3.0.auto.ulpi with driver tusb1210
[ 22.247743] bus: 'ulpi': really_probe: probing driver tusb1210 with device dwc3.0.auto.ulpi
[ 22.256292] driver: 'tusb1210': driver_bound: bound to device 'dwc3.0.auto.ulpi'
[ 22.263723] driver_deferred_probe_trigger <<< 2

...the dwc3.0.auto probes ULPI, we got successful bound and bumped counter...

[ 22.268304] bus: 'ulpi': really_probe: bound device dwc3.0.auto.ulpi to driver tusb1210
[ 22.276697] platform dwc3.0.auto: Driver dwc3 requests probe deferral

...but extcon driver is still missing...

[ 22.283174] platform dwc3.0.auto: Added to deferred list
[ 22.288513] platform dwc3.0.auto: driver_deferred_probe_add_trigger local counter: 1 new counter 2

...and since we had a successful probe, we got counter mismatch...

[ 22.297490] driver_deferred_probe_trigger <<< 3
[ 22.302074] platform dwc3.0.auto: deferred_probe_work_func 2 <<< counter 3

...at the end we have a new counter and loop repeats again, see 22.198727...

---8<---8<---

Revert of the commit helps, but it is probably not helpful for the initially
found regression. Artem Bityutskiy suggested to use counter of the successful
probes instead. This fixes above mentioned case and shouldn't prevent driver
to reprobe deferred ones.

Fixes: 58b116bce136 ("drivercore: deferral race condition fix")
Suggested-by: Artem Bityutskiy <[email protected]>
Cc: Grant Likely [email protected]
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: Felipe Balbi <[email protected]>
Cc: Andrzej Hajda <[email protected]>
Signed-off-by: Andy Shevchenko <[email protected]>
Reviewed-by: Peter Ujfalusi <[email protected]>
Tested-by: Ferry Toth <[email protected]>
---
v2: picked up tags, update Grant's email (Peter)
drivers/base/dd.c | 39 +++++++++++++++++++++------------------
1 file changed, 21 insertions(+), 18 deletions(-)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index b25bcab2a26b..43720beb5300 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -53,7 +53,6 @@
static DEFINE_MUTEX(deferred_probe_mutex);
static LIST_HEAD(deferred_probe_pending_list);
static LIST_HEAD(deferred_probe_active_list);
-static atomic_t deferred_trigger_count = ATOMIC_INIT(0);
static struct dentry *deferred_devices;
static bool initcalls_done;

@@ -147,17 +146,6 @@ static bool driver_deferred_probe_enable = false;
* This functions moves all devices from the pending list to the active
* list and schedules the deferred probe workqueue to process them. It
* should be called anytime a driver is successfully bound to a device.
- *
- * Note, there is a race condition in multi-threaded probe. In the case where
- * more than one device is probing at the same time, it is possible for one
- * probe to complete successfully while another is about to defer. If the second
- * depends on the first, then it will get put on the pending list after the
- * trigger event has already occurred and will be stuck there.
- *
- * The atomic 'deferred_trigger_count' is used to determine if a successful
- * trigger has occurred in the midst of probing a driver. If the trigger count
- * changes in the midst of a probe, then deferred processing should be triggered
- * again.
*/
static void driver_deferred_probe_trigger(void)
{
@@ -170,7 +158,6 @@ static void driver_deferred_probe_trigger(void)
* into the active list so they can be retried by the workqueue
*/
mutex_lock(&deferred_probe_mutex);
- atomic_inc(&deferred_trigger_count);
list_splice_tail_init(&deferred_probe_pending_list,
&deferred_probe_active_list);
mutex_unlock(&deferred_probe_mutex);
@@ -350,6 +337,19 @@ static void __exit deferred_probe_exit(void)
}
__exitcall(deferred_probe_exit);

+/*
+ * Note, there is a race condition in multi-threaded probe. In the case where
+ * more than one device is probing at the same time, it is possible for one
+ * probe to complete successfully while another is about to defer. If the second
+ * depends on the first, then it will get put on the pending list after the
+ * trigger event has already occurred and will be stuck there.
+ *
+ * The atomic 'probe_okay' is used to determine if a successful probe has
+ * occurred in the midst of probing another driver. If the count changes in
+ * the midst of a probe, then deferred processing should be triggered again.
+ */
+static atomic_t probe_okay = ATOMIC_INIT(0);
+
/**
* device_is_bound() - Check if device is bound to a driver
* @dev: device to check
@@ -375,6 +375,7 @@ static void driver_bound(struct device *dev)
pr_debug("driver: '%s': %s: bound to device '%s'\n", dev->driver->name,
__func__, dev_name(dev));

+ atomic_inc(&probe_okay);
klist_add_tail(&dev->p->knode_driver, &dev->driver->p->klist_devices);
device_links_driver_bound(dev);

@@ -481,18 +482,18 @@ static atomic_t probe_count = ATOMIC_INIT(0);
static DECLARE_WAIT_QUEUE_HEAD(probe_waitqueue);

static void driver_deferred_probe_add_trigger(struct device *dev,
- int local_trigger_count)
+ int local_probe_okay_count)
{
driver_deferred_probe_add(dev);
/* Did a trigger occur while probing? Need to re-trigger if yes */
- if (local_trigger_count != atomic_read(&deferred_trigger_count))
+ if (local_probe_okay_count != atomic_read(&probe_okay))
driver_deferred_probe_trigger();
}

static int really_probe(struct device *dev, struct device_driver *drv)
{
int ret = -EPROBE_DEFER;
- int local_trigger_count = atomic_read(&deferred_trigger_count);
+ int local_probe_okay_count = atomic_read(&probe_okay);
bool test_remove = IS_ENABLED(CONFIG_DEBUG_TEST_DRIVER_REMOVE) &&
!drv->suppress_bind_attrs;

@@ -509,7 +510,7 @@ static int really_probe(struct device *dev, struct device_driver *drv)

ret = device_links_check_suppliers(dev);
if (ret == -EPROBE_DEFER)
- driver_deferred_probe_add_trigger(dev, local_trigger_count);
+ driver_deferred_probe_add_trigger(dev, local_probe_okay_count);
if (ret)
return ret;

@@ -619,7 +620,7 @@ static int really_probe(struct device *dev, struct device_driver *drv)
case -EPROBE_DEFER:
/* Driver requested deferred probing */
dev_dbg(dev, "Driver %s requests probe deferral\n", drv->name);
- driver_deferred_probe_add_trigger(dev, local_trigger_count);
+ driver_deferred_probe_add_trigger(dev, local_probe_okay_count);
break;
case -ENODEV:
case -ENXIO:
@@ -1148,6 +1149,8 @@ static void __device_release_driver(struct device *dev, struct device *parent)
dev_pm_set_driver_flags(dev, 0);

klist_remove(&dev->p->knode_driver);
+ atomic_dec(&probe_okay);
+
device_pm_check_callbacks(dev);
if (dev->bus)
blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
--
2.25.1

2020-03-24 12:21:24

by Andy Shevchenko

[permalink] [raw]

Subject: [PATCH v2 3/3] driver core: Replace open-coded list_last_entry()

There is a place in the code where open-coded version of list entry accessors
list_last_entry() is used.

Replace that with the standard macro.

Signed-off-by: Andy Shevchenko <[email protected]>
---
v2: no change
drivers/base/dd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index efd0e4c16ba5..27a4d51b5bba 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -1226,7 +1226,7 @@ void driver_detach(struct device_driver *drv)
spin_unlock(&drv->p->klist_devices.k_lock);
break;
}
- dev_prv = list_entry(drv->p->klist_devices.k_list.prev,
+ dev_prv = list_last_entry(&drv->p->klist_devices.k_list,
struct device_private,
knode_driver.n_node);
dev = dev_prv->device;
--
2.25.1

2020-03-24 12:21:25

by Andy Shevchenko

[permalink] [raw]

Subject: [PATCH v2 2/3] driver core: Read atomic counter once in driver_probe_done()

Between printing the debug message and actual check atomic counter can be
altered. For better debugging experience read atomic counter value only once.

Signed-off-by: Andy Shevchenko <[email protected]>
Tested-by: Ferry Toth <[email protected]>
---
v2: picked up tags
drivers/base/dd.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 43720beb5300..efd0e4c16ba5 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -669,9 +669,10 @@ static int really_probe_debug(struct device *dev, struct device_driver *drv)
*/
int driver_probe_done(void)
{
- pr_debug("%s: probe_count = %d\n", __func__,
- atomic_read(&probe_count));
- if (atomic_read(&probe_count))
+ int local_probe_count = atomic_read(&probe_count);
+
+ pr_debug("%s: probe_count = %d\n", __func__, local_probe_count);
+ if (local_probe_count)
return -EBUSY;
return 0;
}
--
2.25.1

2020-03-24 12:53:21

by Rafael J. Wysocki

[permalink] [raw]

Subject: Re: [PATCH v2 1/3] driver core: Break infinite loop when deferred probe can't be satisfied

On Tue, Mar 24, 2020 at 1:20 PM Andy Shevchenko
<[email protected]> wrote:
>
> Consider the following scenario.
>
> The main driver of USB OTG controller (dwc3-pci), which has the following
> functional dependencies on certain platform:
> - ULPI (tusb1210)
> - extcon (tested with extcon-intel-mrfld)
>
> Note, that first driver, tusb1210, is available at the moment of
> dwc3-pci probing, while extcon-intel-mrfld is built as a module and
> won't appear till user space does something about it.
>
> This is depicted by kernel configuration excerpt:
>
> CONFIG_PHY_TUSB1210=y
> CONFIG_USB_DWC3=y
> CONFIG_USB_DWC3_ULPI=y
> CONFIG_USB_DWC3_DUAL_ROLE=y
> CONFIG_USB_DWC3_PCI=y
> CONFIG_EXTCON_INTEL_MRFLD=m
>
> In the Buildroot environment the modules are probed by alphabetical ordering
> of their modaliases. The latter comes to the case when USB OTG driver will be
> probed first followed by extcon one.
>
> So, if the platform anticipates extcon device to be appeared, in the above case
> we will get deferred probe of USB OTG, because of ordering.
>
> Since current implementation, done by the commit 58b116bce136 ("drivercore:
> deferral race condition fix") counts the amount of triggered deferred probe,
> we never advance the situation -- the change makes it to be an infinite loop.
>
> ---8<---8<---
>
> [ 22.187127] driver_deferred_probe_trigger <<< 1
>
> ...here is the late initcall triggers deferred probe...
>
> [ 22.191725] platform dwc3.0.auto: deferred_probe_work_func in deferred list
>
> ...dwc3.0.auto is the only device in the deferred list...
>
> [ 22.198727] platform dwc3.0.auto: deferred_probe_work_func 1 <<< counter 1
>
> ...the counter before mutex is unlocked is kept the same...
>
> [ 22.205663] platform dwc3.0.auto: Retrying from deferred list
>
> ...mutes has been unlocked, we try to re-probe the driver...
>
> [ 22.211487] bus: 'platform': driver_probe_device: matched device dwc3.0.auto with driver dwc3
> [ 22.220060] bus: 'platform': really_probe: probing driver dwc3 with device dwc3.0.auto
> [ 22.238735] bus: 'ulpi': driver_probe_device: matched device dwc3.0.auto.ulpi with driver tusb1210
> [ 22.247743] bus: 'ulpi': really_probe: probing driver tusb1210 with device dwc3.0.auto.ulpi
> [ 22.256292] driver: 'tusb1210': driver_bound: bound to device 'dwc3.0.auto.ulpi'
> [ 22.263723] driver_deferred_probe_trigger <<< 2
>
> ...the dwc3.0.auto probes ULPI, we got successful bound and bumped counter...
>
> [ 22.268304] bus: 'ulpi': really_probe: bound device dwc3.0.auto.ulpi to driver tusb1210
> [ 22.276697] platform dwc3.0.auto: Driver dwc3 requests probe deferral
>
> ...but extcon driver is still missing...
>
> [ 22.283174] platform dwc3.0.auto: Added to deferred list
> [ 22.288513] platform dwc3.0.auto: driver_deferred_probe_add_trigger local counter: 1 new counter 2
>
> ...and since we had a successful probe, we got counter mismatch...
>
> [ 22.297490] driver_deferred_probe_trigger <<< 3
> [ 22.302074] platform dwc3.0.auto: deferred_probe_work_func 2 <<< counter 3
>
> ...at the end we have a new counter and loop repeats again, see 22.198727...
>
> ---8<---8<---
>
> Revert of the commit helps, but it is probably not helpful for the initially
> found regression. Artem Bityutskiy suggested to use counter of the successful
> probes instead. This fixes above mentioned case and shouldn't prevent driver
> to reprobe deferred ones.
>
> Fixes: 58b116bce136 ("drivercore: deferral race condition fix")
> Suggested-by: Artem Bityutskiy <[email protected]>
> Cc: Grant Likely [email protected]
> Cc: Greg Kroah-Hartman <[email protected]>
> Cc: Mark Brown <[email protected]>
> Cc: Felipe Balbi <[email protected]>
> Cc: Andrzej Hajda <[email protected]>
> Signed-off-by: Andy Shevchenko <[email protected]>
> Reviewed-by: Peter Ujfalusi <[email protected]>
> Tested-by: Ferry Toth <[email protected]>
> ---
> v2: picked up tags, update Grant's email (Peter)
> drivers/base/dd.c | 39 +++++++++++++++++++++------------------
> 1 file changed, 21 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index b25bcab2a26b..43720beb5300 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -53,7 +53,6 @@
> static DEFINE_MUTEX(deferred_probe_mutex);
> static LIST_HEAD(deferred_probe_pending_list);
> static LIST_HEAD(deferred_probe_active_list);
> -static atomic_t deferred_trigger_count = ATOMIC_INIT(0);
> static struct dentry *deferred_devices;
> static bool initcalls_done;
>
> @@ -147,17 +146,6 @@ static bool driver_deferred_probe_enable = false;
> * This functions moves all devices from the pending list to the active
> * list and schedules the deferred probe workqueue to process them. It
> * should be called anytime a driver is successfully bound to a device.
> - *
> - * Note, there is a race condition in multi-threaded probe. In the case where
> - * more than one device is probing at the same time, it is possible for one
> - * probe to complete successfully while another is about to defer. If the second
> - * depends on the first, then it will get put on the pending list after the
> - * trigger event has already occurred and will be stuck there.
> - *
> - * The atomic 'deferred_trigger_count' is used to determine if a successful
> - * trigger has occurred in the midst of probing a driver. If the trigger count
> - * changes in the midst of a probe, then deferred processing should be triggered
> - * again.
> */
> static void driver_deferred_probe_trigger(void)
> {
> @@ -170,7 +158,6 @@ static void driver_deferred_probe_trigger(void)
> * into the active list so they can be retried by the workqueue
> */
> mutex_lock(&deferred_probe_mutex);
> - atomic_inc(&deferred_trigger_count);
> list_splice_tail_init(&deferred_probe_pending_list,
> &deferred_probe_active_list);
> mutex_unlock(&deferred_probe_mutex);
> @@ -350,6 +337,19 @@ static void __exit deferred_probe_exit(void)
> }
> __exitcall(deferred_probe_exit);
>
> +/*
> + * Note, there is a race condition in multi-threaded probe. In the case where
> + * more than one device is probing at the same time, it is possible for one
> + * probe to complete successfully while another is about to defer. If the second
> + * depends on the first, then it will get put on the pending list after the
> + * trigger event has already occurred and will be stuck there.
> + *
> + * The atomic 'probe_okay' is used to determine if a successful probe has
> + * occurred in the midst of probing another driver. If the count changes in
> + * the midst of a probe, then deferred processing should be triggered again.
> + */
> +static atomic_t probe_okay = ATOMIC_INIT(0);
> +
> /**
> * device_is_bound() - Check if device is bound to a driver
> * @dev: device to check
> @@ -375,6 +375,7 @@ static void driver_bound(struct device *dev)
> pr_debug("driver: '%s': %s: bound to device '%s'\n", dev->driver->name,
> __func__, dev_name(dev));
>
> + atomic_inc(&probe_okay);
> klist_add_tail(&dev->p->knode_driver, &dev->driver->p->klist_devices);
> device_links_driver_bound(dev);
>
> @@ -481,18 +482,18 @@ static atomic_t probe_count = ATOMIC_INIT(0);
> static DECLARE_WAIT_QUEUE_HEAD(probe_waitqueue);
>
> static void driver_deferred_probe_add_trigger(struct device *dev,
> - int local_trigger_count)
> + int local_probe_okay_count)
> {
> driver_deferred_probe_add(dev);
> /* Did a trigger occur while probing? Need to re-trigger if yes */
> - if (local_trigger_count != atomic_read(&deferred_trigger_count))
> + if (local_probe_okay_count != atomic_read(&probe_okay))
> driver_deferred_probe_trigger();
> }
>
> static int really_probe(struct device *dev, struct device_driver *drv)
> {
> int ret = -EPROBE_DEFER;
> - int local_trigger_count = atomic_read(&deferred_trigger_count);
> + int local_probe_okay_count = atomic_read(&probe_okay);
> bool test_remove = IS_ENABLED(CONFIG_DEBUG_TEST_DRIVER_REMOVE) &&
> !drv->suppress_bind_attrs;
>
> @@ -509,7 +510,7 @@ static int really_probe(struct device *dev, struct device_driver *drv)
>
> ret = device_links_check_suppliers(dev);
> if (ret == -EPROBE_DEFER)
> - driver_deferred_probe_add_trigger(dev, local_trigger_count);
> + driver_deferred_probe_add_trigger(dev, local_probe_okay_count);
> if (ret)
> return ret;
>
> @@ -619,7 +620,7 @@ static int really_probe(struct device *dev, struct device_driver *drv)
> case -EPROBE_DEFER:
> /* Driver requested deferred probing */
> dev_dbg(dev, "Driver %s requests probe deferral\n", drv->name);
> - driver_deferred_probe_add_trigger(dev, local_trigger_count);
> + driver_deferred_probe_add_trigger(dev, local_probe_okay_count);
> break;
> case -ENODEV:
> case -ENXIO:
> @@ -1148,6 +1149,8 @@ static void __device_release_driver(struct device *dev, struct device *parent)
> dev_pm_set_driver_flags(dev, 0);
>
> klist_remove(&dev->p->knode_driver);
> + atomic_dec(&probe_okay);
> +

Why is this needed?

> device_pm_check_callbacks(dev);
> if (dev->bus)
> blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
> --

2020-03-24 12:54:47

by Rafael J. Wysocki

[permalink] [raw]

Subject: Re: [PATCH v2 3/3] driver core: Replace open-coded list_last_entry()

On Tue, Mar 24, 2020 at 1:20 PM Andy Shevchenko
<[email protected]> wrote:
>
> There is a place in the code where open-coded version of list entry accessors
> list_last_entry() is used.
>
> Replace that with the standard macro.
>
> Signed-off-by: Andy Shevchenko <[email protected]>

Reviewed-by: Rafael J. Wysocki <[email protected]>

> ---
> v2: no change
> drivers/base/dd.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index efd0e4c16ba5..27a4d51b5bba 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -1226,7 +1226,7 @@ void driver_detach(struct device_driver *drv)
> spin_unlock(&drv->p->klist_devices.k_lock);
> break;
> }
> - dev_prv = list_entry(drv->p->klist_devices.k_list.prev,
> + dev_prv = list_last_entry(&drv->p->klist_devices.k_list,
> struct device_private,
> knode_driver.n_node);
> dev = dev_prv->device;
> --
> 2.25.1
>

2020-03-24 12:55:44

by Rafael J. Wysocki

[permalink] [raw]

Subject: Re: [PATCH v2 2/3] driver core: Read atomic counter once in driver_probe_done()

On Tue, Mar 24, 2020 at 1:20 PM Andy Shevchenko
<[email protected]> wrote:
>
> Between printing the debug message and actual check atomic counter can be
> altered. For better debugging experience read atomic counter value only once.
>
> Signed-off-by: Andy Shevchenko <[email protected]>
> Tested-by: Ferry Toth <[email protected]>

Reviewed-by: Rafael J. Wysocki <[email protected]>

> ---
> v2: picked up tags
> drivers/base/dd.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 43720beb5300..efd0e4c16ba5 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -669,9 +669,10 @@ static int really_probe_debug(struct device *dev, struct device_driver *drv)
> */
> int driver_probe_done(void)
> {
> - pr_debug("%s: probe_count = %d\n", __func__,
> - atomic_read(&probe_count));
> - if (atomic_read(&probe_count))
> + int local_probe_count = atomic_read(&probe_count);
> +
> + pr_debug("%s: probe_count = %d\n", __func__, local_probe_count);
> + if (local_probe_count)
> return -EBUSY;
> return 0;
> }
> --
> 2.25.1
>

2020-03-24 13:39:55

by Andy Shevchenko

[permalink] [raw]

Subject: Re: [PATCH v2 1/3] driver core: Break infinite loop when deferred probe can't be satisfied

On Tue, Mar 24, 2020 at 01:52:00PM +0100, Rafael J. Wysocki wrote:
> On Tue, Mar 24, 2020 at 1:20 PM Andy Shevchenko
> <[email protected]> wrote:
> >
> > Consider the following scenario.
> >
> > The main driver of USB OTG controller (dwc3-pci), which has the following
> > functional dependencies on certain platform:
> > - ULPI (tusb1210)
> > - extcon (tested with extcon-intel-mrfld)
> >
> > Note, that first driver, tusb1210, is available at the moment of
> > dwc3-pci probing, while extcon-intel-mrfld is built as a module and
> > won't appear till user space does something about it.
> >
> > This is depicted by kernel configuration excerpt:
> >
> > CONFIG_PHY_TUSB1210=y
> > CONFIG_USB_DWC3=y
> > CONFIG_USB_DWC3_ULPI=y
> > CONFIG_USB_DWC3_DUAL_ROLE=y
> > CONFIG_USB_DWC3_PCI=y
> > CONFIG_EXTCON_INTEL_MRFLD=m
> >
> > In the Buildroot environment the modules are probed by alphabetical ordering
> > of their modaliases. The latter comes to the case when USB OTG driver will be
> > probed first followed by extcon one.
> >
> > So, if the platform anticipates extcon device to be appeared, in the above case
> > we will get deferred probe of USB OTG, because of ordering.
> >
> > Since current implementation, done by the commit 58b116bce136 ("drivercore:
> > deferral race condition fix") counts the amount of triggered deferred probe,
> > we never advance the situation -- the change makes it to be an infinite loop.
> >
> > ---8<---8<---
> >
> > [ 22.187127] driver_deferred_probe_trigger <<< 1
> >
> > ...here is the late initcall triggers deferred probe...
> >
> > [ 22.191725] platform dwc3.0.auto: deferred_probe_work_func in deferred list
> >
> > ...dwc3.0.auto is the only device in the deferred list...
> >
> > [ 22.198727] platform dwc3.0.auto: deferred_probe_work_func 1 <<< counter 1
> >
> > ...the counter before mutex is unlocked is kept the same...
> >
> > [ 22.205663] platform dwc3.0.auto: Retrying from deferred list
> >
> > ...mutes has been unlocked, we try to re-probe the driver...
> >
> > [ 22.211487] bus: 'platform': driver_probe_device: matched device dwc3.0.auto with driver dwc3
> > [ 22.220060] bus: 'platform': really_probe: probing driver dwc3 with device dwc3.0.auto
> > [ 22.238735] bus: 'ulpi': driver_probe_device: matched device dwc3.0.auto.ulpi with driver tusb1210
> > [ 22.247743] bus: 'ulpi': really_probe: probing driver tusb1210 with device dwc3.0.auto.ulpi
> > [ 22.256292] driver: 'tusb1210': driver_bound: bound to device 'dwc3.0.auto.ulpi'
> > [ 22.263723] driver_deferred_probe_trigger <<< 2
> >
> > ...the dwc3.0.auto probes ULPI, we got successful bound and bumped counter...
> >
> > [ 22.268304] bus: 'ulpi': really_probe: bound device dwc3.0.auto.ulpi to driver tusb1210
> > [ 22.276697] platform dwc3.0.auto: Driver dwc3 requests probe deferral
> >
> > ...but extcon driver is still missing...
> >
> > [ 22.283174] platform dwc3.0.auto: Added to deferred list
> > [ 22.288513] platform dwc3.0.auto: driver_deferred_probe_add_trigger local counter: 1 new counter 2
> >
> > ...and since we had a successful probe, we got counter mismatch...
> >
> > [ 22.297490] driver_deferred_probe_trigger <<< 3
> > [ 22.302074] platform dwc3.0.auto: deferred_probe_work_func 2 <<< counter 3
> >
> > ...at the end we have a new counter and loop repeats again, see 22.198727...
> >
> > ---8<---8<---
> >
> > Revert of the commit helps, but it is probably not helpful for the initially
> > found regression. Artem Bityutskiy suggested to use counter of the successful
> > probes instead. This fixes above mentioned case and shouldn't prevent driver
> > to reprobe deferred ones.
> >
> > Fixes: 58b116bce136 ("drivercore: deferral race condition fix")
> > Suggested-by: Artem Bityutskiy <[email protected]>
> > Cc: Grant Likely [email protected]
> > Cc: Greg Kroah-Hartman <[email protected]>
> > Cc: Mark Brown <[email protected]>
> > Cc: Felipe Balbi <[email protected]>
> > Cc: Andrzej Hajda <[email protected]>
> > Signed-off-by: Andy Shevchenko <[email protected]>
> > Reviewed-by: Peter Ujfalusi <[email protected]>
> > Tested-by: Ferry Toth <[email protected]>
> > ---
> > v2: picked up tags, update Grant's email (Peter)
> > drivers/base/dd.c | 39 +++++++++++++++++++++------------------
> > 1 file changed, 21 insertions(+), 18 deletions(-)
> >
> > diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> > index b25bcab2a26b..43720beb5300 100644
> > --- a/drivers/base/dd.c
> > +++ b/drivers/base/dd.c
> > @@ -53,7 +53,6 @@
> > static DEFINE_MUTEX(deferred_probe_mutex);
> > static LIST_HEAD(deferred_probe_pending_list);
> > static LIST_HEAD(deferred_probe_active_list);
> > -static atomic_t deferred_trigger_count = ATOMIC_INIT(0);
> > static struct dentry *deferred_devices;
> > static bool initcalls_done;
> >
> > @@ -147,17 +146,6 @@ static bool driver_deferred_probe_enable = false;
> > * This functions moves all devices from the pending list to the active
> > * list and schedules the deferred probe workqueue to process them. It
> > * should be called anytime a driver is successfully bound to a device.
> > - *
> > - * Note, there is a race condition in multi-threaded probe. In the case where
> > - * more than one device is probing at the same time, it is possible for one
> > - * probe to complete successfully while another is about to defer. If the second
> > - * depends on the first, then it will get put on the pending list after the
> > - * trigger event has already occurred and will be stuck there.
> > - *
> > - * The atomic 'deferred_trigger_count' is used to determine if a successful
> > - * trigger has occurred in the midst of probing a driver. If the trigger count
> > - * changes in the midst of a probe, then deferred processing should be triggered
> > - * again.
> > */
> > static void driver_deferred_probe_trigger(void)
> > {
> > @@ -170,7 +158,6 @@ static void driver_deferred_probe_trigger(void)
> > * into the active list so they can be retried by the workqueue
> > */
> > mutex_lock(&deferred_probe_mutex);
> > - atomic_inc(&deferred_trigger_count);
> > list_splice_tail_init(&deferred_probe_pending_list,
> > &deferred_probe_active_list);
> > mutex_unlock(&deferred_probe_mutex);
> > @@ -350,6 +337,19 @@ static void __exit deferred_probe_exit(void)
> > }
> > __exitcall(deferred_probe_exit);
> >
> > +/*
> > + * Note, there is a race condition in multi-threaded probe. In the case where
> > + * more than one device is probing at the same time, it is possible for one
> > + * probe to complete successfully while another is about to defer. If the second
> > + * depends on the first, then it will get put on the pending list after the
> > + * trigger event has already occurred and will be stuck there.
> > + *
> > + * The atomic 'probe_okay' is used to determine if a successful probe has
> > + * occurred in the midst of probing another driver. If the count changes in
> > + * the midst of a probe, then deferred processing should be triggered again.
> > + */
> > +static atomic_t probe_okay = ATOMIC_INIT(0);
> > +
> > /**
> > * device_is_bound() - Check if device is bound to a driver
> > * @dev: device to check
> > @@ -375,6 +375,7 @@ static void driver_bound(struct device *dev)
> > pr_debug("driver: '%s': %s: bound to device '%s'\n", dev->driver->name,
> > __func__, dev_name(dev));
> >
> > + atomic_inc(&probe_okay);
> > klist_add_tail(&dev->p->knode_driver, &dev->driver->p->klist_devices);
> > device_links_driver_bound(dev);
> >
> > @@ -481,18 +482,18 @@ static atomic_t probe_count = ATOMIC_INIT(0);
> > static DECLARE_WAIT_QUEUE_HEAD(probe_waitqueue);
> >
> > static void driver_deferred_probe_add_trigger(struct device *dev,
> > - int local_trigger_count)
> > + int local_probe_okay_count)
> > {
> > driver_deferred_probe_add(dev);
> > /* Did a trigger occur while probing? Need to re-trigger if yes */
> > - if (local_trigger_count != atomic_read(&deferred_trigger_count))
> > + if (local_probe_okay_count != atomic_read(&probe_okay))
> > driver_deferred_probe_trigger();
> > }
> >
> > static int really_probe(struct device *dev, struct device_driver *drv)
> > {
> > int ret = -EPROBE_DEFER;
> > - int local_trigger_count = atomic_read(&deferred_trigger_count);
> > + int local_probe_okay_count = atomic_read(&probe_okay);
> > bool test_remove = IS_ENABLED(CONFIG_DEBUG_TEST_DRIVER_REMOVE) &&
> > !drv->suppress_bind_attrs;
> >
> > @@ -509,7 +510,7 @@ static int really_probe(struct device *dev, struct device_driver *drv)
> >
> > ret = device_links_check_suppliers(dev);
> > if (ret == -EPROBE_DEFER)
> > - driver_deferred_probe_add_trigger(dev, local_trigger_count);
> > + driver_deferred_probe_add_trigger(dev, local_probe_okay_count);
> > if (ret)
> > return ret;
> >
> > @@ -619,7 +620,7 @@ static int really_probe(struct device *dev, struct device_driver *drv)
> > case -EPROBE_DEFER:
> > /* Driver requested deferred probing */
> > dev_dbg(dev, "Driver %s requests probe deferral\n", drv->name);
> > - driver_deferred_probe_add_trigger(dev, local_trigger_count);
> > + driver_deferred_probe_add_trigger(dev, local_probe_okay_count);
> > break;
> > case -ENODEV:
> > case -ENXIO:
> > @@ -1148,6 +1149,8 @@ static void __device_release_driver(struct device *dev, struct device *parent)
> > dev_pm_set_driver_flags(dev, 0);
> >
> > klist_remove(&dev->p->knode_driver);
> > + atomic_dec(&probe_okay);
> > +
>
> Why is this needed?

Under successful probe the following is comprehended. When probe of the driver
happens it may be discarded (as in above case) as it was initiated by another
driver which got deferred.

We also discussed this with Peter [1] during his review.

[1]: https://lkml.org/lkml/2020/3/12/347

>
> > device_pm_check_callbacks(dev);
> > if (dev->bus)
> > blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
> > --

--
With Best Regards,
Andy Shevchenko

2020-03-24 15:39:07

by Rafael J. Wysocki

[permalink] [raw]

Subject: Re: [PATCH v2 1/3] driver core: Break infinite loop when deferred probe can't be satisfied

On Tue, Mar 24, 2020 at 2:39 PM Andy Shevchenko
<[email protected]> wrote:
>
> On Tue, Mar 24, 2020 at 01:52:00PM +0100, Rafael J. Wysocki wrote:
> > On Tue, Mar 24, 2020 at 1:20 PM Andy Shevchenko
> > <[email protected]> wrote:
> > >
> > > Consider the following scenario.
> > >
> > > The main driver of USB OTG controller (dwc3-pci), which has the following
> > > functional dependencies on certain platform:
> > > - ULPI (tusb1210)
> > > - extcon (tested with extcon-intel-mrfld)
> > >
> > > Note, that first driver, tusb1210, is available at the moment of
> > > dwc3-pci probing, while extcon-intel-mrfld is built as a module and
> > > won't appear till user space does something about it.
> > >
> > > This is depicted by kernel configuration excerpt:
> > >
> > > CONFIG_PHY_TUSB1210=y
> > > CONFIG_USB_DWC3=y
> > > CONFIG_USB_DWC3_ULPI=y
> > > CONFIG_USB_DWC3_DUAL_ROLE=y
> > > CONFIG_USB_DWC3_PCI=y
> > > CONFIG_EXTCON_INTEL_MRFLD=m
> > >
> > > In the Buildroot environment the modules are probed by alphabetical ordering
> > > of their modaliases. The latter comes to the case when USB OTG driver will be
> > > probed first followed by extcon one.
> > >
> > > So, if the platform anticipates extcon device to be appeared, in the above case
> > > we will get deferred probe of USB OTG, because of ordering.
> > >
> > > Since current implementation, done by the commit 58b116bce136 ("drivercore:
> > > deferral race condition fix") counts the amount of triggered deferred probe,
> > > we never advance the situation -- the change makes it to be an infinite loop.
> > >
> > > ---8<---8<---
> > >
> > > [ 22.187127] driver_deferred_probe_trigger <<< 1
> > >
> > > ...here is the late initcall triggers deferred probe...
> > >
> > > [ 22.191725] platform dwc3.0.auto: deferred_probe_work_func in deferred list
> > >
> > > ...dwc3.0.auto is the only device in the deferred list...
> > >
> > > [ 22.198727] platform dwc3.0.auto: deferred_probe_work_func 1 <<< counter 1
> > >
> > > ...the counter before mutex is unlocked is kept the same...
> > >
> > > [ 22.205663] platform dwc3.0.auto: Retrying from deferred list
> > >
> > > ...mutes has been unlocked, we try to re-probe the driver...
> > >
> > > [ 22.211487] bus: 'platform': driver_probe_device: matched device dwc3.0.auto with driver dwc3
> > > [ 22.220060] bus: 'platform': really_probe: probing driver dwc3 with device dwc3.0.auto
> > > [ 22.238735] bus: 'ulpi': driver_probe_device: matched device dwc3.0.auto.ulpi with driver tusb1210
> > > [ 22.247743] bus: 'ulpi': really_probe: probing driver tusb1210 with device dwc3.0.auto.ulpi
> > > [ 22.256292] driver: 'tusb1210': driver_bound: bound to device 'dwc3.0.auto.ulpi'
> > > [ 22.263723] driver_deferred_probe_trigger <<< 2
> > >
> > > ...the dwc3.0.auto probes ULPI, we got successful bound and bumped counter...
> > >
> > > [ 22.268304] bus: 'ulpi': really_probe: bound device dwc3.0.auto.ulpi to driver tusb1210
> > > [ 22.276697] platform dwc3.0.auto: Driver dwc3 requests probe deferral
> > >
> > > ...but extcon driver is still missing...
> > >
> > > [ 22.283174] platform dwc3.0.auto: Added to deferred list
> > > [ 22.288513] platform dwc3.0.auto: driver_deferred_probe_add_trigger local counter: 1 new counter 2
> > >
> > > ...and since we had a successful probe, we got counter mismatch...
> > >
> > > [ 22.297490] driver_deferred_probe_trigger <<< 3
> > > [ 22.302074] platform dwc3.0.auto: deferred_probe_work_func 2 <<< counter 3
> > >
> > > ...at the end we have a new counter and loop repeats again, see 22.198727...
> > >
> > > ---8<---8<---
> > >
> > > Revert of the commit helps, but it is probably not helpful for the initially
> > > found regression. Artem Bityutskiy suggested to use counter of the successful
> > > probes instead. This fixes above mentioned case and shouldn't prevent driver
> > > to reprobe deferred ones.
> > >
> > > Fixes: 58b116bce136 ("drivercore: deferral race condition fix")
> > > Suggested-by: Artem Bityutskiy <[email protected]>
> > > Cc: Grant Likely [email protected]
> > > Cc: Greg Kroah-Hartman <[email protected]>
> > > Cc: Mark Brown <[email protected]>
> > > Cc: Felipe Balbi <[email protected]>
> > > Cc: Andrzej Hajda <[email protected]>
> > > Signed-off-by: Andy Shevchenko <[email protected]>
> > > Reviewed-by: Peter Ujfalusi <[email protected]>
> > > Tested-by: Ferry Toth <[email protected]>
> > > ---
> > > v2: picked up tags, update Grant's email (Peter)
> > > drivers/base/dd.c | 39 +++++++++++++++++++++------------------
> > > 1 file changed, 21 insertions(+), 18 deletions(-)
> > >
> > > diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> > > index b25bcab2a26b..43720beb5300 100644
> > > --- a/drivers/base/dd.c
> > > +++ b/drivers/base/dd.c
> > > @@ -53,7 +53,6 @@
> > > static DEFINE_MUTEX(deferred_probe_mutex);
> > > static LIST_HEAD(deferred_probe_pending_list);
> > > static LIST_HEAD(deferred_probe_active_list);
> > > -static atomic_t deferred_trigger_count = ATOMIC_INIT(0);
> > > static struct dentry *deferred_devices;
> > > static bool initcalls_done;
> > >
> > > @@ -147,17 +146,6 @@ static bool driver_deferred_probe_enable = false;
> > > * This functions moves all devices from the pending list to the active
> > > * list and schedules the deferred probe workqueue to process them. It
> > > * should be called anytime a driver is successfully bound to a device.
> > > - *
> > > - * Note, there is a race condition in multi-threaded probe. In the case where
> > > - * more than one device is probing at the same time, it is possible for one
> > > - * probe to complete successfully while another is about to defer. If the second
> > > - * depends on the first, then it will get put on the pending list after the
> > > - * trigger event has already occurred and will be stuck there.
> > > - *
> > > - * The atomic 'deferred_trigger_count' is used to determine if a successful
> > > - * trigger has occurred in the midst of probing a driver. If the trigger count
> > > - * changes in the midst of a probe, then deferred processing should be triggered
> > > - * again.
> > > */
> > > static void driver_deferred_probe_trigger(void)
> > > {
> > > @@ -170,7 +158,6 @@ static void driver_deferred_probe_trigger(void)
> > > * into the active list so they can be retried by the workqueue
> > > */
> > > mutex_lock(&deferred_probe_mutex);
> > > - atomic_inc(&deferred_trigger_count);
> > > list_splice_tail_init(&deferred_probe_pending_list,
> > > &deferred_probe_active_list);
> > > mutex_unlock(&deferred_probe_mutex);
> > > @@ -350,6 +337,19 @@ static void __exit deferred_probe_exit(void)
> > > }
> > > __exitcall(deferred_probe_exit);
> > >
> > > +/*
> > > + * Note, there is a race condition in multi-threaded probe. In the case where
> > > + * more than one device is probing at the same time, it is possible for one
> > > + * probe to complete successfully while another is about to defer. If the second
> > > + * depends on the first, then it will get put on the pending list after the
> > > + * trigger event has already occurred and will be stuck there.
> > > + *
> > > + * The atomic 'probe_okay' is used to determine if a successful probe has
> > > + * occurred in the midst of probing another driver. If the count changes in
> > > + * the midst of a probe, then deferred processing should be triggered again.
> > > + */
> > > +static atomic_t probe_okay = ATOMIC_INIT(0);
> > > +
> > > /**
> > > * device_is_bound() - Check if device is bound to a driver
> > > * @dev: device to check
> > > @@ -375,6 +375,7 @@ static void driver_bound(struct device *dev)
> > > pr_debug("driver: '%s': %s: bound to device '%s'\n", dev->driver->name,
> > > __func__, dev_name(dev));
> > >
> > > + atomic_inc(&probe_okay);
> > > klist_add_tail(&dev->p->knode_driver, &dev->driver->p->klist_devices);
> > > device_links_driver_bound(dev);
> > >
> > > @@ -481,18 +482,18 @@ static atomic_t probe_count = ATOMIC_INIT(0);
> > > static DECLARE_WAIT_QUEUE_HEAD(probe_waitqueue);
> > >
> > > static void driver_deferred_probe_add_trigger(struct device *dev,
> > > - int local_trigger_count)
> > > + int local_probe_okay_count)
> > > {
> > > driver_deferred_probe_add(dev);
> > > /* Did a trigger occur while probing? Need to re-trigger if yes */
> > > - if (local_trigger_count != atomic_read(&deferred_trigger_count))
> > > + if (local_probe_okay_count != atomic_read(&probe_okay))
> > > driver_deferred_probe_trigger();
> > > }
> > >
> > > static int really_probe(struct device *dev, struct device_driver *drv)
> > > {
> > > int ret = -EPROBE_DEFER;
> > > - int local_trigger_count = atomic_read(&deferred_trigger_count);
> > > + int local_probe_okay_count = atomic_read(&probe_okay);
> > > bool test_remove = IS_ENABLED(CONFIG_DEBUG_TEST_DRIVER_REMOVE) &&
> > > !drv->suppress_bind_attrs;
> > >
> > > @@ -509,7 +510,7 @@ static int really_probe(struct device *dev, struct device_driver *drv)
> > >
> > > ret = device_links_check_suppliers(dev);
> > > if (ret == -EPROBE_DEFER)
> > > - driver_deferred_probe_add_trigger(dev, local_trigger_count);
> > > + driver_deferred_probe_add_trigger(dev, local_probe_okay_count);
> > > if (ret)
> > > return ret;
> > >
> > > @@ -619,7 +620,7 @@ static int really_probe(struct device *dev, struct device_driver *drv)
> > > case -EPROBE_DEFER:
> > > /* Driver requested deferred probing */
> > > dev_dbg(dev, "Driver %s requests probe deferral\n", drv->name);
> > > - driver_deferred_probe_add_trigger(dev, local_trigger_count);
> > > + driver_deferred_probe_add_trigger(dev, local_probe_okay_count);
> > > break;
> > > case -ENODEV:
> > > case -ENXIO:
> > > @@ -1148,6 +1149,8 @@ static void __device_release_driver(struct device *dev, struct device *parent)
> > > dev_pm_set_driver_flags(dev, 0);
> > >
> > > klist_remove(&dev->p->knode_driver);
> > > + atomic_dec(&probe_okay);
> > > +
> >
> > Why is this needed?
>
> Under successful probe the following is comprehended. When probe of the driver
> happens it may be discarded (as in above case) as it was initiated by another
> driver which got deferred.
>
> We also discussed this with Peter [1] during his review.
>
> [1]: https://lkml.org/lkml/2020/3/12/347

OK, but I would add a comment explaining that to the code.

Also it would be good to explain why probe_okay cannot go below zero
here in the changelog.

Cheers!

2020-03-24 15:47:35

by Andy Shevchenko

[permalink] [raw]

Subject: Re: [PATCH v2 1/3] driver core: Break infinite loop when deferred probe can't be satisfied

On Tue, Mar 24, 2020 at 04:36:56PM +0100, Rafael J. Wysocki wrote:
> On Tue, Mar 24, 2020 at 2:39 PM Andy Shevchenko
> <[email protected]> wrote:
> > On Tue, Mar 24, 2020 at 01:52:00PM +0100, Rafael J. Wysocki wrote:
> > > On Tue, Mar 24, 2020 at 1:20 PM Andy Shevchenko
> > > <[email protected]> wrote:

...

> > > > + atomic_dec(&probe_okay);
> > >
> > > Why is this needed?
> >
> > Under successful probe the following is comprehended. When probe of the driver
> > happens it may be discarded (as in above case) as it was initiated by another
> > driver which got deferred.
> >
> > We also discussed this with Peter [1] during his review.
> >
> > [1]: https://lkml.org/lkml/2020/3/12/347
>
> OK, but I would add a comment explaining that to the code.
>
> Also it would be good to explain why probe_okay cannot go below zero
> here in the changelog.

Will do, thank you for review!

--
With Best Regards,
Andy Shevchenko

2020-03-27 17:57:20

by Naresh Kamboju

[permalink] [raw]

Subject: Re: [PATCH v2 3/3] driver core: Replace open-coded list_last_entry()

The kernel warning noticed on arm64 juno-r2 device running linux
next-20200326 and next-20200327

[ 36.077086] ------------[ cut here ]------------
[ 36.081752] amba 20010000.etf: deferred probe timeout, ignoring dependency
[ 36.081859] WARNING: CPU: 1 PID: 42 at drivers/base/dd.c:270
driver_deferred_probe_check_state+0x54/0x80
[ 36.098242] Modules linked in: fuse
[ 36.101753] CPU: 1 PID: 42 Comm: kworker/1:1 Not tainted
5.6.0-rc7-next-20200327 #1
[ 36.109427] Hardware name: ARM Juno development board (r2) (DT)
[ 36.115372] Workqueue: events amba_deferred_retry_func
[ 36.120526] pstate: 60000005 (nZCv daif -PAN -UAO)
[ 36.125334] pc : driver_deferred_probe_check_state+0x54/0x80
[ 36.131010] lr : driver_deferred_probe_check_state+0x54/0x80
[ 36.136680] sp : ffff000934e0fae0
[ 36.140001] x29: ffff000934e0fae0 x28: ffff000934db5608
[ 36.145337] x27: ffffa00013c63240 x26: ffff000934f2a800
[ 36.150668] x25: 0000000000000001 x24: fffffffffffffffe
[ 36.155996] x23: ffff000934c6ab80 x22: ffffa00011b39ea0
[ 36.161322] x21: ffff000934f2a800 x20: ffffa00011905fe0
[ 36.166649] x19: ffff000934f2a800 x18: 0000000000000000
[ 36.171974] x17: 0000000000000000 x16: 0000000000000000
[ 36.177299] x15: 0000000000000000 x14: 003d090000000000
[ 36.182625] x13: 00003d0900000000 x12: ffff9400027ef445
[ 36.187952] x11: 1ffff400027ef444 x10: ffff9400027ef444
[ 36.193278] x9 : dfffa00000000000 x8 : 0000000000000000
[ 36.198603] x7 : 0000000000000001 x6 : ffffa00013f7a220
[ 36.203929] x5 : 0000000000000004 x4 : dfffa00000000000
[ 36.209255] x3 : ffffa000101a74ec x2 : ffff8001269c1f26
[ 36.214581] x1 : da1107b7b6a8fb00 x0 : 0000000000000000
[ 36.219906] Call trace:
[ 36.222369] driver_deferred_probe_check_state+0x54/0x80
[ 36.227698] __genpd_dev_pm_attach+0x264/0x2a0
[ 36.232154] genpd_dev_pm_attach+0x68/0x78
[ 36.236265] dev_pm_domain_attach+0x6c/0x70
[ 36.240463] amba_device_try_add+0xec/0x3f8
[ 36.244659] amba_deferred_retry_func+0x84/0x158
[ 36.249301] process_one_work+0x3f0/0x660
[ 36.253326] worker_thread+0x74/0x698
[ 36.256997] kthread+0x218/0x220
[ 36.260236] ret_from_fork+0x10/0x1c
[ 36.263819] ---[ end trace c637c10e549bd716 ]---#

Full test log,
https://lkft.validation.linaro.org/scheduler/job/1317079#L981

On Tue, 24 Mar 2020 at 18:24, Rafael J. Wysocki <[email protected]> wrote:
>
> On Tue, Mar 24, 2020 at 1:20 PM Andy Shevchenko
> <[email protected]> wrote:
> >
> > There is a place in the code where open-coded version of list entry accessors
> > list_last_entry() is used.
> >
> > Replace that with the standard macro.
> >
> > Signed-off-by: Andy Shevchenko <[email protected]>
>
> Reviewed-by: Rafael J. Wysocki <[email protected]>
>
> > ---
> > v2: no change
> > drivers/base/dd.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> > index efd0e4c16ba5..27a4d51b5bba 100644
> > --- a/drivers/base/dd.c
> > +++ b/drivers/base/dd.c
> > @@ -1226,7 +1226,7 @@ void driver_detach(struct device_driver *drv)
> > spin_unlock(&drv->p->klist_devices.k_lock);
> > break;
> > }
> > - dev_prv = list_entry(drv->p->klist_devices.k_list.prev,
> > + dev_prv = list_last_entry(&drv->p->klist_devices.k_list,
> > struct device_private,
> > knode_driver.n_node);
> > dev = dev_prv->device;

metadata:
git branch: master
git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
git describe: next-20200327
kernel-config:
https://builds.tuxbuild.com/nqmmxorUbC1qTWp42iEKjQ/kernel.config

--
Linaro LKFT
https://lkft.linaro.org

2020-03-27 19:41:00

by Robin Murphy

[permalink] [raw]

Subject: Re: Re: [PATCH v2 3/3] driver core: Replace open-coded list_last_entry()

On 2020-03-27 5:56 pm, Naresh Kamboju wrote:
> The kernel warning noticed on arm64 juno-r2 device running linux
> next-20200326 and next-20200327

I suspect this is the correct expected behaviour manifesting. If you're
using the upstream juno-r2.dts, the power domain being waited for here
is provided by SCPI, however unless you're using an SCP firmware from at
least 3 years ago you won't actually have SCPI since they switched it to
the newer SCMI protocol, which is not yet supported upstream for Juno.
See what happened earlier in the log:

[ 2.741206] scpi_protocol scpi: incorrect or no SCP firmware found
[ 2.747586] scpi_protocol: probe of scpi failed with error -110

Thus this is the "waiting for a dependency which will never appear"
case, for which I assume the warning is intentional, since the system is
essentially broken (i.e. the hardware/firmware doesn't actually match
what the DT describes).

Robin.

> [ 36.077086] ------------[ cut here ]------------
> [ 36.081752] amba 20010000.etf: deferred probe timeout, ignoring dependency
> [ 36.081859] WARNING: CPU: 1 PID: 42 at drivers/base/dd.c:270
> driver_deferred_probe_check_state+0x54/0x80
> [ 36.098242] Modules linked in: fuse
> [ 36.101753] CPU: 1 PID: 42 Comm: kworker/1:1 Not tainted
> 5.6.0-rc7-next-20200327 #1
> [ 36.109427] Hardware name: ARM Juno development board (r2) (DT)
> [ 36.115372] Workqueue: events amba_deferred_retry_func
> [ 36.120526] pstate: 60000005 (nZCv daif -PAN -UAO)
> [ 36.125334] pc : driver_deferred_probe_check_state+0x54/0x80
> [ 36.131010] lr : driver_deferred_probe_check_state+0x54/0x80
> [ 36.136680] sp : ffff000934e0fae0
> [ 36.140001] x29: ffff000934e0fae0 x28: ffff000934db5608
> [ 36.145337] x27: ffffa00013c63240 x26: ffff000934f2a800
> [ 36.150668] x25: 0000000000000001 x24: fffffffffffffffe
> [ 36.155996] x23: ffff000934c6ab80 x22: ffffa00011b39ea0
> [ 36.161322] x21: ffff000934f2a800 x20: ffffa00011905fe0
> [ 36.166649] x19: ffff000934f2a800 x18: 0000000000000000
> [ 36.171974] x17: 0000000000000000 x16: 0000000000000000
> [ 36.177299] x15: 0000000000000000 x14: 003d090000000000
> [ 36.182625] x13: 00003d0900000000 x12: ffff9400027ef445
> [ 36.187952] x11: 1ffff400027ef444 x10: ffff9400027ef444
> [ 36.193278] x9 : dfffa00000000000 x8 : 0000000000000000
> [ 36.198603] x7 : 0000000000000001 x6 : ffffa00013f7a220
> [ 36.203929] x5 : 0000000000000004 x4 : dfffa00000000000
> [ 36.209255] x3 : ffffa000101a74ec x2 : ffff8001269c1f26
> [ 36.214581] x1 : da1107b7b6a8fb00 x0 : 0000000000000000
> [ 36.219906] Call trace:
> [ 36.222369] driver_deferred_probe_check_state+0x54/0x80
> [ 36.227698] __genpd_dev_pm_attach+0x264/0x2a0
> [ 36.232154] genpd_dev_pm_attach+0x68/0x78
> [ 36.236265] dev_pm_domain_attach+0x6c/0x70
> [ 36.240463] amba_device_try_add+0xec/0x3f8
> [ 36.244659] amba_deferred_retry_func+0x84/0x158
> [ 36.249301] process_one_work+0x3f0/0x660
> [ 36.253326] worker_thread+0x74/0x698
> [ 36.256997] kthread+0x218/0x220
> [ 36.260236] ret_from_fork+0x10/0x1c
> [ 36.263819] ---[ end trace c637c10e549bd716 ]---#
>
> Full test log,
> https://lkft.validation.linaro.org/scheduler/job/1317079#L981
>
> On Tue, 24 Mar 2020 at 18:24, Rafael J. Wysocki <[email protected]> wrote:
>>
>> On Tue, Mar 24, 2020 at 1:20 PM Andy Shevchenko
>> <[email protected]> wrote:
>>>
>>> There is a place in the code where open-coded version of list entry accessors
>>> list_last_entry() is used.
>>>
>>> Replace that with the standard macro.
>>>
>>> Signed-off-by: Andy Shevchenko <[email protected]>
>>
>> Reviewed-by: Rafael J. Wysocki <[email protected]>
>>
>>> ---
>>> v2: no change
>>> drivers/base/dd.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
>>> index efd0e4c16ba5..27a4d51b5bba 100644
>>> --- a/drivers/base/dd.c
>>> +++ b/drivers/base/dd.c
>>> @@ -1226,7 +1226,7 @@ void driver_detach(struct device_driver *drv)
>>> spin_unlock(&drv->p->klist_devices.k_lock);
>>> break;
>>> }
>>> - dev_prv = list_entry(drv->p->klist_devices.k_list.prev,
>>> + dev_prv = list_last_entry(&drv->p->klist_devices.k_list,
>>> struct device_private,
>>> knode_driver.n_node);
>>> dev = dev_prv->device;
>
> metadata:
> git branch: master
> git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
> git describe: next-20200327
> kernel-config:
> https://builds.tuxbuild.com/nqmmxorUbC1qTWp42iEKjQ/kernel.config
>
>

2020-03-28 07:25:45

by Greg Kroah-Hartman

[permalink] [raw]

Subject: Re: [PATCH v2 3/3] driver core: Replace open-coded list_last_entry()

On Fri, Mar 27, 2020 at 11:26:13PM +0530, Naresh Kamboju wrote:
> The kernel warning noticed on arm64 juno-r2 device running linux
> next-20200326 and next-20200327
>
> [ 36.077086] ------------[ cut here ]------------
> [ 36.081752] amba 20010000.etf: deferred probe timeout, ignoring dependency
> [ 36.081859] WARNING: CPU: 1 PID: 42 at drivers/base/dd.c:270
> driver_deferred_probe_check_state+0x54/0x80
> [ 36.098242] Modules linked in: fuse
> [ 36.101753] CPU: 1 PID: 42 Comm: kworker/1:1 Not tainted
> 5.6.0-rc7-next-20200327 #1
> [ 36.109427] Hardware name: ARM Juno development board (r2) (DT)
> [ 36.115372] Workqueue: events amba_deferred_retry_func
> [ 36.120526] pstate: 60000005 (nZCv daif -PAN -UAO)
> [ 36.125334] pc : driver_deferred_probe_check_state+0x54/0x80
> [ 36.131010] lr : driver_deferred_probe_check_state+0x54/0x80
> [ 36.136680] sp : ffff000934e0fae0
> [ 36.140001] x29: ffff000934e0fae0 x28: ffff000934db5608
> [ 36.145337] x27: ffffa00013c63240 x26: ffff000934f2a800
> [ 36.150668] x25: 0000000000000001 x24: fffffffffffffffe
> [ 36.155996] x23: ffff000934c6ab80 x22: ffffa00011b39ea0
> [ 36.161322] x21: ffff000934f2a800 x20: ffffa00011905fe0
> [ 36.166649] x19: ffff000934f2a800 x18: 0000000000000000
> [ 36.171974] x17: 0000000000000000 x16: 0000000000000000
> [ 36.177299] x15: 0000000000000000 x14: 003d090000000000
> [ 36.182625] x13: 00003d0900000000 x12: ffff9400027ef445
> [ 36.187952] x11: 1ffff400027ef444 x10: ffff9400027ef444
> [ 36.193278] x9 : dfffa00000000000 x8 : 0000000000000000
> [ 36.198603] x7 : 0000000000000001 x6 : ffffa00013f7a220
> [ 36.203929] x5 : 0000000000000004 x4 : dfffa00000000000
> [ 36.209255] x3 : ffffa000101a74ec x2 : ffff8001269c1f26
> [ 36.214581] x1 : da1107b7b6a8fb00 x0 : 0000000000000000
> [ 36.219906] Call trace:
> [ 36.222369] driver_deferred_probe_check_state+0x54/0x80
> [ 36.227698] __genpd_dev_pm_attach+0x264/0x2a0
> [ 36.232154] genpd_dev_pm_attach+0x68/0x78
> [ 36.236265] dev_pm_domain_attach+0x6c/0x70
> [ 36.240463] amba_device_try_add+0xec/0x3f8
> [ 36.244659] amba_deferred_retry_func+0x84/0x158
> [ 36.249301] process_one_work+0x3f0/0x660
> [ 36.253326] worker_thread+0x74/0x698
> [ 36.256997] kthread+0x218/0x220
> [ 36.260236] ret_from_fork+0x10/0x1c
> [ 36.263819] ---[ end trace c637c10e549bd716 ]---#
>
> Full test log,
> https://lkft.validation.linaro.org/scheduler/job/1317079#L981
>
> On Tue, 24 Mar 2020 at 18:24, Rafael J. Wysocki <[email protected]> wrote:
> >
> > On Tue, Mar 24, 2020 at 1:20 PM Andy Shevchenko
> > <[email protected]> wrote:
> > >
> > > There is a place in the code where open-coded version of list entry accessors
> > > list_last_entry() is used.
> > >
> > > Replace that with the standard macro.
> > >
> > > Signed-off-by: Andy Shevchenko <[email protected]>
> >
> > Reviewed-by: Rafael J. Wysocki <[email protected]>
> >
> > > ---
> > > v2: no change
> > > drivers/base/dd.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> > > index efd0e4c16ba5..27a4d51b5bba 100644
> > > --- a/drivers/base/dd.c
> > > +++ b/drivers/base/dd.c
> > > @@ -1226,7 +1226,7 @@ void driver_detach(struct device_driver *drv)
> > > spin_unlock(&drv->p->klist_devices.k_lock);
> > > break;
> > > }
> > > - dev_prv = list_entry(drv->p->klist_devices.k_list.prev,
> > > + dev_prv = list_last_entry(&drv->p->klist_devices.k_list,
> > > struct device_private,
> > > knode_driver.n_node);
> > > dev = dev_prv->device;
>
> metadata:
> git branch: master
> git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
> git describe: next-20200327
> kernel-config:
> https://builds.tuxbuild.com/nqmmxorUbC1qTWp42iEKjQ/kernel.config
>

And you bisected the warning to this patch? If you revert it, does it
go away?

confused,

greg k-h

2020-03-30 10:14:37

by Sudeep Holla

[permalink] [raw]

Subject: Re: Re: [PATCH v2 3/3] driver core: Replace open-coded list_last_entry()

On Fri, Mar 27, 2020 at 07:40:25PM +0000, Robin Murphy wrote:
> On 2020-03-27 5:56 pm, Naresh Kamboju wrote:
> > The kernel warning noticed on arm64 juno-r2 device running linux
> > next-20200326 and next-20200327
>
> I suspect this is the correct expected behaviour manifesting. If you're
> using the upstream juno-r2.dts, the power domain being waited for here is
> provided by SCPI, however unless you're using an SCP firmware from at least
> 3 years ago you won't actually have SCPI since they switched it to the newer
> SCMI protocol, which is not yet supported upstream for Juno. See what
> happened earlier in the log:
>
> [ 2.741206] scpi_protocol scpi: incorrect or no SCP firmware found
> [ 2.747586] scpi_protocol: probe of scpi failed with error -110
>
> Thus this is the "waiting for a dependency which will never appear" case,
> for which I assume the warning is intentional,

Is that the case ?

Previously we used to get the warning:
"amba xx: ignoring dependency for device, assuming no driver"

Now we have the kernel warning in addition to the above.

> since the system is essentially broken (i.e. the hardware/firmware doesn't
> actually match what the DT describes).
>

Not sure if we can term it as "essentially broken". Definitely not 100%
functional but not broken if the situation like on Juno where SCP firmware
is fundamental for all OSPM but not essential for boot and other minimum
set of functionality.

Either way I am not against the warning, just wanted to get certain things
clarified myself.

--
Regards,
Sudeep

2020-03-30 10:44:38

by Andy Shevchenko

[permalink] [raw]

Subject: Re: Re: [PATCH v2 3/3] driver core: Replace open-coded list_last_entry()

On Mon, Mar 30, 2020 at 11:13:21AM +0100, Sudeep Holla wrote:
> On Fri, Mar 27, 2020 at 07:40:25PM +0000, Robin Murphy wrote:
> > On 2020-03-27 5:56 pm, Naresh Kamboju wrote:
> > > The kernel warning noticed on arm64 juno-r2 device running linux
> > > next-20200326 and next-20200327
> >
> > I suspect this is the correct expected behaviour manifesting. If you're
> > using the upstream juno-r2.dts, the power domain being waited for here is
> > provided by SCPI, however unless you're using an SCP firmware from at least
> > 3 years ago you won't actually have SCPI since they switched it to the newer
> > SCMI protocol, which is not yet supported upstream for Juno. See what
> > happened earlier in the log:
> >
> > [ 2.741206] scpi_protocol scpi: incorrect or no SCP firmware found
> > [ 2.747586] scpi_protocol: probe of scpi failed with error -110
> >
> > Thus this is the "waiting for a dependency which will never appear" case,
> > for which I assume the warning is intentional,
>
> Is that the case ?
>
> Previously we used to get the warning:
> "amba xx: ignoring dependency for device, assuming no driver"
>
> Now we have the kernel warning in addition to the above.
>
> > since the system is essentially broken (i.e. the hardware/firmware doesn't
> > actually match what the DT describes).
> >
>
> Not sure if we can term it as "essentially broken". Definitely not 100%
> functional but not broken if the situation like on Juno where SCP firmware
> is fundamental for all OSPM but not essential for boot and other minimum
> set of functionality.
>
> Either way I am not against the warning, just wanted to get certain things
> clarified myself.

How this warning related to the patch in the subject? Does revert of the patch
gives you no warning? (I will be very surprised).

--
With Best Regards,
Andy Shevchenko

2020-03-30 12:49:09

by Robin Murphy

[permalink] [raw]

Subject: Re: [PATCH v2 3/3] driver core: Replace open-coded list_last_entry()

On 2020-03-30 11:13 am, Sudeep Holla wrote:
> On Fri, Mar 27, 2020 at 07:40:25PM +0000, Robin Murphy wrote:
>> On 2020-03-27 5:56 pm, Naresh Kamboju wrote:
>>> The kernel warning noticed on arm64 juno-r2 device running linux
>>> next-20200326 and next-20200327
>>
>> I suspect this is the correct expected behaviour manifesting. If you're
>> using the upstream juno-r2.dts, the power domain being waited for here is
>> provided by SCPI, however unless you're using an SCP firmware from at least
>> 3 years ago you won't actually have SCPI since they switched it to the newer
>> SCMI protocol, which is not yet supported upstream for Juno. See what
>> happened earlier in the log:
>>
>> [ 2.741206] scpi_protocol scpi: incorrect or no SCP firmware found
>> [ 2.747586] scpi_protocol: probe of scpi failed with error -110
>>
>> Thus this is the "waiting for a dependency which will never appear" case,
>> for which I assume the warning is intentional,
>
> Is that the case ?
>
> Previously we used to get the warning:
> "amba xx: ignoring dependency for device, assuming no driver"
>
> Now we have the kernel warning in addition to the above.

AFAICS the difference is down to whether deferred_probe_timeout has
expired or not - I'm not familiar enough with this code to know
*exactly* what the difference is supposed to represent, nor which change
has actually pushed the Juno case from one state to the other (other
than it almost certainly can't be $SUBJECT - if this series is to blame
at all I'd assume it would be down to patch #1/3, but there's a bunch of
other rework previously queued in -next that is probably also interacting)

>> since the system is essentially broken (i.e. the hardware/firmware doesn't
>> actually match what the DT describes).
>>
>
> Not sure if we can term it as "essentially broken". Definitely not 100%
> functional but not broken if the situation like on Juno where SCP firmware
> is fundamental for all OSPM but not essential for boot and other minimum
> set of functionality.

It's "broken" in the sense that the underlying system is *not* the
system described in the DT. Yes, all the parts that still happen to line
up will mostly still function OK, but those that don't will
fundamentally not work as the kernel has been told to expect. I'm not
sure what you prefer to call "not working as the kernel expects", but I
call it "broken" ;)

Robin.

2020-03-30 13:13:15

by Andy Shevchenko

[permalink] [raw]

Subject: Re: [PATCH v2 3/3] driver core: Replace open-coded list_last_entry()

On Mon, Mar 30, 2020 at 3:49 PM Robin Murphy <[email protected]> wrote:
> On 2020-03-30 11:13 am, Sudeep Holla wrote:
> > On Fri, Mar 27, 2020 at 07:40:25PM +0000, Robin Murphy wrote:

...

> AFAICS the difference is down to whether deferred_probe_timeout has
> expired or not - I'm not familiar enough with this code to know
> *exactly* what the difference is supposed to represent, nor which change
> has actually pushed the Juno case from one state to the other (other
> than it almost certainly can't be $SUBJECT - if this series is to blame
> at all I'd assume it would be down to patch #1/3, but there's a bunch of
> other rework previously queued in -next that is probably also interacting)

JFYI: patch #1/3 wasn't applied.

--
With Best Regards,
Andy Shevchenko

2020-03-30 13:17:46

by Sudeep Holla

[permalink] [raw]

Subject: Re: Re: [PATCH v2 3/3] driver core: Replace open-coded list_last_entry()

On Mon, Mar 30, 2020 at 01:43:40PM +0300, Andy Shevchenko wrote:

[...]

>
> How this warning related to the patch in the subject? Does revert of the patch
> gives you no warning? (I will be very surprised).
>

I did a quick check reverting the $subject patch and it doesn't remove
extra warning. Sorry for the noise, I assumed so hastily, I was wrong.

--
Regards,
Sudeep

2020-03-30 13:31:13

by Robin Murphy

[permalink] [raw]

Subject: Re: [PATCH v2 3/3] driver core: Replace open-coded list_last_entry()

On 2020-03-30 2:11 pm, Andy Shevchenko wrote:
> On Mon, Mar 30, 2020 at 3:49 PM Robin Murphy <[email protected]> wrote:
>> On 2020-03-30 11:13 am, Sudeep Holla wrote:
>>> On Fri, Mar 27, 2020 at 07:40:25PM +0000, Robin Murphy wrote:
>
> ...
>
>> AFAICS the difference is down to whether deferred_probe_timeout has
>> expired or not - I'm not familiar enough with this code to know
>> *exactly* what the difference is supposed to represent, nor which change
>> has actually pushed the Juno case from one state to the other (other
>> than it almost certainly can't be $SUBJECT - if this series is to blame
>> at all I'd assume it would be down to patch #1/3, but there's a bunch of
>> other rework previously queued in -next that is probably also interacting)
>
> JFYI: patch #1/3 wasn't applied.

OK, so if anyone's invested enough to want to investigate, it must be
something in John's earlier changes here:

https://lore.kernel.org/lkml/[email protected]/

Thanks,
Robin.

2020-03-30 13:59:27

by Sudeep Holla

[permalink] [raw]

Subject: Re: [PATCH v2 3/3] driver core: Replace open-coded list_last_entry()

On Mon, Mar 30, 2020 at 01:45:32PM +0100, Robin Murphy wrote:
> On 2020-03-30 11:13 am, Sudeep Holla wrote:
> > On Fri, Mar 27, 2020 at 07:40:25PM +0000, Robin Murphy wrote:
> > > On 2020-03-27 5:56 pm, Naresh Kamboju wrote:
> > > > The kernel warning noticed on arm64 juno-r2 device running linux
> > > > next-20200326 and next-20200327
> > >
> > > I suspect this is the correct expected behaviour manifesting. If you're
> > > using the upstream juno-r2.dts, the power domain being waited for here is
> > > provided by SCPI, however unless you're using an SCP firmware from at least
> > > 3 years ago you won't actually have SCPI since they switched it to the newer
> > > SCMI protocol, which is not yet supported upstream for Juno. See what
> > > happened earlier in the log:
> > >
> > > [ 2.741206] scpi_protocol scpi: incorrect or no SCP firmware found
> > > [ 2.747586] scpi_protocol: probe of scpi failed with error -110
> > >
> > > Thus this is the "waiting for a dependency which will never appear" case,
> > > for which I assume the warning is intentional,
> >
> > Is that the case ?
> >
> > Previously we used to get the warning:
> > "amba xx: ignoring dependency for device, assuming no driver"
> >
> > Now we have the kernel warning in addition to the above.
>
> AFAICS the difference is down to whether deferred_probe_timeout has expired
> or not - I'm not familiar enough with this code to know *exactly* what the
> difference is supposed to represent, nor which change has actually pushed
> the Juno case from one state to the other

Me either

> (other than it almost certainly
> can't be $SUBJECT - if this series is to blame at all I'd assume it would be
> down to patch #1/3, but there's a bunch of other rework previously queued in
> -next that is probably also interacting)
>

I agree, I was assuming one of the patch in series but again I may be wrong.

> > > since the system is essentially broken (i.e. the hardware/firmware doesn't
> > > actually match what the DT describes).
> > >
> >
> > Not sure if we can term it as "essentially broken". Definitely not 100%
> > functional but not broken if the situation like on Juno where SCP firmware
> > is fundamental for all OSPM but not essential for boot and other minimum
> > set of functionality.
>
> It's "broken" in the sense that the underlying system is *not* the system
> described in the DT. Yes, all the parts that still happen to line up will
> mostly still function OK, but those that don't will fundamentally not work
> as the kernel has been told to expect. I'm not sure what you prefer to call
> "not working as the kernel expects", but I call it "broken" ;)
>

I agree with you in context of Juno and it's firmware story.

But I also have another development use-case. Unless the DT becomes the
integral part of firmware from start, we can end up having DT with full
DT components(e.g. all SCMI users) while the firmware can add the
features incremental way. I agree this is not common for most of the
kernel developer but practical for few.

--
Regards,
Sudeep

2020-03-30 20:03:27

by John Stultz

[permalink] [raw]

Subject: Re: [PATCH v2 3/3] driver core: Replace open-coded list_last_entry()

On Mon, Mar 30, 2020 at 6:30 AM Robin Murphy <[email protected]> wrote:
>
> On 2020-03-30 2:11 pm, Andy Shevchenko wrote:
> > On Mon, Mar 30, 2020 at 3:49 PM Robin Murphy <[email protected]> wrote:
> >> On 2020-03-30 11:13 am, Sudeep Holla wrote:
> >>> On Fri, Mar 27, 2020 at 07:40:25PM +0000, Robin Murphy wrote:
> >
> > ...
> >
> >> AFAICS the difference is down to whether deferred_probe_timeout has
> >> expired or not - I'm not familiar enough with this code to know
> >> *exactly* what the difference is supposed to represent, nor which change
> >> has actually pushed the Juno case from one state to the other (other
> >> than it almost certainly can't be $SUBJECT - if this series is to blame
> >> at all I'd assume it would be down to patch #1/3, but there's a bunch of
> >> other rework previously queued in -next that is probably also interacting)
> >
> > JFYI: patch #1/3 wasn't applied.
>
> OK, so if anyone's invested enough to want to investigate, it must be
> something in John's earlier changes here:
>
> https://lore.kernel.org/lkml/[email protected]/

Hey all,
Sorry, I just got a heads up about this thread.

So yea, it looks like the change is due likely to the first patch in
my series. Previously, after initcall_done, (since
deferred_probe_timeout was -1 unless manually specified) if the driver
wasn't already loaded we'd print "ignoring dependency for device,
assuming no driver" and return ENODEV.

Now, if modules are enabled (as without modules enabled, I believe
you'd see the same behavior as previous), we wait 30 seconds (for
userspace to load any posssible modules that meet that dependency) and
then the driver_deferred_probe_timeout fires and we print "deferred
probe timeout, ignoring dependency".

It seems the issue here is the first message was printed with
dev_warn() and the second with dev_WARN() which provides the scary
backtrace.

I think functionally as mentioned above, there's no real behavioral
change here. But please correct me if that's wrong.

Since we are more likely to see the second message now, maybe we
should make both print via dev_warn()?

I'll spin up a patch.

thanks
-john