2014-04-09 10:52:17

by Peter Ujfalusi

[permalink] [raw]
Subject: [PATCH v2 0/2] drivercore: deferral race condition fix

Hi,

Changes since v1:
- The deferral race detection and handling has been moved to
driver_deferred_probe_trigger() and driver_deferred_probe_add() functions with
comment section.
- driver_deferred_probe_add/del function needed to be moved. Both of them placed
after driver_deferred_probe_trigger() so they remain together.
- I have not added the Acked-by from Mark and Grant since the patch itself
changed from v1, but the functionality remained.

Explanantion of the issue (from patch 2):
When the kernel is built with CONFIG_PREEMPT it is possible to reach a state
when all modules loaded but some driver still stuck in the deferred list
and there is a need for external event to kick the deferred queue to probe
these drivers.

The issue has been observed on embedded systems with CONFIG_PREEMPT enabled,
audio support built as modules and using nfsroot for root filesystem.

The following log fragment shows such sequence when all audio modules
were loaded but the sound card is not present since the machine driver has
failed to probe due to missing dependency during it's probe.
The board is am335x-evmsk (McASP<->tlv320aic3106 codec) with davinci-evm
machine driver:

...
[ 12.615118] davinci-mcasp 4803c000.mcasp: davinci_mcasp_probe: ENTER
[ 12.719969] davinci_evm sound.3: davinci_evm_probe: ENTER
[ 12.725753] davinci_evm sound.3: davinci_evm_probe: snd_soc_register_card
[ 12.753846] davinci-mcasp 4803c000.mcasp: davinci_mcasp_probe: snd_soc_register_component
[ 12.922051] davinci-mcasp 4803c000.mcasp: davinci_mcasp_probe: snd_soc_register_component DONE
[ 12.950839] davinci_evm sound.3: ASoC: platform (null) not registered
[ 12.957898] davinci_evm sound.3: davinci_evm_probe: snd_soc_register_card DONE (-517)
[ 13.099026] davinci-mcasp 4803c000.mcasp: Kicking the deferred list
[ 13.177838] davinci-mcasp 4803c000.mcasp: really_probe: probe_count = 2
[ 13.194130] davinci_evm sound.3: snd_soc_register_card failed (-517)
[ 13.346755] davinci_mcasp_driver_init: LEAVE
[ 13.377446] platform sound.3: Driver davinci_evm requests probe deferral
[ 13.592527] platform sound.3: really_probe: probe_count = 0

In the log the machine driver enters it's probe at 12.719969 (this point it
has been removed from the deferred lists). McASP driver already executing
it's probing (since 12.615118).
The machine driver tries to construct the sound card (12.950839) but did
not found one of the components so it fails. After this McASP driver
registers all the ASoC components (the machine driver still in it's probe
function after it failed to construct the card) and the deferred work is
prepared at 13.099026 (note that this time the machine driver is not in the
lists so it is not going to be handled when the work is executing).
Lastly the machine driver exit from it's probe and the core places it to
the deferred list but there will be no other driver going to load and the
deferred queue is not going to be kicked again - till we have external event
like connecting USB stick, etc.

The proposed solution is to try the deferred queue once more when the last
driver is asking for deferring and we had drivers loaded while this last
driver was probing.

This way we can avoid drivers stuck in the deferred queue.

Regards,
Peter
---
Peter Ujfalusi (2):
drivercore: dd: Move driver_deferred_probe_add/del function down in
the code
drivercore: deferral race condition fix

drivers/base/dd.c | 71 +++++++++++++++++++++++++++++++++++++------------------
1 file changed, 48 insertions(+), 23 deletions(-)

--
1.9.1


2014-04-09 10:52:26

by Peter Ujfalusi

[permalink] [raw]
Subject: [PATCH v2 2/2] drivercore: deferral race condition fix

When the kernel is built with CONFIG_PREEMPT it is possible to reach a state
when all modules loaded but some driver still stuck in the deferred list
and there is a need for external event to kick the deferred queue to probe
these drivers.

The issue has been observed on embedded systems with CONFIG_PREEMPT enabled,
audio support built as modules and using nfsroot for root filesystem.

The following log fragment shows such sequence when all audio modules
were loaded but the sound card is not present since the machine driver has
failed to probe due to missing dependency during it's probe.
The board is am335x-evmsk (McASP<->tlv320aic3106 codec) with davinci-evm
machine driver:

...
[ 12.615118] davinci-mcasp 4803c000.mcasp: davinci_mcasp_probe: ENTER
[ 12.719969] davinci_evm sound.3: davinci_evm_probe: ENTER
[ 12.725753] davinci_evm sound.3: davinci_evm_probe: snd_soc_register_card
[ 12.753846] davinci-mcasp 4803c000.mcasp: davinci_mcasp_probe: snd_soc_register_component
[ 12.922051] davinci-mcasp 4803c000.mcasp: davinci_mcasp_probe: snd_soc_register_component DONE
[ 12.950839] davinci_evm sound.3: ASoC: platform (null) not registered
[ 12.957898] davinci_evm sound.3: davinci_evm_probe: snd_soc_register_card DONE (-517)
[ 13.099026] davinci-mcasp 4803c000.mcasp: Kicking the deferred list
[ 13.177838] davinci-mcasp 4803c000.mcasp: really_probe: probe_count = 2
[ 13.194130] davinci_evm sound.3: snd_soc_register_card failed (-517)
[ 13.346755] davinci_mcasp_driver_init: LEAVE
[ 13.377446] platform sound.3: Driver davinci_evm requests probe deferral
[ 13.592527] platform sound.3: really_probe: probe_count = 0

In the log the machine driver enters it's probe at 12.719969 (this point it
has been removed from the deferred lists). McASP driver already executing
it's probing (since 12.615118).
The machine driver tries to construct the sound card (12.950839) but did
not found one of the components so it fails. After this McASP driver
registers all the ASoC components (the machine driver still in it's probe
function after it failed to construct the card) and the deferred work is
prepared at 13.099026 (note that this time the machine driver is not in the
lists so it is not going to be handled when the work is executing).
Lastly the machine driver exit from it's probe and the core places it to
the deferred list but there will be no other driver going to load and the
deferred queue is not going to be kicked again - till we have external event
like connecting USB stick, etc.

The proposed solution is to try the deferred queue once more when the last
driver is asking for deferring and we had drivers loaded while this last
driver was probing.

This way we can avoid drivers stuck in the deferred queue.

Signed-off-by: Peter Ujfalusi <[email protected]>
---
drivers/base/dd.c | 31 ++++++++++++++++++++++++++++---
1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 43d573b960ba..22252aebbd3e 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -53,6 +53,10 @@ static LIST_HEAD(deferred_probe_pending_list);
static LIST_HEAD(deferred_probe_active_list);
static struct workqueue_struct *deferred_wq;

+static atomic_t probe_count = ATOMIC_INIT(0);
+static DECLARE_WAIT_QUEUE_HEAD(probe_waitqueue);
+static bool deferred_auto_retry;
+
/**
* deferred_probe_work_func() - Retry probing devices in the active list.
*/
@@ -127,6 +131,15 @@ static void driver_deferred_probe_trigger(void)
* into the active list so they can be retried by the workqueue
*/
mutex_lock(&deferred_probe_mutex);
+ /*
+ * In case we have more than one device probing at the same time we do
+ * not yet know if the other driver going to be deferred or not.
+ * The flag 'deferred_auto_retry' will be used when the last driver asks
+ * for deferral to trigger again the deferred list to check if it's
+ * dependencies have been satisfied already.
+ */
+ deferred_auto_retry = (atomic_read(&probe_count) > 1);
+
list_splice_tail_init(&deferred_probe_pending_list,
&deferred_probe_active_list);
mutex_unlock(&deferred_probe_mutex);
@@ -140,12 +153,27 @@ static void driver_deferred_probe_trigger(void)

static void driver_deferred_probe_add(struct device *dev)
{
+ bool auto_trigger = false;
+
mutex_lock(&deferred_probe_mutex);
if (list_empty(&dev->p->deferred_probe)) {
dev_dbg(dev, "Added to deferred list\n");
list_add_tail(&dev->p->deferred_probe, &deferred_probe_pending_list);
}
+
+ /*
+ * This is the last driver to load and asking to be deferred.
+ * If other driver(s) loaded while this driver was loading, we
+ * should try the deferred drivers (including this driver) again to
+ * avoid missing dependency for this driver.
+ */
+ if (atomic_read(&probe_count) == 1 && deferred_auto_retry)
+ auto_trigger = true;
+
mutex_unlock(&deferred_probe_mutex);
+
+ if (auto_trigger)
+ driver_deferred_probe_trigger();
}

void driver_deferred_probe_del(struct device *dev)
@@ -259,9 +287,6 @@ int device_bind_driver(struct device *dev)
}
EXPORT_SYMBOL_GPL(device_bind_driver);

-static atomic_t probe_count = ATOMIC_INIT(0);
-static DECLARE_WAIT_QUEUE_HEAD(probe_waitqueue);
-
static int really_probe(struct device *dev, struct device_driver *drv)
{
int ret = 0;
--
1.9.1

2014-04-09 10:52:47

by Peter Ujfalusi

[permalink] [raw]
Subject: [PATCH v2 1/2] drivercore: dd: Move driver_deferred_probe_add/del function down in the code

Move both functions after driver_deferred_probe_trigger() in the source file
since upcoming patch will need to call the _trigger from _add.
Move also the _del so the functions will be kept together.

Signed-off-by: Peter Ujfalusi <[email protected]>
---
drivers/base/dd.c | 40 ++++++++++++++++++++--------------------
1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 06051767393f..43d573b960ba 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -108,26 +108,6 @@ static void deferred_probe_work_func(struct work_struct *work)
}
static DECLARE_WORK(deferred_probe_work, deferred_probe_work_func);

-static void driver_deferred_probe_add(struct device *dev)
-{
- mutex_lock(&deferred_probe_mutex);
- if (list_empty(&dev->p->deferred_probe)) {
- dev_dbg(dev, "Added to deferred list\n");
- list_add_tail(&dev->p->deferred_probe, &deferred_probe_pending_list);
- }
- mutex_unlock(&deferred_probe_mutex);
-}
-
-void driver_deferred_probe_del(struct device *dev)
-{
- mutex_lock(&deferred_probe_mutex);
- if (!list_empty(&dev->p->deferred_probe)) {
- dev_dbg(dev, "Removed from deferred list\n");
- list_del_init(&dev->p->deferred_probe);
- }
- mutex_unlock(&deferred_probe_mutex);
-}
-
static bool driver_deferred_probe_enable = false;
/**
* driver_deferred_probe_trigger() - Kick off re-probing deferred devices
@@ -158,6 +138,26 @@ static void driver_deferred_probe_trigger(void)
queue_work(deferred_wq, &deferred_probe_work);
}

+static void driver_deferred_probe_add(struct device *dev)
+{
+ mutex_lock(&deferred_probe_mutex);
+ if (list_empty(&dev->p->deferred_probe)) {
+ dev_dbg(dev, "Added to deferred list\n");
+ list_add_tail(&dev->p->deferred_probe, &deferred_probe_pending_list);
+ }
+ mutex_unlock(&deferred_probe_mutex);
+}
+
+void driver_deferred_probe_del(struct device *dev)
+{
+ mutex_lock(&deferred_probe_mutex);
+ if (!list_empty(&dev->p->deferred_probe)) {
+ dev_dbg(dev, "Removed from deferred list\n");
+ list_del_init(&dev->p->deferred_probe);
+ }
+ mutex_unlock(&deferred_probe_mutex);
+}
+
/**
* deferred_probe_initcall() - Enable probing of deferred devices
*
--
1.9.1

2014-04-24 07:31:52

by Peter Ujfalusi

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] drivercore: deferral race condition fix

Hi Greg, Grant,

On 04/09/2014 01:52 PM, Peter Ujfalusi wrote:
> Hi,
>
> Changes since v1:
> - The deferral race detection and handling has been moved to
> driver_deferred_probe_trigger() and driver_deferred_probe_add() functions with
> comment section.
> - driver_deferred_probe_add/del function needed to be moved. Both of them placed
> after driver_deferred_probe_trigger() so they remain together.
> - I have not added the Acked-by from Mark and Grant since the patch itself
> changed from v1, but the functionality remained.

Is this solution sounds good? Do you want me to resend the two patch?

Regards,
P?ter

> Explanantion of the issue (from patch 2):
> When the kernel is built with CONFIG_PREEMPT it is possible to reach a state
> when all modules loaded but some driver still stuck in the deferred list
> and there is a need for external event to kick the deferred queue to probe
> these drivers.
>
> The issue has been observed on embedded systems with CONFIG_PREEMPT enabled,
> audio support built as modules and using nfsroot for root filesystem.
>
> The following log fragment shows such sequence when all audio modules
> were loaded but the sound card is not present since the machine driver has
> failed to probe due to missing dependency during it's probe.
> The board is am335x-evmsk (McASP<->tlv320aic3106 codec) with davinci-evm
> machine driver:
>
> ...
> [ 12.615118] davinci-mcasp 4803c000.mcasp: davinci_mcasp_probe: ENTER
> [ 12.719969] davinci_evm sound.3: davinci_evm_probe: ENTER
> [ 12.725753] davinci_evm sound.3: davinci_evm_probe: snd_soc_register_card
> [ 12.753846] davinci-mcasp 4803c000.mcasp: davinci_mcasp_probe: snd_soc_register_component
> [ 12.922051] davinci-mcasp 4803c000.mcasp: davinci_mcasp_probe: snd_soc_register_component DONE
> [ 12.950839] davinci_evm sound.3: ASoC: platform (null) not registered
> [ 12.957898] davinci_evm sound.3: davinci_evm_probe: snd_soc_register_card DONE (-517)
> [ 13.099026] davinci-mcasp 4803c000.mcasp: Kicking the deferred list
> [ 13.177838] davinci-mcasp 4803c000.mcasp: really_probe: probe_count = 2
> [ 13.194130] davinci_evm sound.3: snd_soc_register_card failed (-517)
> [ 13.346755] davinci_mcasp_driver_init: LEAVE
> [ 13.377446] platform sound.3: Driver davinci_evm requests probe deferral
> [ 13.592527] platform sound.3: really_probe: probe_count = 0
>
> In the log the machine driver enters it's probe at 12.719969 (this point it
> has been removed from the deferred lists). McASP driver already executing
> it's probing (since 12.615118).
> The machine driver tries to construct the sound card (12.950839) but did
> not found one of the components so it fails. After this McASP driver
> registers all the ASoC components (the machine driver still in it's probe
> function after it failed to construct the card) and the deferred work is
> prepared at 13.099026 (note that this time the machine driver is not in the
> lists so it is not going to be handled when the work is executing).
> Lastly the machine driver exit from it's probe and the core places it to
> the deferred list but there will be no other driver going to load and the
> deferred queue is not going to be kicked again - till we have external event
> like connecting USB stick, etc.
>
> The proposed solution is to try the deferred queue once more when the last
> driver is asking for deferring and we had drivers loaded while this last
> driver was probing.
>
> This way we can avoid drivers stuck in the deferred queue.
>
> Regards,
> Peter
> ---
> Peter Ujfalusi (2):
> drivercore: dd: Move driver_deferred_probe_add/del function down in
> the code
> drivercore: deferral race condition fix
>
> drivers/base/dd.c | 71 +++++++++++++++++++++++++++++++++++++------------------
> 1 file changed, 48 insertions(+), 23 deletions(-)

2014-04-24 21:01:34

by Grant Likely

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] drivercore: deferral race condition fix

On Thu, Apr 24, 2014 at 8:31 AM, Peter Ujfalusi <[email protected]> wrote:
> Hi Greg, Grant,
>
> On 04/09/2014 01:52 PM, Peter Ujfalusi wrote:
>> Hi,
>>
>> Changes since v1:
>> - The deferral race detection and handling has been moved to
>> driver_deferred_probe_trigger() and driver_deferred_probe_add() functions with
>> comment section.
>> - driver_deferred_probe_add/del function needed to be moved. Both of them placed
>> after driver_deferred_probe_trigger() so they remain together.
>> - I have not added the Acked-by from Mark and Grant since the patch itself
>> changed from v1, but the functionality remained.
>
> Is this solution sounds good? Do you want me to resend the two patch?

You don't need to resend. I just got back from vacation. I will look
at it tomorrow.

g.

>
> Regards,
> P?ter
>
>> Explanantion of the issue (from patch 2):
>> When the kernel is built with CONFIG_PREEMPT it is possible to reach a state
>> when all modules loaded but some driver still stuck in the deferred list
>> and there is a need for external event to kick the deferred queue to probe
>> these drivers.
>>
>> The issue has been observed on embedded systems with CONFIG_PREEMPT enabled,
>> audio support built as modules and using nfsroot for root filesystem.
>>
>> The following log fragment shows such sequence when all audio modules
>> were loaded but the sound card is not present since the machine driver has
>> failed to probe due to missing dependency during it's probe.
>> The board is am335x-evmsk (McASP<->tlv320aic3106 codec) with davinci-evm
>> machine driver:
>>
>> ...
>> [ 12.615118] davinci-mcasp 4803c000.mcasp: davinci_mcasp_probe: ENTER
>> [ 12.719969] davinci_evm sound.3: davinci_evm_probe: ENTER
>> [ 12.725753] davinci_evm sound.3: davinci_evm_probe: snd_soc_register_card
>> [ 12.753846] davinci-mcasp 4803c000.mcasp: davinci_mcasp_probe: snd_soc_register_component
>> [ 12.922051] davinci-mcasp 4803c000.mcasp: davinci_mcasp_probe: snd_soc_register_component DONE
>> [ 12.950839] davinci_evm sound.3: ASoC: platform (null) not registered
>> [ 12.957898] davinci_evm sound.3: davinci_evm_probe: snd_soc_register_card DONE (-517)
>> [ 13.099026] davinci-mcasp 4803c000.mcasp: Kicking the deferred list
>> [ 13.177838] davinci-mcasp 4803c000.mcasp: really_probe: probe_count = 2
>> [ 13.194130] davinci_evm sound.3: snd_soc_register_card failed (-517)
>> [ 13.346755] davinci_mcasp_driver_init: LEAVE
>> [ 13.377446] platform sound.3: Driver davinci_evm requests probe deferral
>> [ 13.592527] platform sound.3: really_probe: probe_count = 0
>>
>> In the log the machine driver enters it's probe at 12.719969 (this point it
>> has been removed from the deferred lists). McASP driver already executing
>> it's probing (since 12.615118).
>> The machine driver tries to construct the sound card (12.950839) but did
>> not found one of the components so it fails. After this McASP driver
>> registers all the ASoC components (the machine driver still in it's probe
>> function after it failed to construct the card) and the deferred work is
>> prepared at 13.099026 (note that this time the machine driver is not in the
>> lists so it is not going to be handled when the work is executing).
>> Lastly the machine driver exit from it's probe and the core places it to
>> the deferred list but there will be no other driver going to load and the
>> deferred queue is not going to be kicked again - till we have external event
>> like connecting USB stick, etc.
>>
>> The proposed solution is to try the deferred queue once more when the last
>> driver is asking for deferring and we had drivers loaded while this last
>> driver was probing.
>>
>> This way we can avoid drivers stuck in the deferred queue.
>>
>> Regards,
>> Peter
>> ---
>> Peter Ujfalusi (2):
>> drivercore: dd: Move driver_deferred_probe_add/del function down in
>> the code
>> drivercore: deferral race condition fix
>>
>> drivers/base/dd.c | 71 +++++++++++++++++++++++++++++++++++++------------------
>> 1 file changed, 48 insertions(+), 23 deletions(-)
>

2014-04-28 14:18:52

by Grant Likely

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] drivercore: deferral race condition fix

On Thu, 24 Apr 2014 22:01:02 +0100, Grant Likely <[email protected]> wrote:
> On Thu, Apr 24, 2014 at 8:31 AM, Peter Ujfalusi <[email protected]> wrote:
> > Hi Greg, Grant,
> >
> > On 04/09/2014 01:52 PM, Peter Ujfalusi wrote:
> >> Hi,
> >>
> >> Changes since v1:
> >> - The deferral race detection and handling has been moved to
> >> driver_deferred_probe_trigger() and driver_deferred_probe_add() functions with
> >> comment section.
> >> - driver_deferred_probe_add/del function needed to be moved. Both of them placed
> >> after driver_deferred_probe_trigger() so they remain together.
> >> - I have not added the Acked-by from Mark and Grant since the patch itself
> >> changed from v1, but the functionality remained.
> >
> > Is this solution sounds good? Do you want me to resend the two patch?
>
> You don't need to resend. I just got back from vacation. I will look
> at it tomorrow.

Hi Peter,

I think I have a cleaner solution. I think it avoids a lot of the corner
cases that I was concerned about with the flag approach. Can you look at
this and tell me if it works for you?

Basically, at the start of each probe, really_probe() checks how many
triggers have occurred, and then later in the defer failure path, it
checks to see if the trigger number has changed. If it has, then it is
possible that the dependencies are resolved, so it does a trigger again.

I could refine this a bit more. The local_trigger_count could be passed
into driver_deferred_probe_add() and the add function could be more
intelligent about only adding that one device to the active list instead
of the entire pending list. However, I don't think doing a full trigger
is actually going to be expensive in the conditions that will trigger
the race. Unless you can convince me otherwise, I think it is better to
keep it simple.

g.

---

drivercore: deferral race condition fix

When the kernel is built with CONFIG_PREEMPT it is possible to reach a state
when all modules loaded but some driver still stuck in the deferred list
and there is a need for external event to kick the deferred queue to probe
these drivers.

The issue has been observed on embedded systems with CONFIG_PREEMPT enabled,
audio support built as modules and using nfsroot for root filesystem.

The following log fragment shows such sequence when all audio modules
were loaded but the sound card is not present since the machine driver has
failed to probe due to missing dependency during it's probe.
The board is am335x-evmsk (McASP<->tlv320aic3106 codec) with davinci-evm
machine driver:

...
[ 12.615118] davinci-mcasp 4803c000.mcasp: davinci_mcasp_probe: ENTER
[ 12.719969] davinci_evm sound.3: davinci_evm_probe: ENTER
[ 12.725753] davinci_evm sound.3: davinci_evm_probe: snd_soc_register_card
[ 12.753846] davinci-mcasp 4803c000.mcasp: davinci_mcasp_probe: snd_soc_register_component
[ 12.922051] davinci-mcasp 4803c000.mcasp: davinci_mcasp_probe: snd_soc_register_component DONE
[ 12.950839] davinci_evm sound.3: ASoC: platform (null) not registered
[ 12.957898] davinci_evm sound.3: davinci_evm_probe: snd_soc_register_card DONE (-517)
[ 13.099026] davinci-mcasp 4803c000.mcasp: Kicking the deferred list
[ 13.177838] davinci-mcasp 4803c000.mcasp: really_probe: probe_count = 2
[ 13.194130] davinci_evm sound.3: snd_soc_register_card failed (-517)
[ 13.346755] davinci_mcasp_driver_init: LEAVE
[ 13.377446] platform sound.3: Driver davinci_evm requests probe deferral
[ 13.592527] platform sound.3: really_probe: probe_count = 0

In the log the machine driver enters it's probe at 12.719969 (this point it
has been removed from the deferred lists). McASP driver already executing
it's probing (since 12.615118).
The machine driver tries to construct the sound card (12.950839) but did
not found one of the components so it fails. After this McASP driver
registers all the ASoC components (the machine driver still in it's probe
function after it failed to construct the card) and the deferred work is
prepared at 13.099026 (note that this time the machine driver is not in the
lists so it is not going to be handled when the work is executing).
Lastly the machine driver exit from it's probe and the core places it to
the deferred list but there will be no other driver going to load and the
deferred queue is not going to be kicked again - till we have external event
like connecting USB stick, etc.

The proposed solution is to check if a trigger has occured during probe, and
if so then trigger the deferred queue again in the defer path.

This way we can avoid drivers stuck in the deferred queue.

[commit description written by Peter Ujfalusi]
Signed-off-by: Grant Likely <[email protected]>

---

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 8986b9f22781..af76fcd8da64 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -52,6 +52,7 @@ static DEFINE_MUTEX(deferred_probe_mutex);
static LIST_HEAD(deferred_probe_pending_list);
static LIST_HEAD(deferred_probe_active_list);
static struct workqueue_struct *deferred_wq;
+static atomic_t deferred_trigger_count = ATOMIC_INIT(0);

/**
* deferred_probe_work_func() - Retry probing devices in the active list.
@@ -135,6 +136,17 @@ static bool driver_deferred_probe_enable = false;
* This functions moves all devices from the pending list to the active
* list and schedules the deferred probe workqueue to process them. It
* should be called anytime a driver is successfully bound to a device.
+ *
+ * Note, there is a race condition in multi-threaded probe. In the case where
+ * more than one device is probing at the same time, it is possible for one
+ * probe to complete successfully while another is about to defer. If the second
+ * depends on the first, then it will get put on the pending list after the
+ * trigger event has already occured and will be stuck there.
+ *
+ * The atomic 'deferred_trigger_count' is used to determine if a successful
+ * trigger has occurred in the midst of probing a driver. If the trigger count
+ * changes in the midst of a probe, then deferred processing should be triggered
+ * again.
*/
static void driver_deferred_probe_trigger(void)
{
@@ -147,6 +159,7 @@ static void driver_deferred_probe_trigger(void)
* into the active list so they can be retried by the workqueue
*/
mutex_lock(&deferred_probe_mutex);
+ atomic_inc(&deferred_trigger_count);
list_splice_tail_init(&deferred_probe_pending_list,
&deferred_probe_active_list);
mutex_unlock(&deferred_probe_mutex);
@@ -265,6 +278,7 @@ static DECLARE_WAIT_QUEUE_HEAD(probe_waitqueue);
static int really_probe(struct device *dev, struct device_driver *drv)
{
int ret = 0;
+ int local_trigger_count = atomic_read(&deferred_trigger_count);

atomic_inc(&probe_count);
pr_debug("bus: '%s': %s: probing driver %s with device %s\n",
@@ -310,6 +324,8 @@ probe_failed:
/* Driver requested deferred probing */
dev_info(dev, "Driver %s requests probe deferral\n", drv->name);
driver_deferred_probe_add(dev);
+ if (local_trigger_count != atomic_read(&deferred_trigger_count))
+ driver_deferred_probe_trigger();
} else if (ret != -ENODEV && ret != -ENXIO) {
/* driver matched but the probe failed */
printk(KERN_WARNING

2014-04-29 08:04:08

by Peter Ujfalusi

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] drivercore: deferral race condition fix

On 04/28/2014 05:18 PM, Grant Likely wrote:
> On Thu, 24 Apr 2014 22:01:02 +0100, Grant Likely <[email protected]> wrote:
>> On Thu, Apr 24, 2014 at 8:31 AM, Peter Ujfalusi <[email protected]> wrote:
>>> Hi Greg, Grant,
>>>
>>> On 04/09/2014 01:52 PM, Peter Ujfalusi wrote:
>>>> Hi,
>>>>
>>>> Changes since v1:
>>>> - The deferral race detection and handling has been moved to
>>>> driver_deferred_probe_trigger() and driver_deferred_probe_add() functions with
>>>> comment section.
>>>> - driver_deferred_probe_add/del function needed to be moved. Both of them placed
>>>> after driver_deferred_probe_trigger() so they remain together.
>>>> - I have not added the Acked-by from Mark and Grant since the patch itself
>>>> changed from v1, but the functionality remained.
>>>
>>> Is this solution sounds good? Do you want me to resend the two patch?
>>
>> You don't need to resend. I just got back from vacation. I will look
>> at it tomorrow.
>
> Hi Peter,
>
> I think I have a cleaner solution. I think it avoids a lot of the corner
> cases that I was concerned about with the flag approach. Can you look at
> this and tell me if it works for you?
>
> Basically, at the start of each probe, really_probe() checks how many
> triggers have occurred, and then later in the defer failure path, it
> checks to see if the trigger number has changed. If it has, then it is
> possible that the dependencies are resolved, so it does a trigger again.
>
> I could refine this a bit more. The local_trigger_count could be passed
> into driver_deferred_probe_add() and the add function could be more
> intelligent about only adding that one device to the active list instead
> of the entire pending list. However, I don't think doing a full trigger
> is actually going to be expensive in the conditions that will trigger
> the race. Unless you can convince me otherwise, I think it is better to
> keep it simple.

I don't think we can be more clever with this. I just noticed that my patch
would fail to trigger in the following case (because it was trying to be clever):
Four drivers probing at the same time and they return from their probes with:
1st driver: deferred
2nd driver: probe OK
3rd driver: deferred
4th driver: Error

With my patch we would not have driver_deferred_probe_trigger() called to
check if the 3rd driver has it's dependency because when it was deferring it
was not the last driver and the last driver is returned with error.

Your patch would handle this fine since it is going to call
driver_deferred_probe_trigger() for the 3rd driver.

This is why I think we can not be too clever. Actually we can be clever but it
means we need to add more code and logic.

Reviewed-by: Peter Ujfalusi <[email protected]>
Tested-by: Peter Ujfalusi <[email protected]>

>
> g.
>
> ---
>
> drivercore: deferral race condition fix
>
> When the kernel is built with CONFIG_PREEMPT it is possible to reach a state
> when all modules loaded but some driver still stuck in the deferred list
> and there is a need for external event to kick the deferred queue to probe
> these drivers.
>
> The issue has been observed on embedded systems with CONFIG_PREEMPT enabled,
> audio support built as modules and using nfsroot for root filesystem.
>
> The following log fragment shows such sequence when all audio modules
> were loaded but the sound card is not present since the machine driver has
> failed to probe due to missing dependency during it's probe.
> The board is am335x-evmsk (McASP<->tlv320aic3106 codec) with davinci-evm
> machine driver:
>
> ...
> [ 12.615118] davinci-mcasp 4803c000.mcasp: davinci_mcasp_probe: ENTER
> [ 12.719969] davinci_evm sound.3: davinci_evm_probe: ENTER
> [ 12.725753] davinci_evm sound.3: davinci_evm_probe: snd_soc_register_card
> [ 12.753846] davinci-mcasp 4803c000.mcasp: davinci_mcasp_probe: snd_soc_register_component
> [ 12.922051] davinci-mcasp 4803c000.mcasp: davinci_mcasp_probe: snd_soc_register_component DONE
> [ 12.950839] davinci_evm sound.3: ASoC: platform (null) not registered
> [ 12.957898] davinci_evm sound.3: davinci_evm_probe: snd_soc_register_card DONE (-517)
> [ 13.099026] davinci-mcasp 4803c000.mcasp: Kicking the deferred list
> [ 13.177838] davinci-mcasp 4803c000.mcasp: really_probe: probe_count = 2
> [ 13.194130] davinci_evm sound.3: snd_soc_register_card failed (-517)
> [ 13.346755] davinci_mcasp_driver_init: LEAVE
> [ 13.377446] platform sound.3: Driver davinci_evm requests probe deferral
> [ 13.592527] platform sound.3: really_probe: probe_count = 0
>
> In the log the machine driver enters it's probe at 12.719969 (this point it
> has been removed from the deferred lists). McASP driver already executing
> it's probing (since 12.615118).
> The machine driver tries to construct the sound card (12.950839) but did
> not found one of the components so it fails. After this McASP driver
> registers all the ASoC components (the machine driver still in it's probe
> function after it failed to construct the card) and the deferred work is
> prepared at 13.099026 (note that this time the machine driver is not in the
> lists so it is not going to be handled when the work is executing).
> Lastly the machine driver exit from it's probe and the core places it to
> the deferred list but there will be no other driver going to load and the
> deferred queue is not going to be kicked again - till we have external event
> like connecting USB stick, etc.
>
> The proposed solution is to check if a trigger has occured during probe, and
> if so then trigger the deferred queue again in the defer path.
>
> This way we can avoid drivers stuck in the deferred queue.
>
> [commit description written by Peter Ujfalusi]
> Signed-off-by: Grant Likely <[email protected]>
>
> ---
>
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 8986b9f22781..af76fcd8da64 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -52,6 +52,7 @@ static DEFINE_MUTEX(deferred_probe_mutex);
> static LIST_HEAD(deferred_probe_pending_list);
> static LIST_HEAD(deferred_probe_active_list);
> static struct workqueue_struct *deferred_wq;
> +static atomic_t deferred_trigger_count = ATOMIC_INIT(0);
>
> /**
> * deferred_probe_work_func() - Retry probing devices in the active list.
> @@ -135,6 +136,17 @@ static bool driver_deferred_probe_enable = false;
> * This functions moves all devices from the pending list to the active
> * list and schedules the deferred probe workqueue to process them. It
> * should be called anytime a driver is successfully bound to a device.
> + *
> + * Note, there is a race condition in multi-threaded probe. In the case where
> + * more than one device is probing at the same time, it is possible for one
> + * probe to complete successfully while another is about to defer. If the second
> + * depends on the first, then it will get put on the pending list after the
> + * trigger event has already occured and will be stuck there.
> + *
> + * The atomic 'deferred_trigger_count' is used to determine if a successful
> + * trigger has occurred in the midst of probing a driver. If the trigger count
> + * changes in the midst of a probe, then deferred processing should be triggered
> + * again.
> */
> static void driver_deferred_probe_trigger(void)
> {
> @@ -147,6 +159,7 @@ static void driver_deferred_probe_trigger(void)
> * into the active list so they can be retried by the workqueue
> */
> mutex_lock(&deferred_probe_mutex);
> + atomic_inc(&deferred_trigger_count);
> list_splice_tail_init(&deferred_probe_pending_list,
> &deferred_probe_active_list);
> mutex_unlock(&deferred_probe_mutex);
> @@ -265,6 +278,7 @@ static DECLARE_WAIT_QUEUE_HEAD(probe_waitqueue);
> static int really_probe(struct device *dev, struct device_driver *drv)
> {
> int ret = 0;
> + int local_trigger_count = atomic_read(&deferred_trigger_count);
>
> atomic_inc(&probe_count);
> pr_debug("bus: '%s': %s: probing driver %s with device %s\n",
> @@ -310,6 +324,8 @@ probe_failed:
> /* Driver requested deferred probing */
> dev_info(dev, "Driver %s requests probe deferral\n", drv->name);
> driver_deferred_probe_add(dev);
> + if (local_trigger_count != atomic_read(&deferred_trigger_count))
> + driver_deferred_probe_trigger();
> } else if (ret != -ENODEV && ret != -ENXIO) {
> /* driver matched but the probe failed */
> printk(KERN_WARNING
>


--
Péter