LinuxLists.cc - Deadlock scenario in regulator core

2011-03-22 22:02:05

Subject: Deadlock scenario in regulator core

Hi Liam and Mark,

I was analyzing the mutex lock usage in drivers/regulator/core.c and found
at least one way to reach deadlock: regulator_enable is called for a
regulator at the same time that regulator_disable is called for that
regulator's supply. Consider this simple example. There are two
regulators: S1 and L2, as well as two consumers: A and B. They are
connected as follows:

S1 --> L2 --> B
|
|--> A

Assume that A has already called regulator_enable for S1 some time in the
past.

Consumer A thread execution:
regulator_disable(S1)
mutex_lock(S1)
_regulator_disable(S1)
_notifier_call_chain(S1)
mutex_lock(L2)

Consumer B thread execution:
regulator_enable(L2)
mutex_lock(L2)
_regulator_enable(L2)
mutex_lock(S1)

The locks for S1 and L2 are taken in opposite orders in the two threads;
therefore, it is possible to achieve deadlock. I am not sure about the
best way to resolve this situation. Is there a correctness requirement
that regulator_enable holds the child regulator's lock when it attempts to
enable the parent regulator? Likewise, is the lock around
_notifier_call_chain required?

Thanks,
David Collins

--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

2011-03-22 22:31:47

by Mark Brown

[permalink] [raw]

Subject: Re: Deadlock scenario in regulator core

On Tue, Mar 22, 2011 at 03:02:01PM -0700, David Collins wrote:

> The locks for S1 and L2 are taken in opposite orders in the two threads;
> therefore, it is possible to achieve deadlock. I am not sure about the
> best way to resolve this situation. Is there a correctness requirement
> that regulator_enable holds the child regulator's lock when it attempts to
> enable the parent regulator? Likewise, is the lock around
> _notifier_call_chain required?

No need to hold the child lock, when we take the reference on the supply
we own the reference. It's just that the systems which need to use
daisychained regulators (mostly a DCDC to power LDOs for better
efficiency) are moderately rare and tend to not bother representing the
supply relationship as the parent regulator tends to be always on.

In fact it looks rather like the refcounting for supplies is wrong
anyway, regulator_disable() unconditionally drops references to supplies
but regulator_enable() only enables them if the refcount was previously
zero, and it appears we don't clean up supplies after failed enables.
The below patch (which I've not even compile tested) should resolve both
issues, could you give it a spin and let me know if it works for you
please?

diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index 3ffc697..0a7fbde 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -1284,19 +1284,6 @@ static int _regulator_enable(struct regulator_dev *rdev)
{
int ret, delay;

- if (rdev->use_count == 0) {
- /* do we need to enable the supply regulator first */
- if (rdev->supply) {
- mutex_lock(&rdev->supply->mutex);
- ret = _regulator_enable(rdev->supply);
- mutex_unlock(&rdev->supply->mutex);
- if (ret < 0) {
- rdev_err(rdev, "failed to enable: %d\n", ret);
- return ret;
- }
- }
- }
-
/* check voltage and requested load before enabling */
if (rdev->constraints &&
(rdev->constraints->valid_ops_mask & REGULATOR_CHANGE_DRMS))
@@ -1370,10 +1357,27 @@ int regulator_enable(struct regulator *regulator)
{
struct regulator_dev *rdev = regulator->rdev;
int ret = 0;
+ int disret;
+
+ if (rdev->supply) {
+ ret = regulator_enable(rdev->supply);
+ if (ret < 0) {
+ rdev_err(rdev, "failed to enable supply: %d\n", ret);
+ return ret;
+ }
+ }

mutex_lock(&rdev->mutex);
ret = _regulator_enable(rdev);
mutex_unlock(&rdev->mutex);
+
+ if (ret != 0 && rdev->supply) {
+ disret = regulator_disable(rdev->supply);
+ if (disret < 0)
+ rdev_err(rdev, "failed to disable supply: %d\n",
+ disret);
+ }
+
return ret;
}
EXPORT_SYMBOL_GPL(regulator_enable);

2011-03-22 22:37:07

by Steven Rostedt

[permalink] [raw]

Subject: Re: Deadlock scenario in regulator core

On Tue, Mar 22, 2011 at 03:02:01PM -0700, David Collins wrote:
> Assume that A has already called regulator_enable for S1 some time in the
> past.
>
> Consumer A thread execution:
> regulator_disable(S1)
> mutex_lock(S1)
> _regulator_disable(S1)
> _notifier_call_chain(S1)
> mutex_lock(L2)
>
> Consumer B thread execution:
> regulator_enable(L2)
> mutex_lock(L2)
> _regulator_enable(L2)
> mutex_lock(S1)
>
> The locks for S1 and L2 are taken in opposite orders in the two threads;
> therefore, it is possible to achieve deadlock. I am not sure about the
> best way to resolve this situation. Is there a correctness requirement
> that regulator_enable holds the child regulator's lock when it attempts to
> enable the parent regulator? Likewise, is the lock around
> _notifier_call_chain required?

I'm curious, if you had enabled lockdep, do you get a warning? If not,
why not?

Thanks,

-- Steve

2011-03-22 22:43:07

by Mark Brown

[permalink] [raw]

Subject: Re: Deadlock scenario in regulator core

On Tue, Mar 22, 2011 at 03:02:01PM -0700, David Collins wrote:

> enable the parent regulator? Likewise, is the lock around
> _notifier_call_chain required?

For this: we don't want the chain changing under us and the chain is
protected by the mutex.

2011-03-22 23:09:03

by David Collins

[permalink] [raw]

Subject: Re: Deadlock scenario in regulator core

On 03/22/2011 03:37 PM, Steven Rostedt wrote:
> On Tue, Mar 22, 2011 at 03:02:01PM -0700, David Collins wrote:
>> Assume that A has already called regulator_enable for S1 some time in the
>> past.
>>
>> Consumer A thread execution:
>> regulator_disable(S1)
>> mutex_lock(S1)
>> _regulator_disable(S1)
>> _notifier_call_chain(S1)
>> mutex_lock(L2)
>>
>> Consumer B thread execution:
>> regulator_enable(L2)
>> mutex_lock(L2)
>> _regulator_enable(L2)
>> mutex_lock(S1)
>>
>> The locks for S1 and L2 are taken in opposite orders in the two threads;
>> therefore, it is possible to achieve deadlock. I am not sure about the
>> best way to resolve this situation. Is there a correctness requirement
>> that regulator_enable holds the child regulator's lock when it attempts to
>> enable the parent regulator? Likewise, is the lock around
>> _notifier_call_chain required?
>
> I'm curious, if you had enabled lockdep, do you get a warning? If not,
> why not?
>
> Thanks,
>
> -- Steve
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

I have tried running with lockdep enabled. It does not produce a warning
about possible deadlock from locks being taken in opposite orders in two
threads. I assume that this is because it can only keep track of locks
taken in the current stack backtrace.

It does produce a warning for regulator_disable by itself though on a
regulator with a non-empty supply_list:

=============================================
[ INFO: possible recursive locking detected ]
2.6.38-rc7+ #231
---------------------------------------------
sh/25 is trying to acquire lock:
(&rdev->mutex){+.+...}, at: [<c0137ae4>] _notifier_call_chain+0x28/0x6c

but task is already holding lock:
(&rdev->mutex){+.+...}, at: [<c0138410>] regulator_disable+0x24/0x74

The locks that it is noting are different; one is for the parent regulator
and the other is for the child regulator. Any thoughts?

Thanks,
David

--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

2011-03-22 23:20:04

by Steven Rostedt

[permalink] [raw]

Subject: Re: Deadlock scenario in regulator core

[ Added Peter and Ingo on Cc ]

On Tue, 2011-03-22 at 16:08 -0700, David Collins wrote:
> On 03/22/2011 03:37 PM, Steven Rostedt wrote:
> > On Tue, Mar 22, 2011 at 03:02:01PM -0700, David Collins wrote:
> >> Assume that A has already called regulator_enable for S1 some time in the
> >> past.
> >>
> >> Consumer A thread execution:
> >> regulator_disable(S1)
> >> mutex_lock(S1)
> >> _regulator_disable(S1)
> >> _notifier_call_chain(S1)
> >> mutex_lock(L2)
> >>
> >> Consumer B thread execution:
> >> regulator_enable(L2)
> >> mutex_lock(L2)
> >> _regulator_enable(L2)
> >> mutex_lock(S1)
> >>
> >> The locks for S1 and L2 are taken in opposite orders in the two threads;
> >> therefore, it is possible to achieve deadlock. I am not sure about the
> >> best way to resolve this situation. Is there a correctness requirement
> >> that regulator_enable holds the child regulator's lock when it attempts to
> >> enable the parent regulator? Likewise, is the lock around
> >> _notifier_call_chain required?
> >
> > I'm curious, if you had enabled lockdep, do you get a warning? If not,
> > why not?
> >
> > Thanks,
> >
> > -- Steve
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> I have tried running with lockdep enabled. It does not produce a warning
> about possible deadlock from locks being taken in opposite orders in two
> threads. I assume that this is because it can only keep track of locks
> taken in the current stack backtrace.
>
> It does produce a warning for regulator_disable by itself though on a
> regulator with a non-empty supply_list:
>
> =============================================
> [ INFO: possible recursive locking detected ]
> 2.6.38-rc7+ #231
> ---------------------------------------------
> sh/25 is trying to acquire lock:
> (&rdev->mutex){+.+...}, at: [<c0137ae4>] _notifier_call_chain+0x28/0x6c
>
> but task is already holding lock:
> (&rdev->mutex){+.+...}, at: [<c0138410>] regulator_disable+0x24/0x74
>
> The locks that it is noting are different; one is for the parent regulator
> and the other is for the child regulator. Any thoughts?

Looks to me that the mutex_lock() in _notifier_call_chain needs to be a
mutex_lock_nested().

The "_nested()" versions are when you have the same type of mutex taken
but belonging to two different instances. Like you have here:

blocking_notifier_call_chain(&rdev->notifier, event, NULL);

/* now notify regulator we supply */
list_for_each_entry(_rdev, &rdev->supply_list, slist) {
mutex_lock(&_rdev->mutex);
_notifier_call_chain(_rdev, event, data);
mutex_unlock(&_rdev->mutex);
}

The rdev->mutex is already held, so we don't need to take it to call the
blocking_notifier_call_chain() with the rdev->notifier. But then the
list of rdev's in the rdev->supply_list are different instances but we
are still taking the same type of lock. lockdep treats all instances of
the same lock the same, so to lockdep this looks like a deadlock. To
teach lockdep that this is a different instance, simply use
mutex_lock_nested() instead.

-- Steve

2011-03-22 23:30:47

by David Collins

[permalink] [raw]

Subject: Re: Deadlock scenario in regulator core

On 03/22/2011 03:31 PM, Mark Brown wrote:
> No need to hold the child lock, when we take the reference on the supply
> we own the reference. It's just that the systems which need to use
> daisychained regulators (mostly a DCDC to power LDOs for better
> efficiency) are moderately rare and tend to not bother representing the
> supply relationship as the parent regulator tends to be always on.
>
> In fact it looks rather like the refcounting for supplies is wrong
> anyway, regulator_disable() unconditionally drops references to supplies
> but regulator_enable() only enables them if the refcount was previously
> zero, and it appears we don't clean up supplies after failed enables.
> The below patch (which I've not even compile tested) should resolve both
> issues, could you give it a spin and let me know if it works for you
> please?
>
> diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
> index 3ffc697..0a7fbde 100644
> --- a/drivers/regulator/core.c
> +++ b/drivers/regulator/core.c
> @@ -1284,19 +1284,6 @@ static int _regulator_enable(struct regulator_dev *rdev)
> {
> int ret, delay;
>
> - if (rdev->use_count == 0) {
> - /* do we need to enable the supply regulator first */
> - if (rdev->supply) {
> - mutex_lock(&rdev->supply->mutex);
> - ret = _regulator_enable(rdev->supply);
> - mutex_unlock(&rdev->supply->mutex);
> - if (ret < 0) {
> - rdev_err(rdev, "failed to enable: %d\n", ret);
> - return ret;
> - }
> - }
> - }
> -
> /* check voltage and requested load before enabling */
> if (rdev->constraints &&
> (rdev->constraints->valid_ops_mask & REGULATOR_CHANGE_DRMS))
> @@ -1370,10 +1357,27 @@ int regulator_enable(struct regulator *regulator)
> {
> struct regulator_dev *rdev = regulator->rdev;
> int ret = 0;
> + int disret;
> +
> + if (rdev->supply) {
> + ret = regulator_enable(rdev->supply);

This should be _regulator_enable instead of regulator_enable. There will
also need to be a mutex lock and unlock around it for rdev->supply->mutex.
I think that it needs to iterate through all supplies in the chain
similar to how it is done in regulator_disable.

> + if (ret < 0) {
> + rdev_err(rdev, "failed to enable supply: %d\n", ret);
> + return ret;
> + }
> + }
>
> mutex_lock(&rdev->mutex);
> ret = _regulator_enable(rdev);
> mutex_unlock(&rdev->mutex);
> +
> + if (ret != 0 && rdev->supply) {
> + disret = regulator_disable(rdev->supply);

This should be _regulator_disable instead of regulator_disable. There
will also need to be a mutex lock and unlock around it for
rdev->supply->mutex. Additionally, a while loop is needed to disable all
supplies in the chain (same as in regulator_disable).

> + if (disret < 0)
> + rdev_err(rdev, "failed to disable supply: %d\n",
> + disret);
> + }
> +
> return ret;
> }
> EXPORT_SYMBOL_GPL(regulator_enable);

This patch doesn't compile. A few changes are needed.

Thanks,
David

--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

2011-03-22 23:41:31

by David Collins

[permalink] [raw]

Subject: Re: Deadlock scenario in regulator core

On 03/22/2011 04:19 PM, Steven Rostedt wrote:
>
> Looks to me that the mutex_lock() in _notifier_call_chain needs to be a
> mutex_lock_nested().
>
> The "_nested()" versions are when you have the same type of mutex taken
> but belonging to two different instances. Like you have here:
>
> blocking_notifier_call_chain(&rdev->notifier, event, NULL);
>
> /* now notify regulator we supply */
> list_for_each_entry(_rdev, &rdev->supply_list, slist) {
> mutex_lock(&_rdev->mutex);
> _notifier_call_chain(_rdev, event, data);
> mutex_unlock(&_rdev->mutex);
> }
>
> The rdev->mutex is already held, so we don't need to take it to call the
> blocking_notifier_call_chain() with the rdev->notifier. But then the
> list of rdev's in the rdev->supply_list are different instances but we
> are still taking the same type of lock. lockdep treats all instances of
> the same lock the same, so to lockdep this looks like a deadlock. To
> teach lockdep that this is a different instance, simply use
> mutex_lock_nested() instead.
>
> -- Steve
>
>

There seem to be very few uses of mutex_lock_nested() in the kernel. Most
of them use subclass = SINGLE_DEPTH_NESTING. Would this be sufficient for
usage in the regulator core in _notifier_call_chain (and perhaps other
places) or should some other subclass be used?

Thanks,
David

--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

2011-03-22 23:45:35

by Mark Brown

[permalink] [raw]

Subject: Re: Deadlock scenario in regulator core

On Tue, Mar 22, 2011 at 04:30:45PM -0700, David Collins wrote:
> On 03/22/2011 03:31 PM, Mark Brown wrote:

> > + int disret;
> > +
> > + if (rdev->supply) {
> > + ret = regulator_enable(rdev->supply);

> This should be _regulator_enable instead of regulator_enable. There will

Oh, feh. The supply stuff would generally be easier if it were
consumers as you'd expect, the special casing just makes things more
fragile. It's not clear to me that the best approach here isn't just to
make the supplies have regular consumer structs so we can do things like
this.

> also need to be a mutex lock and unlock around it for rdev->supply->mutex.

Unless we implement the above change - you're assuming that the change
to the unlocked enable is the best one.

> I think that it needs to iterate through all supplies in the chain
> similar to how it is done in regulator_disable.

The current code (if it had compiled) would deal with that through
recursion.

> This should be _regulator_disable instead of regulator_disable. There
> will also need to be a mutex lock and unlock around it for
> rdev->supply->mutex. Additionally, a while loop is needed to disable all
> supplies in the chain (same as in regulator_disable).

Again, no loop needed with the code as written as it's recursive.

2011-03-23 00:00:51

by Mark Brown

[permalink] [raw]

Subject: Re: Deadlock scenario in regulator core

On Tue, Mar 22, 2011 at 07:19:58PM -0400, Steven Rostedt wrote:

> Looks to me that the mutex_lock() in _notifier_call_chain needs to be a
> mutex_lock_nested().

> The "_nested()" versions are when you have the same type of mutex taken
> but belonging to two different instances. Like you have here:

What's a mutex type? I have to say this is the first time I've heard of
mutex types and the documentation in mutex.c and mutex-design.txt isn't
precisely verbose on what mutex_lock_nested() is for or how one would
pick subclass.

2011-03-23 00:07:41

by Steven Rostedt

[permalink] [raw]

Subject: Re: Deadlock scenario in regulator core

On Tue, 2011-03-22 at 16:41 -0700, David Collins wrote:

> There seem to be very few uses of mutex_lock_nested() in the kernel. Most
> of them use subclass = SINGLE_DEPTH_NESTING. Would this be sufficient for
> usage in the regulator core in _notifier_call_chain (and perhaps other
> places) or should some other subclass be used?

Note, I do not know this code well enough to say. I'm assuming that an
rdevA on a rdevB->supply_list never has rdevB on its own
rdevA->supply_list.

If this is the case, and that you only ever have a lock nesting of one,
then sure, use the SINGLE_DEPTH_NESTING.

Peter or Ingo could correct me if I'm wrong.

-- Steve

2011-03-23 00:11:18

by Mark Brown

[permalink] [raw]

Subject: Re: Deadlock scenario in regulator core

On Tue, Mar 22, 2011 at 08:07:36PM -0400, Steven Rostedt wrote:

> Note, I do not know this code well enough to say. I'm assuming that an
> rdevA on a rdevB->supply_list never has rdevB on its own
> rdevA->supply_list.

Correct.

> If this is the case, and that you only ever have a lock nesting of one,
> then sure, use the SINGLE_DEPTH_NESTING.

It'd be good if someone could update the documentation in the mutex code
so the usage were clear here - I don't want to see the locking become
any more complicated, especially not for relatively infrequently used
things like supplies.

Though we may be able to deal with this by simplifying the
implementation of supplies anyway.

2011-03-23 00:38:58

by Steven Rostedt

[permalink] [raw]

Subject: Re: Deadlock scenario in regulator core

On Wed, 2011-03-23 at 00:01 +0000, Mark Brown wrote:
> On Tue, Mar 22, 2011 at 07:19:58PM -0400, Steven Rostedt wrote:
>
> > Looks to me that the mutex_lock() in _notifier_call_chain needs to be a
> > mutex_lock_nested().
>
> > The "_nested()" versions are when you have the same type of mutex taken
> > but belonging to two different instances. Like you have here:
>
> What's a mutex type? I have to say this is the first time I've heard of
> mutex types and the documentation in mutex.c and mutex-design.txt isn't
> precisely verbose on what mutex_lock_nested() is for or how one would
> pick subclass.

Sorry, I said "mutex type" as a synonym to "lock class". A lock class is
pretty much how a lock is defined.

struct foo {
struct mutex m;
};

struct foo *func(void)
{
bar = kzalloc(sizeof(*bar));
mutex_init(&bar->m);
}

bar is an instance of lock class struct foo.m. If you have:

a = func();
b = func();

Both a->m and b->m are an instance of struct foo.m lock class.

There's better documentation about this in the lockdep-design.txt.

-- Steve

2011-03-23 10:42:22

by Mark Brown

[permalink] [raw]

Subject: Re: Deadlock scenario in regulator core

On Tue, Mar 22, 2011 at 08:38:54PM -0400, Steven Rostedt wrote:
> On Wed, 2011-03-23 at 00:01 +0000, Mark Brown wrote:

> > What's a mutex type? I have to say this is the first time I've heard of
> > mutex types and the documentation in mutex.c and mutex-design.txt isn't
> > precisely verbose on what mutex_lock_nested() is for or how one would
> > pick subclass.

> Sorry, I said "mutex type" as a synonym to "lock class". A lock class is
> pretty much how a lock is defined.

OK, lock class can be grepped which was the main thing, thanks.

> There's better documentation about this in the lockdep-design.txt.

That's helpful, though it doesn't really say anything about how one
picks subclass?

2011-03-25 10:53:57

by Peter Zijlstra

[permalink] [raw]

Subject: Re: Deadlock scenario in regulator core

On Tue, 2011-03-22 at 20:07 -0400, Steven Rostedt wrote:
> On Tue, 2011-03-22 at 16:41 -0700, David Collins wrote:
>
> > There seem to be very few uses of mutex_lock_nested() in the kernel. Most
> > of them use subclass = SINGLE_DEPTH_NESTING. Would this be sufficient for
> > usage in the regulator core in _notifier_call_chain (and perhaps other
> > places) or should some other subclass be used?
>
> Note, I do not know this code well enough to say. I'm assuming that an
> rdevA on a rdevB->supply_list never has rdevB on its own
> rdevA->supply_list.
>
> If this is the case, and that you only ever have a lock nesting of one,
> then sure, use the SINGLE_DEPTH_NESTING.
>
> Peter or Ingo could correct me if I'm wrong.

Right, so be aware that you can annotate an actual deadlock away with
mutex_lock_nested(), so use with care. The thing to avoid is something
like:

mutex_lock(instance1);
mutex_lock_nested(instance2, SINGLE_DEPTH_NESTING);

vs

mutex_lock(instance2);
mutex_lock_nested(instance1, SINGLE_DEPTH_NESTING);

Lockdep will not complain anymore but it will cause deadlocks.

2011-03-25 10:57:13

by Peter Zijlstra

[permalink] [raw]

Subject: Re: Deadlock scenario in regulator core

On Wed, 2011-03-23 at 10:42 +0000, Mark Brown wrote:
>
> That's helpful, though it doesn't really say anything about how one
> picks subclass?

With care ;-)

Its most useful for recursive use of the same lock class, typically we
only have 2 locks held and we can use the SINGLE_DEPTH_NESTING thing.
Some few sites use subclasses for larger things, some are ok, some are
dubious (ext4 comes to mind).

And like said in the other email, be careful with lockdep annotations,
you can actually annotate real deadlocks away.