2007-12-07 23:02:22

by Remy Bohmer

[permalink] [raw]
Subject: lockdep problem conversion semaphore->mutex (dev->sem)

Hello Peter,

> > What specifically is wrong with dev->sem ?
>
> Nothing really, other than that they use semaphores to avoid lockdep :-/
>
> I think I know how to annotate this, after Alan Stern explained all the
> use cases, but I haven't come around to implementing it. Hope to do that
> soonish.

I was looking for an easy semaphore I could convert to a mutex, and I
ran into one that was widely spread and interesting, and which seemed
quite doable at first sight.
So, I started working on it, but was forgotten this discussion, (until
Daniel made me remember it this afternoon). So, I (stupid me ;-) )
tried to convert dev->sem...

After doing the monkey part of the conversion I can boot the kernel
completely on X86 and ARM, and everything works fine, except after
enabling lockdep, lockdep starts complaining...

Is this the problem you were pointing at?
=======================================================
Setting up standard PCI resources
ACPI: EC: Look up EC in DSDT
ACPI: Interpreter enabled
ACPI: (supports S0 S1 S3 S4 S5)
ACPI: Using IOAPIC for interrupt routing

=============================================
[ INFO: possible recursive locking detected ]
2.6.24-rc4 #5
---------------------------------------------
swapper/1 is trying to acquire lock:
(&dev->mut){--..}, at: [<c056760c>] __driver_attach+0x2c/0x5f

but task is already holding lock:
(&dev->mut){--..}, at: [<c05675fd>] __driver_attach+0x1d/0x5f

other info that might help us debug this:
1 lock held by swapper/1:
#0: (&dev->mut){--..}, at: [<c05675fd>] __driver_attach+0x1d/0x5f

stack backtrace:
Pid: 1, comm: swapper Not tainted 2.6.24-rc4 #5
[<c0448a25>] __lock_acquire+0x17b/0xb83
[<c0448491>] trace_hardirqs_on+0x11a/0x13d
[<c04497f9>] lock_acquire+0x5f/0x77
[<c056760c>] __driver_attach+0x2c/0x5f
[<c06210de>] mutex_lock_nested+0x105/0x26b
[<c056760c>] __driver_attach+0x2c/0x5f
[<c056760c>] __driver_attach+0x2c/0x5f
[<c05675e0>] __driver_attach+0x0/0x5f
[<c056760c>] __driver_attach+0x2c/0x5f
[<c0566ba9>] bus_for_each_dev+0x3a/0x5c
[<c05673ba>] driver_attach+0x16/0x18
[<c05675e0>] __driver_attach+0x0/0x5f
[<c0566ea0>] bus_add_driver+0x6d/0x19a
[<c0762613>] acpi_ec_init+0x33/0x51
[<c0749491>] kernel_init+0x148/0x2af
[<c0749349>] kernel_init+0x0/0x2af
[<c0749349>] kernel_init+0x0/0x2af
[<c0405bd7>] kernel_thread_helper+0x7/0x10
=======================
ACPI: PCI Root Bridge [PCI0] (0000:00)
Force enabled HPET at base address 0xfed00000
=======================================================

I tried debugging it, and I have not found a recursive mutex locking
so far, only locking of 2 different mutexes in a row prior to this
warning, which IMO should be valid.

What is your opinion?

BTW: I attached my patch for dev->sem as I have it now, that generates
this lockdep warning ( for if you want to look at it yourself also, so
you do not have to do the monkey part yourself anymore ;-)


Kind Regards,

Remy


Attachments:
(No filename) (2.87 kB)
patch_replace_sem_by_mutex_in_device_h (29.14 kB)
Download all attachments

2007-12-08 12:17:26

by Peter Zijlstra

[permalink] [raw]
Subject: Re: lockdep problem conversion semaphore->mutex (dev->sem)


On Sat, 2007-12-08 at 00:02 +0100, Remy Bohmer wrote:
> Hello Peter,
>
> > > What specifically is wrong with dev->sem ?
> >
> > Nothing really, other than that they use semaphores to avoid lockdep :-/
> >
> > I think I know how to annotate this, after Alan Stern explained all the
> > use cases, but I haven't come around to implementing it. Hope to do that
> > soonish.
>
> I was looking for an easy semaphore I could convert to a mutex, and I
> ran into one that was widely spread and interesting, and which seemed
> quite doable at first sight.
> So, I started working on it, but was forgotten this discussion, (until
> Daniel made me remember it this afternoon). So, I (stupid me ;-) )
> tried to convert dev->sem...
>
> After doing the monkey part of the conversion I can boot the kernel
> completely on X86 and ARM, and everything works fine, except after
> enabling lockdep, lockdep starts complaining...
>
> Is this the problem you were pointing at?

Yeah, one of the interesting nestings :-)

> I tried debugging it, and I have not found a recursive mutex locking
> so far, only locking of 2 different mutexes in a row prior to this
> warning, which IMO should be valid.
>
> What is your opinion?

Yeah, the locking is all valid afaics, its just that it needs some
interesting annotations to make lockdep see it that way.

> BTW: I attached my patch for dev->sem as I have it now, that generates
> this lockdep warning ( for if you want to look at it yourself also, so
> you do not have to do the monkey part yourself anymore ;-)

I have a similar patch floating around, but thanks anyway :-)

2007-12-08 17:07:10

by Daniel Walker

[permalink] [raw]
Subject: Re: lockdep problem conversion semaphore->mutex (dev->sem)

On Sat, 2007-12-08 at 13:16 +0100, Peter Zijlstra wrote:
> On Sat, 2007-12-08 at 00:02 +0100, Remy Bohmer wrote:
> > Hello Peter,
> >
> > > > What specifically is wrong with dev->sem ?
> > >
> > > Nothing really, other than that they use semaphores to avoid lockdep :-/
> > >
> > > I think I know how to annotate this, after Alan Stern explained all the
> > > use cases, but I haven't come around to implementing it. Hope to do that
> > > soonish.
> >
> > I was looking for an easy semaphore I could convert to a mutex, and I
> > ran into one that was widely spread and interesting, and which seemed
> > quite doable at first sight.
> > So, I started working on it, but was forgotten this discussion, (until
> > Daniel made me remember it this afternoon). So, I (stupid me ;-) )
> > tried to convert dev->sem...
> >
> > After doing the monkey part of the conversion I can boot the kernel
> > completely on X86 and ARM, and everything works fine, except after
> > enabling lockdep, lockdep starts complaining...
> >
> > Is this the problem you were pointing at?
>
> Yeah, one of the interesting nestings :-)

It must be the locking in __driver_attach(), taking dev->parent->sem
then taking dev->sem .. Assuming those are different structures, why
does lockdep trigger?

Daniel

2007-12-08 17:20:17

by Daniel Walker

[permalink] [raw]
Subject: Re: lockdep problem conversion semaphore->mutex (dev->sem)

On Sat, 2007-12-08 at 18:11 +0100, Peter Zijlstra wrote:

> > It must be the locking in __driver_attach(), taking dev->parent->sem
> > then taking dev->sem .. Assuming those are different structures, why
> > does lockdep trigger?
>
> They aren't different, parent is a struct device again.

It's different memory tho .. I wasn't sure how to term that .. The locks
are in two different memory location so it couldn't be recursive .. I'm
only asking for my own understanding .. I don't mind inspecting
potentially bad locking ..

Daniel

2007-12-08 17:37:47

by Peter Zijlstra

[permalink] [raw]
Subject: Re: lockdep problem conversion semaphore->mutex (dev->sem)


On Sat, 2007-12-08 at 09:06 -0800, Daniel Walker wrote:
> On Sat, 2007-12-08 at 18:11 +0100, Peter Zijlstra wrote:
>
> > > It must be the locking in __driver_attach(), taking dev->parent->sem
> > > then taking dev->sem .. Assuming those are different structures, why
> > > does lockdep trigger?
> >
> > They aren't different, parent is a struct device again.
>
> It's different memory tho .. I wasn't sure how to term that .. The locks
> are in two different memory location so it couldn't be recursive .. I'm
> only asking for my own understanding .. I don't mind inspecting
> potentially bad locking ..

Yeah, it are different lock instances, however by virtue of sharing the
same lock init site, they belong to the same lock class. Lockdep works
by tracking class dependancies, not instance dependancies.

By generalizing to classes it can detect locking errors before they
actually occur.

2007-12-08 17:44:55

by Peter Zijlstra

[permalink] [raw]
Subject: Re: lockdep problem conversion semaphore->mutex (dev->sem)


On Sat, 2007-12-08 at 08:53 -0800, Daniel Walker wrote:
> On Sat, 2007-12-08 at 13:16 +0100, Peter Zijlstra wrote:
> > On Sat, 2007-12-08 at 00:02 +0100, Remy Bohmer wrote:
> > > Hello Peter,
> > >
> > > > > What specifically is wrong with dev->sem ?
> > > >
> > > > Nothing really, other than that they use semaphores to avoid lockdep :-/
> > > >
> > > > I think I know how to annotate this, after Alan Stern explained all the
> > > > use cases, but I haven't come around to implementing it. Hope to do that
> > > > soonish.
> > >
> > > I was looking for an easy semaphore I could convert to a mutex, and I
> > > ran into one that was widely spread and interesting, and which seemed
> > > quite doable at first sight.
> > > So, I started working on it, but was forgotten this discussion, (until
> > > Daniel made me remember it this afternoon). So, I (stupid me ;-) )
> > > tried to convert dev->sem...
> > >
> > > After doing the monkey part of the conversion I can boot the kernel
> > > completely on X86 and ARM, and everything works fine, except after
> > > enabling lockdep, lockdep starts complaining...
> > >
> > > Is this the problem you were pointing at?
> >
> > Yeah, one of the interesting nestings :-)
>
> It must be the locking in __driver_attach(), taking dev->parent->sem
> then taking dev->sem .. Assuming those are different structures, why
> does lockdep trigger?

They aren't different, parent is a struct device again.

2007-12-08 19:52:18

by Remy Bohmer

[permalink] [raw]
Subject: Re: lockdep problem conversion semaphore->mutex (dev->sem)

Hello Peter and Daniel,

> Yeah, it are different lock instances, however by virtue of sharing the
> same lock init site, they belong to the same lock class. Lockdep works
> by tracking class dependancies, not instance dependancies.

The device and its parent both indeed have different
pointers/instances. I saw that during debugging yesterday, so I
already expected this was not really a bug, but a false positive of
lockdep.

> By generalizing to classes it can detect locking errors before they
> actually occur.

Basically that is a good thing...
But... now we do not transfer the dev->sem to a mutex, because lockdep
will start generating false positives, and thus we mask the lockdep
error, by not converting the dev->sem to what it really is...

This does give me a bad feeling about this... In short, we are
implementing workarounds in good code code to hide bugs in
debug-tooling... ?!

So, any suggestions on how to fix lockdep? Anyone some good hints
where I can start?
Is it that fundamental to lockdep that it cannot be fixed without
breaking it, or do we have to instrument the code that tells lockdep
to ignore this particular mutex?


Kind Regards,

Remy

2007-12-08 20:05:32

by Peter Zijlstra

[permalink] [raw]
Subject: Re: lockdep problem conversion semaphore->mutex (dev->sem)


On Sat, 2007-12-08 at 20:52 +0100, Remy Bohmer wrote:
> Hello Peter and Daniel,
>
> > Yeah, it are different lock instances, however by virtue of sharing the
> > same lock init site, they belong to the same lock class. Lockdep works
> > by tracking class dependancies, not instance dependancies.
>
> The device and its parent both indeed have different
> pointers/instances. I saw that during debugging yesterday, so I
> already expected this was not really a bug, but a false positive of
> lockdep.
>
> > By generalizing to classes it can detect locking errors before they
> > actually occur.
>
> Basically that is a good thing...
> But... now we do not transfer the dev->sem to a mutex, because lockdep
> will start generating false positives, and thus we mask the lockdep
> error, by not converting the dev->sem to what it really is...
>
> This does give me a bad feeling about this... In short, we are
> implementing workarounds in good code code to hide bugs in
> debug-tooling... ?!

Its not a bug, its a feature! :-)

> So, any suggestions on how to fix lockdep? Anyone some good hints
> where I can start?
> Is it that fundamental to lockdep that it cannot be fixed without
> breaking it, or do we have to instrument the code that tells lockdep
> to ignore this particular mutex?

I have a good idea on how to annotate this, but not yet the time to do
so. Aside from the nesting problems, the dev->sem has other problems as
well. Converting this is rather non-trivial.

I'd not put it as harshly as you put it though, lockdep makes some
assumptions which can lead to false positives - otoh these assumptions
often end up pointing out 'curious' locking coupled to 'curious' data
structures. And fixing up these things often leads to better and simpler
code.

The emphasis is on often, this is one of the cases where this is not so.
So while it does restain the creativity of locking it often ends up
being for the better.

And while you might not see it in-tree anymore, lockdep does help out
tremendously while developing new code. I'm sure that without it the
locking would be in a much worse state than it is today.

2007-12-08 20:08:45

by Ingo Molnar

[permalink] [raw]
Subject: Re: lockdep problem conversion semaphore->mutex (dev->sem)


* Remy Bohmer <[email protected]> wrote:

> But... now we do not transfer the dev->sem to a mutex, because lockdep
> will start generating false positives, and thus we mask the lockdep
> error, by not converting the dev->sem to what it really is...

no, you are wrong. If you want to do complex locking, you can still do
it: take a look at the dev->sem conversion patches from Peter which
correctly do this. Lockdep has all the facilities for that. (you just
dont know about them) Currently there are 4459 critical sections in the
kernel that use mutexes and which work fine with lockdep.

the general policy message here is: do not implement complex locking. It
hurts. It's hard to maintain. It's easy to mess up and leads to bugs.
Lockdep just makes that plain obvious.

Your mail and your frustration shows this general concept in happy
action: judging from your comments you have little clue about dev->sem
locking and its implications and you'd happily go along and pollute the
kernel with complex and hard to maintain nested locking constructs.

Lockdep prevents you from doing it mindlessly, it _forces_ you to first
understand the data structures, their locking and their relationship
with each other. Then you can implement complexity, if you still want
it.

That, Sir, is a Good Thing (tm).

Ingo

2007-12-08 20:33:27

by Remy Bohmer

[permalink] [raw]
Subject: Re: lockdep problem conversion semaphore->mutex (dev->sem)

Hello Peter,

> And while you might not see it in-tree anymore, lockdep does help out
> tremendously while developing new code. I'm sure that without it the
> locking would be in a much worse state than it is today.

I am not arguing that, I am also convinced it has done a good job.

> I have a good idea on how to annotate this, but not yet the time to do
> so.

Sounds good...

> Aside from the nesting problems, the dev->sem has other problems as
> well. Converting this is rather non-trivial.

Which problems? I did not see any special things, it looked rather
straight forward. What have I overlooked?

> I'd not put it as harshly as you put it though, lockdep makes some
> assumptions which can lead to false positives -

By putting it this black and white, it usually helps to get all the
opinions clear ;-)
(By staying in the middle, everybody usually tend to agree ;-)

> otoh these assumptions
> often end up pointing out 'curious' locking coupled to 'curious' data
> structures. And fixing up these things often leads to better and simpler
> code.
> The emphasis is on often, this is one of the cases where this is not so.
> So while it does restain the creativity of locking it often ends up
> being for the better.

Ack.

Kind Regards,

Remy

2007-12-08 20:48:25

by Remy Bohmer

[permalink] [raw]
Subject: Re: lockdep problem conversion semaphore->mutex (dev->sem)

Hello Ingo,

> no, you are wrong. If you want to do complex locking, you can still do
> it: take a look at the dev->sem conversion patches from Peter which
> correctly do this. Lockdep has all the facilities for that.
> (you just dont know about them)

Ok.

> the general policy message here is: do not implement complex locking. It
> hurts. It's hard to maintain. It's easy to mess up and leads to bugs.

Agree

> Lockdep just makes that plain obvious.
> Your mail and your frustration shows this general concept in happy

Please, do not see it as frustration, because it isn't...
I just want to understand it better.

> action: judging from your comments you have little clue about dev->sem
> locking and its implications and you'd happily go along and pollute the
> kernel with complex and hard to maintain nested locking constructs.

Now, you assume that i would _like_ complex locking. This is not true.

I just want to understand what was so wrong with dev->sem, I even read
in the discussions before about dev->sem, that it still was on the old
semaphore interface to get around lockdep issues, and _that_ is wrong.
That is bug-hiding from either the code or the tool. I just wanted to
understand if this was a lockdep bug, or a real code bug.

> Lockdep prevents you from doing it mindlessly, it _forces_ you to first
> understand the data structures, their locking and their relationship
> with each other. Then you can implement complexity, if you still want
> it.
>
> That, Sir, is a Good Thing (tm).

Completely agree.


Remy

2007-12-08 20:51:18

by Peter Zijlstra

[permalink] [raw]
Subject: Re: lockdep problem conversion semaphore->mutex (dev->sem)


On Sat, 2007-12-08 at 21:33 +0100, Remy Bohmer wrote:

> Which problems? I did not see any special things, it looked rather
> straight forward. What have I overlooked?

On suspend it locks the whole device tree, this means it has 'unbounded'
nesting and holds an 'unbounded' number of locks. Neither things are
easy to annotate (remember that mutex_lock_nested can handle up to 8
nestings and current->held_locks has a max of 30).

In fact, converting this will be the hardest part, it would require
reworking the locking and introduction of a hard limit on the device
tree depth - this might upset some people, but I suspect that 16 or 24
should be deep enough for pretty much anything. Of course, if people
prove me wrong, I'll have to reconsider. The up-side of the locking
scheme I'm thinking of will be that locking the whole tree will only
take 'depth' number of opterations vs the total number of tree elements.


2007-12-08 20:56:07

by Remy Bohmer

[permalink] [raw]
Subject: Re: lockdep problem conversion semaphore->mutex (dev->sem)

Peter,

Thanks for this clear answer.

Remy

2007/12/8, Peter Zijlstra <[email protected]>:
>
> On Sat, 2007-12-08 at 21:33 +0100, Remy Bohmer wrote:
>
> > Which problems? I did not see any special things, it looked rather
> > straight forward. What have I overlooked?
>
> On suspend it locks the whole device tree, this means it has 'unbounded'
> nesting and holds an 'unbounded' number of locks. Neither things are
> easy to annotate (remember that mutex_lock_nested can handle up to 8
> nestings and current->held_locks has a max of 30).
>
> In fact, converting this will be the hardest part, it would require
> reworking the locking and introduction of a hard limit on the device
> tree depth - this might upset some people, but I suspect that 16 or 24
> should be deep enough for pretty much anything. Of course, if people
> prove me wrong, I'll have to reconsider. The up-side of the locking
> scheme I'm thinking of will be that locking the whole tree will only
> take 'depth' number of opterations vs the total number of tree elements.
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2007-12-08 22:56:33

by Jarek Poplawski

[permalink] [raw]
Subject: Re: lockdep problem conversion semaphore->mutex (dev->sem)

Peter Zijlstra wrote, On 12/08/2007 09:50 PM:

> On Sat, 2007-12-08 at 21:33 +0100, Remy Bohmer wrote:
>
>> Which problems? I did not see any special things, it looked rather
>> straight forward. What have I overlooked?
>
> On suspend it locks the whole device tree, this means it has 'unbounded'
> nesting and holds an 'unbounded' number of locks. Neither things are
> easy to annotate (remember that mutex_lock_nested can handle up to 8
> nestings and current->held_locks has a max of 30).
>
> In fact, converting this will be the hardest part, it would require
> reworking the locking and introduction of a hard limit on the device
> tree depth - this might upset some people, but I suspect that 16 or 24
> should be deep enough for pretty much anything. Of course, if people
> prove me wrong, I'll have to reconsider. The up-side of the locking
> scheme I'm thinking of will be that locking the whole tree will only
> take 'depth' number of opterations vs the total number of tree elements.

Of course, I don't know the problem enough, and would be glad if somebody
give me a hint, but I wonder why so deep nesting with lockdep's modification
is necessary here? Does buses have parent buses and so on x8? Why isn't
here considered creating of different lockdep classes according to types
of buses and devices "the usual way"? This way seems to be quite easy in
later debugging.

Regards,
Jarek P.