2009-07-07 15:25:42

by Joao Correia

[permalink] [raw]
Subject: [PATCH 1/3] Increase lockdep limits: MAX_STACK_TRACE_ENTRIES

(Applies to current Linus tree, as of 2.6.31-rc2)

As it stands now, the limit is too low and is being hit by false
positives. Increasing its value will allow for more room to work with.

This was suggested by Ingo Molnar
(http://article.gmane.org/gmane.linux.kernel/852005) but never
submitted as a patch, to the best of my knowledge.


Signed-off-by: Joao Correia <[email protected]>

---
kernel/lockdep_internals.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/lockdep_internals.h b/kernel/lockdep_internals.h
index 699a2ac..93af1f1 100644
--- a/kernel/lockdep_internals.h
+++ b/kernel/lockdep_internals.h
@@ -65,7 +65,7 @@ enum {
* Stack-trace: tightly packed array of stack backtrace
* addresses. Protected by the hash_lock.
*/
-#define MAX_STACK_TRACE_ENTRIES 262144UL
+#define MAX_STACK_TRACE_ENTRIES 1048576UL

extern struct list_head all_lock_classes;
extern struct lock_chain lock_chains[];
---


2009-07-07 15:34:17

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/3] Increase lockdep limits: MAX_STACK_TRACE_ENTRIES

On Tue, 2009-07-07 at 16:25 +0100, Joao Correia wrote:
> (Applies to current Linus tree, as of 2.6.31-rc2)
>
> As it stands now, the limit is too low and is being hit by false
> positives. Increasing its value will allow for more room to work with.
>
> This was suggested by Ingo Molnar
> (http://article.gmane.org/gmane.linux.kernel/852005) but never
> submitted as a patch, to the best of my knowledge.

Right, we found a bug in the dma-debug code that generated tons of
classes where only 1 was needed, which in turn generated tons of chains
and stack entries.

But that got merged, but you're seeing more of this?

2009-07-07 15:50:38

by Joao Correia

[permalink] [raw]
Subject: Re: [PATCH 1/3] Increase lockdep limits: MAX_STACK_TRACE_ENTRIES

On Tue, Jul 7, 2009 at 4:44 PM, Peter Zijlstra<[email protected]> wrote:
> On Tue, 2009-07-07 at 16:38 +0100, Joao Correia wrote:
>> On Tue, Jul 7, 2009 at 4:33 PM, Peter Zijlstra<[email protected]> wrote:
>> > On Tue, 2009-07-07 at 16:25 +0100, Joao Correia wrote:
>> >> (Applies to current Linus tree, as of 2.6.31-rc2)
>> >>
>> >> As it stands now, the limit is too low and is being hit by false
>> >> positives. Increasing its value will allow for more room to work with.
>> >>
>> >> This was suggested by Ingo Molnar
>> >> (http://article.gmane.org/gmane.linux.kernel/852005) but never
>> >> submitted as a patch, to the best of my knowledge.
>> >
>> > Right, we found a bug in the dma-debug code that generated tons of
>> > classes where only 1 was needed, which in turn generated tons of chains
>> > and stack entries.
>> >
>> > But that got merged, but you're seeing more of this?
>
>> Yes. Anything 2.6.31 forward triggers this immediatly during init
>> process, at random places.
>
> Not on my machines it doesn't.. so I suspect its something weird in
> your .config or maybe due to some hardware you have that I don't that
> triggers different drivers or somesuch.
>
>
>

I am not the only one reporting this, and it happens, for example,
with a stock .config from a Fedora 11 install.

It may, of course, be a funny driver interaction yes, but other than
stripping the box piece by piece, how would one go about debugging
this otherwise?

Joao Correia

2009-07-07 15:55:19

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/3] Increase lockdep limits: MAX_STACK_TRACE_ENTRIES

On Tue, 2009-07-07 at 16:50 +0100, Joao Correia wrote:

> >> Yes. Anything 2.6.31 forward triggers this immediatly during init
> >> process, at random places.
> >
> > Not on my machines it doesn't.. so I suspect its something weird in
> > your .config or maybe due to some hardware you have that I don't that
> > triggers different drivers or somesuch.
>
> I am not the only one reporting this, and it happens, for example,
> with a stock .config from a Fedora 11 install.
>
> It may, of course, be a funny driver interaction yes, but other than
> stripping the box piece by piece, how would one go about debugging
> this otherwise?

One thing to do is stare (or share) at the output
of /proc/lockdep_chains and see if there's some particularly large
chains in there, or many of the same name or something.

/proc/lockdep_stats might also be interesting, mine reads like:

[root@opteron ~]# cat /proc/lockdep_stats
lock-classes: 641 [max: 8191]
direct dependencies: 3794 [max: 16384]
indirect dependencies: 7557
all direct dependencies: 73254
dependency chains: 3716 [max: 32768]
dependency chain hlocks: 10167 [max: 163840]
in-hardirq chains: 21
in-softirq chains: 353
in-process chains: 3342
stack-trace entries: 91035 [max: 262144]
combined max dependencies: 26035284
hardirq-safe locks: 28
hardirq-unsafe locks: 460
softirq-safe locks: 114
softirq-unsafe locks: 373
irq-safe locks: 123
irq-unsafe locks: 460
hardirq-read-safe locks: 0
hardirq-read-unsafe locks: 45
softirq-read-safe locks: 8
softirq-read-unsafe locks: 39
irq-read-safe locks: 8
irq-read-unsafe locks: 45
uncategorized locks: 106
unused locks: 1
max locking depth: 14
max recursion depth: 10
debug_locks: 1

2009-07-07 15:59:49

by Joao Correia

[permalink] [raw]
Subject: Re: [PATCH 1/3] Increase lockdep limits: MAX_STACK_TRACE_ENTRIES

On Tue, Jul 7, 2009 at 4:55 PM, Peter Zijlstra<[email protected]> wrote:
> On Tue, 2009-07-07 at 16:50 +0100, Joao Correia wrote:
>
>> >> Yes. Anything 2.6.31 forward triggers this immediatly during init
>> >> process, at random places.
>> >
>> > Not on my machines it doesn't.. so I suspect its something weird in
>> > your .config or maybe due to some hardware you have that I don't that
>> > triggers different drivers or somesuch.
>>
>> I am not the only one reporting this, and it happens, for example,
>> with a stock .config from a Fedora 11 install.
>>
>> It may, of course, be a funny driver interaction yes, but other than
>> stripping the box piece by piece, how would one go about debugging
>> this otherwise?
>
> One thing to do is stare (or share) at the output
> of /proc/lockdep_chains and see if there's some particularly large
> chains in there, or many of the same name or something.
>
> /proc/lockdep_stats might also be interesting, mine reads like:
>
> [root@opteron ~]# cat /proc/lockdep_stats
> ?lock-classes: ? ? ? ? ? ? ? ? ? ? ? ? ?641 [max: 8191]
> ?direct dependencies: ? ? ? ? ? ? ? ? ?3794 [max: 16384]
> ?indirect dependencies: ? ? ? ? ? ? ? ?7557
> ?all direct dependencies: ? ? ? ? ? ? 73254
> ?dependency chains: ? ? ? ? ? ? ? ? ? ?3716 [max: 32768]
> ?dependency chain hlocks: ? ? ? ? ? ? 10167 [max: 163840]
> ?in-hardirq chains: ? ? ? ? ? ? ? ? ? ? ?21
> ?in-softirq chains: ? ? ? ? ? ? ? ? ? ? 353
> ?in-process chains: ? ? ? ? ? ? ? ? ? ?3342
> ?stack-trace entries: ? ? ? ? ? ? ? ? 91035 [max: 262144]
> ?combined max dependencies: ? ? ? ?26035284
> ?hardirq-safe locks: ? ? ? ? ? ? ? ? ? ? 28
> ?hardirq-unsafe locks: ? ? ? ? ? ? ? ? ?460
> ?softirq-safe locks: ? ? ? ? ? ? ? ? ? ?114
> ?softirq-unsafe locks: ? ? ? ? ? ? ? ? ?373
> ?irq-safe locks: ? ? ? ? ? ? ? ? ? ? ? ?123
> ?irq-unsafe locks: ? ? ? ? ? ? ? ? ? ? ?460
> ?hardirq-read-safe locks: ? ? ? ? ? ? ? ? 0
> ?hardirq-read-unsafe locks: ? ? ? ? ? ? ?45
> ?softirq-read-safe locks: ? ? ? ? ? ? ? ? 8
> ?softirq-read-unsafe locks: ? ? ? ? ? ? ?39
> ?irq-read-safe locks: ? ? ? ? ? ? ? ? ? ? 8
> ?irq-read-unsafe locks: ? ? ? ? ? ? ? ? ?45
> ?uncategorized locks: ? ? ? ? ? ? ? ? ? 106
> ?unused locks: ? ? ? ? ? ? ? ? ? ? ? ? ? ?1
> ?max locking depth: ? ? ? ? ? ? ? ? ? ? ?14
> ?max recursion depth: ? ? ? ? ? ? ? ? ? ?10
> ?debug_locks: ? ? ? ? ? ? ? ? ? ? ? ? ? ? 1
>
>
>


I have definitly much higher numbers than those:

[root@hightech git]# cat /proc/lockdep_stats
lock-classes: 1793 [max: 8191]
direct dependencies: 12032 [max: 16384]
indirect dependencies: 126642
all direct dependencies: 1310382
dependency chains: 28387 [max: 65536]
dependency chain hlocks: 100117 [max: 327680]
in-hardirq chains: 3029
in-softirq chains: 7585
in-process chains: 17773
stack-trace entries: 417613 [max: 1048576]
combined max dependencies: 523805800
hardirq-safe locks: 1074
hardirq-unsafe locks: 575
softirq-safe locks: 1171
softirq-unsafe locks: 445
irq-safe locks: 1184
irq-unsafe locks: 575
hardirq-read-safe locks: 0
hardirq-read-unsafe locks: 68
softirq-read-safe locks: 17
softirq-read-unsafe locks: 55
irq-read-safe locks: 17
irq-read-unsafe locks: 68
uncategorized locks: 103
unused locks: 0
max locking depth: 16
max recursion depth: 11
debug_locks: 1

2009-07-08 17:23:13

by Dave Jones

[permalink] [raw]
Subject: Re: [PATCH 1/3] Increase lockdep limits: MAX_STACK_TRACE_ENTRIES

On Tue, Jul 07, 2009 at 05:55:01PM +0200, Peter Zijlstra wrote:
> On Tue, 2009-07-07 at 16:50 +0100, Joao Correia wrote:
>
> > >> Yes. Anything 2.6.31 forward triggers this immediatly during init
> > >> process, at random places.
> > >
> > > Not on my machines it doesn't.. so I suspect its something weird in
> > > your .config or maybe due to some hardware you have that I don't that
> > > triggers different drivers or somesuch.
> >
> > I am not the only one reporting this, and it happens, for example,
> > with a stock .config from a Fedora 11 install.
> >
> > It may, of course, be a funny driver interaction yes, but other than
> > stripping the box piece by piece, how would one go about debugging
> > this otherwise?
>
> One thing to do is stare (or share) at the output
> of /proc/lockdep_chains and see if there's some particularly large
> chains in there, or many of the same name or something.

I don't see any long chains, just lots of them.
29065 lines on my box that's hitting MAX_STACK_TRACE_ENTRIES

> /proc/lockdep_stats might also be interesting, mine reads like:

lock-classes: 1518 [max: 8191]
direct dependencies: 7142 [max: 16384]
indirect dependencies: 97905
all direct dependencies: 753706
dependency chains: 6201 [max: 32768]
dependency chain hlocks: 16665 [max: 163840]
in-hardirq chains: 1380
in-softirq chains: 589
in-process chains: 4232
stack-trace entries: 262144 [max: 262144]
combined max dependencies: 3449006070
hardirq-safe locks: 1008
hardirq-unsafe locks: 364
softirq-safe locks: 838
softirq-unsafe locks: 322
irq-safe locks: 1043
irq-unsafe locks: 364
hardirq-read-safe locks: 0
hardirq-read-unsafe locks: 48
softirq-read-safe locks: 0
softirq-read-unsafe locks: 48
irq-read-safe locks: 0
irq-read-unsafe locks: 48
uncategorized locks: 104
unused locks: 0
max locking depth: 9
max recursion depth: 10
debug_locks: 0


Dave

2009-07-08 18:36:34

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/3] Increase lockdep limits: MAX_STACK_TRACE_ENTRIES

On Wed, 2009-07-08 at 13:22 -0400, Dave Jones wrote:
> On Tue, Jul 07, 2009 at 05:55:01PM +0200, Peter Zijlstra wrote:
> > On Tue, 2009-07-07 at 16:50 +0100, Joao Correia wrote:
> >
> > > >> Yes. Anything 2.6.31 forward triggers this immediatly during init
> > > >> process, at random places.
> > > >
> > > > Not on my machines it doesn't.. so I suspect its something weird in
> > > > your .config or maybe due to some hardware you have that I don't that
> > > > triggers different drivers or somesuch.
> > >
> > > I am not the only one reporting this, and it happens, for example,
> > > with a stock .config from a Fedora 11 install.
> > >
> > > It may, of course, be a funny driver interaction yes, but other than
> > > stripping the box piece by piece, how would one go about debugging
> > > this otherwise?
> >
> > One thing to do is stare (or share) at the output
> > of /proc/lockdep_chains and see if there's some particularly large
> > chains in there, or many of the same name or something.
>
> I don't see any long chains, just lots of them.
> 29065 lines on my box that's hitting MAX_STACK_TRACE_ENTRIES
>
> > /proc/lockdep_stats might also be interesting, mine reads like:
>
> lock-classes: 1518 [max: 8191]
> direct dependencies: 7142 [max: 16384]

Since we have 7 states per class, and can take one trace per state, and
also take one trace per dependency, this would yield a max of:

7*1518+7142 = 17768 stack traces

With the current limit of 262144 stack-trace entries, that would leave
us with and avg depth of:

262144/17768 = 14.75

Now since we would not use all states for each class, we'd likely have a
little more, but that would still suggest we have rather deep stack
traces on avg.

Looking at a lockdep dump hch gave me I can see that that is certainly
possible, I see tons of very deep callchains.

/me wonders if we're getting significantly deeper..

OK I guess we can raise this one, does doubling work? That would get us
around 29 entries per trace..

Also, Dave do these distro init scrips still load every module on the
planet or are we more sensible these days?

module load/unload cycles are really bad for lockdep resources.

--

As a side node, I see that each and every trace ends with a -1 entry:

...
[ 1194.412158] [<c01f7990>] do_mount+0x3c0/0x7c0
[ 1194.412158] [<c01f7e14>] sys_mount+0x84/0xb0
[ 1194.412158] [<c01221b1>] syscall_call+0x7/0xb
[ 1194.412158] [<ffffffff>] 0xffffffff

Which seems to come from:

void save_stack_trace(struct stack_trace *trace)
{
dump_trace(current, NULL, NULL, 0, &save_stack_ops, trace);
if (trace->nr_entries < trace->max_entries)
trace->entries[trace->nr_entries++] = ULONG_MAX;
}
EXPORT_SYMBOL_GPL(save_stack_trace);

commit 006e84ee3a54e393ec6bef2a9bc891dc5bde2843 seems involved,..

Anybody got clue?

2009-07-08 18:45:31

by Dave Jones

[permalink] [raw]
Subject: Re: [PATCH 1/3] Increase lockdep limits: MAX_STACK_TRACE_ENTRIES

On Wed, Jul 08, 2009 at 08:36:04PM +0200, Peter Zijlstra wrote:
> Looking at a lockdep dump hch gave me I can see that that is certainly
> possible, I see tons of very deep callchains.
>
> /me wonders if we're getting significantly deeper..
>
> OK I guess we can raise this one, does doubling work? That would get us
> around 29 entries per trace..
>
> Also, Dave do these distro init scrips still load every module on the
> planet or are we more sensible these days?
> module load/unload cycles are really bad for lockdep resources.

34 modules get loaded on the system I saw the trace on.
(That's from lsmod after fulling booting up). I'm not aware of any
module unloading that happens during bootup.

Dave

2009-07-08 19:48:56

by Joao Correia

[permalink] [raw]
Subject: Re: [PATCH 1/3] Increase lockdep limits: MAX_STACK_TRACE_ENTRIES

On Wed, Jul 8, 2009 at 7:36 PM, Peter Zijlstra<[email protected]> wrote:
> On Wed, 2009-07-08 at 13:22 -0400, Dave Jones wrote:
>> On Tue, Jul 07, 2009 at 05:55:01PM +0200, Peter Zijlstra wrote:
>> ?> On Tue, 2009-07-07 at 16:50 +0100, Joao Correia wrote:
>> ?>
>> ?> > >> Yes. Anything 2.6.31 forward triggers this immediatly during init
>> ?> > >> process, at random places.
>> ?> > >
>> ?> > > Not on my machines it doesn't.. so I suspect its something weird in
>> ?> > > your .config or maybe due to some hardware you have that I don't that
>> ?> > > triggers different drivers or somesuch.
>> ?> >
>> ?> > I am not the only one reporting this, and it happens, for example,
>> ?> > with a stock .config from a Fedora 11 install.
>> ?> >
>> ?> > It may, of course, be a funny driver interaction yes, but other than
>> ?> > stripping the box piece by piece, how would one go about debugging
>> ?> > this otherwise?
>> ?>
>> ?> One thing to do is stare (or share) at the output
>> ?> of /proc/lockdep_chains and see if there's some particularly large
>> ?> chains in there, or many of the same name or something.
>>
>> I don't see any long chains, just lots of them.
>> 29065 lines on my box that's hitting MAX_STACK_TRACE_ENTRIES
>>
>> ?> /proc/lockdep_stats might also be interesting, mine reads like:
>>
>> ?lock-classes: ? ? ? ? ? ? ? ? ? ? ? ? 1518 [max: 8191]
>> ?direct dependencies: ? ? ? ? ? ? ? ? ?7142 [max: 16384]
>
> Since we have 7 states per class, and can take one trace per state, and
> also take one trace per dependency, this would yield a max of:
>
> ?7*1518+7142 = 17768 stack traces
>
> With the current limit of 262144 stack-trace entries, that would leave
> us with and avg depth of:
>
> ?262144/17768 = 14.75
>
> Now since we would not use all states for each class, we'd likely have a
> little more, but that would still suggest we have rather deep stack
> traces on avg.
>
> Looking at a lockdep dump hch gave me I can see that that is certainly
> possible, I see tons of very deep callchains.
>
> /me wonders if we're getting significantly deeper..
>
> OK I guess we can raise this one, does doubling work? That would get us
> around 29 entries per trace..
>
> Also, Dave do these distro init scrips still load every module on the
> planet or are we more sensible these days?
>
> module load/unload cycles are really bad for lockdep resources.
>
> --
>
> As a side node, I see that each and every trace ends with a -1 entry:
>
> ...
> [ 1194.412158] ? ?[<c01f7990>] do_mount+0x3c0/0x7c0
> [ 1194.412158] ? ?[<c01f7e14>] sys_mount+0x84/0xb0
> [ 1194.412158] ? ?[<c01221b1>] syscall_call+0x7/0xb
> [ 1194.412158] ? ?[<ffffffff>] 0xffffffff
>
> Which seems to come from:
>
> void save_stack_trace(struct stack_trace *trace)
> {
> ? ? ? ?dump_trace(current, NULL, NULL, 0, &save_stack_ops, trace);
> ? ? ? ?if (trace->nr_entries < trace->max_entries)
> ? ? ? ? ? ? ? ?trace->entries[trace->nr_entries++] = ULONG_MAX;
> }
> EXPORT_SYMBOL_GPL(save_stack_trace);
>
> commit 006e84ee3a54e393ec6bef2a9bc891dc5bde2843 seems involved,..
>
> Anybody got clue?
>
>

Im in no way pretending to understand all the details on this system,
but i do know there has been, as of late, an effort to make the init
process do more things at once, ie, load more modules in parallel to
speed up the process. Can't that be at least partially responsible for
this? Even if its just making some other problem more obvious?

Joao Correia

2009-07-08 19:56:46

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/3] Increase lockdep limits: MAX_STACK_TRACE_ENTRIES

On Wed, 2009-07-08 at 20:48 +0100, Joao Correia wrote:
>
>
> Im in no way pretending to understand all the details on this system,
> but i do know there has been, as of late, an effort to make the init
> process do more things at once, ie, load more modules in parallel to
> speed up the process. Can't that be at least partially responsible for
> this? Even if its just making some other problem more obvious?

Nah, lockdep serializes all that under a single lock ;-)

module unloading in particular wastes resources from a lockdep pov, so
as long as that doesn't happen, we're good.

2009-07-09 04:39:59

by Dave Jones

[permalink] [raw]
Subject: Re: [PATCH 1/3] Increase lockdep limits: MAX_STACK_TRACE_ENTRIES

On Wed, Jul 08, 2009 at 08:36:04PM +0200, Peter Zijlstra wrote:
> Looking at a lockdep dump hch gave me I can see that that is certainly
> possible, I see tons of very deep callchains.
>
> /me wonders if we're getting significantly deeper..

Looking at /proc/lockdep, I'm curious..
Take a look at http://davej.fedorapeople.org/lockdep
scroll down to c12c0924

What's up with all those old_style_spin_init's ?

Dave

2009-07-09 08:03:02

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/3] Increase lockdep limits: MAX_STACK_TRACE_ENTRIES

On Thu, 2009-07-09 at 00:39 -0400, Dave Jones wrote:
> On Wed, Jul 08, 2009 at 08:36:04PM +0200, Peter Zijlstra wrote:
> > Looking at a lockdep dump hch gave me I can see that that is certainly
> > possible, I see tons of very deep callchains.
> >
> > /me wonders if we're getting significantly deeper..
>
> Looking at /proc/lockdep, I'm curious..
> Take a look at http://davej.fedorapeople.org/lockdep
> scroll down to c12c0924
>
> What's up with all those old_style_spin_init's ?

What kernel are you running? Does your lib/dma_debug.c:dma_debug_init()
have spin_lock_init() in that HASH_SIZE loop?

If so, then there's someone else doing silly, if not, you seriously need
to upgrade your kernel because you're running something ancient,
like .30 :-)

2009-07-09 09:07:35

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH 1/3] Increase lockdep limits: MAX_STACK_TRACE_ENTRIES

On Wed, 2009-07-08 at 20:36 +0200, Peter Zijlstra wrote:
> As a side node, I see that each and every trace ends with a -1 entry:
>
> ...
> [ 1194.412158] [<c01f7990>] do_mount+0x3c0/0x7c0
> [ 1194.412158] [<c01f7e14>] sys_mount+0x84/0xb0
> [ 1194.412158] [<c01221b1>] syscall_call+0x7/0xb
> [ 1194.412158] [<ffffffff>] 0xffffffff
>
> Which seems to come from:
>
> void save_stack_trace(struct stack_trace *trace)
> {
> dump_trace(current, NULL, NULL, 0, &save_stack_ops, trace);
> if (trace->nr_entries < trace->max_entries)
> trace->entries[trace->nr_entries++] = ULONG_MAX;
> }
> EXPORT_SYMBOL_GPL(save_stack_trace);
>
> commit 006e84ee3a54e393ec6bef2a9bc891dc5bde2843 seems involved,..

The reason for this is that if there are no more traces to show, it
inserts -1. In this case, it cannot trace beyond the system call. If the
stack trace is truncated because of the maximum number of trace entries
it can show, you won't get a -1.

Before the commit above, it was always inserting -1 even if the trace
was longer than the maximum number of entries.

--
Catalin

2009-07-09 09:10:21

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/3] Increase lockdep limits: MAX_STACK_TRACE_ENTRIES

On Thu, 2009-07-09 at 10:06 +0100, Catalin Marinas wrote:
> On Wed, 2009-07-08 at 20:36 +0200, Peter Zijlstra wrote:
> > As a side node, I see that each and every trace ends with a -1 entry:
> >
> > ...
> > [ 1194.412158] [<c01f7990>] do_mount+0x3c0/0x7c0
> > [ 1194.412158] [<c01f7e14>] sys_mount+0x84/0xb0
> > [ 1194.412158] [<c01221b1>] syscall_call+0x7/0xb
> > [ 1194.412158] [<ffffffff>] 0xffffffff
> >
> > Which seems to come from:
> >
> > void save_stack_trace(struct stack_trace *trace)
> > {
> > dump_trace(current, NULL, NULL, 0, &save_stack_ops, trace);
> > if (trace->nr_entries < trace->max_entries)
> > trace->entries[trace->nr_entries++] = ULONG_MAX;
> > }
> > EXPORT_SYMBOL_GPL(save_stack_trace);
> >
> > commit 006e84ee3a54e393ec6bef2a9bc891dc5bde2843 seems involved,..
>
> The reason for this is that if there are no more traces to show, it
> inserts -1. In this case, it cannot trace beyond the system call. If the
> stack trace is truncated because of the maximum number of trace entries
> it can show, you won't get a -1.
>
> Before the commit above, it was always inserting -1 even if the trace
> was longer than the maximum number of entries.

Seems daft to me, I'll fix up lockdep to truncate that last entry,
having a gazillion copies of -1 in the trace entries doesn't make sense.

2009-07-09 16:11:47

by Dave Jones

[permalink] [raw]
Subject: Re: [PATCH 1/3] Increase lockdep limits: MAX_STACK_TRACE_ENTRIES

On Thu, Jul 09, 2009 at 10:02:36AM +0200, Peter Zijlstra wrote:
> On Thu, 2009-07-09 at 00:39 -0400, Dave Jones wrote:
> > On Wed, Jul 08, 2009 at 08:36:04PM +0200, Peter Zijlstra wrote:
> > > Looking at a lockdep dump hch gave me I can see that that is certainly
> > > possible, I see tons of very deep callchains.
> > >
> > > /me wonders if we're getting significantly deeper..
> >
> > Looking at /proc/lockdep, I'm curious..
> > Take a look at http://davej.fedorapeople.org/lockdep
> > scroll down to c12c0924
> >
> > What's up with all those old_style_spin_init's ?
>
> What kernel are you running?

.31rc2

> Does your lib/dma_debug.c:dma_debug_init()
> have spin_lock_init() in that HASH_SIZE loop?

it's doing it by hand..

717 for (i = 0; i < HASH_SIZE; ++i) {
718 INIT_LIST_HEAD(&dma_entry_hash[i].list);
719 dma_entry_hash[i].lock = SPIN_LOCK_UNLOCKED;
720 }


Dave

2009-07-09 17:07:49

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/3] Increase lockdep limits: MAX_STACK_TRACE_ENTRIES

On Thu, 2009-07-09 at 12:10 -0400, Dave Jones wrote:
> On Thu, Jul 09, 2009 at 10:02:36AM +0200, Peter Zijlstra wrote:
> > On Thu, 2009-07-09 at 00:39 -0400, Dave Jones wrote:
> > > On Wed, Jul 08, 2009 at 08:36:04PM +0200, Peter Zijlstra wrote:
> > > > Looking at a lockdep dump hch gave me I can see that that is certainly
> > > > possible, I see tons of very deep callchains.
> > > >
> > > > /me wonders if we're getting significantly deeper..
> > >
> > > Looking at /proc/lockdep, I'm curious..
> > > Take a look at http://davej.fedorapeople.org/lockdep
> > > scroll down to c12c0924
> > >
> > > What's up with all those old_style_spin_init's ?
> >
> > What kernel are you running?
>
> ..31rc2
>
> > Does your lib/dma_debug.c:dma_debug_init()
> > have spin_lock_init() in that HASH_SIZE loop?
>
> it's doing it by hand..
>
> 717 for (i = 0; i < HASH_SIZE; ++i) {
> 718 INIT_LIST_HEAD(&dma_entry_hash[i].list);
> 719 dma_entry_hash[i].lock = SPIN_LOCK_UNLOCKED;
> 720 }

Hmm, that's the problem, it should read:

for (i = 0; i < HASH_SIZE; ++i) {
INIT_LIST_HEAD(&dma_entry_hash[i].list);
spin_lock_init(&dma_entry_hash[i].lock);
}

and does in -tip, so maybe Ingo took that patch, but I thought Joerg
would push that Linus wards. Joerg?

2009-07-10 15:51:34

by Joao Correia

[permalink] [raw]
Subject: Re: [PATCH 1/3] Increase lockdep limits: MAX_STACK_TRACE_ENTRIES

On Thu, Jul 9, 2009 at 6:07 PM, Peter Zijlstra<[email protected]> wrote:
> On Thu, 2009-07-09 at 12:10 -0400, Dave Jones wrote:
>> On Thu, Jul 09, 2009 at 10:02:36AM +0200, Peter Zijlstra wrote:
>> ?> On Thu, 2009-07-09 at 00:39 -0400, Dave Jones wrote:
>> ?> > On Wed, Jul 08, 2009 at 08:36:04PM +0200, Peter Zijlstra wrote:
>> ?> > ?> Looking at a lockdep dump hch gave me I can see that that is certainly
>> ?> > ?> possible, I see tons of very deep callchains.
>> ?> > ?>
>> ?> > ?> /me wonders if we're getting significantly deeper..
>> ?> >
>> ?> > Looking at /proc/lockdep, I'm curious..
>> ?> > Take a look at http://davej.fedorapeople.org/lockdep
>> ?> > scroll down to c12c0924
>> ?> >
>> ?> > What's up with all those old_style_spin_init's ?
>> ?>
>> ?> What kernel are you running?
>>
>> ..31rc2
>>
>> ?> Does your lib/dma_debug.c:dma_debug_init()
>> ?> have spin_lock_init() in that HASH_SIZE loop?
>>
>> it's doing it by hand..
>>
>> ?717 ? ? ? ? for (i = 0; i < HASH_SIZE; ++i) {
>> ?718 ? ? ? ? ? ? ? ? INIT_LIST_HEAD(&dma_entry_hash[i].list);
>> ?719 ? ? ? ? ? ? ? ? dma_entry_hash[i].lock = SPIN_LOCK_UNLOCKED;
>> ?720 ? ? ? ? }
>
> Hmm, that's the problem, it should read:
>
> ? ? ? ?for (i = 0; i < HASH_SIZE; ++i) {
> ? ? ? ? ? ? ? ?INIT_LIST_HEAD(&dma_entry_hash[i].list);
> ? ? ? ? ? ? ? ?spin_lock_init(&dma_entry_hash[i].lock);
> ? ? ? ?}
>
> and does in -tip, so maybe Ingo took that patch, but I thought Joerg
> would push that Linus wards. Joerg?
>
>

Indeed, changing to spin_lock_init keeps the values at a much
healthier value, and the warnings do not trigger anymore. Thanks for
looking into this.

Disregard my patch submission please.

Joao Correia

2009-07-20 13:30:38

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH] lockdep: fixup stacktrace wastage

On Thu, 2009-07-09 at 11:10 +0200, Peter Zijlstra wrote:
> On Thu, 2009-07-09 at 10:06 +0100, Catalin Marinas wrote:
> > On Wed, 2009-07-08 at 20:36 +0200, Peter Zijlstra wrote:
> > > As a side node, I see that each and every trace ends with a -1 entry:
> > >
> > > ...
> > > [ 1194.412158] [<c01f7990>] do_mount+0x3c0/0x7c0
> > > [ 1194.412158] [<c01f7e14>] sys_mount+0x84/0xb0
> > > [ 1194.412158] [<c01221b1>] syscall_call+0x7/0xb
> > > [ 1194.412158] [<ffffffff>] 0xffffffff
> > >
> > > Which seems to come from:
> > >
> > > void save_stack_trace(struct stack_trace *trace)
> > > {
> > > dump_trace(current, NULL, NULL, 0, &save_stack_ops, trace);
> > > if (trace->nr_entries < trace->max_entries)
> > > trace->entries[trace->nr_entries++] = ULONG_MAX;
> > > }
> > > EXPORT_SYMBOL_GPL(save_stack_trace);
> > >
> > > commit 006e84ee3a54e393ec6bef2a9bc891dc5bde2843 seems involved,..
> >
> > The reason for this is that if there are no more traces to show, it
> > inserts -1. In this case, it cannot trace beyond the system call. If the
> > stack trace is truncated because of the maximum number of trace entries
> > it can show, you won't get a -1.
> >
> > Before the commit above, it was always inserting -1 even if the trace
> > was longer than the maximum number of entries.
>
> Seems daft to me, I'll fix up lockdep to truncate that last entry,
> having a gazillion copies of -1 in the trace entries doesn't make sense.

Bugger, not all arches have that. I'll queue up the below, if it breaks
anything, I'll go around and fix up all arches to never emit this -1
crap.

---
Subject: lockdep: fixup stacktrace wastage
From: Peter Zijlstra <[email protected]>
Date: Mon Jul 20 15:27:04 CEST 2009

Some silly architectures emit trailing -1 entries on stacktraces, trim those
to save space.

Signed-off-by: Peter Zijlstra <[email protected]>
---
kernel/lockdep.c | 6 ++++++
1 file changed, 6 insertions(+)

Index: linux-2.6/kernel/lockdep.c
===================================================================
--- linux-2.6.orig/kernel/lockdep.c
+++ linux-2.6/kernel/lockdep.c
@@ -367,6 +367,12 @@ static int save_trace(struct stack_trace

save_stack_trace(trace);

+ /*
+ * Some daft arches put -1 at the end to indicate its a full trace.
+ */
+ if (trace->entries[trace->nr_entries-1] == ULONG_MAX)
+ trace->nr_entries--;
+
trace->max_entries = trace->nr_entries;

nr_stack_trace_entries += trace->nr_entries;