2020-05-26 17:46:51

by Waiman Long

[permalink] [raw]
Subject: [PATCH] locking/lockdep: Increase MAX_LOCKDEP_ENTRIES by half

It was found by Qian Cai that lockdep splat sometimes appears with the
"BUG: MAX_LOCKDEP_ENTRIES too low" message on linux-next. On a 32-vcpu VM
guest with a v5.7-rc7 based kernel, I looked at how many of the various
table entries were being used after bootup and after a parallel kernel
build (make -j32). The tables below show the usage statistics.

After bootup:

Table Used Max %age
----- ---- --- ----
lock_classes[] 1834 8192 22.4
list_entries[] 15646 32768 47.7
lock_chains[] 20873 65536 31.8
chain_hlocks[] 83199 327680 25.4
stack_trace[] 146177 524288 27.9

After parallel kernel build:

Table Used Max %age
----- ---- --- ----
lock_classes[] 1864 8192 22.8
list_entries[] 17134 32768 52.3
lock_chains[] 25196 65536 38.4
chain_hlocks[] 106321 327680 32.4
stack_trace[] 158700 524288 30.3

Percentage-wise, it can be seen that the list_entries for direct
dependencies are used much more than the other tables. So it is also
the table that is mostly likely to run out of space when running a
compex workload.

To reduce the likelihood of running out of table entries, we can increase
MAX_LOCKDEP_ENTRIES by 50% from 16384/32768 to 24576/49152. On a 64-bit
architecture, that represents an increase in memory consumption of
917504 bytes. With that change, the percentage usage of list_entries[]
will fall to 31.8% and 34.9% respectively to make them more in line
with the other tables.

Signed-off-by: Waiman Long <[email protected]>
---
kernel/locking/lockdep_internals.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/lockdep_internals.h b/kernel/locking/lockdep_internals.h
index baca699b94e9..6108d2fbe775 100644
--- a/kernel/locking/lockdep_internals.h
+++ b/kernel/locking/lockdep_internals.h
@@ -89,12 +89,12 @@ static const unsigned long LOCKF_USED_IN_IRQ_READ =
* table (if it's not there yet), and we check it for lock order
* conflicts and deadlocks.
*/
-#define MAX_LOCKDEP_ENTRIES 16384UL
+#define MAX_LOCKDEP_ENTRIES 24576UL
#define MAX_LOCKDEP_CHAINS_BITS 15
#define MAX_STACK_TRACE_ENTRIES 262144UL
#define STACK_TRACE_HASH_SIZE 8192
#else
-#define MAX_LOCKDEP_ENTRIES 32768UL
+#define MAX_LOCKDEP_ENTRIES 49152UL

#define MAX_LOCKDEP_CHAINS_BITS 16

--
2.18.1


2020-05-26 19:04:12

by Qian Cai

[permalink] [raw]
Subject: Re: [PATCH] locking/lockdep: Increase MAX_LOCKDEP_ENTRIES by half

On Tue, May 26, 2020 at 01:43:49PM -0400, Waiman Long wrote:
> It was found by Qian Cai that lockdep splat sometimes appears with the
> "BUG: MAX_LOCKDEP_ENTRIES too low" message on linux-next. On a 32-vcpu VM
> guest with a v5.7-rc7 based kernel, I looked at how many of the various
> table entries were being used after bootup and after a parallel kernel
> build (make -j32). The tables below show the usage statistics.

I think this approach could help the situatin on this AMD server.
However, I don't understand why this particular system starts to exceed
the MAX_LOCKDEP_ENTRIES. All other Intel, powerpc, amd64 and s390
running the same workload have only < 20k at the end, but on this AMD
server,

After system boot: ~13000 (which is similar to other arches.)
After LTP MM +2000
After LTP Syscalls: +5000
After reading all sysfs [1] (except debugfs): +10000
[1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/fs/read_all/read_all.c
# read_all -d /sys -q -r 10

I still don't understand why reading all sysfs files on this system
could increase that much, but here is the lockdep file after
running sysfs read to see if you could spot anything obviously,

https://cailca.github.io/files/lockdep.txt

lock-classes: 2114 [max: 8192]
direct dependencies: 28568 [max: 32768]
indirect dependencies: 126882
all direct dependencies: 304192
dependency chains: 38269 [max: 65536]
dependency chain hlocks used: 141480 [max: 327680]
dependency chain hlocks lost: 0
in-hardirq chains: 166
in-softirq chains: 818
in-process chains: 37285
stack-trace entries: 175558 [max: 524288]
number of stack traces: 8355
number of stack hash chains: 6545
combined max dependencies: 804750782
hardirq-safe locks: 80
hardirq-unsafe locks: 1048
softirq-safe locks: 169
softirq-unsafe locks: 957
irq-safe locks: 183
irq-unsafe locks: 1048
hardirq-read-safe locks: 4
hardirq-read-unsafe locks: 943
softirq-read-safe locks: 12
softirq-read-unsafe locks: 940
irq-read-safe locks: 12
irq-read-unsafe locks: 943
uncategorized locks: 255
unused locks: 0
max locking depth: 19
max bfs queue depth: 1136
chain lookup misses: 43195
chain lookup hits: 3151662027
cyclic checks: 44009
redundant checks: 0
redundant links: 0
find-mask forwards checks: 7726
find-mask backwards checks: 8095
hardirq on events: 3503795225
hardirq off events: 3503795224
redundant hardirq ons: 453772
redundant hardirq offs: 1856554342
softirq on events: 51104551
softirq off events: 51104551
redundant softirq ons: 0
redundant softirq offs: 0
debug_locks: 1

zapped classes: 1101
zapped lock chains: 1117
large chain blocks: 1

>
> After bootup:
>
> Table Used Max %age
> ----- ---- --- ----
> lock_classes[] 1834 8192 22.4
> list_entries[] 15646 32768 47.7
> lock_chains[] 20873 65536 31.8
> chain_hlocks[] 83199 327680 25.4
> stack_trace[] 146177 524288 27.9
>
> After parallel kernel build:
>
> Table Used Max %age
> ----- ---- --- ----
> lock_classes[] 1864 8192 22.8
> list_entries[] 17134 32768 52.3
> lock_chains[] 25196 65536 38.4
> chain_hlocks[] 106321 327680 32.4
> stack_trace[] 158700 524288 30.3
>
> Percentage-wise, it can be seen that the list_entries for direct
> dependencies are used much more than the other tables. So it is also
> the table that is mostly likely to run out of space when running a
> compex workload.
>
> To reduce the likelihood of running out of table entries, we can increase
> MAX_LOCKDEP_ENTRIES by 50% from 16384/32768 to 24576/49152. On a 64-bit
> architecture, that represents an increase in memory consumption of
> 917504 bytes. With that change, the percentage usage of list_entries[]
> will fall to 31.8% and 34.9% respectively to make them more in line
> with the other tables.
>
> Signed-off-by: Waiman Long <[email protected]>
> ---
> kernel/locking/lockdep_internals.h | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/locking/lockdep_internals.h b/kernel/locking/lockdep_internals.h
> index baca699b94e9..6108d2fbe775 100644
> --- a/kernel/locking/lockdep_internals.h
> +++ b/kernel/locking/lockdep_internals.h
> @@ -89,12 +89,12 @@ static const unsigned long LOCKF_USED_IN_IRQ_READ =
> * table (if it's not there yet), and we check it for lock order
> * conflicts and deadlocks.
> */
> -#define MAX_LOCKDEP_ENTRIES 16384UL
> +#define MAX_LOCKDEP_ENTRIES 24576UL
> #define MAX_LOCKDEP_CHAINS_BITS 15
> #define MAX_STACK_TRACE_ENTRIES 262144UL
> #define STACK_TRACE_HASH_SIZE 8192
> #else
> -#define MAX_LOCKDEP_ENTRIES 32768UL
> +#define MAX_LOCKDEP_ENTRIES 49152UL
>
> #define MAX_LOCKDEP_CHAINS_BITS 16
>
> --
> 2.18.1
>

2020-05-26 23:19:37

by Qian Cai

[permalink] [raw]
Subject: Re: [PATCH] locking/lockdep: Increase MAX_LOCKDEP_ENTRIES by half

On Tue, May 26, 2020 at 02:58:53PM -0400, Qian Cai wrote:
> On Tue, May 26, 2020 at 01:43:49PM -0400, Waiman Long wrote:
> > It was found by Qian Cai that lockdep splat sometimes appears with the
> > "BUG: MAX_LOCKDEP_ENTRIES too low" message on linux-next. On a 32-vcpu VM
> > guest with a v5.7-rc7 based kernel, I looked at how many of the various
> > table entries were being used after bootup and after a parallel kernel
> > build (make -j32). The tables below show the usage statistics.
>
> I think this approach could help the situatin on this AMD server.
> However, I don't understand why this particular system starts to exceed
> the MAX_LOCKDEP_ENTRIES. All other Intel, powerpc, amd64 and s390
> running the same workload have only < 20k at the end, but on this AMD
> server,

Correction -- Intel is also quite close to exceed it.

direct dependencies: 29836 [max: 32768]

arm64:

direct dependencies: 25268 [max: 32768]

2020-05-26 23:44:55

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] locking/lockdep: Increase MAX_LOCKDEP_ENTRIES by half

On Tue, May 26, 2020 at 02:58:50PM -0400, Qian Cai wrote:

> I still don't understand why reading all sysfs files on this system
> could increase that much, but here is the lockdep file after
> running sysfs read to see if you could spot anything obviously,
>
> https://cailca.github.io/files/lockdep.txt

00000000f011a2a5 OPS: 20 FD: 45 BD: 1 .+.+: kn->active#834

is that somewhere near the number of CPUs you have?

Anyway, there's very long "kn->active#..." chains in there, which seems
to suggest some annotation is all sorts of buggered.

2020-05-27 00:29:10

by Qian Cai

[permalink] [raw]
Subject: Re: [PATCH] locking/lockdep: Increase MAX_LOCKDEP_ENTRIES by half

On Tue, May 26, 2020 at 09:56:30PM +0200, Peter Zijlstra wrote:
> On Tue, May 26, 2020 at 02:58:50PM -0400, Qian Cai wrote:
>
> > I still don't understand why reading all sysfs files on this system
> > could increase that much, but here is the lockdep file after
> > running sysfs read to see if you could spot anything obviously,
> >
> > https://cailca.github.io/files/lockdep.txt
>
> 00000000f011a2a5 OPS: 20 FD: 45 BD: 1 .+.+: kn->active#834
>
> is that somewhere near the number of CPUs you have?

834? No.

# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 2
Core(s) per socket: 32
Socket(s): 2
NUMA node(s): 8
Vendor ID: AuthenticAMD
CPU family: 23
Model: 1
Model name: AMD EPYC 7601 32-Core Processor
Stepping: 2
CPU MHz: 2932.801
CPU max MHz: 2200.0000
CPU min MHz: 1200.0000
BogoMIPS: 4391.87
Virtualization: AMD-V
L1d cache: 32K
L1i cache: 64K
L2 cache: 512K
L3 cache: 8192K
NUMA node0 CPU(s): 0-7,64-71
NUMA node1 CPU(s): 8-15,72-79
NUMA node2 CPU(s): 16-23,80-87
NUMA node3 CPU(s): 24-31,88-95
NUMA node4 CPU(s): 32-39,96-103
NUMA node5 CPU(s): 40-47,104-111
NUMA node6 CPU(s): 48-55,112-119
NUMA node7 CPU(s): 56-63,120-127

>
> Anyway, there's very long "kn->active#..." chains in there, which seems
> to suggest some annotation is all sorts of buggered.

2020-05-27 00:33:03

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH] locking/lockdep: Increase MAX_LOCKDEP_ENTRIES by half

On 5/26/20 3:56 PM, Peter Zijlstra wrote:
> On Tue, May 26, 2020 at 02:58:50PM -0400, Qian Cai wrote:
>
>> I still don't understand why reading all sysfs files on this system
>> could increase that much, but here is the lockdep file after
>> running sysfs read to see if you could spot anything obviously,
>>
>> https://cailca.github.io/files/lockdep.txt
> 00000000f011a2a5 OPS: 20 FD: 45 BD: 1 .+.+: kn->active#834
>
> is that somewhere near the number of CPUs you have?
>
> Anyway, there's very long "kn->active#..." chains in there, which seems
> to suggest some annotation is all sorts of buggered.
>
It is actually one active lock per instance of the kerfs_node
structures. That means more than 800 sysfs files are accessed in some
way. As we could have much more than 800 sysfs files in the system, we
could easily overwhelm the lockdep tables if we really try to access all
of them.

Cheers,
Longman

2020-05-27 00:57:38

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] locking/lockdep: Increase MAX_LOCKDEP_ENTRIES by half

On Tue, May 26, 2020 at 04:30:58PM -0400, Waiman Long wrote:
> On 5/26/20 3:56 PM, Peter Zijlstra wrote:
> > On Tue, May 26, 2020 at 02:58:50PM -0400, Qian Cai wrote:
> >
> > > I still don't understand why reading all sysfs files on this system
> > > could increase that much, but here is the lockdep file after
> > > running sysfs read to see if you could spot anything obviously,
> > >
> > > https://cailca.github.io/files/lockdep.txt
> > 00000000f011a2a5 OPS: 20 FD: 45 BD: 1 .+.+: kn->active#834
> >
> > is that somewhere near the number of CPUs you have?
> >
> > Anyway, there's very long "kn->active#..." chains in there, which seems
> > to suggest some annotation is all sorts of buggered.
> >
> It is actually one active lock per instance of the kerfs_node structures.
> That means more than 800 sysfs files are accessed in some way. As we could
> have much more than 800 sysfs files in the system, we could easily overwhelm
> the lockdep tables if we really try to access all of them.

A lock per instance is crazy, that's not what lockdep is made for.
Fixing this seems like a far better idea than increasing the numbers.

2020-05-27 01:08:27

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH] locking/lockdep: Increase MAX_LOCKDEP_ENTRIES by half

On 5/26/20 5:27 PM, Peter Zijlstra wrote:
> On Tue, May 26, 2020 at 04:30:58PM -0400, Waiman Long wrote:
>> On 5/26/20 3:56 PM, Peter Zijlstra wrote:
>>> On Tue, May 26, 2020 at 02:58:50PM -0400, Qian Cai wrote:
>>>
>>>> I still don't understand why reading all sysfs files on this system
>>>> could increase that much, but here is the lockdep file after
>>>> running sysfs read to see if you could spot anything obviously,
>>>>
>>>> https://cailca.github.io/files/lockdep.txt
>>> 00000000f011a2a5 OPS: 20 FD: 45 BD: 1 .+.+: kn->active#834
>>>
>>> is that somewhere near the number of CPUs you have?
>>>
>>> Anyway, there's very long "kn->active#..." chains in there, which seems
>>> to suggest some annotation is all sorts of buggered.
>>>
>> It is actually one active lock per instance of the kerfs_node structures.
>> That means more than 800 sysfs files are accessed in some way. As we could
>> have much more than 800 sysfs files in the system, we could easily overwhelm
>> the lockdep tables if we really try to access all of them.
> A lock per instance is crazy, that's not what lockdep is made for.
> Fixing this seems like a far better idea than increasing the numbers.

Thinking about it, one lock per sysfs file does make sense as access to
those sysfs files can invoke vastly different code path and unifying
them into one single lock can easily lead to false positive warning.

Cheers,
Longman

2020-05-27 03:54:19

by Qian Cai

[permalink] [raw]
Subject: Re: [PATCH] locking/lockdep: Increase MAX_LOCKDEP_ENTRIES by half

On Tue, May 26, 2020 at 04:30:58PM -0400, Waiman Long wrote:
> On 5/26/20 3:56 PM, Peter Zijlstra wrote:
> > On Tue, May 26, 2020 at 02:58:50PM -0400, Qian Cai wrote:
> >
> > > I still don't understand why reading all sysfs files on this system
> > > could increase that much, but here is the lockdep file after
> > > running sysfs read to see if you could spot anything obviously,
> > >
> > > https://cailca.github.io/files/lockdep.txt
> > 00000000f011a2a5 OPS: 20 FD: 45 BD: 1 .+.+: kn->active#834
> >
> > is that somewhere near the number of CPUs you have?
> >
> > Anyway, there's very long "kn->active#..." chains in there, which seems
> > to suggest some annotation is all sorts of buggered.
> >
> It is actually one active lock per instance of the kerfs_node structures.
> That means more than 800 sysfs files are accessed in some way. As we could
> have much more than 800 sysfs files in the system, we could easily overwhelm
> the lockdep tables if we really try to access all of them.

Yes, there are a lot of those on large systems, NUMA, percpu, slab etc.
Isn't it better to extend MAX_LOCKDEP_ENTRIES dynamically? There are
plenty of memory over there.