2021-06-30 01:31:09

by Guru Das Srinagesh

[permalink] [raw]
Subject: [RFC PATCH] spinlock_debug: Save stacktrace at lock acquisition

In case of spinlock recursion bugs, knowing which entity acquired the
lock initially is key to debugging. Therefore, save the stack frames
when the lock is acquired and print them when spinlock recursion is
detected.

Signed-off-by: Guru Das Srinagesh <[email protected]>
---
include/linux/spinlock_types.h | 3 +++
kernel/locking/spinlock_debug.c | 9 +++++++++
2 files changed, 12 insertions(+)

diff --git a/include/linux/spinlock_types.h b/include/linux/spinlock_types.h
index b981caa..8a68d55 100644
--- a/include/linux/spinlock_types.h
+++ b/include/linux/spinlock_types.h
@@ -22,6 +22,9 @@ typedef struct raw_spinlock {
#ifdef CONFIG_DEBUG_SPINLOCK
unsigned int magic, owner_cpu;
void *owner;
+#define MAX_STACK_LEN 16
+ int stack_len;
+ unsigned long stack_trace[MAX_STACK_LEN];
#endif
#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map dep_map;
diff --git a/kernel/locking/spinlock_debug.c b/kernel/locking/spinlock_debug.c
index b9d9308..d34cccc 100644
--- a/kernel/locking/spinlock_debug.c
+++ b/kernel/locking/spinlock_debug.c
@@ -12,6 +12,7 @@
#include <linux/debug_locks.h>
#include <linux/delay.h>
#include <linux/export.h>
+#include <linux/stacktrace.h>

void __raw_spin_lock_init(raw_spinlock_t *lock, const char *name,
struct lock_class_key *key, short inner)
@@ -27,6 +28,7 @@ void __raw_spin_lock_init(raw_spinlock_t *lock, const char *name,
lock->magic = SPINLOCK_MAGIC;
lock->owner = SPINLOCK_OWNER_INIT;
lock->owner_cpu = -1;
+ lock->stack_len = 0;
}

EXPORT_SYMBOL(__raw_spin_lock_init);
@@ -65,6 +67,10 @@ static void spin_dump(raw_spinlock_t *lock, const char *msg)
owner ? task_pid_nr(owner) : -1,
READ_ONCE(lock->owner_cpu));
dump_stack();
+ if (!strcmp(msg, "recursion")) {
+ printk(KERN_EMERG "Stack at lock acquisition: \n");
+ stack_trace_print(lock->stack_trace, lock->stack_len, 0);
+ }
}

static void spin_bug(raw_spinlock_t *lock, const char *msg)
@@ -90,6 +96,8 @@ static inline void debug_spin_lock_after(raw_spinlock_t *lock)
{
WRITE_ONCE(lock->owner_cpu, raw_smp_processor_id());
WRITE_ONCE(lock->owner, current);
+ lock->stack_len = stack_trace_save(lock->stack_trace,
+ ARRAY_SIZE(lock->stack_trace), 0);
}

static inline void debug_spin_unlock(raw_spinlock_t *lock)
@@ -101,6 +109,7 @@ static inline void debug_spin_unlock(raw_spinlock_t *lock)
lock, "wrong CPU");
WRITE_ONCE(lock->owner, SPINLOCK_OWNER_INIT);
WRITE_ONCE(lock->owner_cpu, -1);
+ WRITE_ONCE(lock->stack_len, 0);
}

/*
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


2021-06-30 02:29:41

by Waiman Long

[permalink] [raw]
Subject: Re: [RFC PATCH] spinlock_debug: Save stacktrace at lock acquisition

On 6/29/21 9:28 PM, Guru Das Srinagesh wrote:
> In case of spinlock recursion bugs, knowing which entity acquired the
> lock initially is key to debugging. Therefore, save the stack frames
> when the lock is acquired and print them when spinlock recursion is
> detected.
>
> Signed-off-by: Guru Das Srinagesh <[email protected]>
> ---
> include/linux/spinlock_types.h | 3 +++
> kernel/locking/spinlock_debug.c | 9 +++++++++
> 2 files changed, 12 insertions(+)
>
> diff --git a/include/linux/spinlock_types.h b/include/linux/spinlock_types.h
> index b981caa..8a68d55 100644
> --- a/include/linux/spinlock_types.h
> +++ b/include/linux/spinlock_types.h
> @@ -22,6 +22,9 @@ typedef struct raw_spinlock {
> #ifdef CONFIG_DEBUG_SPINLOCK
> unsigned int magic, owner_cpu;
> void *owner;
> +#define MAX_STACK_LEN 16
> + int stack_len;
> + unsigned long stack_trace[MAX_STACK_LEN];
> #endif

Nak

Locking problem like this should be detected by the lockdep code. Adding
136 bytes (64-bit archs) to the size of a spinlock is just too big an
overhead.

Regards,
Longman

2021-07-07 09:23:59

by Chen, Rong A

[permalink] [raw]
Subject: [spinlock_debug] 0263428b26: kernel-selftests.dma.dma_map_benchmark.average_unmap_latency 87.5% regression

Greeting,

FYI, we noticed a 87.5% regression of kernel-selftests.dma.dma_map_benchmark.average_unmap_latency due to commit:


commit: 0263428b26d26ba8f2edc28f65181c4ab3aefc8b ("[RFC PATCH] spinlock_debug: Save stacktrace at lock acquisition")
url: https://github.com/0day-ci/linux/commits/Guru-Das-Srinagesh/spinlock_debug-Save-stacktrace-at-lock-acquisition/20210630-092833
base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 0e8a89d49d45197770f2e57fb15f1bc9ded96eb0

in testcase: kernel-selftests
on test machine: 4 threads Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz with 32G memory
with following parameters:

group: group-01
ucode: 0xe2

test-description: The kernel contains a set of "self tests" under the tools/testing/selftests/ directory. These are intended to be small unit tests to exercise individual code paths in the kernel.
test-url: https://www.kernel.org/doc/Documentation/kselftest.txt



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
bin/lkp run generated-yaml-file

=========================================================================================
compiler/group/kconfig/rootfs/tbox_group/testcase/ucode:
gcc-9/group-01/x86_64-rhel-8.3-kselftests/debian-10.4-x86_64-20200603.cgz/lkp-skl-d05/kernel-selftests/0xe2

commit:
0e8a89d49d ("locking/lockdep: Correct the description error for check_redundant()")
0263428b26 ("spinlock_debug: Save stacktrace at lock acquisition")

0e8a89d49d451977 0263428b26d26ba8f2edc28f651
---------------- ---------------------------
%stddev %change %stddev
\ | \
0.70 +85.7% 1.30 kernel-selftests.dma.dma_map_benchmark.average_map_latency
1.02 ± 10% +57.4% 1.60 kernel-selftests.dma.dma_map_benchmark.average_map_latency_stddev
0.53 ± 8% +87.5% 1.00 kernel-selftests.dma.dma_map_benchmark.average_unmap_latency
0.93 ± 5% +54.6% 1.44 ± 3% kernel-selftests.dma.dma_map_benchmark.average_unmap_latency_stddev
629.81 +51.8% 956.18 kernel-selftests.time.elapsed_time
629.81 +51.8% 956.18 kernel-selftests.time.elapsed_time.max
19380 ± 11% +19.5% 23151 ± 10% kernel-selftests.time.involuntary_context_switches
10274 +16.0% 11916 ± 5% kernel-selftests.time.maximum_resident_set_size
2165658 -1.7% 2129319 kernel-selftests.time.minor_page_faults
41.83 +34.2% 56.14 kernel-selftests.time.percent_of_cpu_this_job_got
253.76 +108.0% 527.81 kernel-selftests.time.system_time
64175 +104.3% 131100 ± 5% kernel-selftests.time.voluntary_context_switches
1510 -15.6% 1275 ± 2% meminfo.Active(anon)
420666 ± 2% -20.7% 333726 ± 2% meminfo.DirectMap4k
136525 +50.3% 205189 meminfo.KReclaimable
136525 +50.3% 205189 meminfo.SReclaimable
42313 +14.2% 48332 meminfo.SUnreclaim
13834 -12.5% 12106 meminfo.Shmem
178839 +41.8% 253522 meminfo.Slab
290239 ± 4% -18.9% 235393 ± 6% meminfo.VmallocUsed



kernel-selftests.time.system_time

600 +---------------------------------------------------------------------+
| O O O O |
500 |-OO O O OOOOOOO OO O OOO OOO OO OOO |
| |
| |
400 |-+ |
| |
300 |-+ |
|++ + +++ ++ +++ + + ++++++++++++++++++++ ++++++ + ++ ++ ++++ ++ |
200 |:: : ::: :: ::: : : : : :: : : :: :: :::: :: |
|:: : ::: :: ::: : : : : :: : : :: :: :::: :: |
|::: :: :: ::::: ::::::: ::: ::: : ::::: : ::::|
100 |-:: : :: : : :: :: : : :: : : : : :: : : ::|
| :: : :: : : :: :: : : :: : : : : :: : : ::|
0 +---------------------------------------------------------------------+


kernel-selftests.time.percent_of_cpu_this_job_got

60 +----------------------------------------------------------------------+
| OOO O O OOOO OOO O O OOOOOO OO OOOOO OO |
50 |-+ |
| |
|++ + +++ + +++ ++ +++++++++++++++++++++++ ++++++ +++ ++ ++++ ++ |
40 |:: : ::: : ::: :: :: :: : :: : : :: :::: :: |
|:: : ::: : ::: :: :: :: : :: : : :: :::: :: |
30 |:: : ::: : ::: :: :: :: : :: : : :: :::: :: |
|:: : ::: : ::: :: :: :: : :: : : :: :::: :: |
20 |::: :: :: ::::: :::: :: :::: :: : ::::: : ::::|
| :: : :: : : :: :: ::: : : :: : : :: : : ::|
| :: : :: : : :: :: ::: : : :: : : :: : : ::|
10 |-:: : :: : : :: :: ::: : : :: : : :: : : ::|
| :: : :: : : :: :: ::: : : :: : : :: : : ::|
0 +----------------------------------------------------------------------+


kernel-selftests.time.elapsed_time

1000 +--------------------------------------------------------------------+
900 |-OOOO OO OOOOOO OOO OOO O O OOOOOO OO |
| |
800 |-+ |
700 |-+ |
|++ ++ +++ ++ + + + + +++++++++++++++++++ ++++++ + ++ + +++ + ++ |
600 |:: :: ::: :: : : : : : : :: : : :: : ::: : :: |
500 |:: :: ::: :: : : : : : : :: : : :: : ::: : :: |
400 |:: :: ::: :: : : : : : : :: : : :: : ::: : :: |
|:: :: ::: :: : : : : : : :: : : :: : ::: : :: |
300 |-:: : ::: :: : : ::: : : : : : : : :: : : ::|
200 |-:: : ::: :: : : ::: : : : : : : : :: : : ::|
| :: : ::: :: : : ::: : : : : : : : :: : : ::|
100 |-:: : ::: :: : : ::: : : : : : : : :: : : ::|
0 +--------------------------------------------------------------------+


kernel-selftests.time.elapsed_time.max

1000 +--------------------------------------------------------------------+
900 |-OOOO OO OOOOOO OOO OOO O O OOOOOO OO |
| |
800 |-+ |
700 |-+ |
|++ ++ +++ ++ + + + + +++++++++++++++++++ ++++++ + ++ + +++ + ++ |
600 |:: :: ::: :: : : : : : : :: : : :: : ::: : :: |
500 |:: :: ::: :: : : : : : : :: : : :: : ::: : :: |
400 |:: :: ::: :: : : : : : : :: : : :: : ::: : :: |
|:: :: ::: :: : : : : : : :: : : :: : ::: : :: |
300 |-:: : ::: :: : : ::: : : : : : : : :: : : ::|
200 |-:: : ::: :: : : ::: : : : : : : : :: : : ::|
| :: : ::: :: : : ::: : : : : : : : :: : : ::|
100 |-:: : ::: :: : : ::: : : : : : : : :: : : ::|
0 +--------------------------------------------------------------------+


kernel-selftests.time.voluntary_context_switches

160000 +------------------------------------------------------------------+
| O O |
140000 |-OOOO OO OO O O O O O OO O O |
120000 |-+ O O O O O O O O O |
| |
100000 |-+ |
| |
80000 |-+ |
|++ ++ + +++ +++ + + +++++++++++++++++++ +++++++ ++ + +++ + ++ |
60000 |:: :: : ::: ::: +: : : : :: :: : +: ::: : :: |
40000 |:: :: : ::: ::: :: : : : :: :: : :: ::: : :: |
|::: : ::::::::: ::::::: ::: :: : ::::: ::::::|
20000 |-:: : ::: : :: :: ::: :: :: : :: : : : ::|
| :: : ::: : :: :: ::: :: :: : :: : : : ::|
0 +------------------------------------------------------------------+


kernel-selftests.dma.dma_map_benchmark.average_map_latency

1.4 +---------------------------------------------------------------------+
| OOO O O OOOOOOOOOO O OOO OOO O OO OOO O |
1.2 |-+ |
| |
1 |-+ |
| |
0.8 |-+ |
|++ + +++ ++ +++ + + ++++++++++++++++++++ ++++++ + ++ ++ ++++ ++ |
0.6 |:: : ::: :: ::: : : : : :: : : :: :: :::: :: |
|:: : ::: :: ::: : : : : :: : : :: :: :::: :: |
0.4 |:: : ::: :: ::: : : : : :: : : :: :: :::: :: |
| :: : :: : : :: :: : : :: : : : : :: : : ::|
0.2 |-:: : :: : : :: :: : : :: : : : : :: : : ::|
| :: : :: : : :: :: : : :: : : : : :: : : ::|
0 +---------------------------------------------------------------------+


kernel-selftests.dma.dma_map_benchmark.average_map_latency_stddev

1.6 +---------------------------------------------------------------------+
| O OO O O |
1.4 |-+ |
1.2 |-+ + + + ++ +++ ++ +++ + +++ + + |
| : + + : :+ +++ + :: : +: : : + : : + + + +: : + |
1 |+: : :+: :: ::: : :+++ + + ++++ +: +: + : ++ :: +:+: + |
|:: : ::: :: ::: + : : + + : :: : : :: :: :::: :+ |
0.8 |:: : ::: :: ::: : : : : :: : : :: :: :::: :: |
|:: : ::: :: ::: : : : : :: : : :: :: :::: :: |
0.6 |::: : : :: : ::: ::: :: :: : :: :: :::: : ::: |
0.4 |-:: : :: : : :: :::: : :: : : : : :: : : ::|
| :: : :: : : :: :: : : :: : : : : :: : : ::|
0.2 |-:: : :: : : :: :: : : :: : : : : :: : : ::|
| :: : :: : : :: :: : : :: : : : : :: : : ::|
0 +---------------------------------------------------------------------+


kernel-selftests.dma.dma_map_benchmark.average_unmap_latency

1 +---------------------------------------------------------------------+
| |
| |
0.8 |-+ |
| |
| |
0.6 |-+ + + + ++ + ++++ + + + +++ ++ + + |
|+: : ++: :+ +++ + + :++++++ + ++++ + +: +: ++ + ++ :: +:+: ++ |
0.4 |:: : ::: :: ::: : : : : :: : : :: :: :::: :: |
|:: : ::: :: ::: : : : : :: : : :: :: :::: :: |
|::: :: :: ::: ::: : : : ::: : : :: :::: : ::: |
0.2 |-:: : :: : : :: :: : : :: : : : : :: : : ::|
| :: : :: : : :: :: : : :: : : : : :: : : ::|
| :: : :: : : :: :: : : :: : : : : :: : : ::|
0 +---------------------------------------------------------------------+


kernel-selftests.dma.dma_map_benchmark.average_unmap_latency_stddev

1.6 +---------------------------------------------------------------------+
| O O O OO O OO |
1.4 |-OOO O OOOOOOOOO O O O O OO O O |
1.2 |-+ O |
| + ++ + + + + + + |
1 |-+ + : + :: +++ : :+ ++ +:+ +++ + + ++ + + ++ + + |
|+: : :+: :: ::: + : :+++++ + ++++ +: +: + : ++ :: +:+: ++ |
0.8 |:: : ::: :: ::: : : : : :: : : :: :: :::: :: |
|:: : ::: :: ::: : : : : :: : : :: :: :::: :: |
0.6 |:: : ::: :::::: : ::: : :: : : :: :: :::: :: |
0.4 |::: : :: : : :: :::: : :: : : : : ::: : ::::|
| :: : :: : : :: :: : : :: : : : : :: : : ::|
0.2 |-:: : :: : : :: :: : : :: : : : : :: : : ::|
| :: : :: : : :: :: : : :: : : : : :: : : ::|
0 +---------------------------------------------------------------------+


[*] bisect-good sample
[O] bisect-bad sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation

Thanks,
Rong Chen


Attachments:
(No filename) (18.60 kB)
config-5.13.0-rc1-00129-g0263428b26d2 (178.12 kB)
job-script (7.28 kB)
job.yaml (6.34 kB)
reproduce (858.00 B)
Download all attachments