2018-07-13 13:44:39

by Waiman Long

[permalink] [raw]
Subject: [PATCH] debugobjects: Disable lockdep tracking of debugobjects internal locks

It was discovered that running the ltp openposix_testsuite sigqueue-09-1
test on a certain 8-sock IvyBridge system caused it to have a hard lockup
with a full debug kernel.

[89981.861500] NMI watchdog: Watchdog detected hard LOCKUP on cpu 17
:
[89981.939812] irq event stamp: 1166122
[89981.943799] hardirqs last enabled at (1166121): [<ffffffffb1048342>] kprobe_ftrace_handler+0x52/0x170
[89981.954215] hardirqs last disabled at (1166122): [<ffffffffb08a3725>] tasklist_write_lock_irq+0x15/0x50
:
[89982.103134] [<ffffffffb093a399>] lock_acquire+0x99/0x1e0
[89982.109163] [<ffffffffb0c0d19b>] ? debug_check_no_obj_freed+0xfb/0x270
[89982.116562] [<ffffffffb1042e1e>] _raw_spin_lock_irqsave+0x5e/0xa0
[89982.123470] [<ffffffffb0c0d19b>] ? debug_check_no_obj_freed+0xfb/0x270
[89982.130851] [<ffffffffb0c0d19b>] debug_check_no_obj_freed+0xfb/0x270
[89982.138045] [<ffffffffb08bf9f3>] ? __sigqueue_free.part.11+0x33/0x40
[89982.145239] [<ffffffffb0a6868a>] kmem_cache_free+0xca/0x390
[89982.151553] [<ffffffffb08bf9f3>] __sigqueue_free.part.11+0x33/0x40
[89982.158552] [<ffffffffb08c07b0>] flush_sigqueue+0x50/0x60
[89982.164673] [<ffffffffb08ad102>] release_task+0x3e2/0x6d0
:

IRQ was disabled by the tasklist_write_lock_irq() call in
release_task(). The lockup is probably caused by the debug code running
for too long. We certainly want the debug code to verify the correctness
of the production code. However, there may not have too much value for
one piece of the debug code to verify the correctness of another piece
of debug code. In this particular case, the lockdep code is verifying
the correctness of the raw debug bucket lock within the debugobjects
code.

The use of spin locks in the debugobjects code for synchronization is
pretty standard and looks to be correct to me. So it is probably not
worth the effort to verify lock usage within the debugobjects code.

This patch disables the checking of the debugobjects internal locks by
lockdep. In fact, with this change, the hard lockup was not observed
anymore.

Signed-off-by: Waiman Long <[email protected]>
---
lib/debugobjects.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index 994be48..592d2ba 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -1103,8 +1103,15 @@ void __init debug_objects_early_init(void)
{
int i;

- for (i = 0; i < ODEBUG_HASH_SIZE; i++)
+ /*
+ * We don't need lockdep to verify correctness of debugobjects
+ * internal locks.
+ */
+ lockdep_set_novalidate_class(&pool_lock);
+ for (i = 0; i < ODEBUG_HASH_SIZE; i++) {
raw_spin_lock_init(&obj_hash[i].lock);
+ lockdep_set_novalidate_class(&obj_hash[i].lock);
+ }

for (i = 0; i < ODEBUG_POOL_SIZE; i++)
hlist_add_head(&obj_static_pool[i].node, &obj_pool);
--
1.8.3.1



2018-07-16 02:29:11

by kernel test robot

[permalink] [raw]
Subject: [lkp-robot] [debugobjects] 3f96d20faf: WARNING:at_kernel/locking/lockdep.c:#register_lock_class


FYI, we noticed the following commit (built with gcc-4.9):

commit: 3f96d20fafb19e6dd869362ace53662b06e6f6c1 ("[PATCH] debugobjects: Disable lockdep tracking of debugobjects internal locks")
url: https://github.com/0day-ci/linux/commits/Waiman-Long/debugobjects-Disable-lockdep-tracking-of-debugobjects-internal-locks/20180714-070310


in testcase: trinity
with following parameters:

runtime: 300s

test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/


on test machine: qemu-system-i386 -enable-kvm -m 256M

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+----------------------------------------------------------+------------+------------+
| | 3951dbf232 | 3f96d20faf |
+----------------------------------------------------------+------------+------------+
| boot_successes | 0 | 0 |
| boot_failures | 15 | 8 |
| WARNING:at_mm/page_alloc.c:#__alloc_pages_nodemask | 15 | 8 |
| EIP:__alloc_pages_nodemask | 15 | 8 |
| Mem-Info | 15 | 8 |
| WARNING:at_kernel/locking/lockdep.c:#register_lock_class | 0 | 8 |
| EIP:register_lock_class | 0 | 8 |
+----------------------------------------------------------+------------+------------+



[ 0.003333] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:704 register_lock_class+0x3f0/0x4e0
[ 0.003333] Modules linked in:
[ 0.003333] CPU: 0 PID: 0 Comm: swapper Not tainted 4.18.0-rc4-00144-g3f96d20 #1
[ 0.003333] EIP: register_lock_class+0x3f0/0x4e0
[ 0.003333] Code: 25 ff ff ff 68 a3 14 a7 81 68 6b fb a4 81 e8 47 48 fc ff 0f 0b 8b 15 4c a3 45 82 58 59 e9 0d ff ff ff 90 8d b4 26 00 00 00 00 <0f> 0b 8b 15 4c a3 45 82 e9 f8 fe ff ff 8d 76 00 8d 54 24 18 89 f8
[ 0.003333] EAX: 81a8242d EBX: 8219b130 ECX: 81d920c0 EDX: 00000000
[ 0.003333] ESI: 00000000 EDI: 81bb4530 EBP: 00000000 ESP: 81b61e38
[ 0.003333] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00210012
[ 0.003333] CR0: 80050033 CR2: ffbff000 CR3: 01d86000 CR4: 000006b0
[ 0.003333] Call Trace:
[ 0.003333] ? __lock_acquire+0x65/0x1a90
[ 0.003333] ? pcpu_alloc+0x495/0x810
[ 0.003333] ? lock_release+0x126/0x290
[ 0.003333] ? lock_acquire+0xa2/0x190
[ 0.003333] ? __debug_object_init+0x255/0x450
[ 0.003333] ? _raw_spin_lock+0x1f/0x30
[ 0.003333] ? __debug_object_init+0x255/0x450
[ 0.003333] ? __debug_object_init+0x255/0x450
[ 0.003333] ? debug_mutex_init+0x21/0x30
[ 0.003333] ? alloc_desc+0x83/0x120
[ 0.003333] ? early_irq_init+0x14d/0x18b
[ 0.003333] ? start_kernel+0x3fa/0x60c
[ 0.003333] ? startup_32_smp+0x164/0x170
[ 0.003333] irq event stamp: 0
[ 0.003333] hardirqs last enabled at (0): [<00000000>] (null)
[ 0.003333] hardirqs last disabled at (0): [<00000000>] (null)
[ 0.003333] softirqs last enabled at (0): [<00000000>] (null)
[ 0.003333] softirqs last disabled at (0): [<00000000>] (null)
[ 0.003333] random: get_random_bytes called from init_oops_id+0x37/0x40 with crng_init=0
[ 0.003333] ---[ end trace 2ecea7c3f5e2e513 ]---


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email



Thanks,
Xiaolong


Attachments:
(No filename) (3.65 kB)
config-4.18.0-rc4-00144-g3f96d20 (110.19 kB)
job-script (4.03 kB)
dmesg.xz (20.29 kB)
Download all attachments