2024-02-29 13:50:54

by Oliver Sang

[permalink] [raw]
Subject: [bristot:dl_server_v6] [sched/fair] 0e52e2609c: WARNING:at_kernel/sched/deadline.c:#task_contending



Hello,

kernel test robot noticed "WARNING:at_kernel/sched/deadline.c:#task_contending" on:

commit: 0e52e2609c893e6e423dfcdbeabe05e5ecdbeb8d ("sched/fair: Fair server interface")
git://git.kernel.org/cgit/linux/kernel/git/bristot/linux dl_server_v6

in testcase: kernel-selftests
version: kernel-selftests-x86_64-d2f1c3c9-1_20240223
with following parameters:

group: cgroup



compiler: gcc-12
test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 32G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)


the issue doesn't always happen (6 times out of 30 runs as below), we also
noticed there is another
"dmesg.WARNING:at_kernel/sched/deadline.c:#task_non_contending"
happen on both this commit and its parent. just FYI.


8384c98fed8762cf 0e52e2609c893e6e423dfcdbeab
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:30 20% 6:30 dmesg.WARNING:at_kernel/sched/deadline.c:#task_contending
2:30 13% 6:30 dmesg.WARNING:at_kernel/sched/deadline.c:#task_non_contending



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-lkp/[email protected]


[ 394.923885][ T0] ------------[ cut here ]------------
[ 394.923908][ T0]
[ 394.923910][ T0] ======================================================
[ 394.923911][ T0] WARNING: possible circular locking dependency detected
[ 394.923913][ T0] 6.8.0-rc4-00017-g0e52e2609c89 #1 Not tainted
[ 394.923916][ T0] ------------------------------------------------------
[ 394.923917][ T0] swapper/2/0 is trying to acquire lock:
[ 394.923919][ T0] ffffffff8514af18 ((console_sem).lock){-...}-{2:2}, at: down_trylock (kernel/locking/semaphore.c:140)
[ 394.923949][ T0]
[ 394.923949][ T0] but task is already holding lock:
[ 394.923950][ T0] ffff8888073450d8 (&rq->__lock){-.-.}-{2:2}, at: sched_ttwu_pending (kernel/sched/sched.h:1623 kernel/sched/sched.h:1690 kernel/sched/core.c:3887)
[ 394.923957][ T0]
[ 394.923957][ T0] which lock already depends on the new lock.
[ 394.923957][ T0]
[ 394.923958][ T0]
[ 394.923958][ T0] the existing dependency chain (in reverse order) is:
[ 394.923958][ T0]
[ 394.923958][ T0] -> #2 (&rq->__lock){-.-.}-{2:2}:
[ 394.923961][ T0] __lock_acquire (kernel/locking/lockdep.c:5137)
[ 394.923966][ T0] lock_acquire (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5756 kernel/locking/lockdep.c:5719)
[ 394.923969][ T0] _raw_spin_lock_nested (kernel/locking/spinlock.c:379)
[ 394.923971][ T0] raw_spin_rq_lock_nested (arch/x86/include/asm/preempt.h:84 kernel/sched/core.c:561)
[ 394.923974][ T0] task_fork_fair (kernel/sched/sched.h:1623 kernel/sched/sched.h:1706 kernel/sched/fair.c:12681)
[ 394.923977][ T0] sched_cgroup_fork (kernel/sched/core.c:4837)
[ 394.923979][ T0] copy_process (include/linux/timekeeping.h:155 kernel/fork.c:2614)
[ 394.923982][ T0] kernel_clone (include/linux/random.h:26 kernel/fork.c:2903)
[ 394.923984][ T0] user_mode_thread (kernel/fork.c:2971)
[ 394.923985][ T0] rest_init (init/main.c:695)
[ 394.923990][ T0] arch_call_rest_init+0x13/0x30
[ 394.923993][ T0] start_kernel (init/main.c:996 (discriminator 1))
[ 394.923995][ T0] x86_64_start_reservations (arch/x86/kernel/head64.c:543)
[ 394.923998][ T0] x86_64_start_kernel (arch/x86/kernel/head64.c:485 (discriminator 17))
[ 394.924001][ T0] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:461)
[ 394.924005][ T0]
[ 394.924005][ T0] -> #1 (&p->pi_lock){-.-.}-{2:2}:
[ 394.924007][ T0] __lock_acquire (kernel/locking/lockdep.c:5137)
[ 394.924010][ T0] lock_acquire (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5756 kernel/locking/lockdep.c:5719)
[ 394.924012][ T0] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:111 kernel/locking/spinlock.c:162)
[ 394.924014][ T0] try_to_wake_up (kernel/sched/core.c:4253)
[ 394.924017][ T0] up (kernel/locking/semaphore.c:191)
[ 394.924019][ T0] console_unlock (kernel/printk/printk.c:341 kernel/printk/printk.c:2706 kernel/printk/printk.c:3038)
[ 394.924021][ T0] vprintk_emit (kernel/printk/printk.c:2303)
[ 394.924023][ T0] _printk (kernel/printk/printk.c:2323)
[ 394.924026][ T0] 0xffffffffc028e0e1
[ 394.924031][ T0] do_one_initcall (init/main.c:1236)
[ 394.924034][ T0] do_init_module (kernel/module/main.c:2533)
[ 394.924038][ T0] load_module (kernel/module/main.c:2984)
[ 394.924040][ T0] init_module_from_file (kernel/module/main.c:3151)
[ 394.924042][ T0] idempotent_init_module (kernel/module/main.c:3168)
[ 394.924044][ T0] __x64_sys_finit_module (include/linux/file.h:45 kernel/module/main.c:3190 kernel/module/main.c:3172 kernel/module/main.c:3172)
[ 394.924046][ T0] do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
[ 394.924048][ T0] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
[ 394.924051][ T0]
[ 394.924051][ T0] -> #0 ((console_sem).lock){-...}-{2:2}:
[ 394.924054][ T0] check_prev_add (kernel/locking/lockdep.c:3135)
[ 394.924056][ T0] validate_chain (kernel/locking/lockdep.c:3254 kernel/locking/lockdep.c:3869)
[ 394.924059][ T0] __lock_acquire (kernel/locking/lockdep.c:5137)
[ 394.924061][ T0] lock_acquire (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5756 kernel/locking/lockdep.c:5719)
[ 394.924063][ T0] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:111 kernel/locking/spinlock.c:162)
[ 394.924065][ T0] down_trylock (kernel/locking/semaphore.c:140)
[ 394.924067][ T0] __down_trylock_console_sem (kernel/printk/printk.c:323)
[ 394.924070][ T0] console_trylock (kernel/printk/printk.c:2659)
[ 394.924073][ T0] console_trylock_spinning (kernel/printk/printk.c:1923)
[ 394.924076][ T0] vprintk_emit (kernel/printk/printk.c:2302)
[ 394.924077][ T0] _printk (kernel/printk/printk.c:2323)
[ 394.924079][ T0] __warn_printk (kernel/panic.c:723)
[ 394.924081][ T0] task_contending (kernel/sched/deadline.c:254 kernel/sched/deadline.c:314 kernel/sched/deadline.c:511)
[ 394.924085][ T0] enqueue_dl_entity (kernel/sched/deadline.c:75 kernel/sched/deadline.c:1061 kernel/sched/deadline.c:1921)
[ 394.924087][ T0] enqueue_task_fair (kernel/sched/fair.c:6730)
[ 394.924089][ T0] activate_task (kernel/sched/core.c:2150)
[ 394.924091][ T0] ttwu_do_activate (kernel/sched/core.c:3796)
[ 394.924094][ T0] sched_ttwu_pending (kernel/sched/core.c:3890 (discriminator 2))
[ 394.924096][ T0] __flush_smp_call_function_queue (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 include/trace/events/csd.h:64 kernel/smp.c:134 kernel/smp.c:558)
[ 394.924101][ T0] flush_smp_call_function_queue (kernel/smp.c:587 (discriminator 1))
[ 394.924104][ T0] do_idle (kernel/sched/idle.c:340)
[ 394.924106][ T0] cpu_startup_entry (kernel/sched/idle.c:409 (discriminator 1))
[ 394.924109][ T0] start_secondary (arch/x86/kernel/smpboot.c:224 arch/x86/kernel/smpboot.c:304)
[ 394.924111][ T0] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:461)
[ 394.924114][ T0]
[ 394.924114][ T0] other info that might help us debug this:
[ 394.924114][ T0]
[ 394.924114][ T0] Chain exists of:
[ 394.924114][ T0] (console_sem).lock --> &p->pi_lock --> &rq->__lock
[ 394.924114][ T0]
[ 394.924118][ T0] Possible unsafe locking scenario:
[ 394.924118][ T0]
[ 394.924118][ T0] CPU0 CPU1
[ 394.924119][ T0] ---- ----
[ 394.924119][ T0] lock(&rq->__lock);
[ 394.924120][ T0] lock(&p->pi_lock);
[ 394.924122][ T0] lock(&rq->__lock);
[ 394.924123][ T0] lock((console_sem).lock);
[ 394.924124][ T0]
[ 394.924124][ T0] *** DEADLOCK ***
[ 394.924124][ T0]
[ 394.924125][ T0] 1 lock held by swapper/2/0:
[ 394.924126][ T0] #0: ffff8888073450d8 (&rq->__lock){-.-.}-{2:2}, at: sched_ttwu_pending (kernel/sched/sched.h:1623 kernel/sched/sched.h:1690 kernel/sched/core.c:3887)
[ 394.924131][ T0]
[ 394.924131][ T0] stack backtrace:
[ 394.924133][ T0] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 6.8.0-rc4-00017-g0e52e2609c89 #1
[ 394.924135][ T0] Hardware name: Gigabyte Technology Co., Ltd. X299 UD4 Pro/X299 UD4 Pro-CF, BIOS F8a 04/27/2021
[ 394.924137][ T0] Call Trace:
[ 394.924138][ T0] <TASK>
[ 394.924140][ T0] dump_stack_lvl (lib/dump_stack.c:108)
[ 394.924146][ T0] check_noncircular (kernel/locking/lockdep.c:2187)
[ 394.924150][ T0] ? print_circular_bug (kernel/locking/lockdep.c:2163)
[ 394.924153][ T0] ? data_push_tail (arch/x86/include/asm/atomic64_64.h:15 include/linux/atomic/atomic-arch-fallback.h:2569 include/linux/atomic/atomic-long.h:38 include/linux/atomic/atomic-instrumented.h:3161 kernel/printk/printk_ringbuffer.c:661)
[ 394.924160][ T0] check_prev_add (kernel/locking/lockdep.c:3135)
[ 394.924163][ T0] ? print_bfs_bug (kernel/locking/lockdep.c:140)
[ 394.924167][ T0] validate_chain (kernel/locking/lockdep.c:3254 kernel/locking/lockdep.c:3869)
[ 394.924172][ T0] ? check_prev_add (kernel/locking/lockdep.c:3825)
[ 394.924175][ T0] ? mark_lock (arch/x86/include/asm/bitops.h:227 (discriminator 3) arch/x86/include/asm/bitops.h:239 (discriminator 3) include/asm-generic/bitops/instrumented-non-atomic.h:142 (discriminator 3) kernel/locking/lockdep.c:228 (discriminator 3) kernel/locking/lockdep.c:4656 (discriminator 3))
[ 394.924179][ T0] __lock_acquire (kernel/locking/lockdep.c:5137)
[ 394.924183][ T0] ? prb_final_commit (arch/x86/include/asm/atomic64_64.h:20 include/linux/atomic/atomic-arch-fallback.h:2615 include/linux/atomic/atomic-long.h:79 include/linux/atomic/atomic-instrumented.h:3196 kernel/printk/printk_ringbuffer.c:1671)
[ 394.924187][ T0] lock_acquire (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5756 kernel/locking/lockdep.c:5719)
[ 394.924189][ T0] ? down_trylock (kernel/locking/semaphore.c:140)
[ 394.924193][ T0] ? lock_sync (kernel/locking/lockdep.c:5722)
[ 394.924196][ T0] ? secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:461)
[ 394.924200][ T0] ? validate_chain (kernel/locking/lockdep.c:3836)
[ 394.924204][ T0] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:111 kernel/locking/spinlock.c:162)
[ 394.924206][ T0] ? down_trylock (kernel/locking/semaphore.c:140)
[ 394.924209][ T0] down_trylock (kernel/locking/semaphore.c:140)
[ 394.924212][ T0] ? console_trylock_spinning (kernel/printk/printk.c:1923)
[ 394.924215][ T0] __down_trylock_console_sem (kernel/printk/printk.c:323)
[ 394.924218][ T0] console_trylock (kernel/printk/printk.c:2659)
[ 394.924221][ T0] console_trylock_spinning (kernel/printk/printk.c:1923)
[ 394.924225][ T0] vprintk_emit (kernel/printk/printk.c:2302)
[ 394.924227][ T0] ? wake_up_klogd_work_func (kernel/printk/printk.c:2265)
[ 394.924230][ T0] ? validate_chain (arch/x86/include/asm/bitops.h:227 arch/x86/include/asm/bitops.h:239 include/asm-generic/bitops/instrumented-non-atomic.h:142 kernel/locking/lockdep.c:228 kernel/locking/lockdep.c:3781 kernel/locking/lockdep.c:3837)
[ 394.924233][ T0] _printk (kernel/printk/printk.c:2323)
[ 394.924236][ T0] ? printk_timed_ratelimit (kernel/printk/printk.c:2323)
[ 394.924240][ T0] ? validate_chain (arch/x86/include/asm/bitops.h:227 arch/x86/include/asm/bitops.h:239 include/asm-generic/bitops/instrumented-non-atomic.h:142 kernel/locking/lockdep.c:228 kernel/locking/lockdep.c:3781 kernel/locking/lockdep.c:3837)
[ 394.924245][ T0] __warn_printk (kernel/panic.c:723)
[ 394.924247][ T0] ? warn_count_show (kernel/panic.c:717)
[ 394.924253][ T0] task_contending (kernel/sched/deadline.c:254 kernel/sched/deadline.c:314 kernel/sched/deadline.c:511)
[ 394.924257][ T0] enqueue_dl_entity (kernel/sched/deadline.c:75 kernel/sched/deadline.c:1061 kernel/sched/deadline.c:1921)
[ 394.924259][ T0] ? lock_acquire (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5756 kernel/locking/lockdep.c:5719)
[ 394.924263][ T0] enqueue_task_fair (kernel/sched/fair.c:6730)
[ 394.924266][ T0] activate_task (kernel/sched/core.c:2150)
[ 394.924271][ T0] ttwu_do_activate (kernel/sched/core.c:3796)
[ 394.924274][ T0] ? update_rq_clock_task (kernel/sched/sched.h:1382 kernel/sched/pelt.h:66 kernel/sched/pelt.h:80 kernel/sched/pelt.h:98 kernel/sched/core.c:748)
[ 394.924277][ T0] sched_ttwu_pending (kernel/sched/core.c:3890 (discriminator 2))
[ 394.924281][ T0] ? ttwu_do_activate (kernel/sched/core.c:3878)
[ 394.924284][ T0] ? tick_nohz_restart_sched_tick (include/linux/hrtimer.h:395 kernel/time/tick-sched.c:784 kernel/time/tick-sched.c:995)
[ 394.924287][ T0] ? do_raw_spin_unlock (arch/x86/include/asm/atomic.h:23 include/linux/atomic/atomic-arch-fallback.h:457 include/linux/atomic/atomic-instrumented.h:33 include/asm-generic/qspinlock.h:57 kernel/locking/spinlock_debug.c:101 kernel/locking/spinlock_debug.c:141)
[ 394.924290][ T0] __flush_smp_call_function_queue (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 include/trace/events/csd.h:64 kernel/smp.c:134 kernel/smp.c:558)
[ 394.924295][ T0] flush_smp_call_function_queue (kernel/smp.c:587 (discriminator 1))
[ 394.924298][ T0] do_idle (kernel/sched/idle.c:340)
[ 394.924302][ T0] cpu_startup_entry (kernel/sched/idle.c:409 (discriminator 1))
[ 394.924305][ T0] start_secondary (arch/x86/kernel/smpboot.c:224 arch/x86/kernel/smpboot.c:304)
[ 394.924307][ T0] ? set_cpu_sibling_map (arch/x86/kernel/smpboot.c:254)
[ 394.924310][ T0] ? soft_restart_cpu (arch/x86/kernel/head_64.S:498)
[ 394.924312][ T0] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:461)
[ 394.924318][ T0] </TASK>
[ 394.998201][ T0] dl_rq->running_bw > dl_rq->this_bw
[ 394.998207][ T0] WARNING: CPU: 2 PID: 0 at kernel/sched/deadline.c:254 task_contending+0x6bf/0x820
[ 394.999605][ T0] Modules linked in: netconsole openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 intel_rapl_msr intel_rapl_common nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp btrfs blake2b_generic xor coretemp kvm_intel raid6_pq kvm zstd_compress libcrc32c irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 rapl intel_cstate nvme nvme_core t10_pi crc64_rocksoft_generic ahci mei_me libahci ipmi_devintf crc64_rocksoft ipmi_msghandler wmi_bmof intel_wmi_thunderbolt mxm_wmi intel_uncore wdat_wdt crc64 libata i2c_i801 mei i2c_smbus ioatdma dca wmi binfmt_misc loop fuse drm dm_mod ip_tables
[ 395.004574][ T0] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 6.8.0-rc4-00017-g0e52e2609c89 #1
[ 395.005380][ T0] Hardware name: Gigabyte Technology Co., Ltd. X299 UD4 Pro/X299 UD4 Pro-CF, BIOS F8a 04/27/2021
[ 395.006325][ T0] RIP: 0010:task_contending+0x6bf/0x820
[ 395.006858][ T0] Code: fc ff ff 0f 0b e9 c7 fc ff ff 80 3d 91 c3 84 04 00 0f 85 c2 fb ff ff 48 c7 c7 a0 c2 09 84 c6 05 7d c3 84 04 01 e8 71 7a ec ff <0f> 0b e9 a8 fb ff ff 80 3d 6b c3 84 04 00 0f 85 6c fb ff ff 48 c7
[ 395.008559][ T0] RSP: 0000:ffffc900001f7bf8 EFLAGS: 00010086
[ 395.009136][ T0] RAX: 0000000000000000 RBX: ffff888807345b10 RCX: 0000000000000027
[ 395.009876][ T0] RDX: 0000000000000027 RSI: 0000000000000004 RDI: ffff888807330c08
[ 395.010618][ T0] RBP: ffff8888073450c0 R08: ffffffff816cfe56 R09: ffffed1100e66181
[ 395.011357][ T0] R10: ffff888807330c0b R11: 0000000000000001 R12: ffffffff85bf7158
[ 395.012097][ T0] R13: 0000000000019998 R14: ffff888807345ae8 R15: 000000000000cccc
[ 395.012839][ T0] FS: 0000000000000000(0000) GS:ffff888807300000(0000) knlGS:0000000000000000
[ 395.013656][ T0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 395.014270][ T0] CR2: 00007fa2265fff78 CR3: 000000089be7a003 CR4: 00000000003706f0
[ 395.015008][ T0] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 395.015748][ T0] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 395.017413][ T0] Call Trace:
[ 395.017764][ T0] <TASK>
[ 395.018083][ T0] ? __warn+0xcd/0x2b0
[ 395.018500][ T0] ? task_contending+0x6bf/0x820
[ 395.018983][ T0] ? report_bug+0x267/0x2d0
[ 395.019431][ T0] ? handle_bug+0x3c/0x70
[ 395.019868][ T0] ? exc_invalid_op+0x17/0x40
[ 395.020331][ T0] ? asm_exc_invalid_op+0x1a/0x20
[ 395.020820][ T0] ? irq_work_claim+0x16/0x60
[ 395.021289][ T0] ? task_contending+0x6bf/0x820
[ 395.021789][ T0] ? task_contending+0x6bf/0x820
[ 395.022272][ T0] enqueue_dl_entity+0xa57/0x2430
[ 395.022764][ T0] ? lock_acquire+0x193/0x4b0
[ 395.023238][ T0] enqueue_task_fair+0x919/0xe70
[ 395.023719][ T0] activate_task+0xf7/0x2f0
[ 395.024182][ T0] ttwu_do_activate+0x11b/0x8c0
[ 395.024657][ T0] ? update_rq_clock_task+0x699/0x850
[ 395.025173][ T0] sched_ttwu_pending+0x27e/0x500
[ 395.025662][ T0] ? ttwu_do_activate+0x8c0/0x8c0
[ 395.026168][ T0] ? tick_nohz_restart_sched_tick+0x1b1/0x230
[ 395.026754][ T0] ? do_raw_spin_unlock+0x58/0x1f0
[ 395.027256][ T0] __flush_smp_call_function_queue+0x44b/0xb10
[ 395.027842][ T0] flush_smp_call_function_queue+0x68/0xa0
[ 395.028397][ T0] do_idle+0x118/0x1a0
[ 395.028806][ T0] cpu_startup_entry+0x55/0x60
[ 395.029271][ T0] start_secondary+0x209/0x290
[ 395.029738][ T0] ? set_cpu_sibling_map+0x2320/0x2320
[ 395.030279][ T0] ? soft_restart_cpu+0x15/0x15
[ 395.030754][ T0] secondary_startup_64_no_verify+0x184/0x18b
[ 395.031339][ T0] </TASK>
[ 395.031665][ T0] irq event stamp: 1494506
[ 395.032104][ T0] hardirqs last enabled at (1494505): [<ffffffff814cd657>] tick_nohz_idle_exit+0xc7/0x190
[ 395.033005][ T0] hardirqs last disabled at (1494506): [<ffffffff814e1c6e>] flush_smp_call_function_queue+0x5e/0xa0
[ 395.033988][ T0] softirqs last enabled at (1494464): [<ffffffff83c6c07a>] __do_softirq+0x53a/0x80e
[ 395.034851][ T0] softirqs last disabled at (1494447): [<ffffffff8122b4c2>] __irq_exit_rcu+0x132/0x180
[ 395.035735][ T0] ---[ end trace 0000000000000000 ]---


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240229/[email protected]



--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki