Greeting,
FYI, we noticed the following commit (built with gcc-9):
commit: d3750a09232a9af1e8c6bb3b06a6609d921eb506 ("[RFC PATCH 13/15] KVM: x86/mmu: Split large pages during CLEAR_DIRTY_LOG")
url: https://github.com/0day-ci/linux/commits/David-Matlack/KVM-x86-mmu-Eager-Page-Splitting-for-the-TDP-MMU/20211120-080051
base: https://git.kernel.org/cgit/virt/kvm/kvm.git queue
patch link: https://lore.kernel.org/kvm/[email protected]
in testcase: kernel-selftests
version: kernel-selftests-x86_64-a21458fc-1_20211128
with following parameters:
group: kvm
ucode: 0xe2
test-description: The kernel contains a set of "self tests" under the tools/testing/selftests/ directory. These are intended to be small unit tests to exercise individual code paths in the kernel.
test-url: https://www.kernel.org/doc/Documentation/kselftest.txt
on test machine: 8 threads Intel(R) Core(TM) i7-6770HQ CPU @ 2.60GHz with 32G memory
caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>
[ 280.691224][T10825] WARNING: possible circular locking dependency detected
[ 280.698458][T10825] 5.15.0-12443-gd3750a09232a #1 Tainted: G I
[ 280.705780][T10825] ------------------------------------------------------
[ 280.712843][T10825] dirty_log_test/10825 is trying to acquire lock:
[280.719317][T10825] ffffffff859d97c0 (fs_reclaim){+.+.}-{0:0}, at: kmem_cache_alloc (include/linux/sched/mm.h:228 mm/slab.h:492 mm/slub.c:3148 mm/slub.c:3242 mm/slub.c:3247)
[ 280.728159][T10825]
[ 280.728159][T10825] but task is already holding lock:
[280.735565][T10825] ffffc90009b61018 (&(kvm)->mmu_lock){++++}-{2:2}, at: kvm_clear_dirty_log_protect (arch/x86/kvm/../../../virt/kvm/kvm_main.c:2176)
[ 280.745919][T10825]
[ 280.745919][T10825] which lock already depends on the new lock.
[ 280.745919][T10825]
[ 280.756398][T10825]
[ 280.756398][T10825] the existing dependency chain (in reverse order) is:
[ 280.765486][T10825]
[ 280.765486][T10825] -> #2 (&(kvm)->mmu_lock){++++}-{2:2}:
[280.773296][T10825] lock_acquire (kernel/locking/lockdep.c:438 kernel/locking/lockdep.c:5639 kernel/locking/lockdep.c:5602)
[280.778318][T10825] _raw_write_lock (include/linux/rwlock_api_smp.h:209 kernel/locking/spinlock.c:300)
[280.783422][T10825] kvm_mmu_notifier_invalidate_range_start (arch/x86/kvm/../../../virt/kvm/kvm_main.c:576 arch/x86/kvm/../../../virt/kvm/kvm_main.c:714)
[280.790879][T10825] __mmu_notifier_invalidate_range_start (mm/mmu_notifier.c:494 mm/mmu_notifier.c:548)
[280.798089][T10825] wp_page_copy (include/linux/mmu_notifier.h:459 mm/memory.c:3017)
[280.803304][T10825] __handle_mm_fault (mm/memory.c:4569 mm/memory.c:4686)
[280.808964][T10825] handle_mm_fault (mm/memory.c:4784)
[280.814276][T10825] do_user_addr_fault (arch/x86/mm/fault.c:1397)
[280.819805][T10825] exc_page_fault (arch/x86/include/asm/irqflags.h:29 arch/x86/include/asm/irqflags.h:70 arch/x86/include/asm/irqflags.h:132 arch/x86/mm/fault.c:1493 arch/x86/mm/fault.c:1541)
[280.824952][T10825] asm_exc_page_fault (arch/x86/include/asm/idtentry.h:568)
[ 280.830336][T10825]
[ 280.830336][T10825] -> #1 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
[280.839807][T10825] lock_acquire (kernel/locking/lockdep.c:438 kernel/locking/lockdep.c:5639 kernel/locking/lockdep.c:5602)
[280.844868][T10825] fs_reclaim_acquire (mm/page_alloc.c:4552)
[280.850354][T10825] __kmalloc_node (include/linux/sched/mm.h:228 mm/slab.h:492 mm/slub.c:3148 mm/slub.c:4467)
[280.855538][T10825] alloc_cpumask_var_node (lib/cpumask.c:115)
[280.861284][T10825] native_smp_prepare_cpus (arch/x86/kernel/smpboot.c:1373)
[280.867316][T10825] kernel_init_freeable (include/linux/compiler.h:252 include/linux/init.h:124 init/main.c:1414 init/main.c:1599)
[280.873070][T10825] kernel_init (init/main.c:1501)
[280.877992][T10825] ret_from_fork (arch/x86/entry/entry_64.S:301)
[ 280.882914][T10825]
[ 280.882914][T10825] -> #0 (fs_reclaim){+.+.}-{0:0}:
[280.890200][T10825] check_prev_add (kernel/locking/lockdep.c:3064)
[280.895486][T10825] __lock_acquire (kernel/locking/lockdep.c:3187 kernel/locking/lockdep.c:3801 kernel/locking/lockdep.c:5027)
[280.900838][T10825] lock_acquire (kernel/locking/lockdep.c:438 kernel/locking/lockdep.c:5639 kernel/locking/lockdep.c:5602)
[280.905873][T10825] fs_reclaim_acquire (mm/page_alloc.c:4536 mm/page_alloc.c:4549)
[280.911412][T10825] kmem_cache_alloc (include/linux/sched/mm.h:228 mm/slab.h:492 mm/slub.c:3148 mm/slub.c:3242 mm/slub.c:3247)
[280.916733][T10825] kvm_mmu_topup_memory_cache (arch/x86/kvm/../../../virt/kvm/kvm_main.c:383)
[280.922982][T10825] mmu_topup_split_caches (arch/x86/kvm/mmu/mmu.c:765)
[280.928842][T10825] kvm_mmu_try_split_large_pages (arch/x86/kvm/mmu/mmu.c:5897)
[280.935333][T10825] kvm_arch_mmu_enable_log_dirty_pt_masked (arch/x86/kvm/mmu/mmu.c:1457)
[280.942762][T10825] kvm_clear_dirty_log_protect (arch/x86/kvm/../../../virt/kvm/kvm_main.c:2193)
[280.949182][T10825] kvm_vm_ioctl (arch/x86/kvm/../../../virt/kvm/kvm_main.c:2215 arch/x86/kvm/../../../virt/kvm/kvm_main.c:4511)
[280.954337][T10825] __x64_sys_ioctl (fs/ioctl.c:52 fs/ioctl.c:874 fs/ioctl.c:860 fs/ioctl.c:860)
[280.959633][T10825] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
[280.964580][T10825] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113)
[ 280.971044][T10825]
[ 280.971044][T10825] other info that might help us debug this:
[ 280.971044][T10825]
[ 280.981447][T10825] Chain exists of:
[ 280.981447][T10825] fs_reclaim --> mmu_notifier_invalidate_range_start --> &(kvm)->mmu_lock
[ 280.981447][T10825]
[ 280.996135][T10825] Possible unsafe locking scenario:
[ 280.996135][T10825]
[ 281.003699][T10825] CPU0 CPU1
[ 281.009067][T10825] ---- ----
[ 281.014443][T10825] lock(&(kvm)->mmu_lock);
[ 281.018989][T10825] lock(mmu_notifier_invalidate_range_start);
[ 281.027803][T10825] lock(&(kvm)->mmu_lock);
[ 281.034897][T10825] lock(fs_reclaim);
[ 281.038853][T10825]
[ 281.038853][T10825] *** DEADLOCK ***
[ 281.038853][T10825]
[ 281.047128][T10825] 2 locks held by dirty_log_test/10825:
[281.052687][T10825] #0: ffffc90009b610a8 (&kvm->slots_lock){+.+.}-{3:3}, at: kvm_vm_ioctl (arch/x86/kvm/../../../virt/kvm/kvm_main.c:2213 arch/x86/kvm/../../../virt/kvm/kvm_main.c:4511)
[281.062371][T10825] #1: ffffc90009b61018 (&(kvm)->mmu_lock){++++}-{2:2}, at: kvm_clear_dirty_log_protect (arch/x86/kvm/../../../virt/kvm/kvm_main.c:2176)
[ 281.073672][T10825]
[ 281.073672][T10825] stack backtrace:
[ 281.079747][T10825] CPU: 5 PID: 10825 Comm: dirty_log_test Tainted: G I 5.15.0-12443-gd3750a09232a #1
[ 281.090909][T10825] Hardware name: /NUC6i7KYB, BIOS KYSKLi70.86A.0041.2016.0817.1130 08/17/2016
[ 281.099876][T10825] Call Trace:
[ 281.103142][T10825] <TASK>
[281.106036][T10825] dump_stack_lvl (lib/dump_stack.c:107)
[281.110529][T10825] check_noncircular (kernel/locking/lockdep.c:2143)
[281.115472][T10825] ? print_circular_bug+0x480/0x480
[281.121341][T10825] ? mark_lock_irq (kernel/locking/lockdep.c:4564)
[281.126269][T10825] ? is_bpf_text_address (kernel/bpf/core.c:713)
[281.131478][T10825] ? mark_lock+0xca/0x1400
[281.136540][T10825] ? mark_lock+0xca/0x1400
[281.141578][T10825] check_prev_add (kernel/locking/lockdep.c:3064)
[281.146353][T10825] __lock_acquire (kernel/locking/lockdep.c:3187 kernel/locking/lockdep.c:3801 kernel/locking/lockdep.c:5027)
[281.151214][T10825] ? lock_is_held_type (kernel/locking/lockdep.c:438 kernel/locking/lockdep.c:5681)
[281.156272][T10825] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4885)
[281.162278][T10825] ? lock_is_held_type (kernel/locking/lockdep.c:438 kernel/locking/lockdep.c:5681)
[281.167324][T10825] ? rcu_read_lock_sched_held (include/linux/lockdep.h:283 kernel/rcu/update.c:125)
[281.173091][T10825] lock_acquire (kernel/locking/lockdep.c:438 kernel/locking/lockdep.c:5639 kernel/locking/lockdep.c:5602)
[281.177732][T10825] ? kmem_cache_alloc (include/linux/sched/mm.h:228 mm/slab.h:492 mm/slub.c:3148 mm/slub.c:3242 mm/slub.c:3247)
[281.182856][T10825] ? rcu_read_unlock (include/linux/rcupdate.h:717 (discriminator 5))
[281.187846][T10825] ? lock_is_held_type (kernel/locking/lockdep.c:438 kernel/locking/lockdep.c:5681)
[281.193099][T10825] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4885)
[281.199356][T10825] ? lock_is_held_type (kernel/locking/lockdep.c:438 kernel/locking/lockdep.c:5681)
[281.204642][T10825] fs_reclaim_acquire (mm/page_alloc.c:4536 mm/page_alloc.c:4549)
[281.209675][T10825] ? kmem_cache_alloc (include/linux/sched/mm.h:228 mm/slab.h:492 mm/slub.c:3148 mm/slub.c:3242 mm/slub.c:3247)
[281.214625][T10825] ? kvm_mmu_topup_memory_cache (arch/x86/kvm/../../../virt/kvm/kvm_main.c:383)
[281.220440][T10825] kmem_cache_alloc (include/linux/sched/mm.h:228 mm/slab.h:492 mm/slub.c:3148 mm/slub.c:3242 mm/slub.c:3247)
[281.225225][T10825] ? lock_acquire (kernel/locking/lockdep.c:438 kernel/locking/lockdep.c:5639 kernel/locking/lockdep.c:5602)
[281.230012][T10825] kvm_mmu_topup_memory_cache (arch/x86/kvm/../../../virt/kvm/kvm_main.c:383)
[281.235876][T10825] ? lock_is_held_type (kernel/locking/lockdep.c:438 kernel/locking/lockdep.c:5681)
[281.241148][T10825] mmu_topup_split_caches (arch/x86/kvm/mmu/mmu.c:765)
[281.246492][T10825] kvm_mmu_try_split_large_pages (arch/x86/kvm/mmu/mmu.c:5897)
[281.252407][T10825] kvm_arch_mmu_enable_log_dirty_pt_masked (arch/x86/kvm/mmu/mmu.c:1457)
[281.259324][T10825] kvm_clear_dirty_log_protect (arch/x86/kvm/../../../virt/kvm/kvm_main.c:2193)
[281.265165][T10825] kvm_vm_ioctl (arch/x86/kvm/../../../virt/kvm/kvm_main.c:2215 arch/x86/kvm/../../../virt/kvm/kvm_main.c:4511)
[281.269769][T10825] ? __lock_acquire (arch/x86/include/asm/bitops.h:214 (discriminator 9) include/asm-generic/bitops/instrumented-non-atomic.h:135 (discriminator 9) kernel/locking/lockdep.c:199 (discriminator 9) kernel/locking/lockdep.c:5024 (discriminator 9))
[281.274716][T10825] ? kvm_unregister_device_ops (arch/x86/kvm/../../../virt/kvm/kvm_main.c:4464)
[281.280453][T10825] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4885)
[281.286752][T10825] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4885)
[281.292869][T10825] ? lock_is_held_type (kernel/locking/lockdep.c:438 kernel/locking/lockdep.c:5681)
[281.297878][T10825] ? fiemap_prep (fs/ioctl.c:778)
[281.302467][T10825] ? rcu_read_lock_sched_held (include/linux/lockdep.h:283 kernel/rcu/update.c:125)
[281.308131][T10825] ? rcu_read_lock_bh_held (kernel/rcu/update.c:120)
[281.313458][T10825] ? lock_acquire (kernel/locking/lockdep.c:438 kernel/locking/lockdep.c:5639 kernel/locking/lockdep.c:5602)
[281.318131][T10825] ? find_held_lock (kernel/locking/lockdep.c:5130)
[281.322907][T10825] ? lock_release (kernel/locking/lockdep.c:438 kernel/locking/lockdep.c:5659)
[281.327606][T10825] ? lock_downgrade (kernel/locking/lockdep.c:5645)
[281.332467][T10825] ? rcu_read_lock_sched_held (kernel/rcu/update.c:306)
[281.338192][T10825] ? __fget_files (fs/file.c:865)
[281.342877][T10825] __x64_sys_ioctl (fs/ioctl.c:52 fs/ioctl.c:874 fs/ioctl.c:860 fs/ioctl.c:860)
[281.347634][T10825] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
[281.352016][T10825] ? irqentry_exit_to_user_mode (kernel/entry/common.c:127 kernel/entry/common.c:315)
[281.357667][T10825] ? asm_exc_page_fault (arch/x86/include/asm/idtentry.h:568)
[281.362599][T10825] ? rcu_read_lock_sched_held (include/linux/lockdep.h:283 kernel/rcu/update.c:125)
[281.368259][T10825] ? rcu_read_lock_bh_held (kernel/rcu/update.c:120)
[281.373557][T10825] ? asm_exc_page_fault (arch/x86/include/asm/idtentry.h:568)
[281.378614][T10825] ? asm_exc_page_fault (arch/x86/include/asm/idtentry.h:568)
[281.383562][T10825] ? lockdep_hardirqs_on (kernel/locking/lockdep.c:4356)
[281.388772][T10825] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113)
[ 281.394893][T10825] RIP: 0033:0x7f9d40646427
[ 281.399484][T10825] Code: 00 00 90 48 8b 05 69 aa 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 aa 0c 00 f7 d8 64 89 01 48
All code
========
0: 00 00 add %al,(%rax)
2: 90 nop
3: 48 8b 05 69 aa 0c 00 mov 0xcaa69(%rip),%rax # 0xcaa73
a: 64 c7 00 26 00 00 00 movl $0x26,%fs:(%rax)
11: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax
18: c3 retq
19: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
20: 00 00 00
23: b8 10 00 00 00 mov $0x10,%eax
28: 0f 05 syscall
2a:* 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax <-- trapping instruction
30: 73 01 jae 0x33
32: c3 retq
33: 48 8b 0d 39 aa 0c 00 mov 0xcaa39(%rip),%rcx # 0xcaa73
3a: f7 d8 neg %eax
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file
# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.
---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation
Thanks,
Oliver Sang
On 12/5/21 14:30, kernel test robot wrote:
>
> Chain exists of:
> fs_reclaim --> mmu_notifier_invalidate_range_start --> &(kvm)->mmu_lock
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(&(kvm)->mmu_lock);
> lock(mmu_notifier_invalidate_range_start);
> lock(&(kvm)->mmu_lock);
> lock(fs_reclaim);
>
David, this is yours; basically, kvm_mmu_topup_memory_cache must be
called outside the mmu_lock.
Paolo
On Sun, Dec 5, 2021 at 10:55 PM Paolo Bonzini <[email protected]> wrote:
>
> On 12/5/21 14:30, kernel test robot wrote:
> >
> > Chain exists of:
> > fs_reclaim --> mmu_notifier_invalidate_range_start --> &(kvm)->mmu_lock
> >
> > Possible unsafe locking scenario:
> >
> > CPU0 CPU1
> > ---- ----
> > lock(&(kvm)->mmu_lock);
> > lock(mmu_notifier_invalidate_range_start);
> > lock(&(kvm)->mmu_lock);
> > lock(fs_reclaim);
> >
>
> David, this is yours; basically, kvm_mmu_topup_memory_cache must be
> called outside the mmu_lock.
Ah, I see. kvm_arch_mmu_enable_log_dirty_pt_masked is called with
mmu_lock already held. I'll make sure to address this in v1. In theory
this should just go away when I switch away from using split_caches to
Sean's suggestion of allocating under the mmu_lock with reclaim
disabled.