Hi,
I am getting the following deadlock on reservation_ww_class_mutex
while trying to boot next-20221111 kernel:
============================================
WARNING: possible recursive locking detected
6.1.0-rc4-next-20221111 #193 Not tainted
--------------------------------------------
kworker/4:1/81 is trying to acquire lock:
ffff88812ebe89a8 (reservation_ww_class_mutex){+.+.}-{3:3}, at:
dma_resv_lock_interruptible include/linux/dma-resv.h:372 [inline]
ffff88812ebe89a8 (reservation_ww_class_mutex){+.+.}-{3:3}, at:
ttm_bo_reserve include/drm/ttm/ttm_bo_driver.h:121 [inline]
ffff88812ebe89a8 (reservation_ww_class_mutex){+.+.}-{3:3}, at:
drm_gem_vram_vmap+0xa4/0x590 drivers/gpu/drm/drm_gem_vram_helper.c:436
but task is already holding lock:
ffff88812ebe89a8 (reservation_ww_class_mutex){+.+.}-{3:3}, at:
dma_resv_lock include/linux/dma-resv.h:345 [inline]
ffff88812ebe89a8 (reservation_ww_class_mutex){+.+.}-{3:3}, at:
drm_gem_vmap_unlocked+0x3f/0xa0 drivers/gpu/drm/drm_gem.c:1195
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(reservation_ww_class_mutex);
lock(reservation_ww_class_mutex);
*** DEADLOCK ***
May be due to missing lock nesting notation
4 locks held by kworker/4:1/81:
#0: ffff888100078d38 ((wq_completion)events){+.+.}-{0:0}, at:
arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
#0: ffff888100078d38 ((wq_completion)events){+.+.}-{0:0}, at:
arch_atomic_long_set include/linux/atomic/atomic-long.h:41 [inline]
#0: ffff888100078d38 ((wq_completion)events){+.+.}-{0:0}, at:
atomic_long_set include/linux/atomic/atomic-instrumented.h:1280
[inline]
#0: ffff888100078d38 ((wq_completion)events){+.+.}-{0:0}, at:
set_work_data kernel/workqueue.c:636 [inline]
#0: ffff888100078d38 ((wq_completion)events){+.+.}-{0:0}, at:
set_work_pool_and_clear_pending kernel/workqueue.c:663 [inline]
#0: ffff888100078d38 ((wq_completion)events){+.+.}-{0:0}, at:
process_one_work+0x8e4/0x1720 kernel/workqueue.c:2260
#1: ffffc9000694fda0
((work_completion)(&helper->damage_work)){+.+.}-{0:0}, at:
process_one_work+0x918/0x1720 kernel/workqueue.c:2264
#2: ffff88812ebe8278 (&helper->lock){+.+.}-{3:3}, at:
drm_fbdev_damage_blit drivers/gpu/drm/drm_fbdev_generic.c:312 [inline]
#2: ffff88812ebe8278 (&helper->lock){+.+.}-{3:3}, at:
drm_fbdev_fb_dirty+0x30e/0xcd0 drivers/gpu/drm/drm_fbdev_generic.c:342
#3: ffff88812ebe89a8 (reservation_ww_class_mutex){+.+.}-{3:3}, at:
dma_resv_lock include/linux/dma-resv.h:345 [inline]
#3: ffff88812ebe89a8 (reservation_ww_class_mutex){+.+.}-{3:3}, at:
drm_gem_vmap_unlocked+0x3f/0xa0 drivers/gpu/drm/drm_gem.c:1195
stack backtrace:
CPU: 4 PID: 81 Comm: kworker/4:1 Not tainted 6.1.0-rc4-next-20221111 #193
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
Workqueue: events drm_fb_helper_damage_work
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x100/0x178 lib/dump_stack.c:106
print_deadlock_bug kernel/locking/lockdep.c:2990 [inline]
check_deadlock kernel/locking/lockdep.c:3033 [inline]
validate_chain kernel/locking/lockdep.c:3818 [inline]
__lock_acquire.cold+0x119/0x3b9 kernel/locking/lockdep.c:5055
lock_acquire kernel/locking/lockdep.c:5668 [inline]
lock_acquire+0x1e0/0x610 kernel/locking/lockdep.c:5633
__mutex_lock_common kernel/locking/mutex.c:603 [inline]
__ww_mutex_lock.constprop.0+0x1ba/0x2ee0 kernel/locking/mutex.c:754
ww_mutex_lock_interruptible+0x37/0x140 kernel/locking/mutex.c:886
dma_resv_lock_interruptible include/linux/dma-resv.h:372 [inline]
ttm_bo_reserve include/drm/ttm/ttm_bo_driver.h:121 [inline]
drm_gem_vram_vmap+0xa4/0x590 drivers/gpu/drm/drm_gem_vram_helper.c:436
drm_gem_vmap+0xc5/0x1b0 drivers/gpu/drm/drm_gem.c:1166
drm_gem_vmap_unlocked+0x4a/0xa0 drivers/gpu/drm/drm_gem.c:1196
drm_client_buffer_vmap+0x45/0xd0 drivers/gpu/drm/drm_client.c:326
drm_fbdev_damage_blit drivers/gpu/drm/drm_fbdev_generic.c:314 [inline]
drm_fbdev_fb_dirty+0x31e/0xcd0 drivers/gpu/drm/drm_fbdev_generic.c:342
drm_fb_helper_damage_work+0x27a/0x5d0 drivers/gpu/drm/drm_fb_helper.c:388
process_one_work+0xa33/0x1720 kernel/workqueue.c:2289
worker_thread+0x67d/0x10e0 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
</TASK>
The config is:
https://gist.githubusercontent.com/dvyukov/2897b21db809075a22db0370c495ed2d/raw/9b2535b2ba77bb57e4f1ba2b909ad4075b6e2c6a/gistfile1.txt
Qemu command line:
qemu-system-x86_64 -enable-kvm -machine q35,nvdimm -cpu
max,migratable=off -smp 18 \
-m 72G -hda buildroot-amd64-2021.08 -kernel arch/x86/boot/bzImage -nographic \
-net user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic,model=virtio-net-pci \
-append "console=ttyS0 root=/dev/sda1 earlyprintk=serial rodata=n \
oops=panic panic_on_warn=1 panic=86400 coredump_filter=0xffff"
On Sun, 13 Nov 2022 at 21:42, Dmitry Vyukov <[email protected]> wrote:
>
> Hi,
>
> I am getting the following deadlock on reservation_ww_class_mutex
> while trying to boot next-20221111 kernel:
The code is recently added by this commit:
commit 79e2cf2e7a193473dfb0da3b9b869682b43dc60f
Author: Dmitry Osipenko <[email protected]>
Date: Mon Oct 17 20:22:11 2022 +0300
drm/gem: Take reservation lock for vmap/vunmap operations
> ============================================
> WARNING: possible recursive locking detected
> 6.1.0-rc4-next-20221111 #193 Not tainted
> --------------------------------------------
> kworker/4:1/81 is trying to acquire lock:
> ffff88812ebe89a8 (reservation_ww_class_mutex){+.+.}-{3:3}, at:
> dma_resv_lock_interruptible include/linux/dma-resv.h:372 [inline]
> ffff88812ebe89a8 (reservation_ww_class_mutex){+.+.}-{3:3}, at:
> ttm_bo_reserve include/drm/ttm/ttm_bo_driver.h:121 [inline]
> ffff88812ebe89a8 (reservation_ww_class_mutex){+.+.}-{3:3}, at:
> drm_gem_vram_vmap+0xa4/0x590 drivers/gpu/drm/drm_gem_vram_helper.c:436
>
> but task is already holding lock:
> ffff88812ebe89a8 (reservation_ww_class_mutex){+.+.}-{3:3}, at:
> dma_resv_lock include/linux/dma-resv.h:345 [inline]
> ffff88812ebe89a8 (reservation_ww_class_mutex){+.+.}-{3:3}, at:
> drm_gem_vmap_unlocked+0x3f/0xa0 drivers/gpu/drm/drm_gem.c:1195
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(reservation_ww_class_mutex);
> lock(reservation_ww_class_mutex);
>
> *** DEADLOCK ***
>
> May be due to missing lock nesting notation
>
> 4 locks held by kworker/4:1/81:
> #0: ffff888100078d38 ((wq_completion)events){+.+.}-{0:0}, at:
> arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
> #0: ffff888100078d38 ((wq_completion)events){+.+.}-{0:0}, at:
> arch_atomic_long_set include/linux/atomic/atomic-long.h:41 [inline]
> #0: ffff888100078d38 ((wq_completion)events){+.+.}-{0:0}, at:
> atomic_long_set include/linux/atomic/atomic-instrumented.h:1280
> [inline]
> #0: ffff888100078d38 ((wq_completion)events){+.+.}-{0:0}, at:
> set_work_data kernel/workqueue.c:636 [inline]
> #0: ffff888100078d38 ((wq_completion)events){+.+.}-{0:0}, at:
> set_work_pool_and_clear_pending kernel/workqueue.c:663 [inline]
> #0: ffff888100078d38 ((wq_completion)events){+.+.}-{0:0}, at:
> process_one_work+0x8e4/0x1720 kernel/workqueue.c:2260
> #1: ffffc9000694fda0
> ((work_completion)(&helper->damage_work)){+.+.}-{0:0}, at:
> process_one_work+0x918/0x1720 kernel/workqueue.c:2264
> #2: ffff88812ebe8278 (&helper->lock){+.+.}-{3:3}, at:
> drm_fbdev_damage_blit drivers/gpu/drm/drm_fbdev_generic.c:312 [inline]
> #2: ffff88812ebe8278 (&helper->lock){+.+.}-{3:3}, at:
> drm_fbdev_fb_dirty+0x30e/0xcd0 drivers/gpu/drm/drm_fbdev_generic.c:342
> #3: ffff88812ebe89a8 (reservation_ww_class_mutex){+.+.}-{3:3}, at:
> dma_resv_lock include/linux/dma-resv.h:345 [inline]
> #3: ffff88812ebe89a8 (reservation_ww_class_mutex){+.+.}-{3:3}, at:
> drm_gem_vmap_unlocked+0x3f/0xa0 drivers/gpu/drm/drm_gem.c:1195
>
> stack backtrace:
> CPU: 4 PID: 81 Comm: kworker/4:1 Not tainted 6.1.0-rc4-next-20221111 #193
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
> Workqueue: events drm_fb_helper_damage_work
> Call Trace:
> <TASK>
> __dump_stack lib/dump_stack.c:88 [inline]
> dump_stack_lvl+0x100/0x178 lib/dump_stack.c:106
> print_deadlock_bug kernel/locking/lockdep.c:2990 [inline]
> check_deadlock kernel/locking/lockdep.c:3033 [inline]
> validate_chain kernel/locking/lockdep.c:3818 [inline]
> __lock_acquire.cold+0x119/0x3b9 kernel/locking/lockdep.c:5055
> lock_acquire kernel/locking/lockdep.c:5668 [inline]
> lock_acquire+0x1e0/0x610 kernel/locking/lockdep.c:5633
> __mutex_lock_common kernel/locking/mutex.c:603 [inline]
> __ww_mutex_lock.constprop.0+0x1ba/0x2ee0 kernel/locking/mutex.c:754
> ww_mutex_lock_interruptible+0x37/0x140 kernel/locking/mutex.c:886
> dma_resv_lock_interruptible include/linux/dma-resv.h:372 [inline]
> ttm_bo_reserve include/drm/ttm/ttm_bo_driver.h:121 [inline]
> drm_gem_vram_vmap+0xa4/0x590 drivers/gpu/drm/drm_gem_vram_helper.c:436
> drm_gem_vmap+0xc5/0x1b0 drivers/gpu/drm/drm_gem.c:1166
> drm_gem_vmap_unlocked+0x4a/0xa0 drivers/gpu/drm/drm_gem.c:1196
> drm_client_buffer_vmap+0x45/0xd0 drivers/gpu/drm/drm_client.c:326
> drm_fbdev_damage_blit drivers/gpu/drm/drm_fbdev_generic.c:314 [inline]
> drm_fbdev_fb_dirty+0x31e/0xcd0 drivers/gpu/drm/drm_fbdev_generic.c:342
> drm_fb_helper_damage_work+0x27a/0x5d0 drivers/gpu/drm/drm_fb_helper.c:388
> process_one_work+0xa33/0x1720 kernel/workqueue.c:2289
> worker_thread+0x67d/0x10e0 kernel/workqueue.c:2436
> kthread+0x2e4/0x3a0 kernel/kthread.c:376
> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
> </TASK>
>
> The config is:
> https://gist.githubusercontent.com/dvyukov/2897b21db809075a22db0370c495ed2d/raw/9b2535b2ba77bb57e4f1ba2b909ad4075b6e2c6a/gistfile1.txt
>
> Qemu command line:
> qemu-system-x86_64 -enable-kvm -machine q35,nvdimm -cpu
> max,migratable=off -smp 18 \
> -m 72G -hda buildroot-amd64-2021.08 -kernel arch/x86/boot/bzImage -nographic \
> -net user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic,model=virtio-net-pci \
> -append "console=ttyS0 root=/dev/sda1 earlyprintk=serial rodata=n \
> oops=panic panic_on_warn=1 panic=86400 coredump_filter=0xffff"
On 11/13/22 23:48, Dmitry Vyukov wrote:
>> Hi,
>>
>> I am getting the following deadlock on reservation_ww_class_mutex
>> while trying to boot next-20221111 kernel:
> The code is recently added by this commit:
>
> commit 79e2cf2e7a193473dfb0da3b9b869682b43dc60f
> Author: Dmitry Osipenko <[email protected]>
> Date: Mon Oct 17 20:22:11 2022 +0300
> drm/gem: Take reservation lock for vmap/vunmap operations
Thanks for the report. I reproduced this problem using bochs driver,
will send the fix ASAP.
--
Best regards,
Dmitry