Hello everyone,
While backporting i915 fixes to the RHEL9 kernel for a similar looking
issue; I noticed the commits that worked for RHEL8 did not work for RHEL9.
Testing the (almost) latest release: 6.5.2-rt8; showed a lot of call traces
on RHEL9. [1] being the most common one and it repeats itself on suspend.
[2] was the second one to show and seems to be the second most common
call trace. This was tested on a Framework Alder Lake laptop with i915
graphics. There was a total of 36 call traces before suspend and
additional 12 after suspend (once again, [1]).
When I tested on 6.6.0-rc1-rt1 the kernel crashed on boot. I did not
have a way to pull the information and was transcribed manually. [3]
[1]
BUG: sleeping function called from invalid context at drivers/gpu/drm/i915/gt/uc/intel_guc.h:330
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 6590, name: gnome-shell
preempt_count: 0, expected: 0
RCU nest depth: 6, expected: 0
12 locks held by gnome-shell/6590:
#0: ffffc900083dfb70 (crtc_ww_class_acquire){+.+.}-{0:0}, at: drm_mode_atomic_ioctl (drivers/gpu/drm/drm_atomic_uapi.c:1347) drm
#1: ffff88812d8e6880 (crtc_ww_class_mutex){+.+.}-{3:3}, at: modeset_lock (drivers/gpu/drm/drm_modeset_lock.c:309) drm
#2: ffff88884e42f9e0 ((softirq_ctrl.lock)){+.+.}-{2:2}, at: __local_bh_disable_ip (./include/linux/rcupdate.h:747 kernel/softirq.c:155)
#3: ffffffff9794b9c0 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock (kernel/locking/rtmutex.c:1862 kernel/locking/spinlock_rt.c:43 kernel/locking/spinlock_rt.c:49 kernel/locking/spinlock_rt.c:57)
#4: ffffffff9794b9c0 (rcu_read_lock){....}-{1:2}, at: __local_bh_disable_ip (kernel/softirq.c:153)
#5: ffffffff9794b9c0 (rcu_read_lock){....}-{1:2}, at: fence_set_priority (drivers/gpu/drm/i915/gem/i915_gem_wait.c:102 drivers/gpu/drm/i915/gem/i915_gem_wait.c:92) i915
#6: ffffffffc0e91060 (schedule_lock){+.+.}-{2:2}, at: i915_schedule (drivers/gpu/drm/i915/i915_scheduler.c:292) i915
#7: ffffffff9794b9c0 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock (kernel/locking/rtmutex.c:1862 kernel/locking/spinlock_rt.c:43 kernel/locking/spinlock_rt.c:49 kernel/locking/spinlock_rt.c:57)
#8: ffff88818110d468 (&sched_engine->lock/2){+.+.}-{2:2}, at: __i915_schedule (drivers/gpu/drm/i915/i915_scheduler.c:144 drivers/gpu/drm/i915/i915_scheduler.c:238) i915
#9: ffffffff9794b9c0 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock (kernel/locking/rtmutex.c:1862 kernel/locking/spinlock_rt.c:43 kernel/locking/spinlock_rt.c:49 kernel/locking/spinlock_rt.c:57)
#10: ffff8881419be9b0 (&ce->guc_state.lock){+.+.}-{2:2}, at: guc_bump_inflight_request_prio (drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:4050) i915
#11: ffffffff9794b9c0 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock (kernel/locking/rtmutex.c:1862 kernel/locking/spinlock_rt.c:43 kernel/locking/spinlock_rt.c:49 kernel/locking/spinlock_rt.c:57)
Hardware name: Framework Laptop (12th Gen Intel Core)/FRANGACP04, BIOS 03.04 07/15/2022
Call Trace:
<TASK>
dump_stack_lvl (lib/dump_stack.c:107)
__might_resched (kernel/sched/core.c:10320)
guc_context_set_prio (./drivers/gpu/drm/i915/gt/uc/intel_guc.h:330 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:625 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:2478 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:3333 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:3360) i915
? __pfx_guc_context_set_prio (drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:3348) i915
? mark_held_locks (kernel/locking/lockdep.c:4273)
guc_bump_inflight_request_prio (drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:3414 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:4055) i915
__i915_schedule (drivers/gpu/drm/i915/i915_scheduler.c:258) i915
? __pfx___i915_schedule (drivers/gpu/drm/i915/i915_scheduler.c:157) i915
? __pfx___lock_release (kernel/locking/lockdep.c:5405)
i915_schedule (./include/linux/spinlock_rt.h:117 drivers/gpu/drm/i915/i915_scheduler.c:293) i915
fence_set_priority (drivers/gpu/drm/i915/gem/i915_gem_wait.c:106 drivers/gpu/drm/i915/gem/i915_gem_wait.c:92) i915
i915_gem_fence_wait_priority.part.0 (drivers/gpu/drm/i915/gem/i915_gem_wait.c:145) i915
i915_gem_object_wait_priority (drivers/gpu/drm/i915/gem/i915_gem_wait.c:157) i915
? __pfx_i915_gem_object_wait_priority (drivers/gpu/drm/i915/gem/i915_gem_wait.c:151) i915
? __pfx_mark_lock.part.0 (kernel/locking/lockdep.c:4636)
intel_prepare_plane_fb (drivers/gpu/drm/i915/display/intel_atomic_plane.c:1078) i915
? __pfx_intel_prepare_plane_fb (drivers/gpu/drm/i915/display/intel_atomic_plane.c:1017) i915
? __module_address.part.0 (kernel/module/main.c:3287)
? is_module_address (./arch/x86/include/asm/bitops.h:207 ./arch/x86/include/asm/bitops.h:239 ./include/asm-generic/bitops/instrumented-non-atomic.h:142 ./include/linux/thread_info.h:118 ./arch/x86/include/asm/preempt.h:132 kernel/module/main.c:3258)
drm_atomic_helper_prepare_planes.part.0 (drivers/gpu/drm/drm_atomic_helper.c:2589) drm_kms_helper
? __init_waitqueue_head (./include/linux/list.h:37 kernel/sched/wait.c:12)
intel_atomic_commit (drivers/gpu/drm/i915/display/intel_display.c:6414 drivers/gpu/drm/i915/display/intel_display.c:7249) i915
drm_mode_atomic_ioctl (drivers/gpu/drm/drm_atomic_uapi.c:1438) drm
? __pfx_drm_mode_atomic_ioctl (drivers/gpu/drm/drm_atomic_uapi.c:1291) drm
? __pfx_do_raw_spin_trylock (kernel/locking/spinlock_debug.c:121)
? _raw_spin_unlock_irqrestore (./include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194)
? rt_spin_unlock (./include/linux/rcupdate.h:781 kernel/locking/spinlock_rt.c:82)
drm_ioctl_kernel (drivers/gpu/drm/drm_ioctl.c:788) drm
? __pfx_drm_mode_atomic_ioctl (drivers/gpu/drm/drm_atomic_uapi.c:1291) drm
? __pfx_drm_ioctl_kernel (drivers/gpu/drm/drm_ioctl.c:773) drm
? __might_fault (mm/memory.c:5856 mm/memory.c:5849)
? __might_fault (mm/memory.c:5856 mm/memory.c:5849)
drm_ioctl (drivers/gpu/drm/drm_ioctl.c:892) drm
? __pfx_drm_mode_atomic_ioctl (drivers/gpu/drm/drm_atomic_uapi.c:1291) drm
? __pfx_drm_ioctl (drivers/gpu/drm/drm_ioctl.c:813) drm
? register_lock_class (./include/linux/rculist.h:589 kernel/locking/lockdep.c:1340)
? __fget_files (./include/linux/rcupdate.h:781 fs/file.c:915)
? __fget_files (fs/file.c:918)
? security_file_ioctl (security/security.c:2608 (discriminator 13))
__x64_sys_ioctl (fs/ioctl.c:51 fs/ioctl.c:870 fs/ioctl.c:856 fs/ioctl.c:856)
do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
? rcu_is_watching (./arch/x86/include/asm/bitops.h:207 ./arch/x86/include/asm/bitops.h:239 ./include/asm-generic/bitops/instrumented-non-atomic.h:142 ./include/linux/thread_info.h:118 ./arch/x86/include/asm/preempt.h:108 kernel/rcu/tree.c:696)
? do_syscall_64 (arch/x86/entry/common.c:87)
? lockdep_hardirqs_on (kernel/locking/lockdep.c:4422)
? do_syscall_64 (arch/x86/entry/common.c:87)
? asm_sysvec_apic_timer_interrupt (./arch/x86/include/asm/idtentry.h:645)
? lockdep_hardirqs_on (kernel/locking/lockdep.c:4422)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
RIP: 0033:0x7f54bc427c6b
Code: 73 01 c3 48 8b 0d b5 b1 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 85 b1 1b 00 f7 d8 64 89 01 48
All code
========
0: 73 01 jae 0x3
2: c3 retq
3: 48 8b 0d b5 b1 1b 00 mov 0x1bb1b5(%rip),%rcx # 0x1bb1bf
a: f7 d8 neg %eax
c: 64 89 01 mov %eax,%fs:(%rcx)
f: 48 83 c8 ff or $0xffffffffffffffff,%rax
13: c3 retq
14: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
1b: 00 00 00
1e: 90 nop
1f: f3 0f 1e fa endbr64
23: b8 10 00 00 00 mov $0x10,%eax
28: 0f 05 syscall
2a:* 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax <-- trapping instruction
30: 73 01 jae 0x33
32: c3 retq
33: 48 8b 0d 85 b1 1b 00 mov 0x1bb185(%rip),%rcx # 0x1bb1bf
3a: f7 d8 neg %eax
3c: 64 89 01 mov %eax,%fs:(%rcx)
3f: 48 rex.W
Code starting with the faulting instruction
===========================================
0: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax
6: 73 01 jae 0x9
8: c3 retq
9: 48 8b 0d 85 b1 1b 00 mov 0x1bb185(%rip),%rcx # 0x1bb195
10: f7 d8 neg %eax
12: 64 89 01 mov %eax,%fs:(%rcx)
15: 48 rex.W
RSP: 002b:00007fff0545df38 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007fff0545df80 RCX: 00007f54bc427c6b
RDX: 00007fff0545df80 RSI: 00000000c03864bc RDI: 000000000000000d
RBP: 00000000c03864bc R08: 0000000000000000 R09: 0000000000000000
R10: 00007f54bc5e3c80 R11: 0000000000000246 R12: 000055c504faa9a0
R13: 000000000000000d R14: 000055c505118380 R15: 000055c50554daa0
</TASK>
[2]
BUG: sleeping function called from invalid context at drivers/gpu/drm/i915/gt/uc/intel_guc.h:330
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 6590, name: gnome-shell
preempt_count: 0, expected: 0
RCU nest depth: 5, expected: 0
11 locks held by gnome-shell/6590:
#0: ffffc900083df8f0 (reservation_ww_class_acquire){+.+.}-{0:0}, at: i915_gem_do_execbuffer (drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:1927 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:3454) i915
#1: ffff8881829301f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: __intel_context_do_pin_ww (./include/linux/dma-resv.h:372 ./drivers/gpu/drm/i915/gem/i915_gem_object.h:171 ./drivers/gpu/drm/i915/gem/i915_gem_object.h:193 drivers/gpu/drm/i915/gt/intel_context.c:222) i915
#2: ffff8881a346f870 (&timeline->mutex){+.+.}-{3:3}, at: i915_request_create (./drivers/gpu/drm/i915/gt/intel_context.h:262 drivers/gpu/drm/i915/i915_request.c:1035) i915
#3: ffff88884dc2f9e0 ((softirq_ctrl.lock)){+.+.}-{2:2}, at: __local_bh_disable_ip (./include/linux/rcupdate.h:747 kernel/softirq.c:155)
#4: ffffffff9794b9c0 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock (kernel/locking/rtmutex.c:1862 kernel/locking/spinlock_rt.c:43 kernel/locking/spinlock_rt.c:49 kernel/locking/spinlock_rt.c:57)
#5: ffffffff9794b9c0 (rcu_read_lock){....}-{1:2}, at: __local_bh_disable_ip (kernel/softirq.c:153)
#6: ffffffff9794b9c0 (rcu_read_lock){....}-{1:2}, at: submit_notify (./include/linux/rcupdate.h:747 drivers/gpu/drm/i915/i915_request.c:796) i915
#7: ffff88818110d468 (&sched_engine->lock/2){+.+.}-{2:2}, at: guc_submit_request (drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:2026) i915
#8: ffffffff9794b9c0 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock (kernel/locking/rtmutex.c:1862 kernel/locking/spinlock_rt.c:43 kernel/locking/spinlock_rt.c:49 kernel/locking/spinlock_rt.c:57)
#9: ffff8881419be9b0 (&ce->guc_state.lock){+.+.}-{2:2}, at: add_to_context (./include/linux/list.h:134 ./include/linux/list.h:229 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:3434) i915
#10: ffffffff9794b9c0 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock (kernel/locking/rtmutex.c:1862 kernel/locking/spinlock_rt.c:43 kernel/locking/spinlock_rt.c:49 kernel/locking/spinlock_rt.c:57)
Hardware name: Framework Laptop (12th Gen Intel Core)/FRANGACP04, BIOS 03.04 07/15/2022
Call Trace:
<TASK>
dump_stack_lvl (lib/dump_stack.c:107)
__might_resched (kernel/sched/core.c:10320)
guc_context_set_prio (./drivers/gpu/drm/i915/gt/uc/intel_guc.h:330 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:625 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:2478 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:3333 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:3360) i915
? __pfx_guc_context_set_prio (drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:3348) i915
? mark_held_locks (kernel/locking/lockdep.c:4273)
add_to_context (drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:3414 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:3444) i915
? gen12_emit_fini_breadcrumb_rcs (drivers/gpu/drm/i915/gt/gen8_engine_cs.c:831) i915
__i915_request_submit (drivers/gpu/drm/i915/i915_request.c:676) i915
? rcu_is_watching (./include/linux/context_tracking.h:122 kernel/rcu/tree.c:695)
guc_submit_request (drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:790 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:1990 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:2028) i915
submit_notify (drivers/gpu/drm/i915/i915_request.c:797) i915
__i915_sw_fence_complete (drivers/gpu/drm/i915/i915_sw_fence.c:131 drivers/gpu/drm/i915/i915_sw_fence.c:201 drivers/gpu/drm/i915/i915_sw_fence.c:191) i915
__i915_request_queue (./include/linux/bottom_half.h:33 drivers/gpu/drm/i915/i915_request.c:1843) i915
eb_request_add (drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:3110) i915
? __pfx_eb_request_add (drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:3074) i915
? eb_request_submit (drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:2418) i915
i915_gem_do_execbuffer (drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:3131 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:3486) i915
? __pfx_i915_gem_do_execbuffer (drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:3347) i915
? validate_chain (./arch/x86/include/asm/bitops.h:228 ./arch/x86/include/asm/bitops.h:240 ./include/asm-generic/bitops/instrumented-non-atomic.h:142 kernel/locking/lockdep.c:228 kernel/locking/lockdep.c:3780 kernel/locking/lockdep.c:3836)
? i915_gem_execbuffer2_ioctl (drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:3579) i915
? __kmalloc_node_track_caller (mm/slab_common.c:973 mm/slab_common.c:1005)
? __lock_acquire (kernel/locking/lockdep.c:5136)
? __might_fault (mm/memory.c:5856 mm/memory.c:5849)
? __might_fault (mm/memory.c:5856 mm/memory.c:5849)
i915_gem_execbuffer2_ioctl (drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:3600) i915
drm_ioctl_kernel (drivers/gpu/drm/drm_ioctl.c:788) drm
? __pfx_i915_gem_execbuffer2_ioctl (drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:3560) i915
? __pfx_drm_ioctl_kernel (drivers/gpu/drm/drm_ioctl.c:773) drm
? __might_fault (mm/memory.c:5856 mm/memory.c:5849)
? __might_fault (mm/memory.c:5856 mm/memory.c:5849)
drm_ioctl (drivers/gpu/drm/drm_ioctl.c:892) drm
? __pfx_i915_gem_execbuffer2_ioctl (drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:3560) i915
? __pfx_drm_ioctl (drivers/gpu/drm/drm_ioctl.c:813) drm
? register_lock_class (kernel/locking/lockdep.c:1335)
? __fget_files (fs/file.c:918)
? security_file_ioctl (security/security.c:2608 (discriminator 13))
__x64_sys_ioctl (fs/ioctl.c:51 fs/ioctl.c:870 fs/ioctl.c:856 fs/ioctl.c:856)
do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
? ktime_get_coarse_real_ts64 (./include/linux/seqlock.h:104 kernel/time/timekeeping.c:2261)
? __task_pid_nr_ns (./include/linux/rcupdate.h:781 kernel/pid.c:501)
? rcu_is_watching (./arch/x86/include/asm/bitops.h:207 ./arch/x86/include/asm/bitops.h:239 ./include/asm-generic/bitops/instrumented-non-atomic.h:142 ./include/linux/thread_info.h:118 ./arch/x86/include/asm/preempt.h:108 kernel/rcu/tree.c:696)
? do_syscall_64 (arch/x86/entry/common.c:87)
? lockdep_hardirqs_on (kernel/locking/lockdep.c:4422)
? do_syscall_64 (arch/x86/entry/common.c:87)
? do_syscall_64 (arch/x86/entry/common.c:87)
? lockdep_hardirqs_on (kernel/locking/lockdep.c:4422)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
RIP: 0033:0x7f54bc427c6b
Code: 73 01 c3 48 8b 0d b5 b1 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 85 b1 1b 00 f7 d8 64 89 01 48
All code
========
0: 73 01 jae 0x3
2: c3 retq
3: 48 8b 0d b5 b1 1b 00 mov 0x1bb1b5(%rip),%rcx # 0x1bb1bf
a: f7 d8 neg %eax
c: 64 89 01 mov %eax,%fs:(%rcx)
f: 48 83 c8 ff or $0xffffffffffffffff,%rax
13: c3 retq
14: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
1b: 00 00 00
1e: 90 nop
1f: f3 0f 1e fa endbr64
23: b8 10 00 00 00 mov $0x10,%eax
28: 0f 05 syscall
2a:* 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax <-- trapping instruction
30: 73 01 jae 0x33
32: c3 retq
33: 48 8b 0d 85 b1 1b 00 mov 0x1bb185(%rip),%rcx # 0x1bb1bf
3a: f7 d8 neg %eax
3c: 64 89 01 mov %eax,%fs:(%rcx)
3f: 48 rex.W
Code starting with the faulting instruction
===========================================
0: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax
6: 73 01 jae 0x9
8: c3 retq
9: 48 8b 0d 85 b1 1b 00 mov 0x1bb185(%rip),%rcx # 0x1bb195
10: f7 d8 neg %eax
12: 64 89 01 mov %eax,%fs:(%rcx)
15: 48 rex.W
RSP: 002b:00007fff0545ddf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000010 RCX: 00007f54bc427c6b
RDX: 00007fff0545de00 RSI: 0000000040406469 RDI: 0000000000000010
RBP: 00007fff0545de00 R08: 000055c506bd3520 R09: 000055c504560e60
R10: 0000000000100080 R11: 0000000000000246 R12: 000055c5046ae6f0
R13: 000055c504f1c280 R14: 000055c5046387dc R15: 000055c5046387c0
</TASK>
[3]
general protection fault, probably for non-canonical address 0xdffffc0004: 0000(#1) PREEMPT_RT SMP KASAN NOPRI
KASAM: null-ptr-deref in range [0x000...20-0x000...27]
RIP: 0010:ucsi_debugfs_unregister (drivers/usb/typec/ucsi/debugfs.c:87)
[snipped]
PKRU: 5555554
Call Trace:
<TASK>
? die_addr (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:460)
? exe_general_proection+0x150/0x230
? asm_exc_general_protection (./arch/x86/include/asm/idtentry.h:564)
? ucsi_debugfs_unregister (drivers/usb/typec/ucsi/debugfs.c:87)
usci_destroy+0xe/0x20
ucsi_acpi_probe (drivers/usb/typec/ucsi/ucsi_acpi.c:207)
platform_probe (drivers/base/platform.c:1404)
really_probe (drivers/base/dd.c:579 drivers/base/dd.c:658)
__driver_probe_device (drivers/base/dd.c:800)
? __driver_attach (drivers/base/dd.c:1216)
driver_probe_device (drivers/base/dd.c:830)
__driver_attach (drivers/base/dd.c:1217)
? __pfx___driver_attach (drivers/base/dd.c:1157)
bus_for_each_dev (drivers/base/bus.c:368)
? __pfx_bus_for_each_dev (drivers/base/bus.c:356)
? rt_spin_unlock (./include/linux/rcupdate.h:781 kernel/locking/spinlock_rt.c:82)
bus_add_driver (drivers/base/bus.c:674)
driver_register (drivers/base/driver.c:246)
? rcu_is_watching (./arch/x86/include/asm/bitops.h:207 ./arch/x86/include/asm/bitops.h:239 ./include/asm-generic/bitops/instrumented-non-atomic.h:142 ./include/linux/thread_info.h:118 ./arch/x86/include/asm/preempt.h:108 kernel/rcu/tree.c:700)
? __pfx_usci_acpi_platform_driver_init+0x10/0x10
do_one_initcall (init/main.c:1232)
? __pfx_do_one_initcall (init/main.c:1223)
? parse_one (kernel/params.c:138)
? __kem_cache_alloc_node+0x191/0x270
? rcu_is_watching (./arch/x86/include/asm/bitops.h:207 ./arch/x86/include/asm/bitops.h:239 ./include/asm-generic/bitops/instrumented-non-atomic.h:142 ./include/linux/thread_info.h:118 ./arch/x86/include/asm/preempt.h:108 kernel/rcu/tree.c:700)
do_initcalls (init/main.c:1293 init/main.c:1310)
kernel_init_freeable (init/main.c:1551)
? __pfx_kernel_init (init/main.c:1429)
kernel_init (init/main.c:1439)
ret_from_fork (arch/x86/kernel/process.c:147)
? __pfx_kernel_init (init/main.c:1429)
ret_from_fork_asm (arch/x86/entry/entry_64.S:312)
</TASK>
Modules
On 2023-09-20 18:07:35 [-0400], John B. Wyatt IV wrote:
> Hello everyone,
Hi,
> While backporting i915 fixes to the RHEL9 kernel for a similar looking
> issue; I noticed the commits that worked for RHEL8 did not work for RHEL9.
>
> Testing the (almost) latest release: 6.5.2-rt8; showed a lot of call traces
> on RHEL9. [1] being the most common one and it repeats itself on suspend.
A warn-once might help to reduce them so they can be worked on one by
one.
> [2] was the second one to show and seems to be the second most common
> call trace. This was tested on a Framework Alder Lake laptop with i915
> graphics. There was a total of 36 call traces before suspend and
> additional 12 after suspend (once again, [1]).
>
> When I tested on 6.6.0-rc1-rt1 the kernel crashed on boot. I did not
> have a way to pull the information and was transcribed manually. [3]
>
> [1]
Both:
> BUG: sleeping function called from invalid context at drivers/gpu/drm/i915/gt/uc/intel_guc.h:330
> in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 6590, name: gnome-shell
> preempt_count: 0, expected: 0
> RCU nest depth: 6, expected: 0
> 12 locks held by gnome-shell/6590:
…
> BUG: sleeping function called from invalid context at drivers/gpu/drm/i915/gt/uc/intel_guc.h:330
> in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 6590, name: gnome-shell
> preempt_count: 0, expected: 0
> RCU nest depth: 5, expected: 0
are might-sleep splats. I don't see these on my notebook/desktop on
6.6-rc. I don't remember doing suspend on 6.5 notebook but I did that on
my desktop for testing.
It looks like due to "locks" the RCU is > 0 and then the splat triggers
because it assumes that it will schedule-out which is okay on RT. But
then it is not okay for the ww-mutex to do so I am a little confused if
this is RT only problem or also not RT. But maybe it is just a try-lock
and the warning is just wrongly triggered…
> [3]
>
> general protection fault, probably for non-canonical address 0xdffffc0004: 0000(#1) PREEMPT_RT SMP KASAN NOPRI
> KASAM: null-ptr-deref in range [0x000...20-0x000...27]
> RIP: 0010:ucsi_debugfs_unregister (drivers/usb/typec/ucsi/debugfs.c:87)
> [snipped]
> PKRU: 5555554
> Call Trace:
> <TASK>
> usci_destroy+0xe/0x20
> ucsi_acpi_probe (drivers/usb/typec/ucsi/ucsi_acpi.c:207)
This is odd. That means that ucsi_register() failed and debugfs was
setup and is NULL. And check in line 87 checks ucsi which is non-NULL
and the ucsi->debugfs is NULL. So it should return but somehow it does
this. Does this also trigger without KASAN?
In the meantime let me try to enable KASAN…
Sebastian
On Fri, Sep 22, 2023 at 01:07:20PM +0200, Sebastian Andrzej Siewior wrote:
> On 2023-09-20 18:07:35 [-0400], John B. Wyatt IV wrote:
>
> > [2] was the second one to show and seems to be the second most common
> > call trace. This was tested on a Framework Alder Lake laptop with i915
> > graphics. There was a total of 36 call traces before suspend and
> > additional 12 after suspend (once again, [1]).
> >
> > When I tested on 6.6.0-rc1-rt1 the kernel crashed on boot. I did not
> > have a way to pull the information and was transcribed manually. [3]
> >
> > [1]
>
> Both:
> > BUG: sleeping function called from invalid context at drivers/gpu/drm/i915/gt/uc/intel_guc.h:330
> > in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 6590, name: gnome-shell
> > preempt_count: 0, expected: 0
> > RCU nest depth: 6, expected: 0
> > 12 locks held by gnome-shell/6590:
> …
> > BUG: sleeping function called from invalid context at drivers/gpu/drm/i915/gt/uc/intel_guc.h:330
> > in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 6590, name: gnome-shell
> > preempt_count: 0, expected: 0
> > RCU nest depth: 5, expected: 0
>
> are might-sleep splats. I don't see these on my notebook/desktop on
> 6.6-rc. I don't remember doing suspend on 6.5 notebook but I did that on
> my desktop for testing.
> It looks like due to "locks" the RCU is > 0 and then the splat triggers
> because it assumes that it will schedule-out which is okay on RT. But
> then it is not okay for the ww-mutex to do so I am a little confused if
> this is RT only problem or also not RT. But maybe it is just a try-lock
> and the warning is just wrongly triggered…
>
For stock (non-rt) I do not see it with 6.6-rc2. This was compiled
with the Stream 9 debug config.
I was able to reproduce similar call traces once I tested again
with 6.6-rc3-rt5 at [4] and [5].
What would be the best way to determine if the warning is wrongly
triggered?
> > [3]
> >
> > general protection fault, probably for non-canonical address 0xdffffc0004: 0000(#1) PREEMPT_RT SMP KASAN NOPRI
> > KASAM: null-ptr-deref in range [0x000...20-0x000...27]
> > RIP: 0010:ucsi_debugfs_unregister (drivers/usb/typec/ucsi/debugfs.c:87)
> > [snipped]
> > PKRU: 5555554
> > Call Trace:
> > <TASK>
> > usci_destroy+0xe/0x20
> > ucsi_acpi_probe (drivers/usb/typec/ucsi/ucsi_acpi.c:207)
>
> This is odd. That means that ucsi_register() failed and debugfs was
> setup and is NULL. And check in line 87 checks ucsi which is non-NULL
> and the ucsi->debugfs is NULL. So it should return but somehow it does
> this. Does this also trigger without KASAN?
>
> In the meantime let me try to enable KASAN…
For [3] I found the stock kernel rc1 (from Torvalds tree) crashed my laptop
as well. 6.6.0-rc3-rt5 boots fine. I did not think to check it until I saw
your earlier not RT comment above.
[4]
BUG: sleeping function called from invalid context at drivers/gpu/drm/i915/gt/uc/intel_guc.h:330
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 2464, name: gnome-shell
preempt_count: 0, expected: 0
RCU nest depth: 6, expected: 0
12 locks held by gnome-shell/2464:
#0: ffffc900047abcd0 (crtc_ww_class_acquire){+.+.}-{0:0}, at: drm_mode_atomic_ioctl (drivers/gpu/drm/drm_atomic_uapi.c:1384) drm
#1: ffff88810144bc80 (crtc_ww_class_mutex){+.+.}-{3:3}, at: modeset_lock (drivers/gpu/drm/drm_modeset_lock.c:316) drm
#2: ffff88888f01fa20 ((softirq_ctrl.lock)){+.+.}-{2:2}, at: __local_bh_disable_ip (./include/linux/rcupdate.h:747 kernel/softirq.c:155)
#3: ffffffff90827b00 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock (./include/linux/rcupdate.h:303 ./include/linux/rcupdate.h:749 kernel/locking/spinlock_rt.c:50 kernel/locking/spinlock_rt.c:57)
#4: ffffffff90827b00 (rcu_read_lock){....}-{1:2}, at: __local_bh_disable_ip (./include/linux/rcupdate.h:303 ./include/linux/rcupdate.h:749 kernel/softirq.c:155)
#5: ffffffff90827b00 (rcu_read_lock){....}-{1:2}, at: fence_set_priority (./include/linux/rcupdate.h:303 ./include/linux/rcupdate.h:749 drivers/gpu/drm/i915/gem/i915_gem_wait.c:104 drivers/gpu/drm/i915/gem/i915_gem_wait.c:92) i915
#6: ffffffffc077c100 (schedule_lock){+.+.}-{2:2}, at: i915_schedule (drivers/gpu/drm/i915/i915_scheduler.c:292) i915
#7: ffffffff90827b00 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock (./include/linux/rcupdate.h:303 ./include/linux/rcupdate.h:749 kernel/locking/spinlock_rt.c:50 kernel/locking/spinlock_rt.c:57)
#8: ffff888106138668 (&sched_engine->lock/2){+.+.}-{2:2}, at: __i915_schedule (drivers/gpu/drm/i915/i915_scheduler.c:144 drivers/gpu/drm/i915/i915_scheduler.c:238) i915
#9: ffffffff90827b00 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock (./include/linux/rcupdate.h:303 ./include/linux/rcupdate.h:749 kernel/locking/spinlock_rt.c:50 kernel/locking/spinlock_rt.c:57)
#10: ffff88824084b370 (&ce->guc_state.lock){+.+.}-{2:2}, at: guc_bump_inflight_request_prio (drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:4050) i915
#11: ffffffff90827b00 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock (./include/linux/rcupdate.h:303 ./include/linux/rcupdate.h:749 kernel/locking/spinlock_rt.c:50 kernel/locking/spinlock_rt.c:57)
Hardware name: Framework Laptop (12th Gen Intel Core)/FRANGACP04, BIOS 03.04 07/15/2022
Call Trace:
<TASK>
dump_stack_lvl (lib/dump_stack.c:107)
__might_resched (kernel/sched/core.c:10318)
guc_context_set_prio (./drivers/gpu/drm/i915/gt/uc/intel_guc.h:330 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:625 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:2478 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:3333 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:3360) i915
guc_bump_inflight_request_prio (drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:3414 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:4055) i915
__i915_schedule (drivers/gpu/drm/i915/i915_scheduler.c:258) i915
? __i915_schedule (drivers/gpu/drm/i915/i915_scheduler.c:158) i915
? mark_held_locks (kernel/locking/lockdep.c:4273)
i915_schedule (./include/linux/spinlock_rt.h:117 drivers/gpu/drm/i915/i915_scheduler.c:293) i915
fence_set_priority (drivers/gpu/drm/i915/gem/i915_gem_wait.c:106 drivers/gpu/drm/i915/gem/i915_gem_wait.c:92) i915
i915_gem_fence_wait_priority (drivers/gpu/drm/i915/gem/i915_gem_wait.c:145) i915
i915_gem_object_wait_priority (drivers/gpu/drm/i915/gem/i915_gem_wait.c:157 (discriminator 3)) i915
intel_prepare_plane_fb (drivers/gpu/drm/i915/display/intel_atomic_plane.c:1083) i915
? is_module_address (./arch/x86/include/asm/preempt.h:121 kernel/module/main.c:3258)
? static_obj (kernel/locking/lockdep.c:853)
drm_atomic_helper_prepare_planes.part.0 (drivers/gpu/drm/drm_atomic_helper.c:2589) drm_kms_helper
intel_atomic_commit (drivers/gpu/drm/i915/display/intel_display.c:6418 drivers/gpu/drm/i915/display/intel_display.c:7257) i915
drm_mode_atomic_ioctl (drivers/gpu/drm/drm_atomic_uapi.c:1480) drm
? __pfx_drm_mode_atomic_ioctl (drivers/gpu/drm/drm_atomic_uapi.c:1328) drm
drm_ioctl_kernel (drivers/gpu/drm/drm_ioctl.c:789) drm
drm_ioctl (drivers/gpu/drm/drm_ioctl.c:893) drm
? __pfx_drm_mode_atomic_ioctl (drivers/gpu/drm/drm_atomic_uapi.c:1328) drm
__x64_sys_ioctl (fs/ioctl.c:51 fs/ioctl.c:871 fs/ioctl.c:857 fs/ioctl.c:857)
do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
? lockdep_hardirqs_on (kernel/locking/lockdep.c:4422)
? do_syscall_64 (arch/x86/entry/common.c:87)
? asm_sysvec_apic_timer_interrupt (./arch/x86/include/asm/idtentry.h:645)
? lockdep_hardirqs_on (kernel/locking/lockdep.c:4422)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
RIP: 0033:0x7f935edd3c6b
Code: 73 01 c3 48 8b 0d b5 b1 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 85 b1 1b 00 f7 d8 64 89 01 48
All code
========
0: 73 01 jae 0x3
2: c3 retq
3: 48 8b 0d b5 b1 1b 00 mov 0x1bb1b5(%rip),%rcx # 0x1bb1bf
a: f7 d8 neg %eax
c: 64 89 01 mov %eax,%fs:(%rcx)
f: 48 83 c8 ff or $0xffffffffffffffff,%rax
13: c3 retq
14: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
1b: 00 00 00
1e: 90 nop
1f: f3 0f 1e fa endbr64
23: b8 10 00 00 00 mov $0x10,%eax
28: 0f 05 syscall
2a:* 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax <-- trapping instruction
30: 73 01 jae 0x33
32: c3 retq
33: 48 8b 0d 85 b1 1b 00 mov 0x1bb185(%rip),%rcx # 0x1bb1bf
3a: f7 d8 neg %eax
3c: 64 89 01 mov %eax,%fs:(%rcx)
3f: 48 rex.W
Code starting with the faulting instruction
===========================================
0: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax
6: 73 01 jae 0x9
8: c3 retq
9: 48 8b 0d 85 b1 1b 00 mov 0x1bb185(%rip),%rcx # 0x1bb195
10: f7 d8 neg %eax
12: 64 89 01 mov %eax,%fs:(%rcx)
15: 48 rex.W
RSP: 002b:00007ffca0a3baa8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffca0a3baf0 RCX: 00007f935edd3c6b
RDX: 00007ffca0a3baf0 RSI: 00000000c03864bc RDI: 000000000000000d
RBP: 00000000c03864bc R08: 0000000000000000 R09: 0000000000000000
R10: 00007f935ef8fc80 R11: 0000000000000246 R12: 0000557be081ebc0
R13: 000000000000000d R14: 0000557be2e09db0 R15: 0000557be3910e90
</TASK>
[5]
BUG: sleeping function called from invalid context at drivers/gpu/drm/i915/gt/uc/intel_guc.h:330
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 2464, name: gnome-shell
preempt_count: 0, expected: 0
RCU nest depth: 5, expected: 0
11 locks held by gnome-shell/2464:
#0: ffffc900047abb18 (reservation_ww_class_acquire){+.+.}-{0:0}, at: i915_gem_do_execbuffer (drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:1906 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:3471) i915
#1: ffff8881e02cb678 (reservation_ww_class_mutex){+.+.}-{3:3}, at: __intel_context_do_pin_ww (./drivers/gpu/drm/i915/gem/i915_gem_object.h:175 ./drivers/gpu/drm/i915/gem/i915_gem_object.h:193 drivers/gpu/drm/i915/gt/intel_context.c:222) i915
#2: ffff888105878470 (&timeline->mutex){+.+.}-{3:3}, at: i915_request_create (./drivers/gpu/drm/i915/gt/intel_context.h:262 drivers/gpu/drm/i915/i915_request.c:1032) i915
#3: ffff88888e01fa20 ((softirq_ctrl.lock)){+.+.}-{2:2}, at: __local_bh_disable_ip (./include/linux/rcupdate.h:747 kernel/softirq.c:155)
#4: ffffffff90827b00 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock (./include/linux/rcupdate.h:303 ./include/linux/rcupdate.h:749 kernel/locking/spinlock_rt.c:50 kernel/locking/spinlock_rt.c:57)
#5: ffffffff90827b00 (rcu_read_lock){....}-{1:2}, at: __local_bh_disable_ip (./include/linux/rcupdate.h:303 ./include/linux/rcupdate.h:749 kernel/softirq.c:155)
#6: ffffffff90827b00 (rcu_read_lock){....}-{1:2}, at: submit_notify (./include/linux/rcupdate.h:303 ./include/linux/rcupdate.h:749 drivers/gpu/drm/i915/i915_request.c:793) i915
#7: ffff888106138668 (&sched_engine->lock/2){+.+.}-{2:2}, at: guc_submit_request (drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:2026) i915
#8: ffffffff90827b00 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock (./include/linux/rcupdate.h:303 ./include/linux/rcupdate.h:749 kernel/locking/spinlock_rt.c:50 kernel/locking/spinlock_rt.c:57)
#9: ffff88824084b370 (&ce->guc_state.lock){+.+.}-{2:2}, at: add_to_context (./include/linux/list.h:124 ./include/linux/list.h:215 ./include/linux/list.h:310 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:3434) i915
#10: ffffffff90827b00 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock (./include/linux/rcupdate.h:303 ./include/linux/rcupdate.h:749 kernel/locking/spinlock_rt.c:50 kernel/locking/spinlock_rt.c:57)
Hardware name: Framework Laptop (12th Gen Intel Core)/FRANGACP04, BIOS 03.04 07/15/2022
Call Trace:
<TASK>
dump_stack_lvl (lib/dump_stack.c:107)
__might_resched (kernel/sched/core.c:10318)
guc_context_set_prio (./drivers/gpu/drm/i915/gt/uc/intel_guc.h:330 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:625 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:2478 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:3333 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:3360) i915
add_to_context (drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:3414 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:3444) i915
? hwsp_offset (drivers/gpu/drm/i915/gt/gen8_engine_cs.c:420) i915
__i915_request_submit (drivers/gpu/drm/i915/i915_request.c:673) i915
guc_submit_request (drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:790 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:1990 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:2028) i915
submit_notify (drivers/gpu/drm/i915/i915_request.c:794) i915
__i915_sw_fence_complete (drivers/gpu/drm/i915/i915_sw_fence.c:131 drivers/gpu/drm/i915/i915_sw_fence.c:201 drivers/gpu/drm/i915/i915_sw_fence.c:191) i915
? __i915_request_queue (./include/linux/bottom_half.h:20 drivers/gpu/drm/i915/i915_request.c:1838) i915
__i915_request_queue (./include/linux/bottom_half.h:33 drivers/gpu/drm/i915/i915_request.c:1840) i915
eb_request_add (drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:3127) i915
i915_gem_do_execbuffer (drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:3148 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:3503) i915
? lock_acquire (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5755 kernel/locking/lockdep.c:5718)
? rt_mutex_slowunlock (kernel/locking/rtmutex.c:567 kernel/locking/rtmutex.c:1464)
? __lock_acquire (kernel/locking/lockdep.c:5136)
? find_held_lock (kernel/locking/lockdep.c:5243)
? local_clock_noinstr (kernel/sched/clock.c:301)
? __lock_release (kernel/locking/lockdep.c:339 kernel/locking/lockdep.c:352 kernel/locking/lockdep.c:5435)
i915_gem_execbuffer2_ioctl (drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:3609) i915
? __pfx_i915_gem_execbuffer2_ioctl (drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:3577) i915
drm_ioctl_kernel (drivers/gpu/drm/drm_ioctl.c:789) drm
drm_ioctl (drivers/gpu/drm/drm_ioctl.c:893) drm
? __pfx_i915_gem_execbuffer2_ioctl (drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:3577) i915
__x64_sys_ioctl (fs/ioctl.c:51 fs/ioctl.c:871 fs/ioctl.c:857 fs/ioctl.c:857)
do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
? do_syscall_64 (arch/x86/entry/common.c:87)
? do_syscall_64 (arch/x86/entry/common.c:87)
? do_syscall_64 (arch/x86/entry/common.c:87)
? lockdep_hardirqs_on (kernel/locking/lockdep.c:4422)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
RIP: 0033:0x7f935edd3c6b
Code: 73 01 c3 48 8b 0d b5 b1 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 85 b1 1b 00 f7 d8 64 89 01 48
All code
========
0: 73 01 jae 0x3
2: c3 retq
3: 48 8b 0d b5 b1 1b 00 mov 0x1bb1b5(%rip),%rcx # 0x1bb1bf
a: f7 d8 neg %eax
c: 64 89 01 mov %eax,%fs:(%rcx)
f: 48 83 c8 ff or $0xffffffffffffffff,%rax
13: c3 retq
14: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
1b: 00 00 00
1e: 90 nop
1f: f3 0f 1e fa endbr64
23: b8 10 00 00 00 mov $0x10,%eax
28: 0f 05 syscall
2a:* 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax <-- trapping instruction
30: 73 01 jae 0x33
32: c3 retq
33: 48 8b 0d 85 b1 1b 00 mov 0x1bb185(%rip),%rcx # 0x1bb1bf
3a: f7 d8 neg %eax
3c: 64 89 01 mov %eax,%fs:(%rcx)
3f: 48 rex.W
Code starting with the faulting instruction
===========================================
0: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax
6: 73 01 jae 0x9
8: c3 retq
9: 48 8b 0d 85 b1 1b 00 mov 0x1bb185(%rip),%rcx # 0x1bb195
10: f7 d8 neg %eax
12: 64 89 01 mov %eax,%fs:(%rcx)
15: 48 rex.W
RSP: 002b:00007ffca0a3b948 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000010 RCX: 00007f935edd3c6b
RDX: 00007ffca0a3b950 RSI: 0000000040406469 RDI: 0000000000000010
RBP: 00007ffca0a3b950 R08: 0000557be3c2fa00 R09: 0000557be05aea00
R10: 0000000000000002 R11: 0000000000000246 R12: 0000557be06fd4a0
R13: 0000557be3c315e0 R14: 0000557be0683bcc R15: 0000557be0683bb0
</TASK>
On 2023-09-29 04:43:32 [-0400], John B. Wyatt IV wrote:
> For stock (non-rt) I do not see it with 6.6-rc2. This was compiled
> with the Stream 9 debug config.
>
> I was able to reproduce similar call traces once I tested again
> with 6.6-rc3-rt5 at [4] and [5].
>
> What would be the best way to determine if the warning is wrongly
> triggered?
I looked at the traces in this email and they originate from a
might_sleep() in guc_context_set_prio(). The reason is that they check
at the atomic/interrupt state to figure out if they can sleep or not.
Both checks don't work on RT as intended and the former has a not to not
be used in drivers…
The snippet below should cure this. Could you test, please.
Sebastian
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 8dc291ff00935..5b8d084c9c58c 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -317,7 +317,7 @@ static inline int intel_guc_send_busy_loop(struct intel_guc *guc,
{
int err;
unsigned int sleep_period_ms = 1;
- bool not_atomic = !in_atomic() && !irqs_disabled();
+ bool not_atomic = !in_atomic() && !irqs_disabled() && !rcu_preempt_depth();
/*
* FIXME: Have caller pass in if we are in an atomic context to avoid
On Mon, Oct 02, 2023 at 11:45:45AM +0200, Sebastian Andrzej Siewior wrote:
> I looked at the traces in this email and they originate from a
> might_sleep() in guc_context_set_prio(). The reason is that they check
> at the atomic/interrupt state to figure out if they can sleep or not.
> Both checks don't work on RT as intended and the former has a not to not
> be used in drivers…
>
> The snippet below should cure this. Could you test, please.
>
> Sebastian
>
I tested this at both boot and suspend/resume. No call traces reported.
Thank you Sebastian.