2023-01-23 16:05:30

by Juri Lelli

[permalink] [raw]
Subject: [RT] i915 sleeping function from atomic in gen6_reset_engines()

Hi,

I've just noticed the following while testing v6.2-rc3-rt1.

# lspci -k | grep -EA3 'VGA|3D|Display'
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3 Processor Integrated Graphics Controller (rev 06)
DeviceName: Onboard IGD
Subsystem: Dell Device 0620
Kernel driver in use: i915

---

[ 11.960670] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
[ 11.960672] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 429, name: systemd-udevd
[ 11.960673] preempt_count: 1, expected: 0
[ 11.960674] RCU nest depth: 0, expected: 0
[ 11.960674] 2 locks held by systemd-udevd/429:
[ 11.960675] #0: ffff8e1f022b71b0 (&dev->mutex){....}-{3:3}, at: __driver_attach+0xcb/0x1d0
[ 11.960684] #1: ffff8e1e27d32ed8 (&uncore->lock){+.+.}-{2:2}, at: gen6_reset_engines+0x1b/0x50 [i915]
[ 11.960897] Preemption disabled at:
[ 11.960898] [<ffffffffc07d5d62>] __intel_gt_reset+0x92/0x100 [i915]
[ 11.961024] CPU: 3 PID: 429 Comm: systemd-udevd Not tainted 6.2.0-rc3-rt1 #1
[ 11.961025] Hardware name: Dell Inc. PowerEdge T20/0VD5HY, BIOS A03 11/25/2013
[ 11.961026] Call Trace:
[ 11.961027] <TASK>
[ 11.961029] ? __intel_gt_reset+0x92/0x100 [i915]
[ 11.961155] dump_stack_lvl+0x57/0x81
[ 11.961160] __might_resched.cold+0xf4/0x12f
[ 11.961165] rt_spin_lock+0x4a/0xe0
[ 11.961169] ? gen6_reset_engines+0x1b/0x50 [i915]
[ 11.961294] ? __pfx_gen6_reset_engines+0x10/0x10 [i915]
[ 11.961453] gen6_reset_engines+0x1b/0x50 [i915]
[ 11.961579] __intel_gt_reset+0xa0/0x100 [i915]
[ 11.961708] i915_driver_mmio_probe+0x126/0x220 [i915]
[ 11.961815] ? intel_gt_probe_all+0x70/0x280 [i915]
[ 11.961939] i915_driver_probe+0xd4/0x3c0 [i915]
[ 11.962047] ? i915_pci_probe+0x87/0x1b0 [i915]
[ 11.962160] local_pci_probe+0x45/0x80
[ 11.962167] pci_call_probe+0x56/0x170
[ 11.962173] pci_device_probe+0x79/0xf0
[ 11.962175] ? driver_sysfs_add+0x73/0xd0
[ 11.962179] really_probe+0xe1/0x390
[ 11.962181] ? pm_runtime_barrier+0x50/0x90
[ 11.962187] __driver_probe_device+0x78/0x180
[ 11.962191] driver_probe_device+0x1e/0x90
[ 11.962195] __driver_attach+0xd6/0x1d0
[ 11.962197] ? __pfx___driver_attach+0x10/0x10
[ 11.962201] bus_for_each_dev+0x7b/0xc0
[ 11.962206] bus_add_driver+0x1b2/0x200
[ 11.962212] driver_register+0x8f/0xf0
[ 11.962215] i915_init+0x20/0x80 [i915]
[ 11.962337] ? __pfx_init_module+0x10/0x10 [i915]
[ 11.962461] do_one_initcall+0x47/0x190
[ 11.962467] ? rcu_read_lock_sched_held+0x43/0x80
[ 11.962471] ? trace_kmalloc+0x3c/0x160
[ 11.962473] ? kmalloc_trace+0x44/0x60
[ 11.962476] do_init_module+0x4c/0x200
[ 11.962480] __do_sys_finit_module+0xb4/0x130
[ 11.962497] do_syscall_64+0x5c/0x90
[ 11.962505] ? do_syscall_64+0x69/0x90
[ 11.962507] ? lockdep_hardirqs_on+0x79/0x100
[ 11.962511] ? do_syscall_64+0x69/0x90
[ 11.962513] ? do_syscall_64+0x69/0x90
[ 11.962515] ? lockdep_hardirqs_on+0x79/0x100
[ 11.962518] ? do_syscall_64+0x69/0x90
[ 11.962520] ? lockdep_hardirqs_on+0x79/0x100
[ 11.962523] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 11.962526] RIP: 0033:0x7fa35d020e5d
[ 11.962529] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89
d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73
01 c3 48 8b 0d 93 af 1b 00 f7 d8 64 89 01 48
[ 11.962530] RSP: 002b:00007ffd62f23a78 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 11.962532] RAX: ffffffffffffffda RBX: 000056058a1429e0 RCX: 00007fa35d020e5d
[ 11.962533] RDX: 0000000000000000 RSI: 00007fa35d25232c RDI: 0000000000000017
[ 11.962534] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000002
[ 11.962534] R10: 0000000000000017 R11: 0000000000000246 R12: 00007fa35d25232c
[ 11.962535] R13: 000056058a147360 R14: 0000000000000007 R15: 000056058a12b790
[ 11.962548] </TASK>
[ 11.964024] i915 0000:00:02.0: vgaarb: deactivate vga console
[ 11.974095] Console: switching to colour dummy device 80x25
[ 11.981308] i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[ 12.041808] [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 0

---

Looks like this discussion might be relevant [1], but I'm not sure if
progress has been made after that.

Thanks!
Juri

1 - https://lore.kernel.org/all/[email protected]/



2023-07-10 07:44:24

by Juri Lelli

[permalink] [raw]
Subject: Re: [RT] i915 sleeping function from atomic in gen6_reset_engines()

Hi!

On 23/01/23 17:04, Juri Lelli wrote:
> Hi,
>
> I've just noticed the following while testing v6.2-rc3-rt1.

I'm still seeing the following on v6.4-rt6.

I believe [email protected] should cure it,
but I don't think it did go anywhere?

> # lspci -k | grep -EA3 'VGA|3D|Display'
> 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3 Processor Integrated Graphics Controller (rev 06)
> DeviceName: Onboard IGD
> Subsystem: Dell Device 0620
> Kernel driver in use: i915
>
> ---
>
> [ 11.960670] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
> [ 11.960672] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 429, name: systemd-udevd
> [ 11.960673] preempt_count: 1, expected: 0
> [ 11.960674] RCU nest depth: 0, expected: 0
> [ 11.960674] 2 locks held by systemd-udevd/429:
> [ 11.960675] #0: ffff8e1f022b71b0 (&dev->mutex){....}-{3:3}, at: __driver_attach+0xcb/0x1d0
> [ 11.960684] #1: ffff8e1e27d32ed8 (&uncore->lock){+.+.}-{2:2}, at: gen6_reset_engines+0x1b/0x50 [i915]
> [ 11.960897] Preemption disabled at:
> [ 11.960898] [<ffffffffc07d5d62>] __intel_gt_reset+0x92/0x100 [i915]
> [ 11.961024] CPU: 3 PID: 429 Comm: systemd-udevd Not tainted 6.2.0-rc3-rt1 #1
> [ 11.961025] Hardware name: Dell Inc. PowerEdge T20/0VD5HY, BIOS A03 11/25/2013
> [ 11.961026] Call Trace:
> [ 11.961027] <TASK>
> [ 11.961029] ? __intel_gt_reset+0x92/0x100 [i915]
> [ 11.961155] dump_stack_lvl+0x57/0x81
> [ 11.961160] __might_resched.cold+0xf4/0x12f
> [ 11.961165] rt_spin_lock+0x4a/0xe0
> [ 11.961169] ? gen6_reset_engines+0x1b/0x50 [i915]
> [ 11.961294] ? __pfx_gen6_reset_engines+0x10/0x10 [i915]
> [ 11.961453] gen6_reset_engines+0x1b/0x50 [i915]
> [ 11.961579] __intel_gt_reset+0xa0/0x100 [i915]
> [ 11.961708] i915_driver_mmio_probe+0x126/0x220 [i915]
> [ 11.961815] ? intel_gt_probe_all+0x70/0x280 [i915]
> [ 11.961939] i915_driver_probe+0xd4/0x3c0 [i915]
> [ 11.962047] ? i915_pci_probe+0x87/0x1b0 [i915]
> [ 11.962160] local_pci_probe+0x45/0x80
> [ 11.962167] pci_call_probe+0x56/0x170
> [ 11.962173] pci_device_probe+0x79/0xf0
> [ 11.962175] ? driver_sysfs_add+0x73/0xd0
> [ 11.962179] really_probe+0xe1/0x390
> [ 11.962181] ? pm_runtime_barrier+0x50/0x90
> [ 11.962187] __driver_probe_device+0x78/0x180
> [ 11.962191] driver_probe_device+0x1e/0x90
> [ 11.962195] __driver_attach+0xd6/0x1d0
> [ 11.962197] ? __pfx___driver_attach+0x10/0x10
> [ 11.962201] bus_for_each_dev+0x7b/0xc0
> [ 11.962206] bus_add_driver+0x1b2/0x200
> [ 11.962212] driver_register+0x8f/0xf0
> [ 11.962215] i915_init+0x20/0x80 [i915]
> [ 11.962337] ? __pfx_init_module+0x10/0x10 [i915]
> [ 11.962461] do_one_initcall+0x47/0x190
> [ 11.962467] ? rcu_read_lock_sched_held+0x43/0x80
> [ 11.962471] ? trace_kmalloc+0x3c/0x160
> [ 11.962473] ? kmalloc_trace+0x44/0x60
> [ 11.962476] do_init_module+0x4c/0x200
> [ 11.962480] __do_sys_finit_module+0xb4/0x130
> [ 11.962497] do_syscall_64+0x5c/0x90
> [ 11.962505] ? do_syscall_64+0x69/0x90
> [ 11.962507] ? lockdep_hardirqs_on+0x79/0x100
> [ 11.962511] ? do_syscall_64+0x69/0x90
> [ 11.962513] ? do_syscall_64+0x69/0x90
> [ 11.962515] ? lockdep_hardirqs_on+0x79/0x100
> [ 11.962518] ? do_syscall_64+0x69/0x90
> [ 11.962520] ? lockdep_hardirqs_on+0x79/0x100
> [ 11.962523] entry_SYSCALL_64_after_hwframe+0x72/0xdc
> [ 11.962526] RIP: 0033:0x7fa35d020e5d
> [ 11.962529] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89
> d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73
> 01 c3 48 8b 0d 93 af 1b 00 f7 d8 64 89 01 48
> [ 11.962530] RSP: 002b:00007ffd62f23a78 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> [ 11.962532] RAX: ffffffffffffffda RBX: 000056058a1429e0 RCX: 00007fa35d020e5d
> [ 11.962533] RDX: 0000000000000000 RSI: 00007fa35d25232c RDI: 0000000000000017
> [ 11.962534] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000002
> [ 11.962534] R10: 0000000000000017 R11: 0000000000000246 R12: 00007fa35d25232c
> [ 11.962535] R13: 000056058a147360 R14: 0000000000000007 R15: 000056058a12b790
> [ 11.962548] </TASK>
> [ 11.964024] i915 0000:00:02.0: vgaarb: deactivate vga console
> [ 11.974095] Console: switching to colour dummy device 80x25
> [ 11.981308] i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
> [ 12.041808] [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 0
>
> ---

Thanks,
Juri


2023-07-10 10:18:51

by Mike Galbraith

[permalink] [raw]
Subject: Re: [RT] i915 sleeping function from atomic in gen6_reset_engines()

On Mon, 2023-07-10 at 09:21 +0200, Juri Lelli wrote:
> Hi!
>
> On 23/01/23 17:04, Juri Lelli wrote:
> > Hi,
> >
> > I've just noticed the following while testing v6.2-rc3-rt1.
>
> I'm still seeing the following on v6.4-rt6.
>
> I believe [email protected] should cure it,
> but I don't think it did go anywhere?

The raw lock cyclictest deltas don't look great. I shut it up with the
below. For my i915 equipped lappy, a raw lock fix would be just fine,
its cyclictest numbers already being well south of 'oh dear'.

Subject: drm,i915: Don't disable preemption in __intel_gt_reset() for PREEMPT_RT
From: Mike Galbraith <[email protected]>
Date: Wed Dec 7 10:13:21 CET 2022

BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 330, name: systemd-udevd
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
Preemption disabled at:
[<ffffffffa090e08e>] __intel_gt_reset+0x6e/0xe0 [i915]
CPU: 0 PID: 330 Comm: systemd-udevd Tainted: G I E 6.1.0.g8ed710d-master-rt #27
Hardware name: HP HP Spectre x360 Convertible/804F, BIOS F.47 11/22/2017
Call Trace:
<TASK>
dump_stack_lvl+0x33/0x46
? __intel_gt_reset+0x6e/0xe0 [i915]
__might_resched+0x162/0x1b0
? __gen11_reset_engines.isra.21+0x2d0/0x2d0 [i915]
rt_spin_lock+0x2d/0x70
gen8_reset_engines+0x33/0x220 [i915]
? __gen11_reset_engines.isra.21+0x2d0/0x2d0 [i915]
__intel_gt_reset+0x79/0xe0 [i915]
sanitize_gpu.part.17+0x2d/0x40 [i915]
i915_driver_probe+0x7b8/0xf20 [i915]
...

Replace the preempt_disable()/enable() with a local_lock()/unlock().

Signed-off-by: Mike Galbraith <[email protected]>
---
drivers/gpu/drm/i915/gt/intel_reset.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)

--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -6,6 +6,7 @@
#include <linux/sched/mm.h>
#include <linux/stop_machine.h>
#include <linux/string_helpers.h>
+#include <linux/local_lock.h>

#include "display/intel_display_reset.h"
#include "display/intel_overlay.h"
@@ -766,6 +767,14 @@ wa_14015076503_end(struct intel_gt *gt,
HECI_H_GS1_ER_PREP, 0);
}

+struct reset_lock {
+ local_lock_t lock;
+};
+
+static DEFINE_PER_CPU(struct reset_lock, reset_lock) = {
+ .lock = INIT_LOCAL_LOCK(lock),
+};
+
int __intel_gt_reset(struct intel_gt *gt, intel_engine_mask_t engine_mask)
{
const int retries = engine_mask == ALL_ENGINES ? RESET_MAX_RETRIES : 1;
@@ -788,9 +797,9 @@ int __intel_gt_reset(struct intel_gt *gt
reset_mask = wa_14015076503_start(gt, engine_mask, !retry);

GT_TRACE(gt, "engine_mask=%x\n", reset_mask);
- preempt_disable();
+ local_lock(&reset_lock.lock);
ret = reset(gt, reset_mask, retry);
- preempt_enable();
+ local_unlock(&reset_lock.lock);

wa_14015076503_end(gt, reset_mask);
}


2023-07-10 12:06:49

by Juri Lelli

[permalink] [raw]
Subject: Re: [RT] i915 sleeping function from atomic in gen6_reset_engines()

On 10/07/23 12:09, Mike Galbraith wrote:
> On Mon, 2023-07-10 at 09:21 +0200, Juri Lelli wrote:
> > Hi!
> >
> > On 23/01/23 17:04, Juri Lelli wrote:
> > > Hi,
> > >
> > > I've just noticed the following while testing v6.2-rc3-rt1.
> >
> > I'm still seeing the following on v6.4-rt6.
> >
> > I believe [email protected]?should cure it,
> > but I don't think it did go anywhere?
>
> The raw lock cyclictest deltas don't look great. I shut it up with the
> below. For my i915 equipped lappy, a raw lock fix would be just fine,
> its cyclictest numbers already being well south of 'oh dear'.

Right, thanks for sharing!

Looks like Sebastian is on holiday, so I guess we'll need to wait till
he's back to see which solution he prefer.

Best,
Juri