2023-08-07 00:33:54

by Borislav Petkov

[permalink] [raw]
Subject: 2b5d1c29f6c4 ("drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts")

Hi folks,

the patch in $Subject breaks booting here on one of my test boxes, see
below.

Reverting it ontop of -rc4 fixes the issue.

Thx.

[ 3.580535] ACPI: \_PR_.CP04: Found 4 idle states
[ 3.585694] ACPI: \_PR_.CP05: Found 4 idle states
[ 3.590852] ACPI: \_PR_.CP06: Found 4 idle states
[ 3.596037] ACPI: \_PR_.CP07: Found 4 idle states
[ 3.644065] Freeing initrd memory: 6740K
[ 3.742932] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[ 3.750409] 00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
[ 3.762111] serial 0000:00:16.3: enabling device (0000 -> 0003)
[ 3.771589] 0000:00:16.3: ttyS1 at I/O 0xf0a0 (irq = 17, base_baud = 115200) is a 16550A
[ 3.782503] Linux agpgart interface v0.103
[ 3.787805] ACPI: bus type drm_connector registered

<--- boot stops here.

It should continue with this:

[ 3.795491] Console: switching to colour dummy device 80x25
[ 3.801933] nouveau 0000:03:00.0: vgaarb: deactivate vga console
[ 3.808303] nouveau 0000:03:00.0: NVIDIA GT218 (0a8c00b1)
[ 3.931002] nouveau 0000:03:00.0: bios: version 70.18.83.00.08
[ 3.941731] nouveau 0000:03:00.0: fb: 512 MiB DDR3
[ 4.110348] tsc: Refined TSC clocksource calibration: 3591.349 MHz
[ 4.116627] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x33c466a1ab5, max_idle_ns: 440795209767 ns
[ 4.126871] clocksource: Switched to clocksource tsc
[ 4.252013] nouveau 0000:03:00.0: DRM: VRAM: 512 MiB
[ 4.257088] nouveau 0000:03:00.0: DRM: GART: 1048576 MiB
[ 4.262501] nouveau 0000:03:00.0: DRM: TMDS table version 2.0
[ 4.268333] nouveau 0000:03:00.0: DRM: DCB version 4.0
[ 4.273561] nouveau 0000:03:00.0: DRM: DCB outp 00: 02000360 00000000
[ 4.280104] nouveau 0000:03:00.0: DRM: DCB outp 01: 02000362 00020010
[ 4.286630] nouveau 0000:03:00.0: DRM: DCB outp 02: 028003a6 0f220010
[ 4.293176] nouveau 0000:03:00.0: DRM: DCB outp 03: 01011380 00000000
[ 4.299711] nouveau 0000:03:00.0: DRM: DCB outp 04: 08011382 00020010
[ 4.306243] nouveau 0000:03:00.0: DRM: DCB outp 05: 088113c6 0f220010
[ 4.312772] nouveau 0000:03:00.0: DRM: DCB conn 00: 00101064
[ 4.318520] nouveau 0000:03:00.0: DRM: DCB conn 01: 00202165
[ 4.329488] nouveau 0000:03:00.0: DRM: MM: using COPY for buffer copies
[ 4.336261] stackdepot: allocating hash table of 1048576 entries via kvcalloc
...


--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


Subject: Re: 2b5d1c29f6c4 ("drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts")

[CCing the regression list, as it should be in the loop for regressions:
https://docs.kernel.org/admin-guide/reporting-regressions.html]

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 06.08.23 23:31, Borislav Petkov wrote:
>
> the patch in $Subject

Side note, in case anyone cares: it was also included in 6.4.7.

> breaks booting here on one of my test boxes, see
> below.
>
> Reverting it ontop of -rc4 fixes the issue.
>
> Thx.

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced 2b5d1c29f6c4
#regzbot title drm/nouveau: stopped booting
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

> [ 3.580535] ACPI: \_PR_.CP04: Found 4 idle states
> [ 3.585694] ACPI: \_PR_.CP05: Found 4 idle states
> [ 3.590852] ACPI: \_PR_.CP06: Found 4 idle states
> [ 3.596037] ACPI: \_PR_.CP07: Found 4 idle states
> [ 3.644065] Freeing initrd memory: 6740K
> [ 3.742932] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> [ 3.750409] 00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
> [ 3.762111] serial 0000:00:16.3: enabling device (0000 -> 0003)
> [ 3.771589] 0000:00:16.3: ttyS1 at I/O 0xf0a0 (irq = 17, base_baud = 115200) is a 16550A
> [ 3.782503] Linux agpgart interface v0.103
> [ 3.787805] ACPI: bus type drm_connector registered
>
> <--- boot stops here.
>
> It should continue with this:
>
> [ 3.795491] Console: switching to colour dummy device 80x25
> [ 3.801933] nouveau 0000:03:00.0: vgaarb: deactivate vga console
> [ 3.808303] nouveau 0000:03:00.0: NVIDIA GT218 (0a8c00b1)
> [ 3.931002] nouveau 0000:03:00.0: bios: version 70.18.83.00.08
> [ 3.941731] nouveau 0000:03:00.0: fb: 512 MiB DDR3
> [ 4.110348] tsc: Refined TSC clocksource calibration: 3591.349 MHz
> [ 4.116627] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x33c466a1ab5, max_idle_ns: 440795209767 ns
> [ 4.126871] clocksource: Switched to clocksource tsc
> [ 4.252013] nouveau 0000:03:00.0: DRM: VRAM: 512 MiB
> [ 4.257088] nouveau 0000:03:00.0: DRM: GART: 1048576 MiB
> [ 4.262501] nouveau 0000:03:00.0: DRM: TMDS table version 2.0
> [ 4.268333] nouveau 0000:03:00.0: DRM: DCB version 4.0
> [ 4.273561] nouveau 0000:03:00.0: DRM: DCB outp 00: 02000360 00000000
> [ 4.280104] nouveau 0000:03:00.0: DRM: DCB outp 01: 02000362 00020010
> [ 4.286630] nouveau 0000:03:00.0: DRM: DCB outp 02: 028003a6 0f220010
> [ 4.293176] nouveau 0000:03:00.0: DRM: DCB outp 03: 01011380 00000000
> [ 4.299711] nouveau 0000:03:00.0: DRM: DCB outp 04: 08011382 00020010
> [ 4.306243] nouveau 0000:03:00.0: DRM: DCB outp 05: 088113c6 0f220010
> [ 4.312772] nouveau 0000:03:00.0: DRM: DCB conn 00: 00101064
> [ 4.318520] nouveau 0000:03:00.0: DRM: DCB conn 01: 00202165
> [ 4.329488] nouveau 0000:03:00.0: DRM: MM: using COPY for buffer copies
> [ 4.336261] stackdepot: allocating hash table of 1048576 entries via kvcalloc
> ...
>
>

2023-08-07 12:52:14

by Karol Herbst

[permalink] [raw]
Subject: Re: 2b5d1c29f6c4 ("drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts")

On Sun, Aug 6, 2023 at 11:40 PM Borislav Petkov <[email protected]> wrote:
>
> Hi folks,
>
> the patch in $Subject breaks booting here on one of my test boxes, see
> below.
>
> Reverting it ontop of -rc4 fixes the issue.
>
> Thx.
>
> [ 3.580535] ACPI: \_PR_.CP04: Found 4 idle states
> [ 3.585694] ACPI: \_PR_.CP05: Found 4 idle states
> [ 3.590852] ACPI: \_PR_.CP06: Found 4 idle states
> [ 3.596037] ACPI: \_PR_.CP07: Found 4 idle states
> [ 3.644065] Freeing initrd memory: 6740K
> [ 3.742932] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> [ 3.750409] 00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
> [ 3.762111] serial 0000:00:16.3: enabling device (0000 -> 0003)
> [ 3.771589] 0000:00:16.3: ttyS1 at I/O 0xf0a0 (irq = 17, base_baud = 115200) is a 16550A
> [ 3.782503] Linux agpgart interface v0.103
> [ 3.787805] ACPI: bus type drm_connector registered
>
> <--- boot stops here.
>

in what way does it stop? Just not progressing? That would be kinda
concerning. Mind tracing with what arguments `nvkm_uevent_add` is
called with and without that patch?

Also a boot log with `nouveau.debug=trace` might be helpful here.

> It should continue with this:
>
> [ 3.795491] Console: switching to colour dummy device 80x25
> [ 3.801933] nouveau 0000:03:00.0: vgaarb: deactivate vga console
> [ 3.808303] nouveau 0000:03:00.0: NVIDIA GT218 (0a8c00b1)
> [ 3.931002] nouveau 0000:03:00.0: bios: version 70.18.83.00.08
> [ 3.941731] nouveau 0000:03:00.0: fb: 512 MiB DDR3
> [ 4.110348] tsc: Refined TSC clocksource calibration: 3591.349 MHz
> [ 4.116627] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x33c466a1ab5, max_idle_ns: 440795209767 ns
> [ 4.126871] clocksource: Switched to clocksource tsc
> [ 4.252013] nouveau 0000:03:00.0: DRM: VRAM: 512 MiB
> [ 4.257088] nouveau 0000:03:00.0: DRM: GART: 1048576 MiB
> [ 4.262501] nouveau 0000:03:00.0: DRM: TMDS table version 2.0
> [ 4.268333] nouveau 0000:03:00.0: DRM: DCB version 4.0
> [ 4.273561] nouveau 0000:03:00.0: DRM: DCB outp 00: 02000360 00000000
> [ 4.280104] nouveau 0000:03:00.0: DRM: DCB outp 01: 02000362 00020010
> [ 4.286630] nouveau 0000:03:00.0: DRM: DCB outp 02: 028003a6 0f220010
> [ 4.293176] nouveau 0000:03:00.0: DRM: DCB outp 03: 01011380 00000000
> [ 4.299711] nouveau 0000:03:00.0: DRM: DCB outp 04: 08011382 00020010
> [ 4.306243] nouveau 0000:03:00.0: DRM: DCB outp 05: 088113c6 0f220010
> [ 4.312772] nouveau 0000:03:00.0: DRM: DCB conn 00: 00101064
> [ 4.318520] nouveau 0000:03:00.0: DRM: DCB conn 01: 00202165
> [ 4.329488] nouveau 0000:03:00.0: DRM: MM: using COPY for buffer copies
> [ 4.336261] stackdepot: allocating hash table of 1048576 entries via kvcalloc
> ...
>
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
>


2023-08-07 15:41:24

by Borislav Petkov

[permalink] [raw]
Subject: Re: 2b5d1c29f6c4 ("drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts")

On Mon, Aug 07, 2023 at 01:49:42PM +0200, Karol Herbst wrote:
> in what way does it stop? Just not progressing? That would be kinda
> concerning. Mind tracing with what arguments `nvkm_uevent_add` is
> called with and without that patch?

Well, me dumping those args I guess made the box not freeze before
catching a #PF over serial. Does that help?

....
[ 3.410135] Unpacking initramfs...
[ 3.416319] software IO TLB: mapped [mem 0x00000000a877d000-0x00000000ac77d000] (64MB)
[ 3.418227] Initialise system trusted keyrings
[ 3.432273] workingset: timestamp_bits=56 max_order=22 bucket_order=0
[ 3.439006] ntfs: driver 2.1.32 [Flags: R/W].
[ 3.443368] fuse: init (API version 7.38)
[ 3.447601] 9p: Installing v9fs 9p2000 file system support
[ 3.453223] Key type asymmetric registered
[ 3.457332] Asymmetric key parser 'x509' registered
[ 3.462236] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 250)
[ 3.475865] efifb: probing for efifb
[ 3.479458] efifb: framebuffer at 0xf9000000, using 1920k, total 1920k
[ 3.485969] efifb: mode is 800x600x32, linelength=3200, pages=1
[ 3.491872] efifb: scrolling: redraw
[ 3.495438] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
[ 3.502349] Console: switching to colour frame buffer device 100x37
[ 3.509564] fb0: EFI VGA frame buffer device
[ 3.514013] ACPI: \_PR_.CP00: Found 4 idle states
[ 3.518850] ACPI: \_PR_.CP01: Found 4 idle states
[ 3.523687] ACPI: \_PR_.CP02: Found 4 idle states
[ 3.528515] ACPI: \_PR_.CP03: Found 4 idle states
[ 3.533346] ACPI: \_PR_.CP04: Found 4 idle states
[ 3.538173] ACPI: \_PR_.CP05: Found 4 idle states
[ 3.543003] ACPI: \_PR_.CP06: Found 4 idle states
[ 3.544219] Freeing initrd memory: 8196K
[ 3.547844] ACPI: \_PR_.CP07: Found 4 idle states
[ 3.609542] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[ 3.616224] 00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
[ 3.625552] serial 0000:00:16.3: enabling device (0000 -> 0003)
[ 3.633034] 0000:00:16.3: ttyS1 at I/O 0xf0a0 (irq = 17, base_baud = 115200) is a 16550A
[ 3.642451] Linux agpgart interface v0.103
[ 3.647141] ACPI: bus type drm_connector registered
[ 3.653261] Console: switching to colour dummy device 80x25
[ 3.659092] nouveau 0000:03:00.0: vgaarb: deactivate vga console
[ 3.665174] nouveau 0000:03:00.0: NVIDIA GT218 (0a8c00b1)
[ 3.784585] nouveau 0000:03:00.0: bios: version 70.18.83.00.08
[ 3.792244] nouveau 0000:03:00.0: fb: 512 MiB DDR3
[ 3.948786] nouveau 0000:03:00.0: DRM: VRAM: 512 MiB
[ 3.953755] nouveau 0000:03:00.0: DRM: GART: 1048576 MiB
[ 3.959073] nouveau 0000:03:00.0: DRM: TMDS table version 2.0
[ 3.964808] nouveau 0000:03:00.0: DRM: DCB version 4.0
[ 3.969938] nouveau 0000:03:00.0: DRM: DCB outp 00: 02000360 00000000
[ 3.976367] nouveau 0000:03:00.0: DRM: DCB outp 01: 02000362 00020010
[ 3.982792] nouveau 0000:03:00.0: DRM: DCB outp 02: 028003a6 0f220010
[ 3.989223] nouveau 0000:03:00.0: DRM: DCB outp 03: 01011380 00000000
[ 3.995647] nouveau 0000:03:00.0: DRM: DCB outp 04: 08011382 00020010
[ 4.002076] nouveau 0000:03:00.0: DRM: DCB outp 05: 088113c6 0f220010
[ 4.008511] nouveau 0000:03:00.0: DRM: DCB conn 00: 00101064
[ 4.014151] nouveau 0000:03:00.0: DRM: DCB conn 01: 00202165
[ 4.021710] nvkm_uevent_add: uevent: 0xffff888100242100, event: 0xffff8881022de1a0, id: 0x0, bits: 0x1, func: 0x0000000000000000
[ 4.033680] nvkm_uevent_add: uevent: 0xffff888100242300, event: 0xffff8881022de1a0, id: 0x0, bits: 0x1, func: 0x0000000000000000
[ 4.045429] nouveau 0000:03:00.0: DRM: MM: using COPY for buffer copies
[ 4.052059] stackdepot: allocating hash table of 1048576 entries via kvcalloc
[ 4.067191] nvkm_uevent_add: uevent: 0xffff888100242800, event: 0xffff888104b3e260, id: 0x0, bits: 0x1, func: 0x0000000000000000
[ 4.078936] nvkm_uevent_add: uevent: 0xffff888100242900, event: 0xffff888104b3e260, id: 0x1, bits: 0x1, func: 0x0000000000000000
[ 4.090514] nvkm_uevent_add: uevent: 0xffff888100242a00, event: 0xffff888102091f28, id: 0x1, bits: 0x3, func: 0xffffffff8177b700
[ 4.102118] tsc: Refined TSC clocksource calibration: 3591.345 MHz
[ 4.108342] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x33c4635c383, max_idle_ns: 440795314831 ns
[ 4.108401] nvkm_uevent_add: uevent: 0xffff8881020b6000, event: 0xffff888102091f28, id: 0xf, bits: 0x3, func: 0xffffffff8177b700
[ 4.129864] clocksource: Switched to clocksource tsc
[ 4.131478] [drm] Initialized nouveau 1.3.1 20120801 for 0000:03:00.0 on minor 0
[ 4.143806] BUG: kernel NULL pointer dereference, address: 0000000000000020
[ 4.144676] #PF: supervisor read access in kernel mode
[ 4.144676] #PF: error_code(0x0000) - not-present page
[ 4.144676] PGD 0 P4D 0
[ 4.144676] Oops: 0000 [#1] PREEMPT SMP PTI
[ 4.144676] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc5-dirty #1
[ 4.144676] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A13 05/11/2014
[ 4.144676] RIP: 0010:nvif_object_mthd+0x136/0x1e0
[ 4.144676] Code: f2 4c 89 ee 48 8d 7c 24 20 66 89 04 24 c6 44 24 18 00 e8 8d 04 4e 00 41 8d 56 20 49 8b 44 24 08 83 fa 17 76 7d 49 39 c4 74 45 <48> 8b 78 20 4c 89 64 24 10 48 8b 40 38 c6 44 24 06 ff 31 c9 48 89
[ 4.144676] RSP: 0000:ffffc90000023888 EFLAGS: 00010282
[ 4.144676] RAX: 0000000000000000 RBX: ffff8881003bc000 RCX: 0000000000000008
[ 4.144676] RDX: 0000000000000028 RSI: ffffc90000023948 RDI: ffffc900000238a8
[ 4.144676] RBP: ffff8881003bc620 R08: ffff888102170000 R09: ffff888102170000
[ 4.144676] R10: 0000000000000002 R11: 0000000000000001 R12: ffff8881003bc620
[ 4.144676] R13: ffffc90000023948 R14: 0000000000000008 R15: 0000000000000000
[ 4.144676] FS: 0000000000000000(0000) GS:ffff88843a700000(0000) knlGS:0000000000000000
[ 4.144676] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4.144676] CR2: 0000000000000020 CR3: 000000000641e001 CR4: 00000000000606e0
[ 4.144676] Call Trace:
[ 4.144676] <TASK>
[ 4.144676] ? __die+0x20/0x70
[ 4.144676] ? page_fault_oops+0x14c/0x430
[ 4.144676] ? fixup_exception+0x22/0x340
[ 4.144676] ? kernelmode_fixup_or_oops+0x84/0x110
[ 4.144676] ? exc_page_fault+0x66/0x1b0
[ 4.144676] ? asm_exc_page_fault+0x22/0x30
[ 4.144676] ? nvif_object_mthd+0x136/0x1e0
[ 4.144676] ? nvif_object_mthd+0x123/0x1e0
[ 4.144676] ? rcu_is_watching+0xd/0x40
[ 4.144676] ? __mutex_lock+0xc9/0x790
[ 4.144676] ? nouveau_dp_detect+0x67/0x4e0
[ 4.144676] nvif_conn_hpd_status+0x22/0xd0
[ 4.144676] nouveau_dp_detect+0x33b/0x4e0
[ 4.144676] ? rt_mutex_unlock+0xf5/0x110
[ 4.144676] nouveau_connector_detect+0x10f/0x470
[ 4.144676] drm_helper_probe_detect+0x81/0xa0
[ 4.144676] drm_helper_probe_single_connector_modes+0x441/0x510
[ 4.144676] drm_client_modeset_probe+0x1f8/0xca0
[ 4.144676] __drm_fb_helper_initial_config_and_unlock+0x34/0x560
[ 4.144676] ? __mutex_lock+0xc9/0x790
[ 4.144676] ? drm_client_register+0x22/0xa0
[ 4.144676] drm_fbdev_generic_client_hotplug+0x66/0xc0
[ 4.144676] drm_client_register+0x64/0xa0
[ 4.144676] nouveau_drm_probe+0x20d/0x230
[ 4.144676] local_pci_probe+0x46/0xa0
[ 4.144676] pci_device_probe+0xaf/0x200
[ 4.144676] really_probe+0xc2/0x2d0
[ 4.144676] __driver_probe_device+0x73/0x120
[ 4.144676] driver_probe_device+0x1e/0xe0
[ 4.144676] __driver_attach+0x8a/0x190
[ 4.144676] ? __pfx___driver_attach+0x10/0x10
[ 4.144676] bus_for_each_dev+0x6a/0xb0
[ 4.144676] bus_add_driver+0xeb/0x1f0
[ 4.144676] driver_register+0x5c/0x120
[ 4.144676] ? __pfx_nouveau_drm_init+0x10/0x10
[ 4.144676] do_one_initcall+0x5b/0x280
[ 4.144676] kernel_init_freeable+0x186/0x2f0
[ 4.144676] ? __pfx_kernel_init+0x10/0x10
[ 4.144676] kernel_init+0x16/0x1b0
[ 4.144676] ret_from_fork+0x30/0x50
[ 4.144676] ? __pfx_kernel_init+0x10/0x10
[ 4.144676] ret_from_fork_asm+0x1b/0x30
[ 4.144676] </TASK>
[ 4.144676] Modules linked in:
[ 4.144676] CR2: 0000000000000020
[ 4.144676] ---[ end trace 0000000000000000 ]---
[ 4.144676] RIP: 0010:nvif_object_mthd+0x136/0x1e0
[ 4.144676] Code: f2 4c 89 ee 48 8d 7c 24 20 66 89 04 24 c6 44 24 18 00 e8 8d 04 4e 00 41 8d 56 20 49 8b 44 24 08 83 fa 17 76 7d 49 39 c4 74 45 <48> 8b 78 20 4c 89 64 24 10 48 8b 40 38 c6 44 24 06 ff 31 c9 48 89
[ 4.144676] RSP: 0000:ffffc90000023888 EFLAGS: 00010282
[ 4.144676] RAX: 0000000000000000 RBX: ffff8881003bc000 RCX: 0000000000000008
[ 4.144676] RDX: 0000000000000028 RSI: ffffc90000023948 RDI: ffffc900000238a8
[ 4.144676] RBP: ffff8881003bc620 R08: ffff888102170000 R09: ffff888102170000
[ 4.144676] R10: 0000000000000002 R11: 0000000000000001 R12: ffff8881003bc620
[ 4.144676] R13: ffffc90000023948 R14: 0000000000000008 R15: 0000000000000000
[ 4.144676] FS: 0000000000000000(0000) GS:ffff88843a700000(0000) knlGS:0000000000000000
[ 4.144676] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4.144676] CR2: 0000000000000020 CR3: 000000000641e001 CR4: 00000000000606e0
[ 4.144676] note: swapper/0[1] exited with irqs disabled
[ 4.549714] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
[ 4.550687] Kernel Offset: disabled
[ 4.550687] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 ]---

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-08-08 18:03:32

by Karol Herbst

[permalink] [raw]
Subject: Re: 2b5d1c29f6c4 ("drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts")

On Mon, Aug 7, 2023 at 5:05 PM Borislav Petkov <[email protected]> wrote:
>
> On Mon, Aug 07, 2023 at 01:49:42PM +0200, Karol Herbst wrote:
> > in what way does it stop? Just not progressing? That would be kinda
> > concerning. Mind tracing with what arguments `nvkm_uevent_add` is
> > called with and without that patch?
>
> Well, me dumping those args I guess made the box not freeze before
> catching a #PF over serial. Does that help?
>
> ....
> [ 3.410135] Unpacking initramfs...
> [ 3.416319] software IO TLB: mapped [mem 0x00000000a877d000-0x00000000ac77d000] (64MB)
> [ 3.418227] Initialise system trusted keyrings
> [ 3.432273] workingset: timestamp_bits=56 max_order=22 bucket_order=0
> [ 3.439006] ntfs: driver 2.1.32 [Flags: R/W].
> [ 3.443368] fuse: init (API version 7.38)
> [ 3.447601] 9p: Installing v9fs 9p2000 file system support
> [ 3.453223] Key type asymmetric registered
> [ 3.457332] Asymmetric key parser 'x509' registered
> [ 3.462236] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 250)
> [ 3.475865] efifb: probing for efifb
> [ 3.479458] efifb: framebuffer at 0xf9000000, using 1920k, total 1920k
> [ 3.485969] efifb: mode is 800x600x32, linelength=3200, pages=1
> [ 3.491872] efifb: scrolling: redraw
> [ 3.495438] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
> [ 3.502349] Console: switching to colour frame buffer device 100x37
> [ 3.509564] fb0: EFI VGA frame buffer device
> [ 3.514013] ACPI: \_PR_.CP00: Found 4 idle states
> [ 3.518850] ACPI: \_PR_.CP01: Found 4 idle states
> [ 3.523687] ACPI: \_PR_.CP02: Found 4 idle states
> [ 3.528515] ACPI: \_PR_.CP03: Found 4 idle states
> [ 3.533346] ACPI: \_PR_.CP04: Found 4 idle states
> [ 3.538173] ACPI: \_PR_.CP05: Found 4 idle states
> [ 3.543003] ACPI: \_PR_.CP06: Found 4 idle states
> [ 3.544219] Freeing initrd memory: 8196K
> [ 3.547844] ACPI: \_PR_.CP07: Found 4 idle states
> [ 3.609542] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> [ 3.616224] 00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
> [ 3.625552] serial 0000:00:16.3: enabling device (0000 -> 0003)
> [ 3.633034] 0000:00:16.3: ttyS1 at I/O 0xf0a0 (irq = 17, base_baud = 115200) is a 16550A
> [ 3.642451] Linux agpgart interface v0.103
> [ 3.647141] ACPI: bus type drm_connector registered
> [ 3.653261] Console: switching to colour dummy device 80x25
> [ 3.659092] nouveau 0000:03:00.0: vgaarb: deactivate vga console
> [ 3.665174] nouveau 0000:03:00.0: NVIDIA GT218 (0a8c00b1)
> [ 3.784585] nouveau 0000:03:00.0: bios: version 70.18.83.00.08
> [ 3.792244] nouveau 0000:03:00.0: fb: 512 MiB DDR3
> [ 3.948786] nouveau 0000:03:00.0: DRM: VRAM: 512 MiB
> [ 3.953755] nouveau 0000:03:00.0: DRM: GART: 1048576 MiB
> [ 3.959073] nouveau 0000:03:00.0: DRM: TMDS table version 2.0
> [ 3.964808] nouveau 0000:03:00.0: DRM: DCB version 4.0
> [ 3.969938] nouveau 0000:03:00.0: DRM: DCB outp 00: 02000360 00000000
> [ 3.976367] nouveau 0000:03:00.0: DRM: DCB outp 01: 02000362 00020010
> [ 3.982792] nouveau 0000:03:00.0: DRM: DCB outp 02: 028003a6 0f220010
> [ 3.989223] nouveau 0000:03:00.0: DRM: DCB outp 03: 01011380 00000000
> [ 3.995647] nouveau 0000:03:00.0: DRM: DCB outp 04: 08011382 00020010
> [ 4.002076] nouveau 0000:03:00.0: DRM: DCB outp 05: 088113c6 0f220010
> [ 4.008511] nouveau 0000:03:00.0: DRM: DCB conn 00: 00101064
> [ 4.014151] nouveau 0000:03:00.0: DRM: DCB conn 01: 00202165
> [ 4.021710] nvkm_uevent_add: uevent: 0xffff888100242100, event: 0xffff8881022de1a0, id: 0x0, bits: 0x1, func: 0x0000000000000000
> [ 4.033680] nvkm_uevent_add: uevent: 0xffff888100242300, event: 0xffff8881022de1a0, id: 0x0, bits: 0x1, func: 0x0000000000000000
> [ 4.045429] nouveau 0000:03:00.0: DRM: MM: using COPY for buffer copies
> [ 4.052059] stackdepot: allocating hash table of 1048576 entries via kvcalloc
> [ 4.067191] nvkm_uevent_add: uevent: 0xffff888100242800, event: 0xffff888104b3e260, id: 0x0, bits: 0x1, func: 0x0000000000000000
> [ 4.078936] nvkm_uevent_add: uevent: 0xffff888100242900, event: 0xffff888104b3e260, id: 0x1, bits: 0x1, func: 0x0000000000000000
> [ 4.090514] nvkm_uevent_add: uevent: 0xffff888100242a00, event: 0xffff888102091f28, id: 0x1, bits: 0x3, func: 0xffffffff8177b700
> [ 4.102118] tsc: Refined TSC clocksource calibration: 3591.345 MHz
> [ 4.108342] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x33c4635c383, max_idle_ns: 440795314831 ns
> [ 4.108401] nvkm_uevent_add: uevent: 0xffff8881020b6000, event: 0xffff888102091f28, id: 0xf, bits: 0x3, func: 0xffffffff8177b700
> [ 4.129864] clocksource: Switched to clocksource tsc
> [ 4.131478] [drm] Initialized nouveau 1.3.1 20120801 for 0000:03:00.0 on minor 0
> [ 4.143806] BUG: kernel NULL pointer dereference, address: 0000000000000020

ahh, that would have been good to know :) Mind figuring out what's
exactly NULL inside nvif_object_mthd? Or rather what line
`nvif_object_mthd+0x136` belongs to, then it should be easy to figure
out what's wrong here.

> [ 4.144676] #PF: supervisor read access in kernel mode
> [ 4.144676] #PF: error_code(0x0000) - not-present page
> [ 4.144676] PGD 0 P4D 0
> [ 4.144676] Oops: 0000 [#1] PREEMPT SMP PTI
> [ 4.144676] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc5-dirty #1
> [ 4.144676] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A13 05/11/2014
> [ 4.144676] RIP: 0010:nvif_object_mthd+0x136/0x1e0
> [ 4.144676] Code: f2 4c 89 ee 48 8d 7c 24 20 66 89 04 24 c6 44 24 18 00 e8 8d 04 4e 00 41 8d 56 20 49 8b 44 24 08 83 fa 17 76 7d 49 39 c4 74 45 <48> 8b 78 20 4c 89 64 24 10 48 8b 40 38 c6 44 24 06 ff 31 c9 48 89
> [ 4.144676] RSP: 0000:ffffc90000023888 EFLAGS: 00010282
> [ 4.144676] RAX: 0000000000000000 RBX: ffff8881003bc000 RCX: 0000000000000008
> [ 4.144676] RDX: 0000000000000028 RSI: ffffc90000023948 RDI: ffffc900000238a8
> [ 4.144676] RBP: ffff8881003bc620 R08: ffff888102170000 R09: ffff888102170000
> [ 4.144676] R10: 0000000000000002 R11: 0000000000000001 R12: ffff8881003bc620
> [ 4.144676] R13: ffffc90000023948 R14: 0000000000000008 R15: 0000000000000000
> [ 4.144676] FS: 0000000000000000(0000) GS:ffff88843a700000(0000) knlGS:0000000000000000
> [ 4.144676] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 4.144676] CR2: 0000000000000020 CR3: 000000000641e001 CR4: 00000000000606e0
> [ 4.144676] Call Trace:
> [ 4.144676] <TASK>
> [ 4.144676] ? __die+0x20/0x70
> [ 4.144676] ? page_fault_oops+0x14c/0x430
> [ 4.144676] ? fixup_exception+0x22/0x340
> [ 4.144676] ? kernelmode_fixup_or_oops+0x84/0x110
> [ 4.144676] ? exc_page_fault+0x66/0x1b0
> [ 4.144676] ? asm_exc_page_fault+0x22/0x30
> [ 4.144676] ? nvif_object_mthd+0x136/0x1e0
> [ 4.144676] ? nvif_object_mthd+0x123/0x1e0
> [ 4.144676] ? rcu_is_watching+0xd/0x40
> [ 4.144676] ? __mutex_lock+0xc9/0x790
> [ 4.144676] ? nouveau_dp_detect+0x67/0x4e0
> [ 4.144676] nvif_conn_hpd_status+0x22/0xd0
> [ 4.144676] nouveau_dp_detect+0x33b/0x4e0
> [ 4.144676] ? rt_mutex_unlock+0xf5/0x110
> [ 4.144676] nouveau_connector_detect+0x10f/0x470
> [ 4.144676] drm_helper_probe_detect+0x81/0xa0
> [ 4.144676] drm_helper_probe_single_connector_modes+0x441/0x510
> [ 4.144676] drm_client_modeset_probe+0x1f8/0xca0
> [ 4.144676] __drm_fb_helper_initial_config_and_unlock+0x34/0x560
> [ 4.144676] ? __mutex_lock+0xc9/0x790
> [ 4.144676] ? drm_client_register+0x22/0xa0
> [ 4.144676] drm_fbdev_generic_client_hotplug+0x66/0xc0
> [ 4.144676] drm_client_register+0x64/0xa0
> [ 4.144676] nouveau_drm_probe+0x20d/0x230
> [ 4.144676] local_pci_probe+0x46/0xa0
> [ 4.144676] pci_device_probe+0xaf/0x200
> [ 4.144676] really_probe+0xc2/0x2d0
> [ 4.144676] __driver_probe_device+0x73/0x120
> [ 4.144676] driver_probe_device+0x1e/0xe0
> [ 4.144676] __driver_attach+0x8a/0x190
> [ 4.144676] ? __pfx___driver_attach+0x10/0x10
> [ 4.144676] bus_for_each_dev+0x6a/0xb0
> [ 4.144676] bus_add_driver+0xeb/0x1f0
> [ 4.144676] driver_register+0x5c/0x120
> [ 4.144676] ? __pfx_nouveau_drm_init+0x10/0x10
> [ 4.144676] do_one_initcall+0x5b/0x280
> [ 4.144676] kernel_init_freeable+0x186/0x2f0
> [ 4.144676] ? __pfx_kernel_init+0x10/0x10
> [ 4.144676] kernel_init+0x16/0x1b0
> [ 4.144676] ret_from_fork+0x30/0x50
> [ 4.144676] ? __pfx_kernel_init+0x10/0x10
> [ 4.144676] ret_from_fork_asm+0x1b/0x30
> [ 4.144676] </TASK>
> [ 4.144676] Modules linked in:
> [ 4.144676] CR2: 0000000000000020
> [ 4.144676] ---[ end trace 0000000000000000 ]---
> [ 4.144676] RIP: 0010:nvif_object_mthd+0x136/0x1e0
> [ 4.144676] Code: f2 4c 89 ee 48 8d 7c 24 20 66 89 04 24 c6 44 24 18 00 e8 8d 04 4e 00 41 8d 56 20 49 8b 44 24 08 83 fa 17 76 7d 49 39 c4 74 45 <48> 8b 78 20 4c 89 64 24 10 48 8b 40 38 c6 44 24 06 ff 31 c9 48 89
> [ 4.144676] RSP: 0000:ffffc90000023888 EFLAGS: 00010282
> [ 4.144676] RAX: 0000000000000000 RBX: ffff8881003bc000 RCX: 0000000000000008
> [ 4.144676] RDX: 0000000000000028 RSI: ffffc90000023948 RDI: ffffc900000238a8
> [ 4.144676] RBP: ffff8881003bc620 R08: ffff888102170000 R09: ffff888102170000
> [ 4.144676] R10: 0000000000000002 R11: 0000000000000001 R12: ffff8881003bc620
> [ 4.144676] R13: ffffc90000023948 R14: 0000000000000008 R15: 0000000000000000
> [ 4.144676] FS: 0000000000000000(0000) GS:ffff88843a700000(0000) knlGS:0000000000000000
> [ 4.144676] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 4.144676] CR2: 0000000000000020 CR3: 000000000641e001 CR4: 00000000000606e0
> [ 4.144676] note: swapper/0[1] exited with irqs disabled
> [ 4.549714] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
> [ 4.550687] Kernel Offset: disabled
> [ 4.550687] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 ]---
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
>


2023-08-08 20:02:26

by Borislav Petkov

[permalink] [raw]
Subject: Re: 2b5d1c29f6c4 ("drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts")

On Tue, Aug 08, 2023 at 12:39:32PM +0200, Karol Herbst wrote:
> ahh, that would have been good to know :)

Yeah, I didn't see it before - it would only freeze. Only after I added
the printk you requested.

> Mind figuring out what's exactly NULL inside nvif_object_mthd? Or
> rather what line `nvif_object_mthd+0x136` belongs to, then it should
> be easy to figure out what's wrong here.

That looks like this:

ffffffff816ddfee: e8 8d 04 4e 00 callq ffffffff81bbe480 <__memcpy>
ffffffff816ddff3: 41 8d 56 20 lea 0x20(%r14),%edx
ffffffff816ddff7: 49 8b 44 24 08 mov 0x8(%r12),%rax
ffffffff816ddffc: 83 fa 17 cmp $0x17,%edx
ffffffff816ddfff: 76 7d jbe ffffffff816de07e <nvif_object_mthd+0x1ae>
ffffffff816de001: 49 39 c4 cmp %rax,%r12
ffffffff816de004: 74 45 je ffffffff816de04b <nvif_object_mthd+0x17b>

<--- RIP points here.

The 0x20 also fits the deref address: 0000000000000020.

Which means %rax is 0. Yap.

ffffffff816de006: 48 8b 78 20 mov 0x20(%rax),%rdi
ffffffff816de00a: 4c 89 64 24 10 mov %r12,0x10(%rsp)
ffffffff816de00f: 48 8b 40 38 mov 0x38(%rax),%rax
ffffffff816de013: c6 44 24 06 ff movb $0xff,0x6(%rsp)
ffffffff816de018: 31 c9 xor %ecx,%ecx
ffffffff816de01a: 48 89 e6 mov %rsp,%rsi
ffffffff816de01d: 48 8b 40 28 mov 0x28(%rax),%rax
ffffffff816de021: e8 3a 0c 4f 00 callq ffffffff81bcec60 <__x86_indirect_thunk_array>


Now, the preprocessed asm version of nvif/object.c says around here:


call memcpy #
# drivers/gpu/drm/nouveau/nvif/object.c:160: ret = nvif_object_ioctl(object, args, sizeof(*args) + size, NULL);
leal 32(%r14), %edx #, _108
# drivers/gpu/drm/nouveau/nvif/object.c:33: struct nvif_client *client = object->client;
movq 8(%r12), %rax # object_19(D)->client, client
# drivers/gpu/drm/nouveau/nvif/object.c:38: if (size >= sizeof(*args) && args->v0.version == 0) {
cmpl $23, %edx #, _108
jbe .L69 #,
# drivers/gpu/drm/nouveau/nvif/object.c:39: if (object != &client->object)
cmpq %rax, %r12 # client, object
je .L70 #,
# drivers/gpu/drm/nouveau/nvif/object.c:47: return client->driver->ioctl(client->object.priv, data, size, hack);
movq 32(%rax), %rdi # client_109->object.priv, client_109->object.priv


So I'd say that client is NULL. IINM.


movq %r12, 16(%rsp) # object, MEM[(union *)&stack].v0.object
# drivers/gpu/drm/nouveau/nvif/object.c:47: return client->driver->ioctl(client->object.priv, data, size, hack);
movq 56(%rax), %rax # client_109->driver, client_109->driver
# drivers/gpu/drm/nouveau/nvif/object.c:43: args->v0.owner = NVIF_IOCTL_V0_OWNER_ANY;
movb $-1, 6(%rsp) #, MEM[(union *)&stack].v0.owner
.L64:
# drivers/gpu/drm/nouveau/nvif/object.c:47: return client->driver->ioctl(client->object.priv, data, size, hack);
xorl %ecx, %ecx #
movq %rsp, %rsi #,
movq 40(%rax), %rax #, _77->ioctl
call __x86_indirect_thunk_rax
# drivers/gpu/drm/nouveau/nvif/object.c:161: memcpy(data, args->mthd.data, size);

> > [ 4.144676] #PF: supervisor read access in kernel mode
> > [ 4.144676] #PF: error_code(0x0000) - not-present page
> > [ 4.144676] PGD 0 P4D 0
> > [ 4.144676] Oops: 0000 [#1] PREEMPT SMP PTI
> > [ 4.144676] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc5-dirty #1
> > [ 4.144676] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A13 05/11/2014
> > [ 4.144676] RIP: 0010:nvif_object_mthd+0x136/0x1e0
> > [ 4.144676] Code: f2 4c 89 ee 48 8d 7c 24 20 66 89 04 24 c6 44 24 18 00 e8 8d 04 4e 00 41 8d 56 20 49 8b 44 24 08 83 fa 17 76 7d 49 39 c4 74 45 <48> 8b 78 20 4c 89 64 24 10 48 8b 40 38 c6 44 24 06 ff 31 c9 48 89

Opcode bytes around RIP look correct too:

./scripts/decodecode < /tmp/oops
[ 4.144676] Code: f2 4c 89 ee 48 8d 7c 24 20 66 89 04 24 c6 44 24 18 00 e8 8d 04 4e 00 41 8d 56 20 49 8b 44 24 08 83 fa 17 76 7d 49 39 c4 74 45 <48> 8b 78 20 4c 89 64 24 10 48 8b 40 38 c6 44 24 06 ff 31 c9 48 89
All code
========
0: f2 4c 89 ee repnz mov %r13,%rsi
4: 48 8d 7c 24 20 lea 0x20(%rsp),%rdi
9: 66 89 04 24 mov %ax,(%rsp)
d: c6 44 24 18 00 movb $0x0,0x18(%rsp)
12: e8 8d 04 4e 00 callq 0x4e04a4
17: 41 8d 56 20 lea 0x20(%r14),%edx
1b: 49 8b 44 24 08 mov 0x8(%r12),%rax
20: 83 fa 17 cmp $0x17,%edx
23: 76 7d jbe 0xa2
25: 49 39 c4 cmp %rax,%r12
28: 74 45 je 0x6f
2a:* 48 8b 78 20 mov 0x20(%rax),%rdi <-- trapping instruction
2e: 4c 89 64 24 10 mov %r12,0x10(%rsp)
33: 48 8b 40 38 mov 0x38(%rax),%rax
37: c6 44 24 06 ff movb $0xff,0x6(%rsp)
3c: 31 c9 xor %ecx,%ecx
3e: 48 rex.W
3f: 89 .byte 0x89


HTH.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-08-09 12:05:09

by Takashi Iwai

[permalink] [raw]
Subject: Re: 2b5d1c29f6c4 ("drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts")

On Tue, 08 Aug 2023 12:39:32 +0200,
Karol Herbst wrote:
>
> On Mon, Aug 7, 2023 at 5:05 PM Borislav Petkov <[email protected]> wrote:
> >
> > On Mon, Aug 07, 2023 at 01:49:42PM +0200, Karol Herbst wrote:
> > > in what way does it stop? Just not progressing? That would be kinda
> > > concerning. Mind tracing with what arguments `nvkm_uevent_add` is
> > > called with and without that patch?
> >
> > Well, me dumping those args I guess made the box not freeze before
> > catching a #PF over serial. Does that help?
> >
> > ....
> > [ 3.410135] Unpacking initramfs...
> > [ 3.416319] software IO TLB: mapped [mem 0x00000000a877d000-0x00000000ac77d000] (64MB)
> > [ 3.418227] Initialise system trusted keyrings
> > [ 3.432273] workingset: timestamp_bits=56 max_order=22 bucket_order=0
> > [ 3.439006] ntfs: driver 2.1.32 [Flags: R/W].
> > [ 3.443368] fuse: init (API version 7.38)
> > [ 3.447601] 9p: Installing v9fs 9p2000 file system support
> > [ 3.453223] Key type asymmetric registered
> > [ 3.457332] Asymmetric key parser 'x509' registered
> > [ 3.462236] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 250)
> > [ 3.475865] efifb: probing for efifb
> > [ 3.479458] efifb: framebuffer at 0xf9000000, using 1920k, total 1920k
> > [ 3.485969] efifb: mode is 800x600x32, linelength=3200, pages=1
> > [ 3.491872] efifb: scrolling: redraw
> > [ 3.495438] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
> > [ 3.502349] Console: switching to colour frame buffer device 100x37
> > [ 3.509564] fb0: EFI VGA frame buffer device
> > [ 3.514013] ACPI: \_PR_.CP00: Found 4 idle states
> > [ 3.518850] ACPI: \_PR_.CP01: Found 4 idle states
> > [ 3.523687] ACPI: \_PR_.CP02: Found 4 idle states
> > [ 3.528515] ACPI: \_PR_.CP03: Found 4 idle states
> > [ 3.533346] ACPI: \_PR_.CP04: Found 4 idle states
> > [ 3.538173] ACPI: \_PR_.CP05: Found 4 idle states
> > [ 3.543003] ACPI: \_PR_.CP06: Found 4 idle states
> > [ 3.544219] Freeing initrd memory: 8196K
> > [ 3.547844] ACPI: \_PR_.CP07: Found 4 idle states
> > [ 3.609542] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> > [ 3.616224] 00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
> > [ 3.625552] serial 0000:00:16.3: enabling device (0000 -> 0003)
> > [ 3.633034] 0000:00:16.3: ttyS1 at I/O 0xf0a0 (irq = 17, base_baud = 115200) is a 16550A
> > [ 3.642451] Linux agpgart interface v0.103
> > [ 3.647141] ACPI: bus type drm_connector registered
> > [ 3.653261] Console: switching to colour dummy device 80x25
> > [ 3.659092] nouveau 0000:03:00.0: vgaarb: deactivate vga console
> > [ 3.665174] nouveau 0000:03:00.0: NVIDIA GT218 (0a8c00b1)
> > [ 3.784585] nouveau 0000:03:00.0: bios: version 70.18.83.00.08
> > [ 3.792244] nouveau 0000:03:00.0: fb: 512 MiB DDR3
> > [ 3.948786] nouveau 0000:03:00.0: DRM: VRAM: 512 MiB
> > [ 3.953755] nouveau 0000:03:00.0: DRM: GART: 1048576 MiB
> > [ 3.959073] nouveau 0000:03:00.0: DRM: TMDS table version 2.0
> > [ 3.964808] nouveau 0000:03:00.0: DRM: DCB version 4.0
> > [ 3.969938] nouveau 0000:03:00.0: DRM: DCB outp 00: 02000360 00000000
> > [ 3.976367] nouveau 0000:03:00.0: DRM: DCB outp 01: 02000362 00020010
> > [ 3.982792] nouveau 0000:03:00.0: DRM: DCB outp 02: 028003a6 0f220010
> > [ 3.989223] nouveau 0000:03:00.0: DRM: DCB outp 03: 01011380 00000000
> > [ 3.995647] nouveau 0000:03:00.0: DRM: DCB outp 04: 08011382 00020010
> > [ 4.002076] nouveau 0000:03:00.0: DRM: DCB outp 05: 088113c6 0f220010
> > [ 4.008511] nouveau 0000:03:00.0: DRM: DCB conn 00: 00101064
> > [ 4.014151] nouveau 0000:03:00.0: DRM: DCB conn 01: 00202165
> > [ 4.021710] nvkm_uevent_add: uevent: 0xffff888100242100, event: 0xffff8881022de1a0, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > [ 4.033680] nvkm_uevent_add: uevent: 0xffff888100242300, event: 0xffff8881022de1a0, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > [ 4.045429] nouveau 0000:03:00.0: DRM: MM: using COPY for buffer copies
> > [ 4.052059] stackdepot: allocating hash table of 1048576 entries via kvcalloc
> > [ 4.067191] nvkm_uevent_add: uevent: 0xffff888100242800, event: 0xffff888104b3e260, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > [ 4.078936] nvkm_uevent_add: uevent: 0xffff888100242900, event: 0xffff888104b3e260, id: 0x1, bits: 0x1, func: 0x0000000000000000
> > [ 4.090514] nvkm_uevent_add: uevent: 0xffff888100242a00, event: 0xffff888102091f28, id: 0x1, bits: 0x3, func: 0xffffffff8177b700
> > [ 4.102118] tsc: Refined TSC clocksource calibration: 3591.345 MHz
> > [ 4.108342] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x33c4635c383, max_idle_ns: 440795314831 ns
> > [ 4.108401] nvkm_uevent_add: uevent: 0xffff8881020b6000, event: 0xffff888102091f28, id: 0xf, bits: 0x3, func: 0xffffffff8177b700
> > [ 4.129864] clocksource: Switched to clocksource tsc
> > [ 4.131478] [drm] Initialized nouveau 1.3.1 20120801 for 0000:03:00.0 on minor 0
> > [ 4.143806] BUG: kernel NULL pointer dereference, address: 0000000000000020
>
> ahh, that would have been good to know :) Mind figuring out what's
> exactly NULL inside nvif_object_mthd? Or rather what line
> `nvif_object_mthd+0x136` belongs to, then it should be easy to figure
> out what's wrong here.

FWIW, we've hit the bug on openSUSE Tumbleweed 6.4.8 kernel:
https://bugzilla.suse.com/show_bug.cgi?id=1214073
Confirmed that reverting the patch cured the issue.

FWIW, loading nouveau showed a refcount_t warning just before the NULL
dereference:

[ 163.237655] ACPI Warning: \_SB.PCI0.IXVE.IGPU._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20230331/nsarguments-61)
[ 163.237700] ACPI: \_SB_.PCI0.IXVE.IGPU: failed to evaluate _DSM
[ 163.237755] nouveau 0000:02:00.0: enabling device (0002 -> 0003)
[ 163.238089] ACPI: \_SB_.PCI0.LGPU: Enabled at IRQ 20
[ 163.249419] Console: switching to colour dummy device 80x25
[ 163.266174] nouveau 0000:02:00.0: vgaarb: deactivate vga console
[ 163.266307] nouveau 0000:02:00.0: NVIDIA MCP79/MCP7A (0ac180b1)
[ 163.287303] nouveau 0000:02:00.0: bios: version 62.79.40.00.01
[ 163.309529] nouveau 0000:02:00.0: fb: 256 MiB stolen system memory
[ 163.383121] nouveau 0000:02:00.0: DRM: VRAM: 256 MiB
[ 163.383132] nouveau 0000:02:00.0: DRM: GART: 1048576 MiB
[ 163.383138] nouveau 0000:02:00.0: DRM: TMDS table version 2.0
[ 163.383142] nouveau 0000:02:00.0: DRM: DCB version 4.0
[ 163.383145] nouveau 0000:02:00.0: DRM: DCB outp 00: 01000123 00010014
[ 163.383150] nouveau 0000:02:00.0: DRM: DCB outp 01: 02021232 00000010
[ 163.383154] nouveau 0000:02:00.0: DRM: DCB outp 02: 02021286 0f220010
[ 163.383158] nouveau 0000:02:00.0: DRM: DCB conn 00: 00000040
[ 163.383162] nouveau 0000:02:00.0: DRM: DCB conn 01: 0000a146
[ 163.385635] nouveau 0000:02:00.0: DRM: MM: using M2MF for buffer copies
[ 163.417977] ------------[ cut here ]------------
[ 163.417988] refcount_t: saturated; leaking memory.
[ 163.418012] WARNING: CPU: 1 PID: 2873 at lib/refcount.c:19 refcount_warn_saturate+0x9b/0x110
[ 163.418022] Modules linked in: nouveau(+) button mxm_wmi i2c_algo_bit drm_display_helper drm_ttm_helper xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype iptable_filter bpfilter br_netfilter bridge stp llc overlay ccm af_packet bnep btusb btrtl btbcm btintel btmtk bluetooth ecdh_generic uvcvideo rtl8xxxu videobuf2_vmalloc mac80211 uvc videobuf2_memops videobuf2_v4l2 videodev libarc4 videobuf2_common mc cfg80211 hid_appleir hid_apple bcm5974 apple_mfi_fastcharge iscsi_ibft iscsi_boot_sysfs joydev rfkill qrtr z3fold snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio coretemp snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec kvm_intel snd_hda_core applesmc snd_hwdep kvm snd_pcm irqbypass pcspkr acpi_cpufreq snd_timer binfmt_misc snd forcedeth soundcore squashfs nls_iso8859_1 loop nls_cp437 vfat fat i2c_nforce2 acpi_als industrialio_triggered_buffer kfifo_buf indu
strialio sbs sbshc apple_bl ac
[ 163.418129] tiny_power_button fuse efi_pstore configfs dmi_sysfs ip_tables x_tables hid_generic usbhid ttm video wmi cec ohci_pci ohci_hcd ehci_pci rc_core sr_mod ehci_hcd sha512_ssse3 cdrom usbcore nv_tco btrfs blake2b_generic libcrc32c xor raid6_pq sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua msr efivarfs [last unloaded: button]
[ 163.418177] CPU: 1 PID: 2873 Comm: modprobe Not tainted 6.4.8-1-default #1 openSUSE Tumbleweed 5f0d78911475bf45bbeef64510275b9fba2542b1
[ 163.418183] Hardware name: Apple Inc. MacBook5,1/Mac-F42D89C8, BIOS MB51.88Z.007D.B03.0904271443 04/27/09
[ 163.418187] RIP: 0010:refcount_warn_saturate+0x9b/0x110
[ 163.418192] Code: 01 01 e8 68 7b aa ff 0f 0b c3 cc cc cc cc 80 3d e6 e1 a8 01 00 75 a8 48 c7 c7 d0 ed 05 a4 c6 05 d6 e1 a8 01 01 e8 45 7b aa ff <0f> 0b c3 cc cc cc cc 80 3d c0 e1 a8 01 00 75 85 48 c7 c7 28 ee 05
[ 163.418196] RSP: 0018:ffffbae941613aa0 EFLAGS: 00010086
[ 163.418200] RAX: 0000000000000000 RBX: ffff951bc88c2000 RCX: 0000000000000027
[ 163.418204] RDX: ffff951cf81274c8 RSI: 0000000000000001 RDI: ffff951cf81274c0
[ 163.418207] RBP: 0000000000000246 R08: 0000000000000000 R09: ffffbae941613948
[ 163.418210] R10: 0000000000000003 R11: ffffffffa4958d48 R12: ffff951be0df3a58
[ 163.418213] R13: ffffbae941613ad8 R14: ffff951be0df3800 R15: 0000000000000000
[ 163.418216] FS: 00007f81c0247740(0000) GS:ffff951cf8100000(0000) knlGS:0000000000000000
[ 163.418220] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 163.418223] CR2: 00005653d818cd64 CR3: 000000012c72a000 CR4: 00000000000006e0
[ 163.418226] Call Trace:
[ 163.418231] <TASK>
[ 163.418234] ? refcount_warn_saturate+0x9b/0x110
[ 163.418238] ? __warn+0x81/0x130
[ 163.418248] ? refcount_warn_saturate+0x9b/0x110
[ 163.418252] ? report_bug+0x171/0x1a0
[ 163.418259] ? handle_bug+0x3c/0x80
[ 163.418264] ? exc_invalid_op+0x17/0x70
[ 163.418268] ? asm_exc_invalid_op+0x1a/0x20
[ 163.418275] ? refcount_warn_saturate+0x9b/0x110
[ 163.418279] drm_connector_list_iter_next+0x97/0xc0
[ 163.418289] drm_connector_register_all+0x3d/0xf0
[ 163.418296] drm_modeset_register_all+0x5f/0x80
[ 163.418302] drm_dev_register+0x114/0x240
[ 163.418307] nouveau_drm_probe+0x16a/0x280 [nouveau 7f21e95875a4a0137564007ae3277f6b641e9279]
[ 163.418713] local_pci_probe+0x45/0xa0
[ 163.418719] pci_device_probe+0xc7/0x230
[ 163.418726] really_probe+0x19e/0x3e0
[ 163.418730] ? __pfx___driver_attach+0x10/0x10
[ 163.418734] __driver_probe_device+0x78/0x160
[ 163.418737] driver_probe_device+0x1f/0x90
[ 163.418741] __driver_attach+0xd2/0x1c0
[ 163.418745] bus_for_each_dev+0x77/0xc0
[ 163.418751] bus_add_driver+0x116/0x220
[ 163.418757] driver_register+0x59/0x100
[ 163.418762] ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau 7f21e95875a4a0137564007ae3277f6b641e9279]
[ 163.418999] do_one_initcall+0x4a/0x220
[ 163.418999] ? kmalloc_trace+0x2a/0xa0
[ 163.418999] do_init_module+0x60/0x240
[ 163.418999] __do_sys_init_module+0x17f/0x1b0
[ 163.418999] do_syscall_64+0x60/0x90
[ 163.418999] ? syscall_exit_to_user_mode+0x1b/0x40
[ 163.418999] ? do_syscall_64+0x6c/0x90
[ 163.418999] ? count_memcg_events.constprop.0+0x1a/0x30
[ 163.418999] ? handle_mm_fault+0x9e/0x350
[ 163.418999] ? do_user_addr_fault+0x179/0x640
[ 163.418999] ? exc_page_fault+0x71/0x160
[ 163.418999] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 163.418999] RIP: 0033:0x7f81bfb19a5e
[ 163.418999] Code: c3 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 66 90 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 7a 03 0d 00 f7 d8 64 89 01 48
[ 163.418999] RSP: 002b:00007ffddb1760c8 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[ 163.418999] RAX: ffffffffffffffda RBX: 0000560bc842df00 RCX: 00007f81bfb19a5e
[ 163.418999] RDX: 0000560bc8432900 RSI: 00000000006aef9b RDI: 00007f81bea94010
[ 163.418999] RBP: 0000560bc8432900 R08: 0000560bc8432c20 R09: 0000000000000000
[ 163.418999] R10: 0000000000012b71 R11: 0000000000000246 R12: 0000000000040000
[ 163.418999] R13: 0000000000000000 R14: 0000000000000009 R15: 0000560bc842d7b0
[ 163.418999] </TASK>
[ 163.418999] ---[ end trace 0000000000000000 ]---

The full dmesg is found in
https://bugzilla.suse.com/attachment.cgi?id=868688


thanks,

Takashi

2023-08-09 12:10:31

by Karol Herbst

[permalink] [raw]
Subject: Re: 2b5d1c29f6c4 ("drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts")

On Wed, Aug 9, 2023 at 11:22 AM Takashi Iwai <[email protected]> wrote:
>
> On Tue, 08 Aug 2023 12:39:32 +0200,
> Karol Herbst wrote:
> >
> > On Mon, Aug 7, 2023 at 5:05 PM Borislav Petkov <[email protected]> wrote:
> > >
> > > On Mon, Aug 07, 2023 at 01:49:42PM +0200, Karol Herbst wrote:
> > > > in what way does it stop? Just not progressing? That would be kinda
> > > > concerning. Mind tracing with what arguments `nvkm_uevent_add` is
> > > > called with and without that patch?
> > >
> > > Well, me dumping those args I guess made the box not freeze before
> > > catching a #PF over serial. Does that help?
> > >
> > > ....
> > > [ 3.410135] Unpacking initramfs...
> > > [ 3.416319] software IO TLB: mapped [mem 0x00000000a877d000-0x00000000ac77d000] (64MB)
> > > [ 3.418227] Initialise system trusted keyrings
> > > [ 3.432273] workingset: timestamp_bits=56 max_order=22 bucket_order=0
> > > [ 3.439006] ntfs: driver 2.1.32 [Flags: R/W].
> > > [ 3.443368] fuse: init (API version 7.38)
> > > [ 3.447601] 9p: Installing v9fs 9p2000 file system support
> > > [ 3.453223] Key type asymmetric registered
> > > [ 3.457332] Asymmetric key parser 'x509' registered
> > > [ 3.462236] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 250)
> > > [ 3.475865] efifb: probing for efifb
> > > [ 3.479458] efifb: framebuffer at 0xf9000000, using 1920k, total 1920k
> > > [ 3.485969] efifb: mode is 800x600x32, linelength=3200, pages=1
> > > [ 3.491872] efifb: scrolling: redraw
> > > [ 3.495438] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
> > > [ 3.502349] Console: switching to colour frame buffer device 100x37
> > > [ 3.509564] fb0: EFI VGA frame buffer device
> > > [ 3.514013] ACPI: \_PR_.CP00: Found 4 idle states
> > > [ 3.518850] ACPI: \_PR_.CP01: Found 4 idle states
> > > [ 3.523687] ACPI: \_PR_.CP02: Found 4 idle states
> > > [ 3.528515] ACPI: \_PR_.CP03: Found 4 idle states
> > > [ 3.533346] ACPI: \_PR_.CP04: Found 4 idle states
> > > [ 3.538173] ACPI: \_PR_.CP05: Found 4 idle states
> > > [ 3.543003] ACPI: \_PR_.CP06: Found 4 idle states
> > > [ 3.544219] Freeing initrd memory: 8196K
> > > [ 3.547844] ACPI: \_PR_.CP07: Found 4 idle states
> > > [ 3.609542] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> > > [ 3.616224] 00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
> > > [ 3.625552] serial 0000:00:16.3: enabling device (0000 -> 0003)
> > > [ 3.633034] 0000:00:16.3: ttyS1 at I/O 0xf0a0 (irq = 17, base_baud = 115200) is a 16550A
> > > [ 3.642451] Linux agpgart interface v0.103
> > > [ 3.647141] ACPI: bus type drm_connector registered
> > > [ 3.653261] Console: switching to colour dummy device 80x25
> > > [ 3.659092] nouveau 0000:03:00.0: vgaarb: deactivate vga console
> > > [ 3.665174] nouveau 0000:03:00.0: NVIDIA GT218 (0a8c00b1)
> > > [ 3.784585] nouveau 0000:03:00.0: bios: version 70.18.83.00.08
> > > [ 3.792244] nouveau 0000:03:00.0: fb: 512 MiB DDR3
> > > [ 3.948786] nouveau 0000:03:00.0: DRM: VRAM: 512 MiB
> > > [ 3.953755] nouveau 0000:03:00.0: DRM: GART: 1048576 MiB
> > > [ 3.959073] nouveau 0000:03:00.0: DRM: TMDS table version 2.0
> > > [ 3.964808] nouveau 0000:03:00.0: DRM: DCB version 4.0
> > > [ 3.969938] nouveau 0000:03:00.0: DRM: DCB outp 00: 02000360 00000000
> > > [ 3.976367] nouveau 0000:03:00.0: DRM: DCB outp 01: 02000362 00020010
> > > [ 3.982792] nouveau 0000:03:00.0: DRM: DCB outp 02: 028003a6 0f220010
> > > [ 3.989223] nouveau 0000:03:00.0: DRM: DCB outp 03: 01011380 00000000
> > > [ 3.995647] nouveau 0000:03:00.0: DRM: DCB outp 04: 08011382 00020010
> > > [ 4.002076] nouveau 0000:03:00.0: DRM: DCB outp 05: 088113c6 0f220010
> > > [ 4.008511] nouveau 0000:03:00.0: DRM: DCB conn 00: 00101064
> > > [ 4.014151] nouveau 0000:03:00.0: DRM: DCB conn 01: 00202165
> > > [ 4.021710] nvkm_uevent_add: uevent: 0xffff888100242100, event: 0xffff8881022de1a0, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > [ 4.033680] nvkm_uevent_add: uevent: 0xffff888100242300, event: 0xffff8881022de1a0, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > [ 4.045429] nouveau 0000:03:00.0: DRM: MM: using COPY for buffer copies
> > > [ 4.052059] stackdepot: allocating hash table of 1048576 entries via kvcalloc
> > > [ 4.067191] nvkm_uevent_add: uevent: 0xffff888100242800, event: 0xffff888104b3e260, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > [ 4.078936] nvkm_uevent_add: uevent: 0xffff888100242900, event: 0xffff888104b3e260, id: 0x1, bits: 0x1, func: 0x0000000000000000
> > > [ 4.090514] nvkm_uevent_add: uevent: 0xffff888100242a00, event: 0xffff888102091f28, id: 0x1, bits: 0x3, func: 0xffffffff8177b700
> > > [ 4.102118] tsc: Refined TSC clocksource calibration: 3591.345 MHz
> > > [ 4.108342] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x33c4635c383, max_idle_ns: 440795314831 ns
> > > [ 4.108401] nvkm_uevent_add: uevent: 0xffff8881020b6000, event: 0xffff888102091f28, id: 0xf, bits: 0x3, func: 0xffffffff8177b700
> > > [ 4.129864] clocksource: Switched to clocksource tsc
> > > [ 4.131478] [drm] Initialized nouveau 1.3.1 20120801 for 0000:03:00.0 on minor 0
> > > [ 4.143806] BUG: kernel NULL pointer dereference, address: 0000000000000020
> >
> > ahh, that would have been good to know :) Mind figuring out what's
> > exactly NULL inside nvif_object_mthd? Or rather what line
> > `nvif_object_mthd+0x136` belongs to, then it should be easy to figure
> > out what's wrong here.
>
> FWIW, we've hit the bug on openSUSE Tumbleweed 6.4.8 kernel:
> https://bugzilla.suse.com/show_bug.cgi?id=1214073
> Confirmed that reverting the patch cured the issue.
>
> FWIW, loading nouveau showed a refcount_t warning just before the NULL
> dereference:
>

mh, I wonder if one of those `return -EINVAL;` branches is hit where
it wasn't before. Could some of you check if `nvkm_uconn_uevent`
returns -EINVAL with that patch where it didn't before? I wonder if
it's the `if (&outp->head == &conn->disp->outps) return -EINVAL;` and
if remove that fixes the crash?

> [ 163.237655] ACPI Warning: \_SB.PCI0.IXVE.IGPU._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20230331/nsarguments-61)
> [ 163.237700] ACPI: \_SB_.PCI0.IXVE.IGPU: failed to evaluate _DSM
> [ 163.237755] nouveau 0000:02:00.0: enabling device (0002 -> 0003)
> [ 163.238089] ACPI: \_SB_.PCI0.LGPU: Enabled at IRQ 20
> [ 163.249419] Console: switching to colour dummy device 80x25
> [ 163.266174] nouveau 0000:02:00.0: vgaarb: deactivate vga console
> [ 163.266307] nouveau 0000:02:00.0: NVIDIA MCP79/MCP7A (0ac180b1)
> [ 163.287303] nouveau 0000:02:00.0: bios: version 62.79.40.00.01
> [ 163.309529] nouveau 0000:02:00.0: fb: 256 MiB stolen system memory
> [ 163.383121] nouveau 0000:02:00.0: DRM: VRAM: 256 MiB
> [ 163.383132] nouveau 0000:02:00.0: DRM: GART: 1048576 MiB
> [ 163.383138] nouveau 0000:02:00.0: DRM: TMDS table version 2.0
> [ 163.383142] nouveau 0000:02:00.0: DRM: DCB version 4.0
> [ 163.383145] nouveau 0000:02:00.0: DRM: DCB outp 00: 01000123 00010014
> [ 163.383150] nouveau 0000:02:00.0: DRM: DCB outp 01: 02021232 00000010
> [ 163.383154] nouveau 0000:02:00.0: DRM: DCB outp 02: 02021286 0f220010
> [ 163.383158] nouveau 0000:02:00.0: DRM: DCB conn 00: 00000040
> [ 163.383162] nouveau 0000:02:00.0: DRM: DCB conn 01: 0000a146
> [ 163.385635] nouveau 0000:02:00.0: DRM: MM: using M2MF for buffer copies
> [ 163.417977] ------------[ cut here ]------------
> [ 163.417988] refcount_t: saturated; leaking memory.
> [ 163.418012] WARNING: CPU: 1 PID: 2873 at lib/refcount.c:19 refcount_warn_saturate+0x9b/0x110
> [ 163.418022] Modules linked in: nouveau(+) button mxm_wmi i2c_algo_bit drm_display_helper drm_ttm_helper xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype iptable_filter bpfilter br_netfilter bridge stp llc overlay ccm af_packet bnep btusb btrtl btbcm btintel btmtk bluetooth ecdh_generic uvcvideo rtl8xxxu videobuf2_vmalloc mac80211 uvc videobuf2_memops videobuf2_v4l2 videodev libarc4 videobuf2_common mc cfg80211 hid_appleir hid_apple bcm5974 apple_mfi_fastcharge iscsi_ibft iscsi_boot_sysfs joydev rfkill qrtr z3fold snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio coretemp snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec kvm_intel snd_hda_core applesmc snd_hwdep kvm snd_pcm irqbypass pcspkr acpi_cpufreq snd_timer binfmt_misc snd forcedeth soundcore squashfs nls_iso8859_1 loop nls_cp437 vfat fat i2c_nforce2 acpi_als industrialio_triggered_buffer kfifo_buf indu
> strialio sbs sbshc apple_bl ac
> [ 163.418129] tiny_power_button fuse efi_pstore configfs dmi_sysfs ip_tables x_tables hid_generic usbhid ttm video wmi cec ohci_pci ohci_hcd ehci_pci rc_core sr_mod ehci_hcd sha512_ssse3 cdrom usbcore nv_tco btrfs blake2b_generic libcrc32c xor raid6_pq sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua msr efivarfs [last unloaded: button]
> [ 163.418177] CPU: 1 PID: 2873 Comm: modprobe Not tainted 6.4.8-1-default #1 openSUSE Tumbleweed 5f0d78911475bf45bbeef64510275b9fba2542b1
> [ 163.418183] Hardware name: Apple Inc. MacBook5,1/Mac-F42D89C8, BIOS MB51.88Z.007D.B03.0904271443 04/27/09
> [ 163.418187] RIP: 0010:refcount_warn_saturate+0x9b/0x110
> [ 163.418192] Code: 01 01 e8 68 7b aa ff 0f 0b c3 cc cc cc cc 80 3d e6 e1 a8 01 00 75 a8 48 c7 c7 d0 ed 05 a4 c6 05 d6 e1 a8 01 01 e8 45 7b aa ff <0f> 0b c3 cc cc cc cc 80 3d c0 e1 a8 01 00 75 85 48 c7 c7 28 ee 05
> [ 163.418196] RSP: 0018:ffffbae941613aa0 EFLAGS: 00010086
> [ 163.418200] RAX: 0000000000000000 RBX: ffff951bc88c2000 RCX: 0000000000000027
> [ 163.418204] RDX: ffff951cf81274c8 RSI: 0000000000000001 RDI: ffff951cf81274c0
> [ 163.418207] RBP: 0000000000000246 R08: 0000000000000000 R09: ffffbae941613948
> [ 163.418210] R10: 0000000000000003 R11: ffffffffa4958d48 R12: ffff951be0df3a58
> [ 163.418213] R13: ffffbae941613ad8 R14: ffff951be0df3800 R15: 0000000000000000
> [ 163.418216] FS: 00007f81c0247740(0000) GS:ffff951cf8100000(0000) knlGS:0000000000000000
> [ 163.418220] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 163.418223] CR2: 00005653d818cd64 CR3: 000000012c72a000 CR4: 00000000000006e0
> [ 163.418226] Call Trace:
> [ 163.418231] <TASK>
> [ 163.418234] ? refcount_warn_saturate+0x9b/0x110
> [ 163.418238] ? __warn+0x81/0x130
> [ 163.418248] ? refcount_warn_saturate+0x9b/0x110
> [ 163.418252] ? report_bug+0x171/0x1a0
> [ 163.418259] ? handle_bug+0x3c/0x80
> [ 163.418264] ? exc_invalid_op+0x17/0x70
> [ 163.418268] ? asm_exc_invalid_op+0x1a/0x20
> [ 163.418275] ? refcount_warn_saturate+0x9b/0x110
> [ 163.418279] drm_connector_list_iter_next+0x97/0xc0
> [ 163.418289] drm_connector_register_all+0x3d/0xf0
> [ 163.418296] drm_modeset_register_all+0x5f/0x80
> [ 163.418302] drm_dev_register+0x114/0x240
> [ 163.418307] nouveau_drm_probe+0x16a/0x280 [nouveau 7f21e95875a4a0137564007ae3277f6b641e9279]
> [ 163.418713] local_pci_probe+0x45/0xa0
> [ 163.418719] pci_device_probe+0xc7/0x230
> [ 163.418726] really_probe+0x19e/0x3e0
> [ 163.418730] ? __pfx___driver_attach+0x10/0x10
> [ 163.418734] __driver_probe_device+0x78/0x160
> [ 163.418737] driver_probe_device+0x1f/0x90
> [ 163.418741] __driver_attach+0xd2/0x1c0
> [ 163.418745] bus_for_each_dev+0x77/0xc0
> [ 163.418751] bus_add_driver+0x116/0x220
> [ 163.418757] driver_register+0x59/0x100
> [ 163.418762] ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau 7f21e95875a4a0137564007ae3277f6b641e9279]
> [ 163.418999] do_one_initcall+0x4a/0x220
> [ 163.418999] ? kmalloc_trace+0x2a/0xa0
> [ 163.418999] do_init_module+0x60/0x240
> [ 163.418999] __do_sys_init_module+0x17f/0x1b0
> [ 163.418999] do_syscall_64+0x60/0x90
> [ 163.418999] ? syscall_exit_to_user_mode+0x1b/0x40
> [ 163.418999] ? do_syscall_64+0x6c/0x90
> [ 163.418999] ? count_memcg_events.constprop.0+0x1a/0x30
> [ 163.418999] ? handle_mm_fault+0x9e/0x350
> [ 163.418999] ? do_user_addr_fault+0x179/0x640
> [ 163.418999] ? exc_page_fault+0x71/0x160
> [ 163.418999] entry_SYSCALL_64_after_hwframe+0x72/0xdc
> [ 163.418999] RIP: 0033:0x7f81bfb19a5e
> [ 163.418999] Code: c3 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 66 90 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 7a 03 0d 00 f7 d8 64 89 01 48
> [ 163.418999] RSP: 002b:00007ffddb1760c8 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
> [ 163.418999] RAX: ffffffffffffffda RBX: 0000560bc842df00 RCX: 00007f81bfb19a5e
> [ 163.418999] RDX: 0000560bc8432900 RSI: 00000000006aef9b RDI: 00007f81bea94010
> [ 163.418999] RBP: 0000560bc8432900 R08: 0000560bc8432c20 R09: 0000000000000000
> [ 163.418999] R10: 0000000000012b71 R11: 0000000000000246 R12: 0000000000040000
> [ 163.418999] R13: 0000000000000000 R14: 0000000000000009 R15: 0000560bc842d7b0
> [ 163.418999] </TASK>
> [ 163.418999] ---[ end trace 0000000000000000 ]---
>
> The full dmesg is found in
> https://bugzilla.suse.com/attachment.cgi?id=868688
>
>
> thanks,
>
> Takashi
>


2023-08-09 12:20:13

by Takashi Iwai

[permalink] [raw]
Subject: Re: 2b5d1c29f6c4 ("drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts")

On Wed, 09 Aug 2023 13:42:09 +0200,
Karol Herbst wrote:
>
> On Wed, Aug 9, 2023 at 11:22 AM Takashi Iwai <[email protected]> wrote:
> >
> > On Tue, 08 Aug 2023 12:39:32 +0200,
> > Karol Herbst wrote:
> > >
> > > On Mon, Aug 7, 2023 at 5:05 PM Borislav Petkov <[email protected]> wrote:
> > > >
> > > > On Mon, Aug 07, 2023 at 01:49:42PM +0200, Karol Herbst wrote:
> > > > > in what way does it stop? Just not progressing? That would be kinda
> > > > > concerning. Mind tracing with what arguments `nvkm_uevent_add` is
> > > > > called with and without that patch?
> > > >
> > > > Well, me dumping those args I guess made the box not freeze before
> > > > catching a #PF over serial. Does that help?
> > > >
> > > > ....
> > > > [ 3.410135] Unpacking initramfs...
> > > > [ 3.416319] software IO TLB: mapped [mem 0x00000000a877d000-0x00000000ac77d000] (64MB)
> > > > [ 3.418227] Initialise system trusted keyrings
> > > > [ 3.432273] workingset: timestamp_bits=56 max_order=22 bucket_order=0
> > > > [ 3.439006] ntfs: driver 2.1.32 [Flags: R/W].
> > > > [ 3.443368] fuse: init (API version 7.38)
> > > > [ 3.447601] 9p: Installing v9fs 9p2000 file system support
> > > > [ 3.453223] Key type asymmetric registered
> > > > [ 3.457332] Asymmetric key parser 'x509' registered
> > > > [ 3.462236] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 250)
> > > > [ 3.475865] efifb: probing for efifb
> > > > [ 3.479458] efifb: framebuffer at 0xf9000000, using 1920k, total 1920k
> > > > [ 3.485969] efifb: mode is 800x600x32, linelength=3200, pages=1
> > > > [ 3.491872] efifb: scrolling: redraw
> > > > [ 3.495438] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
> > > > [ 3.502349] Console: switching to colour frame buffer device 100x37
> > > > [ 3.509564] fb0: EFI VGA frame buffer device
> > > > [ 3.514013] ACPI: \_PR_.CP00: Found 4 idle states
> > > > [ 3.518850] ACPI: \_PR_.CP01: Found 4 idle states
> > > > [ 3.523687] ACPI: \_PR_.CP02: Found 4 idle states
> > > > [ 3.528515] ACPI: \_PR_.CP03: Found 4 idle states
> > > > [ 3.533346] ACPI: \_PR_.CP04: Found 4 idle states
> > > > [ 3.538173] ACPI: \_PR_.CP05: Found 4 idle states
> > > > [ 3.543003] ACPI: \_PR_.CP06: Found 4 idle states
> > > > [ 3.544219] Freeing initrd memory: 8196K
> > > > [ 3.547844] ACPI: \_PR_.CP07: Found 4 idle states
> > > > [ 3.609542] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> > > > [ 3.616224] 00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
> > > > [ 3.625552] serial 0000:00:16.3: enabling device (0000 -> 0003)
> > > > [ 3.633034] 0000:00:16.3: ttyS1 at I/O 0xf0a0 (irq = 17, base_baud = 115200) is a 16550A
> > > > [ 3.642451] Linux agpgart interface v0.103
> > > > [ 3.647141] ACPI: bus type drm_connector registered
> > > > [ 3.653261] Console: switching to colour dummy device 80x25
> > > > [ 3.659092] nouveau 0000:03:00.0: vgaarb: deactivate vga console
> > > > [ 3.665174] nouveau 0000:03:00.0: NVIDIA GT218 (0a8c00b1)
> > > > [ 3.784585] nouveau 0000:03:00.0: bios: version 70.18.83.00.08
> > > > [ 3.792244] nouveau 0000:03:00.0: fb: 512 MiB DDR3
> > > > [ 3.948786] nouveau 0000:03:00.0: DRM: VRAM: 512 MiB
> > > > [ 3.953755] nouveau 0000:03:00.0: DRM: GART: 1048576 MiB
> > > > [ 3.959073] nouveau 0000:03:00.0: DRM: TMDS table version 2.0
> > > > [ 3.964808] nouveau 0000:03:00.0: DRM: DCB version 4.0
> > > > [ 3.969938] nouveau 0000:03:00.0: DRM: DCB outp 00: 02000360 00000000
> > > > [ 3.976367] nouveau 0000:03:00.0: DRM: DCB outp 01: 02000362 00020010
> > > > [ 3.982792] nouveau 0000:03:00.0: DRM: DCB outp 02: 028003a6 0f220010
> > > > [ 3.989223] nouveau 0000:03:00.0: DRM: DCB outp 03: 01011380 00000000
> > > > [ 3.995647] nouveau 0000:03:00.0: DRM: DCB outp 04: 08011382 00020010
> > > > [ 4.002076] nouveau 0000:03:00.0: DRM: DCB outp 05: 088113c6 0f220010
> > > > [ 4.008511] nouveau 0000:03:00.0: DRM: DCB conn 00: 00101064
> > > > [ 4.014151] nouveau 0000:03:00.0: DRM: DCB conn 01: 00202165
> > > > [ 4.021710] nvkm_uevent_add: uevent: 0xffff888100242100, event: 0xffff8881022de1a0, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > [ 4.033680] nvkm_uevent_add: uevent: 0xffff888100242300, event: 0xffff8881022de1a0, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > [ 4.045429] nouveau 0000:03:00.0: DRM: MM: using COPY for buffer copies
> > > > [ 4.052059] stackdepot: allocating hash table of 1048576 entries via kvcalloc
> > > > [ 4.067191] nvkm_uevent_add: uevent: 0xffff888100242800, event: 0xffff888104b3e260, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > [ 4.078936] nvkm_uevent_add: uevent: 0xffff888100242900, event: 0xffff888104b3e260, id: 0x1, bits: 0x1, func: 0x0000000000000000
> > > > [ 4.090514] nvkm_uevent_add: uevent: 0xffff888100242a00, event: 0xffff888102091f28, id: 0x1, bits: 0x3, func: 0xffffffff8177b700
> > > > [ 4.102118] tsc: Refined TSC clocksource calibration: 3591.345 MHz
> > > > [ 4.108342] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x33c4635c383, max_idle_ns: 440795314831 ns
> > > > [ 4.108401] nvkm_uevent_add: uevent: 0xffff8881020b6000, event: 0xffff888102091f28, id: 0xf, bits: 0x3, func: 0xffffffff8177b700
> > > > [ 4.129864] clocksource: Switched to clocksource tsc
> > > > [ 4.131478] [drm] Initialized nouveau 1.3.1 20120801 for 0000:03:00.0 on minor 0
> > > > [ 4.143806] BUG: kernel NULL pointer dereference, address: 0000000000000020
> > >
> > > ahh, that would have been good to know :) Mind figuring out what's
> > > exactly NULL inside nvif_object_mthd? Or rather what line
> > > `nvif_object_mthd+0x136` belongs to, then it should be easy to figure
> > > out what's wrong here.
> >
> > FWIW, we've hit the bug on openSUSE Tumbleweed 6.4.8 kernel:
> > https://bugzilla.suse.com/show_bug.cgi?id=1214073
> > Confirmed that reverting the patch cured the issue.
> >
> > FWIW, loading nouveau showed a refcount_t warning just before the NULL
> > dereference:
> >
>
> mh, I wonder if one of those `return -EINVAL;` branches is hit where
> it wasn't before. Could some of you check if `nvkm_uconn_uevent`
> returns -EINVAL with that patch where it didn't before? I wonder if
> it's the `if (&outp->head == &conn->disp->outps) return -EINVAL;` and
> if remove that fixes the crash?

Please give a patch, then I can build a kernel and let the reporter
testing it :)


thanks,

Takashi

2023-08-09 12:47:28

by Karol Herbst

[permalink] [raw]
Subject: Re: 2b5d1c29f6c4 ("drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts")

On Wed, Aug 9, 2023 at 1:46 PM Takashi Iwai <[email protected]> wrote:
>
> On Wed, 09 Aug 2023 13:42:09 +0200,
> Karol Herbst wrote:
> >
> > On Wed, Aug 9, 2023 at 11:22 AM Takashi Iwai <[email protected]> wrote:
> > >
> > > On Tue, 08 Aug 2023 12:39:32 +0200,
> > > Karol Herbst wrote:
> > > >
> > > > On Mon, Aug 7, 2023 at 5:05 PM Borislav Petkov <[email protected]> wrote:
> > > > >
> > > > > On Mon, Aug 07, 2023 at 01:49:42PM +0200, Karol Herbst wrote:
> > > > > > in what way does it stop? Just not progressing? That would be kinda
> > > > > > concerning. Mind tracing with what arguments `nvkm_uevent_add` is
> > > > > > called with and without that patch?
> > > > >
> > > > > Well, me dumping those args I guess made the box not freeze before
> > > > > catching a #PF over serial. Does that help?
> > > > >
> > > > > ....
> > > > > [ 3.410135] Unpacking initramfs...
> > > > > [ 3.416319] software IO TLB: mapped [mem 0x00000000a877d000-0x00000000ac77d000] (64MB)
> > > > > [ 3.418227] Initialise system trusted keyrings
> > > > > [ 3.432273] workingset: timestamp_bits=56 max_order=22 bucket_order=0
> > > > > [ 3.439006] ntfs: driver 2.1.32 [Flags: R/W].
> > > > > [ 3.443368] fuse: init (API version 7.38)
> > > > > [ 3.447601] 9p: Installing v9fs 9p2000 file system support
> > > > > [ 3.453223] Key type asymmetric registered
> > > > > [ 3.457332] Asymmetric key parser 'x509' registered
> > > > > [ 3.462236] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 250)
> > > > > [ 3.475865] efifb: probing for efifb
> > > > > [ 3.479458] efifb: framebuffer at 0xf9000000, using 1920k, total 1920k
> > > > > [ 3.485969] efifb: mode is 800x600x32, linelength=3200, pages=1
> > > > > [ 3.491872] efifb: scrolling: redraw
> > > > > [ 3.495438] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
> > > > > [ 3.502349] Console: switching to colour frame buffer device 100x37
> > > > > [ 3.509564] fb0: EFI VGA frame buffer device
> > > > > [ 3.514013] ACPI: \_PR_.CP00: Found 4 idle states
> > > > > [ 3.518850] ACPI: \_PR_.CP01: Found 4 idle states
> > > > > [ 3.523687] ACPI: \_PR_.CP02: Found 4 idle states
> > > > > [ 3.528515] ACPI: \_PR_.CP03: Found 4 idle states
> > > > > [ 3.533346] ACPI: \_PR_.CP04: Found 4 idle states
> > > > > [ 3.538173] ACPI: \_PR_.CP05: Found 4 idle states
> > > > > [ 3.543003] ACPI: \_PR_.CP06: Found 4 idle states
> > > > > [ 3.544219] Freeing initrd memory: 8196K
> > > > > [ 3.547844] ACPI: \_PR_.CP07: Found 4 idle states
> > > > > [ 3.609542] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> > > > > [ 3.616224] 00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
> > > > > [ 3.625552] serial 0000:00:16.3: enabling device (0000 -> 0003)
> > > > > [ 3.633034] 0000:00:16.3: ttyS1 at I/O 0xf0a0 (irq = 17, base_baud = 115200) is a 16550A
> > > > > [ 3.642451] Linux agpgart interface v0.103
> > > > > [ 3.647141] ACPI: bus type drm_connector registered
> > > > > [ 3.653261] Console: switching to colour dummy device 80x25
> > > > > [ 3.659092] nouveau 0000:03:00.0: vgaarb: deactivate vga console
> > > > > [ 3.665174] nouveau 0000:03:00.0: NVIDIA GT218 (0a8c00b1)
> > > > > [ 3.784585] nouveau 0000:03:00.0: bios: version 70.18.83.00.08
> > > > > [ 3.792244] nouveau 0000:03:00.0: fb: 512 MiB DDR3
> > > > > [ 3.948786] nouveau 0000:03:00.0: DRM: VRAM: 512 MiB
> > > > > [ 3.953755] nouveau 0000:03:00.0: DRM: GART: 1048576 MiB
> > > > > [ 3.959073] nouveau 0000:03:00.0: DRM: TMDS table version 2.0
> > > > > [ 3.964808] nouveau 0000:03:00.0: DRM: DCB version 4.0
> > > > > [ 3.969938] nouveau 0000:03:00.0: DRM: DCB outp 00: 02000360 00000000
> > > > > [ 3.976367] nouveau 0000:03:00.0: DRM: DCB outp 01: 02000362 00020010
> > > > > [ 3.982792] nouveau 0000:03:00.0: DRM: DCB outp 02: 028003a6 0f220010
> > > > > [ 3.989223] nouveau 0000:03:00.0: DRM: DCB outp 03: 01011380 00000000
> > > > > [ 3.995647] nouveau 0000:03:00.0: DRM: DCB outp 04: 08011382 00020010
> > > > > [ 4.002076] nouveau 0000:03:00.0: DRM: DCB outp 05: 088113c6 0f220010
> > > > > [ 4.008511] nouveau 0000:03:00.0: DRM: DCB conn 00: 00101064
> > > > > [ 4.014151] nouveau 0000:03:00.0: DRM: DCB conn 01: 00202165
> > > > > [ 4.021710] nvkm_uevent_add: uevent: 0xffff888100242100, event: 0xffff8881022de1a0, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > > [ 4.033680] nvkm_uevent_add: uevent: 0xffff888100242300, event: 0xffff8881022de1a0, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > > [ 4.045429] nouveau 0000:03:00.0: DRM: MM: using COPY for buffer copies
> > > > > [ 4.052059] stackdepot: allocating hash table of 1048576 entries via kvcalloc
> > > > > [ 4.067191] nvkm_uevent_add: uevent: 0xffff888100242800, event: 0xffff888104b3e260, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > > [ 4.078936] nvkm_uevent_add: uevent: 0xffff888100242900, event: 0xffff888104b3e260, id: 0x1, bits: 0x1, func: 0x0000000000000000
> > > > > [ 4.090514] nvkm_uevent_add: uevent: 0xffff888100242a00, event: 0xffff888102091f28, id: 0x1, bits: 0x3, func: 0xffffffff8177b700
> > > > > [ 4.102118] tsc: Refined TSC clocksource calibration: 3591.345 MHz
> > > > > [ 4.108342] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x33c4635c383, max_idle_ns: 440795314831 ns
> > > > > [ 4.108401] nvkm_uevent_add: uevent: 0xffff8881020b6000, event: 0xffff888102091f28, id: 0xf, bits: 0x3, func: 0xffffffff8177b700
> > > > > [ 4.129864] clocksource: Switched to clocksource tsc
> > > > > [ 4.131478] [drm] Initialized nouveau 1.3.1 20120801 for 0000:03:00.0 on minor 0
> > > > > [ 4.143806] BUG: kernel NULL pointer dereference, address: 0000000000000020
> > > >
> > > > ahh, that would have been good to know :) Mind figuring out what's
> > > > exactly NULL inside nvif_object_mthd? Or rather what line
> > > > `nvif_object_mthd+0x136` belongs to, then it should be easy to figure
> > > > out what's wrong here.
> > >
> > > FWIW, we've hit the bug on openSUSE Tumbleweed 6.4.8 kernel:
> > > https://bugzilla.suse.com/show_bug.cgi?id=1214073
> > > Confirmed that reverting the patch cured the issue.
> > >
> > > FWIW, loading nouveau showed a refcount_t warning just before the NULL
> > > dereference:
> > >
> >
> > mh, I wonder if one of those `return -EINVAL;` branches is hit where
> > it wasn't before. Could some of you check if `nvkm_uconn_uevent`
> > returns -EINVAL with that patch where it didn't before? I wonder if
> > it's the `if (&outp->head == &conn->disp->outps) return -EINVAL;` and
> > if remove that fixes the crash?
>
> Please give a patch, then I can build a kernel and let the reporter
> testing it :)
>

attached a patch.

Anyway, I'll be on PTO for the rest of the week and I kinda wished
somebody else would have time to figure out what's going wrong there,
or at least simply figuring out what the difference is. Not having
direct access to such a GPU also makes it a bit harder. Once I'm back
I'll check with all my GPUs if there is one hitting a difference here,
but the ones I've tested it with so far were all fine sadly.

>
> thanks,
>
> Takashi
>


Attachments:
tmp.patch (627.00 B)
Subject: Re: 2b5d1c29f6c4 ("drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts")

On 09.08.23 15:13, Takashi Iwai wrote:
>
> If this can't be fixed quickly, I suppose it's safer to revert it from
> 6.4.y for now. 6.5 is still being cooked, but 6.4.x is already in
> wide deployment, hence the regression has to be addressed quickly.

Good luck with that. To quote
https://docs.kernel.org/process/handling-regressions.html :

```
Regarding stable and longterm kernels:

[...]

* Whenever you want to swiftly resolve a regression that recently also
made it into a proper mainline, stable, or longterm release, fix it
quickly in mainline; when appropriate thus involve Linus to fast-track
the fix (see above). That's because the stable team normally does
neither revert nor fix any changes that cause the same problems in mainline.
```

Note the "normally" in there, so there is a chance.

Ciao, Thorsten

2023-08-09 15:05:09

by Takashi Iwai

[permalink] [raw]
Subject: Re: 2b5d1c29f6c4 ("drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts")

On Wed, 09 Aug 2023 15:13:23 +0200,
Takashi Iwai wrote:
>
> On Wed, 09 Aug 2023 14:19:23 +0200,
> Karol Herbst wrote:
> >
> > On Wed, Aug 9, 2023 at 1:46 PM Takashi Iwai <[email protected]> wrote:
> > >
> > > On Wed, 09 Aug 2023 13:42:09 +0200,
> > > Karol Herbst wrote:
> > > >
> > > > On Wed, Aug 9, 2023 at 11:22 AM Takashi Iwai <[email protected]> wrote:
> > > > >
> > > > > On Tue, 08 Aug 2023 12:39:32 +0200,
> > > > > Karol Herbst wrote:
> > > > > >
> > > > > > On Mon, Aug 7, 2023 at 5:05 PM Borislav Petkov <[email protected]> wrote:
> > > > > > >
> > > > > > > On Mon, Aug 07, 2023 at 01:49:42PM +0200, Karol Herbst wrote:
> > > > > > > > in what way does it stop? Just not progressing? That would be kinda
> > > > > > > > concerning. Mind tracing with what arguments `nvkm_uevent_add` is
> > > > > > > > called with and without that patch?
> > > > > > >
> > > > > > > Well, me dumping those args I guess made the box not freeze before
> > > > > > > catching a #PF over serial. Does that help?
> > > > > > >
> > > > > > > ....
> > > > > > > [ 3.410135] Unpacking initramfs...
> > > > > > > [ 3.416319] software IO TLB: mapped [mem 0x00000000a877d000-0x00000000ac77d000] (64MB)
> > > > > > > [ 3.418227] Initialise system trusted keyrings
> > > > > > > [ 3.432273] workingset: timestamp_bits=56 max_order=22 bucket_order=0
> > > > > > > [ 3.439006] ntfs: driver 2.1.32 [Flags: R/W].
> > > > > > > [ 3.443368] fuse: init (API version 7.38)
> > > > > > > [ 3.447601] 9p: Installing v9fs 9p2000 file system support
> > > > > > > [ 3.453223] Key type asymmetric registered
> > > > > > > [ 3.457332] Asymmetric key parser 'x509' registered
> > > > > > > [ 3.462236] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 250)
> > > > > > > [ 3.475865] efifb: probing for efifb
> > > > > > > [ 3.479458] efifb: framebuffer at 0xf9000000, using 1920k, total 1920k
> > > > > > > [ 3.485969] efifb: mode is 800x600x32, linelength=3200, pages=1
> > > > > > > [ 3.491872] efifb: scrolling: redraw
> > > > > > > [ 3.495438] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
> > > > > > > [ 3.502349] Console: switching to colour frame buffer device 100x37
> > > > > > > [ 3.509564] fb0: EFI VGA frame buffer device
> > > > > > > [ 3.514013] ACPI: \_PR_.CP00: Found 4 idle states
> > > > > > > [ 3.518850] ACPI: \_PR_.CP01: Found 4 idle states
> > > > > > > [ 3.523687] ACPI: \_PR_.CP02: Found 4 idle states
> > > > > > > [ 3.528515] ACPI: \_PR_.CP03: Found 4 idle states
> > > > > > > [ 3.533346] ACPI: \_PR_.CP04: Found 4 idle states
> > > > > > > [ 3.538173] ACPI: \_PR_.CP05: Found 4 idle states
> > > > > > > [ 3.543003] ACPI: \_PR_.CP06: Found 4 idle states
> > > > > > > [ 3.544219] Freeing initrd memory: 8196K
> > > > > > > [ 3.547844] ACPI: \_PR_.CP07: Found 4 idle states
> > > > > > > [ 3.609542] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> > > > > > > [ 3.616224] 00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
> > > > > > > [ 3.625552] serial 0000:00:16.3: enabling device (0000 -> 0003)
> > > > > > > [ 3.633034] 0000:00:16.3: ttyS1 at I/O 0xf0a0 (irq = 17, base_baud = 115200) is a 16550A
> > > > > > > [ 3.642451] Linux agpgart interface v0.103
> > > > > > > [ 3.647141] ACPI: bus type drm_connector registered
> > > > > > > [ 3.653261] Console: switching to colour dummy device 80x25
> > > > > > > [ 3.659092] nouveau 0000:03:00.0: vgaarb: deactivate vga console
> > > > > > > [ 3.665174] nouveau 0000:03:00.0: NVIDIA GT218 (0a8c00b1)
> > > > > > > [ 3.784585] nouveau 0000:03:00.0: bios: version 70.18.83.00.08
> > > > > > > [ 3.792244] nouveau 0000:03:00.0: fb: 512 MiB DDR3
> > > > > > > [ 3.948786] nouveau 0000:03:00.0: DRM: VRAM: 512 MiB
> > > > > > > [ 3.953755] nouveau 0000:03:00.0: DRM: GART: 1048576 MiB
> > > > > > > [ 3.959073] nouveau 0000:03:00.0: DRM: TMDS table version 2.0
> > > > > > > [ 3.964808] nouveau 0000:03:00.0: DRM: DCB version 4.0
> > > > > > > [ 3.969938] nouveau 0000:03:00.0: DRM: DCB outp 00: 02000360 00000000
> > > > > > > [ 3.976367] nouveau 0000:03:00.0: DRM: DCB outp 01: 02000362 00020010
> > > > > > > [ 3.982792] nouveau 0000:03:00.0: DRM: DCB outp 02: 028003a6 0f220010
> > > > > > > [ 3.989223] nouveau 0000:03:00.0: DRM: DCB outp 03: 01011380 00000000
> > > > > > > [ 3.995647] nouveau 0000:03:00.0: DRM: DCB outp 04: 08011382 00020010
> > > > > > > [ 4.002076] nouveau 0000:03:00.0: DRM: DCB outp 05: 088113c6 0f220010
> > > > > > > [ 4.008511] nouveau 0000:03:00.0: DRM: DCB conn 00: 00101064
> > > > > > > [ 4.014151] nouveau 0000:03:00.0: DRM: DCB conn 01: 00202165
> > > > > > > [ 4.021710] nvkm_uevent_add: uevent: 0xffff888100242100, event: 0xffff8881022de1a0, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > > > > [ 4.033680] nvkm_uevent_add: uevent: 0xffff888100242300, event: 0xffff8881022de1a0, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > > > > [ 4.045429] nouveau 0000:03:00.0: DRM: MM: using COPY for buffer copies
> > > > > > > [ 4.052059] stackdepot: allocating hash table of 1048576 entries via kvcalloc
> > > > > > > [ 4.067191] nvkm_uevent_add: uevent: 0xffff888100242800, event: 0xffff888104b3e260, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > > > > [ 4.078936] nvkm_uevent_add: uevent: 0xffff888100242900, event: 0xffff888104b3e260, id: 0x1, bits: 0x1, func: 0x0000000000000000
> > > > > > > [ 4.090514] nvkm_uevent_add: uevent: 0xffff888100242a00, event: 0xffff888102091f28, id: 0x1, bits: 0x3, func: 0xffffffff8177b700
> > > > > > > [ 4.102118] tsc: Refined TSC clocksource calibration: 3591.345 MHz
> > > > > > > [ 4.108342] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x33c4635c383, max_idle_ns: 440795314831 ns
> > > > > > > [ 4.108401] nvkm_uevent_add: uevent: 0xffff8881020b6000, event: 0xffff888102091f28, id: 0xf, bits: 0x3, func: 0xffffffff8177b700
> > > > > > > [ 4.129864] clocksource: Switched to clocksource tsc
> > > > > > > [ 4.131478] [drm] Initialized nouveau 1.3.1 20120801 for 0000:03:00.0 on minor 0
> > > > > > > [ 4.143806] BUG: kernel NULL pointer dereference, address: 0000000000000020
> > > > > >
> > > > > > ahh, that would have been good to know :) Mind figuring out what's
> > > > > > exactly NULL inside nvif_object_mthd? Or rather what line
> > > > > > `nvif_object_mthd+0x136` belongs to, then it should be easy to figure
> > > > > > out what's wrong here.
> > > > >
> > > > > FWIW, we've hit the bug on openSUSE Tumbleweed 6.4.8 kernel:
> > > > > https://bugzilla.suse.com/show_bug.cgi?id=1214073
> > > > > Confirmed that reverting the patch cured the issue.
> > > > >
> > > > > FWIW, loading nouveau showed a refcount_t warning just before the NULL
> > > > > dereference:
> > > > >
> > > >
> > > > mh, I wonder if one of those `return -EINVAL;` branches is hit where
> > > > it wasn't before. Could some of you check if `nvkm_uconn_uevent`
> > > > returns -EINVAL with that patch where it didn't before? I wonder if
> > > > it's the `if (&outp->head == &conn->disp->outps) return -EINVAL;` and
> > > > if remove that fixes the crash?
> > >
> > > Please give a patch, then I can build a kernel and let the reporter
> > > testing it :)
> > >
> >
> > attached a patch.
>
> Thanks. Now I'm building a test kernel and asked the reporter for
> testing it.

And the result was negative, the boot still hanged up.


Takashi

2023-08-09 15:11:33

by Takashi Iwai

[permalink] [raw]
Subject: Re: 2b5d1c29f6c4 ("drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts")

On Wed, 09 Aug 2023 14:19:23 +0200,
Karol Herbst wrote:
>
> On Wed, Aug 9, 2023 at 1:46 PM Takashi Iwai <[email protected]> wrote:
> >
> > On Wed, 09 Aug 2023 13:42:09 +0200,
> > Karol Herbst wrote:
> > >
> > > On Wed, Aug 9, 2023 at 11:22 AM Takashi Iwai <[email protected]> wrote:
> > > >
> > > > On Tue, 08 Aug 2023 12:39:32 +0200,
> > > > Karol Herbst wrote:
> > > > >
> > > > > On Mon, Aug 7, 2023 at 5:05 PM Borislav Petkov <[email protected]> wrote:
> > > > > >
> > > > > > On Mon, Aug 07, 2023 at 01:49:42PM +0200, Karol Herbst wrote:
> > > > > > > in what way does it stop? Just not progressing? That would be kinda
> > > > > > > concerning. Mind tracing with what arguments `nvkm_uevent_add` is
> > > > > > > called with and without that patch?
> > > > > >
> > > > > > Well, me dumping those args I guess made the box not freeze before
> > > > > > catching a #PF over serial. Does that help?
> > > > > >
> > > > > > ....
> > > > > > [ 3.410135] Unpacking initramfs...
> > > > > > [ 3.416319] software IO TLB: mapped [mem 0x00000000a877d000-0x00000000ac77d000] (64MB)
> > > > > > [ 3.418227] Initialise system trusted keyrings
> > > > > > [ 3.432273] workingset: timestamp_bits=56 max_order=22 bucket_order=0
> > > > > > [ 3.439006] ntfs: driver 2.1.32 [Flags: R/W].
> > > > > > [ 3.443368] fuse: init (API version 7.38)
> > > > > > [ 3.447601] 9p: Installing v9fs 9p2000 file system support
> > > > > > [ 3.453223] Key type asymmetric registered
> > > > > > [ 3.457332] Asymmetric key parser 'x509' registered
> > > > > > [ 3.462236] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 250)
> > > > > > [ 3.475865] efifb: probing for efifb
> > > > > > [ 3.479458] efifb: framebuffer at 0xf9000000, using 1920k, total 1920k
> > > > > > [ 3.485969] efifb: mode is 800x600x32, linelength=3200, pages=1
> > > > > > [ 3.491872] efifb: scrolling: redraw
> > > > > > [ 3.495438] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
> > > > > > [ 3.502349] Console: switching to colour frame buffer device 100x37
> > > > > > [ 3.509564] fb0: EFI VGA frame buffer device
> > > > > > [ 3.514013] ACPI: \_PR_.CP00: Found 4 idle states
> > > > > > [ 3.518850] ACPI: \_PR_.CP01: Found 4 idle states
> > > > > > [ 3.523687] ACPI: \_PR_.CP02: Found 4 idle states
> > > > > > [ 3.528515] ACPI: \_PR_.CP03: Found 4 idle states
> > > > > > [ 3.533346] ACPI: \_PR_.CP04: Found 4 idle states
> > > > > > [ 3.538173] ACPI: \_PR_.CP05: Found 4 idle states
> > > > > > [ 3.543003] ACPI: \_PR_.CP06: Found 4 idle states
> > > > > > [ 3.544219] Freeing initrd memory: 8196K
> > > > > > [ 3.547844] ACPI: \_PR_.CP07: Found 4 idle states
> > > > > > [ 3.609542] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> > > > > > [ 3.616224] 00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
> > > > > > [ 3.625552] serial 0000:00:16.3: enabling device (0000 -> 0003)
> > > > > > [ 3.633034] 0000:00:16.3: ttyS1 at I/O 0xf0a0 (irq = 17, base_baud = 115200) is a 16550A
> > > > > > [ 3.642451] Linux agpgart interface v0.103
> > > > > > [ 3.647141] ACPI: bus type drm_connector registered
> > > > > > [ 3.653261] Console: switching to colour dummy device 80x25
> > > > > > [ 3.659092] nouveau 0000:03:00.0: vgaarb: deactivate vga console
> > > > > > [ 3.665174] nouveau 0000:03:00.0: NVIDIA GT218 (0a8c00b1)
> > > > > > [ 3.784585] nouveau 0000:03:00.0: bios: version 70.18.83.00.08
> > > > > > [ 3.792244] nouveau 0000:03:00.0: fb: 512 MiB DDR3
> > > > > > [ 3.948786] nouveau 0000:03:00.0: DRM: VRAM: 512 MiB
> > > > > > [ 3.953755] nouveau 0000:03:00.0: DRM: GART: 1048576 MiB
> > > > > > [ 3.959073] nouveau 0000:03:00.0: DRM: TMDS table version 2.0
> > > > > > [ 3.964808] nouveau 0000:03:00.0: DRM: DCB version 4.0
> > > > > > [ 3.969938] nouveau 0000:03:00.0: DRM: DCB outp 00: 02000360 00000000
> > > > > > [ 3.976367] nouveau 0000:03:00.0: DRM: DCB outp 01: 02000362 00020010
> > > > > > [ 3.982792] nouveau 0000:03:00.0: DRM: DCB outp 02: 028003a6 0f220010
> > > > > > [ 3.989223] nouveau 0000:03:00.0: DRM: DCB outp 03: 01011380 00000000
> > > > > > [ 3.995647] nouveau 0000:03:00.0: DRM: DCB outp 04: 08011382 00020010
> > > > > > [ 4.002076] nouveau 0000:03:00.0: DRM: DCB outp 05: 088113c6 0f220010
> > > > > > [ 4.008511] nouveau 0000:03:00.0: DRM: DCB conn 00: 00101064
> > > > > > [ 4.014151] nouveau 0000:03:00.0: DRM: DCB conn 01: 00202165
> > > > > > [ 4.021710] nvkm_uevent_add: uevent: 0xffff888100242100, event: 0xffff8881022de1a0, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > > > [ 4.033680] nvkm_uevent_add: uevent: 0xffff888100242300, event: 0xffff8881022de1a0, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > > > [ 4.045429] nouveau 0000:03:00.0: DRM: MM: using COPY for buffer copies
> > > > > > [ 4.052059] stackdepot: allocating hash table of 1048576 entries via kvcalloc
> > > > > > [ 4.067191] nvkm_uevent_add: uevent: 0xffff888100242800, event: 0xffff888104b3e260, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > > > [ 4.078936] nvkm_uevent_add: uevent: 0xffff888100242900, event: 0xffff888104b3e260, id: 0x1, bits: 0x1, func: 0x0000000000000000
> > > > > > [ 4.090514] nvkm_uevent_add: uevent: 0xffff888100242a00, event: 0xffff888102091f28, id: 0x1, bits: 0x3, func: 0xffffffff8177b700
> > > > > > [ 4.102118] tsc: Refined TSC clocksource calibration: 3591.345 MHz
> > > > > > [ 4.108342] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x33c4635c383, max_idle_ns: 440795314831 ns
> > > > > > [ 4.108401] nvkm_uevent_add: uevent: 0xffff8881020b6000, event: 0xffff888102091f28, id: 0xf, bits: 0x3, func: 0xffffffff8177b700
> > > > > > [ 4.129864] clocksource: Switched to clocksource tsc
> > > > > > [ 4.131478] [drm] Initialized nouveau 1.3.1 20120801 for 0000:03:00.0 on minor 0
> > > > > > [ 4.143806] BUG: kernel NULL pointer dereference, address: 0000000000000020
> > > > >
> > > > > ahh, that would have been good to know :) Mind figuring out what's
> > > > > exactly NULL inside nvif_object_mthd? Or rather what line
> > > > > `nvif_object_mthd+0x136` belongs to, then it should be easy to figure
> > > > > out what's wrong here.
> > > >
> > > > FWIW, we've hit the bug on openSUSE Tumbleweed 6.4.8 kernel:
> > > > https://bugzilla.suse.com/show_bug.cgi?id=1214073
> > > > Confirmed that reverting the patch cured the issue.
> > > >
> > > > FWIW, loading nouveau showed a refcount_t warning just before the NULL
> > > > dereference:
> > > >
> > >
> > > mh, I wonder if one of those `return -EINVAL;` branches is hit where
> > > it wasn't before. Could some of you check if `nvkm_uconn_uevent`
> > > returns -EINVAL with that patch where it didn't before? I wonder if
> > > it's the `if (&outp->head == &conn->disp->outps) return -EINVAL;` and
> > > if remove that fixes the crash?
> >
> > Please give a patch, then I can build a kernel and let the reporter
> > testing it :)
> >
>
> attached a patch.

Thanks. Now I'm building a test kernel and asked the reporter for
testing it.

> Anyway, I'll be on PTO for the rest of the week and I kinda wished
> somebody else would have time to figure out what's going wrong there,
> or at least simply figuring out what the difference is. Not having
> direct access to such a GPU also makes it a bit harder. Once I'm back
> I'll check with all my GPUs if there is one hitting a difference here,
> but the ones I've tested it with so far were all fine sadly.

If this can't be fixed quickly, I suppose it's safer to revert it from
6.4.y for now. 6.5 is still being cooked, but 6.4.x is already in
wide deployment, hence the regression has to be addressed quickly.


Takashi

>
> >
> > thanks,
> >
> > Takashi
> >
> diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/disp/uconn.c b/drivers/gpu/drm/nouveau/nvkm/engine/disp/uconn.c
> index 46b057fe1412e..3666dfb7ecbf4 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/engine/disp/uconn.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/engine/disp/uconn.c
> @@ -85,8 +85,8 @@ nvkm_uconn_uevent(struct nvkm_object *object, void *argv, u32 argc, struct nvkm_
> break;
> }
>
> - if (&outp->head == &conn->disp->outps)
> - return -EINVAL;
> +// if (&outp->head == &conn->disp->outps)
> +// return -EINVAL;
>
> if (outp->dp.aux && !outp->info.location) {
> if (args->v0.types & NVIF_CONN_EVENT_V0_PLUG ) bits |= NVKM_I2C_PLUG;

2023-08-09 16:59:17

by Takashi Iwai

[permalink] [raw]
Subject: Re: 2b5d1c29f6c4 ("drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts")

On Wed, 09 Aug 2023 16:46:38 +0200,
Takashi Iwai wrote:
>
> On Wed, 09 Aug 2023 15:13:23 +0200,
> Takashi Iwai wrote:
> >
> > On Wed, 09 Aug 2023 14:19:23 +0200,
> > Karol Herbst wrote:
> > >
> > > On Wed, Aug 9, 2023 at 1:46 PM Takashi Iwai <[email protected]> wrote:
> > > >
> > > > On Wed, 09 Aug 2023 13:42:09 +0200,
> > > > Karol Herbst wrote:
> > > > >
> > > > > On Wed, Aug 9, 2023 at 11:22 AM Takashi Iwai <[email protected]> wrote:
> > > > > >
> > > > > > On Tue, 08 Aug 2023 12:39:32 +0200,
> > > > > > Karol Herbst wrote:
> > > > > > >
> > > > > > > On Mon, Aug 7, 2023 at 5:05 PM Borislav Petkov <[email protected]> wrote:
> > > > > > > >
> > > > > > > > On Mon, Aug 07, 2023 at 01:49:42PM +0200, Karol Herbst wrote:
> > > > > > > > > in what way does it stop? Just not progressing? That would be kinda
> > > > > > > > > concerning. Mind tracing with what arguments `nvkm_uevent_add` is
> > > > > > > > > called with and without that patch?
> > > > > > > >
> > > > > > > > Well, me dumping those args I guess made the box not freeze before
> > > > > > > > catching a #PF over serial. Does that help?
> > > > > > > >
> > > > > > > > ....
> > > > > > > > [ 3.410135] Unpacking initramfs...
> > > > > > > > [ 3.416319] software IO TLB: mapped [mem 0x00000000a877d000-0x00000000ac77d000] (64MB)
> > > > > > > > [ 3.418227] Initialise system trusted keyrings
> > > > > > > > [ 3.432273] workingset: timestamp_bits=56 max_order=22 bucket_order=0
> > > > > > > > [ 3.439006] ntfs: driver 2.1.32 [Flags: R/W].
> > > > > > > > [ 3.443368] fuse: init (API version 7.38)
> > > > > > > > [ 3.447601] 9p: Installing v9fs 9p2000 file system support
> > > > > > > > [ 3.453223] Key type asymmetric registered
> > > > > > > > [ 3.457332] Asymmetric key parser 'x509' registered
> > > > > > > > [ 3.462236] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 250)
> > > > > > > > [ 3.475865] efifb: probing for efifb
> > > > > > > > [ 3.479458] efifb: framebuffer at 0xf9000000, using 1920k, total 1920k
> > > > > > > > [ 3.485969] efifb: mode is 800x600x32, linelength=3200, pages=1
> > > > > > > > [ 3.491872] efifb: scrolling: redraw
> > > > > > > > [ 3.495438] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
> > > > > > > > [ 3.502349] Console: switching to colour frame buffer device 100x37
> > > > > > > > [ 3.509564] fb0: EFI VGA frame buffer device
> > > > > > > > [ 3.514013] ACPI: \_PR_.CP00: Found 4 idle states
> > > > > > > > [ 3.518850] ACPI: \_PR_.CP01: Found 4 idle states
> > > > > > > > [ 3.523687] ACPI: \_PR_.CP02: Found 4 idle states
> > > > > > > > [ 3.528515] ACPI: \_PR_.CP03: Found 4 idle states
> > > > > > > > [ 3.533346] ACPI: \_PR_.CP04: Found 4 idle states
> > > > > > > > [ 3.538173] ACPI: \_PR_.CP05: Found 4 idle states
> > > > > > > > [ 3.543003] ACPI: \_PR_.CP06: Found 4 idle states
> > > > > > > > [ 3.544219] Freeing initrd memory: 8196K
> > > > > > > > [ 3.547844] ACPI: \_PR_.CP07: Found 4 idle states
> > > > > > > > [ 3.609542] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> > > > > > > > [ 3.616224] 00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
> > > > > > > > [ 3.625552] serial 0000:00:16.3: enabling device (0000 -> 0003)
> > > > > > > > [ 3.633034] 0000:00:16.3: ttyS1 at I/O 0xf0a0 (irq = 17, base_baud = 115200) is a 16550A
> > > > > > > > [ 3.642451] Linux agpgart interface v0.103
> > > > > > > > [ 3.647141] ACPI: bus type drm_connector registered
> > > > > > > > [ 3.653261] Console: switching to colour dummy device 80x25
> > > > > > > > [ 3.659092] nouveau 0000:03:00.0: vgaarb: deactivate vga console
> > > > > > > > [ 3.665174] nouveau 0000:03:00.0: NVIDIA GT218 (0a8c00b1)
> > > > > > > > [ 3.784585] nouveau 0000:03:00.0: bios: version 70.18.83.00.08
> > > > > > > > [ 3.792244] nouveau 0000:03:00.0: fb: 512 MiB DDR3
> > > > > > > > [ 3.948786] nouveau 0000:03:00.0: DRM: VRAM: 512 MiB
> > > > > > > > [ 3.953755] nouveau 0000:03:00.0: DRM: GART: 1048576 MiB
> > > > > > > > [ 3.959073] nouveau 0000:03:00.0: DRM: TMDS table version 2.0
> > > > > > > > [ 3.964808] nouveau 0000:03:00.0: DRM: DCB version 4.0
> > > > > > > > [ 3.969938] nouveau 0000:03:00.0: DRM: DCB outp 00: 02000360 00000000
> > > > > > > > [ 3.976367] nouveau 0000:03:00.0: DRM: DCB outp 01: 02000362 00020010
> > > > > > > > [ 3.982792] nouveau 0000:03:00.0: DRM: DCB outp 02: 028003a6 0f220010
> > > > > > > > [ 3.989223] nouveau 0000:03:00.0: DRM: DCB outp 03: 01011380 00000000
> > > > > > > > [ 3.995647] nouveau 0000:03:00.0: DRM: DCB outp 04: 08011382 00020010
> > > > > > > > [ 4.002076] nouveau 0000:03:00.0: DRM: DCB outp 05: 088113c6 0f220010
> > > > > > > > [ 4.008511] nouveau 0000:03:00.0: DRM: DCB conn 00: 00101064
> > > > > > > > [ 4.014151] nouveau 0000:03:00.0: DRM: DCB conn 01: 00202165
> > > > > > > > [ 4.021710] nvkm_uevent_add: uevent: 0xffff888100242100, event: 0xffff8881022de1a0, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > > > > > [ 4.033680] nvkm_uevent_add: uevent: 0xffff888100242300, event: 0xffff8881022de1a0, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > > > > > [ 4.045429] nouveau 0000:03:00.0: DRM: MM: using COPY for buffer copies
> > > > > > > > [ 4.052059] stackdepot: allocating hash table of 1048576 entries via kvcalloc
> > > > > > > > [ 4.067191] nvkm_uevent_add: uevent: 0xffff888100242800, event: 0xffff888104b3e260, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > > > > > [ 4.078936] nvkm_uevent_add: uevent: 0xffff888100242900, event: 0xffff888104b3e260, id: 0x1, bits: 0x1, func: 0x0000000000000000
> > > > > > > > [ 4.090514] nvkm_uevent_add: uevent: 0xffff888100242a00, event: 0xffff888102091f28, id: 0x1, bits: 0x3, func: 0xffffffff8177b700
> > > > > > > > [ 4.102118] tsc: Refined TSC clocksource calibration: 3591.345 MHz
> > > > > > > > [ 4.108342] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x33c4635c383, max_idle_ns: 440795314831 ns
> > > > > > > > [ 4.108401] nvkm_uevent_add: uevent: 0xffff8881020b6000, event: 0xffff888102091f28, id: 0xf, bits: 0x3, func: 0xffffffff8177b700
> > > > > > > > [ 4.129864] clocksource: Switched to clocksource tsc
> > > > > > > > [ 4.131478] [drm] Initialized nouveau 1.3.1 20120801 for 0000:03:00.0 on minor 0
> > > > > > > > [ 4.143806] BUG: kernel NULL pointer dereference, address: 0000000000000020
> > > > > > >
> > > > > > > ahh, that would have been good to know :) Mind figuring out what's
> > > > > > > exactly NULL inside nvif_object_mthd? Or rather what line
> > > > > > > `nvif_object_mthd+0x136` belongs to, then it should be easy to figure
> > > > > > > out what's wrong here.
> > > > > >
> > > > > > FWIW, we've hit the bug on openSUSE Tumbleweed 6.4.8 kernel:
> > > > > > https://bugzilla.suse.com/show_bug.cgi?id=1214073
> > > > > > Confirmed that reverting the patch cured the issue.
> > > > > >
> > > > > > FWIW, loading nouveau showed a refcount_t warning just before the NULL
> > > > > > dereference:
> > > > > >
> > > > >
> > > > > mh, I wonder if one of those `return -EINVAL;` branches is hit where
> > > > > it wasn't before. Could some of you check if `nvkm_uconn_uevent`
> > > > > returns -EINVAL with that patch where it didn't before? I wonder if
> > > > > it's the `if (&outp->head == &conn->disp->outps) return -EINVAL;` and
> > > > > if remove that fixes the crash?
> > > >
> > > > Please give a patch, then I can build a kernel and let the reporter
> > > > testing it :)
> > > >
> > >
> > > attached a patch.
> >
> > Thanks. Now I'm building a test kernel and asked the reporter for
> > testing it.
>
> And the result was negative, the boot still hanged up.

And below is another log from the 6.4.8 kernel with KASAN-enabled.
Some memory corruption seems happening.

[ 228.422919] nouveau 0000:02:00.0: DRM: DCB conn 01: 0000a146
[ 228.428674] nouveau 0000:02:00.0: DRM: MM: using M2MF for buffer copies
[ 228.436682] ==================================================================
[ 228.436698] BUG: KASAN: slab-use-after-free in drm_connector_list_iter_next+0x176/0x320
[ 228.436715] Read of size 4 at addr ffff8881731ce050 by task modprobe/6174

[ 228.436728] CPU: 0 PID: 6174 Comm: modprobe Not tainted 6.4.9-4.g5b9ad20-default #1 openSUSE Tumbleweed (unreleased) d0a6841e538b38d17513f6942fb58770372b54fd
[ 228.436740] Hardware name: Apple Inc. MacBook5,1/Mac-F42D89C8, BIOS MB51.88Z.007D.B03.0904271443 04/27/09
[ 228.436747] Call Trace:
[ 228.436753] <TASK>
[ 228.436759] dump_stack_lvl+0x47/0x60
[ 228.436773] print_report+0xcf/0x640
[ 228.436784] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[ 228.436797] ? drm_connector_list_iter_next+0x176/0x320
[ 228.436807] kasan_report+0xb1/0xe0
[ 228.436817] ? drm_connector_list_iter_next+0x176/0x320
[ 228.436828] kasan_check_range+0x105/0x1b0
[ 228.436837] drm_connector_list_iter_next+0x176/0x320
[ 228.436848] ? __pfx_drm_connector_list_iter_next+0x10/0x10
[ 228.436859] ? __kmem_cache_free+0x18a/0x2c0
[ 228.436868] nouveau_connector_create+0x170/0x1cd0 [nouveau d0287dfba9984367c331e8149297392f67038244]
[ 228.437540] ? drm_encoder_init+0xbe/0x140
[ 228.437554] ? __pfx_nouveau_connector_create+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
[ 228.438137] ? nvif_outp_ctor+0x2d9/0x430 [nouveau d0287dfba9984367c331e8149297392f67038244]
[ 228.438236] nv50_display_create+0xe54/0x30d0 [nouveau d0287dfba9984367c331e8149297392f67038244]
[ 228.438236] nouveau_display_create+0x903/0x10c0 [nouveau d0287dfba9984367c331e8149297392f67038244]
[ 228.438236] nouveau_drm_device_init+0x3a4/0x19e0 [nouveau d0287dfba9984367c331e8149297392f67038244]
[ 228.438236] ? __pfx_nouveau_drm_device_init+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
[ 228.438236] ? __pfx_pci_update_current_state+0x10/0x10
[ 228.438236] ? __kasan_check_byte+0x13/0x50
[ 228.438236] nouveau_drm_probe+0x1a2/0x6b0 [nouveau d0287dfba9984367c331e8149297392f67038244]
[ 228.438236] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[ 228.438236] ? __pfx_nouveau_drm_probe+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
[ 228.438236] ? __pfx_nouveau_drm_probe+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
[ 228.438236] local_pci_probe+0xdd/0x190
[ 228.438236] pci_device_probe+0x23a/0x770
[ 228.438236] ? kernfs_add_one+0x2d8/0x450
[ 228.438236] ? kernfs_get.part.0+0x4c/0x70
[ 228.438236] ? __pfx_pci_device_probe+0x10/0x10
[ 228.438236] ? kernfs_create_link+0x15f/0x230
[ 228.438236] ? kernfs_put+0x1c/0x40
[ 228.438236] ? sysfs_do_create_link_sd+0x8e/0x100
[ 228.438236] really_probe+0x3e2/0xb80
[ 228.438236] __driver_probe_device+0x18c/0x450
[ 228.438236] ? __pfx_klist_iter_init_node+0x10/0x10
[ 228.438236] driver_probe_device+0x4a/0x120
[ 228.438236] __driver_attach+0x1e1/0x4a0
[ 228.438236] ? __pfx___driver_attach+0x10/0x10
[ 228.438236] bus_for_each_dev+0xf4/0x170
[ 228.438236] ? __pfx__raw_spin_lock+0x10/0x10
[ 228.438236] ? __pfx_bus_for_each_dev+0x10/0x10
[ 228.438236] bus_add_driver+0x29e/0x570
[ 228.438236] ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
[ 228.438236] ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
[ 228.438236] driver_register+0x134/0x460
[ 228.438236] ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
[ 228.438236] do_one_initcall+0x8e/0x310
[ 228.438236] ? __pfx_do_one_initcall+0x10/0x10
[ 228.438236] ? __kmem_cache_alloc_node+0x1b9/0x3b0
[ 228.438236] ? do_init_module+0x4b/0x730
[ 228.438236] ? kasan_unpoison+0x44/0x70
[ 228.438236] do_init_module+0x238/0x730
[ 228.438236] load_module+0x5b41/0x6dd0
[ 228.438236] ? __pfx_load_module+0x10/0x10
[ 228.438236] ? _raw_spin_lock+0x85/0xe0
[ 228.438236] ? __pfx__raw_spin_lock+0x10/0x10
[ 228.438236] ? find_vmap_area+0xab/0xe0
[ 228.438236] ? __do_sys_init_module+0x1df/0x210
[ 228.438236] __do_sys_init_module+0x1df/0x210
[ 228.438236] ? __pfx___do_sys_init_module+0x10/0x10
[ 228.438236] ? syscall_exit_to_user_mode+0x1b/0x40
[ 228.438236] ? do_syscall_64+0x6c/0x90
[ 228.438236] ? __pfx_ksys_read+0x10/0x10
[ 228.438236] do_syscall_64+0x60/0x90
[ 228.438236] ? syscall_exit_to_user_mode+0x1b/0x40
[ 228.438236] ? do_syscall_64+0x6c/0x90
[ 228.438236] ? syscall_exit_to_user_mode+0x1b/0x40
[ 228.438236] ? do_syscall_64+0x6c/0x90
[ 228.438236] ? exc_page_fault+0x62/0xd0
[ 228.438236] entry_SYSCALL_64_after_hwframe+0x77/0xe1
[ 228.438236] RIP: 0033:0x7f91ce119a5e
[ 228.438236] Code: c3 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 66 90 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 7a 03 0d 00 f7 d8 64 89 01 48
[ 228.438236] RSP: 002b:00007ffce2813538 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[ 228.438236] RAX: ffffffffffffffda RBX: 00005588462def10 RCX: 00007f91ce119a5e
[ 228.438236] RDX: 00005588462e39c0 RSI: 0000000000fda8b2 RDI: 00007f91cc371010
[ 228.438236] RBP: 00005588462e39c0 R08: 00005588462e3ce0 R09: 0000000000000000
[ 228.438236] R10: 000000000005af11 R11: 0000000000000246 R12: 0000000000040000
[ 228.438236] R13: 0000000000000000 R14: 0000000000000009 R15: 00005588462de7c0
[ 228.438236] </TASK>

[ 228.438236] Allocated by task 6174:
[ 228.438236] kasan_save_stack+0x20/0x40
[ 228.438236] kasan_set_track+0x25/0x30
[ 228.438236] __kasan_kmalloc+0xaa/0xb0
[ 228.438236] nouveau_connector_create+0x386/0x1cd0 [nouveau]
[ 228.438236] nv50_display_create+0xe54/0x30d0 [nouveau]
[ 228.438236] nouveau_display_create+0x903/0x10c0 [nouveau]
[ 228.438236] nouveau_drm_device_init+0x3a4/0x19e0 [nouveau]
[ 228.438236] nouveau_drm_probe+0x1a2/0x6b0 [nouveau]
[ 228.438236] local_pci_probe+0xdd/0x190
[ 228.438236] pci_device_probe+0x23a/0x770
[ 228.438236] really_probe+0x3e2/0xb80
[ 228.438236] __driver_probe_device+0x18c/0x450
[ 228.438236] driver_probe_device+0x4a/0x120
[ 228.438236] __driver_attach+0x1e1/0x4a0
[ 228.438236] bus_for_each_dev+0xf4/0x170
[ 228.438236] bus_add_driver+0x29e/0x570
[ 228.438236] driver_register+0x134/0x460
[ 228.438236] do_one_initcall+0x8e/0x310
[ 228.438236] do_init_module+0x238/0x730
[ 228.438236] load_module+0x5b41/0x6dd0
[ 228.438236] __do_sys_init_module+0x1df/0x210
[ 228.438236] do_syscall_64+0x60/0x90
[ 228.438236] entry_SYSCALL_64_after_hwframe+0x77/0xe1

[ 228.438236] Freed by task 6174:
[ 228.438236] kasan_save_stack+0x20/0x40
[ 228.438236] kasan_set_track+0x25/0x30
[ 228.438236] kasan_save_free_info+0x2e/0x50
[ 228.438236] ____kasan_slab_free+0x169/0x1c0
[ 228.438236] slab_free_freelist_hook+0xcd/0x190
[ 228.438236] __kmem_cache_free+0x18a/0x2c0
[ 228.438236] nouveau_connector_create+0x1423/0x1cd0 [nouveau]
[ 228.438236] nv50_display_create+0xe54/0x30d0 [nouveau]
[ 228.438236] nouveau_display_create+0x903/0x10c0 [nouveau]
[ 228.438236] nouveau_drm_device_init+0x3a4/0x19e0 [nouveau]
[ 228.438236] nouveau_drm_probe+0x1a2/0x6b0 [nouveau]
[ 228.438236] local_pci_probe+0xdd/0x190
[ 228.438236] pci_device_probe+0x23a/0x770
[ 228.438236] really_probe+0x3e2/0xb80
[ 228.438236] __driver_probe_device+0x18c/0x450
[ 228.438236] driver_probe_device+0x4a/0x120
[ 228.438236] __driver_attach+0x1e1/0x4a0
[ 228.438236] bus_for_each_dev+0xf4/0x170
[ 228.438236] bus_add_driver+0x29e/0x570
[ 228.438236] driver_register+0x134/0x460
[ 228.438236] do_one_initcall+0x8e/0x310
[ 228.438236] do_init_module+0x238/0x730
[ 228.438236] load_module+0x5b41/0x6dd0
[ 228.438236] __do_sys_init_module+0x1df/0x210
[ 228.438236] do_syscall_64+0x60/0x90
[ 228.438236] entry_SYSCALL_64_after_hwframe+0x77/0xe1

[ 228.438236] The buggy address belongs to the object at ffff8881731ce000
which belongs to the cache kmalloc-4k of size 4096
[ 228.438236] The buggy address is located 80 bytes inside of
freed 4096-byte region [ffff8881731ce000, ffff8881731cf000)

[ 228.438236] The buggy address belongs to the physical page:
[ 228.438236] page:00000000d1c274b4 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1731c8
[ 228.438236] head:00000000d1c274b4 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[ 228.438236] flags: 0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
[ 228.438236] page_type: 0xffffffff()
[ 228.438236] raw: 0017ffffc0010200 ffff888100042140 dead000000000122 0000000000000000
[ 228.438236] raw: 0000000000000000 0000000080040004 00000001ffffffff 0000000000000000
[ 228.438236] page dumped because: kasan: bad access detected

[ 228.438236] Memory state around the buggy address:
[ 228.438236] ffff8881731cdf00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 228.438236] ffff8881731cdf80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 228.438236] >ffff8881731ce000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 228.438236] ^
[ 228.438236] ffff8881731ce080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 228.438236] ffff8881731ce100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 228.438236] ==================================================================


Takashi

2023-08-09 19:14:22

by Karol Herbst

[permalink] [raw]
Subject: Re: 2b5d1c29f6c4 ("drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts")

On Wed, Aug 9, 2023 at 8:28 PM Karol Herbst <[email protected]> wrote:
>
> On Wed, Aug 9, 2023 at 4:04 PM Thorsten Leemhuis
> <[email protected]> wrote:
> >
> > On 09.08.23 15:13, Takashi Iwai wrote:
> > >
> > > If this can't be fixed quickly, I suppose it's safer to revert it from
> > > 6.4.y for now. 6.5 is still being cooked, but 6.4.x is already in
> > > wide deployment, hence the regression has to be addressed quickly.
> >
>
> feel free to send reverts to mainline and add my r-by tage to it and I
> can push those changes up. Sadly those patches fixed another
> use-after-free, but it seems like we have to take another shot unless
> somebody does have time to look into it promptly.
>

uhm and the two patches around that one,
752a281032b2d6f4564be827e082bde6f7d2fd4fand
ea293f823a8805735d9e00124df81a8f448ed1ae

> > Good luck with that. To quote
> > https://docs.kernel.org/process/handling-regressions.html :
> >
> > ```
> > Regarding stable and longterm kernels:
> >
> > [...]
> >
> > * Whenever you want to swiftly resolve a regression that recently also
> > made it into a proper mainline, stable, or longterm release, fix it
> > quickly in mainline; when appropriate thus involve Linus to fast-track
> > the fix (see above). That's because the stable team normally does
> > neither revert nor fix any changes that cause the same problems in mainline.
> > ```
> >
> > Note the "normally" in there, so there is a chance.
> >
> > Ciao, Thorsten
> >


2023-08-09 19:46:57

by Karol Herbst

[permalink] [raw]
Subject: Re: 2b5d1c29f6c4 ("drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts")

On Wed, Aug 9, 2023 at 4:04 PM Thorsten Leemhuis
<[email protected]> wrote:
>
> On 09.08.23 15:13, Takashi Iwai wrote:
> >
> > If this can't be fixed quickly, I suppose it's safer to revert it from
> > 6.4.y for now. 6.5 is still being cooked, but 6.4.x is already in
> > wide deployment, hence the regression has to be addressed quickly.
>

feel free to send reverts to mainline and add my r-by tage to it and I
can push those changes up. Sadly those patches fixed another
use-after-free, but it seems like we have to take another shot unless
somebody does have time to look into it promptly.

> Good luck with that. To quote
> https://docs.kernel.org/process/handling-regressions.html :
>
> ```
> Regarding stable and longterm kernels:
>
> [...]
>
> * Whenever you want to swiftly resolve a regression that recently also
> made it into a proper mainline, stable, or longterm release, fix it
> quickly in mainline; when appropriate thus involve Linus to fast-track
> the fix (see above). That's because the stable team normally does
> neither revert nor fix any changes that cause the same problems in mainline.
> ```
>
> Note the "normally" in there, so there is a chance.
>
> Ciao, Thorsten
>


2023-08-14 13:02:06

by Karol Herbst

[permalink] [raw]
Subject: Re: 2b5d1c29f6c4 ("drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts")

On Wed, Aug 9, 2023 at 6:16 PM Takashi Iwai <[email protected]> wrote:
>
> On Wed, 09 Aug 2023 16:46:38 +0200,
> Takashi Iwai wrote:
> >
> > On Wed, 09 Aug 2023 15:13:23 +0200,
> > Takashi Iwai wrote:
> > >
> > > On Wed, 09 Aug 2023 14:19:23 +0200,
> > > Karol Herbst wrote:
> > > >
> > > > On Wed, Aug 9, 2023 at 1:46 PM Takashi Iwai <[email protected]> wrote:
> > > > >
> > > > > On Wed, 09 Aug 2023 13:42:09 +0200,
> > > > > Karol Herbst wrote:
> > > > > >
> > > > > > On Wed, Aug 9, 2023 at 11:22 AM Takashi Iwai <[email protected]> wrote:
> > > > > > >
> > > > > > > On Tue, 08 Aug 2023 12:39:32 +0200,
> > > > > > > Karol Herbst wrote:
> > > > > > > >
> > > > > > > > On Mon, Aug 7, 2023 at 5:05 PM Borislav Petkov <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, Aug 07, 2023 at 01:49:42PM +0200, Karol Herbst wrote:
> > > > > > > > > > in what way does it stop? Just not progressing? That would be kinda
> > > > > > > > > > concerning. Mind tracing with what arguments `nvkm_uevent_add` is
> > > > > > > > > > called with and without that patch?
> > > > > > > > >
> > > > > > > > > Well, me dumping those args I guess made the box not freeze before
> > > > > > > > > catching a #PF over serial. Does that help?
> > > > > > > > >
> > > > > > > > > ....
> > > > > > > > > [ 3.410135] Unpacking initramfs...
> > > > > > > > > [ 3.416319] software IO TLB: mapped [mem 0x00000000a877d000-0x00000000ac77d000] (64MB)
> > > > > > > > > [ 3.418227] Initialise system trusted keyrings
> > > > > > > > > [ 3.432273] workingset: timestamp_bits=56 max_order=22 bucket_order=0
> > > > > > > > > [ 3.439006] ntfs: driver 2.1.32 [Flags: R/W].
> > > > > > > > > [ 3.443368] fuse: init (API version 7.38)
> > > > > > > > > [ 3.447601] 9p: Installing v9fs 9p2000 file system support
> > > > > > > > > [ 3.453223] Key type asymmetric registered
> > > > > > > > > [ 3.457332] Asymmetric key parser 'x509' registered
> > > > > > > > > [ 3.462236] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 250)
> > > > > > > > > [ 3.475865] efifb: probing for efifb
> > > > > > > > > [ 3.479458] efifb: framebuffer at 0xf9000000, using 1920k, total 1920k
> > > > > > > > > [ 3.485969] efifb: mode is 800x600x32, linelength=3200, pages=1
> > > > > > > > > [ 3.491872] efifb: scrolling: redraw
> > > > > > > > > [ 3.495438] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
> > > > > > > > > [ 3.502349] Console: switching to colour frame buffer device 100x37
> > > > > > > > > [ 3.509564] fb0: EFI VGA frame buffer device
> > > > > > > > > [ 3.514013] ACPI: \_PR_.CP00: Found 4 idle states
> > > > > > > > > [ 3.518850] ACPI: \_PR_.CP01: Found 4 idle states
> > > > > > > > > [ 3.523687] ACPI: \_PR_.CP02: Found 4 idle states
> > > > > > > > > [ 3.528515] ACPI: \_PR_.CP03: Found 4 idle states
> > > > > > > > > [ 3.533346] ACPI: \_PR_.CP04: Found 4 idle states
> > > > > > > > > [ 3.538173] ACPI: \_PR_.CP05: Found 4 idle states
> > > > > > > > > [ 3.543003] ACPI: \_PR_.CP06: Found 4 idle states
> > > > > > > > > [ 3.544219] Freeing initrd memory: 8196K
> > > > > > > > > [ 3.547844] ACPI: \_PR_.CP07: Found 4 idle states
> > > > > > > > > [ 3.609542] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> > > > > > > > > [ 3.616224] 00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
> > > > > > > > > [ 3.625552] serial 0000:00:16.3: enabling device (0000 -> 0003)
> > > > > > > > > [ 3.633034] 0000:00:16.3: ttyS1 at I/O 0xf0a0 (irq = 17, base_baud = 115200) is a 16550A
> > > > > > > > > [ 3.642451] Linux agpgart interface v0.103
> > > > > > > > > [ 3.647141] ACPI: bus type drm_connector registered
> > > > > > > > > [ 3.653261] Console: switching to colour dummy device 80x25
> > > > > > > > > [ 3.659092] nouveau 0000:03:00.0: vgaarb: deactivate vga console
> > > > > > > > > [ 3.665174] nouveau 0000:03:00.0: NVIDIA GT218 (0a8c00b1)
> > > > > > > > > [ 3.784585] nouveau 0000:03:00.0: bios: version 70.18.83.00.08
> > > > > > > > > [ 3.792244] nouveau 0000:03:00.0: fb: 512 MiB DDR3
> > > > > > > > > [ 3.948786] nouveau 0000:03:00.0: DRM: VRAM: 512 MiB
> > > > > > > > > [ 3.953755] nouveau 0000:03:00.0: DRM: GART: 1048576 MiB
> > > > > > > > > [ 3.959073] nouveau 0000:03:00.0: DRM: TMDS table version 2.0
> > > > > > > > > [ 3.964808] nouveau 0000:03:00.0: DRM: DCB version 4.0
> > > > > > > > > [ 3.969938] nouveau 0000:03:00.0: DRM: DCB outp 00: 02000360 00000000
> > > > > > > > > [ 3.976367] nouveau 0000:03:00.0: DRM: DCB outp 01: 02000362 00020010
> > > > > > > > > [ 3.982792] nouveau 0000:03:00.0: DRM: DCB outp 02: 028003a6 0f220010
> > > > > > > > > [ 3.989223] nouveau 0000:03:00.0: DRM: DCB outp 03: 01011380 00000000
> > > > > > > > > [ 3.995647] nouveau 0000:03:00.0: DRM: DCB outp 04: 08011382 00020010
> > > > > > > > > [ 4.002076] nouveau 0000:03:00.0: DRM: DCB outp 05: 088113c6 0f220010
> > > > > > > > > [ 4.008511] nouveau 0000:03:00.0: DRM: DCB conn 00: 00101064
> > > > > > > > > [ 4.014151] nouveau 0000:03:00.0: DRM: DCB conn 01: 00202165
> > > > > > > > > [ 4.021710] nvkm_uevent_add: uevent: 0xffff888100242100, event: 0xffff8881022de1a0, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > > > > > > [ 4.033680] nvkm_uevent_add: uevent: 0xffff888100242300, event: 0xffff8881022de1a0, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > > > > > > [ 4.045429] nouveau 0000:03:00.0: DRM: MM: using COPY for buffer copies
> > > > > > > > > [ 4.052059] stackdepot: allocating hash table of 1048576 entries via kvcalloc
> > > > > > > > > [ 4.067191] nvkm_uevent_add: uevent: 0xffff888100242800, event: 0xffff888104b3e260, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > > > > > > [ 4.078936] nvkm_uevent_add: uevent: 0xffff888100242900, event: 0xffff888104b3e260, id: 0x1, bits: 0x1, func: 0x0000000000000000
> > > > > > > > > [ 4.090514] nvkm_uevent_add: uevent: 0xffff888100242a00, event: 0xffff888102091f28, id: 0x1, bits: 0x3, func: 0xffffffff8177b700
> > > > > > > > > [ 4.102118] tsc: Refined TSC clocksource calibration: 3591.345 MHz
> > > > > > > > > [ 4.108342] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x33c4635c383, max_idle_ns: 440795314831 ns
> > > > > > > > > [ 4.108401] nvkm_uevent_add: uevent: 0xffff8881020b6000, event: 0xffff888102091f28, id: 0xf, bits: 0x3, func: 0xffffffff8177b700
> > > > > > > > > [ 4.129864] clocksource: Switched to clocksource tsc
> > > > > > > > > [ 4.131478] [drm] Initialized nouveau 1.3.1 20120801 for 0000:03:00.0 on minor 0
> > > > > > > > > [ 4.143806] BUG: kernel NULL pointer dereference, address: 0000000000000020
> > > > > > > >
> > > > > > > > ahh, that would have been good to know :) Mind figuring out what's
> > > > > > > > exactly NULL inside nvif_object_mthd? Or rather what line
> > > > > > > > `nvif_object_mthd+0x136` belongs to, then it should be easy to figure
> > > > > > > > out what's wrong here.
> > > > > > >
> > > > > > > FWIW, we've hit the bug on openSUSE Tumbleweed 6.4.8 kernel:
> > > > > > > https://bugzilla.suse.com/show_bug.cgi?id=1214073
> > > > > > > Confirmed that reverting the patch cured the issue.
> > > > > > >
> > > > > > > FWIW, loading nouveau showed a refcount_t warning just before the NULL
> > > > > > > dereference:
> > > > > > >
> > > > > >
> > > > > > mh, I wonder if one of those `return -EINVAL;` branches is hit where
> > > > > > it wasn't before. Could some of you check if `nvkm_uconn_uevent`
> > > > > > returns -EINVAL with that patch where it didn't before? I wonder if
> > > > > > it's the `if (&outp->head == &conn->disp->outps) return -EINVAL;` and
> > > > > > if remove that fixes the crash?
> > > > >
> > > > > Please give a patch, then I can build a kernel and let the reporter
> > > > > testing it :)
> > > > >
> > > >
> > > > attached a patch.
> > >
> > > Thanks. Now I'm building a test kernel and asked the reporter for
> > > testing it.
> >
> > And the result was negative, the boot still hanged up.
>
> And below is another log from the 6.4.8 kernel with KASAN-enabled.
> Some memory corruption seems happening.
>
> [ 228.422919] nouveau 0000:02:00.0: DRM: DCB conn 01: 0000a146
> [ 228.428674] nouveau 0000:02:00.0: DRM: MM: using M2MF for buffer copies
> [ 228.436682] ==================================================================
> [ 228.436698] BUG: KASAN: slab-use-after-free in drm_connector_list_iter_next+0x176/0x320
> [ 228.436715] Read of size 4 at addr ffff8881731ce050 by task modprobe/6174
>
> [ 228.436728] CPU: 0 PID: 6174 Comm: modprobe Not tainted 6.4.9-4.g5b9ad20-default #1 openSUSE Tumbleweed (unreleased) d0a6841e538b38d17513f6942fb58770372b54fd
> [ 228.436740] Hardware name: Apple Inc. MacBook5,1/Mac-F42D89C8, BIOS MB51.88Z.007D.B03.0904271443 04/27/09
> [ 228.436747] Call Trace:
> [ 228.436753] <TASK>
> [ 228.436759] dump_stack_lvl+0x47/0x60
> [ 228.436773] print_report+0xcf/0x640
> [ 228.436784] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
> [ 228.436797] ? drm_connector_list_iter_next+0x176/0x320
> [ 228.436807] kasan_report+0xb1/0xe0
> [ 228.436817] ? drm_connector_list_iter_next+0x176/0x320
> [ 228.436828] kasan_check_range+0x105/0x1b0
> [ 228.436837] drm_connector_list_iter_next+0x176/0x320
> [ 228.436848] ? __pfx_drm_connector_list_iter_next+0x10/0x10
> [ 228.436859] ? __kmem_cache_free+0x18a/0x2c0
> [ 228.436868] nouveau_connector_create+0x170/0x1cd0 [nouveau d0287dfba9984367c331e8149297392f67038244]
> [ 228.437540] ? drm_encoder_init+0xbe/0x140
> [ 228.437554] ? __pfx_nouveau_connector_create+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> [ 228.438137] ? nvif_outp_ctor+0x2d9/0x430 [nouveau d0287dfba9984367c331e8149297392f67038244]
> [ 228.438236] nv50_display_create+0xe54/0x30d0 [nouveau d0287dfba9984367c331e8149297392f67038244]
> [ 228.438236] nouveau_display_create+0x903/0x10c0 [nouveau d0287dfba9984367c331e8149297392f67038244]
> [ 228.438236] nouveau_drm_device_init+0x3a4/0x19e0 [nouveau d0287dfba9984367c331e8149297392f67038244]
> [ 228.438236] ? __pfx_nouveau_drm_device_init+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> [ 228.438236] ? __pfx_pci_update_current_state+0x10/0x10
> [ 228.438236] ? __kasan_check_byte+0x13/0x50
> [ 228.438236] nouveau_drm_probe+0x1a2/0x6b0 [nouveau d0287dfba9984367c331e8149297392f67038244]
> [ 228.438236] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
> [ 228.438236] ? __pfx_nouveau_drm_probe+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> [ 228.438236] ? __pfx_nouveau_drm_probe+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> [ 228.438236] local_pci_probe+0xdd/0x190
> [ 228.438236] pci_device_probe+0x23a/0x770
> [ 228.438236] ? kernfs_add_one+0x2d8/0x450
> [ 228.438236] ? kernfs_get.part.0+0x4c/0x70
> [ 228.438236] ? __pfx_pci_device_probe+0x10/0x10
> [ 228.438236] ? kernfs_create_link+0x15f/0x230
> [ 228.438236] ? kernfs_put+0x1c/0x40
> [ 228.438236] ? sysfs_do_create_link_sd+0x8e/0x100
> [ 228.438236] really_probe+0x3e2/0xb80
> [ 228.438236] __driver_probe_device+0x18c/0x450
> [ 228.438236] ? __pfx_klist_iter_init_node+0x10/0x10
> [ 228.438236] driver_probe_device+0x4a/0x120
> [ 228.438236] __driver_attach+0x1e1/0x4a0
> [ 228.438236] ? __pfx___driver_attach+0x10/0x10
> [ 228.438236] bus_for_each_dev+0xf4/0x170
> [ 228.438236] ? __pfx__raw_spin_lock+0x10/0x10
> [ 228.438236] ? __pfx_bus_for_each_dev+0x10/0x10
> [ 228.438236] bus_add_driver+0x29e/0x570
> [ 228.438236] ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> [ 228.438236] ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> [ 228.438236] driver_register+0x134/0x460
> [ 228.438236] ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> [ 228.438236] do_one_initcall+0x8e/0x310
> [ 228.438236] ? __pfx_do_one_initcall+0x10/0x10
> [ 228.438236] ? __kmem_cache_alloc_node+0x1b9/0x3b0
> [ 228.438236] ? do_init_module+0x4b/0x730
> [ 228.438236] ? kasan_unpoison+0x44/0x70
> [ 228.438236] do_init_module+0x238/0x730
> [ 228.438236] load_module+0x5b41/0x6dd0
> [ 228.438236] ? __pfx_load_module+0x10/0x10
> [ 228.438236] ? _raw_spin_lock+0x85/0xe0
> [ 228.438236] ? __pfx__raw_spin_lock+0x10/0x10
> [ 228.438236] ? find_vmap_area+0xab/0xe0
> [ 228.438236] ? __do_sys_init_module+0x1df/0x210
> [ 228.438236] __do_sys_init_module+0x1df/0x210
> [ 228.438236] ? __pfx___do_sys_init_module+0x10/0x10
> [ 228.438236] ? syscall_exit_to_user_mode+0x1b/0x40
> [ 228.438236] ? do_syscall_64+0x6c/0x90
> [ 228.438236] ? __pfx_ksys_read+0x10/0x10
> [ 228.438236] do_syscall_64+0x60/0x90
> [ 228.438236] ? syscall_exit_to_user_mode+0x1b/0x40
> [ 228.438236] ? do_syscall_64+0x6c/0x90
> [ 228.438236] ? syscall_exit_to_user_mode+0x1b/0x40
> [ 228.438236] ? do_syscall_64+0x6c/0x90
> [ 228.438236] ? exc_page_fault+0x62/0xd0
> [ 228.438236] entry_SYSCALL_64_after_hwframe+0x77/0xe1
> [ 228.438236] RIP: 0033:0x7f91ce119a5e
> [ 228.438236] Code: c3 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 66 90 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 7a 03 0d 00 f7 d8 64 89 01 48
> [ 228.438236] RSP: 002b:00007ffce2813538 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
> [ 228.438236] RAX: ffffffffffffffda RBX: 00005588462def10 RCX: 00007f91ce119a5e
> [ 228.438236] RDX: 00005588462e39c0 RSI: 0000000000fda8b2 RDI: 00007f91cc371010
> [ 228.438236] RBP: 00005588462e39c0 R08: 00005588462e3ce0 R09: 0000000000000000
> [ 228.438236] R10: 000000000005af11 R11: 0000000000000246 R12: 0000000000040000
> [ 228.438236] R13: 0000000000000000 R14: 0000000000000009 R15: 00005588462de7c0
> [ 228.438236] </TASK>
>
> [ 228.438236] Allocated by task 6174:
> [ 228.438236] kasan_save_stack+0x20/0x40
> [ 228.438236] kasan_set_track+0x25/0x30
> [ 228.438236] __kasan_kmalloc+0xaa/0xb0
> [ 228.438236] nouveau_connector_create+0x386/0x1cd0 [nouveau]
> [ 228.438236] nv50_display_create+0xe54/0x30d0 [nouveau]
> [ 228.438236] nouveau_display_create+0x903/0x10c0 [nouveau]
> [ 228.438236] nouveau_drm_device_init+0x3a4/0x19e0 [nouveau]
> [ 228.438236] nouveau_drm_probe+0x1a2/0x6b0 [nouveau]
> [ 228.438236] local_pci_probe+0xdd/0x190
> [ 228.438236] pci_device_probe+0x23a/0x770
> [ 228.438236] really_probe+0x3e2/0xb80
> [ 228.438236] __driver_probe_device+0x18c/0x450
> [ 228.438236] driver_probe_device+0x4a/0x120
> [ 228.438236] __driver_attach+0x1e1/0x4a0
> [ 228.438236] bus_for_each_dev+0xf4/0x170
> [ 228.438236] bus_add_driver+0x29e/0x570
> [ 228.438236] driver_register+0x134/0x460
> [ 228.438236] do_one_initcall+0x8e/0x310
> [ 228.438236] do_init_module+0x238/0x730
> [ 228.438236] load_module+0x5b41/0x6dd0
> [ 228.438236] __do_sys_init_module+0x1df/0x210
> [ 228.438236] do_syscall_64+0x60/0x90
> [ 228.438236] entry_SYSCALL_64_after_hwframe+0x77/0xe1
>
> [ 228.438236] Freed by task 6174:
> [ 228.438236] kasan_save_stack+0x20/0x40
> [ 228.438236] kasan_set_track+0x25/0x30
> [ 228.438236] kasan_save_free_info+0x2e/0x50
> [ 228.438236] ____kasan_slab_free+0x169/0x1c0
> [ 228.438236] slab_free_freelist_hook+0xcd/0x190
> [ 228.438236] __kmem_cache_free+0x18a/0x2c0
> [ 228.438236] nouveau_connector_create+0x1423/0x1cd0 [nouveau]
> [ 228.438236] nv50_display_create+0xe54/0x30d0 [nouveau]
> [ 228.438236] nouveau_display_create+0x903/0x10c0 [nouveau]
> [ 228.438236] nouveau_drm_device_init+0x3a4/0x19e0 [nouveau]
> [ 228.438236] nouveau_drm_probe+0x1a2/0x6b0 [nouveau]
> [ 228.438236] local_pci_probe+0xdd/0x190
> [ 228.438236] pci_device_probe+0x23a/0x770
> [ 228.438236] really_probe+0x3e2/0xb80
> [ 228.438236] __driver_probe_device+0x18c/0x450
> [ 228.438236] driver_probe_device+0x4a/0x120
> [ 228.438236] __driver_attach+0x1e1/0x4a0
> [ 228.438236] bus_for_each_dev+0xf4/0x170
> [ 228.438236] bus_add_driver+0x29e/0x570
> [ 228.438236] driver_register+0x134/0x460
> [ 228.438236] do_one_initcall+0x8e/0x310
> [ 228.438236] do_init_module+0x238/0x730
> [ 228.438236] load_module+0x5b41/0x6dd0
> [ 228.438236] __do_sys_init_module+0x1df/0x210
> [ 228.438236] do_syscall_64+0x60/0x90
> [ 228.438236] entry_SYSCALL_64_after_hwframe+0x77/0xe1
>
> [ 228.438236] The buggy address belongs to the object at ffff8881731ce000
> which belongs to the cache kmalloc-4k of size 4096
> [ 228.438236] The buggy address is located 80 bytes inside of
> freed 4096-byte region [ffff8881731ce000, ffff8881731cf000)
>
> [ 228.438236] The buggy address belongs to the physical page:
> [ 228.438236] page:00000000d1c274b4 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1731c8
> [ 228.438236] head:00000000d1c274b4 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
> [ 228.438236] flags: 0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
> [ 228.438236] page_type: 0xffffffff()
> [ 228.438236] raw: 0017ffffc0010200 ffff888100042140 dead000000000122 0000000000000000
> [ 228.438236] raw: 0000000000000000 0000000080040004 00000001ffffffff 0000000000000000
> [ 228.438236] page dumped because: kasan: bad access detected
>
> [ 228.438236] Memory state around the buggy address:
> [ 228.438236] ffff8881731cdf00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [ 228.438236] ffff8881731cdf80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [ 228.438236] >ffff8881731ce000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [ 228.438236] ^
> [ 228.438236] ffff8881731ce080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [ 228.438236] ffff8881731ce100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [ 228.438236] ==================================================================
>

mind resolving those to file lines via decode_stacktrace.sh or
something, because looking at it, it makes no sense really.

>
> Takashi
>


2023-08-14 13:34:06

by Takashi Iwai

[permalink] [raw]
Subject: Re: 2b5d1c29f6c4 ("drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts")

On Mon, 14 Aug 2023 14:38:18 +0200,
Karol Herbst wrote:
>
> On Wed, Aug 9, 2023 at 6:16 PM Takashi Iwai <[email protected]> wrote:
> >
> > On Wed, 09 Aug 2023 16:46:38 +0200,
> > Takashi Iwai wrote:
> > >
> > > On Wed, 09 Aug 2023 15:13:23 +0200,
> > > Takashi Iwai wrote:
> > > >
> > > > On Wed, 09 Aug 2023 14:19:23 +0200,
> > > > Karol Herbst wrote:
> > > > >
> > > > > On Wed, Aug 9, 2023 at 1:46 PM Takashi Iwai <[email protected]> wrote:
> > > > > >
> > > > > > On Wed, 09 Aug 2023 13:42:09 +0200,
> > > > > > Karol Herbst wrote:
> > > > > > >
> > > > > > > On Wed, Aug 9, 2023 at 11:22 AM Takashi Iwai <[email protected]> wrote:
> > > > > > > >
> > > > > > > > On Tue, 08 Aug 2023 12:39:32 +0200,
> > > > > > > > Karol Herbst wrote:
> > > > > > > > >
> > > > > > > > > On Mon, Aug 7, 2023 at 5:05 PM Borislav Petkov <[email protected]> wrote:
> > > > > > > > > >
> > > > > > > > > > On Mon, Aug 07, 2023 at 01:49:42PM +0200, Karol Herbst wrote:
> > > > > > > > > > > in what way does it stop? Just not progressing? That would be kinda
> > > > > > > > > > > concerning. Mind tracing with what arguments `nvkm_uevent_add` is
> > > > > > > > > > > called with and without that patch?
> > > > > > > > > >
> > > > > > > > > > Well, me dumping those args I guess made the box not freeze before
> > > > > > > > > > catching a #PF over serial. Does that help?
> > > > > > > > > >
> > > > > > > > > > ....
> > > > > > > > > > [ 3.410135] Unpacking initramfs...
> > > > > > > > > > [ 3.416319] software IO TLB: mapped [mem 0x00000000a877d000-0x00000000ac77d000] (64MB)
> > > > > > > > > > [ 3.418227] Initialise system trusted keyrings
> > > > > > > > > > [ 3.432273] workingset: timestamp_bits=56 max_order=22 bucket_order=0
> > > > > > > > > > [ 3.439006] ntfs: driver 2.1.32 [Flags: R/W].
> > > > > > > > > > [ 3.443368] fuse: init (API version 7.38)
> > > > > > > > > > [ 3.447601] 9p: Installing v9fs 9p2000 file system support
> > > > > > > > > > [ 3.453223] Key type asymmetric registered
> > > > > > > > > > [ 3.457332] Asymmetric key parser 'x509' registered
> > > > > > > > > > [ 3.462236] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 250)
> > > > > > > > > > [ 3.475865] efifb: probing for efifb
> > > > > > > > > > [ 3.479458] efifb: framebuffer at 0xf9000000, using 1920k, total 1920k
> > > > > > > > > > [ 3.485969] efifb: mode is 800x600x32, linelength=3200, pages=1
> > > > > > > > > > [ 3.491872] efifb: scrolling: redraw
> > > > > > > > > > [ 3.495438] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
> > > > > > > > > > [ 3.502349] Console: switching to colour frame buffer device 100x37
> > > > > > > > > > [ 3.509564] fb0: EFI VGA frame buffer device
> > > > > > > > > > [ 3.514013] ACPI: \_PR_.CP00: Found 4 idle states
> > > > > > > > > > [ 3.518850] ACPI: \_PR_.CP01: Found 4 idle states
> > > > > > > > > > [ 3.523687] ACPI: \_PR_.CP02: Found 4 idle states
> > > > > > > > > > [ 3.528515] ACPI: \_PR_.CP03: Found 4 idle states
> > > > > > > > > > [ 3.533346] ACPI: \_PR_.CP04: Found 4 idle states
> > > > > > > > > > [ 3.538173] ACPI: \_PR_.CP05: Found 4 idle states
> > > > > > > > > > [ 3.543003] ACPI: \_PR_.CP06: Found 4 idle states
> > > > > > > > > > [ 3.544219] Freeing initrd memory: 8196K
> > > > > > > > > > [ 3.547844] ACPI: \_PR_.CP07: Found 4 idle states
> > > > > > > > > > [ 3.609542] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> > > > > > > > > > [ 3.616224] 00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
> > > > > > > > > > [ 3.625552] serial 0000:00:16.3: enabling device (0000 -> 0003)
> > > > > > > > > > [ 3.633034] 0000:00:16.3: ttyS1 at I/O 0xf0a0 (irq = 17, base_baud = 115200) is a 16550A
> > > > > > > > > > [ 3.642451] Linux agpgart interface v0.103
> > > > > > > > > > [ 3.647141] ACPI: bus type drm_connector registered
> > > > > > > > > > [ 3.653261] Console: switching to colour dummy device 80x25
> > > > > > > > > > [ 3.659092] nouveau 0000:03:00.0: vgaarb: deactivate vga console
> > > > > > > > > > [ 3.665174] nouveau 0000:03:00.0: NVIDIA GT218 (0a8c00b1)
> > > > > > > > > > [ 3.784585] nouveau 0000:03:00.0: bios: version 70.18.83.00.08
> > > > > > > > > > [ 3.792244] nouveau 0000:03:00.0: fb: 512 MiB DDR3
> > > > > > > > > > [ 3.948786] nouveau 0000:03:00.0: DRM: VRAM: 512 MiB
> > > > > > > > > > [ 3.953755] nouveau 0000:03:00.0: DRM: GART: 1048576 MiB
> > > > > > > > > > [ 3.959073] nouveau 0000:03:00.0: DRM: TMDS table version 2.0
> > > > > > > > > > [ 3.964808] nouveau 0000:03:00.0: DRM: DCB version 4.0
> > > > > > > > > > [ 3.969938] nouveau 0000:03:00.0: DRM: DCB outp 00: 02000360 00000000
> > > > > > > > > > [ 3.976367] nouveau 0000:03:00.0: DRM: DCB outp 01: 02000362 00020010
> > > > > > > > > > [ 3.982792] nouveau 0000:03:00.0: DRM: DCB outp 02: 028003a6 0f220010
> > > > > > > > > > [ 3.989223] nouveau 0000:03:00.0: DRM: DCB outp 03: 01011380 00000000
> > > > > > > > > > [ 3.995647] nouveau 0000:03:00.0: DRM: DCB outp 04: 08011382 00020010
> > > > > > > > > > [ 4.002076] nouveau 0000:03:00.0: DRM: DCB outp 05: 088113c6 0f220010
> > > > > > > > > > [ 4.008511] nouveau 0000:03:00.0: DRM: DCB conn 00: 00101064
> > > > > > > > > > [ 4.014151] nouveau 0000:03:00.0: DRM: DCB conn 01: 00202165
> > > > > > > > > > [ 4.021710] nvkm_uevent_add: uevent: 0xffff888100242100, event: 0xffff8881022de1a0, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > > > > > > > [ 4.033680] nvkm_uevent_add: uevent: 0xffff888100242300, event: 0xffff8881022de1a0, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > > > > > > > [ 4.045429] nouveau 0000:03:00.0: DRM: MM: using COPY for buffer copies
> > > > > > > > > > [ 4.052059] stackdepot: allocating hash table of 1048576 entries via kvcalloc
> > > > > > > > > > [ 4.067191] nvkm_uevent_add: uevent: 0xffff888100242800, event: 0xffff888104b3e260, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > > > > > > > [ 4.078936] nvkm_uevent_add: uevent: 0xffff888100242900, event: 0xffff888104b3e260, id: 0x1, bits: 0x1, func: 0x0000000000000000
> > > > > > > > > > [ 4.090514] nvkm_uevent_add: uevent: 0xffff888100242a00, event: 0xffff888102091f28, id: 0x1, bits: 0x3, func: 0xffffffff8177b700
> > > > > > > > > > [ 4.102118] tsc: Refined TSC clocksource calibration: 3591.345 MHz
> > > > > > > > > > [ 4.108342] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x33c4635c383, max_idle_ns: 440795314831 ns
> > > > > > > > > > [ 4.108401] nvkm_uevent_add: uevent: 0xffff8881020b6000, event: 0xffff888102091f28, id: 0xf, bits: 0x3, func: 0xffffffff8177b700
> > > > > > > > > > [ 4.129864] clocksource: Switched to clocksource tsc
> > > > > > > > > > [ 4.131478] [drm] Initialized nouveau 1.3.1 20120801 for 0000:03:00.0 on minor 0
> > > > > > > > > > [ 4.143806] BUG: kernel NULL pointer dereference, address: 0000000000000020
> > > > > > > > >
> > > > > > > > > ahh, that would have been good to know :) Mind figuring out what's
> > > > > > > > > exactly NULL inside nvif_object_mthd? Or rather what line
> > > > > > > > > `nvif_object_mthd+0x136` belongs to, then it should be easy to figure
> > > > > > > > > out what's wrong here.
> > > > > > > >
> > > > > > > > FWIW, we've hit the bug on openSUSE Tumbleweed 6.4.8 kernel:
> > > > > > > > https://bugzilla.suse.com/show_bug.cgi?id=1214073
> > > > > > > > Confirmed that reverting the patch cured the issue.
> > > > > > > >
> > > > > > > > FWIW, loading nouveau showed a refcount_t warning just before the NULL
> > > > > > > > dereference:
> > > > > > > >
> > > > > > >
> > > > > > > mh, I wonder if one of those `return -EINVAL;` branches is hit where
> > > > > > > it wasn't before. Could some of you check if `nvkm_uconn_uevent`
> > > > > > > returns -EINVAL with that patch where it didn't before? I wonder if
> > > > > > > it's the `if (&outp->head == &conn->disp->outps) return -EINVAL;` and
> > > > > > > if remove that fixes the crash?
> > > > > >
> > > > > > Please give a patch, then I can build a kernel and let the reporter
> > > > > > testing it :)
> > > > > >
> > > > >
> > > > > attached a patch.
> > > >
> > > > Thanks. Now I'm building a test kernel and asked the reporter for
> > > > testing it.
> > >
> > > And the result was negative, the boot still hanged up.
> >
> > And below is another log from the 6.4.8 kernel with KASAN-enabled.
> > Some memory corruption seems happening.
> >
> > [ 228.422919] nouveau 0000:02:00.0: DRM: DCB conn 01: 0000a146
> > [ 228.428674] nouveau 0000:02:00.0: DRM: MM: using M2MF for buffer copies
> > [ 228.436682] ==================================================================
> > [ 228.436698] BUG: KASAN: slab-use-after-free in drm_connector_list_iter_next+0x176/0x320
> > [ 228.436715] Read of size 4 at addr ffff8881731ce050 by task modprobe/6174
> >
> > [ 228.436728] CPU: 0 PID: 6174 Comm: modprobe Not tainted 6.4.9-4.g5b9ad20-default #1 openSUSE Tumbleweed (unreleased) d0a6841e538b38d17513f6942fb58770372b54fd
> > [ 228.436740] Hardware name: Apple Inc. MacBook5,1/Mac-F42D89C8, BIOS MB51.88Z.007D.B03.0904271443 04/27/09
> > [ 228.436747] Call Trace:
> > [ 228.436753] <TASK>
> > [ 228.436759] dump_stack_lvl+0x47/0x60
> > [ 228.436773] print_report+0xcf/0x640
> > [ 228.436784] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
> > [ 228.436797] ? drm_connector_list_iter_next+0x176/0x320
> > [ 228.436807] kasan_report+0xb1/0xe0
> > [ 228.436817] ? drm_connector_list_iter_next+0x176/0x320
> > [ 228.436828] kasan_check_range+0x105/0x1b0
> > [ 228.436837] drm_connector_list_iter_next+0x176/0x320
> > [ 228.436848] ? __pfx_drm_connector_list_iter_next+0x10/0x10
> > [ 228.436859] ? __kmem_cache_free+0x18a/0x2c0
> > [ 228.436868] nouveau_connector_create+0x170/0x1cd0 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > [ 228.437540] ? drm_encoder_init+0xbe/0x140
> > [ 228.437554] ? __pfx_nouveau_connector_create+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > [ 228.438137] ? nvif_outp_ctor+0x2d9/0x430 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > [ 228.438236] nv50_display_create+0xe54/0x30d0 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > [ 228.438236] nouveau_display_create+0x903/0x10c0 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > [ 228.438236] nouveau_drm_device_init+0x3a4/0x19e0 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > [ 228.438236] ? __pfx_nouveau_drm_device_init+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > [ 228.438236] ? __pfx_pci_update_current_state+0x10/0x10
> > [ 228.438236] ? __kasan_check_byte+0x13/0x50
> > [ 228.438236] nouveau_drm_probe+0x1a2/0x6b0 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > [ 228.438236] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
> > [ 228.438236] ? __pfx_nouveau_drm_probe+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > [ 228.438236] ? __pfx_nouveau_drm_probe+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > [ 228.438236] local_pci_probe+0xdd/0x190
> > [ 228.438236] pci_device_probe+0x23a/0x770
> > [ 228.438236] ? kernfs_add_one+0x2d8/0x450
> > [ 228.438236] ? kernfs_get.part.0+0x4c/0x70
> > [ 228.438236] ? __pfx_pci_device_probe+0x10/0x10
> > [ 228.438236] ? kernfs_create_link+0x15f/0x230
> > [ 228.438236] ? kernfs_put+0x1c/0x40
> > [ 228.438236] ? sysfs_do_create_link_sd+0x8e/0x100
> > [ 228.438236] really_probe+0x3e2/0xb80
> > [ 228.438236] __driver_probe_device+0x18c/0x450
> > [ 228.438236] ? __pfx_klist_iter_init_node+0x10/0x10
> > [ 228.438236] driver_probe_device+0x4a/0x120
> > [ 228.438236] __driver_attach+0x1e1/0x4a0
> > [ 228.438236] ? __pfx___driver_attach+0x10/0x10
> > [ 228.438236] bus_for_each_dev+0xf4/0x170
> > [ 228.438236] ? __pfx__raw_spin_lock+0x10/0x10
> > [ 228.438236] ? __pfx_bus_for_each_dev+0x10/0x10
> > [ 228.438236] bus_add_driver+0x29e/0x570
> > [ 228.438236] ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > [ 228.438236] ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > [ 228.438236] driver_register+0x134/0x460
> > [ 228.438236] ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > [ 228.438236] do_one_initcall+0x8e/0x310
> > [ 228.438236] ? __pfx_do_one_initcall+0x10/0x10
> > [ 228.438236] ? __kmem_cache_alloc_node+0x1b9/0x3b0
> > [ 228.438236] ? do_init_module+0x4b/0x730
> > [ 228.438236] ? kasan_unpoison+0x44/0x70
> > [ 228.438236] do_init_module+0x238/0x730
> > [ 228.438236] load_module+0x5b41/0x6dd0
> > [ 228.438236] ? __pfx_load_module+0x10/0x10
> > [ 228.438236] ? _raw_spin_lock+0x85/0xe0
> > [ 228.438236] ? __pfx__raw_spin_lock+0x10/0x10
> > [ 228.438236] ? find_vmap_area+0xab/0xe0
> > [ 228.438236] ? __do_sys_init_module+0x1df/0x210
> > [ 228.438236] __do_sys_init_module+0x1df/0x210
> > [ 228.438236] ? __pfx___do_sys_init_module+0x10/0x10
> > [ 228.438236] ? syscall_exit_to_user_mode+0x1b/0x40
> > [ 228.438236] ? do_syscall_64+0x6c/0x90
> > [ 228.438236] ? __pfx_ksys_read+0x10/0x10
> > [ 228.438236] do_syscall_64+0x60/0x90
> > [ 228.438236] ? syscall_exit_to_user_mode+0x1b/0x40
> > [ 228.438236] ? do_syscall_64+0x6c/0x90
> > [ 228.438236] ? syscall_exit_to_user_mode+0x1b/0x40
> > [ 228.438236] ? do_syscall_64+0x6c/0x90
> > [ 228.438236] ? exc_page_fault+0x62/0xd0
> > [ 228.438236] entry_SYSCALL_64_after_hwframe+0x77/0xe1
> > [ 228.438236] RIP: 0033:0x7f91ce119a5e
> > [ 228.438236] Code: c3 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 66 90 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 7a 03 0d 00 f7 d8 64 89 01 48
> > [ 228.438236] RSP: 002b:00007ffce2813538 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
> > [ 228.438236] RAX: ffffffffffffffda RBX: 00005588462def10 RCX: 00007f91ce119a5e
> > [ 228.438236] RDX: 00005588462e39c0 RSI: 0000000000fda8b2 RDI: 00007f91cc371010
> > [ 228.438236] RBP: 00005588462e39c0 R08: 00005588462e3ce0 R09: 0000000000000000
> > [ 228.438236] R10: 000000000005af11 R11: 0000000000000246 R12: 0000000000040000
> > [ 228.438236] R13: 0000000000000000 R14: 0000000000000009 R15: 00005588462de7c0
> > [ 228.438236] </TASK>
> >
> > [ 228.438236] Allocated by task 6174:
> > [ 228.438236] kasan_save_stack+0x20/0x40
> > [ 228.438236] kasan_set_track+0x25/0x30
> > [ 228.438236] __kasan_kmalloc+0xaa/0xb0
> > [ 228.438236] nouveau_connector_create+0x386/0x1cd0 [nouveau]
> > [ 228.438236] nv50_display_create+0xe54/0x30d0 [nouveau]
> > [ 228.438236] nouveau_display_create+0x903/0x10c0 [nouveau]
> > [ 228.438236] nouveau_drm_device_init+0x3a4/0x19e0 [nouveau]
> > [ 228.438236] nouveau_drm_probe+0x1a2/0x6b0 [nouveau]
> > [ 228.438236] local_pci_probe+0xdd/0x190
> > [ 228.438236] pci_device_probe+0x23a/0x770
> > [ 228.438236] really_probe+0x3e2/0xb80
> > [ 228.438236] __driver_probe_device+0x18c/0x450
> > [ 228.438236] driver_probe_device+0x4a/0x120
> > [ 228.438236] __driver_attach+0x1e1/0x4a0
> > [ 228.438236] bus_for_each_dev+0xf4/0x170
> > [ 228.438236] bus_add_driver+0x29e/0x570
> > [ 228.438236] driver_register+0x134/0x460
> > [ 228.438236] do_one_initcall+0x8e/0x310
> > [ 228.438236] do_init_module+0x238/0x730
> > [ 228.438236] load_module+0x5b41/0x6dd0
> > [ 228.438236] __do_sys_init_module+0x1df/0x210
> > [ 228.438236] do_syscall_64+0x60/0x90
> > [ 228.438236] entry_SYSCALL_64_after_hwframe+0x77/0xe1
> >
> > [ 228.438236] Freed by task 6174:
> > [ 228.438236] kasan_save_stack+0x20/0x40
> > [ 228.438236] kasan_set_track+0x25/0x30
> > [ 228.438236] kasan_save_free_info+0x2e/0x50
> > [ 228.438236] ____kasan_slab_free+0x169/0x1c0
> > [ 228.438236] slab_free_freelist_hook+0xcd/0x190
> > [ 228.438236] __kmem_cache_free+0x18a/0x2c0
> > [ 228.438236] nouveau_connector_create+0x1423/0x1cd0 [nouveau]
> > [ 228.438236] nv50_display_create+0xe54/0x30d0 [nouveau]
> > [ 228.438236] nouveau_display_create+0x903/0x10c0 [nouveau]
> > [ 228.438236] nouveau_drm_device_init+0x3a4/0x19e0 [nouveau]
> > [ 228.438236] nouveau_drm_probe+0x1a2/0x6b0 [nouveau]
> > [ 228.438236] local_pci_probe+0xdd/0x190
> > [ 228.438236] pci_device_probe+0x23a/0x770
> > [ 228.438236] really_probe+0x3e2/0xb80
> > [ 228.438236] __driver_probe_device+0x18c/0x450
> > [ 228.438236] driver_probe_device+0x4a/0x120
> > [ 228.438236] __driver_attach+0x1e1/0x4a0
> > [ 228.438236] bus_for_each_dev+0xf4/0x170
> > [ 228.438236] bus_add_driver+0x29e/0x570
> > [ 228.438236] driver_register+0x134/0x460
> > [ 228.438236] do_one_initcall+0x8e/0x310
> > [ 228.438236] do_init_module+0x238/0x730
> > [ 228.438236] load_module+0x5b41/0x6dd0
> > [ 228.438236] __do_sys_init_module+0x1df/0x210
> > [ 228.438236] do_syscall_64+0x60/0x90
> > [ 228.438236] entry_SYSCALL_64_after_hwframe+0x77/0xe1
> >
> > [ 228.438236] The buggy address belongs to the object at ffff8881731ce000
> > which belongs to the cache kmalloc-4k of size 4096
> > [ 228.438236] The buggy address is located 80 bytes inside of
> > freed 4096-byte region [ffff8881731ce000, ffff8881731cf000)
> >
> > [ 228.438236] The buggy address belongs to the physical page:
> > [ 228.438236] page:00000000d1c274b4 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1731c8
> > [ 228.438236] head:00000000d1c274b4 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
> > [ 228.438236] flags: 0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
> > [ 228.438236] page_type: 0xffffffff()
> > [ 228.438236] raw: 0017ffffc0010200 ffff888100042140 dead000000000122 0000000000000000
> > [ 228.438236] raw: 0000000000000000 0000000080040004 00000001ffffffff 0000000000000000
> > [ 228.438236] page dumped because: kasan: bad access detected
> >
> > [ 228.438236] Memory state around the buggy address:
> > [ 228.438236] ffff8881731cdf00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> > [ 228.438236] ffff8881731cdf80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> > [ 228.438236] >ffff8881731ce000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > [ 228.438236] ^
> > [ 228.438236] ffff8881731ce080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > [ 228.438236] ffff8881731ce100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > [ 228.438236] ==================================================================
> >
>
> mind resolving those to file lines via decode_stacktrace.sh or
> something, because looking at it, it makes no sense really.

I don't own the machine, so it's a bit difficult from my side,
unfortunately.

But you can read the log and find easily that the object is *freed*
at nouveau_connector_create() called from nv50_display_create(). It
implies that, even after a connector is freed by an error, yet the
object is still referred at nouveau_display_create(). This explains
why the error starts appearing after you put an extra check to return
-EINVAL.

That said, my bet is that some incorrect error handling and the
resource releases at the connector creation.


thanks,

Takashi

2023-08-14 13:54:45

by Karol Herbst

[permalink] [raw]
Subject: Re: 2b5d1c29f6c4 ("drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts")

On Tue, Aug 8, 2023 at 3:47 PM Borislav Petkov <[email protected]> wrote:
>
> On Tue, Aug 08, 2023 at 12:39:32PM +0200, Karol Herbst wrote:
> > ahh, that would have been good to know :)
>
> Yeah, I didn't see it before - it would only freeze. Only after I added
> the printk you requested.
>
> > Mind figuring out what's exactly NULL inside nvif_object_mthd? Or
> > rather what line `nvif_object_mthd+0x136` belongs to, then it should
> > be easy to figure out what's wrong here.
>
> That looks like this:
>
> ffffffff816ddfee: e8 8d 04 4e 00 callq ffffffff81bbe480 <__memcpy>
> ffffffff816ddff3: 41 8d 56 20 lea 0x20(%r14),%edx
> ffffffff816ddff7: 49 8b 44 24 08 mov 0x8(%r12),%rax
> ffffffff816ddffc: 83 fa 17 cmp $0x17,%edx
> ffffffff816ddfff: 76 7d jbe ffffffff816de07e <nvif_object_mthd+0x1ae>
> ffffffff816de001: 49 39 c4 cmp %rax,%r12
> ffffffff816de004: 74 45 je ffffffff816de04b <nvif_object_mthd+0x17b>
>
> <--- RIP points here.
>
> The 0x20 also fits the deref address: 0000000000000020.
>
> Which means %rax is 0. Yap.
>
> ffffffff816de006: 48 8b 78 20 mov 0x20(%rax),%rdi
> ffffffff816de00a: 4c 89 64 24 10 mov %r12,0x10(%rsp)
> ffffffff816de00f: 48 8b 40 38 mov 0x38(%rax),%rax
> ffffffff816de013: c6 44 24 06 ff movb $0xff,0x6(%rsp)
> ffffffff816de018: 31 c9 xor %ecx,%ecx
> ffffffff816de01a: 48 89 e6 mov %rsp,%rsi
> ffffffff816de01d: 48 8b 40 28 mov 0x28(%rax),%rax
> ffffffff816de021: e8 3a 0c 4f 00 callq ffffffff81bcec60 <__x86_indirect_thunk_array>
>
>
> Now, the preprocessed asm version of nvif/object.c says around here:
>
>
> call memcpy #
> # drivers/gpu/drm/nouveau/nvif/object.c:160: ret = nvif_object_ioctl(object, args, sizeof(*args) + size, NULL);
> leal 32(%r14), %edx #, _108
> # drivers/gpu/drm/nouveau/nvif/object.c:33: struct nvif_client *client = object->client;
> movq 8(%r12), %rax # object_19(D)->client, client
> # drivers/gpu/drm/nouveau/nvif/object.c:38: if (size >= sizeof(*args) && args->v0.version == 0) {
> cmpl $23, %edx #, _108
> jbe .L69 #,
> # drivers/gpu/drm/nouveau/nvif/object.c:39: if (object != &client->object)
> cmpq %rax, %r12 # client, object
> je .L70 #,
> # drivers/gpu/drm/nouveau/nvif/object.c:47: return client->driver->ioctl(client->object.priv, data, size, hack);
> movq 32(%rax), %rdi # client_109->object.priv, client_109->object.priv
>
>
> So I'd say that client is NULL. IINM.
>
>
> movq %r12, 16(%rsp) # object, MEM[(union *)&stack].v0.object
> # drivers/gpu/drm/nouveau/nvif/object.c:47: return client->driver->ioctl(client->object.priv, data, size, hack);
> movq 56(%rax), %rax # client_109->driver, client_109->driver
> # drivers/gpu/drm/nouveau/nvif/object.c:43: args->v0.owner = NVIF_IOCTL_V0_OWNER_ANY;
> movb $-1, 6(%rsp) #, MEM[(union *)&stack].v0.owner
> .L64:
> # drivers/gpu/drm/nouveau/nvif/object.c:47: return client->driver->ioctl(client->object.priv, data, size, hack);
> xorl %ecx, %ecx #
> movq %rsp, %rsi #,
> movq 40(%rax), %rax #, _77->ioctl
> call __x86_indirect_thunk_rax
> # drivers/gpu/drm/nouveau/nvif/object.c:161: memcpy(data, args->mthd.data, size);
>
> > > [ 4.144676] #PF: supervisor read access in kernel mode
> > > [ 4.144676] #PF: error_code(0x0000) - not-present page
> > > [ 4.144676] PGD 0 P4D 0
> > > [ 4.144676] Oops: 0000 [#1] PREEMPT SMP PTI
> > > [ 4.144676] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc5-dirty #1
> > > [ 4.144676] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A13 05/11/2014
> > > [ 4.144676] RIP: 0010:nvif_object_mthd+0x136/0x1e0
> > > [ 4.144676] Code: f2 4c 89 ee 48 8d 7c 24 20 66 89 04 24 c6 44 24 18 00 e8 8d 04 4e 00 41 8d 56 20 49 8b 44 24 08 83 fa 17 76 7d 49 39 c4 74 45 <48> 8b 78 20 4c 89 64 24 10 48 8b 40 38 c6 44 24 06 ff 31 c9 48 89
>
> Opcode bytes around RIP look correct too:
>
> ./scripts/decodecode < /tmp/oops
> [ 4.144676] Code: f2 4c 89 ee 48 8d 7c 24 20 66 89 04 24 c6 44 24 18 00 e8 8d 04 4e 00 41 8d 56 20 49 8b 44 24 08 83 fa 17 76 7d 49 39 c4 74 45 <48> 8b 78 20 4c 89 64 24 10 48 8b 40 38 c6 44 24 06 ff 31 c9 48 89
> All code
> ========
> 0: f2 4c 89 ee repnz mov %r13,%rsi
> 4: 48 8d 7c 24 20 lea 0x20(%rsp),%rdi
> 9: 66 89 04 24 mov %ax,(%rsp)
> d: c6 44 24 18 00 movb $0x0,0x18(%rsp)
> 12: e8 8d 04 4e 00 callq 0x4e04a4
> 17: 41 8d 56 20 lea 0x20(%r14),%edx
> 1b: 49 8b 44 24 08 mov 0x8(%r12),%rax
> 20: 83 fa 17 cmp $0x17,%edx
> 23: 76 7d jbe 0xa2
> 25: 49 39 c4 cmp %rax,%r12
> 28: 74 45 je 0x6f
> 2a:* 48 8b 78 20 mov 0x20(%rax),%rdi <-- trapping instruction
> 2e: 4c 89 64 24 10 mov %r12,0x10(%rsp)
> 33: 48 8b 40 38 mov 0x38(%rax),%rax
> 37: c6 44 24 06 ff movb $0xff,0x6(%rsp)
> 3c: 31 c9 xor %ecx,%ecx
> 3e: 48 rex.W
> 3f: 89 .byte 0x89
>

mind compiling your kernel with KASAN and see if you hit the same
error as reported on this thread?

>
> HTH.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
>


2023-08-14 13:57:46

by Karol Herbst

[permalink] [raw]
Subject: Re: 2b5d1c29f6c4 ("drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts")

On Mon, Aug 14, 2023 at 2:48 PM Takashi Iwai <[email protected]> wrote:
>
> On Mon, 14 Aug 2023 14:38:18 +0200,
> Karol Herbst wrote:
> >
> > On Wed, Aug 9, 2023 at 6:16 PM Takashi Iwai <[email protected]> wrote:
> > >
> > > On Wed, 09 Aug 2023 16:46:38 +0200,
> > > Takashi Iwai wrote:
> > > >
> > > > On Wed, 09 Aug 2023 15:13:23 +0200,
> > > > Takashi Iwai wrote:
> > > > >
> > > > > On Wed, 09 Aug 2023 14:19:23 +0200,
> > > > > Karol Herbst wrote:
> > > > > >
> > > > > > On Wed, Aug 9, 2023 at 1:46 PM Takashi Iwai <[email protected]> wrote:
> > > > > > >
> > > > > > > On Wed, 09 Aug 2023 13:42:09 +0200,
> > > > > > > Karol Herbst wrote:
> > > > > > > >
> > > > > > > > On Wed, Aug 9, 2023 at 11:22 AM Takashi Iwai <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > On Tue, 08 Aug 2023 12:39:32 +0200,
> > > > > > > > > Karol Herbst wrote:
> > > > > > > > > >
> > > > > > > > > > On Mon, Aug 7, 2023 at 5:05 PM Borislav Petkov <[email protected]> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Aug 07, 2023 at 01:49:42PM +0200, Karol Herbst wrote:
> > > > > > > > > > > > in what way does it stop? Just not progressing? That would be kinda
> > > > > > > > > > > > concerning. Mind tracing with what arguments `nvkm_uevent_add` is
> > > > > > > > > > > > called with and without that patch?
> > > > > > > > > > >
> > > > > > > > > > > Well, me dumping those args I guess made the box not freeze before
> > > > > > > > > > > catching a #PF over serial. Does that help?
> > > > > > > > > > >
> > > > > > > > > > > ....
> > > > > > > > > > > [ 3.410135] Unpacking initramfs...
> > > > > > > > > > > [ 3.416319] software IO TLB: mapped [mem 0x00000000a877d000-0x00000000ac77d000] (64MB)
> > > > > > > > > > > [ 3.418227] Initialise system trusted keyrings
> > > > > > > > > > > [ 3.432273] workingset: timestamp_bits=56 max_order=22 bucket_order=0
> > > > > > > > > > > [ 3.439006] ntfs: driver 2.1.32 [Flags: R/W].
> > > > > > > > > > > [ 3.443368] fuse: init (API version 7.38)
> > > > > > > > > > > [ 3.447601] 9p: Installing v9fs 9p2000 file system support
> > > > > > > > > > > [ 3.453223] Key type asymmetric registered
> > > > > > > > > > > [ 3.457332] Asymmetric key parser 'x509' registered
> > > > > > > > > > > [ 3.462236] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 250)
> > > > > > > > > > > [ 3.475865] efifb: probing for efifb
> > > > > > > > > > > [ 3.479458] efifb: framebuffer at 0xf9000000, using 1920k, total 1920k
> > > > > > > > > > > [ 3.485969] efifb: mode is 800x600x32, linelength=3200, pages=1
> > > > > > > > > > > [ 3.491872] efifb: scrolling: redraw
> > > > > > > > > > > [ 3.495438] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
> > > > > > > > > > > [ 3.502349] Console: switching to colour frame buffer device 100x37
> > > > > > > > > > > [ 3.509564] fb0: EFI VGA frame buffer device
> > > > > > > > > > > [ 3.514013] ACPI: \_PR_.CP00: Found 4 idle states
> > > > > > > > > > > [ 3.518850] ACPI: \_PR_.CP01: Found 4 idle states
> > > > > > > > > > > [ 3.523687] ACPI: \_PR_.CP02: Found 4 idle states
> > > > > > > > > > > [ 3.528515] ACPI: \_PR_.CP03: Found 4 idle states
> > > > > > > > > > > [ 3.533346] ACPI: \_PR_.CP04: Found 4 idle states
> > > > > > > > > > > [ 3.538173] ACPI: \_PR_.CP05: Found 4 idle states
> > > > > > > > > > > [ 3.543003] ACPI: \_PR_.CP06: Found 4 idle states
> > > > > > > > > > > [ 3.544219] Freeing initrd memory: 8196K
> > > > > > > > > > > [ 3.547844] ACPI: \_PR_.CP07: Found 4 idle states
> > > > > > > > > > > [ 3.609542] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> > > > > > > > > > > [ 3.616224] 00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
> > > > > > > > > > > [ 3.625552] serial 0000:00:16.3: enabling device (0000 -> 0003)
> > > > > > > > > > > [ 3.633034] 0000:00:16.3: ttyS1 at I/O 0xf0a0 (irq = 17, base_baud = 115200) is a 16550A
> > > > > > > > > > > [ 3.642451] Linux agpgart interface v0.103
> > > > > > > > > > > [ 3.647141] ACPI: bus type drm_connector registered
> > > > > > > > > > > [ 3.653261] Console: switching to colour dummy device 80x25
> > > > > > > > > > > [ 3.659092] nouveau 0000:03:00.0: vgaarb: deactivate vga console
> > > > > > > > > > > [ 3.665174] nouveau 0000:03:00.0: NVIDIA GT218 (0a8c00b1)
> > > > > > > > > > > [ 3.784585] nouveau 0000:03:00.0: bios: version 70.18.83.00.08
> > > > > > > > > > > [ 3.792244] nouveau 0000:03:00.0: fb: 512 MiB DDR3
> > > > > > > > > > > [ 3.948786] nouveau 0000:03:00.0: DRM: VRAM: 512 MiB
> > > > > > > > > > > [ 3.953755] nouveau 0000:03:00.0: DRM: GART: 1048576 MiB
> > > > > > > > > > > [ 3.959073] nouveau 0000:03:00.0: DRM: TMDS table version 2.0
> > > > > > > > > > > [ 3.964808] nouveau 0000:03:00.0: DRM: DCB version 4.0
> > > > > > > > > > > [ 3.969938] nouveau 0000:03:00.0: DRM: DCB outp 00: 02000360 00000000
> > > > > > > > > > > [ 3.976367] nouveau 0000:03:00.0: DRM: DCB outp 01: 02000362 00020010
> > > > > > > > > > > [ 3.982792] nouveau 0000:03:00.0: DRM: DCB outp 02: 028003a6 0f220010
> > > > > > > > > > > [ 3.989223] nouveau 0000:03:00.0: DRM: DCB outp 03: 01011380 00000000
> > > > > > > > > > > [ 3.995647] nouveau 0000:03:00.0: DRM: DCB outp 04: 08011382 00020010
> > > > > > > > > > > [ 4.002076] nouveau 0000:03:00.0: DRM: DCB outp 05: 088113c6 0f220010
> > > > > > > > > > > [ 4.008511] nouveau 0000:03:00.0: DRM: DCB conn 00: 00101064
> > > > > > > > > > > [ 4.014151] nouveau 0000:03:00.0: DRM: DCB conn 01: 00202165
> > > > > > > > > > > [ 4.021710] nvkm_uevent_add: uevent: 0xffff888100242100, event: 0xffff8881022de1a0, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > > > > > > > > [ 4.033680] nvkm_uevent_add: uevent: 0xffff888100242300, event: 0xffff8881022de1a0, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > > > > > > > > [ 4.045429] nouveau 0000:03:00.0: DRM: MM: using COPY for buffer copies
> > > > > > > > > > > [ 4.052059] stackdepot: allocating hash table of 1048576 entries via kvcalloc
> > > > > > > > > > > [ 4.067191] nvkm_uevent_add: uevent: 0xffff888100242800, event: 0xffff888104b3e260, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > > > > > > > > [ 4.078936] nvkm_uevent_add: uevent: 0xffff888100242900, event: 0xffff888104b3e260, id: 0x1, bits: 0x1, func: 0x0000000000000000
> > > > > > > > > > > [ 4.090514] nvkm_uevent_add: uevent: 0xffff888100242a00, event: 0xffff888102091f28, id: 0x1, bits: 0x3, func: 0xffffffff8177b700
> > > > > > > > > > > [ 4.102118] tsc: Refined TSC clocksource calibration: 3591.345 MHz
> > > > > > > > > > > [ 4.108342] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x33c4635c383, max_idle_ns: 440795314831 ns
> > > > > > > > > > > [ 4.108401] nvkm_uevent_add: uevent: 0xffff8881020b6000, event: 0xffff888102091f28, id: 0xf, bits: 0x3, func: 0xffffffff8177b700
> > > > > > > > > > > [ 4.129864] clocksource: Switched to clocksource tsc
> > > > > > > > > > > [ 4.131478] [drm] Initialized nouveau 1.3.1 20120801 for 0000:03:00.0 on minor 0
> > > > > > > > > > > [ 4.143806] BUG: kernel NULL pointer dereference, address: 0000000000000020
> > > > > > > > > >
> > > > > > > > > > ahh, that would have been good to know :) Mind figuring out what's
> > > > > > > > > > exactly NULL inside nvif_object_mthd? Or rather what line
> > > > > > > > > > `nvif_object_mthd+0x136` belongs to, then it should be easy to figure
> > > > > > > > > > out what's wrong here.
> > > > > > > > >
> > > > > > > > > FWIW, we've hit the bug on openSUSE Tumbleweed 6.4.8 kernel:
> > > > > > > > > https://bugzilla.suse.com/show_bug.cgi?id=1214073
> > > > > > > > > Confirmed that reverting the patch cured the issue.
> > > > > > > > >
> > > > > > > > > FWIW, loading nouveau showed a refcount_t warning just before the NULL
> > > > > > > > > dereference:
> > > > > > > > >
> > > > > > > >
> > > > > > > > mh, I wonder if one of those `return -EINVAL;` branches is hit where
> > > > > > > > it wasn't before. Could some of you check if `nvkm_uconn_uevent`
> > > > > > > > returns -EINVAL with that patch where it didn't before? I wonder if
> > > > > > > > it's the `if (&outp->head == &conn->disp->outps) return -EINVAL;` and
> > > > > > > > if remove that fixes the crash?
> > > > > > >
> > > > > > > Please give a patch, then I can build a kernel and let the reporter
> > > > > > > testing it :)
> > > > > > >
> > > > > >
> > > > > > attached a patch.
> > > > >
> > > > > Thanks. Now I'm building a test kernel and asked the reporter for
> > > > > testing it.
> > > >
> > > > And the result was negative, the boot still hanged up.
> > >
> > > And below is another log from the 6.4.8 kernel with KASAN-enabled.
> > > Some memory corruption seems happening.
> > >
> > > [ 228.422919] nouveau 0000:02:00.0: DRM: DCB conn 01: 0000a146
> > > [ 228.428674] nouveau 0000:02:00.0: DRM: MM: using M2MF for buffer copies
> > > [ 228.436682] ==================================================================
> > > [ 228.436698] BUG: KASAN: slab-use-after-free in drm_connector_list_iter_next+0x176/0x320
> > > [ 228.436715] Read of size 4 at addr ffff8881731ce050 by task modprobe/6174
> > >
> > > [ 228.436728] CPU: 0 PID: 6174 Comm: modprobe Not tainted 6.4.9-4.g5b9ad20-default #1 openSUSE Tumbleweed (unreleased) d0a6841e538b38d17513f6942fb58770372b54fd
> > > [ 228.436740] Hardware name: Apple Inc. MacBook5,1/Mac-F42D89C8, BIOS MB51.88Z.007D.B03.0904271443 04/27/09
> > > [ 228.436747] Call Trace:
> > > [ 228.436753] <TASK>
> > > [ 228.436759] dump_stack_lvl+0x47/0x60
> > > [ 228.436773] print_report+0xcf/0x640
> > > [ 228.436784] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
> > > [ 228.436797] ? drm_connector_list_iter_next+0x176/0x320
> > > [ 228.436807] kasan_report+0xb1/0xe0
> > > [ 228.436817] ? drm_connector_list_iter_next+0x176/0x320
> > > [ 228.436828] kasan_check_range+0x105/0x1b0
> > > [ 228.436837] drm_connector_list_iter_next+0x176/0x320
> > > [ 228.436848] ? __pfx_drm_connector_list_iter_next+0x10/0x10
> > > [ 228.436859] ? __kmem_cache_free+0x18a/0x2c0
> > > [ 228.436868] nouveau_connector_create+0x170/0x1cd0 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > [ 228.437540] ? drm_encoder_init+0xbe/0x140
> > > [ 228.437554] ? __pfx_nouveau_connector_create+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > [ 228.438137] ? nvif_outp_ctor+0x2d9/0x430 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > [ 228.438236] nv50_display_create+0xe54/0x30d0 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > [ 228.438236] nouveau_display_create+0x903/0x10c0 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > [ 228.438236] nouveau_drm_device_init+0x3a4/0x19e0 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > [ 228.438236] ? __pfx_nouveau_drm_device_init+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > [ 228.438236] ? __pfx_pci_update_current_state+0x10/0x10
> > > [ 228.438236] ? __kasan_check_byte+0x13/0x50
> > > [ 228.438236] nouveau_drm_probe+0x1a2/0x6b0 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > [ 228.438236] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
> > > [ 228.438236] ? __pfx_nouveau_drm_probe+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > [ 228.438236] ? __pfx_nouveau_drm_probe+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > [ 228.438236] local_pci_probe+0xdd/0x190
> > > [ 228.438236] pci_device_probe+0x23a/0x770
> > > [ 228.438236] ? kernfs_add_one+0x2d8/0x450
> > > [ 228.438236] ? kernfs_get.part.0+0x4c/0x70
> > > [ 228.438236] ? __pfx_pci_device_probe+0x10/0x10
> > > [ 228.438236] ? kernfs_create_link+0x15f/0x230
> > > [ 228.438236] ? kernfs_put+0x1c/0x40
> > > [ 228.438236] ? sysfs_do_create_link_sd+0x8e/0x100
> > > [ 228.438236] really_probe+0x3e2/0xb80
> > > [ 228.438236] __driver_probe_device+0x18c/0x450
> > > [ 228.438236] ? __pfx_klist_iter_init_node+0x10/0x10
> > > [ 228.438236] driver_probe_device+0x4a/0x120
> > > [ 228.438236] __driver_attach+0x1e1/0x4a0
> > > [ 228.438236] ? __pfx___driver_attach+0x10/0x10
> > > [ 228.438236] bus_for_each_dev+0xf4/0x170
> > > [ 228.438236] ? __pfx__raw_spin_lock+0x10/0x10
> > > [ 228.438236] ? __pfx_bus_for_each_dev+0x10/0x10
> > > [ 228.438236] bus_add_driver+0x29e/0x570
> > > [ 228.438236] ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > [ 228.438236] ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > [ 228.438236] driver_register+0x134/0x460
> > > [ 228.438236] ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > [ 228.438236] do_one_initcall+0x8e/0x310
> > > [ 228.438236] ? __pfx_do_one_initcall+0x10/0x10
> > > [ 228.438236] ? __kmem_cache_alloc_node+0x1b9/0x3b0
> > > [ 228.438236] ? do_init_module+0x4b/0x730
> > > [ 228.438236] ? kasan_unpoison+0x44/0x70
> > > [ 228.438236] do_init_module+0x238/0x730
> > > [ 228.438236] load_module+0x5b41/0x6dd0
> > > [ 228.438236] ? __pfx_load_module+0x10/0x10
> > > [ 228.438236] ? _raw_spin_lock+0x85/0xe0
> > > [ 228.438236] ? __pfx__raw_spin_lock+0x10/0x10
> > > [ 228.438236] ? find_vmap_area+0xab/0xe0
> > > [ 228.438236] ? __do_sys_init_module+0x1df/0x210
> > > [ 228.438236] __do_sys_init_module+0x1df/0x210
> > > [ 228.438236] ? __pfx___do_sys_init_module+0x10/0x10
> > > [ 228.438236] ? syscall_exit_to_user_mode+0x1b/0x40
> > > [ 228.438236] ? do_syscall_64+0x6c/0x90
> > > [ 228.438236] ? __pfx_ksys_read+0x10/0x10
> > > [ 228.438236] do_syscall_64+0x60/0x90
> > > [ 228.438236] ? syscall_exit_to_user_mode+0x1b/0x40
> > > [ 228.438236] ? do_syscall_64+0x6c/0x90
> > > [ 228.438236] ? syscall_exit_to_user_mode+0x1b/0x40
> > > [ 228.438236] ? do_syscall_64+0x6c/0x90
> > > [ 228.438236] ? exc_page_fault+0x62/0xd0
> > > [ 228.438236] entry_SYSCALL_64_after_hwframe+0x77/0xe1
> > > [ 228.438236] RIP: 0033:0x7f91ce119a5e
> > > [ 228.438236] Code: c3 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 66 90 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 7a 03 0d 00 f7 d8 64 89 01 48
> > > [ 228.438236] RSP: 002b:00007ffce2813538 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
> > > [ 228.438236] RAX: ffffffffffffffda RBX: 00005588462def10 RCX: 00007f91ce119a5e
> > > [ 228.438236] RDX: 00005588462e39c0 RSI: 0000000000fda8b2 RDI: 00007f91cc371010
> > > [ 228.438236] RBP: 00005588462e39c0 R08: 00005588462e3ce0 R09: 0000000000000000
> > > [ 228.438236] R10: 000000000005af11 R11: 0000000000000246 R12: 0000000000040000
> > > [ 228.438236] R13: 0000000000000000 R14: 0000000000000009 R15: 00005588462de7c0
> > > [ 228.438236] </TASK>
> > >
> > > [ 228.438236] Allocated by task 6174:
> > > [ 228.438236] kasan_save_stack+0x20/0x40
> > > [ 228.438236] kasan_set_track+0x25/0x30
> > > [ 228.438236] __kasan_kmalloc+0xaa/0xb0
> > > [ 228.438236] nouveau_connector_create+0x386/0x1cd0 [nouveau]
> > > [ 228.438236] nv50_display_create+0xe54/0x30d0 [nouveau]
> > > [ 228.438236] nouveau_display_create+0x903/0x10c0 [nouveau]
> > > [ 228.438236] nouveau_drm_device_init+0x3a4/0x19e0 [nouveau]
> > > [ 228.438236] nouveau_drm_probe+0x1a2/0x6b0 [nouveau]
> > > [ 228.438236] local_pci_probe+0xdd/0x190
> > > [ 228.438236] pci_device_probe+0x23a/0x770
> > > [ 228.438236] really_probe+0x3e2/0xb80
> > > [ 228.438236] __driver_probe_device+0x18c/0x450
> > > [ 228.438236] driver_probe_device+0x4a/0x120
> > > [ 228.438236] __driver_attach+0x1e1/0x4a0
> > > [ 228.438236] bus_for_each_dev+0xf4/0x170
> > > [ 228.438236] bus_add_driver+0x29e/0x570
> > > [ 228.438236] driver_register+0x134/0x460
> > > [ 228.438236] do_one_initcall+0x8e/0x310
> > > [ 228.438236] do_init_module+0x238/0x730
> > > [ 228.438236] load_module+0x5b41/0x6dd0
> > > [ 228.438236] __do_sys_init_module+0x1df/0x210
> > > [ 228.438236] do_syscall_64+0x60/0x90
> > > [ 228.438236] entry_SYSCALL_64_after_hwframe+0x77/0xe1
> > >
> > > [ 228.438236] Freed by task 6174:
> > > [ 228.438236] kasan_save_stack+0x20/0x40
> > > [ 228.438236] kasan_set_track+0x25/0x30
> > > [ 228.438236] kasan_save_free_info+0x2e/0x50
> > > [ 228.438236] ____kasan_slab_free+0x169/0x1c0
> > > [ 228.438236] slab_free_freelist_hook+0xcd/0x190
> > > [ 228.438236] __kmem_cache_free+0x18a/0x2c0
> > > [ 228.438236] nouveau_connector_create+0x1423/0x1cd0 [nouveau]
> > > [ 228.438236] nv50_display_create+0xe54/0x30d0 [nouveau]
> > > [ 228.438236] nouveau_display_create+0x903/0x10c0 [nouveau]
> > > [ 228.438236] nouveau_drm_device_init+0x3a4/0x19e0 [nouveau]
> > > [ 228.438236] nouveau_drm_probe+0x1a2/0x6b0 [nouveau]
> > > [ 228.438236] local_pci_probe+0xdd/0x190
> > > [ 228.438236] pci_device_probe+0x23a/0x770
> > > [ 228.438236] really_probe+0x3e2/0xb80
> > > [ 228.438236] __driver_probe_device+0x18c/0x450
> > > [ 228.438236] driver_probe_device+0x4a/0x120
> > > [ 228.438236] __driver_attach+0x1e1/0x4a0
> > > [ 228.438236] bus_for_each_dev+0xf4/0x170
> > > [ 228.438236] bus_add_driver+0x29e/0x570
> > > [ 228.438236] driver_register+0x134/0x460
> > > [ 228.438236] do_one_initcall+0x8e/0x310
> > > [ 228.438236] do_init_module+0x238/0x730
> > > [ 228.438236] load_module+0x5b41/0x6dd0
> > > [ 228.438236] __do_sys_init_module+0x1df/0x210
> > > [ 228.438236] do_syscall_64+0x60/0x90
> > > [ 228.438236] entry_SYSCALL_64_after_hwframe+0x77/0xe1
> > >
> > > [ 228.438236] The buggy address belongs to the object at ffff8881731ce000
> > > which belongs to the cache kmalloc-4k of size 4096
> > > [ 228.438236] The buggy address is located 80 bytes inside of
> > > freed 4096-byte region [ffff8881731ce000, ffff8881731cf000)
> > >
> > > [ 228.438236] The buggy address belongs to the physical page:
> > > [ 228.438236] page:00000000d1c274b4 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1731c8
> > > [ 228.438236] head:00000000d1c274b4 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
> > > [ 228.438236] flags: 0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
> > > [ 228.438236] page_type: 0xffffffff()
> > > [ 228.438236] raw: 0017ffffc0010200 ffff888100042140 dead000000000122 0000000000000000
> > > [ 228.438236] raw: 0000000000000000 0000000080040004 00000001ffffffff 0000000000000000
> > > [ 228.438236] page dumped because: kasan: bad access detected
> > >
> > > [ 228.438236] Memory state around the buggy address:
> > > [ 228.438236] ffff8881731cdf00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> > > [ 228.438236] ffff8881731cdf80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> > > [ 228.438236] >ffff8881731ce000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > [ 228.438236] ^
> > > [ 228.438236] ffff8881731ce080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > [ 228.438236] ffff8881731ce100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > [ 228.438236] ==================================================================
> > >
> >
> > mind resolving those to file lines via decode_stacktrace.sh or
> > something, because looking at it, it makes no sense really.
>
> I don't own the machine, so it's a bit difficult from my side,
> unfortunately.
>
> But you can read the log and find easily that the object is *freed*
> at nouveau_connector_create() called from nv50_display_create(). It
> implies that, even after a connector is freed by an error, yet the
> object is still referred at nouveau_display_create(). This explains
> why the error starts appearing after you put an extra check to return
> -EINVAL.
>

yeah, but looking at the code it makes no sense. That's why I want to
be sure it's the allocation I think it is, because it might be as well
a different one I don't really see atm.

> That said, my bet is that some incorrect error handling and the
> resource releases at the connector creation.
>
>
> thanks,
>
> Takashi
>


2023-08-14 14:00:06

by Takashi Iwai

[permalink] [raw]
Subject: Re: 2b5d1c29f6c4 ("drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts")

On Mon, 14 Aug 2023 15:19:11 +0200,
Karol Herbst wrote:
>
> On Mon, Aug 14, 2023 at 2:56 PM Karol Herbst <[email protected]> wrote:
> >
> > On Mon, Aug 14, 2023 at 2:48 PM Takashi Iwai <[email protected]> wrote:
> > >
> > > On Mon, 14 Aug 2023 14:38:18 +0200,
> > > Karol Herbst wrote:
> > > >
> > > > On Wed, Aug 9, 2023 at 6:16 PM Takashi Iwai <[email protected]> wrote:
> > > > >
> > > > > On Wed, 09 Aug 2023 16:46:38 +0200,
> > > > > Takashi Iwai wrote:
> > > > > >
> > > > > > On Wed, 09 Aug 2023 15:13:23 +0200,
> > > > > > Takashi Iwai wrote:
> > > > > > >
> > > > > > > On Wed, 09 Aug 2023 14:19:23 +0200,
> > > > > > > Karol Herbst wrote:
> > > > > > > >
> > > > > > > > On Wed, Aug 9, 2023 at 1:46 PM Takashi Iwai <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > On Wed, 09 Aug 2023 13:42:09 +0200,
> > > > > > > > > Karol Herbst wrote:
> > > > > > > > > >
> > > > > > > > > > On Wed, Aug 9, 2023 at 11:22 AM Takashi Iwai <[email protected]> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Tue, 08 Aug 2023 12:39:32 +0200,
> > > > > > > > > > > Karol Herbst wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Aug 7, 2023 at 5:05 PM Borislav Petkov <[email protected]> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Aug 07, 2023 at 01:49:42PM +0200, Karol Herbst wrote:
> > > > > > > > > > > > > > in what way does it stop? Just not progressing? That would be kinda
> > > > > > > > > > > > > > concerning. Mind tracing with what arguments `nvkm_uevent_add` is
> > > > > > > > > > > > > > called with and without that patch?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Well, me dumping those args I guess made the box not freeze before
> > > > > > > > > > > > > catching a #PF over serial. Does that help?
> > > > > > > > > > > > >
> > > > > > > > > > > > > ....
> > > > > > > > > > > > > [ 3.410135] Unpacking initramfs...
> > > > > > > > > > > > > [ 3.416319] software IO TLB: mapped [mem 0x00000000a877d000-0x00000000ac77d000] (64MB)
> > > > > > > > > > > > > [ 3.418227] Initialise system trusted keyrings
> > > > > > > > > > > > > [ 3.432273] workingset: timestamp_bits=56 max_order=22 bucket_order=0
> > > > > > > > > > > > > [ 3.439006] ntfs: driver 2.1.32 [Flags: R/W].
> > > > > > > > > > > > > [ 3.443368] fuse: init (API version 7.38)
> > > > > > > > > > > > > [ 3.447601] 9p: Installing v9fs 9p2000 file system support
> > > > > > > > > > > > > [ 3.453223] Key type asymmetric registered
> > > > > > > > > > > > > [ 3.457332] Asymmetric key parser 'x509' registered
> > > > > > > > > > > > > [ 3.462236] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 250)
> > > > > > > > > > > > > [ 3.475865] efifb: probing for efifb
> > > > > > > > > > > > > [ 3.479458] efifb: framebuffer at 0xf9000000, using 1920k, total 1920k
> > > > > > > > > > > > > [ 3.485969] efifb: mode is 800x600x32, linelength=3200, pages=1
> > > > > > > > > > > > > [ 3.491872] efifb: scrolling: redraw
> > > > > > > > > > > > > [ 3.495438] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
> > > > > > > > > > > > > [ 3.502349] Console: switching to colour frame buffer device 100x37
> > > > > > > > > > > > > [ 3.509564] fb0: EFI VGA frame buffer device
> > > > > > > > > > > > > [ 3.514013] ACPI: \_PR_.CP00: Found 4 idle states
> > > > > > > > > > > > > [ 3.518850] ACPI: \_PR_.CP01: Found 4 idle states
> > > > > > > > > > > > > [ 3.523687] ACPI: \_PR_.CP02: Found 4 idle states
> > > > > > > > > > > > > [ 3.528515] ACPI: \_PR_.CP03: Found 4 idle states
> > > > > > > > > > > > > [ 3.533346] ACPI: \_PR_.CP04: Found 4 idle states
> > > > > > > > > > > > > [ 3.538173] ACPI: \_PR_.CP05: Found 4 idle states
> > > > > > > > > > > > > [ 3.543003] ACPI: \_PR_.CP06: Found 4 idle states
> > > > > > > > > > > > > [ 3.544219] Freeing initrd memory: 8196K
> > > > > > > > > > > > > [ 3.547844] ACPI: \_PR_.CP07: Found 4 idle states
> > > > > > > > > > > > > [ 3.609542] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> > > > > > > > > > > > > [ 3.616224] 00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
> > > > > > > > > > > > > [ 3.625552] serial 0000:00:16.3: enabling device (0000 -> 0003)
> > > > > > > > > > > > > [ 3.633034] 0000:00:16.3: ttyS1 at I/O 0xf0a0 (irq = 17, base_baud = 115200) is a 16550A
> > > > > > > > > > > > > [ 3.642451] Linux agpgart interface v0.103
> > > > > > > > > > > > > [ 3.647141] ACPI: bus type drm_connector registered
> > > > > > > > > > > > > [ 3.653261] Console: switching to colour dummy device 80x25
> > > > > > > > > > > > > [ 3.659092] nouveau 0000:03:00.0: vgaarb: deactivate vga console
> > > > > > > > > > > > > [ 3.665174] nouveau 0000:03:00.0: NVIDIA GT218 (0a8c00b1)
> > > > > > > > > > > > > [ 3.784585] nouveau 0000:03:00.0: bios: version 70.18.83.00.08
> > > > > > > > > > > > > [ 3.792244] nouveau 0000:03:00.0: fb: 512 MiB DDR3
> > > > > > > > > > > > > [ 3.948786] nouveau 0000:03:00.0: DRM: VRAM: 512 MiB
> > > > > > > > > > > > > [ 3.953755] nouveau 0000:03:00.0: DRM: GART: 1048576 MiB
> > > > > > > > > > > > > [ 3.959073] nouveau 0000:03:00.0: DRM: TMDS table version 2.0
> > > > > > > > > > > > > [ 3.964808] nouveau 0000:03:00.0: DRM: DCB version 4.0
> > > > > > > > > > > > > [ 3.969938] nouveau 0000:03:00.0: DRM: DCB outp 00: 02000360 00000000
> > > > > > > > > > > > > [ 3.976367] nouveau 0000:03:00.0: DRM: DCB outp 01: 02000362 00020010
> > > > > > > > > > > > > [ 3.982792] nouveau 0000:03:00.0: DRM: DCB outp 02: 028003a6 0f220010
> > > > > > > > > > > > > [ 3.989223] nouveau 0000:03:00.0: DRM: DCB outp 03: 01011380 00000000
> > > > > > > > > > > > > [ 3.995647] nouveau 0000:03:00.0: DRM: DCB outp 04: 08011382 00020010
> > > > > > > > > > > > > [ 4.002076] nouveau 0000:03:00.0: DRM: DCB outp 05: 088113c6 0f220010
> > > > > > > > > > > > > [ 4.008511] nouveau 0000:03:00.0: DRM: DCB conn 00: 00101064
> > > > > > > > > > > > > [ 4.014151] nouveau 0000:03:00.0: DRM: DCB conn 01: 00202165
> > > > > > > > > > > > > [ 4.021710] nvkm_uevent_add: uevent: 0xffff888100242100, event: 0xffff8881022de1a0, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > > > > > > > > > > [ 4.033680] nvkm_uevent_add: uevent: 0xffff888100242300, event: 0xffff8881022de1a0, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > > > > > > > > > > [ 4.045429] nouveau 0000:03:00.0: DRM: MM: using COPY for buffer copies
> > > > > > > > > > > > > [ 4.052059] stackdepot: allocating hash table of 1048576 entries via kvcalloc
> > > > > > > > > > > > > [ 4.067191] nvkm_uevent_add: uevent: 0xffff888100242800, event: 0xffff888104b3e260, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > > > > > > > > > > [ 4.078936] nvkm_uevent_add: uevent: 0xffff888100242900, event: 0xffff888104b3e260, id: 0x1, bits: 0x1, func: 0x0000000000000000
> > > > > > > > > > > > > [ 4.090514] nvkm_uevent_add: uevent: 0xffff888100242a00, event: 0xffff888102091f28, id: 0x1, bits: 0x3, func: 0xffffffff8177b700
> > > > > > > > > > > > > [ 4.102118] tsc: Refined TSC clocksource calibration: 3591.345 MHz
> > > > > > > > > > > > > [ 4.108342] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x33c4635c383, max_idle_ns: 440795314831 ns
> > > > > > > > > > > > > [ 4.108401] nvkm_uevent_add: uevent: 0xffff8881020b6000, event: 0xffff888102091f28, id: 0xf, bits: 0x3, func: 0xffffffff8177b700
> > > > > > > > > > > > > [ 4.129864] clocksource: Switched to clocksource tsc
> > > > > > > > > > > > > [ 4.131478] [drm] Initialized nouveau 1.3.1 20120801 for 0000:03:00.0 on minor 0
> > > > > > > > > > > > > [ 4.143806] BUG: kernel NULL pointer dereference, address: 0000000000000020
> > > > > > > > > > > >
> > > > > > > > > > > > ahh, that would have been good to know :) Mind figuring out what's
> > > > > > > > > > > > exactly NULL inside nvif_object_mthd? Or rather what line
> > > > > > > > > > > > `nvif_object_mthd+0x136` belongs to, then it should be easy to figure
> > > > > > > > > > > > out what's wrong here.
> > > > > > > > > > >
> > > > > > > > > > > FWIW, we've hit the bug on openSUSE Tumbleweed 6.4.8 kernel:
> > > > > > > > > > > https://bugzilla.suse.com/show_bug.cgi?id=1214073
> > > > > > > > > > > Confirmed that reverting the patch cured the issue.
> > > > > > > > > > >
> > > > > > > > > > > FWIW, loading nouveau showed a refcount_t warning just before the NULL
> > > > > > > > > > > dereference:
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > mh, I wonder if one of those `return -EINVAL;` branches is hit where
> > > > > > > > > > it wasn't before. Could some of you check if `nvkm_uconn_uevent`
> > > > > > > > > > returns -EINVAL with that patch where it didn't before? I wonder if
> > > > > > > > > > it's the `if (&outp->head == &conn->disp->outps) return -EINVAL;` and
> > > > > > > > > > if remove that fixes the crash?
> > > > > > > > >
> > > > > > > > > Please give a patch, then I can build a kernel and let the reporter
> > > > > > > > > testing it :)
> > > > > > > > >
> > > > > > > >
> > > > > > > > attached a patch.
> > > > > > >
> > > > > > > Thanks. Now I'm building a test kernel and asked the reporter for
> > > > > > > testing it.
> > > > > >
> > > > > > And the result was negative, the boot still hanged up.
> > > > >
> > > > > And below is another log from the 6.4.8 kernel with KASAN-enabled.
> > > > > Some memory corruption seems happening.
> > > > >
> > > > > [ 228.422919] nouveau 0000:02:00.0: DRM: DCB conn 01: 0000a146
> > > > > [ 228.428674] nouveau 0000:02:00.0: DRM: MM: using M2MF for buffer copies
> > > > > [ 228.436682] ==================================================================
> > > > > [ 228.436698] BUG: KASAN: slab-use-after-free in drm_connector_list_iter_next+0x176/0x320
> > > > > [ 228.436715] Read of size 4 at addr ffff8881731ce050 by task modprobe/6174
> > > > >
> > > > > [ 228.436728] CPU: 0 PID: 6174 Comm: modprobe Not tainted 6.4.9-4.g5b9ad20-default #1 openSUSE Tumbleweed (unreleased) d0a6841e538b38d17513f6942fb58770372b54fd
> > > > > [ 228.436740] Hardware name: Apple Inc. MacBook5,1/Mac-F42D89C8, BIOS MB51.88Z.007D.B03.0904271443 04/27/09
> > > > > [ 228.436747] Call Trace:
> > > > > [ 228.436753] <TASK>
> > > > > [ 228.436759] dump_stack_lvl+0x47/0x60
> > > > > [ 228.436773] print_report+0xcf/0x640
> > > > > [ 228.436784] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
> > > > > [ 228.436797] ? drm_connector_list_iter_next+0x176/0x320
> > > > > [ 228.436807] kasan_report+0xb1/0xe0
> > > > > [ 228.436817] ? drm_connector_list_iter_next+0x176/0x320
> > > > > [ 228.436828] kasan_check_range+0x105/0x1b0
> > > > > [ 228.436837] drm_connector_list_iter_next+0x176/0x320
> > > > > [ 228.436848] ? __pfx_drm_connector_list_iter_next+0x10/0x10
> > > > > [ 228.436859] ? __kmem_cache_free+0x18a/0x2c0
> > > > > [ 228.436868] nouveau_connector_create+0x170/0x1cd0 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > [ 228.437540] ? drm_encoder_init+0xbe/0x140
> > > > > [ 228.437554] ? __pfx_nouveau_connector_create+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > [ 228.438137] ? nvif_outp_ctor+0x2d9/0x430 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > [ 228.438236] nv50_display_create+0xe54/0x30d0 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > [ 228.438236] nouveau_display_create+0x903/0x10c0 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > [ 228.438236] nouveau_drm_device_init+0x3a4/0x19e0 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > [ 228.438236] ? __pfx_nouveau_drm_device_init+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > [ 228.438236] ? __pfx_pci_update_current_state+0x10/0x10
> > > > > [ 228.438236] ? __kasan_check_byte+0x13/0x50
> > > > > [ 228.438236] nouveau_drm_probe+0x1a2/0x6b0 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > [ 228.438236] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
> > > > > [ 228.438236] ? __pfx_nouveau_drm_probe+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > [ 228.438236] ? __pfx_nouveau_drm_probe+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > [ 228.438236] local_pci_probe+0xdd/0x190
> > > > > [ 228.438236] pci_device_probe+0x23a/0x770
> > > > > [ 228.438236] ? kernfs_add_one+0x2d8/0x450
> > > > > [ 228.438236] ? kernfs_get.part.0+0x4c/0x70
> > > > > [ 228.438236] ? __pfx_pci_device_probe+0x10/0x10
> > > > > [ 228.438236] ? kernfs_create_link+0x15f/0x230
> > > > > [ 228.438236] ? kernfs_put+0x1c/0x40
> > > > > [ 228.438236] ? sysfs_do_create_link_sd+0x8e/0x100
> > > > > [ 228.438236] really_probe+0x3e2/0xb80
> > > > > [ 228.438236] __driver_probe_device+0x18c/0x450
> > > > > [ 228.438236] ? __pfx_klist_iter_init_node+0x10/0x10
> > > > > [ 228.438236] driver_probe_device+0x4a/0x120
> > > > > [ 228.438236] __driver_attach+0x1e1/0x4a0
> > > > > [ 228.438236] ? __pfx___driver_attach+0x10/0x10
> > > > > [ 228.438236] bus_for_each_dev+0xf4/0x170
> > > > > [ 228.438236] ? __pfx__raw_spin_lock+0x10/0x10
> > > > > [ 228.438236] ? __pfx_bus_for_each_dev+0x10/0x10
> > > > > [ 228.438236] bus_add_driver+0x29e/0x570
> > > > > [ 228.438236] ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > [ 228.438236] ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > [ 228.438236] driver_register+0x134/0x460
> > > > > [ 228.438236] ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > [ 228.438236] do_one_initcall+0x8e/0x310
> > > > > [ 228.438236] ? __pfx_do_one_initcall+0x10/0x10
> > > > > [ 228.438236] ? __kmem_cache_alloc_node+0x1b9/0x3b0
> > > > > [ 228.438236] ? do_init_module+0x4b/0x730
> > > > > [ 228.438236] ? kasan_unpoison+0x44/0x70
> > > > > [ 228.438236] do_init_module+0x238/0x730
> > > > > [ 228.438236] load_module+0x5b41/0x6dd0
> > > > > [ 228.438236] ? __pfx_load_module+0x10/0x10
> > > > > [ 228.438236] ? _raw_spin_lock+0x85/0xe0
> > > > > [ 228.438236] ? __pfx__raw_spin_lock+0x10/0x10
> > > > > [ 228.438236] ? find_vmap_area+0xab/0xe0
> > > > > [ 228.438236] ? __do_sys_init_module+0x1df/0x210
> > > > > [ 228.438236] __do_sys_init_module+0x1df/0x210
> > > > > [ 228.438236] ? __pfx___do_sys_init_module+0x10/0x10
> > > > > [ 228.438236] ? syscall_exit_to_user_mode+0x1b/0x40
> > > > > [ 228.438236] ? do_syscall_64+0x6c/0x90
> > > > > [ 228.438236] ? __pfx_ksys_read+0x10/0x10
> > > > > [ 228.438236] do_syscall_64+0x60/0x90
> > > > > [ 228.438236] ? syscall_exit_to_user_mode+0x1b/0x40
> > > > > [ 228.438236] ? do_syscall_64+0x6c/0x90
> > > > > [ 228.438236] ? syscall_exit_to_user_mode+0x1b/0x40
> > > > > [ 228.438236] ? do_syscall_64+0x6c/0x90
> > > > > [ 228.438236] ? exc_page_fault+0x62/0xd0
> > > > > [ 228.438236] entry_SYSCALL_64_after_hwframe+0x77/0xe1
> > > > > [ 228.438236] RIP: 0033:0x7f91ce119a5e
> > > > > [ 228.438236] Code: c3 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 66 90 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 7a 03 0d 00 f7 d8 64 89 01 48
> > > > > [ 228.438236] RSP: 002b:00007ffce2813538 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
> > > > > [ 228.438236] RAX: ffffffffffffffda RBX: 00005588462def10 RCX: 00007f91ce119a5e
> > > > > [ 228.438236] RDX: 00005588462e39c0 RSI: 0000000000fda8b2 RDI: 00007f91cc371010
> > > > > [ 228.438236] RBP: 00005588462e39c0 R08: 00005588462e3ce0 R09: 0000000000000000
> > > > > [ 228.438236] R10: 000000000005af11 R11: 0000000000000246 R12: 0000000000040000
> > > > > [ 228.438236] R13: 0000000000000000 R14: 0000000000000009 R15: 00005588462de7c0
> > > > > [ 228.438236] </TASK>
> > > > >
> > > > > [ 228.438236] Allocated by task 6174:
> > > > > [ 228.438236] kasan_save_stack+0x20/0x40
> > > > > [ 228.438236] kasan_set_track+0x25/0x30
> > > > > [ 228.438236] __kasan_kmalloc+0xaa/0xb0
> > > > > [ 228.438236] nouveau_connector_create+0x386/0x1cd0 [nouveau]
> > > > > [ 228.438236] nv50_display_create+0xe54/0x30d0 [nouveau]
> > > > > [ 228.438236] nouveau_display_create+0x903/0x10c0 [nouveau]
> > > > > [ 228.438236] nouveau_drm_device_init+0x3a4/0x19e0 [nouveau]
> > > > > [ 228.438236] nouveau_drm_probe+0x1a2/0x6b0 [nouveau]
> > > > > [ 228.438236] local_pci_probe+0xdd/0x190
> > > > > [ 228.438236] pci_device_probe+0x23a/0x770
> > > > > [ 228.438236] really_probe+0x3e2/0xb80
> > > > > [ 228.438236] __driver_probe_device+0x18c/0x450
> > > > > [ 228.438236] driver_probe_device+0x4a/0x120
> > > > > [ 228.438236] __driver_attach+0x1e1/0x4a0
> > > > > [ 228.438236] bus_for_each_dev+0xf4/0x170
> > > > > [ 228.438236] bus_add_driver+0x29e/0x570
> > > > > [ 228.438236] driver_register+0x134/0x460
> > > > > [ 228.438236] do_one_initcall+0x8e/0x310
> > > > > [ 228.438236] do_init_module+0x238/0x730
> > > > > [ 228.438236] load_module+0x5b41/0x6dd0
> > > > > [ 228.438236] __do_sys_init_module+0x1df/0x210
> > > > > [ 228.438236] do_syscall_64+0x60/0x90
> > > > > [ 228.438236] entry_SYSCALL_64_after_hwframe+0x77/0xe1
> > > > >
> > > > > [ 228.438236] Freed by task 6174:
> > > > > [ 228.438236] kasan_save_stack+0x20/0x40
> > > > > [ 228.438236] kasan_set_track+0x25/0x30
> > > > > [ 228.438236] kasan_save_free_info+0x2e/0x50
> > > > > [ 228.438236] ____kasan_slab_free+0x169/0x1c0
> > > > > [ 228.438236] slab_free_freelist_hook+0xcd/0x190
> > > > > [ 228.438236] __kmem_cache_free+0x18a/0x2c0
> > > > > [ 228.438236] nouveau_connector_create+0x1423/0x1cd0 [nouveau]
> > > > > [ 228.438236] nv50_display_create+0xe54/0x30d0 [nouveau]
> > > > > [ 228.438236] nouveau_display_create+0x903/0x10c0 [nouveau]
> > > > > [ 228.438236] nouveau_drm_device_init+0x3a4/0x19e0 [nouveau]
> > > > > [ 228.438236] nouveau_drm_probe+0x1a2/0x6b0 [nouveau]
> > > > > [ 228.438236] local_pci_probe+0xdd/0x190
> > > > > [ 228.438236] pci_device_probe+0x23a/0x770
> > > > > [ 228.438236] really_probe+0x3e2/0xb80
> > > > > [ 228.438236] __driver_probe_device+0x18c/0x450
> > > > > [ 228.438236] driver_probe_device+0x4a/0x120
> > > > > [ 228.438236] __driver_attach+0x1e1/0x4a0
> > > > > [ 228.438236] bus_for_each_dev+0xf4/0x170
> > > > > [ 228.438236] bus_add_driver+0x29e/0x570
> > > > > [ 228.438236] driver_register+0x134/0x460
> > > > > [ 228.438236] do_one_initcall+0x8e/0x310
> > > > > [ 228.438236] do_init_module+0x238/0x730
> > > > > [ 228.438236] load_module+0x5b41/0x6dd0
> > > > > [ 228.438236] __do_sys_init_module+0x1df/0x210
> > > > > [ 228.438236] do_syscall_64+0x60/0x90
> > > > > [ 228.438236] entry_SYSCALL_64_after_hwframe+0x77/0xe1
> > > > >
> > > > > [ 228.438236] The buggy address belongs to the object at ffff8881731ce000
> > > > > which belongs to the cache kmalloc-4k of size 4096
> > > > > [ 228.438236] The buggy address is located 80 bytes inside of
> > > > > freed 4096-byte region [ffff8881731ce000, ffff8881731cf000)
> > > > >
> > > > > [ 228.438236] The buggy address belongs to the physical page:
> > > > > [ 228.438236] page:00000000d1c274b4 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1731c8
> > > > > [ 228.438236] head:00000000d1c274b4 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
> > > > > [ 228.438236] flags: 0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
> > > > > [ 228.438236] page_type: 0xffffffff()
> > > > > [ 228.438236] raw: 0017ffffc0010200 ffff888100042140 dead000000000122 0000000000000000
> > > > > [ 228.438236] raw: 0000000000000000 0000000080040004 00000001ffffffff 0000000000000000
> > > > > [ 228.438236] page dumped because: kasan: bad access detected
> > > > >
> > > > > [ 228.438236] Memory state around the buggy address:
> > > > > [ 228.438236] ffff8881731cdf00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> > > > > [ 228.438236] ffff8881731cdf80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> > > > > [ 228.438236] >ffff8881731ce000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > > > [ 228.438236] ^
> > > > > [ 228.438236] ffff8881731ce080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > > > [ 228.438236] ffff8881731ce100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > > > [ 228.438236] ==================================================================
> > > > >
> > > >
> > > > mind resolving those to file lines via decode_stacktrace.sh or
> > > > something, because looking at it, it makes no sense really.
> > >
> > > I don't own the machine, so it's a bit difficult from my side,
> > > unfortunately.
> > >
>
> also, you don't need to run it on the same machine if it's all
> distribution packaged. As long as you have the exact same binary
> available you can resolve the lines. Or just use gdb:
> https://docs.kernel.org/admin-guide/bug-hunting.html#gdb

Unfortunately it's not possible, as it's a moving target (following
the upstream development), and the rpm packages used for the report
are already gone. What I can get now won't match any longer.

But now I wonder whether this can be reproduced by injecting an
error artificially.


Takashi

2023-08-14 14:32:15

by Karol Herbst

[permalink] [raw]
Subject: Re: 2b5d1c29f6c4 ("drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts")

On Mon, Aug 14, 2023 at 2:56 PM Karol Herbst <[email protected]> wrote:
>
> On Mon, Aug 14, 2023 at 2:48 PM Takashi Iwai <[email protected]> wrote:
> >
> > On Mon, 14 Aug 2023 14:38:18 +0200,
> > Karol Herbst wrote:
> > >
> > > On Wed, Aug 9, 2023 at 6:16 PM Takashi Iwai <[email protected]> wrote:
> > > >
> > > > On Wed, 09 Aug 2023 16:46:38 +0200,
> > > > Takashi Iwai wrote:
> > > > >
> > > > > On Wed, 09 Aug 2023 15:13:23 +0200,
> > > > > Takashi Iwai wrote:
> > > > > >
> > > > > > On Wed, 09 Aug 2023 14:19:23 +0200,
> > > > > > Karol Herbst wrote:
> > > > > > >
> > > > > > > On Wed, Aug 9, 2023 at 1:46 PM Takashi Iwai <[email protected]> wrote:
> > > > > > > >
> > > > > > > > On Wed, 09 Aug 2023 13:42:09 +0200,
> > > > > > > > Karol Herbst wrote:
> > > > > > > > >
> > > > > > > > > On Wed, Aug 9, 2023 at 11:22 AM Takashi Iwai <[email protected]> wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, 08 Aug 2023 12:39:32 +0200,
> > > > > > > > > > Karol Herbst wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Aug 7, 2023 at 5:05 PM Borislav Petkov <[email protected]> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Aug 07, 2023 at 01:49:42PM +0200, Karol Herbst wrote:
> > > > > > > > > > > > > in what way does it stop? Just not progressing? That would be kinda
> > > > > > > > > > > > > concerning. Mind tracing with what arguments `nvkm_uevent_add` is
> > > > > > > > > > > > > called with and without that patch?
> > > > > > > > > > > >
> > > > > > > > > > > > Well, me dumping those args I guess made the box not freeze before
> > > > > > > > > > > > catching a #PF over serial. Does that help?
> > > > > > > > > > > >
> > > > > > > > > > > > ....
> > > > > > > > > > > > [ 3.410135] Unpacking initramfs...
> > > > > > > > > > > > [ 3.416319] software IO TLB: mapped [mem 0x00000000a877d000-0x00000000ac77d000] (64MB)
> > > > > > > > > > > > [ 3.418227] Initialise system trusted keyrings
> > > > > > > > > > > > [ 3.432273] workingset: timestamp_bits=56 max_order=22 bucket_order=0
> > > > > > > > > > > > [ 3.439006] ntfs: driver 2.1.32 [Flags: R/W].
> > > > > > > > > > > > [ 3.443368] fuse: init (API version 7.38)
> > > > > > > > > > > > [ 3.447601] 9p: Installing v9fs 9p2000 file system support
> > > > > > > > > > > > [ 3.453223] Key type asymmetric registered
> > > > > > > > > > > > [ 3.457332] Asymmetric key parser 'x509' registered
> > > > > > > > > > > > [ 3.462236] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 250)
> > > > > > > > > > > > [ 3.475865] efifb: probing for efifb
> > > > > > > > > > > > [ 3.479458] efifb: framebuffer at 0xf9000000, using 1920k, total 1920k
> > > > > > > > > > > > [ 3.485969] efifb: mode is 800x600x32, linelength=3200, pages=1
> > > > > > > > > > > > [ 3.491872] efifb: scrolling: redraw
> > > > > > > > > > > > [ 3.495438] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
> > > > > > > > > > > > [ 3.502349] Console: switching to colour frame buffer device 100x37
> > > > > > > > > > > > [ 3.509564] fb0: EFI VGA frame buffer device
> > > > > > > > > > > > [ 3.514013] ACPI: \_PR_.CP00: Found 4 idle states
> > > > > > > > > > > > [ 3.518850] ACPI: \_PR_.CP01: Found 4 idle states
> > > > > > > > > > > > [ 3.523687] ACPI: \_PR_.CP02: Found 4 idle states
> > > > > > > > > > > > [ 3.528515] ACPI: \_PR_.CP03: Found 4 idle states
> > > > > > > > > > > > [ 3.533346] ACPI: \_PR_.CP04: Found 4 idle states
> > > > > > > > > > > > [ 3.538173] ACPI: \_PR_.CP05: Found 4 idle states
> > > > > > > > > > > > [ 3.543003] ACPI: \_PR_.CP06: Found 4 idle states
> > > > > > > > > > > > [ 3.544219] Freeing initrd memory: 8196K
> > > > > > > > > > > > [ 3.547844] ACPI: \_PR_.CP07: Found 4 idle states
> > > > > > > > > > > > [ 3.609542] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> > > > > > > > > > > > [ 3.616224] 00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
> > > > > > > > > > > > [ 3.625552] serial 0000:00:16.3: enabling device (0000 -> 0003)
> > > > > > > > > > > > [ 3.633034] 0000:00:16.3: ttyS1 at I/O 0xf0a0 (irq = 17, base_baud = 115200) is a 16550A
> > > > > > > > > > > > [ 3.642451] Linux agpgart interface v0.103
> > > > > > > > > > > > [ 3.647141] ACPI: bus type drm_connector registered
> > > > > > > > > > > > [ 3.653261] Console: switching to colour dummy device 80x25
> > > > > > > > > > > > [ 3.659092] nouveau 0000:03:00.0: vgaarb: deactivate vga console
> > > > > > > > > > > > [ 3.665174] nouveau 0000:03:00.0: NVIDIA GT218 (0a8c00b1)
> > > > > > > > > > > > [ 3.784585] nouveau 0000:03:00.0: bios: version 70.18.83.00.08
> > > > > > > > > > > > [ 3.792244] nouveau 0000:03:00.0: fb: 512 MiB DDR3
> > > > > > > > > > > > [ 3.948786] nouveau 0000:03:00.0: DRM: VRAM: 512 MiB
> > > > > > > > > > > > [ 3.953755] nouveau 0000:03:00.0: DRM: GART: 1048576 MiB
> > > > > > > > > > > > [ 3.959073] nouveau 0000:03:00.0: DRM: TMDS table version 2.0
> > > > > > > > > > > > [ 3.964808] nouveau 0000:03:00.0: DRM: DCB version 4.0
> > > > > > > > > > > > [ 3.969938] nouveau 0000:03:00.0: DRM: DCB outp 00: 02000360 00000000
> > > > > > > > > > > > [ 3.976367] nouveau 0000:03:00.0: DRM: DCB outp 01: 02000362 00020010
> > > > > > > > > > > > [ 3.982792] nouveau 0000:03:00.0: DRM: DCB outp 02: 028003a6 0f220010
> > > > > > > > > > > > [ 3.989223] nouveau 0000:03:00.0: DRM: DCB outp 03: 01011380 00000000
> > > > > > > > > > > > [ 3.995647] nouveau 0000:03:00.0: DRM: DCB outp 04: 08011382 00020010
> > > > > > > > > > > > [ 4.002076] nouveau 0000:03:00.0: DRM: DCB outp 05: 088113c6 0f220010
> > > > > > > > > > > > [ 4.008511] nouveau 0000:03:00.0: DRM: DCB conn 00: 00101064
> > > > > > > > > > > > [ 4.014151] nouveau 0000:03:00.0: DRM: DCB conn 01: 00202165
> > > > > > > > > > > > [ 4.021710] nvkm_uevent_add: uevent: 0xffff888100242100, event: 0xffff8881022de1a0, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > > > > > > > > > [ 4.033680] nvkm_uevent_add: uevent: 0xffff888100242300, event: 0xffff8881022de1a0, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > > > > > > > > > [ 4.045429] nouveau 0000:03:00.0: DRM: MM: using COPY for buffer copies
> > > > > > > > > > > > [ 4.052059] stackdepot: allocating hash table of 1048576 entries via kvcalloc
> > > > > > > > > > > > [ 4.067191] nvkm_uevent_add: uevent: 0xffff888100242800, event: 0xffff888104b3e260, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > > > > > > > > > [ 4.078936] nvkm_uevent_add: uevent: 0xffff888100242900, event: 0xffff888104b3e260, id: 0x1, bits: 0x1, func: 0x0000000000000000
> > > > > > > > > > > > [ 4.090514] nvkm_uevent_add: uevent: 0xffff888100242a00, event: 0xffff888102091f28, id: 0x1, bits: 0x3, func: 0xffffffff8177b700
> > > > > > > > > > > > [ 4.102118] tsc: Refined TSC clocksource calibration: 3591.345 MHz
> > > > > > > > > > > > [ 4.108342] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x33c4635c383, max_idle_ns: 440795314831 ns
> > > > > > > > > > > > [ 4.108401] nvkm_uevent_add: uevent: 0xffff8881020b6000, event: 0xffff888102091f28, id: 0xf, bits: 0x3, func: 0xffffffff8177b700
> > > > > > > > > > > > [ 4.129864] clocksource: Switched to clocksource tsc
> > > > > > > > > > > > [ 4.131478] [drm] Initialized nouveau 1.3.1 20120801 for 0000:03:00.0 on minor 0
> > > > > > > > > > > > [ 4.143806] BUG: kernel NULL pointer dereference, address: 0000000000000020
> > > > > > > > > > >
> > > > > > > > > > > ahh, that would have been good to know :) Mind figuring out what's
> > > > > > > > > > > exactly NULL inside nvif_object_mthd? Or rather what line
> > > > > > > > > > > `nvif_object_mthd+0x136` belongs to, then it should be easy to figure
> > > > > > > > > > > out what's wrong here.
> > > > > > > > > >
> > > > > > > > > > FWIW, we've hit the bug on openSUSE Tumbleweed 6.4.8 kernel:
> > > > > > > > > > https://bugzilla.suse.com/show_bug.cgi?id=1214073
> > > > > > > > > > Confirmed that reverting the patch cured the issue.
> > > > > > > > > >
> > > > > > > > > > FWIW, loading nouveau showed a refcount_t warning just before the NULL
> > > > > > > > > > dereference:
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > mh, I wonder if one of those `return -EINVAL;` branches is hit where
> > > > > > > > > it wasn't before. Could some of you check if `nvkm_uconn_uevent`
> > > > > > > > > returns -EINVAL with that patch where it didn't before? I wonder if
> > > > > > > > > it's the `if (&outp->head == &conn->disp->outps) return -EINVAL;` and
> > > > > > > > > if remove that fixes the crash?
> > > > > > > >
> > > > > > > > Please give a patch, then I can build a kernel and let the reporter
> > > > > > > > testing it :)
> > > > > > > >
> > > > > > >
> > > > > > > attached a patch.
> > > > > >
> > > > > > Thanks. Now I'm building a test kernel and asked the reporter for
> > > > > > testing it.
> > > > >
> > > > > And the result was negative, the boot still hanged up.
> > > >
> > > > And below is another log from the 6.4.8 kernel with KASAN-enabled.
> > > > Some memory corruption seems happening.
> > > >
> > > > [ 228.422919] nouveau 0000:02:00.0: DRM: DCB conn 01: 0000a146
> > > > [ 228.428674] nouveau 0000:02:00.0: DRM: MM: using M2MF for buffer copies
> > > > [ 228.436682] ==================================================================
> > > > [ 228.436698] BUG: KASAN: slab-use-after-free in drm_connector_list_iter_next+0x176/0x320
> > > > [ 228.436715] Read of size 4 at addr ffff8881731ce050 by task modprobe/6174
> > > >
> > > > [ 228.436728] CPU: 0 PID: 6174 Comm: modprobe Not tainted 6.4.9-4.g5b9ad20-default #1 openSUSE Tumbleweed (unreleased) d0a6841e538b38d17513f6942fb58770372b54fd
> > > > [ 228.436740] Hardware name: Apple Inc. MacBook5,1/Mac-F42D89C8, BIOS MB51.88Z.007D.B03.0904271443 04/27/09
> > > > [ 228.436747] Call Trace:
> > > > [ 228.436753] <TASK>
> > > > [ 228.436759] dump_stack_lvl+0x47/0x60
> > > > [ 228.436773] print_report+0xcf/0x640
> > > > [ 228.436784] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
> > > > [ 228.436797] ? drm_connector_list_iter_next+0x176/0x320
> > > > [ 228.436807] kasan_report+0xb1/0xe0
> > > > [ 228.436817] ? drm_connector_list_iter_next+0x176/0x320
> > > > [ 228.436828] kasan_check_range+0x105/0x1b0
> > > > [ 228.436837] drm_connector_list_iter_next+0x176/0x320
> > > > [ 228.436848] ? __pfx_drm_connector_list_iter_next+0x10/0x10
> > > > [ 228.436859] ? __kmem_cache_free+0x18a/0x2c0
> > > > [ 228.436868] nouveau_connector_create+0x170/0x1cd0 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > [ 228.437540] ? drm_encoder_init+0xbe/0x140
> > > > [ 228.437554] ? __pfx_nouveau_connector_create+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > [ 228.438137] ? nvif_outp_ctor+0x2d9/0x430 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > [ 228.438236] nv50_display_create+0xe54/0x30d0 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > [ 228.438236] nouveau_display_create+0x903/0x10c0 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > [ 228.438236] nouveau_drm_device_init+0x3a4/0x19e0 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > [ 228.438236] ? __pfx_nouveau_drm_device_init+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > [ 228.438236] ? __pfx_pci_update_current_state+0x10/0x10
> > > > [ 228.438236] ? __kasan_check_byte+0x13/0x50
> > > > [ 228.438236] nouveau_drm_probe+0x1a2/0x6b0 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > [ 228.438236] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
> > > > [ 228.438236] ? __pfx_nouveau_drm_probe+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > [ 228.438236] ? __pfx_nouveau_drm_probe+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > [ 228.438236] local_pci_probe+0xdd/0x190
> > > > [ 228.438236] pci_device_probe+0x23a/0x770
> > > > [ 228.438236] ? kernfs_add_one+0x2d8/0x450
> > > > [ 228.438236] ? kernfs_get.part.0+0x4c/0x70
> > > > [ 228.438236] ? __pfx_pci_device_probe+0x10/0x10
> > > > [ 228.438236] ? kernfs_create_link+0x15f/0x230
> > > > [ 228.438236] ? kernfs_put+0x1c/0x40
> > > > [ 228.438236] ? sysfs_do_create_link_sd+0x8e/0x100
> > > > [ 228.438236] really_probe+0x3e2/0xb80
> > > > [ 228.438236] __driver_probe_device+0x18c/0x450
> > > > [ 228.438236] ? __pfx_klist_iter_init_node+0x10/0x10
> > > > [ 228.438236] driver_probe_device+0x4a/0x120
> > > > [ 228.438236] __driver_attach+0x1e1/0x4a0
> > > > [ 228.438236] ? __pfx___driver_attach+0x10/0x10
> > > > [ 228.438236] bus_for_each_dev+0xf4/0x170
> > > > [ 228.438236] ? __pfx__raw_spin_lock+0x10/0x10
> > > > [ 228.438236] ? __pfx_bus_for_each_dev+0x10/0x10
> > > > [ 228.438236] bus_add_driver+0x29e/0x570
> > > > [ 228.438236] ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > [ 228.438236] ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > [ 228.438236] driver_register+0x134/0x460
> > > > [ 228.438236] ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > [ 228.438236] do_one_initcall+0x8e/0x310
> > > > [ 228.438236] ? __pfx_do_one_initcall+0x10/0x10
> > > > [ 228.438236] ? __kmem_cache_alloc_node+0x1b9/0x3b0
> > > > [ 228.438236] ? do_init_module+0x4b/0x730
> > > > [ 228.438236] ? kasan_unpoison+0x44/0x70
> > > > [ 228.438236] do_init_module+0x238/0x730
> > > > [ 228.438236] load_module+0x5b41/0x6dd0
> > > > [ 228.438236] ? __pfx_load_module+0x10/0x10
> > > > [ 228.438236] ? _raw_spin_lock+0x85/0xe0
> > > > [ 228.438236] ? __pfx__raw_spin_lock+0x10/0x10
> > > > [ 228.438236] ? find_vmap_area+0xab/0xe0
> > > > [ 228.438236] ? __do_sys_init_module+0x1df/0x210
> > > > [ 228.438236] __do_sys_init_module+0x1df/0x210
> > > > [ 228.438236] ? __pfx___do_sys_init_module+0x10/0x10
> > > > [ 228.438236] ? syscall_exit_to_user_mode+0x1b/0x40
> > > > [ 228.438236] ? do_syscall_64+0x6c/0x90
> > > > [ 228.438236] ? __pfx_ksys_read+0x10/0x10
> > > > [ 228.438236] do_syscall_64+0x60/0x90
> > > > [ 228.438236] ? syscall_exit_to_user_mode+0x1b/0x40
> > > > [ 228.438236] ? do_syscall_64+0x6c/0x90
> > > > [ 228.438236] ? syscall_exit_to_user_mode+0x1b/0x40
> > > > [ 228.438236] ? do_syscall_64+0x6c/0x90
> > > > [ 228.438236] ? exc_page_fault+0x62/0xd0
> > > > [ 228.438236] entry_SYSCALL_64_after_hwframe+0x77/0xe1
> > > > [ 228.438236] RIP: 0033:0x7f91ce119a5e
> > > > [ 228.438236] Code: c3 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 66 90 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 7a 03 0d 00 f7 d8 64 89 01 48
> > > > [ 228.438236] RSP: 002b:00007ffce2813538 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
> > > > [ 228.438236] RAX: ffffffffffffffda RBX: 00005588462def10 RCX: 00007f91ce119a5e
> > > > [ 228.438236] RDX: 00005588462e39c0 RSI: 0000000000fda8b2 RDI: 00007f91cc371010
> > > > [ 228.438236] RBP: 00005588462e39c0 R08: 00005588462e3ce0 R09: 0000000000000000
> > > > [ 228.438236] R10: 000000000005af11 R11: 0000000000000246 R12: 0000000000040000
> > > > [ 228.438236] R13: 0000000000000000 R14: 0000000000000009 R15: 00005588462de7c0
> > > > [ 228.438236] </TASK>
> > > >
> > > > [ 228.438236] Allocated by task 6174:
> > > > [ 228.438236] kasan_save_stack+0x20/0x40
> > > > [ 228.438236] kasan_set_track+0x25/0x30
> > > > [ 228.438236] __kasan_kmalloc+0xaa/0xb0
> > > > [ 228.438236] nouveau_connector_create+0x386/0x1cd0 [nouveau]
> > > > [ 228.438236] nv50_display_create+0xe54/0x30d0 [nouveau]
> > > > [ 228.438236] nouveau_display_create+0x903/0x10c0 [nouveau]
> > > > [ 228.438236] nouveau_drm_device_init+0x3a4/0x19e0 [nouveau]
> > > > [ 228.438236] nouveau_drm_probe+0x1a2/0x6b0 [nouveau]
> > > > [ 228.438236] local_pci_probe+0xdd/0x190
> > > > [ 228.438236] pci_device_probe+0x23a/0x770
> > > > [ 228.438236] really_probe+0x3e2/0xb80
> > > > [ 228.438236] __driver_probe_device+0x18c/0x450
> > > > [ 228.438236] driver_probe_device+0x4a/0x120
> > > > [ 228.438236] __driver_attach+0x1e1/0x4a0
> > > > [ 228.438236] bus_for_each_dev+0xf4/0x170
> > > > [ 228.438236] bus_add_driver+0x29e/0x570
> > > > [ 228.438236] driver_register+0x134/0x460
> > > > [ 228.438236] do_one_initcall+0x8e/0x310
> > > > [ 228.438236] do_init_module+0x238/0x730
> > > > [ 228.438236] load_module+0x5b41/0x6dd0
> > > > [ 228.438236] __do_sys_init_module+0x1df/0x210
> > > > [ 228.438236] do_syscall_64+0x60/0x90
> > > > [ 228.438236] entry_SYSCALL_64_after_hwframe+0x77/0xe1
> > > >
> > > > [ 228.438236] Freed by task 6174:
> > > > [ 228.438236] kasan_save_stack+0x20/0x40
> > > > [ 228.438236] kasan_set_track+0x25/0x30
> > > > [ 228.438236] kasan_save_free_info+0x2e/0x50
> > > > [ 228.438236] ____kasan_slab_free+0x169/0x1c0
> > > > [ 228.438236] slab_free_freelist_hook+0xcd/0x190
> > > > [ 228.438236] __kmem_cache_free+0x18a/0x2c0
> > > > [ 228.438236] nouveau_connector_create+0x1423/0x1cd0 [nouveau]
> > > > [ 228.438236] nv50_display_create+0xe54/0x30d0 [nouveau]
> > > > [ 228.438236] nouveau_display_create+0x903/0x10c0 [nouveau]
> > > > [ 228.438236] nouveau_drm_device_init+0x3a4/0x19e0 [nouveau]
> > > > [ 228.438236] nouveau_drm_probe+0x1a2/0x6b0 [nouveau]
> > > > [ 228.438236] local_pci_probe+0xdd/0x190
> > > > [ 228.438236] pci_device_probe+0x23a/0x770
> > > > [ 228.438236] really_probe+0x3e2/0xb80
> > > > [ 228.438236] __driver_probe_device+0x18c/0x450
> > > > [ 228.438236] driver_probe_device+0x4a/0x120
> > > > [ 228.438236] __driver_attach+0x1e1/0x4a0
> > > > [ 228.438236] bus_for_each_dev+0xf4/0x170
> > > > [ 228.438236] bus_add_driver+0x29e/0x570
> > > > [ 228.438236] driver_register+0x134/0x460
> > > > [ 228.438236] do_one_initcall+0x8e/0x310
> > > > [ 228.438236] do_init_module+0x238/0x730
> > > > [ 228.438236] load_module+0x5b41/0x6dd0
> > > > [ 228.438236] __do_sys_init_module+0x1df/0x210
> > > > [ 228.438236] do_syscall_64+0x60/0x90
> > > > [ 228.438236] entry_SYSCALL_64_after_hwframe+0x77/0xe1
> > > >
> > > > [ 228.438236] The buggy address belongs to the object at ffff8881731ce000
> > > > which belongs to the cache kmalloc-4k of size 4096
> > > > [ 228.438236] The buggy address is located 80 bytes inside of
> > > > freed 4096-byte region [ffff8881731ce000, ffff8881731cf000)
> > > >
> > > > [ 228.438236] The buggy address belongs to the physical page:
> > > > [ 228.438236] page:00000000d1c274b4 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1731c8
> > > > [ 228.438236] head:00000000d1c274b4 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
> > > > [ 228.438236] flags: 0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
> > > > [ 228.438236] page_type: 0xffffffff()
> > > > [ 228.438236] raw: 0017ffffc0010200 ffff888100042140 dead000000000122 0000000000000000
> > > > [ 228.438236] raw: 0000000000000000 0000000080040004 00000001ffffffff 0000000000000000
> > > > [ 228.438236] page dumped because: kasan: bad access detected
> > > >
> > > > [ 228.438236] Memory state around the buggy address:
> > > > [ 228.438236] ffff8881731cdf00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> > > > [ 228.438236] ffff8881731cdf80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> > > > [ 228.438236] >ffff8881731ce000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > > [ 228.438236] ^
> > > > [ 228.438236] ffff8881731ce080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > > [ 228.438236] ffff8881731ce100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > > [ 228.438236] ==================================================================
> > > >
> > >
> > > mind resolving those to file lines via decode_stacktrace.sh or
> > > something, because looking at it, it makes no sense really.
> >
> > I don't own the machine, so it's a bit difficult from my side,
> > unfortunately.
> >

also, you don't need to run it on the same machine if it's all
distribution packaged. As long as you have the exact same binary
available you can resolve the lines. Or just use gdb:
https://docs.kernel.org/admin-guide/bug-hunting.html#gdb

> > But you can read the log and find easily that the object is *freed*
> > at nouveau_connector_create() called from nv50_display_create(). It
> > implies that, even after a connector is freed by an error, yet the
> > object is still referred at nouveau_display_create(). This explains
> > why the error starts appearing after you put an extra check to return
> > -EINVAL.
> >
>
> yeah, but looking at the code it makes no sense. That's why I want to
> be sure it's the allocation I think it is, because it might be as well
> a different one I don't really see atm.
>
> > That said, my bet is that some incorrect error handling and the
> > resource releases at the connector creation.
> >
> >
> > thanks,
> >
> > Takashi
> >


2023-08-14 15:09:58

by Karol Herbst

[permalink] [raw]
Subject: Re: 2b5d1c29f6c4 ("drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts")

On Mon, Aug 14, 2023 at 4:19 PM Karol Herbst <[email protected]> wrote:
>
> On Mon, Aug 14, 2023 at 3:35 PM Takashi Iwai <[email protected]> wrote:
> >
> > On Mon, 14 Aug 2023 15:19:11 +0200,
> > Karol Herbst wrote:
> > >
> > > On Mon, Aug 14, 2023 at 2:56 PM Karol Herbst <[email protected]> wrote:
> > > >
> > > > On Mon, Aug 14, 2023 at 2:48 PM Takashi Iwai <[email protected]> wrote:
> > > > >
> > > > > On Mon, 14 Aug 2023 14:38:18 +0200,
> > > > > Karol Herbst wrote:
> > > > > >
> > > > > > On Wed, Aug 9, 2023 at 6:16 PM Takashi Iwai <[email protected]> wrote:
> > > > > > >
> > > > > > > On Wed, 09 Aug 2023 16:46:38 +0200,
> > > > > > > Takashi Iwai wrote:
> > > > > > > >
> > > > > > > > On Wed, 09 Aug 2023 15:13:23 +0200,
> > > > > > > > Takashi Iwai wrote:
> > > > > > > > >
> > > > > > > > > On Wed, 09 Aug 2023 14:19:23 +0200,
> > > > > > > > > Karol Herbst wrote:
> > > > > > > > > >
> > > > > > > > > > On Wed, Aug 9, 2023 at 1:46 PM Takashi Iwai <[email protected]> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Wed, 09 Aug 2023 13:42:09 +0200,
> > > > > > > > > > > Karol Herbst wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Aug 9, 2023 at 11:22 AM Takashi Iwai <[email protected]> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, 08 Aug 2023 12:39:32 +0200,
> > > > > > > > > > > > > Karol Herbst wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, Aug 7, 2023 at 5:05 PM Borislav Petkov <[email protected]> wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Mon, Aug 07, 2023 at 01:49:42PM +0200, Karol Herbst wrote:
> > > > > > > > > > > > > > > > in what way does it stop? Just not progressing? That would be kinda
> > > > > > > > > > > > > > > > concerning. Mind tracing with what arguments `nvkm_uevent_add` is
> > > > > > > > > > > > > > > > called with and without that patch?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Well, me dumping those args I guess made the box not freeze before
> > > > > > > > > > > > > > > catching a #PF over serial. Does that help?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > ....
> > > > > > > > > > > > > > > [ 3.410135] Unpacking initramfs...
> > > > > > > > > > > > > > > [ 3.416319] software IO TLB: mapped [mem 0x00000000a877d000-0x00000000ac77d000] (64MB)
> > > > > > > > > > > > > > > [ 3.418227] Initialise system trusted keyrings
> > > > > > > > > > > > > > > [ 3.432273] workingset: timestamp_bits=56 max_order=22 bucket_order=0
> > > > > > > > > > > > > > > [ 3.439006] ntfs: driver 2.1.32 [Flags: R/W].
> > > > > > > > > > > > > > > [ 3.443368] fuse: init (API version 7.38)
> > > > > > > > > > > > > > > [ 3.447601] 9p: Installing v9fs 9p2000 file system support
> > > > > > > > > > > > > > > [ 3.453223] Key type asymmetric registered
> > > > > > > > > > > > > > > [ 3.457332] Asymmetric key parser 'x509' registered
> > > > > > > > > > > > > > > [ 3.462236] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 250)
> > > > > > > > > > > > > > > [ 3.475865] efifb: probing for efifb
> > > > > > > > > > > > > > > [ 3.479458] efifb: framebuffer at 0xf9000000, using 1920k, total 1920k
> > > > > > > > > > > > > > > [ 3.485969] efifb: mode is 800x600x32, linelength=3200, pages=1
> > > > > > > > > > > > > > > [ 3.491872] efifb: scrolling: redraw
> > > > > > > > > > > > > > > [ 3.495438] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
> > > > > > > > > > > > > > > [ 3.502349] Console: switching to colour frame buffer device 100x37
> > > > > > > > > > > > > > > [ 3.509564] fb0: EFI VGA frame buffer device
> > > > > > > > > > > > > > > [ 3.514013] ACPI: \_PR_.CP00: Found 4 idle states
> > > > > > > > > > > > > > > [ 3.518850] ACPI: \_PR_.CP01: Found 4 idle states
> > > > > > > > > > > > > > > [ 3.523687] ACPI: \_PR_.CP02: Found 4 idle states
> > > > > > > > > > > > > > > [ 3.528515] ACPI: \_PR_.CP03: Found 4 idle states
> > > > > > > > > > > > > > > [ 3.533346] ACPI: \_PR_.CP04: Found 4 idle states
> > > > > > > > > > > > > > > [ 3.538173] ACPI: \_PR_.CP05: Found 4 idle states
> > > > > > > > > > > > > > > [ 3.543003] ACPI: \_PR_.CP06: Found 4 idle states
> > > > > > > > > > > > > > > [ 3.544219] Freeing initrd memory: 8196K
> > > > > > > > > > > > > > > [ 3.547844] ACPI: \_PR_.CP07: Found 4 idle states
> > > > > > > > > > > > > > > [ 3.609542] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> > > > > > > > > > > > > > > [ 3.616224] 00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
> > > > > > > > > > > > > > > [ 3.625552] serial 0000:00:16.3: enabling device (0000 -> 0003)
> > > > > > > > > > > > > > > [ 3.633034] 0000:00:16.3: ttyS1 at I/O 0xf0a0 (irq = 17, base_baud = 115200) is a 16550A
> > > > > > > > > > > > > > > [ 3.642451] Linux agpgart interface v0.103
> > > > > > > > > > > > > > > [ 3.647141] ACPI: bus type drm_connector registered
> > > > > > > > > > > > > > > [ 3.653261] Console: switching to colour dummy device 80x25
> > > > > > > > > > > > > > > [ 3.659092] nouveau 0000:03:00.0: vgaarb: deactivate vga console
> > > > > > > > > > > > > > > [ 3.665174] nouveau 0000:03:00.0: NVIDIA GT218 (0a8c00b1)
> > > > > > > > > > > > > > > [ 3.784585] nouveau 0000:03:00.0: bios: version 70.18.83.00.08
> > > > > > > > > > > > > > > [ 3.792244] nouveau 0000:03:00.0: fb: 512 MiB DDR3
> > > > > > > > > > > > > > > [ 3.948786] nouveau 0000:03:00.0: DRM: VRAM: 512 MiB
> > > > > > > > > > > > > > > [ 3.953755] nouveau 0000:03:00.0: DRM: GART: 1048576 MiB
> > > > > > > > > > > > > > > [ 3.959073] nouveau 0000:03:00.0: DRM: TMDS table version 2.0
> > > > > > > > > > > > > > > [ 3.964808] nouveau 0000:03:00.0: DRM: DCB version 4.0
> > > > > > > > > > > > > > > [ 3.969938] nouveau 0000:03:00.0: DRM: DCB outp 00: 02000360 00000000
> > > > > > > > > > > > > > > [ 3.976367] nouveau 0000:03:00.0: DRM: DCB outp 01: 02000362 00020010
> > > > > > > > > > > > > > > [ 3.982792] nouveau 0000:03:00.0: DRM: DCB outp 02: 028003a6 0f220010
> > > > > > > > > > > > > > > [ 3.989223] nouveau 0000:03:00.0: DRM: DCB outp 03: 01011380 00000000
> > > > > > > > > > > > > > > [ 3.995647] nouveau 0000:03:00.0: DRM: DCB outp 04: 08011382 00020010
> > > > > > > > > > > > > > > [ 4.002076] nouveau 0000:03:00.0: DRM: DCB outp 05: 088113c6 0f220010
> > > > > > > > > > > > > > > [ 4.008511] nouveau 0000:03:00.0: DRM: DCB conn 00: 00101064
> > > > > > > > > > > > > > > [ 4.014151] nouveau 0000:03:00.0: DRM: DCB conn 01: 00202165
> > > > > > > > > > > > > > > [ 4.021710] nvkm_uevent_add: uevent: 0xffff888100242100, event: 0xffff8881022de1a0, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > > > > > > > > > > > > [ 4.033680] nvkm_uevent_add: uevent: 0xffff888100242300, event: 0xffff8881022de1a0, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > > > > > > > > > > > > [ 4.045429] nouveau 0000:03:00.0: DRM: MM: using COPY for buffer copies
> > > > > > > > > > > > > > > [ 4.052059] stackdepot: allocating hash table of 1048576 entries via kvcalloc
> > > > > > > > > > > > > > > [ 4.067191] nvkm_uevent_add: uevent: 0xffff888100242800, event: 0xffff888104b3e260, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > > > > > > > > > > > > [ 4.078936] nvkm_uevent_add: uevent: 0xffff888100242900, event: 0xffff888104b3e260, id: 0x1, bits: 0x1, func: 0x0000000000000000
> > > > > > > > > > > > > > > [ 4.090514] nvkm_uevent_add: uevent: 0xffff888100242a00, event: 0xffff888102091f28, id: 0x1, bits: 0x3, func: 0xffffffff8177b700
> > > > > > > > > > > > > > > [ 4.102118] tsc: Refined TSC clocksource calibration: 3591.345 MHz
> > > > > > > > > > > > > > > [ 4.108342] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x33c4635c383, max_idle_ns: 440795314831 ns
> > > > > > > > > > > > > > > [ 4.108401] nvkm_uevent_add: uevent: 0xffff8881020b6000, event: 0xffff888102091f28, id: 0xf, bits: 0x3, func: 0xffffffff8177b700
> > > > > > > > > > > > > > > [ 4.129864] clocksource: Switched to clocksource tsc
> > > > > > > > > > > > > > > [ 4.131478] [drm] Initialized nouveau 1.3.1 20120801 for 0000:03:00.0 on minor 0
> > > > > > > > > > > > > > > [ 4.143806] BUG: kernel NULL pointer dereference, address: 0000000000000020
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > ahh, that would have been good to know :) Mind figuring out what's
> > > > > > > > > > > > > > exactly NULL inside nvif_object_mthd? Or rather what line
> > > > > > > > > > > > > > `nvif_object_mthd+0x136` belongs to, then it should be easy to figure
> > > > > > > > > > > > > > out what's wrong here.
> > > > > > > > > > > > >
> > > > > > > > > > > > > FWIW, we've hit the bug on openSUSE Tumbleweed 6.4.8 kernel:
> > > > > > > > > > > > > https://bugzilla.suse.com/show_bug.cgi?id=1214073
> > > > > > > > > > > > > Confirmed that reverting the patch cured the issue.
> > > > > > > > > > > > >
> > > > > > > > > > > > > FWIW, loading nouveau showed a refcount_t warning just before the NULL
> > > > > > > > > > > > > dereference:
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > mh, I wonder if one of those `return -EINVAL;` branches is hit where
> > > > > > > > > > > > it wasn't before. Could some of you check if `nvkm_uconn_uevent`
> > > > > > > > > > > > returns -EINVAL with that patch where it didn't before? I wonder if
> > > > > > > > > > > > it's the `if (&outp->head == &conn->disp->outps) return -EINVAL;` and
> > > > > > > > > > > > if remove that fixes the crash?
> > > > > > > > > > >
> > > > > > > > > > > Please give a patch, then I can build a kernel and let the reporter
> > > > > > > > > > > testing it :)
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > attached a patch.
> > > > > > > > >
> > > > > > > > > Thanks. Now I'm building a test kernel and asked the reporter for
> > > > > > > > > testing it.
> > > > > > > >
> > > > > > > > And the result was negative, the boot still hanged up.
> > > > > > >
> > > > > > > And below is another log from the 6.4.8 kernel with KASAN-enabled.
> > > > > > > Some memory corruption seems happening.
> > > > > > >
> > > > > > > [ 228.422919] nouveau 0000:02:00.0: DRM: DCB conn 01: 0000a146
> > > > > > > [ 228.428674] nouveau 0000:02:00.0: DRM: MM: using M2MF for buffer copies
> > > > > > > [ 228.436682] ==================================================================
> > > > > > > [ 228.436698] BUG: KASAN: slab-use-after-free in drm_connector_list_iter_next+0x176/0x320
> > > > > > > [ 228.436715] Read of size 4 at addr ffff8881731ce050 by task modprobe/6174
> > > > > > >
> > > > > > > [ 228.436728] CPU: 0 PID: 6174 Comm: modprobe Not tainted 6.4.9-4.g5b9ad20-default #1 openSUSE Tumbleweed (unreleased) d0a6841e538b38d17513f6942fb58770372b54fd
> > > > > > > [ 228.436740] Hardware name: Apple Inc. MacBook5,1/Mac-F42D89C8, BIOS MB51.88Z.007D.B03.0904271443 04/27/09
> > > > > > > [ 228.436747] Call Trace:
> > > > > > > [ 228.436753] <TASK>
> > > > > > > [ 228.436759] dump_stack_lvl+0x47/0x60
> > > > > > > [ 228.436773] print_report+0xcf/0x640
> > > > > > > [ 228.436784] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
> > > > > > > [ 228.436797] ? drm_connector_list_iter_next+0x176/0x320
> > > > > > > [ 228.436807] kasan_report+0xb1/0xe0
> > > > > > > [ 228.436817] ? drm_connector_list_iter_next+0x176/0x320
> > > > > > > [ 228.436828] kasan_check_range+0x105/0x1b0
> > > > > > > [ 228.436837] drm_connector_list_iter_next+0x176/0x320
> > > > > > > [ 228.436848] ? __pfx_drm_connector_list_iter_next+0x10/0x10
> > > > > > > [ 228.436859] ? __kmem_cache_free+0x18a/0x2c0
> > > > > > > [ 228.436868] nouveau_connector_create+0x170/0x1cd0 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > > > [ 228.437540] ? drm_encoder_init+0xbe/0x140
> > > > > > > [ 228.437554] ? __pfx_nouveau_connector_create+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > > > [ 228.438137] ? nvif_outp_ctor+0x2d9/0x430 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > > > [ 228.438236] nv50_display_create+0xe54/0x30d0 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > > > [ 228.438236] nouveau_display_create+0x903/0x10c0 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > > > [ 228.438236] nouveau_drm_device_init+0x3a4/0x19e0 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > > > [ 228.438236] ? __pfx_nouveau_drm_device_init+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > > > [ 228.438236] ? __pfx_pci_update_current_state+0x10/0x10
> > > > > > > [ 228.438236] ? __kasan_check_byte+0x13/0x50
> > > > > > > [ 228.438236] nouveau_drm_probe+0x1a2/0x6b0 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > > > [ 228.438236] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
> > > > > > > [ 228.438236] ? __pfx_nouveau_drm_probe+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > > > [ 228.438236] ? __pfx_nouveau_drm_probe+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > > > [ 228.438236] local_pci_probe+0xdd/0x190
> > > > > > > [ 228.438236] pci_device_probe+0x23a/0x770
> > > > > > > [ 228.438236] ? kernfs_add_one+0x2d8/0x450
> > > > > > > [ 228.438236] ? kernfs_get.part.0+0x4c/0x70
> > > > > > > [ 228.438236] ? __pfx_pci_device_probe+0x10/0x10
> > > > > > > [ 228.438236] ? kernfs_create_link+0x15f/0x230
> > > > > > > [ 228.438236] ? kernfs_put+0x1c/0x40
> > > > > > > [ 228.438236] ? sysfs_do_create_link_sd+0x8e/0x100
> > > > > > > [ 228.438236] really_probe+0x3e2/0xb80
> > > > > > > [ 228.438236] __driver_probe_device+0x18c/0x450
> > > > > > > [ 228.438236] ? __pfx_klist_iter_init_node+0x10/0x10
> > > > > > > [ 228.438236] driver_probe_device+0x4a/0x120
> > > > > > > [ 228.438236] __driver_attach+0x1e1/0x4a0
> > > > > > > [ 228.438236] ? __pfx___driver_attach+0x10/0x10
> > > > > > > [ 228.438236] bus_for_each_dev+0xf4/0x170
> > > > > > > [ 228.438236] ? __pfx__raw_spin_lock+0x10/0x10
> > > > > > > [ 228.438236] ? __pfx_bus_for_each_dev+0x10/0x10
> > > > > > > [ 228.438236] bus_add_driver+0x29e/0x570
> > > > > > > [ 228.438236] ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > > > [ 228.438236] ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > > > [ 228.438236] driver_register+0x134/0x460
> > > > > > > [ 228.438236] ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > > > [ 228.438236] do_one_initcall+0x8e/0x310
> > > > > > > [ 228.438236] ? __pfx_do_one_initcall+0x10/0x10
> > > > > > > [ 228.438236] ? __kmem_cache_alloc_node+0x1b9/0x3b0
> > > > > > > [ 228.438236] ? do_init_module+0x4b/0x730
> > > > > > > [ 228.438236] ? kasan_unpoison+0x44/0x70
> > > > > > > [ 228.438236] do_init_module+0x238/0x730
> > > > > > > [ 228.438236] load_module+0x5b41/0x6dd0
> > > > > > > [ 228.438236] ? __pfx_load_module+0x10/0x10
> > > > > > > [ 228.438236] ? _raw_spin_lock+0x85/0xe0
> > > > > > > [ 228.438236] ? __pfx__raw_spin_lock+0x10/0x10
> > > > > > > [ 228.438236] ? find_vmap_area+0xab/0xe0
> > > > > > > [ 228.438236] ? __do_sys_init_module+0x1df/0x210
> > > > > > > [ 228.438236] __do_sys_init_module+0x1df/0x210
> > > > > > > [ 228.438236] ? __pfx___do_sys_init_module+0x10/0x10
> > > > > > > [ 228.438236] ? syscall_exit_to_user_mode+0x1b/0x40
> > > > > > > [ 228.438236] ? do_syscall_64+0x6c/0x90
> > > > > > > [ 228.438236] ? __pfx_ksys_read+0x10/0x10
> > > > > > > [ 228.438236] do_syscall_64+0x60/0x90
> > > > > > > [ 228.438236] ? syscall_exit_to_user_mode+0x1b/0x40
> > > > > > > [ 228.438236] ? do_syscall_64+0x6c/0x90
> > > > > > > [ 228.438236] ? syscall_exit_to_user_mode+0x1b/0x40
> > > > > > > [ 228.438236] ? do_syscall_64+0x6c/0x90
> > > > > > > [ 228.438236] ? exc_page_fault+0x62/0xd0
> > > > > > > [ 228.438236] entry_SYSCALL_64_after_hwframe+0x77/0xe1
> > > > > > > [ 228.438236] RIP: 0033:0x7f91ce119a5e
> > > > > > > [ 228.438236] Code: c3 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 66 90 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 7a 03 0d 00 f7 d8 64 89 01 48
> > > > > > > [ 228.438236] RSP: 002b:00007ffce2813538 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
> > > > > > > [ 228.438236] RAX: ffffffffffffffda RBX: 00005588462def10 RCX: 00007f91ce119a5e
> > > > > > > [ 228.438236] RDX: 00005588462e39c0 RSI: 0000000000fda8b2 RDI: 00007f91cc371010
> > > > > > > [ 228.438236] RBP: 00005588462e39c0 R08: 00005588462e3ce0 R09: 0000000000000000
> > > > > > > [ 228.438236] R10: 000000000005af11 R11: 0000000000000246 R12: 0000000000040000
> > > > > > > [ 228.438236] R13: 0000000000000000 R14: 0000000000000009 R15: 00005588462de7c0
> > > > > > > [ 228.438236] </TASK>
> > > > > > >
> > > > > > > [ 228.438236] Allocated by task 6174:
> > > > > > > [ 228.438236] kasan_save_stack+0x20/0x40
> > > > > > > [ 228.438236] kasan_set_track+0x25/0x30
> > > > > > > [ 228.438236] __kasan_kmalloc+0xaa/0xb0
> > > > > > > [ 228.438236] nouveau_connector_create+0x386/0x1cd0 [nouveau]
> > > > > > > [ 228.438236] nv50_display_create+0xe54/0x30d0 [nouveau]
> > > > > > > [ 228.438236] nouveau_display_create+0x903/0x10c0 [nouveau]
> > > > > > > [ 228.438236] nouveau_drm_device_init+0x3a4/0x19e0 [nouveau]
> > > > > > > [ 228.438236] nouveau_drm_probe+0x1a2/0x6b0 [nouveau]
> > > > > > > [ 228.438236] local_pci_probe+0xdd/0x190
> > > > > > > [ 228.438236] pci_device_probe+0x23a/0x770
> > > > > > > [ 228.438236] really_probe+0x3e2/0xb80
> > > > > > > [ 228.438236] __driver_probe_device+0x18c/0x450
> > > > > > > [ 228.438236] driver_probe_device+0x4a/0x120
> > > > > > > [ 228.438236] __driver_attach+0x1e1/0x4a0
> > > > > > > [ 228.438236] bus_for_each_dev+0xf4/0x170
> > > > > > > [ 228.438236] bus_add_driver+0x29e/0x570
> > > > > > > [ 228.438236] driver_register+0x134/0x460
> > > > > > > [ 228.438236] do_one_initcall+0x8e/0x310
> > > > > > > [ 228.438236] do_init_module+0x238/0x730
> > > > > > > [ 228.438236] load_module+0x5b41/0x6dd0
> > > > > > > [ 228.438236] __do_sys_init_module+0x1df/0x210
> > > > > > > [ 228.438236] do_syscall_64+0x60/0x90
> > > > > > > [ 228.438236] entry_SYSCALL_64_after_hwframe+0x77/0xe1
> > > > > > >
> > > > > > > [ 228.438236] Freed by task 6174:
> > > > > > > [ 228.438236] kasan_save_stack+0x20/0x40
> > > > > > > [ 228.438236] kasan_set_track+0x25/0x30
> > > > > > > [ 228.438236] kasan_save_free_info+0x2e/0x50
> > > > > > > [ 228.438236] ____kasan_slab_free+0x169/0x1c0
> > > > > > > [ 228.438236] slab_free_freelist_hook+0xcd/0x190
> > > > > > > [ 228.438236] __kmem_cache_free+0x18a/0x2c0
> > > > > > > [ 228.438236] nouveau_connector_create+0x1423/0x1cd0 [nouveau]
> > > > > > > [ 228.438236] nv50_display_create+0xe54/0x30d0 [nouveau]
> > > > > > > [ 228.438236] nouveau_display_create+0x903/0x10c0 [nouveau]
> > > > > > > [ 228.438236] nouveau_drm_device_init+0x3a4/0x19e0 [nouveau]
> > > > > > > [ 228.438236] nouveau_drm_probe+0x1a2/0x6b0 [nouveau]
> > > > > > > [ 228.438236] local_pci_probe+0xdd/0x190
> > > > > > > [ 228.438236] pci_device_probe+0x23a/0x770
> > > > > > > [ 228.438236] really_probe+0x3e2/0xb80
> > > > > > > [ 228.438236] __driver_probe_device+0x18c/0x450
> > > > > > > [ 228.438236] driver_probe_device+0x4a/0x120
> > > > > > > [ 228.438236] __driver_attach+0x1e1/0x4a0
> > > > > > > [ 228.438236] bus_for_each_dev+0xf4/0x170
> > > > > > > [ 228.438236] bus_add_driver+0x29e/0x570
> > > > > > > [ 228.438236] driver_register+0x134/0x460
> > > > > > > [ 228.438236] do_one_initcall+0x8e/0x310
> > > > > > > [ 228.438236] do_init_module+0x238/0x730
> > > > > > > [ 228.438236] load_module+0x5b41/0x6dd0
> > > > > > > [ 228.438236] __do_sys_init_module+0x1df/0x210
> > > > > > > [ 228.438236] do_syscall_64+0x60/0x90
> > > > > > > [ 228.438236] entry_SYSCALL_64_after_hwframe+0x77/0xe1
> > > > > > >
> > > > > > > [ 228.438236] The buggy address belongs to the object at ffff8881731ce000
> > > > > > > which belongs to the cache kmalloc-4k of size 4096
> > > > > > > [ 228.438236] The buggy address is located 80 bytes inside of
> > > > > > > freed 4096-byte region [ffff8881731ce000, ffff8881731cf000)
> > > > > > >
> > > > > > > [ 228.438236] The buggy address belongs to the physical page:
> > > > > > > [ 228.438236] page:00000000d1c274b4 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1731c8
> > > > > > > [ 228.438236] head:00000000d1c274b4 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
> > > > > > > [ 228.438236] flags: 0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
> > > > > > > [ 228.438236] page_type: 0xffffffff()
> > > > > > > [ 228.438236] raw: 0017ffffc0010200 ffff888100042140 dead000000000122 0000000000000000
> > > > > > > [ 228.438236] raw: 0000000000000000 0000000080040004 00000001ffffffff 0000000000000000
> > > > > > > [ 228.438236] page dumped because: kasan: bad access detected
> > > > > > >
> > > > > > > [ 228.438236] Memory state around the buggy address:
> > > > > > > [ 228.438236] ffff8881731cdf00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> > > > > > > [ 228.438236] ffff8881731cdf80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> > > > > > > [ 228.438236] >ffff8881731ce000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > > > > > [ 228.438236] ^
> > > > > > > [ 228.438236] ffff8881731ce080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > > > > > [ 228.438236] ffff8881731ce100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > > > > > [ 228.438236] ==================================================================
> > > > > > >
> > > > > >
> > > > > > mind resolving those to file lines via decode_stacktrace.sh or
> > > > > > something, because looking at it, it makes no sense really.
> > > > >
> > > > > I don't own the machine, so it's a bit difficult from my side,
> > > > > unfortunately.
> > > > >
> > >
> > > also, you don't need to run it on the same machine if it's all
> > > distribution packaged. As long as you have the exact same binary
> > > available you can resolve the lines. Or just use gdb:
> > > https://docs.kernel.org/admin-guide/bug-hunting.html#gdb
> >
> > Unfortunately it's not possible, as it's a moving target (following
> > the upstream development), and the rpm packages used for the report
> > are already gone. What I can get now won't match any longer.
> >
> > But now I wonder whether this can be reproduced by injecting an
> > error artificially.
> >
>
> uhh, nvm. I was blind and I didn't see those lines of code:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/nouveau/nouveau_connector.c?h=v6.5-rc6#n1403
>
> Those lines probably should only be called after all error checks,
> I'll see if it's safe to do so.
>

I've sent a patch out to address this memory corruption
https://patchwork.freedesktop.org/patch/552642/

It might or might not fix regressions from the original I2C fix, so
please test and report if there are remaining issues.

> >
> > Takashi
> >


2023-08-14 15:32:30

by Karol Herbst

[permalink] [raw]
Subject: Re: 2b5d1c29f6c4 ("drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts")

On Mon, Aug 14, 2023 at 3:35 PM Takashi Iwai <[email protected]> wrote:
>
> On Mon, 14 Aug 2023 15:19:11 +0200,
> Karol Herbst wrote:
> >
> > On Mon, Aug 14, 2023 at 2:56 PM Karol Herbst <[email protected]> wrote:
> > >
> > > On Mon, Aug 14, 2023 at 2:48 PM Takashi Iwai <[email protected]> wrote:
> > > >
> > > > On Mon, 14 Aug 2023 14:38:18 +0200,
> > > > Karol Herbst wrote:
> > > > >
> > > > > On Wed, Aug 9, 2023 at 6:16 PM Takashi Iwai <[email protected]> wrote:
> > > > > >
> > > > > > On Wed, 09 Aug 2023 16:46:38 +0200,
> > > > > > Takashi Iwai wrote:
> > > > > > >
> > > > > > > On Wed, 09 Aug 2023 15:13:23 +0200,
> > > > > > > Takashi Iwai wrote:
> > > > > > > >
> > > > > > > > On Wed, 09 Aug 2023 14:19:23 +0200,
> > > > > > > > Karol Herbst wrote:
> > > > > > > > >
> > > > > > > > > On Wed, Aug 9, 2023 at 1:46 PM Takashi Iwai <[email protected]> wrote:
> > > > > > > > > >
> > > > > > > > > > On Wed, 09 Aug 2023 13:42:09 +0200,
> > > > > > > > > > Karol Herbst wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Aug 9, 2023 at 11:22 AM Takashi Iwai <[email protected]> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, 08 Aug 2023 12:39:32 +0200,
> > > > > > > > > > > > Karol Herbst wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Aug 7, 2023 at 5:05 PM Borislav Petkov <[email protected]> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, Aug 07, 2023 at 01:49:42PM +0200, Karol Herbst wrote:
> > > > > > > > > > > > > > > in what way does it stop? Just not progressing? That would be kinda
> > > > > > > > > > > > > > > concerning. Mind tracing with what arguments `nvkm_uevent_add` is
> > > > > > > > > > > > > > > called with and without that patch?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Well, me dumping those args I guess made the box not freeze before
> > > > > > > > > > > > > > catching a #PF over serial. Does that help?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > ....
> > > > > > > > > > > > > > [ 3.410135] Unpacking initramfs...
> > > > > > > > > > > > > > [ 3.416319] software IO TLB: mapped [mem 0x00000000a877d000-0x00000000ac77d000] (64MB)
> > > > > > > > > > > > > > [ 3.418227] Initialise system trusted keyrings
> > > > > > > > > > > > > > [ 3.432273] workingset: timestamp_bits=56 max_order=22 bucket_order=0
> > > > > > > > > > > > > > [ 3.439006] ntfs: driver 2.1.32 [Flags: R/W].
> > > > > > > > > > > > > > [ 3.443368] fuse: init (API version 7.38)
> > > > > > > > > > > > > > [ 3.447601] 9p: Installing v9fs 9p2000 file system support
> > > > > > > > > > > > > > [ 3.453223] Key type asymmetric registered
> > > > > > > > > > > > > > [ 3.457332] Asymmetric key parser 'x509' registered
> > > > > > > > > > > > > > [ 3.462236] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 250)
> > > > > > > > > > > > > > [ 3.475865] efifb: probing for efifb
> > > > > > > > > > > > > > [ 3.479458] efifb: framebuffer at 0xf9000000, using 1920k, total 1920k
> > > > > > > > > > > > > > [ 3.485969] efifb: mode is 800x600x32, linelength=3200, pages=1
> > > > > > > > > > > > > > [ 3.491872] efifb: scrolling: redraw
> > > > > > > > > > > > > > [ 3.495438] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
> > > > > > > > > > > > > > [ 3.502349] Console: switching to colour frame buffer device 100x37
> > > > > > > > > > > > > > [ 3.509564] fb0: EFI VGA frame buffer device
> > > > > > > > > > > > > > [ 3.514013] ACPI: \_PR_.CP00: Found 4 idle states
> > > > > > > > > > > > > > [ 3.518850] ACPI: \_PR_.CP01: Found 4 idle states
> > > > > > > > > > > > > > [ 3.523687] ACPI: \_PR_.CP02: Found 4 idle states
> > > > > > > > > > > > > > [ 3.528515] ACPI: \_PR_.CP03: Found 4 idle states
> > > > > > > > > > > > > > [ 3.533346] ACPI: \_PR_.CP04: Found 4 idle states
> > > > > > > > > > > > > > [ 3.538173] ACPI: \_PR_.CP05: Found 4 idle states
> > > > > > > > > > > > > > [ 3.543003] ACPI: \_PR_.CP06: Found 4 idle states
> > > > > > > > > > > > > > [ 3.544219] Freeing initrd memory: 8196K
> > > > > > > > > > > > > > [ 3.547844] ACPI: \_PR_.CP07: Found 4 idle states
> > > > > > > > > > > > > > [ 3.609542] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> > > > > > > > > > > > > > [ 3.616224] 00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
> > > > > > > > > > > > > > [ 3.625552] serial 0000:00:16.3: enabling device (0000 -> 0003)
> > > > > > > > > > > > > > [ 3.633034] 0000:00:16.3: ttyS1 at I/O 0xf0a0 (irq = 17, base_baud = 115200) is a 16550A
> > > > > > > > > > > > > > [ 3.642451] Linux agpgart interface v0.103
> > > > > > > > > > > > > > [ 3.647141] ACPI: bus type drm_connector registered
> > > > > > > > > > > > > > [ 3.653261] Console: switching to colour dummy device 80x25
> > > > > > > > > > > > > > [ 3.659092] nouveau 0000:03:00.0: vgaarb: deactivate vga console
> > > > > > > > > > > > > > [ 3.665174] nouveau 0000:03:00.0: NVIDIA GT218 (0a8c00b1)
> > > > > > > > > > > > > > [ 3.784585] nouveau 0000:03:00.0: bios: version 70.18.83.00.08
> > > > > > > > > > > > > > [ 3.792244] nouveau 0000:03:00.0: fb: 512 MiB DDR3
> > > > > > > > > > > > > > [ 3.948786] nouveau 0000:03:00.0: DRM: VRAM: 512 MiB
> > > > > > > > > > > > > > [ 3.953755] nouveau 0000:03:00.0: DRM: GART: 1048576 MiB
> > > > > > > > > > > > > > [ 3.959073] nouveau 0000:03:00.0: DRM: TMDS table version 2.0
> > > > > > > > > > > > > > [ 3.964808] nouveau 0000:03:00.0: DRM: DCB version 4.0
> > > > > > > > > > > > > > [ 3.969938] nouveau 0000:03:00.0: DRM: DCB outp 00: 02000360 00000000
> > > > > > > > > > > > > > [ 3.976367] nouveau 0000:03:00.0: DRM: DCB outp 01: 02000362 00020010
> > > > > > > > > > > > > > [ 3.982792] nouveau 0000:03:00.0: DRM: DCB outp 02: 028003a6 0f220010
> > > > > > > > > > > > > > [ 3.989223] nouveau 0000:03:00.0: DRM: DCB outp 03: 01011380 00000000
> > > > > > > > > > > > > > [ 3.995647] nouveau 0000:03:00.0: DRM: DCB outp 04: 08011382 00020010
> > > > > > > > > > > > > > [ 4.002076] nouveau 0000:03:00.0: DRM: DCB outp 05: 088113c6 0f220010
> > > > > > > > > > > > > > [ 4.008511] nouveau 0000:03:00.0: DRM: DCB conn 00: 00101064
> > > > > > > > > > > > > > [ 4.014151] nouveau 0000:03:00.0: DRM: DCB conn 01: 00202165
> > > > > > > > > > > > > > [ 4.021710] nvkm_uevent_add: uevent: 0xffff888100242100, event: 0xffff8881022de1a0, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > > > > > > > > > > > [ 4.033680] nvkm_uevent_add: uevent: 0xffff888100242300, event: 0xffff8881022de1a0, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > > > > > > > > > > > [ 4.045429] nouveau 0000:03:00.0: DRM: MM: using COPY for buffer copies
> > > > > > > > > > > > > > [ 4.052059] stackdepot: allocating hash table of 1048576 entries via kvcalloc
> > > > > > > > > > > > > > [ 4.067191] nvkm_uevent_add: uevent: 0xffff888100242800, event: 0xffff888104b3e260, id: 0x0, bits: 0x1, func: 0x0000000000000000
> > > > > > > > > > > > > > [ 4.078936] nvkm_uevent_add: uevent: 0xffff888100242900, event: 0xffff888104b3e260, id: 0x1, bits: 0x1, func: 0x0000000000000000
> > > > > > > > > > > > > > [ 4.090514] nvkm_uevent_add: uevent: 0xffff888100242a00, event: 0xffff888102091f28, id: 0x1, bits: 0x3, func: 0xffffffff8177b700
> > > > > > > > > > > > > > [ 4.102118] tsc: Refined TSC clocksource calibration: 3591.345 MHz
> > > > > > > > > > > > > > [ 4.108342] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x33c4635c383, max_idle_ns: 440795314831 ns
> > > > > > > > > > > > > > [ 4.108401] nvkm_uevent_add: uevent: 0xffff8881020b6000, event: 0xffff888102091f28, id: 0xf, bits: 0x3, func: 0xffffffff8177b700
> > > > > > > > > > > > > > [ 4.129864] clocksource: Switched to clocksource tsc
> > > > > > > > > > > > > > [ 4.131478] [drm] Initialized nouveau 1.3.1 20120801 for 0000:03:00.0 on minor 0
> > > > > > > > > > > > > > [ 4.143806] BUG: kernel NULL pointer dereference, address: 0000000000000020
> > > > > > > > > > > > >
> > > > > > > > > > > > > ahh, that would have been good to know :) Mind figuring out what's
> > > > > > > > > > > > > exactly NULL inside nvif_object_mthd? Or rather what line
> > > > > > > > > > > > > `nvif_object_mthd+0x136` belongs to, then it should be easy to figure
> > > > > > > > > > > > > out what's wrong here.
> > > > > > > > > > > >
> > > > > > > > > > > > FWIW, we've hit the bug on openSUSE Tumbleweed 6.4.8 kernel:
> > > > > > > > > > > > https://bugzilla.suse.com/show_bug.cgi?id=1214073
> > > > > > > > > > > > Confirmed that reverting the patch cured the issue.
> > > > > > > > > > > >
> > > > > > > > > > > > FWIW, loading nouveau showed a refcount_t warning just before the NULL
> > > > > > > > > > > > dereference:
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > mh, I wonder if one of those `return -EINVAL;` branches is hit where
> > > > > > > > > > > it wasn't before. Could some of you check if `nvkm_uconn_uevent`
> > > > > > > > > > > returns -EINVAL with that patch where it didn't before? I wonder if
> > > > > > > > > > > it's the `if (&outp->head == &conn->disp->outps) return -EINVAL;` and
> > > > > > > > > > > if remove that fixes the crash?
> > > > > > > > > >
> > > > > > > > > > Please give a patch, then I can build a kernel and let the reporter
> > > > > > > > > > testing it :)
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > attached a patch.
> > > > > > > >
> > > > > > > > Thanks. Now I'm building a test kernel and asked the reporter for
> > > > > > > > testing it.
> > > > > > >
> > > > > > > And the result was negative, the boot still hanged up.
> > > > > >
> > > > > > And below is another log from the 6.4.8 kernel with KASAN-enabled.
> > > > > > Some memory corruption seems happening.
> > > > > >
> > > > > > [ 228.422919] nouveau 0000:02:00.0: DRM: DCB conn 01: 0000a146
> > > > > > [ 228.428674] nouveau 0000:02:00.0: DRM: MM: using M2MF for buffer copies
> > > > > > [ 228.436682] ==================================================================
> > > > > > [ 228.436698] BUG: KASAN: slab-use-after-free in drm_connector_list_iter_next+0x176/0x320
> > > > > > [ 228.436715] Read of size 4 at addr ffff8881731ce050 by task modprobe/6174
> > > > > >
> > > > > > [ 228.436728] CPU: 0 PID: 6174 Comm: modprobe Not tainted 6.4.9-4.g5b9ad20-default #1 openSUSE Tumbleweed (unreleased) d0a6841e538b38d17513f6942fb58770372b54fd
> > > > > > [ 228.436740] Hardware name: Apple Inc. MacBook5,1/Mac-F42D89C8, BIOS MB51.88Z.007D.B03.0904271443 04/27/09
> > > > > > [ 228.436747] Call Trace:
> > > > > > [ 228.436753] <TASK>
> > > > > > [ 228.436759] dump_stack_lvl+0x47/0x60
> > > > > > [ 228.436773] print_report+0xcf/0x640
> > > > > > [ 228.436784] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
> > > > > > [ 228.436797] ? drm_connector_list_iter_next+0x176/0x320
> > > > > > [ 228.436807] kasan_report+0xb1/0xe0
> > > > > > [ 228.436817] ? drm_connector_list_iter_next+0x176/0x320
> > > > > > [ 228.436828] kasan_check_range+0x105/0x1b0
> > > > > > [ 228.436837] drm_connector_list_iter_next+0x176/0x320
> > > > > > [ 228.436848] ? __pfx_drm_connector_list_iter_next+0x10/0x10
> > > > > > [ 228.436859] ? __kmem_cache_free+0x18a/0x2c0
> > > > > > [ 228.436868] nouveau_connector_create+0x170/0x1cd0 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > > [ 228.437540] ? drm_encoder_init+0xbe/0x140
> > > > > > [ 228.437554] ? __pfx_nouveau_connector_create+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > > [ 228.438137] ? nvif_outp_ctor+0x2d9/0x430 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > > [ 228.438236] nv50_display_create+0xe54/0x30d0 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > > [ 228.438236] nouveau_display_create+0x903/0x10c0 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > > [ 228.438236] nouveau_drm_device_init+0x3a4/0x19e0 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > > [ 228.438236] ? __pfx_nouveau_drm_device_init+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > > [ 228.438236] ? __pfx_pci_update_current_state+0x10/0x10
> > > > > > [ 228.438236] ? __kasan_check_byte+0x13/0x50
> > > > > > [ 228.438236] nouveau_drm_probe+0x1a2/0x6b0 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > > [ 228.438236] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
> > > > > > [ 228.438236] ? __pfx_nouveau_drm_probe+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > > [ 228.438236] ? __pfx_nouveau_drm_probe+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > > [ 228.438236] local_pci_probe+0xdd/0x190
> > > > > > [ 228.438236] pci_device_probe+0x23a/0x770
> > > > > > [ 228.438236] ? kernfs_add_one+0x2d8/0x450
> > > > > > [ 228.438236] ? kernfs_get.part.0+0x4c/0x70
> > > > > > [ 228.438236] ? __pfx_pci_device_probe+0x10/0x10
> > > > > > [ 228.438236] ? kernfs_create_link+0x15f/0x230
> > > > > > [ 228.438236] ? kernfs_put+0x1c/0x40
> > > > > > [ 228.438236] ? sysfs_do_create_link_sd+0x8e/0x100
> > > > > > [ 228.438236] really_probe+0x3e2/0xb80
> > > > > > [ 228.438236] __driver_probe_device+0x18c/0x450
> > > > > > [ 228.438236] ? __pfx_klist_iter_init_node+0x10/0x10
> > > > > > [ 228.438236] driver_probe_device+0x4a/0x120
> > > > > > [ 228.438236] __driver_attach+0x1e1/0x4a0
> > > > > > [ 228.438236] ? __pfx___driver_attach+0x10/0x10
> > > > > > [ 228.438236] bus_for_each_dev+0xf4/0x170
> > > > > > [ 228.438236] ? __pfx__raw_spin_lock+0x10/0x10
> > > > > > [ 228.438236] ? __pfx_bus_for_each_dev+0x10/0x10
> > > > > > [ 228.438236] bus_add_driver+0x29e/0x570
> > > > > > [ 228.438236] ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > > [ 228.438236] ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > > [ 228.438236] driver_register+0x134/0x460
> > > > > > [ 228.438236] ? __pfx_nouveau_drm_init+0x10/0x10 [nouveau d0287dfba9984367c331e8149297392f67038244]
> > > > > > [ 228.438236] do_one_initcall+0x8e/0x310
> > > > > > [ 228.438236] ? __pfx_do_one_initcall+0x10/0x10
> > > > > > [ 228.438236] ? __kmem_cache_alloc_node+0x1b9/0x3b0
> > > > > > [ 228.438236] ? do_init_module+0x4b/0x730
> > > > > > [ 228.438236] ? kasan_unpoison+0x44/0x70
> > > > > > [ 228.438236] do_init_module+0x238/0x730
> > > > > > [ 228.438236] load_module+0x5b41/0x6dd0
> > > > > > [ 228.438236] ? __pfx_load_module+0x10/0x10
> > > > > > [ 228.438236] ? _raw_spin_lock+0x85/0xe0
> > > > > > [ 228.438236] ? __pfx__raw_spin_lock+0x10/0x10
> > > > > > [ 228.438236] ? find_vmap_area+0xab/0xe0
> > > > > > [ 228.438236] ? __do_sys_init_module+0x1df/0x210
> > > > > > [ 228.438236] __do_sys_init_module+0x1df/0x210
> > > > > > [ 228.438236] ? __pfx___do_sys_init_module+0x10/0x10
> > > > > > [ 228.438236] ? syscall_exit_to_user_mode+0x1b/0x40
> > > > > > [ 228.438236] ? do_syscall_64+0x6c/0x90
> > > > > > [ 228.438236] ? __pfx_ksys_read+0x10/0x10
> > > > > > [ 228.438236] do_syscall_64+0x60/0x90
> > > > > > [ 228.438236] ? syscall_exit_to_user_mode+0x1b/0x40
> > > > > > [ 228.438236] ? do_syscall_64+0x6c/0x90
> > > > > > [ 228.438236] ? syscall_exit_to_user_mode+0x1b/0x40
> > > > > > [ 228.438236] ? do_syscall_64+0x6c/0x90
> > > > > > [ 228.438236] ? exc_page_fault+0x62/0xd0
> > > > > > [ 228.438236] entry_SYSCALL_64_after_hwframe+0x77/0xe1
> > > > > > [ 228.438236] RIP: 0033:0x7f91ce119a5e
> > > > > > [ 228.438236] Code: c3 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 66 90 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 7a 03 0d 00 f7 d8 64 89 01 48
> > > > > > [ 228.438236] RSP: 002b:00007ffce2813538 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
> > > > > > [ 228.438236] RAX: ffffffffffffffda RBX: 00005588462def10 RCX: 00007f91ce119a5e
> > > > > > [ 228.438236] RDX: 00005588462e39c0 RSI: 0000000000fda8b2 RDI: 00007f91cc371010
> > > > > > [ 228.438236] RBP: 00005588462e39c0 R08: 00005588462e3ce0 R09: 0000000000000000
> > > > > > [ 228.438236] R10: 000000000005af11 R11: 0000000000000246 R12: 0000000000040000
> > > > > > [ 228.438236] R13: 0000000000000000 R14: 0000000000000009 R15: 00005588462de7c0
> > > > > > [ 228.438236] </TASK>
> > > > > >
> > > > > > [ 228.438236] Allocated by task 6174:
> > > > > > [ 228.438236] kasan_save_stack+0x20/0x40
> > > > > > [ 228.438236] kasan_set_track+0x25/0x30
> > > > > > [ 228.438236] __kasan_kmalloc+0xaa/0xb0
> > > > > > [ 228.438236] nouveau_connector_create+0x386/0x1cd0 [nouveau]
> > > > > > [ 228.438236] nv50_display_create+0xe54/0x30d0 [nouveau]
> > > > > > [ 228.438236] nouveau_display_create+0x903/0x10c0 [nouveau]
> > > > > > [ 228.438236] nouveau_drm_device_init+0x3a4/0x19e0 [nouveau]
> > > > > > [ 228.438236] nouveau_drm_probe+0x1a2/0x6b0 [nouveau]
> > > > > > [ 228.438236] local_pci_probe+0xdd/0x190
> > > > > > [ 228.438236] pci_device_probe+0x23a/0x770
> > > > > > [ 228.438236] really_probe+0x3e2/0xb80
> > > > > > [ 228.438236] __driver_probe_device+0x18c/0x450
> > > > > > [ 228.438236] driver_probe_device+0x4a/0x120
> > > > > > [ 228.438236] __driver_attach+0x1e1/0x4a0
> > > > > > [ 228.438236] bus_for_each_dev+0xf4/0x170
> > > > > > [ 228.438236] bus_add_driver+0x29e/0x570
> > > > > > [ 228.438236] driver_register+0x134/0x460
> > > > > > [ 228.438236] do_one_initcall+0x8e/0x310
> > > > > > [ 228.438236] do_init_module+0x238/0x730
> > > > > > [ 228.438236] load_module+0x5b41/0x6dd0
> > > > > > [ 228.438236] __do_sys_init_module+0x1df/0x210
> > > > > > [ 228.438236] do_syscall_64+0x60/0x90
> > > > > > [ 228.438236] entry_SYSCALL_64_after_hwframe+0x77/0xe1
> > > > > >
> > > > > > [ 228.438236] Freed by task 6174:
> > > > > > [ 228.438236] kasan_save_stack+0x20/0x40
> > > > > > [ 228.438236] kasan_set_track+0x25/0x30
> > > > > > [ 228.438236] kasan_save_free_info+0x2e/0x50
> > > > > > [ 228.438236] ____kasan_slab_free+0x169/0x1c0
> > > > > > [ 228.438236] slab_free_freelist_hook+0xcd/0x190
> > > > > > [ 228.438236] __kmem_cache_free+0x18a/0x2c0
> > > > > > [ 228.438236] nouveau_connector_create+0x1423/0x1cd0 [nouveau]
> > > > > > [ 228.438236] nv50_display_create+0xe54/0x30d0 [nouveau]
> > > > > > [ 228.438236] nouveau_display_create+0x903/0x10c0 [nouveau]
> > > > > > [ 228.438236] nouveau_drm_device_init+0x3a4/0x19e0 [nouveau]
> > > > > > [ 228.438236] nouveau_drm_probe+0x1a2/0x6b0 [nouveau]
> > > > > > [ 228.438236] local_pci_probe+0xdd/0x190
> > > > > > [ 228.438236] pci_device_probe+0x23a/0x770
> > > > > > [ 228.438236] really_probe+0x3e2/0xb80
> > > > > > [ 228.438236] __driver_probe_device+0x18c/0x450
> > > > > > [ 228.438236] driver_probe_device+0x4a/0x120
> > > > > > [ 228.438236] __driver_attach+0x1e1/0x4a0
> > > > > > [ 228.438236] bus_for_each_dev+0xf4/0x170
> > > > > > [ 228.438236] bus_add_driver+0x29e/0x570
> > > > > > [ 228.438236] driver_register+0x134/0x460
> > > > > > [ 228.438236] do_one_initcall+0x8e/0x310
> > > > > > [ 228.438236] do_init_module+0x238/0x730
> > > > > > [ 228.438236] load_module+0x5b41/0x6dd0
> > > > > > [ 228.438236] __do_sys_init_module+0x1df/0x210
> > > > > > [ 228.438236] do_syscall_64+0x60/0x90
> > > > > > [ 228.438236] entry_SYSCALL_64_after_hwframe+0x77/0xe1
> > > > > >
> > > > > > [ 228.438236] The buggy address belongs to the object at ffff8881731ce000
> > > > > > which belongs to the cache kmalloc-4k of size 4096
> > > > > > [ 228.438236] The buggy address is located 80 bytes inside of
> > > > > > freed 4096-byte region [ffff8881731ce000, ffff8881731cf000)
> > > > > >
> > > > > > [ 228.438236] The buggy address belongs to the physical page:
> > > > > > [ 228.438236] page:00000000d1c274b4 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1731c8
> > > > > > [ 228.438236] head:00000000d1c274b4 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
> > > > > > [ 228.438236] flags: 0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
> > > > > > [ 228.438236] page_type: 0xffffffff()
> > > > > > [ 228.438236] raw: 0017ffffc0010200 ffff888100042140 dead000000000122 0000000000000000
> > > > > > [ 228.438236] raw: 0000000000000000 0000000080040004 00000001ffffffff 0000000000000000
> > > > > > [ 228.438236] page dumped because: kasan: bad access detected
> > > > > >
> > > > > > [ 228.438236] Memory state around the buggy address:
> > > > > > [ 228.438236] ffff8881731cdf00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> > > > > > [ 228.438236] ffff8881731cdf80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> > > > > > [ 228.438236] >ffff8881731ce000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > > > > [ 228.438236] ^
> > > > > > [ 228.438236] ffff8881731ce080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > > > > [ 228.438236] ffff8881731ce100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > > > > [ 228.438236] ==================================================================
> > > > > >
> > > > >
> > > > > mind resolving those to file lines via decode_stacktrace.sh or
> > > > > something, because looking at it, it makes no sense really.
> > > >
> > > > I don't own the machine, so it's a bit difficult from my side,
> > > > unfortunately.
> > > >
> >
> > also, you don't need to run it on the same machine if it's all
> > distribution packaged. As long as you have the exact same binary
> > available you can resolve the lines. Or just use gdb:
> > https://docs.kernel.org/admin-guide/bug-hunting.html#gdb
>
> Unfortunately it's not possible, as it's a moving target (following
> the upstream development), and the rpm packages used for the report
> are already gone. What I can get now won't match any longer.
>
> But now I wonder whether this can be reproduced by injecting an
> error artificially.
>

uhh, nvm. I was blind and I didn't see those lines of code:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/nouveau/nouveau_connector.c?h=v6.5-rc6#n1403

Those lines probably should only be called after all error checks,
I'll see if it's safe to do so.

>
> Takashi
>


2023-08-14 15:32:49

by Takashi Iwai

[permalink] [raw]
Subject: Re: 2b5d1c29f6c4 ("drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts")

On Mon, 14 Aug 2023 16:51:08 +0200,
Karol Herbst wrote:
>
> I've sent a patch out to address this memory corruption
> https://patchwork.freedesktop.org/patch/552642/
>
> It might or might not fix regressions from the original I2C fix, so
> please test and report if there are remaining issues.

Thanks! I'll build a test kernel and ask the reporter for testing
with it. Let's cross fingers :)


Takashi

2023-08-19 20:26:06

by Takashi Iwai

[permalink] [raw]
Subject: Re: 2b5d1c29f6c4 ("drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts")

On Mon, 14 Aug 2023 17:06:02 +0200,
Takashi Iwai wrote:
>
> On Mon, 14 Aug 2023 16:51:08 +0200,
> Karol Herbst wrote:
> >
> > I've sent a patch out to address this memory corruption
> > https://patchwork.freedesktop.org/patch/552642/
> >
> > It might or might not fix regressions from the original I2C fix, so
> > please test and report if there are remaining issues.
>
> Thanks! I'll build a test kernel and ask the reporter for testing
> with it. Let's cross fingers :)

The feedback is positive, so far. It seems fixing the regression
reported for 6.4.8 kernel.


thanks,

Takashi