2024-06-12 12:41:21

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [REGRESSION] QXL display malfunction

[CCing a few more people and lists that get_maintainers pointed out for qxl]

Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
for once, to make this easily accessible to everyone.

Thomas, from here it looks like this report that apparently is caused by
a change of yours that went into 6.10-rc1 (b33651a5c98dbd ("drm/qxl: Do
not pin buffer objects for vmap")) fell through the cracks. Or was
progress made to resolve this and I just missed this?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke


On 03.06.24 04:29, Kaplan, David wrote:
>> -----Original Message-----
>> From: Kaplan, David
>> Sent: Sunday, June 2, 2024 9:25 PM
>> To: [email protected]; [email protected]; Koenig,
>> Christian <[email protected]>; [email protected]
>> Cc: Petkov, Borislav <[email protected]>; [email protected]
>> Subject: [REGRESSION] QXL display malfunction
>>
>> Hi,
>>
>> I am running an Ubuntu 19.10 VM with a tip kernel using QXL video and I've
>> observed the VM graphics often malfunction after boot, sometimes failing to
>> load the Ubuntu desktop or even immediately shutting the guest down.
>> When it does load, the guest dmesg log often contains errors like
>>
>> [ 4.303586] [drm:drm_atomic_helper_commit_planes] *ERROR* head 1
>> wrong: 65376256x16777216+0+0
>> [ 4.586883] [drm:drm_atomic_helper_commit_planes] *ERROR* head 1
>> wrong: 65376256x16777216+0+0
>> [ 4.904036] [drm:drm_atomic_helper_commit_planes] *ERROR* head 1
>> wrong: 65335296x16777216+0+0
>> [ 5.374347] [drm:qxl_release_from_id_locked] *ERROR* failed to find id in
>> release_idr
>>
>> I bisected the issue down to "drm/qxl: Do not pin buffer objects for vmap"
>> (b33651a5c98dbd5a919219d8c129d0674ef74299).
>>
>> The full guest .config and guest XML can be provided if desired. The guest
>> kernel has QXL support compiled in and the VM has
>>
>> <video>
>> <model type="qxl" ram="65536" vram="65536" vgamem="16384"
>> heads="1" primary="yes"/>
>> <address type="pci" domain="0x0000" bus="0x00" slot="0x01"
>> function="0x0"/> </video>
>>
>> The host is Ubuntu 24.04 (stock) running QEMU version 8.2.2. The VM is run
>> under virt-manager 4.1.0. If other information would be helpful, just let me
>> know.
>>
>> Thanks --David Kaplan
>
> Fixing emails...sorry
>
> --David Kaplan
>
>


2024-06-12 14:26:26

by Thomas Zimmermann

[permalink] [raw]
Subject: Re: [REGRESSION] QXL display malfunction

Hi

Am 12.06.24 um 14:41 schrieb Linux regression tracking (Thorsten Leemhuis):
> [CCing a few more people and lists that get_maintainers pointed out for qxl]
>
> Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
> for once, to make this easily accessible to everyone.
>
> Thomas, from here it looks like this report that apparently is caused by
> a change of yours that went into 6.10-rc1 (b33651a5c98dbd ("drm/qxl: Do
> not pin buffer objects for vmap")) fell through the cracks. Or was
> progress made to resolve this and I just missed this?
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
>
> #regzbot poke
>
>
> On 03.06.24 04:29, Kaplan, David wrote:
>>> -----Original Message-----
>>> From: Kaplan, David
>>> Sent: Sunday, June 2, 2024 9:25 PM
>>> To: [email protected]; [email protected]; Koenig,
>>> Christian <[email protected]>; [email protected]
>>> Cc: Petkov, Borislav <[email protected]>; [email protected]
>>> Subject: [REGRESSION] QXL display malfunction
>>>
>>> Hi,
>>>
>>> I am running an Ubuntu 19.10 VM with a tip kernel using QXL video and I've
>>> observed the VM graphics often malfunction after boot, sometimes failing to
>>> load the Ubuntu desktop or even immediately shutting the guest down.
>>> When it does load, the guest dmesg log often contains errors like
>>>
>>> [ 4.303586] [drm:drm_atomic_helper_commit_planes] *ERROR* head 1
>>> wrong: 65376256x16777216+0+0
>>> [ 4.586883] [drm:drm_atomic_helper_commit_planes] *ERROR* head 1
>>> wrong: 65376256x16777216+0+0
>>> [ 4.904036] [drm:drm_atomic_helper_commit_planes] *ERROR* head 1
>>> wrong: 65335296x16777216+0+0

I don't see how these messages are related. Did they already appear
before the broken commit was there?

>>> [ 5.374347] [drm:qxl_release_from_id_locked] *ERROR* failed to find id in
>>> release_idr

Is there only one such message in the log? Or multiple/frequent ones.

Could you provide a stack trace of what happens before?

We sometimes draw into the buffer object from the CPU. For accessing the
buffer object's pages from the CPU, only a vmap operation should be
necessary. It appears as if qxl also requires a pin. My guess is that
the pin inserts the buffer-object's host-side pages and the code around
qxl_release_from_id_locked() appears to be garbage-collecting them.
Hence without the pin, the GC complains about inconsistent state.
>>>
>>> I bisected the issue down to "drm/qxl: Do not pin buffer objects for vmap"
>>> (b33651a5c98dbd5a919219d8c129d0674ef74299).

Thanks for bisecting. Does it work if you revert that commit?

Best regards
Thomas


>>>
>>> The full guest .config and guest XML can be provided if desired. The guest
>>> kernel has QXL support compiled in and the VM has
>>>
>>> <video>
>>> <model type="qxl" ram="65536" vram="65536" vgamem="16384"
>>> heads="1" primary="yes"/>
>>> <address type="pci" domain="0x0000" bus="0x00" slot="0x01"
>>> function="0x0"/> </video>
>>>
>>> The host is Ubuntu 24.04 (stock) running QEMU version 8.2.2. The VM is run
>>> under virt-manager 4.1.0. If other information would be helpful, just let me
>>> know.
>>>
>>> Thanks --David Kaplan
>> Fixing emails...sorry
>>
>> --David Kaplan
>>
>>

--
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)


2024-06-14 13:45:34

by Kaplan, David

[permalink] [raw]
Subject: RE: [REGRESSION] QXL display malfunction

[AMD Official Use Only - AMD Internal Distribution Only]

> -----Original Message-----
> From: Thomas Zimmermann <[email protected]>
> Sent: Wednesday, June 12, 2024 9:26 AM
> To: Linux regressions mailing list <[email protected]>
> Cc: Petkov, Borislav <[email protected]>;
> [email protected]; [email protected]; Kaplan, David
> <[email protected]>; Koenig, Christian <[email protected]>;
> Dave Airlie <[email protected]>; Maarten Lankhorst
> <[email protected]>; Maxime Ripard
> <[email protected]>; LKML <[email protected]>; ML dri-devel
> <[email protected]>; [email protected];
> [email protected]
> Subject: Re: [REGRESSION] QXL display malfunction
>
> Caution: This message originated from an External Source. Use proper
> caution when opening attachments, clicking links, or responding.
>
>
> Hi
>
> Am 12.06.24 um 14:41 schrieb Linux regression tracking (Thorsten Leemhuis):
> > [CCing a few more people and lists that get_maintainers pointed out
> > for qxl]
> >
> > Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
> > for once, to make this easily accessible to everyone.
> >
> > Thomas, from here it looks like this report that apparently is caused
> > by a change of yours that went into 6.10-rc1 (b33651a5c98dbd
> > ("drm/qxl: Do not pin buffer objects for vmap")) fell through the
> > cracks. Or was progress made to resolve this and I just missed this?
> >
> > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker'
> > hat)
> > --
> > Everything you wanna know about Linux kernel regression tracking:
> > https://linux-regtracking.leemhuis.info/about/#tldr
> > If I did something stupid, please tell me, as explained on that page.
> >
> > #regzbot poke
> >
> >
> > On 03.06.24 04:29, Kaplan, David wrote:
> >>> -----Original Message-----
> >>> From: Kaplan, David
> >>> Sent: Sunday, June 2, 2024 9:25 PM
> >>> To: [email protected]; [email protected]; Koenig,
> >>> Christian <[email protected]>; [email protected]
> >>> Cc: Petkov, Borislav <[email protected]>;
> >>> [email protected]
> >>> Subject: [REGRESSION] QXL display malfunction
> >>>
> >>> Hi,
> >>>
> >>> I am running an Ubuntu 19.10 VM with a tip kernel using QXL video
> >>> and I've observed the VM graphics often malfunction after boot,
> >>> sometimes failing to load the Ubuntu desktop or even immediately
> shutting the guest down.
> >>> When it does load, the guest dmesg log often contains errors like
> >>>
> >>> [ 4.303586] [drm:drm_atomic_helper_commit_planes] *ERROR* head
> 1
> >>> wrong: 65376256x16777216+0+0
> >>> [ 4.586883] [drm:drm_atomic_helper_commit_planes] *ERROR* head
> 1
> >>> wrong: 65376256x16777216+0+0
> >>> [ 4.904036] [drm:drm_atomic_helper_commit_planes] *ERROR* head
> 1
> >>> wrong: 65335296x16777216+0+0
>
> I don't see how these messages are related. Did they already appear before
> the broken commit was there?

No, I did not observe them prior to the broken commit.

>
> >>> [ 5.374347] [drm:qxl_release_from_id_locked] *ERROR* failed to find
> id in
> >>> release_idr
>
> Is there only one such message in the log? Or multiple/frequent ones.

I would usually only see one.

>
> Could you provide a stack trace of what happens before?

Here's the top of a backtrace when the error occurs:
#0 qxl_release_from_id_locked (qdev=qdev@entry=0xffff88810126e000, id=id@entry=262151)
at drivers/gpu/drm/qxl/qxl_release.c:373
#1 0xffffffff819f5b6a in qxl_garbage_collect (qdev=0xffff88810126e000)
at drivers/gpu/drm/qxl/qxl_cmd.c:222
#2 0xffffffff810e3aa8 in process_one_work (worker=worker@entry=0xffff888101680300,
work=0xffff88810126f340) at kernel/workqueue.c:3231
#3 0xffffffff810e6281 in process_scheduled_works (worker=<optimized out>)
at kernel/workqueue.c:3312
#4 worker_thread (__worker=0xffff888101680300) at kernel/workqueue.c:3393

>
> We sometimes draw into the buffer object from the CPU. For accessing the
> buffer object's pages from the CPU, only a vmap operation should be
> necessary. It appears as if qxl also requires a pin. My guess is that the pin
> inserts the buffer-object's host-side pages and the code around
> qxl_release_from_id_locked() appears to be garbage-collecting them.
> Hence without the pin, the GC complains about inconsistent state.
> >>>
> >>> I bisected the issue down to "drm/qxl: Do not pin buffer objects for
> vmap"
> >>> (b33651a5c98dbd5a919219d8c129d0674ef74299).
>
> Thanks for bisecting. Does it work if you revert that commit?

Yes

Thanks --David Kaplan