2021-07-30 01:13:06

by Dave Airlie

[permalink] [raw]
Subject: [git pull] drm fixes for 5.14-rc4

Hi Linus,

Regular drm fixes pull, seems about the right size, lots of small
fixes across the board, mostly amdgpu, but msm and i915 are in there
along with panel and ttm. There is an rc3 backmerge due to some
patches ending up in the gap between last and this week.

Dave.

drm-fixes-2021-07-30:
drm fixes for 5.14-rc4

amdgpu:
- Fix resource leak in an error path
- Avoid stack contents exposure in error path
- pmops check fix for S0ix vs S3
- DCN 2.1 display fixes
- DCN 2.0 display fix
- Backlight control fix for laptops with HDR panels
- Maintainers updates

i915:
- Fix vbt port mask
- Fix around reading the right DSC disable fuse in display_ver 10
- Split display version 9 and 10 in intel_setup_outputs

msm:
- iommu fault display fix
- misc dp compliance fixes
- dpu reg sizing fix

panel:
- Fix bpc for ytc700tlag_05_201c

ttm:
- debugfs init fixes
The following changes since commit ff1176468d368232b684f75e82563369208bc371:

Linux 5.14-rc3 (2021-07-25 15:35:14 -0700)

are available in the Git repository at:

git://anongit.freedesktop.org/drm/drm tags/drm-fixes-2021-07-30

for you to fetch changes up to d28e2568ac26fff351c846bf74ba6ca5dded733e:

Merge tag 'amd-drm-fixes-5.14-2021-07-28' of
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes (2021-07-29
17:20:29 +1000)

----------------------------------------------------------------
drm fixes for 5.14-rc4

amdgpu:
- Fix resource leak in an error path
- Avoid stack contents exposure in error path
- pmops check fix for S0ix vs S3
- DCN 2.1 display fixes
- DCN 2.0 display fix
- Backlight control fix for laptops with HDR panels
- Maintainers updates

i915:
- Fix vbt port mask
- Fix around reading the right DSC disable fuse in display_ver 10
- Split display version 9 and 10 in intel_setup_outputs

msm:
- iommu fault display fix
- misc dp compliance fixes
- dpu reg sizing fix

panel:
- Fix bpc for ytc700tlag_05_201c

ttm:
- debugfs init fixes

----------------------------------------------------------------
Alex Deucher (1):
drm/amdgpu/display: only enable aux backlight control for OLED panels

Bjorn Andersson (1):
drm/msm/dp: Initialize the INTF_CONFIG register

Dale Zhao (1):
drm/amd/display: ensure dentist display clock update finished in DCN20

Dave Airlie (4):
Merge tag 'drm-msm-fixes-2021-07-27' of
https://gitlab.freedesktop.org/drm/msm into drm-fixes
Merge tag 'drm-misc-fixes-2021-07-28' of
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
Merge tag 'drm-intel-fixes-2021-07-28' of
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
Merge tag 'amd-drm-fixes-5.14-2021-07-28' of
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

Jagan Teki (1):
drm/panel: panel-simple: Fix proper bpc for ytc700tlag_05_201c

Jason Ekstrand (1):
drm/ttm: Initialize debugfs from ttm_global_init()

Jiri Kosina (2):
drm/amdgpu: Fix resource leak on probe error path
drm/amdgpu: Avoid printing of stack contents on firmware load error

Kuogee Hsieh (2):
drm/msm/dp: use dp_ctrl_off_link_stream during PHY compliance test run
drm/msm/dp: signal audio plugged change at dp_pm_resume

Lucas De Marchi (2):
drm/i915: fix not reading DSC disable fuse in GLK
drm/i915/display: split DISPLAY_VER 9 and 10 in intel_setup_outputs()

Pratik Vishwakarma (1):
drm/amdgpu: Check pmops for desired suspend state

Rob Clark (1):
drm/msm: Fix display fault handling

Robert Foss (1):
drm/msm/dpu: Fix sm8250_mdp register length

Rodrigo Vivi (1):
drm/i915/bios: Fix ports mask

Sean Paul (1):
drm/msm/dp: Initialize dp->aux->drm_dev before registration

Simon Ser (1):
maintainers: add bugs and chat URLs for amdgpu

Thomas Zimmermann (1):
Merge drm/drm-fixes into drm-misc-fixes

Victor Lu (2):
drm/amd/display: Guard DST_Y_PREFETCH register overflow in DCN21
drm/amd/display: Add missing DCN21 IP parameter

MAINTAINERS | 2 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 3 ++-
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++------
drivers/gpu/drm/amd/amdgpu/psp_v12_0.c | 7 +++----
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 4 ++--
.../gpu/drm/amd/display/dc/clk_mgr/dcn20/dcn20_clk_mgr.c | 2 +-
drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c | 1 +
.../drm/amd/display/dc/dml/dcn21/display_mode_vba_21.c | 3 +++
drivers/gpu/drm/i915/display/intel_bios.c | 3 ++-
drivers/gpu/drm/i915/display/intel_display.c | 8 +++++++-
drivers/gpu/drm/i915/intel_device_info.c | 9 +++++----
drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c | 2 +-
drivers/gpu/drm/msm/dp/dp_catalog.c | 1 +
drivers/gpu/drm/msm/dp/dp_ctrl.c | 2 +-
drivers/gpu/drm/msm/dp/dp_display.c | 5 +++++
drivers/gpu/drm/msm/msm_iommu.c | 11 ++++++++++-
drivers/gpu/drm/panel/panel-simple.c | 2 +-
drivers/gpu/drm/ttm/ttm_device.c | 12 ++++++++++++
drivers/gpu/drm/ttm/ttm_module.c | 16 ----------------
19 files changed, 61 insertions(+), 40 deletions(-)


2021-07-30 05:14:11

by pr-tracker-bot

[permalink] [raw]
Subject: Re: [git pull] drm fixes for 5.14-rc4

The pull request you sent on Fri, 30 Jul 2021 11:11:27 +1000:

> git://anongit.freedesktop.org/drm/drm tags/drm-fixes-2021-07-30

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/764a5bc89b12b82c18ce7ca5d7c1b10dd748a440

Thank you!

--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

2021-08-05 20:17:50

by Linus Torvalds

[permalink] [raw]
Subject: Re: [git pull] drm fixes for 5.14-rc4

This might possibly have been fixed already by the previous drm pull,
but I wanted to report it anyway, just in case.

It happened after an uptime of over a week, so it might not be trivial
to reproduce.

It's a NULL pointer dereference in dc_stream_retain() with the code being

lock xadd %eax,0x390(%rdi) <-- trapping instruction

and that's just the

kref_get(&stream->refcount);

with a NULL 'stream' argument.

Call Trace:
dc_resource_state_copy_construct+0x13f/0x190 [amdgpu]
amdgpu_dm_atomic_commit_tail+0xd5/0x1540 [amdgpu]
commit_tail+0x97/0x180 [drm_kms_helper]
process_one_work+0x1df/0x3a0

the oops is followed by a stream of

[drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:55:crtc-1]
hw_done or flip_done timed out

and the machine was not usable afterwards.

lspci says this is a

49:00.0 VGA compatible controller [0300]:
Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere
[Radeon RX 470/480/570/570X/580/580X/590]
[1002:67df] (rev e7) (prog-if 00 [VGA controller])

Full oops in the attachment, but I think the above is all the really
salient details.

Linus


Attachments:
amd-gpu-ooops (3.13 kB)

2021-08-05 20:21:12

by Alex Deucher

[permalink] [raw]
Subject: Re: [git pull] drm fixes for 5.14-rc4

On Thu, Aug 5, 2021 at 2:14 PM Linus Torvalds
<[email protected]> wrote:
>
> This might possibly have been fixed already by the previous drm pull,
> but I wanted to report it anyway, just in case.
>
> It happened after an uptime of over a week, so it might not be trivial
> to reproduce.
>
> It's a NULL pointer dereference in dc_stream_retain() with the code being
>
> lock xadd %eax,0x390(%rdi) <-- trapping instruction
>
> and that's just the
>
> kref_get(&stream->refcount);
>
> with a NULL 'stream' argument.
>
> Call Trace:
> dc_resource_state_copy_construct+0x13f/0x190 [amdgpu]
> amdgpu_dm_atomic_commit_tail+0xd5/0x1540 [amdgpu]
> commit_tail+0x97/0x180 [drm_kms_helper]
> process_one_work+0x1df/0x3a0
>
> the oops is followed by a stream of
>
> [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:55:crtc-1]
> hw_done or flip_done timed out
>
> and the machine was not usable afterwards.
>
> lspci says this is a
>
> 49:00.0 VGA compatible controller [0300]:
> Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere
> [Radeon RX 470/480/570/570X/580/580X/590]
> [1002:67df] (rev e7) (prog-if 00 [VGA controller])
>
> Full oops in the attachment, but I think the above is all the really
> salient details.

Thanks for the report. Adding some display folks to take a look.

Alex

2021-08-07 00:06:41

by Daniel Vetter

[permalink] [raw]
Subject: Re: [git pull] drm fixes for 5.14-rc4

On Thu, Aug 5, 2021 at 8:14 PM Linus Torvalds
<[email protected]> wrote:
>
> This might possibly have been fixed already by the previous drm pull,
> but I wanted to report it anyway, just in case.
>
> It happened after an uptime of over a week, so it might not be trivial
> to reproduce.
>
> It's a NULL pointer dereference in dc_stream_retain() with the code being
>
> lock xadd %eax,0x390(%rdi) <-- trapping instruction
>
> and that's just the
>
> kref_get(&stream->refcount);
>
> with a NULL 'stream' argument.
>
> Call Trace:
> dc_resource_state_copy_construct+0x13f/0x190 [amdgpu]
> amdgpu_dm_atomic_commit_tail+0xd5/0x1540 [amdgpu]
> commit_tail+0x97/0x180 [drm_kms_helper]
> process_one_work+0x1df/0x3a0
>
> the oops is followed by a stream of
>
> [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:55:crtc-1]
> hw_done or flip_done timed out
>
> and the machine was not usable afterwards.

Hm that part is a bit disappointing because the atomic modeset commit
helpers are designed to recover from this (assuming we didn't fry the
hw). But amdgpu does these waits in amdgpu_dm_atomic_check() which is
decidedly not great (you're not supposed to block on hw or a previous
in that atomic_check ever, because it can be called by userspace in a
TEST_ONLY mode to figure out whether a desired config would work), and
then returns that error to userspace, which is worse.

I guess that's another area where the integration between what atomic
modeset expects and the DC backend provides is suboptimal. I think the
data structures we managed to fuse together fairly ok, but the
check/commit flow and semantics are a bit a struggle.

Anyway this was just an aside, I guess given the bug the driver
wouldn't have recovered anyway.
-Daniel

> lspci says this is a
>
> 49:00.0 VGA compatible controller [0300]:
> Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere
> [Radeon RX 470/480/570/570X/580/580X/590]
> [1002:67df] (rev e7) (prog-if 00 [VGA controller])
>
> Full oops in the attachment, but I think the above is all the really
> salient details.
>
> Linus



--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch