2018-11-02 01:52:58

by Lyude Paul

[permalink] [raw]
Subject: [PATCH] drm/amd/amdgpu/dm: Fix dm_dp_create_fake_mst_encoder()

[why]
Removing connector reusage from DM to match the rest of the tree ended
up revealing an issue that was surprisingly subtle. The original amdgpu
code for DC that was submitted appears to have left a chunk in
dm_dp_create_fake_mst_encoder() that tries to find a "master encoder",
the likes of which isn't actually used or stored anywhere. It does so at
the wrong time as well by trying to access parts of the drm_connector
from the encoder init before it's actually been initialized. This
results in a NULL pointer deref on MST hotplugs:

[ 160.696613] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[ 160.697234] PGD 0 P4D 0
[ 160.697814] Oops: 0010 [#1] SMP PTI
[ 160.698430] CPU: 2 PID: 64 Comm: kworker/2:1 Kdump: loaded Tainted: G O 4.19.0Lyude-Test+ #2
[ 160.699020] Hardware name: HP HP ZBook 15 G4/8275, BIOS P70 Ver. 01.22 05/17/2018
[ 160.699672] Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper]
[ 160.700322] RIP: 0010: (null)
[ 160.700920] Code: Bad RIP value.
[ 160.701541] RSP: 0018:ffffc9000029fc78 EFLAGS: 00010206
[ 160.702183] RAX: 0000000000000000 RBX: ffff8804440ed468 RCX: ffff8804440e9158
[ 160.702778] RDX: 0000000000000000 RSI: ffff8804556c5700 RDI: ffff8804440ed000
[ 160.703408] RBP: ffff880458e21800 R08: 0000000000000002 R09: 000000005fca0a25
[ 160.704002] R10: ffff88045a077a3d R11: ffff88045a077a3c R12: ffff8804440ed000
[ 160.704614] R13: ffff880458e21800 R14: ffff8804440e9000 R15: ffff8804440e9000
[ 160.705260] FS: 0000000000000000(0000) GS:ffff88045f280000(0000) knlGS:0000000000000000
[ 160.705854] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 160.706478] CR2: ffffffffffffffd6 CR3: 000000000200a001 CR4: 00000000003606e0
[ 160.707124] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 160.707724] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 160.708372] Call Trace:
[ 160.708998] ? dm_dp_add_mst_connector+0xed/0x1d0 [amdgpu]
[ 160.709625] ? drm_dp_add_port+0x2fa/0x470 [drm_kms_helper]
[ 160.710284] ? wake_up_q+0x54/0x70
[ 160.710877] ? __mutex_unlock_slowpath.isra.18+0xb3/0x110
[ 160.711512] ? drm_dp_dpcd_access+0xe7/0x110 [drm_kms_helper]
[ 160.712161] ? drm_dp_send_link_address+0x155/0x1e0 [drm_kms_helper]
[ 160.712762] ? drm_dp_check_and_send_link_address+0xa3/0xd0 [drm_kms_helper]
[ 160.713408] ? drm_dp_mst_link_probe_work+0x4b/0x80 [drm_kms_helper]
[ 160.714013] ? process_one_work+0x1a1/0x3a0
[ 160.714667] ? worker_thread+0x30/0x380
[ 160.715326] ? wq_update_unbound_numa+0x10/0x10
[ 160.715939] ? kthread+0x112/0x130
[ 160.716591] ? kthread_create_worker_on_cpu+0x70/0x70
[ 160.717262] ? ret_from_fork+0x35/0x40
[ 160.717886] Modules linked in: amdgpu(O) vfat fat snd_hda_codec_generic joydev i915 chash gpu_sched ttm i2c_algo_bit drm_kms_helper snd_hda_codec_hdmi hp_wmi syscopyarea iTCO_wdt sysfillrect sparse_keymap sysimgblt fb_sys_fops snd_hda_intel usbhid wmi_bmof drm snd_hda_codec btusb snd_hda_core intel_rapl btrtl x86_pkg_temp_thermal btbcm btintel coretemp snd_pcm crc32_pclmul bluetooth psmouse snd_timer snd pcspkr i2c_i801 mei_me i2c_core soundcore mei tpm_tis wmi tpm_tis_core hp_accel ecdh_generic lis3lv02d tpm video rfkill acpi_pad input_polldev hp_wireless pcc_cpufreq crc32c_intel serio_raw tg3 xhci_pci xhci_hcd [last unloaded: amdgpu]
[ 160.720141] CR2: 0000000000000000

Somehow the connector reusage DM was using for MST connectors managed to
paper over this issue entirely; hence why this was never caught until
now.

[how]
Since this code isn't used anywhere and seems useless anyway, we can
just drop it entirely. This appears to fix the issue on my HP ZBook with
an AMD WX4150.

Signed-off-by: Lyude Paul <[email protected]>
---
Hey! This is the patch that I was talking about, feel free to review
it-we should make sure this goes in with the rest of the patches you've
got so far.

drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c | 5 -----
1 file changed, 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
index 6aa7359d61a3..5432a1862b41 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
@@ -285,12 +285,7 @@ dm_dp_create_fake_mst_encoder(struct amdgpu_dm_connector *connector)
struct amdgpu_device *adev = dev->dev_private;
struct amdgpu_encoder *amdgpu_encoder;
struct drm_encoder *encoder;
- const struct drm_connector_helper_funcs *connector_funcs =
- connector->base.helper_private;
- struct drm_encoder *enc_master =
- connector_funcs->best_encoder(&connector->base);

- DRM_DEBUG_KMS("enc master is %p\n", enc_master);
amdgpu_encoder = kzalloc(sizeof(*amdgpu_encoder), GFP_KERNEL);
if (!amdgpu_encoder)
return NULL;
--
2.19.1



2018-11-05 15:32:59

by Harry Wentland

[permalink] [raw]
Subject: Re: [PATCH] drm/amd/amdgpu/dm: Fix dm_dp_create_fake_mst_encoder()

On 2018-11-01 9:51 p.m., Lyude Paul wrote:
> [why]
> Removing connector reusage from DM to match the rest of the tree ended
> up revealing an issue that was surprisingly subtle. The original amdgpu
> code for DC that was submitted appears to have left a chunk in
> dm_dp_create_fake_mst_encoder() that tries to find a "master encoder",
> the likes of which isn't actually used or stored anywhere. It does so at
> the wrong time as well by trying to access parts of the drm_connector
> from the encoder init before it's actually been initialized. This
> results in a NULL pointer deref on MST hotplugs:
>
> [ 160.696613] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
> [ 160.697234] PGD 0 P4D 0
> [ 160.697814] Oops: 0010 [#1] SMP PTI
> [ 160.698430] CPU: 2 PID: 64 Comm: kworker/2:1 Kdump: loaded Tainted: G O 4.19.0Lyude-Test+ #2
> [ 160.699020] Hardware name: HP HP ZBook 15 G4/8275, BIOS P70 Ver. 01.22 05/17/2018
> [ 160.699672] Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper]
> [ 160.700322] RIP: 0010: (null)
> [ 160.700920] Code: Bad RIP value.
> [ 160.701541] RSP: 0018:ffffc9000029fc78 EFLAGS: 00010206
> [ 160.702183] RAX: 0000000000000000 RBX: ffff8804440ed468 RCX: ffff8804440e9158
> [ 160.702778] RDX: 0000000000000000 RSI: ffff8804556c5700 RDI: ffff8804440ed000
> [ 160.703408] RBP: ffff880458e21800 R08: 0000000000000002 R09: 000000005fca0a25
> [ 160.704002] R10: ffff88045a077a3d R11: ffff88045a077a3c R12: ffff8804440ed000
> [ 160.704614] R13: ffff880458e21800 R14: ffff8804440e9000 R15: ffff8804440e9000
> [ 160.705260] FS: 0000000000000000(0000) GS:ffff88045f280000(0000) knlGS:0000000000000000
> [ 160.705854] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 160.706478] CR2: ffffffffffffffd6 CR3: 000000000200a001 CR4: 00000000003606e0
> [ 160.707124] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 160.707724] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 160.708372] Call Trace:
> [ 160.708998] ? dm_dp_add_mst_connector+0xed/0x1d0 [amdgpu]
> [ 160.709625] ? drm_dp_add_port+0x2fa/0x470 [drm_kms_helper]
> [ 160.710284] ? wake_up_q+0x54/0x70
> [ 160.710877] ? __mutex_unlock_slowpath.isra.18+0xb3/0x110
> [ 160.711512] ? drm_dp_dpcd_access+0xe7/0x110 [drm_kms_helper]
> [ 160.712161] ? drm_dp_send_link_address+0x155/0x1e0 [drm_kms_helper]
> [ 160.712762] ? drm_dp_check_and_send_link_address+0xa3/0xd0 [drm_kms_helper]
> [ 160.713408] ? drm_dp_mst_link_probe_work+0x4b/0x80 [drm_kms_helper]
> [ 160.714013] ? process_one_work+0x1a1/0x3a0
> [ 160.714667] ? worker_thread+0x30/0x380
> [ 160.715326] ? wq_update_unbound_numa+0x10/0x10
> [ 160.715939] ? kthread+0x112/0x130
> [ 160.716591] ? kthread_create_worker_on_cpu+0x70/0x70
> [ 160.717262] ? ret_from_fork+0x35/0x40
> [ 160.717886] Modules linked in: amdgpu(O) vfat fat snd_hda_codec_generic joydev i915 chash gpu_sched ttm i2c_algo_bit drm_kms_helper snd_hda_codec_hdmi hp_wmi syscopyarea iTCO_wdt sysfillrect sparse_keymap sysimgblt fb_sys_fops snd_hda_intel usbhid wmi_bmof drm snd_hda_codec btusb snd_hda_core intel_rapl btrtl x86_pkg_temp_thermal btbcm btintel coretemp snd_pcm crc32_pclmul bluetooth psmouse snd_timer snd pcspkr i2c_i801 mei_me i2c_core soundcore mei tpm_tis wmi tpm_tis_core hp_accel ecdh_generic lis3lv02d tpm video rfkill acpi_pad input_polldev hp_wireless pcc_cpufreq crc32c_intel serio_raw tg3 xhci_pci xhci_hcd [last unloaded: amdgpu]
> [ 160.720141] CR2: 0000000000000000
>
> Somehow the connector reusage DM was using for MST connectors managed to
> paper over this issue entirely; hence why this was never caught until
> now.
>
> [how]
> Since this code isn't used anywhere and seems useless anyway, we can
> just drop it entirely. This appears to fix the issue on my HP ZBook with
> an AMD WX4150.
>
> Signed-off-by: Lyude Paul <[email protected]>

Good catch. Probably got broken with some of the best_encoder cleanup that happened in the last few months. I really should've caught it then.

Reviewed-by: Harry Wentland <[email protected]>

Harry

> ---
> Hey! This is the patch that I was talking about, feel free to review
> it-we should make sure this goes in with the rest of the patches you've
> got so far.
>
> drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c | 5 -----
> 1 file changed, 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
> index 6aa7359d61a3..5432a1862b41 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
> @@ -285,12 +285,7 @@ dm_dp_create_fake_mst_encoder(struct amdgpu_dm_connector *connector)
> struct amdgpu_device *adev = dev->dev_private;
> struct amdgpu_encoder *amdgpu_encoder;
> struct drm_encoder *encoder;
> - const struct drm_connector_helper_funcs *connector_funcs =
> - connector->base.helper_private;
> - struct drm_encoder *enc_master =
> - connector_funcs->best_encoder(&connector->base);
>
> - DRM_DEBUG_KMS("enc master is %p\n", enc_master);
> amdgpu_encoder = kzalloc(sizeof(*amdgpu_encoder), GFP_KERNEL);
> if (!amdgpu_encoder)
> return NULL;
>