Following kernel crash noticed on linux next 20211020 tag.
while booting on arm64 architecture dragonboard 410c device.
I see the following config is enabled in 20211020 tag builds.
CONFIG_STACKDEPOT=y
Crash log,
[ 18.583097] Unable to handle kernel paging request at virtual
address 00000000007c4240
[ 18.583521] Mem abort info:
[ 18.590286] ESR = 0x96000004
[ 18.592920] EC = 0x25: DABT (current EL), IL = 32 bits
[ 18.596103] SET = 0, FnV = 0
[ 18.601512] EA = 0, S1PTW = 0
[ 18.604384] FSC = 0x04: level 0 translation fault
[ 18.607447] Data abort info:
[ 18.612296] ISV = 0, ISS = 0x00000004
[ 18.615451] CM = 0, WnR = 0
[ 18.618990] user pgtable: 4k pages, 48-bit VAs, pgdp=000000008b4c7000
[ 18.622054] [00000000007c4240] pgd=0000000000000000, p4d=0000000000000000
[ 18.628974] Internal error: Oops: 96000004 [#1] SMP
[ 18.635073] Modules linked in: adv7511 cec snd_soc_lpass_apq8016
snd_soc_lpass_cpu snd_soc_lpass_platform snd_soc_msm8916_digital
qcom_camss qrtr snd_soc_apq8016_sbc videobuf2_dma_sg qcom_pon
qcom_spmi_vadc snd_soc_qcom_common qcom_q6v5_mss qcom_vadc_common
rtc_pm8xxx qcom_spmi_temp_alarm msm qcom_pil_info v4l2_fwnode
qcom_q6v5 snd_soc_msm8916_analog qcom_sysmon qcom_common v4l2_async
qnoc_msm8916 qcom_rng gpu_sched qcom_glink_smem venus_core
videobuf2_memops icc_smd_rpm qmi_helpers drm_kms_helper v4l2_mem2mem
mdt_loader display_connector i2c_qcom_cci videobuf2_v4l2 crct10dif_ce
videobuf2_common socinfo drm rmtfs_mem fuse
[ 18.672948] CPU: 0 PID: 178 Comm: kworker/u8:3 Not tainted
5.15.0-rc6-next-20211020 #1
[ 18.695000] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[ 18.695012] Workqueue: events_unbound deferred_probe_work_func
[ 18.695033] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 18.715282] pc : __stack_depot_save+0x13c/0x4e0
[ 18.722130] lr : stack_depot_save+0x14/0x20
[ 18.726641] sp : ffff800014a23500
[ 18.730801] x29: ffff800014a23500 x28: 00000000000f8848 x27: ffff800013acdf68
[ 18.734294] x26: 0000000000000000 x25: 00000000007c4240 x24: ffff800014a23780
[ 18.741413] x23: 0000000000000008 x22: ffff800014a235b8 x21: 0000000000000008
[ 18.748530] x20: 00000000c32f8848 x19: ffff00001038cc18 x18: ffffffffffffffff
[ 18.755649] x17: ffff80002d9f8000 x16: ffff800010004000 x15: 000000000000c426
[ 18.762767] x14: 0000000000000000 x13: ffff800014a23780 x12: 0000000000000000
[ 18.769885] x11: ffff00001038cc80 x10: ffff8000136e9ba0 x9 : ffff800014a235f4
[ 18.777003] x8 : 0000000000000001 x7 : 00000000b664620b x6 : 0000000011a58b4a
[ 18.784121] x5 : 000000001aa43464 x4 : 000000009e7d8b67 x3 : 0000000000000001
[ 18.791239] x2 : 0000000000002800 x1 : ffff800013acd000 x0 : 00000000f2d429d8
[ 18.798358] Call trace:
[ 18.805451] __stack_depot_save+0x13c/0x4e0
[ 18.807716] stack_depot_save+0x14/0x20
[ 18.811881] __drm_stack_depot_save+0x44/0x70 [drm]
[ 18.815710] modeset_lock.part.0+0xe0/0x1a4 [drm]
[ 18.820571] drm_modeset_lock_all_ctx+0x2d4/0x334 [drm]
[ 18.825435] drm_client_firmware_config.constprop.0.isra.0+0xc0/0x5d0 [drm]
[ 18.830478] drm_client_modeset_probe+0x328/0xbb0 [drm]
[ 18.837413] __drm_fb_helper_initial_config_and_unlock+0x54/0x5b4
[drm_kms_helper]
[ 18.842633] drm_fb_helper_initial_config+0x5c/0x70 [drm_kms_helper]
[ 18.850266] msm_fbdev_init+0x98/0x100 [msm]
[ 18.856767] msm_drm_bind+0x650/0x720 [msm]
[ 18.861021] try_to_bring_up_master+0x230/0x320
[ 18.864926] __component_add+0xc8/0x1c4
[ 18.869435] component_add+0x20/0x30
[ 18.873253] mdp5_dev_probe+0xe0/0x11c [msm]
[ 18.877077] platform_probe+0x74/0xf0
[ 18.881328] really_probe+0xc4/0x470
[ 18.884883] __driver_probe_device+0x11c/0x190
[ 18.888534] driver_probe_device+0x48/0x110
[ 18.892786] __device_attach_driver+0xa4/0x140
[ 18.896869] bus_for_each_drv+0x84/0xe0
[ 18.901380] __device_attach+0xe4/0x1c0
[ 18.905112] device_initial_probe+0x20/0x30
[ 18.908932] bus_probe_device+0xac/0xb4
[ 18.913098] deferred_probe_work_func+0xc8/0x120
[ 18.916920] process_one_work+0x280/0x6a0
[ 18.921780] worker_thread+0x80/0x454
[ 18.925683] kthread+0x178/0x184
[ 18.929326] ret_from_fork+0x10/0x20
[ 18.932634] Code: d37d4e99 92404e9c f940077a 8b190359 (c8dfff33)
[ 18.936203] ---[ end trace 3e289b724840642d ]---
Full log,
https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20211020/testrun/6177937/suite/linux-log-parser/test/check-kernel-oops-3786583/log
https://lkft.validation.linaro.org/scheduler/job/3786583#L2549
Build config:
https://builds.tuxbuild.com/1zlLlQrUyHVr1MQ1gcler3dKaE6/config
Reported-by: Linux Kernel Functional Testing <[email protected]>
steps to reproduce:
1) https://builds.tuxbuild.com/1zlLlQrUyHVr1MQ1gcler3dKaE6/tuxmake_reproducer.sh
2) Boot db410c device
--
Linaro LKFT
https://lkft.linaro.org
On 10/20/21 20:24, Naresh Kamboju wrote:
> Following kernel crash noticed on linux next 20211020 tag.
> while booting on arm64 architecture dragonboard 410c device.
>
> I see the following config is enabled in 20211020 tag builds.
> CONFIG_STACKDEPOT=y
>
> Crash log,
> [ 18.583097] Unable to handle kernel paging request at virtual
> address 00000000007c4240
> [ 18.583521] Mem abort info:
> [ 18.590286] ESR = 0x96000004
> [ 18.592920] EC = 0x25: DABT (current EL), IL = 32 bits
> [ 18.596103] SET = 0, FnV = 0
> [ 18.601512] EA = 0, S1PTW = 0
> [ 18.604384] FSC = 0x04: level 0 translation fault
> [ 18.607447] Data abort info:
> [ 18.612296] ISV = 0, ISS = 0x00000004
> [ 18.615451] CM = 0, WnR = 0
> [ 18.618990] user pgtable: 4k pages, 48-bit VAs, pgdp=000000008b4c7000
> [ 18.622054] [00000000007c4240] pgd=0000000000000000, p4d=0000000000000000
> [ 18.628974] Internal error: Oops: 96000004 [#1] SMP
> [ 18.635073] Modules linked in: adv7511 cec snd_soc_lpass_apq8016
> snd_soc_lpass_cpu snd_soc_lpass_platform snd_soc_msm8916_digital
> qcom_camss qrtr snd_soc_apq8016_sbc videobuf2_dma_sg qcom_pon
> qcom_spmi_vadc snd_soc_qcom_common qcom_q6v5_mss qcom_vadc_common
> rtc_pm8xxx qcom_spmi_temp_alarm msm qcom_pil_info v4l2_fwnode
> qcom_q6v5 snd_soc_msm8916_analog qcom_sysmon qcom_common v4l2_async
> qnoc_msm8916 qcom_rng gpu_sched qcom_glink_smem venus_core
> videobuf2_memops icc_smd_rpm qmi_helpers drm_kms_helper v4l2_mem2mem
> mdt_loader display_connector i2c_qcom_cci videobuf2_v4l2 crct10dif_ce
> videobuf2_common socinfo drm rmtfs_mem fuse
> [ 18.672948] CPU: 0 PID: 178 Comm: kworker/u8:3 Not tainted
> 5.15.0-rc6-next-20211020 #1
> [ 18.695000] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
> [ 18.695012] Workqueue: events_unbound deferred_probe_work_func
> [ 18.695033] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 18.715282] pc : __stack_depot_save+0x13c/0x4e0
> [ 18.722130] lr : stack_depot_save+0x14/0x20
> [ 18.726641] sp : ffff800014a23500
> [ 18.730801] x29: ffff800014a23500 x28: 00000000000f8848 x27: ffff800013acdf68
> [ 18.734294] x26: 0000000000000000 x25: 00000000007c4240 x24: ffff800014a23780
> [ 18.741413] x23: 0000000000000008 x22: ffff800014a235b8 x21: 0000000000000008
> [ 18.748530] x20: 00000000c32f8848 x19: ffff00001038cc18 x18: ffffffffffffffff
> [ 18.755649] x17: ffff80002d9f8000 x16: ffff800010004000 x15: 000000000000c426
> [ 18.762767] x14: 0000000000000000 x13: ffff800014a23780 x12: 0000000000000000
> [ 18.769885] x11: ffff00001038cc80 x10: ffff8000136e9ba0 x9 : ffff800014a235f4
> [ 18.777003] x8 : 0000000000000001 x7 : 00000000b664620b x6 : 0000000011a58b4a
> [ 18.784121] x5 : 000000001aa43464 x4 : 000000009e7d8b67 x3 : 0000000000000001
> [ 18.791239] x2 : 0000000000002800 x1 : ffff800013acd000 x0 : 00000000f2d429d8
> [ 18.798358] Call trace:
> [ 18.805451] __stack_depot_save+0x13c/0x4e0
> [ 18.807716] stack_depot_save+0x14/0x20
> [ 18.811881] __drm_stack_depot_save+0x44/0x70 [drm]
> [ 18.815710] modeset_lock.part.0+0xe0/0x1a4 [drm]
> [ 18.820571] drm_modeset_lock_all_ctx+0x2d4/0x334 [drm]
This stack_depot_save path appears to be new from Jani's commit
cd06ab2fd48f ("drm/locking: add backtrace for locking contended locks
without backoff")
And there's a semantic conflict with my patch in mmotm:
- sha1 (valid only in next-20211020) 5e6d063de5cd ("lib/stackdepot: allow
optional init and stack_table allocation by kvmalloc()")
- lore: https://lore.kernel.org/all/[email protected]/
- patchwork: https://patchwork.freedesktop.org/series/95549/#rev3
With my patch, to-be callers of stack_depot_save() need to call
stack_depot_init() at least once, to avoid unnecessary runtime overhead
otherwise I have added that calls into three DRM contexts in my patch, but
didn't see cd06ab2fd48f yet at the time.
This one seems a bit more tricky and I could really use some advice.
cd06ab2fd48f adds stackdepot usage to drm_modeset_lock which itself has a
number of different users and requiring those to call stack_depot_init()
would be likely error prone. Would it be ok to add the call of
stack_depot_init() (guarded by #ifdef CONFIG_DRM_DEBUG_MODESET_LOCK) to
drm_modeset_lock_init()? It will do a mutex_lock()/unlock(), and kvmalloc()
on first call.
I don't know how much of hotpath this is, but hopefully should be acceptable
in debug config. Or do you have better suggestion? Thanks.
Then we have to figure out how to order a fix between DRM and mmotm...
> [ 18.825435] drm_client_firmware_config.constprop.0.isra.0+0xc0/0x5d0 [drm]
> [ 18.830478] drm_client_modeset_probe+0x328/0xbb0 [drm]
> [ 18.837413] __drm_fb_helper_initial_config_and_unlock+0x54/0x5b4
> [drm_kms_helper]
> [ 18.842633] drm_fb_helper_initial_config+0x5c/0x70 [drm_kms_helper]
> [ 18.850266] msm_fbdev_init+0x98/0x100 [msm]
> [ 18.856767] msm_drm_bind+0x650/0x720 [msm]
> [ 18.861021] try_to_bring_up_master+0x230/0x320
> [ 18.864926] __component_add+0xc8/0x1c4
> [ 18.869435] component_add+0x20/0x30
> [ 18.873253] mdp5_dev_probe+0xe0/0x11c [msm]
> [ 18.877077] platform_probe+0x74/0xf0
> [ 18.881328] really_probe+0xc4/0x470
> [ 18.884883] __driver_probe_device+0x11c/0x190
> [ 18.888534] driver_probe_device+0x48/0x110
> [ 18.892786] __device_attach_driver+0xa4/0x140
> [ 18.896869] bus_for_each_drv+0x84/0xe0
> [ 18.901380] __device_attach+0xe4/0x1c0
> [ 18.905112] device_initial_probe+0x20/0x30
> [ 18.908932] bus_probe_device+0xac/0xb4
> [ 18.913098] deferred_probe_work_func+0xc8/0x120
> [ 18.916920] process_one_work+0x280/0x6a0
> [ 18.921780] worker_thread+0x80/0x454
> [ 18.925683] kthread+0x178/0x184
> [ 18.929326] ret_from_fork+0x10/0x20
> [ 18.932634] Code: d37d4e99 92404e9c f940077a 8b190359 (c8dfff33)
> [ 18.936203] ---[ end trace 3e289b724840642d ]---
>
> Full log,
> https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20211020/testrun/6177937/suite/linux-log-parser/test/check-kernel-oops-3786583/log
> https://lkft.validation.linaro.org/scheduler/job/3786583#L2549
>
> Build config:
> https://builds.tuxbuild.com/1zlLlQrUyHVr1MQ1gcler3dKaE6/config
>
> Reported-by: Linux Kernel Functional Testing <[email protected]>
>
> steps to reproduce:
> 1) https://builds.tuxbuild.com/1zlLlQrUyHVr1MQ1gcler3dKaE6/tuxmake_reproducer.sh
> 2) Boot db410c device
>
> --
> Linaro LKFT
> https://lkft.linaro.org
>
On Thu, 21 Oct 2021, Vlastimil Babka <[email protected]> wrote:
> On 10/20/21 20:24, Naresh Kamboju wrote:
>> Following kernel crash noticed on linux next 20211020 tag.
>> while booting on arm64 architecture dragonboard 410c device.
>>
>> I see the following config is enabled in 20211020 tag builds.
>> CONFIG_STACKDEPOT=y
>>
>> Crash log,
>> [ 18.583097] Unable to handle kernel paging request at virtual
>> address 00000000007c4240
>> [ 18.583521] Mem abort info:
>> [ 18.590286] ESR = 0x96000004
>> [ 18.592920] EC = 0x25: DABT (current EL), IL = 32 bits
>> [ 18.596103] SET = 0, FnV = 0
>> [ 18.601512] EA = 0, S1PTW = 0
>> [ 18.604384] FSC = 0x04: level 0 translation fault
>> [ 18.607447] Data abort info:
>> [ 18.612296] ISV = 0, ISS = 0x00000004
>> [ 18.615451] CM = 0, WnR = 0
>> [ 18.618990] user pgtable: 4k pages, 48-bit VAs, pgdp=000000008b4c7000
>> [ 18.622054] [00000000007c4240] pgd=0000000000000000, p4d=0000000000000000
>> [ 18.628974] Internal error: Oops: 96000004 [#1] SMP
>> [ 18.635073] Modules linked in: adv7511 cec snd_soc_lpass_apq8016
>> snd_soc_lpass_cpu snd_soc_lpass_platform snd_soc_msm8916_digital
>> qcom_camss qrtr snd_soc_apq8016_sbc videobuf2_dma_sg qcom_pon
>> qcom_spmi_vadc snd_soc_qcom_common qcom_q6v5_mss qcom_vadc_common
>> rtc_pm8xxx qcom_spmi_temp_alarm msm qcom_pil_info v4l2_fwnode
>> qcom_q6v5 snd_soc_msm8916_analog qcom_sysmon qcom_common v4l2_async
>> qnoc_msm8916 qcom_rng gpu_sched qcom_glink_smem venus_core
>> videobuf2_memops icc_smd_rpm qmi_helpers drm_kms_helper v4l2_mem2mem
>> mdt_loader display_connector i2c_qcom_cci videobuf2_v4l2 crct10dif_ce
>> videobuf2_common socinfo drm rmtfs_mem fuse
>> [ 18.672948] CPU: 0 PID: 178 Comm: kworker/u8:3 Not tainted
>> 5.15.0-rc6-next-20211020 #1
>> [ 18.695000] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
>> [ 18.695012] Workqueue: events_unbound deferred_probe_work_func
>> [ 18.695033] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> [ 18.715282] pc : __stack_depot_save+0x13c/0x4e0
>> [ 18.722130] lr : stack_depot_save+0x14/0x20
>> [ 18.726641] sp : ffff800014a23500
>> [ 18.730801] x29: ffff800014a23500 x28: 00000000000f8848 x27: ffff800013acdf68
>> [ 18.734294] x26: 0000000000000000 x25: 00000000007c4240 x24: ffff800014a23780
>> [ 18.741413] x23: 0000000000000008 x22: ffff800014a235b8 x21: 0000000000000008
>> [ 18.748530] x20: 00000000c32f8848 x19: ffff00001038cc18 x18: ffffffffffffffff
>> [ 18.755649] x17: ffff80002d9f8000 x16: ffff800010004000 x15: 000000000000c426
>> [ 18.762767] x14: 0000000000000000 x13: ffff800014a23780 x12: 0000000000000000
>> [ 18.769885] x11: ffff00001038cc80 x10: ffff8000136e9ba0 x9 : ffff800014a235f4
>> [ 18.777003] x8 : 0000000000000001 x7 : 00000000b664620b x6 : 0000000011a58b4a
>> [ 18.784121] x5 : 000000001aa43464 x4 : 000000009e7d8b67 x3 : 0000000000000001
>> [ 18.791239] x2 : 0000000000002800 x1 : ffff800013acd000 x0 : 00000000f2d429d8
>> [ 18.798358] Call trace:
>> [ 18.805451] __stack_depot_save+0x13c/0x4e0
>> [ 18.807716] stack_depot_save+0x14/0x20
>> [ 18.811881] __drm_stack_depot_save+0x44/0x70 [drm]
>> [ 18.815710] modeset_lock.part.0+0xe0/0x1a4 [drm]
>> [ 18.820571] drm_modeset_lock_all_ctx+0x2d4/0x334 [drm]
>
> This stack_depot_save path appears to be new from Jani's commit
> cd06ab2fd48f ("drm/locking: add backtrace for locking contended locks
> without backoff")
> And there's a semantic conflict with my patch in mmotm:
> - sha1 (valid only in next-20211020) 5e6d063de5cd ("lib/stackdepot: allow
> optional init and stack_table allocation by kvmalloc()")
> - lore: https://lore.kernel.org/all/[email protected]/
> - patchwork: https://patchwork.freedesktop.org/series/95549/#rev3
>
> With my patch, to-be callers of stack_depot_save() need to call
> stack_depot_init() at least once, to avoid unnecessary runtime overhead
> otherwise I have added that calls into three DRM contexts in my patch, but
> didn't see cd06ab2fd48f yet at the time.
Auch, I did see your changes fly by, but overlooked this type of
conflict with my patch. Sorry for the trouble.
> This one seems a bit more tricky and I could really use some advice.
> cd06ab2fd48f adds stackdepot usage to drm_modeset_lock which itself has a
> number of different users and requiring those to call stack_depot_init()
> would be likely error prone. Would it be ok to add the call of
> stack_depot_init() (guarded by #ifdef CONFIG_DRM_DEBUG_MODESET_LOCK) to
> drm_modeset_lock_init()? It will do a mutex_lock()/unlock(), and kvmalloc()
> on first call.
> I don't know how much of hotpath this is, but hopefully should be acceptable
> in debug config. Or do you have better suggestion? Thanks.
I think that should be fine.
Maybe add __drm_stack_depot_init() in the existing #if
IS_ENABLED(CONFIG_DRM_DEBUG_MODESET_LOCK), similar to the other
__drm_stack_depot_*() functions, with an empty stub for
CONFIG_DRM_DEBUG_MODESET_LOCK=n, and call it unconditionally in
drm_modeset_lock_init().
> Then we have to figure out how to order a fix between DRM and mmotm...
That is the question! The problem exists only in the merge of the
two. On current DRM side stack_depot_init() exists but it's __init and
does not look safe to call multiple times. And obviously my changes
don't exist at all in mmotm.
I guess one (admittedly hackish) option is to first add a patch in
drm-next (or drm-misc-next) that makes it safe to call
stack_depot_init() multiple times in non-init context. It would be
dropped in favour of your changes once the trees get merged together.
Or is there some way for __drm_stack_depot_init() to detect whether it
should call stack_depot_init() or not, i.e. whether your changes are
there or not?
BR,
Jani.
>
>> [ 18.825435] drm_client_firmware_config.constprop.0.isra.0+0xc0/0x5d0 [drm]
>> [ 18.830478] drm_client_modeset_probe+0x328/0xbb0 [drm]
>> [ 18.837413] __drm_fb_helper_initial_config_and_unlock+0x54/0x5b4
>> [drm_kms_helper]
>> [ 18.842633] drm_fb_helper_initial_config+0x5c/0x70 [drm_kms_helper]
>> [ 18.850266] msm_fbdev_init+0x98/0x100 [msm]
>> [ 18.856767] msm_drm_bind+0x650/0x720 [msm]
>> [ 18.861021] try_to_bring_up_master+0x230/0x320
>> [ 18.864926] __component_add+0xc8/0x1c4
>> [ 18.869435] component_add+0x20/0x30
>> [ 18.873253] mdp5_dev_probe+0xe0/0x11c [msm]
>> [ 18.877077] platform_probe+0x74/0xf0
>> [ 18.881328] really_probe+0xc4/0x470
>> [ 18.884883] __driver_probe_device+0x11c/0x190
>> [ 18.888534] driver_probe_device+0x48/0x110
>> [ 18.892786] __device_attach_driver+0xa4/0x140
>> [ 18.896869] bus_for_each_drv+0x84/0xe0
>> [ 18.901380] __device_attach+0xe4/0x1c0
>> [ 18.905112] device_initial_probe+0x20/0x30
>> [ 18.908932] bus_probe_device+0xac/0xb4
>> [ 18.913098] deferred_probe_work_func+0xc8/0x120
>> [ 18.916920] process_one_work+0x280/0x6a0
>> [ 18.921780] worker_thread+0x80/0x454
>> [ 18.925683] kthread+0x178/0x184
>> [ 18.929326] ret_from_fork+0x10/0x20
>> [ 18.932634] Code: d37d4e99 92404e9c f940077a 8b190359 (c8dfff33)
>> [ 18.936203] ---[ end trace 3e289b724840642d ]---
>>
>> Full log,
>> https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20211020/testrun/6177937/suite/linux-log-parser/test/check-kernel-oops-3786583/log
>> https://lkft.validation.linaro.org/scheduler/job/3786583#L2549
>>
>> Build config:
>> https://builds.tuxbuild.com/1zlLlQrUyHVr1MQ1gcler3dKaE6/config
>>
>> Reported-by: Linux Kernel Functional Testing <[email protected]>
>>
>> steps to reproduce:
>> 1) https://builds.tuxbuild.com/1zlLlQrUyHVr1MQ1gcler3dKaE6/tuxmake_reproducer.sh
>> 2) Boot db410c device
>>
>> --
>> Linaro LKFT
>> https://lkft.linaro.org
>>
>
--
Jani Nikula, Intel Open Source Graphics Center
On 10/21/21 10:40, Jani Nikula wrote:
> On Thu, 21 Oct 2021, Vlastimil Babka <[email protected]> wrote:
>> This one seems a bit more tricky and I could really use some advice.
>> cd06ab2fd48f adds stackdepot usage to drm_modeset_lock which itself has a
>> number of different users and requiring those to call stack_depot_init()
>> would be likely error prone. Would it be ok to add the call of
>> stack_depot_init() (guarded by #ifdef CONFIG_DRM_DEBUG_MODESET_LOCK) to
>> drm_modeset_lock_init()? It will do a mutex_lock()/unlock(), and kvmalloc()
>> on first call.
>> I don't know how much of hotpath this is, but hopefully should be acceptable
>> in debug config. Or do you have better suggestion? Thanks.
>
> I think that should be fine.
>
> Maybe add __drm_stack_depot_init() in the existing #if
> IS_ENABLED(CONFIG_DRM_DEBUG_MODESET_LOCK), similar to the other
> __drm_stack_depot_*() functions, with an empty stub for
> CONFIG_DRM_DEBUG_MODESET_LOCK=n, and call it unconditionally in
> drm_modeset_lock_init().
Good idea.
>> Then we have to figure out how to order a fix between DRM and mmotm...
>
> That is the question! The problem exists only in the merge of the
> two. On current DRM side stack_depot_init() exists but it's __init and
> does not look safe to call multiple times. And obviously my changes
> don't exist at all in mmotm.
>
> I guess one (admittedly hackish) option is to first add a patch in
> drm-next (or drm-misc-next) that makes it safe to call
> stack_depot_init() multiple times in non-init context. It would be
> dropped in favour of your changes once the trees get merged together.
>
> Or is there some way for __drm_stack_depot_init() to detect whether it
> should call stack_depot_init() or not, i.e. whether your changes are
> there or not?
Let's try the easiest approach first. AFAIK mmotm series is now split to
pre-next and post-next part and moving my patch
lib-stackdepot-allow-optional-init-and-stack_table-allocation-by-kvmalloc.patch
with the following fixup to the post-next part should solve this. Would that
work, Andrew? Thanks.
----8<----
From 719e91df5571034b62fa992f6738b00f8d29020e Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <[email protected]>
Date: Thu, 21 Oct 2021 19:43:33 +0200
Subject: [PATCH] lib/stackdepot: allow optional init and stack_table
allocation by kvmalloc() - fixup3
Due to cd06ab2fd48f ("drm/locking: add backtrace for locking contended locks
without backoff") landing recently to -next adding a new stack depot user in
drivers/gpu/drm/drm_modeset_lock.c we need to add an appropriate call to
stack_depot_init() there as well.
Signed-off-by: Vlastimil Babka <[email protected]>
---
drivers/gpu/drm/drm_modeset_lock.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/gpu/drm/drm_modeset_lock.c b/drivers/gpu/drm/drm_modeset_lock.c
index c97323365675..918065982db4 100644
--- a/drivers/gpu/drm/drm_modeset_lock.c
+++ b/drivers/gpu/drm/drm_modeset_lock.c
@@ -107,6 +107,11 @@ static void __drm_stack_depot_print(depot_stack_handle_t stack_depot)
kfree(buf);
}
+
+static void __drm_stack_depot_init(void)
+{
+ stack_depot_init();
+}
#else /* CONFIG_DRM_DEBUG_MODESET_LOCK */
static depot_stack_handle_t __drm_stack_depot_save(void)
{
@@ -115,6 +120,9 @@ static depot_stack_handle_t __drm_stack_depot_save(void)
static void __drm_stack_depot_print(depot_stack_handle_t stack_depot)
{
}
+static void __drm_stack_depot_init(void)
+{
+}
#endif /* CONFIG_DRM_DEBUG_MODESET_LOCK */
/**
@@ -359,6 +367,7 @@ void drm_modeset_lock_init(struct drm_modeset_lock *lock)
{
ww_mutex_init(&lock->mutex, &crtc_ww_class);
INIT_LIST_HEAD(&lock->head);
+ __drm_stack_depot_init();
}
EXPORT_SYMBOL(drm_modeset_lock_init);
--
2.33.0
On Thu, 21 Oct 2021 19:51:20 +0200 Vlastimil Babka <[email protected]> wrote:
> >> Then we have to figure out how to order a fix between DRM and mmotm...
> >
> > That is the question! The problem exists only in the merge of the
> > two. On current DRM side stack_depot_init() exists but it's __init and
> > does not look safe to call multiple times. And obviously my changes
> > don't exist at all in mmotm.
> >
> > I guess one (admittedly hackish) option is to first add a patch in
> > drm-next (or drm-misc-next) that makes it safe to call
> > stack_depot_init() multiple times in non-init context. It would be
> > dropped in favour of your changes once the trees get merged together.
> >
> > Or is there some way for __drm_stack_depot_init() to detect whether it
> > should call stack_depot_init() or not, i.e. whether your changes are
> > there or not?
>
> Let's try the easiest approach first. AFAIK mmotm series is now split to
> pre-next and post-next part
It has been this way for many years!
> and moving my patch
> lib-stackdepot-allow-optional-init-and-stack_table-allocation-by-kvmalloc.patch
> with the following fixup to the post-next part should solve this. Would that
> work, Andrew? Thanks.
For this reason. No probs, thanks.
I merge up the post-linux-next parts late in the merge window. I do
need to manually check that the prerequisites are in mainline, because
sometimes the patches apply OK but don't make sense.
On 10/22/21 05:38, Andrew Morton wrote:
> On Thu, 21 Oct 2021 19:51:20 +0200 Vlastimil Babka <[email protected]> wrote:
>
>> >> Then we have to figure out how to order a fix between DRM and mmotm...
>> >
>> > That is the question! The problem exists only in the merge of the
>> > two. On current DRM side stack_depot_init() exists but it's __init and
>> > does not look safe to call multiple times. And obviously my changes
>> > don't exist at all in mmotm.
>> >
>> > I guess one (admittedly hackish) option is to first add a patch in
>> > drm-next (or drm-misc-next) that makes it safe to call
>> > stack_depot_init() multiple times in non-init context. It would be
>> > dropped in favour of your changes once the trees get merged together.
>> >
>> > Or is there some way for __drm_stack_depot_init() to detect whether it
>> > should call stack_depot_init() or not, i.e. whether your changes are
>> > there or not?
>>
>> Let's try the easiest approach first. AFAIK mmotm series is now split to
>> pre-next and post-next part
>
> It has been this way for many years!
Aha, great. Looks like I misinterpreted few months ago the thread about
adding folio tree to next.
>> and moving my patch
>> lib-stackdepot-allow-optional-init-and-stack_table-allocation-by-kvmalloc.patch
>> with the following fixup to the post-next part should solve this. Would that
>> work, Andrew? Thanks.
>
> For this reason. No probs, thanks.
Thanks!
> I merge up the post-linux-next parts late in the merge window. I do
> need to manually check that the prerequisites are in mainline, because
> sometimes the patches apply OK but don't make sense.
>