During probe defer, drm device is not initialized and an external
trigger to shutdown is trying to clean up drm device leading to crash.
Add checks to avoid drm device cleanup in such cases.
BUG: unable to handle kernel NULL pointer dereference at virtual
address 00000000000000b8
Call trace:
drm_atomic_helper_shutdown+0x44/0x144
msm_pdev_shutdown+0x2c/0x38
platform_shutdown+0x2c/0x38
device_shutdown+0x158/0x210
kernel_restart_prepare+0x40/0x4c
kernel_restart+0x20/0x6c
__arm64_sys_reboot+0x194/0x23c
invoke_syscall+0x50/0x13c
el0_svc_common+0xa0/0x17c
do_el0_svc_compat+0x28/0x34
el0_svc_compat+0x20/0x70
el0t_32_sync_handler+0xa8/0xcc
el0t_32_sync+0x1a8/0x1ac
Changes in v2:
- Add fixes tag.
Fixes: 623f279c778 ("drm/msm: fix shutdown hook in case GPU components failed to bind")
Signed-off-by: Vinod Polimera <[email protected]>
---
drivers/gpu/drm/msm/msm_drv.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 4448536..d62ac66 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -142,6 +142,9 @@ static void msm_irq_uninstall(struct drm_device *dev)
struct msm_drm_private *priv = dev->dev_private;
struct msm_kms *kms = priv->kms;
+ if (!irq_has_action(kms->irq))
+ return;
+
kms->funcs->irq_uninstall(kms);
if (kms->irq_requested)
free_irq(kms->irq, dev);
@@ -259,6 +262,7 @@ static int msm_drm_uninit(struct device *dev)
ddev->dev_private = NULL;
drm_dev_put(ddev);
+ priv->dev = NULL;
destroy_workqueue(priv->wq);
@@ -1167,7 +1171,7 @@ void msm_drv_shutdown(struct platform_device *pdev)
struct msm_drm_private *priv = platform_get_drvdata(pdev);
struct drm_device *drm = priv ? priv->dev : NULL;
- if (!priv || !priv->kms)
+ if (!priv || !priv->kms || !drm)
return;
drm_atomic_helper_shutdown(drm);
--
2.7.4
On 03/06/2022 12:42, Vinod Polimera wrote:
> During probe defer, drm device is not initialized and an external
> trigger to shutdown is trying to clean up drm device leading to crash.
> Add checks to avoid drm device cleanup in such cases.
>
> BUG: unable to handle kernel NULL pointer dereference at virtual
> address 00000000000000b8
>
> Call trace:
>
> drm_atomic_helper_shutdown+0x44/0x144
> msm_pdev_shutdown+0x2c/0x38
> platform_shutdown+0x2c/0x38
> device_shutdown+0x158/0x210
> kernel_restart_prepare+0x40/0x4c
> kernel_restart+0x20/0x6c
> __arm64_sys_reboot+0x194/0x23c
> invoke_syscall+0x50/0x13c
> el0_svc_common+0xa0/0x17c
> do_el0_svc_compat+0x28/0x34
> el0_svc_compat+0x20/0x70
> el0t_32_sync_handler+0xa8/0xcc
> el0t_32_sync+0x1a8/0x1ac
>
> Changes in v2:
> - Add fixes tag.
>
> Fixes: 623f279c778 ("drm/msm: fix shutdown hook in case GPU components failed to bind")
> Signed-off-by: Vinod Polimera <[email protected]>
Also please remove bouncing quicinc.com emails from cc list
> ---
> drivers/gpu/drm/msm/msm_drv.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
> index 4448536..d62ac66 100644
> --- a/drivers/gpu/drm/msm/msm_drv.c
> +++ b/drivers/gpu/drm/msm/msm_drv.c
> @@ -142,6 +142,9 @@ static void msm_irq_uninstall(struct drm_device *dev)
> struct msm_drm_private *priv = dev->dev_private;
> struct msm_kms *kms = priv->kms;
>
> + if (!irq_has_action(kms->irq))
> + return;
> +
> kms->funcs->irq_uninstall(kms);
> if (kms->irq_requested)
> free_irq(kms->irq, dev);
> @@ -259,6 +262,7 @@ static int msm_drm_uninit(struct device *dev)
>
> ddev->dev_private = NULL;
> drm_dev_put(ddev);
> + priv->dev = NULL;
>
> destroy_workqueue(priv->wq);
>
> @@ -1167,7 +1171,7 @@ void msm_drv_shutdown(struct platform_device *pdev)
> struct msm_drm_private *priv = platform_get_drvdata(pdev);
> struct drm_device *drm = priv ? priv->dev : NULL;
>
> - if (!priv || !priv->kms)
> + if (!priv || !priv->kms || !drm)
> return;
>
> drm_atomic_helper_shutdown(drm);
--
With best wishes
Dmitry
On 03/06/2022 12:42, Vinod Polimera wrote:
> During probe defer, drm device is not initialized and an external
> trigger to shutdown is trying to clean up drm device leading to crash.
> Add checks to avoid drm device cleanup in such cases.
>
> BUG: unable to handle kernel NULL pointer dereference at virtual
> address 00000000000000b8
>
> Call trace:
>
> drm_atomic_helper_shutdown+0x44/0x144
> msm_pdev_shutdown+0x2c/0x38
> platform_shutdown+0x2c/0x38
> device_shutdown+0x158/0x210
> kernel_restart_prepare+0x40/0x4c
> kernel_restart+0x20/0x6c
> __arm64_sys_reboot+0x194/0x23c
> invoke_syscall+0x50/0x13c
> el0_svc_common+0xa0/0x17c
> do_el0_svc_compat+0x28/0x34
> el0_svc_compat+0x20/0x70
> el0t_32_sync_handler+0xa8/0xcc
> el0t_32_sync+0x1a8/0x1ac
>
> Changes in v2:
> - Add fixes tag.
I'm still waiting for an answer to the questions raised in v1 review.
>
> Fixes: 623f279c778 ("drm/msm: fix shutdown hook in case GPU components failed to bind")
> Signed-off-by: Vinod Polimera <[email protected]>
> ---
> drivers/gpu/drm/msm/msm_drv.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
> index 4448536..d62ac66 100644
> --- a/drivers/gpu/drm/msm/msm_drv.c
> +++ b/drivers/gpu/drm/msm/msm_drv.c
> @@ -142,6 +142,9 @@ static void msm_irq_uninstall(struct drm_device *dev)
> struct msm_drm_private *priv = dev->dev_private;
> struct msm_kms *kms = priv->kms;
>
> + if (!irq_has_action(kms->irq))
> + return;
> +
> kms->funcs->irq_uninstall(kms);
> if (kms->irq_requested)
> free_irq(kms->irq, dev);
> @@ -259,6 +262,7 @@ static int msm_drm_uninit(struct device *dev)
>
> ddev->dev_private = NULL;
> drm_dev_put(ddev);
> + priv->dev = NULL;
>
> destroy_workqueue(priv->wq);
>
> @@ -1167,7 +1171,7 @@ void msm_drv_shutdown(struct platform_device *pdev)
> struct msm_drm_private *priv = platform_get_drvdata(pdev);
> struct drm_device *drm = priv ? priv->dev : NULL;
>
> - if (!priv || !priv->kms)
> + if (!priv || !priv->kms || !drm)
> return;
>
> drm_atomic_helper_shutdown(drm);
--
With best wishes
Dmitry
On 03/06/2022 12:42, Vinod Polimera wrote:
> During probe defer, drm device is not initialized and an external
> trigger to shutdown is trying to clean up drm device leading to crash.
> Add checks to avoid drm device cleanup in such cases.
>
> BUG: unable to handle kernel NULL pointer dereference at virtual
> address 00000000000000b8
>
> Call trace:
>
> drm_atomic_helper_shutdown+0x44/0x144
> msm_pdev_shutdown+0x2c/0x38
> platform_shutdown+0x2c/0x38
> device_shutdown+0x158/0x210
> kernel_restart_prepare+0x40/0x4c
> kernel_restart+0x20/0x6c
> __arm64_sys_reboot+0x194/0x23c
> invoke_syscall+0x50/0x13c
> el0_svc_common+0xa0/0x17c
> do_el0_svc_compat+0x28/0x34
> el0_svc_compat+0x20/0x70
> el0t_32_sync_handler+0xa8/0xcc
> el0t_32_sync+0x1a8/0x1ac
>
> Changes in v2:
> - Add fixes tag.
>
> Fixes: 623f279c778 ("drm/msm: fix shutdown hook in case GPU components failed to bind")
> Signed-off-by: Vinod Polimera <[email protected]>
Reviewed-by: Dmitry Baryshkov <[email protected]>
> ---
> drivers/gpu/drm/msm/msm_drv.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
> index 4448536..d62ac66 100644
> --- a/drivers/gpu/drm/msm/msm_drv.c
> +++ b/drivers/gpu/drm/msm/msm_drv.c
> @@ -142,6 +142,9 @@ static void msm_irq_uninstall(struct drm_device *dev)
> struct msm_drm_private *priv = dev->dev_private;
> struct msm_kms *kms = priv->kms;
>
> + if (!irq_has_action(kms->irq))
> + return;
> +
> kms->funcs->irq_uninstall(kms);
> if (kms->irq_requested)
> free_irq(kms->irq, dev);
> @@ -259,6 +262,7 @@ static int msm_drm_uninit(struct device *dev)
>
> ddev->dev_private = NULL;
> drm_dev_put(ddev);
> + priv->dev = NULL;
>
> destroy_workqueue(priv->wq);
>
> @@ -1167,7 +1171,7 @@ void msm_drv_shutdown(struct platform_device *pdev)
> struct msm_drm_private *priv = platform_get_drvdata(pdev);
> struct drm_device *drm = priv ? priv->dev : NULL;
>
> - if (!priv || !priv->kms)
> + if (!priv || !priv->kms || !drm)
> return;
>
> drm_atomic_helper_shutdown(drm);
--
With best wishes
Dmitry
On 03/06/2022 12:42, Vinod Polimera wrote:
> During probe defer, drm device is not initialized and an external
> trigger to shutdown is trying to clean up drm device leading to crash.
> Add checks to avoid drm device cleanup in such cases.
>
> BUG: unable to handle kernel NULL pointer dereference at virtual
> address 00000000000000b8
>
> Call trace:
>
> drm_atomic_helper_shutdown+0x44/0x144
> msm_pdev_shutdown+0x2c/0x38
> platform_shutdown+0x2c/0x38
> device_shutdown+0x158/0x210
> kernel_restart_prepare+0x40/0x4c
> kernel_restart+0x20/0x6c
> __arm64_sys_reboot+0x194/0x23c
> invoke_syscall+0x50/0x13c
> el0_svc_common+0xa0/0x17c
> do_el0_svc_compat+0x28/0x34
> el0_svc_compat+0x20/0x70
> el0t_32_sync_handler+0xa8/0xcc
> el0t_32_sync+0x1a8/0x1ac
>
> Changes in v2:
> - Add fixes tag.
>
> Fixes: 623f279c778 ("drm/msm: fix shutdown hook in case GPU components failed to bind")
> Signed-off-by: Vinod Polimera <[email protected]>
> ---
> drivers/gpu/drm/msm/msm_drv.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
> index 4448536..d62ac66 100644
> --- a/drivers/gpu/drm/msm/msm_drv.c
> +++ b/drivers/gpu/drm/msm/msm_drv.c
> @@ -142,6 +142,9 @@ static void msm_irq_uninstall(struct drm_device *dev)
> struct msm_drm_private *priv = dev->dev_private;
> struct msm_kms *kms = priv->kms;
>
> + if (!irq_has_action(kms->irq))
> + return;
As a second thought I'd still prefer a variable here. irq_has_action
would check that there is _any_ IRQ handler for this IRQ. While we do
not have anybody sharing this IRQ, I'd prefer to be clear here, that we
do not want to uninstall our IRQ handler rather than any IRQ handler.
> +
> kms->funcs->irq_uninstall(kms);
> if (kms->irq_requested)
> free_irq(kms->irq, dev);
> @@ -259,6 +262,7 @@ static int msm_drm_uninit(struct device *dev)
>
> ddev->dev_private = NULL;
> drm_dev_put(ddev);
> + priv->dev = NULL;
>
> destroy_workqueue(priv->wq);
>
> @@ -1167,7 +1171,7 @@ void msm_drv_shutdown(struct platform_device *pdev)
> struct msm_drm_private *priv = platform_get_drvdata(pdev);
> struct drm_device *drm = priv ? priv->dev : NULL;
>
> - if (!priv || !priv->kms)
> + if (!priv || !priv->kms || !drm)
> return;
>
> drm_atomic_helper_shutdown(drm);
--
With best wishes
Dmitry
On 15/06/2022 15:23, Dmitry Baryshkov wrote:
> On 03/06/2022 12:42, Vinod Polimera wrote:
>> During probe defer, drm device is not initialized and an external
>> trigger to shutdown is trying to clean up drm device leading to crash.
>> Add checks to avoid drm device cleanup in such cases.
>>
>> BUG: unable to handle kernel NULL pointer dereference at virtual
>> address 00000000000000b8
>>
>> Call trace:
>>
>> drm_atomic_helper_shutdown+0x44/0x144
>> msm_pdev_shutdown+0x2c/0x38
>> platform_shutdown+0x2c/0x38
>> device_shutdown+0x158/0x210
>> kernel_restart_prepare+0x40/0x4c
>> kernel_restart+0x20/0x6c
>> __arm64_sys_reboot+0x194/0x23c
>> invoke_syscall+0x50/0x13c
>> el0_svc_common+0xa0/0x17c
>> do_el0_svc_compat+0x28/0x34
>> el0_svc_compat+0x20/0x70
>> el0t_32_sync_handler+0xa8/0xcc
>> el0t_32_sync+0x1a8/0x1ac
>>
>> Changes in v2:
>> - Add fixes tag.
>>
>> Fixes: 623f279c778 ("drm/msm: fix shutdown hook in case GPU components
>> failed to bind")
>> Signed-off-by: Vinod Polimera <[email protected]>
>> ---
>> drivers/gpu/drm/msm/msm_drv.c | 6 +++++-
>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/msm/msm_drv.c
>> b/drivers/gpu/drm/msm/msm_drv.c
>> index 4448536..d62ac66 100644
>> --- a/drivers/gpu/drm/msm/msm_drv.c
>> +++ b/drivers/gpu/drm/msm/msm_drv.c
>> @@ -142,6 +142,9 @@ static void msm_irq_uninstall(struct drm_device *dev)
>> struct msm_drm_private *priv = dev->dev_private;
>> struct msm_kms *kms = priv->kms;
>> + if (!irq_has_action(kms->irq))
>> + return;
>
> As a second thought I'd still prefer a variable here. irq_has_action
> would check that there is _any_ IRQ handler for this IRQ. While we do
> not have anybody sharing this IRQ, I'd prefer to be clear here, that we
> do not want to uninstall our IRQ handler rather than any IRQ handler.
Vinod, do we still want to pursue this fix? If so, could you please
update it according to the comment.
>
>> +
>> kms->funcs->irq_uninstall(kms);
>> if (kms->irq_requested)
>> free_irq(kms->irq, dev);
>> @@ -259,6 +262,7 @@ static int msm_drm_uninit(struct device *dev)
>> ddev->dev_private = NULL;
>> drm_dev_put(ddev);
>> + priv->dev = NULL;
>> destroy_workqueue(priv->wq);
>> @@ -1167,7 +1171,7 @@ void msm_drv_shutdown(struct platform_device *pdev)
>> struct msm_drm_private *priv = platform_get_drvdata(pdev);
>> struct drm_device *drm = priv ? priv->dev : NULL;
>> - if (!priv || !priv->kms)
>> + if (!priv || !priv->kms || !drm)
>> return;
>> drm_atomic_helper_shutdown(drm);
>
>
--
With best wishes
Dmitry
> -----Original Message-----
> From: Dmitry Baryshkov <[email protected]>
> Sent: Friday, August 26, 2022 2:11 PM
> To: Vinod Polimera (QUIC) <[email protected]>; dri-
> [email protected]; [email protected];
> [email protected]; [email protected]
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]
> Subject: Re: [v2] drm/msm: add null checks for drm device to avoid crash
> during probe defer
>
> WARNING: This email originated from outside of Qualcomm. Please be wary
> of any links or attachments, and do not enable macros.
>
> On 15/06/2022 15:23, Dmitry Baryshkov wrote:
> > On 03/06/2022 12:42, Vinod Polimera wrote:
> >> During probe defer, drm device is not initialized and an external
> >> trigger to shutdown is trying to clean up drm device leading to crash.
> >> Add checks to avoid drm device cleanup in such cases.
> >>
> >> BUG: unable to handle kernel NULL pointer dereference at virtual
> >> address 00000000000000b8
> >>
> >> Call trace:
> >>
> >> drm_atomic_helper_shutdown+0x44/0x144
> >> msm_pdev_shutdown+0x2c/0x38
> >> platform_shutdown+0x2c/0x38
> >> device_shutdown+0x158/0x210
> >> kernel_restart_prepare+0x40/0x4c
> >> kernel_restart+0x20/0x6c
> >> __arm64_sys_reboot+0x194/0x23c
> >> invoke_syscall+0x50/0x13c
> >> el0_svc_common+0xa0/0x17c
> >> do_el0_svc_compat+0x28/0x34
> >> el0_svc_compat+0x20/0x70
> >> el0t_32_sync_handler+0xa8/0xcc
> >> el0t_32_sync+0x1a8/0x1ac
> >>
> >> Changes in v2:
> >> - Add fixes tag.
> >>
> >> Fixes: 623f279c778 ("drm/msm: fix shutdown hook in case GPU
> components
> >> failed to bind")
> >> Signed-off-by: Vinod Polimera <[email protected]>
> >> ---
> >> drivers/gpu/drm/msm/msm_drv.c | 6 +++++-
> >> 1 file changed, 5 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/msm/msm_drv.c
> >> b/drivers/gpu/drm/msm/msm_drv.c
> >> index 4448536..d62ac66 100644
> >> --- a/drivers/gpu/drm/msm/msm_drv.c
> >> +++ b/drivers/gpu/drm/msm/msm_drv.c
> >> @@ -142,6 +142,9 @@ static void msm_irq_uninstall(struct drm_device
> *dev)
> >> struct msm_drm_private *priv = dev->dev_private;
> >> struct msm_kms *kms = priv->kms;
> >> + if (!irq_has_action(kms->irq))
> >> + return;
> >
> > As a second thought I'd still prefer a variable here. irq_has_action
> > would check that there is _any_ IRQ handler for this IRQ. While we do
> > not have anybody sharing this IRQ, I'd prefer to be clear here, that we
> > do not want to uninstall our IRQ handler rather than any IRQ handler.
>
> Vinod, do we still want to pursue this fix? If so, could you please
> update it according to the comment.
>
I have looked up and found many kernel drivers are using Irq_has_action to see if the interrupt is requested, it appears to me as an aggregable way of doing it. Having a variable to track the state seems unnecessary as it needs to be managed race free. let me know your views on it.
> >
> >> +
> >> kms->funcs->irq_uninstall(kms);
> >> if (kms->irq_requested)
> >> free_irq(kms->irq, dev);
> >> @@ -259,6 +262,7 @@ static int msm_drm_uninit(struct device *dev)
> >> ddev->dev_private = NULL;
> >> drm_dev_put(ddev);
> >> + priv->dev = NULL;
> >> destroy_workqueue(priv->wq);
> >> @@ -1167,7 +1171,7 @@ void msm_drv_shutdown(struct
> platform_device *pdev)
> >> struct msm_drm_private *priv = platform_get_drvdata(pdev);
> >> struct drm_device *drm = priv ? priv->dev : NULL;
> >> - if (!priv || !priv->kms)
> >> + if (!priv || !priv->kms || !drm)
> >> return;
> >> drm_atomic_helper_shutdown(drm);
> >
> >
>
> --
> With best wishes
> Dmitry
- Vinod P.
Hello Vinod and Dmitry,
On 9/27/22 09:31, Vinod Polimera wrote:
>> -----Original Message-----
>> From: Dmitry Baryshkov <[email protected]>
>> Sent: Friday, August 26, 2022 2:11 PM
>> To: Vinod Polimera (QUIC) <[email protected]>; dri-
>> [email protected]; [email protected];
>> [email protected]; [email protected]
>> Cc: [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected];
>> [email protected]
>> Subject: Re: [v2] drm/msm: add null checks for drm device to avoid crash
>> during probe defer
>>
[...]
>> Vinod, do we still want to pursue this fix? If so, could you please
>> update it according to the comment.
>>
I don't think this patch is needed anymore, since AFAICT the issue has
been fixed by commit 0a58d2ae572a ("drm/msm: Make .remove and .shutdown
HW shutdown consistent") which is already in the drm/drm-next branch.
--
Best regards,
Javier Martinez Canillas
Core Platforms
Red Hat
> -----Original Message-----
> From: Javier Martinez Canillas <[email protected]>
> Sent: Tuesday, September 27, 2022 2:33 PM
> To: Vinod Polimera <[email protected]>;
> [email protected]; Vinod Polimera (QUIC)
> <[email protected]>; [email protected]; linux-arm-
> [email protected]; [email protected];
> [email protected]
> Cc: [email protected]; [email protected]; Abhinav Kumar
> <[email protected]>; [email protected];
> [email protected]; [email protected]
> Subject: Re: [v2] drm/msm: add null checks for drm device to avoid crash
> during probe defer
>
> WARNING: This email originated from outside of Qualcomm. Please be wary
> of any links or attachments, and do not enable macros.
>
> Hello Vinod and Dmitry,
>
> On 9/27/22 09:31, Vinod Polimera wrote:
> >> -----Original Message-----
> >> From: Dmitry Baryshkov <[email protected]>
> >> Sent: Friday, August 26, 2022 2:11 PM
> >> To: Vinod Polimera (QUIC) <[email protected]>; dri-
> >> [email protected]; [email protected];
> >> [email protected]; [email protected]
> >> Cc: [email protected]; [email protected];
> >> [email protected]; [email protected];
> [email protected];
> >> [email protected]
> >> Subject: Re: [v2] drm/msm: add null checks for drm device to avoid crash
> >> during probe defer
> >>
>
> [...]
>
> >> Vinod, do we still want to pursue this fix? If so, could you please
> >> update it according to the comment.
> >>
>
> I don't think this patch is needed anymore, since AFAICT the issue has
> been fixed by commit 0a58d2ae572a ("drm/msm: Make .remove and
> .shutdown
> HW shutdown consistent") which is already in the drm/drm-next branch.
Yes , Issue will be fixed with the commit 0a58d2ae572a ("drm/msm: Make .remove and .shutdown) . Hence we can drop this patch.
>
> --
> Best regards,
>
> Javier Martinez Canillas
> Core Platforms
> Red Hat
- Vinod P.