The MSM DRM is currently broken in multiple ways with respect to probe
deferral. Not only does the driver currently fail to probe again after a
late deferral, but due to a related use-after-free bug this also
triggers NULL-pointer dereferences.
These bugs are not new but have become critical with the release of
5.19 where probe is deferred in case the aux-bus EP panel driver has not
yet been loaded.
The underlying problem is lifetime issues due to careless use of
device-managed resources.
Specifically, device-managed resources allocated post component bind
must be tied to the lifetime of the aggregate DRM device or they will
not necessarily be released when binding of the aggregate device is
deferred.
The following call chain and pseudo code serves as an illustration of
the problem:
- platform_probe(pdev1)
- dp_display_probe()
- component_add()
- platform_probe(pdev2) // last component
- dp_display_probe() // d0
- component_add()
- try_to_bring_up_aggregate_device()
- devres_open_group(adev->parent) // d1
- msm_drm_bind()
- msm_drm_init()
- component_bind_all()
- for_each_component()
- component_bind()
- devres_open_group(&pdev->dev) // d2
- dp_display_bind()
- devm_kzalloc(&pdev->dev) // a1, OK
- devres_close_group(&pdev->dev) // d3
- dpu_kms_hw_init()
- for_each_panel()
- msm_dp_modeset_init()
- dp_display_request_irq()
- devm_request_irq(&pdev->dev) // a2, BUG
- if (pdev == pdev2 && condition)
- return -EPROBE_DEFER;
- if (error)
- component_unbind_all()
- for_each_component()
- component_unbind()
- dp_display_unbind()
- devres_release_group(&pdev->dev) // d4, only a1 is freed
- if (error)
- devres_release_group(adev->parent) // d5
The device-managed allocation a2 is buggy as its lifetime is tied to the
component platform device and will not be released when the aggregate
device bind fails (e.g. due to a probe deferral).
When pdev2 is later probed again, the attempt to allocate the IRQ a
second time will fail for pdev1 (which is still bound to its platform
driver).
This series fixes the lifetime issues by tying the lifetime of a2 (and
similar allocations) to the lifetime of the aggregate device so that a2
is released at d5.
In some cases, such has for the DP IRQ, the above situation can also be
avoided by moving the allocation in question to the platform driver
probe (d0) or component bind (between d2 and d3). But as doing so is not
a general fix, this can be done later as a cleanup/optimisation.
Johan
Johan Hovold (7):
drm/msm: fix use-after-free on probe deferral
drm/msm: fix memory corruption with too many bridges
drm/msm/dp: fix IRQ lifetime
drm/msm/dp: fix aux-bus EP lifetime
drm/msm/dp: fix bridge lifetime
drm/msm/hdmi: fix IRQ lifetime
drm/msm: drop modeset sanity checks
drivers/gpu/drm/bridge/parade-ps8640.c | 2 +-
drivers/gpu/drm/display/drm_dp_aux_bus.c | 5 +++--
drivers/gpu/drm/msm/dp/dp_display.c | 16 +++++++++-------
drivers/gpu/drm/msm/dp/dp_parser.c | 6 +++---
drivers/gpu/drm/msm/dp/dp_parser.h | 5 +++--
drivers/gpu/drm/msm/dsi/dsi.c | 9 +++++----
drivers/gpu/drm/msm/hdmi/hdmi.c | 7 ++++++-
drivers/gpu/drm/msm/msm_drv.c | 1 +
include/drm/display/drm_dp_aux_bus.h | 6 +++---
9 files changed, 34 insertions(+), 23 deletions(-)
--
2.35.1
Drop the overly defensive modeset sanity checks of function parameters
which have already been checked or used by the callers.
Signed-off-by: Johan Hovold <[email protected]>
---
drivers/gpu/drm/msm/dp/dp_display.c | 7 +------
drivers/gpu/drm/msm/dsi/dsi.c | 7 +------
2 files changed, 2 insertions(+), 12 deletions(-)
diff --git a/drivers/gpu/drm/msm/dp/dp_display.c b/drivers/gpu/drm/msm/dp/dp_display.c
index 393af1ea9ed8..8ad28bf81abe 100644
--- a/drivers/gpu/drm/msm/dp/dp_display.c
+++ b/drivers/gpu/drm/msm/dp/dp_display.c
@@ -1597,15 +1597,10 @@ static int dp_display_get_next_bridge(struct msm_dp *dp)
int msm_dp_modeset_init(struct msm_dp *dp_display, struct drm_device *dev,
struct drm_encoder *encoder)
{
- struct msm_drm_private *priv;
+ struct msm_drm_private *priv = dev->dev_private;
struct dp_display_private *dp_priv;
int ret;
- if (WARN_ON(!encoder) || WARN_ON(!dp_display) || WARN_ON(!dev))
- return -EINVAL;
-
- priv = dev->dev_private;
-
if (priv->num_bridges == ARRAY_SIZE(priv->bridges)) {
DRM_DEV_ERROR(dev->dev, "too many bridges\n");
return -ENOSPC;
diff --git a/drivers/gpu/drm/msm/dsi/dsi.c b/drivers/gpu/drm/msm/dsi/dsi.c
index 8a95c744972a..31fdee2052be 100644
--- a/drivers/gpu/drm/msm/dsi/dsi.c
+++ b/drivers/gpu/drm/msm/dsi/dsi.c
@@ -211,14 +211,9 @@ void __exit msm_dsi_unregister(void)
int msm_dsi_modeset_init(struct msm_dsi *msm_dsi, struct drm_device *dev,
struct drm_encoder *encoder)
{
- struct msm_drm_private *priv;
+ struct msm_drm_private *priv = dev->dev_private;
int ret;
- if (WARN_ON(!encoder) || WARN_ON(!msm_dsi) || WARN_ON(!dev))
- return -EINVAL;
-
- priv = dev->dev_private;
-
if (priv->num_bridges == ARRAY_SIZE(priv->bridges)) {
DRM_DEV_ERROR(dev->dev, "too many bridges\n");
return -ENOSPC;
--
2.35.1
Adding kuogee to this series
Hi Johan
Thanks for posting this.
We will take a look at this, re-validate and give our reviews/tested-bys.
Thanks
Abhinav
On 9/12/2022 8:40 AM, Johan Hovold wrote:
> The MSM DRM is currently broken in multiple ways with respect to probe
> deferral. Not only does the driver currently fail to probe again after a
> late deferral, but due to a related use-after-free bug this also
> triggers NULL-pointer dereferences.
>
> These bugs are not new but have become critical with the release of
> 5.19 where probe is deferred in case the aux-bus EP panel driver has not
> yet been loaded.
>
> The underlying problem is lifetime issues due to careless use of
> device-managed resources.
>
> Specifically, device-managed resources allocated post component bind
> must be tied to the lifetime of the aggregate DRM device or they will
> not necessarily be released when binding of the aggregate device is
> deferred.
>
> The following call chain and pseudo code serves as an illustration of
> the problem:
>
> - platform_probe(pdev1)
> - dp_display_probe()
> - component_add()
>
> - platform_probe(pdev2) // last component
> - dp_display_probe() // d0
> - component_add()
> - try_to_bring_up_aggregate_device()
> - devres_open_group(adev->parent) // d1
>
> - msm_drm_bind()
> - msm_drm_init()
> - component_bind_all()
> - for_each_component()
> - component_bind()
> - devres_open_group(&pdev->dev) // d2
> - dp_display_bind()
> - devm_kzalloc(&pdev->dev) // a1, OK
> - devres_close_group(&pdev->dev) // d3
>
> - dpu_kms_hw_init()
> - for_each_panel()
> - msm_dp_modeset_init()
> - dp_display_request_irq()
> - devm_request_irq(&pdev->dev) // a2, BUG
> - if (pdev == pdev2 && condition)
> - return -EPROBE_DEFER;
>
> - if (error)
> - component_unbind_all()
> - for_each_component()
> - component_unbind()
> - dp_display_unbind()
> - devres_release_group(&pdev->dev) // d4, only a1 is freed
>
> - if (error)
> - devres_release_group(adev->parent) // d5
>
> The device-managed allocation a2 is buggy as its lifetime is tied to the
> component platform device and will not be released when the aggregate
> device bind fails (e.g. due to a probe deferral).
>
> When pdev2 is later probed again, the attempt to allocate the IRQ a
> second time will fail for pdev1 (which is still bound to its platform
> driver).
>
> This series fixes the lifetime issues by tying the lifetime of a2 (and
> similar allocations) to the lifetime of the aggregate device so that a2
> is released at d5.
>
> In some cases, such has for the DP IRQ, the above situation can also be
> avoided by moving the allocation in question to the platform driver
> probe (d0) or component bind (between d2 and d3). But as doing so is not
> a general fix, this can be done later as a cleanup/optimisation.
>
> Johan
>
>
> Johan Hovold (7):
> drm/msm: fix use-after-free on probe deferral
> drm/msm: fix memory corruption with too many bridges
> drm/msm/dp: fix IRQ lifetime
> drm/msm/dp: fix aux-bus EP lifetime
> drm/msm/dp: fix bridge lifetime
> drm/msm/hdmi: fix IRQ lifetime
> drm/msm: drop modeset sanity checks
>
> drivers/gpu/drm/bridge/parade-ps8640.c | 2 +-
> drivers/gpu/drm/display/drm_dp_aux_bus.c | 5 +++--
> drivers/gpu/drm/msm/dp/dp_display.c | 16 +++++++++-------
> drivers/gpu/drm/msm/dp/dp_parser.c | 6 +++---
> drivers/gpu/drm/msm/dp/dp_parser.h | 5 +++--
> drivers/gpu/drm/msm/dsi/dsi.c | 9 +++++----
> drivers/gpu/drm/msm/hdmi/hdmi.c | 7 ++++++-
> drivers/gpu/drm/msm/msm_drv.c | 1 +
> include/drm/display/drm_dp_aux_bus.h | 6 +++---
> 9 files changed, 34 insertions(+), 23 deletions(-)
>
On 12/09/2022 18:40, Johan Hovold wrote:
> Drop the overly defensive modeset sanity checks of function parameters
> which have already been checked or used by the callers.
>
> Signed-off-by: Johan Hovold <[email protected]>
Again, please split into dp and dsi patches. After that:
Reviewed-by: Dmitry Baryshkov <[email protected]>
> ---
> drivers/gpu/drm/msm/dp/dp_display.c | 7 +------
> drivers/gpu/drm/msm/dsi/dsi.c | 7 +------
> 2 files changed, 2 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/dp/dp_display.c b/drivers/gpu/drm/msm/dp/dp_display.c
> index 393af1ea9ed8..8ad28bf81abe 100644
> --- a/drivers/gpu/drm/msm/dp/dp_display.c
> +++ b/drivers/gpu/drm/msm/dp/dp_display.c
> @@ -1597,15 +1597,10 @@ static int dp_display_get_next_bridge(struct msm_dp *dp)
> int msm_dp_modeset_init(struct msm_dp *dp_display, struct drm_device *dev,
> struct drm_encoder *encoder)
> {
> - struct msm_drm_private *priv;
> + struct msm_drm_private *priv = dev->dev_private;
> struct dp_display_private *dp_priv;
> int ret;
>
> - if (WARN_ON(!encoder) || WARN_ON(!dp_display) || WARN_ON(!dev))
> - return -EINVAL;
> -
> - priv = dev->dev_private;
> -
> if (priv->num_bridges == ARRAY_SIZE(priv->bridges)) {
> DRM_DEV_ERROR(dev->dev, "too many bridges\n");
> return -ENOSPC;
> diff --git a/drivers/gpu/drm/msm/dsi/dsi.c b/drivers/gpu/drm/msm/dsi/dsi.c
> index 8a95c744972a..31fdee2052be 100644
> --- a/drivers/gpu/drm/msm/dsi/dsi.c
> +++ b/drivers/gpu/drm/msm/dsi/dsi.c
> @@ -211,14 +211,9 @@ void __exit msm_dsi_unregister(void)
> int msm_dsi_modeset_init(struct msm_dsi *msm_dsi, struct drm_device *dev,
> struct drm_encoder *encoder)
> {
> - struct msm_drm_private *priv;
> + struct msm_drm_private *priv = dev->dev_private;
> int ret;
>
> - if (WARN_ON(!encoder) || WARN_ON(!msm_dsi) || WARN_ON(!dev))
> - return -EINVAL;
> -
> - priv = dev->dev_private;
> -
> if (priv->num_bridges == ARRAY_SIZE(priv->bridges)) {
> DRM_DEV_ERROR(dev->dev, "too many bridges\n");
> return -ENOSPC;
--
With best wishes
Dmitry
On Mon, Sep 12, 2022 at 09:06:28PM +0300, Dmitry Baryshkov wrote:
> On 12/09/2022 18:40, Johan Hovold wrote:
> > Drop the overly defensive modeset sanity checks of function parameters
> > which have already been checked or used by the callers.
> >
> > Signed-off-by: Johan Hovold <[email protected]>
>
> Again, please split into dp and dsi patches. After that:
Sure, if you prefer. But without the stable-tree argument I think
there's little point in splitting.
> Reviewed-by: Dmitry Baryshkov <[email protected]>
>
> > ---
> > drivers/gpu/drm/msm/dp/dp_display.c | 7 +------
> > drivers/gpu/drm/msm/dsi/dsi.c | 7 +------
> > 2 files changed, 2 insertions(+), 12 deletions(-)
Johan