Subject: [PATCH v3 1/3] drm/panfrost: Ignore core_mask for poweroff and disable PWRTRANS irq

Some SoCs may be equipped with a GPU containing two core groups
and this is exactly the case of Samsung's Exynos 5422 featuring
an ARM Mali-T628 MP6 GPU: the support for this GPU in Panfrost
is partial, as this driver currently supports using only one
core group and that's reflected on all parts of it, including
the power on (and power off, previously to this patch) function.

The issue with this is that even though executing the soft reset
operation should power off all cores unconditionally, on at least
one platform we're seeing a crash that seems to be happening due
to an interrupt firing which may be because we are calling power
transition only on the first core group, leaving the second one
unchanged, or because ISR execution was pending before entering
the panfrost_gpu_power_off() function and executed after powering
off the GPU cores, or all of the above.

Finally, solve this by:
- Avoid to enable the power transition interrupt on reset; and
- Ignoring the core_mask and ask the GPU to poweroff both core groups

Fixes: 22aa1a209018 ("drm/panfrost: Really power off GPU cores in panfrost_gpu_power_off()")
Signed-off-by: AngeloGioacchino Del Regno <[email protected]>
---
drivers/gpu/drm/panfrost/panfrost_gpu.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c b/drivers/gpu/drm/panfrost/panfrost_gpu.c
index 09f5e1563ebd..bd41617c5e4b 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gpu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c
@@ -78,7 +78,12 @@ int panfrost_gpu_soft_reset(struct panfrost_device *pfdev)
}

gpu_write(pfdev, GPU_INT_CLEAR, GPU_IRQ_MASK_ALL);
- gpu_write(pfdev, GPU_INT_MASK, GPU_IRQ_MASK_ALL);
+
+ /* Only enable the interrupts we care about */
+ gpu_write(pfdev, GPU_INT_MASK,
+ GPU_IRQ_MASK_ERROR |
+ GPU_IRQ_PERFCNT_SAMPLE_COMPLETED |
+ GPU_IRQ_CLEAN_CACHES_COMPLETED);

/*
* All in-flight jobs should have released their cycle
@@ -425,11 +430,10 @@ void panfrost_gpu_power_on(struct panfrost_device *pfdev)

void panfrost_gpu_power_off(struct panfrost_device *pfdev)
{
- u64 core_mask = panfrost_get_core_mask(pfdev);
int ret;
u32 val;

- gpu_write(pfdev, SHADER_PWROFF_LO, pfdev->features.shader_present & core_mask);
+ gpu_write(pfdev, SHADER_PWROFF_LO, pfdev->features.shader_present);
ret = readl_relaxed_poll_timeout(pfdev->iomem + SHADER_PWRTRANS_LO,
val, !val, 1, 1000);
if (ret)
@@ -441,7 +445,7 @@ void panfrost_gpu_power_off(struct panfrost_device *pfdev)
if (ret)
dev_err(pfdev->dev, "tiler power transition timeout");

- gpu_write(pfdev, L2_PWROFF_LO, pfdev->features.l2_present & core_mask);
+ gpu_write(pfdev, L2_PWROFF_LO, pfdev->features.l2_present);
ret = readl_poll_timeout(pfdev->iomem + L2_PWRTRANS_LO,
val, !val, 0, 1000);
if (ret)
--
2.43.0


2023-12-01 10:59:53

by Boris Brezillon

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] drm/panfrost: Ignore core_mask for poweroff and disable PWRTRANS irq

On Fri, 1 Dec 2023 11:40:25 +0100
AngeloGioacchino Del Regno <[email protected]>
wrote:

> Some SoCs may be equipped with a GPU containing two core groups
> and this is exactly the case of Samsung's Exynos 5422 featuring
> an ARM Mali-T628 MP6 GPU: the support for this GPU in Panfrost
> is partial, as this driver currently supports using only one
> core group and that's reflected on all parts of it, including
> the power on (and power off, previously to this patch) function.
>
> The issue with this is that even though executing the soft reset
> operation should power off all cores unconditionally, on at least
> one platform we're seeing a crash that seems to be happening due
> to an interrupt firing which may be because we are calling power

^ might be caused by us doing a power
transition on the first core group only, leaving the second one
unchanged.

> transition only on the first core group, leaving the second one
> unchanged, or because ISR execution was pending before entering

unchanged. Our changes to the suspend logic seems to have
uncovered another issue where the GPU interrupt handler is called
after the device as entered the suspend state, which leads to invalid
register accesses because the GPU device is no longer
powered/clocked. Given the only addition that was done to the suspend
logic is the power-off requests, and the fact those generate PWRTRANS
interrupts, we have good reason to think the interrupts we are asked
to process in that case are the PWRTRANS ones.

> the panfrost_gpu_power_off() function and executed after powering
> off the GPU cores, or all of the above.
>
> Finally, solve this by:
> - Avoid to enable the power transition interrupt on reset; and
^ 'Avoiding' or maybe 'Not enabling power transition ...'

> - Ignoring the core_mask and ask the GPU to poweroff both core groups
>
> Fixes: 22aa1a209018 ("drm/panfrost: Really power off GPU cores in panfrost_gpu_power_off()")
> Signed-off-by: AngeloGioacchino Del Regno <[email protected]>

Still

Reviewed-by: Boris Brezillon <[email protected]>

> ---
> drivers/gpu/drm/panfrost/panfrost_gpu.c | 12 ++++++++----
> 1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c b/drivers/gpu/drm/panfrost/panfrost_gpu.c
> index 09f5e1563ebd..bd41617c5e4b 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gpu.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c
> @@ -78,7 +78,12 @@ int panfrost_gpu_soft_reset(struct panfrost_device *pfdev)
> }
>
> gpu_write(pfdev, GPU_INT_CLEAR, GPU_IRQ_MASK_ALL);
> - gpu_write(pfdev, GPU_INT_MASK, GPU_IRQ_MASK_ALL);
> +
> + /* Only enable the interrupts we care about */
> + gpu_write(pfdev, GPU_INT_MASK,
> + GPU_IRQ_MASK_ERROR |
> + GPU_IRQ_PERFCNT_SAMPLE_COMPLETED |
> + GPU_IRQ_CLEAN_CACHES_COMPLETED);
>
> /*
> * All in-flight jobs should have released their cycle
> @@ -425,11 +430,10 @@ void panfrost_gpu_power_on(struct panfrost_device *pfdev)
>
> void panfrost_gpu_power_off(struct panfrost_device *pfdev)
> {
> - u64 core_mask = panfrost_get_core_mask(pfdev);
> int ret;
> u32 val;
>
> - gpu_write(pfdev, SHADER_PWROFF_LO, pfdev->features.shader_present & core_mask);
> + gpu_write(pfdev, SHADER_PWROFF_LO, pfdev->features.shader_present);
> ret = readl_relaxed_poll_timeout(pfdev->iomem + SHADER_PWRTRANS_LO,
> val, !val, 1, 1000);
> if (ret)
> @@ -441,7 +445,7 @@ void panfrost_gpu_power_off(struct panfrost_device *pfdev)
> if (ret)
> dev_err(pfdev->dev, "tiler power transition timeout");
>
> - gpu_write(pfdev, L2_PWROFF_LO, pfdev->features.l2_present & core_mask);
> + gpu_write(pfdev, L2_PWROFF_LO, pfdev->features.l2_present);
> ret = readl_poll_timeout(pfdev->iomem + L2_PWRTRANS_LO,
> val, !val, 0, 1000);
> if (ret)

2023-12-01 12:35:00

by Steven Price

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] drm/panfrost: Ignore core_mask for poweroff and disable PWRTRANS irq

On 01/12/2023 10:40, AngeloGioacchino Del Regno wrote:
> Some SoCs may be equipped with a GPU containing two core groups
> and this is exactly the case of Samsung's Exynos 5422 featuring
> an ARM Mali-T628 MP6 GPU: the support for this GPU in Panfrost
> is partial, as this driver currently supports using only one
> core group and that's reflected on all parts of it, including
> the power on (and power off, previously to this patch) function.
>
> The issue with this is that even though executing the soft reset
> operation should power off all cores unconditionally, on at least
> one platform we're seeing a crash that seems to be happening due
> to an interrupt firing which may be because we are calling power
> transition only on the first core group, leaving the second one
> unchanged, or because ISR execution was pending before entering
> the panfrost_gpu_power_off() function and executed after powering
> off the GPU cores, or all of the above.
>
> Finally, solve this by:
> - Avoid to enable the power transition interrupt on reset; and
> - Ignoring the core_mask and ask the GPU to poweroff both core groups
>
> Fixes: 22aa1a209018 ("drm/panfrost: Really power off GPU cores in panfrost_gpu_power_off()")
> Signed-off-by: AngeloGioacchino Del Regno <[email protected]>

Reviewed-by: Steven Price <[email protected]>

> ---
> drivers/gpu/drm/panfrost/panfrost_gpu.c | 12 ++++++++----
> 1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c b/drivers/gpu/drm/panfrost/panfrost_gpu.c
> index 09f5e1563ebd..bd41617c5e4b 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gpu.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c
> @@ -78,7 +78,12 @@ int panfrost_gpu_soft_reset(struct panfrost_device *pfdev)
> }
>
> gpu_write(pfdev, GPU_INT_CLEAR, GPU_IRQ_MASK_ALL);
> - gpu_write(pfdev, GPU_INT_MASK, GPU_IRQ_MASK_ALL);
> +
> + /* Only enable the interrupts we care about */
> + gpu_write(pfdev, GPU_INT_MASK,
> + GPU_IRQ_MASK_ERROR |
> + GPU_IRQ_PERFCNT_SAMPLE_COMPLETED |
> + GPU_IRQ_CLEAN_CACHES_COMPLETED);
>
> /*
> * All in-flight jobs should have released their cycle
> @@ -425,11 +430,10 @@ void panfrost_gpu_power_on(struct panfrost_device *pfdev)
>
> void panfrost_gpu_power_off(struct panfrost_device *pfdev)
> {
> - u64 core_mask = panfrost_get_core_mask(pfdev);
> int ret;
> u32 val;
>
> - gpu_write(pfdev, SHADER_PWROFF_LO, pfdev->features.shader_present & core_mask);
> + gpu_write(pfdev, SHADER_PWROFF_LO, pfdev->features.shader_present);
> ret = readl_relaxed_poll_timeout(pfdev->iomem + SHADER_PWRTRANS_LO,
> val, !val, 1, 1000);
> if (ret)
> @@ -441,7 +445,7 @@ void panfrost_gpu_power_off(struct panfrost_device *pfdev)
> if (ret)
> dev_err(pfdev->dev, "tiler power transition timeout");
>
> - gpu_write(pfdev, L2_PWROFF_LO, pfdev->features.l2_present & core_mask);
> + gpu_write(pfdev, L2_PWROFF_LO, pfdev->features.l2_present);
> ret = readl_poll_timeout(pfdev->iomem + L2_PWRTRANS_LO,
> val, !val, 0, 1000);
> if (ret)