2021-08-10 16:22:03

by Robert Foss

[permalink] [raw]
Subject: [PATCH v1] media: camss: vfe: Don't use vfe->base before it's assigned

vfe->ops->hw_version(vfe) being called before vfe->base has been assigned
is incorrect and causes crashes.

Fixes: b10b5334528a9 ("media: camss: vfe: Don't read hardware version needlessly")

Reported-by: Linux Kernel Functional Testing <[email protected]>
Signed-off-by: Robert Foss <[email protected]>
---
drivers/media/platform/qcom/camss/camss-vfe.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/media/platform/qcom/camss/camss-vfe.c b/drivers/media/platform/qcom/camss/camss-vfe.c
index 6b2f33fc9be22..1c8d2f0f81207 100644
--- a/drivers/media/platform/qcom/camss/camss-vfe.c
+++ b/drivers/media/platform/qcom/camss/camss-vfe.c
@@ -1299,7 +1299,6 @@ int msm_vfe_subdev_init(struct camss *camss, struct vfe_device *vfe,
return -EINVAL;
}
vfe->ops->subdev_init(dev, vfe);
- vfe->ops->hw_version(vfe);

/* Memory */

@@ -1309,6 +1308,8 @@ int msm_vfe_subdev_init(struct camss *camss, struct vfe_device *vfe,
return PTR_ERR(vfe->base);
}

+ vfe->ops->hw_version(vfe);
+
/* Interrupt */

r = platform_get_resource_byname(pdev, IORESOURCE_IRQ,
--
2.30.2


2021-08-11 07:50:31

by Marek Szyprowski

[permalink] [raw]
Subject: Re: [PATCH v1] media: camss: vfe: Don't use vfe->base before it's assigned

On 10.08.2021 12:33, Robert Foss wrote:
> vfe->ops->hw_version(vfe) being called before vfe->base has been assigned
> is incorrect and causes crashes.
>
> Fixes: b10b5334528a9 ("media: camss: vfe: Don't read hardware version needlessly")
>
> Reported-by: Linux Kernel Functional Testing <[email protected]>
> Signed-off-by: Robert Foss <[email protected]>

With this patch applied on top of linux next-20210810 instead of the
NULL pointer dereference I get following error on DragonBoard410c while
loading kernel modules:

[   18.480608] qcom-venus 1d00000.video-codec: Adding to iommu group 1
[   18.536167] qcom-camss 1b0ac00.camss: Adding to iommu group 2
[   18.600373] Internal error: synchronous external abort: 96000010 [#1]
PREEMPT SMP

> ---
> drivers/media/platform/qcom/camss/camss-vfe.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/media/platform/qcom/camss/camss-vfe.c b/drivers/media/platform/qcom/camss/camss-vfe.c
> index 6b2f33fc9be22..1c8d2f0f81207 100644
> --- a/drivers/media/platform/qcom/camss/camss-vfe.c
> +++ b/drivers/media/platform/qcom/camss/camss-vfe.c
> @@ -1299,7 +1299,6 @@ int msm_vfe_subdev_init(struct camss *camss, struct vfe_device *vfe,
> return -EINVAL;
> }
> vfe->ops->subdev_init(dev, vfe);
> - vfe->ops->hw_version(vfe);
>
> /* Memory */
>
> @@ -1309,6 +1308,8 @@ int msm_vfe_subdev_init(struct camss *camss, struct vfe_device *vfe,
> return PTR_ERR(vfe->base);
> }
>
> + vfe->ops->hw_version(vfe);
> +
> /* Interrupt */
>
> r = platform_get_resource_byname(pdev, IORESOURCE_IRQ,

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland

2021-08-11 09:44:01

by Robert Foss

[permalink] [raw]
Subject: Re: [PATCH v1] media: camss: vfe: Don't use vfe->base before it's assigned

Hey Marek,

Thanks for testing this.

On Wed, 11 Aug 2021 at 09:48, Marek Szyprowski <[email protected]> wrote:
>
> On 10.08.2021 12:33, Robert Foss wrote:
> > vfe->ops->hw_version(vfe) being called before vfe->base has been assigned
> > is incorrect and causes crashes.
> >
> > Fixes: b10b5334528a9 ("media: camss: vfe: Don't read hardware version needlessly")
> >
> > Reported-by: Linux Kernel Functional Testing <[email protected]>
> > Signed-off-by: Robert Foss <[email protected]>
>
> With this patch applied on top of linux next-20210810 instead of the
> NULL pointer dereference I get following error on DragonBoard410c while
> loading kernel modules:
>
> [ 18.480608] qcom-venus 1d00000.video-codec: Adding to iommu group 1
> [ 18.536167] qcom-camss 1b0ac00.camss: Adding to iommu group 2
> [ 18.600373] Internal error: synchronous external abort: 96000010 [#1]
> PREEMPT SMP

I'll spin a v2 asap.

2021-08-11 13:44:27

by Robert Foss

[permalink] [raw]
Subject: Re: [PATCH v1] media: camss: vfe: Don't use vfe->base before it's assigned

On Wed, 11 Aug 2021 at 11:41, Robert Foss <[email protected]> wrote:
>
> Hey Marek,
>
> Thanks for testing this.
>
> On Wed, 11 Aug 2021 at 09:48, Marek Szyprowski <[email protected]> wrote:
> >
> > On 10.08.2021 12:33, Robert Foss wrote:
> > > vfe->ops->hw_version(vfe) being called before vfe->base has been assigned
> > > is incorrect and causes crashes.
> > >
> > > Fixes: b10b5334528a9 ("media: camss: vfe: Don't read hardware version needlessly")
> > >
> > > Reported-by: Linux Kernel Functional Testing <[email protected]>
> > > Signed-off-by: Robert Foss <[email protected]>
> >
> > With this patch applied on top of linux next-20210810 instead of the
> > NULL pointer dereference I get following error on DragonBoard410c while
> > loading kernel modules:
> >
> > [ 18.480608] qcom-venus 1d00000.video-codec: Adding to iommu group 1
> > [ 18.536167] qcom-camss 1b0ac00.camss: Adding to iommu group 2
> > [ 18.600373] Internal error: synchronous external abort: 96000010 [#1]
> > PREEMPT SMP
>

After testing this patch + linux-next[1] I'm not able to replicate the
'Internal error' above on the db410c. And I don't think it is related
to this patch.

Are you seeing the same error on [1]? And are you seeing it before the
b10b5334528a9 ("media: camss: vfe: Don't read hardware version
needlessly") patch?

[1] https://git.linaro.org/people/robert.foss/linux.git/log/?h=camss_print_fix_v1


Rob

2021-08-12 07:46:11

by Marek Szyprowski

[permalink] [raw]
Subject: Re: [PATCH v1] media: camss: vfe: Don't use vfe->base before it's assigned

Hi Robert,

On 11.08.2021 15:42, Robert Foss wrote:
> On Wed, 11 Aug 2021 at 11:41, Robert Foss <[email protected]> wrote:
>> Hey Marek,
>>
>> Thanks for testing this.
>>
>> On Wed, 11 Aug 2021 at 09:48, Marek Szyprowski <[email protected]> wrote:
>>> On 10.08.2021 12:33, Robert Foss wrote:
>>>> vfe->ops->hw_version(vfe) being called before vfe->base has been assigned
>>>> is incorrect and causes crashes.
>>>>
>>>> Fixes: b10b5334528a9 ("media: camss: vfe: Don't read hardware version needlessly")
>>>>
>>>> Reported-by: Linux Kernel Functional Testing <[email protected]>
>>>> Signed-off-by: Robert Foss <[email protected]>
>>> With this patch applied on top of linux next-20210810 instead of the
>>> NULL pointer dereference I get following error on DragonBoard410c while
>>> loading kernel modules:
>>>
>>> [ 18.480608] qcom-venus 1d00000.video-codec: Adding to iommu group 1
>>> [ 18.536167] qcom-camss 1b0ac00.camss: Adding to iommu group 2
>>> [ 18.600373] Internal error: synchronous external abort: 96000010 [#1]
>>> PREEMPT SMP
> After testing this patch + linux-next[1] I'm not able to replicate the
> 'Internal error' above on the db410c. And I don't think it is related
> to this patch.
>
> Are you seeing the same error on [1]? And are you seeing it before the
> b10b5334528a9 ("media: camss: vfe: Don't read hardware version
> needlessly") patch?

I've checked once again on your branch. Yes, it is fully reproducible on
my DragonBoard410c. On your branch I get the above synchronous external
abort, while without your last patch I get Null ptr dereference.

Are you sure you have deployed the kernel modules for doing the test?
This issue happens when udev triggers loading kernel modules for the
detected hardware.

Here is the kernel configuration used for my tests on ARM64 based board:

# make ARCH=arm64 defconfig && ./scripts/config --set-val
CMA_SIZE_MBYTES 96 -e PROVE_LOCKING -e DEBUG_ATOMIC_SLEEP -e PM_DEBUG -e
PM_ADVANCED_DEBUG -d ARCH_SUNXI -d ARCH_ALPINE -d DRM_NOUVEAU -d
ARCH_BCM_IPROC -d ARCH_BERLIN -d ARCH_BRCMSTB -d ARCH_LAYERSCAPE -d
ARCH_LG1K -d ARCH_HISI -d ARCH_MEDIATEK -d ARCH_MVEBU -d ARCH_ROCKCHIP
-d ARCH_SEATTLE -d ARCH_SYNQUACER -d ARCH_RENESAS -d ARCH_STRATIX10 -d
ARCH_TEGRA -d ARCH_SPRD -d ARCH_THUNDER -d ARCH_THUNDER2 -d
ARCH_UNIPHIER -d ARCH_XGENE -d ARCH_ZX -d ARCH_ZYNQMP -d HIBERNATION -d
CLK_SUNXI -e BLK_DEV_RAM --set-val BLK_DEV_RAM_COUNT 4 --set-val
BLK_DEV_RAM_SIZE 65536 -d CONFIG_EFI -d CONFIG_TEE

Comparing to the arm64's defconfig, I've enabled some debugging options
and disabled some unused boards.

Best regards

--
Marek Szyprowski, PhD
Samsung R&D Institute Poland

2021-08-12 10:09:26

by Robert Foss

[permalink] [raw]
Subject: Re: [PATCH v1] media: camss: vfe: Don't use vfe->base before it's assigned

> >>>
> >>> [ 18.480608] qcom-venus 1d00000.video-codec: Adding to iommu group 1
> >>> [ 18.536167] qcom-camss 1b0ac00.camss: Adding to iommu group 2
> >>> [ 18.600373] Internal error: synchronous external abort: 96000010 [#1]
> >>> PREEMPT SMP
> > After testing this patch + linux-next[1] I'm not able to replicate the
> > 'Internal error' above on the db410c. And I don't think it is related
> > to this patch.
> >
> > Are you seeing the same error on [1]? And are you seeing it before the
> > b10b5334528a9 ("media: camss: vfe: Don't read hardware version
> > needlessly") patch?
>
> I've checked once again on your branch. Yes, it is fully reproducible on
> my DragonBoard410c. On your branch I get the above synchronous external
> abort, while without your last patch I get Null ptr dereference.
>
> Are you sure you have deployed the kernel modules for doing the test?
> This issue happens when udev triggers loading kernel modules for the
> detected hardware.
>
> Here is the kernel configuration used for my tests on ARM64 based board:
>
> # make ARCH=arm64 defconfig && ./scripts/config --set-val
> CMA_SIZE_MBYTES 96 -e PROVE_LOCKING -e DEBUG_ATOMIC_SLEEP -e PM_DEBUG -e
> PM_ADVANCED_DEBUG -d ARCH_SUNXI -d ARCH_ALPINE -d DRM_NOUVEAU -d
> ARCH_BCM_IPROC -d ARCH_BERLIN -d ARCH_BRCMSTB -d ARCH_LAYERSCAPE -d
> ARCH_LG1K -d ARCH_HISI -d ARCH_MEDIATEK -d ARCH_MVEBU -d ARCH_ROCKCHIP
> -d ARCH_SEATTLE -d ARCH_SYNQUACER -d ARCH_RENESAS -d ARCH_STRATIX10 -d
> ARCH_TEGRA -d ARCH_SPRD -d ARCH_THUNDER -d ARCH_THUNDER2 -d
> ARCH_UNIPHIER -d ARCH_XGENE -d ARCH_ZX -d ARCH_ZYNQMP -d HIBERNATION -d
> CLK_SUNXI -e BLK_DEV_RAM --set-val BLK_DEV_RAM_COUNT 4 --set-val
> BLK_DEV_RAM_SIZE 65536 -d CONFIG_EFI -d CONFIG_TEE
>
> Comparing to the arm64's defconfig, I've enabled some debugging options
> and disabled some unused boards.
>

I made a mistake when testing on the db410c earlier, but with it fixed
I can replicate the issue. It seems to stem from the hardware not
actually being powered up when the hw_version() call is being made.

Unfortunately there is no simple way to achieve the original intention
of the series that reduced the frequency of the hw_version() being
called, while avoiding the above issue. So let's revert to the
previous behaviour for the time being.

Thank you for your help Marek! I'll submit a v2 shortly.


Rob