2021-03-26 23:14:28

by Eric Anholt

[permalink] [raw]
Subject: [PATCH 1/2] iommu/arm-smmu-qcom: Skip the TTBR1 quirk for db820c.

db820c wants to use the qcom smmu path to get HUPCF set (which keeps
the GPU from wedging and then sometimes wedging the kernel after a
page fault), but it doesn't have separate pagetables support yet in
drm/msm so we can't go all the way to the TTBR1 path.

Signed-off-by: Eric Anholt <[email protected]>
---

We've been seeing a flaky test per day or so in Mesa CI where the
kernel gets wedged after an iommu fault turns into CP errors. With
this patch, the CI isn't throwing the string of CP errors on the
faults in any of the ~10 jobs I've run so far.

drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index bcda17012aee..51f22193e456 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -130,6 +130,16 @@ static int qcom_adreno_smmu_alloc_context_bank(struct arm_smmu_domain *smmu_doma
return __arm_smmu_alloc_bitmap(smmu->context_map, start, count);
}

+static bool qcom_adreno_can_do_ttbr1(struct arm_smmu_device *smmu)
+{
+ const struct device_node *np = smmu->dev->of_node;
+
+ if (of_device_is_compatible(np, "qcom,msm8996-smmu-v2"))
+ return false;
+
+ return true;
+}
+
static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
struct io_pgtable_cfg *pgtbl_cfg, struct device *dev)
{
@@ -144,7 +154,8 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
* be AARCH64 stage 1 but double check because the arm-smmu code assumes
* that is the case when the TTBR1 quirk is enabled
*/
- if ((smmu_domain->stage == ARM_SMMU_DOMAIN_S1) &&
+ if (qcom_adreno_can_do_ttbr1(smmu_domain->smmu) &&
+ (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) &&
(smmu_domain->cfg.fmt == ARM_SMMU_CTX_FMT_AARCH64))
pgtbl_cfg->quirks |= IO_PGTABLE_QUIRK_ARM_TTBR1;

--
2.31.0


2021-03-26 23:15:02

by Eric Anholt

[permalink] [raw]
Subject: [PATCH 2/2] arm64: dts: msm8996: Mark the GPU's SMMU as an adreno one.

This enables the adreno-specific SMMU path that sets HUPCF so
(user-managed) page faults don't wedge the GPU.

Signed-off-by: Eric Anholt <[email protected]>
---

We've been seeing a flaky test per day or so in Mesa CI where the
kernel gets wedged after an iommu fault turns into CP errors. With
this patch, the CI isn't throwing the string of CP errors on the
faults in any of the ~10 jobs I've run so far.

arch/arm64/boot/dts/qcom/msm8996.dtsi | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/qcom/msm8996.dtsi b/arch/arm64/boot/dts/qcom/msm8996.dtsi
index 6de136e3add9..432b87ec9c5e 100644
--- a/arch/arm64/boot/dts/qcom/msm8996.dtsi
+++ b/arch/arm64/boot/dts/qcom/msm8996.dtsi
@@ -1127,7 +1127,7 @@ cci_i2c1: [email protected] {
};

adreno_smmu: [email protected] {
- compatible = "qcom,msm8996-smmu-v2", "qcom,smmu-v2";
+ compatible = "qcom,msm8996-smmu-v2", "qcom,adreno-smmu", "qcom,smmu-v2";
reg = <0x00b40000 0x10000>;

#global-interrupts = <1>;
--
2.31.0

2021-03-29 14:49:12

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH 1/2] iommu/arm-smmu-qcom: Skip the TTBR1 quirk for db820c.

On Fri, Mar 26, 2021 at 04:13:02PM -0700, Eric Anholt wrote:
> db820c wants to use the qcom smmu path to get HUPCF set (which keeps
> the GPU from wedging and then sometimes wedging the kernel after a
> page fault), but it doesn't have separate pagetables support yet in
> drm/msm so we can't go all the way to the TTBR1 path.

What do you mean by "doesn't have separate pagetables support yet"? The
compatible string doesn't feel like the right way to determine this.

Will

2021-03-29 17:58:15

by Eric Anholt

[permalink] [raw]
Subject: Re: [PATCH 1/2] iommu/arm-smmu-qcom: Skip the TTBR1 quirk for db820c.

On Mon, Mar 29, 2021 at 7:47 AM Will Deacon <[email protected]> wrote:
>
> On Fri, Mar 26, 2021 at 04:13:02PM -0700, Eric Anholt wrote:
> > db820c wants to use the qcom smmu path to get HUPCF set (which keeps
> > the GPU from wedging and then sometimes wedging the kernel after a
> > page fault), but it doesn't have separate pagetables support yet in
> > drm/msm so we can't go all the way to the TTBR1 path.
>
> What do you mean by "doesn't have separate pagetables support yet"? The
> compatible string doesn't feel like the right way to determine this.

In my past experience with DT, software looking at the (existing)
board-specific compatibles has been a typical mechanism used to
resolve something like this "ok, but you need to actually get down to
what board is involved here to figure out how to play along with the
rest of Linux that later attaches to other DT nodes". Do you have a
preferred mechanism here?

2021-03-30 03:26:01

by Bjorn Andersson

[permalink] [raw]
Subject: Re: [PATCH 2/2] arm64: dts: msm8996: Mark the GPU's SMMU as an adreno one.

On Fri 26 Mar 18:13 CDT 2021, Eric Anholt wrote:

> This enables the adreno-specific SMMU path that sets HUPCF so
> (user-managed) page faults don't wedge the GPU.
>
> Signed-off-by: Eric Anholt <[email protected]>

Acked-by: Bjorn Andersson <[email protected]>

@Will, can you pick this together with the driver patch? (So that they
land in order)

Regards,
Bjorn

> ---
>
> We've been seeing a flaky test per day or so in Mesa CI where the
> kernel gets wedged after an iommu fault turns into CP errors. With
> this patch, the CI isn't throwing the string of CP errors on the
> faults in any of the ~10 jobs I've run so far.
>
> arch/arm64/boot/dts/qcom/msm8996.dtsi | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/arm64/boot/dts/qcom/msm8996.dtsi b/arch/arm64/boot/dts/qcom/msm8996.dtsi
> index 6de136e3add9..432b87ec9c5e 100644
> --- a/arch/arm64/boot/dts/qcom/msm8996.dtsi
> +++ b/arch/arm64/boot/dts/qcom/msm8996.dtsi
> @@ -1127,7 +1127,7 @@ cci_i2c1: [email protected] {
> };
>
> adreno_smmu: [email protected] {
> - compatible = "qcom,msm8996-smmu-v2", "qcom,smmu-v2";
> + compatible = "qcom,msm8996-smmu-v2", "qcom,adreno-smmu", "qcom,smmu-v2";
> reg = <0x00b40000 0x10000>;
>
> #global-interrupts = <1>;
> --
> 2.31.0
>

2021-03-30 03:26:53

by Bjorn Andersson

[permalink] [raw]
Subject: Re: [PATCH 1/2] iommu/arm-smmu-qcom: Skip the TTBR1 quirk for db820c.

On Fri 26 Mar 18:13 CDT 2021, Eric Anholt wrote:

> db820c wants to use the qcom smmu path to get HUPCF set (which keeps
> the GPU from wedging and then sometimes wedging the kernel after a
> page fault), but it doesn't have separate pagetables support yet in
> drm/msm so we can't go all the way to the TTBR1 path.
>
> Signed-off-by: Eric Anholt <[email protected]>

Reviewed-by: Bjorn Andersson <[email protected]>

Regards,
Bjorn

> ---
>
> We've been seeing a flaky test per day or so in Mesa CI where the
> kernel gets wedged after an iommu fault turns into CP errors. With
> this patch, the CI isn't throwing the string of CP errors on the
> faults in any of the ~10 jobs I've run so far.
>
> drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> index bcda17012aee..51f22193e456 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> @@ -130,6 +130,16 @@ static int qcom_adreno_smmu_alloc_context_bank(struct arm_smmu_domain *smmu_doma
> return __arm_smmu_alloc_bitmap(smmu->context_map, start, count);
> }
>
> +static bool qcom_adreno_can_do_ttbr1(struct arm_smmu_device *smmu)
> +{
> + const struct device_node *np = smmu->dev->of_node;
> +
> + if (of_device_is_compatible(np, "qcom,msm8996-smmu-v2"))
> + return false;
> +
> + return true;
> +}
> +
> static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
> struct io_pgtable_cfg *pgtbl_cfg, struct device *dev)
> {
> @@ -144,7 +154,8 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
> * be AARCH64 stage 1 but double check because the arm-smmu code assumes
> * that is the case when the TTBR1 quirk is enabled
> */
> - if ((smmu_domain->stage == ARM_SMMU_DOMAIN_S1) &&
> + if (qcom_adreno_can_do_ttbr1(smmu_domain->smmu) &&
> + (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) &&
> (smmu_domain->cfg.fmt == ARM_SMMU_CTX_FMT_AARCH64))
> pgtbl_cfg->quirks |= IO_PGTABLE_QUIRK_ARM_TTBR1;
>
> --
> 2.31.0
>

2021-03-30 04:01:52

by Rob Clark

[permalink] [raw]
Subject: Re: [PATCH 1/2] iommu/arm-smmu-qcom: Skip the TTBR1 quirk for db820c.

On Mon, Mar 29, 2021 at 7:47 AM Will Deacon <[email protected]> wrote:
>
> On Fri, Mar 26, 2021 at 04:13:02PM -0700, Eric Anholt wrote:
> > db820c wants to use the qcom smmu path to get HUPCF set (which keeps
> > the GPU from wedging and then sometimes wedging the kernel after a
> > page fault), but it doesn't have separate pagetables support yet in
> > drm/msm so we can't go all the way to the TTBR1 path.
>
> What do you mean by "doesn't have separate pagetables support yet"? The
> compatible string doesn't feel like the right way to determine this.

the compatible string identifies what it is, not what the sw
limitations are, so in that regard it seems right to me..

BR,
-R

2021-03-30 09:36:33

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH 1/2] iommu/arm-smmu-qcom: Skip the TTBR1 quirk for db820c.

On Mon, Mar 29, 2021 at 09:02:50PM -0700, Rob Clark wrote:
> On Mon, Mar 29, 2021 at 7:47 AM Will Deacon <[email protected]> wrote:
> >
> > On Fri, Mar 26, 2021 at 04:13:02PM -0700, Eric Anholt wrote:
> > > db820c wants to use the qcom smmu path to get HUPCF set (which keeps
> > > the GPU from wedging and then sometimes wedging the kernel after a
> > > page fault), but it doesn't have separate pagetables support yet in
> > > drm/msm so we can't go all the way to the TTBR1 path.
> >
> > What do you mean by "doesn't have separate pagetables support yet"? The
> > compatible string doesn't feel like the right way to determine this.
>
> the compatible string identifies what it is, not what the sw
> limitations are, so in that regard it seems right to me..

Well it depends on what "doesn't have separate pagetables support yet"
means. I can't tell if it's a hardware issue, a firmware issue or a driver
issue.

Will

2021-03-30 15:02:01

by Rob Clark

[permalink] [raw]
Subject: Re: [PATCH 1/2] iommu/arm-smmu-qcom: Skip the TTBR1 quirk for db820c.

On Tue, Mar 30, 2021 at 2:34 AM Will Deacon <[email protected]> wrote:
>
> On Mon, Mar 29, 2021 at 09:02:50PM -0700, Rob Clark wrote:
> > On Mon, Mar 29, 2021 at 7:47 AM Will Deacon <[email protected]> wrote:
> > >
> > > On Fri, Mar 26, 2021 at 04:13:02PM -0700, Eric Anholt wrote:
> > > > db820c wants to use the qcom smmu path to get HUPCF set (which keeps
> > > > the GPU from wedging and then sometimes wedging the kernel after a
> > > > page fault), but it doesn't have separate pagetables support yet in
> > > > drm/msm so we can't go all the way to the TTBR1 path.
> > >
> > > What do you mean by "doesn't have separate pagetables support yet"? The
> > > compatible string doesn't feel like the right way to determine this.
> >
> > the compatible string identifies what it is, not what the sw
> > limitations are, so in that regard it seems right to me..
>
> Well it depends on what "doesn't have separate pagetables support yet"
> means. I can't tell if it's a hardware issue, a firmware issue or a driver
> issue.

Just a driver issue (and the fact that currently we don't have
physical access to a device... debugging a5xx per-process-pgtables by
pushing untested things to the CI farm is kind of a difficult way to
work)

BR,
-R

2021-03-30 15:33:09

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH 1/2] iommu/arm-smmu-qcom: Skip the TTBR1 quirk for db820c.

On Tue, Mar 30, 2021 at 08:03:36AM -0700, Rob Clark wrote:
> On Tue, Mar 30, 2021 at 2:34 AM Will Deacon <[email protected]> wrote:
> >
> > On Mon, Mar 29, 2021 at 09:02:50PM -0700, Rob Clark wrote:
> > > On Mon, Mar 29, 2021 at 7:47 AM Will Deacon <[email protected]> wrote:
> > > >
> > > > On Fri, Mar 26, 2021 at 04:13:02PM -0700, Eric Anholt wrote:
> > > > > db820c wants to use the qcom smmu path to get HUPCF set (which keeps
> > > > > the GPU from wedging and then sometimes wedging the kernel after a
> > > > > page fault), but it doesn't have separate pagetables support yet in
> > > > > drm/msm so we can't go all the way to the TTBR1 path.
> > > >
> > > > What do you mean by "doesn't have separate pagetables support yet"? The
> > > > compatible string doesn't feel like the right way to determine this.
> > >
> > > the compatible string identifies what it is, not what the sw
> > > limitations are, so in that regard it seems right to me..
> >
> > Well it depends on what "doesn't have separate pagetables support yet"
> > means. I can't tell if it's a hardware issue, a firmware issue or a driver
> > issue.
>
> Just a driver issue (and the fact that currently we don't have
> physical access to a device... debugging a5xx per-process-pgtables by
> pushing untested things to the CI farm is kind of a difficult way to
> work)

But then in that case, this is using the compatible string to identify a
driver issue, no?

Will

2021-03-30 16:08:05

by Rob Clark

[permalink] [raw]
Subject: Re: [PATCH 1/2] iommu/arm-smmu-qcom: Skip the TTBR1 quirk for db820c.

On Tue, Mar 30, 2021 at 8:31 AM Will Deacon <[email protected]> wrote:
>
> On Tue, Mar 30, 2021 at 08:03:36AM -0700, Rob Clark wrote:
> > On Tue, Mar 30, 2021 at 2:34 AM Will Deacon <[email protected]> wrote:
> > >
> > > On Mon, Mar 29, 2021 at 09:02:50PM -0700, Rob Clark wrote:
> > > > On Mon, Mar 29, 2021 at 7:47 AM Will Deacon <[email protected]> wrote:
> > > > >
> > > > > On Fri, Mar 26, 2021 at 04:13:02PM -0700, Eric Anholt wrote:
> > > > > > db820c wants to use the qcom smmu path to get HUPCF set (which keeps
> > > > > > the GPU from wedging and then sometimes wedging the kernel after a
> > > > > > page fault), but it doesn't have separate pagetables support yet in
> > > > > > drm/msm so we can't go all the way to the TTBR1 path.
> > > > >
> > > > > What do you mean by "doesn't have separate pagetables support yet"? The
> > > > > compatible string doesn't feel like the right way to determine this.
> > > >
> > > > the compatible string identifies what it is, not what the sw
> > > > limitations are, so in that regard it seems right to me..
> > >
> > > Well it depends on what "doesn't have separate pagetables support yet"
> > > means. I can't tell if it's a hardware issue, a firmware issue or a driver
> > > issue.
> >
> > Just a driver issue (and the fact that currently we don't have
> > physical access to a device... debugging a5xx per-process-pgtables by
> > pushing untested things to the CI farm is kind of a difficult way to
> > work)
>
> But then in that case, this is using the compatible string to identify a
> driver issue, no?
>

Well, I suppose yes.. but OTOH it is keeping the problem out of the
dtb. Once per-process pgtables works for a5xx, there would be no dtb
change, just a change to the quirk behavior in arm-smmu-qcom.

BR,
-R