(resend with expanded CC list so everybody can see all the patches)
This short series adds support for the adreno-smmu implementation of the
arm-smmu driver and the device-tree bindings to turn on the implementation
for the sm845 and sc7180 GPUs. These changes are the last ones needed to enable
per-instance pagetables in the drm/msm driver.
No deltas in this patchset since the last go-around for 5.10 [1].
[1] https://patchwork.freedesktop.org/series/81393/
Jordan Crouse (3):
iommu/arm-smmu-qcom: Add implementation for the adreno GPU SMMU
dt-bindings: arm-smmu: Add compatible string for Adreno GPU SMMU
arm: dts: qcom: sm845: Set the compatible string for the GPU SMMU
Rob Clark (1):
iommu/arm-smmu: Add a way for implementations to influence SCTLR
.../devicetree/bindings/iommu/arm,smmu.yaml | 9 +-
arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi | 9 +
arch/arm64/boot/dts/qcom/sdm845.dtsi | 2 +-
drivers/iommu/arm/arm-smmu/arm-smmu-impl.c | 3 +
drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 157 +++++++++++++++++-
drivers/iommu/arm/arm-smmu/arm-smmu.c | 3 +
drivers/iommu/arm/arm-smmu/arm-smmu.h | 4 +
7 files changed, 182 insertions(+), 5 deletions(-)
--
2.25.1
Set the qcom,adreno-smmu compatible string for the GPU SMMU to enable
split pagetables and per-instance pagetables for drm/msm.
Signed-off-by: Jordan Crouse <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Bjorn Andersson <[email protected]>
---
arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi | 9 +++++++++
arch/arm64/boot/dts/qcom/sdm845.dtsi | 2 +-
2 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi b/arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi
index 64fc1bfd66fa..39f23cdcbd02 100644
--- a/arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi
@@ -633,6 +633,15 @@ &mdss_mdp {
status = "okay";
};
+/*
+ * Cheza fw does not properly program the GPU aperture to allow the
+ * GPU to update the SMMU pagetables for context switches. Work
+ * around this by dropping the "qcom,adreno-smmu" compat string.
+ */
+&adreno_smmu {
+ compatible = "qcom,sdm845-smmu-v2", "qcom,smmu-v2";
+};
+
&mss_pil {
iommus = <&apps_smmu 0x781 0x0>,
<&apps_smmu 0x724 0x3>;
diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
index 40e8c11f23ab..0508e86140bd 100644
--- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
@@ -4103,7 +4103,7 @@ opp-257000000 {
};
adreno_smmu: iommu@5040000 {
- compatible = "qcom,sdm845-smmu-v2", "qcom,smmu-v2";
+ compatible = "qcom,sdm845-smmu-v2", "qcom,adreno-smmu", "qcom,smmu-v2";
reg = <0 0x5040000 0 0x10000>;
#iommu-cells = <1>;
#global-interrupts = <2>;
--
2.25.1
From: Rob Clark <[email protected]>
For the Adreno GPU's SMMU, we want SCTLR.HUPCF set to ensure that
pending translations are not terminated on iova fault. Otherwise
a terminated CP read could hang the GPU by returning invalid
command-stream data.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Bjorn Andersson <[email protected]>
Signed-off-by: Jordan Crouse <[email protected]>
---
drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 6 ++++++
drivers/iommu/arm/arm-smmu/arm-smmu.c | 3 +++
drivers/iommu/arm/arm-smmu/arm-smmu.h | 3 +++
3 files changed, 12 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index 1e942eed2dfc..0663d7d26908 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -129,6 +129,12 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
(smmu_domain->cfg.fmt == ARM_SMMU_CTX_FMT_AARCH64))
pgtbl_cfg->quirks |= IO_PGTABLE_QUIRK_ARM_TTBR1;
+ /*
+ * On the GPU device we want to process subsequent transactions after a
+ * fault to keep the GPU from hanging
+ */
+ smmu_domain->cfg.sctlr_set |= ARM_SMMU_SCTLR_HUPCF;
+
/*
* Initialize private interface with GPU:
*/
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index dad7fa86fbd4..1f06ab219819 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -617,6 +617,9 @@ void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx)
if (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN))
reg |= ARM_SMMU_SCTLR_E;
+ reg |= cfg->sctlr_set;
+ reg &= ~cfg->sctlr_clr;
+
arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, reg);
}
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index 6c5ff9999eae..ddf2ca4c923d 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -144,6 +144,7 @@ enum arm_smmu_cbar_type {
#define ARM_SMMU_CB_SCTLR 0x0
#define ARM_SMMU_SCTLR_S1_ASIDPNE BIT(12)
#define ARM_SMMU_SCTLR_CFCFG BIT(7)
+#define ARM_SMMU_SCTLR_HUPCF BIT(8)
#define ARM_SMMU_SCTLR_CFIE BIT(6)
#define ARM_SMMU_SCTLR_CFRE BIT(5)
#define ARM_SMMU_SCTLR_E BIT(4)
@@ -341,6 +342,8 @@ struct arm_smmu_cfg {
u16 asid;
u16 vmid;
};
+ u32 sctlr_set; /* extra bits to set in SCTLR */
+ u32 sctlr_clr; /* bits to mask in SCTLR */
enum arm_smmu_cbar_type cbar;
enum arm_smmu_context_fmt fmt;
};
--
2.25.1
Add a special implementation for the SMMU attached to most Adreno GPU
target triggered from the qcom,adreno-smmu compatible string.
The new Adreno SMMU implementation will enable split pagetables
(TTBR1) for the domain attached to the GPU device (SID 0) and
hard code it context bank 0 so the GPU hardware can implement
per-instance pagetables.
Co-developed-by: Rob Clark <[email protected]>
Signed-off-by: Jordan Crouse <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Bjorn Andersson <[email protected]>
---
drivers/iommu/arm/arm-smmu/arm-smmu-impl.c | 3 +
drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 151 ++++++++++++++++++++-
drivers/iommu/arm/arm-smmu/arm-smmu.h | 1 +
3 files changed, 153 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
index 88f17cc33023..d199b4bff15d 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
@@ -223,6 +223,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu)
of_device_is_compatible(np, "qcom,sm8250-smmu-500"))
return qcom_smmu_impl_init(smmu);
+ if (of_device_is_compatible(smmu->dev->of_node, "qcom,adreno-smmu"))
+ return qcom_adreno_smmu_impl_init(smmu);
+
if (of_device_is_compatible(np, "marvell,ap806-smmu-500"))
smmu->impl = &mrvl_mmu500_impl;
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index be4318044f96..1e942eed2dfc 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -3,6 +3,7 @@
* Copyright (c) 2019, The Linux Foundation. All rights reserved.
*/
+#include <linux/adreno-smmu-priv.h>
#include <linux/of_device.h>
#include <linux/qcom_scm.h>
@@ -12,6 +13,134 @@ struct qcom_smmu {
struct arm_smmu_device smmu;
};
+#define QCOM_ADRENO_SMMU_GPU_SID 0
+
+static bool qcom_adreno_smmu_is_gpu_device(struct device *dev)
+{
+ struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+ int i;
+
+ /*
+ * The GPU will always use SID 0 so that is a handy way to uniquely
+ * identify it and configure it for per-instance pagetables
+ */
+ for (i = 0; i < fwspec->num_ids; i++) {
+ u16 sid = FIELD_GET(ARM_SMMU_SMR_ID, fwspec->ids[i]);
+
+ if (sid == QCOM_ADRENO_SMMU_GPU_SID)
+ return true;
+ }
+
+ return false;
+}
+
+static const struct io_pgtable_cfg *qcom_adreno_smmu_get_ttbr1_cfg(
+ const void *cookie)
+{
+ struct arm_smmu_domain *smmu_domain = (void *)cookie;
+ struct io_pgtable *pgtable =
+ io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops);
+ return &pgtable->cfg;
+}
+
+/*
+ * Local implementation to configure TTBR0 with the specified pagetable config.
+ * The GPU driver will call this to enable TTBR0 when per-instance pagetables
+ * are active
+ */
+
+static int qcom_adreno_smmu_set_ttbr0_cfg(const void *cookie,
+ const struct io_pgtable_cfg *pgtbl_cfg)
+{
+ struct arm_smmu_domain *smmu_domain = (void *)cookie;
+ struct io_pgtable *pgtable = io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops);
+ struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
+ struct arm_smmu_cb *cb = &smmu_domain->smmu->cbs[cfg->cbndx];
+
+ /* The domain must have split pagetables already enabled */
+ if (cb->tcr[0] & ARM_SMMU_TCR_EPD1)
+ return -EINVAL;
+
+ /* If the pagetable config is NULL, disable TTBR0 */
+ if (!pgtbl_cfg) {
+ /* Do nothing if it is already disabled */
+ if ((cb->tcr[0] & ARM_SMMU_TCR_EPD0))
+ return -EINVAL;
+
+ /* Set TCR to the original configuration */
+ cb->tcr[0] = arm_smmu_lpae_tcr(&pgtable->cfg);
+ cb->ttbr[0] = FIELD_PREP(ARM_SMMU_TTBRn_ASID, cb->cfg->asid);
+ } else {
+ u32 tcr = cb->tcr[0];
+
+ /* Don't call this again if TTBR0 is already enabled */
+ if (!(cb->tcr[0] & ARM_SMMU_TCR_EPD0))
+ return -EINVAL;
+
+ tcr |= arm_smmu_lpae_tcr(pgtbl_cfg);
+ tcr &= ~(ARM_SMMU_TCR_EPD0 | ARM_SMMU_TCR_EPD1);
+
+ cb->tcr[0] = tcr;
+ cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
+ cb->ttbr[0] |= FIELD_PREP(ARM_SMMU_TTBRn_ASID, cb->cfg->asid);
+ }
+
+ arm_smmu_write_context_bank(smmu_domain->smmu, cb->cfg->cbndx);
+
+ return 0;
+}
+
+static int qcom_adreno_smmu_alloc_context_bank(struct arm_smmu_domain *smmu_domain,
+ struct arm_smmu_device *smmu,
+ struct device *dev, int start)
+{
+ int count;
+
+ /*
+ * Assign context bank 0 to the GPU device so the GPU hardware can
+ * switch pagetables
+ */
+ if (qcom_adreno_smmu_is_gpu_device(dev)) {
+ start = 0;
+ count = 1;
+ } else {
+ start = 1;
+ count = smmu->num_context_banks;
+ }
+
+ return __arm_smmu_alloc_bitmap(smmu->context_map, start, count);
+}
+
+static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
+ struct io_pgtable_cfg *pgtbl_cfg, struct device *dev)
+{
+ struct adreno_smmu_priv *priv;
+
+ /* Only enable split pagetables for the GPU device (SID 0) */
+ if (!qcom_adreno_smmu_is_gpu_device(dev))
+ return 0;
+
+ /*
+ * All targets that use the qcom,adreno-smmu compatible string *should*
+ * be AARCH64 stage 1 but double check because the arm-smmu code assumes
+ * that is the case when the TTBR1 quirk is enabled
+ */
+ if ((smmu_domain->stage == ARM_SMMU_DOMAIN_S1) &&
+ (smmu_domain->cfg.fmt == ARM_SMMU_CTX_FMT_AARCH64))
+ pgtbl_cfg->quirks |= IO_PGTABLE_QUIRK_ARM_TTBR1;
+
+ /*
+ * Initialize private interface with GPU:
+ */
+
+ priv = dev_get_drvdata(dev);
+ priv->cookie = smmu_domain;
+ priv->get_ttbr1_cfg = qcom_adreno_smmu_get_ttbr1_cfg;
+ priv->set_ttbr0_cfg = qcom_adreno_smmu_set_ttbr0_cfg;
+
+ return 0;
+}
+
static const struct of_device_id qcom_smmu_client_of_match[] __maybe_unused = {
{ .compatible = "qcom,adreno" },
{ .compatible = "qcom,mdp4" },
@@ -65,7 +194,15 @@ static const struct arm_smmu_impl qcom_smmu_impl = {
.reset = qcom_smmu500_reset,
};
-struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu)
+static const struct arm_smmu_impl qcom_adreno_smmu_impl = {
+ .init_context = qcom_adreno_smmu_init_context,
+ .def_domain_type = qcom_smmu_def_domain_type,
+ .reset = qcom_smmu500_reset,
+ .alloc_context_bank = qcom_adreno_smmu_alloc_context_bank,
+};
+
+static struct arm_smmu_device *qcom_smmu_create(struct arm_smmu_device *smmu,
+ const struct arm_smmu_impl *impl)
{
struct qcom_smmu *qsmmu;
@@ -75,8 +212,18 @@ struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu)
qsmmu->smmu = *smmu;
- qsmmu->smmu.impl = &qcom_smmu_impl;
+ qsmmu->smmu.impl = impl;
devm_kfree(smmu->dev, smmu);
return &qsmmu->smmu;
}
+
+struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu)
+{
+ return qcom_smmu_create(smmu, &qcom_smmu_impl);
+}
+
+struct arm_smmu_device *qcom_adreno_smmu_impl_init(struct arm_smmu_device *smmu)
+{
+ return qcom_smmu_create(smmu, &qcom_adreno_smmu_impl);
+}
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index 1a746476927c..6c5ff9999eae 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -520,6 +520,7 @@ static inline void arm_smmu_writeq(struct arm_smmu_device *smmu, int page,
struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu);
struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu);
struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu);
+struct arm_smmu_device *qcom_adreno_smmu_impl_init(struct arm_smmu_device *smmu);
void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx);
int arm_mmu500_reset(struct arm_smmu_device *smmu);
--
2.25.1
Every Qcom Adreno GPU has an embedded SMMU for its own use. These
devices depend on unique features such as split pagetables,
different stall/halt requirements and other settings. Identify them
with a compatible string so that they can be identified in the
arm-smmu implementation specific code.
Signed-off-by: Jordan Crouse <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Bjorn Andersson <[email protected]>
---
Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index 503160a7b9a0..3b63f2ae24db 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -28,8 +28,6 @@ properties:
- enum:
- qcom,msm8996-smmu-v2
- qcom,msm8998-smmu-v2
- - qcom,sc7180-smmu-v2
- - qcom,sdm845-smmu-v2
- const: qcom,smmu-v2
- description: Qcom SoCs implementing "arm,mmu-500"
@@ -40,6 +38,13 @@ properties:
- qcom,sm8150-smmu-500
- qcom,sm8250-smmu-500
- const: arm,mmu-500
+ - description: Qcom Adreno GPUs implementing "arm,smmu-v2"
+ items:
+ - enum:
+ - qcom,sc7180-smmu-v2
+ - qcom,sdm845-smmu-v2
+ - const: qcom,adreno-smmu
+ - const: qcom,smmu-v2
- description: Marvell SoCs implementing "arm,mmu-500"
items:
- const: marvell,ap806-smmu-500
--
2.25.1
On 2020-11-02 17:14, Jordan Crouse wrote:
> Add a special implementation for the SMMU attached to most Adreno GPU
> target triggered from the qcom,adreno-smmu compatible string.
>
> The new Adreno SMMU implementation will enable split pagetables
> (TTBR1) for the domain attached to the GPU device (SID 0) and
> hard code it context bank 0 so the GPU hardware can implement
> per-instance pagetables.
>
> Co-developed-by: Rob Clark <[email protected]>
> Signed-off-by: Jordan Crouse <[email protected]>
> Signed-off-by: Rob Clark <[email protected]>
> Reviewed-by: Bjorn Andersson <[email protected]>
> ---
>
> drivers/iommu/arm/arm-smmu/arm-smmu-impl.c | 3 +
> drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 151 ++++++++++++++++++++-
> drivers/iommu/arm/arm-smmu/arm-smmu.h | 1 +
> 3 files changed, 153 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
> index 88f17cc33023..d199b4bff15d 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
> @@ -223,6 +223,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu)
> of_device_is_compatible(np, "qcom,sm8250-smmu-500"))
> return qcom_smmu_impl_init(smmu);
>
> + if (of_device_is_compatible(smmu->dev->of_node, "qcom,adreno-smmu"))
> + return qcom_adreno_smmu_impl_init(smmu);
> +
> if (of_device_is_compatible(np, "marvell,ap806-smmu-500"))
> smmu->impl = &mrvl_mmu500_impl;
>
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> index be4318044f96..1e942eed2dfc 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> @@ -3,6 +3,7 @@
> * Copyright (c) 2019, The Linux Foundation. All rights reserved.
> */
>
> +#include <linux/adreno-smmu-priv.h>
> #include <linux/of_device.h>
> #include <linux/qcom_scm.h>
>
> @@ -12,6 +13,134 @@ struct qcom_smmu {
> struct arm_smmu_device smmu;
> };
>
> +#define QCOM_ADRENO_SMMU_GPU_SID 0
> +
> +static bool qcom_adreno_smmu_is_gpu_device(struct device *dev)
> +{
> + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> + int i;
> +
> + /*
> + * The GPU will always use SID 0 so that is a handy way to uniquely
> + * identify it and configure it for per-instance pagetables
> + */
> + for (i = 0; i < fwspec->num_ids; i++) {
> + u16 sid = FIELD_GET(ARM_SMMU_SMR_ID, fwspec->ids[i]);
> +
> + if (sid == QCOM_ADRENO_SMMU_GPU_SID)
> + return true;
> + }
> +
> + return false;
> +}
> +
> +static const struct io_pgtable_cfg *qcom_adreno_smmu_get_ttbr1_cfg(
> + const void *cookie)
> +{
> + struct arm_smmu_domain *smmu_domain = (void *)cookie;
> + struct io_pgtable *pgtable =
> + io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops);
> + return &pgtable->cfg;
> +}
> +
> +/*
> + * Local implementation to configure TTBR0 with the specified pagetable config.
> + * The GPU driver will call this to enable TTBR0 when per-instance pagetables
> + * are active
> + */
> +
> +static int qcom_adreno_smmu_set_ttbr0_cfg(const void *cookie,
> + const struct io_pgtable_cfg *pgtbl_cfg)
> +{
> + struct arm_smmu_domain *smmu_domain = (void *)cookie;
> + struct io_pgtable *pgtable = io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops);
> + struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
> + struct arm_smmu_cb *cb = &smmu_domain->smmu->cbs[cfg->cbndx];
> +
> + /* The domain must have split pagetables already enabled */
> + if (cb->tcr[0] & ARM_SMMU_TCR_EPD1)
> + return -EINVAL;
> +
> + /* If the pagetable config is NULL, disable TTBR0 */
> + if (!pgtbl_cfg) {
> + /* Do nothing if it is already disabled */
> + if ((cb->tcr[0] & ARM_SMMU_TCR_EPD0))
> + return -EINVAL;
> +
> + /* Set TCR to the original configuration */
> + cb->tcr[0] = arm_smmu_lpae_tcr(&pgtable->cfg);
> + cb->ttbr[0] = FIELD_PREP(ARM_SMMU_TTBRn_ASID, cb->cfg->asid);
> + } else {
> + u32 tcr = cb->tcr[0];
> +
> + /* Don't call this again if TTBR0 is already enabled */
> + if (!(cb->tcr[0] & ARM_SMMU_TCR_EPD0))
> + return -EINVAL;
> +
> + tcr |= arm_smmu_lpae_tcr(pgtbl_cfg);
> + tcr &= ~(ARM_SMMU_TCR_EPD0 | ARM_SMMU_TCR_EPD1);
> +
> + cb->tcr[0] = tcr;
> + cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
> + cb->ttbr[0] |= FIELD_PREP(ARM_SMMU_TTBRn_ASID, cb->cfg->asid);
> + }
> +
> + arm_smmu_write_context_bank(smmu_domain->smmu, cb->cfg->cbndx);
> +
> + return 0;
> +}
> +
> +static int qcom_adreno_smmu_alloc_context_bank(struct arm_smmu_domain *smmu_domain,
> + struct arm_smmu_device *smmu,
> + struct device *dev, int start)
> +{
> + int count;
> +
> + /*
> + * Assign context bank 0 to the GPU device so the GPU hardware can
> + * switch pagetables
> + */
> + if (qcom_adreno_smmu_is_gpu_device(dev)) {
> + start = 0;
> + count = 1;
> + } else {
> + start = 1;
> + count = smmu->num_context_banks;
> + }
> +
> + return __arm_smmu_alloc_bitmap(smmu->context_map, start, count);
> +}
> +static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
> + struct io_pgtable_cfg *pgtbl_cfg, struct device *dev)
> +{
> + struct adreno_smmu_priv *priv;
> +
> + /* Only enable split pagetables for the GPU device (SID 0) */
> + if (!qcom_adreno_smmu_is_gpu_device(dev))
> + return 0;
> +
> + /*
> + * All targets that use the qcom,adreno-smmu compatible string *should*
> + * be AARCH64 stage 1 but double check because the arm-smmu code assumes
> + * that is the case when the TTBR1 quirk is enabled
> + */
> + if ((smmu_domain->stage == ARM_SMMU_DOMAIN_S1) &&
> + (smmu_domain->cfg.fmt == ARM_SMMU_CTX_FMT_AARCH64))
> + pgtbl_cfg->quirks |= IO_PGTABLE_QUIRK_ARM_TTBR1;
> +
> + /*
> + * Initialize private interface with GPU:
> + */
> +
> + priv = dev_get_drvdata(dev);
> + priv->cookie = smmu_domain;
> + priv->get_ttbr1_cfg = qcom_adreno_smmu_get_ttbr1_cfg;
> + priv->set_ttbr0_cfg = qcom_adreno_smmu_set_ttbr0_cfg;
I still think it would have been logical to reserve context bank 0
outright in cfg_probe, then just swizzle cbndx/irptndx at this point
once everything else has proven that this is to be the One Special
Domain. I guess this way at least you don't have to intervene in
domain_free, but by the same token that means you never get to clean up
the dangling pointer in priv->cookie, which is a little bit yuck. Oh well...
Acked-by: Robin Murphy <[email protected]>
Thanks,
Robin.
> +
> + return 0;
> +}
> +
> static const struct of_device_id qcom_smmu_client_of_match[] __maybe_unused = {
> { .compatible = "qcom,adreno" },
> { .compatible = "qcom,mdp4" },
> @@ -65,7 +194,15 @@ static const struct arm_smmu_impl qcom_smmu_impl = {
> .reset = qcom_smmu500_reset,
> };
>
> -struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu)
> +static const struct arm_smmu_impl qcom_adreno_smmu_impl = {
> + .init_context = qcom_adreno_smmu_init_context,
> + .def_domain_type = qcom_smmu_def_domain_type,
> + .reset = qcom_smmu500_reset,
> + .alloc_context_bank = qcom_adreno_smmu_alloc_context_bank,
> +};
> +
> +static struct arm_smmu_device *qcom_smmu_create(struct arm_smmu_device *smmu,
> + const struct arm_smmu_impl *impl)
> {
> struct qcom_smmu *qsmmu;
>
> @@ -75,8 +212,18 @@ struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu)
>
> qsmmu->smmu = *smmu;
>
> - qsmmu->smmu.impl = &qcom_smmu_impl;
> + qsmmu->smmu.impl = impl;
> devm_kfree(smmu->dev, smmu);
>
> return &qsmmu->smmu;
> }
> +
> +struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu)
> +{
> + return qcom_smmu_create(smmu, &qcom_smmu_impl);
> +}
> +
> +struct arm_smmu_device *qcom_adreno_smmu_impl_init(struct arm_smmu_device *smmu)
> +{
> + return qcom_smmu_create(smmu, &qcom_adreno_smmu_impl);
> +}
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> index 1a746476927c..6c5ff9999eae 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> @@ -520,6 +520,7 @@ static inline void arm_smmu_writeq(struct arm_smmu_device *smmu, int page,
> struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu);
> struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu);
> struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu);
> +struct arm_smmu_device *qcom_adreno_smmu_impl_init(struct arm_smmu_device *smmu);
>
> void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx);
> int arm_mmu500_reset(struct arm_smmu_device *smmu);
>
On 2020-11-02 17:14, Jordan Crouse wrote:
> From: Rob Clark <[email protected]>
>
> For the Adreno GPU's SMMU, we want SCTLR.HUPCF set to ensure that
> pending translations are not terminated on iova fault. Otherwise
> a terminated CP read could hang the GPU by returning invalid
> command-stream data.
>
> Signed-off-by: Rob Clark <[email protected]>
> Reviewed-by: Bjorn Andersson <[email protected]>
> Signed-off-by: Jordan Crouse <[email protected]>
> ---
>
> drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 6 ++++++
> drivers/iommu/arm/arm-smmu/arm-smmu.c | 3 +++
> drivers/iommu/arm/arm-smmu/arm-smmu.h | 3 +++
> 3 files changed, 12 insertions(+)
>
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> index 1e942eed2dfc..0663d7d26908 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> @@ -129,6 +129,12 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
> (smmu_domain->cfg.fmt == ARM_SMMU_CTX_FMT_AARCH64))
> pgtbl_cfg->quirks |= IO_PGTABLE_QUIRK_ARM_TTBR1;
>
> + /*
> + * On the GPU device we want to process subsequent transactions after a
> + * fault to keep the GPU from hanging
> + */
> + smmu_domain->cfg.sctlr_set |= ARM_SMMU_SCTLR_HUPCF;
> +
> /*
> * Initialize private interface with GPU:
> */
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> index dad7fa86fbd4..1f06ab219819 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> @@ -617,6 +617,9 @@ void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx)
> if (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN))
> reg |= ARM_SMMU_SCTLR_E;
>
> + reg |= cfg->sctlr_set;
> + reg &= ~cfg->sctlr_clr;
Since we now have a write_s2cr hook, I'm inclined to think that the
consistency of a write_sctlr hook that could similarly apply its own
arbitrary tweaks would make sense for this. Does anyone have any strong
opinions?
Robin.
> +
> arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, reg);
> }
>
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> index 6c5ff9999eae..ddf2ca4c923d 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> @@ -144,6 +144,7 @@ enum arm_smmu_cbar_type {
> #define ARM_SMMU_CB_SCTLR 0x0
> #define ARM_SMMU_SCTLR_S1_ASIDPNE BIT(12)
> #define ARM_SMMU_SCTLR_CFCFG BIT(7)
> +#define ARM_SMMU_SCTLR_HUPCF BIT(8)
> #define ARM_SMMU_SCTLR_CFIE BIT(6)
> #define ARM_SMMU_SCTLR_CFRE BIT(5)
> #define ARM_SMMU_SCTLR_E BIT(4)
> @@ -341,6 +342,8 @@ struct arm_smmu_cfg {
> u16 asid;
> u16 vmid;
> };
> + u32 sctlr_set; /* extra bits to set in SCTLR */
> + u32 sctlr_clr; /* bits to mask in SCTLR */
> enum arm_smmu_cbar_type cbar;
> enum arm_smmu_context_fmt fmt;
> };
>
On 2020-11-02 17:14, Jordan Crouse wrote:
> Every Qcom Adreno GPU has an embedded SMMU for its own use. These
> devices depend on unique features such as split pagetables,
> different stall/halt requirements and other settings. Identify them
> with a compatible string so that they can be identified in the
> arm-smmu implementation specific code.
>
> Signed-off-by: Jordan Crouse <[email protected]>
> Reviewed-by: Rob Herring <[email protected]>
> Signed-off-by: Rob Clark <[email protected]>
> Reviewed-by: Bjorn Andersson <[email protected]>
> ---
>
> Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
> index 503160a7b9a0..3b63f2ae24db 100644
> --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
> @@ -28,8 +28,6 @@ properties:
> - enum:
> - qcom,msm8996-smmu-v2
> - qcom,msm8998-smmu-v2
> - - qcom,sc7180-smmu-v2
> - - qcom,sdm845-smmu-v2
What about the "Apps SMMU" instances? Those are distinct and don't
have/need the GPU special behaviour, right?
Robin.
> - const: qcom,smmu-v2
>
> - description: Qcom SoCs implementing "arm,mmu-500"
> @@ -40,6 +38,13 @@ properties:
> - qcom,sm8150-smmu-500
> - qcom,sm8250-smmu-500
> - const: arm,mmu-500
> + - description: Qcom Adreno GPUs implementing "arm,smmu-v2"
> + items:
> + - enum:
> + - qcom,sc7180-smmu-v2
> + - qcom,sdm845-smmu-v2
> + - const: qcom,adreno-smmu
> + - const: qcom,smmu-v2
> - description: Marvell SoCs implementing "arm,mmu-500"
> items:
> - const: marvell,ap806-smmu-500
>
On 2020-11-02 18:22, Robin Murphy wrote:
> On 2020-11-02 17:14, Jordan Crouse wrote:
>> Every Qcom Adreno GPU has an embedded SMMU for its own use. These
>> devices depend on unique features such as split pagetables,
>> different stall/halt requirements and other settings. Identify them
>> with a compatible string so that they can be identified in the
>> arm-smmu implementation specific code.
>>
>> Signed-off-by: Jordan Crouse <[email protected]>
>> Reviewed-by: Rob Herring <[email protected]>
>> Signed-off-by: Rob Clark <[email protected]>
>> Reviewed-by: Bjorn Andersson <[email protected]>
>> ---
>>
>> Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 9 +++++++--
>> 1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
>> b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
>> index 503160a7b9a0..3b63f2ae24db 100644
>> --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
>> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
>> @@ -28,8 +28,6 @@ properties:
>> - enum:
>> - qcom,msm8996-smmu-v2
>> - qcom,msm8998-smmu-v2
>> - - qcom,sc7180-smmu-v2
>> - - qcom,sdm845-smmu-v2
>
> What about the "Apps SMMU" instances? Those are distinct and don't
> have/need the GPU special behaviour, right?
Oh, having looked at patch #4, which prompted me go and look at the 845
DTSI in context, now I realise the subtlety I overlooked. So I guess it
really was worth resending, ha! Sorry for being thick :)
Reviewed-by: Robin Murphy <[email protected]>
>
> Robin.
>
>> - const: qcom,smmu-v2
>> - description: Qcom SoCs implementing "arm,mmu-500"
>> @@ -40,6 +38,13 @@ properties:
>> - qcom,sm8150-smmu-500
>> - qcom,sm8250-smmu-500
>> - const: arm,mmu-500
>> + - description: Qcom Adreno GPUs implementing "arm,smmu-v2"
>> + items:
>> + - enum:
>> + - qcom,sc7180-smmu-v2
>> + - qcom,sdm845-smmu-v2
>> + - const: qcom,adreno-smmu
>> + - const: qcom,smmu-v2
>> - description: Marvell SoCs implementing "arm,mmu-500"
>> items:
>> - const: marvell,ap806-smmu-500
>>
On Mon, Nov 02, 2020 at 06:18:45PM +0000, Robin Murphy wrote:
> On 2020-11-02 17:14, Jordan Crouse wrote:
> >From: Rob Clark <[email protected]>
> >
> >For the Adreno GPU's SMMU, we want SCTLR.HUPCF set to ensure that
> >pending translations are not terminated on iova fault. Otherwise
> >a terminated CP read could hang the GPU by returning invalid
> >command-stream data.
> >
> >Signed-off-by: Rob Clark <[email protected]>
> >Reviewed-by: Bjorn Andersson <[email protected]>
> >Signed-off-by: Jordan Crouse <[email protected]>
> >---
> >
> > drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 6 ++++++
> > drivers/iommu/arm/arm-smmu/arm-smmu.c | 3 +++
> > drivers/iommu/arm/arm-smmu/arm-smmu.h | 3 +++
> > 3 files changed, 12 insertions(+)
> >
> >diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> >index 1e942eed2dfc..0663d7d26908 100644
> >--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> >+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> >@@ -129,6 +129,12 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
> > (smmu_domain->cfg.fmt == ARM_SMMU_CTX_FMT_AARCH64))
> > pgtbl_cfg->quirks |= IO_PGTABLE_QUIRK_ARM_TTBR1;
> >+ /*
> >+ * On the GPU device we want to process subsequent transactions after a
> >+ * fault to keep the GPU from hanging
> >+ */
> >+ smmu_domain->cfg.sctlr_set |= ARM_SMMU_SCTLR_HUPCF;
> >+
> > /*
> > * Initialize private interface with GPU:
> > */
> >diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> >index dad7fa86fbd4..1f06ab219819 100644
> >--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> >+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> >@@ -617,6 +617,9 @@ void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx)
> > if (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN))
> > reg |= ARM_SMMU_SCTLR_E;
> >+ reg |= cfg->sctlr_set;
> >+ reg &= ~cfg->sctlr_clr;
>
> Since we now have a write_s2cr hook, I'm inclined to think that the
> consistency of a write_sctlr hook that could similarly apply its own
> arbitrary tweaks would make sense for this. Does anyone have any strong
> opinions?
None from me. That would make an eventual stall-on-fault implementation easier
too.
Jordan
> Robin.
>
> >+
> > arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, reg);
> > }
> >diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> >index 6c5ff9999eae..ddf2ca4c923d 100644
> >--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
> >+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> >@@ -144,6 +144,7 @@ enum arm_smmu_cbar_type {
> > #define ARM_SMMU_CB_SCTLR 0x0
> > #define ARM_SMMU_SCTLR_S1_ASIDPNE BIT(12)
> > #define ARM_SMMU_SCTLR_CFCFG BIT(7)
> >+#define ARM_SMMU_SCTLR_HUPCF BIT(8)
> > #define ARM_SMMU_SCTLR_CFIE BIT(6)
> > #define ARM_SMMU_SCTLR_CFRE BIT(5)
> > #define ARM_SMMU_SCTLR_E BIT(4)
> >@@ -341,6 +342,8 @@ struct arm_smmu_cfg {
> > u16 asid;
> > u16 vmid;
> > };
> >+ u32 sctlr_set; /* extra bits to set in SCTLR */
> >+ u32 sctlr_clr; /* bits to mask in SCTLR */
> > enum arm_smmu_cbar_type cbar;
> > enum arm_smmu_context_fmt fmt;
> > };
> >
--
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
On Mon 02 Nov 12:18 CST 2020, Robin Murphy wrote:
> On 2020-11-02 17:14, Jordan Crouse wrote:
> > From: Rob Clark <[email protected]>
> >
> > For the Adreno GPU's SMMU, we want SCTLR.HUPCF set to ensure that
> > pending translations are not terminated on iova fault. Otherwise
> > a terminated CP read could hang the GPU by returning invalid
> > command-stream data.
> >
> > Signed-off-by: Rob Clark <[email protected]>
> > Reviewed-by: Bjorn Andersson <[email protected]>
> > Signed-off-by: Jordan Crouse <[email protected]>
> > ---
> >
> > drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 6 ++++++
> > drivers/iommu/arm/arm-smmu/arm-smmu.c | 3 +++
> > drivers/iommu/arm/arm-smmu/arm-smmu.h | 3 +++
> > 3 files changed, 12 insertions(+)
> >
> > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> > index 1e942eed2dfc..0663d7d26908 100644
> > --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> > @@ -129,6 +129,12 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
> > (smmu_domain->cfg.fmt == ARM_SMMU_CTX_FMT_AARCH64))
> > pgtbl_cfg->quirks |= IO_PGTABLE_QUIRK_ARM_TTBR1;
> > + /*
> > + * On the GPU device we want to process subsequent transactions after a
> > + * fault to keep the GPU from hanging
> > + */
> > + smmu_domain->cfg.sctlr_set |= ARM_SMMU_SCTLR_HUPCF;
> > +
> > /*
> > * Initialize private interface with GPU:
> > */
> > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > index dad7fa86fbd4..1f06ab219819 100644
> > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > @@ -617,6 +617,9 @@ void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx)
> > if (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN))
> > reg |= ARM_SMMU_SCTLR_E;
> > + reg |= cfg->sctlr_set;
> > + reg &= ~cfg->sctlr_clr;
>
> Since we now have a write_s2cr hook, I'm inclined to think that the
> consistency of a write_sctlr hook that could similarly apply its own
> arbitrary tweaks would make sense for this. Does anyone have any strong
> opinions?
>
I like it.
Regards,
Bjorn
> Robin.
>
> > +
> > arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, reg);
> > }
> > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> > index 6c5ff9999eae..ddf2ca4c923d 100644
> > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
> > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> > @@ -144,6 +144,7 @@ enum arm_smmu_cbar_type {
> > #define ARM_SMMU_CB_SCTLR 0x0
> > #define ARM_SMMU_SCTLR_S1_ASIDPNE BIT(12)
> > #define ARM_SMMU_SCTLR_CFCFG BIT(7)
> > +#define ARM_SMMU_SCTLR_HUPCF BIT(8)
> > #define ARM_SMMU_SCTLR_CFIE BIT(6)
> > #define ARM_SMMU_SCTLR_CFRE BIT(5)
> > #define ARM_SMMU_SCTLR_E BIT(4)
> > @@ -341,6 +342,8 @@ struct arm_smmu_cfg {
> > u16 asid;
> > u16 vmid;
> > };
> > + u32 sctlr_set; /* extra bits to set in SCTLR */
> > + u32 sctlr_clr; /* bits to mask in SCTLR */
> > enum arm_smmu_cbar_type cbar;
> > enum arm_smmu_context_fmt fmt;
> > };
> >
On Tue, Nov 03, 2020 at 10:28:13AM -0700, Jordan Crouse wrote:
> On Mon, Nov 02, 2020 at 06:18:45PM +0000, Robin Murphy wrote:
> > On 2020-11-02 17:14, Jordan Crouse wrote:
> > >From: Rob Clark <[email protected]>
> > >
> > >For the Adreno GPU's SMMU, we want SCTLR.HUPCF set to ensure that
> > >pending translations are not terminated on iova fault. Otherwise
> > >a terminated CP read could hang the GPU by returning invalid
> > >command-stream data.
> > >
> > >Signed-off-by: Rob Clark <[email protected]>
> > >Reviewed-by: Bjorn Andersson <[email protected]>
> > >Signed-off-by: Jordan Crouse <[email protected]>
> > >---
> > >
> > > drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 6 ++++++
> > > drivers/iommu/arm/arm-smmu/arm-smmu.c | 3 +++
> > > drivers/iommu/arm/arm-smmu/arm-smmu.h | 3 +++
> > > 3 files changed, 12 insertions(+)
> > >
> > >diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> > >index 1e942eed2dfc..0663d7d26908 100644
> > >--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> > >+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> > >@@ -129,6 +129,12 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
> > > (smmu_domain->cfg.fmt == ARM_SMMU_CTX_FMT_AARCH64))
> > > pgtbl_cfg->quirks |= IO_PGTABLE_QUIRK_ARM_TTBR1;
> > >+ /*
> > >+ * On the GPU device we want to process subsequent transactions after a
> > >+ * fault to keep the GPU from hanging
> > >+ */
> > >+ smmu_domain->cfg.sctlr_set |= ARM_SMMU_SCTLR_HUPCF;
> > >+
> > > /*
> > > * Initialize private interface with GPU:
> > > */
> > >diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > >index dad7fa86fbd4..1f06ab219819 100644
> > >--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > >+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > >@@ -617,6 +617,9 @@ void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx)
> > > if (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN))
> > > reg |= ARM_SMMU_SCTLR_E;
> > >+ reg |= cfg->sctlr_set;
> > >+ reg &= ~cfg->sctlr_clr;
> >
> > Since we now have a write_s2cr hook, I'm inclined to think that the
> > consistency of a write_sctlr hook that could similarly apply its own
> > arbitrary tweaks would make sense for this. Does anyone have any strong
> > opinions?
>
> None from me. That would make an eventual stall-on-fault implementation easier
> too.
Sounds like people like this idea, so please can you spin a new version with
that so that I can queue the first three patches for 5.11?
Cheers,
Will