2018-03-02 17:35:51

by Vivek Gautam

[permalink] [raw]
Subject: [PATCH v8 0/5] iommu/arm-smmu: Add runtime pm/sleep support

This series provides the support for turning on the arm-smmu's
clocks/power domains using runtime pm. This is done using the
recently introduced device links patches, which lets the smmu's
runtime to follow the master's runtime pm, so the smmu remains
powered only when the masters use it.

It also adds support for Qcom's arm-smmu-v2 variant that
has different clocks and power requirements.

Took some reference from the exynos runtime patches [1].

After another round of discussion [3], we now finally seem to be
in agreement to add a flag based on compatible, a flag that would
indicate if a particular implementation of arm-smmu supports
runtime pm or not.
This lets us to use the much-argued pm_runtime_get_sync/put_sync()
calls in map/unmap callbacks so that the clients do not have to
worry about handling any of the arm-smmu's power.
The patch that exported couple of pm_runtime suppliers APIS, viz.
pm_runtime_get_suppliers(), and pm_runtime_put_suppliers() can be
dropped since we don't have a user now for these APIs.
Thanks Rafael for reviewing the changes, but looks like we don't
need to export those APIs for some more time. :)

Previous version of this patch series is @ [5].

[v8]
* Major change -
- Added a flag 'rpm_supported' which each platform that supports
runtime pm, can enable, and we enable runtime_pm over arm-smmu
only when this flag is set.
- Adding the conditional pm_runtime_get/put() calls to .map, .unmap
and .attach_dev ops.
- Dropped the patch [6] that exported pm_runtim_get/put_suupliers(),
and also dropped the user driver patch [7] for these APIs.

* Clock code further cleanup
- doing only clk_bulk_enable() and clk_bulk_disable() in runtime pm
callbacks. We shouldn't be taking a slow path (clk_prepare/unprepare())
from these runtime pm callbacks. Thereby, moved clk_bulk_prepare() to
arm_smmu_device_probe(), and clk_bulk_unprepare() to
arm_smmu_device_remove().
- clk data filling to a common method arm_smmu_fill_clk_data() that
fills the clock ids and number of clocks.

* Addressed other nits and comments
- device_link_add() error path fixed.
- Fix for checking negative error value from pm_runtime_get_sync().
- Documentation redo.

* Added another patch fixing the error path in arm_smmu_attach_dev()
to destroy allocated domain context.

[v7]
* Addressed review comments given by Robin Murphy -
- Added device_link_del() in .remove_device path.
- Error path cleanup in arm_smmu_add_device().
- Added pm_runtime_get/put_sync() in .remove path, and replaced
pm_runtime_force_suspend() with pm_runtime_disable().
- clk_names cleanup in arm_smmu_init_clks()
* Added 'Reviewed-by' given by Rob H.

[V6]
* Added Ack given by Rafael to first patch in the series.
* Addressed Rob Herring's comment for adding soc specific compatible
string as well besides 'qcom,smmu-v2'.

[V5]
* Dropped runtime pm calls from "arm_smmu_unmap" op as discussed over
the list [3] for the last patch series.
* Added a patch to export pm_runtime_get/put_suppliers() APIs to the
series as agreed with Rafael [4].
* Added the related patch for msm drm iommu layer to use
pm_runtime_get/put_suppliers() APIs in msm_mmu_funcs.
* Dropped arm-mmu500 clock patch since that would break existing
platforms.
* Changed compatible 'qcom,msm8996-smmu-v2' to 'qcom,smmu-v2' to reflect
the IP version rather than the platform on which it is used.
The same IP is used across multiple platforms including msm8996,
and sdm845 etc.
* Using clock bulk APIs to handle the clocks available to the IP as
suggested by Stephen Boyd.
* The first patch in v4 version of the patch-series:
("iommu/arm-smmu: Fix the error path in arm_smmu_add_device") has
already made it to mainline.

[V4]
* Reworked the clock handling part. We now take clock names as data
in the driver for supported compatible versions, and loop over them
to get, enable, and disable the clocks.
* Using qcom,msm8996 based compatibles for bindings instead of a generic
qcom compatible.
* Refactor MMU500 patch to just add the necessary clock names data and
corresponding bindings.
* Added the pm_runtime_get/put() calls in .unmap iommu op (fix added by
Stanimir on top of previous patch version.
* Added a patch to fix error path in arm_smmu_add_device()
* Removed patch 3/5 of V3 patch series that added qcom,smmu-v2 bindings.

[V3]
* Reworked the patches to keep the clocks init/enabling function
separately for each compatible.

* Added clocks bindings for MMU40x/500.

* Added a new compatible for qcom,smmu-v2 implementation and
the clock bindings for the same.

* Rebased on top of 4.11-rc1

[V2]
* Split the patches little differently.

* Addressed comments.

* Removed the patch #4 [2] from previous post
for arm-smmu context save restore. Planning to
post this separately after reworking/addressing Robin's
feedback.

* Reversed the sequence to disable clocks than enabling.
This was required for those cases where the
clocks are populated in a dependent order from DT.

[1] https://lkml.org/lkml/2016/10/20/70
[2] https://patchwork.kernel.org/patch/9389717/
[3] https://patchwork.kernel.org/patch/10204925/
[4] https://patchwork.kernel.org/patch/10102445/
[5] https://lkml.org/lkml/2018/2/7/144
[6] https://patchwork.kernel.org/patch/10204945/
[7] https://patchwork.kernel.org/patch/10204925/

Sricharan R (3):
iommu/arm-smmu: Add pm_runtime/sleep ops
iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
iommu/arm-smmu: Add the device_link between masters and smmu

Vivek Gautam (2):
iommu/arm-smmu: Destroy domain context in failure path
iommu/arm-smmu: Add support for qcom,smmu-v2 variant

.../devicetree/bindings/iommu/arm,smmu.txt | 42 +++++
drivers/iommu/arm-smmu.c | 199 +++++++++++++++++++--
2 files changed, 230 insertions(+), 11 deletions(-)

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation



2018-03-02 10:12:57

by Vivek Gautam

[permalink] [raw]
Subject: [PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

From: Sricharan R <[email protected]>

The smmu device probe/remove and add/remove master device callbacks
gets called when the smmu is not linked to its master, that is without
the context of the master device. So calling runtime apis in those places
separately.

Signed-off-by: Sricharan R <[email protected]>
[vivek: Cleanup pm runtime calls]
Signed-off-by: Vivek Gautam <[email protected]>
---
drivers/iommu/arm-smmu.c | 96 ++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 88 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index c8b16f53f597..3d6a1875431f 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -209,6 +209,8 @@ struct arm_smmu_device {
struct clk_bulk_data *clks;
int num_clks;

+ bool rpm_supported;
+
u32 cavium_id_base; /* Specific to Cavium */

spinlock_t global_sync_lock;
@@ -268,6 +270,20 @@ static struct arm_smmu_option_prop arm_smmu_options[] = {
{ 0, NULL},
};

+static inline int arm_smmu_rpm_get(struct arm_smmu_device *smmu)
+{
+ if (smmu->rpm_supported)
+ return pm_runtime_get_sync(smmu->dev);
+
+ return 0;
+}
+
+static inline void arm_smmu_rpm_put(struct arm_smmu_device *smmu)
+{
+ if (smmu->rpm_supported)
+ pm_runtime_put(smmu->dev);
+}
+
static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
{
return container_of(dom, struct arm_smmu_domain, domain);
@@ -913,11 +929,15 @@ static void arm_smmu_destroy_domain_context(struct iommu_domain *domain)
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
struct arm_smmu_device *smmu = smmu_domain->smmu;
struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
- int irq;
+ int ret, irq;

if (!smmu || domain->type == IOMMU_DOMAIN_IDENTITY)
return;

+ ret = arm_smmu_rpm_get(smmu);
+ if (ret < 0)
+ return;
+
/*
* Disable the context bank and free the page tables before freeing
* it.
@@ -932,6 +952,8 @@ static void arm_smmu_destroy_domain_context(struct iommu_domain *domain)

free_io_pgtable_ops(smmu_domain->pgtbl_ops);
__arm_smmu_free_bitmap(smmu->context_map, cfg->cbndx);
+
+ arm_smmu_rpm_put(smmu);
}

static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
@@ -1213,10 +1235,15 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
return -ENODEV;

smmu = fwspec_smmu(fwspec);
+
+ ret = arm_smmu_rpm_get(smmu);
+ if (ret < 0)
+ return ret;
+
/* Ensure that the domain is finalised */
ret = arm_smmu_init_domain_context(domain, smmu);
if (ret < 0)
- return ret;
+ goto rpm_put;

/*
* Sanity check the domain. We don't support domains across
@@ -1231,10 +1258,17 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
}

/* Looks ok, so add the device to the domain */
- return arm_smmu_domain_add_master(smmu_domain, fwspec);
+ ret = arm_smmu_domain_add_master(smmu_domain, fwspec);
+
+ arm_smmu_rpm_put(smmu);
+
+ return ret;

destroy_domain:
arm_smmu_destroy_domain_context(domain);
+rpm_put:
+ arm_smmu_rpm_put(smmu);
+
return ret;
}

@@ -1242,22 +1276,36 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
phys_addr_t paddr, size_t size, int prot)
{
struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+ struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+ struct arm_smmu_device *smmu = smmu_domain->smmu;
+ int ret;

if (!ops)
return -ENODEV;

- return ops->map(ops, iova, paddr, size, prot);
+ arm_smmu_rpm_get(smmu);
+ ret = ops->map(ops, iova, paddr, size, prot);
+ arm_smmu_rpm_put(smmu);
+
+ return ret;
}

static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
size_t size)
{
struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+ struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+ struct arm_smmu_device *smmu = smmu_domain->smmu;
+ size_t ret;

if (!ops)
return 0;

- return ops->unmap(ops, iova, size);
+ arm_smmu_rpm_get(smmu);
+ ret = ops->unmap(ops, iova, size);
+ arm_smmu_rpm_put(smmu);
+
+ return ret;
}

static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
@@ -1412,14 +1460,22 @@ static int arm_smmu_add_device(struct device *dev)
while (i--)
cfg->smendx[i] = INVALID_SMENDX;

+ ret = arm_smmu_rpm_get(smmu);
+ if (ret < 0)
+ goto out_cfg_free;
+
ret = arm_smmu_master_alloc_smes(dev);
if (ret)
- goto out_cfg_free;
+ goto out_rpm_put;

iommu_device_link(&smmu->iommu, dev);

+ arm_smmu_rpm_put(smmu);
+
return 0;

+out_rpm_put:
+ arm_smmu_rpm_put(smmu);
out_cfg_free:
kfree(cfg);
out_free:
@@ -1432,7 +1488,7 @@ static void arm_smmu_remove_device(struct device *dev)
struct iommu_fwspec *fwspec = dev->iommu_fwspec;
struct arm_smmu_master_cfg *cfg;
struct arm_smmu_device *smmu;
-
+ int ret;

if (!fwspec || fwspec->ops != &arm_smmu_ops)
return;
@@ -1440,8 +1496,15 @@ static void arm_smmu_remove_device(struct device *dev)
cfg = fwspec->iommu_priv;
smmu = cfg->smmu;

+ ret = arm_smmu_rpm_get(smmu);
+ if (ret < 0)
+ return;
+
iommu_device_unlink(&smmu->iommu, dev);
arm_smmu_master_free_smes(fwspec);
+
+ arm_smmu_rpm_put(smmu);
+
iommu_group_remove_device(dev);
kfree(fwspec->iommu_priv);
iommu_fwspec_free(dev);
@@ -1907,6 +1970,7 @@ struct arm_smmu_match_data {
enum arm_smmu_implementation model;
const char * const *clks;
int num_clks;
+ bool rpm_supported;
};

#define ARM_SMMU_MATCH_DATA(name, ver, imp) \
@@ -2029,6 +2093,7 @@ static int arm_smmu_device_dt_probe(struct platform_device *pdev,
smmu->version = data->version;
smmu->model = data->model;
smmu->num_clks = data->num_clks;
+ smmu->rpm_supported = data->rpm_supported;

arm_smmu_fill_clk_data(smmu, data->clks);

@@ -2129,6 +2194,8 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
smmu->irqs[i] = irq;
}

+ platform_set_drvdata(pdev, smmu);
+
err = devm_clk_bulk_get(smmu->dev, smmu->num_clks, smmu->clks);
if (err)
return err;
@@ -2137,6 +2204,13 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
if (err)
return err;

+ if (smmu->rpm_supported)
+ pm_runtime_enable(dev);
+
+ err = arm_smmu_rpm_get(smmu);
+ if (err < 0)
+ return err;
+
err = arm_smmu_device_cfg_probe(smmu);
if (err)
return err;
@@ -2178,10 +2252,11 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
return err;
}

- platform_set_drvdata(pdev, smmu);
arm_smmu_device_reset(smmu);
arm_smmu_test_smr_masks(smmu);

+ arm_smmu_rpm_put(smmu);
+
/*
* For ACPI and generic DT bindings, an SMMU will be probed before
* any device which might need it, so we want the bus ops in place
@@ -2217,9 +2292,14 @@ static int arm_smmu_device_remove(struct platform_device *pdev)
if (!bitmap_empty(smmu->context_map, ARM_SMMU_MAX_CBS))
dev_err(&pdev->dev, "removing device with active domains!\n");

+ arm_smmu_rpm_get(smmu);
/* Turn the thing off */
writel(sCR0_CLIENTPD, ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);

+ arm_smmu_rpm_put(smmu);
+ if (smmu->rpm_supported)
+ pm_runtime_disable(smmu->dev);
+
clk_bulk_unprepare(smmu->num_clks, smmu->clks);

return 0;
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation


2018-03-02 10:13:15

by Vivek Gautam

[permalink] [raw]
Subject: [PATCH v8 5/5] iommu/arm-smmu: Add support for qcom,smmu-v2 variant

qcom,smmu-v2 is an arm,smmu-v2 implementation with specific
clock and power requirements. This smmu core is used with
multiple masters on msm8996, viz. mdss, video, etc.
Add bindings for the same.

Signed-off-by: Vivek Gautam <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
---
.../devicetree/bindings/iommu/arm,smmu.txt | 42 ++++++++++++++++++++++
drivers/iommu/arm-smmu.c | 15 ++++++++
2 files changed, 57 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
index 8a6ffce12af5..6ea27bd4f785 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
@@ -17,10 +17,19 @@ conditions.
"arm,mmu-401"
"arm,mmu-500"
"cavium,smmu-v2"
+ "qcom,<soc>-smmu-v2", "qcom,smmu-v2"

depending on the particular implementation and/or the
version of the architecture implemented.

+ A number of Qcom SoCs use qcom,smmu-v2 version of the IP.
+ "qcom,<soc>-smmu-v2" represents a soc specific compatible
+ string that should be present along with the "qcom,smmu-v2"
+ to facilitate SoC specific clocks/power connections and to
+ address specific bug fixes.
+ An example string would be -
+ "qcom,msm8996-smmu-v2", "qcom,smmu-v2".
+
- reg : Base address and size of the SMMU.

- #global-interrupts : The number of global interrupts exposed by the
@@ -71,6 +80,22 @@ conditions.
or using stream matching with #iommu-cells = <2>, and
may be ignored if present in such cases.

+- clock-names: List of the names of clocks input to the device. The
+ required list depends on particular implementation and
+ is as follows:
+ - for "qcom,smmu-v2":
+ - "bus": clock required for downstream bus access and
+ for the smmu ptw,
+ - "iface": clock required to access smmu's registers
+ through the TCU's programming interface.
+ - unspecified for other implementations.
+
+- clocks: Specifiers for all clocks listed in the clock-names property,
+ as per generic clock bindings.
+
+- power-domains: Specifiers for power domains required to be powered on for
+ the SMMU to operate, as per generic power domain bindings.
+
** Deprecated properties:

- mmu-masters (deprecated in favour of the generic "iommus" binding) :
@@ -137,3 +162,20 @@ conditions.
iommu-map = <0 &smmu3 0 0x400>;
...
};
+
+ /* Qcom's arm,smmu-v2 implementation */
+ smmu4: iommu {
+ compatible = "qcom,msm8996-smmu-v2", "qcom,smmu-v2";
+ reg = <0xd00000 0x10000>;
+
+ #global-interrupts = <1>;
+ interrupts = <GIC_SPI 73 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 320 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 321 IRQ_TYPE_LEVEL_HIGH>;
+ #iommu-cells = <1>;
+ power-domains = <&mmcc MDSS_GDSC>;
+
+ clocks = <&mmcc SMMU_MDP_AXI_CLK>,
+ <&mmcc SMMU_MDP_AHB_CLK>;
+ clock-names = "bus", "iface";
+ };
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index bb1ea82c1003..7a96c924ae22 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -119,6 +119,7 @@ enum arm_smmu_implementation {
GENERIC_SMMU,
ARM_MMU500,
CAVIUM_SMMUV2,
+ QCOM_SMMUV2,
};

struct arm_smmu_s2cr {
@@ -2003,6 +2004,18 @@ ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, GENERIC_SMMU);
ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500);
ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2);

+static const char * const qcom_smmuv2_clks[] = {
+ "bus", "iface",
+};
+
+static const struct arm_smmu_match_data qcom_smmuv2 = {
+ .version = ARM_SMMU_V2,
+ .model = QCOM_SMMUV2,
+ .clks = qcom_smmuv2_clks,
+ .num_clks = ARRAY_SIZE(qcom_smmuv2_clks),
+ .rpm_supported = true,
+};
+
static const struct of_device_id arm_smmu_of_match[] = {
{ .compatible = "arm,smmu-v1", .data = &smmu_generic_v1 },
{ .compatible = "arm,smmu-v2", .data = &smmu_generic_v2 },
@@ -2010,6 +2023,7 @@ static const struct of_device_id arm_smmu_of_match[] = {
{ .compatible = "arm,mmu-401", .data = &arm_mmu401 },
{ .compatible = "arm,mmu-500", .data = &arm_mmu500 },
{ .compatible = "cavium,smmu-v2", .data = &cavium_smmuv2 },
+ { .compatible = "qcom,smmu-v2", .data = &qcom_smmuv2 },
{ },
};
MODULE_DEVICE_TABLE(of, arm_smmu_of_match);
@@ -2379,6 +2393,7 @@ IOMMU_OF_DECLARE(arm_mmu400, "arm,mmu-400");
IOMMU_OF_DECLARE(arm_mmu401, "arm,mmu-401");
IOMMU_OF_DECLARE(arm_mmu500, "arm,mmu-500");
IOMMU_OF_DECLARE(cavium_smmuv2, "cavium,smmu-v2");
+IOMMU_OF_DECLARE(qcom_smmuv2, "qcom,smmu-v2");

MODULE_DESCRIPTION("IOMMU API for ARM architected SMMU implementations");
MODULE_AUTHOR("Will Deacon <[email protected]>");
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation


2018-03-02 10:14:35

by Vivek Gautam

[permalink] [raw]
Subject: [PATCH v8 4/5] iommu/arm-smmu: Add the device_link between masters and smmu

From: Sricharan R <[email protected]>

Finally add the device link between the master device and
smmu, so that the smmu gets runtime enabled/disabled only when the
master needs it. This is done from add_device callback which gets
called once when the master is added to the smmu.

Signed-off-by: Sricharan R <[email protected]>
Signed-off-by: Vivek Gautam <[email protected]>
---
drivers/iommu/arm-smmu.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 3d6a1875431f..bb1ea82c1003 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -217,6 +217,9 @@ struct arm_smmu_device {

/* IOMMU core code handle */
struct iommu_device iommu;
+
+ /* runtime PM link to master */
+ struct device_link *link;
};

enum arm_smmu_context_fmt {
@@ -1470,10 +1473,26 @@ static int arm_smmu_add_device(struct device *dev)

iommu_device_link(&smmu->iommu, dev);

+ /*
+ * Establish the link between smmu and master, so that the
+ * smmu gets runtime enabled/disabled as per the master's
+ * needs.
+ */
+ smmu->link = device_link_add(dev, smmu->dev, DL_FLAG_PM_RUNTIME);
+ if (!smmu->link) {
+ dev_warn(smmu->dev, "Unable to create device link between %s and %s\n",
+ dev_name(smmu->dev), dev_name(dev));
+ ret = -ENODEV;
+ goto out_unlink;
+ }
+
arm_smmu_rpm_put(smmu);

return 0;

+out_unlink:
+ iommu_device_unlink(&smmu->iommu, dev);
+ arm_smmu_master_free_smes(fwspec);
out_rpm_put:
arm_smmu_rpm_put(smmu);
out_cfg_free:
@@ -1496,6 +1515,8 @@ static void arm_smmu_remove_device(struct device *dev)
cfg = fwspec->iommu_priv;
smmu = cfg->smmu;

+ device_link_del(smmu->link);
+
ret = arm_smmu_rpm_get(smmu);
if (ret < 0)
return;
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation


2018-03-02 10:16:32

by Vivek Gautam

[permalink] [raw]
Subject: [PATCH v8 1/5] iommu/arm-smmu: Destroy domain context in failure path

If we fail after initializing domain_context, we should destroy
the context to free up resources.

Signed-off-by: Vivek Gautam <[email protected]>
---

* New patch added in this series.

drivers/iommu/arm-smmu.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 69e7c60792a8..ffc152c36002 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1223,11 +1223,16 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
dev_err(dev,
"cannot attach to SMMU %s whilst already attached to domain on SMMU %s\n",
dev_name(smmu_domain->smmu->dev), dev_name(smmu->dev));
- return -EINVAL;
+ ret = -EINVAL;
+ goto destroy_domain;
}

/* Looks ok, so add the device to the domain */
return arm_smmu_domain_add_master(smmu_domain, fwspec);
+
+destroy_domain:
+ arm_smmu_destroy_domain_context(domain);
+ return ret;
}

static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation


2018-03-02 10:18:07

by Vivek Gautam

[permalink] [raw]
Subject: [PATCH v8 2/5] iommu/arm-smmu: Add pm_runtime/sleep ops

From: Sricharan R <[email protected]>

The smmu needs to be functional only when the respective
master's using it are active. The device_link feature
helps to track such functional dependencies, so that the
iommu gets powered when the master device enables itself
using pm_runtime. So by adapting the smmu driver for
runtime pm, above said dependency can be addressed.

This patch adds the pm runtime/sleep callbacks to the
driver and also the functions to parse the smmu clocks
from DT and enable them in resume/suspend.

Signed-off-by: Sricharan R <[email protected]>
Signed-off-by: Archit Taneja <[email protected]>
[vivek: Clock rework to request bulk of clocks]
Signed-off-by: Vivek Gautam <[email protected]>
---
drivers/iommu/arm-smmu.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 58 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index ffc152c36002..c8b16f53f597 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -48,6 +48,7 @@
#include <linux/of_iommu.h>
#include <linux/pci.h>
#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
#include <linux/slab.h>
#include <linux/spinlock.h>

@@ -205,6 +206,8 @@ struct arm_smmu_device {
u32 num_global_irqs;
u32 num_context_irqs;
unsigned int *irqs;
+ struct clk_bulk_data *clks;
+ int num_clks;

u32 cavium_id_base; /* Specific to Cavium */

@@ -1902,10 +1905,12 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
struct arm_smmu_match_data {
enum arm_smmu_arch_version version;
enum arm_smmu_implementation model;
+ const char * const *clks;
+ int num_clks;
};

#define ARM_SMMU_MATCH_DATA(name, ver, imp) \
-static struct arm_smmu_match_data name = { .version = ver, .model = imp }
+static const struct arm_smmu_match_data name = { .version = ver, .model = imp }

ARM_SMMU_MATCH_DATA(smmu_generic_v1, ARM_SMMU_V1, GENERIC_SMMU);
ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU);
@@ -1924,6 +1929,23 @@ static const struct of_device_id arm_smmu_of_match[] = {
};
MODULE_DEVICE_TABLE(of, arm_smmu_of_match);

+static void arm_smmu_fill_clk_data(struct arm_smmu_device *smmu,
+ const char * const *clks)
+{
+ int i;
+
+ if (smmu->num_clks < 1)
+ return;
+
+ smmu->clks = devm_kcalloc(smmu->dev, smmu->num_clks,
+ sizeof(*smmu->clks), GFP_KERNEL);
+ if (!smmu->clks)
+ return;
+
+ for (i = 0; i < smmu->num_clks; i++)
+ smmu->clks[i].id = clks[i];
+}
+
#ifdef CONFIG_ACPI
static int acpi_smmu_get_data(u32 model, struct arm_smmu_device *smmu)
{
@@ -2006,6 +2028,9 @@ static int arm_smmu_device_dt_probe(struct platform_device *pdev,
data = of_device_get_match_data(dev);
smmu->version = data->version;
smmu->model = data->model;
+ smmu->num_clks = data->num_clks;
+
+ arm_smmu_fill_clk_data(smmu, data->clks);

parse_driver_options(smmu);

@@ -2104,6 +2129,14 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
smmu->irqs[i] = irq;
}

+ err = devm_clk_bulk_get(smmu->dev, smmu->num_clks, smmu->clks);
+ if (err)
+ return err;
+
+ err = clk_bulk_prepare(smmu->num_clks, smmu->clks);
+ if (err)
+ return err;
+
err = arm_smmu_device_cfg_probe(smmu);
if (err)
return err;
@@ -2186,6 +2219,9 @@ static int arm_smmu_device_remove(struct platform_device *pdev)

/* Turn the thing off */
writel(sCR0_CLIENTPD, ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
+
+ clk_bulk_unprepare(smmu->num_clks, smmu->clks);
+
return 0;
}

@@ -2202,7 +2238,27 @@ static int __maybe_unused arm_smmu_pm_resume(struct device *dev)
return 0;
}

-static SIMPLE_DEV_PM_OPS(arm_smmu_pm_ops, NULL, arm_smmu_pm_resume);
+static int __maybe_unused arm_smmu_runtime_resume(struct device *dev)
+{
+ struct arm_smmu_device *smmu = dev_get_drvdata(dev);
+
+ return clk_bulk_enable(smmu->num_clks, smmu->clks);
+}
+
+static int __maybe_unused arm_smmu_runtime_suspend(struct device *dev)
+{
+ struct arm_smmu_device *smmu = dev_get_drvdata(dev);
+
+ clk_bulk_disable(smmu->num_clks, smmu->clks);
+
+ return 0;
+}
+
+static const struct dev_pm_ops arm_smmu_pm_ops = {
+ SET_SYSTEM_SLEEP_PM_OPS(NULL, arm_smmu_pm_resume)
+ SET_RUNTIME_PM_OPS(arm_smmu_runtime_suspend,
+ arm_smmu_runtime_resume, NULL)
+};

static struct platform_driver arm_smmu_driver = {
.driver = {
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation


2018-03-05 14:18:32

by Tomasz Figa

[permalink] [raw]
Subject: Re: [PATCH v8 0/5] iommu/arm-smmu: Add runtime pm/sleep support

Hi Vivek,

On Fri, Mar 2, 2018 at 7:10 PM, Vivek Gautam
<[email protected]> wrote:
> This series provides the support for turning on the arm-smmu's
> clocks/power domains using runtime pm. This is done using the
> recently introduced device links patches, which lets the smmu's
> runtime to follow the master's runtime pm, so the smmu remains
> powered only when the masters use it.
>
> It also adds support for Qcom's arm-smmu-v2 variant that
> has different clocks and power requirements.
>
> Took some reference from the exynos runtime patches [1].
>
> After another round of discussion [3], we now finally seem to be
> in agreement to add a flag based on compatible, a flag that would
> indicate if a particular implementation of arm-smmu supports
> runtime pm or not.
> This lets us to use the much-argued pm_runtime_get_sync/put_sync()
> calls in map/unmap callbacks so that the clients do not have to
> worry about handling any of the arm-smmu's power.
> The patch that exported couple of pm_runtime suppliers APIS, viz.
> pm_runtime_get_suppliers(), and pm_runtime_put_suppliers() can be
> dropped since we don't have a user now for these APIs.
> Thanks Rafael for reviewing the changes, but looks like we don't
> need to export those APIs for some more time. :)
>
> Previous version of this patch series is @ [5].

Thanks for addressing my comments. There is still a bit of space for
improving the granularity of power management, as far as I understood
how it works on SDM845 correctly, but as a first step, this should at
least let things work.

Reviewed-by: Tomasz Figa <[email protected]>

Best regards,
Tomasz

2018-03-05 17:20:47

by Vivek Gautam

[permalink] [raw]
Subject: Re: [PATCH v8 0/5] iommu/arm-smmu: Add runtime pm/sleep support

Hi Tomasz,


On 3/5/2018 6:55 PM, Tomasz Figa wrote:
> Hi Vivek,
>
> On Fri, Mar 2, 2018 at 7:10 PM, Vivek Gautam
> <[email protected]> wrote:
>> This series provides the support for turning on the arm-smmu's
>> clocks/power domains using runtime pm. This is done using the
>> recently introduced device links patches, which lets the smmu's
>> runtime to follow the master's runtime pm, so the smmu remains
>> powered only when the masters use it.
>>
>> It also adds support for Qcom's arm-smmu-v2 variant that
>> has different clocks and power requirements.
>>
>> Took some reference from the exynos runtime patches [1].
>>
>> After another round of discussion [3], we now finally seem to be
>> in agreement to add a flag based on compatible, a flag that would
>> indicate if a particular implementation of arm-smmu supports
>> runtime pm or not.
>> This lets us to use the much-argued pm_runtime_get_sync/put_sync()
>> calls in map/unmap callbacks so that the clients do not have to
>> worry about handling any of the arm-smmu's power.
>> The patch that exported couple of pm_runtime suppliers APIS, viz.
>> pm_runtime_get_suppliers(), and pm_runtime_put_suppliers() can be
>> dropped since we don't have a user now for these APIs.
>> Thanks Rafael for reviewing the changes, but looks like we don't
>> need to export those APIs for some more time. :)
>>
>> Previous version of this patch series is @ [5].
> Thanks for addressing my comments. There is still a bit of space for
> improving the granularity of power management, as far as I understood
> how it works on SDM845 correctly, but as a first step, this should at
> least let things work.

Sure. I will be sending a patch, based on this series, to add
'qcom,smmu-500'
that enables *rpm_suported* flag for us.
We can try to take care of some of the things with that.
> Reviewed-by: Tomasz Figa <[email protected]>

Thanks for the review.

regards
Vivek
>
> Best regards,
> Tomasz


2018-03-07 12:22:18

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH v8 1/5] iommu/arm-smmu: Destroy domain context in failure path

On 02/03/18 10:10, Vivek Gautam wrote:
> If we fail after initializing domain_context, we should destroy
> the context to free up resources.

Have another think about why the "problem" this patch caters for cannot
ever happen (hint: consider how domain->smmu is used in
arm_smmu_init_domain_context()). And then also about the really
catastrophically bad problem it actually introduces (hint:
"iommu_attach(domain, good_dev); iommu_attach(domain, bad_dev);")

Robin.

> Signed-off-by: Vivek Gautam <[email protected]>
> ---
>
> * New patch added in this series.
>
> drivers/iommu/arm-smmu.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 69e7c60792a8..ffc152c36002 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -1223,11 +1223,16 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
> dev_err(dev,
> "cannot attach to SMMU %s whilst already attached to domain on SMMU %s\n",
> dev_name(smmu_domain->smmu->dev), dev_name(smmu->dev));
> - return -EINVAL;
> + ret = -EINVAL;
> + goto destroy_domain;
> }
>
> /* Looks ok, so add the device to the domain */
> return arm_smmu_domain_add_master(smmu_domain, fwspec);
> +
> +destroy_domain:
> + arm_smmu_destroy_domain_context(domain);
> + return ret;
> }
>
> static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
>

2018-03-07 12:40:39

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

On 02/03/18 10:10, Vivek Gautam wrote:
> From: Sricharan R <[email protected]>
>
> The smmu device probe/remove and add/remove master device callbacks
> gets called when the smmu is not linked to its master, that is without
> the context of the master device. So calling runtime apis in those places
> separately.
>
> Signed-off-by: Sricharan R <[email protected]>
> [vivek: Cleanup pm runtime calls]
> Signed-off-by: Vivek Gautam <[email protected]>
> ---
> drivers/iommu/arm-smmu.c | 96 ++++++++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 88 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index c8b16f53f597..3d6a1875431f 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -209,6 +209,8 @@ struct arm_smmu_device {
> struct clk_bulk_data *clks;
> int num_clks;
>
> + bool rpm_supported;
> +

Can we not automatically infer this from whether clocks and/or power
domains are specified or not, then just use pm_runtime_enabled() as the
fast-path check as Tomasz originally proposed?

I worry that relying on statically-defined matchdata is just going to
blow up the driver and DT binding into a maintenance nightmare; I really
don't want to start needing separate definitions for e.g.
"arm,juno-etr-mmu-401" and "arm,juno-hdlcd-mmu-401" just because one
otherwise-identical instance within the SoC is in a separate
controllable power domain while the others aren't.

Robin.

> u32 cavium_id_base; /* Specific to Cavium */
>
> spinlock_t global_sync_lock;
> @@ -268,6 +270,20 @@ static struct arm_smmu_option_prop arm_smmu_options[] = {
> { 0, NULL},
> };
>
> +static inline int arm_smmu_rpm_get(struct arm_smmu_device *smmu)
> +{
> + if (smmu->rpm_supported)
> + return pm_runtime_get_sync(smmu->dev);
> +
> + return 0;
> +}
> +
> +static inline void arm_smmu_rpm_put(struct arm_smmu_device *smmu)
> +{
> + if (smmu->rpm_supported)
> + pm_runtime_put(smmu->dev);
> +}
> +
> static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
> {
> return container_of(dom, struct arm_smmu_domain, domain);
> @@ -913,11 +929,15 @@ static void arm_smmu_destroy_domain_context(struct iommu_domain *domain)
> struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> struct arm_smmu_device *smmu = smmu_domain->smmu;
> struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
> - int irq;
> + int ret, irq;
>
> if (!smmu || domain->type == IOMMU_DOMAIN_IDENTITY)
> return;
>
> + ret = arm_smmu_rpm_get(smmu);
> + if (ret < 0)
> + return;
> +
> /*
> * Disable the context bank and free the page tables before freeing
> * it.
> @@ -932,6 +952,8 @@ static void arm_smmu_destroy_domain_context(struct iommu_domain *domain)
>
> free_io_pgtable_ops(smmu_domain->pgtbl_ops);
> __arm_smmu_free_bitmap(smmu->context_map, cfg->cbndx);
> +
> + arm_smmu_rpm_put(smmu);
> }
>
> static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
> @@ -1213,10 +1235,15 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
> return -ENODEV;
>
> smmu = fwspec_smmu(fwspec);
> +
> + ret = arm_smmu_rpm_get(smmu);
> + if (ret < 0)
> + return ret;
> +
> /* Ensure that the domain is finalised */
> ret = arm_smmu_init_domain_context(domain, smmu);
> if (ret < 0)
> - return ret;
> + goto rpm_put;
>
> /*
> * Sanity check the domain. We don't support domains across
> @@ -1231,10 +1258,17 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
> }
>
> /* Looks ok, so add the device to the domain */
> - return arm_smmu_domain_add_master(smmu_domain, fwspec);
> + ret = arm_smmu_domain_add_master(smmu_domain, fwspec);
> +
> + arm_smmu_rpm_put(smmu);
> +
> + return ret;
>
> destroy_domain:
> arm_smmu_destroy_domain_context(domain);
> +rpm_put:
> + arm_smmu_rpm_put(smmu);
> +
> return ret;
> }
>
> @@ -1242,22 +1276,36 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
> phys_addr_t paddr, size_t size, int prot)
> {
> struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
> + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> + struct arm_smmu_device *smmu = smmu_domain->smmu;
> + int ret;
>
> if (!ops)
> return -ENODEV;
>
> - return ops->map(ops, iova, paddr, size, prot);
> + arm_smmu_rpm_get(smmu);
> + ret = ops->map(ops, iova, paddr, size, prot);
> + arm_smmu_rpm_put(smmu);
> +
> + return ret;
> }
>
> static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
> size_t size)
> {
> struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
> + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> + struct arm_smmu_device *smmu = smmu_domain->smmu;
> + size_t ret;
>
> if (!ops)
> return 0;
>
> - return ops->unmap(ops, iova, size);
> + arm_smmu_rpm_get(smmu);
> + ret = ops->unmap(ops, iova, size);
> + arm_smmu_rpm_put(smmu);
> +
> + return ret;
> }
>
> static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
> @@ -1412,14 +1460,22 @@ static int arm_smmu_add_device(struct device *dev)
> while (i--)
> cfg->smendx[i] = INVALID_SMENDX;
>
> + ret = arm_smmu_rpm_get(smmu);
> + if (ret < 0)
> + goto out_cfg_free;
> +
> ret = arm_smmu_master_alloc_smes(dev);
> if (ret)
> - goto out_cfg_free;
> + goto out_rpm_put;
>
> iommu_device_link(&smmu->iommu, dev);
>
> + arm_smmu_rpm_put(smmu);
> +
> return 0;
>
> +out_rpm_put:
> + arm_smmu_rpm_put(smmu);
> out_cfg_free:
> kfree(cfg);
> out_free:
> @@ -1432,7 +1488,7 @@ static void arm_smmu_remove_device(struct device *dev)
> struct iommu_fwspec *fwspec = dev->iommu_fwspec;
> struct arm_smmu_master_cfg *cfg;
> struct arm_smmu_device *smmu;
> -
> + int ret;
>
> if (!fwspec || fwspec->ops != &arm_smmu_ops)
> return;
> @@ -1440,8 +1496,15 @@ static void arm_smmu_remove_device(struct device *dev)
> cfg = fwspec->iommu_priv;
> smmu = cfg->smmu;
>
> + ret = arm_smmu_rpm_get(smmu);
> + if (ret < 0)
> + return;
> +
> iommu_device_unlink(&smmu->iommu, dev);
> arm_smmu_master_free_smes(fwspec);
> +
> + arm_smmu_rpm_put(smmu);
> +
> iommu_group_remove_device(dev);
> kfree(fwspec->iommu_priv);
> iommu_fwspec_free(dev);
> @@ -1907,6 +1970,7 @@ struct arm_smmu_match_data {
> enum arm_smmu_implementation model;
> const char * const *clks;
> int num_clks;
> + bool rpm_supported;
> };
>
> #define ARM_SMMU_MATCH_DATA(name, ver, imp) \
> @@ -2029,6 +2093,7 @@ static int arm_smmu_device_dt_probe(struct platform_device *pdev,
> smmu->version = data->version;
> smmu->model = data->model;
> smmu->num_clks = data->num_clks;
> + smmu->rpm_supported = data->rpm_supported;
>
> arm_smmu_fill_clk_data(smmu, data->clks);
>
> @@ -2129,6 +2194,8 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
> smmu->irqs[i] = irq;
> }
>
> + platform_set_drvdata(pdev, smmu);
> +
> err = devm_clk_bulk_get(smmu->dev, smmu->num_clks, smmu->clks);
> if (err)
> return err;
> @@ -2137,6 +2204,13 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
> if (err)
> return err;
>
> + if (smmu->rpm_supported)
> + pm_runtime_enable(dev);
> +
> + err = arm_smmu_rpm_get(smmu);
> + if (err < 0)
> + return err;
> +
> err = arm_smmu_device_cfg_probe(smmu);
> if (err)
> return err;
> @@ -2178,10 +2252,11 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
> return err;
> }
>
> - platform_set_drvdata(pdev, smmu);
> arm_smmu_device_reset(smmu);
> arm_smmu_test_smr_masks(smmu);
>
> + arm_smmu_rpm_put(smmu);
> +
> /*
> * For ACPI and generic DT bindings, an SMMU will be probed before
> * any device which might need it, so we want the bus ops in place
> @@ -2217,9 +2292,14 @@ static int arm_smmu_device_remove(struct platform_device *pdev)
> if (!bitmap_empty(smmu->context_map, ARM_SMMU_MAX_CBS))
> dev_err(&pdev->dev, "removing device with active domains!\n");
>
> + arm_smmu_rpm_get(smmu);
> /* Turn the thing off */
> writel(sCR0_CLIENTPD, ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
>
> + arm_smmu_rpm_put(smmu);
> + if (smmu->rpm_supported)
> + pm_runtime_disable(smmu->dev);
> +
> clk_bulk_unprepare(smmu->num_clks, smmu->clks);
>
> return 0;
>

2018-03-07 12:50:05

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH v8 4/5] iommu/arm-smmu: Add the device_link between masters and smmu

On 02/03/18 10:10, Vivek Gautam wrote:
> From: Sricharan R <[email protected]>
>
> Finally add the device link between the master device and
> smmu, so that the smmu gets runtime enabled/disabled only when the
> master needs it. This is done from add_device callback which gets
> called once when the master is added to the smmu.
>
> Signed-off-by: Sricharan R <[email protected]>
> Signed-off-by: Vivek Gautam <[email protected]>
> ---
> drivers/iommu/arm-smmu.c | 21 +++++++++++++++++++++
> 1 file changed, 21 insertions(+)
>
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 3d6a1875431f..bb1ea82c1003 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -217,6 +217,9 @@ struct arm_smmu_device {
>
> /* IOMMU core code handle */
> struct iommu_device iommu;
> +
> + /* runtime PM link to master */
> + struct device_link *link;

Just the one?

> };
>
> enum arm_smmu_context_fmt {
> @@ -1470,10 +1473,26 @@ static int arm_smmu_add_device(struct device *dev)
>
> iommu_device_link(&smmu->iommu, dev);
>
> + /*
> + * Establish the link between smmu and master, so that the
> + * smmu gets runtime enabled/disabled as per the master's
> + * needs.
> + */
> + smmu->link = device_link_add(dev, smmu->dev, DL_FLAG_PM_RUNTIME);

Maybe I've misunderstood how the API works, but AFAICS the second and
subsequent devices are all just going to overwrite (and leak) the link
of the previous one...

> + if (!smmu->link) {
> + dev_warn(smmu->dev, "Unable to create device link between %s and %s\n",
> + dev_name(smmu->dev), dev_name(dev));
> + ret = -ENODEV;
> + goto out_unlink;
> + }
> +
> arm_smmu_rpm_put(smmu);
>
> return 0;
>
> +out_unlink:
> + iommu_device_unlink(&smmu->iommu, dev);
> + arm_smmu_master_free_smes(fwspec);
> out_rpm_put:
> arm_smmu_rpm_put(smmu);
> out_cfg_free:
> @@ -1496,6 +1515,8 @@ static void arm_smmu_remove_device(struct device *dev)
> cfg = fwspec->iommu_priv;
> smmu = cfg->smmu;
>
> + device_link_del(smmu->link);

...and equivalently you end up with a double-free (or more) here of a
link which may not have belonged to dev anyway.

Robin.

> +
> ret = arm_smmu_rpm_get(smmu);
> if (ret < 0)
> return;
>

2018-03-07 13:54:28

by Tomasz Figa

[permalink] [raw]
Subject: Re: [PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

On Wed, Mar 7, 2018 at 9:38 PM, Robin Murphy <[email protected]> wrote:
> On 02/03/18 10:10, Vivek Gautam wrote:
>>
>> From: Sricharan R <[email protected]>
>>
>> The smmu device probe/remove and add/remove master device callbacks
>> gets called when the smmu is not linked to its master, that is without
>> the context of the master device. So calling runtime apis in those places
>> separately.
>>
>> Signed-off-by: Sricharan R <[email protected]>
>> [vivek: Cleanup pm runtime calls]
>> Signed-off-by: Vivek Gautam <[email protected]>
>> ---
>> drivers/iommu/arm-smmu.c | 96
>> ++++++++++++++++++++++++++++++++++++++++++++----
>> 1 file changed, 88 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index c8b16f53f597..3d6a1875431f 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -209,6 +209,8 @@ struct arm_smmu_device {
>> struct clk_bulk_data *clks;
>> int num_clks;
>> + bool rpm_supported;
>> +
>
>
> Can we not automatically infer this from whether clocks and/or power domains
> are specified or not, then just use pm_runtime_enabled() as the fast-path
> check as Tomasz originally proposed?

I wouldn't tie this to presence of clocks, since as a next step we
would want to actually control the clocks separately. (As far as I
understand, on QCom SoCs we might want to have runtime PM active for
the translation to work, but clocks gated whenever access to SMMU
registers is not needed.) Moreover, you might still have some super
high scale thousand-core systems that require clocks to be
prepare-enabled, but runtime PM would be undesirable for the reasons
we discussed before.

>
> I worry that relying on statically-defined matchdata is just going to blow
> up the driver and DT binding into a maintenance nightmare; I really don't
> want to start needing separate definitions for e.g. "arm,juno-etr-mmu-401"
> and "arm,juno-hdlcd-mmu-401" just because one otherwise-identical instance
> within the SoC is in a separate controllable power domain while the others
> aren't.

I don't see a reason why both couldn't just have RPM supported
regardless of whether there is a real power domain. It would
effectively be just a no-op for those that don't have one. IMHO the
only reason to avoid having the RPM enabled is the scalability issue
we discussed before.

Best regards,
Tomasz

2018-03-07 17:01:16

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

On 07/03/18 13:52, Tomasz Figa wrote:
> On Wed, Mar 7, 2018 at 9:38 PM, Robin Murphy <[email protected]> wrote:
>> On 02/03/18 10:10, Vivek Gautam wrote:
>>>
>>> From: Sricharan R <[email protected]>
>>>
>>> The smmu device probe/remove and add/remove master device callbacks
>>> gets called when the smmu is not linked to its master, that is without
>>> the context of the master device. So calling runtime apis in those places
>>> separately.
>>>
>>> Signed-off-by: Sricharan R <[email protected]>
>>> [vivek: Cleanup pm runtime calls]
>>> Signed-off-by: Vivek Gautam <[email protected]>
>>> ---
>>> drivers/iommu/arm-smmu.c | 96
>>> ++++++++++++++++++++++++++++++++++++++++++++----
>>> 1 file changed, 88 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>>> index c8b16f53f597..3d6a1875431f 100644
>>> --- a/drivers/iommu/arm-smmu.c
>>> +++ b/drivers/iommu/arm-smmu.c
>>> @@ -209,6 +209,8 @@ struct arm_smmu_device {
>>> struct clk_bulk_data *clks;
>>> int num_clks;
>>> + bool rpm_supported;
>>> +
>>
>>
>> Can we not automatically infer this from whether clocks and/or power domains
>> are specified or not, then just use pm_runtime_enabled() as the fast-path
>> check as Tomasz originally proposed?
>
> I wouldn't tie this to presence of clocks, since as a next step we
> would want to actually control the clocks separately. (As far as I
> understand, on QCom SoCs we might want to have runtime PM active for
> the translation to work, but clocks gated whenever access to SMMU
> registers is not needed.) Moreover, you might still have some super
> high scale thousand-core systems that require clocks to be
> prepare-enabled, but runtime PM would be undesirable for the reasons
> we discussed before.
>
>>
>> I worry that relying on statically-defined matchdata is just going to blow
>> up the driver and DT binding into a maintenance nightmare; I really don't
>> want to start needing separate definitions for e.g. "arm,juno-etr-mmu-401"
>> and "arm,juno-hdlcd-mmu-401" just because one otherwise-identical instance
>> within the SoC is in a separate controllable power domain while the others
>> aren't.
>
> I don't see a reason why both couldn't just have RPM supported
> regardless of whether there is a real power domain. It would
> effectively be just a no-op for those that don't have one.

Because you're then effectively defining "compatible" values for the
sake of attaching software policy to them, rather than actually
describing different hardware implementations.

The fact that RPM can't do anything meaningful unless relevant
clock/power aspects *are* described, however, means that we shouldn't
need additional information redundant with that. Much like the fact that
we don't *already* have an "arm,juno-hdlcd-mmu-401" compatible to
account for those being integrated such that IDR0.CTTW has the wrong
value, since the presence or not of the "dma-coherent" property already
describes the truth in that regard.

> IMHO the
> only reason to avoid having the RPM enabled is the scalability issue
> we discussed before.

Yes, but that's kind of my point; in reality high throughput/minimal
latency and aggressive power management are more or less mutually
exclusive. Mobile SoCs with fine-grained clock trees and power domains
won't have multiple 40GBe/NVMf/whatever links running flat out in
parallel; conversely networking/infrastructure/server SoCs aren't
designed around saving every last microamp of leakage current - even in
the (fairly unlikely) case of the interconnect clocks being
software-gateable at all I would be very surprised if that were ever
exposed directly to Linux (FWIW I believe ACPI essentially *requires*
clocks to be abstracted behind firmware).

Realistically then, explicit clocks are only expected on systems which
care about power management. We can always revisit that assumption if
anything crazy where it isn't the case ever becomes non-theoretical, but
for now it's one I'm entirely comfortable with. If on the other hand it
turns out that we can rely on just a power domain being present wherever
we want RPM, making clocks moot, then all the better.

Robin.

2018-03-08 04:36:38

by Tomasz Figa

[permalink] [raw]
Subject: Re: [PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

On Thu, Mar 8, 2018 at 1:58 AM, Robin Murphy <[email protected]> wrote:
> On 07/03/18 13:52, Tomasz Figa wrote:
>>
>> On Wed, Mar 7, 2018 at 9:38 PM, Robin Murphy <[email protected]> wrote:
>>>
>>> On 02/03/18 10:10, Vivek Gautam wrote:
>>>>
>>>>
>>>> From: Sricharan R <[email protected]>
>>>>
>>>> The smmu device probe/remove and add/remove master device callbacks
>>>> gets called when the smmu is not linked to its master, that is without
>>>> the context of the master device. So calling runtime apis in those
>>>> places
>>>> separately.
>>>>
>>>> Signed-off-by: Sricharan R <[email protected]>
>>>> [vivek: Cleanup pm runtime calls]
>>>> Signed-off-by: Vivek Gautam <[email protected]>
>>>> ---
>>>> drivers/iommu/arm-smmu.c | 96
>>>> ++++++++++++++++++++++++++++++++++++++++++++----
>>>> 1 file changed, 88 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>>>> index c8b16f53f597..3d6a1875431f 100644
>>>> --- a/drivers/iommu/arm-smmu.c
>>>> +++ b/drivers/iommu/arm-smmu.c
>>>> @@ -209,6 +209,8 @@ struct arm_smmu_device {
>>>> struct clk_bulk_data *clks;
>>>> int num_clks;
>>>> + bool rpm_supported;
>>>> +
>>>
>>>
>>>
>>> Can we not automatically infer this from whether clocks and/or power
>>> domains
>>> are specified or not, then just use pm_runtime_enabled() as the fast-path
>>> check as Tomasz originally proposed?
>>
>>
>> I wouldn't tie this to presence of clocks, since as a next step we
>> would want to actually control the clocks separately. (As far as I
>> understand, on QCom SoCs we might want to have runtime PM active for
>> the translation to work, but clocks gated whenever access to SMMU
>> registers is not needed.) Moreover, you might still have some super
>> high scale thousand-core systems that require clocks to be
>> prepare-enabled, but runtime PM would be undesirable for the reasons
>> we discussed before.
>>
>>>
>>> I worry that relying on statically-defined matchdata is just going to
>>> blow
>>> up the driver and DT binding into a maintenance nightmare; I really don't
>>> want to start needing separate definitions for e.g.
>>> "arm,juno-etr-mmu-401"
>>> and "arm,juno-hdlcd-mmu-401" just because one otherwise-identical
>>> instance
>>> within the SoC is in a separate controllable power domain while the
>>> others
>>> aren't.
>>
>>
>> I don't see a reason why both couldn't just have RPM supported
>> regardless of whether there is a real power domain. It would
>> effectively be just a no-op for those that don't have one.
>
>
> Because you're then effectively defining "compatible" values for the sake of
> attaching software policy to them, rather than actually describing different
> hardware implementations.
>
> The fact that RPM can't do anything meaningful unless relevant clock/power
> aspects *are* described, however, means that we shouldn't need additional
> information redundant with that. Much like the fact that we don't *already*
> have an "arm,juno-hdlcd-mmu-401" compatible to account for those being
> integrated such that IDR0.CTTW has the wrong value, since the presence or
> not of the "dma-coherent" property already describes the truth in that
> regard.

Fair enough.

>
>> IMHO the
>> only reason to avoid having the RPM enabled is the scalability issue
>> we discussed before.
>
>
> Yes, but that's kind of my point; in reality high throughput/minimal latency
> and aggressive power management are more or less mutually exclusive. Mobile
> SoCs with fine-grained clock trees and power domains won't have multiple
> 40GBe/NVMf/whatever links running flat out in parallel; conversely
> networking/infrastructure/server SoCs aren't designed around saving every
> last microamp of leakage current - even in the (fairly unlikely) case of the
> interconnect clocks being software-gateable at all I would be very surprised
> if that were ever exposed directly to Linux (FWIW I believe ACPI essentially
> *requires* clocks to be abstracted behind firmware).
>
> Realistically then, explicit clocks are only expected on systems which care
> about power management. We can always revisit that assumption if anything
> crazy where it isn't the case ever becomes non-theoretical, but for now it's
> one I'm entirely comfortable with. If on the other hand it turns out that we
> can rely on just a power domain being present wherever we want RPM, making
> clocks moot, then all the better.

Alright. Since Qcom would be the only user of clock and power handling
for the time being, I think checking power domain presence could work
for us. +/- the fact that clocks need to be handled even if power
domain is not present, but we should normally always have both.

Now we need a way to do the check. Perhaps for the time being it would
be enough to just check for the power-domains property in DT?

Best regards,
Tomasz

2018-03-08 05:34:12

by Vivek Gautam

[permalink] [raw]
Subject: Re: [PATCH v8 1/5] iommu/arm-smmu: Destroy domain context in failure path

On Wed, Mar 7, 2018 at 5:50 PM, Robin Murphy <[email protected]> wrote:
> On 02/03/18 10:10, Vivek Gautam wrote:
>>
>> If we fail after initializing domain_context, we should destroy
>> the context to free up resources.
>
>
> Have another think about why the "problem" this patch caters for cannot ever
> happen (hint: consider how domain->smmu is used in
> arm_smmu_init_domain_context()). And then also about the really
> catastrophically bad problem it actually introduces (hint:
> "iommu_attach(domain, good_dev); iommu_attach(domain, bad_dev);")

Got it, we would end up destroying good_dev's domain context with this.
Thanks

regards
Vivek

>
> Robin.
>
>
>> Signed-off-by: Vivek Gautam <[email protected]>
>> ---
>>
>> * New patch added in this series.
>>
>> drivers/iommu/arm-smmu.c | 7 ++++++-
>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index 69e7c60792a8..ffc152c36002 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -1223,11 +1223,16 @@ static int arm_smmu_attach_dev(struct iommu_domain
>> *domain, struct device *dev)
>> dev_err(dev,
>> "cannot attach to SMMU %s whilst already attached
>> to domain on SMMU %s\n",
>> dev_name(smmu_domain->smmu->dev),
>> dev_name(smmu->dev));
>> - return -EINVAL;
>> + ret = -EINVAL;
>> + goto destroy_domain;
>> }
>> /* Looks ok, so add the device to the domain */
>> return arm_smmu_domain_add_master(smmu_domain, fwspec);
>> +
>> +destroy_domain:
>> + arm_smmu_destroy_domain_context(domain);
>> + return ret;
>> }
>> static int arm_smmu_map(struct iommu_domain *domain, unsigned long
>> iova,
>>
>



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

2018-03-08 06:36:40

by Vivek Gautam

[permalink] [raw]
Subject: Re: [PATCH v8 4/5] iommu/arm-smmu: Add the device_link between masters and smmu

On Wed, Mar 7, 2018 at 6:17 PM, Robin Murphy <[email protected]> wrote:
> On 02/03/18 10:10, Vivek Gautam wrote:
>>
>> From: Sricharan R <[email protected]>
>>
>> Finally add the device link between the master device and
>> smmu, so that the smmu gets runtime enabled/disabled only when the
>> master needs it. This is done from add_device callback which gets
>> called once when the master is added to the smmu.
>>
>> Signed-off-by: Sricharan R <[email protected]>
>> Signed-off-by: Vivek Gautam <[email protected]>
>> ---
>> drivers/iommu/arm-smmu.c | 21 +++++++++++++++++++++
>> 1 file changed, 21 insertions(+)
>>
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index 3d6a1875431f..bb1ea82c1003 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -217,6 +217,9 @@ struct arm_smmu_device {
>> /* IOMMU core code handle */
>> struct iommu_device iommu;
>> +
>> + /* runtime PM link to master */
>> + struct device_link *link;
>
>
> Just the one?
>
>> };
>> enum arm_smmu_context_fmt {
>> @@ -1470,10 +1473,26 @@ static int arm_smmu_add_device(struct device *dev)
>> iommu_device_link(&smmu->iommu, dev);
>> + /*
>> + * Establish the link between smmu and master, so that the
>> + * smmu gets runtime enabled/disabled as per the master's
>> + * needs.
>> + */
>> + smmu->link = device_link_add(dev, smmu->dev, DL_FLAG_PM_RUNTIME);
>
>
> Maybe I've misunderstood how the API works, but AFAICS the second and
> subsequent devices are all just going to overwrite (and leak) the link of
> the previous one...

Sorry, my bad. Will take care of this.

regards
Vivek

>
>> + if (!smmu->link) {
>> + dev_warn(smmu->dev, "Unable to create device link between
>> %s and %s\n",
>> + dev_name(smmu->dev), dev_name(dev));
>> + ret = -ENODEV;
>> + goto out_unlink;
>> + }
>> +
>> arm_smmu_rpm_put(smmu);
>> return 0;
>> +out_unlink:
>> + iommu_device_unlink(&smmu->iommu, dev);
>> + arm_smmu_master_free_smes(fwspec);
>> out_rpm_put:
>> arm_smmu_rpm_put(smmu);
>> out_cfg_free:
>> @@ -1496,6 +1515,8 @@ static void arm_smmu_remove_device(struct device
>> *dev)
>> cfg = fwspec->iommu_priv;
>> smmu = cfg->smmu;
>> + device_link_del(smmu->link);
>
>
> ...and equivalently you end up with a double-free (or more) here of a link
> which may not have belonged to dev anyway.
>
> Robin.
>
>
>> +
>> ret = arm_smmu_rpm_get(smmu);
>> if (ret < 0)
>> return;
>>
>



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

2018-03-08 12:13:39

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

On 08/03/18 04:33, Tomasz Figa wrote:
> On Thu, Mar 8, 2018 at 1:58 AM, Robin Murphy <[email protected]> wrote:
>> On 07/03/18 13:52, Tomasz Figa wrote:
>>>
>>> On Wed, Mar 7, 2018 at 9:38 PM, Robin Murphy <[email protected]> wrote:
>>>>
>>>> On 02/03/18 10:10, Vivek Gautam wrote:
>>>>>
>>>>>
>>>>> From: Sricharan R <[email protected]>
>>>>>
>>>>> The smmu device probe/remove and add/remove master device callbacks
>>>>> gets called when the smmu is not linked to its master, that is without
>>>>> the context of the master device. So calling runtime apis in those
>>>>> places
>>>>> separately.
>>>>>
>>>>> Signed-off-by: Sricharan R <[email protected]>
>>>>> [vivek: Cleanup pm runtime calls]
>>>>> Signed-off-by: Vivek Gautam <[email protected]>
>>>>> ---
>>>>> drivers/iommu/arm-smmu.c | 96
>>>>> ++++++++++++++++++++++++++++++++++++++++++++----
>>>>> 1 file changed, 88 insertions(+), 8 deletions(-)
>>>>>
>>>>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>>>>> index c8b16f53f597..3d6a1875431f 100644
>>>>> --- a/drivers/iommu/arm-smmu.c
>>>>> +++ b/drivers/iommu/arm-smmu.c
>>>>> @@ -209,6 +209,8 @@ struct arm_smmu_device {
>>>>> struct clk_bulk_data *clks;
>>>>> int num_clks;
>>>>> + bool rpm_supported;
>>>>> +
>>>>
>>>>
>>>>
>>>> Can we not automatically infer this from whether clocks and/or power
>>>> domains
>>>> are specified or not, then just use pm_runtime_enabled() as the fast-path
>>>> check as Tomasz originally proposed?
>>>
>>>
>>> I wouldn't tie this to presence of clocks, since as a next step we
>>> would want to actually control the clocks separately. (As far as I
>>> understand, on QCom SoCs we might want to have runtime PM active for
>>> the translation to work, but clocks gated whenever access to SMMU
>>> registers is not needed.) Moreover, you might still have some super
>>> high scale thousand-core systems that require clocks to be
>>> prepare-enabled, but runtime PM would be undesirable for the reasons
>>> we discussed before.
>>>
>>>>
>>>> I worry that relying on statically-defined matchdata is just going to
>>>> blow
>>>> up the driver and DT binding into a maintenance nightmare; I really don't
>>>> want to start needing separate definitions for e.g.
>>>> "arm,juno-etr-mmu-401"
>>>> and "arm,juno-hdlcd-mmu-401" just because one otherwise-identical
>>>> instance
>>>> within the SoC is in a separate controllable power domain while the
>>>> others
>>>> aren't.
>>>
>>>
>>> I don't see a reason why both couldn't just have RPM supported
>>> regardless of whether there is a real power domain. It would
>>> effectively be just a no-op for those that don't have one.
>>
>>
>> Because you're then effectively defining "compatible" values for the sake of
>> attaching software policy to them, rather than actually describing different
>> hardware implementations.
>>
>> The fact that RPM can't do anything meaningful unless relevant clock/power
>> aspects *are* described, however, means that we shouldn't need additional
>> information redundant with that. Much like the fact that we don't *already*
>> have an "arm,juno-hdlcd-mmu-401" compatible to account for those being
>> integrated such that IDR0.CTTW has the wrong value, since the presence or
>> not of the "dma-coherent" property already describes the truth in that
>> regard.
>
> Fair enough.
>
>>
>>> IMHO the
>>> only reason to avoid having the RPM enabled is the scalability issue
>>> we discussed before.
>>
>>
>> Yes, but that's kind of my point; in reality high throughput/minimal latency
>> and aggressive power management are more or less mutually exclusive. Mobile
>> SoCs with fine-grained clock trees and power domains won't have multiple
>> 40GBe/NVMf/whatever links running flat out in parallel; conversely
>> networking/infrastructure/server SoCs aren't designed around saving every
>> last microamp of leakage current - even in the (fairly unlikely) case of the
>> interconnect clocks being software-gateable at all I would be very surprised
>> if that were ever exposed directly to Linux (FWIW I believe ACPI essentially
>> *requires* clocks to be abstracted behind firmware).
>>
>> Realistically then, explicit clocks are only expected on systems which care
>> about power management. We can always revisit that assumption if anything
>> crazy where it isn't the case ever becomes non-theoretical, but for now it's
>> one I'm entirely comfortable with. If on the other hand it turns out that we
>> can rely on just a power domain being present wherever we want RPM, making
>> clocks moot, then all the better.
>
> Alright. Since Qcom would be the only user of clock and power handling
> for the time being, I think checking power domain presence could work
> for us. +/- the fact that clocks need to be handled even if power
> domain is not present, but we should normally always have both.

Great! (the issue of Qcom-specific clock handling is a separate argument
which I don't feel like reigniting just now...)

> Now we need a way to do the check. Perhaps for the time being it would
> be enough to just check for the power-domains property in DT?

AFAICS, it might be as simple as arm_smmu_probe() doing this:

/*
* We want to avoid touching dev->power.lock in fastpaths unless
* it's really going to do something useful - pm_runtime_enabled()
* can serve as an ideal proxy for that decision.
*/
if (dev->pm_domain)
pm_runtime_enable(dev);

or maybe even just gate all the calls with "if (smmu->dev.pm_domain)"
directly (like pcie-mediatek does), but I'm not sure which would be
conceptually cleaner.

Robin.

2018-03-09 04:52:08

by Tomasz Figa

[permalink] [raw]
Subject: Re: [PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

On Thu, Mar 8, 2018 at 9:12 PM, Robin Murphy <[email protected]> wrote:
> On 08/03/18 04:33, Tomasz Figa wrote:
>>
>> On Thu, Mar 8, 2018 at 1:58 AM, Robin Murphy <[email protected]> wrote:
>>>
>>> On 07/03/18 13:52, Tomasz Figa wrote:
>>>>
>>>>
>>>> On Wed, Mar 7, 2018 at 9:38 PM, Robin Murphy <[email protected]>
>>>> wrote:
>>>>>
>>>>>
>>>>> On 02/03/18 10:10, Vivek Gautam wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> From: Sricharan R <[email protected]>
>>>>>>
>>>>>> The smmu device probe/remove and add/remove master device callbacks
>>>>>> gets called when the smmu is not linked to its master, that is without
>>>>>> the context of the master device. So calling runtime apis in those
>>>>>> places
>>>>>> separately.
>>>>>>
>>>>>> Signed-off-by: Sricharan R <[email protected]>
>>>>>> [vivek: Cleanup pm runtime calls]
>>>>>> Signed-off-by: Vivek Gautam <[email protected]>
>>>>>> ---
>>>>>> drivers/iommu/arm-smmu.c | 96
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++----
>>>>>> 1 file changed, 88 insertions(+), 8 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>>>>>> index c8b16f53f597..3d6a1875431f 100644
>>>>>> --- a/drivers/iommu/arm-smmu.c
>>>>>> +++ b/drivers/iommu/arm-smmu.c
>>>>>> @@ -209,6 +209,8 @@ struct arm_smmu_device {
>>>>>> struct clk_bulk_data *clks;
>>>>>> int num_clks;
>>>>>> + bool rpm_supported;
>>>>>> +
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Can we not automatically infer this from whether clocks and/or power
>>>>> domains
>>>>> are specified or not, then just use pm_runtime_enabled() as the
>>>>> fast-path
>>>>> check as Tomasz originally proposed?
>>>>
>>>>
>>>>
>>>> I wouldn't tie this to presence of clocks, since as a next step we
>>>> would want to actually control the clocks separately. (As far as I
>>>> understand, on QCom SoCs we might want to have runtime PM active for
>>>> the translation to work, but clocks gated whenever access to SMMU
>>>> registers is not needed.) Moreover, you might still have some super
>>>> high scale thousand-core systems that require clocks to be
>>>> prepare-enabled, but runtime PM would be undesirable for the reasons
>>>> we discussed before.
>>>>
>>>>>
>>>>> I worry that relying on statically-defined matchdata is just going to
>>>>> blow
>>>>> up the driver and DT binding into a maintenance nightmare; I really
>>>>> don't
>>>>> want to start needing separate definitions for e.g.
>>>>> "arm,juno-etr-mmu-401"
>>>>> and "arm,juno-hdlcd-mmu-401" just because one otherwise-identical
>>>>> instance
>>>>> within the SoC is in a separate controllable power domain while the
>>>>> others
>>>>> aren't.
>>>>
>>>>
>>>>
>>>> I don't see a reason why both couldn't just have RPM supported
>>>> regardless of whether there is a real power domain. It would
>>>> effectively be just a no-op for those that don't have one.
>>>
>>>
>>>
>>> Because you're then effectively defining "compatible" values for the sake
>>> of
>>> attaching software policy to them, rather than actually describing
>>> different
>>> hardware implementations.
>>>
>>> The fact that RPM can't do anything meaningful unless relevant
>>> clock/power
>>> aspects *are* described, however, means that we shouldn't need additional
>>> information redundant with that. Much like the fact that we don't
>>> *already*
>>> have an "arm,juno-hdlcd-mmu-401" compatible to account for those being
>>> integrated such that IDR0.CTTW has the wrong value, since the presence or
>>> not of the "dma-coherent" property already describes the truth in that
>>> regard.
>>
>>
>> Fair enough.
>>
>>>
>>>> IMHO the
>>>> only reason to avoid having the RPM enabled is the scalability issue
>>>> we discussed before.
>>>
>>>
>>>
>>> Yes, but that's kind of my point; in reality high throughput/minimal
>>> latency
>>> and aggressive power management are more or less mutually exclusive.
>>> Mobile
>>> SoCs with fine-grained clock trees and power domains won't have multiple
>>> 40GBe/NVMf/whatever links running flat out in parallel; conversely
>>> networking/infrastructure/server SoCs aren't designed around saving every
>>> last microamp of leakage current - even in the (fairly unlikely) case of
>>> the
>>> interconnect clocks being software-gateable at all I would be very
>>> surprised
>>> if that were ever exposed directly to Linux (FWIW I believe ACPI
>>> essentially
>>> *requires* clocks to be abstracted behind firmware).
>>>
>>> Realistically then, explicit clocks are only expected on systems which
>>> care
>>> about power management. We can always revisit that assumption if anything
>>> crazy where it isn't the case ever becomes non-theoretical, but for now
>>> it's
>>> one I'm entirely comfortable with. If on the other hand it turns out that
>>> we
>>> can rely on just a power domain being present wherever we want RPM,
>>> making
>>> clocks moot, then all the better.
>>
>>
>> Alright. Since Qcom would be the only user of clock and power handling
>> for the time being, I think checking power domain presence could work
>> for us. +/- the fact that clocks need to be handled even if power
>> domain is not present, but we should normally always have both.
>
>
> Great! (the issue of Qcom-specific clock handling is a separate argument
> which I don't feel like reigniting just now...)
>
>> Now we need a way to do the check. Perhaps for the time being it would
>> be enough to just check for the power-domains property in DT?
>
>
> AFAICS, it might be as simple as arm_smmu_probe() doing this:
>
> /*
> * We want to avoid touching dev->power.lock in fastpaths unless
> * it's really going to do something useful - pm_runtime_enabled()
> * can serve as an ideal proxy for that decision.
> */
> if (dev->pm_domain)
> pm_runtime_enable(dev);
>
> or maybe even just gate all the calls with "if (smmu->dev.pm_domain)"
> directly (like pcie-mediatek does), but I'm not sure which would be
> conceptually cleaner.

Okay, that was easier than I expected. Thanks. :)

Actually, there is one more thing that might need rechecking. Are you
sure that dev->pm_domain is NULL for the devices, for which we don't
want runtime PM to be enabled? I think ACPI was mentioned and ACPI
includes the concept of PM domains.

Best regards,
Tomasz

2018-03-09 07:13:00

by Vivek Gautam

[permalink] [raw]
Subject: Re: [PATCH v8 4/5] iommu/arm-smmu: Add the device_link between masters and smmu

On Thu, Mar 8, 2018 at 10:29 AM, Vivek Gautam
<[email protected]> wrote:
> On Wed, Mar 7, 2018 at 6:17 PM, Robin Murphy <[email protected]> wrote:
>> On 02/03/18 10:10, Vivek Gautam wrote:
>>>
>>> From: Sricharan R <[email protected]>
>>>
>>> Finally add the device link between the master device and
>>> smmu, so that the smmu gets runtime enabled/disabled only when the
>>> master needs it. This is done from add_device callback which gets
>>> called once when the master is added to the smmu.
>>>
>>> Signed-off-by: Sricharan R <[email protected]>
>>> Signed-off-by: Vivek Gautam <[email protected]>
>>> ---
>>> drivers/iommu/arm-smmu.c | 21 +++++++++++++++++++++
>>> 1 file changed, 21 insertions(+)
>>>
>>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>>> index 3d6a1875431f..bb1ea82c1003 100644
>>> --- a/drivers/iommu/arm-smmu.c
>>> +++ b/drivers/iommu/arm-smmu.c
>>> @@ -217,6 +217,9 @@ struct arm_smmu_device {
>>> /* IOMMU core code handle */
>>> struct iommu_device iommu;
>>> +
>>> + /* runtime PM link to master */
>>> + struct device_link *link;
>>
>>
>> Just the one?

we will either have to count all the devices that are present on the
iommu bus, or
maintain a list to which all the links can be added.
But to add the list, we will have to initialize a LIST_HEAD in struct
device_link
as well.

Or, I think we don't even need to maintain a pointer to link with smmu.
In arm_smmu_remove_device(), we can find out the correct link, and delete it.

list_for_each_entry(link, &dev->links.suppliers, c_node)
if (link->supplier == smmu->dev);
device_link_del(link);

Should that be fine?

Rafael, does the above snippet looks right to you? Context: smmu->dev
is the supplier, and dev is the consumer. We want to find the link,
and delete it.

regards
Vivek

>>
>>> };
>>> enum arm_smmu_context_fmt {
>>> @@ -1470,10 +1473,26 @@ static int arm_smmu_add_device(struct device *dev)
>>> iommu_device_link(&smmu->iommu, dev);
>>> + /*
>>> + * Establish the link between smmu and master, so that the
>>> + * smmu gets runtime enabled/disabled as per the master's
>>> + * needs.
>>> + */
>>> + smmu->link = device_link_add(dev, smmu->dev, DL_FLAG_PM_RUNTIME);
>>
>>
>> Maybe I've misunderstood how the API works, but AFAICS the second and
>> subsequent devices are all just going to overwrite (and leak) the link of
>> the previous one...
>
> Sorry, my bad. Will take care of this.
>
> regards
> Vivek
>
>>
>>> + if (!smmu->link) {
>>> + dev_warn(smmu->dev, "Unable to create device link between
>>> %s and %s\n",
>>> + dev_name(smmu->dev), dev_name(dev));
>>> + ret = -ENODEV;
>>> + goto out_unlink;
>>> + }
>>> +
>>> arm_smmu_rpm_put(smmu);
>>> return 0;
>>> +out_unlink:
>>> + iommu_device_unlink(&smmu->iommu, dev);
>>> + arm_smmu_master_free_smes(fwspec);
>>> out_rpm_put:
>>> arm_smmu_rpm_put(smmu);
>>> out_cfg_free:
>>> @@ -1496,6 +1515,8 @@ static void arm_smmu_remove_device(struct device
>>> *dev)
>>> cfg = fwspec->iommu_priv;
>>> smmu = cfg->smmu;
>>> + device_link_del(smmu->link);
>>
>>
>> ...and equivalently you end up with a double-free (or more) here of a link
>> which may not have belonged to dev anyway.
>>
>> Robin.
>>
>>
>>> +
>>> ret = arm_smmu_rpm_get(smmu);
>>> if (ret < 0)
>>> return;
>>>
>>
>
>
>
> --
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
> of Code Aurora Forum, hosted by The Linux Foundation



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

2018-03-09 10:41:34

by Vivek Gautam

[permalink] [raw]
Subject: Re: [PATCH v8 4/5] iommu/arm-smmu: Add the device_link between masters and smmu

On Wed, Mar 7, 2018 at 6:17 PM, Robin Murphy <[email protected]> wrote:
> On 02/03/18 10:10, Vivek Gautam wrote:
>>
>> From: Sricharan R <[email protected]>
>>
>> Finally add the device link between the master device and
>> smmu, so that the smmu gets runtime enabled/disabled only when the
>> master needs it. This is done from add_device callback which gets
>> called once when the master is added to the smmu.
>>
>> Signed-off-by: Sricharan R <[email protected]>
>> Signed-off-by: Vivek Gautam <[email protected]>
>> ---
>> drivers/iommu/arm-smmu.c | 21 +++++++++++++++++++++
>> 1 file changed, 21 insertions(+)
>>
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index 3d6a1875431f..bb1ea82c1003 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -217,6 +217,9 @@ struct arm_smmu_device {
>> /* IOMMU core code handle */
>> struct iommu_device iommu;
>> +
>> + /* runtime PM link to master */
>> + struct device_link *link;
>
>
> Just the one?
>
>> };
>> enum arm_smmu_context_fmt {
>> @@ -1470,10 +1473,26 @@ static int arm_smmu_add_device(struct device *dev)
>> iommu_device_link(&smmu->iommu, dev);
>> + /*
>> + * Establish the link between smmu and master, so that the
>> + * smmu gets runtime enabled/disabled as per the master's
>> + * needs.
>> + */
>> + smmu->link = device_link_add(dev, smmu->dev, DL_FLAG_PM_RUNTIME);
>
>
> Maybe I've misunderstood how the API works, but AFAICS the second and
> subsequent devices are all just going to overwrite (and leak) the link of
> the previous one...

Also, noticed one more thing while testing on sdm845. When we are
conditionally enabling the runtime pm, we should create the device
link too conditionally, i.e. only in the case the smmu->dev has
runtime pm_enabled we can create this device link between smmu and the
master device.
Otherwise when the master tries to do a pm_runtime_get() over itself,
the device link will ensure that pm_runtime_get() for smmu is done
first. But that will fail when we don't have pm runtime enabled over
smmu, and so the master device's pm_runtime_get() will fail too.
Will fix this in the next version.

Thanks
Vivek

>
>> + if (!smmu->link) {
>> + dev_warn(smmu->dev, "Unable to create device link between
>> %s and %s\n",
>> + dev_name(smmu->dev), dev_name(dev));
>> + ret = -ENODEV;
>> + goto out_unlink;
>> + }
>> +
>> arm_smmu_rpm_put(smmu);
>> return 0;
>> +out_unlink:
>> + iommu_device_unlink(&smmu->iommu, dev);
>> + arm_smmu_master_free_smes(fwspec);
>> out_rpm_put:
>> arm_smmu_rpm_put(smmu);
>> out_cfg_free:
>> @@ -1496,6 +1515,8 @@ static void arm_smmu_remove_device(struct device
>> *dev)
>> cfg = fwspec->iommu_priv;
>> smmu = cfg->smmu;
>> + device_link_del(smmu->link);
>
>
> ...and equivalently you end up with a double-free (or more) here of a link
> which may not have belonged to dev anyway.
>
> Robin.
>
>
>> +
>> ret = arm_smmu_rpm_get(smmu);
>> if (ret < 0)
>> return;
>>
>



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

2018-03-09 12:35:57

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH v8 4/5] iommu/arm-smmu: Add the device_link between masters and smmu

On 09/03/18 07:11, Vivek Gautam wrote:
> On Thu, Mar 8, 2018 at 10:29 AM, Vivek Gautam
> <[email protected]> wrote:
>> On Wed, Mar 7, 2018 at 6:17 PM, Robin Murphy <[email protected]> wrote:
>>> On 02/03/18 10:10, Vivek Gautam wrote:
>>>>
>>>> From: Sricharan R <[email protected]>
>>>>
>>>> Finally add the device link between the master device and
>>>> smmu, so that the smmu gets runtime enabled/disabled only when the
>>>> master needs it. This is done from add_device callback which gets
>>>> called once when the master is added to the smmu.
>>>>
>>>> Signed-off-by: Sricharan R <[email protected]>
>>>> Signed-off-by: Vivek Gautam <[email protected]>
>>>> ---
>>>> drivers/iommu/arm-smmu.c | 21 +++++++++++++++++++++
>>>> 1 file changed, 21 insertions(+)
>>>>
>>>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>>>> index 3d6a1875431f..bb1ea82c1003 100644
>>>> --- a/drivers/iommu/arm-smmu.c
>>>> +++ b/drivers/iommu/arm-smmu.c
>>>> @@ -217,6 +217,9 @@ struct arm_smmu_device {
>>>> /* IOMMU core code handle */
>>>> struct iommu_device iommu;
>>>> +
>>>> + /* runtime PM link to master */
>>>> + struct device_link *link;
>>>
>>>
>>> Just the one?
>
> we will either have to count all the devices that are present on the
> iommu bus, or
> maintain a list to which all the links can be added.
> But to add the list, we will have to initialize a LIST_HEAD in struct
> device_link
> as well.
>
> Or, I think we don't even need to maintain a pointer to link with smmu.
> In arm_smmu_remove_device(), we can find out the correct link, and delete it.
>
> list_for_each_entry(link, &dev->links.suppliers, c_node)
> if (link->supplier == smmu->dev);
> device_link_del(link);
>
> Should that be fine?
>
> Rafael, does the above snippet looks right to you? Context: smmu->dev
> is the supplier, and dev is the consumer. We want to find the link,
> and delete it.

Actually, looking at the existing code, it seems like device_link_add()
will in fact look up and return any existing link between a given
supplier and consumer - is that intentional API behaviour that users may
rely on to avoid keeping track of explicit link pointers? (or
conversely, might it be reasonable to factor out a device_link_find()
function?)

Robin.

2018-03-09 17:38:57

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

[ +Lorenzo ]

On 09/03/18 04:50, Tomasz Figa wrote:
[...]
>>> Now we need a way to do the check. Perhaps for the time being it would
>>> be enough to just check for the power-domains property in DT?
>>
>>
>> AFAICS, it might be as simple as arm_smmu_probe() doing this:
>>
>> /*
>> * We want to avoid touching dev->power.lock in fastpaths unless
>> * it's really going to do something useful - pm_runtime_enabled()
>> * can serve as an ideal proxy for that decision.
>> */
>> if (dev->pm_domain)
>> pm_runtime_enable(dev);
>>
>> or maybe even just gate all the calls with "if (smmu->dev.pm_domain)"
>> directly (like pcie-mediatek does), but I'm not sure which would be
>> conceptually cleaner.
>
> Okay, that was easier than I expected. Thanks. :)
>
> Actually, there is one more thing that might need rechecking. Are you
> sure that dev->pm_domain is NULL for the devices, for which we don't
> want runtime PM to be enabled? I think ACPI was mentioned and ACPI
> includes the concept of PM domains.

Thanks for pointing that out - thankfully, I've confirmed that the SMMUs
on my Juno don't have dev->pm_domain set when booting with ACPI, and
double-checking the ACPI code I think we're OK here. Since the SMMUs are
only described in the static IORT table and not in the ACPI namespace,
they won't have the ACPI companion device that acpi_dev_pm_attach()
looks for, and thus should always be ignored. Lorenzo, do I have that right?

Robin.

2018-03-12 10:22:58

by Vivek Gautam

[permalink] [raw]
Subject: Re: [PATCH v8 4/5] iommu/arm-smmu: Add the device_link between masters and smmu

On Fri, Mar 9, 2018 at 6:04 PM, Robin Murphy <[email protected]> wrote:
> On 09/03/18 07:11, Vivek Gautam wrote:
>>
>> On Thu, Mar 8, 2018 at 10:29 AM, Vivek Gautam
>> <[email protected]> wrote:
>>>
>>> On Wed, Mar 7, 2018 at 6:17 PM, Robin Murphy <[email protected]>
>>> wrote:
>>>>
>>>> On 02/03/18 10:10, Vivek Gautam wrote:
>>>>>
>>>>>
>>>>> From: Sricharan R <[email protected]>
>>>>>
>>>>> Finally add the device link between the master device and
>>>>> smmu, so that the smmu gets runtime enabled/disabled only when the
>>>>> master needs it. This is done from add_device callback which gets
>>>>> called once when the master is added to the smmu.
>>>>>
>>>>> Signed-off-by: Sricharan R <[email protected]>
>>>>> Signed-off-by: Vivek Gautam <[email protected]>
>>>>> ---
>>>>> drivers/iommu/arm-smmu.c | 21 +++++++++++++++++++++
>>>>> 1 file changed, 21 insertions(+)
>>>>>
>>>>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>>>>> index 3d6a1875431f..bb1ea82c1003 100644
>>>>> --- a/drivers/iommu/arm-smmu.c
>>>>> +++ b/drivers/iommu/arm-smmu.c
>>>>> @@ -217,6 +217,9 @@ struct arm_smmu_device {
>>>>> /* IOMMU core code handle */
>>>>> struct iommu_device iommu;
>>>>> +
>>>>> + /* runtime PM link to master */
>>>>> + struct device_link *link;
>>>>
>>>>
>>>>
>>>> Just the one?
>>
>>
>> we will either have to count all the devices that are present on the
>> iommu bus, or
>> maintain a list to which all the links can be added.
>> But to add the list, we will have to initialize a LIST_HEAD in struct
>> device_link
>> as well.
>>
>> Or, I think we don't even need to maintain a pointer to link with smmu.
>> In arm_smmu_remove_device(), we can find out the correct link, and delete
>> it.
>>
>> list_for_each_entry(link, &dev->links.suppliers, c_node)
>> if (link->supplier == smmu->dev);
>> device_link_del(link);
>>
>> Should that be fine?
>>
>> Rafael, does the above snippet looks right to you? Context: smmu->dev
>> is the supplier, and dev is the consumer. We want to find the link,
>> and delete it.
>
>
> Actually, looking at the existing code, it seems like device_link_add() will
> in fact look up and return any existing link between a given supplier and
> consumer - is that intentional API behaviour that users may rely on to avoid
> keeping track of explicit link pointers?
> (or conversely, might it be
> reasonable to factor out a device_link_find() function?)

Yea, that sounds better.

regards
Vivek

>
> Robin.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation