I replied[1] to this patch series[2] and described how I think interconnect
bandwidth voting should be captured in DT and how it should work.
So sending out a patch series implementing that. This patch series does the
following:
- Allow required-opps to point to any device's OPP
- Add support to devfreq passive governor to look at the "parent"
devfreq's required-opps to decide the "slave" device's frequency
- Adds Bandwidth OPP table support
- Adds a devfreq library for interconnect paths
Interconnects and interconnect paths quantify they performance levels in
terms of bandwidth. So similar to how we have frequency based OPP tables
in DT and in the OPP framework, this patch series adds bandwidth OPP
table support in the OPP framework and in DT.
To simplify voting for interconnects, this patch series adds helper
functions to create devfreq devices out of interconnect paths. This
allows drivers to add a single line of code to add interconnect voting
capability.
To add devfreq device for the "gpu-mem" interconnect path and use
required OPPs to do decide gpu-mem bandwidth:
icc_create_devfreq(dev, "gpu-mem", gpu_devfreq);
To add devfreq device for the "gpu-mem" interconnect path and use some
other governor to device gpu-mem bandwidth:
icc_create_devfreq(dev, "gpu-mem", NULL);
So with the DT bindings added in this patch series, the DT for a GPU
that does bandwidth voting from GPU to Cache and GPU to DDR would look
something like this:
gpu_cache_opp_table: gpu_cache_opp_table {
compatible = "operating-points-v2";
gpu_cache_3000: opp-3000 {
opp-peak-KBps = <3000>;
opp-avg-KBps = <1000>;
};
gpu_cache_6000: opp-6000 {
opp-peak-KBps = <6000>;
opp-avg-KBps = <2000>;
};
gpu_cache_9000: opp-9000 {
opp-peak-KBps = <9000>;
opp-avg-KBps = <9000>;
};
};
gpu_ddr_opp_table: gpu_ddr_opp_table {
compatible = "operating-points-v2";
gpu_ddr_1525: opp-1525 {
opp-peak-KBps = <1525>;
opp-avg-KBps = <452>;
};
gpu_ddr_3051: opp-3051 {
opp-peak-KBps = <3051>;
opp-avg-KBps = <915>;
};
gpu_ddr_7500: opp-7500 {
opp-peak-KBps = <7500>;
opp-avg-KBps = <3000>;
};
};
gpu_opp_table: gpu_opp_table {
compatible = "operating-points-v2";
opp-shared;
opp-200000000 {
opp-hz = /bits/ 64 <200000000>;
required-opps = <&gpu_cache_3000>, <&gpu_ddr_1525>;
};
opp-400000000 {
opp-hz = /bits/ 64 <400000000>;
required-opps = <&gpu_cache_6000>, <&gpu_ddr_3051>;
};
};
gpu@7864000 {
...
operating-points-v2 = <&gpu_opp_table>, <&gpu_cache_opp_table>, <&gpu_ddr_opp_table>;
interconnects = <&mmnoc MASTER_GPU_1 &bimc SLAVE_SYSTEL_CACHE>,
<&mmnoc MASTER_GPU_1 &bimc SLAVE_DDR>;
interconnect-names = "gpu-cache", "gpu-mem";
interconnect-opp-table = <&gpu_cache_opp_table>, <&gpu_ddr_opp_table>
};
Cheers,
Saravana
[1] - https://lore.kernel.org/lkml/[email protected]/
[2] - https://lore.kernel.org/lkml/[email protected]/
Saravana Kannan (11):
OPP: Allow required-opps even if the device doesn't have power-domains
OPP: Add function to look up required OPP's for a given OPP
PM / devfreq: Add required OPPs support to passive governor
dt-bindings: opp: Introduce opp-peak-KBps and opp-avg-KBps bindings
OPP: Add support for bandwidth OPP tables
OPP: Add helper function for bandwidth OPP tables
OPP: Add API to find an OPP table from its DT node
dt-bindings: interconnect: Add interconnect-opp-table property
interconnect: Add OPP table support for interconnects
OPP: Allow copying OPPs tables between devices
interconnect: Add devfreq support
.../bindings/interconnect/interconnect.txt | 8 +
Documentation/devicetree/bindings/opp/opp.txt | 15 +-
drivers/devfreq/governor_passive.c | 25 ++-
drivers/interconnect/Makefile | 2 +-
drivers/interconnect/core.c | 27 ++-
drivers/interconnect/icc-devfreq.c | 156 ++++++++++++++++++
drivers/opp/core.c | 115 ++++++++++++-
drivers/opp/of.c | 90 +++++++---
drivers/opp/opp.h | 4 +-
include/linux/interconnect.h | 21 +++
include/linux/pm_opp.h | 44 +++++
11 files changed, 474 insertions(+), 33 deletions(-)
create mode 100644 drivers/interconnect/icc-devfreq.c
--
2.22.0.rc2.383.gf4fbbf30c2-goog
A Device-A can have a (minimum) performance requirement on another
Device-B to be able to function correctly. This performance requirement
on Device-B can also change based on the current performance level of
Device-A.
The existing required-opps feature fits well to describe this need. So,
instead of limiting required-opps to point to only PM-domain devices,
allow it to point to any device.
Signed-off-by: Saravana Kannan <[email protected]>
---
drivers/opp/core.c | 2 +-
drivers/opp/of.c | 14 --------------
2 files changed, 1 insertion(+), 15 deletions(-)
diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index 0e7703fe733f..74c7bdc6f463 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -710,7 +710,7 @@ static int _set_required_opps(struct device *dev,
return 0;
/* Single genpd case */
- if (!genpd_virt_devs) {
+ if (!genpd_virt_devs && required_opp_tables[0]->is_genpd) {
pstate = opp->required_opps[0]->pstate;
ret = dev_pm_genpd_set_performance_state(dev, pstate);
if (ret) {
diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index c10c782d15aa..7c8336e94aff 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -195,9 +195,6 @@ static void _opp_table_alloc_required_tables(struct opp_table *opp_table,
*/
count_pd = of_count_phandle_with_args(dev->of_node, "power-domains",
"#power-domain-cells");
- if (!count_pd)
- goto put_np;
-
if (count_pd > 1) {
genpd_virt_devs = kcalloc(count, sizeof(*genpd_virt_devs),
GFP_KERNEL);
@@ -226,17 +223,6 @@ static void _opp_table_alloc_required_tables(struct opp_table *opp_table,
if (IS_ERR(required_opp_tables[i]))
goto free_required_tables;
-
- /*
- * We only support genpd's OPPs in the "required-opps" for now,
- * as we don't know how much about other cases. Error out if the
- * required OPP doesn't belong to a genpd.
- */
- if (!required_opp_tables[i]->is_genpd) {
- dev_err(dev, "required-opp doesn't belong to genpd: %pOF\n",
- required_np);
- goto free_required_tables;
- }
}
goto put_np;
--
2.22.0.rc2.383.gf4fbbf30c2-goog
Add a function that allows looking up required OPPs given a source OPP
table, destination OPP table and the source OPP.
Signed-off-by: Saravana Kannan <[email protected]>
---
drivers/opp/core.c | 54 ++++++++++++++++++++++++++++++++++++++++++
include/linux/pm_opp.h | 11 +++++++++
2 files changed, 65 insertions(+)
diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index 74c7bdc6f463..4f7870bffbf8 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -1830,6 +1830,60 @@ void dev_pm_opp_put_genpd_virt_dev(struct opp_table *opp_table,
dev_err(virt_dev, "Failed to find required device entry\n");
}
+/**
+ * dev_pm_opp_xlate_opp() - Find required OPP for src_table OPP.
+ * @src_table: OPP table which has dst_table as one of its required OPP table.
+ * @dst_table: Required OPP table of the src_table.
+ * @pstate: OPP of the src_table.
+ *
+ * This function returns the OPP (present in @dst_table) pointed out by the
+ * "required-opps" property of the OPP (present in @src_table).
+ *
+ * The callers are required to call dev_pm_opp_put() for the returned OPP after
+ * use.
+ *
+ * Return: destination table OPP on success, otherwise NULL on errors.
+ */
+struct dev_pm_opp *dev_pm_opp_xlate_opp(struct opp_table *src_table,
+ struct opp_table *dst_table,
+ struct dev_pm_opp *src_opp)
+{
+ struct dev_pm_opp *opp, *dest_opp = NULL;
+ int i;
+
+ if (!src_table || !dst_table || !src_opp)
+ return NULL;
+
+ for (i = 0; i < src_table->required_opp_count; i++) {
+ if (src_table->required_opp_tables[i]->np == dst_table->np)
+ break;
+ }
+
+ if (unlikely(i == src_table->required_opp_count)) {
+ pr_err("%s: Couldn't find matching OPP table (%p: %p)\n",
+ __func__, src_table, dst_table);
+ return NULL;
+ }
+
+ mutex_lock(&src_table->lock);
+
+ list_for_each_entry(opp, &src_table->opp_list, node) {
+ if (opp == src_opp) {
+ dest_opp = opp->required_opps[i];
+ dev_pm_opp_get(dest_opp);
+ goto unlock;
+ }
+ }
+
+ pr_err("%s: Couldn't find matching OPP (%p: %p)\n", __func__, src_table,
+ dst_table);
+
+unlock:
+ mutex_unlock(&src_table->lock);
+
+ return dest_opp;
+}
+
/**
* dev_pm_opp_xlate_performance_state() - Find required OPP's pstate for src_table.
* @src_table: OPP table which has dst_table as one of its required OPP table.
diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h
index b150fe97ce5a..bc5c68bdfc8d 100644
--- a/include/linux/pm_opp.h
+++ b/include/linux/pm_opp.h
@@ -134,6 +134,9 @@ void dev_pm_opp_unregister_set_opp_helper(struct opp_table *opp_table);
struct opp_table *dev_pm_opp_set_genpd_virt_dev(struct device *dev, struct device *virt_dev, int index);
void dev_pm_opp_put_genpd_virt_dev(struct opp_table *opp_table, struct device *virt_dev);
int dev_pm_opp_xlate_performance_state(struct opp_table *src_table, struct opp_table *dst_table, unsigned int pstate);
+struct dev_pm_opp *dev_pm_opp_xlate_opp(struct opp_table *src_table,
+ struct opp_table *dst_table,
+ struct dev_pm_opp *src_opp);
int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq);
int dev_pm_opp_set_sharing_cpus(struct device *cpu_dev, const struct cpumask *cpumask);
int dev_pm_opp_get_sharing_cpus(struct device *cpu_dev, struct cpumask *cpumask);
@@ -307,6 +310,14 @@ static inline int dev_pm_opp_xlate_performance_state(struct opp_table *src_table
return -ENOTSUPP;
}
+static inline struct dev_pm_opp *dev_pm_opp_xlate_opp(
+ struct opp_table *src_table,
+ struct opp_table *dst_table,
+ struct dev_pm_opp *src_opp)
+{
+ return NULL;
+}
+
static inline int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq)
{
return -ENOTSUPP;
--
2.22.0.rc2.383.gf4fbbf30c2-goog
Look at the required OPPs of the "parent" device to determine the OPP that
is required from the slave device managed by the passive governor. This
allows having mappings between a parent device and a slave device even when
they don't have the same number of OPPs.
Signed-off-by: Saravana Kannan <[email protected]>
---
drivers/devfreq/governor_passive.c | 25 +++++++++++++++++++++++--
1 file changed, 23 insertions(+), 2 deletions(-)
diff --git a/drivers/devfreq/governor_passive.c b/drivers/devfreq/governor_passive.c
index 3bc29acbd54e..bd4a98bb15b1 100644
--- a/drivers/devfreq/governor_passive.c
+++ b/drivers/devfreq/governor_passive.c
@@ -21,8 +21,9 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
struct devfreq_passive_data *p_data
= (struct devfreq_passive_data *)devfreq->data;
struct devfreq *parent_devfreq = (struct devfreq *)p_data->parent;
+ struct opp_table *opp_table = NULL, *c_opp_table = NULL;
unsigned long child_freq = ULONG_MAX;
- struct dev_pm_opp *opp;
+ struct dev_pm_opp *opp = NULL, *c_opp = NULL;
int i, count, ret = 0;
/*
@@ -65,7 +66,20 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
goto out;
}
- dev_pm_opp_put(opp);
+ opp_table = dev_pm_opp_get_opp_table(parent_devfreq->dev.parent);
+ if (IS_ERR_OR_NULL(opp_table)) {
+ ret = PTR_ERR(opp_table);
+ goto out;
+ }
+
+ c_opp_table = dev_pm_opp_get_opp_table(devfreq->dev.parent);
+ if (!IS_ERR_OR_NULL(c_opp_table))
+ c_opp = dev_pm_opp_xlate_opp(opp_table, c_opp_table, opp);
+ if (c_opp) {
+ *freq = dev_pm_opp_get_freq(c_opp);
+ dev_pm_opp_put(c_opp);
+ goto out;
+ }
/*
* Get the OPP table's index of decided freqeuncy by governor
@@ -92,6 +106,13 @@ static int devfreq_passive_get_target_freq(struct devfreq *devfreq,
*freq = child_freq;
out:
+ if (!IS_ERR_OR_NULL(opp_table))
+ dev_pm_opp_put_opp_table(opp_table);
+ if (!IS_ERR_OR_NULL(c_opp_table))
+ dev_pm_opp_put_opp_table(c_opp_table);
+ if (!IS_ERR_OR_NULL(opp))
+ dev_pm_opp_put(opp);
+
return ret;
}
--
2.22.0.rc2.383.gf4fbbf30c2-goog
Not all devices quantify their performance points in terms of frequency.
Devices like interconnects quantify their performance points in terms of
bandwidth. We need a way to represent these bandwidth levels in OPP. So,
add support for parsing bandwidth OPPs from DT.
Signed-off-by: Saravana Kannan <[email protected]>
---
drivers/opp/of.c | 34 ++++++++++++++++++++++++++++++++--
drivers/opp/opp.h | 4 +++-
2 files changed, 35 insertions(+), 3 deletions(-)
diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index 7c8336e94aff..d5815289518a 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -538,6 +538,35 @@ void dev_pm_opp_of_remove_table(struct device *dev)
}
EXPORT_SYMBOL_GPL(dev_pm_opp_of_remove_table);
+static int _read_opp_key(struct dev_pm_opp *new_opp, struct device_node *np)
+{
+ int ret;
+ u64 rate;
+ u32 bw;
+
+ ret = of_property_read_u64(np, "opp-hz", &rate);
+ if (!ret) {
+ /*
+ * Rate is defined as an unsigned long in clk API, and so
+ * casting explicitly to its type. Must be fixed once rate is 64
+ * bit guaranteed in clk API.
+ */
+ new_opp->rate = (unsigned long)rate;
+ return 0;
+ }
+
+ ret = of_property_read_u32(np, "opp-peak-KBps", &bw);
+ if (ret)
+ return ret;
+ new_opp->rate = (unsigned long) &bw;
+
+ ret = of_property_read_u32(np, "opp-avg-KBps", &bw);
+ if (!ret)
+ new_opp->avg_bw = (unsigned long) &bw;
+
+ return 0;
+}
+
/**
* _opp_add_static_v2() - Allocate static OPPs (As per 'v2' DT bindings)
* @opp_table: OPP table
@@ -575,11 +604,12 @@ static struct dev_pm_opp *_opp_add_static_v2(struct opp_table *opp_table,
if (!new_opp)
return ERR_PTR(-ENOMEM);
- ret = of_property_read_u64(np, "opp-hz", &rate);
+ ret = _read_opp_key(new_opp, np);
if (ret < 0) {
/* "opp-hz" is optional for devices like power domains. */
if (!opp_table->is_genpd) {
- dev_err(dev, "%s: opp-hz not found\n", __func__);
+ dev_err(dev, "%s: opp-hz or opp-peak-bw not found\n",
+ __func__);
goto free_opp;
}
diff --git a/drivers/opp/opp.h b/drivers/opp/opp.h
index 569b3525aa67..ead2cdafe957 100644
--- a/drivers/opp/opp.h
+++ b/drivers/opp/opp.h
@@ -59,7 +59,8 @@ extern struct list_head opp_tables;
* @turbo: true if turbo (boost) OPP
* @suspend: true if suspend OPP
* @pstate: Device's power domain's performance state.
- * @rate: Frequency in hertz
+ * @rate: Frequency in hertz OR Peak bandwidth in kilobytes per second
+ * @avg_bw: Average bandwidth in kilobytes per second
* @level: Performance level
* @supplies: Power supplies voltage/current values
* @clock_latency_ns: Latency (in nanoseconds) of switching to this OPP's
@@ -81,6 +82,7 @@ struct dev_pm_opp {
bool suspend;
unsigned int pstate;
unsigned long rate;
+ unsigned long avg_bw;
unsigned int level;
struct dev_pm_opp_supply *supplies;
--
2.22.0.rc2.383.gf4fbbf30c2-goog
This allows finding a device's OPP table (when it has multiple) from a
phandle to the OPP table in DT.
Signed-off-by: Saravana Kannan <[email protected]>
---
drivers/opp/of.c | 42 ++++++++++++++++++++++++++++++++++--------
include/linux/pm_opp.h | 7 +++++++
2 files changed, 41 insertions(+), 8 deletions(-)
diff --git a/drivers/opp/of.c b/drivers/opp/of.c
index d5815289518a..e8697a4ca3bc 100644
--- a/drivers/opp/of.c
+++ b/drivers/opp/of.c
@@ -42,14 +42,9 @@ struct device_node *dev_pm_opp_of_get_opp_desc_node(struct device *dev)
}
EXPORT_SYMBOL_GPL(dev_pm_opp_of_get_opp_desc_node);
-struct opp_table *_managed_opp(struct device *dev, int index)
+struct opp_table *_find_opp_table_from_node(struct device_node *np)
{
struct opp_table *opp_table, *managed_table = NULL;
- struct device_node *np;
-
- np = _opp_of_get_opp_desc_node(dev->of_node, index);
- if (!np)
- return NULL;
list_for_each_entry(opp_table, &opp_tables, node) {
if (opp_table->np == np) {
@@ -69,11 +64,42 @@ struct opp_table *_managed_opp(struct device *dev, int index)
}
}
- of_node_put(np);
-
return managed_table;
}
+/**
+ * dev_pm_opp_of_find_table_from_node() - Find OPP table from its DT node
+ * @np: DT node used for finding the OPP table
+ *
+ * Return: OPP table corresponding to the DT node, else NULL on failure.
+ *
+ * The caller needs to put the node with of_node_put() after using it.
+ */
+struct opp_table *dev_pm_opp_of_find_table_from_node(struct device_node *np)
+{
+ struct opp_table *opp_table;
+
+ mutex_lock(&opp_table_lock);
+ opp_table = _find_opp_table_from_node(np);
+ mutex_unlock(&opp_table_lock);
+ return opp_table;
+}
+EXPORT_SYMBOL_GPL(dev_pm_opp_of_find_table_from_node);
+
+struct opp_table *_managed_opp(struct device *dev, int index)
+{
+ struct device_node *np;
+ struct opp_table *opp_table;
+
+ np = _opp_of_get_opp_desc_node(dev->of_node, index);
+ if (!np)
+ return NULL;
+
+ opp_table = _find_opp_table_from_node(np);
+ of_node_put(np);
+ return opp_table;
+}
+
/* The caller must call dev_pm_opp_put() after the OPP is used */
static struct dev_pm_opp *_find_opp_of_np(struct opp_table *opp_table,
struct device_node *opp_np)
diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h
index 2e122af26b8e..d9156b62d966 100644
--- a/include/linux/pm_opp.h
+++ b/include/linux/pm_opp.h
@@ -370,6 +370,7 @@ int dev_pm_opp_of_cpumask_add_table(const struct cpumask *cpumask);
void dev_pm_opp_of_cpumask_remove_table(const struct cpumask *cpumask);
int dev_pm_opp_of_get_sharing_cpus(struct device *cpu_dev, struct cpumask *cpumask);
struct device_node *dev_pm_opp_of_get_opp_desc_node(struct device *dev);
+struct opp_table *dev_pm_opp_of_find_table_from_node(struct device_node *np);
struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp *opp);
int of_get_required_opp_performance_state(struct device_node *np, int index);
void dev_pm_opp_of_register_em(struct cpumask *cpus);
@@ -407,6 +408,12 @@ static inline struct device_node *dev_pm_opp_of_get_opp_desc_node(struct device
return NULL;
}
+static inline struct opp_table *dev_pm_opp_of_find_table_from_node(
+ struct device_node *np)
+{
+ return NULL;
+}
+
static inline struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp *opp)
{
return NULL;
--
2.22.0.rc2.383.gf4fbbf30c2-goog
Add support for listing bandwidth OPP tables for each interconnect path
listed using the interconnects property.
Signed-off-by: Saravana Kannan <[email protected]>
---
.../devicetree/bindings/interconnect/interconnect.txt | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/Documentation/devicetree/bindings/interconnect/interconnect.txt b/Documentation/devicetree/bindings/interconnect/interconnect.txt
index 6f5d23a605b7..fc5b75b76a2c 100644
--- a/Documentation/devicetree/bindings/interconnect/interconnect.txt
+++ b/Documentation/devicetree/bindings/interconnect/interconnect.txt
@@ -55,10 +55,18 @@ interconnect-names : List of interconnect path name strings sorted in the same
* dma-mem: Path from the device to the main memory of
the system
+interconnect-opp-table: List of phandles to OPP tables (bandwidth OPP tables)
+ that specify the OPPs for the interconnect paths listed
+ in the interconnects property. This property can only
+ point to OPP tables that belong to the device and are
+ listed in the device's operating-points-v2 property.
+
Example:
sdhci@7864000 {
+ operating-points-v2 = <&sdhc_opp_table>, <&sdhc_mem_opp_table>;
...
interconnects = <&pnoc MASTER_SDCC_1 &bimc SLAVE_EBI_CH0>;
interconnect-names = "sdhc-mem";
+ interconnect-opp-table = <&sdhc_mem_opp_table>;
};
--
2.22.0.rc2.383.gf4fbbf30c2-goog
Some hardware devices might create multiple children devices to manage
different components of the hardware. In these cases, it might be necessary
for the original hardware device to copy specific OPP tables to a specific
the new child device. Add dev_pm_opp_add_opp_table() to do that.
Signed-off-by: Saravana Kannan <[email protected]>
---
drivers/opp/core.c | 8 ++++++++
include/linux/pm_opp.h | 7 +++++++
2 files changed, 15 insertions(+)
diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index c9914afd508a..668a377f59a5 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -943,6 +943,14 @@ struct opp_device *_add_opp_dev(const struct device *dev,
return opp_dev;
}
+int dev_pm_opp_add_opp_table(struct device *dev, struct opp_table *opp_table)
+{
+ if (!dev || !opp_table)
+ return -EINVAL;
+
+ return _add_opp_dev(dev, opp_table) ? 0 : -ENOMEM;
+}
+
static struct opp_table *_allocate_opp_table(struct device *dev, int index)
{
struct opp_table *opp_table;
diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h
index d9156b62d966..3694d703817f 100644
--- a/include/linux/pm_opp.h
+++ b/include/linux/pm_opp.h
@@ -81,6 +81,7 @@ struct dev_pm_set_opp_data {
struct opp_table *dev_pm_opp_get_opp_table(struct device *dev);
struct opp_table *dev_pm_opp_get_opp_table_indexed(struct device *dev, int index);
void dev_pm_opp_put_opp_table(struct opp_table *opp_table);
+int dev_pm_opp_add_opp_table(struct device *dev, struct opp_table *opp_table);
unsigned long dev_pm_opp_get_voltage(struct dev_pm_opp *opp);
@@ -158,6 +159,12 @@ static inline struct opp_table *dev_pm_opp_get_opp_table_indexed(struct device *
static inline void dev_pm_opp_put_opp_table(struct opp_table *opp_table) {}
+static int dev_pm_opp_add_opp_table(struct device *dev,
+ struct opp_table *opp_table)
+{
+ return -ENOTSUPP;
+}
+
static inline unsigned long dev_pm_opp_get_voltage(struct dev_pm_opp *opp)
{
return 0;
--
2.22.0.rc2.383.gf4fbbf30c2-goog
The frequency OPP tables have helper functions to search for entries in the
table based on frequency and get the frequency values for a given (or
suspend) OPP entry.
Add similar helper functions for bandwidth OPP tables to search for entries
in the table based on peak bandwidth and to get the peak and average
bandwidth for a given (or suspend) OPP entry.
Signed-off-by: Saravana Kannan <[email protected]>
---
drivers/opp/core.c | 51 ++++++++++++++++++++++++++++++++++++++++++
include/linux/pm_opp.h | 19 ++++++++++++++++
2 files changed, 70 insertions(+)
diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index 4f7870bffbf8..c9914afd508a 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -130,6 +130,29 @@ unsigned long dev_pm_opp_get_freq(struct dev_pm_opp *opp)
}
EXPORT_SYMBOL_GPL(dev_pm_opp_get_freq);
+/**
+ * dev_pm_opp_get_bw() - Gets the bandwidth corresponding to an available opp
+ * @opp: opp for which frequency has to be returned for
+ * @avg_bw: Pointer where the corresponding average bandwidth is stored.
+ * Can be NULL.
+ *
+ * Return: Peak bandwidth in KBps corresponding to the opp, else
+ * return 0
+ */
+unsigned long dev_pm_opp_get_bw(struct dev_pm_opp *opp, unsigned long *avg_bw)
+{
+ if (IS_ERR_OR_NULL(opp) || !opp->available) {
+ pr_err("%s: Invalid parameters\n", __func__);
+ return 0;
+ }
+
+ if (avg_bw)
+ *avg_bw = opp->avg_bw;
+
+ return opp->rate;
+}
+EXPORT_SYMBOL_GPL(dev_pm_opp_get_bw);
+
/**
* dev_pm_opp_get_level() - Gets the level corresponding to an available opp
* @opp: opp for which level value has to be returned for
@@ -302,6 +325,34 @@ unsigned long dev_pm_opp_get_suspend_opp_freq(struct device *dev)
}
EXPORT_SYMBOL_GPL(dev_pm_opp_get_suspend_opp_freq);
+/**
+ * dev_pm_opp_get_suspend_opp_bw() - Get peak bandwidth of suspend opp in KBps
+ * @dev: device for which we do this operation
+ * @avg_bw: Pointer where the corresponding average bandwidth is stored.
+ * Can be NULL.
+ *
+ * Return: This function returns the peak bandwidth of the OPP marked as
+ * suspend_opp if one is available, else returns 0;
+ */
+unsigned long dev_pm_opp_get_suspend_opp_bw(struct device *dev,
+ unsigned long *avg_bw)
+{
+ struct opp_table *opp_table;
+ unsigned long peak_bw = 0;
+
+ opp_table = _find_opp_table(dev);
+ if (IS_ERR(opp_table))
+ return 0;
+
+ if (opp_table->suspend_opp && opp_table->suspend_opp->available)
+ peak_bw = dev_pm_opp_get_bw(opp_table->suspend_opp, avg_bw);
+
+ dev_pm_opp_put_opp_table(opp_table);
+
+ return peak_bw;
+}
+EXPORT_SYMBOL_GPL(dev_pm_opp_get_suspend_opp_bw);
+
int _get_opp_count(struct opp_table *opp_table)
{
struct dev_pm_opp *opp;
diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h
index bc5c68bdfc8d..2e122af26b8e 100644
--- a/include/linux/pm_opp.h
+++ b/include/linux/pm_opp.h
@@ -85,6 +85,7 @@ void dev_pm_opp_put_opp_table(struct opp_table *opp_table);
unsigned long dev_pm_opp_get_voltage(struct dev_pm_opp *opp);
unsigned long dev_pm_opp_get_freq(struct dev_pm_opp *opp);
+unsigned long dev_pm_opp_get_bw(struct dev_pm_opp *opp, unsigned long *avg_bw);
unsigned int dev_pm_opp_get_level(struct dev_pm_opp *opp);
@@ -95,6 +96,8 @@ unsigned long dev_pm_opp_get_max_clock_latency(struct device *dev);
unsigned long dev_pm_opp_get_max_volt_latency(struct device *dev);
unsigned long dev_pm_opp_get_max_transition_latency(struct device *dev);
unsigned long dev_pm_opp_get_suspend_opp_freq(struct device *dev);
+unsigned long dev_pm_opp_get_suspend_opp_bw(struct device *dev,
+ unsigned long *avg_bw);
struct dev_pm_opp *dev_pm_opp_find_freq_exact(struct device *dev,
unsigned long freq,
@@ -164,6 +167,11 @@ static inline unsigned long dev_pm_opp_get_freq(struct dev_pm_opp *opp)
{
return 0;
}
+static inline unsigned long dev_pm_opp_get_bw(struct dev_pm_opp *opp,
+ unsigned long *avg_bw)
+{
+ return 0;
+}
static inline unsigned int dev_pm_opp_get_level(struct dev_pm_opp *opp)
{
@@ -200,6 +208,12 @@ static inline unsigned long dev_pm_opp_get_suspend_opp_freq(struct device *dev)
return 0;
}
+static inline unsigned long dev_pm_opp_get_suspend_opp_bw(struct device *dev,
+ unsigned long *avg_bw)
+{
+ return 0;
+}
+
static inline struct dev_pm_opp *dev_pm_opp_find_freq_exact(struct device *dev,
unsigned long freq, bool available)
{
@@ -343,6 +357,11 @@ static inline void dev_pm_opp_cpumask_remove_table(const struct cpumask *cpumask
#endif /* CONFIG_PM_OPP */
+#define dev_pm_opp_find_peak_bw_exact dev_pm_opp_find_freq_exact
+#define dev_pm_opp_find_peak_bw_floor dev_pm_opp_find_freq_floor
+#define dev_pm_opp_find_peak_bw_ceil_by_volt dev_pm_opp_find_freq_ceil_by_volt
+#define dev_pm_opp_find_peak_bw_ceil dev_pm_opp_find_freq_ceil
+
#if defined(CONFIG_PM_OPP) && defined(CONFIG_OF)
int dev_pm_opp_of_add_table(struct device *dev);
int dev_pm_opp_of_add_table_indexed(struct device *dev, int index);
--
2.22.0.rc2.383.gf4fbbf30c2-goog
Interconnect paths can have different performance points. Now that OPP
framework supports bandwidth OPP tables, add OPP table support for
interconnects.
Devices can use the interconnect-opp-table DT property to specify OPP
tables for interconnect paths. And the driver can obtain the OPP table for
an interconnect path by calling icc_get_opp_table().
Signed-off-by: Saravana Kannan <[email protected]>
---
drivers/interconnect/core.c | 27 ++++++++++++++++++++++++++-
include/linux/interconnect.h | 7 +++++++
2 files changed, 33 insertions(+), 1 deletion(-)
diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c
index 871eb4bc4efc..881bac80bc1e 100644
--- a/drivers/interconnect/core.c
+++ b/drivers/interconnect/core.c
@@ -47,6 +47,7 @@ struct icc_req {
*/
struct icc_path {
size_t num_nodes;
+ struct opp_table *opp_table;
struct icc_req reqs[];
};
@@ -313,7 +314,7 @@ struct icc_path *of_icc_get(struct device *dev, const char *name)
{
struct icc_path *path = ERR_PTR(-EPROBE_DEFER);
struct icc_node *src_node, *dst_node;
- struct device_node *np = NULL;
+ struct device_node *np = NULL, *opp_node;
struct of_phandle_args src_args, dst_args;
int idx = 0;
int ret;
@@ -381,10 +382,34 @@ struct icc_path *of_icc_get(struct device *dev, const char *name)
dev_err(dev, "%s: invalid path=%ld\n", __func__, PTR_ERR(path));
mutex_unlock(&icc_lock);
+ opp_node = of_parse_phandle(np, "interconnect-opp-table", idx);
+ if (opp_node) {
+ path->opp_table = dev_pm_opp_of_find_table_from_node(opp_node);
+ of_node_put(opp_node);
+ }
+
+
return path;
}
EXPORT_SYMBOL_GPL(of_icc_get);
+/**
+ * icc_get_opp_table() - Get the OPP table that corresponds to a path
+ * @path: reference to the path returned by icc_get()
+ *
+ * This function will return the OPP table that corresponds to a path handle.
+ * If the interconnect API is disabled, NULL is returned and the consumer
+ * drivers will still build. Drivers are free to handle this specifically, but
+ * they don't have to.
+ *
+ * Return: opp_table pointer on success. NULL is returned when the API is
+ * disabled or the OPP table is missing.
+ */
+struct opp_table *icc_get_opp_table(struct icc_path *path)
+{
+ return path->opp_table;
+}
+
/**
* icc_set_bw() - set bandwidth constraints on an interconnect path
* @path: reference to the path returned by icc_get()
diff --git a/include/linux/interconnect.h b/include/linux/interconnect.h
index dc25864755ba..0c0bc55f0e89 100644
--- a/include/linux/interconnect.h
+++ b/include/linux/interconnect.h
@@ -9,6 +9,7 @@
#include <linux/mutex.h>
#include <linux/types.h>
+#include <linux/pm_opp.h>
/* macros for converting to icc units */
#define Bps_to_icc(x) ((x) / 1000)
@@ -28,6 +29,7 @@ struct device;
struct icc_path *icc_get(struct device *dev, const int src_id,
const int dst_id);
struct icc_path *of_icc_get(struct device *dev, const char *name);
+struct opp_table *icc_get_opp_table(struct icc_path *path);
void icc_put(struct icc_path *path);
int icc_set_bw(struct icc_path *path, u32 avg_bw, u32 peak_bw);
@@ -49,6 +51,11 @@ static inline void icc_put(struct icc_path *path)
{
}
+static inline struct opp_table *icc_get_opp_table(struct icc_path *path)
+{
+ return NULL;
+}
+
static inline int icc_set_bw(struct icc_path *path, u32 avg_bw, u32 peak_bw)
{
return 0;
--
2.22.0.rc2.383.gf4fbbf30c2-goog
Add a icc_create_devfreq() and icc_remove_devfreq() to create and remove
devfreq devices for interconnect paths. A driver can create/remove devfreq
devices for the interconnects needed for its device by calling these APIs.
This would allow various devfreq governors to work with interconnect paths
and the device driver itself doesn't have to actively manage the bandwidth
votes for the interconnects.
Signed-off-by: Saravana Kannan <[email protected]>
---
drivers/interconnect/Makefile | 2 +-
drivers/interconnect/icc-devfreq.c | 156 +++++++++++++++++++++++++++++
include/linux/interconnect.h | 14 +++
3 files changed, 171 insertions(+), 1 deletion(-)
create mode 100644 drivers/interconnect/icc-devfreq.c
diff --git a/drivers/interconnect/Makefile b/drivers/interconnect/Makefile
index 28f2ab0824d5..ddfb65b7fa55 100644
--- a/drivers/interconnect/Makefile
+++ b/drivers/interconnect/Makefile
@@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
-icc-core-objs := core.o
+icc-core-objs := core.o icc-devfreq.o
obj-$(CONFIG_INTERCONNECT) += icc-core.o
obj-$(CONFIG_INTERCONNECT_QCOM) += qcom/
diff --git a/drivers/interconnect/icc-devfreq.c b/drivers/interconnect/icc-devfreq.c
new file mode 100644
index 000000000000..608716fbe612
--- /dev/null
+++ b/drivers/interconnect/icc-devfreq.c
@@ -0,0 +1,156 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * A devfreq device driver for interconnect paths
+ *
+ * Copyright (C) 2019 Google, Inc
+ */
+
+#include <linux/devfreq.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/pm_opp.h>
+#include <linux/slab.h>
+#include <linux/interconnect.h>
+#include <linux/limits.h>
+
+struct icc_devfreq {
+ struct device icc_dev;
+ struct icc_path *path;
+ struct devfreq_dev_profile dp;
+ struct devfreq *df;
+ unsigned long peak_bw;
+ unsigned long avg_bw;
+#if IS_ENABLED(CONFIG_DEVFREQ_GOV_PASSIVE)
+ struct devfreq_passive_data p_data;
+#endif
+};
+
+static int icc_target(struct device *dev, unsigned long *freq,
+ u32 flags)
+{
+ struct icc_devfreq *d = dev_get_drvdata(dev);
+ struct dev_pm_opp *opp;
+ unsigned long peak_bw, avg_bw;
+
+ opp = devfreq_recommended_opp(dev, &peak_bw, flags);
+ if (IS_ERR(opp)) {
+ dev_err(dev, "Failed to find opp for %lu KHz\n", *freq);
+ return PTR_ERR(opp);
+ }
+ peak_bw = dev_pm_opp_get_bw(opp, &avg_bw);
+ dev_pm_opp_put(opp);
+
+ if (!icc_set_bw(d->path, avg_bw, peak_bw)) {
+ *freq = peak_bw;
+ d->peak_bw = peak_bw;
+ d->avg_bw = avg_bw;
+ }
+
+ return 0;
+}
+
+static int icc_get_dev_status(struct device *dev,
+ struct devfreq_dev_status *stat)
+{
+ struct icc_devfreq *d = dev_get_drvdata(dev);
+
+ stat->current_frequency = d->peak_bw;
+ return 0;
+}
+
+#define icc_dev_to_data(DEV) container_of((DEV), struct icc_devfreq, icc_dev)
+static void icc_dev_release(struct device *dev)
+{
+ struct icc_devfreq *d = icc_dev_to_data(dev);
+
+ kfree(d);
+}
+
+struct icc_devfreq *icc_create_devfreq(struct device *dev, char *name,
+ struct devfreq *devf)
+{
+ struct icc_devfreq *d;
+ struct icc_path *path;
+ struct opp_table *opp_table;
+ struct dev_pm_opp *opp;
+ void *p_data = NULL;
+ unsigned long peak_bw = U32_MAX, avg_bw = U32_MAX;
+ int err;
+
+ if (!name)
+ return ERR_PTR(-EINVAL);
+
+ path = of_icc_get(dev, name);
+ if (IS_ERR(path))
+ return (void *) path;
+
+ opp_table = icc_get_opp_table(path);
+ if (!opp_table) {
+ err = -EINVAL;
+ goto out_path;
+ }
+
+ d = kzalloc(sizeof(*d), GFP_KERNEL);
+ if (!d) {
+ err = -ENOMEM;
+ goto out_path;
+ }
+ d->path = path;
+
+ d->icc_dev.parent = dev;
+ d->icc_dev.release = icc_dev_release;
+ dev_set_name(&d->icc_dev, "%s-icc-%s", dev_name(dev), name);
+ err = device_register(&d->icc_dev);
+ if (err) {
+ put_device(&d->icc_dev);
+ goto out_path;
+ }
+
+ dev_pm_opp_add_opp_table(&d->icc_dev, opp_table);
+ opp = dev_pm_opp_find_peak_bw_floor(dev, &peak_bw);
+ peak_bw = dev_pm_opp_get_bw(opp, &avg_bw);
+ dev_pm_opp_put(opp);
+
+ if (!icc_set_bw(d->path, avg_bw, peak_bw)) {
+ d->peak_bw = peak_bw;
+ d->avg_bw = avg_bw;
+ }
+
+#if !IS_ENABLED(CONFIG_DEVFREQ_GOV_PASSIVE)
+ if (devf) {
+ d->p_data.parent = devf;
+ pdata = &d->p_data;
+ }
+#endif
+ d->dp.initial_freq = peak_bw;
+ d->dp.polling_ms = 0;
+ d->dp.target = icc_target;
+ d->dp.get_dev_status = icc_get_dev_status;
+ d->df = devm_devfreq_add_device(&d->icc_dev,
+ &d->dp,
+ p_data ? "passive" : "performance",
+ p_data);
+ if (IS_ERR(d->df)) {
+ err = PTR_ERR(d->df);
+ goto out_dev;
+ }
+
+ return d;
+out_dev:
+ put_device(&d->icc_dev);
+out_path:
+ icc_put(path);
+ return ERR_PTR(err);
+}
+EXPORT_SYMBOL_GPL(icc_create_devfreq);
+
+void icc_remove_devfreq(struct icc_devfreq *d)
+{
+ icc_put(d->path);
+ put_device(&d->icc_dev);
+}
+EXPORT_SYMBOL_GPL(icc_remove_devfreq);
+
+MODULE_LICENSE("GPL v2");
+MODULE_DESCRIPTION("interconnect devfreq device driver");
+MODULE_AUTHOR("Saravana Kannan <[email protected]>");
diff --git a/include/linux/interconnect.h b/include/linux/interconnect.h
index 0c0bc55f0e89..ec5879a79301 100644
--- a/include/linux/interconnect.h
+++ b/include/linux/interconnect.h
@@ -10,6 +10,7 @@
#include <linux/mutex.h>
#include <linux/types.h>
#include <linux/pm_opp.h>
+#include <linux/devfreq.h>
/* macros for converting to icc units */
#define Bps_to_icc(x) ((x) / 1000)
@@ -23,6 +24,7 @@
struct icc_path;
struct device;
+struct icc_devfreq;
#if IS_ENABLED(CONFIG_INTERCONNECT)
@@ -32,6 +34,9 @@ struct icc_path *of_icc_get(struct device *dev, const char *name);
struct opp_table *icc_get_opp_table(struct icc_path *path);
void icc_put(struct icc_path *path);
int icc_set_bw(struct icc_path *path, u32 avg_bw, u32 peak_bw);
+struct icc_devfreq *icc_create_devfreq(struct device *dev, char *name,
+ struct devfreq *devf);
+void icc_remove_devfreq(struct icc_devfreq *d);
#else
@@ -61,6 +66,15 @@ static inline int icc_set_bw(struct icc_path *path, u32 avg_bw, u32 peak_bw)
return 0;
}
+static inline struct icc_devfreq *icc_create_devfreq(struct device *dev,
+ char *name,
+ struct devfreq *devf);
+{
+ return NULL;
+}
+
+void icc_remove_devfreq(struct icc_devfreq *d) {}
+
#endif /* CONFIG_INTERCONNECT */
#endif /* __LINUX_INTERCONNECT_H */
--
2.22.0.rc2.383.gf4fbbf30c2-goog
Interconnects often quantify their performance points in terms of
bandwidth. So, add opp-peak-KBps (required) and opp-avg-KBps (optional) to
allow specifying Bandwidth OPP tables in DT.
opp-peak-KBps is a required property that replace opp-hz for Bandwidth OPP
tables.
opp-avg-KBps is an optional property that can be used in Bandwidth OPP
tables.
Signed-off-by: Saravana Kannan <[email protected]>
---
Documentation/devicetree/bindings/opp/opp.txt | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/Documentation/devicetree/bindings/opp/opp.txt b/Documentation/devicetree/bindings/opp/opp.txt
index 76b6c79604a5..c869e87caa2a 100644
--- a/Documentation/devicetree/bindings/opp/opp.txt
+++ b/Documentation/devicetree/bindings/opp/opp.txt
@@ -83,9 +83,14 @@ properties.
Required properties:
- opp-hz: Frequency in Hz, expressed as a 64-bit big-endian integer. This is a
- required property for all device nodes but devices like power domains. The
- power domain nodes must have another (implementation dependent) property which
- uniquely identifies the OPP nodes.
+ required property for all device nodes but for devices like power domains or
+ bandwidth opp tables. The power domain nodes must have another (implementation
+ dependent) property which uniquely identifies the OPP nodes. The interconnect
+ opps are required to have the opp-peak-bw property.
+
+- opp-peak-KBps: Peak bandwidth in kilobytes per second, expressed as a 32-bit
+ big-endian integer. This is a required property for all devices that don't
+ have opp-hz. For example, bandwidth OPP tables for interconnect paths.
Optional properties:
- opp-microvolt: voltage in micro Volts.
@@ -132,6 +137,10 @@ Optional properties:
- opp-level: A value representing the performance level of the device,
expressed as a 32-bit integer.
+- opp-avg-KBps: Average bandwidth in kilobytes per second, expressed as a
+ 32-bit big-endian integer. This property is only meaningful in OPP tables
+ where opp-peak-KBps is present.
+
- clock-latency-ns: Specifies the maximum possible transition latency (in
nanoseconds) for switching to this OPP from any other OPP.
--
2.22.0.rc2.383.gf4fbbf30c2-goog
Hi Saravana,
On 6/14/19 07:17, Saravana Kannan wrote:
> Add a icc_create_devfreq() and icc_remove_devfreq() to create and remove
> devfreq devices for interconnect paths. A driver can create/remove devfreq
> devices for the interconnects needed for its device by calling these APIs.
> This would allow various devfreq governors to work with interconnect paths
> and the device driver itself doesn't have to actively manage the bandwidth
> votes for the interconnects.
Thanks for the patches, but creating devfreq devices for each interconnect path
seems odd to me - at least for consumers that already use a governor. So for DDR
scaling for example, are you suggesting that we add a devfreq device from the
cpufreq driver in order to scale the interconnect between CPU<->DDR? Also if the
GPU is already using devfreq, should we add a devfreq per each interconnect
path? What would be the benefit in this case - using different governors for
bandwidth scaling maybe?
Thanks,
Georgi
On Mon, Jun 17, 2019 at 8:44 AM Georgi Djakov <[email protected]> wrote:
>
> Hi Saravana,
>
> On 6/14/19 07:17, Saravana Kannan wrote:
> > Add a icc_create_devfreq() and icc_remove_devfreq() to create and remove
> > devfreq devices for interconnect paths. A driver can create/remove devfreq
> > devices for the interconnects needed for its device by calling these APIs.
> > This would allow various devfreq governors to work with interconnect paths
> > and the device driver itself doesn't have to actively manage the bandwidth
> > votes for the interconnects.
>
> Thanks for the patches, but creating devfreq devices for each interconnect path
> seems odd to me - at least for consumers that already use a governor.
Each governor instance always handles one "frequency" (more like
performance) domain at a time. So if a consumer is already using a
governor to scale the hardware block, then using another governor to
scale the interconnect performance points is the right way to go about
it. In fact, that's exactly what devfreq passive governor's
documentation even says it's meant for. That's also what cpufreq does
for each cluster/CPU frequency domain too.
> So for DDR
> scaling for example, are you suggesting that we add a devfreq device from the
> cpufreq driver in order to scale the interconnect between CPU<->DDR?
Yes in general. Although, CPUs are a special case because CPUs don't
go through devfreq. So passive governor as it stands today won't work.
CPU<->DDR scaling might need a separate governor (unlikely) or some
changes to the passive governor that I'm happy to work on once we
settle this for general devices like GPU, etc. But the DT format for
CPUs will be identical to GPUs or any other device.
> Also if the
> GPU is already using devfreq, should we add a devfreq per each interconnect
> path? What would be the benefit in this case - using different governors for
> bandwidth scaling maybe?
When saying "separate/different governors" in this email, I mean both
different instance of the same governor logic with different tunables
AND actually different algorithms/governor logic entirely.
The heuristics to use for each interconnect path might be (more like,
will be) different based on hardware characteristics (Eg: what voltage
domains the interconnect is sitting on) and what interconnect
information is available (Eg: Just busy time vs bandwidth count vs no
information etc) -- so having separate governors for each interconnect
path makes a lot of sense. It also allows userspace to control the
policy for scaling each of those paths based on product use cases.
For example, when the GPU is just doing simple UI rendering, userspace
can use the max_freq sysfs file for the devfreq device to disallow high
bandwidth OPPs on the GPU<->DDR path, but those higher OPPs could be
allowed by userspace when the GPU is used for games. Having devfreq
device for each interconnect path also make it easy to debug
performance issues -- you can independently change the votes for each
path to figure out what is causing the bottleneck, etc.
Adding a devfreq device for interconnect voting with a few lines gives
all these features "for free".
This doesn't mean all users of interconnect framework NEED to use
devfreq for interconnect. They might do it simply based on
calculations based on the use case (Eg: display driver from display
resolution). But if they are trying to use any kind of
algorithm/heuristics, writing it as a devfreq governor should be
encouraged.
Also want to point out that BW OPPs also work for drivers that don't
use devfreq at all. The interconnect-opp-table just lists the
meaningful OPP leveld for the path and the device driver can pick one
entry from the table based on the use case.
Thanks,
Saravana
Hey Saravana,
On 6/18/19 2:48 AM, Saravana Kannan wrote:
> On Mon, Jun 17, 2019 at 8:44 AM Georgi Djakov <[email protected]> wrote:
>>
>> Hi Saravana,
>>
>> On 6/14/19 07:17, Saravana Kannan wrote:
>>> Add a icc_create_devfreq() and icc_remove_devfreq() to create and remove
>>> devfreq devices for interconnect paths. A driver can create/remove devfreq
>>> devices for the interconnects needed for its device by calling these APIs.
>>> This would allow various devfreq governors to work with interconnect paths
>>> and the device driver itself doesn't have to actively manage the bandwidth
>>> votes for the interconnects.
>>
>> Thanks for the patches, but creating devfreq devices for each interconnect path
>> seems odd to me - at least for consumers that already use a governor.
>
> Each governor instance always handles one "frequency" (more like
> performance) domain at a time. So if a consumer is already using a
> governor to scale the hardware block, then using another governor to
> scale the interconnect performance points is the right way to go about
> it. In fact, that's exactly what devfreq passive governor's
> documentation even says it's meant for. That's also what cpufreq does
> for each cluster/CPU frequency domain too.
>
>> So for DDR
>> scaling for example, are you suggesting that we add a devfreq device from the
>> cpufreq driver in order to scale the interconnect between CPU<->DDR?
>
> Yes in general. Although, CPUs are a special case because CPUs don't
> go through devfreq. So passive governor as it stands today won't work.
> CPU<->DDR scaling might need a separate governor (unlikely) or some
> changes to the passive governor that I'm happy to work on once we
> settle this for general devices like GPU, etc. But the DT format for
> CPUs will be identical to GPUs or any other device.
using icc_create_devfreq from the cpufreq-hw driver on SDM845 SoC
to scale CPU<->DDR would cause a circular dependency. (i.e) with
the addition of cpufreq notifier to the passive governor as in
https://patchwork.kernel.org/patch/11046147/ devm_devfreq_add_device
would require the cpufreq transistion notifier register and cpu
freq_cpu_get to go through. Please add your thought on addressing this.
>
>> Also if the
>> GPU is already using devfreq, should we add a devfreq per each interconnect
>> path? What would be the benefit in this case - using different governors for
>> bandwidth scaling maybe?
>
> When saying "separate/different governors" in this email, I mean both
> different instance of the same governor logic with different tunables
> AND actually different algorithms/governor logic entirely.
>
> The heuristics to use for each interconnect path might be (more like,
> will be) different based on hardware characteristics (Eg: what voltage
> domains the interconnect is sitting on) and what interconnect
> information is available (Eg: Just busy time vs bandwidth count vs no
> information etc) -- so having separate governors for each interconnect
> path makes a lot of sense. It also allows userspace to control the
> policy for scaling each of those paths based on product use cases.
>
> For example, when the GPU is just doing simple UI rendering, userspace
> can use the max_freq sysfs file for the devfreq device to disallow high
> bandwidth OPPs on the GPU<->DDR path, but those higher OPPs could be
> allowed by userspace when the GPU is used for games. Having devfreq
> device for each interconnect path also make it easy to debug
> performance issues -- you can independently change the votes for each
> path to figure out what is causing the bottleneck, etc.
>
> Adding a devfreq device for interconnect voting with a few lines gives
> all these features "for free".
>
> This doesn't mean all users of interconnect framework NEED to use
> devfreq for interconnect. They might do it simply based on
> calculations based on the use case (Eg: display driver from display
> resolution). But if they are trying to use any kind of
> algorithm/heuristics, writing it as a devfreq governor should be
> encouraged.
>
> Also want to point out that BW OPPs also work for drivers that don't
> use devfreq at all. The interconnect-opp-table just lists the
> meaningful OPP leveld for the path and the device driver can pick one
> entry from the table based on the use case.
>
> Thanks,
> Saravana
>
>
>
--
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc, is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
On Tue, Jul 16, 2019 at 11:13 AM Sibi Sankar <[email protected]> wrote:
>
> Hey Saravana,
>
> On 6/18/19 2:48 AM, Saravana Kannan wrote:
> > On Mon, Jun 17, 2019 at 8:44 AM Georgi Djakov <[email protected]> wrote:
> >>
> >> Hi Saravana,
> >>
> >> On 6/14/19 07:17, Saravana Kannan wrote:
> >>> Add a icc_create_devfreq() and icc_remove_devfreq() to create and remove
> >>> devfreq devices for interconnect paths. A driver can create/remove devfreq
> >>> devices for the interconnects needed for its device by calling these APIs.
> >>> This would allow various devfreq governors to work with interconnect paths
> >>> and the device driver itself doesn't have to actively manage the bandwidth
> >>> votes for the interconnects.
> >>
> >> Thanks for the patches, but creating devfreq devices for each interconnect path
> >> seems odd to me - at least for consumers that already use a governor.
> >
> > Each governor instance always handles one "frequency" (more like
> > performance) domain at a time. So if a consumer is already using a
> > governor to scale the hardware block, then using another governor to
> > scale the interconnect performance points is the right way to go about
> > it. In fact, that's exactly what devfreq passive governor's
> > documentation even says it's meant for. That's also what cpufreq does
> > for each cluster/CPU frequency domain too.
> >
> >> So for DDR
> >> scaling for example, are you suggesting that we add a devfreq device from the
> >> cpufreq driver in order to scale the interconnect between CPU<->DDR?
> >
> > Yes in general. Although, CPUs are a special case because CPUs don't
> > go through devfreq. So passive governor as it stands today won't work.
> > CPU<->DDR scaling might need a separate governor (unlikely) or some
> > changes to the passive governor that I'm happy to work on once we
> > settle this for general devices like GPU, etc. But the DT format for
> > CPUs will be identical to GPUs or any other device.
>
> using icc_create_devfreq from the cpufreq-hw driver on SDM845 SoC
> to scale CPU<->DDR would cause a circular dependency. (i.e) with
> the addition of cpufreq notifier to the passive governor as in
> https://patchwork.kernel.org/patch/11046147/ devm_devfreq_add_device
> would require the cpufreq transistion notifier register and cpu
> freq_cpu_get to go through. Please add your thought on addressing this.
This is an old series. So not going to dive into this much.
But to answer your question, I wrote the cpufreq_map governor a long
time ago. So not surprised if you are finding issues with it -- it
needs a rewrite anyway.
-Saravana
> >
> >> Also if the
> >> GPU is already using devfreq, should we add a devfreq per each interconnect
> >> path? What would be the benefit in this case - using different governors for
> >> bandwidth scaling maybe?
> >
> > When saying "separate/different governors" in this email, I mean both
> > different instance of the same governor logic with different tunables
> > AND actually different algorithms/governor logic entirely.
> >
> > The heuristics to use for each interconnect path might be (more like,
> > will be) different based on hardware characteristics (Eg: what voltage
> > domains the interconnect is sitting on) and what interconnect
> > information is available (Eg: Just busy time vs bandwidth count vs no
> > information etc) -- so having separate governors for each interconnect
> > path makes a lot of sense. It also allows userspace to control the
> > policy for scaling each of those paths based on product use cases.
> >
> > For example, when the GPU is just doing simple UI rendering, userspace
> > can use the max_freq sysfs file for the devfreq device to disallow high
> > bandwidth OPPs on the GPU<->DDR path, but those higher OPPs could be
> > allowed by userspace when the GPU is used for games. Having devfreq
> > device for each interconnect path also make it easy to debug
> > performance issues -- you can independently change the votes for each
> > path to figure out what is causing the bottleneck, etc.
> >
> > Adding a devfreq device for interconnect voting with a few lines gives
> > all these features "for free".
> >
> > This doesn't mean all users of interconnect framework NEED to use
> > devfreq for interconnect. They might do it simply based on
> > calculations based on the use case (Eg: display driver from display
> > resolution). But if they are trying to use any kind of
> > algorithm/heuristics, writing it as a devfreq governor should be
> > encouraged.
> >
> > Also want to point out that BW OPPs also work for drivers that don't
> > use devfreq at all. The interconnect-opp-table just lists the
> > meaningful OPP leveld for the path and the device driver can pick one
> > entry from the table based on the use case.
> >
> > Thanks,
> > Saravana
> >
> >
> >
>
> --
> Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc, is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
>