This series adds the throttler driver, for non-thermal throttling of
CPUs and devfreq devices. A use case for non-thermal throttling could
be the detection of a high battery discharge current, close to the
over-current protection (OCP) limit of the battery.
To support throttling of devfreq devices the series introduces the
concept of a devfreq policy and the DEVFREQ_ADJUST notifier (similar
to CPUFREQ_ADJUST). Further it includes some related devfreq bugfixes
and improvements that change some of the code that is also touched
by the policy changes.
Matthias Kaehlcke (12):
PM / devfreq: Init user limits from OPP limits, not viceversa
PM / devfreq: Fix handling of min/max_freq == 0
PM / devfreq: Don't adjust to user limits in governors
PM / devfreq: Add struct devfreq_policy
PM / devfreq: Add support for policy notifiers
PM / devfreq: Make update_devfreq() public
PM / devfreq: export devfreq_class
cpufreq: Add stub for cpufreq_update_policy()
dt-bindings: misc: add bindings for throttler
misc: throttler: Add core support for non-thermal throttling
misc: throttler: Add Chrome OS EC throttler
mfd: cros_ec: Add throttler sub-device
.../devicetree/bindings/misc/throttler.txt | 13 +
MAINTAINERS | 7 +
drivers/devfreq/devfreq.c | 222 +++---
drivers/devfreq/governor.h | 6 +-
drivers/devfreq/governor_passive.c | 4 +-
drivers/devfreq/governor_performance.c | 5 +-
drivers/devfreq/governor_powersave.c | 2 +-
drivers/devfreq/governor_simpleondemand.c | 12 +-
drivers/devfreq/governor_userspace.c | 16 +-
drivers/mfd/cros_ec_dev.c | 19 +
drivers/misc/Kconfig | 1 +
drivers/misc/Makefile | 1 +
drivers/misc/throttler/Kconfig | 33 +
drivers/misc/throttler/Makefile | 2 +
drivers/misc/throttler/core.c | 697 ++++++++++++++++++
drivers/misc/throttler/cros_ec_throttler.c | 111 +++
include/linux/cpufreq.h | 1 +
include/linux/devfreq.h | 113 ++-
include/linux/throttler.h | 21 +
19 files changed, 1161 insertions(+), 125 deletions(-)
create mode 100644 Documentation/devicetree/bindings/misc/throttler.txt
create mode 100644 drivers/misc/throttler/Kconfig
create mode 100644 drivers/misc/throttler/Makefile
create mode 100644 drivers/misc/throttler/core.c
create mode 100644 drivers/misc/throttler/cros_ec_throttler.c
create mode 100644 include/linux/throttler.h
--
2.18.0.203.gfac676dfb9-goog
The purpose of the throttler is to provide support for non-thermal
throttling. Throttling is triggered by external event, e.g. the
detection of a high battery discharge current, close to the OCP limit
of the battery. The throttler is only in charge of the throttling, not
the monitoring, which is done by another (possibly platform specific)
driver.
Signed-off-by: Matthias Kaehlcke <[email protected]>
--
Changes in v5:
- read throttling OPPs using new device tree binding (throttler-opps = <...>)
- thr_get_throttling_freq(): removed warning
- thr_handle_cpufreq_event(): merged similar loops
- thr_handle_devfreq_event(): read level once and save it in
local variable
- thr_init_freq_table(): renamed nchilds to n_opps and nfreqs to n_freqs
Changes in v4:
- remove OOM logs
- "does have no" => "has no" in log message
- changed 'level' to unsigned int
- hold mutex in throttler_set_level() when checking if level has changed
- remove debugfs dir in throttler_teardown()
- consolidated update of all devfreq devices in thr_update_devfreq()
- added field 'shutting_down' to struct throttler
- refactored teardown to avoid deadlocks
Changes in v3:
- Kconfig: don't select CPU_FREQ and PM_DEVFREQ
- added CONFIG_THROTTLER_DEBUG option
- changed 'level' sysfs attribute to debugfs
- introduced thr_<level> macros for logging
- added debug logs
- added field clamp_freq to struct cpufreq_thrdev and devfreq_thrdev
to keep track of the current frequency limits and avoid spammy logs
Changes in v2:
- completely reworked the driver to support configuration through OPPs, which
requires a more dynamic handling
- added sysfs attribute to set the level for debugging and testing
- Makefile: depend on Kconfig option to traverse throttler directory
- Kconfig: removed 'default n'
- added SPDX line instead of license boiler-plate
- added entry to MAINTAINERS file
---
MAINTAINERS | 7 +
drivers/misc/Kconfig | 1 +
drivers/misc/Makefile | 1 +
drivers/misc/throttler/Kconfig | 23 ++
drivers/misc/throttler/Makefile | 1 +
drivers/misc/throttler/core.c | 697 ++++++++++++++++++++++++++++++++
include/linux/throttler.h | 21 +
7 files changed, 751 insertions(+)
create mode 100644 drivers/misc/throttler/Kconfig
create mode 100644 drivers/misc/throttler/Makefile
create mode 100644 drivers/misc/throttler/core.c
create mode 100644 include/linux/throttler.h
diff --git a/MAINTAINERS b/MAINTAINERS
index dc241b04d1bd..df974c0d85a4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14057,6 +14057,13 @@ T: git git://repo.or.cz/linux-2.6/linux-acpi-2.6/ibm-acpi-2.6.git
S: Maintained
F: drivers/platform/x86/thinkpad_acpi.c
+THROTTLER DRIVERS
+M: Matthias Kaehlcke <[email protected]>
+L: [email protected]
+S: Maintained
+F: drivers/misc/throttler/
+F: include/linux/throttler.h
+
THUNDERBOLT DRIVER
M: Andreas Noever <[email protected]>
M: Michael Jamet <[email protected]>
diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index 3726eacdf65d..717fa3bd0e09 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -527,4 +527,5 @@ source "drivers/misc/echo/Kconfig"
source "drivers/misc/cxl/Kconfig"
source "drivers/misc/ocxl/Kconfig"
source "drivers/misc/cardreader/Kconfig"
+source "drivers/misc/throttler/Kconfig"
endmenu
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index af22bbc3d00c..0f4ecc6a7532 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -58,3 +58,4 @@ obj-$(CONFIG_ASPEED_LPC_SNOOP) += aspeed-lpc-snoop.o
obj-$(CONFIG_PCI_ENDPOINT_TEST) += pci_endpoint_test.o
obj-$(CONFIG_OCXL) += ocxl/
obj-$(CONFIG_MISC_RTSX) += cardreader/
+obj-$(CONFIG_THROTTLER) += throttler/
diff --git a/drivers/misc/throttler/Kconfig b/drivers/misc/throttler/Kconfig
new file mode 100644
index 000000000000..8b2e63b2ef48
--- /dev/null
+++ b/drivers/misc/throttler/Kconfig
@@ -0,0 +1,23 @@
+# SPDX-License-Identifier: GPL-2.0
+
+menuconfig THROTTLER
+ bool "Throttler support"
+ depends on OF
+ help
+ This option enables core support for non-thermal throttling of CPUs
+ and devfreq devices.
+
+ Note that you also need a event monitor module usually called
+ *_throttler.
+
+if THROTTLER
+
+menuconfig THROTTLER_DEBUG
+ bool "Enable throttler debugging"
+ help
+ This option enables throttler debugging features like additional
+ logging and a debugfs attribute for setting the logging level.
+
+ Choose N unless you want to debug throttler drivers.
+
+endif # THROTTLER
diff --git a/drivers/misc/throttler/Makefile b/drivers/misc/throttler/Makefile
new file mode 100644
index 000000000000..c8d920cee315
--- /dev/null
+++ b/drivers/misc/throttler/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_THROTTLER) += core.o
diff --git a/drivers/misc/throttler/core.c b/drivers/misc/throttler/core.c
new file mode 100644
index 000000000000..44c99e380aa0
--- /dev/null
+++ b/drivers/misc/throttler/core.c
@@ -0,0 +1,697 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Core code for non-thermal throttling
+ *
+ * Copyright (C) 2018 Google, Inc.
+ */
+
+#include <linux/cpu.h>
+#include <linux/cpufreq.h>
+#include <linux/cpumask.h>
+#include <linux/debugfs.h>
+#include <linux/devfreq.h>
+#include <linux/kernel.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+#include <linux/pm_opp.h>
+#include <linux/slab.h>
+#include <linux/sort.h>
+#include <linux/throttler.h>
+
+/*
+ * Non-thermal throttling: throttling of system components in response to
+ * external events (e.g. high battery discharge current).
+ *
+ * The throttler supports throttling through cpufreq and devfreq. Multiple
+ * levels of throttling can be configured. At level 0 no throttling is
+ * active on behalf of the throttler, for values > 0 throttling is typically
+ * configured to be increasingly aggressive with each level.
+ * The number of throttling levels is not limited by the throttler (though
+ * it is likely limited by the throttling devices). It is not necessary to
+ * configure the same number of levels for all throttling devices. If the
+ * requested throttling level for a device is higher than the maximum level
+ * of the device the throttler will select the maximum throttling level of
+ * the device.
+ *
+ * Non-thermal throttling is split in two parts:
+ *
+ * - throttler core
+ * - parses the thermal policy
+ * - applies throttling settings for a requested level of throttling
+ *
+ * - event monitor driver
+ * - monitors events that trigger throttling
+ * - determines the throttling level (often limited to on/off)
+ * - asks throttler core to apply throttling settings
+ *
+ * It is possible for a system to have more than one throttler and the
+ * throttlers may make use of the same throttling devices, in case of
+ * conflicting settings for a device the more aggressive values will be
+ * applied.
+ *
+ */
+
+#define ci_to_throttler(ci) \
+ container_of(ci, struct throttler, devfreq.class_iface)
+
+struct thr_freq_table {
+ uint32_t *freqs;
+ int n_entries;
+};
+
+struct cpufreq_thrdev {
+ uint32_t cpu;
+ struct thr_freq_table freq_table;
+ uint32_t clamp_freq;
+ struct list_head node;
+};
+
+struct devfreq_thrdev {
+ struct devfreq *devfreq;
+ struct thr_freq_table freq_table;
+ uint32_t clamp_freq;
+ struct throttler *thr;
+ struct notifier_block nb;
+ struct list_head node;
+};
+
+struct __thr_cpufreq {
+ struct list_head list;
+ cpumask_t cm_initialized;
+ cpumask_t cm_ignore;
+ struct notifier_block nb;
+};
+
+struct __thr_devfreq {
+ struct list_head list;
+ struct class_interface class_iface;
+};
+
+struct __thr_debugfs {
+ struct dentry *dir;
+ struct dentry *attr_level;
+};
+
+struct throttler {
+ struct device *dev;
+ unsigned int level;
+ struct __thr_cpufreq cpufreq;
+ struct __thr_devfreq devfreq;
+ struct mutex lock;
+ bool shutting_down;
+#ifdef CONFIG_THROTTLER_DEBUG
+ struct __thr_debugfs debugfs;
+#endif
+};
+
+static inline int cmp_freqs(const void *a, const void *b)
+{
+ const uint32_t *pa = a, *pb = b;
+
+ if (*pa < *pb)
+ return 1;
+ else if (*pa > *pb)
+ return -1;
+
+ return 0;
+}
+
+static int thr_handle_devfreq_event(struct notifier_block *nb,
+ unsigned long event, void *data);
+
+static unsigned long thr_get_throttling_freq(struct thr_freq_table *ft,
+ unsigned int level)
+{
+ if (level == 0)
+ return ULONG_MAX;
+
+ if (level <= ft->n_entries)
+ return ft->freqs[level - 1];
+ else
+ return ft->freqs[ft->n_entries - 1];
+}
+
+static int thr_init_freq_table(struct throttler *thr, struct device *opp_dev,
+ struct thr_freq_table *ft)
+{
+ struct device_node *np_opp_desc;
+ int n_opps;
+ int n_thr_opps;
+ int i;
+ uint32_t *freqs;
+ int n_freqs = 0;
+ int err = 0;
+
+ np_opp_desc = dev_pm_opp_of_get_opp_desc_node(opp_dev);
+ if (!np_opp_desc)
+ return -EINVAL;
+
+ n_opps = of_get_child_count(np_opp_desc);
+ if (!n_opps) {
+ err = -EINVAL;
+ goto out_node_put;
+ }
+
+ freqs = kzalloc(n_opps * sizeof(uint32_t), GFP_KERNEL);
+ if (!freqs) {
+ err = -ENOMEM;
+ goto out_node_put;
+ }
+
+ n_thr_opps = of_property_count_u32_elems(thr->dev->of_node,
+ "throttler-opps");
+ if (n_thr_opps <= 0) {
+ thr_err(thr, "No OPPs configured for throttling\n");
+ err = -EINVAL;
+ goto out_free;
+ }
+
+ for (i = 0; i < n_thr_opps; i++) {
+ struct device_node *np_opp;
+ u64 rate;
+
+ np_opp = of_parse_phandle(thr->dev->of_node, "throttler-opps",
+ i);
+ if (!np_opp) {
+ thr_err(thr,
+ "failed to parse phandle %d: %s\n", i,
+ np_opp->full_name);
+ continue;
+ }
+
+ if (of_get_parent(np_opp) != np_opp_desc) {
+ of_node_put(np_opp);
+ continue;
+ }
+
+ err = of_property_read_u64(np_opp, "opp-hz",
+ &rate);
+ if (!err) {
+ freqs[n_freqs] = rate;
+ n_freqs++;
+
+ thr_dbg(thr,
+ "OPP %s (%llu MHz) is used for throttling\n",
+ np_opp->full_name,
+ rate / 1000000);
+ } else {
+ thr_err(thr, "opp-hz not found: %s\n",
+ np_opp->full_name);
+ }
+
+ of_node_put(np_opp);
+ }
+
+ if (n_freqs > 0) {
+ /* sort frequencies in descending order */
+ sort(freqs, n_freqs, sizeof(*freqs), cmp_freqs, NULL);
+
+ ft->n_entries = n_freqs;
+ ft->freqs = devm_kzalloc(thr->dev,
+ n_freqs * sizeof(*freqs), GFP_KERNEL);
+ if (!ft->freqs) {
+ err = -ENOMEM;
+ goto out_free;
+ }
+
+ memcpy(ft->freqs, freqs, n_freqs * sizeof(*freqs));
+ } else {
+ err = -ENODEV;
+ }
+
+out_free:
+ kfree(freqs);
+
+out_node_put:
+ of_node_put(np_opp_desc);
+
+ return err;
+}
+
+static void thr_cpufreq_init(struct throttler *thr, int cpu)
+{
+ struct device *cpu_dev;
+ struct thr_freq_table ft;
+ struct cpufreq_thrdev *cpufreq_dev;
+ int err;
+
+ WARN_ON(!mutex_is_locked(&thr->lock));
+
+ cpu_dev = get_cpu_device(cpu);
+ if (!cpu_dev) {
+ dev_err_ratelimited(thr->dev, "failed to get CPU %d\n", cpu);
+ return;
+ }
+
+ err = thr_init_freq_table(thr, cpu_dev, &ft);
+ if (err) {
+ /* CPU is not throttled or initialization failed */
+ if (err != -ENODEV)
+ thr_err(thr, "failed to initialize CPU %d: %d", cpu,
+ err);
+
+ cpumask_set_cpu(cpu, &thr->cpufreq.cm_ignore);
+ return;
+ }
+
+ cpufreq_dev = devm_kzalloc(thr->dev, sizeof(*cpufreq_dev), GFP_KERNEL);
+ if (!cpufreq_dev)
+ return;
+
+ cpufreq_dev->cpu = cpu;
+ memcpy(&cpufreq_dev->freq_table, &ft, sizeof(ft));
+ list_add_tail(&cpufreq_dev->node, &thr->cpufreq.list);
+
+ cpumask_set_cpu(cpu, &thr->cpufreq.cm_initialized);
+}
+
+static void thr_devfreq_init(struct device *dev, void *data)
+{
+ struct throttler *thr = data;
+ struct thr_freq_table ft;
+ struct devfreq_thrdev *dftd;
+ int err;
+
+ WARN_ON(!mutex_is_locked(&thr->lock));
+
+ err = thr_init_freq_table(thr, dev->parent, &ft);
+ if (err) {
+ if (err == -ENODEV)
+ return;
+
+ thr_err(thr, "failed to init frequency table of device %s: %d",
+ dev_name(dev), err);
+ return;
+ }
+
+ dftd = devm_kzalloc(thr->dev, sizeof(*dftd), GFP_KERNEL);
+ if (!dftd)
+ return;
+
+ dftd->thr = thr;
+ dftd->devfreq = container_of(dev, struct devfreq, dev);
+ memcpy(&dftd->freq_table, &ft, sizeof(ft));
+
+ dftd->nb.notifier_call = thr_handle_devfreq_event;
+ err = devm_devfreq_register_notifier(thr->dev, dftd->devfreq,
+ &dftd->nb, DEVFREQ_POLICY_NOTIFIER);
+ if (err < 0) {
+ thr_err(thr, "failed to register devfreq notifier\n");
+ devm_kfree(thr->dev, dftd);
+ return;
+ }
+
+ list_add_tail(&dftd->node, &thr->devfreq.list);
+
+ thr_dbg(thr, "device '%s' is used for throttling\n",
+ dev_name(dev));
+}
+
+static int thr_handle_cpufreq_event(struct notifier_block *nb,
+ unsigned long event, void *data)
+{
+ struct throttler *thr =
+ container_of(nb, struct throttler, cpufreq.nb);
+ struct cpufreq_policy *policy = data;
+ struct cpufreq_thrdev *cftd;
+
+ if ((event != CPUFREQ_ADJUST) || thr->shutting_down)
+ return 0;
+
+ mutex_lock(&thr->lock);
+
+ if (cpumask_test_cpu(policy->cpu, &thr->cpufreq.cm_ignore))
+ goto out;
+
+ if (!cpumask_test_cpu(policy->cpu, &thr->cpufreq.cm_initialized)) {
+ thr_cpufreq_init(thr, policy->cpu);
+
+ if (cpumask_test_cpu(policy->cpu, &thr->cpufreq.cm_ignore))
+ goto out;
+
+ thr_dbg(thr, "CPU%d is used for throttling\n", policy->cpu);
+ }
+
+ list_for_each_entry(cftd, &thr->cpufreq.list, node) {
+ unsigned long clamp_freq;
+
+ if (cftd->cpu != policy->cpu)
+ continue;
+
+ if (thr->level == 0) {
+ if (cftd->clamp_freq != 0) {
+ thr_dbg(thr, "unthrottling CPU%d\n", cftd->cpu);
+ cftd->clamp_freq = 0;
+ }
+
+ continue;
+ }
+
+ clamp_freq = thr_get_throttling_freq(&cftd->freq_table,
+ thr->level) / 1000;
+ if (cftd->clamp_freq != clamp_freq) {
+ thr_dbg(thr, "throttling CPU%d to %lu MHz\n", cftd->cpu,
+ clamp_freq / 1000);
+ cftd->clamp_freq = clamp_freq;
+ }
+
+ if (clamp_freq < policy->max)
+ cpufreq_verify_within_limits(policy, 0, clamp_freq);
+ }
+
+out:
+ mutex_unlock(&thr->lock);
+
+ return NOTIFY_DONE;
+}
+
+/*
+ * Notifier called by devfreq. Can't acquire thr->lock since it might
+ * already be held by throttler_set_level(). It isn't necessary to
+ * acquire the lock for the following reasons:
+ *
+ * Only the devfreq_thrdev and thr->level are accessed in this function.
+ * The devfreq device won't go away (or change) during the execution of
+ * this function, since we are called from the devfreq core. Theoretically
+ * thr->level could change and we'd apply an outdated setting, however in
+ * this case the function would run again shortly after and apply the
+ * correct value.
+ */
+static int thr_handle_devfreq_event(struct notifier_block *nb,
+ unsigned long event, void *data)
+{
+ struct devfreq_thrdev *dftd =
+ container_of(nb, struct devfreq_thrdev, nb);
+ struct throttler *thr = dftd->thr;
+ struct devfreq_policy *policy = data;
+ int level = READ_ONCE(thr->level);
+ unsigned long clamp_freq;
+
+ if ((event != DEVFREQ_ADJUST) || thr->shutting_down)
+ return NOTIFY_DONE;
+
+ if (level == 0) {
+ if (dftd->clamp_freq != 0) {
+ thr_dbg(thr, "unthrottling '%s'\n",
+ dev_name(&dftd->devfreq->dev));
+ dftd->clamp_freq = 0;
+ }
+
+ return NOTIFY_DONE;
+ }
+
+ clamp_freq = thr_get_throttling_freq(&dftd->freq_table, level);
+ if (clamp_freq != dftd->clamp_freq) {
+ thr_dbg(thr, "throttling '%s' to %lu MHz\n",
+ dev_name(&dftd->devfreq->dev), clamp_freq / 1000000);
+ dftd->clamp_freq = clamp_freq;
+ }
+
+ if (clamp_freq < policy->max)
+ devfreq_verify_within_limits(policy, 0, clamp_freq);
+
+ return NOTIFY_DONE;
+}
+
+static void thr_cpufreq_update_policy(struct throttler *thr)
+{
+ struct cpufreq_thrdev *cftd;
+
+ WARN_ON(!mutex_is_locked(&thr->lock));
+
+ list_for_each_entry(cftd, &thr->cpufreq.list, node) {
+ struct cpufreq_policy *policy = cpufreq_cpu_get(cftd->cpu);
+
+ if (!policy) {
+ thr_warn(thr, "CPU%d has no cpufreq policy!\n",
+ cftd->cpu);
+ continue;
+ }
+
+ /*
+ * The lock isn't really needed in this function, the list
+ * of cpufreq devices can be extended, but no items are
+ * deleted during the lifetime of the throttler. Releasing
+ * the lock is necessary since cpufreq_update_policy() ends
+ * up calling thr_handle_cpufreq_event(), which needs to
+ * acquire the lock.
+ */
+ mutex_unlock(&thr->lock);
+ cpufreq_update_policy(cftd->cpu);
+ mutex_lock(&thr->lock);
+
+ cpufreq_cpu_put(policy);
+ }
+}
+
+static void thr_update_devfreq(struct throttler *thr)
+{
+ struct devfreq_thrdev *dftd;
+
+ WARN_ON(!mutex_is_locked(&thr->lock));
+
+ list_for_each_entry(dftd, &thr->devfreq.list, node) {
+ mutex_lock(&dftd->devfreq->lock);
+ update_devfreq(dftd->devfreq);
+ mutex_unlock(&dftd->devfreq->lock);
+ }
+}
+
+static int thr_handle_devfreq_added(struct device *dev,
+ struct class_interface *ci)
+{
+ struct throttler *thr = ci_to_throttler(ci);
+
+ mutex_lock(&thr->lock);
+ thr_devfreq_init(dev, thr);
+ mutex_unlock(&thr->lock);
+
+ return 0;
+}
+
+static void thr_handle_devfreq_removed(struct device *dev,
+ struct class_interface *ci)
+{
+ struct devfreq_thrdev *dftd;
+ struct throttler *thr = ci_to_throttler(ci);
+
+ mutex_lock(&thr->lock);
+
+ list_for_each_entry(dftd, &thr->devfreq.list, node) {
+ if (dev == &dftd->devfreq->dev) {
+ list_del(&dftd->node);
+ devm_kfree(thr->dev, dftd->freq_table.freqs);
+ devm_kfree(thr->dev, dftd);
+ break;
+ }
+ }
+
+ mutex_unlock(&thr->lock);
+}
+
+void throttler_set_level(struct throttler *thr, unsigned int level)
+{
+ mutex_lock(&thr->lock);
+
+ if ((level == thr->level) || thr->shutting_down) {
+ mutex_unlock(&thr->lock);
+ return;
+ }
+
+ thr_dbg(thr, "throttling level: %u\n", level);
+ thr->level = level;
+
+ if (!list_empty(&thr->cpufreq.list))
+ thr_cpufreq_update_policy(thr);
+
+ thr_update_devfreq(thr);
+
+ mutex_unlock(&thr->lock);
+}
+EXPORT_SYMBOL_GPL(throttler_set_level);
+
+#ifdef CONFIG_THROTTLER_DEBUG
+
+static ssize_t thr_level_read(struct file *file, char __user *user_buf,
+ size_t count, loff_t *ppos)
+{
+ struct throttler *thr = file->f_inode->i_private;
+ char buf[5];
+ int len;
+
+ len = scnprintf(buf, sizeof(buf), "%u\n", thr->level);
+
+ return simple_read_from_buffer(user_buf, count, ppos, buf, len);
+}
+
+static ssize_t thr_level_write(struct file *file,
+ const char __user *user_buf,
+ size_t count, loff_t *ppos)
+{
+ int rc;
+ unsigned int level;
+ struct throttler *thr = file->f_inode->i_private;
+
+ rc = kstrtouint_from_user(user_buf, count, 10, &level);
+ if (rc)
+ return rc;
+
+ throttler_set_level(thr, level);
+
+ return count;
+}
+
+static const struct file_operations level_debugfs_ops = {
+ .owner = THIS_MODULE,
+ .read = thr_level_read,
+ .write = thr_level_write,
+};
+#endif
+
+struct throttler *throttler_setup(struct device *dev)
+{
+ struct throttler *thr;
+ struct device_node *np = dev->of_node;
+ struct class_interface *ci;
+ int cpu;
+ int err;
+
+ if (!np)
+ /* should never happen */
+ return ERR_PTR(-EINVAL);
+
+ thr = devm_kzalloc(dev, sizeof(*thr), GFP_KERNEL);
+ if (!thr)
+ return ERR_PTR(-ENOMEM);
+
+ mutex_init(&thr->lock);
+ thr->dev = dev;
+
+ cpumask_clear(&thr->cpufreq.cm_ignore);
+ cpumask_clear(&thr->cpufreq.cm_initialized);
+
+ INIT_LIST_HEAD(&thr->cpufreq.list);
+ INIT_LIST_HEAD(&thr->devfreq.list);
+
+ thr->cpufreq.nb.notifier_call = thr_handle_cpufreq_event;
+ err = cpufreq_register_notifier(&thr->cpufreq.nb,
+ CPUFREQ_POLICY_NOTIFIER);
+ if (err < 0) {
+ thr_err(thr, "failed to register cpufreq notifier\n");
+ return ERR_PTR(err);
+ }
+
+ /*
+ * The CPU throttling configuration is parsed at runtime, when the
+ * cpufreq policy notifier is called for a CPU that hasn't been
+ * initialized yet.
+ *
+ * This is done for two reasons:
+ * - when the throttler is probed the CPU might not yet have a policy
+ * - CPUs that were offline at probe time might be hotplugged
+ *
+ * The notifier is called then the policy is added/set
+ */
+ for_each_online_cpu(cpu) {
+ struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
+
+ if (!policy)
+ continue;
+
+ cpufreq_update_policy(cpu);
+ cpufreq_cpu_put(policy);
+ }
+
+ /*
+ * devfreq devices can be added and removed at runtime, hence they
+ * must also be handled dynamically. The class_interface notifies us
+ * whenever a device is added or removed. When the interface is
+ * registered ci->add_dev() is called for all existing devfreq
+ * devices.
+ */
+ ci = &thr->devfreq.class_iface;
+ ci->class = devfreq_class;
+ ci->add_dev = thr_handle_devfreq_added;
+ ci->remove_dev = thr_handle_devfreq_removed;
+
+ err = class_interface_register(ci);
+ if (err) {
+ thr_err(thr, "failed to register devfreq class interface: %d\n",
+ err);
+ cpufreq_unregister_notifier(&thr->cpufreq.nb,
+ CPUFREQ_POLICY_NOTIFIER);
+ return ERR_PTR(err);
+ }
+
+#ifdef CONFIG_THROTTLER_DEBUG
+ thr->debugfs.dir = debugfs_create_dir(dev_name(thr->dev), NULL);
+ if (IS_ERR(thr->debugfs.dir)) {
+ thr_warn(thr, "failed to create debugfs directory: %ld\n",
+ PTR_ERR(thr->debugfs.dir));
+ thr->debugfs.dir = NULL;
+ goto skip_debugfs;
+ }
+
+ thr->debugfs.attr_level = debugfs_create_file("level", 0644,
+ thr->debugfs.dir, thr,
+ &level_debugfs_ops);
+ if (IS_ERR(thr->debugfs.dir)) {
+ thr_warn(thr, "failed to create debugfs attribute: %ld\n",
+ PTR_ERR(thr->debugfs.attr_level));
+ debugfs_remove(thr->debugfs.dir);
+ thr->debugfs.dir = NULL;
+ }
+
+skip_debugfs:
+#endif
+
+ return thr;
+}
+EXPORT_SYMBOL_GPL(throttler_setup);
+
+void throttler_teardown(struct throttler *thr)
+{
+#ifdef CONFIG_THROTTLER_DEBUG
+ debugfs_remove_recursive(thr->debugfs.dir);
+#endif
+
+ /*
+ * Indicate notifiers and _set_level() that we are shutting down.
+ * If a notifier starts before the flag is set it may still apply
+ * throttling settings. This is not a problem since we explicitly
+ * trigger the notifiers (again) below to unthrottle CPUs and
+ * devfreq devices.
+ */
+ thr->shutting_down = true;
+
+ /*
+ * Unregister without the lock being held to avoid possible
+ * deadlock with notifier calls.
+ */
+ cpufreq_unregister_notifier(&thr->cpufreq.nb,
+ CPUFREQ_POLICY_NOTIFIER);
+
+ mutex_lock(&thr->lock);
+
+ if (thr->level) {
+ /* Unthrottle CPUs */
+ if (!list_empty(&thr->cpufreq.list))
+ thr_cpufreq_update_policy(thr);
+
+ /* Unthrottle devfreq devices */
+ thr_update_devfreq(thr);
+ }
+
+ mutex_unlock(&thr->lock);
+
+ /*
+ * Unregistering the class interface must be done without holding the
+ * lock, since it results in calling thr_handle_devfreq_removed(),
+ * which acquires the lock.
+ */
+ class_interface_unregister(&thr->devfreq.class_iface);
+}
+EXPORT_SYMBOL_GPL(throttler_teardown);
diff --git a/include/linux/throttler.h b/include/linux/throttler.h
new file mode 100644
index 000000000000..c43ab1acd96b
--- /dev/null
+++ b/include/linux/throttler.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __LINUX_THROTTLER_H__
+#define __LINUX_THROTTLER_H__
+
+struct throttler;
+
+extern struct throttler *throttler_setup(struct device *dev);
+extern void throttler_teardown(struct throttler *thr);
+extern void throttler_set_level(struct throttler *thr, unsigned int level);
+
+#ifdef CONFIG_THROTTLER_DEBUG
+#define thr_dbg(thr, fmt, ...) dev_info(thr->dev, fmt, ##__VA_ARGS__)
+#else
+#define thr_dbg(thr, fmt, ...) dev_dbg(thr->dev, fmt, ##__VA_ARGS__)
+#endif
+
+#define thr_info(thr, fmt, ...) dev_info(thr->dev, fmt, ##__VA_ARGS__)
+#define thr_warn(thr, fmt, ...) dev_warn(thr->dev, fmt, ##__VA_ARGS__)
+#define thr_err(thr, fmt, ...) dev_warn(thr->dev, fmt, ##__VA_ARGS__)
+
+#endif /* __LINUX_THROTTLER_H__ */
--
2.18.0.203.gfac676dfb9-goog
Instantiate the CrOS EC throttler if it is enabled in the kernel
configuration.
Signed-off-by: Matthias Kaehlcke <[email protected]>
Reviewed-by: Brian Norris <[email protected]>
---
Changes in v5:
- added 'Reviewed-by: Brian Norris <[email protected]>' tag
Changes in v4:
- register throttler in cros_ec_dev.c instead of cros_ec.c
Changes in v3:
- patch added to series
---
drivers/mfd/cros_ec_dev.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/drivers/mfd/cros_ec_dev.c b/drivers/mfd/cros_ec_dev.c
index eafd06f62a3a..9e56c2793075 100644
--- a/drivers/mfd/cros_ec_dev.c
+++ b/drivers/mfd/cros_ec_dev.c
@@ -383,6 +383,22 @@ static void cros_ec_sensors_register(struct cros_ec_dev *ec)
kfree(msg);
}
+static const struct mfd_cell ec_throttler_cells[] = {
+ { .name = "cros-ec-throttler" }
+};
+
+static void cros_ec_throttler_register(struct cros_ec_dev *ec)
+{
+ int ret;
+
+ ret = mfd_add_devices(ec->dev, 0, ec_throttler_cells,
+ ARRAY_SIZE(ec_throttler_cells),
+ NULL, 0, NULL);
+ if (ret)
+ dev_err(ec->dev,
+ "failed to add cros-ec-throttler device: %d\n", ret);
+}
+
static int ec_device_probe(struct platform_device *pdev)
{
int retval = -ENOMEM;
@@ -422,6 +438,9 @@ static int ec_device_probe(struct platform_device *pdev)
if (cros_ec_check_features(ec, EC_FEATURE_MOTION_SENSE))
cros_ec_sensors_register(ec);
+ if (IS_ENABLED(CONFIG_CROS_EC_THROTTLER))
+ cros_ec_throttler_register(ec);
+
/* Take control of the lightbar from the EC. */
lb_manual_suspend_ctrl(ec, 1);
--
2.18.0.203.gfac676dfb9-goog
Exporting the device class allows other parts of the kernel to enumerate
the devfreq devices and receive notification when a devfreq device is
added or removed.
Signed-off-by: Matthias Kaehlcke <[email protected]>
Reviewed-by: Brian Norris <[email protected]>
---
Changes in v5:
- none
Changes in v4:
- added 'Reviewed-by: Brian Norris <[email protected]>' tag
Changes in v3:
- none
Changes in v2:
- patch added to series
---
drivers/devfreq/devfreq.c | 3 ++-
include/linux/devfreq.h | 2 ++
2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
index 4cbaa7ad1972..38b90b64fc6e 100644
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -31,7 +31,8 @@
#define MAX(a,b) ((a > b) ? a : b)
#define MIN(a,b) ((a < b) ? a : b)
-static struct class *devfreq_class;
+struct class *devfreq_class;
+EXPORT_SYMBOL_GPL(devfreq_class);
/*
* devfreq core provides delayed work based load monitoring helper
diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
index c4f84a769cb5..964e064a951f 100644
--- a/include/linux/devfreq.h
+++ b/include/linux/devfreq.h
@@ -206,6 +206,8 @@ struct devfreq_freqs {
};
#if defined(CONFIG_PM_DEVFREQ)
+extern struct class *devfreq_class;
+
extern struct devfreq *devfreq_add_device(struct device *dev,
struct devfreq_dev_profile *profile,
const char *governor_name,
--
2.18.0.203.gfac676dfb9-goog
cpufreq stubs out some functions when CONFIG_CPU_FREQ=n , but
cpufreq_update_policy() is not among them. The throttler driver
(https://patchwork.kernel.org/patch/10453351/) uses cpufreq as one
possible throttling mechanism, but it can still be useful without
cpufreq. Stubbing out cpufreq_update_policy() allows the throttler
driver to be built without ugly #ifdef'ery when cpufreq is disabled.
Signed-off-by: Matthias Kaehlcke <[email protected]>
Reviewed-by: Brian Norris <[email protected]>
---
Changes in v5:
- none
Changes in v4:
- added 'Reviewed-by: Brian Norris <[email protected]>' tag
Changes in v3:
- patch added to series
---
include/linux/cpufreq.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 882a9b9e34bc..dba8c4951e2e 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -210,6 +210,7 @@ static inline unsigned int cpufreq_quick_get_max(unsigned int cpu)
return 0;
}
static inline void disable_cpufreq(void) { }
+static inline void cpufreq_update_policy(unsigned int cpu) { }
#endif
#ifdef CONFIG_CPU_FREQ_STAT
--
2.18.0.203.gfac676dfb9-goog
The driver subscribes to throttling events from the Chrome OS
embedded controller and enables/disables system throttling based
on these events.
Signed-off-by: Matthias Kaehlcke <[email protected]>
Reviewed-by: Enric Balletbo i Serra <[email protected]>
Reviewed-by: Brian Norris <[email protected]>
---
Changes in v5:
- none
Changes in v4:
- adapt to parent being struct cros_ec_dev instead of struct
cros_ec_device
- added 'Reviewed-by: Brian Norris <[email protected]>' tag
Changes in v3:
- change module license to GPL v2 as in the SPDX identifier
- don't instantiate the throttler through the DT (instantiation
by CrOS EC MFD in a separate patch)
Changes in v2:
- added SPDX line instead of license boiler-plate
- use macro to avoid splitting line
- changed variable name for throttler from 'cte' to 'ce_thr'
- formatting fixes
- Kconfig: removed odd dashes around 'help'
- added 'Reviewed-by' tag
---
drivers/misc/throttler/Kconfig | 10 ++
drivers/misc/throttler/Makefile | 1 +
drivers/misc/throttler/cros_ec_throttler.c | 111 +++++++++++++++++++++
3 files changed, 122 insertions(+)
create mode 100644 drivers/misc/throttler/cros_ec_throttler.c
diff --git a/drivers/misc/throttler/Kconfig b/drivers/misc/throttler/Kconfig
index 8b2e63b2ef48..ef984c96f67b 100644
--- a/drivers/misc/throttler/Kconfig
+++ b/drivers/misc/throttler/Kconfig
@@ -20,4 +20,14 @@ menuconfig THROTTLER_DEBUG
Choose N unless you want to debug throttler drivers.
+config CROS_EC_THROTTLER
+ tristate "Throttler event monitor for the Chrome OS Embedded Controller"
+ depends on MFD_CROS_EC
+ help
+ This driver adds support to throttle the system in reaction to
+ Chrome OS EC events.
+
+ To compile this driver as a module, choose M here: the
+ module will be called cros_ec_throttler.
+
endif # THROTTLER
diff --git a/drivers/misc/throttler/Makefile b/drivers/misc/throttler/Makefile
index c8d920cee315..d9b2a77dabc9 100644
--- a/drivers/misc/throttler/Makefile
+++ b/drivers/misc/throttler/Makefile
@@ -1 +1,2 @@
obj-$(CONFIG_THROTTLER) += core.o
+obj-$(CONFIG_CROS_EC_THROTTLER) += cros_ec_throttler.o
diff --git a/drivers/misc/throttler/cros_ec_throttler.c b/drivers/misc/throttler/cros_ec_throttler.c
new file mode 100644
index 000000000000..82a25415a264
--- /dev/null
+++ b/drivers/misc/throttler/cros_ec_throttler.c
@@ -0,0 +1,111 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Driver for throttling triggered by events from the Chrome OS Embedded
+ * Controller.
+ *
+ * Copyright (C) 2018 Google, Inc.
+ */
+
+#include <linux/kernel.h>
+#include <linux/mfd/cros_ec.h>
+#include <linux/module.h>
+#include <linux/notifier.h>
+#include <linux/of.h>
+#include <linux/of_platform.h>
+#include <linux/platform_device.h>
+#include <linux/throttler.h>
+
+#define nb_to_ce_thr(nb) container_of(nb, struct cros_ec_throttler, nb)
+
+struct cros_ec_throttler {
+ struct cros_ec_device *ec;
+ struct throttler *throttler;
+ struct notifier_block nb;
+};
+
+static int cros_ec_throttler_event(struct notifier_block *nb,
+ unsigned long queued_during_suspend, void *_notify)
+{
+ struct cros_ec_throttler *ce_thr = nb_to_ce_thr(nb);
+ u32 host_event;
+
+ host_event = cros_ec_get_host_event(ce_thr->ec);
+ if (host_event & EC_HOST_EVENT_MASK(EC_HOST_EVENT_THROTTLE_START)) {
+ throttler_set_level(ce_thr->throttler, 1);
+
+ return NOTIFY_OK;
+ } else if (host_event &
+ EC_HOST_EVENT_MASK(EC_HOST_EVENT_THROTTLE_STOP)) {
+ throttler_set_level(ce_thr->throttler, 0);
+
+ return NOTIFY_OK;
+ }
+
+ return NOTIFY_DONE;
+}
+
+static int cros_ec_throttler_probe(struct platform_device *pdev)
+{
+ struct cros_ec_throttler *ce_thr;
+ struct device *dev = &pdev->dev;
+ struct cros_ec_dev *ec;
+ int ret;
+
+ ce_thr = devm_kzalloc(dev, sizeof(*ce_thr), GFP_KERNEL);
+ if (!ce_thr)
+ return -ENOMEM;
+
+ ec = dev_get_drvdata(pdev->dev.parent);
+ ce_thr->ec = ec->ec_dev;
+
+ /*
+ * The core code uses the DT node of the throttler to identify its
+ * throttling devices and rates. The CrOS EC throttler is a sub-device
+ * of the CrOS EC MFD device and doesn't have its own device node. Use
+ * the node of the MFD device instead.
+ */
+ dev->of_node = ce_thr->ec->dev->of_node;
+
+ ce_thr->throttler = throttler_setup(dev);
+ if (IS_ERR(ce_thr->throttler))
+ return PTR_ERR(ce_thr->throttler);
+
+ dev_set_drvdata(dev, ce_thr);
+
+ ce_thr->nb.notifier_call = cros_ec_throttler_event;
+ ret = blocking_notifier_chain_register(&ce_thr->ec->event_notifier,
+ &ce_thr->nb);
+ if (ret < 0) {
+ dev_err(dev, "failed to register notifier\n");
+ throttler_teardown(ce_thr->throttler);
+ return ret;
+ }
+
+ return 0;
+}
+
+static int cros_ec_throttler_remove(struct platform_device *pdev)
+{
+ struct cros_ec_throttler *ce_thr = platform_get_drvdata(pdev);
+
+ blocking_notifier_chain_unregister(&ce_thr->ec->event_notifier,
+ &ce_thr->nb);
+
+ throttler_teardown(ce_thr->throttler);
+
+ return 0;
+}
+
+static struct platform_driver cros_ec_throttler_driver = {
+ .driver = {
+ .name = "cros-ec-throttler",
+ },
+ .probe = cros_ec_throttler_probe,
+ .remove = cros_ec_throttler_remove,
+};
+
+module_platform_driver(cros_ec_throttler_driver);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Matthias Kaehlcke <[email protected]>");
+MODULE_DESCRIPTION("Chrome OS EC Throttler");
--
2.18.0.203.gfac676dfb9-goog
The Throttler is used for non-thermal throttling of system components
like CPUs or devfreq devices.
Signed-off-by: Matthias Kaehlcke <[email protected]>
--
Changes in v5:
- patch added to the series (replacing "dt-bindings: PM / OPP: add
opp-throttlers property")
---
.../devicetree/bindings/misc/throttler.txt | 13 +++++++++++++
1 file changed, 13 insertions(+)
create mode 100644 Documentation/devicetree/bindings/misc/throttler.txt
diff --git a/Documentation/devicetree/bindings/misc/throttler.txt b/Documentation/devicetree/bindings/misc/throttler.txt
new file mode 100644
index 000000000000..2ea80c62dbe1
--- /dev/null
+++ b/Documentation/devicetree/bindings/misc/throttler.txt
@@ -0,0 +1,13 @@
+Throttler driver
+
+The Throttler is used for non-thermal throttling of system components like
+CPUs or devfreq devices.
+
+Required properties:
+- throttler-opps Array of OPP-v2 phandles with the OPPs used for
+ throttling.
+
+Example:
+ throttler {
+ throttler-opps = <&cpu0_opp03, &cpu1_opp02, &gpu_opp03>;
+ };
--
2.18.0.203.gfac676dfb9-goog
Move variables related with devfreq policy changes from struct devfreq
to the new struct devfreq_policy and add a policy field to struct devfreq.
The following variables are moved:
df->min/max_freq => p->user.min/max_freq
df->scaling_min/max_freq => p->devinfo.min/max_freq
df->governor => p->governor
df->governor_name => p->governor_name
Signed-off-by: Matthias Kaehlcke <[email protected]>
Reviewed-by: Brian Norris <[email protected]>
---
Changes in v5:
- none
Changes in v4:
- added 'Reviewed-by: Brian Norris <[email protected]>' tag
Changes in v3:
- none
Changes in v2:
- performance, powersave and simpleondemand governors don't need changes
with "PM / devfreq: Don't adjust to user limits in governors"
- formatting fixes
---
drivers/devfreq/devfreq.c | 137 ++++++++++++++++-------------
drivers/devfreq/governor_passive.c | 4 +-
include/linux/devfreq.h | 38 +++++---
3 files changed, 103 insertions(+), 76 deletions(-)
diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
index 6f604f8b2b81..21604d6ae2b8 100644
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -255,6 +255,7 @@ static int devfreq_notify_transition(struct devfreq *devfreq,
*/
int update_devfreq(struct devfreq *devfreq)
{
+ struct devfreq_policy *policy = &devfreq->policy;
struct devfreq_freqs freqs;
unsigned long freq, cur_freq, min_freq, max_freq;
int err = 0;
@@ -265,11 +266,11 @@ int update_devfreq(struct devfreq *devfreq)
return -EINVAL;
}
- if (!devfreq->governor)
+ if (!policy->governor)
return -EINVAL;
/* Reevaluate the proper frequency */
- err = devfreq->governor->get_target_freq(devfreq, &freq);
+ err = policy->governor->get_target_freq(devfreq, &freq);
if (err)
return err;
@@ -280,8 +281,8 @@ int update_devfreq(struct devfreq *devfreq)
* max_freq
* min_freq
*/
- max_freq = MIN(devfreq->scaling_max_freq, devfreq->max_freq);
- min_freq = MAX(devfreq->scaling_min_freq, devfreq->min_freq);
+ max_freq = MIN(policy->devinfo.max_freq, policy->user.max_freq);
+ min_freq = MAX(policy->devinfo.min_freq, policy->user.min_freq);
if (freq < min_freq) {
freq = min_freq;
@@ -493,18 +494,19 @@ static int devfreq_notifier_call(struct notifier_block *nb, unsigned long type,
void *devp)
{
struct devfreq *devfreq = container_of(nb, struct devfreq, nb);
+ struct devfreq_policy *policy = &devfreq->policy;
int ret;
mutex_lock(&devfreq->lock);
- devfreq->scaling_min_freq = find_available_min_freq(devfreq);
- if (!devfreq->scaling_min_freq) {
+ policy->devinfo.min_freq = find_available_min_freq(devfreq);
+ if (!policy->devinfo.min_freq) {
mutex_unlock(&devfreq->lock);
return -EINVAL;
}
- devfreq->scaling_max_freq = find_available_max_freq(devfreq);
- if (!devfreq->scaling_max_freq) {
+ policy->devinfo.max_freq = find_available_max_freq(devfreq);
+ if (!policy->devinfo.max_freq) {
mutex_unlock(&devfreq->lock);
return -EINVAL;
}
@@ -524,6 +526,7 @@ static int devfreq_notifier_call(struct notifier_block *nb, unsigned long type,
static void devfreq_dev_release(struct device *dev)
{
struct devfreq *devfreq = to_devfreq(dev);
+ struct devfreq_policy *policy = &devfreq->policy;
mutex_lock(&devfreq_list_lock);
if (IS_ERR(find_device_devfreq(devfreq->dev.parent))) {
@@ -534,9 +537,9 @@ static void devfreq_dev_release(struct device *dev)
list_del(&devfreq->node);
mutex_unlock(&devfreq_list_lock);
- if (devfreq->governor)
- devfreq->governor->event_handler(devfreq,
- DEVFREQ_GOV_STOP, NULL);
+ if (policy->governor)
+ policy->governor->event_handler(devfreq,
+ DEVFREQ_GOV_STOP, NULL);
if (devfreq->profile->exit)
devfreq->profile->exit(devfreq->dev.parent);
@@ -559,6 +562,7 @@ struct devfreq *devfreq_add_device(struct device *dev,
void *data)
{
struct devfreq *devfreq;
+ struct devfreq_policy *policy;
struct devfreq_governor *governor;
static atomic_t devfreq_no = ATOMIC_INIT(-1);
int err = 0;
@@ -584,13 +588,14 @@ struct devfreq *devfreq_add_device(struct device *dev,
goto err_out;
}
+ policy = &devfreq->policy;
mutex_init(&devfreq->lock);
mutex_lock(&devfreq->lock);
devfreq->dev.parent = dev;
devfreq->dev.class = devfreq_class;
devfreq->dev.release = devfreq_dev_release;
devfreq->profile = profile;
- strncpy(devfreq->governor_name, governor_name, DEVFREQ_NAME_LEN);
+ strncpy(policy->governor_name, governor_name, DEVFREQ_NAME_LEN);
devfreq->previous_freq = profile->initial_freq;
devfreq->last_status.current_frequency = profile->initial_freq;
devfreq->data = data;
@@ -604,21 +609,21 @@ struct devfreq *devfreq_add_device(struct device *dev,
mutex_lock(&devfreq->lock);
}
- devfreq->scaling_min_freq = find_available_min_freq(devfreq);
- if (!devfreq->scaling_min_freq) {
+ policy->devinfo.min_freq = find_available_min_freq(devfreq);
+ if (!policy->devinfo.min_freq) {
mutex_unlock(&devfreq->lock);
err = -EINVAL;
goto err_dev;
}
- devfreq->min_freq = devfreq->scaling_min_freq;
+ policy->user.min_freq = policy->devinfo.min_freq;
- devfreq->scaling_max_freq = find_available_max_freq(devfreq);
- if (!devfreq->scaling_max_freq) {
+ policy->devinfo.max_freq = find_available_max_freq(devfreq);
+ if (!policy->devinfo.max_freq) {
mutex_unlock(&devfreq->lock);
err = -EINVAL;
goto err_dev;
}
- devfreq->max_freq = devfreq->scaling_max_freq;
+ policy->user.max_freq = policy->devinfo.max_freq;
dev_set_name(&devfreq->dev, "devfreq%d",
atomic_inc_return(&devfreq_no));
@@ -646,7 +651,7 @@ struct devfreq *devfreq_add_device(struct device *dev,
mutex_lock(&devfreq_list_lock);
list_add(&devfreq->node, &devfreq_list);
- governor = find_devfreq_governor(devfreq->governor_name);
+ governor = find_devfreq_governor(policy->governor_name);
if (IS_ERR(governor)) {
dev_err(dev, "%s: Unable to find governor for the device\n",
__func__);
@@ -654,9 +659,9 @@ struct devfreq *devfreq_add_device(struct device *dev,
goto err_init;
}
- devfreq->governor = governor;
- err = devfreq->governor->event_handler(devfreq, DEVFREQ_GOV_START,
- NULL);
+ policy->governor = governor;
+ err = policy->governor->event_handler(devfreq, DEVFREQ_GOV_START,
+ NULL);
if (err) {
dev_err(dev, "%s: Unable to start governor for the device\n",
__func__);
@@ -817,10 +822,10 @@ int devfreq_suspend_device(struct devfreq *devfreq)
if (!devfreq)
return -EINVAL;
- if (!devfreq->governor)
+ if (!devfreq->policy.governor)
return 0;
- return devfreq->governor->event_handler(devfreq,
+ return devfreq->policy.governor->event_handler(devfreq,
DEVFREQ_GOV_SUSPEND, NULL);
}
EXPORT_SYMBOL(devfreq_suspend_device);
@@ -838,10 +843,10 @@ int devfreq_resume_device(struct devfreq *devfreq)
if (!devfreq)
return -EINVAL;
- if (!devfreq->governor)
+ if (!devfreq->policy.governor)
return 0;
- return devfreq->governor->event_handler(devfreq,
+ return devfreq->policy.governor->event_handler(devfreq,
DEVFREQ_GOV_RESUME, NULL);
}
EXPORT_SYMBOL(devfreq_resume_device);
@@ -875,30 +880,31 @@ int devfreq_add_governor(struct devfreq_governor *governor)
list_for_each_entry(devfreq, &devfreq_list, node) {
int ret = 0;
struct device *dev = devfreq->dev.parent;
+ struct devfreq_policy *policy = &devfreq->policy;
- if (!strncmp(devfreq->governor_name, governor->name,
+ if (!strncmp(policy->governor_name, governor->name,
DEVFREQ_NAME_LEN)) {
/* The following should never occur */
- if (devfreq->governor) {
+ if (policy->governor) {
dev_warn(dev,
"%s: Governor %s already present\n",
- __func__, devfreq->governor->name);
- ret = devfreq->governor->event_handler(devfreq,
+ __func__, policy->governor->name);
+ ret = policy->governor->event_handler(devfreq,
DEVFREQ_GOV_STOP, NULL);
if (ret) {
dev_warn(dev,
"%s: Governor %s stop = %d\n",
__func__,
- devfreq->governor->name, ret);
+ policy->governor->name, ret);
}
/* Fall through */
}
- devfreq->governor = governor;
- ret = devfreq->governor->event_handler(devfreq,
+ policy->governor = governor;
+ ret = policy->governor->event_handler(devfreq,
DEVFREQ_GOV_START, NULL);
if (ret) {
dev_warn(dev, "%s: Governor %s start=%d\n",
- __func__, devfreq->governor->name,
+ __func__, policy->governor->name,
ret);
}
}
@@ -937,24 +943,25 @@ int devfreq_remove_governor(struct devfreq_governor *governor)
list_for_each_entry(devfreq, &devfreq_list, node) {
int ret;
struct device *dev = devfreq->dev.parent;
+ struct devfreq_policy *policy = &devfreq->policy;
- if (!strncmp(devfreq->governor_name, governor->name,
+ if (!strncmp(policy->governor_name, governor->name,
DEVFREQ_NAME_LEN)) {
/* we should have a devfreq governor! */
- if (!devfreq->governor) {
+ if (!policy->governor) {
dev_warn(dev, "%s: Governor %s NOT present\n",
__func__, governor->name);
continue;
/* Fall through */
}
- ret = devfreq->governor->event_handler(devfreq,
+ ret = policy->governor->event_handler(devfreq,
DEVFREQ_GOV_STOP, NULL);
if (ret) {
dev_warn(dev, "%s: Governor %s stop=%d\n",
- __func__, devfreq->governor->name,
+ __func__, policy->governor->name,
ret);
}
- devfreq->governor = NULL;
+ policy->governor = NULL;
}
}
@@ -969,16 +976,17 @@ EXPORT_SYMBOL(devfreq_remove_governor);
static ssize_t governor_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
- if (!to_devfreq(dev)->governor)
+ if (!to_devfreq(dev)->policy.governor)
return -EINVAL;
- return sprintf(buf, "%s\n", to_devfreq(dev)->governor->name);
+ return sprintf(buf, "%s\n", to_devfreq(dev)->policy.governor->name);
}
static ssize_t governor_store(struct device *dev, struct device_attribute *attr,
const char *buf, size_t count)
{
struct devfreq *df = to_devfreq(dev);
+ struct devfreq_policy *policy = &df->policy;
int ret;
char str_governor[DEVFREQ_NAME_LEN + 1];
struct devfreq_governor *governor;
@@ -993,29 +1001,30 @@ static ssize_t governor_store(struct device *dev, struct device_attribute *attr,
ret = PTR_ERR(governor);
goto out;
}
- if (df->governor == governor) {
+ if (policy->governor == governor) {
ret = 0;
goto out;
- } else if ((df->governor && df->governor->immutable) ||
+ } else if ((policy->governor && policy->governor->immutable) ||
governor->immutable) {
ret = -EINVAL;
goto out;
}
- if (df->governor) {
- ret = df->governor->event_handler(df, DEVFREQ_GOV_STOP, NULL);
+ if (policy->governor) {
+ ret = policy->governor->event_handler(df, DEVFREQ_GOV_STOP,
+ NULL);
if (ret) {
dev_warn(dev, "%s: Governor %s not stopped(%d)\n",
- __func__, df->governor->name, ret);
+ __func__, policy->governor->name, ret);
goto out;
}
}
- df->governor = governor;
- strncpy(df->governor_name, governor->name, DEVFREQ_NAME_LEN);
- ret = df->governor->event_handler(df, DEVFREQ_GOV_START, NULL);
+ policy->governor = governor;
+ strncpy(policy->governor_name, governor->name, DEVFREQ_NAME_LEN);
+ ret = policy->governor->event_handler(df, DEVFREQ_GOV_START, NULL);
if (ret)
dev_warn(dev, "%s: Governor %s not started(%d)\n",
- __func__, df->governor->name, ret);
+ __func__, policy->governor->name, ret);
out:
mutex_unlock(&devfreq_list_lock);
@@ -1030,6 +1039,7 @@ static ssize_t available_governors_show(struct device *d,
char *buf)
{
struct devfreq *df = to_devfreq(d);
+ struct devfreq_policy *policy = &df->policy;
ssize_t count = 0;
mutex_lock(&devfreq_list_lock);
@@ -1038,9 +1048,9 @@ static ssize_t available_governors_show(struct device *d,
* The devfreq with immutable governor (e.g., passive) shows
* only own governor.
*/
- if (df->governor->immutable) {
+ if (policy->governor->immutable) {
count = scnprintf(&buf[count], DEVFREQ_NAME_LEN,
- "%s ", df->governor_name);
+ "%s ", policy->governor_name);
/*
* The devfreq device shows the registered governor except for
* immutable governors such as passive governor .
@@ -1100,17 +1110,18 @@ static ssize_t polling_interval_store(struct device *dev,
const char *buf, size_t count)
{
struct devfreq *df = to_devfreq(dev);
+ struct devfreq_policy *policy = &df->policy;
unsigned int value;
int ret;
- if (!df->governor)
+ if (!policy->governor)
return -EINVAL;
ret = sscanf(buf, "%u", &value);
if (ret != 1)
return -EINVAL;
- df->governor->event_handler(df, DEVFREQ_GOV_INTERVAL, &value);
+ policy->governor->event_handler(df, DEVFREQ_GOV_INTERVAL, &value);
ret = count;
return ret;
@@ -1132,7 +1143,7 @@ static ssize_t min_freq_store(struct device *dev, struct device_attribute *attr,
mutex_lock(&df->lock);
if (value) {
- if (value > df->max_freq) {
+ if (value > df->policy.user.max_freq) {
ret = -EINVAL;
goto unlock;
}
@@ -1145,7 +1156,7 @@ static ssize_t min_freq_store(struct device *dev, struct device_attribute *attr,
value = freq_table[df->profile->max_state - 1];
}
- df->min_freq = value;
+ df->policy.user.min_freq = value;
update_devfreq(df);
ret = count;
unlock:
@@ -1156,9 +1167,10 @@ static ssize_t min_freq_store(struct device *dev, struct device_attribute *attr,
static ssize_t min_freq_show(struct device *dev, struct device_attribute *attr,
char *buf)
{
- struct devfreq *df = to_devfreq(dev);
+ struct devfreq_policy *policy = &to_devfreq(dev)->policy;
- return sprintf(buf, "%lu\n", MAX(df->scaling_min_freq, df->min_freq));
+ return sprintf(buf, "%lu\n",
+ MAX(policy->devinfo.min_freq, policy->user.min_freq));
}
static ssize_t max_freq_store(struct device *dev, struct device_attribute *attr,
@@ -1176,7 +1188,7 @@ static ssize_t max_freq_store(struct device *dev, struct device_attribute *attr,
mutex_lock(&df->lock);
if (value) {
- if (value < df->min_freq) {
+ if (value < df->policy.user.min_freq) {
ret = -EINVAL;
goto unlock;
}
@@ -1189,7 +1201,7 @@ static ssize_t max_freq_store(struct device *dev, struct device_attribute *attr,
value = freq_table[0];
}
- df->max_freq = value;
+ df->policy.user.max_freq = value;
update_devfreq(df);
ret = count;
unlock:
@@ -1201,9 +1213,10 @@ static DEVICE_ATTR_RW(min_freq);
static ssize_t max_freq_show(struct device *dev, struct device_attribute *attr,
char *buf)
{
- struct devfreq *df = to_devfreq(dev);
+ struct devfreq_policy *policy = &to_devfreq(dev)->policy;
- return sprintf(buf, "%lu\n", MIN(df->scaling_max_freq, df->max_freq));
+ return sprintf(buf, "%lu\n",
+ MIN(policy->devinfo.max_freq, policy->user.max_freq));
}
static DEVICE_ATTR_RW(max_freq);
diff --git a/drivers/devfreq/governor_passive.c b/drivers/devfreq/governor_passive.c
index 3bc29acbd54e..e0987c749ec2 100644
--- a/drivers/devfreq/governor_passive.c
+++ b/drivers/devfreq/governor_passive.c
@@ -99,12 +99,12 @@ static int update_devfreq_passive(struct devfreq *devfreq, unsigned long freq)
{
int ret;
- if (!devfreq->governor)
+ if (!devfreq->policy.governor)
return -EINVAL;
mutex_lock_nested(&devfreq->lock, SINGLE_DEPTH_NESTING);
- ret = devfreq->governor->get_target_freq(devfreq, &freq);
+ ret = devfreq->policy.governor->get_target_freq(devfreq, &freq);
if (ret < 0)
goto out;
diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
index 3aae5b3af87c..9bf23b976f4d 100644
--- a/include/linux/devfreq.h
+++ b/include/linux/devfreq.h
@@ -109,6 +109,30 @@ struct devfreq_dev_profile {
unsigned int max_state;
};
+/**
+ * struct devfreq_freq_limits - Devfreq frequency limits
+ * @min_freq: minimum frequency
+ * @max_freq: maximum frequency
+ */
+struct devfreq_freq_limits {
+ unsigned long min_freq;
+ unsigned long max_freq;
+};
+
+/**
+ * struct devfreq_policy - Devfreq policy
+ * @user: frequency limits requested by the user
+ * @devinfo: frequency limits of the device (available OPPs)
+ * @governor: method how to choose frequency based on the usage.
+ * @governor_name: devfreq governor name for use with this devfreq
+ */
+struct devfreq_policy {
+ struct devfreq_freq_limits user;
+ struct devfreq_freq_limits devinfo;
+ const struct devfreq_governor *governor;
+ char governor_name[DEVFREQ_NAME_LEN];
+};
+
/**
* struct devfreq - Device devfreq structure
* @node: list node - contains the devices with devfreq that have been
@@ -117,8 +141,6 @@ struct devfreq_dev_profile {
* @dev: device registered by devfreq class. dev.parent is the device
* using devfreq.
* @profile: device-specific devfreq profile
- * @governor: method how to choose frequency based on the usage.
- * @governor_name: devfreq governor name for use with this devfreq
* @nb: notifier block used to notify devfreq object that it should
* reevaluate operable frequencies. Devfreq users may use
* devfreq.nb to the corresponding register notifier call chain.
@@ -126,10 +148,7 @@ struct devfreq_dev_profile {
* @previous_freq: previously configured frequency value.
* @data: Private data of the governor. The devfreq framework does not
* touch this.
- * @min_freq: Limit minimum frequency requested by user (0: none)
- * @max_freq: Limit maximum frequency requested by user (0: none)
- * @scaling_min_freq: Limit minimum frequency requested by OPP interface
- * @scaling_max_freq: Limit maximum frequency requested by OPP interface
+ * @policy: Policy for frequency adjustments
* @stop_polling: devfreq polling status of a device.
* @total_trans: Number of devfreq transitions
* @trans_table: Statistics of devfreq transitions
@@ -151,8 +170,6 @@ struct devfreq {
struct mutex lock;
struct device dev;
struct devfreq_dev_profile *profile;
- const struct devfreq_governor *governor;
- char governor_name[DEVFREQ_NAME_LEN];
struct notifier_block nb;
struct delayed_work work;
@@ -161,10 +178,7 @@ struct devfreq {
void *data; /* private data for governors */
- unsigned long min_freq;
- unsigned long max_freq;
- unsigned long scaling_min_freq;
- unsigned long scaling_max_freq;
+ struct devfreq_policy policy;
bool stop_polling;
/* information for device frequency transition */
--
2.18.0.203.gfac676dfb9-goog
Policy notifiers are called before a frequency change and may narrow
the min/max frequency range in devfreq_policy, which is used to adjust
the target frequency if it is beyond this range.
Also add a few helpers:
- devfreq_verify_within_[dev_]limits()
- should be used by the notifiers for policy adjustments.
- dev_to_devfreq()
- lookup a devfreq strict from a device pointer
Signed-off-by: Matthias Kaehlcke <[email protected]>
Reviewed-by: Brian Norris <[email protected]>
---
Changes in v5:
- none
Changes in v4:
- Fixed typo in commit message: devfreg => devfreq
- added 'Reviewed-by: Brian Norris <[email protected]>' tag
Changes in v3:
- devfreq.h: fixed misspelling of struct devfreq_policy
Changes in v2:
- performance, powersave and simpleondemand governors don't need changes
with "PM / devfreq: Don't adjust to user limits in governors"
- formatting fixes
---
drivers/devfreq/devfreq.c | 48 ++++++++++++++++++++++-------
include/linux/devfreq.h | 65 +++++++++++++++++++++++++++++++++++++++
2 files changed, 102 insertions(+), 11 deletions(-)
diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
index 21604d6ae2b8..4cbaa7ad1972 100644
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -72,6 +72,21 @@ static struct devfreq *find_device_devfreq(struct device *dev)
return ERR_PTR(-ENODEV);
}
+/**
+ * dev_to_devfreq() - find devfreq struct using device pointer
+ * @dev: device pointer used to lookup device devfreq.
+ */
+struct devfreq *dev_to_devfreq(struct device *dev)
+{
+ struct devfreq *devfreq;
+
+ mutex_lock(&devfreq_list_lock);
+ devfreq = find_device_devfreq(dev);
+ mutex_unlock(&devfreq_list_lock);
+
+ return devfreq;
+}
+
static unsigned long find_available_min_freq(struct devfreq *devfreq)
{
struct dev_pm_opp *opp;
@@ -269,20 +284,21 @@ int update_devfreq(struct devfreq *devfreq)
if (!policy->governor)
return -EINVAL;
+ policy->min = policy->devinfo.min_freq;
+ policy->max = policy->devinfo.max_freq;
+
+ srcu_notifier_call_chain(&devfreq->policy_notifier_list,
+ DEVFREQ_ADJUST, policy);
+
/* Reevaluate the proper frequency */
err = policy->governor->get_target_freq(devfreq, &freq);
if (err)
return err;
- /*
- * Adjust the frequency with user freq, QoS and available freq.
- *
- * List from the highest priority
- * max_freq
- * min_freq
- */
- max_freq = MIN(policy->devinfo.max_freq, policy->user.max_freq);
- min_freq = MAX(policy->devinfo.min_freq, policy->user.min_freq);
+ /* Adjust the frequency */
+
+ max_freq = MIN(policy->max, policy->user.max_freq);
+ min_freq = MAX(policy->min, policy->user.min_freq);
if (freq < min_freq) {
freq = min_freq;
@@ -645,6 +661,7 @@ struct devfreq *devfreq_add_device(struct device *dev,
devfreq->last_stat_updated = jiffies;
srcu_init_notifier_head(&devfreq->transition_notifier_list);
+ srcu_init_notifier_head(&devfreq->policy_notifier_list);
mutex_unlock(&devfreq->lock);
@@ -1445,7 +1462,7 @@ EXPORT_SYMBOL(devm_devfreq_unregister_opp_notifier);
* devfreq_register_notifier() - Register a driver with devfreq
* @devfreq: The devfreq object.
* @nb: The notifier block to register.
- * @list: DEVFREQ_TRANSITION_NOTIFIER.
+ * @list: DEVFREQ_TRANSITION_NOTIFIER or DEVFREQ_POLICY_NOTIFIER.
*/
int devfreq_register_notifier(struct devfreq *devfreq,
struct notifier_block *nb,
@@ -1461,6 +1478,10 @@ int devfreq_register_notifier(struct devfreq *devfreq,
ret = srcu_notifier_chain_register(
&devfreq->transition_notifier_list, nb);
break;
+ case DEVFREQ_POLICY_NOTIFIER:
+ ret = srcu_notifier_chain_register(
+ &devfreq->policy_notifier_list, nb);
+ break;
default:
ret = -EINVAL;
}
@@ -1473,7 +1494,7 @@ EXPORT_SYMBOL(devfreq_register_notifier);
* devfreq_unregister_notifier() - Unregister a driver with devfreq
* @devfreq: The devfreq object.
* @nb: The notifier block to be unregistered.
- * @list: DEVFREQ_TRANSITION_NOTIFIER.
+ * @list: DEVFREQ_TRANSITION_NOTIFIER or DEVFREQ_POLICY_NOTIFIER.
*/
int devfreq_unregister_notifier(struct devfreq *devfreq,
struct notifier_block *nb,
@@ -1489,6 +1510,11 @@ int devfreq_unregister_notifier(struct devfreq *devfreq,
ret = srcu_notifier_chain_unregister(
&devfreq->transition_notifier_list, nb);
break;
+ case DEVFREQ_POLICY_NOTIFIER:
+ ret = srcu_notifier_chain_unregister(
+ &devfreq->policy_notifier_list, nb);
+ break;
+
default:
ret = -EINVAL;
}
diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
index 9bf23b976f4d..7c8dce96db73 100644
--- a/include/linux/devfreq.h
+++ b/include/linux/devfreq.h
@@ -33,6 +33,10 @@
#define DEVFREQ_PRECHANGE (0)
#define DEVFREQ_POSTCHANGE (1)
+#define DEVFREQ_POLICY_NOTIFIER 1
+
+#define DEVFREQ_ADJUST 0
+
struct devfreq;
struct devfreq_governor;
@@ -121,12 +125,16 @@ struct devfreq_freq_limits {
/**
* struct devfreq_policy - Devfreq policy
+ * @min: minimum frequency (adjustable by policy notifiers)
+ * @min: maximum frequency (adjustable by policy notifiers)
* @user: frequency limits requested by the user
* @devinfo: frequency limits of the device (available OPPs)
* @governor: method how to choose frequency based on the usage.
* @governor_name: devfreq governor name for use with this devfreq
*/
struct devfreq_policy {
+ unsigned long min;
+ unsigned long max;
struct devfreq_freq_limits user;
struct devfreq_freq_limits devinfo;
const struct devfreq_governor *governor;
@@ -155,6 +163,7 @@ struct devfreq_policy {
* @time_in_state: Statistics of devfreq states
* @last_stat_updated: The last time stat updated
* @transition_notifier_list: list head of DEVFREQ_TRANSITION_NOTIFIER notifier
+ * @policy_notifier_list: list head of DEVFREQ_POLICY_NOTIFIER notifier
*
* This structure stores the devfreq information for a give device.
*
@@ -188,6 +197,7 @@ struct devfreq {
unsigned long last_stat_updated;
struct srcu_notifier_head transition_notifier_list;
+ struct srcu_notifier_head policy_notifier_list;
};
struct devfreq_freqs {
@@ -240,6 +250,45 @@ extern void devm_devfreq_unregister_notifier(struct device *dev,
extern struct devfreq *devfreq_get_devfreq_by_phandle(struct device *dev,
int index);
+/**
+ * devfreq_verify_within_limits() - Adjust a devfreq policy if needed to make
+ * sure its min/max values are within a
+ * specified range.
+ * @policy: the policy
+ * @min: the minimum frequency
+ * @max: the maximum frequency
+ */
+static inline void devfreq_verify_within_limits(struct devfreq_policy *policy,
+ unsigned int min, unsigned int max)
+{
+ if (policy->min < min)
+ policy->min = min;
+ if (policy->max < min)
+ policy->max = min;
+ if (policy->min > max)
+ policy->min = max;
+ if (policy->max > max)
+ policy->max = max;
+ if (policy->min > policy->max)
+ policy->min = policy->max;
+}
+
+/**
+ * devfreq_verify_within_dev_limits() - Adjust a devfreq policy if needed to
+ * make sure its min/max values are within
+ * the frequency range supported by the
+ * device.
+ * @policy: the policy
+ */
+static inline void
+devfreq_verify_within_dev_limits(struct devfreq_policy *policy)
+{
+ devfreq_verify_within_limits(policy, policy->devinfo.min_freq,
+ policy->devinfo.max_freq);
+}
+
+struct devfreq *dev_to_devfreq(struct device *dev);
+
#if IS_ENABLED(CONFIG_DEVFREQ_GOV_SIMPLE_ONDEMAND)
/**
* struct devfreq_simple_ondemand_data - void *data fed to struct devfreq
@@ -394,10 +443,26 @@ static inline struct devfreq *devfreq_get_devfreq_by_phandle(struct device *dev,
return ERR_PTR(-ENODEV);
}
+static inline void devfreq_verify_within_limits(struct devfreq_policy *policy,
+ unsigned int min, unsigned int max)
+{
+}
+
+static inline void
+devfreq_verify_within_dev_limits(struct devfreq_policy *policy)
+{
+}
+
static inline int devfreq_update_stats(struct devfreq *df)
{
return -EINVAL;
}
+
+static inline struct devfreq *dev_to_devfreq(struct device *dev)
+{
+ return NULL;
+}
+
#endif /* CONFIG_PM_DEVFREQ */
#endif /* __LINUX_DEVFREQ_H__ */
--
2.18.0.203.gfac676dfb9-goog
Several governors use the user space limits df->min/max_freq to adjust
the target frequency. This is not necessary, since update_devfreq()
already takes care of this. Instead the governor can request the available
min/max frequency by setting the target frequency to DEVFREQ_MIN/MAX_FREQ
and let update_devfreq() take care of any adjustments.
Signed-off-by: Matthias Kaehlcke <[email protected]>
Reviewed-by: Brian Norris <[email protected]>
---
Changes in v5:
- none
Changes in v4:
- added 'Reviewed-by: Brian Norris <[email protected]>' tag
Changes in v3:
- none
Changes in v2:
- squashed "PM / devfreq: Remove redundant frequency adjustment from governors"
and "PM / devfreq: governors: Return device frequency limits instead of user
limits"
- updated subject and commit message
- use DEVFREQ_MIN/MAX_FREQ instead of df->scaling_min/max_freq
---
drivers/devfreq/governor.h | 3 +++
drivers/devfreq/governor_performance.c | 5 +----
drivers/devfreq/governor_powersave.c | 2 +-
drivers/devfreq/governor_simpleondemand.c | 12 +++---------
drivers/devfreq/governor_userspace.c | 16 ++++------------
5 files changed, 12 insertions(+), 26 deletions(-)
diff --git a/drivers/devfreq/governor.h b/drivers/devfreq/governor.h
index cfc50a61a90d..b81700244ce3 100644
--- a/drivers/devfreq/governor.h
+++ b/drivers/devfreq/governor.h
@@ -25,6 +25,9 @@
#define DEVFREQ_GOV_SUSPEND 0x4
#define DEVFREQ_GOV_RESUME 0x5
+#define DEVFREQ_MIN_FREQ 0
+#define DEVFREQ_MAX_FREQ ULONG_MAX
+
/**
* struct devfreq_governor - Devfreq policy governor
* @node: list node - contains registered devfreq governors
diff --git a/drivers/devfreq/governor_performance.c b/drivers/devfreq/governor_performance.c
index 4d23ecfbd948..ded429fd51be 100644
--- a/drivers/devfreq/governor_performance.c
+++ b/drivers/devfreq/governor_performance.c
@@ -20,10 +20,7 @@ static int devfreq_performance_func(struct devfreq *df,
* target callback should be able to get floor value as
* said in devfreq.h
*/
- if (!df->max_freq)
- *freq = UINT_MAX;
- else
- *freq = df->max_freq;
+ *freq = DEVFREQ_MAX_FREQ;
return 0;
}
diff --git a/drivers/devfreq/governor_powersave.c b/drivers/devfreq/governor_powersave.c
index 0c42f23249ef..9e8897f5ac42 100644
--- a/drivers/devfreq/governor_powersave.c
+++ b/drivers/devfreq/governor_powersave.c
@@ -20,7 +20,7 @@ static int devfreq_powersave_func(struct devfreq *df,
* target callback should be able to get ceiling value as
* said in devfreq.h
*/
- *freq = df->min_freq;
+ *freq = DEVFREQ_MIN_FREQ;
return 0;
}
diff --git a/drivers/devfreq/governor_simpleondemand.c b/drivers/devfreq/governor_simpleondemand.c
index 28e0f2de7100..c0417f0e081e 100644
--- a/drivers/devfreq/governor_simpleondemand.c
+++ b/drivers/devfreq/governor_simpleondemand.c
@@ -27,7 +27,6 @@ static int devfreq_simple_ondemand_func(struct devfreq *df,
unsigned int dfso_upthreshold = DFSO_UPTHRESHOLD;
unsigned int dfso_downdifferential = DFSO_DOWNDIFFERENCTIAL;
struct devfreq_simple_ondemand_data *data = df->data;
- unsigned long max = (df->max_freq) ? df->max_freq : UINT_MAX;
err = devfreq_update_stats(df);
if (err)
@@ -47,7 +46,7 @@ static int devfreq_simple_ondemand_func(struct devfreq *df,
/* Assume MAX if it is going to be divided by zero */
if (stat->total_time == 0) {
- *freq = max;
+ *freq = DEVFREQ_MAX_FREQ;
return 0;
}
@@ -60,13 +59,13 @@ static int devfreq_simple_ondemand_func(struct devfreq *df,
/* Set MAX if it's busy enough */
if (stat->busy_time * 100 >
stat->total_time * dfso_upthreshold) {
- *freq = max;
+ *freq = DEVFREQ_MAX_FREQ;
return 0;
}
/* Set MAX if we do not know the initial frequency */
if (stat->current_frequency == 0) {
- *freq = max;
+ *freq = DEVFREQ_MAX_FREQ;
return 0;
}
@@ -85,11 +84,6 @@ static int devfreq_simple_ondemand_func(struct devfreq *df,
b = div_u64(b, (dfso_upthreshold - dfso_downdifferential / 2));
*freq = (unsigned long) b;
- if (df->min_freq && *freq < df->min_freq)
- *freq = df->min_freq;
- if (df->max_freq && *freq > df->max_freq)
- *freq = df->max_freq;
-
return 0;
}
diff --git a/drivers/devfreq/governor_userspace.c b/drivers/devfreq/governor_userspace.c
index 080607c3f34d..378d84c011df 100644
--- a/drivers/devfreq/governor_userspace.c
+++ b/drivers/devfreq/governor_userspace.c
@@ -26,19 +26,11 @@ static int devfreq_userspace_func(struct devfreq *df, unsigned long *freq)
{
struct userspace_data *data = df->data;
- if (data->valid) {
- unsigned long adjusted_freq = data->user_frequency;
-
- if (df->max_freq && adjusted_freq > df->max_freq)
- adjusted_freq = df->max_freq;
-
- if (df->min_freq && adjusted_freq < df->min_freq)
- adjusted_freq = df->min_freq;
-
- *freq = adjusted_freq;
- } else {
+ if (data->valid)
+ *freq = data->user_frequency;
+ else
*freq = df->previous_freq; /* No user freq specified yet */
- }
+
return 0;
}
--
2.18.0.203.gfac676dfb9-goog
Currently update_devfreq() is only visible to devfreq governors outside
of devfreq.c. Make it public to allow drivers that adjust devfreq policies
to cause a re-evaluation of the frequency after a policy change.
Signed-off-by: Matthias Kaehlcke <[email protected]>
Acked-by: MyungJoo Ham <[email protected]>
Reviewed-by: Brian Norris <[email protected]>
--
Changes in v5:
- none
Changed in v4:
- added 'Reviewed-by: Brian Norris <[email protected]>' tag
Changes in v3:
- none
Changes in v2:
- added 'Acked-by: MyungJoo Ham <[email protected]>' tag
---
drivers/devfreq/governor.h | 3 ---
include/linux/devfreq.h | 8 ++++++++
2 files changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/devfreq/governor.h b/drivers/devfreq/governor.h
index b81700244ce3..f53339ca610f 100644
--- a/drivers/devfreq/governor.h
+++ b/drivers/devfreq/governor.h
@@ -57,9 +57,6 @@ struct devfreq_governor {
unsigned int event, void *data);
};
-/* Caution: devfreq->lock must be locked before calling update_devfreq */
-extern int update_devfreq(struct devfreq *devfreq);
-
extern void devfreq_monitor_start(struct devfreq *devfreq);
extern void devfreq_monitor_stop(struct devfreq *devfreq);
extern void devfreq_monitor_suspend(struct devfreq *devfreq);
diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
index 7c8dce96db73..c4f84a769cb5 100644
--- a/include/linux/devfreq.h
+++ b/include/linux/devfreq.h
@@ -222,6 +222,14 @@ extern void devm_devfreq_remove_device(struct device *dev,
extern int devfreq_suspend_device(struct devfreq *devfreq);
extern int devfreq_resume_device(struct devfreq *devfreq);
+/**
+ * update_devfreq() - Reevaluate the device and configure frequency
+ * @devfreq: the devfreq device
+ *
+ * Note: devfreq->lock must be held
+ */
+extern int update_devfreq(struct devfreq *devfreq);
+
/* Helper functions for devfreq user device driver with OPP. */
extern struct dev_pm_opp *devfreq_recommended_opp(struct device *dev,
unsigned long *freq, u32 flags);
--
2.18.0.203.gfac676dfb9-goog
Commit ab8f58ad72c4 ("PM / devfreq: Set min/max_freq when adding
the devfreq device") introduced the initialization of the user
limits min/max_freq from the lowest/highest available OPPs. Later
commit f1d981eaecf8 ("PM / devfreq: Use the available min/max
frequency") added scaling_min/max_freq, which actually represent
the frequencies of the lowest/highest available OPP. scaling_min/
max_freq are initialized with the values from min/max_freq, which
is totally correct in the context, but a bit awkward to read.
Swap the initialization and assign scaling_min/max_freq with the
OPP freqs and then the user limts min/max_freq with scaling_min/
max_freq.
Needless to say that this change is a NOP, intended to improve
readability.
Signed-off-by: Matthias Kaehlcke <[email protected]>
Reviewed-by: Chanwoo Choi <[email protected]>
Reviewed-by: Brian Norris <[email protected]>
---
Changes in v5:
- none
Changes in v4:
- added 'Reviewed-by: Brian Norris <[email protected]>' tag
Changes in v3:
- none
Changes in v2:
- added 'Reviewed-by: Chanwoo Choi <[email protected]>' tag
---
drivers/devfreq/devfreq.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
index fe2af6aa88fc..0057ef5b0a98 100644
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -604,21 +604,21 @@ struct devfreq *devfreq_add_device(struct device *dev,
mutex_lock(&devfreq->lock);
}
- devfreq->min_freq = find_available_min_freq(devfreq);
- if (!devfreq->min_freq) {
+ devfreq->scaling_min_freq = find_available_min_freq(devfreq);
+ if (!devfreq->scaling_min_freq) {
mutex_unlock(&devfreq->lock);
err = -EINVAL;
goto err_dev;
}
- devfreq->scaling_min_freq = devfreq->min_freq;
+ devfreq->min_freq = devfreq->scaling_min_freq;
- devfreq->max_freq = find_available_max_freq(devfreq);
- if (!devfreq->max_freq) {
+ devfreq->scaling_max_freq = find_available_max_freq(devfreq);
+ if (!devfreq->scaling_max_freq) {
mutex_unlock(&devfreq->lock);
err = -EINVAL;
goto err_dev;
}
- devfreq->scaling_max_freq = devfreq->max_freq;
+ devfreq->max_freq = devfreq->scaling_max_freq;
dev_set_name(&devfreq->dev, "devfreq%d",
atomic_inc_return(&devfreq_no));
--
2.18.0.203.gfac676dfb9-goog
Commit ab8f58ad72c4 ("PM / devfreq: Set min/max_freq when adding the
devfreq device") initializes df->min/max_freq with the min/max OPP when
the device is added. Later commit f1d981eaecf8 ("PM / devfreq: Use the
available min/max frequency") adds df->scaling_min/max_freq and the
following to the frequency adjustment code:
max_freq = MIN(devfreq->scaling_max_freq, devfreq->max_freq);
With the current handling of min/max_freq this is incorrect:
Even though df->max_freq is now initialized to a value != 0 user space
can still set it to 0, in this case max_freq would be 0 instead of
df->scaling_max_freq as intended. In consequence the frequency adjustment
is not performed:
if (max_freq && freq > max_freq) {
freq = max_freq;
To fix this set df->min/max freq to the min/max OPP in max/max_freq_store,
when the user passes a value of 0. This also prevents df->max_freq from
being set below the min OPP when df->min_freq is 0, and similar for
min_freq. Since it is now guaranteed that df->min/max_freq can't be 0 the
checks for this case can be removed.
Fixes: f1d981eaecf8 ("PM / devfreq: Use the available min/max frequency")
Signed-off-by: Matthias Kaehlcke <[email protected]>
Reviewed-by: Brian Norris <[email protected]>
---
Changes in v5:
- none
Changes in v4:
- added 'Reviewed-by: Brian Norris <[email protected]>' tag
Changes in v3:
- none
Changes in v2:
- handle freq tables sorted in ascending and descending order in
min/max_freq_store()
- use same order for conditional statements in min/max_freq_store()
---
drivers/devfreq/devfreq.c | 42 ++++++++++++++++++++++++++++-----------
1 file changed, 30 insertions(+), 12 deletions(-)
diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
index 0057ef5b0a98..6f604f8b2b81 100644
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -283,11 +283,11 @@ int update_devfreq(struct devfreq *devfreq)
max_freq = MIN(devfreq->scaling_max_freq, devfreq->max_freq);
min_freq = MAX(devfreq->scaling_min_freq, devfreq->min_freq);
- if (min_freq && freq < min_freq) {
+ if (freq < min_freq) {
freq = min_freq;
flags &= ~DEVFREQ_FLAG_LEAST_UPPER_BOUND; /* Use GLB */
}
- if (max_freq && freq > max_freq) {
+ if (freq > max_freq) {
freq = max_freq;
flags |= DEVFREQ_FLAG_LEAST_UPPER_BOUND; /* Use LUB */
}
@@ -1122,18 +1122,27 @@ static ssize_t min_freq_store(struct device *dev, struct device_attribute *attr,
{
struct devfreq *df = to_devfreq(dev);
unsigned long value;
+ unsigned long *freq_table;
int ret;
- unsigned long max;
ret = sscanf(buf, "%lu", &value);
if (ret != 1)
return -EINVAL;
mutex_lock(&df->lock);
- max = df->max_freq;
- if (value && max && value > max) {
- ret = -EINVAL;
- goto unlock;
+
+ if (value) {
+ if (value > df->max_freq) {
+ ret = -EINVAL;
+ goto unlock;
+ }
+ } else {
+ freq_table = df->profile->freq_table;
+ /* typical order is ascending, some drivers use descending */
+ if (freq_table[0] < freq_table[df->profile->max_state - 1])
+ value = freq_table[0];
+ else
+ value = freq_table[df->profile->max_state - 1];
}
df->min_freq = value;
@@ -1157,18 +1166,27 @@ static ssize_t max_freq_store(struct device *dev, struct device_attribute *attr,
{
struct devfreq *df = to_devfreq(dev);
unsigned long value;
+ unsigned long *freq_table;
int ret;
- unsigned long min;
ret = sscanf(buf, "%lu", &value);
if (ret != 1)
return -EINVAL;
mutex_lock(&df->lock);
- min = df->min_freq;
- if (value && min && value < min) {
- ret = -EINVAL;
- goto unlock;
+
+ if (value) {
+ if (value < df->min_freq) {
+ ret = -EINVAL;
+ goto unlock;
+ }
+ } else {
+ freq_table = df->profile->freq_table;
+ /* typical order is ascending, some drivers use descending */
+ if (freq_table[0] < freq_table[df->profile->max_state - 1])
+ value = freq_table[df->profile->max_state - 1];
+ else
+ value = freq_table[0];
}
df->max_freq = value;
--
2.18.0.203.gfac676dfb9-goog
Hi Matthias,
On 2018년 07월 04일 08:46, Matthias Kaehlcke wrote:
> Commit ab8f58ad72c4 ("PM / devfreq: Set min/max_freq when adding the
> devfreq device") initializes df->min/max_freq with the min/max OPP when
> the device is added. Later commit f1d981eaecf8 ("PM / devfreq: Use the
> available min/max frequency") adds df->scaling_min/max_freq and the
> following to the frequency adjustment code:
>
> max_freq = MIN(devfreq->scaling_max_freq, devfreq->max_freq);
>
> With the current handling of min/max_freq this is incorrect:
>
> Even though df->max_freq is now initialized to a value != 0 user space
> can still set it to 0, in this case max_freq would be 0 instead of
> df->scaling_max_freq as intended. In consequence the frequency adjustment
> is not performed:
>
> if (max_freq && freq > max_freq) {
> freq = max_freq;
>
> To fix this set df->min/max freq to the min/max OPP in max/max_freq_store,
> when the user passes a value of 0. This also prevents df->max_freq from
> being set below the min OPP when df->min_freq is 0, and similar for
> min_freq. Since it is now guaranteed that df->min/max_freq can't be 0 the
> checks for this case can be removed.
>
> Fixes: f1d981eaecf8 ("PM / devfreq: Use the available min/max frequency")
> Signed-off-by: Matthias Kaehlcke <[email protected]>
> Reviewed-by: Brian Norris <[email protected]>
> ---
> Changes in v5:
> - none
>
> Changes in v4:
> - added 'Reviewed-by: Brian Norris <[email protected]>' tag
>
> Changes in v3:
> - none
>
> Changes in v2:
> - handle freq tables sorted in ascending and descending order in
> min/max_freq_store()
> - use same order for conditional statements in min/max_freq_store()
> ---
> drivers/devfreq/devfreq.c | 42 ++++++++++++++++++++++++++++-----------
> 1 file changed, 30 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
> index 0057ef5b0a98..6f604f8b2b81 100644
> --- a/drivers/devfreq/devfreq.c
> +++ b/drivers/devfreq/devfreq.c
> @@ -283,11 +283,11 @@ int update_devfreq(struct devfreq *devfreq)
> max_freq = MIN(devfreq->scaling_max_freq, devfreq->max_freq);
> min_freq = MAX(devfreq->scaling_min_freq, devfreq->min_freq);
>
> - if (min_freq && freq < min_freq) {
> + if (freq < min_freq) {
> freq = min_freq;
> flags &= ~DEVFREQ_FLAG_LEAST_UPPER_BOUND; /* Use GLB */
> }
> - if (max_freq && freq > max_freq) {
> + if (freq > max_freq) {
> freq = max_freq;
> flags |= DEVFREQ_FLAG_LEAST_UPPER_BOUND; /* Use LUB */
> }
> @@ -1122,18 +1122,27 @@ static ssize_t min_freq_store(struct device *dev, struct device_attribute *attr,
> {
> struct devfreq *df = to_devfreq(dev);
> unsigned long value;
> + unsigned long *freq_table;
You can move 'freq_table' under 'else' statement.
> int ret;
> - unsigned long max;
>
> ret = sscanf(buf, "%lu", &value);
> if (ret != 1)
> return -EINVAL;
>
> mutex_lock(&df->lock);
> - max = df->max_freq;
> - if (value && max && value > max) {
> - ret = -EINVAL;
> - goto unlock;
> +
> + if (value) {
> + if (value > df->max_freq) {
> + ret = -EINVAL;
> + goto unlock;
> + }
> + } else {
> + freq_table = df->profile->freq_table;
> + /* typical order is ascending, some drivers use descending */
You better to explain what is doing of following code.
How about modifying it as following?
/* Get minimum frequency according to sorting way */
> + if (freq_table[0] < freq_table[df->profile->max_state - 1])
> + value = freq_table[0];
> + else
> + value = freq_table[df->profile->max_state - 1];
> }
>
> df->min_freq = value;
> @@ -1157,18 +1166,27 @@ static ssize_t max_freq_store(struct device *dev, struct device_attribute *attr,
> {
> struct devfreq *df = to_devfreq(dev);
> unsigned long value;
> + unsigned long *freq_table;
ditto. You can move 'freq_table' under 'else' statement.
> int ret;
> - unsigned long min;
>
> ret = sscanf(buf, "%lu", &value);
> if (ret != 1)
> return -EINVAL;
>
> mutex_lock(&df->lock);
> - min = df->min_freq;
> - if (value && min && value < min) {
> - ret = -EINVAL;
> - goto unlock;
> +
> + if (value) {
> + if (value < df->min_freq) {
> + ret = -EINVAL;
> + goto unlock;
> + }
> + } else {
> + freq_table = df->profile->freq_table;
> + /* typical order is ascending, some drivers use descending */
ditto.
/* Get maximum frequency according to sorting way */
> + if (freq_table[0] < freq_table[df->profile->max_state - 1])
> + value = freq_table[df->profile->max_state - 1];
> + else
> + value = freq_table[0];
> }
>
> df->max_freq = value;
>
If you agree my comment and modify this patch according to my comment,
feel free to add my review tag.
- Reviewed-by: Chanwoo Choi <[email protected]>
--
Best Regards,
Chanwoo Choi
Samsung Electronics
>Commit ab8f58ad72c4 ("PM / devfreq: Set min/max_freq when adding
>the devfreq device") introduced the initialization of the user
>limits min/max_freq from the lowest/highest available OPPs. Later
>commit f1d981eaecf8 ("PM / devfreq: Use the available min/max
>frequency") added scaling_min/max_freq, which actually represent
>the frequencies of the lowest/highest available OPP. scaling_min/
>max_freq are initialized with the values from min/max_freq, which
>is totally correct in the context, but a bit awkward to read.
>
>Swap the initialization and assign scaling_min/max_freq with the
>OPP freqs and then the user limts min/max_freq with scaling_min/
>max_freq.
>
>Needless to say that this change is a NOP, intended to improve
>readability.
>
>Signed-off-by: Matthias Kaehlcke <[email protected]>
>Reviewed-by: Chanwoo Choi <[email protected]>
>Reviewed-by: Brian Norris <[email protected]>
>---
>Changes in v5:
>- none
>
>Changes in v4:
>- added 'Reviewed-by: Brian Norris <[email protected]>' tag
>
>Changes in v3:
>- none
>
>Changes in v2:
>- added 'Reviewed-by: Chanwoo Choi <[email protected]>' tag
>---
> drivers/devfreq/devfreq.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
Acked-by: MyungJoo Ham <[email protected]>
This can be applied independently from other commits in this series.
Hi Matthias,
On 2018년 07월 04일 08:46, Matthias Kaehlcke wrote:
> Several governors use the user space limits df->min/max_freq to adjust
> the target frequency. This is not necessary, since update_devfreq()
> already takes care of this. Instead the governor can request the available
> min/max frequency by setting the target frequency to DEVFREQ_MIN/MAX_FREQ
> and let update_devfreq() take care of any adjustments.
>
> Signed-off-by: Matthias Kaehlcke <[email protected]>
> Reviewed-by: Brian Norris <[email protected]>
> ---
> Changes in v5:
> - none
>
> Changes in v4:
> - added 'Reviewed-by: Brian Norris <[email protected]>' tag
>
> Changes in v3:
> - none
>
> Changes in v2:
> - squashed "PM / devfreq: Remove redundant frequency adjustment from governors"
> and "PM / devfreq: governors: Return device frequency limits instead of user
> limits"
> - updated subject and commit message
> - use DEVFREQ_MIN/MAX_FREQ instead of df->scaling_min/max_freq
> ---
> drivers/devfreq/governor.h | 3 +++
> drivers/devfreq/governor_performance.c | 5 +----
> drivers/devfreq/governor_powersave.c | 2 +-
> drivers/devfreq/governor_simpleondemand.c | 12 +++---------
> drivers/devfreq/governor_userspace.c | 16 ++++------------
> 5 files changed, 12 insertions(+), 26 deletions(-)
Actually, I preferred to use 'df->scaling_min/max_freq'
instead of DEVFREQ_MIN/MAX_FREQ. But, DEVFREQ_MIN/MAX_FREQ is other way.
So, Looks good to me.
Reviewed-by: Chanwoo Choi <[email protected]>
[snip]
--
Best Regards,
Chanwoo Choi
Samsung Electronics
Hi,
On 2018년 07월 04일 08:46, Matthias Kaehlcke wrote:
> Move variables related with devfreq policy changes from struct devfreq
> to the new struct devfreq_policy and add a policy field to struct devfreq.
>
> The following variables are moved:
>
> df->min/max_freq => p->user.min/max_freq
> df->scaling_min/max_freq => p->devinfo.min/max_freq
> df->governor => p->governor
> df->governor_name => p->governor_name
>
> Signed-off-by: Matthias Kaehlcke <[email protected]>
> Reviewed-by: Brian Norris <[email protected]>
> ---
> Changes in v5:
> - none
>
> Changes in v4:
> - added 'Reviewed-by: Brian Norris <[email protected]>' tag
>
> Changes in v3:
> - none
>
> Changes in v2:
> - performance, powersave and simpleondemand governors don't need changes
> with "PM / devfreq: Don't adjust to user limits in governors"
> - formatting fixes
> ---
> drivers/devfreq/devfreq.c | 137 ++++++++++++++++-------------
> drivers/devfreq/governor_passive.c | 4 +-
> include/linux/devfreq.h | 38 +++++---
> 3 files changed, 103 insertions(+), 76 deletions(-)
>
(skip)
>
> diff --git a/drivers/devfreq/governor_passive.c b/drivers/devfreq/governor_passive.c
> index 3bc29acbd54e..e0987c749ec2 100644
> --- a/drivers/devfreq/governor_passive.c
> +++ b/drivers/devfreq/governor_passive.c
> @@ -99,12 +99,12 @@ static int update_devfreq_passive(struct devfreq *devfreq, unsigned long freq)
> {
> int ret;
>
> - if (!devfreq->governor)
> + if (!devfreq->policy.governor)
> return -EINVAL;
>
> mutex_lock_nested(&devfreq->lock, SINGLE_DEPTH_NESTING);
>
> - ret = devfreq->governor->get_target_freq(devfreq, &freq);
> + ret = devfreq->policy.governor->get_target_freq(devfreq, &freq);
> if (ret < 0)
> goto out;
>
> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
> index 3aae5b3af87c..9bf23b976f4d 100644
> --- a/include/linux/devfreq.h
> +++ b/include/linux/devfreq.h
> @@ -109,6 +109,30 @@ struct devfreq_dev_profile {
> unsigned int max_state;
> };
>
> +/**
> + * struct devfreq_freq_limits - Devfreq frequency limits
> + * @min_freq: minimum frequency
> + * @max_freq: maximum frequency
> + */
> +struct devfreq_freq_limits {
> + unsigned long min_freq;
> + unsigned long max_freq;
> +};
> +
> +/**
> + * struct devfreq_policy - Devfreq policy
> + * @user: frequency limits requested by the user
> + * @devinfo: frequency limits of the device (available OPPs)
> + * @governor: method how to choose frequency based on the usage.
nitpick. remove '.' on the end of line.
> + * @governor_name: devfreq governor name for use with this devfreq
> + */
> +struct devfreq_policy {
> + struct devfreq_freq_limits user;
> + struct devfreq_freq_limits devinfo;
> + const struct devfreq_governor *governor;
> + char governor_name[DEVFREQ_NAME_LEN];
> +};
> +
> /**
> * struct devfreq - Device devfreq structure
> * @node: list node - contains the devices with devfreq that have been
> @@ -117,8 +141,6 @@ struct devfreq_dev_profile {
> * @dev: device registered by devfreq class. dev.parent is the device
> * using devfreq.
> * @profile: device-specific devfreq profile
> - * @governor: method how to choose frequency based on the usage.
> - * @governor_name: devfreq governor name for use with this devfreq
> * @nb: notifier block used to notify devfreq object that it should
> * reevaluate operable frequencies. Devfreq users may use
> * devfreq.nb to the corresponding register notifier call chain.
> @@ -126,10 +148,7 @@ struct devfreq_dev_profile {
> * @previous_freq: previously configured frequency value.
> * @data: Private data of the governor. The devfreq framework does not
> * touch this.
> - * @min_freq: Limit minimum frequency requested by user (0: none)
> - * @max_freq: Limit maximum frequency requested by user (0: none)
> - * @scaling_min_freq: Limit minimum frequency requested by OPP interface
> - * @scaling_max_freq: Limit maximum frequency requested by OPP interface
> + * @policy: Policy for frequency adjustments
The devfreq_policy contains the range of frequency and governor information.
But, this description focus on the frequency. You need to explain the more
correct description of 'policy'.
> * @stop_polling: devfreq polling status of a device.
> * @total_trans: Number of devfreq transitions
> * @trans_table: Statistics of devfreq transitions
> @@ -151,8 +170,6 @@ struct devfreq {
> struct mutex lock;
> struct device dev;
> struct devfreq_dev_profile *profile;
> - const struct devfreq_governor *governor;
> - char governor_name[DEVFREQ_NAME_LEN];
> struct notifier_block nb;
> struct delayed_work work;
>
> @@ -161,10 +178,7 @@ struct devfreq {
>
> void *data; /* private data for governors */
>
> - unsigned long min_freq;
> - unsigned long max_freq;
> - unsigned long scaling_min_freq;
> - unsigned long scaling_max_freq;
> + struct devfreq_policy policy;
I recommend that you better to move under 'struct devfreq_dev_profile'
as following:
struct devfreq_dev_profile *profile;
struct devfreq_policy policy;
> bool stop_polling;
>
> /* information for device frequency transition */
>
--
Best Regards,
Chanwoo Choi
Samsung Electronics
Hi,
I didn't see any framework which exporting the class instance.
It is very dangerous. Unknown device drivers is able to reset
the 'devfreq_class' instance. I can't agree this approach.
Regards,
Chanwoo Choi
On 2018년 07월 04일 08:47, Matthias Kaehlcke wrote:
> Exporting the device class allows other parts of the kernel to enumerate
> the devfreq devices and receive notification when a devfreq device is
> added or removed.
>
> Signed-off-by: Matthias Kaehlcke <[email protected]>
> Reviewed-by: Brian Norris <[email protected]>
> ---
> Changes in v5:
> - none
>
> Changes in v4:
> - added 'Reviewed-by: Brian Norris <[email protected]>' tag
>
> Changes in v3:
> - none
>
> Changes in v2:
> - patch added to series
> ---
> drivers/devfreq/devfreq.c | 3 ++-
> include/linux/devfreq.h | 2 ++
> 2 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
> index 4cbaa7ad1972..38b90b64fc6e 100644
> --- a/drivers/devfreq/devfreq.c
> +++ b/drivers/devfreq/devfreq.c
> @@ -31,7 +31,8 @@
> #define MAX(a,b) ((a > b) ? a : b)
> #define MIN(a,b) ((a < b) ? a : b)
>
> -static struct class *devfreq_class;
> +struct class *devfreq_class;
> +EXPORT_SYMBOL_GPL(devfreq_class);
>
> /*
> * devfreq core provides delayed work based load monitoring helper
> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
> index c4f84a769cb5..964e064a951f 100644
> --- a/include/linux/devfreq.h
> +++ b/include/linux/devfreq.h
> @@ -206,6 +206,8 @@ struct devfreq_freqs {
> };
>
> #if defined(CONFIG_PM_DEVFREQ)
> +extern struct class *devfreq_class;
> +
> extern struct devfreq *devfreq_add_device(struct device *dev,
> struct devfreq_dev_profile *profile,
> const char *governor_name,
>
Hi Matthias,
Firstly,
I'm not sure why devfreq needs the devfreq_verify_within_limits() function.
devfreq already used the OPP interface as default. It means that
the outside of 'drivers/devfreq' can disable/enable the frequency
such as drivers/thermal/devfreq_cooling.c. Also, when some device
drivers disable/enable the specific frequency, the devfreq core
consider them.
So, devfreq doesn't need to devfreq_verify_within_limits() because
already support some interface to change the minimum/maximum frequency
of devfreq device.
In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()'
to change the minimum/maximum frequency of cpu. some device driver cannot
change the minimum/maximum frequency through OPP interface.
But, in case of devfreq subsystem, as I explained already, devfreq support
the OPP interface as default way. devfreq subsystem doesn't need to add
other way to change the minimum/maximum frequency.
Secondly,
This patch send the 'struct devfreq_policy' instance as the data
when sending the notification as following:
srcu_notifier_call_chain(&devfreq->policy_notifier_list,
DEVFREQ_ADJUST, policy);
But, I think that if devfreq core sends the 'struct devfreq_freq_limits'
instance instead of 'struct devfreq_policy', it is enough.
Because receiver of DEVFREQ_ADJUST just will use the min_freq/max_freq variables.
So, I tried to find the cpufreq's case. The some device drivers using
CPUFREQ_POLICY_NOTIFIER uses following variables of 'struct cpufreq_policy'.
It means that receiver of CPUFREQ_POLICY_NOTIFIER don't need to other
information/variables except for min/max frequency.
- policy->min
- policy->max
- policy->cpuinfo.max_freq
- policy->cpuinfo.min_freq
- policy->cpu : not related to devfreq)
- policy->related_cpus : not related to devfreq)
- list of device drivers using CPUFREQ_POLICY_NOTIFIER (linux kernel is v4.18-rc1)
$ grep -rn "CPUFREQ_POLICY_NOTIFIER" .
./drivers/macintosh/windfarm_cpufreq_clamp.c
./drivers/thermal/cpu_cooling.c
./drivers/thermal/cpu_cooling.c
./drivers/acpi/processor_thermal.c
./drivers/acpi/processor_thermal.c
./drivers/acpi/processor_perflib.c
./drivers/acpi/processor_perflib.c
./drivers/base/arch_topology.c
./drivers/base/arch_topology.c
./drivers/video/fbdev/sa1100fb.c
./drivers/video/fbdev/pxafb.c
./drivers/cpufreq/ppc_cbe_cpufreq_pmi.c
./drivers/cpufreq/cpufreq.c
./drivers/cpufreq/cpufreq.c
./drivers/cpufreq/cpufreq.c
./drivers/cpufreq/cpufreq.c
On 2018년 07월 04일 08:46, Matthias Kaehlcke wrote:
> Policy notifiers are called before a frequency change and may narrow
> the min/max frequency range in devfreq_policy, which is used to adjust
> the target frequency if it is beyond this range.
>
> Also add a few helpers:
> - devfreq_verify_within_[dev_]limits()
> - should be used by the notifiers for policy adjustments.
> - dev_to_devfreq()
> - lookup a devfreq strict from a device pointer
>
> Signed-off-by: Matthias Kaehlcke <[email protected]>
> Reviewed-by: Brian Norris <[email protected]>
> ---
> Changes in v5:
> - none
>
> Changes in v4:
> - Fixed typo in commit message: devfreg => devfreq
> - added 'Reviewed-by: Brian Norris <[email protected]>' tag
>
> Changes in v3:
> - devfreq.h: fixed misspelling of struct devfreq_policy
>
> Changes in v2:
> - performance, powersave and simpleondemand governors don't need changes
> with "PM / devfreq: Don't adjust to user limits in governors"
> - formatting fixes
> ---
> drivers/devfreq/devfreq.c | 48 ++++++++++++++++++++++-------
> include/linux/devfreq.h | 65 +++++++++++++++++++++++++++++++++++++++
> 2 files changed, 102 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
> index 21604d6ae2b8..4cbaa7ad1972 100644
> --- a/drivers/devfreq/devfreq.c
> +++ b/drivers/devfreq/devfreq.c
> @@ -72,6 +72,21 @@ static struct devfreq *find_device_devfreq(struct device *dev)
> return ERR_PTR(-ENODEV);
> }
>
> +/**
> + * dev_to_devfreq() - find devfreq struct using device pointer
> + * @dev: device pointer used to lookup device devfreq.
> + */
> +struct devfreq *dev_to_devfreq(struct device *dev)
> +{
> + struct devfreq *devfreq;
> +
> + mutex_lock(&devfreq_list_lock);
> + devfreq = find_device_devfreq(dev);
> + mutex_unlock(&devfreq_list_lock);
> +
> + return devfreq;
> +}
> +
> static unsigned long find_available_min_freq(struct devfreq *devfreq)
> {
> struct dev_pm_opp *opp;
> @@ -269,20 +284,21 @@ int update_devfreq(struct devfreq *devfreq)
> if (!policy->governor)
> return -EINVAL;
>
> + policy->min = policy->devinfo.min_freq;
> + policy->max = policy->devinfo.max_freq;
Why don't you consider 'policy->user.max/min_freq' as following?
As I already commented, I think that 'struct devfreq_freq_limits' is enough
instead of 'struct devfreq_policy'.
->max_freq = MIN(policy->devinfo.max_freq, policy->user.max_freq);
->min_freq = MAX(policy->devinfo.min_freq, policy->user.min_freq);
> +
> + srcu_notifier_call_chain(&devfreq->policy_notifier_list,
> + DEVFREQ_ADJUST, policy);
> +
> /* Reevaluate the proper frequency */
> err = policy->governor->get_target_freq(devfreq, &freq);
> if (err)
> return err;
>
> - /*
> - * Adjust the frequency with user freq, QoS and available freq.
> - *
> - * List from the highest priority
> - * max_freq
> - * min_freq
> - */
> - max_freq = MIN(policy->devinfo.max_freq, policy->user.max_freq);
> - min_freq = MAX(policy->devinfo.min_freq, policy->user.min_freq);
> + /* Adjust the frequency */
> +
> + max_freq = MIN(policy->max, policy->user.max_freq);
> + min_freq = MAX(policy->min, policy->user.min_freq);
>
> if (freq < min_freq) {
> freq = min_freq;
> @@ -645,6 +661,7 @@ struct devfreq *devfreq_add_device(struct device *dev,
> devfreq->last_stat_updated = jiffies;
>
> srcu_init_notifier_head(&devfreq->transition_notifier_list);
> + srcu_init_notifier_head(&devfreq->policy_notifier_list);
>
> mutex_unlock(&devfreq->lock);
>
> @@ -1445,7 +1462,7 @@ EXPORT_SYMBOL(devm_devfreq_unregister_opp_notifier);
> * devfreq_register_notifier() - Register a driver with devfreq
> * @devfreq: The devfreq object.
> * @nb: The notifier block to register.
> - * @list: DEVFREQ_TRANSITION_NOTIFIER.
> + * @list: DEVFREQ_TRANSITION_NOTIFIER or DEVFREQ_POLICY_NOTIFIER.
> */
> int devfreq_register_notifier(struct devfreq *devfreq,
> struct notifier_block *nb,
> @@ -1461,6 +1478,10 @@ int devfreq_register_notifier(struct devfreq *devfreq,
> ret = srcu_notifier_chain_register(
> &devfreq->transition_notifier_list, nb);
> break;
> + case DEVFREQ_POLICY_NOTIFIER:
> + ret = srcu_notifier_chain_register(
> + &devfreq->policy_notifier_list, nb);
> + break;
> default:
> ret = -EINVAL;
> }
> @@ -1473,7 +1494,7 @@ EXPORT_SYMBOL(devfreq_register_notifier);
> * devfreq_unregister_notifier() - Unregister a driver with devfreq
> * @devfreq: The devfreq object.
> * @nb: The notifier block to be unregistered.
> - * @list: DEVFREQ_TRANSITION_NOTIFIER.
> + * @list: DEVFREQ_TRANSITION_NOTIFIER or DEVFREQ_POLICY_NOTIFIER.
> */
> int devfreq_unregister_notifier(struct devfreq *devfreq,
> struct notifier_block *nb,
> @@ -1489,6 +1510,11 @@ int devfreq_unregister_notifier(struct devfreq *devfreq,
> ret = srcu_notifier_chain_unregister(
> &devfreq->transition_notifier_list, nb);
> break;
> + case DEVFREQ_POLICY_NOTIFIER:
> + ret = srcu_notifier_chain_unregister(
> + &devfreq->policy_notifier_list, nb);
> + break;
> +
> default:
> ret = -EINVAL;
> }
> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
> index 9bf23b976f4d..7c8dce96db73 100644
> --- a/include/linux/devfreq.h
> +++ b/include/linux/devfreq.h
> @@ -33,6 +33,10 @@
> #define DEVFREQ_PRECHANGE (0)
> #define DEVFREQ_POSTCHANGE (1)
>
> +#define DEVFREQ_POLICY_NOTIFIER 1
> +
> +#define DEVFREQ_ADJUST 0
> +
> struct devfreq;
> struct devfreq_governor;
>
> @@ -121,12 +125,16 @@ struct devfreq_freq_limits {
>
> /**
> * struct devfreq_policy - Devfreq policy
> + * @min: minimum frequency (adjustable by policy notifiers)
> + * @min: maximum frequency (adjustable by policy notifiers)
> * @user: frequency limits requested by the user
> * @devinfo: frequency limits of the device (available OPPs)
> * @governor: method how to choose frequency based on the usage.
> * @governor_name: devfreq governor name for use with this devfreq
> */
> struct devfreq_policy {
> + unsigned long min;
> + unsigned long max;
> struct devfreq_freq_limits user;
> struct devfreq_freq_limits devinfo;
> const struct devfreq_governor *governor;
> @@ -155,6 +163,7 @@ struct devfreq_policy {
> * @time_in_state: Statistics of devfreq states
> * @last_stat_updated: The last time stat updated
> * @transition_notifier_list: list head of DEVFREQ_TRANSITION_NOTIFIER notifier
> + * @policy_notifier_list: list head of DEVFREQ_POLICY_NOTIFIER notifier
> *
> * This structure stores the devfreq information for a give device.
> *
> @@ -188,6 +197,7 @@ struct devfreq {
> unsigned long last_stat_updated;
>
> struct srcu_notifier_head transition_notifier_list;
> + struct srcu_notifier_head policy_notifier_list;
> };
>
> struct devfreq_freqs {
> @@ -240,6 +250,45 @@ extern void devm_devfreq_unregister_notifier(struct device *dev,
> extern struct devfreq *devfreq_get_devfreq_by_phandle(struct device *dev,
> int index);
>
> +/**
> + * devfreq_verify_within_limits() - Adjust a devfreq policy if needed to make
> + * sure its min/max values are within a
> + * specified range.
> + * @policy: the policy
> + * @min: the minimum frequency
> + * @max: the maximum frequency
> + */
> +static inline void devfreq_verify_within_limits(struct devfreq_policy *policy,
> + unsigned int min, unsigned int max)
> +{
> + if (policy->min < min)
> + policy->min = min;
> + if (policy->max < min)
> + policy->max = min;
> + if (policy->min > max)
> + policy->min = max;
> + if (policy->max > max)
> + policy->max = max;
> + if (policy->min > policy->max)
> + policy->min = policy->max;
> +}
> +
> +/**
> + * devfreq_verify_within_dev_limits() - Adjust a devfreq policy if needed to
> + * make sure its min/max values are within
> + * the frequency range supported by the
> + * device.
> + * @policy: the policy
> + */
> +static inline void
> +devfreq_verify_within_dev_limits(struct devfreq_policy *policy)
> +{
> + devfreq_verify_within_limits(policy, policy->devinfo.min_freq,
> + policy->devinfo.max_freq);
> +}
> +
> +struct devfreq *dev_to_devfreq(struct device *dev);
> +
> #if IS_ENABLED(CONFIG_DEVFREQ_GOV_SIMPLE_ONDEMAND)
> /**
> * struct devfreq_simple_ondemand_data - void *data fed to struct devfreq
> @@ -394,10 +443,26 @@ static inline struct devfreq *devfreq_get_devfreq_by_phandle(struct device *dev,
> return ERR_PTR(-ENODEV);
> }
>
> +static inline void devfreq_verify_within_limits(struct devfreq_policy *policy,
> + unsigned int min, unsigned int max)
> +{
> +}
> +
> +static inline void
> +devfreq_verify_within_dev_limits(struct devfreq_policy *policy)
> +{
> +}
> +
> static inline int devfreq_update_stats(struct devfreq *df)
> {
> return -EINVAL;
> }
> +
> +static inline struct devfreq *dev_to_devfreq(struct device *dev)
> +{
> + return NULL;
> +}
> +
> #endif /* CONFIG_PM_DEVFREQ */
>
> #endif /* __LINUX_DEVFREQ_H__ */
>
--
Best Regards,
Chanwoo Choi
Samsung Electronics
On Tue, 03 Jul 2018, Matthias Kaehlcke wrote:
> Instantiate the CrOS EC throttler if it is enabled in the kernel
> configuration.
>
> Signed-off-by: Matthias Kaehlcke <[email protected]>
> Reviewed-by: Brian Norris <[email protected]>
> ---
> Changes in v5:
> - added 'Reviewed-by: Brian Norris <[email protected]>' tag
>
> Changes in v4:
> - register throttler in cros_ec_dev.c instead of cros_ec.c
>
> Changes in v3:
> - patch added to series
> ---
> drivers/mfd/cros_ec_dev.c | 19 +++++++++++++++++++
> 1 file changed, 19 insertions(+)
For my own reference:
Acked-for-MFD-by: Lee Jones <[email protected]>
--
Lee Jones [李琼斯]
Linaro Services Technical Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog
On 03-07-18, 16:47, Matthias Kaehlcke wrote:
> The Throttler is used for non-thermal throttling of system components
> like CPUs or devfreq devices.
>
> Signed-off-by: Matthias Kaehlcke <[email protected]>
> --
> Changes in v5:
> - patch added to the series (replacing "dt-bindings: PM / OPP: add
> opp-throttlers property")
> ---
> .../devicetree/bindings/misc/throttler.txt | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
> create mode 100644 Documentation/devicetree/bindings/misc/throttler.txt
>
> diff --git a/Documentation/devicetree/bindings/misc/throttler.txt b/Documentation/devicetree/bindings/misc/throttler.txt
> new file mode 100644
> index 000000000000..2ea80c62dbe1
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/misc/throttler.txt
> @@ -0,0 +1,13 @@
> +Throttler driver
> +
> +The Throttler is used for non-thermal throttling of system components like
> +CPUs or devfreq devices.
> +
> +Required properties:
> +- throttler-opps Array of OPP-v2 phandles with the OPPs used for
> + throttling.
> +
> +Example:
> + throttler {
> + throttler-opps = <&cpu0_opp03, &cpu1_opp02, &gpu_opp03>;
> + };
All you do with these phandles for now is that you parse them and read
"opp-hz" value. For that purpose current bindings look sufficient.
From OPP point of view:
Acked-by: Viresh Kumar <[email protected]>
--
viresh
On Wednesday, July 4, 2018 1:47:01 AM CEST Matthias Kaehlcke wrote:
> cpufreq stubs out some functions when CONFIG_CPU_FREQ=n , but
> cpufreq_update_policy() is not among them. The throttler driver
> (https://patchwork.kernel.org/patch/10453351/) uses cpufreq as one
> possible throttling mechanism, but it can still be useful without
> cpufreq. Stubbing out cpufreq_update_policy() allows the throttler
> driver to be built without ugly #ifdef'ery when cpufreq is disabled.
>
> Signed-off-by: Matthias Kaehlcke <[email protected]>
> Reviewed-by: Brian Norris <[email protected]>
> ---
> Changes in v5:
> - none
>
> Changes in v4:
> - added 'Reviewed-by: Brian Norris <[email protected]>' tag
>
> Changes in v3:
> - patch added to series
> ---
> include/linux/cpufreq.h | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index 882a9b9e34bc..dba8c4951e2e 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -210,6 +210,7 @@ static inline unsigned int cpufreq_quick_get_max(unsigned int cpu)
> return 0;
> }
> static inline void disable_cpufreq(void) { }
> +static inline void cpufreq_update_policy(unsigned int cpu) { }
> #endif
>
> #ifdef CONFIG_CPU_FREQ_STAT
>
I can take this patch if you want me to.
Thanks,
Rafael
On 03-07-18, 16:47, Matthias Kaehlcke wrote:
> cpufreq stubs out some functions when CONFIG_CPU_FREQ=n , but
> cpufreq_update_policy() is not among them. The throttler driver
> (https://patchwork.kernel.org/patch/10453351/) uses cpufreq as one
> possible throttling mechanism, but it can still be useful without
> cpufreq. Stubbing out cpufreq_update_policy() allows the throttler
> driver to be built without ugly #ifdef'ery when cpufreq is disabled.
>
> Signed-off-by: Matthias Kaehlcke <[email protected]>
> Reviewed-by: Brian Norris <[email protected]>
> ---
> Changes in v5:
> - none
>
> Changes in v4:
> - added 'Reviewed-by: Brian Norris <[email protected]>' tag
>
> Changes in v3:
> - patch added to series
> ---
> include/linux/cpufreq.h | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index 882a9b9e34bc..dba8c4951e2e 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -210,6 +210,7 @@ static inline unsigned int cpufreq_quick_get_max(unsigned int cpu)
> return 0;
> }
> static inline void disable_cpufreq(void) { }
> +static inline void cpufreq_update_policy(unsigned int cpu) { }
> #endif
>
> #ifdef CONFIG_CPU_FREQ_STAT
Acked-by: Viresh Kumar <[email protected]>
--
viresh
Hi Chanwoo,
On Wed, Jul 04, 2018 at 11:20:31AM +0900, Chanwoo Choi wrote:
> Hi Matthias,
>
> On 2018년 07월 04일 08:46, Matthias Kaehlcke wrote:
> > Commit ab8f58ad72c4 ("PM / devfreq: Set min/max_freq when adding the
> > devfreq device") initializes df->min/max_freq with the min/max OPP when
> > the device is added. Later commit f1d981eaecf8 ("PM / devfreq: Use the
> > available min/max frequency") adds df->scaling_min/max_freq and the
> > following to the frequency adjustment code:
> >
> > max_freq = MIN(devfreq->scaling_max_freq, devfreq->max_freq);
> >
> > With the current handling of min/max_freq this is incorrect:
> >
> > Even though df->max_freq is now initialized to a value != 0 user space
> > can still set it to 0, in this case max_freq would be 0 instead of
> > df->scaling_max_freq as intended. In consequence the frequency adjustment
> > is not performed:
> >
> > if (max_freq && freq > max_freq) {
> > freq = max_freq;
> >
> > To fix this set df->min/max freq to the min/max OPP in max/max_freq_store,
> > when the user passes a value of 0. This also prevents df->max_freq from
> > being set below the min OPP when df->min_freq is 0, and similar for
> > min_freq. Since it is now guaranteed that df->min/max_freq can't be 0 the
> > checks for this case can be removed.
> >
> > Fixes: f1d981eaecf8 ("PM / devfreq: Use the available min/max frequency")
> > Signed-off-by: Matthias Kaehlcke <[email protected]>
> > Reviewed-by: Brian Norris <[email protected]>
> > ---
> > Changes in v5:
> > - none
> >
> > Changes in v4:
> > - added 'Reviewed-by: Brian Norris <[email protected]>' tag
> >
> > Changes in v3:
> > - none
> >
> > Changes in v2:
> > - handle freq tables sorted in ascending and descending order in
> > min/max_freq_store()
> > - use same order for conditional statements in min/max_freq_store()
> > ---
> > drivers/devfreq/devfreq.c | 42 ++++++++++++++++++++++++++++-----------
> > 1 file changed, 30 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
> > index 0057ef5b0a98..6f604f8b2b81 100644
> > --- a/drivers/devfreq/devfreq.c
> > +++ b/drivers/devfreq/devfreq.c
> > @@ -283,11 +283,11 @@ int update_devfreq(struct devfreq *devfreq)
> > max_freq = MIN(devfreq->scaling_max_freq, devfreq->max_freq);
> > min_freq = MAX(devfreq->scaling_min_freq, devfreq->min_freq);
> >
> > - if (min_freq && freq < min_freq) {
> > + if (freq < min_freq) {
> > freq = min_freq;
> > flags &= ~DEVFREQ_FLAG_LEAST_UPPER_BOUND; /* Use GLB */
> > }
> > - if (max_freq && freq > max_freq) {
> > + if (freq > max_freq) {
> > freq = max_freq;
> > flags |= DEVFREQ_FLAG_LEAST_UPPER_BOUND; /* Use LUB */
> > }
> > @@ -1122,18 +1122,27 @@ static ssize_t min_freq_store(struct device *dev, struct device_attribute *attr,
> > {
> > struct devfreq *df = to_devfreq(dev);
> > unsigned long value;
> > + unsigned long *freq_table;
>
> You can move 'freq_table' under 'else' statement.
Will do
> > int ret;
> > - unsigned long max;
> >
> > ret = sscanf(buf, "%lu", &value);
> > if (ret != 1)
> > return -EINVAL;
> >
> > mutex_lock(&df->lock);
> > - max = df->max_freq;
> > - if (value && max && value > max) {
> > - ret = -EINVAL;
> > - goto unlock;
> > +
> > + if (value) {
> > + if (value > df->max_freq) {
> > + ret = -EINVAL;
> > + goto unlock;
> > + }
> > + } else {
> > + freq_table = df->profile->freq_table;
> > + /* typical order is ascending, some drivers use descending */
>
> You better to explain what is doing of following code.
> How about modifying it as following?
>
> /* Get minimum frequency according to sorting way */
Ok, will slightly modify it to 'sorting order' if you don't mind.
> > + if (freq_table[0] < freq_table[df->profile->max_state - 1])
> > + value = freq_table[0];
> > + else
> > + value = freq_table[df->profile->max_state - 1];
> > }
> >
> > df->min_freq = value;
> > @@ -1157,18 +1166,27 @@ static ssize_t max_freq_store(struct device *dev, struct device_attribute *attr,
> > {
> > struct devfreq *df = to_devfreq(dev);
> > unsigned long value;
> > + unsigned long *freq_table;
>
> ditto. You can move 'freq_table' under 'else' statement.
Will do
> > int ret;
> > - unsigned long min;
> >
> > ret = sscanf(buf, "%lu", &value);
> > if (ret != 1)
> > return -EINVAL;
> >
> > mutex_lock(&df->lock);
> > - min = df->min_freq;
> > - if (value && min && value < min) {
> > - ret = -EINVAL;
> > - goto unlock;
> > +
> > + if (value) {
> > + if (value < df->min_freq) {
> > + ret = -EINVAL;
> > + goto unlock;
> > + }
> > + } else {
> > + freq_table = df->profile->freq_table;
> > + /* typical order is ascending, some drivers use descending */
>
> ditto.
> /* Get maximum frequency according to sorting way */
Ok
> > + if (freq_table[0] < freq_table[df->profile->max_state - 1])
> > + value = freq_table[df->profile->max_state - 1];
> > + else
> > + value = freq_table[0];
> > }
> >
> > df->max_freq = value;
> >
>
> If you agree my comment and modify this patch according to my comment,
> feel free to add my review tag.
> - Reviewed-by: Chanwoo Choi <[email protected]>
Thanks for the review!
Matthias
Hi,
On Wed, Jul 04, 2018 at 11:51:30AM +0900, Chanwoo Choi wrote:
> Hi,
>
> On 2018년 07월 04일 08:46, Matthias Kaehlcke wrote:
> > Move variables related with devfreq policy changes from struct devfreq
> > to the new struct devfreq_policy and add a policy field to struct devfreq.
> >
> > The following variables are moved:
> >
> > df->min/max_freq => p->user.min/max_freq
> > df->scaling_min/max_freq => p->devinfo.min/max_freq
> > df->governor => p->governor
> > df->governor_name => p->governor_name
> >
> > Signed-off-by: Matthias Kaehlcke <[email protected]>
> > Reviewed-by: Brian Norris <[email protected]>
> > ---
> > Changes in v5:
> > - none
> >
> > Changes in v4:
> > - added 'Reviewed-by: Brian Norris <[email protected]>' tag
> >
> > Changes in v3:
> > - none
> >
> > Changes in v2:
> > - performance, powersave and simpleondemand governors don't need changes
> > with "PM / devfreq: Don't adjust to user limits in governors"
> > - formatting fixes
> > ---
> > drivers/devfreq/devfreq.c | 137 ++++++++++++++++-------------
> > drivers/devfreq/governor_passive.c | 4 +-
> > include/linux/devfreq.h | 38 +++++---
> > 3 files changed, 103 insertions(+), 76 deletions(-)
> >
>
> (skip)
>
> >
> > diff --git a/drivers/devfreq/governor_passive.c b/drivers/devfreq/governor_passive.c
> > index 3bc29acbd54e..e0987c749ec2 100644
> > --- a/drivers/devfreq/governor_passive.c
> > +++ b/drivers/devfreq/governor_passive.c
> > @@ -99,12 +99,12 @@ static int update_devfreq_passive(struct devfreq *devfreq, unsigned long freq)
> > {
> > int ret;
> >
> > - if (!devfreq->governor)
> > + if (!devfreq->policy.governor)
> > return -EINVAL;
> >
> > mutex_lock_nested(&devfreq->lock, SINGLE_DEPTH_NESTING);
> >
> > - ret = devfreq->governor->get_target_freq(devfreq, &freq);
> > + ret = devfreq->policy.governor->get_target_freq(devfreq, &freq);
> > if (ret < 0)
> > goto out;
> >
> > diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
> > index 3aae5b3af87c..9bf23b976f4d 100644
> > --- a/include/linux/devfreq.h
> > +++ b/include/linux/devfreq.h
> > @@ -109,6 +109,30 @@ struct devfreq_dev_profile {
> > unsigned int max_state;
> > };
> >
> > +/**
> > + * struct devfreq_freq_limits - Devfreq frequency limits
> > + * @min_freq: minimum frequency
> > + * @max_freq: maximum frequency
> > + */
> > +struct devfreq_freq_limits {
> > + unsigned long min_freq;
> > + unsigned long max_freq;
> > +};
> > +
> > +/**
> > + * struct devfreq_policy - Devfreq policy
> > + * @user: frequency limits requested by the user
> > + * @devinfo: frequency limits of the device (available OPPs)
> > + * @governor: method how to choose frequency based on the usage.
>
> nitpick. remove '.' on the end of line.
Ok
> > + * @governor_name: devfreq governor name for use with this devfreq
> > + */
> > +struct devfreq_policy {
> > + struct devfreq_freq_limits user;
> > + struct devfreq_freq_limits devinfo;
> > + const struct devfreq_governor *governor;
> > + char governor_name[DEVFREQ_NAME_LEN];
> > +};
> > +
> > /**
> > * struct devfreq - Device devfreq structure
> > * @node: list node - contains the devices with devfreq that have been
> > @@ -117,8 +141,6 @@ struct devfreq_dev_profile {
> > * @dev: device registered by devfreq class. dev.parent is the device
> > * using devfreq.
> > * @profile: device-specific devfreq profile
> > - * @governor: method how to choose frequency based on the usage.
> > - * @governor_name: devfreq governor name for use with this devfreq
> > * @nb: notifier block used to notify devfreq object that it should
> > * reevaluate operable frequencies. Devfreq users may use
> > * devfreq.nb to the corresponding register notifier call chain.
> > @@ -126,10 +148,7 @@ struct devfreq_dev_profile {
> > * @previous_freq: previously configured frequency value.
> > * @data: Private data of the governor. The devfreq framework does not
> > * touch this.
> > - * @min_freq: Limit minimum frequency requested by user (0: none)
> > - * @max_freq: Limit maximum frequency requested by user (0: none)
> > - * @scaling_min_freq: Limit minimum frequency requested by OPP interface
> > - * @scaling_max_freq: Limit maximum frequency requested by OPP interface
> > + * @policy: Policy for frequency adjustments
>
> The devfreq_policy contains the range of frequency and governor information.
> But, this description focus on the frequency. You need to explain the more
> correct description of 'policy'.
I wouldn't say that the focus is on 'frequency', but on 'frequency
adjustments', and the governor is an integral part of them.
I can change it to "Policy for frequency adjustments, including
frequency limits and the governor" if you prefer. I'm open to other
suggestions.
> > * @stop_polling: devfreq polling status of a device.
> > * @total_trans: Number of devfreq transitions
> > * @trans_table: Statistics of devfreq transitions
> > @@ -151,8 +170,6 @@ struct devfreq {
> > struct mutex lock;
> > struct device dev;
> > struct devfreq_dev_profile *profile;
> > - const struct devfreq_governor *governor;
> > - char governor_name[DEVFREQ_NAME_LEN];
> > struct notifier_block nb;
> > struct delayed_work work;
> >
> > @@ -161,10 +178,7 @@ struct devfreq {
> >
> > void *data; /* private data for governors */
> >
> > - unsigned long min_freq;
> > - unsigned long max_freq;
> > - unsigned long scaling_min_freq;
> > - unsigned long scaling_max_freq;
> > + struct devfreq_policy policy;
>
> I recommend that you better to move under 'struct devfreq_dev_profile'
> as following:
>
> struct devfreq_dev_profile *profile;
> struct devfreq_policy policy;
Will do
Thanks for the review!
Hi Chanwoo,
On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote:
> Firstly,
> I'm not sure why devfreq needs the devfreq_verify_within_limits() function.
>
> devfreq already used the OPP interface as default. It means that
> the outside of 'drivers/devfreq' can disable/enable the frequency
> such as drivers/thermal/devfreq_cooling.c. Also, when some device
> drivers disable/enable the specific frequency, the devfreq core
> consider them.
>
> So, devfreq doesn't need to devfreq_verify_within_limits() because
> already support some interface to change the minimum/maximum frequency
> of devfreq device.
>
> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()'
> to change the minimum/maximum frequency of cpu. some device driver cannot
> change the minimum/maximum frequency through OPP interface.
>
> But, in case of devfreq subsystem, as I explained already, devfreq support
> the OPP interface as default way. devfreq subsystem doesn't need to add
> other way to change the minimum/maximum frequency.
Using the OPP interface exclusively works as long as a
enabling/disabling of OPPs is limited to a single driver
(drivers/thermal/devfreq_cooling.c). When multiple drivers are
involved you need a way to resolve conflicts, that's the purpose of
devfreq_verify_within_limits(). Please let me know if there are
existing mechanisms for conflict resolution that I overlooked.
Possibly drivers/thermal/devfreq_cooling.c could be migrated to use
devfreq_verify_within_limits() instead of the OPP interface if
desired, however this seems beyond the scope of this series.
> Secondly,
> This patch send the 'struct devfreq_policy' instance as the data
> when sending the notification as following:
>
> srcu_notifier_call_chain(&devfreq->policy_notifier_list,
> DEVFREQ_ADJUST, policy);
>
> But, I think that if devfreq core sends the 'struct devfreq_freq_limits'
> instance instead of 'struct devfreq_policy', it is enough.
> Because receiver of DEVFREQ_ADJUST just will use the min_freq/max_freq variables.
>
> So, I tried to find the cpufreq's case. The some device drivers using
> CPUFREQ_POLICY_NOTIFIER uses following variables of 'struct cpufreq_policy'.
> It means that receiver of CPUFREQ_POLICY_NOTIFIER don't need to other
> information/variables except for min/max frequency.
>
> - policy->min
> - policy->max
> - policy->cpuinfo.max_freq
> - policy->cpuinfo.min_freq
> - policy->cpu : not related to devfreq)
> - policy->related_cpus : not related to devfreq)
>
> - list of device drivers using CPUFREQ_POLICY_NOTIFIER (linux kernel is v4.18-rc1)
> $ grep -rn "CPUFREQ_POLICY_NOTIFIER" .
> ./drivers/macintosh/windfarm_cpufreq_clamp.c
> ./drivers/thermal/cpu_cooling.c
> ./drivers/thermal/cpu_cooling.c
> ./drivers/acpi/processor_thermal.c
> ./drivers/acpi/processor_thermal.c
> ./drivers/acpi/processor_perflib.c
> ./drivers/acpi/processor_perflib.c
> ./drivers/base/arch_topology.c
> ./drivers/base/arch_topology.c
> ./drivers/video/fbdev/sa1100fb.c
> ./drivers/video/fbdev/pxafb.c
> ./drivers/cpufreq/ppc_cbe_cpufreq_pmi.c
> ./drivers/cpufreq/cpufreq.c
> ./drivers/cpufreq/cpufreq.c
> ./drivers/cpufreq/cpufreq.c
> ./drivers/cpufreq/cpufreq.c
Thanks for your investigation.
I decided to mirror the cpufreq interface for consistency, but I agree
that 'struct devfreq_freq_limits' could be passed instead of the
policy object. I'm fine with changing that.
> On 2018년 07월 04일 08:46, Matthias Kaehlcke wrote:
> > Policy notifiers are called before a frequency change and may narrow
> > the min/max frequency range in devfreq_policy, which is used to adjust
> > the target frequency if it is beyond this range.
> >
> > Also add a few helpers:
> > - devfreq_verify_within_[dev_]limits()
> > - should be used by the notifiers for policy adjustments.
> > - dev_to_devfreq()
> > - lookup a devfreq strict from a device pointer
> >
> > Signed-off-by: Matthias Kaehlcke <[email protected]>
> > Reviewed-by: Brian Norris <[email protected]>
> > ---
> > Changes in v5:
> > - none
> >
> > Changes in v4:
> > - Fixed typo in commit message: devfreg => devfreq
> > - added 'Reviewed-by: Brian Norris <[email protected]>' tag
> >
> > Changes in v3:
> > - devfreq.h: fixed misspelling of struct devfreq_policy
> >
> > Changes in v2:
> > - performance, powersave and simpleondemand governors don't need changes
> > with "PM / devfreq: Don't adjust to user limits in governors"
> > - formatting fixes
> > ---
> > drivers/devfreq/devfreq.c | 48 ++++++++++++++++++++++-------
> > include/linux/devfreq.h | 65 +++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 102 insertions(+), 11 deletions(-)
> >
> > diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
> > index 21604d6ae2b8..4cbaa7ad1972 100644
> > --- a/drivers/devfreq/devfreq.c
> > +++ b/drivers/devfreq/devfreq.c
> > @@ -72,6 +72,21 @@ static struct devfreq *find_device_devfreq(struct device *dev)
> > return ERR_PTR(-ENODEV);
> > }
> >
> > +/**
> > + * dev_to_devfreq() - find devfreq struct using device pointer
> > + * @dev: device pointer used to lookup device devfreq.
> > + */
> > +struct devfreq *dev_to_devfreq(struct device *dev)
> > +{
> > + struct devfreq *devfreq;
> > +
> > + mutex_lock(&devfreq_list_lock);
> > + devfreq = find_device_devfreq(dev);
> > + mutex_unlock(&devfreq_list_lock);
> > +
> > + return devfreq;
> > +}
> > +
> > static unsigned long find_available_min_freq(struct devfreq *devfreq)
> > {
> > struct dev_pm_opp *opp;
> > @@ -269,20 +284,21 @@ int update_devfreq(struct devfreq *devfreq)
> > if (!policy->governor)
> > return -EINVAL;
> >
> > + policy->min = policy->devinfo.min_freq;
> > + policy->max = policy->devinfo.max_freq;
>
> Why don't you consider 'policy->user.max/min_freq' as following?
> As I already commented, I think that 'struct devfreq_freq_limits' is enough
> instead of 'struct devfreq_policy'.
>
> ->max_freq = MIN(policy->devinfo.max_freq, policy->user.max_freq);
> ->min_freq = MAX(policy->devinfo.min_freq, policy->user.min_freq);
You mean limiting the frequency range with user.min/max before
DEVFREQ_ADJUST instead of adjusting it afterwards? That's fine with
me.
Thanks
Matthias
Hi,
On Wed, Jul 04, 2018 at 02:30:32PM +0900, Chanwoo Choi wrote:
> I didn't see any framework which exporting the class instance.
> It is very dangerous. Unknown device drivers is able to reset
> the 'devfreq_class' instance. I can't agree this approach.
While I agree that it is potential dangerous it is actually a common
practice to export the class:
grep "extern struct class " include/linux/ -R
include/linux/rio.h:extern struct class rio_mport_class;
include/linux/tty.h:extern struct class *tty_class;
include/linux/fb.h:extern struct class *fb_class;
include/linux/ide.h:extern struct class *ide_port_class;
include/linux/device.h:extern struct class * __must_check __class_create(struct module *owner,
include/linux/devfreq.h:extern struct class *devfreq_class;
include/linux/switchtec.h:extern struct class *switchtec_class;
include/linux/input.h:extern struct class input_class;
include/linux/genhd.h:extern struct class block_class;
include/linux/power_supply.h:extern struct class *power_supply_class;
include/linux/rtc.h:extern struct class *rtc_class;
struct class_interface and class_interface_register() would be
pointless without exported classes.
My understanding is that the kernel is often lax on encapsulation and
exposes private/delicate data pragmatically within the kernel when
needed because "the kernel trusts itself".
Thanks
Matthias
On Wed, Jul 04, 2018 at 12:41:21PM +0200, Rafael J. Wysocki wrote:
> On Wednesday, July 4, 2018 1:47:01 AM CEST Matthias Kaehlcke wrote:
> > cpufreq stubs out some functions when CONFIG_CPU_FREQ=n , but
> > cpufreq_update_policy() is not among them. The throttler driver
> > (https://patchwork.kernel.org/patch/10453351/) uses cpufreq as one
> > possible throttling mechanism, but it can still be useful without
> > cpufreq. Stubbing out cpufreq_update_policy() allows the throttler
> > driver to be built without ugly #ifdef'ery when cpufreq is disabled.
> >
> > Signed-off-by: Matthias Kaehlcke <[email protected]>
> > Reviewed-by: Brian Norris <[email protected]>
> > ---
> > Changes in v5:
> > - none
> >
> > Changes in v4:
> > - added 'Reviewed-by: Brian Norris <[email protected]>' tag
> >
> > Changes in v3:
> > - patch added to series
> > ---
> > include/linux/cpufreq.h | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> > index 882a9b9e34bc..dba8c4951e2e 100644
> > --- a/include/linux/cpufreq.h
> > +++ b/include/linux/cpufreq.h
> > @@ -210,6 +210,7 @@ static inline unsigned int cpufreq_quick_get_max(unsigned int cpu)
> > return 0;
> > }
> > static inline void disable_cpufreq(void) { }
> > +static inline void cpufreq_update_policy(unsigned int cpu) { }
> > #endif
> >
> > #ifdef CONFIG_CPU_FREQ_STAT
> >
>
> I can take this patch if you want me to.
Sounds good.
This series is moving forward slower than I had hoped and there are a
few patches that are useful independently of the throttler at the end
of the series. It probably makes sense to start integrating them
rather than carrying them around unchanged from version to version and
repeatedly spam <world>.
Thanks
Matthias
Hi Matthias,
On 2018년 07월 07일 01:36, Matthias Kaehlcke wrote:
> Hi Chanwoo,
>
> On Wed, Jul 04, 2018 at 11:20:31AM +0900, Chanwoo Choi wrote:
>> Hi Matthias,
>>
>> On 2018년 07월 04일 08:46, Matthias Kaehlcke wrote:
>>> Commit ab8f58ad72c4 ("PM / devfreq: Set min/max_freq when adding the
>>> devfreq device") initializes df->min/max_freq with the min/max OPP when
>>> the device is added. Later commit f1d981eaecf8 ("PM / devfreq: Use the
>>> available min/max frequency") adds df->scaling_min/max_freq and the
>>> following to the frequency adjustment code:
>>>
>>> max_freq = MIN(devfreq->scaling_max_freq, devfreq->max_freq);
>>>
>>> With the current handling of min/max_freq this is incorrect:
>>>
>>> Even though df->max_freq is now initialized to a value != 0 user space
>>> can still set it to 0, in this case max_freq would be 0 instead of
>>> df->scaling_max_freq as intended. In consequence the frequency adjustment
>>> is not performed:
>>>
>>> if (max_freq && freq > max_freq) {
>>> freq = max_freq;
>>>
>>> To fix this set df->min/max freq to the min/max OPP in max/max_freq_store,
>>> when the user passes a value of 0. This also prevents df->max_freq from
>>> being set below the min OPP when df->min_freq is 0, and similar for
>>> min_freq. Since it is now guaranteed that df->min/max_freq can't be 0 the
>>> checks for this case can be removed.
>>>
>>> Fixes: f1d981eaecf8 ("PM / devfreq: Use the available min/max frequency")
>>> Signed-off-by: Matthias Kaehlcke <[email protected]>
>>> Reviewed-by: Brian Norris <[email protected]>
>>> ---
>>> Changes in v5:
>>> - none
>>>
>>> Changes in v4:
>>> - added 'Reviewed-by: Brian Norris <[email protected]>' tag
>>>
>>> Changes in v3:
>>> - none
>>>
>>> Changes in v2:
>>> - handle freq tables sorted in ascending and descending order in
>>> min/max_freq_store()
>>> - use same order for conditional statements in min/max_freq_store()
>>> ---
>>> drivers/devfreq/devfreq.c | 42 ++++++++++++++++++++++++++++-----------
>>> 1 file changed, 30 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
>>> index 0057ef5b0a98..6f604f8b2b81 100644
>>> --- a/drivers/devfreq/devfreq.c
>>> +++ b/drivers/devfreq/devfreq.c
>>> @@ -283,11 +283,11 @@ int update_devfreq(struct devfreq *devfreq)
>>> max_freq = MIN(devfreq->scaling_max_freq, devfreq->max_freq);
>>> min_freq = MAX(devfreq->scaling_min_freq, devfreq->min_freq);
>>>
>>> - if (min_freq && freq < min_freq) {
>>> + if (freq < min_freq) {
>>> freq = min_freq;
>>> flags &= ~DEVFREQ_FLAG_LEAST_UPPER_BOUND; /* Use GLB */
>>> }
>>> - if (max_freq && freq > max_freq) {
>>> + if (freq > max_freq) {
>>> freq = max_freq;
>>> flags |= DEVFREQ_FLAG_LEAST_UPPER_BOUND; /* Use LUB */
>>> }
>>> @@ -1122,18 +1122,27 @@ static ssize_t min_freq_store(struct device *dev, struct device_attribute *attr,
>>> {
>>> struct devfreq *df = to_devfreq(dev);
>>> unsigned long value;
>>> + unsigned long *freq_table;
>>
>> You can move 'freq_table' under 'else' statement.
>
> Will do
>
>>> int ret;
>>> - unsigned long max;
>>>
>>> ret = sscanf(buf, "%lu", &value);
>>> if (ret != 1)
>>> return -EINVAL;
>>>
>>> mutex_lock(&df->lock);
>>> - max = df->max_freq;
>>> - if (value && max && value > max) {
>>> - ret = -EINVAL;
>>> - goto unlock;
>>> +
>>> + if (value) {
>>> + if (value > df->max_freq) {
>>> + ret = -EINVAL;
>>> + goto unlock;
>>> + }
>>> + } else {
>>> + freq_table = df->profile->freq_table;
>>> + /* typical order is ascending, some drivers use descending */
>>
>> You better to explain what is doing of following code.
>> How about modifying it as following?
>>
>> /* Get minimum frequency according to sorting way */
>
> Ok, will slightly modify it to 'sorting order' if you don't mind.
I don't mind of 'soring order'. Thanks.
(snip)
--
Best Regards,
Chanwoo Choi
Samsung Electronics
Hi Matthias,
On 2018년 07월 07일 02:07, Matthias Kaehlcke wrote:
> Hi,
>
> On Wed, Jul 04, 2018 at 11:51:30AM +0900, Chanwoo Choi wrote:
>> Hi,
>>
>> On 2018년 07월 04일 08:46, Matthias Kaehlcke wrote:
>>> Move variables related with devfreq policy changes from struct devfreq
>>> to the new struct devfreq_policy and add a policy field to struct devfreq.
>>>
>>> The following variables are moved:
>>>
>>> df->min/max_freq => p->user.min/max_freq
>>> df->scaling_min/max_freq => p->devinfo.min/max_freq
>>> df->governor => p->governor
>>> df->governor_name => p->governor_name
>>>
>>> Signed-off-by: Matthias Kaehlcke <[email protected]>
>>> Reviewed-by: Brian Norris <[email protected]>
>>> ---
>>> Changes in v5:
>>> - none
>>>
>>> Changes in v4:
>>> - added 'Reviewed-by: Brian Norris <[email protected]>' tag
>>>
>>> Changes in v3:
>>> - none
>>>
>>> Changes in v2:
>>> - performance, powersave and simpleondemand governors don't need changes
>>> with "PM / devfreq: Don't adjust to user limits in governors"
>>> - formatting fixes
>>> ---
>>> drivers/devfreq/devfreq.c | 137 ++++++++++++++++-------------
>>> drivers/devfreq/governor_passive.c | 4 +-
>>> include/linux/devfreq.h | 38 +++++---
>
>
>>> 3 files changed, 103 insertions(+), 76 deletions(-)
>>>
>>
>> (skip)
>>
>>>
>>> diff --git a/drivers/devfreq/governor_passive.c b/drivers/devfreq/governor_passive.c
>>> index 3bc29acbd54e..e0987c749ec2 100644
>>> --- a/drivers/devfreq/governor_passive.c
>>> +++ b/drivers/devfreq/governor_passive.c
>>> @@ -99,12 +99,12 @@ static int update_devfreq_passive(struct devfreq *devfreq, unsigned long freq)
>>> {
>>> int ret;
>>>
>>> - if (!devfreq->governor)
>>> + if (!devfreq->policy.governor)
>>> return -EINVAL;
>>>
>>> mutex_lock_nested(&devfreq->lock, SINGLE_DEPTH_NESTING);
>>>
>>> - ret = devfreq->governor->get_target_freq(devfreq, &freq);
>>> + ret = devfreq->policy.governor->get_target_freq(devfreq, &freq);
>>> if (ret < 0)
>>> goto out;
>>>
>>> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
>>> index 3aae5b3af87c..9bf23b976f4d 100644
>>> --- a/include/linux/devfreq.h
>>> +++ b/include/linux/devfreq.h
>>> @@ -109,6 +109,30 @@ struct devfreq_dev_profile {
>>> unsigned int max_state;
>>> };
>>>
>>> +/**
>>> + * struct devfreq_freq_limits - Devfreq frequency limits
>>> + * @min_freq: minimum frequency
>>> + * @max_freq: maximum frequency
>>> + */
>>> +struct devfreq_freq_limits {
>>> + unsigned long min_freq;
>>> + unsigned long max_freq;
>>> +};
>>> +
>>> +/**
>>> + * struct devfreq_policy - Devfreq policy
>>> + * @user: frequency limits requested by the user
>>> + * @devinfo: frequency limits of the device (available OPPs)
>>> + * @governor: method how to choose frequency based on the usage.
>>
>> nitpick. remove '.' on the end of line.
>
> Ok
>
>>> + * @governor_name: devfreq governor name for use with this devfreq
>>> + */
>>> +struct devfreq_policy {
>>> + struct devfreq_freq_limits user;
>>> + struct devfreq_freq_limits devinfo;
>>> + const struct devfreq_governor *governor;
>>> + char governor_name[DEVFREQ_NAME_LEN];
>>> +};
>>> +
>>> /**
>>> * struct devfreq - Device devfreq structure
>>> * @node: list node - contains the devices with devfreq that have been
>>> @@ -117,8 +141,6 @@ struct devfreq_dev_profile {
>>> * @dev: device registered by devfreq class. dev.parent is the device
>>> * using devfreq.
>>> * @profile: device-specific devfreq profile
>>> - * @governor: method how to choose frequency based on the usage.
>>> - * @governor_name: devfreq governor name for use with this devfreq
>>> * @nb: notifier block used to notify devfreq object that it should
>>> * reevaluate operable frequencies. Devfreq users may use
>>> * devfreq.nb to the corresponding register notifier call chain.
>>> @@ -126,10 +148,7 @@ struct devfreq_dev_profile {
>>> * @previous_freq: previously configured frequency value.
>>> * @data: Private data of the governor. The devfreq framework does not
>>> * touch this.
>>> - * @min_freq: Limit minimum frequency requested by user (0: none)
>>> - * @max_freq: Limit maximum frequency requested by user (0: none)
>>> - * @scaling_min_freq: Limit minimum frequency requested by OPP interface
>>> - * @scaling_max_freq: Limit maximum frequency requested by OPP interface
>>> + * @policy: Policy for frequency adjustments
>>
>> The devfreq_policy contains the range of frequency and governor information.
>> But, this description focus on the frequency. You need to explain the more
>> correct description of 'policy'.
>
> I wouldn't say that the focus is on 'frequency', but on 'frequency
> adjustments', and the governor is an integral part of them.
OK. I agree your original description.
>
> I can change it to "Policy for frequency adjustments, including
> frequency limits and the governor" if you prefer. I'm open to other
> suggestions.
(snip)
--
Best Regards,
Chanwoo Choi
Samsung Electronics
Hi Matthias,
On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote:
> Hi Chanwoo,
>
> On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote:
>
>> Firstly,
>> I'm not sure why devfreq needs the devfreq_verify_within_limits() function.
>>
>> devfreq already used the OPP interface as default. It means that
>> the outside of 'drivers/devfreq' can disable/enable the frequency
>> such as drivers/thermal/devfreq_cooling.c. Also, when some device
>> drivers disable/enable the specific frequency, the devfreq core
>> consider them.
>>
>> So, devfreq doesn't need to devfreq_verify_within_limits() because
>> already support some interface to change the minimum/maximum frequency
>> of devfreq device.
>>
>> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()'
>> to change the minimum/maximum frequency of cpu. some device driver cannot
>> change the minimum/maximum frequency through OPP interface.
>>
>> But, in case of devfreq subsystem, as I explained already, devfreq support
>> the OPP interface as default way. devfreq subsystem doesn't need to add
>> other way to change the minimum/maximum frequency.
>
> Using the OPP interface exclusively works as long as a
> enabling/disabling of OPPs is limited to a single driver
> (drivers/thermal/devfreq_cooling.c). When multiple drivers are
> involved you need a way to resolve conflicts, that's the purpose of
> devfreq_verify_within_limits(). Please let me know if there are
> existing mechanisms for conflict resolution that I overlooked.
>
> Possibly drivers/thermal/devfreq_cooling.c could be migrated to use
> devfreq_verify_within_limits() instead of the OPP interface if
> desired, however this seems beyond the scope of this series.
Actually, if we uses this approach, it doesn't support the multiple drivers too.
If non throttler drivers uses devfreq_verify_within_limits(), the conflict
happen.
To resolve the conflict for multiple device driver, maybe OPP interface
have to support 'usage_count' such as clk_enable/disable().
>
>> Secondly,
>> This patch send the 'struct devfreq_policy' instance as the data
>> when sending the notification as following:
>>
>> srcu_notifier_call_chain(&devfreq->policy_notifier_list,
>> DEVFREQ_ADJUST, policy);
>>
>> But, I think that if devfreq core sends the 'struct devfreq_freq_limits'
>> instance instead of 'struct devfreq_policy', it is enough.
>> Because receiver of DEVFREQ_ADJUST just will use the min_freq/max_freq variables.
>>
>> So, I tried to find the cpufreq's case. The some device drivers using
>> CPUFREQ_POLICY_NOTIFIER uses following variables of 'struct cpufreq_policy'.
>> It means that receiver of CPUFREQ_POLICY_NOTIFIER don't need to other
>> information/variables except for min/max frequency.
>>
>> - policy->min
>> - policy->max
>> - policy->cpuinfo.max_freq
>> - policy->cpuinfo.min_freq
>> - policy->cpu : not related to devfreq)
>> - policy->related_cpus : not related to devfreq)
>>
>> - list of device drivers using CPUFREQ_POLICY_NOTIFIER (linux kernel is v4.18-rc1)
>> $ grep -rn "CPUFREQ_POLICY_NOTIFIER" .
>> ./drivers/macintosh/windfarm_cpufreq_clamp.c
>> ./drivers/thermal/cpu_cooling.c
>> ./drivers/thermal/cpu_cooling.c
>> ./drivers/acpi/processor_thermal.c
>> ./drivers/acpi/processor_thermal.c
>> ./drivers/acpi/processor_perflib.c
>> ./drivers/acpi/processor_perflib.c
>> ./drivers/base/arch_topology.c
>> ./drivers/base/arch_topology.c
>> ./drivers/video/fbdev/sa1100fb.c
>> ./drivers/video/fbdev/pxafb.c
>> ./drivers/cpufreq/ppc_cbe_cpufreq_pmi.c
>> ./drivers/cpufreq/cpufreq.c
>> ./drivers/cpufreq/cpufreq.c
>> ./drivers/cpufreq/cpufreq.c
>> ./drivers/cpufreq/cpufreq.c
>
> Thanks for your investigation.
>
> I decided to mirror the cpufreq interface for consistency, but I agree
> that 'struct devfreq_freq_limits' could be passed instead of the
> policy object. I'm fine with changing that.
>
>> On 2018년 07월 04일 08:46, Matthias Kaehlcke wrote:
>>> Policy notifiers are called before a frequency change and may narrow
>>> the min/max frequency range in devfreq_policy, which is used to adjust
>>> the target frequency if it is beyond this range.
>>>
>>> Also add a few helpers:
>>> - devfreq_verify_within_[dev_]limits()
>>> - should be used by the notifiers for policy adjustments.
>>> - dev_to_devfreq()
>>> - lookup a devfreq strict from a device pointer
>>>
>>> Signed-off-by: Matthias Kaehlcke <[email protected]>
>>> Reviewed-by: Brian Norris <[email protected]>
>>> ---
>>> Changes in v5:
>>> - none
>>>
>>> Changes in v4:
>>> - Fixed typo in commit message: devfreg => devfreq
>>> - added 'Reviewed-by: Brian Norris <[email protected]>' tag
>>>
>>> Changes in v3:
>>> - devfreq.h: fixed misspelling of struct devfreq_policy
>>>
>>> Changes in v2:
>>> - performance, powersave and simpleondemand governors don't need changes
>>> with "PM / devfreq: Don't adjust to user limits in governors"
>>> - formatting fixes
>>> ---
>>> drivers/devfreq/devfreq.c | 48 ++++++++++++++++++++++-------
>>> include/linux/devfreq.h | 65 +++++++++++++++++++++++++++++++++++++++
>>> 2 files changed, 102 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
>>> index 21604d6ae2b8..4cbaa7ad1972 100644
>>> --- a/drivers/devfreq/devfreq.c
>>> +++ b/drivers/devfreq/devfreq.c
>>> @@ -72,6 +72,21 @@ static struct devfreq *find_device_devfreq(struct device *dev)
>>> return ERR_PTR(-ENODEV);
>>> }
>>>
>>> +/**
>>> + * dev_to_devfreq() - find devfreq struct using device pointer
>>> + * @dev: device pointer used to lookup device devfreq.
>>> + */
>>> +struct devfreq *dev_to_devfreq(struct device *dev)
>>> +{
>>> + struct devfreq *devfreq;
>>> +
>>> + mutex_lock(&devfreq_list_lock);
>>> + devfreq = find_device_devfreq(dev);
>>> + mutex_unlock(&devfreq_list_lock);
>>> +
>>> + return devfreq;
>>> +}
>>> +
>>> static unsigned long find_available_min_freq(struct devfreq *devfreq)
>>> {
>>> struct dev_pm_opp *opp;
>>> @@ -269,20 +284,21 @@ int update_devfreq(struct devfreq *devfreq)
>>> if (!policy->governor)
>>> return -EINVAL;
>>>
>>> + policy->min = policy->devinfo.min_freq;
>>> + policy->max = policy->devinfo.max_freq;
>>
>> Why don't you consider 'policy->user.max/min_freq' as following?
>> As I already commented, I think that 'struct devfreq_freq_limits' is enough
>> instead of 'struct devfreq_policy'.
>>
>> ->max_freq = MIN(policy->devinfo.max_freq, policy->user.max_freq);
>> ->min_freq = MAX(policy->devinfo.min_freq, policy->user.min_freq);
>
> You mean limiting the frequency range with user.min/max before
> DEVFREQ_ADJUST instead of adjusting it afterwards? That's fine with
> me.
>
> Thanks
>
> Matthias
>
>
>
--
Best Regards,
Chanwoo Choi
Samsung Electronics
Hi Matthias,
On 2018년 07월 07일 03:09, Matthias Kaehlcke wrote:
> Hi,
>
> On Wed, Jul 04, 2018 at 02:30:32PM +0900, Chanwoo Choi wrote:
>
>> I didn't see any framework which exporting the class instance.
>> It is very dangerous. Unknown device drivers is able to reset
>> the 'devfreq_class' instance. I can't agree this approach.
>
> While I agree that it is potential dangerous it is actually a common
> practice to export the class:
>
I tried to find the real usage of exported class instance
and I add the comment for each class instance. Almost exported class
instance are used in the their own director or some exported class
like rio_mport_class/switchtec_class are created from specific device driver
instead of subsystem.
Only following two cases are used on outside of subsystem directory.
devtmpfs.c and alarmtimer.c are core feature of linux kernel.
drivers/base/devtmpfs.c uses 'block_class'.
kernel/time/alarmtimer.c uses 'rtc_class'.
I cannot yet agree this approach due to only block_class and rtc_class.
You added the following comment why devfreq_class instance is necessary.
Actullay, I don't know the best solution right now. But, all device drivers
can be added or removed if driver uses the module type. It is not a problem
for only devfreq instance.
/*
+ * devfreq devices can be added and removed at runtime, hence they
+ * must also be handled dynamically. The class_interface notifies us
+ * whenever a device is added or removed. When the interface is
+ * registered ci->add_dev() is called for all existing devfreq
+ * devices.
*/
> grep "extern struct class " include/linux/ -R
> include/linux/rio.h:extern struct class rio_mport_class;
rio_mport_class is created on drivers/rapidio/rio-drivers.c.
It means that just device driver create the 'rio_mport_class' class
instead of any linux kernel subsystem.
> include/linux/tty.h:extern struct class *tty_class;
tty_class is not used on outside of drivers/tty
> include/linux/fb.h:extern struct class *fb_class;
fb_class is not used on outside of drivers/video/fbdev
> include/linux/ide.h:extern struct class *ide_port_class;
ide_port_class is not used on outside of drivers/ide.
> include/linux/device.h:extern struct class * __must_check __class_create(struct module *owner,
> include/linux/devfreq.h:extern struct class *devfreq_class;
not yet
> include/linux/switchtec.h:extern struct class *switchtec_class;
switchtec_class is created on drivers/pci/switch/switchtec.c
and then switchtec_class is only used on drivers/ntb/hw/mscc/ntb_hw_switchtec.c.
It is not subsystem. Just switchtec.c device driver makes the their own class.
> include/linux/input.h:extern struct class input_class;
input_class is not used on outside of drivers/input.
> include/linux/power_supply.h:extern struct class *power_supply_class;
power_supply_class is not used on outside of drivers/power/supply.
> include/linux/genhd.h:extern struct class block_class;
drivers/base/devtmpfs.c uses 'block_class'.
> include/linux/rtc.h:extern struct class *rtc_class;
kernel/time/alarmtimer.c uses 'rtc_class'.
>
> struct class_interface and class_interface_register() would be
> pointless without exported classes.
>
> My understanding is that the kernel is often lax on encapsulation and
> exposes private/delicate data pragmatically within the kernel when
> needed because "the kernel trusts itself".
>
> Thanks
>
> Matthias
>
>
>
--
Best Regards,
Chanwoo Choi
Samsung Electronics
On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote:
> Hi Matthias,
>
> On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote:
> > Hi Chanwoo,
> >
> > On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote:
> >
> >> Firstly,
> >> I'm not sure why devfreq needs the devfreq_verify_within_limits() function.
> >>
> >> devfreq already used the OPP interface as default. It means that
> >> the outside of 'drivers/devfreq' can disable/enable the frequency
> >> such as drivers/thermal/devfreq_cooling.c. Also, when some device
> >> drivers disable/enable the specific frequency, the devfreq core
> >> consider them.
> >>
> >> So, devfreq doesn't need to devfreq_verify_within_limits() because
> >> already support some interface to change the minimum/maximum frequency
> >> of devfreq device.
> >>
> >> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()'
> >> to change the minimum/maximum frequency of cpu. some device driver cannot
> >> change the minimum/maximum frequency through OPP interface.
> >>
> >> But, in case of devfreq subsystem, as I explained already, devfreq support
> >> the OPP interface as default way. devfreq subsystem doesn't need to add
> >> other way to change the minimum/maximum frequency.
> >
> > Using the OPP interface exclusively works as long as a
> > enabling/disabling of OPPs is limited to a single driver
> > (drivers/thermal/devfreq_cooling.c). When multiple drivers are
> > involved you need a way to resolve conflicts, that's the purpose of
> > devfreq_verify_within_limits(). Please let me know if there are
> > existing mechanisms for conflict resolution that I overlooked.
> >
> > Possibly drivers/thermal/devfreq_cooling.c could be migrated to use
> > devfreq_verify_within_limits() instead of the OPP interface if
> > desired, however this seems beyond the scope of this series.
>
> Actually, if we uses this approach, it doesn't support the multiple drivers too.
> If non throttler drivers uses devfreq_verify_within_limits(), the conflict
> happen.
As long as drivers limit the max freq there is no conflict, the lowest
max freq wins. I expect this to be the usual case, apparently it
worked for cpufreq for 10+ years.
However it is correct that there would be a conflict if a driver
requests a min freq that is higher than the max freq requested by
another. In this case devfreq_verify_within_limits() resolves the
conflict by raising p->max to the min freq. Not sure if this is
something that would ever occur in practice though.
If we are really concerned about this case it would also be an option
to limit the adjustment to the max frequency.
> To resolve the conflict for multiple device driver, maybe OPP interface
> have to support 'usage_count' such as clk_enable/disable().
This would require supporting negative usage count values, since a OPP
should not be enabled if e.g. thermal enables it but the throttler
disabled it or viceversa.
Theoretically there could also be conflicts, like one driver disabling
the higher OPPs and another the lower ones, with the outcome of all
OPPs being disabled, which would be a more drastic conflict resolution
than that of devfreq_verify_within_limits().
Viresh, what do you think about an OPP usage count?
Thanks
Matthias
Hi Chanwoo,
On Thu, Jul 12, 2018 at 06:08:36PM +0900, Chanwoo Choi wrote:
> Hi Matthias,
>
> On 2018년 07월 07일 03:09, Matthias Kaehlcke wrote:
> > Hi,
> >
> > On Wed, Jul 04, 2018 at 02:30:32PM +0900, Chanwoo Choi wrote:
> >
> >> I didn't see any framework which exporting the class instance.
> >> It is very dangerous. Unknown device drivers is able to reset
> >> the 'devfreq_class' instance. I can't agree this approach.
> >
> > While I agree that it is potential dangerous it is actually a common
> > practice to export the class:
> >
>
> I tried to find the real usage of exported class instance
> and I add the comment for each class instance. Almost exported class
> instance are used in the their own director or some exported class
> like rio_mport_class/switchtec_class are created from specific device driver
> instead of subsystem.
>
> Only following two cases are used on outside of subsystem directory.
> devtmpfs.c and alarmtimer.c are core feature of linux kernel.
>
> drivers/base/devtmpfs.c uses 'block_class'.
> kernel/time/alarmtimer.c uses 'rtc_class'.
>
> I cannot yet agree this approach due to only block_class and rtc_class.
I thought your main concern was that the class is exported, which is
what several other subsystems do. That the class isn't used outside of
the subsystem directory most likely means that there is no need for
it, rather than that it shouldn't be done at all (depending on the
type of use of course).
In any case not exporting the class object provides a limited
protection against potential abuse of the class at best. To use the
class API all that is needed is a 'struct device' of a devfreq device,
which has a pointer to the class object (dev->class).
Theoretically I could register a fake devfreq device to obtain access
to the class object, though that doesn't seem a very neat approach ;-)
> You added the following comment why devfreq_class instance is necessary.
> Actullay, I don't know the best solution right now. But, all device drivers
> can be added or removed if driver uses the module type. It is not a problem
> for only devfreq instance.
Certainly it's not a problem limited to devfreq devices. In many other
cases bus notifiers can be used, but since devfreq devices areen't
tied to a specific bus this is not an option here.
If you really don't want to export the class we could add wrappers
for (un)registering a class interface:
int devfreq_class_interface_register(struct class_interface *)
void devfreq_class_interface_unregister(struct class_interface *)
The wrappers would have to assign ci->class since the throttler
can't see the class object.
Or add notifiers for device addition/removal, though the throttler
relies on the behavior of the class_interface which also notifies
about devices added before registration. This might not be what other
potential users of the notifiers expect.
Thanks
Matthias
> /*
> + * devfreq devices can be added and removed at runtime, hence they
> + * must also be handled dynamically. The class_interface notifies us
> + * whenever a device is added or removed. When the interface is
> + * registered ci->add_dev() is called for all existing devfreq
> + * devices.
> */
>
>
> > grep "extern struct class " include/linux/ -R
> > include/linux/rio.h:extern struct class rio_mport_class;
> rio_mport_class is created on drivers/rapidio/rio-drivers.c.
> It means that just device driver create the 'rio_mport_class' class
> instead of any linux kernel subsystem.
>
> > include/linux/tty.h:extern struct class *tty_class;
> tty_class is not used on outside of drivers/tty
>
> > include/linux/fb.h:extern struct class *fb_class;
> fb_class is not used on outside of drivers/video/fbdev
>
> > include/linux/ide.h:extern struct class *ide_port_class;
> ide_port_class is not used on outside of drivers/ide.
>
> > include/linux/device.h:extern struct class * __must_check __class_create(struct module *owner,
>
> > include/linux/devfreq.h:extern struct class *devfreq_class;
> not yet
>
> > include/linux/switchtec.h:extern struct class *switchtec_class;
> switchtec_class is created on drivers/pci/switch/switchtec.c
> and then switchtec_class is only used on drivers/ntb/hw/mscc/ntb_hw_switchtec.c.
> It is not subsystem. Just switchtec.c device driver makes the their own class.
>
> > include/linux/input.h:extern struct class input_class;
> input_class is not used on outside of drivers/input.
>
> > include/linux/power_supply.h:extern struct class *power_supply_class;
> power_supply_class is not used on outside of drivers/power/supply.
>
>
> > include/linux/genhd.h:extern struct class block_class;
> drivers/base/devtmpfs.c uses 'block_class'.
>
> > include/linux/rtc.h:extern struct class *rtc_class;
> kernel/time/alarmtimer.c uses 'rtc_class'.
>
> >
> > struct class_interface and class_interface_register() would be
> > pointless without exported classes.
> >
> > My understanding is that the kernel is often lax on encapsulation and
> > exposes private/delicate data pragmatically within the kernel when
> > needed because "the kernel trusts itself".
On Mon, Jul 16, 2018 at 12:41:14PM -0700, Matthias Kaehlcke wrote:
> Hi Chanwoo,
>
> On Thu, Jul 12, 2018 at 06:08:36PM +0900, Chanwoo Choi wrote:
> > Hi Matthias,
> >
> > On 2018년 07월 07일 03:09, Matthias Kaehlcke wrote:
> > > Hi,
> > >
> > > On Wed, Jul 04, 2018 at 02:30:32PM +0900, Chanwoo Choi wrote:
> > >
> > >> I didn't see any framework which exporting the class instance.
> > >> It is very dangerous. Unknown device drivers is able to reset
> > >> the 'devfreq_class' instance. I can't agree this approach.
> > >
> > > While I agree that it is potential dangerous it is actually a common
> > > practice to export the class:
> > >
> >
> > I tried to find the real usage of exported class instance
> > and I add the comment for each class instance. Almost exported class
> > instance are used in the their own director or some exported class
> > like rio_mport_class/switchtec_class are created from specific device driver
> > instead of subsystem.
> >
> > Only following two cases are used on outside of subsystem directory.
> > devtmpfs.c and alarmtimer.c are core feature of linux kernel.
> >
> > drivers/base/devtmpfs.c uses 'block_class'.
> > kernel/time/alarmtimer.c uses 'rtc_class'.
> >
> > I cannot yet agree this approach due to only block_class and rtc_class.
>
> I thought your main concern was that the class is exported, which is
> what several other subsystems do. That the class isn't used outside of
> the subsystem directory most likely means that there is no need for
> it, rather than that it shouldn't be done at all (depending on the
> type of use of course).
>
> In any case not exporting the class object provides a limited
> protection against potential abuse of the class at best. To use the
> class API all that is needed is a 'struct device' of a devfreq device,
> which has a pointer to the class object (dev->class).
>
> Theoretically I could register a fake devfreq device to obtain access
> to the class object, though that doesn't seem a very neat approach ;-)
>
> > You added the following comment why devfreq_class instance is necessary.
> > Actullay, I don't know the best solution right now. But, all device drivers
> > can be added or removed if driver uses the module type. It is not a problem
> > for only devfreq instance.
>
> Certainly it's not a problem limited to devfreq devices. In many other
> cases bus notifiers can be used, but since devfreq devices areen't
> tied to a specific bus this is not an option here.
>
> If you really don't want to export the class we could add wrappers
> for (un)registering a class interface:
>
> int devfreq_class_interface_register(struct class_interface *)
> void devfreq_class_interface_unregister(struct class_interface *)
>
> The wrappers would have to assign ci->class since the throttler
> can't see the class object.
>
> Or add notifiers for device addition/removal, though the throttler
> relies on the behavior of the class_interface which also notifies
> about devices added before registration. This might not be what other
> potential users of the notifiers expect.
Ping
Could we please try to find a solution/reach a conclusion for this?
Not that it should affect the outcome of this discussion, but I want
to mention that from my point of view it is a bit unfortunate that
this and other fundamental concerns were only raised after I spent
significant time on repeatedly refactoring the throttler driver to
address other comments. Since you and MyungJoo Ham previously had only
minor comments on the other devfreq patches in this series I assumed
there were no major concerns from your side :(
On Mon, Jul 16, 2018 at 10:50:50AM -0700, Matthias Kaehlcke wrote:
> On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote:
> > Hi Matthias,
> >
> > On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote:
> > > Hi Chanwoo,
> > >
> > > On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote:
> > >
> > >> Firstly,
> > >> I'm not sure why devfreq needs the devfreq_verify_within_limits() function.
> > >>
> > >> devfreq already used the OPP interface as default. It means that
> > >> the outside of 'drivers/devfreq' can disable/enable the frequency
> > >> such as drivers/thermal/devfreq_cooling.c. Also, when some device
> > >> drivers disable/enable the specific frequency, the devfreq core
> > >> consider them.
> > >>
> > >> So, devfreq doesn't need to devfreq_verify_within_limits() because
> > >> already support some interface to change the minimum/maximum frequency
> > >> of devfreq device.
> > >>
> > >> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()'
> > >> to change the minimum/maximum frequency of cpu. some device driver cannot
> > >> change the minimum/maximum frequency through OPP interface.
> > >>
> > >> But, in case of devfreq subsystem, as I explained already, devfreq support
> > >> the OPP interface as default way. devfreq subsystem doesn't need to add
> > >> other way to change the minimum/maximum frequency.
> > >
> > > Using the OPP interface exclusively works as long as a
> > > enabling/disabling of OPPs is limited to a single driver
> > > (drivers/thermal/devfreq_cooling.c). When multiple drivers are
> > > involved you need a way to resolve conflicts, that's the purpose of
> > > devfreq_verify_within_limits(). Please let me know if there are
> > > existing mechanisms for conflict resolution that I overlooked.
> > >
> > > Possibly drivers/thermal/devfreq_cooling.c could be migrated to use
> > > devfreq_verify_within_limits() instead of the OPP interface if
> > > desired, however this seems beyond the scope of this series.
> >
> > Actually, if we uses this approach, it doesn't support the multiple drivers too.
> > If non throttler drivers uses devfreq_verify_within_limits(), the conflict
> > happen.
>
> As long as drivers limit the max freq there is no conflict, the lowest
> max freq wins. I expect this to be the usual case, apparently it
> worked for cpufreq for 10+ years.
>
> However it is correct that there would be a conflict if a driver
> requests a min freq that is higher than the max freq requested by
> another. In this case devfreq_verify_within_limits() resolves the
> conflict by raising p->max to the min freq. Not sure if this is
> something that would ever occur in practice though.
>
> If we are really concerned about this case it would also be an option
> to limit the adjustment to the max frequency.
>
> > To resolve the conflict for multiple device driver, maybe OPP interface
> > have to support 'usage_count' such as clk_enable/disable().
>
> This would require supporting negative usage count values, since a OPP
> should not be enabled if e.g. thermal enables it but the throttler
> disabled it or viceversa.
>
> Theoretically there could also be conflicts, like one driver disabling
> the higher OPPs and another the lower ones, with the outcome of all
> OPPs being disabled, which would be a more drastic conflict resolution
> than that of devfreq_verify_within_limits().
>
> Viresh, what do you think about an OPP usage count?
Ping, can we try to reach a conclusion on this or at least keep the
discussion going?
Not that it matters, but my preferred solution continues to be
devfreq_verify_within_limits(). It solves conflicts in some way (which
could be adjusted if needed) and has proven to work in practice for
10+ years in a very similar sub-system.
On 2018년 08월 01일 04:39, Matthias Kaehlcke wrote:
> On Mon, Jul 16, 2018 at 10:50:50AM -0700, Matthias Kaehlcke wrote:
>> On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote:
>>> Hi Matthias,
>>>
>>> On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote:
>>>> Hi Chanwoo,
>>>>
>>>> On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote:
>>>>
>>>>> Firstly,
>>>>> I'm not sure why devfreq needs the devfreq_verify_within_limits() function.
>>>>>
>>>>> devfreq already used the OPP interface as default. It means that
>>>>> the outside of 'drivers/devfreq' can disable/enable the frequency
>>>>> such as drivers/thermal/devfreq_cooling.c. Also, when some device
>>>>> drivers disable/enable the specific frequency, the devfreq core
>>>>> consider them.
>>>>>
>>>>> So, devfreq doesn't need to devfreq_verify_within_limits() because
>>>>> already support some interface to change the minimum/maximum frequency
>>>>> of devfreq device.
>>>>>
>>>>> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()'
>>>>> to change the minimum/maximum frequency of cpu. some device driver cannot
>>>>> change the minimum/maximum frequency through OPP interface.
>>>>>
>>>>> But, in case of devfreq subsystem, as I explained already, devfreq support
>>>>> the OPP interface as default way. devfreq subsystem doesn't need to add
>>>>> other way to change the minimum/maximum frequency.
>>>>
>>>> Using the OPP interface exclusively works as long as a
>>>> enabling/disabling of OPPs is limited to a single driver
>>>> (drivers/thermal/devfreq_cooling.c). When multiple drivers are
>>>> involved you need a way to resolve conflicts, that's the purpose of
>>>> devfreq_verify_within_limits(). Please let me know if there are
>>>> existing mechanisms for conflict resolution that I overlooked.
>>>>
>>>> Possibly drivers/thermal/devfreq_cooling.c could be migrated to use
>>>> devfreq_verify_within_limits() instead of the OPP interface if
>>>> desired, however this seems beyond the scope of this series.
>>>
>>> Actually, if we uses this approach, it doesn't support the multiple drivers too.
>>> If non throttler drivers uses devfreq_verify_within_limits(), the conflict
>>> happen.
>>
>> As long as drivers limit the max freq there is no conflict, the lowest
>> max freq wins. I expect this to be the usual case, apparently it
>> worked for cpufreq for 10+ years.
>>
>> However it is correct that there would be a conflict if a driver
>> requests a min freq that is higher than the max freq requested by
>> another. In this case devfreq_verify_within_limits() resolves the
>> conflict by raising p->max to the min freq. Not sure if this is
>> something that would ever occur in practice though.
>>
>> If we are really concerned about this case it would also be an option
>> to limit the adjustment to the max frequency.
>>
>>> To resolve the conflict for multiple device driver, maybe OPP interface
>>> have to support 'usage_count' such as clk_enable/disable().
>>
>> This would require supporting negative usage count values, since a OPP
>> should not be enabled if e.g. thermal enables it but the throttler
>> disabled it or viceversa.
>>
>> Theoretically there could also be conflicts, like one driver disabling
>> the higher OPPs and another the lower ones, with the outcome of all
>> OPPs being disabled, which would be a more drastic conflict resolution
>> than that of devfreq_verify_within_limits().
>>
>> Viresh, what do you think about an OPP usage count?
>
> Ping, can we try to reach a conclusion on this or at least keep the
> discussion going?
>
> Not that it matters, but my preferred solution continues to be
> devfreq_verify_within_limits(). It solves conflicts in some way (which
> could be adjusted if needed) and has proven to work in practice for
> 10+ years in a very similar sub-system.
It is not true. Current cpufreq subsystem doesn't support external OPP
control to enable/disable the OPP entry. If some device driver
controls the OPP entry of cpufreq driver with opp_disable/enable(),
the operation is not working. Because cpufreq considers the limit
through 'cpufreq_verify_with_limits()' only.
As I already commented[1], there is different between cpufreq and devfreq.
[1] https://lkml.org/lkml/2018/7/4/80
Already, subsystem already used OPP interface in order to control
specific OPP entry. I don't want to provide two outside method
to control the frequency of devfreq driver. It might make the confusion.
I want to use only OPP interface to enable/disable frequency
even if we have to modify the OPP interface.
--
Best Regards,
Chanwoo Choi
Samsung Electronics
Hi Matthias,
On 2018년 08월 01일 04:29, Matthias Kaehlcke wrote:
> On Mon, Jul 16, 2018 at 12:41:14PM -0700, Matthias Kaehlcke wrote:
>> Hi Chanwoo,
>>
>> On Thu, Jul 12, 2018 at 06:08:36PM +0900, Chanwoo Choi wrote:
>>> Hi Matthias,
>>>
>>> On 2018년 07월 07일 03:09, Matthias Kaehlcke wrote:
>>>> Hi,
>>>>
>>>> On Wed, Jul 04, 2018 at 02:30:32PM +0900, Chanwoo Choi wrote:
>>>>
>>>>> I didn't see any framework which exporting the class instance.
>>>>> It is very dangerous. Unknown device drivers is able to reset
>>>>> the 'devfreq_class' instance. I can't agree this approach.
>>>>
>>>> While I agree that it is potential dangerous it is actually a common
>>>> practice to export the class:
>>>>
>>>
>>> I tried to find the real usage of exported class instance
>>> and I add the comment for each class instance. Almost exported class
>>> instance are used in the their own director or some exported class
>>> like rio_mport_class/switchtec_class are created from specific device driver
>>> instead of subsystem.
>>>
>>> Only following two cases are used on outside of subsystem directory.
>>> devtmpfs.c and alarmtimer.c are core feature of linux kernel.
>>>
>>> drivers/base/devtmpfs.c uses 'block_class'.
>>> kernel/time/alarmtimer.c uses 'rtc_class'.
>>>
>>> I cannot yet agree this approach due to only block_class and rtc_class.
>>
>> I thought your main concern was that the class is exported, which is
>> what several other subsystems do. That the class isn't used outside of
>> the subsystem directory most likely means that there is no need for
>> it, rather than that it shouldn't be done at all (depending on the
>> type of use of course).
>>
>> In any case not exporting the class object provides a limited
>> protection against potential abuse of the class at best. To use the
>> class API all that is needed is a 'struct device' of a devfreq device,
>> which has a pointer to the class object (dev->class).
>>
>> Theoretically I could register a fake devfreq device to obtain access
>> to the class object, though that doesn't seem a very neat approach ;-)
>>
>>> You added the following comment why devfreq_class instance is necessary.
>>> Actullay, I don't know the best solution right now. But, all device drivers
>>> can be added or removed if driver uses the module type. It is not a problem
>>> for only devfreq instance.
>>
>> Certainly it's not a problem limited to devfreq devices. In many other
>> cases bus notifiers can be used, but since devfreq devices areen't
>> tied to a specific bus this is not an option here.
>>
>> If you really don't want to export the class we could add wrappers
>> for (un)registering a class interface:
>>
>> int devfreq_class_interface_register(struct class_interface *)
>> void devfreq_class_interface_unregister(struct class_interface *)
About this approach, I agree because it doesn't export the devfreq_class
instance as you commented.
>>
>> The wrappers would have to assign ci->class since the throttler
>> can't see the class object.
>>
>> Or add notifiers for device addition/removal, though the throttler
>> relies on the behavior of the class_interface which also notifies
>> about devices added before registration. This might not be what other
>> potential users of the notifiers expect.
>
> Ping
>
> Could we please try to find a solution/reach a conclusion for this?
>
> Not that it should affect the outcome of this discussion, but I want
> to mention that from my point of view it is a bit unfortunate that
> this and other fundamental concerns were only raised after I spent
> significant time on repeatedly refactoring the throttler driver to
> address other comments. Since you and MyungJoo Ham previously had only
> minor comments on the other devfreq patches in this series I assumed
> there were no major concerns from your side :(
>
>
--
Best Regards,
Chanwoo Choi
Samsung Electronics
Hi Matthias,
On 2018년 07월 04일 08:47, Matthias Kaehlcke wrote:
> The Throttler is used for non-thermal throttling of system components
> like CPUs or devfreq devices.
>
> Signed-off-by: Matthias Kaehlcke <[email protected]>
> --
> Changes in v5:
> - patch added to the series (replacing "dt-bindings: PM / OPP: add
> opp-throttlers property")
> ---
> .../devicetree/bindings/misc/throttler.txt | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
> create mode 100644 Documentation/devicetree/bindings/misc/throttler.txt
>
> diff --git a/Documentation/devicetree/bindings/misc/throttler.txt b/Documentation/devicetree/bindings/misc/throttler.txt
> new file mode 100644
> index 000000000000..2ea80c62dbe1
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/misc/throttler.txt
> @@ -0,0 +1,13 @@
> +Throttler driver
> +
> +The Throttler is used for non-thermal throttling of system components like
> +CPUs or devfreq devices.
> +
> +Required properties:
> +- throttler-opps Array of OPP-v2 phandles with the OPPs used for
> + throttling.
> +
> +Example:
> + throttler {
> + throttler-opps = <&cpu0_opp03, &cpu1_opp02, &gpu_opp03>;
> + };
>
If possible, I hope the more detailed example for "cpu0_opp03, &cpu1_opp02, &gpu_opp03"
because I'm confusing the meaning of 'cpu0_opp03' phandle.
cpu0_opp03 indicates the only one specific OPP entry among set of OPP entries of CPU0 cpufreq? or
cpu0_opp03 indicates the set of OPP entries for CPU0 cpufreq ?
--
Best Regards,
Chanwoo Choi
Samsung Electronics
Hi Matthias,
On 2018년 07월 04일 08:46, Matthias Kaehlcke wrote:
> Currently update_devfreq() is only visible to devfreq governors outside
> of devfreq.c. Make it public to allow drivers that adjust devfreq policies
> to cause a re-evaluation of the frequency after a policy change.
>
> Signed-off-by: Matthias Kaehlcke <[email protected]>
> Acked-by: MyungJoo Ham <[email protected]>
> Reviewed-by: Brian Norris <[email protected]>
> --
> Changes in v5:
> - none
>
> Changed in v4:
> - added 'Reviewed-by: Brian Norris <[email protected]>' tag
>
> Changes in v3:
> - none
>
> Changes in v2:
> - added 'Acked-by: MyungJoo Ham <[email protected]>' tag
> ---
> drivers/devfreq/governor.h | 3 ---
> include/linux/devfreq.h | 8 ++++++++
> 2 files changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/devfreq/governor.h b/drivers/devfreq/governor.h
> index b81700244ce3..f53339ca610f 100644
> --- a/drivers/devfreq/governor.h
> +++ b/drivers/devfreq/governor.h
> @@ -57,9 +57,6 @@ struct devfreq_governor {
> unsigned int event, void *data);
> };
>
> -/* Caution: devfreq->lock must be locked before calling update_devfreq */
> -extern int update_devfreq(struct devfreq *devfreq);
> -
> extern void devfreq_monitor_start(struct devfreq *devfreq);
> extern void devfreq_monitor_stop(struct devfreq *devfreq);
> extern void devfreq_monitor_suspend(struct devfreq *devfreq);
> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
> index 7c8dce96db73..c4f84a769cb5 100644
> --- a/include/linux/devfreq.h
> +++ b/include/linux/devfreq.h
> @@ -222,6 +222,14 @@ extern void devm_devfreq_remove_device(struct device *dev,
> extern int devfreq_suspend_device(struct devfreq *devfreq);
> extern int devfreq_resume_device(struct devfreq *devfreq);
>
> +/**
> + * update_devfreq() - Reevaluate the device and configure frequency
> + * @devfreq: the devfreq device
> + *
> + * Note: devfreq->lock must be held
> + */
> +extern int update_devfreq(struct devfreq *devfreq);
> +
> /* Helper functions for devfreq user device driver with OPP. */
> extern struct dev_pm_opp *devfreq_recommended_opp(struct device *dev,
> unsigned long *freq, u32 flags);
>
Reviewed-by: Chanwoo Choi <[email protected]>
--
Best Regards,
Chanwoo Choi
Samsung Electronics
Hi Chanwoo,
On Wed, Aug 01, 2018 at 10:22:16AM +0900, Chanwoo Choi wrote:
> On 2018년 08월 01일 04:39, Matthias Kaehlcke wrote:
> > On Mon, Jul 16, 2018 at 10:50:50AM -0700, Matthias Kaehlcke wrote:
> >> On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote:
> >>> Hi Matthias,
> >>>
> >>> On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote:
> >>>> Hi Chanwoo,
> >>>>
> >>>> On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote:
> >>>>
> >>>>> Firstly,
> >>>>> I'm not sure why devfreq needs the devfreq_verify_within_limits() function.
> >>>>>
> >>>>> devfreq already used the OPP interface as default. It means that
> >>>>> the outside of 'drivers/devfreq' can disable/enable the frequency
> >>>>> such as drivers/thermal/devfreq_cooling.c. Also, when some device
> >>>>> drivers disable/enable the specific frequency, the devfreq core
> >>>>> consider them.
> >>>>>
> >>>>> So, devfreq doesn't need to devfreq_verify_within_limits() because
> >>>>> already support some interface to change the minimum/maximum frequency
> >>>>> of devfreq device.
> >>>>>
> >>>>> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()'
> >>>>> to change the minimum/maximum frequency of cpu. some device driver cannot
> >>>>> change the minimum/maximum frequency through OPP interface.
> >>>>>
> >>>>> But, in case of devfreq subsystem, as I explained already, devfreq support
> >>>>> the OPP interface as default way. devfreq subsystem doesn't need to add
> >>>>> other way to change the minimum/maximum frequency.
> >>>>
> >>>> Using the OPP interface exclusively works as long as a
> >>>> enabling/disabling of OPPs is limited to a single driver
> >>>> (drivers/thermal/devfreq_cooling.c). When multiple drivers are
> >>>> involved you need a way to resolve conflicts, that's the purpose of
> >>>> devfreq_verify_within_limits(). Please let me know if there are
> >>>> existing mechanisms for conflict resolution that I overlooked.
> >>>>
> >>>> Possibly drivers/thermal/devfreq_cooling.c could be migrated to use
> >>>> devfreq_verify_within_limits() instead of the OPP interface if
> >>>> desired, however this seems beyond the scope of this series.
> >>>
> >>> Actually, if we uses this approach, it doesn't support the multiple drivers too.
> >>> If non throttler drivers uses devfreq_verify_within_limits(), the conflict
> >>> happen.
> >>
> >> As long as drivers limit the max freq there is no conflict, the lowest
> >> max freq wins. I expect this to be the usual case, apparently it
> >> worked for cpufreq for 10+ years.
> >>
> >> However it is correct that there would be a conflict if a driver
> >> requests a min freq that is higher than the max freq requested by
> >> another. In this case devfreq_verify_within_limits() resolves the
> >> conflict by raising p->max to the min freq. Not sure if this is
> >> something that would ever occur in practice though.
> >>
> >> If we are really concerned about this case it would also be an option
> >> to limit the adjustment to the max frequency.
> >>
> >>> To resolve the conflict for multiple device driver, maybe OPP interface
> >>> have to support 'usage_count' such as clk_enable/disable().
> >>
> >> This would require supporting negative usage count values, since a OPP
> >> should not be enabled if e.g. thermal enables it but the throttler
> >> disabled it or viceversa.
> >>
> >> Theoretically there could also be conflicts, like one driver disabling
> >> the higher OPPs and another the lower ones, with the outcome of all
> >> OPPs being disabled, which would be a more drastic conflict resolution
> >> than that of devfreq_verify_within_limits().
> >>
> >> Viresh, what do you think about an OPP usage count?
> >
> > Ping, can we try to reach a conclusion on this or at least keep the
> > discussion going?
> >
> > Not that it matters, but my preferred solution continues to be
> > devfreq_verify_within_limits(). It solves conflicts in some way (which
> > could be adjusted if needed) and has proven to work in practice for
> > 10+ years in a very similar sub-system.
>
> It is not true. Current cpufreq subsystem doesn't support external OPP
> control to enable/disable the OPP entry. If some device driver
> controls the OPP entry of cpufreq driver with opp_disable/enable(),
> the operation is not working. Because cpufreq considers the limit
> through 'cpufreq_verify_with_limits()' only.
Ok, we can probably agree that using cpufreq_verify_with_limits()
exclusively seems to have worked well for cpufreq, and that in their
overall purpose cpufreq and devfreq are similar subsystems.
The current throttler series with devfreq_verify_within_limits() takes
the enabled OPPs into account, the lowest and highest OPP are used as
a starting point for the frequency adjustment and (in theory) the
frequency range should only be narrowed by
devfreq_verify_within_limits().
> As I already commented[1], there is different between cpufreq and devfreq.
> [1] https://lkml.org/lkml/2018/7/4/80
>
> Already, subsystem already used OPP interface in order to control
> specific OPP entry. I don't want to provide two outside method
> to control the frequency of devfreq driver. It might make the confusion.
I understand your point, it would indeed be preferable to have a
single method. However I'm not convinced that the OPP interface is
a suitable solution, as I exposed earlier in this thread (quoted
below).
I would like you to at least consider the possibility of changing
drivers/thermal/devfreq_cooling.c to devfreq_verify_within_limits().
Besides that it's not what is currently used, do you see any technical
concerns that would make devfreq_verify_within_limits() an unsuitable
or inferior solution?
> I want to use only OPP interface to enable/disable frequency
> even if we have to modify the OPP interface.
These are the concerns I raised earlier about a solution with OPP
usage counts:
"This would require supporting negative usage count values, since a OPP
should not be enabled if e.g. thermal enables it but the throttler
disabled it or viceversa.
Theoretically there could also be conflicts, like one driver disabling
the higher OPPs and another the lower ones, with the outcome of all
OPPs being disabled, which would be a more drastic conflict resolution
than that of devfreq_verify_within_limits()."
What do you think about these points?
The negative usage counts aren't necessarily a dealbreaker in a
technical sense, though I'm not a friend of quirky interfaces that
don't behave like a typical user would expect (e.g. an OPP isn't
necessarily enabled after dev_pm_opp_enable()).
I can sent an RFC with OPP usage counts, though due to the above
concerns I have doubts it will be well received.
Thanks
Matthias
On Wed, Aug 01, 2018 at 05:18:40PM +0900, Chanwoo Choi wrote:
> Hi Matthias,
>
> On 2018년 08월 01일 04:29, Matthias Kaehlcke wrote:
> > On Mon, Jul 16, 2018 at 12:41:14PM -0700, Matthias Kaehlcke wrote:
> >> Hi Chanwoo,
> >>
> >> On Thu, Jul 12, 2018 at 06:08:36PM +0900, Chanwoo Choi wrote:
> >>> Hi Matthias,
> >>>
> >>> On 2018년 07월 07일 03:09, Matthias Kaehlcke wrote:
> >>>> Hi,
> >>>>
> >>>> On Wed, Jul 04, 2018 at 02:30:32PM +0900, Chanwoo Choi wrote:
> >>>>
> >>>>> I didn't see any framework which exporting the class instance.
> >>>>> It is very dangerous. Unknown device drivers is able to reset
> >>>>> the 'devfreq_class' instance. I can't agree this approach.
> >>>>
> >>>> While I agree that it is potential dangerous it is actually a common
> >>>> practice to export the class:
> >>>>
> >>>
> >>> I tried to find the real usage of exported class instance
> >>> and I add the comment for each class instance. Almost exported class
> >>> instance are used in the their own director or some exported class
> >>> like rio_mport_class/switchtec_class are created from specific device driver
> >>> instead of subsystem.
> >>>
> >>> Only following two cases are used on outside of subsystem directory.
> >>> devtmpfs.c and alarmtimer.c are core feature of linux kernel.
> >>>
> >>> drivers/base/devtmpfs.c uses 'block_class'.
> >>> kernel/time/alarmtimer.c uses 'rtc_class'.
> >>>
> >>> I cannot yet agree this approach due to only block_class and rtc_class.
> >>
> >> I thought your main concern was that the class is exported, which is
> >> what several other subsystems do. That the class isn't used outside of
> >> the subsystem directory most likely means that there is no need for
> >> it, rather than that it shouldn't be done at all (depending on the
> >> type of use of course).
> >>
> >> In any case not exporting the class object provides a limited
> >> protection against potential abuse of the class at best. To use the
> >> class API all that is needed is a 'struct device' of a devfreq device,
> >> which has a pointer to the class object (dev->class).
> >>
> >> Theoretically I could register a fake devfreq device to obtain access
> >> to the class object, though that doesn't seem a very neat approach ;-)
> >>
> >>> You added the following comment why devfreq_class instance is necessary.
> >>> Actullay, I don't know the best solution right now. But, all device drivers
> >>> can be added or removed if driver uses the module type. It is not a problem
> >>> for only devfreq instance.
> >>
> >> Certainly it's not a problem limited to devfreq devices. In many other
> >> cases bus notifiers can be used, but since devfreq devices areen't
> >> tied to a specific bus this is not an option here.
> >>
> >> If you really don't want to export the class we could add wrappers
> >> for (un)registering a class interface:
> >>
> >> int devfreq_class_interface_register(struct class_interface *)
> >> void devfreq_class_interface_unregister(struct class_interface *)
>
> About this approach, I agree because it doesn't export the devfreq_class
> instance as you commented.
Great, I'll change it in the next revision!
Hi Chanwoo,
On Wed, Aug 01, 2018 at 05:27:57PM +0900, Chanwoo Choi wrote:
> Hi Matthias,
>
> On 2018년 07월 04일 08:47, Matthias Kaehlcke wrote:
> > The Throttler is used for non-thermal throttling of system components
> > like CPUs or devfreq devices.
> >
> > Signed-off-by: Matthias Kaehlcke <[email protected]>
>
> If possible, I hope the more detailed example for "cpu0_opp03, &cpu1_opp02, &gpu_opp03"
> because I'm confusing the meaning of 'cpu0_opp03' phandle.
I omitted the OPP definitions to avoid cluttering the example. I'm
happy to add them if DT folks also think it would add value.
> cpu0_opp03 indicates the only one specific OPP entry among set of
> OPP entries of CPU0 cpufreq? or cpu0_opp03 indicates the set of OPP
> entries for CPU0 cpufreq ?
cpu0_opp03 and the other phandles are specific OPP entries, that
indicate the throttler which frequencies/OPPs should be used for
throttling. For devices/CPUs with multiple levels of throttling
multiple entries from its OPP set are listed. In the example a single
level of throttling is used for all devices/CPUs.
Hi Matthias,
On 2018년 08월 02일 02:08, Matthias Kaehlcke wrote:
> Hi Chanwoo,
>
> On Wed, Aug 01, 2018 at 10:22:16AM +0900, Chanwoo Choi wrote:
>> On 2018년 08월 01일 04:39, Matthias Kaehlcke wrote:
>>> On Mon, Jul 16, 2018 at 10:50:50AM -0700, Matthias Kaehlcke wrote:
>>>> On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote:
>>>>> Hi Matthias,
>>>>>
>>>>> On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote:
>>>>>> Hi Chanwoo,
>>>>>>
>>>>>> On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote:
>>>>>>
>>>>>>> Firstly,
>>>>>>> I'm not sure why devfreq needs the devfreq_verify_within_limits() function.
>>>>>>>
>>>>>>> devfreq already used the OPP interface as default. It means that
>>>>>>> the outside of 'drivers/devfreq' can disable/enable the frequency
>>>>>>> such as drivers/thermal/devfreq_cooling.c. Also, when some device
>>>>>>> drivers disable/enable the specific frequency, the devfreq core
>>>>>>> consider them.
>>>>>>>
>>>>>>> So, devfreq doesn't need to devfreq_verify_within_limits() because
>>>>>>> already support some interface to change the minimum/maximum frequency
>>>>>>> of devfreq device.
>>>>>>>
>>>>>>> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()'
>>>>>>> to change the minimum/maximum frequency of cpu. some device driver cannot
>>>>>>> change the minimum/maximum frequency through OPP interface.
>>>>>>>
>>>>>>> But, in case of devfreq subsystem, as I explained already, devfreq support
>>>>>>> the OPP interface as default way. devfreq subsystem doesn't need to add
>>>>>>> other way to change the minimum/maximum frequency.
>>>>>>
>>>>>> Using the OPP interface exclusively works as long as a
>>>>>> enabling/disabling of OPPs is limited to a single driver
>>>>>> (drivers/thermal/devfreq_cooling.c). When multiple drivers are
>>>>>> involved you need a way to resolve conflicts, that's the purpose of
>>>>>> devfreq_verify_within_limits(). Please let me know if there are
>>>>>> existing mechanisms for conflict resolution that I overlooked.
>>>>>>
>>>>>> Possibly drivers/thermal/devfreq_cooling.c could be migrated to use
>>>>>> devfreq_verify_within_limits() instead of the OPP interface if
>>>>>> desired, however this seems beyond the scope of this series.
>>>>>
>>>>> Actually, if we uses this approach, it doesn't support the multiple drivers too.
>>>>> If non throttler drivers uses devfreq_verify_within_limits(), the conflict
>>>>> happen.
>>>>
>>>> As long as drivers limit the max freq there is no conflict, the lowest
>>>> max freq wins. I expect this to be the usual case, apparently it
>>>> worked for cpufreq for 10+ years.
>>>>
>>>> However it is correct that there would be a conflict if a driver
>>>> requests a min freq that is higher than the max freq requested by
>>>> another. In this case devfreq_verify_within_limits() resolves the
>>>> conflict by raising p->max to the min freq. Not sure if this is
>>>> something that would ever occur in practice though.
>>>>
>>>> If we are really concerned about this case it would also be an option
>>>> to limit the adjustment to the max frequency.
>>>>
>>>>> To resolve the conflict for multiple device driver, maybe OPP interface
>>>>> have to support 'usage_count' such as clk_enable/disable().
>>>>
>>>> This would require supporting negative usage count values, since a OPP
>>>> should not be enabled if e.g. thermal enables it but the throttler
>>>> disabled it or viceversa.
>>>>
>>>> Theoretically there could also be conflicts, like one driver disabling
>>>> the higher OPPs and another the lower ones, with the outcome of all
>>>> OPPs being disabled, which would be a more drastic conflict resolution
>>>> than that of devfreq_verify_within_limits().
>>>>
>>>> Viresh, what do you think about an OPP usage count?
>>>
>>> Ping, can we try to reach a conclusion on this or at least keep the
>>> discussion going?
>>>
>>> Not that it matters, but my preferred solution continues to be
>>> devfreq_verify_within_limits(). It solves conflicts in some way (which
>>> could be adjusted if needed) and has proven to work in practice for
>>> 10+ years in a very similar sub-system.
>>
>> It is not true. Current cpufreq subsystem doesn't support external OPP
>> control to enable/disable the OPP entry. If some device driver
>> controls the OPP entry of cpufreq driver with opp_disable/enable(),
>> the operation is not working. Because cpufreq considers the limit
>> through 'cpufreq_verify_with_limits()' only.
>
> Ok, we can probably agree that using cpufreq_verify_with_limits()
> exclusively seems to have worked well for cpufreq, and that in their
> overall purpose cpufreq and devfreq are similar subsystems.
>
> The current throttler series with devfreq_verify_within_limits() takes
> the enabled OPPs into account, the lowest and highest OPP are used as
> a starting point for the frequency adjustment and (in theory) the
> frequency range should only be narrowed by
> devfreq_verify_within_limits().
>
>> As I already commented[1], there is different between cpufreq and devfreq.
>> [1] https://lkml.org/lkml/2018/7/4/80
>>
>> Already, subsystem already used OPP interface in order to control
>> specific OPP entry. I don't want to provide two outside method
>> to control the frequency of devfreq driver. It might make the confusion.
>
> I understand your point, it would indeed be preferable to have a
> single method. However I'm not convinced that the OPP interface is
> a suitable solution, as I exposed earlier in this thread (quoted
> below).
>
> I would like you to at least consider the possibility of changing
> drivers/thermal/devfreq_cooling.c to devfreq_verify_within_limits().
> Besides that it's not what is currently used, do you see any technical
> concerns that would make devfreq_verify_within_limits() an unsuitable
> or inferior solution?
As we already discussed, devfreq_verify_within_limits() doesn't support
the multiple outside controllers (e.g., devfreq-cooling.c).
After you are suggesting the throttler core, there are at least two
outside controllers (e.g., devfreq-cooling.c and throttler driver).
As I knew the problem about conflict, I cannot agree the temporary
method. OPP interface is mandatory for devfreq in order to control
the OPP (frequency/voltage). In this situation, we have to try to
find the method through OPP interface.
We can refer to regulator/clock. Multiple device driver can use
the regulator/clock without any problem. I think that usage of OPP
is similiar with regulator/clock. As you mentioned, maybe OPP
would handle the negative count. Although opp_enable/opp_disable()
have to handle the negative count and opp_enable/opp_disable()
can support the multiple usage from device drivers, I think that
this approach is right.
>
>> I want to use only OPP interface to enable/disable frequency
>> even if we have to modify the OPP interface.
>
> These are the concerns I raised earlier about a solution with OPP
> usage counts:
>
> "This would require supporting negative usage count values, since a OPP
> should not be enabled if e.g. thermal enables it but the throttler
> disabled it or viceversa.
Already replied about negative usage count. I think that negative usage count
is not problem if this approach could resolve the issue.
>
> Theoretically there could also be conflicts, like one driver disabling
> the higher OPPs and another the lower ones, with the outcome of all
> OPPs being disabled, which would be a more drastic conflict resolution
> than that of devfreq_verify_within_limits()."
>
> What do you think about these points?
It depends on how to use OPP interface on multiple device driver.
Even if devfreq/opp provides the control method, outside device driver
are misusing them. It is problem of user.
Instead, if we use the OPP interface, we can check why OPP entry
is disabled or enabled through usage count.
>
> The negative usage counts aren't necessarily a dealbreaker in a
> technical sense, though I'm not a friend of quirky interfaces that
> don't behave like a typical user would expect (e.g. an OPP isn't
> necessarily enabled after dev_pm_opp_enable()).
>
> I can sent an RFC with OPP usage counts, though due to the above
> concerns I have doubts it will be well received.
Please add me to Cc list.
>
> Thanks
>
> Matthias
>
>
--
Best Regards,
Chanwoo Choi
Samsung Electronics
Hi Chanwoo,
On Thu, Aug 02, 2018 at 10:58:59AM +0900, Chanwoo Choi wrote:
> Hi Matthias,
>
> On 2018년 08월 02일 02:08, Matthias Kaehlcke wrote:
> > Hi Chanwoo,
> >
> > On Wed, Aug 01, 2018 at 10:22:16AM +0900, Chanwoo Choi wrote:
> >> On 2018년 08월 01일 04:39, Matthias Kaehlcke wrote:
> >>> On Mon, Jul 16, 2018 at 10:50:50AM -0700, Matthias Kaehlcke wrote:
> >>>> On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote:
> >>>>> Hi Matthias,
> >>>>>
> >>>>> On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote:
> >>>>>> Hi Chanwoo,
> >>>>>>
> >>>>>> On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote:
> >>>>>>
> >>>>>>> Firstly,
> >>>>>>> I'm not sure why devfreq needs the devfreq_verify_within_limits() function.
> >>>>>>>
> >>>>>>> devfreq already used the OPP interface as default. It means that
> >>>>>>> the outside of 'drivers/devfreq' can disable/enable the frequency
> >>>>>>> such as drivers/thermal/devfreq_cooling.c. Also, when some device
> >>>>>>> drivers disable/enable the specific frequency, the devfreq core
> >>>>>>> consider them.
> >>>>>>>
> >>>>>>> So, devfreq doesn't need to devfreq_verify_within_limits() because
> >>>>>>> already support some interface to change the minimum/maximum frequency
> >>>>>>> of devfreq device.
> >>>>>>>
> >>>>>>> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()'
> >>>>>>> to change the minimum/maximum frequency of cpu. some device driver cannot
> >>>>>>> change the minimum/maximum frequency through OPP interface.
> >>>>>>>
> >>>>>>> But, in case of devfreq subsystem, as I explained already, devfreq support
> >>>>>>> the OPP interface as default way. devfreq subsystem doesn't need to add
> >>>>>>> other way to change the minimum/maximum frequency.
> >>>>>>
> >>>>>> Using the OPP interface exclusively works as long as a
> >>>>>> enabling/disabling of OPPs is limited to a single driver
> >>>>>> (drivers/thermal/devfreq_cooling.c). When multiple drivers are
> >>>>>> involved you need a way to resolve conflicts, that's the purpose of
> >>>>>> devfreq_verify_within_limits(). Please let me know if there are
> >>>>>> existing mechanisms for conflict resolution that I overlooked.
> >>>>>>
> >>>>>> Possibly drivers/thermal/devfreq_cooling.c could be migrated to use
> >>>>>> devfreq_verify_within_limits() instead of the OPP interface if
> >>>>>> desired, however this seems beyond the scope of this series.
> >>>>>
> >>>>> Actually, if we uses this approach, it doesn't support the multiple drivers too.
> >>>>> If non throttler drivers uses devfreq_verify_within_limits(), the conflict
> >>>>> happen.
> >>>>
> >>>> As long as drivers limit the max freq there is no conflict, the lowest
> >>>> max freq wins. I expect this to be the usual case, apparently it
> >>>> worked for cpufreq for 10+ years.
> >>>>
> >>>> However it is correct that there would be a conflict if a driver
> >>>> requests a min freq that is higher than the max freq requested by
> >>>> another. In this case devfreq_verify_within_limits() resolves the
> >>>> conflict by raising p->max to the min freq. Not sure if this is
> >>>> something that would ever occur in practice though.
> >>>>
> >>>> If we are really concerned about this case it would also be an option
> >>>> to limit the adjustment to the max frequency.
> >>>>
> >>>>> To resolve the conflict for multiple device driver, maybe OPP interface
> >>>>> have to support 'usage_count' such as clk_enable/disable().
> >>>>
> >>>> This would require supporting negative usage count values, since a OPP
> >>>> should not be enabled if e.g. thermal enables it but the throttler
> >>>> disabled it or viceversa.
> >>>>
> >>>> Theoretically there could also be conflicts, like one driver disabling
> >>>> the higher OPPs and another the lower ones, with the outcome of all
> >>>> OPPs being disabled, which would be a more drastic conflict resolution
> >>>> than that of devfreq_verify_within_limits().
> >>>>
> >>>> Viresh, what do you think about an OPP usage count?
> >>>
> >>> Ping, can we try to reach a conclusion on this or at least keep the
> >>> discussion going?
> >>>
> >>> Not that it matters, but my preferred solution continues to be
> >>> devfreq_verify_within_limits(). It solves conflicts in some way (which
> >>> could be adjusted if needed) and has proven to work in practice for
> >>> 10+ years in a very similar sub-system.
> >>
> >> It is not true. Current cpufreq subsystem doesn't support external OPP
> >> control to enable/disable the OPP entry. If some device driver
> >> controls the OPP entry of cpufreq driver with opp_disable/enable(),
> >> the operation is not working. Because cpufreq considers the limit
> >> through 'cpufreq_verify_with_limits()' only.
> >
> > Ok, we can probably agree that using cpufreq_verify_with_limits()
> > exclusively seems to have worked well for cpufreq, and that in their
> > overall purpose cpufreq and devfreq are similar subsystems.
> >
> > The current throttler series with devfreq_verify_within_limits() takes
> > the enabled OPPs into account, the lowest and highest OPP are used as
> > a starting point for the frequency adjustment and (in theory) the
> > frequency range should only be narrowed by
> > devfreq_verify_within_limits().
> >
> >> As I already commented[1], there is different between cpufreq and devfreq.
> >> [1] https://lkml.org/lkml/2018/7/4/80
> >>
> >> Already, subsystem already used OPP interface in order to control
> >> specific OPP entry. I don't want to provide two outside method
> >> to control the frequency of devfreq driver. It might make the confusion.
> >
> > I understand your point, it would indeed be preferable to have a
> > single method. However I'm not convinced that the OPP interface is
> > a suitable solution, as I exposed earlier in this thread (quoted
> > below).
> >
> > I would like you to at least consider the possibility of changing
> > drivers/thermal/devfreq_cooling.c to devfreq_verify_within_limits().
> > Besides that it's not what is currently used, do you see any technical
> > concerns that would make devfreq_verify_within_limits() an unsuitable
> > or inferior solution?
>
> As we already discussed, devfreq_verify_within_limits() doesn't support
> the multiple outside controllers (e.g., devfreq-cooling.c).
That's incorrect, its purpose is precisely that.
Are you suggesting that cpufreq with its use of
cpufreq_verify_within_limits() (the inspiration for
devfreq_verify_within_limits()) is broken? It is used by cpu_cooling.c
and other drivers when receiving a CPUFREQ_ADJUST event, essentially
what I am proposing with DEVFREQ_ADJUST.
Could you elaborate why this model wouldn't work for devfreq? "OPP
interface is mandatory for devfreq" isn't really a technical argument,
is it mandatory for any other reason than that it is the interface
that is currently used?
> After you are suggesting the throttler core, there are at least two
> outside controllers (e.g., devfreq-cooling.c and throttler driver).
> As I knew the problem about conflict, I cannot agree the temporary
> method. OPP interface is mandatory for devfreq in order to control
> the OPP (frequency/voltage). In this situation, we have to try to
> find the method through OPP interface.
What do you mean with "temporary method"?
We can try to find a method through the OPP interface, but at this
point I'm not convinced that it is technically necessary or even
preferable.
Another inconvenient of the OPP approach for both devfreq-cooling.c
and the throttler is that they have to bother with disabling all OPPs
above/below the max/min (they don't/shouldn't have to care), instead
of just telling devfreq the max/min.
> We can refer to regulator/clock. Multiple device driver can use
> the regulator/clock without any problem. I think that usage of OPP
> is similiar with regulator/clock. As you mentioned, maybe OPP
> would handle the negative count. Although opp_enable/opp_disable()
> have to handle the negative count and opp_enable/opp_disable()
> can support the multiple usage from device drivers, I think that
> this approach is right.
The regulator/clock approach with the typical usage counts seems more
intuitive to me, personally I wouldn't write an interface with
negative usage count if I could reasonably avoid it.
> >> I want to use only OPP interface to enable/disable frequency
> >> even if we have to modify the OPP interface.
> >
> > These are the concerns I raised earlier about a solution with OPP
> > usage counts:
> >
> > "This would require supporting negative usage count values, since a OPP
> > should not be enabled if e.g. thermal enables it but the throttler
> > disabled it or viceversa.
>
> Already replied about negative usage count. I think that negative usage count
> is not problem if this approach could resolve the issue.
>
> >
> > Theoretically there could also be conflicts, like one driver disabling
> > the higher OPPs and another the lower ones, with the outcome of all
> > OPPs being disabled, which would be a more drastic conflict resolution
> > than that of devfreq_verify_within_limits()."
> >
> > What do you think about these points?
>
> It depends on how to use OPP interface on multiple device driver.
> Even if devfreq/opp provides the control method, outside device driver
> are misusing them. It is problem of user.
I wouldn't call it misusing if two independent drivers take
contradictory actions on an interface that doesn't provide
arbitration. How can driver A know that it shouldn't disable OPPs a, b
and c because driver B disabled d, e and f? Who is misusing the
interface, driver A or driver B?
> Instead, if we use the OPP interface, we can check why OPP entry
> is disabled or enabled through usage count.
>
> >
> > The negative usage counts aren't necessarily a dealbreaker in a
> > technical sense, though I'm not a friend of quirky interfaces that
> > don't behave like a typical user would expect (e.g. an OPP isn't
> > necessarily enabled after dev_pm_opp_enable()).
> >
> > I can sent an RFC with OPP usage counts, though due to the above
> > concerns I have doubts it will be well received.
>
> Please add me to Cc list.
Will do
Thanks
Matthias
Hi Chanwoo,
this patch and "PM / devfreq: Fix handling of min/max_freq == 0"
address issues not directly related with the throttler. It seems it
could still take a while for the throttler to move forward, do you
want me to spin out these two patches so that they can get merged
independently from the rest of the series?
Thanks
Matthias
On Tue, Jul 03, 2018 at 04:46:56PM -0700, Matthias Kaehlcke wrote:
> Several governors use the user space limits df->min/max_freq to adjust
> the target frequency. This is not necessary, since update_devfreq()
> already takes care of this. Instead the governor can request the available
> min/max frequency by setting the target frequency to DEVFREQ_MIN/MAX_FREQ
> and let update_devfreq() take care of any adjustments.
>
> Signed-off-by: Matthias Kaehlcke <[email protected]>
> Reviewed-by: Brian Norris <[email protected]>
> ---
> Changes in v5:
> - none
>
> Changes in v4:
> - added 'Reviewed-by: Brian Norris <[email protected]>' tag
>
> Changes in v3:
> - none
>
> Changes in v2:
> - squashed "PM / devfreq: Remove redundant frequency adjustment from governors"
> and "PM / devfreq: governors: Return device frequency limits instead of user
> limits"
> - updated subject and commit message
> - use DEVFREQ_MIN/MAX_FREQ instead of df->scaling_min/max_freq
> ---
> drivers/devfreq/governor.h | 3 +++
> drivers/devfreq/governor_performance.c | 5 +----
> drivers/devfreq/governor_powersave.c | 2 +-
> drivers/devfreq/governor_simpleondemand.c | 12 +++---------
> drivers/devfreq/governor_userspace.c | 16 ++++------------
> 5 files changed, 12 insertions(+), 26 deletions(-)
>
> diff --git a/drivers/devfreq/governor.h b/drivers/devfreq/governor.h
> index cfc50a61a90d..b81700244ce3 100644
> --- a/drivers/devfreq/governor.h
> +++ b/drivers/devfreq/governor.h
> @@ -25,6 +25,9 @@
> #define DEVFREQ_GOV_SUSPEND 0x4
> #define DEVFREQ_GOV_RESUME 0x5
>
> +#define DEVFREQ_MIN_FREQ 0
> +#define DEVFREQ_MAX_FREQ ULONG_MAX
> +
> /**
> * struct devfreq_governor - Devfreq policy governor
> * @node: list node - contains registered devfreq governors
> diff --git a/drivers/devfreq/governor_performance.c b/drivers/devfreq/governor_performance.c
> index 4d23ecfbd948..ded429fd51be 100644
> --- a/drivers/devfreq/governor_performance.c
> +++ b/drivers/devfreq/governor_performance.c
> @@ -20,10 +20,7 @@ static int devfreq_performance_func(struct devfreq *df,
> * target callback should be able to get floor value as
> * said in devfreq.h
> */
> - if (!df->max_freq)
> - *freq = UINT_MAX;
> - else
> - *freq = df->max_freq;
> + *freq = DEVFREQ_MAX_FREQ;
> return 0;
> }
>
> diff --git a/drivers/devfreq/governor_powersave.c b/drivers/devfreq/governor_powersave.c
> index 0c42f23249ef..9e8897f5ac42 100644
> --- a/drivers/devfreq/governor_powersave.c
> +++ b/drivers/devfreq/governor_powersave.c
> @@ -20,7 +20,7 @@ static int devfreq_powersave_func(struct devfreq *df,
> * target callback should be able to get ceiling value as
> * said in devfreq.h
> */
> - *freq = df->min_freq;
> + *freq = DEVFREQ_MIN_FREQ;
> return 0;
> }
>
> diff --git a/drivers/devfreq/governor_simpleondemand.c b/drivers/devfreq/governor_simpleondemand.c
> index 28e0f2de7100..c0417f0e081e 100644
> --- a/drivers/devfreq/governor_simpleondemand.c
> +++ b/drivers/devfreq/governor_simpleondemand.c
> @@ -27,7 +27,6 @@ static int devfreq_simple_ondemand_func(struct devfreq *df,
> unsigned int dfso_upthreshold = DFSO_UPTHRESHOLD;
> unsigned int dfso_downdifferential = DFSO_DOWNDIFFERENCTIAL;
> struct devfreq_simple_ondemand_data *data = df->data;
> - unsigned long max = (df->max_freq) ? df->max_freq : UINT_MAX;
>
> err = devfreq_update_stats(df);
> if (err)
> @@ -47,7 +46,7 @@ static int devfreq_simple_ondemand_func(struct devfreq *df,
>
> /* Assume MAX if it is going to be divided by zero */
> if (stat->total_time == 0) {
> - *freq = max;
> + *freq = DEVFREQ_MAX_FREQ;
> return 0;
> }
>
> @@ -60,13 +59,13 @@ static int devfreq_simple_ondemand_func(struct devfreq *df,
> /* Set MAX if it's busy enough */
> if (stat->busy_time * 100 >
> stat->total_time * dfso_upthreshold) {
> - *freq = max;
> + *freq = DEVFREQ_MAX_FREQ;
> return 0;
> }
>
> /* Set MAX if we do not know the initial frequency */
> if (stat->current_frequency == 0) {
> - *freq = max;
> + *freq = DEVFREQ_MAX_FREQ;
> return 0;
> }
>
> @@ -85,11 +84,6 @@ static int devfreq_simple_ondemand_func(struct devfreq *df,
> b = div_u64(b, (dfso_upthreshold - dfso_downdifferential / 2));
> *freq = (unsigned long) b;
>
> - if (df->min_freq && *freq < df->min_freq)
> - *freq = df->min_freq;
> - if (df->max_freq && *freq > df->max_freq)
> - *freq = df->max_freq;
> -
> return 0;
> }
>
> diff --git a/drivers/devfreq/governor_userspace.c b/drivers/devfreq/governor_userspace.c
> index 080607c3f34d..378d84c011df 100644
> --- a/drivers/devfreq/governor_userspace.c
> +++ b/drivers/devfreq/governor_userspace.c
> @@ -26,19 +26,11 @@ static int devfreq_userspace_func(struct devfreq *df, unsigned long *freq)
> {
> struct userspace_data *data = df->data;
>
> - if (data->valid) {
> - unsigned long adjusted_freq = data->user_frequency;
> -
> - if (df->max_freq && adjusted_freq > df->max_freq)
> - adjusted_freq = df->max_freq;
> -
> - if (df->min_freq && adjusted_freq < df->min_freq)
> - adjusted_freq = df->min_freq;
> -
> - *freq = adjusted_freq;
> - } else {
> + if (data->valid)
> + *freq = data->user_frequency;
> + else
> *freq = df->previous_freq; /* No user freq specified yet */
> - }
> +
> return 0;
> }
>
On Thu, Aug 02, 2018 at 04:13:43PM -0700, Matthias Kaehlcke wrote:
> Hi Chanwoo,
>
> On Thu, Aug 02, 2018 at 10:58:59AM +0900, Chanwoo Choi wrote:
> > Hi Matthias,
> >
> > On 2018년 08월 02일 02:08, Matthias Kaehlcke wrote:
> > > Hi Chanwoo,
> > >
> > > On Wed, Aug 01, 2018 at 10:22:16AM +0900, Chanwoo Choi wrote:
> > >> On 2018년 08월 01일 04:39, Matthias Kaehlcke wrote:
> > >>> On Mon, Jul 16, 2018 at 10:50:50AM -0700, Matthias Kaehlcke wrote:
> > >>>> On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote:
> > >>>>> Hi Matthias,
> > >>>>>
> > >>>>> On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote:
> > >>>>>> Hi Chanwoo,
> > >>>>>>
> > >>>>>> On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote:
> > >>>>>>
> > >>>>>>> Firstly,
> > >>>>>>> I'm not sure why devfreq needs the devfreq_verify_within_limits() function.
> > >>>>>>>
> > >>>>>>> devfreq already used the OPP interface as default. It means that
> > >>>>>>> the outside of 'drivers/devfreq' can disable/enable the frequency
> > >>>>>>> such as drivers/thermal/devfreq_cooling.c. Also, when some device
> > >>>>>>> drivers disable/enable the specific frequency, the devfreq core
> > >>>>>>> consider them.
> > >>>>>>>
> > >>>>>>> So, devfreq doesn't need to devfreq_verify_within_limits() because
> > >>>>>>> already support some interface to change the minimum/maximum frequency
> > >>>>>>> of devfreq device.
> > >>>>>>>
> > >>>>>>> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()'
> > >>>>>>> to change the minimum/maximum frequency of cpu. some device driver cannot
> > >>>>>>> change the minimum/maximum frequency through OPP interface.
> > >>>>>>>
> > >>>>>>> But, in case of devfreq subsystem, as I explained already, devfreq support
> > >>>>>>> the OPP interface as default way. devfreq subsystem doesn't need to add
> > >>>>>>> other way to change the minimum/maximum frequency.
> > >>>>>>
> > >>>>>> Using the OPP interface exclusively works as long as a
> > >>>>>> enabling/disabling of OPPs is limited to a single driver
> > >>>>>> (drivers/thermal/devfreq_cooling.c). When multiple drivers are
> > >>>>>> involved you need a way to resolve conflicts, that's the purpose of
> > >>>>>> devfreq_verify_within_limits(). Please let me know if there are
> > >>>>>> existing mechanisms for conflict resolution that I overlooked.
> > >>>>>>
> > >>>>>> Possibly drivers/thermal/devfreq_cooling.c could be migrated to use
> > >>>>>> devfreq_verify_within_limits() instead of the OPP interface if
> > >>>>>> desired, however this seems beyond the scope of this series.
> > >>>>>
> > >>>>> Actually, if we uses this approach, it doesn't support the multiple drivers too.
> > >>>>> If non throttler drivers uses devfreq_verify_within_limits(), the conflict
> > >>>>> happen.
> > >>>>
> > >>>> As long as drivers limit the max freq there is no conflict, the lowest
> > >>>> max freq wins. I expect this to be the usual case, apparently it
> > >>>> worked for cpufreq for 10+ years.
> > >>>>
> > >>>> However it is correct that there would be a conflict if a driver
> > >>>> requests a min freq that is higher than the max freq requested by
> > >>>> another. In this case devfreq_verify_within_limits() resolves the
> > >>>> conflict by raising p->max to the min freq. Not sure if this is
> > >>>> something that would ever occur in practice though.
> > >>>>
> > >>>> If we are really concerned about this case it would also be an option
> > >>>> to limit the adjustment to the max frequency.
> > >>>>
> > >>>>> To resolve the conflict for multiple device driver, maybe OPP interface
> > >>>>> have to support 'usage_count' such as clk_enable/disable().
> > >>>>
> > >>>> This would require supporting negative usage count values, since a OPP
> > >>>> should not be enabled if e.g. thermal enables it but the throttler
> > >>>> disabled it or viceversa.
> > >>>>
> > >>>> Theoretically there could also be conflicts, like one driver disabling
> > >>>> the higher OPPs and another the lower ones, with the outcome of all
> > >>>> OPPs being disabled, which would be a more drastic conflict resolution
> > >>>> than that of devfreq_verify_within_limits().
> > >>>>
> > >>>> Viresh, what do you think about an OPP usage count?
> > >>>
> > >>> Ping, can we try to reach a conclusion on this or at least keep the
> > >>> discussion going?
> > >>>
> > >>> Not that it matters, but my preferred solution continues to be
> > >>> devfreq_verify_within_limits(). It solves conflicts in some way (which
> > >>> could be adjusted if needed) and has proven to work in practice for
> > >>> 10+ years in a very similar sub-system.
> > >>
> > >> It is not true. Current cpufreq subsystem doesn't support external OPP
> > >> control to enable/disable the OPP entry. If some device driver
> > >> controls the OPP entry of cpufreq driver with opp_disable/enable(),
> > >> the operation is not working. Because cpufreq considers the limit
> > >> through 'cpufreq_verify_with_limits()' only.
> > >
> > > Ok, we can probably agree that using cpufreq_verify_with_limits()
> > > exclusively seems to have worked well for cpufreq, and that in their
> > > overall purpose cpufreq and devfreq are similar subsystems.
> > >
> > > The current throttler series with devfreq_verify_within_limits() takes
> > > the enabled OPPs into account, the lowest and highest OPP are used as
> > > a starting point for the frequency adjustment and (in theory) the
> > > frequency range should only be narrowed by
> > > devfreq_verify_within_limits().
> > >
> > >> As I already commented[1], there is different between cpufreq and devfreq.
> > >> [1] https://lkml.org/lkml/2018/7/4/80
> > >>
> > >> Already, subsystem already used OPP interface in order to control
> > >> specific OPP entry. I don't want to provide two outside method
> > >> to control the frequency of devfreq driver. It might make the confusion.
> > >
> > > I understand your point, it would indeed be preferable to have a
> > > single method. However I'm not convinced that the OPP interface is
> > > a suitable solution, as I exposed earlier in this thread (quoted
> > > below).
> > >
> > > I would like you to at least consider the possibility of changing
> > > drivers/thermal/devfreq_cooling.c to devfreq_verify_within_limits().
> > > Besides that it's not what is currently used, do you see any technical
> > > concerns that would make devfreq_verify_within_limits() an unsuitable
> > > or inferior solution?
> >
> > As we already discussed, devfreq_verify_within_limits() doesn't support
> > the multiple outside controllers (e.g., devfreq-cooling.c).
>
> That's incorrect, its purpose is precisely that.
>
> Are you suggesting that cpufreq with its use of
> cpufreq_verify_within_limits() (the inspiration for
> devfreq_verify_within_limits()) is broken? It is used by cpu_cooling.c
> and other drivers when receiving a CPUFREQ_ADJUST event, essentially
> what I am proposing with DEVFREQ_ADJUST.
>
> Could you elaborate why this model wouldn't work for devfreq? "OPP
> interface is mandatory for devfreq" isn't really a technical argument,
> is it mandatory for any other reason than that it is the interface
> that is currently used?
>
> > After you are suggesting the throttler core, there are at least two
> > outside controllers (e.g., devfreq-cooling.c and throttler driver).
> > As I knew the problem about conflict, I cannot agree the temporary
> > method. OPP interface is mandatory for devfreq in order to control
> > the OPP (frequency/voltage). In this situation, we have to try to
> > find the method through OPP interface.
>
> What do you mean with "temporary method"?
>
> We can try to find a method through the OPP interface, but at this
> point I'm not convinced that it is technically necessary or even
> preferable.
>
> Another inconvenient of the OPP approach for both devfreq-cooling.c
> and the throttler is that they have to bother with disabling all OPPs
> above/below the max/min (they don't/shouldn't have to care), instead
> of just telling devfreq the max/min.
And a more important one: both drivers now have to keep track which
OPPs they enabled/disabled previously, done are the days of a simple
dev_pm_opp_enable/disable() in devfreq_cooling. Certainly it is
possible and not very complex to implement, but is it really the
best/a good solution?
Hi Matthias,
On 2018년 08월 03일 08:13, Matthias Kaehlcke wrote:
> Hi Chanwoo,
>
> On Thu, Aug 02, 2018 at 10:58:59AM +0900, Chanwoo Choi wrote:
>> Hi Matthias,
>>
>> On 2018년 08월 02일 02:08, Matthias Kaehlcke wrote:
>>> Hi Chanwoo,
>>>
>>> On Wed, Aug 01, 2018 at 10:22:16AM +0900, Chanwoo Choi wrote:
>>>> On 2018년 08월 01일 04:39, Matthias Kaehlcke wrote:
>>>>> On Mon, Jul 16, 2018 at 10:50:50AM -0700, Matthias Kaehlcke wrote:
>>>>>> On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote:
>>>>>>> Hi Matthias,
>>>>>>>
>>>>>>> On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote:
>>>>>>>> Hi Chanwoo,
>>>>>>>>
>>>>>>>> On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote:
>>>>>>>>
>>>>>>>>> Firstly,
>>>>>>>>> I'm not sure why devfreq needs the devfreq_verify_within_limits() function.
>>>>>>>>>
>>>>>>>>> devfreq already used the OPP interface as default. It means that
>>>>>>>>> the outside of 'drivers/devfreq' can disable/enable the frequency
>>>>>>>>> such as drivers/thermal/devfreq_cooling.c. Also, when some device
>>>>>>>>> drivers disable/enable the specific frequency, the devfreq core
>>>>>>>>> consider them.
>>>>>>>>>
>>>>>>>>> So, devfreq doesn't need to devfreq_verify_within_limits() because
>>>>>>>>> already support some interface to change the minimum/maximum frequency
>>>>>>>>> of devfreq device.
>>>>>>>>>
>>>>>>>>> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()'
>>>>>>>>> to change the minimum/maximum frequency of cpu. some device driver cannot
>>>>>>>>> change the minimum/maximum frequency through OPP interface.
>>>>>>>>>
>>>>>>>>> But, in case of devfreq subsystem, as I explained already, devfreq support
>>>>>>>>> the OPP interface as default way. devfreq subsystem doesn't need to add
>>>>>>>>> other way to change the minimum/maximum frequency.
>>>>>>>>
>>>>>>>> Using the OPP interface exclusively works as long as a
>>>>>>>> enabling/disabling of OPPs is limited to a single driver
>>>>>>>> (drivers/thermal/devfreq_cooling.c). When multiple drivers are
>>>>>>>> involved you need a way to resolve conflicts, that's the purpose of
>>>>>>>> devfreq_verify_within_limits(). Please let me know if there are
>>>>>>>> existing mechanisms for conflict resolution that I overlooked.
>>>>>>>>
>>>>>>>> Possibly drivers/thermal/devfreq_cooling.c could be migrated to use
>>>>>>>> devfreq_verify_within_limits() instead of the OPP interface if
>>>>>>>> desired, however this seems beyond the scope of this series.
>>>>>>>
>>>>>>> Actually, if we uses this approach, it doesn't support the multiple drivers too.
>>>>>>> If non throttler drivers uses devfreq_verify_within_limits(), the conflict
>>>>>>> happen.
>>>>>>
>>>>>> As long as drivers limit the max freq there is no conflict, the lowest
>>>>>> max freq wins. I expect this to be the usual case, apparently it
>>>>>> worked for cpufreq for 10+ years.
>>>>>>
>>>>>> However it is correct that there would be a conflict if a driver
>>>>>> requests a min freq that is higher than the max freq requested by
>>>>>> another. In this case devfreq_verify_within_limits() resolves the
>>>>>> conflict by raising p->max to the min freq. Not sure if this is
>>>>>> something that would ever occur in practice though.
>>>>>>
>>>>>> If we are really concerned about this case it would also be an option
>>>>>> to limit the adjustment to the max frequency.
>>>>>>
>>>>>>> To resolve the conflict for multiple device driver, maybe OPP interface
>>>>>>> have to support 'usage_count' such as clk_enable/disable().
>>>>>>
>>>>>> This would require supporting negative usage count values, since a OPP
>>>>>> should not be enabled if e.g. thermal enables it but the throttler
>>>>>> disabled it or viceversa.
>>>>>>
>>>>>> Theoretically there could also be conflicts, like one driver disabling
>>>>>> the higher OPPs and another the lower ones, with the outcome of all
>>>>>> OPPs being disabled, which would be a more drastic conflict resolution
>>>>>> than that of devfreq_verify_within_limits().
>>>>>>
>>>>>> Viresh, what do you think about an OPP usage count?
>>>>>
>>>>> Ping, can we try to reach a conclusion on this or at least keep the
>>>>> discussion going?
>>>>>
>>>>> Not that it matters, but my preferred solution continues to be
>>>>> devfreq_verify_within_limits(). It solves conflicts in some way (which
>>>>> could be adjusted if needed) and has proven to work in practice for
>>>>> 10+ years in a very similar sub-system.
>>>>
>>>> It is not true. Current cpufreq subsystem doesn't support external OPP
>>>> control to enable/disable the OPP entry. If some device driver
>>>> controls the OPP entry of cpufreq driver with opp_disable/enable(),
>>>> the operation is not working. Because cpufreq considers the limit
>>>> through 'cpufreq_verify_with_limits()' only.
>>>
>>> Ok, we can probably agree that using cpufreq_verify_with_limits()
>>> exclusively seems to have worked well for cpufreq, and that in their
>>> overall purpose cpufreq and devfreq are similar subsystems.
>>>
>>> The current throttler series with devfreq_verify_within_limits() takes
>>> the enabled OPPs into account, the lowest and highest OPP are used as
>>> a starting point for the frequency adjustment and (in theory) the
>>> frequency range should only be narrowed by
>>> devfreq_verify_within_limits().
>>>
>>>> As I already commented[1], there is different between cpufreq and devfreq.
>>>> [1] https://lkml.org/lkml/2018/7/4/80
>>>>
>>>> Already, subsystem already used OPP interface in order to control
>>>> specific OPP entry. I don't want to provide two outside method
>>>> to control the frequency of devfreq driver. It might make the confusion.
>>>
>>> I understand your point, it would indeed be preferable to have a
>>> single method. However I'm not convinced that the OPP interface is
>>> a suitable solution, as I exposed earlier in this thread (quoted
>>> below).
>>>
>>> I would like you to at least consider the possibility of changing
>>> drivers/thermal/devfreq_cooling.c to devfreq_verify_within_limits().
>>> Besides that it's not what is currently used, do you see any technical
>>> concerns that would make devfreq_verify_within_limits() an unsuitable
>>> or inferior solution?
>>
>> As we already discussed, devfreq_verify_within_limits() doesn't support
>> the multiple outside controllers (e.g., devfreq-cooling.c).
>
> That's incorrect, its purpose is precisely that.
>
> Are you suggesting that cpufreq with its use of
> cpufreq_verify_within_limits() (the inspiration for
> devfreq_verify_within_limits()) is broken? It is used by cpu_cooling.c
> and other drivers when receiving a CPUFREQ_ADJUST event, essentially
> what I am proposing with DEVFREQ_ADJUST.
>
> Could you elaborate why this model wouldn't work for devfreq? "OPP
I don't mention that this model is not working. As I already commented[1],
devfreq used OPP interface to control OPP entry on outside of devfreq driver.
Because devfreq used OPP interface, I hope to provide only OPP method
to control the frequency on outside of devfreq.
[1] https://lkml.org/lkml/2018/7/4/80
> interface is mandatory for devfreq" isn't really a technical argument,
> is it mandatory for any other reason than that it is the interface
> that is currently used?
In case of controlling the frequency, OPP interface is mandatory for devfreq.
cpufreq used cpufreq_verify_within_limit(). If outside driver disable
specific OPP entry, cpufreq don't consider them because after getting the frequency
from devicetree, cpufreq don't use the OPP interface for disabling/enabling.
Only if outside driver used cpufreq_verify_within_limit(), cpufreq consider
the range of minimum/maximum frequency. cpufreq core doesn't use 'dev_pm_opp_find_*'
function. It means that cpufreq code doesn't consider the statue of opp_diable/enable.
devfreq used OPP interface. If outside driver disable specific OPP entry, devfreq consider them.
When find available minimum frequency, devfreq used OPP interface. (find_available_min_freq)
When find available maximum frequency, devfreq used OPP interface. (find_available_max_freq)
When make freq_table of devfreq device, devfreq used OPP interface. (set_freq_table)
When outside driver disable or enable OPP entry, devfreq receives the notification
from OPP interface and then update the scaling_min_freq/scaling_max_freq by using
OPP interface. (devfreq_notifier_call)
At this point of using scaling_min_freq/scaling_max_freq on devfreq, it indicates
that devfreq used OPP interface because devfref tried to find scaling_min_freq/scaling_max_freq
through OPP interface.
If outside driver use OPP interface in order to control frequency,
devfreq core is well working without any modification of devfreq core.
>
>> After you are suggesting the throttler core, there are at least two
>> outside controllers (e.g., devfreq-cooling.c and throttler driver).
>> As I knew the problem about conflict, I cannot agree the temporary
>> method. OPP interface is mandatory for devfreq in order to control
>> the OPP (frequency/voltage). In this situation, we have to try to
>> find the method through OPP interface.
>
> What do you mean with "temporary method"?
this expression might be not proper. Please ignore this expression.
>
> We can try to find a method through the OPP interface, but at this
> point I'm not convinced that it is technically necessary or even
> preferable.
I replied it about this as following.
>
> Another inconvenient of the OPP approach for both devfreq-cooling.c
> and the throttler is that they have to bother with disabling all OPPs
> above/below the max/min (they don't/shouldn't have to care), instead
> of just telling devfreq the max/min.
I think it doesn't matter. We can enable/disable the OPP entry by traversing.
partition_enable_opps() in drivers/thermal/devfreq-cools.c have already done so.
>
>> We can refer to regulator/clock. Multiple device driver can use
>> the regulator/clock without any problem. I think that usage of OPP
>> is similiar with regulator/clock. As you mentioned, maybe OPP
>> would handle the negative count. Although opp_enable/opp_disable()
>> have to handle the negative count and opp_enable/opp_disable()
>> can support the multiple usage from device drivers, I think that
>> this approach is right.
>
> The regulator/clock approach with the typical usage counts seems more
> intuitive to me, personally I wouldn't write an interface with
> negative usage count if I could reasonably avoid it.
I think the use of negative usage count is not problem if it's required.
>
>>>> I want to use only OPP interface to enable/disable frequency
>>>> even if we have to modify the OPP interface.
>>>
>>> These are the concerns I raised earlier about a solution with OPP
>>> usage counts:
>>>
>>> "This would require supporting negative usage count values, since a OPP
>>> should not be enabled if e.g. thermal enables it but the throttler
>>> disabled it or viceversa.
>>
>> Already replied about negative usage count. I think that negative usage count
>> is not problem if this approach could resolve the issue.
>>
>>>
>>> Theoretically there could also be conflicts, like one driver disabling
>>> the higher OPPs and another the lower ones, with the outcome of all
>>> OPPs being disabled, which would be a more drastic conflict resolution
>>> than that of devfreq_verify_within_limits()."
>>>
>>> What do you think about these points?
>>
>> It depends on how to use OPP interface on multiple device driver.
>> Even if devfreq/opp provides the control method, outside device driver
>> are misusing them. It is problem of user.
>
> I wouldn't call it misusing if two independent drivers take
> contradictory actions on an interface that doesn't provide
> arbitration. How can driver A know that it shouldn't disable OPPs a, b
> and c because driver B disabled d, e and f? Who is misusing the
> interface, driver A or driver B?
Each outside driver has their own throttling policy to control OPP entries.
They don't care the requirement of other driver and cannot know the requirement
of other driver. devfreq core can only recognize them.
>
>> Instead, if we use the OPP interface, we can check why OPP entry
>> is disabled or enabled through usage count.
>>
>>>
>>> The negative usage counts aren't necessarily a dealbreaker in a
>>> technical sense, though I'm not a friend of quirky interfaces that
>>> don't behave like a typical user would expect (e.g. an OPP isn't
>>> necessarily enabled after dev_pm_opp_enable()).
>>>
>>> I can sent an RFC with OPP usage counts, though due to the above
>>> concerns I have doubts it will be well received.
>>
>> Please add me to Cc list.
>
> Will do
OK. Thanks.
--
Best Regards,
Chanwoo Choi
Samsung Electronics
Hi Matthias,
On 2018년 08월 03일 08:36, Matthias Kaehlcke wrote:
> Hi Chanwoo,
>
> this patch and "PM / devfreq: Fix handling of min/max_freq == 0"
> address issues not directly related with the throttler. It seems it
> could still take a while for the throttler to move forward, do you
> want me to spin out these two patches so that they can get merged
> independently from the rest of the series?
How about resend the devfreq patches(patch1/2/3/4/6) which don't depend on
throttler core with my reviewed tag? Maybe, it is easy to merge them through Myungjoo.
Regards,
Chanwoo Choi
>
> Thanks
>
> Matthias
>
> On Tue, Jul 03, 2018 at 04:46:56PM -0700, Matthias Kaehlcke wrote:
>> Several governors use the user space limits df->min/max_freq to adjust
>> the target frequency. This is not necessary, since update_devfreq()
>> already takes care of this. Instead the governor can request the available
>> min/max frequency by setting the target frequency to DEVFREQ_MIN/MAX_FREQ
>> and let update_devfreq() take care of any adjustments.
>>
>> Signed-off-by: Matthias Kaehlcke <[email protected]>
>> Reviewed-by: Brian Norris <[email protected]>
>> ---
>> Changes in v5:
>> - none
>>
>> Changes in v4:
>> - added 'Reviewed-by: Brian Norris <[email protected]>' tag
>>
>> Changes in v3:
>> - none
>>
>> Changes in v2:
>> - squashed "PM / devfreq: Remove redundant frequency adjustment from governors"
>> and "PM / devfreq: governors: Return device frequency limits instead of user
>> limits"
>> - updated subject and commit message
>> - use DEVFREQ_MIN/MAX_FREQ instead of df->scaling_min/max_freq
>> ---
>> drivers/devfreq/governor.h | 3 +++
>> drivers/devfreq/governor_performance.c | 5 +----
>> drivers/devfreq/governor_powersave.c | 2 +-
>> drivers/devfreq/governor_simpleondemand.c | 12 +++---------
>> drivers/devfreq/governor_userspace.c | 16 ++++------------
>> 5 files changed, 12 insertions(+), 26 deletions(-)
>>
>> diff --git a/drivers/devfreq/governor.h b/drivers/devfreq/governor.h
>> index cfc50a61a90d..b81700244ce3 100644
>> --- a/drivers/devfreq/governor.h
>> +++ b/drivers/devfreq/governor.h
>> @@ -25,6 +25,9 @@
>> #define DEVFREQ_GOV_SUSPEND 0x4
>> #define DEVFREQ_GOV_RESUME 0x5
>>
>> +#define DEVFREQ_MIN_FREQ 0
>> +#define DEVFREQ_MAX_FREQ ULONG_MAX
>> +
>> /**
>> * struct devfreq_governor - Devfreq policy governor
>> * @node: list node - contains registered devfreq governors
>> diff --git a/drivers/devfreq/governor_performance.c b/drivers/devfreq/governor_performance.c
>> index 4d23ecfbd948..ded429fd51be 100644
>> --- a/drivers/devfreq/governor_performance.c
>> +++ b/drivers/devfreq/governor_performance.c
>> @@ -20,10 +20,7 @@ static int devfreq_performance_func(struct devfreq *df,
>> * target callback should be able to get floor value as
>> * said in devfreq.h
>> */
>> - if (!df->max_freq)
>> - *freq = UINT_MAX;
>> - else
>> - *freq = df->max_freq;
>> + *freq = DEVFREQ_MAX_FREQ;
>> return 0;
>> }
>>
>> diff --git a/drivers/devfreq/governor_powersave.c b/drivers/devfreq/governor_powersave.c
>> index 0c42f23249ef..9e8897f5ac42 100644
>> --- a/drivers/devfreq/governor_powersave.c
>> +++ b/drivers/devfreq/governor_powersave.c
>> @@ -20,7 +20,7 @@ static int devfreq_powersave_func(struct devfreq *df,
>> * target callback should be able to get ceiling value as
>> * said in devfreq.h
>> */
>> - *freq = df->min_freq;
>> + *freq = DEVFREQ_MIN_FREQ;
>> return 0;
>> }
>>
>> diff --git a/drivers/devfreq/governor_simpleondemand.c b/drivers/devfreq/governor_simpleondemand.c
>> index 28e0f2de7100..c0417f0e081e 100644
>> --- a/drivers/devfreq/governor_simpleondemand.c
>> +++ b/drivers/devfreq/governor_simpleondemand.c
>> @@ -27,7 +27,6 @@ static int devfreq_simple_ondemand_func(struct devfreq *df,
>> unsigned int dfso_upthreshold = DFSO_UPTHRESHOLD;
>> unsigned int dfso_downdifferential = DFSO_DOWNDIFFERENCTIAL;
>> struct devfreq_simple_ondemand_data *data = df->data;
>> - unsigned long max = (df->max_freq) ? df->max_freq : UINT_MAX;
>>
>> err = devfreq_update_stats(df);
>> if (err)
>> @@ -47,7 +46,7 @@ static int devfreq_simple_ondemand_func(struct devfreq *df,
>>
>> /* Assume MAX if it is going to be divided by zero */
>> if (stat->total_time == 0) {
>> - *freq = max;
>> + *freq = DEVFREQ_MAX_FREQ;
>> return 0;
>> }
>>
>> @@ -60,13 +59,13 @@ static int devfreq_simple_ondemand_func(struct devfreq *df,
>> /* Set MAX if it's busy enough */
>> if (stat->busy_time * 100 >
>> stat->total_time * dfso_upthreshold) {
>> - *freq = max;
>> + *freq = DEVFREQ_MAX_FREQ;
>> return 0;
>> }
>>
>> /* Set MAX if we do not know the initial frequency */
>> if (stat->current_frequency == 0) {
>> - *freq = max;
>> + *freq = DEVFREQ_MAX_FREQ;
>> return 0;
>> }
>>
>> @@ -85,11 +84,6 @@ static int devfreq_simple_ondemand_func(struct devfreq *df,
>> b = div_u64(b, (dfso_upthreshold - dfso_downdifferential / 2));
>> *freq = (unsigned long) b;
>>
>> - if (df->min_freq && *freq < df->min_freq)
>> - *freq = df->min_freq;
>> - if (df->max_freq && *freq > df->max_freq)
>> - *freq = df->max_freq;
>> -
>> return 0;
>> }
>>
>> diff --git a/drivers/devfreq/governor_userspace.c b/drivers/devfreq/governor_userspace.c
>> index 080607c3f34d..378d84c011df 100644
>> --- a/drivers/devfreq/governor_userspace.c
>> +++ b/drivers/devfreq/governor_userspace.c
>> @@ -26,19 +26,11 @@ static int devfreq_userspace_func(struct devfreq *df, unsigned long *freq)
>> {
>> struct userspace_data *data = df->data;
>>
>> - if (data->valid) {
>> - unsigned long adjusted_freq = data->user_frequency;
>> -
>> - if (df->max_freq && adjusted_freq > df->max_freq)
>> - adjusted_freq = df->max_freq;
>> -
>> - if (df->min_freq && adjusted_freq < df->min_freq)
>> - adjusted_freq = df->min_freq;
>> -
>> - *freq = adjusted_freq;
>> - } else {
>> + if (data->valid)
>> + *freq = data->user_frequency;
>> + else
>> *freq = df->previous_freq; /* No user freq specified yet */
>> - }
>> +
>> return 0;
>> }
>>
>
>
Hi Matthias,
On 2018년 07월 12일 17:38, Chanwoo Choi wrote:
> Hi Matthias,
>
> On 2018년 07월 07일 02:07, Matthias Kaehlcke wrote:
>> Hi,
>>
>> On Wed, Jul 04, 2018 at 11:51:30AM +0900, Chanwoo Choi wrote:
>>> Hi,
>>>
>>> On 2018년 07월 04일 08:46, Matthias Kaehlcke wrote:
>>>> Move variables related with devfreq policy changes from struct devfreq
>>>> to the new struct devfreq_policy and add a policy field to struct devfreq.
>>>>
>>>> The following variables are moved:
>>>>
>>>> df->min/max_freq => p->user.min/max_freq
>>>> df->scaling_min/max_freq => p->devinfo.min/max_freq
>>>> df->governor => p->governor
>>>> df->governor_name => p->governor_name
>>>>
>>>> Signed-off-by: Matthias Kaehlcke <[email protected]>
>>>> Reviewed-by: Brian Norris <[email protected]>
>>>> ---
>>>> Changes in v5:
>>>> - none
>>>>
>>>> Changes in v4:
>>>> - added 'Reviewed-by: Brian Norris <[email protected]>' tag
>>>>
>>>> Changes in v3:
>>>> - none
>>>>
>>>> Changes in v2:
>>>> - performance, powersave and simpleondemand governors don't need changes
>>>> with "PM / devfreq: Don't adjust to user limits in governors"
>>>> - formatting fixes
>>>> ---
>>>> drivers/devfreq/devfreq.c | 137 ++++++++++++++++-------------
>>>> drivers/devfreq/governor_passive.c | 4 +-
>>>> include/linux/devfreq.h | 38 +++++---
>>
>>
>>>> 3 files changed, 103 insertions(+), 76 deletions(-)
>>>>
>>>
>>> (skip)
>>>
>>>>
>>>> diff --git a/drivers/devfreq/governor_passive.c b/drivers/devfreq/governor_passive.c
>>>> index 3bc29acbd54e..e0987c749ec2 100644
>>>> --- a/drivers/devfreq/governor_passive.c
>>>> +++ b/drivers/devfreq/governor_passive.c
>>>> @@ -99,12 +99,12 @@ static int update_devfreq_passive(struct devfreq *devfreq, unsigned long freq)
>>>> {
>>>> int ret;
>>>>
>>>> - if (!devfreq->governor)
>>>> + if (!devfreq->policy.governor)
>>>> return -EINVAL;
>>>>
>>>> mutex_lock_nested(&devfreq->lock, SINGLE_DEPTH_NESTING);
>>>>
>>>> - ret = devfreq->governor->get_target_freq(devfreq, &freq);
>>>> + ret = devfreq->policy.governor->get_target_freq(devfreq, &freq);
>>>> if (ret < 0)
>>>> goto out;
>>>>
>>>> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
>>>> index 3aae5b3af87c..9bf23b976f4d 100644
>>>> --- a/include/linux/devfreq.h
>>>> +++ b/include/linux/devfreq.h
>>>> @@ -109,6 +109,30 @@ struct devfreq_dev_profile {
>>>> unsigned int max_state;
>>>> };
>>>>
>>>> +/**
>>>> + * struct devfreq_freq_limits - Devfreq frequency limits
>>>> + * @min_freq: minimum frequency
>>>> + * @max_freq: maximum frequency
>>>> + */
>>>> +struct devfreq_freq_limits {
>>>> + unsigned long min_freq;
>>>> + unsigned long max_freq;
>>>> +};
>>>> +
>>>> +/**
>>>> + * struct devfreq_policy - Devfreq policy
>>>> + * @user: frequency limits requested by the user
>>>> + * @devinfo: frequency limits of the device (available OPPs)
>>>> + * @governor: method how to choose frequency based on the usage.
>>>
>>> nitpick. remove '.' on the end of line.
>>
>> Ok
>>
>>>> + * @governor_name: devfreq governor name for use with this devfreq
>>>> + */
>>>> +struct devfreq_policy {
>>>> + struct devfreq_freq_limits user;
>>>> + struct devfreq_freq_limits devinfo;
>>>> + const struct devfreq_governor *governor;
>>>> + char governor_name[DEVFREQ_NAME_LEN];
>>>> +};
>>>> +
>>>> /**
>>>> * struct devfreq - Device devfreq structure
>>>> * @node: list node - contains the devices with devfreq that have been
>>>> @@ -117,8 +141,6 @@ struct devfreq_dev_profile {
>>>> * @dev: device registered by devfreq class. dev.parent is the device
>>>> * using devfreq.
>>>> * @profile: device-specific devfreq profile
>>>> - * @governor: method how to choose frequency based on the usage.
>>>> - * @governor_name: devfreq governor name for use with this devfreq
>>>> * @nb: notifier block used to notify devfreq object that it should
>>>> * reevaluate operable frequencies. Devfreq users may use
>>>> * devfreq.nb to the corresponding register notifier call chain.
>>>> @@ -126,10 +148,7 @@ struct devfreq_dev_profile {
>>>> * @previous_freq: previously configured frequency value.
>>>> * @data: Private data of the governor. The devfreq framework does not
>>>> * touch this.
>>>> - * @min_freq: Limit minimum frequency requested by user (0: none)
>>>> - * @max_freq: Limit maximum frequency requested by user (0: none)
>>>> - * @scaling_min_freq: Limit minimum frequency requested by OPP interface
>>>> - * @scaling_max_freq: Limit maximum frequency requested by OPP interface
>>>> + * @policy: Policy for frequency adjustments
>>>
>>> The devfreq_policy contains the range of frequency and governor information.
>>> But, this description focus on the frequency. You need to explain the more
>>> correct description of 'policy'.
>>
>> I wouldn't say that the focus is on 'frequency', but on 'frequency
>> adjustments', and the governor is an integral part of them.
>
> OK. I agree your original description.
>
>>
>> I can change it to "Policy for frequency adjustments, including
>> frequency limits and the governor" if you prefer. I'm open to other
>> suggestions.
>
> (snip)
>
When you resend next patch with my comment except for description of 'policy',
feel free to add my reviewed-by tag.
Reviewed-by: Chanwoo Choi <[email protected]>
--
Best Regards,
Chanwoo Choi
Samsung Electronics
Hi Matthias,
On 2018년 08월 03일 08:48, Matthias Kaehlcke wrote:
> On Thu, Aug 02, 2018 at 04:13:43PM -0700, Matthias Kaehlcke wrote:
>> Hi Chanwoo,
>>
>> On Thu, Aug 02, 2018 at 10:58:59AM +0900, Chanwoo Choi wrote:
>>> Hi Matthias,
>>>
>>> On 2018년 08월 02일 02:08, Matthias Kaehlcke wrote:
>>>> Hi Chanwoo,
>>>>
>>>> On Wed, Aug 01, 2018 at 10:22:16AM +0900, Chanwoo Choi wrote:
>>>>> On 2018년 08월 01일 04:39, Matthias Kaehlcke wrote:
>>>>>> On Mon, Jul 16, 2018 at 10:50:50AM -0700, Matthias Kaehlcke wrote:
>>>>>>> On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote:
>>>>>>>> Hi Matthias,
>>>>>>>>
>>>>>>>> On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote:
>>>>>>>>> Hi Chanwoo,
>>>>>>>>>
>>>>>>>>> On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote:
>>>>>>>>>
>>>>>>>>>> Firstly,
>>>>>>>>>> I'm not sure why devfreq needs the devfreq_verify_within_limits() function.
>>>>>>>>>>
>>>>>>>>>> devfreq already used the OPP interface as default. It means that
>>>>>>>>>> the outside of 'drivers/devfreq' can disable/enable the frequency
>>>>>>>>>> such as drivers/thermal/devfreq_cooling.c. Also, when some device
>>>>>>>>>> drivers disable/enable the specific frequency, the devfreq core
>>>>>>>>>> consider them.
>>>>>>>>>>
>>>>>>>>>> So, devfreq doesn't need to devfreq_verify_within_limits() because
>>>>>>>>>> already support some interface to change the minimum/maximum frequency
>>>>>>>>>> of devfreq device.
>>>>>>>>>>
>>>>>>>>>> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()'
>>>>>>>>>> to change the minimum/maximum frequency of cpu. some device driver cannot
>>>>>>>>>> change the minimum/maximum frequency through OPP interface.
>>>>>>>>>>
>>>>>>>>>> But, in case of devfreq subsystem, as I explained already, devfreq support
>>>>>>>>>> the OPP interface as default way. devfreq subsystem doesn't need to add
>>>>>>>>>> other way to change the minimum/maximum frequency.
>>>>>>>>>
>>>>>>>>> Using the OPP interface exclusively works as long as a
>>>>>>>>> enabling/disabling of OPPs is limited to a single driver
>>>>>>>>> (drivers/thermal/devfreq_cooling.c). When multiple drivers are
>>>>>>>>> involved you need a way to resolve conflicts, that's the purpose of
>>>>>>>>> devfreq_verify_within_limits(). Please let me know if there are
>>>>>>>>> existing mechanisms for conflict resolution that I overlooked.
>>>>>>>>>
>>>>>>>>> Possibly drivers/thermal/devfreq_cooling.c could be migrated to use
>>>>>>>>> devfreq_verify_within_limits() instead of the OPP interface if
>>>>>>>>> desired, however this seems beyond the scope of this series.
>>>>>>>>
>>>>>>>> Actually, if we uses this approach, it doesn't support the multiple drivers too.
>>>>>>>> If non throttler drivers uses devfreq_verify_within_limits(), the conflict
>>>>>>>> happen.
>>>>>>>
>>>>>>> As long as drivers limit the max freq there is no conflict, the lowest
>>>>>>> max freq wins. I expect this to be the usual case, apparently it
>>>>>>> worked for cpufreq for 10+ years.
>>>>>>>
>>>>>>> However it is correct that there would be a conflict if a driver
>>>>>>> requests a min freq that is higher than the max freq requested by
>>>>>>> another. In this case devfreq_verify_within_limits() resolves the
>>>>>>> conflict by raising p->max to the min freq. Not sure if this is
>>>>>>> something that would ever occur in practice though.
>>>>>>>
>>>>>>> If we are really concerned about this case it would also be an option
>>>>>>> to limit the adjustment to the max frequency.
>>>>>>>
>>>>>>>> To resolve the conflict for multiple device driver, maybe OPP interface
>>>>>>>> have to support 'usage_count' such as clk_enable/disable().
>>>>>>>
>>>>>>> This would require supporting negative usage count values, since a OPP
>>>>>>> should not be enabled if e.g. thermal enables it but the throttler
>>>>>>> disabled it or viceversa.
>>>>>>>
>>>>>>> Theoretically there could also be conflicts, like one driver disabling
>>>>>>> the higher OPPs and another the lower ones, with the outcome of all
>>>>>>> OPPs being disabled, which would be a more drastic conflict resolution
>>>>>>> than that of devfreq_verify_within_limits().
>>>>>>>
>>>>>>> Viresh, what do you think about an OPP usage count?
>>>>>>
>>>>>> Ping, can we try to reach a conclusion on this or at least keep the
>>>>>> discussion going?
>>>>>>
>>>>>> Not that it matters, but my preferred solution continues to be
>>>>>> devfreq_verify_within_limits(). It solves conflicts in some way (which
>>>>>> could be adjusted if needed) and has proven to work in practice for
>>>>>> 10+ years in a very similar sub-system.
>>>>>
>>>>> It is not true. Current cpufreq subsystem doesn't support external OPP
>>>>> control to enable/disable the OPP entry. If some device driver
>>>>> controls the OPP entry of cpufreq driver with opp_disable/enable(),
>>>>> the operation is not working. Because cpufreq considers the limit
>>>>> through 'cpufreq_verify_with_limits()' only.
>>>>
>>>> Ok, we can probably agree that using cpufreq_verify_with_limits()
>>>> exclusively seems to have worked well for cpufreq, and that in their
>>>> overall purpose cpufreq and devfreq are similar subsystems.
>>>>
>>>> The current throttler series with devfreq_verify_within_limits() takes
>>>> the enabled OPPs into account, the lowest and highest OPP are used as
>>>> a starting point for the frequency adjustment and (in theory) the
>>>> frequency range should only be narrowed by
>>>> devfreq_verify_within_limits().
>>>>
>>>>> As I already commented[1], there is different between cpufreq and devfreq.
>>>>> [1] https://lkml.org/lkml/2018/7/4/80
>>>>>
>>>>> Already, subsystem already used OPP interface in order to control
>>>>> specific OPP entry. I don't want to provide two outside method
>>>>> to control the frequency of devfreq driver. It might make the confusion.
>>>>
>>>> I understand your point, it would indeed be preferable to have a
>>>> single method. However I'm not convinced that the OPP interface is
>>>> a suitable solution, as I exposed earlier in this thread (quoted
>>>> below).
>>>>
>>>> I would like you to at least consider the possibility of changing
>>>> drivers/thermal/devfreq_cooling.c to devfreq_verify_within_limits().
>>>> Besides that it's not what is currently used, do you see any technical
>>>> concerns that would make devfreq_verify_within_limits() an unsuitable
>>>> or inferior solution?
>>>
>>> As we already discussed, devfreq_verify_within_limits() doesn't support
>>> the multiple outside controllers (e.g., devfreq-cooling.c).
>>
>> That's incorrect, its purpose is precisely that.
>>
>> Are you suggesting that cpufreq with its use of
>> cpufreq_verify_within_limits() (the inspiration for
>> devfreq_verify_within_limits()) is broken? It is used by cpu_cooling.c
>> and other drivers when receiving a CPUFREQ_ADJUST event, essentially
>> what I am proposing with DEVFREQ_ADJUST.
>>
>> Could you elaborate why this model wouldn't work for devfreq? "OPP
>> interface is mandatory for devfreq" isn't really a technical argument,
>> is it mandatory for any other reason than that it is the interface
>> that is currently used?
>>
>>> After you are suggesting the throttler core, there are at least two
>>> outside controllers (e.g., devfreq-cooling.c and throttler driver).
>>> As I knew the problem about conflict, I cannot agree the temporary
>>> method. OPP interface is mandatory for devfreq in order to control
>>> the OPP (frequency/voltage). In this situation, we have to try to
>>> find the method through OPP interface.
>>
>> What do you mean with "temporary method"?
>>
>> We can try to find a method through the OPP interface, but at this
>> point I'm not convinced that it is technically necessary or even
>> preferable.
>>
>> Another inconvenient of the OPP approach for both devfreq-cooling.c
>> and the throttler is that they have to bother with disabling all OPPs
>> above/below the max/min (they don't/shouldn't have to care), instead
>> of just telling devfreq the max/min.
>
> And a more important one: both drivers now have to keep track which
> OPPs they enabled/disabled previously, done are the days of a simple
> dev_pm_opp_enable/disable() in devfreq_cooling. Certainly it is
> possible and not very complex to implement, but is it really the
> best/a good solution?
As I replied them right before, Each outside driver has their own throttling
policy to control OPP entries. They don't care the requirement of other
driver and cannot know the requirement of other driver. devfreq core can only
recognize them and then only consider enabled OPP entris without disabled OPP entries.
For example1,
| devfreq-cooling| throttler
---------------------------------------
500Mhz | disabled | disabled
400Mhz | disabled | disabled
300Mhz | | disabled
200Mhz | |
100Mhz | |
=> devfreq driver can use only 100/200Mhz
For example2,
| devfreq-cooling| throttler
---------------------------------------
500Mhz | disabled | disabled
400Mhz | disabled |
300Mhz | disabled |
200Mhz | |
100Mhz | |
=> devfreq driver can use only 100/200Mhz
For example3,
| devfreq-cooling| throttler
---------------------------------------
500Mhz | disabled | disabled
400Mhz | |
300Mhz | |
200Mhz | | disabled
100Mhz | | disabled
=> devfreq driver can use only 300/400Mhz
--
Best Regards,
Chanwoo Choi
Samsung Electronics
On Fri, Aug 03, 2018 at 09:03:30AM +0900, Chanwoo Choi wrote:
> Hi Matthias,
>
> On 2018년 08월 03일 08:36, Matthias Kaehlcke wrote:
> > Hi Chanwoo,
> >
> > this patch and "PM / devfreq: Fix handling of min/max_freq == 0"
> > address issues not directly related with the throttler. It seems it
> > could still take a while for the throttler to move forward, do you
> > want me to spin out these two patches so that they can get merged
> > independently from the rest of the series?
>
> How about resend the devfreq patches(patch1/2/3/4/6) which don't depend on
> throttler core with my reviewed tag? Maybe, it is easy to merge them through Myungjoo.
Sure, I can do this if you think it is reasonable to merge all these
patches without the throttler.
These are the patches we are talking about and my interpretation of
their status:
[01] PM / devfreq: Init user limits from OPP limits, not viceversa
landed in Rafaels tree
[02] PM / devfreq: Fix handling of min/max_freq == 0
independent fix, can land
[03] PM / devfreq: Don't adjust to user limits in governors
independent improvement, can land
[04] PM / devfreq: Add struct devfreq_policy
edge case, can land if devfreq maintainers think that factoring out
some fields to the policy struct is an improvement independently of
the throttler
[05] PM / devfreq: Add support for policy notifiers
under heavy discussion ;-), can't land
[06] PM / devfreq: Make update_devfreq() public
has no user without the throttler, not sure if it should be merged
without it. up to devfreq maintainers.
Please let me know what you think
Thanks
Matthias
Hi Matthias,
On 2018년 08월 03일 09:24, Matthias Kaehlcke wrote:
> On Fri, Aug 03, 2018 at 09:03:30AM +0900, Chanwoo Choi wrote:
>> Hi Matthias,
>>
>> On 2018년 08월 03일 08:36, Matthias Kaehlcke wrote:
>>> Hi Chanwoo,
>>>
>>> this patch and "PM / devfreq: Fix handling of min/max_freq == 0"
>>> address issues not directly related with the throttler. It seems it
>>> could still take a while for the throttler to move forward, do you
>>> want me to spin out these two patches so that they can get merged
>>> independently from the rest of the series?
>>
>> How about resend the devfreq patches(patch1/2/3/4/6) which don't depend on
>> throttler core with my reviewed tag? Maybe, it is easy to merge them through Myungjoo.
>
> Sure, I can do this if you think it is reasonable to merge all these
> patches without the throttler.
IMO, patch1/2/3/6 looks good. I replied with my reviewed-tag for them.
patch4 defines the 'struct devfreq_policy' and then patch5
send notification with 'struct devfreq_policy' on original patch.
But, when we discussed it on patch5, new devfreq notification
send 'struct devfreq_freq_limits' better than 'struct devfreq_policy'.
So, patch4 would be required with more discussion. If myungjoo agree
the current patch4, I'm okay.
>
> These are the patches we are talking about and my interpretation of
> their status:
>
> [01] PM / devfreq: Init user limits from OPP limits, not viceversa
> landed in Rafaels tree
>
> [02] PM / devfreq: Fix handling of min/max_freq == 0
> independent fix, can land
>
> [03] PM / devfreq: Don't adjust to user limits in governors
> independent improvement, can land
>
> [04] PM / devfreq: Add struct devfreq_policy
> edge case, can land if devfreq maintainers think that factoring out
> some fields to the policy struct is an improvement independently of
> the throttler
>
> [05] PM / devfreq: Add support for policy notifiers
> under heavy discussion ;-), can't land
>
> [06] PM / devfreq: Make update_devfreq() public
> has no user without the throttler, not sure if it should be merged
> without it. up to devfreq maintainers.
>
> Please let me know what you think
--
Best Regards,
Chanwoo Choi
Samsung Electronics
Hi Chanwoo,
On Fri, Aug 03, 2018 at 08:56:57AM +0900, Chanwoo Choi wrote:
> Hi Matthias,
>
> On 2018년 08월 03일 08:13, Matthias Kaehlcke wrote:
> > Hi Chanwoo,
> >
> > On Thu, Aug 02, 2018 at 10:58:59AM +0900, Chanwoo Choi wrote:
> >> Hi Matthias,
> >>
> >> On 2018년 08월 02일 02:08, Matthias Kaehlcke wrote:
> >>> Hi Chanwoo,
> >>>
> >>> On Wed, Aug 01, 2018 at 10:22:16AM +0900, Chanwoo Choi wrote:
> >>>> On 2018년 08월 01일 04:39, Matthias Kaehlcke wrote:
> >>>>> On Mon, Jul 16, 2018 at 10:50:50AM -0700, Matthias Kaehlcke wrote:
> >>>>>> On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote:
> >>>>>>> Hi Matthias,
> >>>>>>>
> >>>>>>> On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote:
> >>>>>>>> Hi Chanwoo,
> >>>>>>>>
> >>>>>>>> On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote:
> >>>>>>>>
> >>>>>>>>> Firstly,
> >>>>>>>>> I'm not sure why devfreq needs the devfreq_verify_within_limits() function.
> >>>>>>>>>
> >>>>>>>>> devfreq already used the OPP interface as default. It means that
> >>>>>>>>> the outside of 'drivers/devfreq' can disable/enable the frequency
> >>>>>>>>> such as drivers/thermal/devfreq_cooling.c. Also, when some device
> >>>>>>>>> drivers disable/enable the specific frequency, the devfreq core
> >>>>>>>>> consider them.
> >>>>>>>>>
> >>>>>>>>> So, devfreq doesn't need to devfreq_verify_within_limits() because
> >>>>>>>>> already support some interface to change the minimum/maximum frequency
> >>>>>>>>> of devfreq device.
> >>>>>>>>>
> >>>>>>>>> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()'
> >>>>>>>>> to change the minimum/maximum frequency of cpu. some device driver cannot
> >>>>>>>>> change the minimum/maximum frequency through OPP interface.
> >>>>>>>>>
> >>>>>>>>> But, in case of devfreq subsystem, as I explained already, devfreq support
> >>>>>>>>> the OPP interface as default way. devfreq subsystem doesn't need to add
> >>>>>>>>> other way to change the minimum/maximum frequency.
> >>>>>>>>
> >>>>>>>> Using the OPP interface exclusively works as long as a
> >>>>>>>> enabling/disabling of OPPs is limited to a single driver
> >>>>>>>> (drivers/thermal/devfreq_cooling.c). When multiple drivers are
> >>>>>>>> involved you need a way to resolve conflicts, that's the purpose of
> >>>>>>>> devfreq_verify_within_limits(). Please let me know if there are
> >>>>>>>> existing mechanisms for conflict resolution that I overlooked.
> >>>>>>>>
> >>>>>>>> Possibly drivers/thermal/devfreq_cooling.c could be migrated to use
> >>>>>>>> devfreq_verify_within_limits() instead of the OPP interface if
> >>>>>>>> desired, however this seems beyond the scope of this series.
> >>>>>>>
> >>>>>>> Actually, if we uses this approach, it doesn't support the multiple drivers too.
> >>>>>>> If non throttler drivers uses devfreq_verify_within_limits(), the conflict
> >>>>>>> happen.
> >>>>>>
> >>>>>> As long as drivers limit the max freq there is no conflict, the lowest
> >>>>>> max freq wins. I expect this to be the usual case, apparently it
> >>>>>> worked for cpufreq for 10+ years.
> >>>>>>
> >>>>>> However it is correct that there would be a conflict if a driver
> >>>>>> requests a min freq that is higher than the max freq requested by
> >>>>>> another. In this case devfreq_verify_within_limits() resolves the
> >>>>>> conflict by raising p->max to the min freq. Not sure if this is
> >>>>>> something that would ever occur in practice though.
> >>>>>>
> >>>>>> If we are really concerned about this case it would also be an option
> >>>>>> to limit the adjustment to the max frequency.
> >>>>>>
> >>>>>>> To resolve the conflict for multiple device driver, maybe OPP interface
> >>>>>>> have to support 'usage_count' such as clk_enable/disable().
> >>>>>>
> >>>>>> This would require supporting negative usage count values, since a OPP
> >>>>>> should not be enabled if e.g. thermal enables it but the throttler
> >>>>>> disabled it or viceversa.
> >>>>>>
> >>>>>> Theoretically there could also be conflicts, like one driver disabling
> >>>>>> the higher OPPs and another the lower ones, with the outcome of all
> >>>>>> OPPs being disabled, which would be a more drastic conflict resolution
> >>>>>> than that of devfreq_verify_within_limits().
> >>>>>>
> >>>>>> Viresh, what do you think about an OPP usage count?
> >>>>>
> >>>>> Ping, can we try to reach a conclusion on this or at least keep the
> >>>>> discussion going?
> >>>>>
> >>>>> Not that it matters, but my preferred solution continues to be
> >>>>> devfreq_verify_within_limits(). It solves conflicts in some way (which
> >>>>> could be adjusted if needed) and has proven to work in practice for
> >>>>> 10+ years in a very similar sub-system.
> >>>>
> >>>> It is not true. Current cpufreq subsystem doesn't support external OPP
> >>>> control to enable/disable the OPP entry. If some device driver
> >>>> controls the OPP entry of cpufreq driver with opp_disable/enable(),
> >>>> the operation is not working. Because cpufreq considers the limit
> >>>> through 'cpufreq_verify_with_limits()' only.
> >>>
> >>> Ok, we can probably agree that using cpufreq_verify_with_limits()
> >>> exclusively seems to have worked well for cpufreq, and that in their
> >>> overall purpose cpufreq and devfreq are similar subsystems.
> >>>
> >>> The current throttler series with devfreq_verify_within_limits() takes
> >>> the enabled OPPs into account, the lowest and highest OPP are used as
> >>> a starting point for the frequency adjustment and (in theory) the
> >>> frequency range should only be narrowed by
> >>> devfreq_verify_within_limits().
> >>>
> >>>> As I already commented[1], there is different between cpufreq and devfreq.
> >>>> [1] https://lkml.org/lkml/2018/7/4/80
> >>>>
> >>>> Already, subsystem already used OPP interface in order to control
> >>>> specific OPP entry. I don't want to provide two outside method
> >>>> to control the frequency of devfreq driver. It might make the confusion.
> >>>
> >>> I understand your point, it would indeed be preferable to have a
> >>> single method. However I'm not convinced that the OPP interface is
> >>> a suitable solution, as I exposed earlier in this thread (quoted
> >>> below).
> >>>
> >>> I would like you to at least consider the possibility of changing
> >>> drivers/thermal/devfreq_cooling.c to devfreq_verify_within_limits().
> >>> Besides that it's not what is currently used, do you see any technical
> >>> concerns that would make devfreq_verify_within_limits() an unsuitable
> >>> or inferior solution?
> >>
> >> As we already discussed, devfreq_verify_within_limits() doesn't support
> >> the multiple outside controllers (e.g., devfreq-cooling.c).
> >
> > That's incorrect, its purpose is precisely that.
> >
> > Are you suggesting that cpufreq with its use of
> > cpufreq_verify_within_limits() (the inspiration for
> > devfreq_verify_within_limits()) is broken? It is used by cpu_cooling.c
> > and other drivers when receiving a CPUFREQ_ADJUST event, essentially
> > what I am proposing with DEVFREQ_ADJUST.
> >
> > Could you elaborate why this model wouldn't work for devfreq? "OPP
>
> I don't mention that this model is not working. As I already commented[1],
> devfreq used OPP interface to control OPP entry on outside of devfreq driver.
> Because devfreq used OPP interface, I hope to provide only OPP method
> to control the frequency on outside of devfreq.
> [1] https://lkml.org/lkml/2018/7/4/80
>
> > interface is mandatory for devfreq" isn't really a technical argument,
> > is it mandatory for any other reason than that it is the interface
> > that is currently used?
>
> In case of controlling the frequency, OPP interface is mandatory for devfreq.
>
> cpufreq used cpufreq_verify_within_limit(). If outside driver disable
> specific OPP entry, cpufreq don't consider them because after getting the frequency
> from devicetree, cpufreq don't use the OPP interface for disabling/enabling.
> Only if outside driver used cpufreq_verify_within_limit(), cpufreq consider
> the range of minimum/maximum frequency. cpufreq core doesn't use 'dev_pm_opp_find_*'
> function. It means that cpufreq code doesn't consider the statue of opp_diable/enable.
>
> devfreq used OPP interface. If outside driver disable specific OPP entry, devfreq consider them.
What exactly is this 'outside driver' you are referring? The driver
that 'owns' the devfreq device, e.g. a GPU driver? Or just any
non-devfreq driver, like devfreq-cooling.c?
If it's the first case then this isn't currently working as intended
when the devfreq device is used as a cooling device, since the cooling
device would overwrite the state set by the 'owner' in
partition_enable_opps().
> When find available minimum frequency, devfreq used OPP interface. (find_available_min_freq)
> When find available maximum frequency, devfreq used OPP interface. (find_available_max_freq)
> When make freq_table of devfreq device, devfreq used OPP interface. (set_freq_table)
> When outside driver disable or enable OPP entry, devfreq receives the notification
> from OPP interface and then update the scaling_min_freq/scaling_max_freq by using
> OPP interface. (devfreq_notifier_call)
> At this point of using scaling_min_freq/scaling_max_freq on devfreq, it indicates
> that devfreq used OPP interface because devfref tried to find scaling_min_freq/scaling_max_freq
> through OPP interface.
>
> If outside driver use OPP interface in order to control frequency,
> devfreq core is well working without any modification of devfreq
> core.
Thanks for elaborating!
I understand that this is how it currently works, but unless I'm
missing something about the outside driver disabling an OPP I still
essentially read this as 'the OPP interface is mandatory because it's
what is currently used by the devfreq core to limit the frequency
range', rather than that using the OPP interface allows to provide a
particular feature or is inherently better in some other way.
I don't propose to completely strip the OPP interface out of devfreq,
but mainly to switch devfreq-cooling.c to
devfreq_verify_within_limits() to avoid having two mechanisms for
limiting the frequency range. Besides being simpler this would allow
to support the case where the 'owner' disables a certain OPP and
devfreq respects that. The code required in the devfreq core to
support this would be minimal (this patch).
> >> After you are suggesting the throttler core, there are at least two
> >> outside controllers (e.g., devfreq-cooling.c and throttler driver).
> >> As I knew the problem about conflict, I cannot agree the temporary
> >> method. OPP interface is mandatory for devfreq in order to control
> >> the OPP (frequency/voltage). In this situation, we have to try to
> >> find the method through OPP interface.
> >
> > What do you mean with "temporary method"?
>
> this expression might be not proper. Please ignore this expression.
>
> >
> > We can try to find a method through the OPP interface, but at this
> > point I'm not convinced that it is technically necessary or even
> > preferable.
>
> I replied it about this as following.
>
> >
> > Another inconvenient of the OPP approach for both devfreq-cooling.c
> > and the throttler is that they have to bother with disabling all OPPs
> > above/below the max/min (they don't/shouldn't have to care), instead
> > of just telling devfreq the max/min.
>
> I think it doesn't matter. We can enable/disable the OPP entry by traversing.
> partition_enable_opps() in drivers/thermal/devfreq-cools.c have already done so.
>
> >
> >> We can refer to regulator/clock. Multiple device driver can use
> >> the regulator/clock without any problem. I think that usage of OPP
> >> is similiar with regulator/clock. As you mentioned, maybe OPP
> >> would handle the negative count. Although opp_enable/opp_disable()
> >> have to handle the negative count and opp_enable/opp_disable()
> >> can support the multiple usage from device drivers, I think that
> >> this approach is right.
> >
> > The regulator/clock approach with the typical usage counts seems more
> > intuitive to me, personally I wouldn't write an interface with
> > negative usage count if I could reasonably avoid it.
>
> I think the use of negative usage count is not problem if it's required.
>
> >
> >>>> I want to use only OPP interface to enable/disable frequency
> >>>> even if we have to modify the OPP interface.
> >>>
> >>> These are the concerns I raised earlier about a solution with OPP
> >>> usage counts:
> >>>
> >>> "This would require supporting negative usage count values, since a OPP
> >>> should not be enabled if e.g. thermal enables it but the throttler
> >>> disabled it or viceversa.
> >>
> >> Already replied about negative usage count. I think that negative usage count
> >> is not problem if this approach could resolve the issue.
> >>
> >>>
> >>> Theoretically there could also be conflicts, like one driver disabling
> >>> the higher OPPs and another the lower ones, with the outcome of all
> >>> OPPs being disabled, which would be a more drastic conflict resolution
> >>> than that of devfreq_verify_within_limits()."
> >>>
> >>> What do you think about these points?
> >>
> >> It depends on how to use OPP interface on multiple device driver.
> >> Even if devfreq/opp provides the control method, outside device driver
> >> are misusing them. It is problem of user.
> >
> > I wouldn't call it misusing if two independent drivers take
> > contradictory actions on an interface that doesn't provide
> > arbitration. How can driver A know that it shouldn't disable OPPs a, b
> > and c because driver B disabled d, e and f? Who is misusing the
> > interface, driver A or driver B?
>
> Each outside driver has their own throttling policy to control OPP entries.
> They don't care the requirement of other driver and cannot know the requirement
> of other driver. devfreq core can only recognize them.
>
> >
> >> Instead, if we use the OPP interface, we can check why OPP entry
> >> is disabled or enabled through usage count.
> >>
> >>>
> >>> The negative usage counts aren't necessarily a dealbreaker in a
> >>> technical sense, though I'm not a friend of quirky interfaces that
> >>> don't behave like a typical user would expect (e.g. an OPP isn't
> >>> necessarily enabled after dev_pm_opp_enable()).
> >>>
> >>> I can sent an RFC with OPP usage counts, though due to the above
> >>> concerns I have doubts it will be well received.
> >>
> >> Please add me to Cc list.
> >
> > Will do
>
> OK. Thanks.
This might take a bit for a few reasons. Before posting anything I
would like to experiment a bit with it and find time to do so between
other tasks (admittedly I'm also procrastinating a bit, because I'm
unconvinced). And I will be out of office for two weeks starting
nextweek, it's probably not the best to post and then disapear from
the discussion. I might post the RFC if I can advance it in the next
48 hours, otherwise I think it is better to delay until I'm back from
vacation.
Cheers
Matthias
Hi Chanwoo,
On Fri, Aug 03, 2018 at 09:14:46AM +0900, Chanwoo Choi wrote:
> Hi Matthias,
>
> On 2018년 08월 03일 08:48, Matthias Kaehlcke wrote:
> > On Thu, Aug 02, 2018 at 04:13:43PM -0700, Matthias Kaehlcke wrote:
> >> Hi Chanwoo,
> >>
> >> On Thu, Aug 02, 2018 at 10:58:59AM +0900, Chanwoo Choi wrote:
> >>> Hi Matthias,
> >>>
> >>> On 2018년 08월 02일 02:08, Matthias Kaehlcke wrote:
> >>>> Hi Chanwoo,
> >>>>
> >>>> On Wed, Aug 01, 2018 at 10:22:16AM +0900, Chanwoo Choi wrote:
> >>>>> On 2018년 08월 01일 04:39, Matthias Kaehlcke wrote:
> >>>>>> On Mon, Jul 16, 2018 at 10:50:50AM -0700, Matthias Kaehlcke wrote:
> >>>>>>> On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote:
> >>>>>>>> Hi Matthias,
> >>>>>>>>
> >>>>>>>> On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote:
> >>>>>>>>> Hi Chanwoo,
> >>>>>>>>>
> >>>>>>>>> On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote:
> >>>>>>>>>
> >>>>>>>>>> Firstly,
> >>>>>>>>>> I'm not sure why devfreq needs the devfreq_verify_within_limits() function.
> >>>>>>>>>>
> >>>>>>>>>> devfreq already used the OPP interface as default. It means that
> >>>>>>>>>> the outside of 'drivers/devfreq' can disable/enable the frequency
> >>>>>>>>>> such as drivers/thermal/devfreq_cooling.c. Also, when some device
> >>>>>>>>>> drivers disable/enable the specific frequency, the devfreq core
> >>>>>>>>>> consider them.
> >>>>>>>>>>
> >>>>>>>>>> So, devfreq doesn't need to devfreq_verify_within_limits() because
> >>>>>>>>>> already support some interface to change the minimum/maximum frequency
> >>>>>>>>>> of devfreq device.
> >>>>>>>>>>
> >>>>>>>>>> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()'
> >>>>>>>>>> to change the minimum/maximum frequency of cpu. some device driver cannot
> >>>>>>>>>> change the minimum/maximum frequency through OPP interface.
> >>>>>>>>>>
> >>>>>>>>>> But, in case of devfreq subsystem, as I explained already, devfreq support
> >>>>>>>>>> the OPP interface as default way. devfreq subsystem doesn't need to add
> >>>>>>>>>> other way to change the minimum/maximum frequency.
> >>>>>>>>>
> >>>>>>>>> Using the OPP interface exclusively works as long as a
> >>>>>>>>> enabling/disabling of OPPs is limited to a single driver
> >>>>>>>>> (drivers/thermal/devfreq_cooling.c). When multiple drivers are
> >>>>>>>>> involved you need a way to resolve conflicts, that's the purpose of
> >>>>>>>>> devfreq_verify_within_limits(). Please let me know if there are
> >>>>>>>>> existing mechanisms for conflict resolution that I overlooked.
> >>>>>>>>>
> >>>>>>>>> Possibly drivers/thermal/devfreq_cooling.c could be migrated to use
> >>>>>>>>> devfreq_verify_within_limits() instead of the OPP interface if
> >>>>>>>>> desired, however this seems beyond the scope of this series.
> >>>>>>>>
> >>>>>>>> Actually, if we uses this approach, it doesn't support the multiple drivers too.
> >>>>>>>> If non throttler drivers uses devfreq_verify_within_limits(), the conflict
> >>>>>>>> happen.
> >>>>>>>
> >>>>>>> As long as drivers limit the max freq there is no conflict, the lowest
> >>>>>>> max freq wins. I expect this to be the usual case, apparently it
> >>>>>>> worked for cpufreq for 10+ years.
> >>>>>>>
> >>>>>>> However it is correct that there would be a conflict if a driver
> >>>>>>> requests a min freq that is higher than the max freq requested by
> >>>>>>> another. In this case devfreq_verify_within_limits() resolves the
> >>>>>>> conflict by raising p->max to the min freq. Not sure if this is
> >>>>>>> something that would ever occur in practice though.
> >>>>>>>
> >>>>>>> If we are really concerned about this case it would also be an option
> >>>>>>> to limit the adjustment to the max frequency.
> >>>>>>>
> >>>>>>>> To resolve the conflict for multiple device driver, maybe OPP interface
> >>>>>>>> have to support 'usage_count' such as clk_enable/disable().
> >>>>>>>
> >>>>>>> This would require supporting negative usage count values, since a OPP
> >>>>>>> should not be enabled if e.g. thermal enables it but the throttler
> >>>>>>> disabled it or viceversa.
> >>>>>>>
> >>>>>>> Theoretically there could also be conflicts, like one driver disabling
> >>>>>>> the higher OPPs and another the lower ones, with the outcome of all
> >>>>>>> OPPs being disabled, which would be a more drastic conflict resolution
> >>>>>>> than that of devfreq_verify_within_limits().
> >>>>>>>
> >>>>>>> Viresh, what do you think about an OPP usage count?
> >>>>>>
> >>>>>> Ping, can we try to reach a conclusion on this or at least keep the
> >>>>>> discussion going?
> >>>>>>
> >>>>>> Not that it matters, but my preferred solution continues to be
> >>>>>> devfreq_verify_within_limits(). It solves conflicts in some way (which
> >>>>>> could be adjusted if needed) and has proven to work in practice for
> >>>>>> 10+ years in a very similar sub-system.
> >>>>>
> >>>>> It is not true. Current cpufreq subsystem doesn't support external OPP
> >>>>> control to enable/disable the OPP entry. If some device driver
> >>>>> controls the OPP entry of cpufreq driver with opp_disable/enable(),
> >>>>> the operation is not working. Because cpufreq considers the limit
> >>>>> through 'cpufreq_verify_with_limits()' only.
> >>>>
> >>>> Ok, we can probably agree that using cpufreq_verify_with_limits()
> >>>> exclusively seems to have worked well for cpufreq, and that in their
> >>>> overall purpose cpufreq and devfreq are similar subsystems.
> >>>>
> >>>> The current throttler series with devfreq_verify_within_limits() takes
> >>>> the enabled OPPs into account, the lowest and highest OPP are used as
> >>>> a starting point for the frequency adjustment and (in theory) the
> >>>> frequency range should only be narrowed by
> >>>> devfreq_verify_within_limits().
> >>>>
> >>>>> As I already commented[1], there is different between cpufreq and devfreq.
> >>>>> [1] https://lkml.org/lkml/2018/7/4/80
> >>>>>
> >>>>> Already, subsystem already used OPP interface in order to control
> >>>>> specific OPP entry. I don't want to provide two outside method
> >>>>> to control the frequency of devfreq driver. It might make the confusion.
> >>>>
> >>>> I understand your point, it would indeed be preferable to have a
> >>>> single method. However I'm not convinced that the OPP interface is
> >>>> a suitable solution, as I exposed earlier in this thread (quoted
> >>>> below).
> >>>>
> >>>> I would like you to at least consider the possibility of changing
> >>>> drivers/thermal/devfreq_cooling.c to devfreq_verify_within_limits().
> >>>> Besides that it's not what is currently used, do you see any technical
> >>>> concerns that would make devfreq_verify_within_limits() an unsuitable
> >>>> or inferior solution?
> >>>
> >>> As we already discussed, devfreq_verify_within_limits() doesn't support
> >>> the multiple outside controllers (e.g., devfreq-cooling.c).
> >>
> >> That's incorrect, its purpose is precisely that.
> >>
> >> Are you suggesting that cpufreq with its use of
> >> cpufreq_verify_within_limits() (the inspiration for
> >> devfreq_verify_within_limits()) is broken? It is used by cpu_cooling.c
> >> and other drivers when receiving a CPUFREQ_ADJUST event, essentially
> >> what I am proposing with DEVFREQ_ADJUST.
> >>
> >> Could you elaborate why this model wouldn't work for devfreq? "OPP
> >> interface is mandatory for devfreq" isn't really a technical argument,
> >> is it mandatory for any other reason than that it is the interface
> >> that is currently used?
> >>
> >>> After you are suggesting the throttler core, there are at least two
> >>> outside controllers (e.g., devfreq-cooling.c and throttler driver).
> >>> As I knew the problem about conflict, I cannot agree the temporary
> >>> method. OPP interface is mandatory for devfreq in order to control
> >>> the OPP (frequency/voltage). In this situation, we have to try to
> >>> find the method through OPP interface.
> >>
> >> What do you mean with "temporary method"?
> >>
> >> We can try to find a method through the OPP interface, but at this
> >> point I'm not convinced that it is technically necessary or even
> >> preferable.
> >>
> >> Another inconvenient of the OPP approach for both devfreq-cooling.c
> >> and the throttler is that they have to bother with disabling all OPPs
> >> above/below the max/min (they don't/shouldn't have to care), instead
> >> of just telling devfreq the max/min.
> >
> > And a more important one: both drivers now have to keep track which
> > OPPs they enabled/disabled previously, done are the days of a simple
> > dev_pm_opp_enable/disable() in devfreq_cooling. Certainly it is
> > possible and not very complex to implement, but is it really the
> > best/a good solution?
>
>
> As I replied them right before, Each outside driver has their own throttling
> policy to control OPP entries. They don't care the requirement of other
> driver and cannot know the requirement of other driver. devfreq core can only
> recognize them and then only consider enabled OPP entris without disabled OPP entries.
>
> For example1,
> | devfreq-cooling| throttler
> ---------------------------------------
> 500Mhz | disabled | disabled
> 400Mhz | disabled | disabled
> 300Mhz | | disabled
> 200Mhz | |
> 100Mhz | |
> => devfreq driver can use only 100/200Mhz
>
>
> For example2,
> | devfreq-cooling| throttler
> ---------------------------------------
> 500Mhz | disabled | disabled
> 400Mhz | disabled |
> 300Mhz | disabled |
> 200Mhz | |
> 100Mhz | |
> => devfreq driver can use only 100/200Mhz
>
>
> For example3,
> | devfreq-cooling| throttler
> ---------------------------------------
> 500Mhz | disabled | disabled
> 400Mhz | |
> 300Mhz | |
> 200Mhz | | disabled
> 100Mhz | | disabled
> => devfreq driver can use only 300/400Mhz
These are all cases without conflicts, my concern is about this:
> | devfreq-cooling| throttler
> ---------------------------------------
> 500Mhz | disabled |
> 400Mhz | disabled |
> 300Mhz | | disabled
> 200Mhz | | disabled
> 100Mhz | | disabled
> => devfreq driver can't use any frequency?
Actually my above comment wasn't about this case, but about the
added complexity in devfreq-cooling.c and the throttler:
A bit simplified partition_enable_opps() currently does this:
for_each_opp(opp) {
if (opp->freq <= max)
opp_enable(opp)
else
opp_disable(opp)
}
With the OPP usage/disable count this doesn't work any longer. Now we
need to keep track of the enabled/disabled state of the OPP, something
like:
dev_pm_opp_enable(opp) {
if (opp->freq <= max) {
if (opp->freq > prev_max)
opp_enable(opp)
} else {
if (opp->freq < prev_max)
opp_disable(opp)
}
}
And duplicate the same in the throttler (and other possible
drivers). Obviously it can be done, but is there really any gain
from it?
Instead they just could do:
devfreq_verify_within_limits(policy/freq_pair, 0, max_freq)
without being concerned about implementation details of devfreq.
Thanks
Matthias
Hi Matthias,
On 2018년 08월 07일 03:46, Matthias Kaehlcke wrote:
> Hi Chanwoo,
>
> On Fri, Aug 03, 2018 at 08:56:57AM +0900, Chanwoo Choi wrote:
>> Hi Matthias,
>>
>> On 2018년 08월 03일 08:13, Matthias Kaehlcke wrote:
>>> Hi Chanwoo,
>>>
>>> On Thu, Aug 02, 2018 at 10:58:59AM +0900, Chanwoo Choi wrote:
>>>> Hi Matthias,
>>>>
>>>> On 2018년 08월 02일 02:08, Matthias Kaehlcke wrote:
>>>>> Hi Chanwoo,
>>>>>
>>>>> On Wed, Aug 01, 2018 at 10:22:16AM +0900, Chanwoo Choi wrote:
>>>>>> On 2018년 08월 01일 04:39, Matthias Kaehlcke wrote:
>>>>>>> On Mon, Jul 16, 2018 at 10:50:50AM -0700, Matthias Kaehlcke wrote:
>>>>>>>> On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote:
>>>>>>>>> Hi Matthias,
>>>>>>>>>
>>>>>>>>> On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote:
>>>>>>>>>> Hi Chanwoo,
>>>>>>>>>>
>>>>>>>>>> On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote:
>>>>>>>>>>
>>>>>>>>>>> Firstly,
>>>>>>>>>>> I'm not sure why devfreq needs the devfreq_verify_within_limits() function.
>>>>>>>>>>>
>>>>>>>>>>> devfreq already used the OPP interface as default. It means that
>>>>>>>>>>> the outside of 'drivers/devfreq' can disable/enable the frequency
>>>>>>>>>>> such as drivers/thermal/devfreq_cooling.c. Also, when some device
>>>>>>>>>>> drivers disable/enable the specific frequency, the devfreq core
>>>>>>>>>>> consider them.
>>>>>>>>>>>
>>>>>>>>>>> So, devfreq doesn't need to devfreq_verify_within_limits() because
>>>>>>>>>>> already support some interface to change the minimum/maximum frequency
>>>>>>>>>>> of devfreq device.
>>>>>>>>>>>
>>>>>>>>>>> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()'
>>>>>>>>>>> to change the minimum/maximum frequency of cpu. some device driver cannot
>>>>>>>>>>> change the minimum/maximum frequency through OPP interface.
>>>>>>>>>>>
>>>>>>>>>>> But, in case of devfreq subsystem, as I explained already, devfreq support
>>>>>>>>>>> the OPP interface as default way. devfreq subsystem doesn't need to add
>>>>>>>>>>> other way to change the minimum/maximum frequency.
>>>>>>>>>>
>>>>>>>>>> Using the OPP interface exclusively works as long as a
>>>>>>>>>> enabling/disabling of OPPs is limited to a single driver
>>>>>>>>>> (drivers/thermal/devfreq_cooling.c). When multiple drivers are
>>>>>>>>>> involved you need a way to resolve conflicts, that's the purpose of
>>>>>>>>>> devfreq_verify_within_limits(). Please let me know if there are
>>>>>>>>>> existing mechanisms for conflict resolution that I overlooked.
>>>>>>>>>>
>>>>>>>>>> Possibly drivers/thermal/devfreq_cooling.c could be migrated to use
>>>>>>>>>> devfreq_verify_within_limits() instead of the OPP interface if
>>>>>>>>>> desired, however this seems beyond the scope of this series.
>>>>>>>>>
>>>>>>>>> Actually, if we uses this approach, it doesn't support the multiple drivers too.
>>>>>>>>> If non throttler drivers uses devfreq_verify_within_limits(), the conflict
>>>>>>>>> happen.
>>>>>>>>
>>>>>>>> As long as drivers limit the max freq there is no conflict, the lowest
>>>>>>>> max freq wins. I expect this to be the usual case, apparently it
>>>>>>>> worked for cpufreq for 10+ years.
>>>>>>>>
>>>>>>>> However it is correct that there would be a conflict if a driver
>>>>>>>> requests a min freq that is higher than the max freq requested by
>>>>>>>> another. In this case devfreq_verify_within_limits() resolves the
>>>>>>>> conflict by raising p->max to the min freq. Not sure if this is
>>>>>>>> something that would ever occur in practice though.
>>>>>>>>
>>>>>>>> If we are really concerned about this case it would also be an option
>>>>>>>> to limit the adjustment to the max frequency.
>>>>>>>>
>>>>>>>>> To resolve the conflict for multiple device driver, maybe OPP interface
>>>>>>>>> have to support 'usage_count' such as clk_enable/disable().
>>>>>>>>
>>>>>>>> This would require supporting negative usage count values, since a OPP
>>>>>>>> should not be enabled if e.g. thermal enables it but the throttler
>>>>>>>> disabled it or viceversa.
>>>>>>>>
>>>>>>>> Theoretically there could also be conflicts, like one driver disabling
>>>>>>>> the higher OPPs and another the lower ones, with the outcome of all
>>>>>>>> OPPs being disabled, which would be a more drastic conflict resolution
>>>>>>>> than that of devfreq_verify_within_limits().
>>>>>>>>
>>>>>>>> Viresh, what do you think about an OPP usage count?
>>>>>>>
>>>>>>> Ping, can we try to reach a conclusion on this or at least keep the
>>>>>>> discussion going?
>>>>>>>
>>>>>>> Not that it matters, but my preferred solution continues to be
>>>>>>> devfreq_verify_within_limits(). It solves conflicts in some way (which
>>>>>>> could be adjusted if needed) and has proven to work in practice for
>>>>>>> 10+ years in a very similar sub-system.
>>>>>>
>>>>>> It is not true. Current cpufreq subsystem doesn't support external OPP
>>>>>> control to enable/disable the OPP entry. If some device driver
>>>>>> controls the OPP entry of cpufreq driver with opp_disable/enable(),
>>>>>> the operation is not working. Because cpufreq considers the limit
>>>>>> through 'cpufreq_verify_with_limits()' only.
>>>>>
>>>>> Ok, we can probably agree that using cpufreq_verify_with_limits()
>>>>> exclusively seems to have worked well for cpufreq, and that in their
>>>>> overall purpose cpufreq and devfreq are similar subsystems.
>>>>>
>>>>> The current throttler series with devfreq_verify_within_limits() takes
>>>>> the enabled OPPs into account, the lowest and highest OPP are used as
>>>>> a starting point for the frequency adjustment and (in theory) the
>>>>> frequency range should only be narrowed by
>>>>> devfreq_verify_within_limits().
>>>>>
>>>>>> As I already commented[1], there is different between cpufreq and devfreq.
>>>>>> [1] https://lkml.org/lkml/2018/7/4/80
>>>>>>
>>>>>> Already, subsystem already used OPP interface in order to control
>>>>>> specific OPP entry. I don't want to provide two outside method
>>>>>> to control the frequency of devfreq driver. It might make the confusion.
>>>>>
>>>>> I understand your point, it would indeed be preferable to have a
>>>>> single method. However I'm not convinced that the OPP interface is
>>>>> a suitable solution, as I exposed earlier in this thread (quoted
>>>>> below).
>>>>>
>>>>> I would like you to at least consider the possibility of changing
>>>>> drivers/thermal/devfreq_cooling.c to devfreq_verify_within_limits().
>>>>> Besides that it's not what is currently used, do you see any technical
>>>>> concerns that would make devfreq_verify_within_limits() an unsuitable
>>>>> or inferior solution?
>>>>
>>>> As we already discussed, devfreq_verify_within_limits() doesn't support
>>>> the multiple outside controllers (e.g., devfreq-cooling.c).
>>>
>>> That's incorrect, its purpose is precisely that.
>>>
>>> Are you suggesting that cpufreq with its use of
>>> cpufreq_verify_within_limits() (the inspiration for
>>> devfreq_verify_within_limits()) is broken? It is used by cpu_cooling.c
>>> and other drivers when receiving a CPUFREQ_ADJUST event, essentially
>>> what I am proposing with DEVFREQ_ADJUST.
>>>
>>> Could you elaborate why this model wouldn't work for devfreq? "OPP
>>
>> I don't mention that this model is not working. As I already commented[1],
>> devfreq used OPP interface to control OPP entry on outside of devfreq driver.
>> Because devfreq used OPP interface, I hope to provide only OPP method
>> to control the frequency on outside of devfreq.
>> [1] https://lkml.org/lkml/2018/7/4/80
>>
>>> interface is mandatory for devfreq" isn't really a technical argument,
>>> is it mandatory for any other reason than that it is the interface
>>> that is currently used?
>>
>> In case of controlling the frequency, OPP interface is mandatory for devfreq.
>>
>> cpufreq used cpufreq_verify_within_limit(). If outside driver disable
>> specific OPP entry, cpufreq don't consider them because after getting the frequency
>> from devicetree, cpufreq don't use the OPP interface for disabling/enabling.
>> Only if outside driver used cpufreq_verify_within_limit(), cpufreq consider
>> the range of minimum/maximum frequency. cpufreq core doesn't use 'dev_pm_opp_find_*'
>> function. It means that cpufreq code doesn't consider the statue of opp_diable/enable.
>>
>> devfreq used OPP interface. If outside driver disable specific OPP entry, devfreq consider them.
>
> What exactly is this 'outside driver' you are referring? The driver
> that 'owns' the devfreq device, e.g. a GPU driver? Or just any
> non-devfreq driver, like devfreq-cooling.c?
>
> If it's the first case then this isn't currently working as intended
> when the devfreq device is used as a cooling device, since the cooling
> device would overwrite the state set by the 'owner' in
> partition_enable_opps().
>
>> When find available minimum frequency, devfreq used OPP interface. (find_available_min_freq)
>> When find available maximum frequency, devfreq used OPP interface. (find_available_max_freq)
>> When make freq_table of devfreq device, devfreq used OPP interface. (set_freq_table)
>> When outside driver disable or enable OPP entry, devfreq receives the notification
>> from OPP interface and then update the scaling_min_freq/scaling_max_freq by using
>> OPP interface. (devfreq_notifier_call)
>> At this point of using scaling_min_freq/scaling_max_freq on devfreq, it indicates
>> that devfreq used OPP interface because devfref tried to find scaling_min_freq/scaling_max_freq
>> through OPP interface.
>>
>> If outside driver use OPP interface in order to control frequency,
>> devfreq core is well working without any modification of devfreq
>> core.
>
> Thanks for elaborating!
>
> I understand that this is how it currently works, but unless I'm
> missing something about the outside driver disabling an OPP I still
> essentially read this as 'the OPP interface is mandatory because it's
> what is currently used by the devfreq core to limit the frequency
> range', rather than that using the OPP interface allows to provide a
> particular feature or is inherently better in some other way.
>
> I don't propose to completely strip the OPP interface out of devfreq,
> but mainly to switch devfreq-cooling.c to
> devfreq_verify_within_limits() to avoid having two mechanisms for
> limiting the frequency range. Besides being simpler this would allow
> to support the case where the 'owner' disables a certain OPP and
> devfreq respects that. The code required in the devfreq core to
> support this would be minimal (this patch).
>
>>>> After you are suggesting the throttler core, there are at least two
>>>> outside controllers (e.g., devfreq-cooling.c and throttler driver).
>>>> As I knew the problem about conflict, I cannot agree the temporary
>>>> method. OPP interface is mandatory for devfreq in order to control
>>>> the OPP (frequency/voltage). In this situation, we have to try to
>>>> find the method through OPP interface.
>>>
>>> What do you mean with "temporary method"?
>>
>> this expression might be not proper. Please ignore this expression.
>>
>>>
>>> We can try to find a method through the OPP interface, but at this
>>> point I'm not convinced that it is technically necessary or even
>>> preferable.
>>
>> I replied it about this as following.
>>
>>>
>>> Another inconvenient of the OPP approach for both devfreq-cooling.c
>>> and the throttler is that they have to bother with disabling all OPPs
>>> above/below the max/min (they don't/shouldn't have to care), instead
>>> of just telling devfreq the max/min.
>>
>> I think it doesn't matter. We can enable/disable the OPP entry by traversing.
>> partition_enable_opps() in drivers/thermal/devfreq-cools.c have already done so.
>>
>>>
>>>> We can refer to regulator/clock. Multiple device driver can use
>>>> the regulator/clock without any problem. I think that usage of OPP
>>>> is similiar with regulator/clock. As you mentioned, maybe OPP
>>>> would handle the negative count. Although opp_enable/opp_disable()
>>>> have to handle the negative count and opp_enable/opp_disable()
>>>> can support the multiple usage from device drivers, I think that
>>>> this approach is right.
>>>
>>> The regulator/clock approach with the typical usage counts seems more
>>> intuitive to me, personally I wouldn't write an interface with
>>> negative usage count if I could reasonably avoid it.
>>
>> I think the use of negative usage count is not problem if it's required.
>>
>>>
>>>>>> I want to use only OPP interface to enable/disable frequency
>>>>>> even if we have to modify the OPP interface.
>>>>>
>>>>> These are the concerns I raised earlier about a solution with OPP
>>>>> usage counts:
>>>>>
>>>>> "This would require supporting negative usage count values, since a OPP
>>>>> should not be enabled if e.g. thermal enables it but the throttler
>>>>> disabled it or viceversa.
>>>>
>>>> Already replied about negative usage count. I think that negative usage count
>>>> is not problem if this approach could resolve the issue.
>>>>
>>>>>
>>>>> Theoretically there could also be conflicts, like one driver disabling
>>>>> the higher OPPs and another the lower ones, with the outcome of all
>>>>> OPPs being disabled, which would be a more drastic conflict resolution
>>>>> than that of devfreq_verify_within_limits()."
>>>>>
>>>>> What do you think about these points?
>>>>
>>>> It depends on how to use OPP interface on multiple device driver.
>>>> Even if devfreq/opp provides the control method, outside device driver
>>>> are misusing them. It is problem of user.
>>>
>>> I wouldn't call it misusing if two independent drivers take
>>> contradictory actions on an interface that doesn't provide
>>> arbitration. How can driver A know that it shouldn't disable OPPs a, b
>>> and c because driver B disabled d, e and f? Who is misusing the
>>> interface, driver A or driver B?
>>
>> Each outside driver has their own throttling policy to control OPP entries.
>> They don't care the requirement of other driver and cannot know the requirement
>> of other driver. devfreq core can only recognize them.
>>
>>>
>>>> Instead, if we use the OPP interface, we can check why OPP entry
>>>> is disabled or enabled through usage count.
>>>>
>>>>>
>>>>> The negative usage counts aren't necessarily a dealbreaker in a
>>>>> technical sense, though I'm not a friend of quirky interfaces that
>>>>> don't behave like a typical user would expect (e.g. an OPP isn't
>>>>> necessarily enabled after dev_pm_opp_enable()).
>>>>>
>>>>> I can sent an RFC with OPP usage counts, though due to the above
>>>>> concerns I have doubts it will be well received.
>>>>
>>>> Please add me to Cc list.
>>>
>>> Will do
>>
>> OK. Thanks.
>
> This might take a bit for a few reasons. Before posting anything I
> would like to experiment a bit with it and find time to do so between
> other tasks (admittedly I'm also procrastinating a bit, because I'm
> unconvinced). And I will be out of office for two weeks starting
> nextweek, it's probably not the best to post and then disapear from
> the discussion. I might post the RFC if I can advance it in the next
> 48 hours, otherwise I think it is better to delay until I'm back from
> vacation.
I agree you better to do this after your vacation.
--
Best Regards,
Chanwoo Choi
Samsung Electronics
Hi Matthias,
On 2018년 08월 07일 04:21, Matthias Kaehlcke wrote:
> Hi Chanwoo,
>
> On Fri, Aug 03, 2018 at 09:14:46AM +0900, Chanwoo Choi wrote:
>> Hi Matthias,
>>
>> On 2018년 08월 03일 08:48, Matthias Kaehlcke wrote:
>>> On Thu, Aug 02, 2018 at 04:13:43PM -0700, Matthias Kaehlcke wrote:
>>>> Hi Chanwoo,
>>>>
>>>> On Thu, Aug 02, 2018 at 10:58:59AM +0900, Chanwoo Choi wrote:
>>>>> Hi Matthias,
>>>>>
>>>>> On 2018년 08월 02일 02:08, Matthias Kaehlcke wrote:
>>>>>> Hi Chanwoo,
>>>>>>
>>>>>> On Wed, Aug 01, 2018 at 10:22:16AM +0900, Chanwoo Choi wrote:
>>>>>>> On 2018년 08월 01일 04:39, Matthias Kaehlcke wrote:
>>>>>>>> On Mon, Jul 16, 2018 at 10:50:50AM -0700, Matthias Kaehlcke wrote:
>>>>>>>>> On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote:
>>>>>>>>>> Hi Matthias,
>>>>>>>>>>
>>>>>>>>>> On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote:
>>>>>>>>>>> Hi Chanwoo,
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Firstly,
>>>>>>>>>>>> I'm not sure why devfreq needs the devfreq_verify_within_limits() function.
>>>>>>>>>>>>
>>>>>>>>>>>> devfreq already used the OPP interface as default. It means that
>>>>>>>>>>>> the outside of 'drivers/devfreq' can disable/enable the frequency
>>>>>>>>>>>> such as drivers/thermal/devfreq_cooling.c. Also, when some device
>>>>>>>>>>>> drivers disable/enable the specific frequency, the devfreq core
>>>>>>>>>>>> consider them.
>>>>>>>>>>>>
>>>>>>>>>>>> So, devfreq doesn't need to devfreq_verify_within_limits() because
>>>>>>>>>>>> already support some interface to change the minimum/maximum frequency
>>>>>>>>>>>> of devfreq device.
>>>>>>>>>>>>
>>>>>>>>>>>> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()'
>>>>>>>>>>>> to change the minimum/maximum frequency of cpu. some device driver cannot
>>>>>>>>>>>> change the minimum/maximum frequency through OPP interface.
>>>>>>>>>>>>
>>>>>>>>>>>> But, in case of devfreq subsystem, as I explained already, devfreq support
>>>>>>>>>>>> the OPP interface as default way. devfreq subsystem doesn't need to add
>>>>>>>>>>>> other way to change the minimum/maximum frequency.
>>>>>>>>>>>
>>>>>>>>>>> Using the OPP interface exclusively works as long as a
>>>>>>>>>>> enabling/disabling of OPPs is limited to a single driver
>>>>>>>>>>> (drivers/thermal/devfreq_cooling.c). When multiple drivers are
>>>>>>>>>>> involved you need a way to resolve conflicts, that's the purpose of
>>>>>>>>>>> devfreq_verify_within_limits(). Please let me know if there are
>>>>>>>>>>> existing mechanisms for conflict resolution that I overlooked.
>>>>>>>>>>>
>>>>>>>>>>> Possibly drivers/thermal/devfreq_cooling.c could be migrated to use
>>>>>>>>>>> devfreq_verify_within_limits() instead of the OPP interface if
>>>>>>>>>>> desired, however this seems beyond the scope of this series.
>>>>>>>>>>
>>>>>>>>>> Actually, if we uses this approach, it doesn't support the multiple drivers too.
>>>>>>>>>> If non throttler drivers uses devfreq_verify_within_limits(), the conflict
>>>>>>>>>> happen.
>>>>>>>>>
>>>>>>>>> As long as drivers limit the max freq there is no conflict, the lowest
>>>>>>>>> max freq wins. I expect this to be the usual case, apparently it
>>>>>>>>> worked for cpufreq for 10+ years.
>>>>>>>>>
>>>>>>>>> However it is correct that there would be a conflict if a driver
>>>>>>>>> requests a min freq that is higher than the max freq requested by
>>>>>>>>> another. In this case devfreq_verify_within_limits() resolves the
>>>>>>>>> conflict by raising p->max to the min freq. Not sure if this is
>>>>>>>>> something that would ever occur in practice though.
>>>>>>>>>
>>>>>>>>> If we are really concerned about this case it would also be an option
>>>>>>>>> to limit the adjustment to the max frequency.
>>>>>>>>>
>>>>>>>>>> To resolve the conflict for multiple device driver, maybe OPP interface
>>>>>>>>>> have to support 'usage_count' such as clk_enable/disable().
>>>>>>>>>
>>>>>>>>> This would require supporting negative usage count values, since a OPP
>>>>>>>>> should not be enabled if e.g. thermal enables it but the throttler
>>>>>>>>> disabled it or viceversa.
>>>>>>>>>
>>>>>>>>> Theoretically there could also be conflicts, like one driver disabling
>>>>>>>>> the higher OPPs and another the lower ones, with the outcome of all
>>>>>>>>> OPPs being disabled, which would be a more drastic conflict resolution
>>>>>>>>> than that of devfreq_verify_within_limits().
>>>>>>>>>
>>>>>>>>> Viresh, what do you think about an OPP usage count?
>>>>>>>>
>>>>>>>> Ping, can we try to reach a conclusion on this or at least keep the
>>>>>>>> discussion going?
>>>>>>>>
>>>>>>>> Not that it matters, but my preferred solution continues to be
>>>>>>>> devfreq_verify_within_limits(). It solves conflicts in some way (which
>>>>>>>> could be adjusted if needed) and has proven to work in practice for
>>>>>>>> 10+ years in a very similar sub-system.
>>>>>>>
>>>>>>> It is not true. Current cpufreq subsystem doesn't support external OPP
>>>>>>> control to enable/disable the OPP entry. If some device driver
>>>>>>> controls the OPP entry of cpufreq driver with opp_disable/enable(),
>>>>>>> the operation is not working. Because cpufreq considers the limit
>>>>>>> through 'cpufreq_verify_with_limits()' only.
>>>>>>
>>>>>> Ok, we can probably agree that using cpufreq_verify_with_limits()
>>>>>> exclusively seems to have worked well for cpufreq, and that in their
>>>>>> overall purpose cpufreq and devfreq are similar subsystems.
>>>>>>
>>>>>> The current throttler series with devfreq_verify_within_limits() takes
>>>>>> the enabled OPPs into account, the lowest and highest OPP are used as
>>>>>> a starting point for the frequency adjustment and (in theory) the
>>>>>> frequency range should only be narrowed by
>>>>>> devfreq_verify_within_limits().
>>>>>>
>>>>>>> As I already commented[1], there is different between cpufreq and devfreq.
>>>>>>> [1] https://lkml.org/lkml/2018/7/4/80
>>>>>>>
>>>>>>> Already, subsystem already used OPP interface in order to control
>>>>>>> specific OPP entry. I don't want to provide two outside method
>>>>>>> to control the frequency of devfreq driver. It might make the confusion.
>>>>>>
>>>>>> I understand your point, it would indeed be preferable to have a
>>>>>> single method. However I'm not convinced that the OPP interface is
>>>>>> a suitable solution, as I exposed earlier in this thread (quoted
>>>>>> below).
>>>>>>
>>>>>> I would like you to at least consider the possibility of changing
>>>>>> drivers/thermal/devfreq_cooling.c to devfreq_verify_within_limits().
>>>>>> Besides that it's not what is currently used, do you see any technical
>>>>>> concerns that would make devfreq_verify_within_limits() an unsuitable
>>>>>> or inferior solution?
>>>>>
>>>>> As we already discussed, devfreq_verify_within_limits() doesn't support
>>>>> the multiple outside controllers (e.g., devfreq-cooling.c).
>>>>
>>>> That's incorrect, its purpose is precisely that.
>>>>
>>>> Are you suggesting that cpufreq with its use of
>>>> cpufreq_verify_within_limits() (the inspiration for
>>>> devfreq_verify_within_limits()) is broken? It is used by cpu_cooling.c
>>>> and other drivers when receiving a CPUFREQ_ADJUST event, essentially
>>>> what I am proposing with DEVFREQ_ADJUST.
>>>>
>>>> Could you elaborate why this model wouldn't work for devfreq? "OPP
>>>> interface is mandatory for devfreq" isn't really a technical argument,
>>>> is it mandatory for any other reason than that it is the interface
>>>> that is currently used?
>>>>
>>>>> After you are suggesting the throttler core, there are at least two
>>>>> outside controllers (e.g., devfreq-cooling.c and throttler driver).
>>>>> As I knew the problem about conflict, I cannot agree the temporary
>>>>> method. OPP interface is mandatory for devfreq in order to control
>>>>> the OPP (frequency/voltage). In this situation, we have to try to
>>>>> find the method through OPP interface.
>>>>
>>>> What do you mean with "temporary method"?
>>>>
>>>> We can try to find a method through the OPP interface, but at this
>>>> point I'm not convinced that it is technically necessary or even
>>>> preferable.
>>>>
>>>> Another inconvenient of the OPP approach for both devfreq-cooling.c
>>>> and the throttler is that they have to bother with disabling all OPPs
>>>> above/below the max/min (they don't/shouldn't have to care), instead
>>>> of just telling devfreq the max/min.
>>>
>>> And a more important one: both drivers now have to keep track which
>>> OPPs they enabled/disabled previously, done are the days of a simple
>>> dev_pm_opp_enable/disable() in devfreq_cooling. Certainly it is
>>> possible and not very complex to implement, but is it really the
>>> best/a good solution?
>>
>>
>> As I replied them right before, Each outside driver has their own throttling
>> policy to control OPP entries. They don't care the requirement of other
>> driver and cannot know the requirement of other driver. devfreq core can only
>> recognize them and then only consider enabled OPP entris without disabled OPP entries.
>>
>> For example1,
>> | devfreq-cooling| throttler
>> ---------------------------------------
>> 500Mhz | disabled | disabled
>> 400Mhz | disabled | disabled
>> 300Mhz | | disabled
>> 200Mhz | |
>> 100Mhz | |
>> => devfreq driver can use only 100/200Mhz
>>
>>
>> For example2,
>> | devfreq-cooling| throttler
>> ---------------------------------------
>> 500Mhz | disabled | disabled
>> 400Mhz | disabled |
>> 300Mhz | disabled |
>> 200Mhz | |
>> 100Mhz | |
>> => devfreq driver can use only 100/200Mhz
>>
>>
>> For example3,
>> | devfreq-cooling| throttler
>> ---------------------------------------
>> 500Mhz | disabled | disabled
>> 400Mhz | |
>> 300Mhz | |
>> 200Mhz | | disabled
>> 100Mhz | | disabled
>> => devfreq driver can use only 300/400Mhz
>
> These are all cases without conflicts, my concern is about this:
>
>> | devfreq-cooling| throttler
>> ---------------------------------------
>> 500Mhz | disabled |
>> 400Mhz | disabled |
>> 300Mhz | | disabled
>> 200Mhz | | disabled
>> 100Mhz | | disabled
>> => devfreq driver can't use any frequency?
There are no any enabled frequency. Because device driver
(devfreq-cooling, throttler) disable all frequencies.
Outside drivers(devfreq-cooling, throttler) can enable/disable
specific OPP entries. As I already commented, each outside driver
doesn't consider the policy of other device driver about OPP entries.
OPP interface is independent on devfreq and just control OPP entries.
After that, devfreq just consider the only enabled OPP entries.
>
> Actually my above comment wasn't about this case, but about the
> added complexity in devfreq-cooling.c and the throttler:
>
> A bit simplified partition_enable_opps() currently does this:
>
> for_each_opp(opp) {
> if (opp->freq <= max)
> opp_enable(opp)
> else
> opp_disable(opp)
> }
>
> With the OPP usage/disable count this doesn't work any longer. Now we
> need to keep track of the enabled/disabled state of the OPP, something
> like:
>
> dev_pm_opp_enable(opp) {
> if (opp->freq <= max) {
> if (opp->freq > prev_max)
> opp_enable(opp)
> } else {
> if (opp->freq < prev_max)
> opp_disable(opp)
> }
> }
>
> And duplicate the same in the throttler (and other possible
> drivers). Obviously it can be done, but is there really any gain
> from it?
>
> Instead they just could do:
>
> devfreq_verify_within_limits(policy/freq_pair, 0, max_freq)
>
> without being concerned about implementation details of devfreq.
>
I don't think so. dev_pm_opp_enable()/dev_pm_opp_disable()
have to consider only one OPP entry without any other OPP entry.
dev_pm_opp_enable()/dev_pm_opp_disable() can never know the other
OPP entries. After some driver(devfreq-cooling.c and throttler)
enable or disable specific OPP entries, the remaining OPP entry
with enabled state will be considered on devfreq driver.
--
Best Regards,
Chanwoo Choi
Samsung Electronics
Hi Viresh Kumar,
I have a question about dev_pm_opp_enable() and dev_pm_opp_disable().
Two functions have 'available' field to indicate the status of specific OPP.
If different device drivers try to control the same OPP,
dev_pm_opp_enable() and dev_pm_opp_disable() will consider only last operation.
It means that OPP should be enabled/disabled by only one device driver.
For example,
opp_table of driver a(dev_a)
- 500Mhz
- 400Mhz
- 300Mhz
- 200Mhz
- 100Mhz
Driver B, opp_disable(dev_a, 500)
Driver C, opp_enable(dev_a, 500)
-> 500Mhz is enabled. But, driver B might want to enable 500Mhz at this time such as cooling.
I think that if OPP support the use of multiple device drivers,
dev_pm_opp_enable() and dev_pm_opp_disable() should support the usage count
such as regulator/clock.
I would like your opinion.
Regards,
Chanwoo Choi
On 2018년 08월 07일 07:31, Chanwoo Choi wrote:
> Hi Matthias,
>
> On 2018년 08월 07일 04:21, Matthias Kaehlcke wrote:
>> Hi Chanwoo,
>>
>> On Fri, Aug 03, 2018 at 09:14:46AM +0900, Chanwoo Choi wrote:
>>> Hi Matthias,
>>>
>>> On 2018년 08월 03일 08:48, Matthias Kaehlcke wrote:
>>>> On Thu, Aug 02, 2018 at 04:13:43PM -0700, Matthias Kaehlcke wrote:
>>>>> Hi Chanwoo,
>>>>>
>>>>> On Thu, Aug 02, 2018 at 10:58:59AM +0900, Chanwoo Choi wrote:
>>>>>> Hi Matthias,
>>>>>>
>>>>>> On 2018년 08월 02일 02:08, Matthias Kaehlcke wrote:
>>>>>>> Hi Chanwoo,
>>>>>>>
>>>>>>> On Wed, Aug 01, 2018 at 10:22:16AM +0900, Chanwoo Choi wrote:
>>>>>>>> On 2018년 08월 01일 04:39, Matthias Kaehlcke wrote:
>>>>>>>>> On Mon, Jul 16, 2018 at 10:50:50AM -0700, Matthias Kaehlcke wrote:
>>>>>>>>>> On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote:
>>>>>>>>>>> Hi Matthias,
>>>>>>>>>>>
>>>>>>>>>>> On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote:
>>>>>>>>>>>> Hi Chanwoo,
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Firstly,
>>>>>>>>>>>>> I'm not sure why devfreq needs the devfreq_verify_within_limits() function.
>>>>>>>>>>>>>
>>>>>>>>>>>>> devfreq already used the OPP interface as default. It means that
>>>>>>>>>>>>> the outside of 'drivers/devfreq' can disable/enable the frequency
>>>>>>>>>>>>> such as drivers/thermal/devfreq_cooling.c. Also, when some device
>>>>>>>>>>>>> drivers disable/enable the specific frequency, the devfreq core
>>>>>>>>>>>>> consider them.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So, devfreq doesn't need to devfreq_verify_within_limits() because
>>>>>>>>>>>>> already support some interface to change the minimum/maximum frequency
>>>>>>>>>>>>> of devfreq device.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()'
>>>>>>>>>>>>> to change the minimum/maximum frequency of cpu. some device driver cannot
>>>>>>>>>>>>> change the minimum/maximum frequency through OPP interface.
>>>>>>>>>>>>>
>>>>>>>>>>>>> But, in case of devfreq subsystem, as I explained already, devfreq support
>>>>>>>>>>>>> the OPP interface as default way. devfreq subsystem doesn't need to add
>>>>>>>>>>>>> other way to change the minimum/maximum frequency.
>>>>>>>>>>>>
>>>>>>>>>>>> Using the OPP interface exclusively works as long as a
>>>>>>>>>>>> enabling/disabling of OPPs is limited to a single driver
>>>>>>>>>>>> (drivers/thermal/devfreq_cooling.c). When multiple drivers are
>>>>>>>>>>>> involved you need a way to resolve conflicts, that's the purpose of
>>>>>>>>>>>> devfreq_verify_within_limits(). Please let me know if there are
>>>>>>>>>>>> existing mechanisms for conflict resolution that I overlooked.
>>>>>>>>>>>>
>>>>>>>>>>>> Possibly drivers/thermal/devfreq_cooling.c could be migrated to use
>>>>>>>>>>>> devfreq_verify_within_limits() instead of the OPP interface if
>>>>>>>>>>>> desired, however this seems beyond the scope of this series.
>>>>>>>>>>>
>>>>>>>>>>> Actually, if we uses this approach, it doesn't support the multiple drivers too.
>>>>>>>>>>> If non throttler drivers uses devfreq_verify_within_limits(), the conflict
>>>>>>>>>>> happen.
>>>>>>>>>>
>>>>>>>>>> As long as drivers limit the max freq there is no conflict, the lowest
>>>>>>>>>> max freq wins. I expect this to be the usual case, apparently it
>>>>>>>>>> worked for cpufreq for 10+ years.
>>>>>>>>>>
>>>>>>>>>> However it is correct that there would be a conflict if a driver
>>>>>>>>>> requests a min freq that is higher than the max freq requested by
>>>>>>>>>> another. In this case devfreq_verify_within_limits() resolves the
>>>>>>>>>> conflict by raising p->max to the min freq. Not sure if this is
>>>>>>>>>> something that would ever occur in practice though.
>>>>>>>>>>
>>>>>>>>>> If we are really concerned about this case it would also be an option
>>>>>>>>>> to limit the adjustment to the max frequency.
>>>>>>>>>>
>>>>>>>>>>> To resolve the conflict for multiple device driver, maybe OPP interface
>>>>>>>>>>> have to support 'usage_count' such as clk_enable/disable().
>>>>>>>>>>
>>>>>>>>>> This would require supporting negative usage count values, since a OPP
>>>>>>>>>> should not be enabled if e.g. thermal enables it but the throttler
>>>>>>>>>> disabled it or viceversa.
>>>>>>>>>>
>>>>>>>>>> Theoretically there could also be conflicts, like one driver disabling
>>>>>>>>>> the higher OPPs and another the lower ones, with the outcome of all
>>>>>>>>>> OPPs being disabled, which would be a more drastic conflict resolution
>>>>>>>>>> than that of devfreq_verify_within_limits().
>>>>>>>>>>
>>>>>>>>>> Viresh, what do you think about an OPP usage count?
>>>>>>>>>
>>>>>>>>> Ping, can we try to reach a conclusion on this or at least keep the
>>>>>>>>> discussion going?
>>>>>>>>>
>>>>>>>>> Not that it matters, but my preferred solution continues to be
>>>>>>>>> devfreq_verify_within_limits(). It solves conflicts in some way (which
>>>>>>>>> could be adjusted if needed) and has proven to work in practice for
>>>>>>>>> 10+ years in a very similar sub-system.
>>>>>>>>
>>>>>>>> It is not true. Current cpufreq subsystem doesn't support external OPP
>>>>>>>> control to enable/disable the OPP entry. If some device driver
>>>>>>>> controls the OPP entry of cpufreq driver with opp_disable/enable(),
>>>>>>>> the operation is not working. Because cpufreq considers the limit
>>>>>>>> through 'cpufreq_verify_with_limits()' only.
>>>>>>>
>>>>>>> Ok, we can probably agree that using cpufreq_verify_with_limits()
>>>>>>> exclusively seems to have worked well for cpufreq, and that in their
>>>>>>> overall purpose cpufreq and devfreq are similar subsystems.
>>>>>>>
>>>>>>> The current throttler series with devfreq_verify_within_limits() takes
>>>>>>> the enabled OPPs into account, the lowest and highest OPP are used as
>>>>>>> a starting point for the frequency adjustment and (in theory) the
>>>>>>> frequency range should only be narrowed by
>>>>>>> devfreq_verify_within_limits().
>>>>>>>
>>>>>>>> As I already commented[1], there is different between cpufreq and devfreq.
>>>>>>>> [1] https://lkml.org/lkml/2018/7/4/80
>>>>>>>>
>>>>>>>> Already, subsystem already used OPP interface in order to control
>>>>>>>> specific OPP entry. I don't want to provide two outside method
>>>>>>>> to control the frequency of devfreq driver. It might make the confusion.
>>>>>>>
>>>>>>> I understand your point, it would indeed be preferable to have a
>>>>>>> single method. However I'm not convinced that the OPP interface is
>>>>>>> a suitable solution, as I exposed earlier in this thread (quoted
>>>>>>> below).
>>>>>>>
>>>>>>> I would like you to at least consider the possibility of changing
>>>>>>> drivers/thermal/devfreq_cooling.c to devfreq_verify_within_limits().
>>>>>>> Besides that it's not what is currently used, do you see any technical
>>>>>>> concerns that would make devfreq_verify_within_limits() an unsuitable
>>>>>>> or inferior solution?
>>>>>>
>>>>>> As we already discussed, devfreq_verify_within_limits() doesn't support
>>>>>> the multiple outside controllers (e.g., devfreq-cooling.c).
>>>>>
>>>>> That's incorrect, its purpose is precisely that.
>>>>>
>>>>> Are you suggesting that cpufreq with its use of
>>>>> cpufreq_verify_within_limits() (the inspiration for
>>>>> devfreq_verify_within_limits()) is broken? It is used by cpu_cooling.c
>>>>> and other drivers when receiving a CPUFREQ_ADJUST event, essentially
>>>>> what I am proposing with DEVFREQ_ADJUST.
>>>>>
>>>>> Could you elaborate why this model wouldn't work for devfreq? "OPP
>>>>> interface is mandatory for devfreq" isn't really a technical argument,
>>>>> is it mandatory for any other reason than that it is the interface
>>>>> that is currently used?
>>>>>
>>>>>> After you are suggesting the throttler core, there are at least two
>>>>>> outside controllers (e.g., devfreq-cooling.c and throttler driver).
>>>>>> As I knew the problem about conflict, I cannot agree the temporary
>>>>>> method. OPP interface is mandatory for devfreq in order to control
>>>>>> the OPP (frequency/voltage). In this situation, we have to try to
>>>>>> find the method through OPP interface.
>>>>>
>>>>> What do you mean with "temporary method"?
>>>>>
>>>>> We can try to find a method through the OPP interface, but at this
>>>>> point I'm not convinced that it is technically necessary or even
>>>>> preferable.
>>>>>
>>>>> Another inconvenient of the OPP approach for both devfreq-cooling.c
>>>>> and the throttler is that they have to bother with disabling all OPPs
>>>>> above/below the max/min (they don't/shouldn't have to care), instead
>>>>> of just telling devfreq the max/min.
>>>>
>>>> And a more important one: both drivers now have to keep track which
>>>> OPPs they enabled/disabled previously, done are the days of a simple
>>>> dev_pm_opp_enable/disable() in devfreq_cooling. Certainly it is
>>>> possible and not very complex to implement, but is it really the
>>>> best/a good solution?
>>>
>>>
>>> As I replied them right before, Each outside driver has their own throttling
>>> policy to control OPP entries. They don't care the requirement of other
>>> driver and cannot know the requirement of other driver. devfreq core can only
>>> recognize them and then only consider enabled OPP entris without disabled OPP entries.
>>>
>>> For example1,
>>> | devfreq-cooling| throttler
>>> ---------------------------------------
>>> 500Mhz | disabled | disabled
>>> 400Mhz | disabled | disabled
>>> 300Mhz | | disabled
>>> 200Mhz | |
>>> 100Mhz | |
>>> => devfreq driver can use only 100/200Mhz
>>>
>>>
>>> For example2,
>>> | devfreq-cooling| throttler
>>> ---------------------------------------
>>> 500Mhz | disabled | disabled
>>> 400Mhz | disabled |
>>> 300Mhz | disabled |
>>> 200Mhz | |
>>> 100Mhz | |
>>> => devfreq driver can use only 100/200Mhz
>>>
>>>
>>> For example3,
>>> | devfreq-cooling| throttler
>>> ---------------------------------------
>>> 500Mhz | disabled | disabled
>>> 400Mhz | |
>>> 300Mhz | |
>>> 200Mhz | | disabled
>>> 100Mhz | | disabled
>>> => devfreq driver can use only 300/400Mhz
>>
>> These are all cases without conflicts, my concern is about this:
>>
>>> | devfreq-cooling| throttler
>>> ---------------------------------------
>>> 500Mhz | disabled |
>>> 400Mhz | disabled |
>>> 300Mhz | | disabled
>>> 200Mhz | | disabled
>>> 100Mhz | | disabled
>>> => devfreq driver can't use any frequency?
>
> There are no any enabled frequency. Because device driver
> (devfreq-cooling, throttler) disable all frequencies.
>
> Outside drivers(devfreq-cooling, throttler) can enable/disable
> specific OPP entries. As I already commented, each outside driver
> doesn't consider the policy of other device driver about OPP entries.
>
> OPP interface is independent on devfreq and just control OPP entries.
> After that, devfreq just consider the only enabled OPP entries.
In this case, at least one OPP should be remained on enabled state.
Maybe, OPP interface should provide the function which cannot disable
specific OPP entry.
>
>>
>> Actually my above comment wasn't about this case, but about the
>> added complexity in devfreq-cooling.c and the throttler:
>>
>> A bit simplified partition_enable_opps() currently does this:
>>
>> for_each_opp(opp) {
>> if (opp->freq <= max)
>> opp_enable(opp)
>> else
>> opp_disable(opp)
>> }
>>
>> With the OPP usage/disable count this doesn't work any longer. Now we
>> need to keep track of the enabled/disabled state of the OPP, something
>> like:
>>
>> dev_pm_opp_enable(opp) {
>> if (opp->freq <= max) {
>> if (opp->freq > prev_max)
>> opp_enable(opp)
>> } else {
>> if (opp->freq < prev_max)
>> opp_disable(opp)
>> }
>> }
>>
>> And duplicate the same in the throttler (and other possible
>> drivers). Obviously it can be done, but is there really any gain
>> from it?
>>
>> Instead they just could do:
>>
>> devfreq_verify_within_limits(policy/freq_pair, 0, max_freq)
>>
>> without being concerned about implementation details of devfreq.
>>
>
> I don't think so. dev_pm_opp_enable()/dev_pm_opp_disable()
> have to consider only one OPP entry without any other OPP entry.
>
> dev_pm_opp_enable()/dev_pm_opp_disable() can never know the other
> OPP entries. After some driver(devfreq-cooling.c and throttler)
> enable or disable specific OPP entries, the remaining OPP entry
> with enabled state will be considered on devfreq driver.
>
Hi Chanwoo,
On Tue, Aug 07, 2018 at 07:31:16AM +0900, Chanwoo Choi wrote:
> Hi Matthias,
>
> On 2018년 08월 07일 04:21, Matthias Kaehlcke wrote:
> > Hi Chanwoo,
> >
> > On Fri, Aug 03, 2018 at 09:14:46AM +0900, Chanwoo Choi wrote:
> >> Hi Matthias,
> >>
> >> On 2018년 08월 03일 08:48, Matthias Kaehlcke wrote:
> >>> On Thu, Aug 02, 2018 at 04:13:43PM -0700, Matthias Kaehlcke wrote:
> >>>> Hi Chanwoo,
> >>>>
> >>>> On Thu, Aug 02, 2018 at 10:58:59AM +0900, Chanwoo Choi wrote:
> >>>>> Hi Matthias,
> >>>>>
> >>>>> On 2018년 08월 02일 02:08, Matthias Kaehlcke wrote:
> >>>>>> Hi Chanwoo,
> >>>>>>
> >>>>>> On Wed, Aug 01, 2018 at 10:22:16AM +0900, Chanwoo Choi wrote:
> >>>>>>> On 2018년 08월 01일 04:39, Matthias Kaehlcke wrote:
> >>>>>>>> On Mon, Jul 16, 2018 at 10:50:50AM -0700, Matthias Kaehlcke wrote:
> >>>>>>>>> On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote:
> >>>>>>>>>> Hi Matthias,
> >>>>>>>>>>
> >>>>>>>>>> On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote:
> >>>>>>>>>>> Hi Chanwoo,
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Firstly,
> >>>>>>>>>>>> I'm not sure why devfreq needs the devfreq_verify_within_limits() function.
> >>>>>>>>>>>>
> >>>>>>>>>>>> devfreq already used the OPP interface as default. It means that
> >>>>>>>>>>>> the outside of 'drivers/devfreq' can disable/enable the frequency
> >>>>>>>>>>>> such as drivers/thermal/devfreq_cooling.c. Also, when some device
> >>>>>>>>>>>> drivers disable/enable the specific frequency, the devfreq core
> >>>>>>>>>>>> consider them.
> >>>>>>>>>>>>
> >>>>>>>>>>>> So, devfreq doesn't need to devfreq_verify_within_limits() because
> >>>>>>>>>>>> already support some interface to change the minimum/maximum frequency
> >>>>>>>>>>>> of devfreq device.
> >>>>>>>>>>>>
> >>>>>>>>>>>> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()'
> >>>>>>>>>>>> to change the minimum/maximum frequency of cpu. some device driver cannot
> >>>>>>>>>>>> change the minimum/maximum frequency through OPP interface.
> >>>>>>>>>>>>
> >>>>>>>>>>>> But, in case of devfreq subsystem, as I explained already, devfreq support
> >>>>>>>>>>>> the OPP interface as default way. devfreq subsystem doesn't need to add
> >>>>>>>>>>>> other way to change the minimum/maximum frequency.
> >>>>>>>>>>>
> >>>>>>>>>>> Using the OPP interface exclusively works as long as a
> >>>>>>>>>>> enabling/disabling of OPPs is limited to a single driver
> >>>>>>>>>>> (drivers/thermal/devfreq_cooling.c). When multiple drivers are
> >>>>>>>>>>> involved you need a way to resolve conflicts, that's the purpose of
> >>>>>>>>>>> devfreq_verify_within_limits(). Please let me know if there are
> >>>>>>>>>>> existing mechanisms for conflict resolution that I overlooked.
> >>>>>>>>>>>
> >>>>>>>>>>> Possibly drivers/thermal/devfreq_cooling.c could be migrated to use
> >>>>>>>>>>> devfreq_verify_within_limits() instead of the OPP interface if
> >>>>>>>>>>> desired, however this seems beyond the scope of this series.
> >>>>>>>>>>
> >>>>>>>>>> Actually, if we uses this approach, it doesn't support the multiple drivers too.
> >>>>>>>>>> If non throttler drivers uses devfreq_verify_within_limits(), the conflict
> >>>>>>>>>> happen.
> >>>>>>>>>
> >>>>>>>>> As long as drivers limit the max freq there is no conflict, the lowest
> >>>>>>>>> max freq wins. I expect this to be the usual case, apparently it
> >>>>>>>>> worked for cpufreq for 10+ years.
> >>>>>>>>>
> >>>>>>>>> However it is correct that there would be a conflict if a driver
> >>>>>>>>> requests a min freq that is higher than the max freq requested by
> >>>>>>>>> another. In this case devfreq_verify_within_limits() resolves the
> >>>>>>>>> conflict by raising p->max to the min freq. Not sure if this is
> >>>>>>>>> something that would ever occur in practice though.
> >>>>>>>>>
> >>>>>>>>> If we are really concerned about this case it would also be an option
> >>>>>>>>> to limit the adjustment to the max frequency.
> >>>>>>>>>
> >>>>>>>>>> To resolve the conflict for multiple device driver, maybe OPP interface
> >>>>>>>>>> have to support 'usage_count' such as clk_enable/disable().
> >>>>>>>>>
> >>>>>>>>> This would require supporting negative usage count values, since a OPP
> >>>>>>>>> should not be enabled if e.g. thermal enables it but the throttler
> >>>>>>>>> disabled it or viceversa.
> >>>>>>>>>
> >>>>>>>>> Theoretically there could also be conflicts, like one driver disabling
> >>>>>>>>> the higher OPPs and another the lower ones, with the outcome of all
> >>>>>>>>> OPPs being disabled, which would be a more drastic conflict resolution
> >>>>>>>>> than that of devfreq_verify_within_limits().
> >>>>>>>>>
> >>>>>>>>> Viresh, what do you think about an OPP usage count?
> >>>>>>>>
> >>>>>>>> Ping, can we try to reach a conclusion on this or at least keep the
> >>>>>>>> discussion going?
> >>>>>>>>
> >>>>>>>> Not that it matters, but my preferred solution continues to be
> >>>>>>>> devfreq_verify_within_limits(). It solves conflicts in some way (which
> >>>>>>>> could be adjusted if needed) and has proven to work in practice for
> >>>>>>>> 10+ years in a very similar sub-system.
> >>>>>>>
> >>>>>>> It is not true. Current cpufreq subsystem doesn't support external OPP
> >>>>>>> control to enable/disable the OPP entry. If some device driver
> >>>>>>> controls the OPP entry of cpufreq driver with opp_disable/enable(),
> >>>>>>> the operation is not working. Because cpufreq considers the limit
> >>>>>>> through 'cpufreq_verify_with_limits()' only.
> >>>>>>
> >>>>>> Ok, we can probably agree that using cpufreq_verify_with_limits()
> >>>>>> exclusively seems to have worked well for cpufreq, and that in their
> >>>>>> overall purpose cpufreq and devfreq are similar subsystems.
> >>>>>>
> >>>>>> The current throttler series with devfreq_verify_within_limits() takes
> >>>>>> the enabled OPPs into account, the lowest and highest OPP are used as
> >>>>>> a starting point for the frequency adjustment and (in theory) the
> >>>>>> frequency range should only be narrowed by
> >>>>>> devfreq_verify_within_limits().
> >>>>>>
> >>>>>>> As I already commented[1], there is different between cpufreq and devfreq.
> >>>>>>> [1] https://lkml.org/lkml/2018/7/4/80
> >>>>>>>
> >>>>>>> Already, subsystem already used OPP interface in order to control
> >>>>>>> specific OPP entry. I don't want to provide two outside method
> >>>>>>> to control the frequency of devfreq driver. It might make the confusion.
> >>>>>>
> >>>>>> I understand your point, it would indeed be preferable to have a
> >>>>>> single method. However I'm not convinced that the OPP interface is
> >>>>>> a suitable solution, as I exposed earlier in this thread (quoted
> >>>>>> below).
> >>>>>>
> >>>>>> I would like you to at least consider the possibility of changing
> >>>>>> drivers/thermal/devfreq_cooling.c to devfreq_verify_within_limits().
> >>>>>> Besides that it's not what is currently used, do you see any technical
> >>>>>> concerns that would make devfreq_verify_within_limits() an unsuitable
> >>>>>> or inferior solution?
> >>>>>
> >>>>> As we already discussed, devfreq_verify_within_limits() doesn't support
> >>>>> the multiple outside controllers (e.g., devfreq-cooling.c).
> >>>>
> >>>> That's incorrect, its purpose is precisely that.
> >>>>
> >>>> Are you suggesting that cpufreq with its use of
> >>>> cpufreq_verify_within_limits() (the inspiration for
> >>>> devfreq_verify_within_limits()) is broken? It is used by cpu_cooling.c
> >>>> and other drivers when receiving a CPUFREQ_ADJUST event, essentially
> >>>> what I am proposing with DEVFREQ_ADJUST.
> >>>>
> >>>> Could you elaborate why this model wouldn't work for devfreq? "OPP
> >>>> interface is mandatory for devfreq" isn't really a technical argument,
> >>>> is it mandatory for any other reason than that it is the interface
> >>>> that is currently used?
> >>>>
> >>>>> After you are suggesting the throttler core, there are at least two
> >>>>> outside controllers (e.g., devfreq-cooling.c and throttler driver).
> >>>>> As I knew the problem about conflict, I cannot agree the temporary
> >>>>> method. OPP interface is mandatory for devfreq in order to control
> >>>>> the OPP (frequency/voltage). In this situation, we have to try to
> >>>>> find the method through OPP interface.
> >>>>
> >>>> What do you mean with "temporary method"?
> >>>>
> >>>> We can try to find a method through the OPP interface, but at this
> >>>> point I'm not convinced that it is technically necessary or even
> >>>> preferable.
> >>>>
> >>>> Another inconvenient of the OPP approach for both devfreq-cooling.c
> >>>> and the throttler is that they have to bother with disabling all OPPs
> >>>> above/below the max/min (they don't/shouldn't have to care), instead
> >>>> of just telling devfreq the max/min.
> >>>
> >>> And a more important one: both drivers now have to keep track which
> >>> OPPs they enabled/disabled previously, done are the days of a simple
> >>> dev_pm_opp_enable/disable() in devfreq_cooling. Certainly it is
> >>> possible and not very complex to implement, but is it really the
> >>> best/a good solution?
> >>
> >>
> >> As I replied them right before, Each outside driver has their own throttling
> >> policy to control OPP entries. They don't care the requirement of other
> >> driver and cannot know the requirement of other driver. devfreq core can only
> >> recognize them and then only consider enabled OPP entris without disabled OPP entries.
> >>
> >> For example1,
> >> | devfreq-cooling| throttler
> >> ---------------------------------------
> >> 500Mhz | disabled | disabled
> >> 400Mhz | disabled | disabled
> >> 300Mhz | | disabled
> >> 200Mhz | |
> >> 100Mhz | |
> >> => devfreq driver can use only 100/200Mhz
> >>
> >>
> >> For example2,
> >> | devfreq-cooling| throttler
> >> ---------------------------------------
> >> 500Mhz | disabled | disabled
> >> 400Mhz | disabled |
> >> 300Mhz | disabled |
> >> 200Mhz | |
> >> 100Mhz | |
> >> => devfreq driver can use only 100/200Mhz
> >>
> >>
> >> For example3,
> >> | devfreq-cooling| throttler
> >> ---------------------------------------
> >> 500Mhz | disabled | disabled
> >> 400Mhz | |
> >> 300Mhz | |
> >> 200Mhz | | disabled
> >> 100Mhz | | disabled
> >> => devfreq driver can use only 300/400Mhz
> >
> > These are all cases without conflicts, my concern is about this:
> >
> >> | devfreq-cooling| throttler
> >> ---------------------------------------
> >> 500Mhz | disabled |
> >> 400Mhz | disabled |
> >> 300Mhz | | disabled
> >> 200Mhz | | disabled
> >> 100Mhz | | disabled
> >> => devfreq driver can't use any frequency?
>
> There are no any enabled frequency. Because device driver
> (devfreq-cooling, throttler) disable all frequencies.
>
> Outside drivers(devfreq-cooling, throttler) can enable/disable
> specific OPP entries. As I already commented, each outside driver
> doesn't consider the policy of other device driver about OPP entries.
And wouldn't it be preferable to have an interface that tries to avoid
this situation in the first place and has a clear policy for conflict
resolution?
> OPP interface is independent on devfreq and just control OPP entries.
> After that, devfreq just consider the only enabled OPP entries.
>
> >
> > Actually my above comment wasn't about this case, but about the
> > added complexity in devfreq-cooling.c and the throttler:
> >
> > A bit simplified partition_enable_opps() currently does this:
> >
> > for_each_opp(opp) {
> > if (opp->freq <= max)
> > opp_enable(opp)
> > else
> > opp_disable(opp)
> > }
> >
> > With the OPP usage/disable count this doesn't work any longer. Now we
> > need to keep track of the enabled/disabled state of the OPP, something
> > like:
> >
> > dev_pm_opp_enable(opp) {
> > if (opp->freq <= max) {
> > if (opp->freq > prev_max)
> > opp_enable(opp)
> > } else {
> > if (opp->freq < prev_max)
> > opp_disable(opp)
> > }
> > }
> >
> > And duplicate the same in the throttler (and other possible
> > drivers). Obviously it can be done, but is there really any gain
> > from it?
> >
> > Instead they just could do:
> >
> > devfreq_verify_within_limits(policy/freq_pair, 0, max_freq)
> >
> > without being concerned about implementation details of devfreq.
> >
>
> I don't think so.
What are you referring to, the change that I claim that will be needed
in partition_enable_opps() when OPPs have usage/disable counts? If so,
how do you avoid that the function doesn't enable/disable an OPP that
was already enabled/disabled in the previous iteration?
> dev_pm_opp_enable()/dev_pm_opp_disable() have to consider only one
> OPP entry without any other OPP entry.
I agree with this :)
> dev_pm_opp_enable()/dev_pm_opp_disable() can never know the other
> OPP entries. After some driver(devfreq-cooling.c and throttler)
> enable or disable specific OPP entries, the remaining OPP entry
> with enabled state will be considered on devfreq driver.
Having multiple drivers (or even a single one) enable and disable
OPPs independently and at the time of their choosing sounds like a
recipe for race conditions.
What happens if e.g. the devfreq core calls
dev_pm_opp_find_freq_ceil/floor() and right after returning another
driver disables the OPP? devfreq uses the disabled OPP. Probably not a
big deal if disabling the OPP is only a semantic question, but I
imagine there can be worse scenarios. Currently the only user of
dev_pm_opp_disable() besides devfreq_cooling.c is imx6q-cpufreq.c, and
it is well behaved and only disables OPPs during probe().
I keep missing a clear answer to the question in which sense
manipulating the OPPs in devfreq_cooling.c is superior over narrowing
down the frequency during DEVFREQ_ADJUST, which would avoid potential
races and allow to resolve conflicts. Does it allow for some
functionality that couldn't be achieved otherwise, does it make the
code significantly less complex, is some integration with the OPP
subsystem needed that I'm overlooking, is it more efficient, ...?
I'm not just insisting because I'm stubborn. I'd be happy to use any
interface that fits the bill, or to adjust one to fit the bill, but as
of now I mainly see drawbacks on the OPPs side and haven't seen
convincing arguments that it is really needed in devreq_cooling.c or a
better solution.
Cheers
Matthias
Hi Matthias,
On 2018년 08월 07일 09:23, Matthias Kaehlcke wrote:
> Hi Chanwoo,
>
> On Tue, Aug 07, 2018 at 07:31:16AM +0900, Chanwoo Choi wrote:
>> Hi Matthias,
>>
>> On 2018년 08월 07일 04:21, Matthias Kaehlcke wrote:
>>> Hi Chanwoo,
>>>
>>> On Fri, Aug 03, 2018 at 09:14:46AM +0900, Chanwoo Choi wrote:
>>>> Hi Matthias,
>>>>
>>>> On 2018년 08월 03일 08:48, Matthias Kaehlcke wrote:
>>>>> On Thu, Aug 02, 2018 at 04:13:43PM -0700, Matthias Kaehlcke wrote:
>>>>>> Hi Chanwoo,
>>>>>>
>>>>>> On Thu, Aug 02, 2018 at 10:58:59AM +0900, Chanwoo Choi wrote:
>>>>>>> Hi Matthias,
>>>>>>>
>>>>>>> On 2018년 08월 02일 02:08, Matthias Kaehlcke wrote:
>>>>>>>> Hi Chanwoo,
>>>>>>>>
>>>>>>>> On Wed, Aug 01, 2018 at 10:22:16AM +0900, Chanwoo Choi wrote:
>>>>>>>>> On 2018년 08월 01일 04:39, Matthias Kaehlcke wrote:
>>>>>>>>>> On Mon, Jul 16, 2018 at 10:50:50AM -0700, Matthias Kaehlcke wrote:
>>>>>>>>>>> On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote:
>>>>>>>>>>>> Hi Matthias,
>>>>>>>>>>>>
>>>>>>>>>>>> On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote:
>>>>>>>>>>>>> Hi Chanwoo,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Firstly,
>>>>>>>>>>>>>> I'm not sure why devfreq needs the devfreq_verify_within_limits() function.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> devfreq already used the OPP interface as default. It means that
>>>>>>>>>>>>>> the outside of 'drivers/devfreq' can disable/enable the frequency
>>>>>>>>>>>>>> such as drivers/thermal/devfreq_cooling.c. Also, when some device
>>>>>>>>>>>>>> drivers disable/enable the specific frequency, the devfreq core
>>>>>>>>>>>>>> consider them.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So, devfreq doesn't need to devfreq_verify_within_limits() because
>>>>>>>>>>>>>> already support some interface to change the minimum/maximum frequency
>>>>>>>>>>>>>> of devfreq device.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()'
>>>>>>>>>>>>>> to change the minimum/maximum frequency of cpu. some device driver cannot
>>>>>>>>>>>>>> change the minimum/maximum frequency through OPP interface.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> But, in case of devfreq subsystem, as I explained already, devfreq support
>>>>>>>>>>>>>> the OPP interface as default way. devfreq subsystem doesn't need to add
>>>>>>>>>>>>>> other way to change the minimum/maximum frequency.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Using the OPP interface exclusively works as long as a
>>>>>>>>>>>>> enabling/disabling of OPPs is limited to a single driver
>>>>>>>>>>>>> (drivers/thermal/devfreq_cooling.c). When multiple drivers are
>>>>>>>>>>>>> involved you need a way to resolve conflicts, that's the purpose of
>>>>>>>>>>>>> devfreq_verify_within_limits(). Please let me know if there are
>>>>>>>>>>>>> existing mechanisms for conflict resolution that I overlooked.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Possibly drivers/thermal/devfreq_cooling.c could be migrated to use
>>>>>>>>>>>>> devfreq_verify_within_limits() instead of the OPP interface if
>>>>>>>>>>>>> desired, however this seems beyond the scope of this series.
>>>>>>>>>>>>
>>>>>>>>>>>> Actually, if we uses this approach, it doesn't support the multiple drivers too.
>>>>>>>>>>>> If non throttler drivers uses devfreq_verify_within_limits(), the conflict
>>>>>>>>>>>> happen.
>>>>>>>>>>>
>>>>>>>>>>> As long as drivers limit the max freq there is no conflict, the lowest
>>>>>>>>>>> max freq wins. I expect this to be the usual case, apparently it
>>>>>>>>>>> worked for cpufreq for 10+ years.
>>>>>>>>>>>
>>>>>>>>>>> However it is correct that there would be a conflict if a driver
>>>>>>>>>>> requests a min freq that is higher than the max freq requested by
>>>>>>>>>>> another. In this case devfreq_verify_within_limits() resolves the
>>>>>>>>>>> conflict by raising p->max to the min freq. Not sure if this is
>>>>>>>>>>> something that would ever occur in practice though.
>>>>>>>>>>>
>>>>>>>>>>> If we are really concerned about this case it would also be an option
>>>>>>>>>>> to limit the adjustment to the max frequency.
>>>>>>>>>>>
>>>>>>>>>>>> To resolve the conflict for multiple device driver, maybe OPP interface
>>>>>>>>>>>> have to support 'usage_count' such as clk_enable/disable().
>>>>>>>>>>>
>>>>>>>>>>> This would require supporting negative usage count values, since a OPP
>>>>>>>>>>> should not be enabled if e.g. thermal enables it but the throttler
>>>>>>>>>>> disabled it or viceversa.
>>>>>>>>>>>
>>>>>>>>>>> Theoretically there could also be conflicts, like one driver disabling
>>>>>>>>>>> the higher OPPs and another the lower ones, with the outcome of all
>>>>>>>>>>> OPPs being disabled, which would be a more drastic conflict resolution
>>>>>>>>>>> than that of devfreq_verify_within_limits().
>>>>>>>>>>>
>>>>>>>>>>> Viresh, what do you think about an OPP usage count?
>>>>>>>>>>
>>>>>>>>>> Ping, can we try to reach a conclusion on this or at least keep the
>>>>>>>>>> discussion going?
>>>>>>>>>>
>>>>>>>>>> Not that it matters, but my preferred solution continues to be
>>>>>>>>>> devfreq_verify_within_limits(). It solves conflicts in some way (which
>>>>>>>>>> could be adjusted if needed) and has proven to work in practice for
>>>>>>>>>> 10+ years in a very similar sub-system.
>>>>>>>>>
>>>>>>>>> It is not true. Current cpufreq subsystem doesn't support external OPP
>>>>>>>>> control to enable/disable the OPP entry. If some device driver
>>>>>>>>> controls the OPP entry of cpufreq driver with opp_disable/enable(),
>>>>>>>>> the operation is not working. Because cpufreq considers the limit
>>>>>>>>> through 'cpufreq_verify_with_limits()' only.
>>>>>>>>
>>>>>>>> Ok, we can probably agree that using cpufreq_verify_with_limits()
>>>>>>>> exclusively seems to have worked well for cpufreq, and that in their
>>>>>>>> overall purpose cpufreq and devfreq are similar subsystems.
>>>>>>>>
>>>>>>>> The current throttler series with devfreq_verify_within_limits() takes
>>>>>>>> the enabled OPPs into account, the lowest and highest OPP are used as
>>>>>>>> a starting point for the frequency adjustment and (in theory) the
>>>>>>>> frequency range should only be narrowed by
>>>>>>>> devfreq_verify_within_limits().
>>>>>>>>
>>>>>>>>> As I already commented[1], there is different between cpufreq and devfreq.
>>>>>>>>> [1] https://lkml.org/lkml/2018/7/4/80
>>>>>>>>>
>>>>>>>>> Already, subsystem already used OPP interface in order to control
>>>>>>>>> specific OPP entry. I don't want to provide two outside method
>>>>>>>>> to control the frequency of devfreq driver. It might make the confusion.
>>>>>>>>
>>>>>>>> I understand your point, it would indeed be preferable to have a
>>>>>>>> single method. However I'm not convinced that the OPP interface is
>>>>>>>> a suitable solution, as I exposed earlier in this thread (quoted
>>>>>>>> below).
>>>>>>>>
>>>>>>>> I would like you to at least consider the possibility of changing
>>>>>>>> drivers/thermal/devfreq_cooling.c to devfreq_verify_within_limits().
>>>>>>>> Besides that it's not what is currently used, do you see any technical
>>>>>>>> concerns that would make devfreq_verify_within_limits() an unsuitable
>>>>>>>> or inferior solution?
>>>>>>>
>>>>>>> As we already discussed, devfreq_verify_within_limits() doesn't support
>>>>>>> the multiple outside controllers (e.g., devfreq-cooling.c).
>>>>>>
>>>>>> That's incorrect, its purpose is precisely that.
>>>>>>
>>>>>> Are you suggesting that cpufreq with its use of
>>>>>> cpufreq_verify_within_limits() (the inspiration for
>>>>>> devfreq_verify_within_limits()) is broken? It is used by cpu_cooling.c
>>>>>> and other drivers when receiving a CPUFREQ_ADJUST event, essentially
>>>>>> what I am proposing with DEVFREQ_ADJUST.
>>>>>>
>>>>>> Could you elaborate why this model wouldn't work for devfreq? "OPP
>>>>>> interface is mandatory for devfreq" isn't really a technical argument,
>>>>>> is it mandatory for any other reason than that it is the interface
>>>>>> that is currently used?
>>>>>>
>>>>>>> After you are suggesting the throttler core, there are at least two
>>>>>>> outside controllers (e.g., devfreq-cooling.c and throttler driver).
>>>>>>> As I knew the problem about conflict, I cannot agree the temporary
>>>>>>> method. OPP interface is mandatory for devfreq in order to control
>>>>>>> the OPP (frequency/voltage). In this situation, we have to try to
>>>>>>> find the method through OPP interface.
>>>>>>
>>>>>> What do you mean with "temporary method"?
>>>>>>
>>>>>> We can try to find a method through the OPP interface, but at this
>>>>>> point I'm not convinced that it is technically necessary or even
>>>>>> preferable.
>>>>>>
>>>>>> Another inconvenient of the OPP approach for both devfreq-cooling.c
>>>>>> and the throttler is that they have to bother with disabling all OPPs
>>>>>> above/below the max/min (they don't/shouldn't have to care), instead
>>>>>> of just telling devfreq the max/min.
>>>>>
>>>>> And a more important one: both drivers now have to keep track which
>>>>> OPPs they enabled/disabled previously, done are the days of a simple
>>>>> dev_pm_opp_enable/disable() in devfreq_cooling. Certainly it is
>>>>> possible and not very complex to implement, but is it really the
>>>>> best/a good solution?
>>>>
>>>>
>>>> As I replied them right before, Each outside driver has their own throttling
>>>> policy to control OPP entries. They don't care the requirement of other
>>>> driver and cannot know the requirement of other driver. devfreq core can only
>>>> recognize them and then only consider enabled OPP entris without disabled OPP entries.
>>>>
>>>> For example1,
>>>> | devfreq-cooling| throttler
>>>> ---------------------------------------
>>>> 500Mhz | disabled | disabled
>>>> 400Mhz | disabled | disabled
>>>> 300Mhz | | disabled
>>>> 200Mhz | |
>>>> 100Mhz | |
>>>> => devfreq driver can use only 100/200Mhz
>>>>
>>>>
>>>> For example2,
>>>> | devfreq-cooling| throttler
>>>> ---------------------------------------
>>>> 500Mhz | disabled | disabled
>>>> 400Mhz | disabled |
>>>> 300Mhz | disabled |
>>>> 200Mhz | |
>>>> 100Mhz | |
>>>> => devfreq driver can use only 100/200Mhz
>>>>
>>>>
>>>> For example3,
>>>> | devfreq-cooling| throttler
>>>> ---------------------------------------
>>>> 500Mhz | disabled | disabled
>>>> 400Mhz | |
>>>> 300Mhz | |
>>>> 200Mhz | | disabled
>>>> 100Mhz | | disabled
>>>> => devfreq driver can use only 300/400Mhz
>>>
>>> These are all cases without conflicts, my concern is about this:
>>>
>>>> | devfreq-cooling| throttler
>>>> ---------------------------------------
>>>> 500Mhz | disabled |
>>>> 400Mhz | disabled |
>>>> 300Mhz | | disabled
>>>> 200Mhz | | disabled
>>>> 100Mhz | | disabled
>>>> => devfreq driver can't use any frequency?
>>
>> There are no any enabled frequency. Because device driver
>> (devfreq-cooling, throttler) disable all frequencies.
>>
>> Outside drivers(devfreq-cooling, throttler) can enable/disable
>> specific OPP entries. As I already commented, each outside driver
>> doesn't consider the policy of other device driver about OPP entries.
>
> And wouldn't it be preferable to have an interface that tries to avoid
> this situation in the first place and has a clear policy for conflict
> resolution?
>
>> OPP interface is independent on devfreq and just control OPP entries.
>> After that, devfreq just consider the only enabled OPP entries.
>>
>>>
>>> Actually my above comment wasn't about this case, but about the
>>> added complexity in devfreq-cooling.c and the throttler:
>>>
>>> A bit simplified partition_enable_opps() currently does this:
>>>
>>> for_each_opp(opp) {
>>> if (opp->freq <= max)
>>> opp_enable(opp)
>>> else
>>> opp_disable(opp)
>>> }
>>>
>>> With the OPP usage/disable count this doesn't work any longer. Now we
>>> need to keep track of the enabled/disabled state of the OPP, something
>>> like:
>>>
>>> dev_pm_opp_enable(opp) {
>>> if (opp->freq <= max) {
>>> if (opp->freq > prev_max)
>>> opp_enable(opp)
>>> } else {
>>> if (opp->freq < prev_max)
>>> opp_disable(opp)
>>> }
>>> }
>>>
>>> And duplicate the same in the throttler (and other possible
>>> drivers). Obviously it can be done, but is there really any gain
>>> from it?
>>>
>>> Instead they just could do:
>>>
>>> devfreq_verify_within_limits(policy/freq_pair, 0, max_freq)
I have a new question about using devfreq_verify_within_limits().
For example,
Driver A has following opp-table
- 500Mhz
- 400Mhz
- 300Mhz
- 200Mhz
- 100Mhz
Basically, driver A has following init value:
- policy->min is 100
- policy->max is 500
Driver B, devfreq_verify_within_limits(200, 300)
policy->min is 200
policy->max is 300
Driver C, devfreq_verify_within_limits(300, 400)
policy->min is 300
policy->max is 300
Driver D, devfreq_verify_within_limits(400, 500)
policy->min is 400
policy->max is 400
In result, it looks like the requirement of Driver B are disappeared.
is it the intention of devfreq_verify_within_limits()?
>>>
>>> without being concerned about implementation details of devfreq.
>>>
>>
>> I don't think so.
>
> What are you referring to, the change that I claim that will be needed
> in partition_enable_opps() when OPPs have usage/disable counts? If so,
> how do you avoid that the function doesn't enable/disable an OPP that
> was already enabled/disabled in the previous iteration?
Just about changes of "dev_pm_opp_enable(opp)".
>
>> dev_pm_opp_enable()/dev_pm_opp_disable() have to consider only one
>> OPP entry without any other OPP entry.
>
> I agree with this :)
>
>> dev_pm_opp_enable()/dev_pm_opp_disable() can never know the other
>> OPP entries. After some driver(devfreq-cooling.c and throttler)
>> enable or disable specific OPP entries, the remaining OPP entry
>> with enabled state will be considered on devfreq driver.
>
> Having multiple drivers (or even a single one) enable and disable
> OPPs independently and at the time of their choosing sounds like a
> recipe for race conditions.
>
> What happens if e.g. the devfreq core calls
> dev_pm_opp_find_freq_ceil/floor() and right after returning another
> driver disables the OPP? devfreq uses the disabled OPP. Probably not a\
devfreq doesn't use the disabled OPP.
For example,
1. devfreq-cooling.c disable/enable some OPP and OPP send notification about OPP changes.
2. devfreq receives the notification (devfreq_notifier_call() is executed)
3. devfreq_notifier_call() try to find scaling_min_freq/scaling_max_freq
4. devfreq_notifier_call() executes update_devfreq() in order to apply the OPP changes.
5. devfreq can consider only enabled frequencies right after dev_pm_opp_disable/enable()
> big deal if disabling the OPP is only a semantic question, but I
> imagine there can be worse scenarios. Currently the only user of
> dev_pm_opp_disable() besides devfreq_cooling.c is imx6q-cpufreq.c, and
> it is well behaved and only disables OPPs during probe().
imx6q-cpufreq.c used the dev_pm_opp_disable() before calling dev_pm_opp_init_cpufreq_table(). After registered cpufreq_register_driver(), imx6q-cpufreq.c doesn't use the dev_pm_opp_disable/enable(). It means that dev_pm_opp_disable() of imx6q-cpufreq.c doesn't affect the frequency choice of cpufreq on the runtime after registered cpufreq driver.
On the other hand, devfreq_cooling.c use dev_pm_opp_disable/enable() on the runtime after registering devfreq driver. It affect the frequency choice of devfreq on the runtime.
>
> I keep missing a clear answer to the question in which sense
> manipulating the OPPs in devfreq_cooling.c is superior over narrowing
> down the frequency during DEVFREQ_ADJUST, which would avoid potential
> races and allow to resolve conflicts. Does it allow for some
You mentioned the race conditions eariler. Actually, I don't know the potential races.
> functionality that couldn't be achieved otherwise, does it make the
> code significantly less complex, is some integration with the OPP
> subsystem needed that I'm overlooking, is it more efficient, ...?
>
> I'm not just insisting because I'm stubborn. I'd be happy to use any
> interface that fits the bill, or to adjust one to fit the bill, but as
> of now I mainly see drawbacks on the OPPs side and haven't seen
> convincing arguments that it is really needed in devreq_cooling.c or a
> better solution.
During we discussed, we knew that OPP doesn't provide all operation perfectly. As of now, OPP is standard framework to control the pair of frequency/voltage. devfreq need to use OPP to the pair of frequency/voltage. Even if OPP doesn't provide all of our requirements, I think that devfreq should use OPP interface after updating OPP framework, instead of adding other functionality to control frequency in outside driver(devfreq-cooling.c, throttler).
Finally,
I asked to Viresh (OPP maintainer). If dev_pm_opp_enable() and dev_pm_opp_disable() don't support the usecase on multiple device drivers, opp interface could not be used in order to support both devfreq-cooling.c and throttler. So, We better to wait the opinion from OPP maintainer.
--
Best Regards,
Chanwoo Choi
Samsung Electronics
Hi Chanwoo,
On Tue, Aug 07, 2018 at 10:35:37AM +0900, Chanwoo Choi wrote:
> Hi Matthias,
>
> On 2018년 08월 07일 09:23, Matthias Kaehlcke wrote:
> > Hi Chanwoo,
> >
> > On Tue, Aug 07, 2018 at 07:31:16AM +0900, Chanwoo Choi wrote:
> >> Hi Matthias,
> >>
> >> On 2018년 08월 07일 04:21, Matthias Kaehlcke wrote:
> >>> Hi Chanwoo,
> >>>
> >>> On Fri, Aug 03, 2018 at 09:14:46AM +0900, Chanwoo Choi wrote:
> >>>> Hi Matthias,
> >>>>
> >>>> On 2018년 08월 03일 08:48, Matthias Kaehlcke wrote:
> >>>>> On Thu, Aug 02, 2018 at 04:13:43PM -0700, Matthias Kaehlcke wrote:
> >>>>>> Hi Chanwoo,
> >>>>>>
> >>>>>> On Thu, Aug 02, 2018 at 10:58:59AM +0900, Chanwoo Choi wrote:
> >>>>>>> Hi Matthias,
> >>>>>>>
> >>>>>>> On 2018년 08월 02일 02:08, Matthias Kaehlcke wrote:
> >>>>>>>> Hi Chanwoo,
> >>>>>>>>
> >>>>>>>> On Wed, Aug 01, 2018 at 10:22:16AM +0900, Chanwoo Choi wrote:
> >>>>>>>>> On 2018년 08월 01일 04:39, Matthias Kaehlcke wrote:
> >>>>>>>>>> On Mon, Jul 16, 2018 at 10:50:50AM -0700, Matthias Kaehlcke wrote:
> >>>>>>>>>>> On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote:
> >>>>>>>>>>>> Hi Matthias,
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote:
> >>>>>>>>>>>>> Hi Chanwoo,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Firstly,
> >>>>>>>>>>>>>> I'm not sure why devfreq needs the devfreq_verify_within_limits() function.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> devfreq already used the OPP interface as default. It means that
> >>>>>>>>>>>>>> the outside of 'drivers/devfreq' can disable/enable the frequency
> >>>>>>>>>>>>>> such as drivers/thermal/devfreq_cooling.c. Also, when some device
> >>>>>>>>>>>>>> drivers disable/enable the specific frequency, the devfreq core
> >>>>>>>>>>>>>> consider them.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> So, devfreq doesn't need to devfreq_verify_within_limits() because
> >>>>>>>>>>>>>> already support some interface to change the minimum/maximum frequency
> >>>>>>>>>>>>>> of devfreq device.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()'
> >>>>>>>>>>>>>> to change the minimum/maximum frequency of cpu. some device driver cannot
> >>>>>>>>>>>>>> change the minimum/maximum frequency through OPP interface.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> But, in case of devfreq subsystem, as I explained already, devfreq support
> >>>>>>>>>>>>>> the OPP interface as default way. devfreq subsystem doesn't need to add
> >>>>>>>>>>>>>> other way to change the minimum/maximum frequency.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Using the OPP interface exclusively works as long as a
> >>>>>>>>>>>>> enabling/disabling of OPPs is limited to a single driver
> >>>>>>>>>>>>> (drivers/thermal/devfreq_cooling.c). When multiple drivers are
> >>>>>>>>>>>>> involved you need a way to resolve conflicts, that's the purpose of
> >>>>>>>>>>>>> devfreq_verify_within_limits(). Please let me know if there are
> >>>>>>>>>>>>> existing mechanisms for conflict resolution that I overlooked.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Possibly drivers/thermal/devfreq_cooling.c could be migrated to use
> >>>>>>>>>>>>> devfreq_verify_within_limits() instead of the OPP interface if
> >>>>>>>>>>>>> desired, however this seems beyond the scope of this series.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Actually, if we uses this approach, it doesn't support the multiple drivers too.
> >>>>>>>>>>>> If non throttler drivers uses devfreq_verify_within_limits(), the conflict
> >>>>>>>>>>>> happen.
> >>>>>>>>>>>
> >>>>>>>>>>> As long as drivers limit the max freq there is no conflict, the lowest
> >>>>>>>>>>> max freq wins. I expect this to be the usual case, apparently it
> >>>>>>>>>>> worked for cpufreq for 10+ years.
> >>>>>>>>>>>
> >>>>>>>>>>> However it is correct that there would be a conflict if a driver
> >>>>>>>>>>> requests a min freq that is higher than the max freq requested by
> >>>>>>>>>>> another. In this case devfreq_verify_within_limits() resolves the
> >>>>>>>>>>> conflict by raising p->max to the min freq. Not sure if this is
> >>>>>>>>>>> something that would ever occur in practice though.
> >>>>>>>>>>>
> >>>>>>>>>>> If we are really concerned about this case it would also be an option
> >>>>>>>>>>> to limit the adjustment to the max frequency.
> >>>>>>>>>>>
> >>>>>>>>>>>> To resolve the conflict for multiple device driver, maybe OPP interface
> >>>>>>>>>>>> have to support 'usage_count' such as clk_enable/disable().
> >>>>>>>>>>>
> >>>>>>>>>>> This would require supporting negative usage count values, since a OPP
> >>>>>>>>>>> should not be enabled if e.g. thermal enables it but the throttler
> >>>>>>>>>>> disabled it or viceversa.
> >>>>>>>>>>>
> >>>>>>>>>>> Theoretically there could also be conflicts, like one driver disabling
> >>>>>>>>>>> the higher OPPs and another the lower ones, with the outcome of all
> >>>>>>>>>>> OPPs being disabled, which would be a more drastic conflict resolution
> >>>>>>>>>>> than that of devfreq_verify_within_limits().
> >>>>>>>>>>>
> >>>>>>>>>>> Viresh, what do you think about an OPP usage count?
> >>>>>>>>>>
> >>>>>>>>>> Ping, can we try to reach a conclusion on this or at least keep the
> >>>>>>>>>> discussion going?
> >>>>>>>>>>
> >>>>>>>>>> Not that it matters, but my preferred solution continues to be
> >>>>>>>>>> devfreq_verify_within_limits(). It solves conflicts in some way (which
> >>>>>>>>>> could be adjusted if needed) and has proven to work in practice for
> >>>>>>>>>> 10+ years in a very similar sub-system.
> >>>>>>>>>
> >>>>>>>>> It is not true. Current cpufreq subsystem doesn't support external OPP
> >>>>>>>>> control to enable/disable the OPP entry. If some device driver
> >>>>>>>>> controls the OPP entry of cpufreq driver with opp_disable/enable(),
> >>>>>>>>> the operation is not working. Because cpufreq considers the limit
> >>>>>>>>> through 'cpufreq_verify_with_limits()' only.
> >>>>>>>>
> >>>>>>>> Ok, we can probably agree that using cpufreq_verify_with_limits()
> >>>>>>>> exclusively seems to have worked well for cpufreq, and that in their
> >>>>>>>> overall purpose cpufreq and devfreq are similar subsystems.
> >>>>>>>>
> >>>>>>>> The current throttler series with devfreq_verify_within_limits() takes
> >>>>>>>> the enabled OPPs into account, the lowest and highest OPP are used as
> >>>>>>>> a starting point for the frequency adjustment and (in theory) the
> >>>>>>>> frequency range should only be narrowed by
> >>>>>>>> devfreq_verify_within_limits().
> >>>>>>>>
> >>>>>>>>> As I already commented[1], there is different between cpufreq and devfreq.
> >>>>>>>>> [1] https://lkml.org/lkml/2018/7/4/80
> >>>>>>>>>
> >>>>>>>>> Already, subsystem already used OPP interface in order to control
> >>>>>>>>> specific OPP entry. I don't want to provide two outside method
> >>>>>>>>> to control the frequency of devfreq driver. It might make the confusion.
> >>>>>>>>
> >>>>>>>> I understand your point, it would indeed be preferable to have a
> >>>>>>>> single method. However I'm not convinced that the OPP interface is
> >>>>>>>> a suitable solution, as I exposed earlier in this thread (quoted
> >>>>>>>> below).
> >>>>>>>>
> >>>>>>>> I would like you to at least consider the possibility of changing
> >>>>>>>> drivers/thermal/devfreq_cooling.c to devfreq_verify_within_limits().
> >>>>>>>> Besides that it's not what is currently used, do you see any technical
> >>>>>>>> concerns that would make devfreq_verify_within_limits() an unsuitable
> >>>>>>>> or inferior solution?
> >>>>>>>
> >>>>>>> As we already discussed, devfreq_verify_within_limits() doesn't support
> >>>>>>> the multiple outside controllers (e.g., devfreq-cooling.c).
> >>>>>>
> >>>>>> That's incorrect, its purpose is precisely that.
> >>>>>>
> >>>>>> Are you suggesting that cpufreq with its use of
> >>>>>> cpufreq_verify_within_limits() (the inspiration for
> >>>>>> devfreq_verify_within_limits()) is broken? It is used by cpu_cooling.c
> >>>>>> and other drivers when receiving a CPUFREQ_ADJUST event, essentially
> >>>>>> what I am proposing with DEVFREQ_ADJUST.
> >>>>>>
> >>>>>> Could you elaborate why this model wouldn't work for devfreq? "OPP
> >>>>>> interface is mandatory for devfreq" isn't really a technical argument,
> >>>>>> is it mandatory for any other reason than that it is the interface
> >>>>>> that is currently used?
> >>>>>>
> >>>>>>> After you are suggesting the throttler core, there are at least two
> >>>>>>> outside controllers (e.g., devfreq-cooling.c and throttler driver).
> >>>>>>> As I knew the problem about conflict, I cannot agree the temporary
> >>>>>>> method. OPP interface is mandatory for devfreq in order to control
> >>>>>>> the OPP (frequency/voltage). In this situation, we have to try to
> >>>>>>> find the method through OPP interface.
> >>>>>>
> >>>>>> What do you mean with "temporary method"?
> >>>>>>
> >>>>>> We can try to find a method through the OPP interface, but at this
> >>>>>> point I'm not convinced that it is technically necessary or even
> >>>>>> preferable.
> >>>>>>
> >>>>>> Another inconvenient of the OPP approach for both devfreq-cooling.c
> >>>>>> and the throttler is that they have to bother with disabling all OPPs
> >>>>>> above/below the max/min (they don't/shouldn't have to care), instead
> >>>>>> of just telling devfreq the max/min.
> >>>>>
> >>>>> And a more important one: both drivers now have to keep track which
> >>>>> OPPs they enabled/disabled previously, done are the days of a simple
> >>>>> dev_pm_opp_enable/disable() in devfreq_cooling. Certainly it is
> >>>>> possible and not very complex to implement, but is it really the
> >>>>> best/a good solution?
> >>>>
> >>>>
> >>>> As I replied them right before, Each outside driver has their own throttling
> >>>> policy to control OPP entries. They don't care the requirement of other
> >>>> driver and cannot know the requirement of other driver. devfreq core can only
> >>>> recognize them and then only consider enabled OPP entris without disabled OPP entries.
> >>>>
> >>>> For example1,
> >>>> | devfreq-cooling| throttler
> >>>> ---------------------------------------
> >>>> 500Mhz | disabled | disabled
> >>>> 400Mhz | disabled | disabled
> >>>> 300Mhz | | disabled
> >>>> 200Mhz | |
> >>>> 100Mhz | |
> >>>> => devfreq driver can use only 100/200Mhz
> >>>>
> >>>>
> >>>> For example2,
> >>>> | devfreq-cooling| throttler
> >>>> ---------------------------------------
> >>>> 500Mhz | disabled | disabled
> >>>> 400Mhz | disabled |
> >>>> 300Mhz | disabled |
> >>>> 200Mhz | |
> >>>> 100Mhz | |
> >>>> => devfreq driver can use only 100/200Mhz
> >>>>
> >>>>
> >>>> For example3,
> >>>> | devfreq-cooling| throttler
> >>>> ---------------------------------------
> >>>> 500Mhz | disabled | disabled
> >>>> 400Mhz | |
> >>>> 300Mhz | |
> >>>> 200Mhz | | disabled
> >>>> 100Mhz | | disabled
> >>>> => devfreq driver can use only 300/400Mhz
> >>>
> >>> These are all cases without conflicts, my concern is about this:
> >>>
> >>>> | devfreq-cooling| throttler
> >>>> ---------------------------------------
> >>>> 500Mhz | disabled |
> >>>> 400Mhz | disabled |
> >>>> 300Mhz | | disabled
> >>>> 200Mhz | | disabled
> >>>> 100Mhz | | disabled
> >>>> => devfreq driver can't use any frequency?
> >>
> >> There are no any enabled frequency. Because device driver
> >> (devfreq-cooling, throttler) disable all frequencies.
> >>
> >> Outside drivers(devfreq-cooling, throttler) can enable/disable
> >> specific OPP entries. As I already commented, each outside driver
> >> doesn't consider the policy of other device driver about OPP entries.
> >
> > And wouldn't it be preferable to have an interface that tries to avoid
> > this situation in the first place and has a clear policy for conflict
> > resolution?
> >
> >> OPP interface is independent on devfreq and just control OPP entries.
> >> After that, devfreq just consider the only enabled OPP entries.
> >>
> >>>
> >>> Actually my above comment wasn't about this case, but about the
> >>> added complexity in devfreq-cooling.c and the throttler:
> >>>
> >>> A bit simplified partition_enable_opps() currently does this:
> >>>
> >>> for_each_opp(opp) {
> >>> if (opp->freq <= max)
> >>> opp_enable(opp)
> >>> else
> >>> opp_disable(opp)
> >>> }
> >>>
> >>> With the OPP usage/disable count this doesn't work any longer. Now we
> >>> need to keep track of the enabled/disabled state of the OPP, something
> >>> like:
> >>>
> >>> dev_pm_opp_enable(opp) {
> >>> if (opp->freq <= max) {
> >>> if (opp->freq > prev_max)
> >>> opp_enable(opp)
> >>> } else {
> >>> if (opp->freq < prev_max)
> >>> opp_disable(opp)
> >>> }
> >>> }
> >>>
> >>> And duplicate the same in the throttler (and other possible
> >>> drivers). Obviously it can be done, but is there really any gain
> >>> from it?
> >>>
> >>> Instead they just could do:
> >>>
> >>> devfreq_verify_within_limits(policy/freq_pair, 0, max_freq)
>
> I have a new question about using devfreq_verify_within_limits().
>
> For example,
> Driver A has following opp-table
> - 500Mhz
> - 400Mhz
> - 300Mhz
> - 200Mhz
> - 100Mhz
>
> Basically, driver A has following init value:
> - policy->min is 100
> - policy->max is 500
>
> Driver B, devfreq_verify_within_limits(200, 300)
> policy->min is 200
> policy->max is 300
> Driver C, devfreq_verify_within_limits(300, 400)
> policy->min is 300
> policy->max is 300
> Driver D, devfreq_verify_within_limits(400, 500)
> policy->min is 400
> policy->max is 400
>
> In result, it looks like the requirement of Driver B are disappeared.
> is it the intention of devfreq_verify_within_limits()?
The requirements are conflicting, neither
devfreq_verify_within_limits() nor any other mechanism can completely
resolve that. Any conflict resolution method will violate the
requirements of at least one client. I don't claim the current method
is necessarily the best, it's just one that takes a predictable
action, I'm open to other suggestions.
> >>> without being concerned about implementation details of devfreq.
> >>>
> >>
> >> I don't think so.
> >
> > What are you referring to, the change that I claim that will be needed
> > in partition_enable_opps() when OPPs have usage/disable counts? If so,
> > how do you avoid that the function doesn't enable/disable an OPP that
> > was already enabled/disabled in the previous iteration?
>
> Just about changes of "dev_pm_opp_enable(opp)".
>
> >
> >> dev_pm_opp_enable()/dev_pm_opp_disable() have to consider only one
> >> OPP entry without any other OPP entry.
> >
> > I agree with this :)
> >
> >> dev_pm_opp_enable()/dev_pm_opp_disable() can never know the other
> >> OPP entries. After some driver(devfreq-cooling.c and throttler)
> >> enable or disable specific OPP entries, the remaining OPP entry
> >> with enabled state will be considered on devfreq driver.
> >
> > Having multiple drivers (or even a single one) enable and disable
> > OPPs independently and at the time of their choosing sounds like a
> > recipe for race conditions.
> >
> > What happens if e.g. the devfreq core calls
> > dev_pm_opp_find_freq_ceil/floor() and right after returning another
> > driver disables the OPP? devfreq uses the disabled OPP. Probably not a\
>
> devfreq doesn't use the disabled OPP.
>
> For example,
> 1. devfreq-cooling.c disable/enable some OPP and OPP send notification about OPP changes.
> 2. devfreq receives the notification (devfreq_notifier_call() is executed)
> 3. devfreq_notifier_call() try to find scaling_min_freq/scaling_max_freq
3.5 The throttler runs and disables scaling_max_freq
> 4. devfreq_notifier_call() executes update_devfreq() in order to apply the OPP changes.
> 5. devfreq can consider only enabled frequencies right after dev_pm_opp_disable/enable()
Depending on the locking requirements within the throttler it *might*
be possible to avoid this situation if the throttler held
devfreq->lock while manipulating the OPPs. Not a great option though,
since we'd be dealing with devfreq implementation details in the
throttler.
If I am not mistaken the same can happen nowadays with devfreq_cooling.c:
<thermal governor>
mutex_lock(tz->lock)
thermal_cdev_update
mutex_lock(cdev->lock)
devfreq_cooling_set_cur_state
partition_enable_opps
The governor function can run at the same time as
devfreq_notifier_call() / update_devfreq() and unless I overlooked
something there is nothing preventing partition_enable_opps() from
disabling OPPs while the devfreq functions are executing.
> > big deal if disabling the OPP is only a semantic question, but I
> > imagine there can be worse scenarios. Currently the only user of
> > dev_pm_opp_disable() besides devfreq_cooling.c is imx6q-cpufreq.c, and
> > it is well behaved and only disables OPPs during probe().
>
> imx6q-cpufreq.c used the dev_pm_opp_disable() before calling
> dev_pm_opp_init_cpufreq_table(). After registered
> cpufreq_register_driver(), imx6q-cpufreq.c doesn't use the
> dev_pm_opp_disable/enable(). It means that dev_pm_opp_disable() of
> imx6q-cpufreq.c doesn't affect the frequency choice of cpufreq on
> the runtime after registered cpufreq driver.
>
> On the other hand, devfreq_cooling.c use dev_pm_opp_disable/enable()
> on the runtime after registering devfreq driver. It affect the
> frequency choice of devfreq on the runtime.
Exactly, that was my point. devfreq_cooling.c is currently the only
driver that uses the interface at runtime. And I have doubts that
proper synchronization is in place.
> > I keep missing a clear answer to the question in which sense
> > manipulating the OPPs in devfreq_cooling.c is superior over narrowing
> > down the frequency during DEVFREQ_ADJUST, which would avoid potential
> > races and allow to resolve conflicts. Does it allow for some
>
> You mentioned the race conditions eariler. Actually, I don't know
> the potential races.
Does my above analysis look correct to you, or did I miss some
mechanism that provides synchronization?
> > functionality that couldn't be achieved otherwise, does it make the
> > code significantly less complex, is some integration with the OPP
> > subsystem needed that I'm overlooking, is it more efficient, ...?
> >
> > I'm not just insisting because I'm stubborn. I'd be happy to use any
> > interface that fits the bill, or to adjust one to fit the bill, but as
> > of now I mainly see drawbacks on the OPPs side and haven't seen
> > convincing arguments that it is really needed in devreq_cooling.c or a
> > better solution.
>
> During we discussed, we knew that OPP doesn't provide all operation
> perfectly. As of now, OPP is standard framework to control the pair
> of frequency/voltage. devfreq need to use OPP to the pair of
> frequency/voltage. Even if OPP doesn't provide all of our
> requirements, I think that devfreq should use OPP interface after
> updating OPP framework, instead of adding other functionality to
> control frequency in outside driver(devfreq-cooling.c, throttler).
I think it's great if devfreq uses the OPP interface internally, and
disables OPPs *after* having received the requirements from all
outside drivers and resolved potential conflicts.
For narrowing down the frequency range from outside drivers - even
with the extended OPP interface we are discussing - there are multiple
caveats and so far not a single *technical* argument in it's favor
(and I'm really in favor of using standard interfaces when suitable
instead of implementing custom solutions, though in this case the
'custom solution' is a simple function call).
> I asked to Viresh (OPP maintainer). If dev_pm_opp_enable() and
> dev_pm_opp_disable() don't support the usecase on multiple device
> drivers, opp interface could not be used in order to support both
> devfreq-cooling.c and throttler. So, We better to wait the opinion
> from OPP maintainer.
Yes, it would be interesting to know Viresh's opinion, maybe I'm
completely wrong and there is an elegant solution using the
OPPs. Hopefully this thread didn't get buried in his inbox, it might
be worth to ping him separately.
Thanks
Matthias