(Sorry for the spam. I am resending the series because I noted that
some of the email addresses were mistyped)
Currently main mechanism to implement scaling using devfreq is
polling and the device profile is free to set polling interval.
However, in many cases this approach is not optimal; We may
unnecessarily use CPU time to determine load on engine that
is idle most of time - or we may simply read load when we
already know that the device is busy.
In some cases a device itself has counters to track its activity
and possibility to raise interrupts when load goes above or below
certain threshold.
This series adds support for watermark events to devfreq and
introduces two example governors. The first patch adds two
callbacks to the device profile (for setting watermark) and
adds a new function call to devfreq that informs of crossed
watermark.
The added governors demonstrate usage of the new API. The first
governor (watermark simple) sets device to trigger low watermark
event when load goes below 10% and high watermark interrupt when
the load goes above 60%. Watermark active tries to keep load at
certain level and it actively sets thresholds based on the
frequency table in order get interrupts only when the load value
would affect to the current frequency in re-estimation.
Arto Merilainen (1):
PM / devfreq: Add watermark active governor
Shridhar Rasal (2):
PM / devfreq: Add watermark events
PM / devfreq: Add watermark simple governor
drivers/devfreq/Kconfig | 18 +++
drivers/devfreq/Makefile | 2 +
drivers/devfreq/devfreq.c | 19 +++
drivers/devfreq/governor.h | 1 +
drivers/devfreq/governor_wmark_active.c | 276 ++++++++++++++++++++++++++++++++
drivers/devfreq/governor_wmark_simple.c | 245 ++++++++++++++++++++++++++++
include/linux/devfreq.h | 26 +++
7 files changed, 587 insertions(+)
create mode 100644 drivers/devfreq/governor_wmark_active.c
create mode 100644 drivers/devfreq/governor_wmark_simple.c
--
1.8.1.5
From: Shridhar Rasal <[email protected]>
This patch adds support for watermark events. These events inform
the governor that the device load has gone below (low watermark) or
above (high watermark) certain load value.
Signed-off-by: Shridhar Rasal <[email protected]>
Signed-off-by: Arto Merilainen <[email protected]>
---
drivers/devfreq/devfreq.c | 19 +++++++++++++++++++
drivers/devfreq/governor.h | 1 +
include/linux/devfreq.h | 26 ++++++++++++++++++++++++++
3 files changed, 46 insertions(+)
diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
index 30b538d8cc90..8333d024132a 100644
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -1050,6 +1050,25 @@ static struct attribute *devfreq_attrs[] = {
};
ATTRIBUTE_GROUPS(devfreq);
+/**
+ * devfreq_watermark_event() - Handles watermark events
+ * @devfreq: the devfreq instance to be updated
+ * @type: type of watermark event
+ */
+int devfreq_watermark_event(struct devfreq *devfreq, int type)
+{
+ if (!devfreq)
+ return -EINVAL;
+
+ if (!devfreq->governor)
+ return -EINVAL;
+
+ return devfreq->governor->event_handler(devfreq,
+ DEVFREQ_GOV_WMARK, &type);
+}
+EXPORT_SYMBOL(devfreq_watermark_event);
+
+
static int __init devfreq_init(void)
{
devfreq_class = class_create(THIS_MODULE, "devfreq");
diff --git a/drivers/devfreq/governor.h b/drivers/devfreq/governor.h
index fad7d6321978..fb9875388f3f 100644
--- a/drivers/devfreq/governor.h
+++ b/drivers/devfreq/governor.h
@@ -24,6 +24,7 @@
#define DEVFREQ_GOV_INTERVAL 0x3
#define DEVFREQ_GOV_SUSPEND 0x4
#define DEVFREQ_GOV_RESUME 0x5
+#define DEVFREQ_GOV_WMARK 0x6
/* Caution: devfreq->lock must be locked before calling update_devfreq */
extern int update_devfreq(struct devfreq *devfreq);
diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
index f1863dcd83ea..b5bf6c4fe286 100644
--- a/include/linux/devfreq.h
+++ b/include/linux/devfreq.h
@@ -21,6 +21,12 @@
struct devfreq;
+enum watermark_type {
+ NO_WATERMARK_EVENT = 0,
+ HIGH_WATERMARK_EVENT = 1,
+ LOW_WATERMARK_EVENT = 2
+};
+
/**
* struct devfreq_dev_status - Data given from devfreq user device to
* governors. Represents the performance
@@ -68,6 +74,16 @@ struct devfreq_dev_status {
* status to devfreq, which is used by governors.
* @get_cur_freq: The device should provide the current frequency
* at which it is operating.
+ * @set_high_wmark: This is an optional callback to set high
+ * watermark for watermark event. The value is
+ * be scaled between 0 and 1000 where 1000 equals to
+ * 100% load. Setting this value to 1000 disables
+ * the event
+ * @set_low_wmark: This is an optional callback to set low
+ * watermark for watermark event. The value is
+ * be scaled between 0 and 1000 where 1000 equals to
+ * 100% load. Setting this value to 0 disables the
+ * event.
* @exit: An optional callback that is called when devfreq
* is removing the devfreq object due to error or
* from devfreq_remove_device() call. If the user
@@ -84,6 +100,8 @@ struct devfreq_dev_profile {
int (*get_dev_status)(struct device *dev,
struct devfreq_dev_status *stat);
int (*get_cur_freq)(struct device *dev, unsigned long *freq);
+ int (*set_high_wmark)(struct device *dev, unsigned int val);
+ int (*set_low_wmark)(struct device *dev, unsigned int val);
void (*exit)(struct device *dev);
unsigned int *freq_table;
@@ -191,6 +209,8 @@ extern void devm_devfreq_remove_device(struct device *dev,
/* Supposed to be called by PM_SLEEP/PM_RUNTIME callbacks */
extern int devfreq_suspend_device(struct devfreq *devfreq);
extern int devfreq_resume_device(struct devfreq *devfreq);
+extern int devfreq_watermark_event(struct devfreq *devfreq,
+ int type);
/* Helper functions for devfreq user device driver with OPP. */
extern struct dev_pm_opp *devfreq_recommended_opp(struct device *dev,
@@ -289,6 +309,12 @@ static inline void devm_devfreq_unregister_opp_notifier(struct device *dev,
struct devfreq *devfreq)
{
}
+
+static inline int devfreq_watermark_event(struct devfreq *devfreq,
+ int type)
+{
+ return 0;
+}
#endif /* CONFIG_PM_DEVFREQ */
#endif /* __LINUX_DEVFREQ_H__ */
--
1.8.1.5
From: Shridhar Rasal <[email protected]>
This patch adds a new devfreq governor, watermark simple. The
governor decides next frequency naively by selecting the previous
frequency in the frequency table if we received low watermark
- or the next frequency in case we received high watermark.
Signed-off-by: Shridhar Rasal <[email protected]>
Signed-off-by: Arto Merilainen <[email protected]>
---
drivers/devfreq/Kconfig | 7 +
drivers/devfreq/Makefile | 1 +
drivers/devfreq/governor_wmark_simple.c | 245 ++++++++++++++++++++++++++++++++
3 files changed, 253 insertions(+)
create mode 100644 drivers/devfreq/governor_wmark_simple.c
diff --git a/drivers/devfreq/Kconfig b/drivers/devfreq/Kconfig
index faf4e70c42e0..37710d999ffe 100644
--- a/drivers/devfreq/Kconfig
+++ b/drivers/devfreq/Kconfig
@@ -63,6 +63,13 @@ config DEVFREQ_GOV_USERSPACE
Otherwise, the governor does not change the frequnecy
given at the initialization.
+config DEVFREQ_GOV_WMARK_SIMPLE
+ tristate "Simple Watermark"
+ help
+ Sets the frequency based on monitor watermark events.
+ This governor returns the next frequency from frequency table
+ based on type watermark event.
+
comment "DEVFREQ Drivers"
config ARM_EXYNOS4_BUS_DEVFREQ
diff --git a/drivers/devfreq/Makefile b/drivers/devfreq/Makefile
index 16138c9e0d58..92024eeb73b6 100644
--- a/drivers/devfreq/Makefile
+++ b/drivers/devfreq/Makefile
@@ -3,6 +3,7 @@ obj-$(CONFIG_DEVFREQ_GOV_SIMPLE_ONDEMAND) += governor_simpleondemand.o
obj-$(CONFIG_DEVFREQ_GOV_PERFORMANCE) += governor_performance.o
obj-$(CONFIG_DEVFREQ_GOV_POWERSAVE) += governor_powersave.o
obj-$(CONFIG_DEVFREQ_GOV_USERSPACE) += governor_userspace.o
+obj-$(CONFIG_DEVFREQ_GOV_WMARK_SIMPLE) += governor_wmark_simple.o
# DEVFREQ Drivers
obj-$(CONFIG_ARM_EXYNOS4_BUS_DEVFREQ) += exynos/
diff --git a/drivers/devfreq/governor_wmark_simple.c b/drivers/devfreq/governor_wmark_simple.c
new file mode 100644
index 000000000000..bd14adcc84cb
--- /dev/null
+++ b/drivers/devfreq/governor_wmark_simple.c
@@ -0,0 +1,245 @@
+/*
+ * Copyright (c) 2014, NVIDIA CORPORATION. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/devfreq.h>
+#include <linux/debugfs.h>
+#include <linux/types.h>
+#include <linux/slab.h>
+#include <linux/device.h>
+#include <linux/notifier.h>
+#include <linux/platform_device.h>
+
+#include "governor.h"
+
+struct wmark_gov_info {
+ /* probed from the devfreq */
+ unsigned int *freqlist;
+ int freq_count;
+
+ /* algorithm parameters */
+ unsigned int p_high_wmark;
+ unsigned int p_low_wmark;
+
+ /* dynamically changing data */
+ enum watermark_type event;
+ unsigned int last_request;
+
+ /* common data */
+ struct devfreq *df;
+ struct platform_device *pdev;
+ struct dentry *debugdir;
+};
+
+static unsigned long freqlist_up(struct wmark_gov_info *wmarkinfo,
+ unsigned long curr_freq)
+{
+ int i, pos;
+
+ for (i = 0; i < wmarkinfo->freq_count; i++)
+ if (wmarkinfo->freqlist[i] > curr_freq)
+ break;
+
+ pos = min(wmarkinfo->freq_count - 1, i);
+
+ return wmarkinfo->freqlist[pos];
+}
+
+static unsigned int freqlist_down(struct wmark_gov_info *wmarkinfo,
+ unsigned long curr_freq)
+{
+ int i, pos;
+
+ for (i = wmarkinfo->freq_count - 1; i >= 0; i--)
+ if (wmarkinfo->freqlist[i] < curr_freq)
+ break;
+
+ pos = max(0, i);
+ return wmarkinfo->freqlist[pos];
+}
+
+static int devfreq_watermark_target_freq(struct devfreq *df,
+ unsigned long *freq)
+{
+ struct wmark_gov_info *wmarkinfo = df->data;
+ struct devfreq_dev_status dev_stat;
+ int err;
+
+ err = df->profile->get_dev_status(df->dev.parent, &dev_stat);
+ if (err < 0)
+ return err;
+
+ switch (wmarkinfo->event) {
+ case HIGH_WATERMARK_EVENT:
+ *freq = freqlist_up(wmarkinfo, dev_stat.current_frequency);
+
+ /* always enable low watermark */
+ df->profile->set_low_wmark(df->dev.parent,
+ wmarkinfo->p_low_wmark);
+
+ /* disable high watermark if no change */
+ if (*freq == wmarkinfo->last_request)
+ df->profile->set_high_wmark(df->dev.parent, 1000);
+ break;
+ case LOW_WATERMARK_EVENT:
+ *freq = freqlist_down(wmarkinfo, dev_stat.current_frequency);
+
+ /* always enable high watermark */
+ df->profile->set_high_wmark(df->dev.parent,
+ wmarkinfo->p_high_wmark);
+
+ /* disable low watermark if no change */
+ if (*freq == wmarkinfo->last_request)
+ df->profile->set_low_wmark(df->dev.parent, 0);
+ break;
+ default:
+ break;
+ }
+
+ /* Mark that you handled event */
+ wmarkinfo->event = NO_WATERMARK_EVENT;
+ wmarkinfo->last_request = *freq;
+
+ return 0;
+}
+
+static void devfreq_watermark_debug_start(struct devfreq *df)
+{
+ struct wmark_gov_info *wmarkinfo = df->data;
+ char dirname[128];
+
+ snprintf(dirname, sizeof(dirname), "%s_scaling",
+ to_platform_device(df->dev.parent)->name);
+
+ if (!wmarkinfo)
+ return;
+
+ wmarkinfo->debugdir = debugfs_create_dir(dirname, NULL);
+ if (!wmarkinfo->debugdir)
+ return;
+
+ debugfs_create_u32("low_wmark", S_IRUGO | S_IWUSR,
+ wmarkinfo->debugdir, &wmarkinfo->p_low_wmark);
+ debugfs_create_u32("high_wmark", S_IRUGO | S_IWUSR,
+ wmarkinfo->debugdir, &wmarkinfo->p_high_wmark);
+}
+
+static void devfreq_watermark_debug_stop(struct devfreq *df)
+{
+ struct wmark_gov_info *wmarkinfo = df->data;
+
+ debugfs_remove_recursive(wmarkinfo->debugdir);
+}
+
+static int devfreq_watermark_start(struct devfreq *df)
+{
+ struct wmark_gov_info *wmarkinfo;
+ struct platform_device *pdev = to_platform_device(df->dev.parent);
+
+ if (!df->profile->freq_table) {
+ dev_err(&pdev->dev, "Frequency table missing\n");
+ return -EINVAL;
+ }
+
+ wmarkinfo = kzalloc(sizeof(struct wmark_gov_info), GFP_KERNEL);
+ if (!wmarkinfo)
+ return -ENOMEM;
+
+ df->data = (void *)wmarkinfo;
+ wmarkinfo->freqlist = df->profile->freq_table;
+ wmarkinfo->freq_count = df->profile->max_state;
+ wmarkinfo->event = NO_WATERMARK_EVENT;
+ wmarkinfo->df = df;
+ wmarkinfo->pdev = pdev;
+ wmarkinfo->p_low_wmark = 100;
+ wmarkinfo->p_high_wmark = 600;
+
+ devfreq_watermark_debug_start(df);
+
+ return 0;
+}
+
+static int devfreq_watermark_event_handler(struct devfreq *df,
+ unsigned int event,
+ void *wmark_type)
+{
+ int ret = 0;
+ struct wmark_gov_info *wmarkinfo = df->data;
+ enum watermark_type *type = wmark_type;
+
+ switch (event) {
+ case DEVFREQ_GOV_START:
+ devfreq_watermark_start(df);
+ wmarkinfo = df->data;
+ if (df->profile->set_low_wmark)
+ df->profile->set_low_wmark(df->dev.parent,
+ wmarkinfo->p_low_wmark);
+ if (df->profile->set_high_wmark)
+ df->profile->set_high_wmark(df->dev.parent,
+ wmarkinfo->p_high_wmark);
+ break;
+ case DEVFREQ_GOV_STOP:
+ devfreq_watermark_debug_stop(df);
+ break;
+ case DEVFREQ_GOV_SUSPEND:
+ devfreq_monitor_suspend(df);
+ break;
+
+ case DEVFREQ_GOV_RESUME:
+ if (df->profile->set_low_wmark)
+ df->profile->set_low_wmark(df->dev.parent,
+ wmarkinfo->p_low_wmark);
+ if (df->profile->set_high_wmark)
+ df->profile->set_high_wmark(df->dev.parent,
+ wmarkinfo->p_high_wmark);
+ devfreq_monitor_resume(df);
+ break;
+
+ case DEVFREQ_GOV_WMARK:
+ /* Set watermark interrupt type */
+ wmarkinfo->event = *type;
+
+ mutex_lock(&df->lock);
+ update_devfreq(df);
+ mutex_unlock(&df->lock);
+
+ break;
+
+ default:
+ break;
+ }
+
+ return ret;
+}
+
+static struct devfreq_governor devfreq_watermark = {
+ .name = "wmark_simple",
+ .get_target_freq = devfreq_watermark_target_freq,
+ .event_handler = devfreq_watermark_event_handler,
+};
+
+
+static int __init devfreq_watermark_init(void)
+{
+ return devfreq_add_governor(&devfreq_watermark);
+}
+
+static void __exit devfreq_watermark_exit(void)
+{
+ devfreq_remove_governor(&devfreq_watermark);
+}
+
+rootfs_initcall(devfreq_watermark_init);
+module_exit(devfreq_watermark_exit);
--
1.8.1.5
This patch adds a new watermark governor that actively alters the
watermark values based on the current frequency and target load.
The governor takes target load as a given property (80% by default)
and every time when the governor re-estimation is triggered, the
governor checks the current load and calculates the next frequency
that would push the device as close this target load as possible.
At this point also the watermark values are updated so that next
time the interrupt should come when the load change is significant
enough to cause frequency change to either next or previous
frequency.
Signed-off-by: Arto Merilainen <[email protected]>
---
drivers/devfreq/Kconfig | 11 ++
drivers/devfreq/Makefile | 1 +
drivers/devfreq/governor_wmark_active.c | 276 ++++++++++++++++++++++++++++++++
3 files changed, 288 insertions(+)
create mode 100644 drivers/devfreq/governor_wmark_active.c
diff --git a/drivers/devfreq/Kconfig b/drivers/devfreq/Kconfig
index 37710d999ffe..dc5b207e9c3b 100644
--- a/drivers/devfreq/Kconfig
+++ b/drivers/devfreq/Kconfig
@@ -70,6 +70,17 @@ config DEVFREQ_GOV_WMARK_SIMPLE
This governor returns the next frequency from frequency table
based on type watermark event.
+config DEVFREQ_GOV_WMARK_ACTIVE
+ tristate "Active Watermark"
+ help
+ Sets the frequency based on monitor watermark events.
+ This governor calculates relation between current load and
+ target load. The next frequency is calculated by multiplying
+ this relation with the current frequency.
+
+ The watermark values are updated so that the watermarks are
+ triggered when the above algorithm would change the frequency.
+
comment "DEVFREQ Drivers"
config ARM_EXYNOS4_BUS_DEVFREQ
diff --git a/drivers/devfreq/Makefile b/drivers/devfreq/Makefile
index 92024eeb73b6..91e07aef3b29 100644
--- a/drivers/devfreq/Makefile
+++ b/drivers/devfreq/Makefile
@@ -4,6 +4,7 @@ obj-$(CONFIG_DEVFREQ_GOV_PERFORMANCE) += governor_performance.o
obj-$(CONFIG_DEVFREQ_GOV_POWERSAVE) += governor_powersave.o
obj-$(CONFIG_DEVFREQ_GOV_USERSPACE) += governor_userspace.o
obj-$(CONFIG_DEVFREQ_GOV_WMARK_SIMPLE) += governor_wmark_simple.o
+obj-$(CONFIG_DEVFREQ_GOV_WMARK_ACTIVE) += governor_wmark_active.o
# DEVFREQ Drivers
obj-$(CONFIG_ARM_EXYNOS4_BUS_DEVFREQ) += exynos/
diff --git a/drivers/devfreq/governor_wmark_active.c b/drivers/devfreq/governor_wmark_active.c
new file mode 100644
index 000000000000..fe67d915d490
--- /dev/null
+++ b/drivers/devfreq/governor_wmark_active.c
@@ -0,0 +1,276 @@
+/*
+ * Copyright (c) 2014, NVIDIA CORPORATION. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/devfreq.h>
+#include <linux/debugfs.h>
+#include <linux/types.h>
+#include <linux/slab.h>
+#include <linux/device.h>
+#include <linux/notifier.h>
+#include <linux/platform_device.h>
+
+#include "governor.h"
+
+struct wmark_gov_info {
+ /* probed from the devfreq */
+ unsigned int *freqlist;
+ int freq_count;
+
+ /* algorithm parameters */
+ unsigned int p_load_target;
+ unsigned int p_load_max;
+
+ /* common data */
+ struct devfreq *df;
+ struct platform_device *pdev;
+ struct dentry *debugdir;
+};
+
+static unsigned int freqlist_up(struct wmark_gov_info *wmarkinfo,
+ unsigned int curr_freq)
+{
+ int i, pos;
+
+ for (i = 0; i < wmarkinfo->freq_count; i++)
+ if (wmarkinfo->freqlist[i] > curr_freq)
+ break;
+
+ pos = min(wmarkinfo->freq_count - 1, i);
+
+ return wmarkinfo->freqlist[pos];
+}
+
+static unsigned int freqlist_down(struct wmark_gov_info *wmarkinfo,
+ unsigned int curr_freq)
+{
+ int i, pos;
+
+ for (i = wmarkinfo->freq_count - 1; i >= 0; i--)
+ if (wmarkinfo->freqlist[i] < curr_freq)
+ break;
+
+ pos = max(0, i);
+ return wmarkinfo->freqlist[pos];
+}
+
+static unsigned int freqlist_round(struct wmark_gov_info *wmarkinfo,
+ unsigned int freq)
+{
+ int i, pos;
+
+ for (i = 0; i < wmarkinfo->freq_count; i++)
+ if (wmarkinfo->freqlist[i] >= freq)
+ break;
+
+ pos = min(wmarkinfo->freq_count - 1, i);
+ return wmarkinfo->freqlist[pos];
+}
+
+static void update_watermarks(struct devfreq *df,
+ unsigned int current_frequency)
+{
+ struct wmark_gov_info *wmarkinfo = df->data;
+ unsigned int relation = 0, next_freq = 0;
+ unsigned int current_frequency_khz = current_frequency / 1000;
+
+ if (current_frequency == wmarkinfo->freqlist[0]) {
+ /* disable the low watermark if we are at lowest clock */
+ df->profile->set_low_wmark(df->dev.parent, 0);
+ } else {
+ /* calculate the low threshold; what is the load value
+ * at which we would go into lower frequency given the
+ * that we are running at the new frequency? */
+ next_freq = freqlist_down(wmarkinfo, current_frequency);
+ relation = ((next_freq / current_frequency_khz) *
+ wmarkinfo->p_load_target) / 1000;
+ df->profile->set_low_wmark(df->dev.parent, relation);
+ }
+
+ if (current_frequency ==
+ wmarkinfo->freqlist[wmarkinfo->freq_count - 1]) {
+ /* disable the high watermark if we are at highest clock */
+ df->profile->set_high_wmark(df->dev.parent, 1000);
+ } else {
+ /* calculate the high threshold; what is the load value
+ * at which we would go into highest frequency given the
+ * that we are running at the new frequency? */
+ next_freq = freqlist_up(wmarkinfo, current_frequency);
+ relation = ((next_freq / current_frequency_khz) *
+ wmarkinfo->p_load_target) / 1000;
+ relation = min(wmarkinfo->p_load_max, relation);
+ df->profile->set_high_wmark(df->dev.parent, relation);
+ }
+
+}
+
+static int devfreq_watermark_target_freq(struct devfreq *df,
+ unsigned long *freq)
+{
+ struct wmark_gov_info *wmarkinfo = df->data;
+ struct devfreq_dev_status dev_stat;
+ unsigned int load, relation, next_freq;
+ int err;
+
+ err = df->profile->get_dev_status(df->dev.parent, &dev_stat);
+ if (err < 0)
+ return err;
+
+ /* keep current frequency if we do not have proper data available */
+ if (!dev_stat.total_time) {
+ *freq = dev_stat.current_frequency;
+ return 0;
+ }
+
+ /* calculate first load and relation load/p_load_target */
+ load = (dev_stat.busy_time * 1000) / dev_stat.total_time;
+
+ /* if we cross load max... */
+ if (load >= wmarkinfo->p_load_max) {
+ /* we go directly to the highest frequency. depending
+ * on frequency table we might never go higher than
+ * the current frequency (i.e. load should be over 100%
+ * to make relation push to the next frequency). */
+ *freq = wmarkinfo->freqlist[wmarkinfo->freq_count - 1];
+ } else {
+ /* otherwise, based on relation between current load and
+ * load target we calculate the "ideal" frequency
+ * where we would be just at the target */
+ relation = (load * 1000) / wmarkinfo->p_load_target;
+ next_freq = relation * (dev_stat.current_frequency / 1000);
+
+ /* round this frequency */
+ *freq = freqlist_round(wmarkinfo, next_freq);
+ }
+
+ /* update watermarks to match with the new frequency */
+ update_watermarks(df, *freq);
+
+ return 0;
+}
+
+static void devfreq_watermark_debug_start(struct devfreq *df)
+{
+ struct wmark_gov_info *wmarkinfo = df->data;
+ char dirname[128];
+
+ snprintf(dirname, sizeof(dirname), "%s_scaling",
+ to_platform_device(df->dev.parent)->name);
+
+ if (!wmarkinfo)
+ return;
+
+ wmarkinfo->debugdir = debugfs_create_dir(dirname, NULL);
+ if (!wmarkinfo->debugdir)
+ return;
+
+ debugfs_create_u32("load_target", S_IRUGO | S_IWUSR,
+ wmarkinfo->debugdir, &wmarkinfo->p_load_target);
+ debugfs_create_u32("load_max", S_IRUGO | S_IWUSR,
+ wmarkinfo->debugdir, &wmarkinfo->p_load_max);
+}
+
+static void devfreq_watermark_debug_stop(struct devfreq *df)
+{
+ struct wmark_gov_info *wmarkinfo = df->data;
+
+ debugfs_remove_recursive(wmarkinfo->debugdir);
+}
+
+static int devfreq_watermark_start(struct devfreq *df)
+{
+ struct wmark_gov_info *wmarkinfo;
+ struct platform_device *pdev = to_platform_device(df->dev.parent);
+
+ if (!df->profile->freq_table) {
+ dev_err(&pdev->dev, "Frequency table missing\n");
+ return -EINVAL;
+ }
+
+ wmarkinfo = kzalloc(sizeof(struct wmark_gov_info), GFP_KERNEL);
+ if (!wmarkinfo)
+ return -ENOMEM;
+
+ df->data = (void *)wmarkinfo;
+ wmarkinfo->freqlist = df->profile->freq_table;
+ wmarkinfo->freq_count = df->profile->max_state;
+ wmarkinfo->p_load_target = 700;
+ wmarkinfo->p_load_max = 900;
+ wmarkinfo->df = df;
+ wmarkinfo->pdev = pdev;
+
+ devfreq_watermark_debug_start(df);
+
+ return 0;
+}
+
+static int devfreq_watermark_event_handler(struct devfreq *df,
+ unsigned int event,
+ void *wmark_type)
+{
+ int ret = 0;
+ struct wmark_gov_info *wmarkinfo = df->data;
+
+ switch (event) {
+ case DEVFREQ_GOV_START:
+ devfreq_watermark_start(df);
+ wmarkinfo = df->data;
+ update_watermarks(df, wmarkinfo->freqlist[0]);
+ break;
+ case DEVFREQ_GOV_STOP:
+ devfreq_watermark_debug_stop(df);
+ break;
+ case DEVFREQ_GOV_SUSPEND:
+ devfreq_monitor_suspend(df);
+ break;
+
+ case DEVFREQ_GOV_RESUME:
+ wmarkinfo = df->data;
+ update_watermarks(df, wmarkinfo->freqlist[0]);
+ devfreq_monitor_resume(df);
+ break;
+
+ case DEVFREQ_GOV_WMARK:
+ mutex_lock(&df->lock);
+ update_devfreq(df);
+ mutex_unlock(&df->lock);
+ break;
+
+ default:
+ break;
+ }
+
+ return ret;
+}
+
+static struct devfreq_governor devfreq_watermark_active = {
+ .name = "wmark_active",
+ .get_target_freq = devfreq_watermark_target_freq,
+ .event_handler = devfreq_watermark_event_handler,
+};
+
+
+static int __init devfreq_watermark_init(void)
+{
+ return devfreq_add_governor(&devfreq_watermark_active);
+}
+
+static void __exit devfreq_watermark_exit(void)
+{
+ devfreq_remove_governor(&devfreq_watermark_active);
+}
+
+rootfs_initcall(devfreq_watermark_init);
+module_exit(devfreq_watermark_exit);
--
1.8.1.5
> (Sorry for the spam. I am resending the series because I noted that
> some of the email addresses were mistyped)
>
> Currently main mechanism to implement scaling using devfreq is
> polling and the device profile is free to set polling interval.
> However, in many cases this approach is not optimal; We may
> unnecessarily use CPU time to determine load on engine that
> is idle most of time - or we may simply read load when we
> already know that the device is busy.
>
> In some cases a device itself has counters to track its activity
> and possibility to raise interrupts when load goes above or below
> certain threshold.
>
> This series adds support for watermark events to devfreq and
> introduces two example governors. The first patch adds two
> callbacks to the device profile (for setting watermark) and
> adds a new function call to devfreq that informs of crossed
> watermark.
>
> The added governors demonstrate usage of the new API. The first
> governor (watermark simple) sets device to trigger low watermark
> event when load goes below 10% and high watermark interrupt when
> the load goes above 60%. Watermark active tries to keep load at
> certain level and it actively sets thresholds based on the
> frequency table in order get interrupts only when the load value
> would affect to the current frequency in re-estimation.
Hi Arto,
Please let me start with somewhat naive high-level question:
What do you mean by watermark in this context?
Is it a product name of yours (interrupt-based PMU, I presume)?
Or does watermark have another semantics that I am not aware of?
Or do you really mean something like
http://en.wikipedia.org/wiki/Digital_watermarking
Other itching points include:
- devfreq_watermark_event() was declared but has never been used.
Who is supposed to call this function?
- Is enum watermark_type supposed to be used out of /drivers/devfreq/* ?
Otherwise, please move it inside /drivers/devfreq/governor.h
(I guess it is to be used inside corresponding governors only)
- Could you please watermark-specific interfaces (set_high/low_wmark) into
its own public header file? (e.g., /include/linux/devfreq/governor_wmark.h)
I think we can create another (governor_simpleondemand.h) in there as well
in order to have threshold values configured by device drivers.
Adding governor-specific configurations into devfreq.h seems not
appropriate especially when we expect that we may need many different
governors.
OR..
This seems that you can keep set_h/l_wmark functions exposed to
drivers/devfreq/* only. Therefore, having /drivers/devfreq/governor_wmark.h
should be sufficient as well, which should be more neat than the above.
The callbacks are to be defined by the devfreq drivers, aren't they?
- The event name, "DEVFREQ_GOV_WMARK", defined in governor.h, seems to be
more appropriate if it is named "DEVFREQ_GOV_INTERNAL" as we won't need
to define event names for each govornor's internal needs.
OR.. for more generality,
we may define a macro like:
#define DEVFREQ_GOV_INTERNAL(value) ((0x1 << 31) | (value))
#define GET_DEVFREQ_GOV_INTERNAL(event) ((event) & ~(0x1 << 31))
- In general, I would love to see governors with minimal intervention on
the framework core/main code, especially when it is not beneficial to
other governors. Unlike cpufreq, we may contain many different types of
devices in devfreq, which has the potential to accompany many different
governors.
Cheers,
MyungJoo.
>
> Arto Merilainen (1):
> PM / devfreq: Add watermark active governor
>
> Shridhar Rasal (2):
> PM / devfreq: Add watermark events
> PM / devfreq: Add watermark simple governor
>
> drivers/devfreq/Kconfig | 18 +++
> drivers/devfreq/Makefile | 2 +
> drivers/devfreq/devfreq.c | 19 +++
> drivers/devfreq/governor.h | 1 +
> drivers/devfreq/governor_wmark_active.c | 276 ++++++++++++++++++++++++++++++++
> drivers/devfreq/governor_wmark_simple.c | 245 ++++++++++++++++++++++++++++
> include/linux/devfreq.h | 26 +++
> 7 files changed, 587 insertions(+)
> create mode 100644 drivers/devfreq/governor_wmark_active.c
> create mode 100644 drivers/devfreq/governor_wmark_simple.c
>
> --
> 1.8.1.5
>
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m????????????I?
On 8 December 2014 at 08:44, MyungJoo Ham <[email protected]> wrote:
>> (Sorry for the spam. I am resending the series because I noted that
>> some of the email addresses were mistyped)
>>
>> Currently main mechanism to implement scaling using devfreq is
>> polling and the device profile is free to set polling interval.
>> However, in many cases this approach is not optimal; We may
>> unnecessarily use CPU time to determine load on engine that
>> is idle most of time - or we may simply read load when we
>> already know that the device is busy.
>>
>> In some cases a device itself has counters to track its activity
>> and possibility to raise interrupts when load goes above or below
>> certain threshold.
>>
>> This series adds support for watermark events to devfreq and
>> introduces two example governors. The first patch adds two
>> callbacks to the device profile (for setting watermark) and
>> adds a new function call to devfreq that informs of crossed
>> watermark.
>>
>> The added governors demonstrate usage of the new API. The first
>> governor (watermark simple) sets device to trigger low watermark
>> event when load goes below 10% and high watermark interrupt when
>> the load goes above 60%. Watermark active tries to keep load at
>> certain level and it actively sets thresholds based on the
>> frequency table in order get interrupts only when the load value
>> would affect to the current frequency in re-estimation.
>
> Hi Arto,
>
>
> Please let me start with somewhat naive high-level question:
> What do you mean by watermark in this context?
> Is it a product name of yours (interrupt-based PMU, I presume)?
> Or does watermark have another semantics that I am not aware of?
> Or do you really mean something like
> http://en.wikipedia.org/wiki/Digital_watermarking
The watermark term is borrowed here from queues. The high and low
watermarks define the boundaries within the load has to be contained.
If the load is above or below the watermarks, a more suitable
frequency has to be found.
Regards,
Tomeu
> Other itching points include:
> - devfreq_watermark_event() was declared but has never been used.
> Who is supposed to call this function?
> - Is enum watermark_type supposed to be used out of /drivers/devfreq/* ?
> Otherwise, please move it inside /drivers/devfreq/governor.h
> (I guess it is to be used inside corresponding governors only)
> - Could you please watermark-specific interfaces (set_high/low_wmark) into
> its own public header file? (e.g., /include/linux/devfreq/governor_wmark.h)
> I think we can create another (governor_simpleondemand.h) in there as well
> in order to have threshold values configured by device drivers.
> Adding governor-specific configurations into devfreq.h seems not
> appropriate especially when we expect that we may need many different
> governors.
>
> OR..
>
> This seems that you can keep set_h/l_wmark functions exposed to
> drivers/devfreq/* only. Therefore, having /drivers/devfreq/governor_wmark.h
> should be sufficient as well, which should be more neat than the above.
> The callbacks are to be defined by the devfreq drivers, aren't they?
>
> - The event name, "DEVFREQ_GOV_WMARK", defined in governor.h, seems to be
> more appropriate if it is named "DEVFREQ_GOV_INTERNAL" as we won't need
> to define event names for each govornor's internal needs.
>
> OR.. for more generality,
> we may define a macro like:
>
> #define DEVFREQ_GOV_INTERNAL(value) ((0x1 << 31) | (value))
> #define GET_DEVFREQ_GOV_INTERNAL(event) ((event) & ~(0x1 << 31))
>
> - In general, I would love to see governors with minimal intervention on
> the framework core/main code, especially when it is not beneficial to
> other governors. Unlike cpufreq, we may contain many different types of
> devices in devfreq, which has the potential to accompany many different
> governors.
>
>
> Cheers,
> MyungJoo.
>
>>
>> Arto Merilainen (1):
>> PM / devfreq: Add watermark active governor
>>
>> Shridhar Rasal (2):
>> PM / devfreq: Add watermark events
>> PM / devfreq: Add watermark simple governor
>>
>> drivers/devfreq/Kconfig | 18 +++
>> drivers/devfreq/Makefile | 2 +
>> drivers/devfreq/devfreq.c | 19 +++
>> drivers/devfreq/governor.h | 1 +
>> drivers/devfreq/governor_wmark_active.c | 276 ++++++++++++++++++++++++++++++++
>> drivers/devfreq/governor_wmark_simple.c | 245 ++++++++++++++++++++++++++++
>> include/linux/devfreq.h | 26 +++
>> 7 files changed, 587 insertions(+)
>> create mode 100644 drivers/devfreq/governor_wmark_active.c
>> create mode 100644 drivers/devfreq/governor_wmark_simple.c
>>
>> --
>> 1.8.1.5
>>
Hi MyungJoo,
Thank for your answer, see my answers inline.
On 12/08/2014 09:44 AM, MyungJoo Ham wrote:
> Please let me start with somewhat naive high-level question:
> What do you mean by watermark in this context?
Sorry for poor choice of naming. "Load threshold interrupt" might be
more appropriate. The idea of the series here is to add an ability to
program hardware to give interrupts when the current load value goes
above (or below) certain threshold(s). I will revisit the naming.
> Other itching points include:
> - devfreq_watermark_event() was declared but has never been used.
> Who is supposed to call this function?
The device profile is responsible to call this function from an
interrupt handler. I will document this.
> - Is enum watermark_type supposed to be used out of /drivers/devfreq/* ?
Yes, the idea is to use the enumerations for informing the type of the
event. I.e. from interrupt handler we might call:
devfreq_watermark_event(devfreq, ACTMON_INTR_BELOW_WMARK);
> - Could you please watermark-specific interfaces (set_high/low_wmark) into
> its own public header file? (e.g., /include/linux/devfreq/governor_wmark.h)
> I think we can create another (governor_simpleondemand.h) in there as well
> in order to have threshold values configured by device drivers.
> Adding governor-specific configurations into devfreq.h seems not
> appropriate especially when we expect that we may need many different
> governors.
I am not convinced that creating two separate interfaces is better than
extending a single centralised interface. Let us first consider the
compatibility...:
First, in trivial case the device driver does not implement the
threshold feature. Obviously we cannot use threshold based governor for
scaling but all existing polling based governors will work as earlier.
Second, let us assume that we have a device that supports threshold
interrupts and the driver has implemented the required set_h/l_wmark
interrupts and calls devfreq_watermark_event() when we get an interrupt.
This device can therefore use wmark_simple and wmark_active governors.
In addition to the wmark governors, we may choose to use a polling based
governor. With current approach the polling governor simply ignores the
events and leaves the thresholds "as is".
In short, I would not be concerned on compatibility. From governor side
it should be ok to ignore an event. In addition, it is ok from governor
initialization to fail in case the device driver does not implement all
required features.
I am also afraid that initialisation makes two-communication-paths
approach tricky. In code you likely could have..:
devfreq_add_device(dev, &pdata->devfreq, "wmark_simple", &extra_fns);
.. but what happens if we wish to change the governor on the fly through
sysfs? In worst case this is targeting to a governor that requires its
own data field in initialisation.
>
> OR..
>
> This seems that you can keep set_h/l_wmark functions exposed to
> drivers/devfreq/* only. Therefore, having /drivers/devfreq/governor_wmark.h
> should be sufficient as well, which should be more neat than the above.
> The callbacks are to be defined by the devfreq drivers, aren't they?
set_h/l_wmark callbacks should be implemented by the device profiles
(the device drivers themselves) for setting the thresholds on hardware.
These are not devfreq/governor private functions but something that
should be implemented inside the device driver.
>
> - The event name, "DEVFREQ_GOV_WMARK", defined in governor.h, seems to be
> more appropriate if it is named "DEVFREQ_GOV_INTERNAL" as we won't need
> to define event names for each govornor's internal needs.
>
> OR.. for more generality,
> we may define a macro like:
>
> #define DEVFREQ_GOV_INTERNAL(value) ((0x1 << 31) | (value))
> #define GET_DEVFREQ_GOV_INTERNAL(event) ((event) & ~(0x1 << 31))
This should work.
>
> - In general, I would love to see governors with minimal intervention on
> the framework core/main code, especially when it is not beneficial to
> other governors. Unlike cpufreq, we may contain many different types of
> devices in devfreq, which has the potential to accompany many different
> governors.
>
I would like to see is that we could easily take advantage of hardware
features and not let framework limit us too much. :-)
For example, Tomeu has been working on ACTMON driver that measures
memory transfers (please refer to "Add support for Tegra Activity
Monitor" series). This hardware block is capable of generating
interrupts based on activity and hence by factoring the series properly,
we could use the same governor from the series in different places
easily. Without proper interfacing everyone ends up duplicating the code.
- Arto
>
> Cheers,
> MyungJoo.
>
>>
>> Arto Merilainen (1):
>> PM / devfreq: Add watermark active governor
>>
>> Shridhar Rasal (2):
>> PM / devfreq: Add watermark events
>> PM / devfreq: Add watermark simple governor
>>
>> drivers/devfreq/Kconfig | 18 +++
>> drivers/devfreq/Makefile | 2 +
>> drivers/devfreq/devfreq.c | 19 +++
>> drivers/devfreq/governor.h | 1 +
>> drivers/devfreq/governor_wmark_active.c | 276 ++++++++++++++++++++++++++++++++
>> drivers/devfreq/governor_wmark_simple.c | 245 ++++++++++++++++++++++++++++
>> include/linux/devfreq.h | 26 +++
>> 7 files changed, 587 insertions(+)
>> create mode 100644 drivers/devfreq/governor_wmark_active.c
>> create mode 100644 drivers/devfreq/governor_wmark_simple.c
>>
>> --
>> 1.8.1.5
>>
On Fri, Dec 5, 2014 at 10:41 PM, Arto Merilainen <[email protected]> wrote:
> From: Shridhar Rasal <[email protected]>
>
> This patch adds support for watermark events. These events inform
> the governor that the device load has gone below (low watermark) or
> above (high watermark) certain load value.
This is definitely a useful feature to have, as it reflects what
efficient hardware does (raise an interrupt when load goes above/under
a certain threshold). In particular Tomeu's ACTMON driver would
probably greatly benefit from it.
A few comments below:
>
> Signed-off-by: Shridhar Rasal <[email protected]>
> Signed-off-by: Arto Merilainen <[email protected]>
> ---
> drivers/devfreq/devfreq.c | 19 +++++++++++++++++++
> drivers/devfreq/governor.h | 1 +
> include/linux/devfreq.h | 26 ++++++++++++++++++++++++++
> 3 files changed, 46 insertions(+)
>
> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
> index 30b538d8cc90..8333d024132a 100644
> --- a/drivers/devfreq/devfreq.c
> +++ b/drivers/devfreq/devfreq.c
> @@ -1050,6 +1050,25 @@ static struct attribute *devfreq_attrs[] = {
> };
> ATTRIBUTE_GROUPS(devfreq);
>
> +/**
> + * devfreq_watermark_event() - Handles watermark events
> + * @devfreq: the devfreq instance to be updated
> + * @type: type of watermark event
> + */
> +int devfreq_watermark_event(struct devfreq *devfreq, int type)
Here I suppose "type" is to be an enum watermark_type - therefore it
should probably be changed to that type instead of int.
> +{
> + if (!devfreq)
> + return -EINVAL;
> +
> + if (!devfreq->governor)
> + return -EINVAL;
Let's fold these two conditions into "if (!devfreq || !devfreq->governor)"
> +
> + return devfreq->governor->event_handler(devfreq,
> + DEVFREQ_GOV_WMARK, &type);
> +}
> +EXPORT_SYMBOL(devfreq_watermark_event);
> +
> +
Extra whiteline here.
> static int __init devfreq_init(void)
> {
> devfreq_class = class_create(THIS_MODULE, "devfreq");
> diff --git a/drivers/devfreq/governor.h b/drivers/devfreq/governor.h
> index fad7d6321978..fb9875388f3f 100644
> --- a/drivers/devfreq/governor.h
> +++ b/drivers/devfreq/governor.h
> @@ -24,6 +24,7 @@
> #define DEVFREQ_GOV_INTERVAL 0x3
> #define DEVFREQ_GOV_SUSPEND 0x4
> #define DEVFREQ_GOV_RESUME 0x5
> +#define DEVFREQ_GOV_WMARK 0x6
>
> /* Caution: devfreq->lock must be locked before calling update_devfreq */
> extern int update_devfreq(struct devfreq *devfreq);
> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
> index f1863dcd83ea..b5bf6c4fe286 100644
> --- a/include/linux/devfreq.h
> +++ b/include/linux/devfreq.h
> @@ -21,6 +21,12 @@
>
> struct devfreq;
>
> +enum watermark_type {
Change the name to devfreq_watermark_event maybe? We know it's a type
already from the context, and we should use the devfreq_ prefix to
avoid name collisions.
> + NO_WATERMARK_EVENT = 0,
> + HIGH_WATERMARK_EVENT = 1,
> + LOW_WATERMARK_EVENT = 2
... and if the type is renamed to watermark_event, these could become
DEVFREQ_NO_WATERMARK, DEVFREQ_HIGH_WATERMARK, and
DEVFREQ_LOW_WATERMARK.
> +};
> +
> /**
> * struct devfreq_dev_status - Data given from devfreq user device to
> * governors. Represents the performance
> @@ -68,6 +74,16 @@ struct devfreq_dev_status {
> * status to devfreq, which is used by governors.
> * @get_cur_freq: The device should provide the current frequency
> * at which it is operating.
> + * @set_high_wmark: This is an optional callback to set high
> + * watermark for watermark event. The value is
> + * be scaled between 0 and 1000 where 1000 equals to
> + * 100% load. Setting this value to 1000 disables
> + * the event
> + * @set_low_wmark: This is an optional callback to set low
> + * watermark for watermark event. The value is
> + * be scaled between 0 and 1000 where 1000 equals to
> + * 100% load. Setting this value to 0 disables the
> + * event.
>From the comment it is not clear whether the 0..1000 range corresponds
to the load between two frequencies in the frequencies table, or
covers the whole frequency table. I understand it is the former, but
it would be nice to explicitly state it.
On Fri, Dec 5, 2014 at 10:41 PM, Arto Merilainen <[email protected]> wrote:
> From: Shridhar Rasal <[email protected]>
>
> This patch adds a new devfreq governor, watermark simple. The
> governor decides next frequency naively by selecting the previous
> frequency in the frequency table if we received low watermark
> - or the next frequency in case we received high watermark.
>
> Signed-off-by: Shridhar Rasal <[email protected]>
> Signed-off-by: Arto Merilainen <[email protected]>
> ---
> drivers/devfreq/Kconfig | 7 +
> drivers/devfreq/Makefile | 1 +
> drivers/devfreq/governor_wmark_simple.c | 245 ++++++++++++++++++++++++++++++++
> 3 files changed, 253 insertions(+)
> create mode 100644 drivers/devfreq/governor_wmark_simple.c
>
> diff --git a/drivers/devfreq/Kconfig b/drivers/devfreq/Kconfig
> index faf4e70c42e0..37710d999ffe 100644
> --- a/drivers/devfreq/Kconfig
> +++ b/drivers/devfreq/Kconfig
> @@ -63,6 +63,13 @@ config DEVFREQ_GOV_USERSPACE
> Otherwise, the governor does not change the frequnecy
> given at the initialization.
>
> +config DEVFREQ_GOV_WMARK_SIMPLE
> + tristate "Simple Watermark"
> + help
> + Sets the frequency based on monitor watermark events.
> + This governor returns the next frequency from frequency table
> + based on type watermark event.
> +
> comment "DEVFREQ Drivers"
>
> config ARM_EXYNOS4_BUS_DEVFREQ
> diff --git a/drivers/devfreq/Makefile b/drivers/devfreq/Makefile
> index 16138c9e0d58..92024eeb73b6 100644
> --- a/drivers/devfreq/Makefile
> +++ b/drivers/devfreq/Makefile
> @@ -3,6 +3,7 @@ obj-$(CONFIG_DEVFREQ_GOV_SIMPLE_ONDEMAND) += governor_simpleondemand.o
> obj-$(CONFIG_DEVFREQ_GOV_PERFORMANCE) += governor_performance.o
> obj-$(CONFIG_DEVFREQ_GOV_POWERSAVE) += governor_powersave.o
> obj-$(CONFIG_DEVFREQ_GOV_USERSPACE) += governor_userspace.o
> +obj-$(CONFIG_DEVFREQ_GOV_WMARK_SIMPLE) += governor_wmark_simple.o
>
> # DEVFREQ Drivers
> obj-$(CONFIG_ARM_EXYNOS4_BUS_DEVFREQ) += exynos/
> diff --git a/drivers/devfreq/governor_wmark_simple.c b/drivers/devfreq/governor_wmark_simple.c
> new file mode 100644
> index 000000000000..bd14adcc84cb
> --- /dev/null
> +++ b/drivers/devfreq/governor_wmark_simple.c
> @@ -0,0 +1,245 @@
> +/*
> + * Copyright (c) 2014, NVIDIA CORPORATION. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <linux/devfreq.h>
> +#include <linux/debugfs.h>
> +#include <linux/types.h>
> +#include <linux/slab.h>
> +#include <linux/device.h>
> +#include <linux/notifier.h>
> +#include <linux/platform_device.h>
> +
> +#include "governor.h"
> +
> +struct wmark_gov_info {
> + /* probed from the devfreq */
> + unsigned int *freqlist;
> + int freq_count;
> +
> + /* algorithm parameters */
> + unsigned int p_high_wmark;
> + unsigned int p_low_wmark;
> +
> + /* dynamically changing data */
> + enum watermark_type event;
I would rename this to last_event for clarity.
> + unsigned int last_request;
last_freq might be a more appropriate name here.
> +
> + /* common data */
> + struct devfreq *df;
Unless I missed something this member is only set once in
devfreq_watermark_start() and never touched afterwards.
> + struct platform_device *pdev;
Same with this one.
> + struct dentry *debugdir;
> +};
> +
> +static unsigned long freqlist_up(struct wmark_gov_info *wmarkinfo,
> + unsigned long curr_freq)
> +{
> + int i, pos;
> +
> + for (i = 0; i < wmarkinfo->freq_count; i++)
> + if (wmarkinfo->freqlist[i] > curr_freq)
> + break;
> +
> + pos = min(wmarkinfo->freq_count - 1, i);
> +
> + return wmarkinfo->freqlist[pos];
> +}
> +
> +static unsigned int freqlist_down(struct wmark_gov_info *wmarkinfo,
> + unsigned long curr_freq)
> +{
> + int i, pos;
> +
> + for (i = wmarkinfo->freq_count - 1; i >= 0; i--)
> + if (wmarkinfo->freqlist[i] < curr_freq)
> + break;
> +
> + pos = max(0, i);
> + return wmarkinfo->freqlist[pos];
> +}
> +
> +static int devfreq_watermark_target_freq(struct devfreq *df,
> + unsigned long *freq)
> +{
> + struct wmark_gov_info *wmarkinfo = df->data;
> + struct devfreq_dev_status dev_stat;
> + int err;
> +
> + err = df->profile->get_dev_status(df->dev.parent, &dev_stat);
> + if (err < 0)
> + return err;
> +
> + switch (wmarkinfo->event) {
> + case HIGH_WATERMARK_EVENT:
> + *freq = freqlist_up(wmarkinfo, dev_stat.current_frequency);
> +
> + /* always enable low watermark */
> + df->profile->set_low_wmark(df->dev.parent,
> + wmarkinfo->p_low_wmark);
> +
> + /* disable high watermark if no change */
> + if (*freq == wmarkinfo->last_request)
> + df->profile->set_high_wmark(df->dev.parent, 1000);
> + break;
> + case LOW_WATERMARK_EVENT:
> + *freq = freqlist_down(wmarkinfo, dev_stat.current_frequency);
> +
> + /* always enable high watermark */
> + df->profile->set_high_wmark(df->dev.parent,
> + wmarkinfo->p_high_wmark);
> +
> + /* disable low watermark if no change */
> + if (*freq == wmarkinfo->last_request)
> + df->profile->set_low_wmark(df->dev.parent, 0);
> + break;
> + default:
> + break;
> + }
> +
> + /* Mark that you handled event */
s/you/we
> + wmarkinfo->event = NO_WATERMARK_EVENT;
> + wmarkinfo->last_request = *freq;
> +
> + return 0;
> +}
> +
> +static void devfreq_watermark_debug_start(struct devfreq *df)
> +{
> + struct wmark_gov_info *wmarkinfo = df->data;
> + char dirname[128];
> +
> + snprintf(dirname, sizeof(dirname), "%s_scaling",
> + to_platform_device(df->dev.parent)->name);
> +
> + if (!wmarkinfo)
> + return;
> +
> + wmarkinfo->debugdir = debugfs_create_dir(dirname, NULL);
> + if (!wmarkinfo->debugdir)
> + return;
> +
> + debugfs_create_u32("low_wmark", S_IRUGO | S_IWUSR,
> + wmarkinfo->debugdir, &wmarkinfo->p_low_wmark);
> + debugfs_create_u32("high_wmark", S_IRUGO | S_IWUSR,
> + wmarkinfo->debugdir, &wmarkinfo->p_high_wmark);
Shouldn't we call set_low_wmark() and set_high_wmark() when these
values are changed so the hardware reacts to the new values?
> +}
> +
> +static void devfreq_watermark_debug_stop(struct devfreq *df)
> +{
> + struct wmark_gov_info *wmarkinfo = df->data;
> +
> + debugfs_remove_recursive(wmarkinfo->debugdir);
> +}
> +
> +static int devfreq_watermark_start(struct devfreq *df)
> +{
> + struct wmark_gov_info *wmarkinfo;
> + struct platform_device *pdev = to_platform_device(df->dev.parent);
> +
> + if (!df->profile->freq_table) {
> + dev_err(&pdev->dev, "Frequency table missing\n");
> + return -EINVAL;
This error is not propagated by devfreq_watermark_event_handler(). It
definitely should be.
Also you will want to fail if d->profile->max_state == 0, otherwise
funny things will happen when a watermark is reached.
> + }
> +
> + wmarkinfo = kzalloc(sizeof(struct wmark_gov_info), GFP_KERNEL);
This memory is never freed.
> + if (!wmarkinfo)
> + return -ENOMEM;
> +
> + df->data = (void *)wmarkinfo;
> + wmarkinfo->freqlist = df->profile->freq_table;
> + wmarkinfo->freq_count = df->profile->max_state;
Contrary to what its comment says, freq_table is not optional anymore
and only used for statistics if one intents to use a watermark
governor. It should probably be updated to reflect this.
> + wmarkinfo->event = NO_WATERMARK_EVENT;
> + wmarkinfo->df = df;
> + wmarkinfo->pdev = pdev;
> + wmarkinfo->p_low_wmark = 100;
> + wmarkinfo->p_high_wmark = 600;
> +
> + devfreq_watermark_debug_start(df);
> +
> + return 0;
> +}
> +
> +static int devfreq_watermark_event_handler(struct devfreq *df,
> + unsigned int event,
> + void *wmark_type)
> +{
> + int ret = 0;
> + struct wmark_gov_info *wmarkinfo = df->data;
> + enum watermark_type *type = wmark_type;
> +
> + switch (event) {
> + case DEVFREQ_GOV_START:
> + devfreq_watermark_start(df);
This function does not really start anything - it prepares df to be
used with this governor. Maybe call it devfreq_watermark_init()
instead? Then you could have a devfreq_watermark_fini() function
called when we stop the governor that frees the memory previously
allocated.
Also I think you want to call try_module_get(THIS_MODULE) here to
prevent the governor module from being removed it if is used by
someone.
> + wmarkinfo = df->data;
> + if (df->profile->set_low_wmark)
> + df->profile->set_low_wmark(df->dev.parent,
> + wmarkinfo->p_low_wmark);
> + if (df->profile->set_high_wmark)
> + df->profile->set_high_wmark(df->dev.parent,
> + wmarkinfo->p_high_wmark);
Isn't it an error if one (or both) of these functions is missing? For
we will never receive any event. Not sure about this.
> + break;
> + case DEVFREQ_GOV_STOP:
> + devfreq_watermark_debug_stop(df);
As said above, have a devfreq_watermark_fini() (or any other name)
here that frees the kzalloc'd memory and calls
devfreq_watermark_debug_stop(). Also you should probably call
module_put(THIS_MODULE).
> + break;
> + case DEVFREQ_GOV_SUSPEND:
> + devfreq_monitor_suspend(df);
Shouldn't we call set_low_wmark() and set_high_wmark() to prevent the
hardware from firing interrupts after devfreq_monitor_suspend() is
called?
On Tue, Dec 9, 2014 at 2:07 AM, Arto Merilainen <[email protected]> wrote:
> Hi MyungJoo,
>
> Thank for your answer, see my answers inline.
>
> On 12/08/2014 09:44 AM, MyungJoo Ham wrote:
>>
>> Please let me start with somewhat naive high-level question:
>> What do you mean by watermark in this context?
>
>
> Sorry for poor choice of naming. "Load threshold interrupt" might be more
> appropriate. The idea of the series here is to add an ability to program
> hardware to give interrupts when the current load value goes above (or
> below) certain threshold(s). I will revisit the naming.
I personally like the watermark naming and think it is a pretty good
analogy to what this series is doing, but your call.
>
>> Other itching points include:
>> - devfreq_watermark_event() was declared but has never been used.
>> Who is supposed to call this function?
>
>
> The device profile is responsible to call this function from an interrupt
> handler. I will document this.
>
>> - Is enum watermark_type supposed to be used out of /drivers/devfreq/* ?
>
>
> Yes, the idea is to use the enumerations for informing the type of the
> event. I.e. from interrupt handler we might call:
> devfreq_watermark_event(devfreq, ACTMON_INTR_BELOW_WMARK);
>
>> - Could you please watermark-specific interfaces (set_high/low_wmark) into
>> its own public header file? (e.g.,
>> /include/linux/devfreq/governor_wmark.h)
>> I think we can create another (governor_simpleondemand.h) in there as
>> well
>> in order to have threshold values configured by device drivers.
>> Adding governor-specific configurations into devfreq.h seems not
>> appropriate especially when we expect that we may need many different
>> governors.
Considering that watermarking is a pretty common technique for the
kind of work that devfreq is attempting to cover (implementing
event-driven frequency scaling vs. polling-driven as current devfreq
does), doesn't it make sense to turn it into a core devfreq feature? I
don't think devfreq can be that easily extended and doing so might
make things messier than they are.
>
>
> I am not convinced that creating two separate interfaces is better than
> extending a single centralised interface. Let us first consider the
> compatibility...:
>
> First, in trivial case the device driver does not implement the threshold
> feature. Obviously we cannot use threshold based governor for scaling but
> all existing polling based governors will work as earlier.
>
> Second, let us assume that we have a device that supports threshold
> interrupts and the driver has implemented the required set_h/l_wmark
> interrupts and calls devfreq_watermark_event() when we get an interrupt.
> This device can therefore use wmark_simple and wmark_active governors. In
> addition to the wmark governors, we may choose to use a polling based
> governor. With current approach the polling governor simply ignores the
> events and leaves the thresholds "as is".
>
> In short, I would not be concerned on compatibility. From governor side it
> should be ok to ignore an event. In addition, it is ok from governor
> initialization to fail in case the device driver does not implement all
> required features.
>
>
> I am also afraid that initialisation makes two-communication-paths approach
> tricky. In code you likely could have..:
> devfreq_add_device(dev, &pdata->devfreq, "wmark_simple", &extra_fns);
>
> .. but what happens if we wish to change the governor on the fly through
> sysfs? In worst case this is targeting to a governor that requires its own
> data field in initialisation.
>
>>
>> OR..
>>
>> This seems that you can keep set_h/l_wmark functions exposed to
>> drivers/devfreq/* only. Therefore, having
>> /drivers/devfreq/governor_wmark.h
>> should be sufficient as well, which should be more neat than the
>> above.
>> The callbacks are to be defined by the devfreq drivers, aren't they?
>
>
> set_h/l_wmark callbacks should be implemented by the device profiles (the
> device drivers themselves) for setting the thresholds on hardware. These are
> not devfreq/governor private functions but something that should be
> implemented inside the device driver.
>
>>
>> - The event name, "DEVFREQ_GOV_WMARK", defined in governor.h, seems to be
>> more appropriate if it is named "DEVFREQ_GOV_INTERNAL" as we won't
>> need
>> to define event names for each govornor's internal needs.
>>
>> OR.. for more generality,
>> we may define a macro like:
>>
>> #define DEVFREQ_GOV_INTERNAL(value) ((0x1 << 31) | (value))
>> #define GET_DEVFREQ_GOV_INTERNAL(event) ((event) & ~(0x1 << 31))
>
>
> This should work.
>
>>
>> - In general, I would love to see governors with minimal intervention on
>> the framework core/main code, especially when it is not beneficial to
>> other governors. Unlike cpufreq, we may contain many different types
>> of
>> devices in devfreq, which has the potential to accompany many
>> different
>> governors.
>>
>
> I would like to see is that we could easily take advantage of hardware
> features and not let framework limit us too much. :-)
>
> For example, Tomeu has been working on ACTMON driver that measures memory
> transfers (please refer to "Add support for Tegra Activity Monitor" series).
> This hardware block is capable of generating interrupts based on activity
> and hence by factoring the series properly, we could use the same governor
> from the series in different places easily. Without proper interfacing
> everyone ends up duplicating the code.
Indeed. Watermarking support would make the ACTMON driver simpler and
easier to integrate with all the governors that devfreq currently has.
Without core support for it, it could only interact properly with its
own, dedicated governor.
MyungJoo, the issue of having this feature in the core vs. having it
as a devfreq "extension" aside, do you agree with the core idea? An
informl ack for the idea would allow us to start leveraging this for
the ACTMON driver. With a concrete user it will make it easier for
everyone to understand how the pieces should fit together.
On 5 December 2014 at 14:41, Arto Merilainen <[email protected]> wrote:
> (Sorry for the spam. I am resending the series because I noted that
> some of the email addresses were mistyped)
>
> Currently main mechanism to implement scaling using devfreq is
> polling and the device profile is free to set polling interval.
> However, in many cases this approach is not optimal; We may
> unnecessarily use CPU time to determine load on engine that
> is idle most of time - or we may simply read load when we
> already know that the device is busy.
>
> In some cases a device itself has counters to track its activity
> and possibility to raise interrupts when load goes above or below
> certain threshold.
Hi Arto,
this is very interesting stuff, thanks for sending.
Regarding the question of whether the ACTMON driver should use this
generic governor, I think it comes down to whether we are able to
reach the best compromise between performance and power usage with it.
I see that the logic of the downstream ACTMON driver is much more
complex, as it takes into account the activity caused by the CPU
complex and gives it more weight than the total. It also sets
watermarks for the running average over the last 128 samples. The
frequency of the CPU is also taken into account.
Besides that, there's several factors and adjustments that I can only
think that have been determined through experimentation and
measurement and that in that case are very likely to be specific to a
given board and/or SoC.
It also takes into account consecutive watermark breaches and boosts
the frequency accordingly.
It would be great if you could ask within NVIDIA for the rationale
behind the way of calculating the target frequency downstream, as it
would allow us to better discern what can be made generic and what
belongs to the ACTMON driver itself.
It would be also good if we can find other SoCs that have similar
functionality, so we can make sure that what we end up adding to the
generic framework is of use to others.
Besides the above general points, two more specific comments:
* why not use the opp freq table from the DT (operating-points)
instead of freq_table?
* why having wmark_simple at all? And if we don't actually want it,
guess we can drop the event type arg?
Thanks,
Tomeu
> This series adds support for watermark events to devfreq and
> introduces two example governors. The first patch adds two
> callbacks to the device profile (for setting watermark) and
> adds a new function call to devfreq that informs of crossed
> watermark.
>
> The added governors demonstrate usage of the new API. The first
> governor (watermark simple) sets device to trigger low watermark
> event when load goes below 10% and high watermark interrupt when
> the load goes above 60%. Watermark active tries to keep load at
> certain level and it actively sets thresholds based on the
> frequency table in order get interrupts only when the load value
> would affect to the current frequency in re-estimation.
>
> Arto Merilainen (1):
> PM / devfreq: Add watermark active governor
>
> Shridhar Rasal (2):
> PM / devfreq: Add watermark events
> PM / devfreq: Add watermark simple governor
>
> drivers/devfreq/Kconfig | 18 +++
> drivers/devfreq/Makefile | 2 +
> drivers/devfreq/devfreq.c | 19 +++
> drivers/devfreq/governor.h | 1 +
> drivers/devfreq/governor_wmark_active.c | 276 ++++++++++++++++++++++++++++++++
> drivers/devfreq/governor_wmark_simple.c | 245 ++++++++++++++++++++++++++++
> include/linux/devfreq.h | 26 +++
> 7 files changed, 587 insertions(+)
> create mode 100644 drivers/devfreq/governor_wmark_active.c
> create mode 100644 drivers/devfreq/governor_wmark_simple.c
>
> --
> 1.8.1.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Alexandre Courbot:
>
> MyungJoo, the issue of having this feature in the core vs. having it
> as a devfreq "extension" aside, do you agree with the core idea? An
> informl ack for the idea would allow us to start leveraging this for
> the ACTMON driver. With a concrete user it will make it easier for
> everyone to understand how the pieces should fit together.
Yes, I love the idea of event/interrupt-driven DVFS and I agree with
the core idea. The remaining concern is that the current "implementation"
might be too invasive to the subsystem while it does not need to be as
a governor can always choose not to poll. And the other issues are
quite minor as well.
Actually, it appears that next-generations of some architectures
(I think I cannot publically say which and when) are going to support
this for general peripherals on the bus and the CPU soon.
Cheers,
MyungJoo
>
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m????????????I?