2018-01-11 09:36:19

by Viresh Kumar

[permalink] [raw]
Subject: [PATCH V3] thermal: Add cooling device's statistics in sysfs

This extends the sysfs interface for thermal cooling devices and exposes
some pretty useful statistics. These statistics have proven to be quite
useful specially while doing benchmarks related to the task scheduler,
where we want to make sure that nothing has disrupted the test,
specially the cooling device which may have put constraints on the CPUs.
The information exposed here tells us to what extent the CPUs were
constrained by the thermal framework.

The read-only "total_trans" file shows the total number of cooling state
transitions the device has gone through since the time the cooling
device is registered or the time when statistics were reset last.

The read-only "time_in_state_ms" file shows the time spent by the device
in the respective cooling states.

The write-only "reset" file is used to reset the statistics.

This is how the directory structure looks like for a single cooling
device:

$ ls -R /sys/class/thermal/cooling_device0/
/sys/class/thermal/cooling_device0/:
cur_state max_state power stats subsystem type uevent

/sys/class/thermal/cooling_device0/power:
autosuspend_delay_ms runtime_active_time runtime_suspended_time
control runtime_status

/sys/class/thermal/cooling_device0/stats:
reset time_in_state_ms total_trans

This is tested on ARM 32-bit Hisilicon hikey620 board running Ubuntu and
ARM 64-bit Hisilicon hikey960 board running Android.

Signed-off-by: Viresh Kumar <[email protected]>
---
V2->V3:
- Total number of states is max_level + 1. The earlier version didn't
take that into account and so the stats for the highest state were
missing.

V1->V2:
- Move to sysfs from debugfs

drivers/thermal/thermal_core.c | 3 +-
drivers/thermal/thermal_core.h | 3 +
drivers/thermal/thermal_helpers.c | 5 +-
drivers/thermal/thermal_sysfs.c | 146 ++++++++++++++++++++++++++++++++++++++
include/linux/thermal.h | 1 +
5 files changed, 156 insertions(+), 2 deletions(-)

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 2b1b0ba393a4..2cc4ae57484e 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -972,8 +972,8 @@ __thermal_cooling_device_register(struct device_node *np,
cdev->ops = ops;
cdev->updated = false;
cdev->device.class = &thermal_class;
- thermal_cooling_device_setup_sysfs(cdev);
cdev->devdata = devdata;
+ thermal_cooling_device_setup_sysfs(cdev);
dev_set_name(&cdev->device, "cooling_device%d", cdev->id);
result = device_register(&cdev->device);
if (result) {
@@ -1106,6 +1106,7 @@ void thermal_cooling_device_unregister(struct thermal_cooling_device *cdev)

ida_simple_remove(&thermal_cdev_ida, cdev->id);
device_unregister(&cdev->device);
+ thermal_cooling_device_remove_sysfs(cdev);
}
EXPORT_SYMBOL_GPL(thermal_cooling_device_unregister);

diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h
index 27e3b1df7360..f6eb01e99816 100644
--- a/drivers/thermal/thermal_core.h
+++ b/drivers/thermal/thermal_core.h
@@ -73,6 +73,9 @@ int thermal_build_list_of_policies(char *buf);
int thermal_zone_create_device_groups(struct thermal_zone_device *, int);
void thermal_zone_destroy_device_groups(struct thermal_zone_device *);
void thermal_cooling_device_setup_sysfs(struct thermal_cooling_device *);
+void thermal_cooling_device_remove_sysfs(struct thermal_cooling_device *cdev);
+void thermal_cooling_device_stats_update(struct thermal_cooling_device *cdev,
+ unsigned long new_state);
/* used only at binding time */
ssize_t
thermal_cooling_device_trip_point_show(struct device *,
diff --git a/drivers/thermal/thermal_helpers.c b/drivers/thermal/thermal_helpers.c
index 8cdf75adcce1..eb03d7e099bb 100644
--- a/drivers/thermal/thermal_helpers.c
+++ b/drivers/thermal/thermal_helpers.c
@@ -187,7 +187,10 @@ void thermal_cdev_update(struct thermal_cooling_device *cdev)
if (instance->target > target)
target = instance->target;
}
- cdev->ops->set_cur_state(cdev, target);
+
+ if (!cdev->ops->set_cur_state(cdev, target))
+ thermal_cooling_device_stats_update(cdev, target);
+
cdev->updated = true;
mutex_unlock(&cdev->lock);
trace_cdev_update(cdev, target);
diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
index ba81c9080f6e..88bb26d5d375 100644
--- a/drivers/thermal/thermal_sysfs.c
+++ b/drivers/thermal/thermal_sysfs.c
@@ -666,6 +666,121 @@ void thermal_zone_destroy_device_groups(struct thermal_zone_device *tz)
}

/* sys I/F for cooling device */
+struct cooling_dev_stats {
+ spinlock_t lock;
+ unsigned int total_trans;
+ unsigned long state;
+ unsigned long max_states;
+ ktime_t last_time;
+ ktime_t *time_in_state;
+};
+
+static void update_time_in_state(struct cooling_dev_stats *stats)
+{
+ ktime_t now = ktime_get(), delta;
+
+ delta = ktime_sub(now, stats->last_time);
+ stats->time_in_state[stats->state] =
+ ktime_add(stats->time_in_state[stats->state], delta);
+ stats->last_time = now;
+}
+
+void thermal_cooling_device_stats_update(struct thermal_cooling_device *cdev,
+ unsigned long new_state)
+{
+ struct cooling_dev_stats *stats = cdev->stats;
+
+ spin_lock(&stats->lock);
+
+ if (stats->state == new_state)
+ goto unlock;
+
+ update_time_in_state(stats);
+ stats->state = new_state;
+ stats->total_trans++;
+
+unlock:
+ spin_unlock(&stats->lock);
+}
+
+static ssize_t
+thermal_cooling_device_total_trans_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct thermal_cooling_device *cdev = to_cooling_device(dev);
+ struct cooling_dev_stats *stats = cdev->stats;
+ int ret;
+
+ spin_lock(&stats->lock);
+ ret = sprintf(buf, "%u\n", stats->total_trans);
+ spin_unlock(&stats->lock);
+
+ return ret;
+}
+
+static ssize_t
+thermal_cooling_device_time_in_state_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct thermal_cooling_device *cdev = to_cooling_device(dev);
+ struct cooling_dev_stats *stats = cdev->stats;
+ ssize_t len = 0;
+ int i;
+
+ spin_lock(&stats->lock);
+ update_time_in_state(stats);
+
+ for (i = 0; i < stats->max_states; i++) {
+ len += sprintf(buf + len, "state%u\t%llu\n", i,
+ ktime_to_ms(stats->time_in_state[i]));
+ }
+ spin_unlock(&stats->lock);
+
+ return len;
+}
+
+static ssize_t
+thermal_cooling_device_reset_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct thermal_cooling_device *cdev = to_cooling_device(dev);
+ struct cooling_dev_stats *stats = cdev->stats;
+ int i;
+
+ spin_lock(&stats->lock);
+
+ stats->total_trans = 0;
+ stats->last_time = ktime_get();
+
+ for (i = 0; i < stats->max_states; i++)
+ stats->time_in_state[i] = ktime_set(0, 0);
+
+ spin_unlock(&stats->lock);
+
+ return count;
+}
+
+static DEVICE_ATTR(total_trans, 0444, thermal_cooling_device_total_trans_show,
+ NULL);
+static DEVICE_ATTR(time_in_state_ms, 0444,
+ thermal_cooling_device_time_in_state_show, NULL);
+static DEVICE_ATTR(reset, 0200, NULL, thermal_cooling_device_reset_store);
+
+static struct attribute *cooling_device_stats_attrs[] = {
+ &dev_attr_total_trans.attr,
+ &dev_attr_time_in_state_ms.attr,
+ &dev_attr_reset.attr,
+ NULL
+};
+
+static const struct attribute_group cooling_device_stats_attr_group = {
+ .attrs = cooling_device_stats_attrs,
+ .name = "stats"
+};
+
static ssize_t
thermal_cooling_device_type_show(struct device *dev,
struct device_attribute *attr, char *buf)
@@ -721,6 +836,7 @@ thermal_cooling_device_cur_state_store(struct device *dev,
result = cdev->ops->set_cur_state(cdev, state);
if (result)
return result;
+ thermal_cooling_device_stats_update(cdev, state);
return count;
}

@@ -745,14 +861,44 @@ static const struct attribute_group cooling_device_attr_group = {

static const struct attribute_group *cooling_device_attr_groups[] = {
&cooling_device_attr_group,
+ &cooling_device_stats_attr_group,
NULL,
};

void thermal_cooling_device_setup_sysfs(struct thermal_cooling_device *cdev)
{
+ struct cooling_dev_stats *stats;
+ unsigned long states;
+ int ret, size;
+
+ ret = cdev->ops->get_max_state(cdev, &states);
+ if (ret)
+ return;
+
+ states++; /* Total number of states is highest state + 1 */
+ size = sizeof(*stats);
+ size += sizeof(*stats->time_in_state) * states;
+
+ stats = kzalloc(size, GFP_KERNEL);
+ if (!stats)
+ return;
+
+ stats->time_in_state = (ktime_t *)(stats + 1);
+ cdev->stats = stats;
+ stats->last_time = ktime_get();
+ stats->max_states = states;
+ cdev->stats = stats;
+
+ spin_lock_init(&stats->lock);
cdev->device.groups = cooling_device_attr_groups;
}

+void thermal_cooling_device_remove_sysfs(struct thermal_cooling_device *cdev)
+{
+ kfree(cdev->stats);
+ cdev->stats = NULL;
+}
+
/* these helper will be used only at the time of bindig */
ssize_t
thermal_cooling_device_trip_point_show(struct device *dev,
diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index 8c5302374eaa..7834be668d80 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -148,6 +148,7 @@ struct thermal_cooling_device {
struct device device;
struct device_node *np;
void *devdata;
+ void *stats;
const struct thermal_cooling_device_ops *ops;
bool updated; /* true if the cooling device does not need update */
struct mutex lock; /* protect thermal_instances list */
--
2.15.0.194.g9af6a3dea062


2018-01-12 11:09:54

by Aishwarya Pant

[permalink] [raw]
Subject: Re: [PATCH V3] thermal: Add cooling device's statistics in sysfs

On Thu, Jan 11, 2018 at 03:06:09PM +0530, Viresh Kumar wrote:
> This extends the sysfs interface for thermal cooling devices and exposes
> some pretty useful statistics. These statistics have proven to be quite
> useful specially while doing benchmarks related to the task scheduler,
> where we want to make sure that nothing has disrupted the test,
> specially the cooling device which may have put constraints on the CPUs.
> The information exposed here tells us to what extent the CPUs were
> constrained by the thermal framework.
>
> The read-only "total_trans" file shows the total number of cooling state
> transitions the device has gone through since the time the cooling
> device is registered or the time when statistics were reset last.
>
> The read-only "time_in_state_ms" file shows the time spent by the device
> in the respective cooling states.
>
> The write-only "reset" file is used to reset the statistics.
>
> This is how the directory structure looks like for a single cooling
> device:
>
> $ ls -R /sys/class/thermal/cooling_device0/
> /sys/class/thermal/cooling_device0/:
> cur_state max_state power stats subsystem type uevent
>
> /sys/class/thermal/cooling_device0/power:
> autosuspend_delay_ms runtime_active_time runtime_suspended_time
> control runtime_status
>
> /sys/class/thermal/cooling_device0/stats:
> reset time_in_state_ms total_trans
>
> This is tested on ARM 32-bit Hisilicon hikey620 board running Ubuntu and
> ARM 64-bit Hisilicon hikey960 board running Android.
>
> Signed-off-by: Viresh Kumar <[email protected]>
> ---
> V2->V3:
> - Total number of states is max_level + 1. The earlier version didn't
> take that into account and so the stats for the highest state were
> missing.
>
> V1->V2:
> - Move to sysfs from debugfs
>
> drivers/thermal/thermal_core.c | 3 +-
> drivers/thermal/thermal_core.h | 3 +
> drivers/thermal/thermal_helpers.c | 5 +-
> drivers/thermal/thermal_sysfs.c | 146 ++++++++++++++++++++++++++++++++++++++
> include/linux/thermal.h | 1 +
> 5 files changed, 156 insertions(+), 2 deletions(-)

<snip>

> index 27e3b1df7360..f6eb01e99816 100644
> --- a/drivers/thermal/thermal_core.h
> +++ b/drivers/thermal/thermal_core.h
+
> +static DEVICE_ATTR(total_trans, 0444, thermal_cooling_device_total_trans_show,
> + NULL);
> +static DEVICE_ATTR(time_in_state_ms, 0444,
> + thermal_cooling_device_time_in_state_show, NULL);
> +static DEVICE_ATTR(reset, 0200, NULL, thermal_cooling_device_reset_store);
> +

Hi

I can see that you have added some files to the sysfs ABI. It would be good to
have these new interfaces documented in Documentation/ABI.

Aishwarya

<snip>

2018-01-12 17:25:48

by Eduardo Valentin

[permalink] [raw]
Subject: Re: [PATCH V3] thermal: Add cooling device's statistics in sysfs

On Fri, Jan 12, 2018 at 04:39:43PM +0530, Aishwarya Pant wrote:
> On Thu, Jan 11, 2018 at 03:06:09PM +0530, Viresh Kumar wrote:
> > This extends the sysfs interface for thermal cooling devices and exposes
> > some pretty useful statistics. These statistics have proven to be quite
> > useful specially while doing benchmarks related to the task scheduler,
> > where we want to make sure that nothing has disrupted the test,
> > specially the cooling device which may have put constraints on the CPUs.
> > The information exposed here tells us to what extent the CPUs were
> > constrained by the thermal framework.
> >
> > The read-only "total_trans" file shows the total number of cooling state
> > transitions the device has gone through since the time the cooling
> > device is registered or the time when statistics were reset last.
> >
> > The read-only "time_in_state_ms" file shows the time spent by the device
> > in the respective cooling states.
> >
> > The write-only "reset" file is used to reset the statistics.
> >
> > This is how the directory structure looks like for a single cooling
> > device:
> >
> > $ ls -R /sys/class/thermal/cooling_device0/
> > /sys/class/thermal/cooling_device0/:
> > cur_state max_state power stats subsystem type uevent
> >
> > /sys/class/thermal/cooling_device0/power:
> > autosuspend_delay_ms runtime_active_time runtime_suspended_time
> > control runtime_status
> >
> > /sys/class/thermal/cooling_device0/stats:
> > reset time_in_state_ms total_trans
> >
> > This is tested on ARM 32-bit Hisilicon hikey620 board running Ubuntu and
> > ARM 64-bit Hisilicon hikey960 board running Android.
> >
> > Signed-off-by: Viresh Kumar <[email protected]>
> > ---
> > V2->V3:
> > - Total number of states is max_level + 1. The earlier version didn't
> > take that into account and so the stats for the highest state were
> > missing.
> >
> > V1->V2:
> > - Move to sysfs from debugfs
> >
> > drivers/thermal/thermal_core.c | 3 +-
> > drivers/thermal/thermal_core.h | 3 +
> > drivers/thermal/thermal_helpers.c | 5 +-
> > drivers/thermal/thermal_sysfs.c | 146 ++++++++++++++++++++++++++++++++++++++
> > include/linux/thermal.h | 1 +
> > 5 files changed, 156 insertions(+), 2 deletions(-)
>
> <snip>
>
> > index 27e3b1df7360..f6eb01e99816 100644
> > --- a/drivers/thermal/thermal_core.h
> > +++ b/drivers/thermal/thermal_core.h
> +
> > +static DEVICE_ATTR(total_trans, 0444, thermal_cooling_device_total_trans_show,
> > + NULL);
> > +static DEVICE_ATTR(time_in_state_ms, 0444,
> > + thermal_cooling_device_time_in_state_show, NULL);
> > +static DEVICE_ATTR(reset, 0200, NULL, thermal_cooling_device_reset_store);
> > +
>
> Hi
>
> I can see that you have added some files to the sysfs ABI. It would be good to
> have these new interfaces documented in Documentation/ABI.

all thermal sysfs entries are documented here:


Documentation/thermal/sysfs-api.txt

Please update.

>
> Aishwarya
>
> <snip>

2018-01-12 17:46:11

by Eduardo Valentin

[permalink] [raw]
Subject: Re: [PATCH V3] thermal: Add cooling device's statistics in sysfs

Hello,

On Thu, Jan 11, 2018 at 03:06:09PM +0530, Viresh Kumar wrote:
> This extends the sysfs interface for thermal cooling devices and exposes
> some pretty useful statistics. These statistics have proven to be quite
> useful specially while doing benchmarks related to the task scheduler,
> where we want to make sure that nothing has disrupted the test,
> specially the cooling device which may have put constraints on the CPUs.
> The information exposed here tells us to what extent the CPUs were
> constrained by the thermal framework.
>
> The read-only "total_trans" file shows the total number of cooling state
> transitions the device has gone through since the time the cooling
> device is registered or the time when statistics were reset last.

It would be good to properly describe what total_trans means. Also, to
have this documented in the thermal sysfs file. Does it mean how many
times the cooling device has been set to a specific state? Or how many
times the state has been changed to that specific value?

>
> The read-only "time_in_state_ms" file shows the time spent by the device
> in the respective cooling states.

What about this file use the same unit as the cpufreq equivalent? IIRC,
the cpufreq node used clock_t.

>
> The write-only "reset" file is used to reset the statistics.
>

cool.

> This is how the directory structure looks like for a single cooling
> device:
>
> $ ls -R /sys/class/thermal/cooling_device0/
> /sys/class/thermal/cooling_device0/:
> cur_state max_state power stats subsystem type uevent
>
> /sys/class/thermal/cooling_device0/power:
> autosuspend_delay_ms runtime_active_time runtime_suspended_time
> control runtime_status
>
> /sys/class/thermal/cooling_device0/stats:
> reset time_in_state_ms total_trans
>

I guess the only missing node from the cpufreq design copy was the
transition table.

Also, I think the stats per thermal zone is also useful, sometimes
more than per cooling device, as I have been using in
the past, but that is just a comment, not really to block your patch.

> This is tested on ARM 32-bit Hisilicon hikey620 board running Ubuntu and
> ARM 64-bit Hisilicon hikey960 board running Android.
>
> Signed-off-by: Viresh Kumar <[email protected]>
> ---
> V2->V3:
> - Total number of states is max_level + 1. The earlier version didn't
> take that into account and so the stats for the highest state were
> missing.
>
> V1->V2:
> - Move to sysfs from debugfs
>
> drivers/thermal/thermal_core.c | 3 +-
> drivers/thermal/thermal_core.h | 3 +
> drivers/thermal/thermal_helpers.c | 5 +-
> drivers/thermal/thermal_sysfs.c | 146 ++++++++++++++++++++++++++++++++++++++
> include/linux/thermal.h | 1 +
> 5 files changed, 156 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index 2b1b0ba393a4..2cc4ae57484e 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -972,8 +972,8 @@ __thermal_cooling_device_register(struct device_node *np,
> cdev->ops = ops;
> cdev->updated = false;
> cdev->device.class = &thermal_class;
> - thermal_cooling_device_setup_sysfs(cdev);
> cdev->devdata = devdata;
> + thermal_cooling_device_setup_sysfs(cdev);
> dev_set_name(&cdev->device, "cooling_device%d", cdev->id);
> result = device_register(&cdev->device);
> if (result) {
> @@ -1106,6 +1106,7 @@ void thermal_cooling_device_unregister(struct thermal_cooling_device *cdev)
>
> ida_simple_remove(&thermal_cdev_ida, cdev->id);
> device_unregister(&cdev->device);
> + thermal_cooling_device_remove_sysfs(cdev);
> }
> EXPORT_SYMBOL_GPL(thermal_cooling_device_unregister);
>
> diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h
> index 27e3b1df7360..f6eb01e99816 100644
> --- a/drivers/thermal/thermal_core.h
> +++ b/drivers/thermal/thermal_core.h
> @@ -73,6 +73,9 @@ int thermal_build_list_of_policies(char *buf);
> int thermal_zone_create_device_groups(struct thermal_zone_device *, int);
> void thermal_zone_destroy_device_groups(struct thermal_zone_device *);
> void thermal_cooling_device_setup_sysfs(struct thermal_cooling_device *);
> +void thermal_cooling_device_remove_sysfs(struct thermal_cooling_device *cdev);
> +void thermal_cooling_device_stats_update(struct thermal_cooling_device *cdev,
> + unsigned long new_state);
> /* used only at binding time */
> ssize_t
> thermal_cooling_device_trip_point_show(struct device *,
> diff --git a/drivers/thermal/thermal_helpers.c b/drivers/thermal/thermal_helpers.c
> index 8cdf75adcce1..eb03d7e099bb 100644
> --- a/drivers/thermal/thermal_helpers.c
> +++ b/drivers/thermal/thermal_helpers.c
> @@ -187,7 +187,10 @@ void thermal_cdev_update(struct thermal_cooling_device *cdev)
> if (instance->target > target)
> target = instance->target;
> }
> - cdev->ops->set_cur_state(cdev, target);
> +
> + if (!cdev->ops->set_cur_state(cdev, target))
> + thermal_cooling_device_stats_update(cdev, target);
> +

I might be wrong but, this will maybe double account for cases the same
cooling state is selected.

> cdev->updated = true;
> mutex_unlock(&cdev->lock);
> trace_cdev_update(cdev, target);
> diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
> index ba81c9080f6e..88bb26d5d375 100644
> --- a/drivers/thermal/thermal_sysfs.c
> +++ b/drivers/thermal/thermal_sysfs.c
> @@ -666,6 +666,121 @@ void thermal_zone_destroy_device_groups(struct thermal_zone_device *tz)
> }
>
> /* sys I/F for cooling device */
> +struct cooling_dev_stats {
> + spinlock_t lock;
> + unsigned int total_trans;
> + unsigned long state;
> + unsigned long max_states;
> + ktime_t last_time;
> + ktime_t *time_in_state;
> +};
> +
> +static void update_time_in_state(struct cooling_dev_stats *stats)
> +{
> + ktime_t now = ktime_get(), delta;
> +
> + delta = ktime_sub(now, stats->last_time);
> + stats->time_in_state[stats->state] =
> + ktime_add(stats->time_in_state[stats->state], delta);
> + stats->last_time = now;
> +}
> +
> +void thermal_cooling_device_stats_update(struct thermal_cooling_device *cdev,
> + unsigned long new_state)
> +{
> + struct cooling_dev_stats *stats = cdev->stats;
> +
> + spin_lock(&stats->lock);
> +
> + if (stats->state == new_state)
> + goto unlock;
> +
> + update_time_in_state(stats);
> + stats->state = new_state;
> + stats->total_trans++;
> +
> +unlock:
> + spin_unlock(&stats->lock);
> +}
> +
> +static ssize_t
> +thermal_cooling_device_total_trans_show(struct device *dev,
> + struct device_attribute *attr,
> + char *buf)
> +{
> + struct thermal_cooling_device *cdev = to_cooling_device(dev);
> + struct cooling_dev_stats *stats = cdev->stats;
> + int ret;
> +
> + spin_lock(&stats->lock);
> + ret = sprintf(buf, "%u\n", stats->total_trans);
> + spin_unlock(&stats->lock);
> +
> + return ret;
> +}
> +
> +static ssize_t
> +thermal_cooling_device_time_in_state_show(struct device *dev,
> + struct device_attribute *attr,
> + char *buf)
> +{
> + struct thermal_cooling_device *cdev = to_cooling_device(dev);
> + struct cooling_dev_stats *stats = cdev->stats;
> + ssize_t len = 0;
> + int i;
> +
> + spin_lock(&stats->lock);
> + update_time_in_state(stats);
> +
> + for (i = 0; i < stats->max_states; i++) {
> + len += sprintf(buf + len, "state%u\t%llu\n", i,
> + ktime_to_ms(stats->time_in_state[i]));
> + }
> + spin_unlock(&stats->lock);
> +
> + return len;
> +}
> +
> +static ssize_t
> +thermal_cooling_device_reset_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct thermal_cooling_device *cdev = to_cooling_device(dev);
> + struct cooling_dev_stats *stats = cdev->stats;
> + int i;
> +
> + spin_lock(&stats->lock);
> +
> + stats->total_trans = 0;
> + stats->last_time = ktime_get();

So, this is ktime. Then again, might make sense to follow cpufreq unit.

Once again, this needs to go into the documentation, regardless of the
unit we end up with.

> +
> + for (i = 0; i < stats->max_states; i++)
> + stats->time_in_state[i] = ktime_set(0, 0);
> +
> + spin_unlock(&stats->lock);
> +
> + return count;
> +}
> +
> +static DEVICE_ATTR(total_trans, 0444, thermal_cooling_device_total_trans_show,
> + NULL);
> +static DEVICE_ATTR(time_in_state_ms, 0444,
> + thermal_cooling_device_time_in_state_show, NULL);
> +static DEVICE_ATTR(reset, 0200, NULL, thermal_cooling_device_reset_store);
> +
> +static struct attribute *cooling_device_stats_attrs[] = {
> + &dev_attr_total_trans.attr,
> + &dev_attr_time_in_state_ms.attr,
> + &dev_attr_reset.attr,
> + NULL
> +};
> +
> +static const struct attribute_group cooling_device_stats_attr_group = {
> + .attrs = cooling_device_stats_attrs,
> + .name = "stats"
> +};
> +
> static ssize_t
> thermal_cooling_device_type_show(struct device *dev,
> struct device_attribute *attr, char *buf)
> @@ -721,6 +836,7 @@ thermal_cooling_device_cur_state_store(struct device *dev,
> result = cdev->ops->set_cur_state(cdev, state);
> if (result)
> return result;
> + thermal_cooling_device_stats_update(cdev, state);
> return count;
> }
>
> @@ -745,14 +861,44 @@ static const struct attribute_group cooling_device_attr_group = {
>
> static const struct attribute_group *cooling_device_attr_groups[] = {
> &cooling_device_attr_group,
> + &cooling_device_stats_attr_group,
> NULL,
> };
>
> void thermal_cooling_device_setup_sysfs(struct thermal_cooling_device *cdev)
> {
> + struct cooling_dev_stats *stats;
> + unsigned long states;
> + int ret, size;
> +
> + ret = cdev->ops->get_max_state(cdev, &states);
> + if (ret)
> + return;
> +
> + states++; /* Total number of states is highest state + 1 */
> + size = sizeof(*stats);
> + size += sizeof(*stats->time_in_state) * states;
> +
> + stats = kzalloc(size, GFP_KERNEL);
> + if (!stats)
> + return;
> +
> + stats->time_in_state = (ktime_t *)(stats + 1);
> + cdev->stats = stats;
> + stats->last_time = ktime_get();
> + stats->max_states = states;
> + cdev->stats = stats;
> +
> + spin_lock_init(&stats->lock);

I would say, the cooling device stat initialization deserves its own
initialization function, don't you think?


Also, I see nothing about sysfs on the lines added to
thermal_cooling_device_setup_sysfs().

> cdev->device.groups = cooling_device_attr_groups;
> }
>
> +void thermal_cooling_device_remove_sysfs(struct thermal_cooling_device *cdev)
> +{
> + kfree(cdev->stats);
> + cdev->stats = NULL;
> +}
> +
> /* these helper will be used only at the time of bindig */
> ssize_t
> thermal_cooling_device_trip_point_show(struct device *dev,
> diff --git a/include/linux/thermal.h b/include/linux/thermal.h
> index 8c5302374eaa..7834be668d80 100644
> --- a/include/linux/thermal.h
> +++ b/include/linux/thermal.h
> @@ -148,6 +148,7 @@ struct thermal_cooling_device {
> struct device device;
> struct device_node *np;
> void *devdata;
> + void *stats;
> const struct thermal_cooling_device_ops *ops;
> bool updated; /* true if the cooling device does not need update */
> struct mutex lock; /* protect thermal_instances list */
> --
> 2.15.0.194.g9af6a3dea062
>

2018-01-15 04:46:21

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH V3] thermal: Add cooling device's statistics in sysfs

On 12-01-18, 09:46, Eduardo Valentin wrote:
> On Thu, Jan 11, 2018 at 03:06:09PM +0530, Viresh Kumar wrote:
> > $ ls -R /sys/class/thermal/cooling_device0/
> > /sys/class/thermal/cooling_device0/:
> > cur_state max_state power stats subsystem type uevent
> >
> > /sys/class/thermal/cooling_device0/power:
> > autosuspend_delay_ms runtime_active_time runtime_suspended_time
> > control runtime_status
> >
> > /sys/class/thermal/cooling_device0/stats:
> > reset time_in_state_ms total_trans
> >
>
> I guess the only missing node from the cpufreq design copy was the
> transition table.

I don't think it would be very useful here and so didn't add it.

> Also, I think the stats per thermal zone is also useful, sometimes
> more than per cooling device, as I have been using in
> the past, but that is just a comment, not really to block your patch.

Sure, that can be added in a separate patch. What kind of stats are you
expecting there ?

> > diff --git a/drivers/thermal/thermal_helpers.c b/drivers/thermal/thermal_helpers.c
> > index 8cdf75adcce1..eb03d7e099bb 100644
> > --- a/drivers/thermal/thermal_helpers.c
> > +++ b/drivers/thermal/thermal_helpers.c
> > @@ -187,7 +187,10 @@ void thermal_cdev_update(struct thermal_cooling_device *cdev)
> > if (instance->target > target)
> > target = instance->target;
> > }
> > - cdev->ops->set_cur_state(cdev, target);
> > +
> > + if (!cdev->ops->set_cur_state(cdev, target))
> > + thermal_cooling_device_stats_update(cdev, target);
> > +
>
> I might be wrong but, this will maybe double account for cases the same
> cooling state is selected.

No because I am exiting early for same sates in that routine.

> > void thermal_cooling_device_setup_sysfs(struct thermal_cooling_device *cdev)
> > {
> > + struct cooling_dev_stats *stats;
> > + unsigned long states;
> > + int ret, size;
> > +
> > + ret = cdev->ops->get_max_state(cdev, &states);
> > + if (ret)
> > + return;
> > +
> > + states++; /* Total number of states is highest state + 1 */
> > + size = sizeof(*stats);
> > + size += sizeof(*stats->time_in_state) * states;
> > +
> > + stats = kzalloc(size, GFP_KERNEL);
> > + if (!stats)
> > + return;
> > +
> > + stats->time_in_state = (ktime_t *)(stats + 1);
> > + cdev->stats = stats;
> > + stats->last_time = ktime_get();
> > + stats->max_states = states;
> > + cdev->stats = stats;
> > +
> > + spin_lock_init(&stats->lock);
>
> I would say, the cooling device stat initialization deserves its own
> initialization function, don't you think?

Hmm, I am not sure if we need it right away. This function is only for cooling
device's sysfs setup and initializing the "stats" structure kind of falls in the
setup category. Of course if we want this function to handle sysfs setup for
thermal zones as well, then yes we better split it up.

The other thing was with thermal_cooling_device_remove_sysfs() as that would
also need to call a separate routine to do the cleanup and that would be the
only thing it will end up doing. So it would more be like a dummy caller.

> Also, I see nothing about sysfs on the lines added to
> thermal_cooling_device_setup_sysfs().

Well, it is allocating structure to keep the values showed while reading sysfs
files, so it is kind of sysfs related to me :)

--
viresh

2018-01-15 16:32:22

by Eduardo Valentin

[permalink] [raw]
Subject: Re: [PATCH V3] thermal: Add cooling device's statistics in sysfs

On Mon, Jan 15, 2018 at 10:16:16AM +0530, Viresh Kumar wrote:
> On 12-01-18, 09:46, Eduardo Valentin wrote:
> > On Thu, Jan 11, 2018 at 03:06:09PM +0530, Viresh Kumar wrote:
> > > $ ls -R /sys/class/thermal/cooling_device0/
> > > /sys/class/thermal/cooling_device0/:
> > > cur_state max_state power stats subsystem type uevent
> > >
> > > /sys/class/thermal/cooling_device0/power:
> > > autosuspend_delay_ms runtime_active_time runtime_suspended_time
> > > control runtime_status
> > >
> > > /sys/class/thermal/cooling_device0/stats:
> > > reset time_in_state_ms total_trans
> > >
> >
> > I guess the only missing node from the cpufreq design copy was the
> > transition table.
>
> I don't think it would be very useful here and so didn't add it.
>

It is actually quite useful, on both, thermal zones and cooling devices,
to detect misbehaving policies. Typically the transitions recorded
by a well behaving policy is very well defined, typically very smooth
from one state to the nearby states. When you see jumps, which is very
clear on a transition table, that is hint for misbehaving scenarios.

Or, when jumps are expected, they are typically rare situation in which
the system engineer wants to be aware/ account for frequency of
occurrence.

> > Also, I think the stats per thermal zone is also useful, sometimes
> > more than per cooling device, as I have been using in
> > the past, but that is just a comment, not really to block your patch.
>
> Sure, that can be added in a separate patch. What kind of stats are you
> expecting there ?

Same set you added for cooling devices should also be reflected on
thermal zones, but instead of cooling state, you want to do the account
on trips, at least for the context of this patch set.

>
> > > diff --git a/drivers/thermal/thermal_helpers.c b/drivers/thermal/thermal_helpers.c
> > > index 8cdf75adcce1..eb03d7e099bb 100644
> > > --- a/drivers/thermal/thermal_helpers.c
> > > +++ b/drivers/thermal/thermal_helpers.c
> > > @@ -187,7 +187,10 @@ void thermal_cdev_update(struct thermal_cooling_device *cdev)
> > > if (instance->target > target)
> > > target = instance->target;
> > > }
> > > - cdev->ops->set_cur_state(cdev, target);
> > > +
> > > + if (!cdev->ops->set_cur_state(cdev, target))
> > > + thermal_cooling_device_stats_update(cdev, target);
> > > +
> >
> > I might be wrong but, this will maybe double account for cases the same
> > cooling state is selected.
>
> No because I am exiting early for same sates in that routine.

Did I miss that hunk?

>
> > > void thermal_cooling_device_setup_sysfs(struct thermal_cooling_device *cdev)
> > > {
> > > + struct cooling_dev_stats *stats;
> > > + unsigned long states;
> > > + int ret, size;
> > > +
> > > + ret = cdev->ops->get_max_state(cdev, &states);
> > > + if (ret)
> > > + return;
> > > +
> > > + states++; /* Total number of states is highest state + 1 */
> > > + size = sizeof(*stats);
> > > + size += sizeof(*stats->time_in_state) * states;
> > > +
> > > + stats = kzalloc(size, GFP_KERNEL);
> > > + if (!stats)
> > > + return;
> > > +
> > > + stats->time_in_state = (ktime_t *)(stats + 1);
> > > + cdev->stats = stats;
> > > + stats->last_time = ktime_get();
> > > + stats->max_states = states;
> > > + cdev->stats = stats;
> > > +
> > > + spin_lock_init(&stats->lock);
> >
> > I would say, the cooling device stat initialization deserves its own
> > initialization function, don't you think?
>
> Hmm, I am not sure if we need it right away. This function is only for cooling
> device's sysfs setup and initializing the "stats" structure kind of falls in the
> setup category. Of course if we want this function to handle sysfs setup for
> thermal zones as well, then yes we better split it up.
>
> The other thing was with thermal_cooling_device_remove_sysfs() as that would
> also need to call a separate routine to do the cleanup and that would be the
> only thing it will end up doing. So it would more be like a dummy caller.
>
> > Also, I see nothing about sysfs on the lines added to
> > thermal_cooling_device_setup_sysfs().
>
> Well, it is allocating structure to keep the values showed while reading sysfs
> files, so it is kind of sysfs related to me :)

Ideally these stats should be behind a config entry, which can be used
also to remove the code out of the kernel that does not want it in.
So, yes, I still think they deserve their own setup/destroy pair, and
a config entry to put the code behind it.

>
> --
> viresh

2018-01-16 08:44:24

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH V3] thermal: Add cooling device's statistics in sysfs

On 15-01-18, 08:32, Eduardo Valentin wrote:
> On Mon, Jan 15, 2018 at 10:16:16AM +0530, Viresh Kumar wrote:
> > No because I am exiting early for same sates in that routine.
>
> Did I miss that hunk?

Yes. It is something like this.

+ if (stats->state == new_state)
+ goto unlock;
+

--
viresh

2018-01-16 10:00:51

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH V3] thermal: Add cooling device's statistics in sysfs

On 15-01-18, 08:32, Eduardo Valentin wrote:
> Same set you added for cooling devices should also be reflected on
> thermal zones, but instead of cooling state, you want to do the account
> on trips, at least for the context of this patch set.

I never dived deep into the thermal core earlier and never had to implement any
SoC side of it as well, so consider me a newbie here :)

I tried to look into the core on what I can do to get some stats out for the
zones, but I am a bit confused.

handle_thermal_trip() looked to be the function where I could update stats but
then I realized that this is artificially called for every possible trip in most
of the cases from thermal_zone_device_update() and so this can't be the real
place.

Does it mean that this needs to be updated somehow from the thermal governor's,
->throttle() callback? I am not sure if doing it from every governor code is
what we want to do. And then the trip point of cooling devices can be different
within the same thermal zone.

And so I posted the first patch again without thermal zone stuff in it as that
would take more time.

--
viresh