2022-04-18 03:54:59

by Junwen Wu

[permalink] [raw]
Subject: [PATCH v1] thermal/core: change mm alloc method to avoid kernel warning

Very high cooling device max state value makes cooling device stats
buffer allocation fails,like below.Using kzvalloc instead of kzalloc
can avoid this issue.

[ 7.392644]WARNING: CPU: 7 PID: 1747 at mm/page_alloc.c:5090 __alloc_pages_nodemask+0x1c0/0x3dc
[ 7.392989]Call trace:
[ 7.392992]__alloc_pages_nodemask+0x1c0/0x3dc
[ 7.392995]kmalloc_order+0x54/0x358
[ 7.392997]kmalloc_order_trace+0x34/0x1bc
[ 7.393001]__kmalloc+0x5cc/0x9c8
[ 7.393005]thermal_cooling_device_setup_sysfs+0x90/0x218
[ 7.393008]__thermal_cooling_device_register+0x160/0x7a4
[ 7.393012]thermal_of_cooling_device_register+0x14/0x24
[ 7.393140]backlight_cdev_register+0x88/0x100 [msm_drm]

Signed-off-by: Junwen Wu <[email protected]>
---
drivers/thermal/thermal_sysfs.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
index f154bada2906..361e0d0c241b 100644
--- a/drivers/thermal/thermal_sysfs.c
+++ b/drivers/thermal/thermal_sysfs.c
@@ -829,7 +829,7 @@ static void cooling_device_stats_setup(struct thermal_cooling_device *cdev)
var += sizeof(*stats->time_in_state) * states;
var += sizeof(*stats->trans_table) * states * states;

- stats = kzalloc(var, GFP_KERNEL);
+ stats = kvzalloc(var, GFP_KERNEL);
if (!stats)
return;

@@ -848,7 +848,7 @@ static void cooling_device_stats_setup(struct thermal_cooling_device *cdev)

static void cooling_device_stats_destroy(struct thermal_cooling_device *cdev)
{
- kfree(cdev->stats);
+ kvfree(cdev->stats);
cdev->stats = NULL;
}

--
2.25.1


2022-04-20 21:31:44

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH v1] thermal/core: change mm alloc method to avoid kernel warning

On 19/04/2022 15:54, Zhang Rui wrote:
> CC Viresh.
>
> On Tue, 2022-04-19 at 11:14 +0200, Daniel Lezcano wrote:
>> On 19/04/2022 10:48, Zhang Rui wrote:
>>> On Sun, 2022-04-17 at 12:56 +0000, Junwen Wu wrote:
>>>> Very high cooling device max state value makes cooling device
>>>> stats
>>>> buffer allocation fails,like below.Using kzvalloc instead of
>>>> kzalloc
>>>> can avoid this issue.
>>>
>>> When a cooling device has big max_state, this patch can get ride of
>>> the
>>> warning here, but still we end up with the read failure of the
>>> trans_table in sysfs because it is larger than PAGE_SIZE.
>>>
>>> $ cat /sys/class/thermal/cooling_device8/stats/trans_table
>>> cat: /sys/class/thermal/cooling_device8/stats/trans_table: File too
>>> large
>>>
>>> IMO, unless we can fix both places, I'd suggest we skip allocating
>>> and
>>> creating the broken trans_table attr. Like a prototype patch below
>>
>> Why not create a thermal debugfs with real useful information and
>> get
>> rid of this broken code ?
>
> The idea looks good to me.

What about doing a percentile approach of the state indexes changes
instead of a raw matrix full of zeros ? So we show the most significant
transitions, perhaps something like:

99%: 7->6 6->7
98%: 6->5 5->6
95%: 5->4 4->5
90%: 7->5 5->7
80%: 6->4 4->6
70%: 7->1 7->2
50%: ... ...

total: 123456 124573


And another statistics file containing some timings information like the
total duration in mitigation, and the duration in the most significant
states above?

--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2022-04-21 07:14:34

by Zhang, Rui

[permalink] [raw]
Subject: Re: [PATCH v1] thermal/core: change mm alloc method to avoid kernel warning

CC Viresh.

On Tue, 2022-04-19 at 11:14 +0200, Daniel Lezcano wrote:
> On 19/04/2022 10:48, Zhang Rui wrote:
> > On Sun, 2022-04-17 at 12:56 +0000, Junwen Wu wrote:
> > > Very high cooling device max state value makes cooling device
> > > stats
> > > buffer allocation fails,like below.Using kzvalloc instead of
> > > kzalloc
> > > can avoid this issue.
> >
> > When a cooling device has big max_state, this patch can get ride of
> > the
> > warning here, but still we end up with the read failure of the
> > trans_table in sysfs because it is larger than PAGE_SIZE.
> >
> > $ cat /sys/class/thermal/cooling_device8/stats/trans_table
> > cat: /sys/class/thermal/cooling_device8/stats/trans_table: File too
> > large
> >
> > IMO, unless we can fix both places, I'd suggest we skip allocating
> > and
> > creating the broken trans_table attr. Like a prototype patch below
>
> Why not create a thermal debugfs with real useful information and
> get
> rid of this broken code ?

The idea looks good to me.

thanks,
rui

2022-04-21 20:26:48

by Daniel Lezcano

[permalink] [raw]
Subject: Re: [PATCH v1] thermal/core: change mm alloc method to avoid kernel warning

On 19/04/2022 10:48, Zhang Rui wrote:
> On Sun, 2022-04-17 at 12:56 +0000, Junwen Wu wrote:
>> Very high cooling device max state value makes cooling device stats
>> buffer allocation fails,like below.Using kzvalloc instead of kzalloc
>> can avoid this issue.
>
> When a cooling device has big max_state, this patch can get ride of the
> warning here, but still we end up with the read failure of the
> trans_table in sysfs because it is larger than PAGE_SIZE.
>
> $ cat /sys/class/thermal/cooling_device8/stats/trans_table
> cat: /sys/class/thermal/cooling_device8/stats/trans_table: File too
> large
>
> IMO, unless we can fix both places, I'd suggest we skip allocating and
> creating the broken trans_table attr. Like a prototype patch below

Why not create a thermal debugfs with real useful information and get
rid of this broken code ?

I've some prototype code I can respin to RFC



--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

2022-04-22 19:45:14

by Zhang, Rui

[permalink] [raw]
Subject: Re: [PATCH v1] thermal/core: change mm alloc method to avoid kernel warning

On Sun, 2022-04-17 at 12:56 +0000, Junwen Wu wrote:
> Very high cooling device max state value makes cooling device stats
> buffer allocation fails,like below.Using kzvalloc instead of kzalloc
> can avoid this issue.

When a cooling device has big max_state, this patch can get ride of the
warning here, but still we end up with the read failure of the
trans_table in sysfs because it is larger than PAGE_SIZE.

$ cat /sys/class/thermal/cooling_device8/stats/trans_table
cat: /sys/class/thermal/cooling_device8/stats/trans_table: File too
large

IMO, unless we can fix both places, I'd suggest we skip allocating and
creating the broken trans_table attr. Like a prototype patch below

thanks,
rui

From 2a504596d06a91d6f01d25eee71ebcdeff164e59 Mon Sep 17 00:00:00 2001
From: Zhang Rui <[email protected]>
Date: Tue, 19 Apr 2022 16:40:04 +0800
Subject: [PATCH] thermal: thermal_stats: skip broken trans_table

Signed-off-by: Zhang Rui <[email protected]>
---
drivers/thermal/thermal_sysfs.c | 40 ++++++++++++++++++++++++++-------
1 file changed, 32 insertions(+), 8 deletions(-)

diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
index f154bada2906..d917489f89bc 100644
--- a/drivers/thermal/thermal_sysfs.c
+++ b/drivers/thermal/thermal_sysfs.c
@@ -686,7 +686,8 @@ void thermal_cooling_device_stats_update(struct thermal_cooling_device *cdev,
goto unlock;

update_time_in_state(stats);
- stats->trans_table[stats->state * stats->max_states + new_state]++;
+ if (stats->trans_table)
+ stats->trans_table[stats->state * stats->max_states + new_state]++;
stats->state = new_state;
stats->total_trans++;

@@ -741,8 +742,9 @@ reset_store(struct device *dev, struct device_attribute *attr, const char *buf,

stats->total_trans = 0;
stats->last_time = ktime_get();
- memset(stats->trans_table, 0,
- states * states * sizeof(*stats->trans_table));
+ if (stats->trans_table)
+ memset(stats->trans_table, 0,
+ states * states * sizeof(*stats->trans_table));

for (i = 0; i < stats->max_states; i++)
stats->time_in_state[i] = ktime_set(0, 0);
@@ -805,7 +807,6 @@ static struct attribute *cooling_device_stats_attrs[] = {
&dev_attr_total_trans.attr,
&dev_attr_time_in_state_ms.attr,
&dev_attr_reset.attr,
- &dev_attr_trans_table.attr,
NULL
};

@@ -814,11 +815,25 @@ static const struct attribute_group cooling_device_stats_attr_group = {
.name = "stats"
};

+static struct attribute *cooling_device_stats_ext_attrs[] = {
+ &dev_attr_total_trans.attr,
+ &dev_attr_time_in_state_ms.attr,
+ &dev_attr_reset.attr,
+ &dev_attr_trans_table.attr,
+ NULL
+};
+
+static const struct attribute_group cooling_device_stats_ext_attr_group = {
+ .attrs = cooling_device_stats_ext_attrs,
+ .name = "stats"
+};
+
static void cooling_device_stats_setup(struct thermal_cooling_device *cdev)
{
struct cooling_dev_stats *stats;
unsigned long states;
- int var;
+ int var, size;
+ bool ext = false;

if (cdev->ops->get_max_state(cdev, &states))
return;
@@ -827,14 +842,20 @@ static void cooling_device_stats_setup(struct thermal_cooling_device *cdev)

var = sizeof(*stats);
var += sizeof(*stats->time_in_state) * states;
- var += sizeof(*stats->trans_table) * states * states;
+ size = sizeof(*stats->trans_table) * states * states;
+
+ if (var + size < PAGE_SIZE) {
+ ext = true;
+ var += size;
+ }

stats = kzalloc(var, GFP_KERNEL);
if (!stats)
return;

stats->time_in_state = (ktime_t *)(stats + 1);
- stats->trans_table = (unsigned int *)(stats->time_in_state + states);
+ if (ext)
+ stats->trans_table = (unsigned int *)(stats->time_in_state + states);
cdev->stats = stats;
stats->last_time = ktime_get();
stats->max_states = states;
@@ -843,7 +864,10 @@ static void cooling_device_stats_setup(struct thermal_cooling_device *cdev)

/* Fill the empty slot left in cooling_device_attr_groups */
var = ARRAY_SIZE(cooling_device_attr_groups) - 2;
- cooling_device_attr_groups[var] = &cooling_device_stats_attr_group;
+ if (ext)
+ cooling_device_attr_groups[var] = &cooling_device_stats_ext_attr_group;
+ else
+ cooling_device_attr_groups[var] = &cooling_device_stats_attr_group;
}

static void cooling_device_stats_destroy(struct thermal_cooling_device *cdev)
--
2.17.1

>
> [ 7.392644]WARNING: CPU: 7 PID: 1747 at mm/page_alloc.c:5090
> __alloc_pages_nodemask+0x1c0/0x3dc
> [ 7.392989]Call trace:
> [ 7.392992]__alloc_pages_nodemask+0x1c0/0x3dc
> [ 7.392995]kmalloc_order+0x54/0x358
> [ 7.392997]kmalloc_order_trace+0x34/0x1bc
> [ 7.393001]__kmalloc+0x5cc/0x9c8
> [ 7.393005]thermal_cooling_device_setup_sysfs+0x90/0x218
> [ 7.393008]__thermal_cooling_device_register+0x160/0x7a4
> [ 7.393012]thermal_of_cooling_device_register+0x14/0x24
> [ 7.393140]backlight_cdev_register+0x88/0x100 [msm_drm]
>
>
> Signed-off-by: Junwen Wu <[email protected]>
> ---
> drivers/thermal/thermal_sysfs.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/thermal/thermal_sysfs.c
> b/drivers/thermal/thermal_sysfs.c
> index f154bada2906..361e0d0c241b 100644
> --- a/drivers/thermal/thermal_sysfs.c
> +++ b/drivers/thermal/thermal_sysfs.c
> @@ -829,7 +829,7 @@ static void cooling_device_stats_setup(struct
> thermal_cooling_device *cdev)
> var += sizeof(*stats->time_in_state) * states;
> var += sizeof(*stats->trans_table) * states * states;
>
> - stats = kzalloc(var, GFP_KERNEL);
> + stats = kvzalloc(var, GFP_KERNEL);
> if (!stats)
> return;
>
> @@ -848,7 +848,7 @@ static void cooling_device_stats_setup(struct
> thermal_cooling_device *cdev)
>
> static void cooling_device_stats_destroy(struct
> thermal_cooling_device *cdev)
> {
> - kfree(cdev->stats);
> + kvfree(cdev->stats);
> cdev->stats = NULL;
> }
>

2022-05-09 06:47:14

by Junwen Wu

[permalink] [raw]
Subject: Re: [PATCH v1] thermal/core: change mm alloc method to avoid kernel warning

From: Daniel Lezcano <[email protected]>

On 19/04/2022 15:54, Zhang Rui wrote:
> CC Viresh.
>
> On Tue, 2022-04-19 at 11:14 +0200, Daniel Lezcano wrote:
>> On 19/04/2022 10:48, Zhang Rui wrote:
>>> large
>>>
>>> IMO, unless we can fix both places, I'd suggest we skip allocating
>>> and
>>> creating the broken trans_table attr. Like a prototype patch below
>>
>> Why not create a thermal debugfs with real useful information and
>> get
>> rid of this broken code ?
>
> The idea looks good to me.

>What about doing a percentile approach of the state indexes changes
>instead of a raw matrix full of zeros ? So we show the most significant
>transitions, perhaps something like:
>
>99%: 7->6 6->7
>98%: 6->5 5->6
>95%: 5->4 4->5
>90%: 7->5 5->7
>80%: 6->4 4->6
>70%: 7->1 7->2
>50%: ... ...

>total: 123456 124573


>And another statistics file containing some timings information like the
>total duration in mitigation, and the duration in the most significant
>states above?

Viresh, Zhang Rui, Daniel,sorry for the delay indeed ,the trans_table is always full of zero,
I introduce 'show_state' node(tunnable by user,default set as max_states/2) ,thus only show show_state'th trans count
to the max trans count change stats. in this way trans_table_show's buffer always less than PAGE_SIZE
I create a patch v2
like this:
/sys/class/thermal/cooling_device0/stats # cat trans_table
From : Index_change
state 0: ->1( 1) ->2( 2) ->7( 1)
state 1: ->0( 1) ->2( 1)
state 2: ->0( 2) ->1( 1)

here is the patch:
From 64a7fefd008cb890a4a9ea4efd0dd388ac536ad5 Mon Sep 17 00:00:00 2001
From: Junwen Wu <[email protected]>
Date: Sun, 8 May 2022 14:50:14 +0000
Subject: [PATCH v2] thermal/core: Make trans_table tunnable to avoid some
needless zero output

Very high cooling device max state value make trans_table node prompt File too large.
we introduce show_state node, tunnable by user,thus trans_table only show show_state'th
trans count to the max trans count, in this way trans_table_show's buffer is
always less than PAGE_SIZE and shows the important changes.

Signed-off-by: Junwen Wu <[email protected]>
---
V1 -> V2: avoid some needless zero output
drivers/thermal/thermal_sysfs.c | 136 +++++++++++++++++++++++---------
1 file changed, 99 insertions(+), 37 deletions(-)

diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
index f154bada2906..1496088a1638 100644
--- a/drivers/thermal/thermal_sysfs.c
+++ b/drivers/thermal/thermal_sysfs.c
@@ -656,6 +656,7 @@ struct cooling_dev_stats {
spinlock_t lock;
unsigned int total_trans;
unsigned long state;
+ unsigned long show_state;
unsigned long max_states;
ktime_t last_time;
ktime_t *time_in_state;
@@ -752,60 +753,119 @@ reset_store(struct device *dev, struct device_attribute *attr, const char *buf,
return count;
}

-static ssize_t trans_table_show(struct device *dev,
- struct device_attribute *attr, char *buf)
+static ssize_t
+show_state_store(struct device *dev, struct device_attribute *attr, const char *buf,
+ size_t count)
{
- struct thermal_cooling_device *cdev = to_cooling_device(dev);
- struct cooling_dev_stats *stats = cdev->stats;
- ssize_t len = 0;
- int i, j;
+ struct thermal_cooling_device *cdev = to_cooling_device(dev);
+ struct cooling_dev_stats *stats = cdev->stats;
+ unsigned long state;
+ ssize_t ret;

- len += snprintf(buf + len, PAGE_SIZE - len, " From : To\n");
- len += snprintf(buf + len, PAGE_SIZE - len, " : ");
- for (i = 0; i < stats->max_states; i++) {
- if (len >= PAGE_SIZE)
- break;
- len += snprintf(buf + len, PAGE_SIZE - len, "state%2u ", i);
- }
- if (len >= PAGE_SIZE)
- return PAGE_SIZE;
+ spin_lock(&stats->lock);

- len += snprintf(buf + len, PAGE_SIZE - len, "\n");
+ ret = kstrtoul(buf, 10, &state);
+ if (ret || (state > stats->max_states))
+ goto unlock;

- for (i = 0; i < stats->max_states; i++) {
- if (len >= PAGE_SIZE)
- break;
+ stats->show_state = state;
+unlock:
+ spin_unlock(&stats->lock);
+ return count;
+}

- len += snprintf(buf + len, PAGE_SIZE - len, "state%2u:", i);
+static ssize_t
+show_state_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+ struct thermal_cooling_device *cdev = to_cooling_device(dev);
+ struct cooling_dev_stats *stats = cdev->stats;
+
+ return sprintf(buf, "%lu\n", stats->show_state);
+}
+
+static int find_show_state( int *nums, int numsSize, int k, unsigned int *max_value)
+{
+ int i, min = INT_MAX, max = 0;
+ for( i = 0; i < numsSize; ++i )
+ {
+ min = nums[i] < min ? nums[i] : min;
+ max = nums[i] > max ? nums[i] : max;
+ }
+ int l = min, r = max, mid, cnt = 0;
+ while( l < r )
+ {
+ mid = r - (r - l) / 2;
+ for( i = 0; i < numsSize; ++i )
+ {
+ if( nums[i] >= mid )
+ ++cnt;
+ }
+ if( cnt < k )
+ {
+ r = mid - 1;
+ cnt = 0;
+ }
+ else
+ {
+ l = mid;
+ cnt = 0;
+ }
+ }
+ *max_value = max;
+ return l;
+}

- for (j = 0; j < stats->max_states; j++) {
- if (len >= PAGE_SIZE)
- break;
- len += snprintf(buf + len, PAGE_SIZE - len, "%8u ",
- stats->trans_table[i * stats->max_states + j]);
- }
- if (len >= PAGE_SIZE)
- break;
- len += snprintf(buf + len, PAGE_SIZE - len, "\n");
- }

- if (len >= PAGE_SIZE) {
- pr_warn_once("Thermal transition table exceeds PAGE_SIZE. Disabling\n");
- return -EFBIG;
- }
- return len;
+
+static ssize_t trans_table_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct thermal_cooling_device *cdev = to_cooling_device(dev);
+ struct cooling_dev_stats *stats = cdev->stats;
+ ssize_t len = 0;
+ int i, j;
+ unsigned int show_state_value = 0;
+ unsigned int max_state_value = 0;
+
+ len += snprintf(buf + len, PAGE_SIZE - len, " From : Index_change\n");
+ for (i = 0; i < stats->max_states; i++) {
+ show_state_value = find_show_state(&stats->trans_table[i * stats->max_states], stats->max_states, stats->show_state, &max_state_value);
+ if (max_state_value) {
+ len += snprintf(buf + len, PAGE_SIZE - len, "state%2u:", i);
+ }
+ else {
+ continue;
+ }
+
+ for (j = 0; j < stats->max_states; j++) {
+ if (stats->trans_table[i * stats->max_states + j] && (show_state_value <= stats->trans_table[i * stats->max_states + j])) {
+ len += snprintf(buf + len, PAGE_SIZE - len, " ->%u(%u)",j, stats->trans_table[i * stats->max_states + j]);
+ }
+ }
+ if (len >= PAGE_SIZE)
+ break;
+ len += snprintf(buf + len, PAGE_SIZE - len, "\n");
+ }
+
+ if (len >= PAGE_SIZE) {
+ pr_warn_once("Thermal transition table exceeds PAGE_SIZE. Disabling\n");
+ return -EFBIG;
+ }
+ return len;
}

static DEVICE_ATTR_RO(total_trans);
static DEVICE_ATTR_RO(time_in_state_ms);
static DEVICE_ATTR_WO(reset);
static DEVICE_ATTR_RO(trans_table);
+static DEVICE_ATTR_RW(show_state);

static struct attribute *cooling_device_stats_attrs[] = {
&dev_attr_total_trans.attr,
&dev_attr_time_in_state_ms.attr,
&dev_attr_reset.attr,
&dev_attr_trans_table.attr,
+ &dev_attr_show_state.attr,
NULL
};

@@ -829,7 +889,7 @@ static void cooling_device_stats_setup(struct thermal_cooling_device *cdev)
var += sizeof(*stats->time_in_state) * states;
var += sizeof(*stats->trans_table) * states * states;

- stats = kzalloc(var, GFP_KERNEL);
+ stats = kvzalloc(var, GFP_KERNEL);
if (!stats)
return;

@@ -838,6 +898,8 @@ static void cooling_device_stats_setup(struct thermal_cooling_device *cdev)
cdev->stats = stats;
stats->last_time = ktime_get();
stats->max_states = states;
+ /* default set show_state = max_states/2 */
+ stats->show_state = states / 2;

spin_lock_init(&stats->lock);

@@ -848,7 +910,7 @@ static void cooling_device_stats_setup(struct thermal_cooling_device *cdev)

static void cooling_device_stats_destroy(struct thermal_cooling_device *cdev)
{
- kfree(cdev->stats);
+ kvfree(cdev->stats);
cdev->stats = NULL;
}

--
2.25.1