2022-07-22 08:47:42

by Di Shen

[permalink] [raw]
Subject: [PATCH V2 1/1] thermal/sysfs: Clear cooling_device_stats_attr_group before initialized

There's a space allocated for cooling_device_stats_attr_group
within cooling_device_attr_groups. This space is shared by all
cooling devices.

If the stats structure of one cooling device successfully
creates stats sysfs. After that, another cooling device fails
to get max_states in cooling_device_stats_setup(). It can
return directly without initializing the stats structure, but
the cooling_device_stats_attr_group is still the attribute
group of the last cooling device.

At this time, read or write stats sysfs nodes can cause kernel
crash. Like the following, kernel crashed when
'cat time_in_state_ms'.

[<5baac8d4>] panic+0x1b4/0x3c8
[<9d287b0f>] arm_notify_die+0x0/0x78
[<094fc22c>] __do_kernel_fault+0x94/0xa4
[<3b4b69a4>] do_page_fault+0xd4/0x364
[<23793e7a>] do_translation_fault+0x38/0xc0
[<6e5cc52a>] do_DataAbort+0x4c/0xd0
[<a28c16b8>] __dabt_svc+0x5c/0xa0
[<747516ae>] _raw_spin_lock+0x20/0x60
[<9a9e4cd4>] time_in_state_ms_show+0x28/0x148
[<cb78325e>] dev_attr_show+0x38/0x64
[<aea3e364>] sysfs_kf_seq_show+0x8c/0xf0
[<c0a843ab>] seq_read+0x244/0x620
[<b316b374>] vfs_read+0xd8/0x218
[<3aebf5fa>] sys_read+0x80/0xe4
[<7cf100f5>] ret_fast_syscall+0x0/0x28
[<08cbe22f>] 0xbe8c1198

stats sysfs:
phone:/sys/class/thermal/cooling_device2/stats # ls
reset time_in_state_ms total_trans trans_table

The same as cat total_trans, trans_table, and echo reset.

To avoid kernel crash, this patch set clears the
cooling_device_attr_groups before stats structure is initialized.

Signed-off-by: Di Shen <[email protected]>
---
drivers/thermal/thermal_sysfs.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
index 1c4aac8464a7..e3fae63fa0f7 100644
--- a/drivers/thermal/thermal_sysfs.c
+++ b/drivers/thermal/thermal_sysfs.c
@@ -817,6 +817,9 @@ static void cooling_device_stats_setup(struct thermal_cooling_device *cdev)
unsigned long states;
int var;

+ var = ARRAY_SIZE(cooling_device_attr_groups) - 2;
+ cooling_device_attr_groups[var] = NULL;
+
if (cdev->ops->get_max_state(cdev, &states))
return;

--
2.17.1


2022-07-22 17:49:18

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH V2 1/1] thermal/sysfs: Clear cooling_device_stats_attr_group before initialized

On Fri, Jul 22, 2022 at 10:44 AM Di Shen <[email protected]> wrote:
>
> There's a space allocated for cooling_device_stats_attr_group
> within cooling_device_attr_groups. This space is shared by all
> cooling devices.

That's correct.

> If the stats structure of one cooling device successfully
> creates stats sysfs. After that, another cooling device fails
> to get max_states in cooling_device_stats_setup(). It can
> return directly without initializing the stats structure, but
> the cooling_device_stats_attr_group is still the attribute
> group of the last cooling device.

I cannot parse the above, sorry.

For example, how can a "stats structure of one cooling device" create
anything? As a data structure, it is a passive entity, so it doesn't
carry out any actions.

I think (but I am not sure) that you are referring to the error code
path in which the ->get_max_state() callback fails for a cooling
device after cooling_device_stats_setup() has completed successfully
for another one.

> At this time, read or write stats sysfs nodes can cause kernel
> crash. Like the following, kernel crashed when
> 'cat time_in_state_ms'.
>
> [<5baac8d4>] panic+0x1b4/0x3c8
> [<9d287b0f>] arm_notify_die+0x0/0x78
> [<094fc22c>] __do_kernel_fault+0x94/0xa4
> [<3b4b69a4>] do_page_fault+0xd4/0x364
> [<23793e7a>] do_translation_fault+0x38/0xc0
> [<6e5cc52a>] do_DataAbort+0x4c/0xd0
> [<a28c16b8>] __dabt_svc+0x5c/0xa0
> [<747516ae>] _raw_spin_lock+0x20/0x60
> [<9a9e4cd4>] time_in_state_ms_show+0x28/0x148
> [<cb78325e>] dev_attr_show+0x38/0x64
> [<aea3e364>] sysfs_kf_seq_show+0x8c/0xf0
> [<c0a843ab>] seq_read+0x244/0x620
> [<b316b374>] vfs_read+0xd8/0x218
> [<3aebf5fa>] sys_read+0x80/0xe4
> [<7cf100f5>] ret_fast_syscall+0x0/0x28
> [<08cbe22f>] 0xbe8c1198
>
> stats sysfs:
> phone:/sys/class/thermal/cooling_device2/stats # ls
> reset time_in_state_ms total_trans trans_table
>
> The same as cat total_trans, trans_table, and echo reset.
>
> To avoid kernel crash, this patch set clears the
> cooling_device_attr_groups before stats structure is initialized.
>
> Signed-off-by: Di Shen <[email protected]>
> ---
> drivers/thermal/thermal_sysfs.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
> index 1c4aac8464a7..e3fae63fa0f7 100644
> --- a/drivers/thermal/thermal_sysfs.c
> +++ b/drivers/thermal/thermal_sysfs.c
> @@ -817,6 +817,9 @@ static void cooling_device_stats_setup(struct thermal_cooling_device *cdev)
> unsigned long states;
> int var;
>
> + var = ARRAY_SIZE(cooling_device_attr_groups) - 2;
> + cooling_device_attr_groups[var] = NULL;
> +
> if (cdev->ops->get_max_state(cdev, &states))
> return;
>
> --

2022-07-22 19:00:38

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH V2 1/1] thermal/sysfs: Clear cooling_device_stats_attr_group before initialized

On Friday, July 22, 2022 7:18:42 PM CEST Rafael J. Wysocki wrote:
> On Fri, Jul 22, 2022 at 10:44 AM Di Shen <[email protected]> wrote:
> >
> > There's a space allocated for cooling_device_stats_attr_group
> > within cooling_device_attr_groups. This space is shared by all
> > cooling devices.
>
> That's correct.
>
> > If the stats structure of one cooling device successfully
> > creates stats sysfs. After that, another cooling device fails
> > to get max_states in cooling_device_stats_setup(). It can
> > return directly without initializing the stats structure, but
> > the cooling_device_stats_attr_group is still the attribute
> > group of the last cooling device.
>
> I cannot parse the above, sorry.
>
> For example, how can a "stats structure of one cooling device" create
> anything? As a data structure, it is a passive entity, so it doesn't
> carry out any actions.
>
> I think (but I am not sure) that you are referring to the error code
> path in which the ->get_max_state() callback fails for a cooling
> device after cooling_device_stats_setup() has completed successfully
> for another one.
>
> > At this time, read or write stats sysfs nodes can cause kernel
> > crash. Like the following, kernel crashed when
> > 'cat time_in_state_ms'.
> >
> > [<5baac8d4>] panic+0x1b4/0x3c8
> > [<9d287b0f>] arm_notify_die+0x0/0x78
> > [<094fc22c>] __do_kernel_fault+0x94/0xa4
> > [<3b4b69a4>] do_page_fault+0xd4/0x364
> > [<23793e7a>] do_translation_fault+0x38/0xc0
> > [<6e5cc52a>] do_DataAbort+0x4c/0xd0
> > [<a28c16b8>] __dabt_svc+0x5c/0xa0
> > [<747516ae>] _raw_spin_lock+0x20/0x60
> > [<9a9e4cd4>] time_in_state_ms_show+0x28/0x148
> > [<cb78325e>] dev_attr_show+0x38/0x64
> > [<aea3e364>] sysfs_kf_seq_show+0x8c/0xf0
> > [<c0a843ab>] seq_read+0x244/0x620
> > [<b316b374>] vfs_read+0xd8/0x218
> > [<3aebf5fa>] sys_read+0x80/0xe4
> > [<7cf100f5>] ret_fast_syscall+0x0/0x28
> > [<08cbe22f>] 0xbe8c1198
> >
> > stats sysfs:
> > phone:/sys/class/thermal/cooling_device2/stats # ls
> > reset time_in_state_ms total_trans trans_table
> >
> > The same as cat total_trans, trans_table, and echo reset.

So does the (untested) patch below work too?

---
drivers/thermal/thermal_sysfs.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

Index: linux-pm/drivers/thermal/thermal_sysfs.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_sysfs.c
+++ linux-pm/drivers/thermal/thermal_sysfs.c
@@ -813,12 +813,13 @@ static const struct attribute_group cool

static void cooling_device_stats_setup(struct thermal_cooling_device *cdev)
{
+ const struct attribute_group *stats_attr_group = NULL;
struct cooling_dev_stats *stats;
unsigned long states;
int var;

if (cdev->ops->get_max_state(cdev, &states))
- return;
+ goto out;

states++; /* Total number of states is highest state + 1 */

@@ -828,7 +829,7 @@ static void cooling_device_stats_setup(s

stats = kzalloc(var, GFP_KERNEL);
if (!stats)
- return;
+ goto out;

stats->time_in_state = (ktime_t *)(stats + 1);
stats->trans_table = (unsigned int *)(stats->time_in_state + states);
@@ -838,9 +839,12 @@ static void cooling_device_stats_setup(s

spin_lock_init(&stats->lock);

+ stats_attr_group = &cooling_device_stats_attr_group;
+
+out:
/* Fill the empty slot left in cooling_device_attr_groups */
var = ARRAY_SIZE(cooling_device_attr_groups) - 2;
- cooling_device_attr_groups[var] = &cooling_device_stats_attr_group;
+ cooling_device_attr_groups[var] = stats_attr_group;
}

static void cooling_device_stats_destroy(struct thermal_cooling_device *cdev)



2022-07-26 08:29:21

by Di Shen

[permalink] [raw]
Subject: Re: [PATCH V2 1/1] thermal/sysfs: Clear cooling_device_stats_attr_group before initialized

On Sat, Jul 23, 2022 at 1:18 AM Rafael J. Wysocki <[email protected]> wrote:
>
> On Fri, Jul 22, 2022 at 10:44 AM Di Shen <[email protected]> wrote:
> >
> > There's a space allocated for cooling_device_stats_attr_group
> > within cooling_device_attr_groups. This space is shared by all
> > cooling devices.
>
> That's correct.
>
> > If the stats structure of one cooling device successfully
> > creates stats sysfs. After that, another cooling device fails
> > to get max_states in cooling_device_stats_setup(). It can
> > return directly without initializing the stats structure, but
> > the cooling_device_stats_attr_group is still the attribute
> > group of the last cooling device.
>
> I cannot parse the above, sorry.
>
> For example, how can a "stats structure of one cooling device" create
> anything? As a data structure, it is a passive entity, so it doesn't
> carry out any actions.
>

Sorry, I didn't describe it properly. I mean 'if it has been called back
cooling_device_stats_setup() successfully for a cooling device'.

> I think (but I am not sure) that you are referring to the error code
> path in which the ->get_max_state() callback fails for a cooling
> device after cooling_device_stats_setup() has completed successfully
> for another one.
>

That's it. As you say, ->get_max_state() callback fails for a cooling
device after cooling_device_stats_setup() has completed successfully
for another one.

> > At this time, read or write stats sysfs nodes can cause kernel
> > crash. Like the following, kernel crashed when
> > 'cat time_in_state_ms'.
> >
> > [<5baac8d4>] panic+0x1b4/0x3c8
> > [<9d287b0f>] arm_notify_die+0x0/0x78
> > [<094fc22c>] __do_kernel_fault+0x94/0xa4
> > [<3b4b69a4>] do_page_fault+0xd4/0x364
> > [<23793e7a>] do_translation_fault+0x38/0xc0
> > [<6e5cc52a>] do_DataAbort+0x4c/0xd0
> > [<a28c16b8>] __dabt_svc+0x5c/0xa0
> > [<747516ae>] _raw_spin_lock+0x20/0x60
> > [<9a9e4cd4>] time_in_state_ms_show+0x28/0x148
> > [<cb78325e>] dev_attr_show+0x38/0x64
> > [<aea3e364>] sysfs_kf_seq_show+0x8c/0xf0
> > [<c0a843ab>] seq_read+0x244/0x620
> > [<b316b374>] vfs_read+0xd8/0x218
> > [<3aebf5fa>] sys_read+0x80/0xe4
> > [<7cf100f5>] ret_fast_syscall+0x0/0x28
> > [<08cbe22f>] 0xbe8c1198
> >
> > stats sysfs:
> > phone:/sys/class/thermal/cooling_device2/stats # ls
> > reset time_in_state_ms total_trans trans_table
> >
> > The same as cat total_trans, trans_table, and echo reset.
> >
> > To avoid kernel crash, this patch set clears the
> > cooling_device_attr_groups before stats structure is initialized.
> >
> > Signed-off-by: Di Shen <[email protected]>
> > ---
> > drivers/thermal/thermal_sysfs.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
> > index 1c4aac8464a7..e3fae63fa0f7 100644
> > --- a/drivers/thermal/thermal_sysfs.c
> > +++ b/drivers/thermal/thermal_sysfs.c
> > @@ -817,6 +817,9 @@ static void cooling_device_stats_setup(struct thermal_cooling_device *cdev)
> > unsigned long states;
> > int var;
> >
> > + var = ARRAY_SIZE(cooling_device_attr_groups) - 2;
> > + cooling_device_attr_groups[var] = NULL;
> > +
> > if (cdev->ops->get_max_state(cdev, &states))
> > return;
> >
> > --

--
Best regards,
Di



Di

2022-07-26 08:30:03

by Di Shen

[permalink] [raw]
Subject: Re: [PATCH V2 1/1] thermal/sysfs: Clear cooling_device_stats_attr_group before initialized

On Sat, Jul 23, 2022 at 2:42 AM Rafael J. Wysocki <[email protected]> wrote:
>
> On Friday, July 22, 2022 7:18:42 PM CEST Rafael J. Wysocki wrote:
> > On Fri, Jul 22, 2022 at 10:44 AM Di Shen <[email protected]> wrote:
> > >
> > > There's a space allocated for cooling_device_stats_attr_group
> > > within cooling_device_attr_groups. This space is shared by all
> > > cooling devices.
> >
> > That's correct.
> >
> > > If the stats structure of one cooling device successfully
> > > creates stats sysfs. After that, another cooling device fails
> > > to get max_states in cooling_device_stats_setup(). It can
> > > return directly without initializing the stats structure, but
> > > the cooling_device_stats_attr_group is still the attribute
> > > group of the last cooling device.
> >
> > I cannot parse the above, sorry.
> >
> > For example, how can a "stats structure of one cooling device" create
> > anything? As a data structure, it is a passive entity, so it doesn't
> > carry out any actions.
> >
> > I think (but I am not sure) that you are referring to the error code
> > path in which the ->get_max_state() callback fails for a cooling
> > device after cooling_device_stats_setup() has completed successfully
> > for another one.
> >
> > > At this time, read or write stats sysfs nodes can cause kernel
> > > crash. Like the following, kernel crashed when
> > > 'cat time_in_state_ms'.
> > >
> > > [<5baac8d4>] panic+0x1b4/0x3c8
> > > [<9d287b0f>] arm_notify_die+0x0/0x78
> > > [<094fc22c>] __do_kernel_fault+0x94/0xa4
> > > [<3b4b69a4>] do_page_fault+0xd4/0x364
> > > [<23793e7a>] do_translation_fault+0x38/0xc0
> > > [<6e5cc52a>] do_DataAbort+0x4c/0xd0
> > > [<a28c16b8>] __dabt_svc+0x5c/0xa0
> > > [<747516ae>] _raw_spin_lock+0x20/0x60
> > > [<9a9e4cd4>] time_in_state_ms_show+0x28/0x148
> > > [<cb78325e>] dev_attr_show+0x38/0x64
> > > [<aea3e364>] sysfs_kf_seq_show+0x8c/0xf0
> > > [<c0a843ab>] seq_read+0x244/0x620
> > > [<b316b374>] vfs_read+0xd8/0x218
> > > [<3aebf5fa>] sys_read+0x80/0xe4
> > > [<7cf100f5>] ret_fast_syscall+0x0/0x28
> > > [<08cbe22f>] 0xbe8c1198
> > >
> > > stats sysfs:
> > > phone:/sys/class/thermal/cooling_device2/stats # ls
> > > reset time_in_state_ms total_trans trans_table
> > >
> > > The same as cat total_trans, trans_table, and echo reset.
>
> So does the (untested) patch below work too?
>

Yes, I agree with you. I will test it on our platform and give
a reply later. Thanks.

> ---
> drivers/thermal/thermal_sysfs.c | 10 +++++++---
> 1 file changed, 7 insertions(+), 3 deletions(-)
>
> Index: linux-pm/drivers/thermal/thermal_sysfs.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_sysfs.c
> +++ linux-pm/drivers/thermal/thermal_sysfs.c
> @@ -813,12 +813,13 @@ static const struct attribute_group cool
>
> static void cooling_device_stats_setup(struct thermal_cooling_device *cdev)
> {
> + const struct attribute_group *stats_attr_group = NULL;
> struct cooling_dev_stats *stats;
> unsigned long states;
> int var;
>
> if (cdev->ops->get_max_state(cdev, &states))
> - return;
> + goto out;
>
> states++; /* Total number of states is highest state + 1 */
>
> @@ -828,7 +829,7 @@ static void cooling_device_stats_setup(s
>
> stats = kzalloc(var, GFP_KERNEL);
> if (!stats)
> - return;
> + goto out;
>
> stats->time_in_state = (ktime_t *)(stats + 1);
> stats->trans_table = (unsigned int *)(stats->time_in_state + states);
> @@ -838,9 +839,12 @@ static void cooling_device_stats_setup(s
>
> spin_lock_init(&stats->lock);
>
> + stats_attr_group = &cooling_device_stats_attr_group;
> +
> +out:
> /* Fill the empty slot left in cooling_device_attr_groups */
> var = ARRAY_SIZE(cooling_device_attr_groups) - 2;
> - cooling_device_attr_groups[var] = &cooling_device_stats_attr_group;
> + cooling_device_attr_groups[var] = stats_attr_group;
> }
>
> static void cooling_device_stats_destroy(struct thermal_cooling_device *cdev)
>
>
>
--
Best regards,

2022-07-27 08:19:58

by Di Shen

[permalink] [raw]
Subject: Re: [PATCH V2 1/1] thermal/sysfs: Clear cooling_device_stats_attr_group before initialized

Hi Rafael,
I have tested this patch on our platform, and it works. Later, I will
send the patch v3.
Thanks!
--
Di

On Tue, Jul 26, 2022 at 3:39 PM Di Shen <[email protected]> wrote:
>
> On Sat, Jul 23, 2022 at 2:42 AM Rafael J. Wysocki <[email protected]> wrote:
> >
> > On Friday, July 22, 2022 7:18:42 PM CEST Rafael J. Wysocki wrote:
> > > On Fri, Jul 22, 2022 at 10:44 AM Di Shen <[email protected]> wrote:
> > > >
> > > > There's a space allocated for cooling_device_stats_attr_group
> > > > within cooling_device_attr_groups. This space is shared by all
> > > > cooling devices.
> > >
> > > That's correct.
> > >
> > > > If the stats structure of one cooling device successfully
> > > > creates stats sysfs. After that, another cooling device fails
> > > > to get max_states in cooling_device_stats_setup(). It can
> > > > return directly without initializing the stats structure, but
> > > > the cooling_device_stats_attr_group is still the attribute
> > > > group of the last cooling device.
> > >
> > > I cannot parse the above, sorry.
> > >
> > > For example, how can a "stats structure of one cooling device" create
> > > anything? As a data structure, it is a passive entity, so it doesn't
> > > carry out any actions.
> > >
> > > I think (but I am not sure) that you are referring to the error code
> > > path in which the ->get_max_state() callback fails for a cooling
> > > device after cooling_device_stats_setup() has completed successfully
> > > for another one.
> > >
> > > > At this time, read or write stats sysfs nodes can cause kernel
> > > > crash. Like the following, kernel crashed when
> > > > 'cat time_in_state_ms'.
> > > >
> > > > [<5baac8d4>] panic+0x1b4/0x3c8
> > > > [<9d287b0f>] arm_notify_die+0x0/0x78
> > > > [<094fc22c>] __do_kernel_fault+0x94/0xa4
> > > > [<3b4b69a4>] do_page_fault+0xd4/0x364
> > > > [<23793e7a>] do_translation_fault+0x38/0xc0
> > > > [<6e5cc52a>] do_DataAbort+0x4c/0xd0
> > > > [<a28c16b8>] __dabt_svc+0x5c/0xa0
> > > > [<747516ae>] _raw_spin_lock+0x20/0x60
> > > > [<9a9e4cd4>] time_in_state_ms_show+0x28/0x148
> > > > [<cb78325e>] dev_attr_show+0x38/0x64
> > > > [<aea3e364>] sysfs_kf_seq_show+0x8c/0xf0
> > > > [<c0a843ab>] seq_read+0x244/0x620
> > > > [<b316b374>] vfs_read+0xd8/0x218
> > > > [<3aebf5fa>] sys_read+0x80/0xe4
> > > > [<7cf100f5>] ret_fast_syscall+0x0/0x28
> > > > [<08cbe22f>] 0xbe8c1198
> > > >
> > > > stats sysfs:
> > > > phone:/sys/class/thermal/cooling_device2/stats # ls
> > > > reset time_in_state_ms total_trans trans_table
> > > >
> > > > The same as cat total_trans, trans_table, and echo reset.
> >
> > So does the (untested) patch below work too?
> >
>
> Yes, I agree with you. I will test it on our platform and give
> a reply later. Thanks.
>
> > ---
> > drivers/thermal/thermal_sysfs.c | 10 +++++++---
> > 1 file changed, 7 insertions(+), 3 deletions(-)
> >
> > Index: linux-pm/drivers/thermal/thermal_sysfs.c
> > ===================================================================
> > --- linux-pm.orig/drivers/thermal/thermal_sysfs.c
> > +++ linux-pm/drivers/thermal/thermal_sysfs.c
> > @@ -813,12 +813,13 @@ static const struct attribute_group cool
> >
> > static void cooling_device_stats_setup(struct thermal_cooling_device *cdev)
> > {
> > + const struct attribute_group *stats_attr_group = NULL;
> > struct cooling_dev_stats *stats;
> > unsigned long states;
> > int var;
> >
> > if (cdev->ops->get_max_state(cdev, &states))
> > - return;
> > + goto out;
> >
> > states++; /* Total number of states is highest state + 1 */
> >
> > @@ -828,7 +829,7 @@ static void cooling_device_stats_setup(s
> >
> > stats = kzalloc(var, GFP_KERNEL);
> > if (!stats)
> > - return;
> > + goto out;
> >
> > stats->time_in_state = (ktime_t *)(stats + 1);
> > stats->trans_table = (unsigned int *)(stats->time_in_state + states);
> > @@ -838,9 +839,12 @@ static void cooling_device_stats_setup(s
> >
> > spin_lock_init(&stats->lock);
> >
> > + stats_attr_group = &cooling_device_stats_attr_group;
> > +
> > +out:
> > /* Fill the empty slot left in cooling_device_attr_groups */
> > var = ARRAY_SIZE(cooling_device_attr_groups) - 2;
> > - cooling_device_attr_groups[var] = &cooling_device_stats_attr_group;
> > + cooling_device_attr_groups[var] = stats_attr_group;
> > }
> >
> > static void cooling_device_stats_destroy(struct thermal_cooling_device *cdev)
> >
> >
> >
> --
> Best regards,

2022-07-27 15:11:35

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH V2 1/1] thermal/sysfs: Clear cooling_device_stats_attr_group before initialized

On Wed, Jul 27, 2022 at 10:17 AM Di Shen <[email protected]> wrote:
>
> Hi Rafael,
> I have tested this patch on our platform, and it works. Later, I will
> send the patch v3.

Well, no need. I'll use the one that you've just tested.

Thanks!


> On Tue, Jul 26, 2022 at 3:39 PM Di Shen <[email protected]> wrote:
> >
> > On Sat, Jul 23, 2022 at 2:42 AM Rafael J. Wysocki <[email protected]> wrote:
> > >
> > > On Friday, July 22, 2022 7:18:42 PM CEST Rafael J. Wysocki wrote:
> > > > On Fri, Jul 22, 2022 at 10:44 AM Di Shen <[email protected]> wrote:
> > > > >
> > > > > There's a space allocated for cooling_device_stats_attr_group
> > > > > within cooling_device_attr_groups. This space is shared by all
> > > > > cooling devices.
> > > >
> > > > That's correct.
> > > >
> > > > > If the stats structure of one cooling device successfully
> > > > > creates stats sysfs. After that, another cooling device fails
> > > > > to get max_states in cooling_device_stats_setup(). It can
> > > > > return directly without initializing the stats structure, but
> > > > > the cooling_device_stats_attr_group is still the attribute
> > > > > group of the last cooling device.
> > > >
> > > > I cannot parse the above, sorry.
> > > >
> > > > For example, how can a "stats structure of one cooling device" create
> > > > anything? As a data structure, it is a passive entity, so it doesn't
> > > > carry out any actions.
> > > >
> > > > I think (but I am not sure) that you are referring to the error code
> > > > path in which the ->get_max_state() callback fails for a cooling
> > > > device after cooling_device_stats_setup() has completed successfully
> > > > for another one.
> > > >
> > > > > At this time, read or write stats sysfs nodes can cause kernel
> > > > > crash. Like the following, kernel crashed when
> > > > > 'cat time_in_state_ms'.
> > > > >
> > > > > [<5baac8d4>] panic+0x1b4/0x3c8
> > > > > [<9d287b0f>] arm_notify_die+0x0/0x78
> > > > > [<094fc22c>] __do_kernel_fault+0x94/0xa4
> > > > > [<3b4b69a4>] do_page_fault+0xd4/0x364
> > > > > [<23793e7a>] do_translation_fault+0x38/0xc0
> > > > > [<6e5cc52a>] do_DataAbort+0x4c/0xd0
> > > > > [<a28c16b8>] __dabt_svc+0x5c/0xa0
> > > > > [<747516ae>] _raw_spin_lock+0x20/0x60
> > > > > [<9a9e4cd4>] time_in_state_ms_show+0x28/0x148
> > > > > [<cb78325e>] dev_attr_show+0x38/0x64
> > > > > [<aea3e364>] sysfs_kf_seq_show+0x8c/0xf0
> > > > > [<c0a843ab>] seq_read+0x244/0x620
> > > > > [<b316b374>] vfs_read+0xd8/0x218
> > > > > [<3aebf5fa>] sys_read+0x80/0xe4
> > > > > [<7cf100f5>] ret_fast_syscall+0x0/0x28
> > > > > [<08cbe22f>] 0xbe8c1198
> > > > >
> > > > > stats sysfs:
> > > > > phone:/sys/class/thermal/cooling_device2/stats # ls
> > > > > reset time_in_state_ms total_trans trans_table
> > > > >
> > > > > The same as cat total_trans, trans_table, and echo reset.
> > >
> > > So does the (untested) patch below work too?
> > >
> >
> > Yes, I agree with you. I will test it on our platform and give
> > a reply later. Thanks.
> >
> > > ---
> > > drivers/thermal/thermal_sysfs.c | 10 +++++++---
> > > 1 file changed, 7 insertions(+), 3 deletions(-)
> > >
> > > Index: linux-pm/drivers/thermal/thermal_sysfs.c
> > > ===================================================================
> > > --- linux-pm.orig/drivers/thermal/thermal_sysfs.c
> > > +++ linux-pm/drivers/thermal/thermal_sysfs.c
> > > @@ -813,12 +813,13 @@ static const struct attribute_group cool
> > >
> > > static void cooling_device_stats_setup(struct thermal_cooling_device *cdev)
> > > {
> > > + const struct attribute_group *stats_attr_group = NULL;
> > > struct cooling_dev_stats *stats;
> > > unsigned long states;
> > > int var;
> > >
> > > if (cdev->ops->get_max_state(cdev, &states))
> > > - return;
> > > + goto out;
> > >
> > > states++; /* Total number of states is highest state + 1 */
> > >
> > > @@ -828,7 +829,7 @@ static void cooling_device_stats_setup(s
> > >
> > > stats = kzalloc(var, GFP_KERNEL);
> > > if (!stats)
> > > - return;
> > > + goto out;
> > >
> > > stats->time_in_state = (ktime_t *)(stats + 1);
> > > stats->trans_table = (unsigned int *)(stats->time_in_state + states);
> > > @@ -838,9 +839,12 @@ static void cooling_device_stats_setup(s
> > >
> > > spin_lock_init(&stats->lock);
> > >
> > > + stats_attr_group = &cooling_device_stats_attr_group;
> > > +
> > > +out:
> > > /* Fill the empty slot left in cooling_device_attr_groups */
> > > var = ARRAY_SIZE(cooling_device_attr_groups) - 2;
> > > - cooling_device_attr_groups[var] = &cooling_device_stats_attr_group;
> > > + cooling_device_attr_groups[var] = stats_attr_group;
> > > }
> > >
> > > static void cooling_device_stats_destroy(struct thermal_cooling_device *cdev)
> > >
> > >
> > >
> > --
> > Best regards,

2022-07-28 01:57:36

by Di Shen

[permalink] [raw]
Subject: Re: [PATCH V2 1/1] thermal/sysfs: Clear cooling_device_stats_attr_group before initialized

Ok, thanks.

On Wed, Jul 27, 2022 at 10:20 PM Rafael J. Wysocki <[email protected]> wrote:
>
> On Wed, Jul 27, 2022 at 10:17 AM Di Shen <[email protected]> wrote:
> >
> > Hi Rafael,
> > I have tested this patch on our platform, and it works. Later, I will
> > send the patch v3.
>
> Well, no need. I'll use the one that you've just tested.
>
> Thanks!
>
>
> > On Tue, Jul 26, 2022 at 3:39 PM Di Shen <[email protected]> wrote:
> > >
> > > On Sat, Jul 23, 2022 at 2:42 AM Rafael J. Wysocki <[email protected]> wrote:
> > > >
> > > > On Friday, July 22, 2022 7:18:42 PM CEST Rafael J. Wysocki wrote:
> > > > > On Fri, Jul 22, 2022 at 10:44 AM Di Shen <[email protected]> wrote:
> > > > > >
> > > > > > There's a space allocated for cooling_device_stats_attr_group
> > > > > > within cooling_device_attr_groups. This space is shared by all
> > > > > > cooling devices.
> > > > >
> > > > > That's correct.
> > > > >
> > > > > > If the stats structure of one cooling device successfully
> > > > > > creates stats sysfs. After that, another cooling device fails
> > > > > > to get max_states in cooling_device_stats_setup(). It can
> > > > > > return directly without initializing the stats structure, but
> > > > > > the cooling_device_stats_attr_group is still the attribute
> > > > > > group of the last cooling device.
> > > > >
> > > > > I cannot parse the above, sorry.
> > > > >
> > > > > For example, how can a "stats structure of one cooling device" create
> > > > > anything? As a data structure, it is a passive entity, so it doesn't
> > > > > carry out any actions.
> > > > >
> > > > > I think (but I am not sure) that you are referring to the error code
> > > > > path in which the ->get_max_state() callback fails for a cooling
> > > > > device after cooling_device_stats_setup() has completed successfully
> > > > > for another one.
> > > > >
> > > > > > At this time, read or write stats sysfs nodes can cause kernel
> > > > > > crash. Like the following, kernel crashed when
> > > > > > 'cat time_in_state_ms'.
> > > > > >
> > > > > > [<5baac8d4>] panic+0x1b4/0x3c8
> > > > > > [<9d287b0f>] arm_notify_die+0x0/0x78
> > > > > > [<094fc22c>] __do_kernel_fault+0x94/0xa4
> > > > > > [<3b4b69a4>] do_page_fault+0xd4/0x364
> > > > > > [<23793e7a>] do_translation_fault+0x38/0xc0
> > > > > > [<6e5cc52a>] do_DataAbort+0x4c/0xd0
> > > > > > [<a28c16b8>] __dabt_svc+0x5c/0xa0
> > > > > > [<747516ae>] _raw_spin_lock+0x20/0x60
> > > > > > [<9a9e4cd4>] time_in_state_ms_show+0x28/0x148
> > > > > > [<cb78325e>] dev_attr_show+0x38/0x64
> > > > > > [<aea3e364>] sysfs_kf_seq_show+0x8c/0xf0
> > > > > > [<c0a843ab>] seq_read+0x244/0x620
> > > > > > [<b316b374>] vfs_read+0xd8/0x218
> > > > > > [<3aebf5fa>] sys_read+0x80/0xe4
> > > > > > [<7cf100f5>] ret_fast_syscall+0x0/0x28
> > > > > > [<08cbe22f>] 0xbe8c1198
> > > > > >
> > > > > > stats sysfs:
> > > > > > phone:/sys/class/thermal/cooling_device2/stats # ls
> > > > > > reset time_in_state_ms total_trans trans_table
> > > > > >
> > > > > > The same as cat total_trans, trans_table, and echo reset.
> > > >
> > > > So does the (untested) patch below work too?
> > > >
> > >
> > > Yes, I agree with you. I will test it on our platform and give
> > > a reply later. Thanks.
> > >
> > > > ---
> > > > drivers/thermal/thermal_sysfs.c | 10 +++++++---
> > > > 1 file changed, 7 insertions(+), 3 deletions(-)
> > > >
> > > > Index: linux-pm/drivers/thermal/thermal_sysfs.c
> > > > ===================================================================
> > > > --- linux-pm.orig/drivers/thermal/thermal_sysfs.c
> > > > +++ linux-pm/drivers/thermal/thermal_sysfs.c
> > > > @@ -813,12 +813,13 @@ static const struct attribute_group cool
> > > >
> > > > static void cooling_device_stats_setup(struct thermal_cooling_device *cdev)
> > > > {
> > > > + const struct attribute_group *stats_attr_group = NULL;
> > > > struct cooling_dev_stats *stats;
> > > > unsigned long states;
> > > > int var;
> > > >
> > > > if (cdev->ops->get_max_state(cdev, &states))
> > > > - return;
> > > > + goto out;
> > > >
> > > > states++; /* Total number of states is highest state + 1 */
> > > >
> > > > @@ -828,7 +829,7 @@ static void cooling_device_stats_setup(s
> > > >
> > > > stats = kzalloc(var, GFP_KERNEL);
> > > > if (!stats)
> > > > - return;
> > > > + goto out;
> > > >
> > > > stats->time_in_state = (ktime_t *)(stats + 1);
> > > > stats->trans_table = (unsigned int *)(stats->time_in_state + states);
> > > > @@ -838,9 +839,12 @@ static void cooling_device_stats_setup(s
> > > >
> > > > spin_lock_init(&stats->lock);
> > > >
> > > > + stats_attr_group = &cooling_device_stats_attr_group;
> > > > +
> > > > +out:
> > > > /* Fill the empty slot left in cooling_device_attr_groups */
> > > > var = ARRAY_SIZE(cooling_device_attr_groups) - 2;
> > > > - cooling_device_attr_groups[var] = &cooling_device_stats_attr_group;
> > > > + cooling_device_attr_groups[var] = stats_attr_group;
> > > > }
> > > >
> > > > static void cooling_device_stats_destroy(struct thermal_cooling_device *cdev)
> > > >
> > > >
> > > >
> > > --
> > > Best regards,