2023-12-19 08:28:00

by Chen-Yu Tsai

[permalink] [raw]
Subject: [PATCH] thermal/core: Correctly free tz->tzp in thermal zone registration error path

After commit 3d439b1a2ad3 ("thermal/core: Alloc-copy-free the thermal
zone parameters structure"), the core now copies the thermal zone
parameters structure, and frees it if an error happens during thermal
zone device registration, or upon unregistration of the device.

In the error path, if device_register() was called, then `tz` disappears
before kfree(tz->tzp) happens, causing a NULL pointer deference crash.

In my case, the error path was entered from the sbs power supply driver,
which through the power supply core registers a thermal zone *without
trip points* for the battery temperature sensor. This combined with
setting the default thermal governor to "power allocator", which
*requires* trip_max, causes the thermal zone registration to error out.

The error path should handle the two cases, one where device_register
has not happened and the kobj hasn't been reference counted, and vice
versa where it has. The original commit tried to cover the first case,
but fails for the second. Fix this by adding kfree(tz->tzp) before
put_device() to cover the second case, and check if `tz` is still valid
before calling kfree(tz->tzp) to avoid crashing in the second case.

Fixes: 3d439b1a2ad3 ("thermal/core: Alloc-copy-free the thermal zone parameters structure")
Signed-off-by: Chen-Yu Tsai <[email protected]>
---
This includes the minimal changes to fix the crash. I suppose some other
things in the thermal core could be reworked:
- Don't use "power allocator" for thermal zones without trip points
- Move some of the thermal zone cleanup code into the release function

drivers/thermal/thermal_core.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 2415dc50c31d..e47826d82062 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -1392,12 +1392,16 @@ thermal_zone_device_register_with_trips(const char *type, struct thermal_trip *t
unregister:
device_del(&tz->device);
release_device:
+ /* Free tz->tzp before tz goes away. */
+ kfree(tz->tzp);
put_device(&tz->device);
tz = NULL;
remove_id:
ida_free(&thermal_tz_ida, id);
free_tzp:
- kfree(tz->tzp);
+ /* If we arrived here before device_register() was called. */
+ if (tz)
+ kfree(tz->tzp);
free_tz:
kfree(tz);
return ERR_PTR(result);
--
2.43.0.472.g3155946c3a-goog



2023-12-19 15:28:29

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] thermal/core: Correctly free tz->tzp in thermal zone registration error path

On Tue, Dec 19, 2023 at 9:27 AM Chen-Yu Tsai <[email protected]> wrote:
>
> After commit 3d439b1a2ad3 ("thermal/core: Alloc-copy-free the thermal
> zone parameters structure"), the core now copies the thermal zone
> parameters structure, and frees it if an error happens during thermal
> zone device registration, or upon unregistration of the device.
>
> In the error path, if device_register() was called, then `tz` disappears
> before kfree(tz->tzp) happens, causing a NULL pointer deference crash.
>
> In my case, the error path was entered from the sbs power supply driver,
> which through the power supply core registers a thermal zone *without
> trip points* for the battery temperature sensor. This combined with
> setting the default thermal governor to "power allocator", which
> *requires* trip_max, causes the thermal zone registration to error out.
>
> The error path should handle the two cases, one where device_register
> has not happened and the kobj hasn't been reference counted, and vice
> versa where it has. The original commit tried to cover the first case,
> but fails for the second. Fix this by adding kfree(tz->tzp) before
> put_device() to cover the second case, and check if `tz` is still valid
> before calling kfree(tz->tzp) to avoid crashing in the second case.
>
> Fixes: 3d439b1a2ad3 ("thermal/core: Alloc-copy-free the thermal zone parameters structure")
> Signed-off-by: Chen-Yu Tsai <[email protected]>
> ---
> This includes the minimal changes to fix the crash. I suppose some other
> things in the thermal core could be reworked:
> - Don't use "power allocator" for thermal zones without trip points
> - Move some of the thermal zone cleanup code into the release function
>
> drivers/thermal/thermal_core.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index 2415dc50c31d..e47826d82062 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -1392,12 +1392,16 @@ thermal_zone_device_register_with_trips(const char *type, struct thermal_trip *t
> unregister:
> device_del(&tz->device);
> release_device:
> + /* Free tz->tzp before tz goes away. */
> + kfree(tz->tzp);
> put_device(&tz->device);
> tz = NULL;
> remove_id:
> ida_free(&thermal_tz_ida, id);
> free_tzp:
> - kfree(tz->tzp);
> + /* If we arrived here before device_register() was called. */
> + if (tz)
> + kfree(tz->tzp);
> free_tz:
> kfree(tz);
> return ERR_PTR(result);
> --

Can you please test linux-next from today? The issue addressed by
your patch should be fixed there.

2024-01-09 03:46:07

by Chen-Yu Tsai

[permalink] [raw]
Subject: Re: [PATCH] thermal/core: Correctly free tz->tzp in thermal zone registration error path

On Tue, Dec 19, 2023 at 11:28 PM Rafael J. Wysocki <[email protected]> wrote:
>
> On Tue, Dec 19, 2023 at 9:27 AM Chen-Yu Tsai <[email protected]> wrote:
> >
> > After commit 3d439b1a2ad3 ("thermal/core: Alloc-copy-free the thermal
> > zone parameters structure"), the core now copies the thermal zone
> > parameters structure, and frees it if an error happens during thermal
> > zone device registration, or upon unregistration of the device.
> >
> > In the error path, if device_register() was called, then `tz` disappears
> > before kfree(tz->tzp) happens, causing a NULL pointer deference crash.
> >
> > In my case, the error path was entered from the sbs power supply driver,
> > which through the power supply core registers a thermal zone *without
> > trip points* for the battery temperature sensor. This combined with
> > setting the default thermal governor to "power allocator", which
> > *requires* trip_max, causes the thermal zone registration to error out.
> >
> > The error path should handle the two cases, one where device_register
> > has not happened and the kobj hasn't been reference counted, and vice
> > versa where it has. The original commit tried to cover the first case,
> > but fails for the second. Fix this by adding kfree(tz->tzp) before
> > put_device() to cover the second case, and check if `tz` is still valid
> > before calling kfree(tz->tzp) to avoid crashing in the second case.
> >
> > Fixes: 3d439b1a2ad3 ("thermal/core: Alloc-copy-free the thermal zone parameters structure")
> > Signed-off-by: Chen-Yu Tsai <[email protected]>
> > ---
> > This includes the minimal changes to fix the crash. I suppose some other
> > things in the thermal core could be reworked:
> > - Don't use "power allocator" for thermal zones without trip points
> > - Move some of the thermal zone cleanup code into the release function
> >
> > drivers/thermal/thermal_core.c | 6 +++++-
> > 1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> > index 2415dc50c31d..e47826d82062 100644
> > --- a/drivers/thermal/thermal_core.c
> > +++ b/drivers/thermal/thermal_core.c
> > @@ -1392,12 +1392,16 @@ thermal_zone_device_register_with_trips(const char *type, struct thermal_trip *t
> > unregister:
> > device_del(&tz->device);
> > release_device:
> > + /* Free tz->tzp before tz goes away. */
> > + kfree(tz->tzp);
> > put_device(&tz->device);
> > tz = NULL;
> > remove_id:
> > ida_free(&thermal_tz_ida, id);
> > free_tzp:
> > - kfree(tz->tzp);
> > + /* If we arrived here before device_register() was called. */
> > + if (tz)
> > + kfree(tz->tzp);
> > free_tz:
> > kfree(tz);
> > return ERR_PTR(result);
> > --
>
> Can you please test linux-next from today? The issue addressed by
> your patch should be fixed there.

Sorry for the very late reply. Yes it does. Thanks.

ChenYu