2020-03-30 14:28:14

by Takashi Iwai

[permalink] [raw]
Subject: [PATCH] thermal: Add a sanity check for invalid state at stats update

The thermal sysfs handler keeps the statistics table with the fixed
size that was determined from the initial max_states() call, and the
table entry is updated at each sysfs cur_state write call. And, when
the driver's set_cur_state() ops accepts the value given from
user-space, the thermal sysfs core blindly applies it to the
statistics table entry, which may overflow and cause an Oops.
Although it's rather a bug in the driver's ops implementations, we
shouldn't crash but rather give a proper warning instead.

This patch adds a sanity check for avoiding such an OOB access and
warns with a stack trace to show the suspicious device in question.

Signed-off-by: Takashi Iwai <[email protected]>
---

We've hit some crash by stress tests, and this patch at least works
around the crash itself. While the actual bug fix of the buggy driver
is still being investigated, I submit the hardening in the core side
at first.

drivers/thermal/thermal_sysfs.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
index aa99edb4dff7..a23c4e701d63 100644
--- a/drivers/thermal/thermal_sysfs.c
+++ b/drivers/thermal/thermal_sysfs.c
@@ -772,6 +772,11 @@ void thermal_cooling_device_stats_update(struct thermal_cooling_device *cdev,

spin_lock(&stats->lock);

+ if (dev_WARN_ONCE(&cdev->device, new_state >= stats->max_states,
+ "new state %ld exceeds max_state %ld",
+ new_state, stats->max_states))
+ goto unlock;
+
if (stats->state == new_state)
goto unlock;

--
2.16.4


2020-04-07 13:32:28

by Amit Kucheria

[permalink] [raw]
Subject: Re: [PATCH] thermal: Add a sanity check for invalid state at stats update

On Mon, Mar 30, 2020 at 7:39 PM Takashi Iwai <[email protected]> wrote:
>
> The thermal sysfs handler keeps the statistics table with the fixed
> size that was determined from the initial max_states() call, and the
> table entry is updated at each sysfs cur_state write call. And, when
> the driver's set_cur_state() ops accepts the value given from
> user-space, the thermal sysfs core blindly applies it to the
> statistics table entry, which may overflow and cause an Oops.
> Although it's rather a bug in the driver's ops implementations, we
> shouldn't crash but rather give a proper warning instead.
>
> This patch adds a sanity check for avoiding such an OOB access and
> warns with a stack trace to show the suspicious device in question.

Hi Takashi,

Instead of this warning, I think we should reject such input when
writing to cur_state.

See attached patch. If you think this OK, I'll submit it.

Regards,
Amit

> Signed-off-by: Takashi Iwai <[email protected]>
> ---
>
> We've hit some crash by stress tests, and this patch at least works
> around the crash itself. While the actual bug fix of the buggy driver
> is still being investigated, I submit the hardening in the core side
> at first.
>
> drivers/thermal/thermal_sysfs.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
> index aa99edb4dff7..a23c4e701d63 100644
> --- a/drivers/thermal/thermal_sysfs.c
> +++ b/drivers/thermal/thermal_sysfs.c
> @@ -772,6 +772,11 @@ void thermal_cooling_device_stats_update(struct thermal_cooling_device *cdev,
>
> spin_lock(&stats->lock);
>
> + if (dev_WARN_ONCE(&cdev->device, new_state >= stats->max_states,
> + "new state %ld exceeds max_state %ld",
> + new_state, stats->max_states))
> + goto unlock;
> +
> if (stats->state == new_state)
> goto unlock;
>
> --
> 2.16.4
>


Attachments:
0001-thermal-Reject-invalid-cur_state-input-from-userspac.patch (1.77 kB)

2020-04-07 14:15:50

by Takashi Iwai

[permalink] [raw]
Subject: Re: [PATCH] thermal: Add a sanity check for invalid state at stats update

On Tue, 07 Apr 2020 15:30:51 +0200,
Amit Kucheria wrote:
>
> On Mon, Mar 30, 2020 at 7:39 PM Takashi Iwai <[email protected]> wrote:
> >
> > The thermal sysfs handler keeps the statistics table with the fixed
> > size that was determined from the initial max_states() call, and the
> > table entry is updated at each sysfs cur_state write call. And, when
> > the driver's set_cur_state() ops accepts the value given from
> > user-space, the thermal sysfs core blindly applies it to the
> > statistics table entry, which may overflow and cause an Oops.
> > Although it's rather a bug in the driver's ops implementations, we
> > shouldn't crash but rather give a proper warning instead.
> >
> > This patch adds a sanity check for avoiding such an OOB access and
> > warns with a stack trace to show the suspicious device in question.
>
> Hi Takashi,
>
> Instead of this warning, I think we should reject such input when
> writing to cur_state.
>
> See attached patch. If you think this OK, I'll submit it.

Actually the input value itself is correct, the problem is rather
about the max_states that may vary depending on other driver. So IMO,
we don't want to refuse the input completely.

Please see the thread:
https://lore.kernel.org/linux-acpi/[email protected]/


thanks,

Takashi


>
> Regards,
> Amit
>
> > Signed-off-by: Takashi Iwai <[email protected]>
> > ---
> >
> > We've hit some crash by stress tests, and this patch at least works
> > around the crash itself. While the actual bug fix of the buggy driver
> > is still being investigated, I submit the hardening in the core side
> > at first.
> >
> > drivers/thermal/thermal_sysfs.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
> > index aa99edb4dff7..a23c4e701d63 100644
> > --- a/drivers/thermal/thermal_sysfs.c
> > +++ b/drivers/thermal/thermal_sysfs.c
> > @@ -772,6 +772,11 @@ void thermal_cooling_device_stats_update(struct thermal_cooling_device *cdev,
> >
> > spin_lock(&stats->lock);
> >
> > + if (dev_WARN_ONCE(&cdev->device, new_state >= stats->max_states,
> > + "new state %ld exceeds max_state %ld",
> > + new_state, stats->max_states))
> > + goto unlock;
> > +
> > if (stats->state == new_state)
> > goto unlock;
> >
> > --
> > 2.16.4
> >
> From 54266260d483ab4476510dd4461a1cafc611e17d Mon Sep 17 00:00:00 2001
> Message-Id: <54266260d483ab4476510dd4461a1cafc611e17d.1586266224.git.amit.kucheria@linaro.org>
> From: Amit Kucheria <[email protected]>
> Date: Tue, 7 Apr 2020 18:48:14 +0530
> Subject: [PATCH] thermal: Reject invalid cur_state input from userspace
>
> We don't check if the cur_state value input in sysfs is greater than the
> maximum cooling state that the cooling device supports. This can cause
> access to unallocated memory in case THERMAL_STATISTICS in enabled and
> could also crash cooling devices that don't check for an invalid state in
> their set_cur_state() callback.
>
> Return an error if the state being requested in greater than the maximum
> cooling state the device supports.
>
> Reported-by: Takashi Iwai <[email protected]>
> Signed-off-by: Amit Kucheria <[email protected]>
> ---
> drivers/thermal/thermal_sysfs.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
> index 7e1d11bdd258..8033e5a9386a 100644
> --- a/drivers/thermal/thermal_sysfs.c
> +++ b/drivers/thermal/thermal_sysfs.c
> @@ -703,7 +703,7 @@ cur_state_store(struct device *dev, struct device_attribute *attr,
> const char *buf, size_t count)
> {
> struct thermal_cooling_device *cdev = to_cooling_device(dev);
> - unsigned long state;
> + unsigned long state, max_state;
> int result;
>
> if (sscanf(buf, "%ld\n", &state) != 1)
> @@ -712,6 +712,13 @@ cur_state_store(struct device *dev, struct device_attribute *attr,
> if ((long)state < 0)
> return -EINVAL;
>
> + result = cdev->ops->get_max_state(cdev, &max_state);
> + if (result)
> + return result;
> +
> + if (state >= max_state)
> + return -EINVAL;
> +
> mutex_lock(&cdev->lock);
>
> result = cdev->ops->set_cur_state(cdev, state);
> --
> 2.20.1
>

2020-04-07 14:19:57

by Zhang, Rui

[permalink] [raw]
Subject: RE: [PATCH] thermal: Add a sanity check for invalid state at stats update

Right, I have a V2 patch series, which will be post soon.

Thanks,
rui

-----Original Message-----
From: [email protected] <[email protected]> On Behalf Of Takashi Iwai
Sent: Tuesday, April 07, 2020 10:13 PM
To: Amit Kucheria <[email protected]>
Cc: Takashi Iwai <[email protected]>; Zhang, Rui <[email protected]>; Daniel Lezcano <[email protected]>; Linux PM list <[email protected]>; LKML <[email protected]>
Subject: Re: [PATCH] thermal: Add a sanity check for invalid state at stats update
Importance: High

On Tue, 07 Apr 2020 15:30:51 +0200,
Amit Kucheria wrote:
>
> On Mon, Mar 30, 2020 at 7:39 PM Takashi Iwai <[email protected]> wrote:
> >
> > The thermal sysfs handler keeps the statistics table with the fixed
> > size that was determined from the initial max_states() call, and the
> > table entry is updated at each sysfs cur_state write call. And,
> > when the driver's set_cur_state() ops accepts the value given from
> > user-space, the thermal sysfs core blindly applies it to the
> > statistics table entry, which may overflow and cause an Oops.
> > Although it's rather a bug in the driver's ops implementations, we
> > shouldn't crash but rather give a proper warning instead.
> >
> > This patch adds a sanity check for avoiding such an OOB access and
> > warns with a stack trace to show the suspicious device in question.
>
> Hi Takashi,
>
> Instead of this warning, I think we should reject such input when
> writing to cur_state.
>
> See attached patch. If you think this OK, I'll submit it.

Actually the input value itself is correct, the problem is rather about the max_states that may vary depending on other driver. So IMO, we don't want to refuse the input completely.

Please see the thread:
https://lore.kernel.org/linux-acpi/[email protected]/


thanks,

Takashi


>
> Regards,
> Amit
>
> > Signed-off-by: Takashi Iwai <[email protected]>
> > ---
> >
> > We've hit some crash by stress tests, and this patch at least works
> > around the crash itself. While the actual bug fix of the buggy
> > driver is still being investigated, I submit the hardening in the
> > core side at first.
> >
> > drivers/thermal/thermal_sysfs.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/thermal/thermal_sysfs.c
> > b/drivers/thermal/thermal_sysfs.c index aa99edb4dff7..a23c4e701d63
> > 100644
> > --- a/drivers/thermal/thermal_sysfs.c
> > +++ b/drivers/thermal/thermal_sysfs.c
> > @@ -772,6 +772,11 @@ void thermal_cooling_device_stats_update(struct
> > thermal_cooling_device *cdev,
> >
> > spin_lock(&stats->lock);
> >
> > + if (dev_WARN_ONCE(&cdev->device, new_state >= stats->max_states,
> > + "new state %ld exceeds max_state %ld",
> > + new_state, stats->max_states))
> > + goto unlock;
> > +
> > if (stats->state == new_state)
> > goto unlock;
> >
> > --
> > 2.16.4
> >
> From 54266260d483ab4476510dd4461a1cafc611e17d Mon Sep 17 00:00:00 2001
> Message-Id:
> <54266260d483ab4476510dd4461a1cafc611e17d.1586266224.git.amit.kucheria
> @linaro.org>
> From: Amit Kucheria <[email protected]>
> Date: Tue, 7 Apr 2020 18:48:14 +0530
> Subject: [PATCH] thermal: Reject invalid cur_state input from
> userspace
>
> We don't check if the cur_state value input in sysfs is greater than
> the maximum cooling state that the cooling device supports. This can
> cause access to unallocated memory in case THERMAL_STATISTICS in
> enabled and could also crash cooling devices that don't check for an
> invalid state in their set_cur_state() callback.
>
> Return an error if the state being requested in greater than the
> maximum cooling state the device supports.
>
> Reported-by: Takashi Iwai <[email protected]>
> Signed-off-by: Amit Kucheria <[email protected]>
> ---
> drivers/thermal/thermal_sysfs.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/thermal/thermal_sysfs.c
> b/drivers/thermal/thermal_sysfs.c index 7e1d11bdd258..8033e5a9386a
> 100644
> --- a/drivers/thermal/thermal_sysfs.c
> +++ b/drivers/thermal/thermal_sysfs.c
> @@ -703,7 +703,7 @@ cur_state_store(struct device *dev, struct device_attribute *attr,
> const char *buf, size_t count)
> {
> struct thermal_cooling_device *cdev = to_cooling_device(dev);
> - unsigned long state;
> + unsigned long state, max_state;
> int result;
>
> if (sscanf(buf, "%ld\n", &state) != 1) @@ -712,6 +712,13 @@
> cur_state_store(struct device *dev, struct device_attribute *attr,
> if ((long)state < 0)
> return -EINVAL;
>
> + result = cdev->ops->get_max_state(cdev, &max_state);
> + if (result)
> + return result;
> +
> + if (state >= max_state)
> + return -EINVAL;
> +
> mutex_lock(&cdev->lock);
>
> result = cdev->ops->set_cur_state(cdev, state);
> --
> 2.20.1
>

2020-04-07 15:40:21

by Zhang, Rui

[permalink] [raw]
Subject: RE: [PATCH] thermal: Add a sanity check for invalid state at stats update

Too late for me to post it out by this midnight, will post it out tomorrow.

Thanks,
rui

-----Original Message-----
From: [email protected] <[email protected]> On Behalf Of Zhang, Rui
Sent: Tuesday, April 07, 2020 10:17 PM
To: Takashi Iwai <[email protected]>; Amit Kucheria <[email protected]>
Cc: Daniel Lezcano <[email protected]>; Linux PM list <[email protected]>; LKML <[email protected]>
Subject: RE: [PATCH] thermal: Add a sanity check for invalid state at stats update

Right, I have a V2 patch series, which will be post soon.

Thanks,
rui

-----Original Message-----
From: [email protected] <[email protected]> On Behalf Of Takashi Iwai
Sent: Tuesday, April 07, 2020 10:13 PM
To: Amit Kucheria <[email protected]>
Cc: Takashi Iwai <[email protected]>; Zhang, Rui <[email protected]>; Daniel Lezcano <[email protected]>; Linux PM list <[email protected]>; LKML <[email protected]>
Subject: Re: [PATCH] thermal: Add a sanity check for invalid state at stats update
Importance: High

On Tue, 07 Apr 2020 15:30:51 +0200,
Amit Kucheria wrote:
>
> On Mon, Mar 30, 2020 at 7:39 PM Takashi Iwai <[email protected]> wrote:
> >
> > The thermal sysfs handler keeps the statistics table with the fixed
> > size that was determined from the initial max_states() call, and the
> > table entry is updated at each sysfs cur_state write call. And,
> > when the driver's set_cur_state() ops accepts the value given from
> > user-space, the thermal sysfs core blindly applies it to the
> > statistics table entry, which may overflow and cause an Oops.
> > Although it's rather a bug in the driver's ops implementations, we
> > shouldn't crash but rather give a proper warning instead.
> >
> > This patch adds a sanity check for avoiding such an OOB access and
> > warns with a stack trace to show the suspicious device in question.
>
> Hi Takashi,
>
> Instead of this warning, I think we should reject such input when
> writing to cur_state.
>
> See attached patch. If you think this OK, I'll submit it.

Actually the input value itself is correct, the problem is rather about the max_states that may vary depending on other driver. So IMO, we don't want to refuse the input completely.

Please see the thread:
https://lore.kernel.org/linux-acpi/[email protected]/


thanks,

Takashi


>
> Regards,
> Amit
>
> > Signed-off-by: Takashi Iwai <[email protected]>
> > ---
> >
> > We've hit some crash by stress tests, and this patch at least works
> > around the crash itself. While the actual bug fix of the buggy
> > driver is still being investigated, I submit the hardening in the
> > core side at first.
> >
> > drivers/thermal/thermal_sysfs.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/thermal/thermal_sysfs.c
> > b/drivers/thermal/thermal_sysfs.c index aa99edb4dff7..a23c4e701d63
> > 100644
> > --- a/drivers/thermal/thermal_sysfs.c
> > +++ b/drivers/thermal/thermal_sysfs.c
> > @@ -772,6 +772,11 @@ void thermal_cooling_device_stats_update(struct
> > thermal_cooling_device *cdev,
> >
> > spin_lock(&stats->lock);
> >
> > + if (dev_WARN_ONCE(&cdev->device, new_state >= stats->max_states,
> > + "new state %ld exceeds max_state %ld",
> > + new_state, stats->max_states))
> > + goto unlock;
> > +
> > if (stats->state == new_state)
> > goto unlock;
> >
> > --
> > 2.16.4
> >
> From 54266260d483ab4476510dd4461a1cafc611e17d Mon Sep 17 00:00:00 2001
> Message-Id:
> <54266260d483ab4476510dd4461a1cafc611e17d.1586266224.git.amit.kucheria
> @linaro.org>
> From: Amit Kucheria <[email protected]>
> Date: Tue, 7 Apr 2020 18:48:14 +0530
> Subject: [PATCH] thermal: Reject invalid cur_state input from
> userspace
>
> We don't check if the cur_state value input in sysfs is greater than
> the maximum cooling state that the cooling device supports. This can
> cause access to unallocated memory in case THERMAL_STATISTICS in
> enabled and could also crash cooling devices that don't check for an
> invalid state in their set_cur_state() callback.
>
> Return an error if the state being requested in greater than the
> maximum cooling state the device supports.
>
> Reported-by: Takashi Iwai <[email protected]>
> Signed-off-by: Amit Kucheria <[email protected]>
> ---
> drivers/thermal/thermal_sysfs.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/thermal/thermal_sysfs.c
> b/drivers/thermal/thermal_sysfs.c index 7e1d11bdd258..8033e5a9386a
> 100644
> --- a/drivers/thermal/thermal_sysfs.c
> +++ b/drivers/thermal/thermal_sysfs.c
> @@ -703,7 +703,7 @@ cur_state_store(struct device *dev, struct device_attribute *attr,
> const char *buf, size_t count)
> {
> struct thermal_cooling_device *cdev = to_cooling_device(dev);
> - unsigned long state;
> + unsigned long state, max_state;
> int result;
>
> if (sscanf(buf, "%ld\n", &state) != 1) @@ -712,6 +712,13 @@
> cur_state_store(struct device *dev, struct device_attribute *attr,
> if ((long)state < 0)
> return -EINVAL;
>
> + result = cdev->ops->get_max_state(cdev, &max_state);
> + if (result)
> + return result;
> +
> + if (state >= max_state)
> + return -EINVAL;
> +
> mutex_lock(&cdev->lock);
>
> result = cdev->ops->set_cur_state(cdev, state);
> --
> 2.20.1
>