2020-04-06 16:24:01

by Holger Hoffstätte

[permalink] [raw]
Subject: hwmon: drivetemp: bogus values after wake up from suspend


I've been giving the drivetemp hwmon driver a try and am very happy
with it; works right away and - much to my surprise - doesn't wake up
HDDs that have gone to sleep. Nice!

I did notice one tiny thing though: after waking up from suspend, my SSD
(Samsung 850 Pro) reports a few initial bogus values - suspiciously -128°,
which is definitely not the temperature in my office. While this is more
a cosmetic problem, it cramps my monitoring setup and leads to wrong graphs.
Can't have that!

So I looked into the source and found that the values are (understandably)
passed on unfiltered/uncapped. Since it's unlikely any active device has
operating temperature below-zero, I figured the laziest way is to cap the
value to positive:

diff -rup a/drivers/hwmon/drivetemp.c b/drivers/hwmon/drivetemp.c
--- a/drivers/hwmon/drivetemp.c 2020-04-02 08:02:32.000000000 +0200
+++ b/drivers/hwmon/drivetemp.c 2020-04-06 18:13:04.892554087 +0200
@@ -147,7 +147,7 @@ static LIST_HEAD(drivetemp_devlist);
#define INVALID_TEMP 0x80

#define temp_is_valid(temp) ((temp) != INVALID_TEMP)
-#define temp_from_sct(temp) (((s8)(temp)) * 1000)
+#define temp_from_sct(temp) (max(0, ((s8)(temp)) * 1000))

static inline bool ata_id_smart_supported(u16 *id)
{

The assumption is of course *theoretically* wrong since some
equipment might indeed operate in negative C°. One way might be
to use the device's "low" operating point first, but then that
might not be available and we'd be back to capping to 0.
I'm open to other suggestions. :)

thanks,
Holger


2020-04-06 19:19:03

by Guenter Roeck

[permalink] [raw]
Subject: Re: hwmon: drivetemp: bogus values after wake up from suspend

On Mon, Apr 06, 2020 at 06:23:01PM +0200, Holger Hoffst?tte wrote:
>
> I've been giving the drivetemp hwmon driver a try and am very happy
> with it; works right away and - much to my surprise - doesn't wake up
> HDDs that have gone to sleep. Nice!
>
> I did notice one tiny thing though: after waking up from suspend, my SSD
> (Samsung 850 Pro) reports a few initial bogus values - suspiciously -128?,
> which is definitely not the temperature in my office. While this is more
> a cosmetic problem, it cramps my monitoring setup and leads to wrong graphs.
> Can't have that!
>
> So I looked into the source and found that the values are (understandably)
> passed on unfiltered/uncapped. Since it's unlikely any active device has
> operating temperature below-zero, I figured the laziest way is to cap the
> value to positive:
>
> diff -rup a/drivers/hwmon/drivetemp.c b/drivers/hwmon/drivetemp.c
> --- a/drivers/hwmon/drivetemp.c 2020-04-02 08:02:32.000000000 +0200
> +++ b/drivers/hwmon/drivetemp.c 2020-04-06 18:13:04.892554087 +0200
> @@ -147,7 +147,7 @@ static LIST_HEAD(drivetemp_devlist);
> #define INVALID_TEMP 0x80
> #define temp_is_valid(temp) ((temp) != INVALID_TEMP)
> -#define temp_from_sct(temp) (((s8)(temp)) * 1000)
> +#define temp_from_sct(temp) (max(0, ((s8)(temp)) * 1000))
> static inline bool ata_id_smart_supported(u16 *id)
> {
>
> The assumption is of course *theoretically* wrong since some
> equipment might indeed operate in negative C?. One way might be
> to use the device's "low" operating point first, but then that
> might not be available and we'd be back to capping to 0.
> I'm open to other suggestions. :)
>

I think 0 is't much better than -128, unless your office is somewhere
in the Arctic. I'll have to loook up the spec, but I think -128 may mean
"no data". Maybe we can return something like -ENODATA in that case.

Guenter

2020-04-07 01:42:52

by Guenter Roeck

[permalink] [raw]
Subject: Re: hwmon: drivetemp: bogus values after wake up from suspend

On 4/6/20 9:23 AM, Holger Hoffstätte wrote:
>
> I've been giving the drivetemp hwmon driver a try and am very happy
> with it; works right away and - much to my surprise - doesn't wake up
> HDDs that have gone to sleep. Nice!
>
> I did notice one tiny thing though: after waking up from suspend, my SSD
> (Samsung 850 Pro) reports a few initial bogus values - suspiciously -128°,
> which is definitely not the temperature in my office. While this is more
> a cosmetic problem, it cramps my monitoring setup and leads to wrong graphs.
> Can't have that!
>
> So I looked into the source and found that the values are (understandably)
> passed on unfiltered/uncapped. Since it's unlikely any active device has
> operating temperature below-zero, I figured the laziest way is to cap the
> value to positive:
>
> diff -rup a/drivers/hwmon/drivetemp.c b/drivers/hwmon/drivetemp.c
> --- a/drivers/hwmon/drivetemp.c    2020-04-02 08:02:32.000000000 +0200
> +++ b/drivers/hwmon/drivetemp.c    2020-04-06 18:13:04.892554087 +0200
> @@ -147,7 +147,7 @@ static LIST_HEAD(drivetemp_devlist);
>  #define INVALID_TEMP        0x80
>  
>  #define temp_is_valid(temp)    ((temp) != INVALID_TEMP)
> -#define temp_from_sct(temp)    (((s8)(temp)) * 1000)
> +#define temp_from_sct(temp)    (max(0, ((s8)(temp)) * 1000))
>  
>  static inline bool ata_id_smart_supported(u16 *id)
>  {
>
> The assumption is of course *theoretically* wrong since some
> equipment might indeed operate in negative C°. One way might be
> to use the device's "low" operating point first, but then that
> might not be available and we'd be back to capping to 0.
> I'm open to other suggestions. :)
>

Looking into the code, 0x80 or -128 indeed reflects an invalid temperature.
Any chance you can apply the following to see if it helps ?

diff --git a/drivers/hwmon/drivetemp.c b/drivers/hwmon/drivetemp.c
index 370d0c74eb01..c27239eb28cf 100644
--- a/drivers/hwmon/drivetemp.c
+++ b/drivers/hwmon/drivetemp.c
@@ -264,6 +264,8 @@ static int drivetemp_get_scttemp(struct drivetemp_data *st, u32 attr, long *val)
return err;
switch (attr) {
case hwmon_temp_input:
+ if (!temp_is_valid(buf[SCT_STATUS_TEMP]))
+ return -ENODATA;
*val = temp_from_sct(buf[SCT_STATUS_TEMP]);
break;
case hwmon_temp_lowest:

I am not sure what the best error code would be - suggestions welcome.

Thanks,
Guenter

2020-04-08 04:01:44

by Holger Hoffstätte

[permalink] [raw]
Subject: Re: hwmon: drivetemp: bogus values after wake up from suspend

On 4/7/20 3:41 AM, Guenter Roeck wrote:
> On 4/6/20 9:23 AM, Holger Hoffstätte wrote:
>>
>> I've been giving the drivetemp hwmon driver a try and am very happy
>> with it; works right away and - much to my surprise - doesn't wake up
>> HDDs that have gone to sleep. Nice!
>>
>> I did notice one tiny thing though: after waking up from suspend, my SSD
>> (Samsung 850 Pro) reports a few initial bogus values - suspiciously -128°,
>> which is definitely not the temperature in my office. While this is more
>> a cosmetic problem, it cramps my monitoring setup and leads to wrong graphs.
>> Can't have that!
>>
>> So I looked into the source and found that the values are (understandably)
>> passed on unfiltered/uncapped. Since it's unlikely any active device has
>> operating temperature below-zero, I figured the laziest way is to cap the
>> value to positive:
>>
>> diff -rup a/drivers/hwmon/drivetemp.c b/drivers/hwmon/drivetemp.c
>> --- a/drivers/hwmon/drivetemp.c    2020-04-02 08:02:32.000000000 +0200
>> +++ b/drivers/hwmon/drivetemp.c    2020-04-06 18:13:04.892554087 +0200
>> @@ -147,7 +147,7 @@ static LIST_HEAD(drivetemp_devlist);
>>  #define INVALID_TEMP        0x80
>>
>>  #define temp_is_valid(temp)    ((temp) != INVALID_TEMP)
>> -#define temp_from_sct(temp)    (((s8)(temp)) * 1000)
>> +#define temp_from_sct(temp)    (max(0, ((s8)(temp)) * 1000))
>>
>>  static inline bool ata_id_smart_supported(u16 *id)
>>  {
>>
>> The assumption is of course *theoretically* wrong since some
>> equipment might indeed operate in negative C°. One way might be
>> to use the device's "low" operating point first, but then that
>> might not be available and we'd be back to capping to 0.
>> I'm open to other suggestions. :)
>>
>
> Looking into the code, 0x80 or -128 indeed reflects an invalid temperature.

Excellent, that's of course much better than just capping to 0.

> Any chance you can apply the following to see if it helps ?
>
> diff --git a/drivers/hwmon/drivetemp.c b/drivers/hwmon/drivetemp.c
> index 370d0c74eb01..c27239eb28cf 100644
> --- a/drivers/hwmon/drivetemp.c
> +++ b/drivers/hwmon/drivetemp.c
> @@ -264,6 +264,8 @@ static int drivetemp_get_scttemp(struct drivetemp_data *st, u32 attr, long *val)
> return err;
> switch (attr) {
> case hwmon_temp_input:
> + if (!temp_is_valid(buf[SCT_STATUS_TEMP]))
> + return -ENODATA;
> *val = temp_from_sct(buf[SCT_STATUS_TEMP]);
> break;
> case hwmon_temp_lowest:
>
> I am not sure what the best error code would be - suggestions welcome.

Gave it a try and had to wait overnight for things to cool down
(just suspending for an hour wouldn't do it). Right after wakeup sensors
now shows "N/A" as expected, and no illegal values in drivetemp or my
monitoring; missing values are perfectly fine.
After a few minutes correct values show up and all is good.

In case you submit this as official patch feel free to add my
Reported-by/Tested-by. Thanks for looking into it!

cheers
Holger