The JC42 compatible thermal sensor on Kingston KSM32ES8/16ME DIMMs
(using Micron E-Die) is an ST Microelectronics STTS2004 (manufacturer
0x104a, device 0x2201). It does not keep the previously programmed
minimum, maximum and critical temperatures after system suspend and
resume (which is a shutdown / startup cycle for the JC42 temperature
sensor). This results in an alarm on system resume because the hardware
default for these values is 0°C (so any environment temperature greater
than 0°C will trigger the alarm).
Example before system suspend:
jc42-i2c-0-1a
Adapter: SMBus PIIX4 adapter port 0 at 0b00
temp1: +34.8°C (low = +0.0°C)
(high = +85.0°C, hyst = +85.0°C)
(crit = +95.0°C, hyst = +95.0°C)
Example after system resume (without this change):
jc42-i2c-0-1a
Adapter: SMBus PIIX4 adapter port 0 at 0b00
temp1: +34.8°C (low = +0.0°C) ALARM (HIGH, CRIT)
(high = +0.0°C, hyst = +0.0°C)
(crit = +0.0°C, hyst = +0.0°C)
Apply the previously read or previously programmed temperature limits on
system resume. This fixes the alarm due to the hardware defaults of 0°C
because the previously applied limits (from a userspace setting) are
re-applied on system resume.
Fixes: 175c490c9e7f ("hwmon: (jc42) Add support for STTS2004 and AT30TSE004")
Signed-off-by: Martin Blumenstingl <[email protected]>
---
This is my first change to jc42. I tried to be defensive with applying
the previous values by only configuring them if they are known valid. I
only have this one set of JC42 compatible sensors but I can adapt the
code here based on your suggestions.
drivers/hwmon/jc42.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/drivers/hwmon/jc42.c b/drivers/hwmon/jc42.c
index 30888feaf589..f98b28ff10ad 100644
--- a/drivers/hwmon/jc42.c
+++ b/drivers/hwmon/jc42.c
@@ -558,6 +558,19 @@ static int jc42_resume(struct device *dev)
data->config &= ~JC42_CFG_SHUTDOWN;
i2c_smbus_write_word_swapped(data->client, JC42_REG_CONFIG,
data->config);
+
+ if (data->valid || data->temp[t_min])
+ i2c_smbus_write_word_swapped(data->client, temp_regs[t_min],
+ data->temp[t_min]);
+
+ if (data->valid || data->temp[t_max])
+ i2c_smbus_write_word_swapped(data->client, temp_regs[t_max],
+ data->temp[t_max]);
+
+ if (data->valid || data->temp[t_crit])
+ i2c_smbus_write_word_swapped(data->client, temp_regs[t_crit],
+ data->temp[t_crit]);
+
return 0;
}
--
2.38.1
Hi Guenter,
Thank you for the quick feedback!
On Wed, Oct 19, 2022 at 11:51 PM Guenter Roeck <[email protected]> wrote:
[...]
> > + if (data->valid || data->temp[t_min])
>
> This contradicts "with applying the previous values by only configuring
> them if they are known valid". It explicitly applies the values if they
> are marked as not valid, and it also applies the values if they are 0
> (I don't really see the value of doing that).
>
> Sorry, I don't understand the logic. Did you mean to use "&&" instead
> of "||" ?
My understanding is that that:
1) data->valid = true is only set in jc42_update_device() (which is
only called when reading back the values from the registers)
2) jc42_write() can write values without setting data->value = true
In other words: if jc42_read() is never called but jc42_write() is
then we still have some setting to apply while data->valid is false.
Whether that's possible in reality is something that I'm not sure
about.
If your suggestion is to simplify this to use data->valid only then I
can do that.
It would be great if you could also comment on whether
jc42_update_device() should be called from jc42_suspend() to give the
driver the chance to at least read the data once (and set data->valid)
if this has not happened before.
Best regards,
Martin
On 10/19/22 14:41, Martin Blumenstingl wrote:
> The JC42 compatible thermal sensor on Kingston KSM32ES8/16ME DIMMs
> (using Micron E-Die) is an ST Microelectronics STTS2004 (manufacturer
> 0x104a, device 0x2201). It does not keep the previously programmed
> minimum, maximum and critical temperatures after system suspend and
> resume (which is a shutdown / startup cycle for the JC42 temperature
> sensor). This results in an alarm on system resume because the hardware
> default for these values is 0°C (so any environment temperature greater
> than 0°C will trigger the alarm).
>
> Example before system suspend:
> jc42-i2c-0-1a
> Adapter: SMBus PIIX4 adapter port 0 at 0b00
> temp1: +34.8°C (low = +0.0°C)
> (high = +85.0°C, hyst = +85.0°C)
> (crit = +95.0°C, hyst = +95.0°C)
>
> Example after system resume (without this change):
> jc42-i2c-0-1a
> Adapter: SMBus PIIX4 adapter port 0 at 0b00
> temp1: +34.8°C (low = +0.0°C) ALARM (HIGH, CRIT)
> (high = +0.0°C, hyst = +0.0°C)
> (crit = +0.0°C, hyst = +0.0°C)
>
> Apply the previously read or previously programmed temperature limits on
> system resume. This fixes the alarm due to the hardware defaults of 0°C
> because the previously applied limits (from a userspace setting) are
> re-applied on system resume.
>
> Fixes: 175c490c9e7f ("hwmon: (jc42) Add support for STTS2004 and AT30TSE004")
> Signed-off-by: Martin Blumenstingl <[email protected]>
> ---
> This is my first change to jc42. I tried to be defensive with applying
> the previous values by only configuring them if they are known valid. I
> only have this one set of JC42 compatible sensors but I can adapt the
> code here based on your suggestions.
>
>
> drivers/hwmon/jc42.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/drivers/hwmon/jc42.c b/drivers/hwmon/jc42.c
> index 30888feaf589..f98b28ff10ad 100644
> --- a/drivers/hwmon/jc42.c
> +++ b/drivers/hwmon/jc42.c
> @@ -558,6 +558,19 @@ static int jc42_resume(struct device *dev)
> data->config &= ~JC42_CFG_SHUTDOWN;
> i2c_smbus_write_word_swapped(data->client, JC42_REG_CONFIG,
> data->config);
> +
> + if (data->valid || data->temp[t_min])
This contradicts "with applying the previous values by only configuring
them if they are known valid". It explicitly applies the values if they
are marked as not valid, and it also applies the values if they are 0
(I don't really see the value of doing that).
Sorry, I don't understand the logic. Did you mean to use "&&" instead
of "||" ?
Thanks,
Guenter
> + i2c_smbus_write_word_swapped(data->client, temp_regs[t_min],
> + data->temp[t_min]);
> +
> + if (data->valid || data->temp[t_max])
> + i2c_smbus_write_word_swapped(data->client, temp_regs[t_max],
> + data->temp[t_max]);
> +
> + if (data->valid || data->temp[t_crit])
> + i2c_smbus_write_word_swapped(data->client, temp_regs[t_crit],
> + data->temp[t_crit]);
> +
> return 0;
> }
>
On 10/19/22 15:03, Martin Blumenstingl wrote:
> Hi Guenter,
>
> Thank you for the quick feedback!
>
> On Wed, Oct 19, 2022 at 11:51 PM Guenter Roeck <[email protected]> wrote:
> [...]
>>> + if (data->valid || data->temp[t_min])
>>
>> This contradicts "with applying the previous values by only configuring
>> them if they are known valid". It explicitly applies the values if they
>> are marked as not valid, and it also applies the values if they are 0
>> (I don't really see the value of doing that).
>>
>> Sorry, I don't understand the logic. Did you mean to use "&&" instead
>> of "||" ?
> My understanding is that that:
> 1) data->valid = true is only set in jc42_update_device() (which is
> only called when reading back the values from the registers)
> 2) jc42_write() can write values without setting data->value = true
> In other words: if jc42_read() is never called but jc42_write() is
> then we still have some setting to apply while data->valid is false.
> Whether that's possible in reality is something that I'm not sure
> about.
>
The above only means that the code is not optimized for the problem
you are trying to solve.
- Calling jc42_update_device() would solve the valid problem.
- Calling jc42_update_device() from the write function would
solve it as well.
Relying on a previous call to the write function is most definitely
wrong.
Also, for optimization, jc42_update_device() should really read the limit
registers only once. The best solution would probably be to convert the
driver to use regmap and let regmap handle the caching.
Guenter
> If your suggestion is to simplify this to use data->valid only then I
> can do that.
> It would be great if you could also comment on whether
> jc42_update_device() should be called from jc42_suspend() to give the
> driver the chance to at least read the data once (and set data->valid)
> if this has not happened before.
>
>
> Best regards,
> Martin