2024-04-06 00:58:22

by Ricardo Neri

[permalink] [raw]
Subject: [PATCH 0/3] drivers: thermal/hwmon: intel: Use model-specific bitmasks for temperature registers

Hi,

We have been treating the register MSR_TEMPERATURE_TARGET as a
architectural. It is model-specific.

The registers IA32_[PACKAGE]_THERM_STATUS are architectural. However, they
have become model-specific: in recent processors the temperature readout
occupies bits [23:16] whereas the Intel Software Developer's manual
specifies that it uses [22:16].

Using incorrect bitmasks leads to incorrect temperature readings and
writes to reserved register bits. For instance, temperatures below ~-27C
(depending on the value of TjMax) would be read incorrectly if only the
bits [22:16] of IA32_THERM_[PACKAGE]_STATUS are used.

Update the intel_tcc library to use model-specific bitmasks. Also update
the hwmon/coretemp and intel_tcc_cooling drivers drivers to use the model
checking utilities of intel_tcc.

Updating hwmon/coretemp to use the intel_tcc library required to add a
weak reverse dependency on CONFIG_INTEL_TCC. The less attractive
alternative would be to duplicate the model checking functionality of
intel_tcc in hwmon/coretemp.

I have tested these patches on Alder Lake, Meteor Lake, and Grand Ridge
systems by looking at the temp*_input sysfs files of hwmon.

These patches apply cleanly on top of the `testing` branches of the
linux-pm and hwmon repositories.

Thanks and BR,
Ricardo

Ricardo Neri (3):
thermal: intel: intel_tcc: Add model checks for temperature registers
thermal: intel: intel_tcc_cooling: Use a model-specific bitmask for
TCC offset
hwmon: (coretemp) Use a model-specific bitmask to read registers

drivers/hwmon/Kconfig | 1 +
drivers/hwmon/coretemp.c | 6 +-
drivers/thermal/intel/intel_tcc.c | 177 +++++++++++++++++++++-
drivers/thermal/intel/intel_tcc_cooling.c | 2 +-
include/linux/intel_tcc.h | 8 +
5 files changed, 187 insertions(+), 7 deletions(-)

--
2.34.1



2024-04-06 00:58:37

by Ricardo Neri

[permalink] [raw]
Subject: [PATCH 2/3] thermal: intel: intel_tcc_cooling: Use a model-specific bitmask for TCC offset

The TCC offset field in the register MSR_TEMPERATURE_TARGET is not
architectural. The TCC library provides a model-specific bitmask. Use it to
determine the maximum TCC offset.

Suggested-by: Zhang Rui <[email protected]>
Signed-off-by: Ricardo Neri <[email protected]>
---
Cc: Daniel Lezcano <[email protected]>
Cc: Lukasz Luba <[email protected]>
Cc: Srinivas Pandruvada <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected] # v6.7+
---
drivers/thermal/intel/intel_tcc_cooling.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/thermal/intel/intel_tcc_cooling.c b/drivers/thermal/intel/intel_tcc_cooling.c
index 6c392147e6d1..308946853cdd 100644
--- a/drivers/thermal/intel/intel_tcc_cooling.c
+++ b/drivers/thermal/intel/intel_tcc_cooling.c
@@ -20,7 +20,7 @@ static struct thermal_cooling_device *tcc_cdev;
static int tcc_get_max_state(struct thermal_cooling_device *cdev, unsigned long
*state)
{
- *state = 0x3f;
+ *state = get_tcc_offset_mask();
return 0;
}

--
2.34.1


2024-04-06 00:58:53

by Ricardo Neri

[permalink] [raw]
Subject: [PATCH 3/3] hwmon: (coretemp) Use a model-specific bitmask to read registers

The Intel Software Development manual defines states the temperature
digital readout as the bits [22:16] of the IA32_[PACKAGE]_THERM_STATUS
registers. In recent processor, however, the range is [23:16]. Use a
model-specific bitmask to extract the temperature readout correctly.

Instead of re-implementing model checks, extract the correct bitmask
using the intel_tcc library. Add an 'imply' weak reverse dependency on
CONFIG_INTEL_TCC. This captures the dependency and lets user to unselect
them if they are so inclined. In such case, the bitmask used for the
digital readout is [22:16] as specified in the Intel Software Developer's
manual.

Signed-off-by: Ricardo Neri <[email protected]>
---
Cc: Daniel Lezcano <[email protected]>
Cc: Lukasz Luba <[email protected]>
Cc: Srinivas Pandruvada <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected] # v6.7+
---
drivers/hwmon/Kconfig | 1 +
drivers/hwmon/coretemp.c | 6 +++++-
2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
index 83945397b6eb..11d72b3009bf 100644
--- a/drivers/hwmon/Kconfig
+++ b/drivers/hwmon/Kconfig
@@ -847,6 +847,7 @@ config SENSORS_I5500
config SENSORS_CORETEMP
tristate "Intel Core/Core2/Atom temperature sensor"
depends on X86
+ imply INTEL_TCC
help
If you say yes here you get support for the temperature
sensor inside your CPU. Most of the family 6 CPUs
diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c
index 616bd1a5b864..5632e1b1dfb1 100644
--- a/drivers/hwmon/coretemp.c
+++ b/drivers/hwmon/coretemp.c
@@ -17,6 +17,7 @@
#include <linux/sysfs.h>
#include <linux/hwmon-sysfs.h>
#include <linux/err.h>
+#include <linux/intel_tcc.h>
#include <linux/mutex.h>
#include <linux/list.h>
#include <linux/platform_device.h>
@@ -404,6 +405,8 @@ static ssize_t show_temp(struct device *dev,
tjmax = get_tjmax(tdata, dev);
/* Check whether the time interval has elapsed */
if (time_after(jiffies, tdata->last_updated + HZ)) {
+ u32 mask = intel_tcc_get_temp_mask(is_pkg_temp_data(tdata));
+
rdmsr_on_cpu(tdata->cpu, tdata->status_reg, &eax, &edx);
/*
* Ignore the valid bit. In all observed cases the register
@@ -411,7 +414,7 @@ static ssize_t show_temp(struct device *dev,
* Return it instead of reporting an error which doesn't
* really help at all.
*/
- tdata->temp = tjmax - ((eax >> 16) & 0x7f) * 1000;
+ tdata->temp = tjmax - ((eax >> 16) & mask) * 1000;
tdata->last_updated = jiffies;
}

@@ -838,4 +841,5 @@ module_exit(coretemp_exit)

MODULE_AUTHOR("Rudolf Marek <[email protected]>");
MODULE_DESCRIPTION("Intel Core temperature monitor");
+MODULE_IMPORT_NS(INTEL_TCC);
MODULE_LICENSE("GPL");
--
2.34.1


2024-04-06 00:58:54

by Ricardo Neri

[permalink] [raw]
Subject: [PATCH 1/3] thermal: intel: intel_tcc: Add model checks for temperature registers

The register MSR_TEMPERATURE_TARGET is not architectural. Its fields may be
defined differently for each processor model. TCC_OFFSET is an example of
such case.

Despite being specified as architectural, the registers IA32_[PACKAGE]_
THERM_STATUS have become model-specific: in recent processors, the
digital temperature readout uses bits [23:16] whereas the Intel Software
Developer's manual specifies bits [22:16].

Create an array of processor models and their bitmasks for TCC_OFFSET and
the digital temperature readout fields. Do not include recent processors.
Instead, use the bitmasks of these recent processors as default.

Use these model-specific bitmasks when reading TCC_OFFSET or the
temperature sensors.

Initialize a model-specific data structure during subsys_initcall() to
have it ready when thermal drivers are loaded.

Expose the new interfaces get_tcc_offset_mask() and
intel_tcc_get_temp_mask(). The hwmon/coretemp and the intel_tcc_cooling
drivers need to use model-specific bitmasks. Include stubs that reflect
minimum support for !CONFIG_INTEL_TCC.

Signed-off-by: Ricardo Neri <[email protected]>
---
Cc: Daniel Lezcano <[email protected]>
Cc: Lukasz Luba <[email protected]>
Cc: Srinivas Pandruvada <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected] # v6.7+
---
drivers/thermal/intel/intel_tcc.c | 177 +++++++++++++++++++++++++++++-
include/linux/intel_tcc.h | 8 ++
2 files changed, 180 insertions(+), 5 deletions(-)

diff --git a/drivers/thermal/intel/intel_tcc.c b/drivers/thermal/intel/intel_tcc.c
index 5e8b7f34b395..1f595f174ab0 100644
--- a/drivers/thermal/intel/intel_tcc.c
+++ b/drivers/thermal/intel/intel_tcc.c
@@ -6,8 +6,170 @@

#include <linux/errno.h>
#include <linux/intel_tcc.h>
+#include <asm/cpu_device_id.h>
+#include <asm/intel-family.h>
#include <asm/msr.h>

+/**
+ * struct temp_masks - Bitmasks for temperature readings
+ * @tcc_offset: TCC offset in MSR_TEMPERATURE_TARGET
+ * @digital_readout: Digital readout in MSR_IA32_THERM_STATUS
+ * @pkg_digital_readout: Digital readout in MSR_IA32_PACKAGE_THERM_STATUS
+ *
+ * Bitmasks to extract the fields of the MSR_TEMPERATURE and IA32_[PACKAGE]_
+ * THERM_STATUS registers for different processor models.
+ *
+ * The bitmask of TjMax is not included in this structure. It is always 0xff.
+ */
+struct temp_masks {
+ u32 tcc_offset;
+ u32 digital_readout;
+ u32 pkg_digital_readout;
+};
+
+#define TCC_FAM6_MODEL_TEMP_MASKS(model, _tcc_offset, _digital_readout, \
+ _pkg_digital_readout) \
+ static const struct temp_masks temp_##model __initconst = { \
+ .tcc_offset = _tcc_offset, \
+ .digital_readout = _digital_readout, \
+ .pkg_digital_readout = _pkg_digital_readout \
+ }
+
+TCC_FAM6_MODEL_TEMP_MASKS(nehalem, 0, 0x7f, 0x7f);
+TCC_FAM6_MODEL_TEMP_MASKS(haswell_x, 0xf, 0x7f, 0x7f);
+TCC_FAM6_MODEL_TEMP_MASKS(broadwell, 0x3f, 0x7f, 0x7f);
+TCC_FAM6_MODEL_TEMP_MASKS(goldmont, 0x7f, 0x7f, 0x7f);
+TCC_FAM6_MODEL_TEMP_MASKS(tigerlake, 0x3f, 0xff, 0xff);
+TCC_FAM6_MODEL_TEMP_MASKS(sapphirerapids, 0x3f, 0x7f, 0xff);
+
+/* Use these masks for processors not included in @tcc_cpu_ids. */
+static struct temp_masks intel_tcc_temp_masks __ro_after_init = {
+ .tcc_offset = 0x7f,
+ .digital_readout = 0xff,
+ .pkg_digital_readout = 0xff,
+};
+
+static const struct x86_cpu_id intel_tcc_cpu_ids[] __initconst = {
+ X86_MATCH_INTEL_FAM6_MODEL(CORE_YONAH, &temp_nehalem),
+ X86_MATCH_INTEL_FAM6_MODEL(CORE2_MEROM, &temp_nehalem),
+ X86_MATCH_INTEL_FAM6_MODEL(CORE2_MEROM_L, &temp_nehalem),
+ X86_MATCH_INTEL_FAM6_MODEL(CORE2_PENRYN, &temp_nehalem),
+ X86_MATCH_INTEL_FAM6_MODEL(CORE2_DUNNINGTON, &temp_nehalem),
+ X86_MATCH_INTEL_FAM6_MODEL(NEHALEM, &temp_nehalem),
+ X86_MATCH_INTEL_FAM6_MODEL(NEHALEM_G, &temp_nehalem),
+ X86_MATCH_INTEL_FAM6_MODEL(NEHALEM_EP, &temp_nehalem),
+ X86_MATCH_INTEL_FAM6_MODEL(NEHALEM_EX, &temp_nehalem),
+ X86_MATCH_INTEL_FAM6_MODEL(WESTMERE, &temp_nehalem),
+ X86_MATCH_INTEL_FAM6_MODEL(WESTMERE_EP, &temp_nehalem),
+ X86_MATCH_INTEL_FAM6_MODEL(WESTMERE_EX, &temp_nehalem),
+ X86_MATCH_INTEL_FAM6_MODEL(SANDYBRIDGE, &temp_nehalem),
+ X86_MATCH_INTEL_FAM6_MODEL(SANDYBRIDGE_X, &temp_nehalem),
+ X86_MATCH_INTEL_FAM6_MODEL(IVYBRIDGE, &temp_nehalem),
+ X86_MATCH_INTEL_FAM6_MODEL(IVYBRIDGE_X, &temp_haswell_x),
+ X86_MATCH_INTEL_FAM6_MODEL(HASWELL, &temp_nehalem),
+ X86_MATCH_INTEL_FAM6_MODEL(HASWELL_X, &temp_haswell_x),
+ X86_MATCH_INTEL_FAM6_MODEL(HASWELL_L, &temp_nehalem),
+ X86_MATCH_INTEL_FAM6_MODEL(HASWELL_G, &temp_nehalem),
+ X86_MATCH_INTEL_FAM6_MODEL(BROADWELL, &temp_broadwell),
+ X86_MATCH_INTEL_FAM6_MODEL(BROADWELL_G, &temp_broadwell),
+ X86_MATCH_INTEL_FAM6_MODEL(BROADWELL_X, &temp_haswell_x),
+ X86_MATCH_INTEL_FAM6_MODEL(BROADWELL_D, &temp_haswell_x),
+ X86_MATCH_INTEL_FAM6_MODEL(SKYLAKE_L, &temp_broadwell),
+ X86_MATCH_INTEL_FAM6_MODEL(SKYLAKE, &temp_broadwell),
+ X86_MATCH_INTEL_FAM6_MODEL(SKYLAKE_X, &temp_haswell_x),
+ X86_MATCH_INTEL_FAM6_MODEL(KABYLAKE_L, &temp_broadwell),
+ X86_MATCH_INTEL_FAM6_MODEL(KABYLAKE, &temp_broadwell),
+ X86_MATCH_INTEL_FAM6_MODEL(COMETLAKE, &temp_broadwell),
+ X86_MATCH_INTEL_FAM6_MODEL(COMETLAKE_L, &temp_broadwell),
+ X86_MATCH_INTEL_FAM6_MODEL(CANNONLAKE_L, &temp_broadwell),
+ X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X, &temp_broadwell),
+ X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_D, &temp_broadwell),
+ X86_MATCH_INTEL_FAM6_MODEL(ICELAKE, &temp_broadwell),
+ X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_L, &temp_broadwell),
+ X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_NNPI, &temp_broadwell),
+ X86_MATCH_INTEL_FAM6_MODEL(ROCKETLAKE, &temp_broadwell),
+ X86_MATCH_INTEL_FAM6_MODEL(TIGERLAKE_L, &temp_tigerlake),
+ X86_MATCH_INTEL_FAM6_MODEL(TIGERLAKE, &temp_tigerlake),
+ X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, &temp_sapphirerapids),
+ X86_MATCH_INTEL_FAM6_MODEL(EMERALDRAPIDS_X, &temp_sapphirerapids),
+ X86_MATCH_INTEL_FAM6_MODEL(LAKEFIELD, &temp_broadwell),
+ X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE, &temp_tigerlake),
+ X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE_L, &temp_tigerlake),
+ X86_MATCH_INTEL_FAM6_MODEL(RAPTORLAKE, &temp_tigerlake),
+ X86_MATCH_INTEL_FAM6_MODEL(RAPTORLAKE_P, &temp_tigerlake),
+ X86_MATCH_INTEL_FAM6_MODEL(RAPTORLAKE_S, &temp_tigerlake),
+ X86_MATCH_INTEL_FAM6_MODEL(ATOM_BONNELL, &temp_nehalem),
+ X86_MATCH_INTEL_FAM6_MODEL(ATOM_BONNELL_MID, &temp_nehalem),
+ X86_MATCH_INTEL_FAM6_MODEL(ATOM_SALTWELL, &temp_nehalem),
+ X86_MATCH_INTEL_FAM6_MODEL(ATOM_SALTWELL_MID, &temp_nehalem),
+ X86_MATCH_INTEL_FAM6_MODEL(ATOM_SILVERMONT, &temp_broadwell),
+ X86_MATCH_INTEL_FAM6_MODEL(ATOM_SILVERMONT_D, &temp_broadwell),
+ X86_MATCH_INTEL_FAM6_MODEL(ATOM_SILVERMONT_MID, &temp_broadwell),
+ X86_MATCH_INTEL_FAM6_MODEL(ATOM_AIRMONT, &temp_broadwell),
+ X86_MATCH_INTEL_FAM6_MODEL(ATOM_AIRMONT_MID, &temp_broadwell),
+ X86_MATCH_INTEL_FAM6_MODEL(ATOM_AIRMONT_NP, &temp_broadwell),
+ X86_MATCH_INTEL_FAM6_MODEL(ATOM_GOLDMONT, &temp_goldmont),
+ X86_MATCH_INTEL_FAM6_MODEL(ATOM_GOLDMONT_D, &temp_goldmont),
+ X86_MATCH_INTEL_FAM6_MODEL(ATOM_GOLDMONT_PLUS, &temp_goldmont),
+ X86_MATCH_INTEL_FAM6_MODEL(ATOM_TREMONT_D, &temp_broadwell),
+ X86_MATCH_INTEL_FAM6_MODEL(ATOM_TREMONT, &temp_broadwell),
+ X86_MATCH_INTEL_FAM6_MODEL(ATOM_TREMONT_L, &temp_broadwell),
+ X86_MATCH_INTEL_FAM6_MODEL(ATOM_GRACEMONT, &temp_tigerlake),
+ X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNL, &temp_broadwell),
+ X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNM, &temp_broadwell),
+ {}
+};
+
+static int __init intel_tcc_init(void)
+{
+ const struct x86_cpu_id *id;
+
+ id = x86_match_cpu(intel_tcc_cpu_ids);
+ if (id)
+ memcpy(&intel_tcc_temp_masks, (const void *)id->driver_data,
+ sizeof(intel_tcc_temp_masks));
+
+ return 0;
+}
+/*
+ * Use subsys_initcall to ensure temperature bitmasks are initialized before
+ * the drivers that use this library.
+ */
+subsys_initcall(intel_tcc_init);
+
+/**
+ * get_tcc_offset_mask() - Returns the model-specific bitmask for TCC offset
+ *
+ * Get the model-specific bitmask to extract TCC_OFFSET from the MSR_TEMPERATURE_
+ * TARGET register. If the mask is 0, it means the processor does not support TCC offset.
+ *
+ * Return: The model-specific bitmask for TCC offset.
+ */
+u32 get_tcc_offset_mask(void)
+{
+ return intel_tcc_temp_masks.tcc_offset;
+}
+EXPORT_SYMBOL_NS(get_tcc_offset_mask, INTEL_TCC);
+
+/**
+ * intel_tcc_get_temp_mask() - Returns the model-specific bitmask for temperature
+ *
+ * @pkg: true: Package Thermal Sensor. false: Core Thermal Sensor.
+ *
+ * Get the model-specific bitmask to extract the temperature reading from the
+ * MSR_IA32_[PACKAGE]_THERM_STATUS register.
+ *
+ * Callers must check if the thermal status registers are supported.
+ *
+ * Return: The model-specific bitmask for temperature reading
+ */
+u32 intel_tcc_get_temp_mask(bool pkg)
+{
+ return pkg ? intel_tcc_temp_masks.pkg_digital_readout :
+ intel_tcc_temp_masks.digital_readout;
+}
+EXPORT_SYMBOL_NS(intel_tcc_get_temp_mask, INTEL_TCC);
+
/**
* intel_tcc_get_tjmax() - returns the default TCC activation Temperature
* @cpu: cpu that the MSR should be run on, nagative value means any cpu.
@@ -56,7 +218,7 @@ int intel_tcc_get_offset(int cpu)
if (err)
return err;

- return (low >> 24) & 0x3f;
+ return (low >> 24) & intel_tcc_temp_masks.tcc_offset;
}
EXPORT_SYMBOL_NS_GPL(intel_tcc_get_offset, INTEL_TCC);

@@ -76,7 +238,10 @@ int intel_tcc_set_offset(int cpu, int offset)
u32 low, high;
int err;

- if (offset < 0 || offset > 0x3f)
+ if (!intel_tcc_temp_masks.tcc_offset)
+ return -ENODEV;
+
+ if (offset < 0 || offset > intel_tcc_temp_masks.tcc_offset)
return -EINVAL;

if (cpu < 0)
@@ -90,7 +255,7 @@ int intel_tcc_set_offset(int cpu, int offset)
if (low & BIT(31))
return -EPERM;

- low &= ~(0x3f << 24);
+ low &= ~(intel_tcc_temp_masks.tcc_offset << 24);
low |= offset << 24;

if (cpu < 0)
@@ -113,8 +278,8 @@ EXPORT_SYMBOL_NS_GPL(intel_tcc_set_offset, INTEL_TCC);
*/
int intel_tcc_get_temp(int cpu, int *temp, bool pkg)
{
- u32 low, high;
u32 msr = pkg ? MSR_IA32_PACKAGE_THERM_STATUS : MSR_IA32_THERM_STATUS;
+ u32 low, high, mask;
int tjmax, err;

tjmax = intel_tcc_get_tjmax(cpu);
@@ -132,7 +297,9 @@ int intel_tcc_get_temp(int cpu, int *temp, bool pkg)
if (!(low & BIT(31)))
return -ENODATA;

- *temp = tjmax - ((low >> 16) & 0x7f);
+ mask = intel_tcc_get_temp_mask(pkg);
+
+ *temp = tjmax - ((low >> 16) & mask);

return 0;
}
diff --git a/include/linux/intel_tcc.h b/include/linux/intel_tcc.h
index 8ff8eabb4a98..e281cf06aeab 100644
--- a/include/linux/intel_tcc.h
+++ b/include/linux/intel_tcc.h
@@ -14,5 +14,13 @@ int intel_tcc_get_tjmax(int cpu);
int intel_tcc_get_offset(int cpu);
int intel_tcc_set_offset(int cpu, int offset);
int intel_tcc_get_temp(int cpu, int *temp, bool pkg);
+#ifdef CONFIG_INTEL_TCC
+u32 get_tcc_offset_mask(void);
+u32 intel_tcc_get_temp_mask(bool pkg);
+#else
+static inline u32 get_tcc_offset_mask(void) { return 0; }
+/* Use the architectural bitmask of the temperature readout. No model checks. */
+static inline u32 intel_tcc_get_temp_mask(bool pkg) { return 0x7f; }
+#endif

#endif /* __INTEL_TCC_H__ */
--
2.34.1


2024-04-06 13:17:40

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 3/3] hwmon: (coretemp) Use a model-specific bitmask to read registers

On Fri, Apr 05, 2024 at 06:04:16PM -0700, Ricardo Neri wrote:
> The Intel Software Development manual defines states the temperature
> digital readout as the bits [22:16] of the IA32_[PACKAGE]_THERM_STATUS
> registers. In recent processor, however, the range is [23:16]. Use a
> model-specific bitmask to extract the temperature readout correctly.
>
> Instead of re-implementing model checks, extract the correct bitmask
> using the intel_tcc library. Add an 'imply' weak reverse dependency on
> CONFIG_INTEL_TCC. This captures the dependency and lets user to unselect
> them if they are so inclined. In such case, the bitmask used for the
> digital readout is [22:16] as specified in the Intel Software Developer's
> manual.
>
> Signed-off-by: Ricardo Neri <[email protected]>
> ---
> Cc: Daniel Lezcano <[email protected]>
> Cc: Lukasz Luba <[email protected]>
> Cc: Srinivas Pandruvada <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected] # v6.7+
> ---
> drivers/hwmon/Kconfig | 1 +
> drivers/hwmon/coretemp.c | 6 +++++-
> 2 files changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
> index 83945397b6eb..11d72b3009bf 100644
> --- a/drivers/hwmon/Kconfig
> +++ b/drivers/hwmon/Kconfig
> @@ -847,6 +847,7 @@ config SENSORS_I5500
> config SENSORS_CORETEMP
> tristate "Intel Core/Core2/Atom temperature sensor"
> depends on X86
> + imply INTEL_TCC

I do not think it is appropriate to make a hardware monitoring driver
depend on the thermal subsystem.

NAK in the current form.

Guenter

2024-04-07 08:13:45

by Zhang, Rui

[permalink] [raw]
Subject: Re: [PATCH 1/3] thermal: intel: intel_tcc: Add model checks for temperature registers

> +
> +#define TCC_FAM6_MODEL_TEMP_MASKS

Future non FAM6 processors can still use this macro, right?
So I'd prefer to remove FAM6_MODEL in the macro name.

[...]
>
> +
> +/**
> + * get_tcc_offset_mask() - Returns the model-specific bitmask for
> TCC offset
> + *
> + * Get the model-specific bitmask to extract TCC_OFFSET from the
> MSR_TEMPERATURE_
> + * TARGET register. If the mask is 0, it means the processor does
> not support TCC offset.
> + *
> + * Return: The model-specific bitmask for TCC offset.
> + */
> +u32 get_tcc_offset_mask(void)
> +{
> +       return intel_tcc_temp_masks.tcc_offset;
> +}
> +EXPORT_SYMBOL_NS(get_tcc_offset_mask, INTEL_TCC);

the name is not consistent with the other intel_tcc APIs.

how about intel_tcc_get_offset_mask()?

[...]

> diff --git a/include/linux/intel_tcc.h b/include/linux/intel_tcc.h
> index 8ff8eabb4a98..e281cf06aeab 100644
> --- a/include/linux/intel_tcc.h
> +++ b/include/linux/intel_tcc.h
> @@ -14,5 +14,13 @@ int intel_tcc_get_tjmax(int cpu);
>  int intel_tcc_get_offset(int cpu);
>  int intel_tcc_set_offset(int cpu, int offset);
>  int intel_tcc_get_temp(int cpu, int *temp, bool pkg);
> +#ifdef CONFIG_INTEL_TCC
> +u32 get_tcc_offset_mask(void);
> +u32 intel_tcc_get_temp_mask(bool pkg);
> +#else
> +static inline u32 get_tcc_offset_mask(void) { return 0; }
> +/* Use the architectural bitmask of the temperature readout. No
> model checks. */
> +static inline u32 intel_tcc_get_temp_mask(bool pkg) { return 0x7f; }
> +#endif

for intel_tcc_get_temp_mask()
1. with CONFIG_INTEL_TCC
a) for a platform in the model list, return the hardcoded value
b) for a platform not in the model list, return 0xff
2. without CONFIG_INTEL_TCC, return 0x7f

This is a bit confusing. IMO, at least we should leave a comment about
this difference.

thanks,
rui

2024-04-07 08:24:54

by Zhang, Rui

[permalink] [raw]
Subject: Re: [PATCH 3/3] hwmon: (coretemp) Use a model-specific bitmask to read registers

On Fri, 2024-04-05 at 18:04 -0700, Ricardo Neri wrote:
> The Intel Software Development manual defines states the temperature

I failed to parse this, is the above "states" redundant?

[...]

> digital readout as the bits [22:16] of the
> IA32_[PACKAGE]_THERM_STATUS
> registers. In recent processor, however, the range is [23:16]. Use a
> model-specific bitmask to extract the temperature readout correctly.
>
> diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c
> index 616bd1a5b864..5632e1b1dfb1 100644
> --- a/drivers/hwmon/coretemp.c
> +++ b/drivers/hwmon/coretemp.c
> @@ -17,6 +17,7 @@
>  #include <linux/sysfs.h>
>  #include <linux/hwmon-sysfs.h>
>  #include <linux/err.h>
> +#include <linux/intel_tcc.h>
>  #include <linux/mutex.h>
>  #include <linux/list.h>
>  #include <linux/platform_device.h>
> @@ -404,6 +405,8 @@ static ssize_t show_temp(struct device *dev,
>         tjmax = get_tjmax(tdata, dev);
>         /* Check whether the time interval has elapsed */
>         if (time_after(jiffies, tdata->last_updated + HZ)) {
> +               u32 mask =
> intel_tcc_get_temp_mask(is_pkg_temp_data(tdata));
> +
>                 rdmsr_on_cpu(tdata->cpu, tdata->status_reg, &eax,
> &edx);
>                 /*
>                  * Ignore the valid bit. In all observed cases the
> register
> @@ -411,7 +414,7 @@ static ssize_t show_temp(struct device *dev,
>                  * Return it instead of reporting an error which
> doesn't
>                  * really help at all.
>                  */
> -               tdata->temp = tjmax - ((eax >> 16) & 0x7f) * 1000;
> +               tdata->temp = tjmax - ((eax >> 16) & mask) * 1000;
>                 tdata->last_updated = jiffies;
>         }
>
Besides this one, we can also convert to use intel_tcc_get_tjmax() in
get_tjmax().

thanks,
rui

2024-04-07 08:40:14

by Zhang, Rui

[permalink] [raw]
Subject: Re: [PATCH 3/3] hwmon: (coretemp) Use a model-specific bitmask to read registers

On Sat, 2024-04-06 at 06:17 -0700, Guenter Roeck wrote:
> On Fri, Apr 05, 2024 at 06:04:16PM -0700, Ricardo Neri wrote:
> > The Intel Software Development manual defines states the
> > temperature
> > digital readout as the bits [22:16] of the
> > IA32_[PACKAGE]_THERM_STATUS
> > registers. In recent processor, however, the range is [23:16]. Use
> > a
> > model-specific bitmask to extract the temperature readout
> > correctly.
> >
> > Instead of re-implementing model checks, extract the correct
> > bitmask
> > using the intel_tcc library. Add an 'imply' weak reverse dependency
> > on
> > CONFIG_INTEL_TCC. This captures the dependency and lets user to
> > unselect
> > them if they are so inclined. In such case, the bitmask used for
> > the
> > digital readout is [22:16] as specified in the Intel Software
> > Developer's
> > manual.
> >
> > Signed-off-by: Ricardo Neri <[email protected]>
> > ---
> > Cc: Daniel Lezcano <[email protected]>
> > Cc: Lukasz Luba <[email protected]>
> > Cc: Srinivas Pandruvada <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> > Cc: [email protected]
> > Cc: [email protected] # v6.7+
> > ---
> >  drivers/hwmon/Kconfig    | 1 +
> >  drivers/hwmon/coretemp.c | 6 +++++-
> >  2 files changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
> > index 83945397b6eb..11d72b3009bf 100644
> > --- a/drivers/hwmon/Kconfig
> > +++ b/drivers/hwmon/Kconfig
> > @@ -847,6 +847,7 @@ config SENSORS_I5500
> >  config SENSORS_CORETEMP
> >         tristate "Intel Core/Core2/Atom temperature sensor"
> >         depends on X86
> > +       imply INTEL_TCC
>
> I do not think it is appropriate to make a hardware monitoring driver
> depend on the thermal subsystem.
>
> NAK in the current form.
>
Hi, Guenter,

Thanks for reviewing.

We've seen a couple of hwmon drivers depends on or imply THERMAL.
That's why we think this is an applicable solution.
Using the intel_tcc APIs can effectively reduce the future maintenance
burden because we don't need to duplicate the model list in multiple
places.

or do you have any suggestions?

thanks,
rui

2024-04-08 11:40:35

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 3/3] hwmon: (coretemp) Use a model-specific bitmask to read registers

On Sun, Apr 07, 2024 at 08:39:51AM +0000, Zhang, Rui wrote:
> > I do not think it is appropriate to make a hardware monitoring driver
> > depend on the thermal subsystem.
> >
> > NAK in the current form.
> >
> Hi, Guenter,
>
> Thanks for reviewing.
>
> We've seen a couple of hwmon drivers depends on or imply THERMAL.
> That's why we think this is an applicable solution.

So far this was - unless someone sneaked something by - for drivers
which registered thermal zones, not for calling code which resides
inside thermal subsystem.

> Using the intel_tcc APIs can effectively reduce the future maintenance
> burden because we don't need to duplicate the model list in multiple
> places.
>
> or do you have any suggestions?

The exported code should reside outside the thermal subsystem.

Also, as implemented, if INTEL_TCC=n, the returned temperature mask value
is 0x7f, and the offset mask is 0. So the alternative would be to just use
those values unconditionally since apparently that is sufficient.

Thanks,
Guenter

2024-04-08 15:33:53

by Ricardo Neri

[permalink] [raw]
Subject: Re: [PATCH 1/3] thermal: intel: intel_tcc: Add model checks for temperature registers

On Sun, Apr 07, 2024 at 08:13:28AM +0000, Zhang, Rui wrote:
> > +
> > +#define TCC_FAM6_MODEL_TEMP_MASKS

Thank your your review, Rui!

>
> Future non FAM6 processors can still use this macro, right?
> So I'd prefer to remove FAM6_MODEL in the macro name.

Yes, it is true, FAM6_MODEL it is restrictive and also not needed here.
I will update accodingly.

> [...]
> >
> > +
> > +/**
> > + * get_tcc_offset_mask() - Returns the model-specific bitmask for
> > TCC offset
> > + *
> > + * Get the model-specific bitmask to extract TCC_OFFSET from the
> > MSR_TEMPERATURE_
> > + * TARGET register. If the mask is 0, it means the processor does
> > not support TCC offset.
> > + *
> > + * Return: The model-specific bitmask for TCC offset.
> > + */
> > +u32 get_tcc_offset_mask(void)
> > +{
> > +???????return intel_tcc_temp_masks.tcc_offset;
> > +}
> > +EXPORT_SYMBOL_NS(get_tcc_offset_mask, INTEL_TCC);
>
> the name is not consistent with the other intel_tcc APIs.
>
> how about intel_tcc_get_offset_mask()?

Sure. I can make this change.

>
> [...]
>
> > diff --git a/include/linux/intel_tcc.h b/include/linux/intel_tcc.h
> > index 8ff8eabb4a98..e281cf06aeab 100644
> > --- a/include/linux/intel_tcc.h
> > +++ b/include/linux/intel_tcc.h
> > @@ -14,5 +14,13 @@ int intel_tcc_get_tjmax(int cpu);
> > ?int intel_tcc_get_offset(int cpu);
> > ?int intel_tcc_set_offset(int cpu, int offset);
> > ?int intel_tcc_get_temp(int cpu, int *temp, bool pkg);
> > +#ifdef CONFIG_INTEL_TCC
> > +u32 get_tcc_offset_mask(void);
> > +u32 intel_tcc_get_temp_mask(bool pkg);
> > +#else
> > +static inline u32 get_tcc_offset_mask(void) { return 0; }
> > +/* Use the architectural bitmask of the temperature readout. No
> > model checks. */
> > +static inline u32 intel_tcc_get_temp_mask(bool pkg) { return 0x7f; }
> > +#endif
>
> for intel_tcc_get_temp_mask()
> 1. with CONFIG_INTEL_TCC
> a) for a platform in the model list, return the hardcoded value
> b) for a platform not in the model list, return 0xff
> 2. without CONFIG_INTEL_TCC, return 0x7f
>
> This is a bit confusing. IMO, at least we should leave a comment about
> this difference.

If we don't do model checks, I think we should rely on what is architectural
as per the SDM. Hence the 0x7f value.

Perhaps I can expand the comment in this hunk to detail what we do when we
do model checks.


2024-04-15 01:13:43

by Ricardo Neri

[permalink] [raw]
Subject: Re: [PATCH 3/3] hwmon: (coretemp) Use a model-specific bitmask to read registers

On Sun, Apr 07, 2024 at 08:24:40AM +0000, Zhang, Rui wrote:
> On Fri, 2024-04-05 at 18:04 -0700, Ricardo Neri wrote:
> > The Intel Software Development manual defines states the temperature
>
> I failed to parse this, is the above "states" redundant?

Sorry Rui! I missed this repy.

Ah, the commit message is wrong. I will do s/defines//

>
> [...]
>
> > digital readout as the bits [22:16] of the
> > IA32_[PACKAGE]_THERM_STATUS
> > registers. In recent processor, however, the range is [23:16]. Use a
> > model-specific bitmask to extract the temperature readout correctly.
> >
> > diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c
> > index 616bd1a5b864..5632e1b1dfb1 100644
> > --- a/drivers/hwmon/coretemp.c
> > +++ b/drivers/hwmon/coretemp.c
> > @@ -17,6 +17,7 @@
> > ?#include <linux/sysfs.h>
> > ?#include <linux/hwmon-sysfs.h>
> > ?#include <linux/err.h>
> > +#include <linux/intel_tcc.h>
> > ?#include <linux/mutex.h>
> > ?#include <linux/list.h>
> > ?#include <linux/platform_device.h>
> > @@ -404,6 +405,8 @@ static ssize_t show_temp(struct device *dev,
> > ????????tjmax = get_tjmax(tdata, dev);
> > ????????/* Check whether the time interval has elapsed */
> > ????????if (time_after(jiffies, tdata->last_updated + HZ)) {
> > +???????????????u32 mask =
> > intel_tcc_get_temp_mask(is_pkg_temp_data(tdata));
> > +
> > ????????????????rdmsr_on_cpu(tdata->cpu, tdata->status_reg, &eax,
> > &edx);
> > ????????????????/*
> > ???????????????? * Ignore the valid bit. In all observed cases the
> > register
> > @@ -411,7 +414,7 @@ static ssize_t show_temp(struct device *dev,
> > ???????????????? * Return it instead of reporting an error which
> > doesn't
> > ???????????????? * really help at all.
> > ???????????????? */
> > -???????????????tdata->temp = tjmax - ((eax >> 16) & 0x7f) * 1000;
> > +???????????????tdata->temp = tjmax - ((eax >> 16) & mask) * 1000;
> > ????????????????tdata->last_updated = jiffies;
> > ????????}
> >
> Besides this one, we can also convert to use intel_tcc_get_tjmax() in
> get_tjmax().

I thought about this, but realized that the bitmask of TjMax is always
[23:16]; no need for a model check. If anything, intel_tcc_get_tjmax()
would remove some duplicated code. But coretemp.c would need to depend
on INTEL_TCC, which seems to be a non-starter.

2024-04-15 12:42:23

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 3/3] hwmon: (coretemp) Use a model-specific bitmask to read registers

On Sun, Apr 14, 2024 at 06:19:46PM -0700, Ricardo Neri wrote:
> On Sun, Apr 07, 2024 at 08:24:40AM +0000, Zhang, Rui wrote:
> > On Fri, 2024-04-05 at 18:04 -0700, Ricardo Neri wrote:
> > > The Intel Software Development manual defines states the temperature
> >
> > I failed to parse this, is the above "states" redundant?
>
> Sorry Rui! I missed this repy.
>
> Ah, the commit message is wrong. I will do s/defines//
>
> >
> > [...]
> >
> > > digital readout as the bits [22:16] of the
> > > IA32_[PACKAGE]_THERM_STATUS
> > > registers. In recent processor, however, the range is [23:16]. Use a
> > > model-specific bitmask to extract the temperature readout correctly.
> > >
> > > diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c
> > > index 616bd1a5b864..5632e1b1dfb1 100644
> > > --- a/drivers/hwmon/coretemp.c
> > > +++ b/drivers/hwmon/coretemp.c
> > > @@ -17,6 +17,7 @@
> > > ?#include <linux/sysfs.h>
> > > ?#include <linux/hwmon-sysfs.h>
> > > ?#include <linux/err.h>
> > > +#include <linux/intel_tcc.h>
> > > ?#include <linux/mutex.h>
> > > ?#include <linux/list.h>
> > > ?#include <linux/platform_device.h>
> > > @@ -404,6 +405,8 @@ static ssize_t show_temp(struct device *dev,
> > > ????????tjmax = get_tjmax(tdata, dev);
> > > ????????/* Check whether the time interval has elapsed */
> > > ????????if (time_after(jiffies, tdata->last_updated + HZ)) {
> > > +???????????????u32 mask =
> > > intel_tcc_get_temp_mask(is_pkg_temp_data(tdata));
> > > +
> > > ????????????????rdmsr_on_cpu(tdata->cpu, tdata->status_reg, &eax,
> > > &edx);
> > > ????????????????/*
> > > ???????????????? * Ignore the valid bit. In all observed cases the
> > > register
> > > @@ -411,7 +414,7 @@ static ssize_t show_temp(struct device *dev,
> > > ???????????????? * Return it instead of reporting an error which
> > > doesn't
> > > ???????????????? * really help at all.
> > > ???????????????? */
> > > -???????????????tdata->temp = tjmax - ((eax >> 16) & 0x7f) * 1000;
> > > +???????????????tdata->temp = tjmax - ((eax >> 16) & mask) * 1000;
> > > ????????????????tdata->last_updated = jiffies;
> > > ????????}
> > >
> > Besides this one, we can also convert to use intel_tcc_get_tjmax() in
> > get_tjmax().
>
> I thought about this, but realized that the bitmask of TjMax is always
> [23:16]; no need for a model check. If anything, intel_tcc_get_tjmax()
> would remove some duplicated code. But coretemp.c would need to depend
> on INTEL_TCC, which seems to be a non-starter.
>

Calling intel_tcc_get_temp_mask() in practice already introduces that
dependency because it returns a fixed mask if INTEL_TCC is not enabled.
If that doesn't matter, the dynamic mask is unnecessary to start with.

Guenter