2020-01-22 16:09:30

by Guenter Roeck

[permalink] [raw]
Subject: [PATCH v4 0/6] hwmon: k10temp driver improvements

This patch series implements various improvements for the k10temp driver.

Patch 1/6 introduces the use of bit operations.

Patch 2/6 converts the driver to use the devm_hwmon_device_register_with_info
API. This not only simplifies the code and reduces its size, it also
makes the code easier to maintain and enhance.

Patch 3/6 adds support for reporting Core Complex Die (CCD) temperatures
on Zen2 (Ryzen and Threadripper) CPUs (note that reporting is incomplete
for Threadripper CPUs - it is known that additional temperature sensors
exist, but the register locations are unknown).

Patch 4/6 adds support for reporting core and SoC current and voltage
information on Ryzen CPUs (note: voltage and current measurements for
Threadripper and EPYC CPUs are known to exist, but register locations
are unknown, and values are therefore not reported at this time).

Patch 5/6 removes the maximum temperature from Tdie for Ryzen CPUs.
It is inaccurate, misleading, and it just doesn't make sense to report
wrong information.

Patch 6/6 adds debugfs files to provide raw thermal and SVI register
dumps. This may help in the future to identify additional sensors and/or
to fix problems.

With all patches in place, output on Ryzen 3900X CPUs looks as follows
(with the system under load).

k10temp-pci-00c3
Adapter: PCI adapter
Vcore: +1.39 V
Vsoc: +1.18 V
Tdie: +79.9°C
Tctl: +79.9°C
Tccd1: +61.8°C
Tccd2: +76.5°C
Icore: +46.00 A
Isoc: +12.00 A

The voltage and current information is limited to Ryzen CPUs. Voltage
and current reporting on Threadripper and EPYC CPUs is different, and the
reported information is either incomplete or wrong. Exclude it for the time
being; it can always be added if/when more information becomes available.

Tested with the following Ryzen CPUs:
1300X A user with this CPU in the system reported somewhat unexpected
values for Vcore; it isn't entirely if at all clear why that is
the case. Overall this does not warrant holding up the series.
1600
1800X
2200G
2400G
2700
2700X
2950X
3600X
3800X
3900X
3950X
3970X
EPYC 7302
EPYC 7742

Many thanks to everyone who helped to test this series.

---
v4: Normalize current calculations do show 1A / LSB for core current and
0.25A / LSB for SoC current. The reported current values are board
specific and need to be scaled using the configuration file.
Clarified that the maximum temperature of 70 degrees C (which is no
longer displayed) was associated to Tctl and not to Tdie.
Added debugfs support.

v3: Added more Tested-by: tags
Added detection for 3970X, and report Tccd1 for this CPU.

v2: Added Tested-by: tags as received.
Don't display voltage and current information for Threadripper and EPYC.
Stop displaying the fixed (and wrong) maximum temperature of 70 degrees C
for Tdie on model 17h/18h CPUs.


2020-01-22 16:09:32

by Guenter Roeck

[permalink] [raw]
Subject: [PATCH v4 1/6] hwmon: (k10temp) Use bitops

Using bitops makes bit masks and shifts easier to read.

Tested-by: Brad Campbell <[email protected]>
Tested-by: Bernhard Gebetsberger <[email protected]>
Tested-by: Holger Kiehl <[email protected]>
Tested-by: Michael Larabel <[email protected]>
Tested-by: Jonathan McDowell <[email protected]>
Tested-by: Ken Moffat <[email protected]>
Tested-by: Darren Salt <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
---
drivers/hwmon/k10temp.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c
index 5c1dddde193c..8807d7da68db 100644
--- a/drivers/hwmon/k10temp.c
+++ b/drivers/hwmon/k10temp.c
@@ -5,6 +5,7 @@
* Copyright (c) 2009 Clemens Ladisch <[email protected]>
*/

+#include <linux/bitops.h>
#include <linux/err.h>
#include <linux/hwmon.h>
#include <linux/hwmon-sysfs.h>
@@ -31,22 +32,22 @@ static DEFINE_MUTEX(nb_smu_ind_mutex);
#endif

/* CPUID function 0x80000001, ebx */
-#define CPUID_PKGTYPE_MASK 0xf0000000
+#define CPUID_PKGTYPE_MASK GENMASK(31, 28)
#define CPUID_PKGTYPE_F 0x00000000
#define CPUID_PKGTYPE_AM2R2_AM3 0x10000000

/* DRAM controller (PCI function 2) */
#define REG_DCT0_CONFIG_HIGH 0x094
-#define DDR3_MODE 0x00000100
+#define DDR3_MODE BIT(8)

/* miscellaneous (PCI function 3) */
#define REG_HARDWARE_THERMAL_CONTROL 0x64
-#define HTC_ENABLE 0x00000001
+#define HTC_ENABLE BIT(0)

#define REG_REPORTED_TEMPERATURE 0xa4

#define REG_NORTHBRIDGE_CAPABILITIES 0xe8
-#define NB_CAP_HTC 0x00000400
+#define NB_CAP_HTC BIT(10)

/*
* For F15h M60h and M70h, REG_HARDWARE_THERMAL_CONTROL
@@ -60,6 +61,9 @@ static DEFINE_MUTEX(nb_smu_ind_mutex);
/* F17h M01h Access througn SMN */
#define F17H_M01H_REPORTED_TEMP_CTRL_OFFSET 0x00059800

+#define CUR_TEMP_SHIFT 21
+#define CUR_TEMP_RANGE_SEL_MASK BIT(19)
+
struct k10temp_data {
struct pci_dev *pdev;
void (*read_htcreg)(struct pci_dev *pdev, u32 *regval);
@@ -129,7 +133,7 @@ static unsigned int get_raw_temp(struct k10temp_data *data)
u32 regval;

data->read_tempreg(data->pdev, &regval);
- temp = (regval >> 21) * 125;
+ temp = (regval >> CUR_TEMP_SHIFT) * 125;
if (regval & data->temp_adjust_mask)
temp -= 49000;
return temp;
@@ -312,7 +316,7 @@ static int k10temp_probe(struct pci_dev *pdev,
data->read_htcreg = read_htcreg_nb_f15;
data->read_tempreg = read_tempreg_nb_f15;
} else if (boot_cpu_data.x86 == 0x17 || boot_cpu_data.x86 == 0x18) {
- data->temp_adjust_mask = 0x80000;
+ data->temp_adjust_mask = CUR_TEMP_RANGE_SEL_MASK;
data->read_tempreg = read_tempreg_nb_f17;
data->show_tdie = true;
} else {
--
2.17.1

2020-01-22 16:09:44

by Guenter Roeck

[permalink] [raw]
Subject: [PATCH v4 6/6] hwmon: (k10temp) Add debugfs support

Show thermal and SVI registers for Family 17h CPUs.

Signed-off-by: Guenter Roeck <[email protected]>
---
drivers/hwmon/k10temp.c | 78 ++++++++++++++++++++++++++++++++++++++++-
1 file changed, 77 insertions(+), 1 deletion(-)

diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c
index 4a470b5195ee..5e3f43594084 100644
--- a/drivers/hwmon/k10temp.c
+++ b/drivers/hwmon/k10temp.c
@@ -26,6 +26,7 @@
*/

#include <linux/bitops.h>
+#include <linux/debugfs.h>
#include <linux/err.h>
#include <linux/hwmon.h>
#include <linux/init.h>
@@ -442,6 +443,76 @@ static bool has_erratum_319(struct pci_dev *pdev)
(boot_cpu_data.x86_model == 4 && boot_cpu_data.x86_stepping <= 2);
}

+#ifdef CONFIG_DEBUG_FS
+
+static void k10temp_smn_regs_show(struct seq_file *s, struct pci_dev *pdev,
+ u32 addr, int count)
+{
+ u32 reg;
+ int i;
+
+ for (i = 0; i < count; i++) {
+ if (!(i & 3))
+ seq_printf(s, "0x%06x: ", addr + i * 4);
+ amd_smn_read(amd_pci_dev_to_node_id(pdev), addr + i * 4, &reg);
+ seq_printf(s, "%08x ", reg);
+ if ((i & 3) == 3)
+ seq_puts(s, "\n");
+ }
+}
+
+static int svi_show(struct seq_file *s, void *unused)
+{
+ struct k10temp_data *data = s->private;
+
+ k10temp_smn_regs_show(s, data->pdev, F17H_M01H_SVI, 32);
+ return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(svi);
+
+static int thm_show(struct seq_file *s, void *unused)
+{
+ struct k10temp_data *data = s->private;
+
+ k10temp_smn_regs_show(s, data->pdev,
+ F17H_M01H_REPORTED_TEMP_CTRL_OFFSET, 256);
+ return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(thm);
+
+static void k10temp_debugfs_cleanup(void *ddir)
+{
+ debugfs_remove_recursive(ddir);
+}
+
+static void k10temp_init_debugfs(struct k10temp_data *data)
+{
+ struct dentry *debugfs;
+ char name[32];
+
+ /* Only show debugfs data for Family 17h/18h CPUs */
+ if (!data->show_tdie)
+ return;
+
+ scnprintf(name, sizeof(name), "k10temp-%s", pci_name(data->pdev));
+
+ debugfs = debugfs_create_dir(name, NULL);
+ if (debugfs) {
+ debugfs_create_file("svi", 0444, debugfs, data, &svi_fops);
+ debugfs_create_file("thm", 0444, debugfs, data, &thm_fops);
+ devm_add_action_or_reset(&data->pdev->dev,
+ k10temp_debugfs_cleanup, debugfs);
+ }
+}
+
+#else
+
+static void k10temp_init_debugfs(struct k10temp_data *data)
+{
+}
+
+#endif
+
static const struct hwmon_channel_info *k10temp_info[] = {
HWMON_CHANNEL_INFO(temp,
HWMON_T_INPUT | HWMON_T_MAX |
@@ -553,7 +624,12 @@ static int k10temp_probe(struct pci_dev *pdev, const struct pci_device_id *id)
hwmon_dev = devm_hwmon_device_register_with_info(dev, "k10temp", data,
&k10temp_chip_info,
NULL);
- return PTR_ERR_OR_ZERO(hwmon_dev);
+ if (IS_ERR(hwmon_dev))
+ return PTR_ERR(hwmon_dev);
+
+ k10temp_init_debugfs(data);
+
+ return 0;
}

static const struct pci_device_id k10temp_id_table[] = {
--
2.17.1

2020-01-22 16:09:53

by Guenter Roeck

[permalink] [raw]
Subject: [PATCH v4 4/6] hwmon: (k10temp) Show core and SoC current and voltages on Ryzen CPUs

Ryzen CPUs report core and SoC voltages and currents. Add support
for it to the k10temp driver.

For the time being, only report voltages and currents for Ryzen
CPUs. Threadripper and EPYC appear to use a different mechanism.

Tested-by: Brad Campbell <[email protected]>
Tested-by: Bernhard Gebetsberger <[email protected]>
Tested-by: Holger Kiehl <[email protected]>
Tested-by: Michael Larabel <[email protected]>
Tested-by: Jonathan McDowell <[email protected]>
Tested-by: Ken Moffat <[email protected]>
Tested-by: Darren Salt <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
---
drivers/hwmon/k10temp.c | 134 +++++++++++++++++++++++++++++++++++++++-
1 file changed, 131 insertions(+), 3 deletions(-)

diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c
index 0af096b061fa..b961e12c6f58 100644
--- a/drivers/hwmon/k10temp.c
+++ b/drivers/hwmon/k10temp.c
@@ -11,6 +11,18 @@
* convert raw register values is from https://github.com/ocerman/zenpower.
* The information is not confirmed from chip datasheets, but experiments
* suggest that it provides reasonable temperature values.
+ * - Register addresses to read chip voltage and current are also from
+ * https://github.com/ocerman/zenpower, and not confirmed from chip
+ * datasheets. Current calibration is board specific and not typically
+ * shared by board vendors. For this reason, current values are
+ * normalized to report 1A/LSB for core current and and 0.25A/LSB for SoC
+ * current. Reported values can be adjusted using the sensors configuration
+ * file.
+ * - It is unknown if the mechanism to read CCD1/CCD2 temperature as well as
+ * current and voltage information works on higher-end Ryzen CPUs.
+ * Information reported by Windows tools suggests that additional sensors
+ * (both temperature and voltage/current) are supported, but their register
+ * location is currently unknown.
*/

#include <linux/bitops.h>
@@ -70,9 +82,16 @@ static DEFINE_MUTEX(nb_smu_ind_mutex);
#define F17H_M70H_CCD1_TEMP 0x00059954
#define F17H_M70H_CCD2_TEMP 0x00059958

+#define F17H_M01H_SVI 0x0005A000
+#define F17H_M01H_SVI_TEL_PLANE0 (F17H_M01H_SVI + 0xc)
+#define F17H_M01H_SVI_TEL_PLANE1 (F17H_M01H_SVI + 0x10)
+
#define CUR_TEMP_SHIFT 21
#define CUR_TEMP_RANGE_SEL_MASK BIT(19)

+#define CFACTOR_ICORE 1000000 /* 1A / LSB */
+#define CFACTOR_ISOC 250000 /* 0.25A / LSB */
+
struct k10temp_data {
struct pci_dev *pdev;
void (*read_htcreg)(struct pci_dev *pdev, u32 *regval);
@@ -82,6 +101,9 @@ struct k10temp_data {
bool show_tdie;
bool show_tccd1;
bool show_tccd2;
+ u32 svi_addr[2];
+ bool show_current;
+ int cfactor[2];
};

struct tctl_offset {
@@ -99,6 +121,16 @@ static const struct tctl_offset tctl_offset_table[] = {
{ 0x17, "AMD Ryzen Threadripper 29", 27000 }, /* 29{20,50,70,90}[W]X */
};

+static bool is_threadripper(void)
+{
+ return strstr(boot_cpu_data.x86_model_id, "Threadripper");
+}
+
+static bool is_epyc(void)
+{
+ return strstr(boot_cpu_data.x86_model_id, "EPYC");
+}
+
static void read_htcreg_pci(struct pci_dev *pdev, u32 *regval)
{
pci_read_config_dword(pdev, REG_HARDWARE_THERMAL_CONTROL, regval);
@@ -157,16 +189,76 @@ const char *k10temp_temp_label[] = {
"Tccd2",
};

+const char *k10temp_in_label[] = {
+ "Vcore",
+ "Vsoc",
+};
+
+const char *k10temp_curr_label[] = {
+ "Icore",
+ "Isoc",
+};
+
static int k10temp_read_labels(struct device *dev,
enum hwmon_sensor_types type,
u32 attr, int channel, const char **str)
{
- *str = k10temp_temp_label[channel];
+ switch (type) {
+ case hwmon_temp:
+ *str = k10temp_temp_label[channel];
+ break;
+ case hwmon_in:
+ *str = k10temp_in_label[channel];
+ break;
+ case hwmon_curr:
+ *str = k10temp_curr_label[channel];
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
return 0;
}

-static int k10temp_read(struct device *dev, enum hwmon_sensor_types type,
- u32 attr, int channel, long *val)
+static int k10temp_read_curr(struct device *dev, u32 attr, int channel,
+ long *val)
+{
+ struct k10temp_data *data = dev_get_drvdata(dev);
+ u32 regval;
+
+ switch (attr) {
+ case hwmon_curr_input:
+ amd_smn_read(amd_pci_dev_to_node_id(data->pdev),
+ data->svi_addr[channel], &regval);
+ *val = DIV_ROUND_CLOSEST(data->cfactor[channel] *
+ (regval & 0xff),
+ 1000);
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
+ return 0;
+}
+
+static int k10temp_read_in(struct device *dev, u32 attr, int channel, long *val)
+{
+ struct k10temp_data *data = dev_get_drvdata(dev);
+ u32 regval;
+
+ switch (attr) {
+ case hwmon_in_input:
+ amd_smn_read(amd_pci_dev_to_node_id(data->pdev),
+ data->svi_addr[channel], &regval);
+ regval = (regval >> 16) & 0xff;
+ *val = DIV_ROUND_CLOSEST(155000 - regval * 625, 100);
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
+ return 0;
+}
+
+static int k10temp_read_temp(struct device *dev, u32 attr, int channel,
+ long *val)
{
struct k10temp_data *data = dev_get_drvdata(dev);
u32 regval;
@@ -216,6 +308,21 @@ static int k10temp_read(struct device *dev, enum hwmon_sensor_types type,
return 0;
}

+static int k10temp_read(struct device *dev, enum hwmon_sensor_types type,
+ u32 attr, int channel, long *val)
+{
+ switch (type) {
+ case hwmon_temp:
+ return k10temp_read_temp(dev, attr, channel, val);
+ case hwmon_in:
+ return k10temp_read_in(dev, attr, channel, val);
+ case hwmon_curr:
+ return k10temp_read_curr(dev, attr, channel, val);
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
static umode_t k10temp_is_visible(const void *_data,
enum hwmon_sensor_types type,
u32 attr, int channel)
@@ -290,6 +397,11 @@ static umode_t k10temp_is_visible(const void *_data,
return 0;
}
break;
+ case hwmon_in:
+ case hwmon_curr:
+ if (!data->show_current)
+ return 0;
+ break;
default:
return 0;
}
@@ -338,6 +450,12 @@ static const struct hwmon_channel_info *k10temp_info[] = {
HWMON_T_INPUT | HWMON_T_LABEL,
HWMON_T_INPUT | HWMON_T_LABEL,
HWMON_T_INPUT | HWMON_T_LABEL),
+ HWMON_CHANNEL_INFO(in,
+ HWMON_I_INPUT | HWMON_I_LABEL,
+ HWMON_I_INPUT | HWMON_I_LABEL),
+ HWMON_CHANNEL_INFO(curr,
+ HWMON_C_INPUT | HWMON_C_LABEL,
+ HWMON_C_INPUT | HWMON_C_LABEL),
NULL
};

@@ -393,9 +511,19 @@ static int k10temp_probe(struct pci_dev *pdev, const struct pci_device_id *id)
case 0x8: /* Zen+ */
case 0x11: /* Zen APU */
case 0x18: /* Zen+ APU */
+ data->show_current = !is_threadripper() && !is_epyc();
+ data->svi_addr[0] = F17H_M01H_SVI_TEL_PLANE0;
+ data->svi_addr[1] = F17H_M01H_SVI_TEL_PLANE1;
+ data->cfactor[0] = CFACTOR_ICORE;
+ data->cfactor[1] = CFACTOR_ISOC;
break;
case 0x31: /* Zen2 Threadripper */
case 0x71: /* Zen2 */
+ data->show_current = !is_threadripper() && !is_epyc();
+ data->cfactor[0] = CFACTOR_ICORE;
+ data->cfactor[1] = CFACTOR_ISOC;
+ data->svi_addr[0] = F17H_M01H_SVI_TEL_PLANE1;
+ data->svi_addr[1] = F17H_M01H_SVI_TEL_PLANE0;
amd_smn_read(amd_pci_dev_to_node_id(pdev),
F17H_M70H_CCD1_TEMP, &regval);
if (regval & 0xfff)
--
2.17.1

2020-01-22 16:09:54

by Guenter Roeck

[permalink] [raw]
Subject: [PATCH v4 5/6] hwmon: (k10temp) Don't show temperature limits on Ryzen (Zen) CPUs

The maximum Tdie or Tctl is not published for Ryzen CPUs. What is
known, however, is that the traditional value of 70 degrees C is no
longer correct. On top of that, the limit applies to Tctl, not to Tdie.
Displaying it in either context is meaningless, confusing, and wrong.
Stop doing it.

Tested-by: Brad Campbell <[email protected]>
Tested-by: Holger Kiehl <[email protected]>
Tested-by: Michael Larabel <[email protected]>
Tested-by: Jonathan McDowell <[email protected]>
Tested-by: Ken Moffat <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
---
drivers/hwmon/k10temp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c
index b961e12c6f58..4a470b5195ee 100644
--- a/drivers/hwmon/k10temp.c
+++ b/drivers/hwmon/k10temp.c
@@ -355,7 +355,7 @@ static umode_t k10temp_is_visible(const void *_data,
}
break;
case hwmon_temp_max:
- if (channel)
+ if (channel || data->show_tdie)
return 0;
break;
case hwmon_temp_crit:
--
2.17.1

2020-01-22 16:10:13

by Guenter Roeck

[permalink] [raw]
Subject: [PATCH v4 2/6] hmon: (k10temp) Convert to use devm_hwmon_device_register_with_info

Convert driver to use devm_hwmon_device_register_with_info to simplify
the code and to reduce its size.

Old size (x86_64):
text data bss dec hex filename
8247 4488 64 12799 31ff drivers/hwmon/k10temp.o
New size:
text data bss dec hex filename
6778 2792 64 9634 25a2 drivers/hwmon/k10temp.o

Tested-by: Brad Campbell <[email protected]>
Tested-by: Bernhard Gebetsberger <[email protected]>
Tested-by: Holger Kiehl <[email protected]>
Tested-by: Michael Larabel <[email protected]>
Tested-by: Jonathan McDowell <[email protected]>
Tested-by: Ken Moffat <[email protected]>
Tested-by: Darren Salt <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
---
drivers/hwmon/k10temp.c | 213 +++++++++++++++++++++-------------------
1 file changed, 112 insertions(+), 101 deletions(-)

diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c
index 8807d7da68db..c45f6498a59b 100644
--- a/drivers/hwmon/k10temp.c
+++ b/drivers/hwmon/k10temp.c
@@ -1,14 +1,15 @@
// SPDX-License-Identifier: GPL-2.0-or-later
/*
- * k10temp.c - AMD Family 10h/11h/12h/14h/15h/16h processor hardware monitoring
+ * k10temp.c - AMD Family 10h/11h/12h/14h/15h/16h/17h
+ * processor hardware monitoring
*
* Copyright (c) 2009 Clemens Ladisch <[email protected]>
+ * Copyright (c) 2020 Guenter Roeck <[email protected]>
*/

#include <linux/bitops.h>
#include <linux/err.h>
#include <linux/hwmon.h>
-#include <linux/hwmon-sysfs.h>
#include <linux/init.h>
#include <linux/module.h>
#include <linux/pci.h>
@@ -127,10 +128,10 @@ static void read_tempreg_nb_f17(struct pci_dev *pdev, u32 *regval)
F17H_M01H_REPORTED_TEMP_CTRL_OFFSET, regval);
}

-static unsigned int get_raw_temp(struct k10temp_data *data)
+static long get_raw_temp(struct k10temp_data *data)
{
- unsigned int temp;
u32 regval;
+ long temp;

data->read_tempreg(data->pdev, &regval);
temp = (regval >> CUR_TEMP_SHIFT) * 125;
@@ -139,118 +140,108 @@ static unsigned int get_raw_temp(struct k10temp_data *data)
return temp;
}

-static ssize_t temp1_input_show(struct device *dev,
- struct device_attribute *attr, char *buf)
-{
- struct k10temp_data *data = dev_get_drvdata(dev);
- unsigned int temp = get_raw_temp(data);
-
- if (temp > data->temp_offset)
- temp -= data->temp_offset;
- else
- temp = 0;
-
- return sprintf(buf, "%u\n", temp);
-}
-
-static ssize_t temp2_input_show(struct device *dev,
- struct device_attribute *devattr, char *buf)
-{
- struct k10temp_data *data = dev_get_drvdata(dev);
- unsigned int temp = get_raw_temp(data);
-
- return sprintf(buf, "%u\n", temp);
-}
-
-static ssize_t temp_label_show(struct device *dev,
- struct device_attribute *devattr, char *buf)
-{
- struct sensor_device_attribute *attr = to_sensor_dev_attr(devattr);
-
- return sprintf(buf, "%s\n", attr->index ? "Tctl" : "Tdie");
-}
+const char *k10temp_temp_label[] = {
+ "Tdie",
+ "Tctl",
+};

-static ssize_t temp1_max_show(struct device *dev,
- struct device_attribute *attr, char *buf)
+static int k10temp_read_labels(struct device *dev,
+ enum hwmon_sensor_types type,
+ u32 attr, int channel, const char **str)
{
- return sprintf(buf, "%d\n", 70 * 1000);
+ *str = k10temp_temp_label[channel];
+ return 0;
}

-static ssize_t temp_crit_show(struct device *dev,
- struct device_attribute *devattr, char *buf)
+static int k10temp_read(struct device *dev, enum hwmon_sensor_types type,
+ u32 attr, int channel, long *val)
{
- struct sensor_device_attribute *attr = to_sensor_dev_attr(devattr);
struct k10temp_data *data = dev_get_drvdata(dev);
- int show_hyst = attr->index;
u32 regval;
- int value;

- data->read_htcreg(data->pdev, &regval);
- value = ((regval >> 16) & 0x7f) * 500 + 52000;
- if (show_hyst)
- value -= ((regval >> 24) & 0xf) * 500;
- return sprintf(buf, "%d\n", value);
+ switch (attr) {
+ case hwmon_temp_input:
+ switch (channel) {
+ case 0: /* Tdie */
+ *val = get_raw_temp(data) - data->temp_offset;
+ if (*val < 0)
+ *val = 0;
+ break;
+ case 1: /* Tctl */
+ *val = get_raw_temp(data);
+ if (*val < 0)
+ *val = 0;
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
+ break;
+ case hwmon_temp_max:
+ *val = 70 * 1000;
+ break;
+ case hwmon_temp_crit:
+ data->read_htcreg(data->pdev, &regval);
+ *val = ((regval >> 16) & 0x7f) * 500 + 52000;
+ break;
+ case hwmon_temp_crit_hyst:
+ data->read_htcreg(data->pdev, &regval);
+ *val = (((regval >> 16) & 0x7f)
+ - ((regval >> 24) & 0xf)) * 500 + 52000;
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
+ return 0;
}

-static DEVICE_ATTR_RO(temp1_input);
-static DEVICE_ATTR_RO(temp1_max);
-static SENSOR_DEVICE_ATTR_RO(temp1_crit, temp_crit, 0);
-static SENSOR_DEVICE_ATTR_RO(temp1_crit_hyst, temp_crit, 1);
-
-static SENSOR_DEVICE_ATTR_RO(temp1_label, temp_label, 0);
-static DEVICE_ATTR_RO(temp2_input);
-static SENSOR_DEVICE_ATTR_RO(temp2_label, temp_label, 1);
-
-static umode_t k10temp_is_visible(struct kobject *kobj,
- struct attribute *attr, int index)
+static umode_t k10temp_is_visible(const void *_data,
+ enum hwmon_sensor_types type,
+ u32 attr, int channel)
{
- struct device *dev = container_of(kobj, struct device, kobj);
- struct k10temp_data *data = dev_get_drvdata(dev);
+ const struct k10temp_data *data = _data;
struct pci_dev *pdev = data->pdev;
u32 reg;

- switch (index) {
- case 0 ... 1: /* temp1_input, temp1_max */
- default:
- break;
- case 2 ... 3: /* temp1_crit, temp1_crit_hyst */
- if (!data->read_htcreg)
- return 0;
-
- pci_read_config_dword(pdev, REG_NORTHBRIDGE_CAPABILITIES,
- &reg);
- if (!(reg & NB_CAP_HTC))
- return 0;
-
- data->read_htcreg(data->pdev, &reg);
- if (!(reg & HTC_ENABLE))
- return 0;
- break;
- case 4 ... 6: /* temp1_label, temp2_input, temp2_label */
- if (!data->show_tdie)
+ switch (type) {
+ case hwmon_temp:
+ switch (attr) {
+ case hwmon_temp_input:
+ if (channel && !data->show_tdie)
+ return 0;
+ break;
+ case hwmon_temp_max:
+ if (channel)
+ return 0;
+ break;
+ case hwmon_temp_crit:
+ case hwmon_temp_crit_hyst:
+ if (channel || !data->read_htcreg)
+ return 0;
+
+ pci_read_config_dword(pdev,
+ REG_NORTHBRIDGE_CAPABILITIES,
+ &reg);
+ if (!(reg & NB_CAP_HTC))
+ return 0;
+
+ data->read_htcreg(data->pdev, &reg);
+ if (!(reg & HTC_ENABLE))
+ return 0;
+ break;
+ case hwmon_temp_label:
+ if (!data->show_tdie)
+ return 0;
+ break;
+ default:
return 0;
+ }
break;
+ default:
+ return 0;
}
- return attr->mode;
+ return 0444;
}

-static struct attribute *k10temp_attrs[] = {
- &dev_attr_temp1_input.attr,
- &dev_attr_temp1_max.attr,
- &sensor_dev_attr_temp1_crit.dev_attr.attr,
- &sensor_dev_attr_temp1_crit_hyst.dev_attr.attr,
- &sensor_dev_attr_temp1_label.dev_attr.attr,
- &dev_attr_temp2_input.attr,
- &sensor_dev_attr_temp2_label.dev_attr.attr,
- NULL
-};
-
-static const struct attribute_group k10temp_group = {
- .attrs = k10temp_attrs,
- .is_visible = k10temp_is_visible,
-};
-__ATTRIBUTE_GROUPS(k10temp);
-
static bool has_erratum_319(struct pci_dev *pdev)
{
u32 pkg_type, reg_dram_cfg;
@@ -285,8 +276,27 @@ static bool has_erratum_319(struct pci_dev *pdev)
(boot_cpu_data.x86_model == 4 && boot_cpu_data.x86_stepping <= 2);
}

-static int k10temp_probe(struct pci_dev *pdev,
- const struct pci_device_id *id)
+static const struct hwmon_channel_info *k10temp_info[] = {
+ HWMON_CHANNEL_INFO(temp,
+ HWMON_T_INPUT | HWMON_T_MAX |
+ HWMON_T_CRIT | HWMON_T_CRIT_HYST |
+ HWMON_T_LABEL,
+ HWMON_T_INPUT | HWMON_T_LABEL),
+ NULL
+};
+
+static const struct hwmon_ops k10temp_hwmon_ops = {
+ .is_visible = k10temp_is_visible,
+ .read = k10temp_read,
+ .read_string = k10temp_read_labels,
+};
+
+static const struct hwmon_chip_info k10temp_chip_info = {
+ .ops = &k10temp_hwmon_ops,
+ .info = k10temp_info,
+};
+
+static int k10temp_probe(struct pci_dev *pdev, const struct pci_device_id *id)
{
int unreliable = has_erratum_319(pdev);
struct device *dev = &pdev->dev;
@@ -334,8 +344,9 @@ static int k10temp_probe(struct pci_dev *pdev,
}
}

- hwmon_dev = devm_hwmon_device_register_with_groups(dev, "k10temp", data,
- k10temp_groups);
+ hwmon_dev = devm_hwmon_device_register_with_info(dev, "k10temp", data,
+ &k10temp_chip_info,
+ NULL);
return PTR_ERR_OR_ZERO(hwmon_dev);
}

--
2.17.1

2020-01-22 16:10:26

by Guenter Roeck

[permalink] [raw]
Subject: [PATCH v4 3/6] hwmon: (k10temp) Report temperatures per CPU die

Zen2 reports reporting temperatures per CPU die (called Core Complex Dies,
or CCD, by AMD). Add support for it to the k10temp driver.

Tested-by: Brad Campbell <[email protected]>
Tested-by: Bernhard Gebetsberger <[email protected]>
Tested-by: Holger Kiehl <[email protected]>
Tested-by: Michael Larabel <[email protected]>
Tested-by: Jonathan McDowell <[email protected]>
Tested-by: Ken Moffat <[email protected]>
Tested-by: Darren Salt <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
---
drivers/hwmon/k10temp.c | 80 ++++++++++++++++++++++++++++++++++++++++-
1 file changed, 79 insertions(+), 1 deletion(-)

diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c
index c45f6498a59b..0af096b061fa 100644
--- a/drivers/hwmon/k10temp.c
+++ b/drivers/hwmon/k10temp.c
@@ -5,6 +5,12 @@
*
* Copyright (c) 2009 Clemens Ladisch <[email protected]>
* Copyright (c) 2020 Guenter Roeck <[email protected]>
+ *
+ * Implementation notes:
+ * - CCD1 and CCD2 register address information as well as the calculation to
+ * convert raw register values is from https://github.com/ocerman/zenpower.
+ * The information is not confirmed from chip datasheets, but experiments
+ * suggest that it provides reasonable temperature values.
*/

#include <linux/bitops.h>
@@ -61,6 +67,8 @@ static DEFINE_MUTEX(nb_smu_ind_mutex);

/* F17h M01h Access througn SMN */
#define F17H_M01H_REPORTED_TEMP_CTRL_OFFSET 0x00059800
+#define F17H_M70H_CCD1_TEMP 0x00059954
+#define F17H_M70H_CCD2_TEMP 0x00059958

#define CUR_TEMP_SHIFT 21
#define CUR_TEMP_RANGE_SEL_MASK BIT(19)
@@ -72,6 +80,8 @@ struct k10temp_data {
int temp_offset;
u32 temp_adjust_mask;
bool show_tdie;
+ bool show_tccd1;
+ bool show_tccd2;
};

struct tctl_offset {
@@ -143,6 +153,8 @@ static long get_raw_temp(struct k10temp_data *data)
const char *k10temp_temp_label[] = {
"Tdie",
"Tctl",
+ "Tccd1",
+ "Tccd2",
};

static int k10temp_read_labels(struct device *dev,
@@ -172,6 +184,16 @@ static int k10temp_read(struct device *dev, enum hwmon_sensor_types type,
if (*val < 0)
*val = 0;
break;
+ case 2: /* Tccd1 */
+ amd_smn_read(amd_pci_dev_to_node_id(data->pdev),
+ F17H_M70H_CCD1_TEMP, &regval);
+ *val = (regval & 0xfff) * 125 - 305000;
+ break;
+ case 3: /* Tccd2 */
+ amd_smn_read(amd_pci_dev_to_node_id(data->pdev),
+ F17H_M70H_CCD2_TEMP, &regval);
+ *val = (regval & 0xfff) * 125 - 305000;
+ break;
default:
return -EOPNOTSUPP;
}
@@ -206,8 +228,24 @@ static umode_t k10temp_is_visible(const void *_data,
case hwmon_temp:
switch (attr) {
case hwmon_temp_input:
- if (channel && !data->show_tdie)
+ switch (channel) {
+ case 0: /* Tdie, or Tctl if we don't show it */
+ break;
+ case 1: /* Tctl */
+ if (!data->show_tdie)
+ return 0;
+ break;
+ case 2: /* Tccd1 */
+ if (!data->show_tccd1)
+ return 0;
+ break;
+ case 3: /* Tccd2 */
+ if (!data->show_tccd2)
+ return 0;
+ break;
+ default:
return 0;
+ }
break;
case hwmon_temp_max:
if (channel)
@@ -229,8 +267,24 @@ static umode_t k10temp_is_visible(const void *_data,
return 0;
break;
case hwmon_temp_label:
+ /* No labels if we don't show the die temperature */
if (!data->show_tdie)
return 0;
+ switch (channel) {
+ case 0: /* Tdie */
+ case 1: /* Tctl */
+ break;
+ case 2: /* Tccd1 */
+ if (!data->show_tccd1)
+ return 0;
+ break;
+ case 3: /* Tccd2 */
+ if (!data->show_tccd2)
+ return 0;
+ break;
+ default:
+ return 0;
+ }
break;
default:
return 0;
@@ -281,6 +335,8 @@ static const struct hwmon_channel_info *k10temp_info[] = {
HWMON_T_INPUT | HWMON_T_MAX |
HWMON_T_CRIT | HWMON_T_CRIT_HYST |
HWMON_T_LABEL,
+ HWMON_T_INPUT | HWMON_T_LABEL,
+ HWMON_T_INPUT | HWMON_T_LABEL,
HWMON_T_INPUT | HWMON_T_LABEL),
NULL
};
@@ -326,9 +382,31 @@ static int k10temp_probe(struct pci_dev *pdev, const struct pci_device_id *id)
data->read_htcreg = read_htcreg_nb_f15;
data->read_tempreg = read_tempreg_nb_f15;
} else if (boot_cpu_data.x86 == 0x17 || boot_cpu_data.x86 == 0x18) {
+ u32 regval;
+
data->temp_adjust_mask = CUR_TEMP_RANGE_SEL_MASK;
data->read_tempreg = read_tempreg_nb_f17;
data->show_tdie = true;
+
+ switch (boot_cpu_data.x86_model) {
+ case 0x1: /* Zen */
+ case 0x8: /* Zen+ */
+ case 0x11: /* Zen APU */
+ case 0x18: /* Zen+ APU */
+ break;
+ case 0x31: /* Zen2 Threadripper */
+ case 0x71: /* Zen2 */
+ amd_smn_read(amd_pci_dev_to_node_id(pdev),
+ F17H_M70H_CCD1_TEMP, &regval);
+ if (regval & 0xfff)
+ data->show_tccd1 = true;
+
+ amd_smn_read(amd_pci_dev_to_node_id(pdev),
+ F17H_M70H_CCD2_TEMP, &regval);
+ if (regval & 0xfff)
+ data->show_tccd2 = true;
+ break;
+ }
} else {
data->read_htcreg = read_htcreg_pci;
data->read_tempreg = read_tempreg_pci;
--
2.17.1

2020-01-22 19:06:44

by Sebastian Reichel

[permalink] [raw]
Subject: Re: [PATCH v4 0/6] hwmon: k10temp driver improvements

Hi,

The series is

Tested-by: Sebastian Reichel <[email protected]>

on 3800X.

idle:

k10temp-pci-00c3
Adapter: PCI adapter
Vcore: 919.00 mV
Vsoc: 1.01 V
Tdie: +41.1?C
Tctl: +41.1?C
Tccd1: +39.8?C
Icore: 0.00 A
Isoc: 4.50 A

with load:

k10temp-pci-00c3
Adapter: PCI adapter
Vcore: 1.29 V
Vsoc: 1.01 V
Tdie: +80.4?C
Tctl: +80.4?C
Tccd1: +78.5?C
Icore: 61.00 A
Isoc: 6.50 A

debugfs output is also register dumps are also working.

-- Sebastian

On Wed, Jan 22, 2020 at 08:07:54AM -0800, Guenter Roeck wrote:
> This patch series implements various improvements for the k10temp driver.
>
> Patch 1/6 introduces the use of bit operations.
>
> Patch 2/6 converts the driver to use the devm_hwmon_device_register_with_info
> API. This not only simplifies the code and reduces its size, it also
> makes the code easier to maintain and enhance.
>
> Patch 3/6 adds support for reporting Core Complex Die (CCD) temperatures
> on Zen2 (Ryzen and Threadripper) CPUs (note that reporting is incomplete
> for Threadripper CPUs - it is known that additional temperature sensors
> exist, but the register locations are unknown).
>
> Patch 4/6 adds support for reporting core and SoC current and voltage
> information on Ryzen CPUs (note: voltage and current measurements for
> Threadripper and EPYC CPUs are known to exist, but register locations
> are unknown, and values are therefore not reported at this time).
>
> Patch 5/6 removes the maximum temperature from Tdie for Ryzen CPUs.
> It is inaccurate, misleading, and it just doesn't make sense to report
> wrong information.
>
> Patch 6/6 adds debugfs files to provide raw thermal and SVI register
> dumps. This may help in the future to identify additional sensors and/or
> to fix problems.
>
> With all patches in place, output on Ryzen 3900X CPUs looks as follows
> (with the system under load).
>
> k10temp-pci-00c3
> Adapter: PCI adapter
> Vcore: +1.39 V
> Vsoc: +1.18 V
> Tdie: +79.9?C
> Tctl: +79.9?C
> Tccd1: +61.8?C
> Tccd2: +76.5?C
> Icore: +46.00 A
> Isoc: +12.00 A
>
> The voltage and current information is limited to Ryzen CPUs. Voltage
> and current reporting on Threadripper and EPYC CPUs is different, and the
> reported information is either incomplete or wrong. Exclude it for the time
> being; it can always be added if/when more information becomes available.
>
> Tested with the following Ryzen CPUs:
> 1300X A user with this CPU in the system reported somewhat unexpected
> values for Vcore; it isn't entirely if at all clear why that is
> the case. Overall this does not warrant holding up the series.
> 1600
> 1800X
> 2200G
> 2400G
> 2700
> 2700X
> 2950X
> 3600X
> 3800X
> 3900X
> 3950X
> 3970X
> EPYC 7302
> EPYC 7742
>
> Many thanks to everyone who helped to test this series.
>
> ---
> v4: Normalize current calculations do show 1A / LSB for core current and
> 0.25A / LSB for SoC current. The reported current values are board
> specific and need to be scaled using the configuration file.
> Clarified that the maximum temperature of 70 degrees C (which is no
> longer displayed) was associated to Tctl and not to Tdie.
> Added debugfs support.
>
> v3: Added more Tested-by: tags
> Added detection for 3970X, and report Tccd1 for this CPU.
>
> v2: Added Tested-by: tags as received.
> Don't display voltage and current information for Threadripper and EPYC.
> Stop displaying the fixed (and wrong) maximum temperature of 70 degrees C
> for Tdie on model 17h/18h CPUs.


Attachments:
(No filename) (3.78 kB)
signature.asc (849.00 B)
Download all attachments

2020-01-22 19:38:48

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH v4 0/6] hwmon: k10temp driver improvements

On Wed, Jan 22, 2020 at 08:05:08PM +0100, Sebastian Reichel wrote:
> Hi,
>
> The series is
>
> Tested-by: Sebastian Reichel <[email protected]>
>
Thanks again!

Guenter

> on 3800X.
>
> idle:
>
> k10temp-pci-00c3
> Adapter: PCI adapter
> Vcore: 919.00 mV
> Vsoc: 1.01 V
> Tdie: +41.1?C
> Tctl: +41.1?C
> Tccd1: +39.8?C
> Icore: 0.00 A
> Isoc: 4.50 A
>
> with load:
>
> k10temp-pci-00c3
> Adapter: PCI adapter
> Vcore: 1.29 V
> Vsoc: 1.01 V
> Tdie: +80.4?C
> Tctl: +80.4?C
> Tccd1: +78.5?C
> Icore: 61.00 A
> Isoc: 6.50 A
>
> debugfs output is also register dumps are also working.
>
> -- Sebastian
>
> On Wed, Jan 22, 2020 at 08:07:54AM -0800, Guenter Roeck wrote:
> > This patch series implements various improvements for the k10temp driver.
> >
> > Patch 1/6 introduces the use of bit operations.
> >
> > Patch 2/6 converts the driver to use the devm_hwmon_device_register_with_info
> > API. This not only simplifies the code and reduces its size, it also
> > makes the code easier to maintain and enhance.
> >
> > Patch 3/6 adds support for reporting Core Complex Die (CCD) temperatures
> > on Zen2 (Ryzen and Threadripper) CPUs (note that reporting is incomplete
> > for Threadripper CPUs - it is known that additional temperature sensors
> > exist, but the register locations are unknown).
> >
> > Patch 4/6 adds support for reporting core and SoC current and voltage
> > information on Ryzen CPUs (note: voltage and current measurements for
> > Threadripper and EPYC CPUs are known to exist, but register locations
> > are unknown, and values are therefore not reported at this time).
> >
> > Patch 5/6 removes the maximum temperature from Tdie for Ryzen CPUs.
> > It is inaccurate, misleading, and it just doesn't make sense to report
> > wrong information.
> >
> > Patch 6/6 adds debugfs files to provide raw thermal and SVI register
> > dumps. This may help in the future to identify additional sensors and/or
> > to fix problems.
> >
> > With all patches in place, output on Ryzen 3900X CPUs looks as follows
> > (with the system under load).
> >
> > k10temp-pci-00c3
> > Adapter: PCI adapter
> > Vcore: +1.39 V
> > Vsoc: +1.18 V
> > Tdie: +79.9?C
> > Tctl: +79.9?C
> > Tccd1: +61.8?C
> > Tccd2: +76.5?C
> > Icore: +46.00 A
> > Isoc: +12.00 A
> >
> > The voltage and current information is limited to Ryzen CPUs. Voltage
> > and current reporting on Threadripper and EPYC CPUs is different, and the
> > reported information is either incomplete or wrong. Exclude it for the time
> > being; it can always be added if/when more information becomes available.
> >
> > Tested with the following Ryzen CPUs:
> > 1300X A user with this CPU in the system reported somewhat unexpected
> > values for Vcore; it isn't entirely if at all clear why that is
> > the case. Overall this does not warrant holding up the series.
> > 1600
> > 1800X
> > 2200G
> > 2400G
> > 2700
> > 2700X
> > 2950X
> > 3600X
> > 3800X
> > 3900X
> > 3950X
> > 3970X
> > EPYC 7302
> > EPYC 7742
> >
> > Many thanks to everyone who helped to test this series.
> >
> > ---
> > v4: Normalize current calculations do show 1A / LSB for core current and
> > 0.25A / LSB for SoC current. The reported current values are board
> > specific and need to be scaled using the configuration file.
> > Clarified that the maximum temperature of 70 degrees C (which is no
> > longer displayed) was associated to Tctl and not to Tdie.
> > Added debugfs support.
> >
> > v3: Added more Tested-by: tags
> > Added detection for 3970X, and report Tccd1 for this CPU.
> >
> > v2: Added Tested-by: tags as received.
> > Don't display voltage and current information for Threadripper and EPYC.
> > Stop displaying the fixed (and wrong) maximum temperature of 70 degrees C
> > for Tdie on model 17h/18h CPUs.


2020-01-24 00:12:19

by Ken Moffat

[permalink] [raw]
Subject: Re: [PATCH v4 6/6] hwmon: (k10temp) Add debugfs support

Hi Guenter,

you asked else where for debugfs files from machines with embedded
graphics. I've pasted diffs below (idle,load) from my 3400G ('Picasso' APU).

On Wed, 22 Jan 2020 at 16:08, Guenter Roeck <[email protected]> wrote:
>
> Show thermal and SVI registers for Family 17h CPUs.
>
> Signed-off-by: Guenter Roeck <[email protected]>
> ---
> drivers/hwmon/k10temp.c | 78 ++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 77 insertions(+), 1 deletion(-)
>
[snipping here for brevity]

--- svi-idle 2020-01-23 23:27:36.576177896 +0000
+++ svi-load 2020-01-23 23:33:05.342392957 +0000
@@ -1,8 +1,8 @@
-0x05a000: 0000000e 0000000e 00000002 01710000
-0x05a010: 014a0010 00000000 0000000e 00000000
-0x05a020: 00000000 00000000 00000080 005f0000
+0x05a000: 0000000e 0000000e 00000002 011f002e
+0x05a010: 014a0017 00000000 0000000e 00000000
+0x05a020: 00000000 00000000 00000080 001a0000
0x05a030: 00000000 00000000 00000021 00000000
-0x05a040: 00000000 00000000 00000000 5f000000
+0x05a040: 00000000 00000000 00000000 1a000000
0x05a050: 68000000 48000000 00000000 0000030a
0x05a060: 00000007 00000000 80000002 80000002
0x05a070: 80000041 00000001 00000008 00000000

--- thm-idle 2020-01-23 23:27:51.969229368 +0000
+++ thm-load 2020-01-23 23:33:19.779445923 +0000
@@ -1,15 +1,15 @@
-0x059800: 24200fef 00ff1001 00002921 000f4240
+0x059800: 3f800fef 00ff1001 00002921 000f4240
0x059810: 800000f9 00000000 00000000 00000000
0x059820: 00000000 00000000 00000000 0fff0078
-0x059830: 00000000 0029ccdf 0029acde 002a2ce2
-0x059840: 002a4ce3 002a0ce1 002a0ce1 002a6ce4
-0x059850: 0029ece0 0029ece0 002a0ce1 002a0ce1
-0x059860: 0029acde 002a8ce5 0029ece0 0029acde
-0x059870: 00298cdd 0029ece0 002a8ce5 002a4ce3
-0x059880: 0029ccdf 002a8ce5 0029acde 00296cdc
-0x059890: 002a4ce3 00296cdc 0029ece0 0029acde
-0x0598a0: 00294cdb 0029ece0 00294cdb 00298cdd
-0x0598b0: 0029acde 00000000 00002100 ffffffff
+0x059830: 00000000 0030cd17 002e8d05 002f4d0b
+0x059840: 00338d2c 0032cd26 00314d1b 0034cd36
+0x059850: 002d8cfd 002e2d02 00300d11 002eed08
+0x059860: 002dccff 002fcd0f 002d4cfb 002e0d01
+0x059870: 002ded00 002f2d0a 00346d33 00344d32
+0x059880: 002f8d0d 00346d33 002f4d0b 0030cd17
+0x059890: 00344d32 00302d12 0031ed20 00386d53
+0x0598a0: 00392d59 0036ad45 0036ed47 0034ad35
+0x0598b0: 0034ad35 00000000 00002100 ffffffff
0x0598c0: 00000000 00000000 00000000 00000000
0x0598d0: 00000000 00000000 00000000 00000000
0x0598e0: 00000000 00000000 00000000 00000000
@@ -20,15 +20,15 @@
0x059930: 00000000 00000000 00000000 00000000
0x059940: 00000000 00000000 00000000 00000000
0x059950: 00000000 00000000 00000000 00000000
-0x059960: 00000000 08400001 00004623 00000039
+0x059960: 00000000 08400001 00008241 00000045
0x059970: c0800005 30c8680e 00024068 00000000
0x059980: 00000000 00000000 00000000 00000000
0x059990: 00000000 00000000 00000000 00000000
0x0599a0: 00000000 00000000 00000000 00000000
0x0599b0: 00000000 00000000 00000000 00000000
-0x0599c0: 00000060 000002a8 0000000c 00000294
-0x0599d0: 0000001b 00000000 00000000 000002a8
-0x0599e0: 0000000c 00000000 00000000 00000001
+0x0599c0: 00000060 00000392 0000001b 000002d4
+0x0599d0: 0000000d 00000000 00000000 00000392
+0x0599e0: 0000001b 00000000 00000000 00000001
0x0599f0: 00000000 00010003 00000000 00000000
0x059a00: 00000000 00000000 00000000 00000000
0x059a10: 0000000e 00000000 00000003 00000000

and the accompanying human-readable sensor output
(these were not all taken at hte exact same moment)

--- k10-idle 2020-01-23 23:25:32.020740997 +0000
+++ k10-load 2020-01-23 23:33:01.305378146 +0000
@@ -1,15 +1,15 @@
k10temp-pci-00c3
Adapter: PCI adapter
-Vcore: +0.96 V
-Vsoc: +1.09 V
-Tdie: +36.9°C
-Tctl: +36.9°C
-Icore: +2.00 A
-Isoc: +5.75 A
+Vcore: +1.34 V
+Vsoc: +1.08 V
+Tdie: +62.5°C
+Tctl: +62.5°C
+Icore: +56.00 A
+Isoc: +6.75 A

amdgpu-pci-0900
Adapter: PCI adapter
vddgfx: N/A
vddnb: N/A
-edge: +36.0°C (crit = +80.0°C, hyst = +0.0°C)
+edge: +62.0°C (crit = +80.0°C, hyst = +0.0°C)

Hope this is not a waste of your time.
Would you like similar for the 2500u ?

ĸen
--
I live in a city. I know sparrows from starlings. After that
everything is a duck as far as I'm concerned. -- Monstrous Regiment

2020-01-24 05:00:29

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH v4 6/6] hwmon: (k10temp) Add debugfs support

Hi Ken,

On 1/23/20 4:01 PM, Ken Moffat wrote:
> Hi Guenter,
>

Thanks a lot for the additional information. The following
is interesting.

> -0x059960: 00000000 08400001 00004623 00000039
> +0x059960: 00000000 08400001 00008241 00000045

The last two blocks also temperatures. In the AMD thermal code,
we find definitions for CG_MULT_THERMAL_STATUS and
CG_THERMAL_RANGE. The first consists of 2 x 9 bit (0x23
and 0x43 above for idle and under load), the second is just
a value. On Zen2, the address for those values is 20 higher
(0x05997c instead of 0x059968), but the numbers are pretty
much the same. The AMD thermal code reads those values for
some graphics chips and displays it directly in degrees C.

I am just not sure what exactly it represents. I see those
temperatures on 3900X as well. Actually, it looks like all
chips report them, including server chips, so it is not the
graphics temperature. But it is definitely worth keeping an eye
on it; maybe someone can figure out what it is.

> Hope this is not a waste of your time.

No, it is definitely worth it. It will give me data to work with
in the future.

> Would you like similar for the 2500u ?
>
Yes, that would be great.

Thanks,
Guenter

2020-01-25 17:42:12

by Holger Kiehl

[permalink] [raw]
Subject: Re: [PATCH v4 0/6] hwmon: k10temp driver improvements

On Wed, 22 Jan 2020, Guenter Roeck wrote:

> This patch series implements various improvements for the k10temp driver.
>
> Patch 1/6 introduces the use of bit operations.
>
> Patch 2/6 converts the driver to use the devm_hwmon_device_register_with_info
> API. This not only simplifies the code and reduces its size, it also
> makes the code easier to maintain and enhance.
>
> Patch 3/6 adds support for reporting Core Complex Die (CCD) temperatures
> on Zen2 (Ryzen and Threadripper) CPUs (note that reporting is incomplete
> for Threadripper CPUs - it is known that additional temperature sensors
> exist, but the register locations are unknown).
>
> Patch 4/6 adds support for reporting core and SoC current and voltage
> information on Ryzen CPUs (note: voltage and current measurements for
> Threadripper and EPYC CPUs are known to exist, but register locations
> are unknown, and values are therefore not reported at this time).
>
> Patch 5/6 removes the maximum temperature from Tdie for Ryzen CPUs.
> It is inaccurate, misleading, and it just doesn't make sense to report
> wrong information.
>
> Patch 6/6 adds debugfs files to provide raw thermal and SVI register
> dumps. This may help in the future to identify additional sensors and/or
> to fix problems.
>
> With all patches in place, output on Ryzen 3900X CPUs looks as follows
> (with the system under load).
>
> k10temp-pci-00c3
> Adapter: PCI adapter
> Vcore: +1.39 V
> Vsoc: +1.18 V
> Tdie: +79.9°C
> Tctl: +79.9°C
> Tccd1: +61.8°C
> Tccd2: +76.5°C
> Icore: +46.00 A
> Isoc: +12.00 A
>
> The voltage and current information is limited to Ryzen CPUs. Voltage
> and current reporting on Threadripper and EPYC CPUs is different, and the
> reported information is either incomplete or wrong. Exclude it for the time
> being; it can always be added if/when more information becomes available.
>
> Tested with the following Ryzen CPUs:
> 1300X A user with this CPU in the system reported somewhat unexpected
> values for Vcore; it isn't entirely if at all clear why that is
> the case. Overall this does not warrant holding up the series.
> 1600
> 1800X
> 2200G
> 2400G
> 2700
> 2700X
> 2950X
> 3600X
> 3800X
> 3900X
> 3950X
> 3970X
> EPYC 7302
> EPYC 7742
>
Below some more testing on two Deskmini A300. One with a 2400G and
the other 3400G. Both have the same bios and board.

Regards,
Holger


Deskmini A300 Ryzen 5 2400G

Idle:
=====
cat /sys/kernel/debug/k10temp-0000\:00\:18.3/thm
0x059800: 2c800fef 00ff1001 00002921 000f4240
0x059810: 800000f9 00000000 00000000 00000000
0x059820: 00000000 00000000 00000000 0fff0078
0x059830: 00000000 002ead28 002ead28 002e6d26
0x059840: 002e8d27 002ded23 002e8d27 002f0d2b
0x059850: 002d6d1f 002dcd22 002dcd22 002ecd29
0x059860: 002d6d1f 002e8d27 002ded23 002e4d25
0x059870: 002e6d26 002e4d25 002e4d25 002ead28
0x059880: 002e0d24 002ead28 002d4d1e 002d8d20
0x059890: 002dcd22 002e6d26 002e6d26 002d8d20
0x0598a0: 002dcd22 002dcd22 002ded23 002d4d1e
0x0598b0: 002e4d25 00000000 00002100 ffffffff
0x0598c0: 00000000 00000000 00000000 00000000
0x0598d0: 00000000 00000000 00000000 00000000
0x0598e0: 00000000 00000000 00000000 00000000
0x0598f0: 00000000 00000000 00000000 00000000
0x059900: 00000000 00000000 00000000 00000000
0x059910: 00000000 00000000 00000000 00000000
0x059920: 00000000 00000000 00000000 00000000
0x059930: 00000000 00000000 00000000 00000000
0x059940: 00000000 00000000 00000000 00000000
0x059950: 00000000 00000000 00000000 00000000
0x059960: 00000000 08400001 0000582c 0000004e
0x059970: c0800005 30c8680e 00024068 00000000
0x059980: 00000000 00000000 00000000 00000000
0x059990: 00000000 00000000 00000000 00000000
0x0599a0: 00000000 00000000 00000000 00000000
0x0599b0: 00000000 00000000 00000000 00000000
0x0599c0: 00000060 000002f0 00000006 000002d4
0x0599d0: 00000015 00000000 00000000 000002f0
0x0599e0: 00000006 00000000 00000000 00000001
0x0599f0: 00000000 00010003 00000000 00000000
0x059a00: 00000000 00000000 00000000 00000000
0x059a10: 0000000e 00000000 00000003 00000000
0x059a20: 001f001a 00050003 00000000 00000000
0x059a30: 00df0010 00000000 00000000 00000000
0x059a40: 00000000 00000000 00000007 000000fe
0x059a50: 00000000 00000000 00000000 00000000
0x059a60: 00000000 00130082 0000063f 12110201
0x059a70: 0003005a 00001303 00000000 028a4f5c
0x059a80: 08036927 0021e548 00000000 7fffffff
0x059a90: 00000000 00000043 c00001c0 000000f9
0x059aa0: 00000000 00000000 00000000 00000000
0x059ab0: 00000000 00000000 00000000 00000000
0x059ac0: 00000000 00000000 00000000 00000000
0x059ad0: 00000000 00000000 00000000 00000000
0x059ae0: 00000000 00000000 00000000 00000000
0x059af0: 00000000 00000000 00000000 00000000
0x059b00: 00000000 00000000 00000000 00000000
0x059b10: 00000000 00000000 00000000 00000000
0x059b20: 00000000 00000000 00000000 00000000
0x059b30: 00000000 00000000 00000000 00000000
0x059b40: 00000000 00000000 00000000 00000000
0x059b50: 00000000 00000000 00000000 00000000
0x059b60: 00000000 00000000 00000000 00000000
0x059b70: 00000000 00000000 00000000 00000000
0x059b80: 00000000 00000000 00000000 00000000
0x059b90: 00000000 00000000 00000000 00000000
0x059ba0: 00000000 00000000 00000000 00000000
0x059bb0: 00000000 00000000 00000000 00000000
0x059bc0: 00000000 00000000 00000000 00000000
0x059bd0: 00000000 00000000 00000000 00000000
0x059be0: 00000000 00000000 00000000 00000000
0x059bf0: 00000000 00000000 00000000 00000000
cat /sys/kernel/debug/k10temp-0000\:00\:18.3/svi
0x05a000: 0000000e 0000002e 00000002 017d0013
0x05a010: 01490011 00000000 0000000e 00000000
0x05a020: 00000000 80000000 00000000 007a0000
0x05a030: 00000000 00000000 00000021 00000000
0x05a040: 00000000 00000000 00000000 7a000000
0x05a050: 68000000 48000000 00000000 0000030a
0x05a060: 00000007 00000000 80000002 80000002
0x05a070: 80000041 00000001 00000008 00000000

k10temp-pci-00c3
Adapter: PCI adapter
Vcore: +0.77 V
Vsoc: +1.09 V
Tdie: +44.5°C
Tctl: +44.5°C
Icore: +21.00 A
Isoc: +4.00 A

nvme-pci-0100
Adapter: PCI adapter
Composite: +41.9°C (low = -273.1°C, high = +80.8°C)
(crit = +80.8°C)
Sensor 1: +41.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 2: +44.9°C (low = -273.1°C, high = +65261.8°C)

nct6793-isa-0290
Adapter: ISA adapter
in0: +0.42 V (min = +0.00 V, max = +1.74 V)
in1: +1.85 V (min = +0.00 V, max = +0.00 V) ALARM
in2: +3.41 V (min = +0.00 V, max = +0.00 V) ALARM
in3: +3.41 V (min = +0.00 V, max = +0.00 V) ALARM
in4: +0.26 V (min = +0.00 V, max = +0.00 V) ALARM
in5: +0.14 V (min = +0.00 V, max = +0.00 V) ALARM
in6: +0.67 V (min = +0.00 V, max = +0.00 V) ALARM
in7: +3.39 V (min = +0.00 V, max = +0.00 V) ALARM
in8: +3.26 V (min = +0.00 V, max = +0.00 V) ALARM
in9: +1.84 V (min = +0.00 V, max = +0.00 V) ALARM
in10: +0.19 V (min = +0.00 V, max = +0.00 V) ALARM
in11: +0.14 V (min = +0.00 V, max = +0.00 V) ALARM
in12: +1.85 V (min = +0.00 V, max = +0.00 V) ALARM
in13: +1.72 V (min = +0.00 V, max = +0.00 V) ALARM
in14: +0.20 V (min = +0.00 V, max = +0.00 V) ALARM
fan1: 0 RPM (min = 0 RPM)
fan2: 320 RPM (min = 0 RPM)
fan3: 0 RPM (min = 0 RPM)
fan4: 0 RPM (min = 0 RPM)
fan5: 0 RPM (min = 0 RPM)
SYSTIN: +113.0°C (high = +0.0°C, hyst = +0.0°C) sensor = thermistor
CPUTIN: +59.5°C (high = +80.0°C, hyst = +75.0°C) sensor = thermistor
AUXTIN0: +45.0°C (high = +0.0°C, hyst = +0.0°C) ALARM sensor = thermistor
AUXTIN1: +106.0°C sensor = thermistor
AUXTIN2: +105.0°C sensor = thermistor
AUXTIN3: +102.0°C sensor = thermistor
SMBUSMASTER 0: +44.5°C
PCH_CHIP_CPU_MAX_TEMP: +0.0°C
PCH_CHIP_TEMP: +0.0°C
PCH_CPU_TEMP: +0.0°C
intrusion0: OK
intrusion1: ALARM
beep_enable: disabled

amdgpu-pci-0300
Adapter: PCI adapter
vddgfx: N/A
vddnb: N/A
edge: +44.0°C (crit = +80.0°C, hyst = +0.0°C)


Load:
=====
cat /sys/kernel/debug/k10temp-0000\:00\:18.3/thm
0x059800: 4b400fef 00ff1001 00002921 000f4240
0x059810: 800000f9 00000000 00000000 00000000
0x059820: 00000000 00000000 00000000 0fff0078
0x059830: 00000000 0034ad58 00308d37 0031ed42
0x059840: 0035ed62 00350d5b 00344d55 00378d6f
0x059850: 00308d37 0030cd39 00328d47 0031cd41
0x059860: 0030ad38 00318d3f 002fcd31 0030cd39
0x059870: 00312d3c 00322d44 00376d6e 00376d6e
0x059880: 00322d44 00378d6f 00324d45 00342d54
0x059890: 0037ad70 00336d4e 00350d5b 003dcda1
0x0598a0: 003dcda1 003acd89 003e2da4 00366d66
0x0598b0: 0035cd61 00000000 00002100 ffffffff
0x0598c0: 00000000 00000000 00000000 00000000
0x0598d0: 00000000 00000000 00000000 00000000
0x0598e0: 00000000 00000000 00000000 00000000
0x0598f0: 00000000 00000000 00000000 00000000
0x059900: 00000000 00000000 00000000 00000000
0x059910: 00000000 00000000 00000000 00000000
0x059920: 00000000 00000000 00000000 00000000
0x059930: 00000000 00000000 00000000 00000000
0x059940: 00000000 00000000 00000000 00000000
0x059950: 00000000 00000000 00000000 00000000
0x059960: 00000000 08400001 0000944a 0000004f
0x059970: c0800005 30c8680e 00024068 00000000
0x059980: 00000000 00000000 00000000 00000000
0x059990: 00000000 00000000 00000000 00000000
0x0599a0: 00000000 00000000 00000000 00000000
0x0599b0: 00000000 00000000 00000000 00000000
0x0599c0: 00000060 000003e2 0000001d 000002fc
0x0599d0: 0000000d 00000000 00000000 000003e2
0x0599e0: 0000001d 00000000 00000000 00000001
0x0599f0: 00000000 00010003 00000000 00000000
0x059a00: 00000000 00000000 00000000 00000000
0x059a10: 0000000e 00000000 00000003 00000000
0x059a20: 001f001a 00050003 00000000 00000000
0x059a30: 00df0001 00000000 00000000 00000000
0x059a40: 00000000 00000000 00000007 000000fe
0x059a50: 00000000 00000000 00000000 00000000
0x059a60: 00000000 00130082 0000063f 12110201
0x059a70: 0003005a 00001303 00000000 028a4f5c
0x059a80: 08036927 0021e548 00000000 7fffffff
0x059a90: 00000000 00000043 c00001c0 000000f9
0x059aa0: 00000000 00000000 00000000 00000000
0x059ab0: 00000000 00000000 00000000 00000000
0x059ac0: 00000000 00000000 00000000 00000000
0x059ad0: 00000000 00000000 00000000 00000000
0x059ae0: 00000000 00000000 00000000 00000000
0x059af0: 00000000 00000000 00000000 00000000
0x059b00: 00000000 00000000 00000000 00000000
0x059b10: 00000000 00000000 00000000 00000000
0x059b20: 00000000 00000000 00000000 00000000
0x059b30: 00000000 00000000 00000000 00000000
0x059b40: 00000000 00000000 00000000 00000000
0x059b50: 00000000 00000000 00000000 00000000
0x059b60: 00000000 00000000 00000000 00000000
0x059b70: 00000000 00000000 00000000 00000000
0x059b80: 00000000 00000000 00000000 00000000
0x059b90: 00000000 00000000 00000000 00000000
0x059ba0: 00000000 00000000 00000000 00000000
0x059bb0: 00000000 00000000 00000000 00000000
0x059bc0: 00000000 00000000 00000000 00000000
0x059bd0: 00000000 00000000 00000000 00000000
0x059be0: 00000000 00000000 00000000 00000000
0x059bf0: 00000000 00000000 00000000 00000000
cat /sys/kernel/debug/k10temp-0000\:00\:18.3/svi
0x05a000: 0000000e 0000002e 00000002 01220067
0x05a010: 01490013 00000000 0000000e 00000000
0x05a020: 00000000 00000000 00000000 00150000
0x05a030: 00000000 00000000 00000021 00000000
0x05a040: 00000000 00000000 00000000 15000000
0x05a050: 68000000 48000000 00000000 0000030a
0x05a060: 00000007 00000000 80000002 80000002
0x05a070: 80000041 00000001 00000008 00000000

k10temp-pci-00c3
Adapter: PCI adapter
Vcore: +1.32 V
Vsoc: +1.10 V
Tdie: +74.8°C
Tctl: +74.8°C
Icore: +99.00 A
Isoc: +5.00 A

nvme-pci-0100
Adapter: PCI adapter
Composite: +41.9°C (low = -273.1°C, high = +80.8°C)
(crit = +80.8°C)
Sensor 1: +41.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 2: +44.9°C (low = -273.1°C, high = +65261.8°C)

nct6793-isa-0290
Adapter: ISA adapter
in0: +0.68 V (min = +0.00 V, max = +1.74 V)
in1: +1.85 V (min = +0.00 V, max = +0.00 V) ALARM
in2: +3.41 V (min = +0.00 V, max = +0.00 V) ALARM
in3: +3.39 V (min = +0.00 V, max = +0.00 V) ALARM
in4: +0.26 V (min = +0.00 V, max = +0.00 V) ALARM
in5: +0.14 V (min = +0.00 V, max = +0.00 V) ALARM
in6: +0.67 V (min = +0.00 V, max = +0.00 V) ALARM
in7: +3.39 V (min = +0.00 V, max = +0.00 V) ALARM
in8: +3.26 V (min = +0.00 V, max = +0.00 V) ALARM
in9: +1.84 V (min = +0.00 V, max = +0.00 V) ALARM
in10: +0.19 V (min = +0.00 V, max = +0.00 V) ALARM
in11: +0.14 V (min = +0.00 V, max = +0.00 V) ALARM
in12: +1.85 V (min = +0.00 V, max = +0.00 V) ALARM
in13: +1.72 V (min = +0.00 V, max = +0.00 V) ALARM
in14: +0.20 V (min = +0.00 V, max = +0.00 V) ALARM
fan1: 0 RPM (min = 0 RPM)
fan2: 1934 RPM (min = 0 RPM)
fan3: 0 RPM (min = 0 RPM)
fan4: 0 RPM (min = 0 RPM)
fan5: 0 RPM (min = 0 RPM)
SYSTIN: +113.0°C (high = +0.0°C, hyst = +0.0°C) sensor = thermistor
CPUTIN: +63.5°C (high = +80.0°C, hyst = +75.0°C) sensor = thermistor
AUXTIN0: +45.0°C (high = +0.0°C, hyst = +0.0°C) ALARM sensor = thermistor
AUXTIN1: +106.0°C sensor = thermistor
AUXTIN2: +105.0°C sensor = thermistor
AUXTIN3: +103.0°C sensor = thermistor
SMBUSMASTER 0: +74.5°C
PCH_CHIP_CPU_MAX_TEMP: +0.0°C
PCH_CHIP_TEMP: +0.0°C
PCH_CPU_TEMP: +0.0°C
intrusion0: OK
intrusion1: ALARM
beep_enable: disabled

amdgpu-pci-0300
Adapter: PCI adapter
vddgfx: N/A
vddnb: N/A
edge: +74.0°C (crit = +80.0°C, hyst = +0.0°C)

-------------------------------------------------------------------------------
Deskmini A300 Ryzen 5 3400G

Idle:
=====
cat /sys/kernel/debug/k10temp-0000\:00\:18.3/thm
0x059800: 25800fef 00ff1001 00002921 000f4240
0x059810: 800000f9 00000000 00000000 00000000
0x059820: 00000000 00000000 00000000 0fff0078
0x059830: 00000000 002aed0d 002aad0b 002a4d08
0x059840: 002a6d09 002acd0c 002a6d09 002aad0b
0x059850: 002a6d09 002a6d09 002a6d09 002a8d0a
0x059860: 002aad0b 002a8d0a 002a4d08 002a2d07
0x059870: 002a8d0a 002a6d09 002ac000 002b2000
0x059880: 002ac000 002a6000 002ac000 002ac000
0x059890: 002b0000 002aa000 002ac000 002aa000
0x0598a0: 002a8000 002a6000 002a2000 002a6000
0x0598b0: 002a4000 00000000 00002100 ffffffff
0x0598c0: 00000000 00000000 00000000 00000000
0x0598d0: 00000000 00000000 00000000 00000000
0x0598e0: 00000000 00000000 00000000 00000000
0x0598f0: 00000000 00000000 00000000 00000000
0x059900: 00000000 00000000 00000000 00000000
0x059910: 00000000 00000000 00000000 00000000
0x059920: 00000000 00000000 00000000 00000000
0x059930: 00000000 00000000 00000000 00000000
0x059940: 00000000 00000000 00000000 00000000
0x059950: 00000000 00000000 00000000 00000000
0x059960: 00000000 08400001 00004a25 00000047
0x059970: c0800005 30c8680e 00024068 00000000
0x059980: 00000000 00000000 00000000 00000000
0x059990: 00000000 00000000 00000000 00000000
0x0599a0: 00000000 00000000 00000000 00000000
0x0599b0: 00000000 00000000 00000000 00000000
0x0599c0: 00000060 000002b4 00000012 0000029c
0x0599d0: 00000007 00000000 00000000 000002b4
0x0599e0: 00000012 00000000 00000000 00000001
0x0599f0: 00000000 00010003 00000000 00000000
0x059a00: 00000000 00000000 00000000 00000000
0x059a10: 0000000e 00000000 00000003 00000000
0x059a20: 001f001a 00050003 00000000 00000000
0x059a30: 000b0010 00000000 00000000 00000000
0x059a40: 00000000 00000000 00000007 000000fe
0x059a50: 00000000 00000000 00000000 00000000
0x059a60: 00000000 00130082 0000063f 12110201
0x059a70: 0003005a 00001303 00000000 028a4f5c
0x059a80: 08036927 0021e548 00000000 7fffffff
0x059a90: 00000000 00000043 c00001c0 800000f9
0x059aa0: 00000000 00000000 00000000 00000000
0x059ab0: 00000000 00000000 00000000 00000000
0x059ac0: 00000000 00000000 00000000 00000000
0x059ad0: 00000000 00000000 00000000 00000000
0x059ae0: 00000000 00000000 00000000 00000000
0x059af0: 00000000 00000000 00000000 00000000
0x059b00: 00000000 00000000 00000000 00000000
0x059b10: 00000000 00000000 00000000 00000000
0x059b20: 00000000 00000000 00000000 00000000
0x059b30: 00000000 00000000 00000000 00000000
0x059b40: 00000000 00000000 00000000 00000000
0x059b50: 00000000 00000000 00000000 00000000
0x059b60: 00000000 00000000 00000000 00000000
0x059b70: 00000000 00000000 00000000 00000000
0x059b80: 00000000 00000000 00000000 00000000
0x059b90: 00000000 00000000 00000000 00000000
0x059ba0: 00000000 00000000 00000000 00000000
0x059bb0: 00000000 00000000 00000000 00000000
0x059bc0: 00000000 00000000 00000000 00000000
0x059bd0: 00000000 00000000 00000000 00000000
0x059be0: 00000000 00000000 00000000 00000000
0x059bf0: 00000000 00000000 00000000 00000000
cat /sys/kernel/debug/k10temp-0000\:00\:18.3/svi
0x05a000: 0000000e 0000002e 00000002 018b0013
0x05a010: 017d000e 00000000 0000000e 00000000
0x05a020: 00000000 80000000 00000000 00890000
0x05a030: 00000000 00000000 00000021 00000000
0x05a040: 00000000 00000000 00000000 89000000
0x05a050: 68000000 7c000000 00000000 0000030a
0x05a060: 00000007 00000000 80000002 80000002
0x05a070: 80000041 00000001 00000008 00000000

nct6793-isa-0290
Adapter: ISA adapter
in0: +2.04 V (min = +0.00 V, max = +1.74 V) ALARM
in1: +1.85 V (min = +0.00 V, max = +0.00 V) ALARM
in2: +3.39 V (min = +0.00 V, max = +0.00 V) ALARM
in3: +3.39 V (min = +0.00 V, max = +0.00 V) ALARM
in4: +0.26 V (min = +0.00 V, max = +0.00 V) ALARM
in5: +0.14 V (min = +0.00 V, max = +0.00 V) ALARM
in6: +0.75 V (min = +0.00 V, max = +0.00 V) ALARM
in7: +3.39 V (min = +0.00 V, max = +0.00 V) ALARM
in8: +3.26 V (min = +0.00 V, max = +0.00 V) ALARM
in9: +1.85 V (min = +0.00 V, max = +0.00 V) ALARM
in10: +0.18 V (min = +0.00 V, max = +0.00 V) ALARM
in11: +0.14 V (min = +0.00 V, max = +0.00 V) ALARM
in12: +1.85 V (min = +0.00 V, max = +0.00 V) ALARM
in13: +1.71 V (min = +0.00 V, max = +0.00 V) ALARM
in14: +0.19 V (min = +0.00 V, max = +0.00 V) ALARM
fan1: 0 RPM (min = 0 RPM)
fan2: 316 RPM (min = 0 RPM)
fan3: 0 RPM (min = 0 RPM)
fan4: 0 RPM (min = 0 RPM)
fan5: 0 RPM (min = 0 RPM)
SYSTIN: +115.0°C (high = +0.0°C, hyst = +0.0°C) sensor = thermistor
CPUTIN: +46.5°C (high = +80.0°C, hyst = +75.0°C) sensor = thermistor
AUXTIN0: +40.0°C (high = +0.0°C, hyst = +0.0°C) ALARM sensor = thermistor
AUXTIN1: +108.0°C sensor = thermistor
AUXTIN2: +107.0°C sensor = thermistor
AUXTIN3: +106.0°C sensor = thermistor
SMBUSMASTER 0: +42.5°C
PCH_CHIP_CPU_MAX_TEMP: +0.0°C
PCH_CHIP_TEMP: +0.0°C
PCH_CPU_TEMP: +0.0°C
intrusion0: OK
intrusion1: ALARM
beep_enable: disabled

amdgpu-pci-0400
Adapter: PCI adapter
vddgfx: N/A
vddnb: N/A
edge: +42.0°C (crit = +80.0°C, hyst = +0.0°C)

nvme-pci-0200
Adapter: PCI adapter
Composite: +35.9°C (low = -273.1°C, high = +80.8°C)
(crit = +80.8°C)
Sensor 1: +35.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 2: +37.9°C (low = -273.1°C, high = +65261.8°C)

k10temp-pci-00c3
Adapter: PCI adapter
Vcore: +0.69 V
Vsoc: +0.78 V
Tdie: +42.1°C
Tctl: +42.1°C
Icore: +24.00 A
Isoc: +4.00 A

nvme-pci-0100
Adapter: PCI adapter
Composite: +36.9°C (low = -273.1°C, high = +80.8°C)
(crit = +80.8°C)
Sensor 1: +36.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 2: +39.9°C (low = -273.1°C, high = +65261.8°C)


Load:
=====
cat /sys/kernel/debug/k10temp-0000\:00\:18.3/thm
0x059800: 46e00fef 00ff1001 00002921 000f4240
0x059810: 800000f9 00000000 00000000 00000000
0x059820: 00000000 00000000 00000000 0fff0078
0x059830: 00000000 0033ad52 0030cd3b 00312d3e
0x059840: 00350d5d 00346d58 0033ad52 0035cd63
0x059850: 002f6d30 00300d35 00322d46 0030ed3c
0x059860: 00300d35 0031ed44 002f8d31 00302d36
0x059870: 00300d35 0031cd43 0035cd63 00366d68
0x059880: 00320d45 00360d65 00320d45 00336d50
0x059890: 00364d67 00328d49 00342d56 003b2d8e
0x0598a0: 003c0d95 003aad8a 003aed8c 0035ad62
0x0598b0: 00350d5d 00000000 00002100 ffffffff
0x0598c0: 00000000 00000000 00000000 00000000
0x0598d0: 00000000 00000000 00000000 00000000
0x0598e0: 00000000 00000000 00000000 00000000
0x0598f0: 00000000 00000000 00000000 00000000
0x059900: 00000000 00000000 00000000 00000000
0x059910: 00000000 00000000 00000000 00000000
0x059920: 00000000 00000000 00000000 00000000
0x059930: 00000000 00000000 00000000 00000000
0x059940: 00000000 00000000 00000000 00000000
0x059950: 00000000 00000000 00000000 00000000
0x059960: 00000000 08400001 00008e47 00000049
0x059970: c0800005 30c8680e 00024068 00000000
0x059980: 00000000 00000000 00000000 00000000
0x059990: 00000000 00000000 00000000 00000000
0x0599a0: 00000000 00000000 00000000 00000000
0x0599b0: 00000000 00000000 00000000 00000000
0x0599c0: 00000060 000003c0 0000001b 000002f6
0x0599d0: 00000007 00000000 00000000 000003c0
0x0599e0: 0000001b 00000000 00000000 00000001
0x0599f0: 00000000 00010003 00000000 00000000
0x059a00: 00000000 00000000 00000000 00000000
0x059a10: 0000000e 00000000 00000003 00000000
0x059a20: 001f001a 00050003 00000000 00000000
0x059a30: 000b0010 00000000 00000000 00000000
0x059a40: 00000000 00000000 00000007 000000fe
0x059a50: 00000000 00000000 00000000 00000000
0x059a60: 00000000 00130082 0000063f 12110201
0x059a70: 0003005a 00001303 00000000 028a4f5c
0x059a80: 08036927 0021e548 00000000 7fffffff
0x059a90: 00000000 00000043 c00001c0 000000f9
0x059aa0: 00000000 00000000 00000000 00000000
0x059ab0: 00000000 00000000 00000000 00000000
0x059ac0: 00000000 00000000 00000000 00000000
0x059ad0: 00000000 00000000 00000000 00000000
0x059ae0: 00000000 00000000 00000000 00000000
0x059af0: 00000000 00000000 00000000 00000000
0x059b00: 00000000 00000000 00000000 00000000
0x059b10: 00000000 00000000 00000000 00000000
0x059b20: 00000000 00000000 00000000 00000000
0x059b30: 00000000 00000000 00000000 00000000
0x059b40: 00000000 00000000 00000000 00000000
0x059b50: 00000000 00000000 00000000 00000000
0x059b60: 00000000 00000000 00000000 00000000
0x059b70: 00000000 00000000 00000000 00000000
0x059b80: 00000000 00000000 00000000 00000000
0x059b90: 00000000 00000000 00000000 00000000
0x059ba0: 00000000 00000000 00000000 00000000
0x059bb0: 00000000 00000000 00000000 00000000
0x059bc0: 00000000 00000000 00000000 00000000
0x059bd0: 00000000 00000000 00000000 00000000
0x059be0: 00000000 00000000 00000000 00000000
0x059bf0: 00000000 00000000 00000000 00000000
cat /sys/kernel/debug/k10temp-0000\:00\:18.3/svi
0x05a000: 0000000e 0000002e 00000002 012e0078
0x05a010: 01490019 00000000 0000000e 00000000
0x05a020: 00000000 00000000 00000000 00210000
0x05a030: 00000000 00000000 00000021 00000000
0x05a040: 00000000 00000000 00000000 21000000
0x05a050: 68000000 48000000 00000000 0000030a
0x05a060: 00000007 00000000 00000002 80000002
0x05a070: 80000041 00000001 00000008 00000000

nct6793-isa-0290
Adapter: ISA adapter
in0: +0.66 V (min = +0.00 V, max = +1.74 V)
in1: +1.85 V (min = +0.00 V, max = +0.00 V) ALARM
in2: +3.41 V (min = +0.00 V, max = +0.00 V) ALARM
in3: +3.39 V (min = +0.00 V, max = +0.00 V) ALARM
in4: +0.25 V (min = +0.00 V, max = +0.00 V) ALARM
in5: +0.14 V (min = +0.00 V, max = +0.00 V) ALARM
in6: +0.75 V (min = +0.00 V, max = +0.00 V) ALARM
in7: +3.39 V (min = +0.00 V, max = +0.00 V) ALARM
in8: +3.26 V (min = +0.00 V, max = +0.00 V) ALARM
in9: +1.85 V (min = +0.00 V, max = +0.00 V) ALARM
in10: +0.18 V (min = +0.00 V, max = +0.00 V) ALARM
in11: +0.14 V (min = +0.00 V, max = +0.00 V) ALARM
in12: +1.85 V (min = +0.00 V, max = +0.00 V) ALARM
in13: +1.72 V (min = +0.00 V, max = +0.00 V) ALARM
in14: +0.19 V (min = +0.00 V, max = +0.00 V) ALARM
fan1: 0 RPM (min = 0 RPM)
fan2: 1797 RPM (min = 0 RPM)
fan3: 0 RPM (min = 0 RPM)
fan4: 0 RPM (min = 0 RPM)
fan5: 0 RPM (min = 0 RPM)
SYSTIN: +115.0°C (high = +0.0°C, hyst = +0.0°C) sensor = thermistor
CPUTIN: +57.0°C (high = +80.0°C, hyst = +75.0°C) sensor = thermistor
AUXTIN0: +39.0°C (high = +0.0°C, hyst = +0.0°C) ALARM sensor = thermistor
AUXTIN1: +108.0°C sensor = thermistor
AUXTIN2: +107.0°C sensor = thermistor
AUXTIN3: +106.0°C sensor = thermistor
SMBUSMASTER 0: +71.0°C
PCH_CHIP_CPU_MAX_TEMP: +0.0°C
PCH_CHIP_TEMP: +0.0°C
PCH_CPU_TEMP: +0.0°C
intrusion0: OK
intrusion1: ALARM
beep_enable: disabled

amdgpu-pci-0400
Adapter: PCI adapter
vddgfx: N/A
vddnb: N/A
edge: +71.0°C (crit = +80.0°C, hyst = +0.0°C)

nvme-pci-0200
Adapter: PCI adapter
Composite: +35.9°C (low = -273.1°C, high = +80.8°C)
(crit = +80.8°C)
Sensor 1: +35.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 2: +37.9°C (low = -273.1°C, high = +65261.8°C)

k10temp-pci-00c3
Adapter: PCI adapter
Vcore: +1.26 V
Vsoc: +1.09 V
Tdie: +71.0°C
Tctl: +71.0°C
Icore: +119.00 A
Isoc: +6.50 A

nvme-pci-0100
Adapter: PCI adapter
Composite: +36.9°C (low = -273.1°C, high = +80.8°C)
(crit = +80.8°C)
Sensor 1: +36.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 2: +39.9°C (low = -273.1°C, high = +65261.8°C)