2024-06-06 16:17:06

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH v4 0/8] Enhance AMD SMN Error Checking

Hi all,

This set implements more robust error checking for AMD System Management
Network (SMN) accesses.

Patches 1-3:
- Pre-patches in AMD64 EDAC and K10Temp modules.
- Required in order to avoid build warnings with the
introduction of the __must_check attribute in patch 4.

Patch 4:
- Introduces __must_check attribute for SMN access functions.

Patches 5-6:
- Optional cleanup patches in k10temp.
- Not required for the SMN access issue, but I thought they may
be good to do.

Patches 7-8:
- Minor changes in k10temp.
- Fix W=2 warnings found during build testing.

Thanks,
Yazen

Changes in v4:
- Rebased on tip/x86/urgent.
- Reword commit message for patch 4.
- Dropped stable tags.
- Included additional tags from Guenter.
- Link to v3: https://lore.kernel.org/r/[email protected]

Changes in v3:
- Added tags from Guenter and Mario.
- Removed unused variable in patch 2.
- Added patches 7 and 8 to fix extra warnings in k10temp.
- Link to v2: https://lore.kernel.org/r/[email protected]

Changes in v2:
- Address return code comments from Guenter.
- Link to v1: https://lore.kernel.org/r/[email protected]

To: Guenter Roeck <[email protected]>
To: [email protected]
To: Yazen Ghannam <[email protected]>
Cc: Mario Limonciello <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]

Yazen Ghannam (8):
EDAC/amd64: Remove unused register accesses
EDAC/amd64: Check return value of amd_smn_read()
hwmon: (k10temp) Check return value of amd_smn_read()
x86/amd_nb: Enhance SMN access error checking
hwmon: (k10temp) Define helper function to read CCD temp
hwmon: (k10temp) Reduce k10temp_get_ccd_support() parameters
hwmon: (k10temp) Remove unused HAVE_TDIE() macro
hwmon: (k10temp) Rename _data variable

arch/x86/include/asm/amd_nb.h | 4 +--
arch/x86/kernel/amd_nb.c | 39 ++++++++++++++++++++----
drivers/edac/amd64_edac.c | 69 ++++++++++++++++++++++++-------------------
drivers/edac/amd64_edac.h | 4 ---
drivers/hwmon/k10temp.c | 62 +++++++++++++++++++++++++-------------
5 files changed, 115 insertions(+), 63 deletions(-)

base_commit: c625dabbf1c4a8e77e4734014f2fde7aa9071a1f
change-id: 20240329-fix-smn-bad-read-2ee9ddcf3989



2024-06-06 16:17:07

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH v4 1/8] EDAC/amd64: Remove unused register accesses

A number of UMC registers are read only for the purpose of debug
printing. They are not used in any calculations. Nor do they have any
specific debug value.

Remove these register accesses.

Signed-off-by: Yazen Ghannam <[email protected]>
Reviewed-by: Mario Limonciello <[email protected]>
---
drivers/edac/amd64_edac.c | 18 +-----------------
drivers/edac/amd64_edac.h | 4 ----
2 files changed, 1 insertion(+), 21 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 1f3520d76861..4cedfb3b4cb6 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -20,7 +20,6 @@ static inline u32 get_umc_reg(struct amd64_pvt *pvt, u32 reg)
return reg;

switch (reg) {
- case UMCCH_ADDR_CFG: return UMCCH_ADDR_CFG_DDR5;
case UMCCH_ADDR_MASK_SEC: return UMCCH_ADDR_MASK_SEC_DDR5;
case UMCCH_DIMM_CFG: return UMCCH_DIMM_CFG_DDR5;
}
@@ -1339,22 +1338,15 @@ static void umc_debug_display_dimm_sizes(struct amd64_pvt *pvt, u8 ctrl)
static void umc_dump_misc_regs(struct amd64_pvt *pvt)
{
struct amd64_umc *umc;
- u32 i, tmp, umc_base;
+ u32 i;

for_each_umc(i) {
- umc_base = get_umc_base(i);
umc = &pvt->umc[i];

edac_dbg(1, "UMC%d DIMM cfg: 0x%x\n", i, umc->dimm_cfg);
edac_dbg(1, "UMC%d UMC cfg: 0x%x\n", i, umc->umc_cfg);
edac_dbg(1, "UMC%d SDP ctrl: 0x%x\n", i, umc->sdp_ctrl);
edac_dbg(1, "UMC%d ECC ctrl: 0x%x\n", i, umc->ecc_ctrl);
-
- amd_smn_read(pvt->mc_node_id, umc_base + UMCCH_ECC_BAD_SYMBOL, &tmp);
- edac_dbg(1, "UMC%d ECC bad symbol: 0x%x\n", i, tmp);
-
- amd_smn_read(pvt->mc_node_id, umc_base + UMCCH_UMC_CAP, &tmp);
- edac_dbg(1, "UMC%d UMC cap: 0x%x\n", i, tmp);
edac_dbg(1, "UMC%d UMC cap high: 0x%x\n", i, umc->umc_cap_hi);

edac_dbg(1, "UMC%d ECC capable: %s, ChipKill ECC capable: %s\n",
@@ -1367,14 +1359,6 @@ static void umc_dump_misc_regs(struct amd64_pvt *pvt)
edac_dbg(1, "UMC%d x16 DIMMs present: %s\n",
i, (umc->dimm_cfg & BIT(7)) ? "yes" : "no");

- if (umc->dram_type == MEM_LRDDR4 || umc->dram_type == MEM_LRDDR5) {
- amd_smn_read(pvt->mc_node_id,
- umc_base + get_umc_reg(pvt, UMCCH_ADDR_CFG),
- &tmp);
- edac_dbg(1, "UMC%d LRDIMM %dx rank multiply\n",
- i, 1 << ((tmp >> 4) & 0x3));
- }
-
umc_debug_display_dimm_sizes(pvt, i);
}
}
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index b879b12971e7..17228d07de4c 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -256,15 +256,11 @@
#define UMCCH_ADDR_MASK 0x20
#define UMCCH_ADDR_MASK_SEC 0x28
#define UMCCH_ADDR_MASK_SEC_DDR5 0x30
-#define UMCCH_ADDR_CFG 0x30
-#define UMCCH_ADDR_CFG_DDR5 0x40
#define UMCCH_DIMM_CFG 0x80
#define UMCCH_DIMM_CFG_DDR5 0x90
#define UMCCH_UMC_CFG 0x100
#define UMCCH_SDP_CTRL 0x104
#define UMCCH_ECC_CTRL 0x14C
-#define UMCCH_ECC_BAD_SYMBOL 0xD90
-#define UMCCH_UMC_CAP 0xDF0
#define UMCCH_UMC_CAP_HI 0xDF4

/* UMC CH bitfields */

--
2.34.1


2024-06-06 16:17:28

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH v4 2/8] EDAC/amd64: Check return value of amd_smn_read()

Check the return value of amd_smn_read() before saving a value. This
ensures invalid values aren't saved. The struct umc instance is
initialized to 0 during memory allocation. Therefore, a bad read will
keep the value as 0 providing the expected Read-as-Zero behavior.

Furthermore, the __must_check attribute will be added to amd_smn_read().
Therefore, this change is required to avoid compile-time warnings.

Signed-off-by: Yazen Ghannam <[email protected]>
Reviewed-by: Mario Limonciello <[email protected]>
---
drivers/edac/amd64_edac.c | 51 ++++++++++++++++++++++++++++++++++-------------
1 file changed, 37 insertions(+), 14 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 4cedfb3b4cb6..e958ade6ff24 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -1436,6 +1436,7 @@ static void umc_read_base_mask(struct amd64_pvt *pvt)
u32 *base, *base_sec;
u32 *mask, *mask_sec;
int cs, umc;
+ u32 tmp;

for_each_umc(umc) {
umc_base_reg = get_umc_base(umc) + UMCCH_BASE_ADDR;
@@ -1448,13 +1449,17 @@ static void umc_read_base_mask(struct amd64_pvt *pvt)
base_reg = umc_base_reg + (cs * 4);
base_reg_sec = umc_base_reg_sec + (cs * 4);

- if (!amd_smn_read(pvt->mc_node_id, base_reg, base))
+ if (!amd_smn_read(pvt->mc_node_id, base_reg, &tmp)) {
+ *base = tmp;
edac_dbg(0, " DCSB%d[%d]=0x%08x reg: 0x%x\n",
umc, cs, *base, base_reg);
+ }

- if (!amd_smn_read(pvt->mc_node_id, base_reg_sec, base_sec))
+ if (!amd_smn_read(pvt->mc_node_id, base_reg_sec, &tmp)) {
+ *base_sec = tmp;
edac_dbg(0, " DCSB_SEC%d[%d]=0x%08x reg: 0x%x\n",
umc, cs, *base_sec, base_reg_sec);
+ }
}

umc_mask_reg = get_umc_base(umc) + UMCCH_ADDR_MASK;
@@ -1467,13 +1472,17 @@ static void umc_read_base_mask(struct amd64_pvt *pvt)
mask_reg = umc_mask_reg + (cs * 4);
mask_reg_sec = umc_mask_reg_sec + (cs * 4);

- if (!amd_smn_read(pvt->mc_node_id, mask_reg, mask))
+ if (!amd_smn_read(pvt->mc_node_id, mask_reg, &tmp)) {
+ *mask = tmp;
edac_dbg(0, " DCSM%d[%d]=0x%08x reg: 0x%x\n",
umc, cs, *mask, mask_reg);
+ }

- if (!amd_smn_read(pvt->mc_node_id, mask_reg_sec, mask_sec))
+ if (!amd_smn_read(pvt->mc_node_id, mask_reg_sec, &tmp)) {
+ *mask_sec = tmp;
edac_dbg(0, " DCSM_SEC%d[%d]=0x%08x reg: 0x%x\n",
umc, cs, *mask_sec, mask_reg_sec);
+ }
}
}
}
@@ -2892,7 +2901,7 @@ static void umc_read_mc_regs(struct amd64_pvt *pvt)
{
u8 nid = pvt->mc_node_id;
struct amd64_umc *umc;
- u32 i, umc_base;
+ u32 i, tmp, umc_base;

/* Read registers from each UMC */
for_each_umc(i) {
@@ -2900,11 +2909,20 @@ static void umc_read_mc_regs(struct amd64_pvt *pvt)
umc_base = get_umc_base(i);
umc = &pvt->umc[i];

- amd_smn_read(nid, umc_base + get_umc_reg(pvt, UMCCH_DIMM_CFG), &umc->dimm_cfg);
- amd_smn_read(nid, umc_base + UMCCH_UMC_CFG, &umc->umc_cfg);
- amd_smn_read(nid, umc_base + UMCCH_SDP_CTRL, &umc->sdp_ctrl);
- amd_smn_read(nid, umc_base + UMCCH_ECC_CTRL, &umc->ecc_ctrl);
- amd_smn_read(nid, umc_base + UMCCH_UMC_CAP_HI, &umc->umc_cap_hi);
+ if (!amd_smn_read(nid, umc_base + get_umc_reg(pvt, UMCCH_DIMM_CFG), &tmp))
+ umc->dimm_cfg = tmp;
+
+ if (!amd_smn_read(nid, umc_base + UMCCH_UMC_CFG, &tmp))
+ umc->umc_cfg = tmp;
+
+ if (!amd_smn_read(nid, umc_base + UMCCH_SDP_CTRL, &tmp))
+ umc->sdp_ctrl = tmp;
+
+ if (!amd_smn_read(nid, umc_base + UMCCH_ECC_CTRL, &tmp))
+ umc->ecc_ctrl = tmp;
+
+ if (!amd_smn_read(nid, umc_base + UMCCH_UMC_CAP_HI, &tmp))
+ umc->umc_cap_hi = tmp;
}
}

@@ -3633,16 +3651,21 @@ static void gpu_read_mc_regs(struct amd64_pvt *pvt)
{
u8 nid = pvt->mc_node_id;
struct amd64_umc *umc;
- u32 i, umc_base;
+ u32 i, tmp, umc_base;

/* Read registers from each UMC */
for_each_umc(i) {
umc_base = gpu_get_umc_base(pvt, i, 0);
umc = &pvt->umc[i];

- amd_smn_read(nid, umc_base + UMCCH_UMC_CFG, &umc->umc_cfg);
- amd_smn_read(nid, umc_base + UMCCH_SDP_CTRL, &umc->sdp_ctrl);
- amd_smn_read(nid, umc_base + UMCCH_ECC_CTRL, &umc->ecc_ctrl);
+ if (!amd_smn_read(nid, umc_base + UMCCH_UMC_CFG, &tmp))
+ umc->umc_cfg = tmp;
+
+ if (!amd_smn_read(nid, umc_base + UMCCH_SDP_CTRL, &tmp))
+ umc->sdp_ctrl = tmp;
+
+ if (!amd_smn_read(nid, umc_base + UMCCH_ECC_CTRL, &tmp))
+ umc->ecc_ctrl = tmp;
}
}


--
2.34.1


2024-06-06 16:17:29

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH v4 3/8] hwmon: (k10temp) Check return value of amd_smn_read()

Check the return value of amd_smn_read() before saving a value. This
ensures invalid values aren't saved or used.

There are three cases here with slightly different behavior.

1) read_tempreg_nb_zen():
This is a function pointer which does not include a return code.
In this case, set the register value to 0 on failure. This
enforces Read-as-Zero behavior.

2) k10temp_read_temp():
This function does have return codes, so return the error code
from the failed register read. Continued operation is not
necessary, since there is no valid data from the register.
Furthermore, if the register value was set to 0, then the
following operation would underflow.

3) k10temp_get_ccd_support():
This function reads the same register from multiple CCD
instances in a loop. And a bitmask is formed if a specific bit
is set in each register instance. The loop should continue on a
failed register read, skipping the bit check.

Furthermore, the __must_check attribute will be added to amd_smn_read().
Therefore, this change is required to avoid compile-time warnings.

Signed-off-by: Yazen Ghannam <[email protected]>
Reviewed-by: Mario Limonciello <[email protected]>
Acked-by: Guenter Roeck <[email protected]>
---
drivers/hwmon/k10temp.c | 36 +++++++++++++++++++++++++++---------
1 file changed, 27 insertions(+), 9 deletions(-)

diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c
index 8092312c0a87..6cad35e7f182 100644
--- a/drivers/hwmon/k10temp.c
+++ b/drivers/hwmon/k10temp.c
@@ -153,8 +153,9 @@ static void read_tempreg_nb_f15(struct pci_dev *pdev, u32 *regval)

static void read_tempreg_nb_zen(struct pci_dev *pdev, u32 *regval)
{
- amd_smn_read(amd_pci_dev_to_node_id(pdev),
- ZEN_REPORTED_TEMP_CTRL_BASE, regval);
+ if (amd_smn_read(amd_pci_dev_to_node_id(pdev),
+ ZEN_REPORTED_TEMP_CTRL_BASE, regval))
+ *regval = 0;
}

static long get_raw_temp(struct k10temp_data *data)
@@ -205,6 +206,7 @@ static int k10temp_read_temp(struct device *dev, u32 attr, int channel,
long *val)
{
struct k10temp_data *data = dev_get_drvdata(dev);
+ int ret = -EOPNOTSUPP;
u32 regval;

switch (attr) {
@@ -221,13 +223,17 @@ static int k10temp_read_temp(struct device *dev, u32 attr, int channel,
*val = 0;
break;
case 2 ... 13: /* Tccd{1-12} */
- amd_smn_read(amd_pci_dev_to_node_id(data->pdev),
- ZEN_CCD_TEMP(data->ccd_offset, channel - 2),
- &regval);
+ ret = amd_smn_read(amd_pci_dev_to_node_id(data->pdev),
+ ZEN_CCD_TEMP(data->ccd_offset, channel - 2),
+ &regval);
+
+ if (ret)
+ return ret;
+
*val = (regval & ZEN_CCD_TEMP_MASK) * 125 - 49000;
break;
default:
- return -EOPNOTSUPP;
+ return ret;
}
break;
case hwmon_temp_max:
@@ -243,7 +249,7 @@ static int k10temp_read_temp(struct device *dev, u32 attr, int channel,
- ((regval >> 24) & 0xf)) * 500 + 52000;
break;
default:
- return -EOPNOTSUPP;
+ return ret;
}
return 0;
}
@@ -381,8 +387,20 @@ static void k10temp_get_ccd_support(struct pci_dev *pdev,
int i;

for (i = 0; i < limit; i++) {
- amd_smn_read(amd_pci_dev_to_node_id(pdev),
- ZEN_CCD_TEMP(data->ccd_offset, i), &regval);
+ /*
+ * Ignore inaccessible CCDs.
+ *
+ * Some systems will return a register value of 0, and the TEMP_VALID
+ * bit check below will naturally fail.
+ *
+ * Other systems will return a PCI_ERROR_RESPONSE (0xFFFFFFFF) for
+ * the register value. And this will incorrectly pass the TEMP_VALID
+ * bit check.
+ */
+ if (amd_smn_read(amd_pci_dev_to_node_id(pdev),
+ ZEN_CCD_TEMP(data->ccd_offset, i), &regval))
+ continue;
+
if (regval & ZEN_CCD_TEMP_VALID)
data->show_temp |= BIT(TCCD_BIT(i));
}

--
2.34.1


2024-06-06 16:18:03

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH v4 4/8] x86/amd_nb: Enhance SMN access error checking

AMD Zen-based systems use a System Management Network (SMN) that
provides access to implementation-specific registers.

SMN accesses are done indirectly through an index/data pair in PCI
config space. The accesses can fail for a variety of reasons.

Include code comments to describe some possible scenarios.

Require error checking for callers of amd_smn_read() and amd_smn_write().
This is needed because many error conditions cannot be checked by these
functions.

Also, drop the extern keyword as it's not needed. And remove a warning
that will not be triggered in many cases.

Signed-off-by: Yazen Ghannam <[email protected]>
Reviewed-by: Mario Limonciello <[email protected]>
---
arch/x86/include/asm/amd_nb.h | 4 ++--
arch/x86/kernel/amd_nb.c | 39 ++++++++++++++++++++++++++++++++++-----
2 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h
index 5c37944c8a5e..6f3b6aef47ba 100644
--- a/arch/x86/include/asm/amd_nb.h
+++ b/arch/x86/include/asm/amd_nb.h
@@ -21,8 +21,8 @@ extern int amd_numa_init(void);
extern int amd_get_subcaches(int);
extern int amd_set_subcaches(int, unsigned long);

-extern int amd_smn_read(u16 node, u32 address, u32 *value);
-extern int amd_smn_write(u16 node, u32 address, u32 value);
+int __must_check amd_smn_read(u16 node, u32 address, u32 *value);
+int __must_check amd_smn_write(u16 node, u32 address, u32 value);

struct amd_l3_cache {
unsigned indices;
diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
index 027a8c7a2c9e..0616a7727898 100644
--- a/arch/x86/kernel/amd_nb.c
+++ b/arch/x86/kernel/amd_nb.c
@@ -180,6 +180,38 @@ static struct pci_dev *next_northbridge(struct pci_dev *dev,
return dev;
}

+/*
+ * SMN accesses may fail in ways that are difficult to detect here in the called
+ * functions smn_read() and smn_write(). Therefore, callers of these functions
+ * must do their own checking based on what behavior they expect.
+ *
+ * For SMN reads, the returned SMN value may be zero if the register is
+ * Read-as-Zero . Or it may be a "PCI Error Response", e.g. all 0xFFs. The "PCI
+ * Error Response" can be checked here, and a proper error code can be returned.
+ * But the Read-as-Zero response cannot be verified here. A value of 0 may be
+ * correct in some cases, so callers must check that this correct is for the
+ * register/fields they need.
+ *
+ * For SMN writes, success can be determined through a "write and read back"
+ * procedure. However, this is not robust when done here.
+ *
+ * Possible issues:
+ * 1) Bits that are "Write-1-to-Clear". In this case, the read value should
+ * *not* match the write value.
+ * 2) Bits that are "Read-as-Zero"/"Writes-Ignored". This information cannot be
+ * known here.
+ * 3) Bits that are "Reserved / Set to 1". Ditto above.
+ *
+ * Callers of amd_smn_write() should do the "write and read back" check themselves,
+ * if needed.
+ *
+ * For #1, they can see if their target bits got cleared.
+ *
+ * For #2 and #3, they can check if their target bits got set as intended.
+ *
+ * This matches what is done for rdmsr/wrmsr. As long as there's no #GP, then
+ * the operation is considered a success, and the caller does their own checking.
+ */
static int __amd_smn_rw(u16 node, u32 address, u32 *value, bool write)
{
struct pci_dev *root;
@@ -202,9 +234,6 @@ static int __amd_smn_rw(u16 node, u32 address, u32 *value, bool write)

err = (write ? pci_write_config_dword(root, 0x64, *value)
: pci_read_config_dword(root, 0x64, value));
- if (err)
- pr_warn("Error %s SMN address 0x%x.\n",
- (write ? "writing to" : "reading from"), address);

out_unlock:
mutex_unlock(&smn_mutex);
@@ -213,7 +242,7 @@ static int __amd_smn_rw(u16 node, u32 address, u32 *value, bool write)
return err;
}

-int amd_smn_read(u16 node, u32 address, u32 *value)
+int __must_check amd_smn_read(u16 node, u32 address, u32 *value)
{
int err = __amd_smn_rw(node, address, value, false);

@@ -226,7 +255,7 @@ int amd_smn_read(u16 node, u32 address, u32 *value)
}
EXPORT_SYMBOL_GPL(amd_smn_read);

-int amd_smn_write(u16 node, u32 address, u32 value)
+int __must_check amd_smn_write(u16 node, u32 address, u32 value)
{
return __amd_smn_rw(node, address, &value, true);
}

--
2.34.1


2024-06-06 16:18:20

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH v4 5/8] hwmon: (k10temp) Define helper function to read CCD temp

The CCD temperature register is read in two places. These reads are done
using an AMD SMN access, and a number of parameters are needed for the
operation.

Move the SMN access and parameter gathering into a helper function in
order to simply the code flow. This also has a benefit of centralizing
the hardware register access in a single place in case fixes or special
decoding is required.

Signed-off-by: Yazen Ghannam <[email protected]>
Reviewed-by: Mario Limonciello <[email protected]>
Acked-by: Guenter Roeck <[email protected]>
---
drivers/hwmon/k10temp.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c
index 6cad35e7f182..315c52de6e54 100644
--- a/drivers/hwmon/k10temp.c
+++ b/drivers/hwmon/k10temp.c
@@ -158,6 +158,13 @@ static void read_tempreg_nb_zen(struct pci_dev *pdev, u32 *regval)
*regval = 0;
}

+static int read_ccd_temp_reg(struct k10temp_data *data, int ccd, u32 *regval)
+{
+ u16 node_id = amd_pci_dev_to_node_id(data->pdev);
+
+ return amd_smn_read(node_id, ZEN_CCD_TEMP(data->ccd_offset, ccd), regval);
+}
+
static long get_raw_temp(struct k10temp_data *data)
{
u32 regval;
@@ -223,9 +230,7 @@ static int k10temp_read_temp(struct device *dev, u32 attr, int channel,
*val = 0;
break;
case 2 ... 13: /* Tccd{1-12} */
- ret = amd_smn_read(amd_pci_dev_to_node_id(data->pdev),
- ZEN_CCD_TEMP(data->ccd_offset, channel - 2),
- &regval);
+ ret = read_ccd_temp_reg(data, channel - 2, &regval);

if (ret)
return ret;
@@ -397,8 +402,7 @@ static void k10temp_get_ccd_support(struct pci_dev *pdev,
* the register value. And this will incorrectly pass the TEMP_VALID
* bit check.
*/
- if (amd_smn_read(amd_pci_dev_to_node_id(pdev),
- ZEN_CCD_TEMP(data->ccd_offset, i), &regval))
+ if (read_ccd_temp_reg(data, i, &regval))
continue;

if (regval & ZEN_CCD_TEMP_VALID)

--
2.34.1


2024-06-06 16:18:35

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH v4 7/8] hwmon: (k10temp) Remove unused HAVE_TDIE() macro

...to address the following warning:

drivers/hwmon/k10temp.c:104:9:
warning: macro is not used [-Wunused-macros]

No functional change is intended.

Signed-off-by: Yazen Ghannam <[email protected]>
Acked-by: Guenter Roeck <[email protected]>
---
drivers/hwmon/k10temp.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c
index 6deb272c7cef..a2d203304533 100644
--- a/drivers/hwmon/k10temp.c
+++ b/drivers/hwmon/k10temp.c
@@ -101,7 +101,6 @@ struct k10temp_data {
#define TCCD_BIT(x) ((x) + 2)

#define HAVE_TEMP(d, channel) ((d)->show_temp & BIT(channel))
-#define HAVE_TDIE(d) HAVE_TEMP(d, TDIE_BIT)

struct tctl_offset {
u8 model;

--
2.34.1


2024-06-06 16:21:21

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH v4 6/8] hwmon: (k10temp) Reduce k10temp_get_ccd_support() parameters

Currently, k10temp_get_ccd_support() takes as input "pdev" and "data".
However, "pdev" is already included in "data". Furthermore, the "pdev"
parameter is no longer used in k10temp_get_ccd_support(), since its use
was moved into read_ccd_temp_reg().

Drop the "pdev" input parameter as it is no longer needed.

No functional change is intended.

Signed-off-by: Yazen Ghannam <[email protected]>
Reviewed-by: Mario Limonciello <[email protected]>
Acked-by: Guenter Roeck <[email protected]>
---
drivers/hwmon/k10temp.c | 17 ++++++++---------
1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c
index 315c52de6e54..6deb272c7cef 100644
--- a/drivers/hwmon/k10temp.c
+++ b/drivers/hwmon/k10temp.c
@@ -385,8 +385,7 @@ static const struct hwmon_chip_info k10temp_chip_info = {
.info = k10temp_info,
};

-static void k10temp_get_ccd_support(struct pci_dev *pdev,
- struct k10temp_data *data, int limit)
+static void k10temp_get_ccd_support(struct k10temp_data *data, int limit)
{
u32 regval;
int i;
@@ -456,18 +455,18 @@ static int k10temp_probe(struct pci_dev *pdev, const struct pci_device_id *id)
case 0x11: /* Zen APU */
case 0x18: /* Zen+ APU */
data->ccd_offset = 0x154;
- k10temp_get_ccd_support(pdev, data, 4);
+ k10temp_get_ccd_support(data, 4);
break;
case 0x31: /* Zen2 Threadripper */
case 0x60: /* Renoir */
case 0x68: /* Lucienne */
case 0x71: /* Zen2 */
data->ccd_offset = 0x154;
- k10temp_get_ccd_support(pdev, data, 8);
+ k10temp_get_ccd_support(data, 8);
break;
case 0xa0 ... 0xaf:
data->ccd_offset = 0x300;
- k10temp_get_ccd_support(pdev, data, 8);
+ k10temp_get_ccd_support(data, 8);
break;
}
} else if (boot_cpu_data.x86 == 0x19) {
@@ -481,21 +480,21 @@ static int k10temp_probe(struct pci_dev *pdev, const struct pci_device_id *id)
case 0x21: /* Zen3 Ryzen Desktop */
case 0x50 ... 0x5f: /* Green Sardine */
data->ccd_offset = 0x154;
- k10temp_get_ccd_support(pdev, data, 8);
+ k10temp_get_ccd_support(data, 8);
break;
case 0x40 ... 0x4f: /* Yellow Carp */
data->ccd_offset = 0x300;
- k10temp_get_ccd_support(pdev, data, 8);
+ k10temp_get_ccd_support(data, 8);
break;
case 0x60 ... 0x6f:
case 0x70 ... 0x7f:
data->ccd_offset = 0x308;
- k10temp_get_ccd_support(pdev, data, 8);
+ k10temp_get_ccd_support(data, 8);
break;
case 0x10 ... 0x1f:
case 0xa0 ... 0xaf:
data->ccd_offset = 0x300;
- k10temp_get_ccd_support(pdev, data, 12);
+ k10temp_get_ccd_support(data, 12);
break;
}
} else if (boot_cpu_data.x86 == 0x1a) {

--
2.34.1


2024-06-06 16:21:30

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH v4 8/8] hwmon: (k10temp) Rename _data variable

...to address the following warning:

drivers/hwmon/k10temp.c:273:47:
warning: declaration shadows a variable in the global scope [-Wshadow]
static umode_t k10temp_is_visible(const void *_data,
^
include/asm-generic/sections.h:36:13:
note: previous declaration is here
extern char _data[], _sdata[], _edata[];

No functional change is intended.

Signed-off-by: Yazen Ghannam <[email protected]>
Acked-by: Guenter Roeck <[email protected]>
---
drivers/hwmon/k10temp.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c
index a2d203304533..543526bac042 100644
--- a/drivers/hwmon/k10temp.c
+++ b/drivers/hwmon/k10temp.c
@@ -269,11 +269,11 @@ static int k10temp_read(struct device *dev, enum hwmon_sensor_types type,
}
}

-static umode_t k10temp_is_visible(const void *_data,
+static umode_t k10temp_is_visible(const void *drvdata,
enum hwmon_sensor_types type,
u32 attr, int channel)
{
- const struct k10temp_data *data = _data;
+ const struct k10temp_data *data = drvdata;
struct pci_dev *pdev = data->pdev;
u32 reg;


--
2.34.1


Subject: [tip: x86/misc] hwmon: (k10temp) Rename _data variable

The following commit has been merged into the x86/misc branch of tip:

Commit-ID: efdf761a83cd390523223f58793755f5d60a4489
Gitweb: https://git.kernel.org/tip/efdf761a83cd390523223f58793755f5d60a4489
Author: Yazen Ghannam <[email protected]>
AuthorDate: Thu, 06 Jun 2024 11:13:01 -05:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Wed, 12 Jun 2024 11:40:31 +02:00

hwmon: (k10temp) Rename _data variable

...to address the following warning:

drivers/hwmon/k10temp.c:273:47:
warning: declaration shadows a variable in the global scope [-Wshadow]
static umode_t k10temp_is_visible(const void *_data,
^
No functional change is intended.

Signed-off-by: Yazen Ghannam <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Acked-by: Guenter Roeck <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
drivers/hwmon/k10temp.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c
index a2d2033..543526b 100644
--- a/drivers/hwmon/k10temp.c
+++ b/drivers/hwmon/k10temp.c
@@ -269,11 +269,11 @@ static int k10temp_read(struct device *dev, enum hwmon_sensor_types type,
}
}

-static umode_t k10temp_is_visible(const void *_data,
+static umode_t k10temp_is_visible(const void *drvdata,
enum hwmon_sensor_types type,
u32 attr, int channel)
{
- const struct k10temp_data *data = _data;
+ const struct k10temp_data *data = drvdata;
struct pci_dev *pdev = data->pdev;
u32 reg;


Subject: [tip: x86/misc] EDAC/amd64: Remove unused register accesses

The following commit has been merged into the x86/misc branch of tip:

Commit-ID: f97a8b9170a0524c5432791cd5852bf797934af4
Gitweb: https://git.kernel.org/tip/f97a8b9170a0524c5432791cd5852bf797934af4
Author: Yazen Ghannam <[email protected]>
AuthorDate: Thu, 06 Jun 2024 11:12:54 -05:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Wed, 12 Jun 2024 11:33:45 +02:00

EDAC/amd64: Remove unused register accesses

A number of UMC registers are read only for the purpose of debug printing. They
are not used in any calculations. Nor do they have any specific debug value.

Remove them.

Signed-off-by: Yazen Ghannam <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Mario Limonciello <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
drivers/edac/amd64_edac.c | 18 +-----------------
drivers/edac/amd64_edac.h | 4 ----
2 files changed, 1 insertion(+), 21 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index a17f3c0..dfc4fb8 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -20,7 +20,6 @@ static inline u32 get_umc_reg(struct amd64_pvt *pvt, u32 reg)
return reg;

switch (reg) {
- case UMCCH_ADDR_CFG: return UMCCH_ADDR_CFG_DDR5;
case UMCCH_ADDR_MASK_SEC: return UMCCH_ADDR_MASK_SEC_DDR5;
case UMCCH_DIMM_CFG: return UMCCH_DIMM_CFG_DDR5;
}
@@ -1341,22 +1340,15 @@ static void umc_debug_display_dimm_sizes(struct amd64_pvt *pvt, u8 ctrl)
static void umc_dump_misc_regs(struct amd64_pvt *pvt)
{
struct amd64_umc *umc;
- u32 i, tmp, umc_base;
+ u32 i;

for_each_umc(i) {
- umc_base = get_umc_base(i);
umc = &pvt->umc[i];

edac_dbg(1, "UMC%d DIMM cfg: 0x%x\n", i, umc->dimm_cfg);
edac_dbg(1, "UMC%d UMC cfg: 0x%x\n", i, umc->umc_cfg);
edac_dbg(1, "UMC%d SDP ctrl: 0x%x\n", i, umc->sdp_ctrl);
edac_dbg(1, "UMC%d ECC ctrl: 0x%x\n", i, umc->ecc_ctrl);
-
- amd_smn_read(pvt->mc_node_id, umc_base + UMCCH_ECC_BAD_SYMBOL, &tmp);
- edac_dbg(1, "UMC%d ECC bad symbol: 0x%x\n", i, tmp);
-
- amd_smn_read(pvt->mc_node_id, umc_base + UMCCH_UMC_CAP, &tmp);
- edac_dbg(1, "UMC%d UMC cap: 0x%x\n", i, tmp);
edac_dbg(1, "UMC%d UMC cap high: 0x%x\n", i, umc->umc_cap_hi);

edac_dbg(1, "UMC%d ECC capable: %s, ChipKill ECC capable: %s\n",
@@ -1369,14 +1361,6 @@ static void umc_dump_misc_regs(struct amd64_pvt *pvt)
edac_dbg(1, "UMC%d x16 DIMMs present: %s\n",
i, (umc->dimm_cfg & BIT(7)) ? "yes" : "no");

- if (umc->dram_type == MEM_LRDDR4 || umc->dram_type == MEM_LRDDR5) {
- amd_smn_read(pvt->mc_node_id,
- umc_base + get_umc_reg(pvt, UMCCH_ADDR_CFG),
- &tmp);
- edac_dbg(1, "UMC%d LRDIMM %dx rank multiply\n",
- i, 1 << ((tmp >> 4) & 0x3));
- }
-
umc_debug_display_dimm_sizes(pvt, i);
}
}
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index b879b12..17228d0 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -256,15 +256,11 @@
#define UMCCH_ADDR_MASK 0x20
#define UMCCH_ADDR_MASK_SEC 0x28
#define UMCCH_ADDR_MASK_SEC_DDR5 0x30
-#define UMCCH_ADDR_CFG 0x30
-#define UMCCH_ADDR_CFG_DDR5 0x40
#define UMCCH_DIMM_CFG 0x80
#define UMCCH_DIMM_CFG_DDR5 0x90
#define UMCCH_UMC_CFG 0x100
#define UMCCH_SDP_CTRL 0x104
#define UMCCH_ECC_CTRL 0x14C
-#define UMCCH_ECC_BAD_SYMBOL 0xD90
-#define UMCCH_UMC_CAP 0xDF0
#define UMCCH_UMC_CAP_HI 0xDF4

/* UMC CH bitfields */

Subject: [tip: x86/misc] hwmon: (k10temp) Check return value of amd_smn_read()

The following commit has been merged into the x86/misc branch of tip:

Commit-ID: c2d79cc5455c891de6c93e1e0c73d806e299c54f
Gitweb: https://git.kernel.org/tip/c2d79cc5455c891de6c93e1e0c73d806e299c54f
Author: Yazen Ghannam <[email protected]>
AuthorDate: Thu, 06 Jun 2024 11:12:56 -05:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Wed, 12 Jun 2024 11:33:46 +02:00

hwmon: (k10temp) Check return value of amd_smn_read()

Check the return value of amd_smn_read() before saving a value. This
ensures invalid values aren't saved or used.

There are three cases here with slightly different behavior:

1) read_tempreg_nb_zen():
This is a function pointer which does not include a return code.
In this case, set the register value to 0 on failure. This
enforces Read-as-Zero behavior.

2) k10temp_read_temp():
This function does have return codes, so return the error code
from the failed register read. Continued operation is not
necessary, since there is no valid data from the register.
Furthermore, if the register value was set to 0, then the
following operation would underflow.

3) k10temp_get_ccd_support():
This function reads the same register from multiple CCD
instances in a loop. And a bitmask is formed if a specific bit
is set in each register instance. The loop should continue on a
failed register read, skipping the bit check.

Signed-off-by: Yazen Ghannam <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Mario Limonciello <[email protected]>
Acked-by: Guenter Roeck <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
drivers/hwmon/k10temp.c | 36 +++++++++++++++++++++++++++---------
1 file changed, 27 insertions(+), 9 deletions(-)

diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c
index 8092312..6cad35e 100644
--- a/drivers/hwmon/k10temp.c
+++ b/drivers/hwmon/k10temp.c
@@ -153,8 +153,9 @@ static void read_tempreg_nb_f15(struct pci_dev *pdev, u32 *regval)

static void read_tempreg_nb_zen(struct pci_dev *pdev, u32 *regval)
{
- amd_smn_read(amd_pci_dev_to_node_id(pdev),
- ZEN_REPORTED_TEMP_CTRL_BASE, regval);
+ if (amd_smn_read(amd_pci_dev_to_node_id(pdev),
+ ZEN_REPORTED_TEMP_CTRL_BASE, regval))
+ *regval = 0;
}

static long get_raw_temp(struct k10temp_data *data)
@@ -205,6 +206,7 @@ static int k10temp_read_temp(struct device *dev, u32 attr, int channel,
long *val)
{
struct k10temp_data *data = dev_get_drvdata(dev);
+ int ret = -EOPNOTSUPP;
u32 regval;

switch (attr) {
@@ -221,13 +223,17 @@ static int k10temp_read_temp(struct device *dev, u32 attr, int channel,
*val = 0;
break;
case 2 ... 13: /* Tccd{1-12} */
- amd_smn_read(amd_pci_dev_to_node_id(data->pdev),
- ZEN_CCD_TEMP(data->ccd_offset, channel - 2),
- &regval);
+ ret = amd_smn_read(amd_pci_dev_to_node_id(data->pdev),
+ ZEN_CCD_TEMP(data->ccd_offset, channel - 2),
+ &regval);
+
+ if (ret)
+ return ret;
+
*val = (regval & ZEN_CCD_TEMP_MASK) * 125 - 49000;
break;
default:
- return -EOPNOTSUPP;
+ return ret;
}
break;
case hwmon_temp_max:
@@ -243,7 +249,7 @@ static int k10temp_read_temp(struct device *dev, u32 attr, int channel,
- ((regval >> 24) & 0xf)) * 500 + 52000;
break;
default:
- return -EOPNOTSUPP;
+ return ret;
}
return 0;
}
@@ -381,8 +387,20 @@ static void k10temp_get_ccd_support(struct pci_dev *pdev,
int i;

for (i = 0; i < limit; i++) {
- amd_smn_read(amd_pci_dev_to_node_id(pdev),
- ZEN_CCD_TEMP(data->ccd_offset, i), &regval);
+ /*
+ * Ignore inaccessible CCDs.
+ *
+ * Some systems will return a register value of 0, and the TEMP_VALID
+ * bit check below will naturally fail.
+ *
+ * Other systems will return a PCI_ERROR_RESPONSE (0xFFFFFFFF) for
+ * the register value. And this will incorrectly pass the TEMP_VALID
+ * bit check.
+ */
+ if (amd_smn_read(amd_pci_dev_to_node_id(pdev),
+ ZEN_CCD_TEMP(data->ccd_offset, i), &regval))
+ continue;
+
if (regval & ZEN_CCD_TEMP_VALID)
data->show_temp |= BIT(TCCD_BIT(i));
}

Subject: [tip: x86/misc] hwmon: (k10temp) Define a helper function to read CCD temperature

The following commit has been merged into the x86/misc branch of tip:

Commit-ID: cc66126fd317a70d8612a8356ad512a1539abd75
Gitweb: https://git.kernel.org/tip/cc66126fd317a70d8612a8356ad512a1539abd75
Author: Yazen Ghannam <[email protected]>
AuthorDate: Thu, 06 Jun 2024 11:12:58 -05:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Wed, 12 Jun 2024 11:39:29 +02:00

hwmon: (k10temp) Define a helper function to read CCD temperature

The CCD temperature register is read in two places. These reads are done
using an AMD SMN access, and a number of parameters are needed for the
operation.

Move the SMN access and parameter gathering into a helper function in order to
simplify the code flow. This also has a benefit of centralizing the hardware
register access in a single place in case fixes or special decoding is required.

Signed-off-by: Yazen Ghannam <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Mario Limonciello <[email protected]>
Acked-by: Guenter Roeck <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
drivers/hwmon/k10temp.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c
index 6cad35e..315c52d 100644
--- a/drivers/hwmon/k10temp.c
+++ b/drivers/hwmon/k10temp.c
@@ -158,6 +158,13 @@ static void read_tempreg_nb_zen(struct pci_dev *pdev, u32 *regval)
*regval = 0;
}

+static int read_ccd_temp_reg(struct k10temp_data *data, int ccd, u32 *regval)
+{
+ u16 node_id = amd_pci_dev_to_node_id(data->pdev);
+
+ return amd_smn_read(node_id, ZEN_CCD_TEMP(data->ccd_offset, ccd), regval);
+}
+
static long get_raw_temp(struct k10temp_data *data)
{
u32 regval;
@@ -223,9 +230,7 @@ static int k10temp_read_temp(struct device *dev, u32 attr, int channel,
*val = 0;
break;
case 2 ... 13: /* Tccd{1-12} */
- ret = amd_smn_read(amd_pci_dev_to_node_id(data->pdev),
- ZEN_CCD_TEMP(data->ccd_offset, channel - 2),
- &regval);
+ ret = read_ccd_temp_reg(data, channel - 2, &regval);

if (ret)
return ret;
@@ -397,8 +402,7 @@ static void k10temp_get_ccd_support(struct pci_dev *pdev,
* the register value. And this will incorrectly pass the TEMP_VALID
* bit check.
*/
- if (amd_smn_read(amd_pci_dev_to_node_id(pdev),
- ZEN_CCD_TEMP(data->ccd_offset, i), &regval))
+ if (read_ccd_temp_reg(data, i, &regval))
continue;

if (regval & ZEN_CCD_TEMP_VALID)

Subject: [tip: x86/misc] x86/amd_nb: Enhance SMN access error checking

The following commit has been merged into the x86/misc branch of tip:

Commit-ID: dc5243921be1b6a0b4259dbcec3dc95016ad8427
Gitweb: https://git.kernel.org/tip/dc5243921be1b6a0b4259dbcec3dc95016ad8427
Author: Yazen Ghannam <[email protected]>
AuthorDate: Thu, 06 Jun 2024 11:12:57 -05:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Wed, 12 Jun 2024 11:38:58 +02:00

x86/amd_nb: Enhance SMN access error checking

AMD Zen-based systems use a System Management Network (SMN) that
provides access to implementation-specific registers.

SMN accesses are done indirectly through an index/data pair in PCI
config space. The accesses can fail for a variety of reasons.

Include code comments to describe some possible scenarios.

Require error checking for callers of amd_smn_read() and amd_smn_write().
This is needed because many error conditions cannot be checked by these
functions.

[ bp: Touchup comment. ]

Signed-off-by: Yazen Ghannam <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Mario Limonciello <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/include/asm/amd_nb.h | 4 +--
arch/x86/kernel/amd_nb.c | 44 ++++++++++++++++++++++++++++++----
2 files changed, 41 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h
index 5c37944..6f3b6ae 100644
--- a/arch/x86/include/asm/amd_nb.h
+++ b/arch/x86/include/asm/amd_nb.h
@@ -21,8 +21,8 @@ extern int amd_numa_init(void);
extern int amd_get_subcaches(int);
extern int amd_set_subcaches(int, unsigned long);

-extern int amd_smn_read(u16 node, u32 address, u32 *value);
-extern int amd_smn_write(u16 node, u32 address, u32 value);
+int __must_check amd_smn_read(u16 node, u32 address, u32 *value);
+int __must_check amd_smn_write(u16 node, u32 address, u32 value);

struct amd_l3_cache {
unsigned indices;
diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
index 027a8c7..059e5c1 100644
--- a/arch/x86/kernel/amd_nb.c
+++ b/arch/x86/kernel/amd_nb.c
@@ -180,6 +180,43 @@ static struct pci_dev *next_northbridge(struct pci_dev *dev,
return dev;
}

+/*
+ * SMN accesses may fail in ways that are difficult to detect here in the called
+ * functions amd_smn_read() and amd_smn_write(). Therefore, callers must do
+ * their own checking based on what behavior they expect.
+ *
+ * For SMN reads, the returned value may be zero if the register is Read-as-Zero.
+ * Or it may be a "PCI Error Response", e.g. all 0xFFs. The "PCI Error Response"
+ * can be checked here, and a proper error code can be returned.
+ *
+ * But the Read-as-Zero response cannot be verified here. A value of 0 may be
+ * correct in some cases, so callers must check that this correct is for the
+ * register/fields they need.
+ *
+ * For SMN writes, success can be determined through a "write and read back"
+ * However, this is not robust when done here.
+ *
+ * Possible issues:
+ *
+ * 1) Bits that are "Write-1-to-Clear". In this case, the read value should
+ * *not* match the write value.
+ *
+ * 2) Bits that are "Read-as-Zero"/"Writes-Ignored". This information cannot be
+ * known here.
+ *
+ * 3) Bits that are "Reserved / Set to 1". Ditto above.
+ *
+ * Callers of amd_smn_write() should do the "write and read back" check
+ * themselves, if needed.
+ *
+ * For #1, they can see if their target bits got cleared.
+ *
+ * For #2 and #3, they can check if their target bits got set as intended.
+ *
+ * This matches what is done for RDMSR/WRMSR. As long as there's no #GP, then
+ * the operation is considered a success, and the caller does their own
+ * checking.
+ */
static int __amd_smn_rw(u16 node, u32 address, u32 *value, bool write)
{
struct pci_dev *root;
@@ -202,9 +239,6 @@ static int __amd_smn_rw(u16 node, u32 address, u32 *value, bool write)

err = (write ? pci_write_config_dword(root, 0x64, *value)
: pci_read_config_dword(root, 0x64, value));
- if (err)
- pr_warn("Error %s SMN address 0x%x.\n",
- (write ? "writing to" : "reading from"), address);

out_unlock:
mutex_unlock(&smn_mutex);
@@ -213,7 +247,7 @@ out:
return err;
}

-int amd_smn_read(u16 node, u32 address, u32 *value)
+int __must_check amd_smn_read(u16 node, u32 address, u32 *value)
{
int err = __amd_smn_rw(node, address, value, false);

@@ -226,7 +260,7 @@ int amd_smn_read(u16 node, u32 address, u32 *value)
}
EXPORT_SYMBOL_GPL(amd_smn_read);

-int amd_smn_write(u16 node, u32 address, u32 value)
+int __must_check amd_smn_write(u16 node, u32 address, u32 value)
{
return __amd_smn_rw(node, address, &value, true);
}

Subject: [tip: x86/misc] EDAC/amd64: Check return value of amd_smn_read()

The following commit has been merged into the x86/misc branch of tip:

Commit-ID: 5ac6293047cf5de6daca662347c19347e856c2a5
Gitweb: https://git.kernel.org/tip/5ac6293047cf5de6daca662347c19347e856c2a5
Author: Yazen Ghannam <[email protected]>
AuthorDate: Thu, 06 Jun 2024 11:12:55 -05:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Wed, 12 Jun 2024 11:33:45 +02:00

EDAC/amd64: Check return value of amd_smn_read()

Check the return value of amd_smn_read() before saving a value. This
ensures invalid values aren't saved. The struct umc instance is
initialized to 0 during memory allocation. Therefore, a bad read will
keep the value as 0 providing the expected Read-as-Zero behavior.

Signed-off-by: Yazen Ghannam <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Mario Limonciello <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
drivers/edac/amd64_edac.c | 51 +++++++++++++++++++++++++++-----------
1 file changed, 37 insertions(+), 14 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index dfc4fb8..ddfbdb6 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -1438,6 +1438,7 @@ static void umc_read_base_mask(struct amd64_pvt *pvt)
u32 *base, *base_sec;
u32 *mask, *mask_sec;
int cs, umc;
+ u32 tmp;

for_each_umc(umc) {
umc_base_reg = get_umc_base(umc) + UMCCH_BASE_ADDR;
@@ -1450,13 +1451,17 @@ static void umc_read_base_mask(struct amd64_pvt *pvt)
base_reg = umc_base_reg + (cs * 4);
base_reg_sec = umc_base_reg_sec + (cs * 4);

- if (!amd_smn_read(pvt->mc_node_id, base_reg, base))
+ if (!amd_smn_read(pvt->mc_node_id, base_reg, &tmp)) {
+ *base = tmp;
edac_dbg(0, " DCSB%d[%d]=0x%08x reg: 0x%x\n",
umc, cs, *base, base_reg);
+ }

- if (!amd_smn_read(pvt->mc_node_id, base_reg_sec, base_sec))
+ if (!amd_smn_read(pvt->mc_node_id, base_reg_sec, &tmp)) {
+ *base_sec = tmp;
edac_dbg(0, " DCSB_SEC%d[%d]=0x%08x reg: 0x%x\n",
umc, cs, *base_sec, base_reg_sec);
+ }
}

umc_mask_reg = get_umc_base(umc) + UMCCH_ADDR_MASK;
@@ -1469,13 +1474,17 @@ static void umc_read_base_mask(struct amd64_pvt *pvt)
mask_reg = umc_mask_reg + (cs * 4);
mask_reg_sec = umc_mask_reg_sec + (cs * 4);

- if (!amd_smn_read(pvt->mc_node_id, mask_reg, mask))
+ if (!amd_smn_read(pvt->mc_node_id, mask_reg, &tmp)) {
+ *mask = tmp;
edac_dbg(0, " DCSM%d[%d]=0x%08x reg: 0x%x\n",
umc, cs, *mask, mask_reg);
+ }

- if (!amd_smn_read(pvt->mc_node_id, mask_reg_sec, mask_sec))
+ if (!amd_smn_read(pvt->mc_node_id, mask_reg_sec, &tmp)) {
+ *mask_sec = tmp;
edac_dbg(0, " DCSM_SEC%d[%d]=0x%08x reg: 0x%x\n",
umc, cs, *mask_sec, mask_reg_sec);
+ }
}
}
}
@@ -2894,7 +2903,7 @@ static void umc_read_mc_regs(struct amd64_pvt *pvt)
{
u8 nid = pvt->mc_node_id;
struct amd64_umc *umc;
- u32 i, umc_base;
+ u32 i, tmp, umc_base;

/* Read registers from each UMC */
for_each_umc(i) {
@@ -2902,11 +2911,20 @@ static void umc_read_mc_regs(struct amd64_pvt *pvt)
umc_base = get_umc_base(i);
umc = &pvt->umc[i];

- amd_smn_read(nid, umc_base + get_umc_reg(pvt, UMCCH_DIMM_CFG), &umc->dimm_cfg);
- amd_smn_read(nid, umc_base + UMCCH_UMC_CFG, &umc->umc_cfg);
- amd_smn_read(nid, umc_base + UMCCH_SDP_CTRL, &umc->sdp_ctrl);
- amd_smn_read(nid, umc_base + UMCCH_ECC_CTRL, &umc->ecc_ctrl);
- amd_smn_read(nid, umc_base + UMCCH_UMC_CAP_HI, &umc->umc_cap_hi);
+ if (!amd_smn_read(nid, umc_base + get_umc_reg(pvt, UMCCH_DIMM_CFG), &tmp))
+ umc->dimm_cfg = tmp;
+
+ if (!amd_smn_read(nid, umc_base + UMCCH_UMC_CFG, &tmp))
+ umc->umc_cfg = tmp;
+
+ if (!amd_smn_read(nid, umc_base + UMCCH_SDP_CTRL, &tmp))
+ umc->sdp_ctrl = tmp;
+
+ if (!amd_smn_read(nid, umc_base + UMCCH_ECC_CTRL, &tmp))
+ umc->ecc_ctrl = tmp;
+
+ if (!amd_smn_read(nid, umc_base + UMCCH_UMC_CAP_HI, &tmp))
+ umc->umc_cap_hi = tmp;
}
}

@@ -3635,16 +3653,21 @@ static void gpu_read_mc_regs(struct amd64_pvt *pvt)
{
u8 nid = pvt->mc_node_id;
struct amd64_umc *umc;
- u32 i, umc_base;
+ u32 i, tmp, umc_base;

/* Read registers from each UMC */
for_each_umc(i) {
umc_base = gpu_get_umc_base(pvt, i, 0);
umc = &pvt->umc[i];

- amd_smn_read(nid, umc_base + UMCCH_UMC_CFG, &umc->umc_cfg);
- amd_smn_read(nid, umc_base + UMCCH_SDP_CTRL, &umc->sdp_ctrl);
- amd_smn_read(nid, umc_base + UMCCH_ECC_CTRL, &umc->ecc_ctrl);
+ if (!amd_smn_read(nid, umc_base + UMCCH_UMC_CFG, &tmp))
+ umc->umc_cfg = tmp;
+
+ if (!amd_smn_read(nid, umc_base + UMCCH_SDP_CTRL, &tmp))
+ umc->sdp_ctrl = tmp;
+
+ if (!amd_smn_read(nid, umc_base + UMCCH_ECC_CTRL, &tmp))
+ umc->ecc_ctrl = tmp;
}
}


Subject: [tip: x86/misc] hwmon: (k10temp) Reduce k10temp_get_ccd_support() parameters

The following commit has been merged into the x86/misc branch of tip:

Commit-ID: a8bc4165d237f4a6bddbab55d2b6592b87341f0a
Gitweb: https://git.kernel.org/tip/a8bc4165d237f4a6bddbab55d2b6592b87341f0a
Author: Yazen Ghannam <[email protected]>
AuthorDate: Thu, 06 Jun 2024 11:12:59 -05:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Wed, 12 Jun 2024 11:39:49 +02:00

hwmon: (k10temp) Reduce k10temp_get_ccd_support() parameters

Currently, k10temp_get_ccd_support() takes as input "pdev" and "data". However,
"pdev" is already included in "data". Furthermore, the "pdev" parameter is no
longer used in k10temp_get_ccd_support(), since its use was moved into
read_ccd_temp_reg().

Drop the "pdev" input parameter as it is no longer needed.

No functional change is intended.

Signed-off-by: Yazen Ghannam <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Mario Limonciello <[email protected]>
Acked-by: Guenter Roeck <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
drivers/hwmon/k10temp.c | 17 ++++++++---------
1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c
index 315c52d..6deb272 100644
--- a/drivers/hwmon/k10temp.c
+++ b/drivers/hwmon/k10temp.c
@@ -385,8 +385,7 @@ static const struct hwmon_chip_info k10temp_chip_info = {
.info = k10temp_info,
};

-static void k10temp_get_ccd_support(struct pci_dev *pdev,
- struct k10temp_data *data, int limit)
+static void k10temp_get_ccd_support(struct k10temp_data *data, int limit)
{
u32 regval;
int i;
@@ -456,18 +455,18 @@ static int k10temp_probe(struct pci_dev *pdev, const struct pci_device_id *id)
case 0x11: /* Zen APU */
case 0x18: /* Zen+ APU */
data->ccd_offset = 0x154;
- k10temp_get_ccd_support(pdev, data, 4);
+ k10temp_get_ccd_support(data, 4);
break;
case 0x31: /* Zen2 Threadripper */
case 0x60: /* Renoir */
case 0x68: /* Lucienne */
case 0x71: /* Zen2 */
data->ccd_offset = 0x154;
- k10temp_get_ccd_support(pdev, data, 8);
+ k10temp_get_ccd_support(data, 8);
break;
case 0xa0 ... 0xaf:
data->ccd_offset = 0x300;
- k10temp_get_ccd_support(pdev, data, 8);
+ k10temp_get_ccd_support(data, 8);
break;
}
} else if (boot_cpu_data.x86 == 0x19) {
@@ -481,21 +480,21 @@ static int k10temp_probe(struct pci_dev *pdev, const struct pci_device_id *id)
case 0x21: /* Zen3 Ryzen Desktop */
case 0x50 ... 0x5f: /* Green Sardine */
data->ccd_offset = 0x154;
- k10temp_get_ccd_support(pdev, data, 8);
+ k10temp_get_ccd_support(data, 8);
break;
case 0x40 ... 0x4f: /* Yellow Carp */
data->ccd_offset = 0x300;
- k10temp_get_ccd_support(pdev, data, 8);
+ k10temp_get_ccd_support(data, 8);
break;
case 0x60 ... 0x6f:
case 0x70 ... 0x7f:
data->ccd_offset = 0x308;
- k10temp_get_ccd_support(pdev, data, 8);
+ k10temp_get_ccd_support(data, 8);
break;
case 0x10 ... 0x1f:
case 0xa0 ... 0xaf:
data->ccd_offset = 0x300;
- k10temp_get_ccd_support(pdev, data, 12);
+ k10temp_get_ccd_support(data, 12);
break;
}
} else if (boot_cpu_data.x86 == 0x1a) {

Subject: [tip: x86/misc] hwmon: (k10temp) Remove unused HAVE_TDIE() macro

The following commit has been merged into the x86/misc branch of tip:

Commit-ID: 0e097f2b5928651de3b4f0100401c9e71fa73dba
Gitweb: https://git.kernel.org/tip/0e097f2b5928651de3b4f0100401c9e71fa73dba
Author: Yazen Ghannam <[email protected]>
AuthorDate: Thu, 06 Jun 2024 11:13:00 -05:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Wed, 12 Jun 2024 11:40:17 +02:00

hwmon: (k10temp) Remove unused HAVE_TDIE() macro

...to address the following warning:

drivers/hwmon/k10temp.c:104:9:
warning: macro is not used [-Wunused-macros]

No functional change is intended.

Signed-off-by: Yazen Ghannam <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Acked-by: Guenter Roeck <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
drivers/hwmon/k10temp.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c
index 6deb272..a2d2033 100644
--- a/drivers/hwmon/k10temp.c
+++ b/drivers/hwmon/k10temp.c
@@ -101,7 +101,6 @@ struct k10temp_data {
#define TCCD_BIT(x) ((x) + 2)

#define HAVE_TEMP(d, channel) ((d)->show_temp & BIT(channel))
-#define HAVE_TDIE(d) HAVE_TEMP(d, TDIE_BIT)

struct tctl_offset {
u8 model;