Subject: [PATCH v6 0/5] x86/edac/amd64: Add heterogeneous node support

On newer heterogeneous systems with AMD CPUs the data fabrics of GPUs
can be connected directly via custom links.

This series of patchset does the following
1. amd_nb.c:
a. Add support for northbridges on Aldebaran GPU nodes
b. export AMD node map details to be used by edac and mce modules

2. mce_amd module:
a. Identify the node ID where the error occurred and map the node
id to linux enumerated node id.

3. amd64_edac module
a. Add new family op routines
b. Enumerate UMCs and HBMs on the GPU nodes
c. Move fam_type structure into amd64_pvt struct

This patchset is rebased on top of
"
commit 07416cadfdfa38283b840e700427ae3782c76f6b
Author: Yazen Ghannam <[email protected]>
Date: Tue Oct 5 15:44:19 2021 +0000

EDAC/amd64: Handle three rank interleaving mode
"

Muralidhara M K (3):
x86/amd_nb: Add support for northbridges on Aldebaran
EDAC/amd64: Extend family ops functions
EDAC/amd64: Move struct fam_type into amd64_pvt structure

Naveen Krishna Chatradhi (2):
EDAC/mce_amd: Extract node id from MCA_IPID
EDAC/amd64: Enumerate memory on Aldebaran GPU nodes

arch/x86/include/asm/amd_nb.h | 9 +
arch/x86/kernel/amd_nb.c | 146 ++++++--
drivers/edac/amd64_edac.c | 654 ++++++++++++++++++++++++----------
drivers/edac/amd64_edac.h | 39 +-
drivers/edac/mce_amd.c | 24 +-
include/linux/pci_ids.h | 1 +
6 files changed, 663 insertions(+), 210 deletions(-)

--
2.25.1


Subject: [PATCH v6 2/5] EDAC/mce_amd: Extract node id from MCA_IPID

On SMCA banks of the GPU nodes, the node id information is
available in register MCA_IPID[47:44](InstanceIdHi).

Convert the hardware node ID to a value used by Linux
where GPU nodes are sequencially after the CPU nodes.

Co-developed-by: Muralidhara M K <[email protected]>
Signed-off-by: Muralidhara M K <[email protected]>
Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
Reviewed-by: Yazen Ghannam <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
Changes since v5:
None

Changes since v4:
Add reviewed by Yazen

Changes since v3:
1. Use APIs from amd_nb to identify the gpu_node_start_id and cpu_node_count.
Which is required to map the hardware node id to node id enumerated by Linux.

Changes since v2:
1. Modified subject and commit message
2. Added Reviewed by Yazen Ghannam

Changes since v1:
1. Modified the commit message
2. rearranged the conditions before calling decode_dram_ecc()

drivers/edac/mce_amd.c | 24 ++++++++++++++++++++++--
1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index 67dbf4c31271..af6caa76adc7 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -2,6 +2,7 @@
#include <linux/module.h>
#include <linux/slab.h>

+#include <asm/amd_nb.h>
#include <asm/cpu.h>

#include "mce_amd.h"
@@ -1072,8 +1073,27 @@ static void decode_smca_error(struct mce *m)
if (xec < smca_mce_descs[bank_type].num_descs)
pr_cont(", %s.\n", smca_mce_descs[bank_type].descs[xec]);

- if (bank_type == SMCA_UMC && xec == 0 && decode_dram_ecc)
- decode_dram_ecc(topology_die_id(m->extcpu), m);
+ if (xec == 0 && decode_dram_ecc) {
+ int node_id = 0;
+
+ if (bank_type == SMCA_UMC) {
+ node_id = topology_die_id(m->extcpu);
+ } else if (bank_type == SMCA_UMC_V2) {
+ /*
+ * SMCA_UMC_V2 exists on GPU nodes, extract the node id
+ * from register MCA_IPID[47:44](InstanceIdHi).
+ * The InstanceIdHi field represents the instance ID of the GPU.
+ * Which needs to be mapped to a value used by Linux,
+ * where GPU nodes are simply numerically after the CPU nodes.
+ */
+ node_id = ((m->ipid >> 44) & 0xF) -
+ amd_gpu_node_start_id() + amd_cpu_node_count();
+ } else {
+ return;
+ }
+
+ decode_dram_ecc(node_id, m);
+ }
}

static inline void amd_decode_err_code(u16 ec)
--
2.25.1

Subject: [PATCH v6 3/5] EDAC/amd64: Extend family ops functions

From: Muralidhara M K <[email protected]>

Create new family operation routines and define them respectively.
This would simplify adding support for future platforms.

Signed-off-by: Muralidhara M K <[email protected]>
Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
Changes since v5:
split read_mc_regs for per family ops
Adjusted and Called dump_misc_regs for family ops

Changes since v4:
1. Modified k8_prep_chip_selects for ext_model checks
2. Add read_dct_base_mask to ops
3. Renamed find_umc_channel and addressed minor comments

Changes since v3:
1. Defined new family operation routines

Changs since v2:
1. new patch

drivers/edac/amd64_edac.c | 302 +++++++++++++++++++++++---------------
drivers/edac/amd64_edac.h | 10 +-
2 files changed, 188 insertions(+), 124 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 4fce75013674..1029fe84ba2e 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -1204,10 +1204,7 @@ static void __dump_misc_regs(struct amd64_pvt *pvt)
/* Display and decode various NB registers for debug purposes. */
static void dump_misc_regs(struct amd64_pvt *pvt)
{
- if (pvt->umc)
- __dump_misc_regs_df(pvt);
- else
- __dump_misc_regs(pvt);
+ pvt->ops->get_misc_regs(pvt);

edac_dbg(1, " DramHoleValid: %s\n", dhar_valid(pvt) ? "yes" : "no");

@@ -1217,28 +1214,39 @@ static void dump_misc_regs(struct amd64_pvt *pvt)
/*
* See BKDG, F2x[1,0][5C:40], F2[1,0][6C:60]
*/
-static void prep_chip_selects(struct amd64_pvt *pvt)
+static void k8_prep_chip_selects(struct amd64_pvt *pvt)
{
if (pvt->fam == 0xf && pvt->ext_model < K8_REV_F) {
pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8;
pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 8;
- } else if (pvt->fam == 0x15 && pvt->model == 0x30) {
- pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 4;
- pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2;
- } else if (pvt->fam >= 0x17) {
- int umc;
-
- for_each_umc(umc) {
- pvt->csels[umc].b_cnt = 4;
- pvt->csels[umc].m_cnt = 2;
- }
-
- } else {
+ } else if (pvt->fam == 0xf && pvt->ext_model >= K8_REV_F) {
pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8;
pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 4;
}
}

+static void f15m30_prep_chip_selects(struct amd64_pvt *pvt)
+{
+ pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 4;
+ pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2;
+}
+
+static void default_prep_chip_selects(struct amd64_pvt *pvt)
+{
+ pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8;
+ pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 4;
+}
+
+static void f17_prep_chip_selects(struct amd64_pvt *pvt)
+{
+ int umc;
+
+ for_each_umc(umc) {
+ pvt->csels[umc].b_cnt = 4;
+ pvt->csels[umc].m_cnt = 2;
+ }
+}
+
static void read_umc_base_mask(struct amd64_pvt *pvt)
{
u32 umc_base_reg, umc_base_reg_sec;
@@ -1297,11 +1305,6 @@ static void read_dct_base_mask(struct amd64_pvt *pvt)
{
int cs;

- prep_chip_selects(pvt);
-
- if (pvt->umc)
- return read_umc_base_mask(pvt);
-
for_each_chip_select(cs, 0, pvt) {
int reg0 = DCSB0 + (cs * 4);
int reg1 = DCSB1 + (cs * 4);
@@ -2512,143 +2515,181 @@ static void debug_display_dimm_sizes(struct amd64_pvt *pvt, u8 ctrl)
}
}

+/* Prototypes for family specific ops routines */
+static int init_csrows(struct mem_ctl_info *mci);
+static int init_csrows_df(struct mem_ctl_info *mci);
+static void read_mc_regs(struct amd64_pvt *pvt);
+static void __read_mc_regs_df(struct amd64_pvt *pvt);
+static void update_umc_err_info(struct mce *m, struct err_info *err);
+
+static const struct low_ops k8_ops = {
+ .early_channel_count = k8_early_channel_count,
+ .map_sysaddr_to_csrow = k8_map_sysaddr_to_csrow,
+ .dbam_to_cs = k8_dbam_to_chip_select,
+ .prep_chip_select = k8_prep_chip_selects,
+ .get_base_mask = read_dct_base_mask,
+ .get_misc_regs = __dump_misc_regs,
+ .get_mc_regs = read_mc_regs,
+ .populate_csrows = init_csrows,
+};
+
+static const struct low_ops f10_ops = {
+ .early_channel_count = f1x_early_channel_count,
+ .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
+ .dbam_to_cs = f10_dbam_to_chip_select,
+ .prep_chip_select = default_prep_chip_selects,
+ .get_base_mask = read_dct_base_mask,
+ .get_misc_regs = __dump_misc_regs,
+ .get_mc_regs = read_mc_regs,
+ .populate_csrows = init_csrows,
+};
+
+static const struct low_ops f15_ops = {
+ .early_channel_count = f1x_early_channel_count,
+ .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
+ .dbam_to_cs = f15_dbam_to_chip_select,
+ .prep_chip_select = default_prep_chip_selects,
+ .get_base_mask = read_dct_base_mask,
+ .get_misc_regs = __dump_misc_regs,
+ .get_mc_regs = read_mc_regs,
+ .populate_csrows = init_csrows,
+};
+
+static const struct low_ops f15m30_ops = {
+ .early_channel_count = f1x_early_channel_count,
+ .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
+ .dbam_to_cs = f16_dbam_to_chip_select,
+ .prep_chip_select = f15m30_prep_chip_selects,
+ .get_base_mask = read_dct_base_mask,
+ .get_misc_regs = __dump_misc_regs,
+ .get_mc_regs = read_mc_regs,
+ .populate_csrows = init_csrows,
+};
+
+static const struct low_ops f15m60_ops = {
+ .early_channel_count = f1x_early_channel_count,
+ .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
+ .dbam_to_cs = f15_m60h_dbam_to_chip_select,
+ .prep_chip_select = default_prep_chip_selects,
+ .get_base_mask = read_dct_base_mask,
+ .get_misc_regs = __dump_misc_regs,
+ .get_mc_regs = read_mc_regs,
+ .populate_csrows = init_csrows,
+};
+
+static const struct low_ops f16_ops = {
+ .early_channel_count = f1x_early_channel_count,
+ .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
+ .dbam_to_cs = f16_dbam_to_chip_select,
+ .prep_chip_select = default_prep_chip_selects,
+ .get_base_mask = read_dct_base_mask,
+ .get_misc_regs = __dump_misc_regs,
+ .get_mc_regs = read_mc_regs,
+ .populate_csrows = init_csrows,
+};
+
+static const struct low_ops f17_ops = {
+ .early_channel_count = f17_early_channel_count,
+ .dbam_to_cs = f17_addr_mask_to_cs_size,
+ .prep_chip_select = f17_prep_chip_selects,
+ .get_base_mask = read_umc_base_mask,
+ .get_misc_regs = __dump_misc_regs_df,
+ .get_mc_regs = __read_mc_regs_df,
+ .populate_csrows = init_csrows_df,
+ .get_umc_err_info = update_umc_err_info,
+};
+
static struct amd64_family_type family_types[] = {
[K8_CPUS] = {
.ctl_name = "K8",
.f1_id = PCI_DEVICE_ID_AMD_K8_NB_ADDRMAP,
.f2_id = PCI_DEVICE_ID_AMD_K8_NB_MEMCTL,
.max_mcs = 2,
- .ops = {
- .early_channel_count = k8_early_channel_count,
- .map_sysaddr_to_csrow = k8_map_sysaddr_to_csrow,
- .dbam_to_cs = k8_dbam_to_chip_select,
- }
+ .ops = k8_ops,
},
[F10_CPUS] = {
.ctl_name = "F10h",
.f1_id = PCI_DEVICE_ID_AMD_10H_NB_MAP,
.f2_id = PCI_DEVICE_ID_AMD_10H_NB_DRAM,
.max_mcs = 2,
- .ops = {
- .early_channel_count = f1x_early_channel_count,
- .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
- .dbam_to_cs = f10_dbam_to_chip_select,
- }
+ .ops = f10_ops,
},
[F15_CPUS] = {
.ctl_name = "F15h",
.f1_id = PCI_DEVICE_ID_AMD_15H_NB_F1,
.f2_id = PCI_DEVICE_ID_AMD_15H_NB_F2,
.max_mcs = 2,
- .ops = {
- .early_channel_count = f1x_early_channel_count,
- .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
- .dbam_to_cs = f15_dbam_to_chip_select,
- }
+ .ops = f15_ops,
},
[F15_M30H_CPUS] = {
.ctl_name = "F15h_M30h",
.f1_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F1,
.f2_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F2,
.max_mcs = 2,
- .ops = {
- .early_channel_count = f1x_early_channel_count,
- .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
- .dbam_to_cs = f16_dbam_to_chip_select,
- }
+ .ops = f15m30_ops,
},
[F15_M60H_CPUS] = {
.ctl_name = "F15h_M60h",
.f1_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F1,
.f2_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F2,
.max_mcs = 2,
- .ops = {
- .early_channel_count = f1x_early_channel_count,
- .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
- .dbam_to_cs = f15_m60h_dbam_to_chip_select,
- }
+ .ops = f15m60_ops,
},
[F16_CPUS] = {
.ctl_name = "F16h",
.f1_id = PCI_DEVICE_ID_AMD_16H_NB_F1,
.f2_id = PCI_DEVICE_ID_AMD_16H_NB_F2,
.max_mcs = 2,
- .ops = {
- .early_channel_count = f1x_early_channel_count,
- .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
- .dbam_to_cs = f16_dbam_to_chip_select,
- }
+ .ops = f16_ops,
},
[F16_M30H_CPUS] = {
.ctl_name = "F16h_M30h",
.f1_id = PCI_DEVICE_ID_AMD_16H_M30H_NB_F1,
.f2_id = PCI_DEVICE_ID_AMD_16H_M30H_NB_F2,
.max_mcs = 2,
- .ops = {
- .early_channel_count = f1x_early_channel_count,
- .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
- .dbam_to_cs = f16_dbam_to_chip_select,
- }
+ .ops = f16_ops,
},
[F17_CPUS] = {
.ctl_name = "F17h",
.f0_id = PCI_DEVICE_ID_AMD_17H_DF_F0,
.f6_id = PCI_DEVICE_ID_AMD_17H_DF_F6,
.max_mcs = 2,
- .ops = {
- .early_channel_count = f17_early_channel_count,
- .dbam_to_cs = f17_addr_mask_to_cs_size,
- }
+ .ops = f17_ops,
},
[F17_M10H_CPUS] = {
.ctl_name = "F17h_M10h",
.f0_id = PCI_DEVICE_ID_AMD_17H_M10H_DF_F0,
.f6_id = PCI_DEVICE_ID_AMD_17H_M10H_DF_F6,
.max_mcs = 2,
- .ops = {
- .early_channel_count = f17_early_channel_count,
- .dbam_to_cs = f17_addr_mask_to_cs_size,
- }
+ .ops = f17_ops,
},
[F17_M30H_CPUS] = {
.ctl_name = "F17h_M30h",
.f0_id = PCI_DEVICE_ID_AMD_17H_M30H_DF_F0,
.f6_id = PCI_DEVICE_ID_AMD_17H_M30H_DF_F6,
.max_mcs = 8,
- .ops = {
- .early_channel_count = f17_early_channel_count,
- .dbam_to_cs = f17_addr_mask_to_cs_size,
- }
+ .ops = f17_ops,
},
[F17_M60H_CPUS] = {
.ctl_name = "F17h_M60h",
.f0_id = PCI_DEVICE_ID_AMD_17H_M60H_DF_F0,
.f6_id = PCI_DEVICE_ID_AMD_17H_M60H_DF_F6,
.max_mcs = 2,
- .ops = {
- .early_channel_count = f17_early_channel_count,
- .dbam_to_cs = f17_addr_mask_to_cs_size,
- }
+ .ops = f17_ops,
},
[F17_M70H_CPUS] = {
.ctl_name = "F17h_M70h",
.f0_id = PCI_DEVICE_ID_AMD_17H_M70H_DF_F0,
.f6_id = PCI_DEVICE_ID_AMD_17H_M70H_DF_F6,
.max_mcs = 2,
- .ops = {
- .early_channel_count = f17_early_channel_count,
- .dbam_to_cs = f17_addr_mask_to_cs_size,
- }
+ .ops = f17_ops,
},
[F19_CPUS] = {
.ctl_name = "F19h",
.f0_id = PCI_DEVICE_ID_AMD_19H_DF_F0,
.f6_id = PCI_DEVICE_ID_AMD_19H_DF_F6,
.max_mcs = 8,
- .ops = {
- .early_channel_count = f17_early_channel_count,
- .dbam_to_cs = f17_addr_mask_to_cs_size,
- }
+ .ops = f17_ops,
},
};

@@ -2899,10 +2940,13 @@ static inline void decode_bus_error(int node_id, struct mce *m)
* Currently, we can derive the channel number by looking at the 6th nibble in
* the instance_id. For example, instance_id=0xYXXXXX where Y is the channel
* number.
+ *
+ * csrow can be derived from the lower 3 bits of MCA_SYND value.
*/
-static int find_umc_channel(struct mce *m)
+static void update_umc_err_info(struct mce *m, struct err_info *err)
{
- return (m->ipid & GENMASK(31, 0)) >> 20;
+ err->channel = (m->ipid & GENMASK(31, 0)) >> 20;
+ err->csrow = m->synd & 0x7;
}

static void decode_umc_error(int node_id, struct mce *m)
@@ -2924,8 +2968,6 @@ static void decode_umc_error(int node_id, struct mce *m)
if (m->status & MCI_STATUS_DEFERRED)
ecc_type = 3;

- err.channel = find_umc_channel(m);
-
if (!(m->status & MCI_STATUS_SYNDV)) {
err.err_code = ERR_SYND;
goto log_error;
@@ -2940,7 +2982,7 @@ static void decode_umc_error(int node_id, struct mce *m)
err.err_code = ERR_CHANNEL;
}

- err.csrow = m->synd & 0x7;
+ pvt->ops->get_umc_err_info(m, &err);

if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, err.channel, &sys_addr)) {
err.err_code = ERR_NORM_ADDR;
@@ -3058,6 +3100,27 @@ static void determine_ecc_sym_sz(struct amd64_pvt *pvt)
}
}

+static void read_top_mem_registers(struct amd64_pvt *pvt)
+{
+ u64 msr_val;
+
+ /*
+ * Retrieve TOP_MEM and TOP_MEM2; no masking off of reserved bits since
+ * those are Read-As-Zero.
+ */
+ rdmsrl(MSR_K8_TOP_MEM1, pvt->top_mem);
+ edac_dbg(0, " TOP_MEM: 0x%016llx\n", pvt->top_mem);
+
+ /* Check first whether TOP_MEM2 is enabled: */
+ rdmsrl(MSR_AMD64_SYSCFG, msr_val);
+ if (msr_val & BIT(21)) {
+ rdmsrl(MSR_K8_TOP_MEM2, pvt->top_mem2);
+ edac_dbg(0, " TOP_MEM2: 0x%016llx\n", pvt->top_mem2);
+ } else {
+ edac_dbg(0, " TOP_MEM2 disabled\n");
+ }
+}
+
/*
* Retrieve the hardware registers of the memory controller.
*/
@@ -3067,6 +3130,8 @@ static void __read_mc_regs_df(struct amd64_pvt *pvt)
struct amd64_umc *umc;
u32 i, umc_base;

+ read_top_mem_registers(pvt);
+
/* Read registers from each UMC */
for_each_umc(i) {

@@ -3079,6 +3144,8 @@ static void __read_mc_regs_df(struct amd64_pvt *pvt)
amd_smn_read(nid, umc_base + UMCCH_ECC_CTRL, &umc->ecc_ctrl);
amd_smn_read(nid, umc_base + UMCCH_UMC_CAP_HI, &umc->umc_cap_hi);
}
+
+ amd64_read_pci_cfg(pvt->F0, DF_DHAR, &pvt->dhar);
}

/*
@@ -3088,30 +3155,8 @@ static void __read_mc_regs_df(struct amd64_pvt *pvt)
static void read_mc_regs(struct amd64_pvt *pvt)
{
unsigned int range;
- u64 msr_val;
-
- /*
- * Retrieve TOP_MEM and TOP_MEM2; no masking off of reserved bits since
- * those are Read-As-Zero.
- */
- rdmsrl(MSR_K8_TOP_MEM1, pvt->top_mem);
- edac_dbg(0, " TOP_MEM: 0x%016llx\n", pvt->top_mem);
-
- /* Check first whether TOP_MEM2 is enabled: */
- rdmsrl(MSR_AMD64_SYSCFG, msr_val);
- if (msr_val & BIT(21)) {
- rdmsrl(MSR_K8_TOP_MEM2, pvt->top_mem2);
- edac_dbg(0, " TOP_MEM2: 0x%016llx\n", pvt->top_mem2);
- } else {
- edac_dbg(0, " TOP_MEM2 disabled\n");
- }

- if (pvt->umc) {
- __read_mc_regs_df(pvt);
- amd64_read_pci_cfg(pvt->F0, DF_DHAR, &pvt->dhar);
-
- goto skip;
- }
+ read_top_mem_registers(pvt);

amd64_read_pci_cfg(pvt->F3, NBCAP, &pvt->nbcap);

@@ -3152,14 +3197,6 @@ static void read_mc_regs(struct amd64_pvt *pvt)
amd64_read_dct_pci_cfg(pvt, 1, DCLR0, &pvt->dclr1);
amd64_read_dct_pci_cfg(pvt, 1, DCHR0, &pvt->dchr1);
}
-
-skip:
- read_dct_base_mask(pvt);
-
- determine_memory_type(pvt);
- edac_dbg(1, " DIMM type: %s\n", edac_mem_types[pvt->dram_type]);
-
- determine_ecc_sym_sz(pvt);
}

/*
@@ -3277,9 +3314,6 @@ static int init_csrows(struct mem_ctl_info *mci)
int nr_pages = 0;
u32 val;

- if (pvt->umc)
- return init_csrows_df(mci);
-
amd64_read_pci_cfg(pvt->F3, NBCFG, &val);

pvt->nbcfg = val;
@@ -3703,6 +3737,21 @@ static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt)
return NULL;
}

+ /* ops required for all the families */
+ if (!pvt->ops->early_channel_count || !pvt->ops->dbam_to_cs ||
+ !pvt->ops->prep_chip_select || !pvt->ops->get_base_mask ||
+ !pvt->ops->get_misc_regs || !pvt->ops->get_mc_regs ||
+ !pvt->ops->populate_csrows) {
+ edac_dbg(1, "Common helper routines not defined.\n");
+ return NULL;
+ }
+
+ /* ops required for families 17h and later */
+ if (pvt->fam >= 0x17 && !pvt->ops->get_umc_err_info) {
+ edac_dbg(1, "Platform specific helper routines not defined.\n");
+ return NULL;
+ }
+
return fam_type;
}

@@ -3735,7 +3784,16 @@ static int hw_info_get(struct amd64_pvt *pvt)
if (ret)
return ret;

- read_mc_regs(pvt);
+ pvt->ops->get_mc_regs(pvt);
+
+ pvt->ops->prep_chip_select(pvt);
+
+ pvt->ops->get_base_mask(pvt);
+
+ determine_memory_type(pvt);
+ edac_dbg(1, " DIMM type: %s\n", edac_mem_types[pvt->dram_type]);
+
+ determine_ecc_sym_sz(pvt);

return 0;
}
@@ -3786,7 +3844,7 @@ static int init_one_instance(struct amd64_pvt *pvt)

setup_mci_misc_attrs(mci);

- if (init_csrows(mci))
+ if (pvt->ops->populate_csrows(mci))
mci->edac_cap = EDAC_FLAG_NONE;

ret = -ENODEV;
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 85aa820bc165..881ff6322bc9 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -467,11 +467,17 @@ struct ecc_settings {
* functions and per device encoding/decoding logic.
*/
struct low_ops {
- int (*early_channel_count) (struct amd64_pvt *pvt);
+ int (*early_channel_count) (struct amd64_pvt *pvt);
void (*map_sysaddr_to_csrow) (struct mem_ctl_info *mci, u64 sys_addr,
struct err_info *);
- int (*dbam_to_cs) (struct amd64_pvt *pvt, u8 dct,
+ int (*dbam_to_cs) (struct amd64_pvt *pvt, u8 dct,
unsigned cs_mode, int cs_mask_nr);
+ void (*prep_chip_select) (struct amd64_pvt *pvt);
+ void (*get_base_mask) (struct amd64_pvt *pvt);
+ void (*get_misc_regs) (struct amd64_pvt *pvt);
+ void (*get_mc_regs) (struct amd64_pvt *pvt);
+ int (*populate_csrows) (struct mem_ctl_info *mci);
+ void (*get_umc_err_info) (struct mce *m, struct err_info *err);
};

struct amd64_family_type {
--
2.25.1

Subject: [PATCH v6 4/5] EDAC/amd64: Move struct fam_type into amd64_pvt structure

From: Muralidhara M K <[email protected]>

On heterogeneous systems, the GPU nodes are probed after the CPU
nodes and will overwrites the family type set by CPU nodes.

Moving struct fam_type to struct amd64_pvt, instead of using fam_type
as a global variable.

Signed-off-by: Muralidhara M K <[email protected]>
Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
Reviewed-by: Yazen Ghannam <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
Changes since v5:
Added reviewed by Yazen

Changes since v4:
New patch, created based on a comment.

drivers/edac/amd64_edac.c | 58 +++++++++++++++++++--------------------
drivers/edac/amd64_edac.h | 2 ++
2 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 1029fe84ba2e..7953ffe9d547 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -13,8 +13,6 @@ module_param(ecc_enable_override, int, 0644);

static struct msr __percpu *msrs;

-static struct amd64_family_type *fam_type;
-
/* Per-node stuff */
static struct ecc_settings **ecc_stngs;

@@ -448,7 +446,7 @@ static void get_cs_base_and_mask(struct amd64_pvt *pvt, int csrow, u8 dct,
for (i = 0; i < pvt->csels[dct].m_cnt; i++)

#define for_each_umc(i) \
- for (i = 0; i < fam_type->max_mcs; i++)
+ for (i = 0; i < pvt->fam_type->max_mcs; i++)

/*
* @input_addr is an InputAddr associated with the node given by mci. Return the
@@ -3635,7 +3633,7 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci)

mci->edac_cap = determine_edac_cap(pvt);
mci->mod_name = EDAC_MOD_STR;
- mci->ctl_name = fam_type->ctl_name;
+ mci->ctl_name = pvt->fam_type->ctl_name;
mci->dev_name = pci_name(pvt->F3);
mci->ctl_page_to_phys = NULL;

@@ -3656,64 +3654,64 @@ static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt)

switch (pvt->fam) {
case 0xf:
- fam_type = &family_types[K8_CPUS];
+ pvt->fam_type = &family_types[K8_CPUS];
pvt->ops = &family_types[K8_CPUS].ops;
break;

case 0x10:
- fam_type = &family_types[F10_CPUS];
+ pvt->fam_type = &family_types[F10_CPUS];
pvt->ops = &family_types[F10_CPUS].ops;
break;

case 0x15:
if (pvt->model == 0x30) {
- fam_type = &family_types[F15_M30H_CPUS];
+ pvt->fam_type = &family_types[F15_M30H_CPUS];
pvt->ops = &family_types[F15_M30H_CPUS].ops;
break;
} else if (pvt->model == 0x60) {
- fam_type = &family_types[F15_M60H_CPUS];
+ pvt->fam_type = &family_types[F15_M60H_CPUS];
pvt->ops = &family_types[F15_M60H_CPUS].ops;
break;
/* Richland is only client */
} else if (pvt->model == 0x13) {
return NULL;
} else {
- fam_type = &family_types[F15_CPUS];
+ pvt->fam_type = &family_types[F15_CPUS];
pvt->ops = &family_types[F15_CPUS].ops;
}
break;

case 0x16:
if (pvt->model == 0x30) {
- fam_type = &family_types[F16_M30H_CPUS];
+ pvt->fam_type = &family_types[F16_M30H_CPUS];
pvt->ops = &family_types[F16_M30H_CPUS].ops;
break;
}
- fam_type = &family_types[F16_CPUS];
+ pvt->fam_type = &family_types[F16_CPUS];
pvt->ops = &family_types[F16_CPUS].ops;
break;

case 0x17:
if (pvt->model >= 0x10 && pvt->model <= 0x2f) {
- fam_type = &family_types[F17_M10H_CPUS];
+ pvt->fam_type = &family_types[F17_M10H_CPUS];
pvt->ops = &family_types[F17_M10H_CPUS].ops;
break;
} else if (pvt->model >= 0x30 && pvt->model <= 0x3f) {
- fam_type = &family_types[F17_M30H_CPUS];
+ pvt->fam_type = &family_types[F17_M30H_CPUS];
pvt->ops = &family_types[F17_M30H_CPUS].ops;
break;
} else if (pvt->model >= 0x60 && pvt->model <= 0x6f) {
- fam_type = &family_types[F17_M60H_CPUS];
+ pvt->fam_type = &family_types[F17_M60H_CPUS];
pvt->ops = &family_types[F17_M60H_CPUS].ops;
break;
} else if (pvt->model >= 0x70 && pvt->model <= 0x7f) {
- fam_type = &family_types[F17_M70H_CPUS];
+ pvt->fam_type = &family_types[F17_M70H_CPUS];
pvt->ops = &family_types[F17_M70H_CPUS].ops;
break;
}
fallthrough;
case 0x18:
- fam_type = &family_types[F17_CPUS];
+ pvt->fam_type = &family_types[F17_CPUS];
pvt->ops = &family_types[F17_CPUS].ops;

if (pvt->fam == 0x18)
@@ -3722,12 +3720,12 @@ static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt)

case 0x19:
if (pvt->model >= 0x20 && pvt->model <= 0x2f) {
- fam_type = &family_types[F17_M70H_CPUS];
+ pvt->fam_type = &family_types[F17_M70H_CPUS];
pvt->ops = &family_types[F17_M70H_CPUS].ops;
- fam_type->ctl_name = "F19h_M20h";
+ pvt->fam_type->ctl_name = "F19h_M20h";
break;
}
- fam_type = &family_types[F19_CPUS];
+ pvt->fam_type = &family_types[F19_CPUS];
pvt->ops = &family_types[F19_CPUS].ops;
family_types[F19_CPUS].ctl_name = "F19h";
break;
@@ -3752,7 +3750,7 @@ static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt)
return NULL;
}

- return fam_type;
+ return pvt->fam_type;
}

static const struct attribute_group *amd64_edac_attr_groups[] = {
@@ -3769,15 +3767,15 @@ static int hw_info_get(struct amd64_pvt *pvt)
int ret;

if (pvt->fam >= 0x17) {
- pvt->umc = kcalloc(fam_type->max_mcs, sizeof(struct amd64_umc), GFP_KERNEL);
+ pvt->umc = kcalloc(pvt->fam_type->max_mcs, sizeof(struct amd64_umc), GFP_KERNEL);
if (!pvt->umc)
return -ENOMEM;

- pci_id1 = fam_type->f0_id;
- pci_id2 = fam_type->f6_id;
+ pci_id1 = pvt->fam_type->f0_id;
+ pci_id2 = pvt->fam_type->f6_id;
} else {
- pci_id1 = fam_type->f1_id;
- pci_id2 = fam_type->f2_id;
+ pci_id1 = pvt->fam_type->f1_id;
+ pci_id2 = pvt->fam_type->f2_id;
}

ret = reserve_mc_sibling_devs(pvt, pci_id1, pci_id2);
@@ -3832,7 +3830,7 @@ static int init_one_instance(struct amd64_pvt *pvt)
* only one channel. Also, this simplifies handling later for the price
* of a couple of KBs tops.
*/
- layers[1].size = fam_type->max_mcs;
+ layers[1].size = pvt->fam_type->max_mcs;
layers[1].is_virt_csrow = false;

mci = edac_mc_alloc(pvt->mc_node_id, ARRAY_SIZE(layers), layers, 0);
@@ -3862,7 +3860,7 @@ static bool instance_has_memory(struct amd64_pvt *pvt)
bool cs_enabled = false;
int cs = 0, dct = 0;

- for (dct = 0; dct < fam_type->max_mcs; dct++) {
+ for (dct = 0; dct < pvt->fam_type->max_mcs; dct++) {
for_each_chip_select(cs, dct, pvt)
cs_enabled |= csrow_enabled(cs, dct, pvt);
}
@@ -3892,8 +3890,8 @@ static int probe_one_instance(unsigned int nid)
pvt->F3 = F3;

ret = -ENODEV;
- fam_type = per_family_init(pvt);
- if (!fam_type)
+ pvt->fam_type = per_family_init(pvt);
+ if (!pvt->fam_type)
goto err_enable;

ret = hw_info_get(pvt);
@@ -3932,7 +3930,7 @@ static int probe_one_instance(unsigned int nid)
goto err_enable;
}

- amd64_info("%s %sdetected (node %d).\n", fam_type->ctl_name,
+ amd64_info("%s %sdetected (node %d).\n", pvt->fam_type->ctl_name,
(pvt->fam == 0xf ?
(pvt->ext_model >= K8_REV_F ? "revF or later "
: "revE or earlier ")
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 881ff6322bc9..82d9f64aa150 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -389,6 +389,8 @@ struct amd64_pvt {
enum mem_type dram_type;

struct amd64_umc *umc; /* UMC registers */
+
+ struct amd64_family_type *fam_type;
};

enum err_codes {
--
2.25.1

Subject: [PATCH v6 5/5] EDAC/amd64: Enumerate memory on Aldebaran GPU nodes

On newer heterogeneous systems with AMD CPUs, the data fabrics of the GPUs
are connected directly via custom links.

One such system, where Aldebaran GPU nodes are connected to the
Family 19h, model 30h family of CPU nodes, the Aldebaran GPUs can report
memory errors via SMCA banks.

Aldebaran GPU support was added to DRM framework
https://lists.freedesktop.org/archives/amd-gfx/2021-February/059694.html

The GPU nodes comes with HBM2 memory in-built, ECC support is enabled by
default and the UMCs on GPU node are different from the UMCs on CPU nodes.

GPU specific ops routines are defined to extend the amd64_edac
module to enumerate HBM memory leveraging the existing edac and the
amd64 specific data structures.

Note: The UMC Phys on GPU nodes are enumerated as csrows and the UMC
channels connected to HBM banks are enumerated as ranks.

Cc: Yazen Ghannam <[email protected]>
Co-developed-by: Muralidhara M K <[email protected]>
Signed-off-by: Muralidhara M K <[email protected]>
Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
Changes since v5:
Removed else condition in per_family_init for 19h family

Changes since v4:
Split "f17_addr_mask_to_cs_size" instead as did in 3rd patch earlier

Changes since v3:
1. Bifurcated the GPU code from v2

Changes since v2:
1. Restored line deletions and handled minor comments
2. Modified commit message and some of the function comments
3. variable df_inst_id is introduced instead of umc_num

Changes since v1:
1. Modifed the commit message
2. Change the edac_cap
3. kept sizes of both cpu and noncpu together
4. return success if the !F3 condition true and remove unnecessary validation

drivers/edac/amd64_edac.c | 298 +++++++++++++++++++++++++++++++++-----
drivers/edac/amd64_edac.h | 27 ++++
2 files changed, 292 insertions(+), 33 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 7953ffe9d547..b404fa5b03ce 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -1121,6 +1121,20 @@ static void debug_display_dimm_sizes_df(struct amd64_pvt *pvt, u8 ctrl)
}
}

+static void debug_display_dimm_sizes_gpu(struct amd64_pvt *pvt, u8 ctrl)
+{
+ int size, cs = 0, cs_mode;
+
+ edac_printk(KERN_DEBUG, EDAC_MC, "UMC%d chip selects:\n", ctrl);
+
+ cs_mode = CS_EVEN_PRIMARY | CS_ODD_PRIMARY;
+
+ for_each_chip_select(cs, ctrl, pvt) {
+ size = pvt->ops->dbam_to_cs(pvt, ctrl, cs_mode, cs);
+ amd64_info(EDAC_MC ": %d: %5dMB\n", cs, size);
+ }
+}
+
static void __dump_misc_regs_df(struct amd64_pvt *pvt)
{
struct amd64_umc *umc;
@@ -1165,6 +1179,27 @@ static void __dump_misc_regs_df(struct amd64_pvt *pvt)
pvt->dhar, dhar_base(pvt));
}

+static void __dump_misc_regs_gpu(struct amd64_pvt *pvt)
+{
+ struct amd64_umc *umc;
+ u32 i, umc_base;
+
+ for_each_umc(i) {
+ umc_base = get_umc_base(i);
+ umc = &pvt->umc[i];
+
+ edac_dbg(1, "UMC%d UMC cfg: 0x%x\n", i, umc->umc_cfg);
+ edac_dbg(1, "UMC%d SDP ctrl: 0x%x\n", i, umc->sdp_ctrl);
+ edac_dbg(1, "UMC%d ECC ctrl: 0x%x\n", i, umc->ecc_ctrl);
+ edac_dbg(1, "UMC%d All HBMs support ECC: yes\n", i);
+
+ debug_display_dimm_sizes_gpu(pvt, i);
+ }
+
+ edac_dbg(1, "F0x104 (DRAM Hole Address): 0x%08x, base: 0x%08x\n",
+ pvt->dhar, dhar_base(pvt));
+}
+
/* Display and decode various NB registers for debug purposes. */
static void __dump_misc_regs(struct amd64_pvt *pvt)
{
@@ -1245,6 +1280,43 @@ static void f17_prep_chip_selects(struct amd64_pvt *pvt)
}
}

+static void gpu_prep_chip_selects(struct amd64_pvt *pvt)
+{
+ int umc;
+
+ for_each_umc(umc) {
+ pvt->csels[umc].b_cnt = 8;
+ pvt->csels[umc].m_cnt = 8;
+ }
+}
+
+static void read_umc_base_mask_gpu(struct amd64_pvt *pvt)
+{
+ u32 base_reg, mask_reg;
+ u32 *base, *mask;
+ int umc, cs;
+
+ for_each_umc(umc) {
+ for_each_chip_select(cs, umc, pvt) {
+ base_reg = get_umc_base_gpu(umc, cs) + UMCCH_BASE_ADDR;
+ base = &pvt->csels[umc].csbases[cs];
+
+ if (!amd_smn_read(pvt->mc_node_id, base_reg, base)) {
+ edac_dbg(0, " DCSB%d[%d]=0x%08x reg: 0x%x\n",
+ umc, cs, *base, base_reg);
+ }
+
+ mask_reg = get_umc_base_gpu(umc, cs) + UMCCH_ADDR_MASK;
+ mask = &pvt->csels[umc].csmasks[cs];
+
+ if (!amd_smn_read(pvt->mc_node_id, mask_reg, mask)) {
+ edac_dbg(0, " DCSM%d[%d]=0x%08x reg: 0x%x\n",
+ umc, cs, *mask, mask_reg);
+ }
+ }
+ }
+}
+
static void read_umc_base_mask(struct amd64_pvt *pvt)
{
u32 umc_base_reg, umc_base_reg_sec;
@@ -1743,6 +1815,19 @@ static int f17_early_channel_count(struct amd64_pvt *pvt)
return channels;
}

+static int gpu_early_channel_count(struct amd64_pvt *pvt)
+{
+ int i, channels = 0;
+
+ /* The memory channels in case of GPUs are fully populated */
+ for_each_umc(i)
+ channels += pvt->csels[i].b_cnt;
+
+ amd64_info("MCT channel count: %d\n", channels);
+
+ return channels;
+}
+
static int ddr3_cs_size(unsigned i, bool dct_width)
{
unsigned shift = 0;
@@ -1870,11 +1955,46 @@ static int f16_dbam_to_chip_select(struct amd64_pvt *pvt, u8 dct,
return ddr3_cs_size(cs_mode, false);
}

+static int __addr_mask_to_cs_size(u32 addr_mask_orig, unsigned int cs_mode,
+ int csrow_nr, int dimm)
+{
+ u32 msb, weight, num_zero_bits;
+ u32 addr_mask_deinterleaved;
+ int size = 0;
+
+ /*
+ * The number of zero bits in the mask is equal to the number of bits
+ * in a full mask minus the number of bits in the current mask.
+ *
+ * The MSB is the number of bits in the full mask because BIT[0] is
+ * always 0.
+ *
+ * In the special 3 Rank interleaving case, a single bit is flipped
+ * without swapping with the most significant bit. This can be handled
+ * by keeping the MSB where it is and ignoring the single zero bit.
+ */
+ msb = fls(addr_mask_orig) - 1;
+ weight = hweight_long(addr_mask_orig);
+ num_zero_bits = msb - weight - !!(cs_mode & CS_3R_INTERLEAVE);
+
+ /* Take the number of zero bits off from the top of the mask. */
+ addr_mask_deinterleaved = GENMASK_ULL(msb - num_zero_bits, 1);
+
+ edac_dbg(1, "CS%d DIMM%d AddrMasks:\n", csrow_nr, dimm);
+ edac_dbg(1, " Original AddrMask: 0x%x\n", addr_mask_orig);
+ edac_dbg(1, " Deinterleaved AddrMask: 0x%x\n", addr_mask_deinterleaved);
+
+ /* Register [31:1] = Address [39:9]. Size is in kBs here. */
+ size = (addr_mask_deinterleaved >> 2) + 1;
+
+ /* Return size in MBs. */
+ return size >> 10;
+}
+
static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
unsigned int cs_mode, int csrow_nr)
{
- u32 addr_mask_orig, addr_mask_deinterleaved;
- u32 msb, weight, num_zero_bits;
+ u32 addr_mask_orig;
int dimm, size = 0;

/* No Chip Selects are enabled. */
@@ -1902,33 +2022,15 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
else
addr_mask_orig = pvt->csels[umc].csmasks[dimm];

- /*
- * The number of zero bits in the mask is equal to the number of bits
- * in a full mask minus the number of bits in the current mask.
- *
- * The MSB is the number of bits in the full mask because BIT[0] is
- * always 0.
- *
- * In the special 3 Rank interleaving case, a single bit is flipped
- * without swapping with the most significant bit. This can be handled
- * by keeping the MSB where it is and ignoring the single zero bit.
- */
- msb = fls(addr_mask_orig) - 1;
- weight = hweight_long(addr_mask_orig);
- num_zero_bits = msb - weight - !!(cs_mode & CS_3R_INTERLEAVE);
-
- /* Take the number of zero bits off from the top of the mask. */
- addr_mask_deinterleaved = GENMASK_ULL(msb - num_zero_bits, 1);
-
- edac_dbg(1, "CS%d DIMM%d AddrMasks:\n", csrow_nr, dimm);
- edac_dbg(1, " Original AddrMask: 0x%x\n", addr_mask_orig);
- edac_dbg(1, " Deinterleaved AddrMask: 0x%x\n", addr_mask_deinterleaved);
+ return __addr_mask_to_cs_size(addr_mask_orig, cs_mode, csrow_nr, dimm);
+}

- /* Register [31:1] = Address [39:9]. Size is in kBs here. */
- size = (addr_mask_deinterleaved >> 2) + 1;
+static int gpu_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
+ unsigned int cs_mode, int csrow_nr)
+{
+ u32 addr_mask_orig = pvt->csels[umc].csmasks[csrow_nr];

- /* Return size in MBs. */
- return size >> 10;
+ return __addr_mask_to_cs_size(addr_mask_orig, cs_mode, csrow_nr, csrow_nr >> 1);
}

static void read_dram_ctl_register(struct amd64_pvt *pvt)
@@ -2516,9 +2618,12 @@ static void debug_display_dimm_sizes(struct amd64_pvt *pvt, u8 ctrl)
/* Prototypes for family specific ops routines */
static int init_csrows(struct mem_ctl_info *mci);
static int init_csrows_df(struct mem_ctl_info *mci);
+static int init_csrows_gpu(struct mem_ctl_info *mci);
static void read_mc_regs(struct amd64_pvt *pvt);
static void __read_mc_regs_df(struct amd64_pvt *pvt);
+static void __read_mc_regs_gpu(struct amd64_pvt *pvt);
static void update_umc_err_info(struct mce *m, struct err_info *err);
+static void update_umc_err_info_gpu(struct mce *m, struct err_info *err);

static const struct low_ops k8_ops = {
.early_channel_count = k8_early_channel_count,
@@ -2597,6 +2702,17 @@ static const struct low_ops f17_ops = {
.get_umc_err_info = update_umc_err_info,
};

+static const struct low_ops gpu_ops = {
+ .early_channel_count = gpu_early_channel_count,
+ .dbam_to_cs = gpu_addr_mask_to_cs_size,
+ .prep_chip_select = gpu_prep_chip_selects,
+ .get_base_mask = read_umc_base_mask_gpu,
+ .get_misc_regs = __dump_misc_regs_gpu,
+ .get_mc_regs = __read_mc_regs_gpu,
+ .populate_csrows = init_csrows_gpu,
+ .get_umc_err_info = update_umc_err_info_gpu,
+};
+
static struct amd64_family_type family_types[] = {
[K8_CPUS] = {
.ctl_name = "K8",
@@ -2689,6 +2805,14 @@ static struct amd64_family_type family_types[] = {
.max_mcs = 8,
.ops = f17_ops,
},
+ [ALDEBARAN_GPUS] = {
+ .ctl_name = "ALDEBARAN",
+ .f0_id = PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F0,
+ .f6_id = PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F6,
+ .max_mcs = 4,
+ .ops = gpu_ops,
+ },
+
};

/*
@@ -2947,12 +3071,38 @@ static void update_umc_err_info(struct mce *m, struct err_info *err)
err->csrow = m->synd & 0x7;
}

+/*
+ * The CPUs have one channel per UMC, So UMC number is equivalent to a
+ * channel number. The GPUs have 8 channels per UMC, so the UMC number no
+ * longer works as a channel number.
+ * The channel number within a GPU UMC is given in MCA_IPID[15:12].
+ * However, the IDs are split such that two UMC values go to one UMC, and
+ * the channel numbers are split in two groups of four.
+ *
+ * Refer comment on get_umc_base_gpu() from amd64_edac.h
+ *
+ * For example,
+ * UMC0 CH[3:0] = 0x0005[3:0]000
+ * UMC0 CH[7:4] = 0x0015[3:0]000
+ * UMC1 CH[3:0] = 0x0025[3:0]000
+ * UMC1 CH[7:4] = 0x0035[3:0]000
+ */
+static void update_umc_err_info_gpu(struct mce *m, struct err_info *err)
+{
+ u8 ch = (m->ipid & GENMASK(31, 0)) >> 20;
+ u8 phy = ((m->ipid >> 12) & 0xf);
+
+ err->channel = ch % 2 ? phy + 4 : phy;
+ err->csrow = phy;
+}
+
static void decode_umc_error(int node_id, struct mce *m)
{
u8 ecc_type = (m->status >> 45) & 0x3;
struct mem_ctl_info *mci;
struct amd64_pvt *pvt;
struct err_info err;
+ u8 df_inst_id;
u64 sys_addr;

mci = edac_mc_find(node_id);
@@ -2982,7 +3132,17 @@ static void decode_umc_error(int node_id, struct mce *m)

pvt->ops->get_umc_err_info(m, &err);

- if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, err.channel, &sys_addr)) {
+ /*
+ * GPU node has #phys[X] which has #channels[Y] each.
+ * On GPUs, df_inst_id = [X] * num_ch_per_phy + [Y].
+ * On CPUs, "Channel"="UMC Number"="DF Instance ID".
+ */
+ if (pvt->is_gpu)
+ df_inst_id = (err.csrow * pvt->channel_count / mci->nr_csrows) + err.channel;
+ else
+ df_inst_id = err.channel;
+
+ if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, df_inst_id, &sys_addr)) {
err.err_code = ERR_NORM_ADDR;
goto log_error;
}
@@ -3146,6 +3306,25 @@ static void __read_mc_regs_df(struct amd64_pvt *pvt)
amd64_read_pci_cfg(pvt->F0, DF_DHAR, &pvt->dhar);
}

+static void __read_mc_regs_gpu(struct amd64_pvt *pvt)
+{
+ u8 nid = pvt->mc_node_id;
+ struct amd64_umc *umc;
+ u32 i, umc_base;
+
+ /* Read registers from each UMC */
+ for_each_umc(i) {
+ umc_base = get_umc_base_gpu(i, 0);
+ umc = &pvt->umc[i];
+
+ amd_smn_read(nid, umc_base + UMCCH_UMC_CFG, &umc->umc_cfg);
+ amd_smn_read(nid, umc_base + UMCCH_SDP_CTRL, &umc->sdp_ctrl);
+ amd_smn_read(nid, umc_base + UMCCH_ECC_CTRL, &umc->ecc_ctrl);
+ }
+
+ amd64_read_pci_cfg(pvt->F0, DF_DHAR, &pvt->dhar);
+}
+
/*
* Retrieve the hardware registers of the memory controller (this includes the
* 'Address Map' and 'Misc' device regs)
@@ -3241,7 +3420,10 @@ static u32 get_csrow_nr_pages(struct amd64_pvt *pvt, u8 dct, int csrow_nr_orig)
csrow_nr >>= 1;
cs_mode = DBAM_DIMM(csrow_nr, dbam);
} else {
- cs_mode = f17_get_cs_mode(csrow_nr >> 1, dct, pvt);
+ if (pvt->is_gpu)
+ cs_mode = CS_EVEN_PRIMARY | CS_ODD_PRIMARY;
+ else
+ cs_mode = f17_get_cs_mode(csrow_nr >> 1, dct, pvt);
}

nr_pages = pvt->ops->dbam_to_cs(pvt, dct, cs_mode, csrow_nr);
@@ -3298,6 +3480,35 @@ static int init_csrows_df(struct mem_ctl_info *mci)
return empty;
}

+static int init_csrows_gpu(struct mem_ctl_info *mci)
+{
+ struct amd64_pvt *pvt = mci->pvt_info;
+ struct dimm_info *dimm;
+ int empty = 1;
+ u8 umc, cs;
+
+ for_each_umc(umc) {
+ for_each_chip_select(cs, umc, pvt) {
+ if (!csrow_enabled(cs, umc, pvt))
+ continue;
+
+ empty = 0;
+ dimm = mci->csrows[umc]->channels[cs]->dimm;
+
+ edac_dbg(1, "MC node: %d, csrow: %d\n",
+ pvt->mc_node_id, cs);
+
+ dimm->nr_pages = get_csrow_nr_pages(pvt, umc, cs);
+ dimm->mtype = MEM_HBM2;
+ dimm->edac_mode = EDAC_SECDED;
+ dimm->dtype = DEV_X16;
+ dimm->grain = 64;
+ }
+ }
+
+ return empty;
+}
+
/*
* Initialize the array of csrow attribute instances, based on the values
* from pci config hardware registers.
@@ -3539,6 +3750,10 @@ static bool ecc_enabled(struct amd64_pvt *pvt)
u8 ecc_en = 0, i;
u32 value;

+ /* ECC is enabled by default on GPU nodes */
+ if (pvt->is_gpu)
+ return true;
+
if (boot_cpu_data.x86 >= 0x17) {
u8 umc_en_mask = 0, ecc_en_mask = 0;
struct amd64_umc *umc;
@@ -3622,7 +3837,10 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci)
mci->edac_ctl_cap = EDAC_FLAG_NONE;

if (pvt->umc) {
- f17h_determine_edac_ctl_cap(mci, pvt);
+ if (pvt->is_gpu)
+ mci->edac_ctl_cap |= EDAC_FLAG_SECDED;
+ else
+ f17h_determine_edac_ctl_cap(mci, pvt);
} else {
if (pvt->nbcap & NBCAP_SECDED)
mci->edac_ctl_cap |= EDAC_FLAG_SECDED;
@@ -3724,6 +3942,17 @@ static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt)
pvt->ops = &family_types[F17_M70H_CPUS].ops;
pvt->fam_type->ctl_name = "F19h_M20h";
break;
+ } else if (pvt->model >= 0x30 && pvt->model <= 0x3f) {
+ if (pvt->mc_node_id >= amd_cpu_node_count()) {
+ pvt->fam_type = &family_types[ALDEBARAN_GPUS];
+ pvt->ops = &family_types[ALDEBARAN_GPUS].ops;
+ pvt->is_gpu = true;
+ } else {
+ pvt->fam_type = &family_types[F19_CPUS];
+ pvt->ops = &family_types[F19_CPUS].ops;
+ family_types[F19_CPUS].ctl_name = "F19h_M30h";
+ }
+ break;
}
pvt->fam_type = &family_types[F19_CPUS];
pvt->ops = &family_types[F19_CPUS].ops;
@@ -3791,7 +4020,9 @@ static int hw_info_get(struct amd64_pvt *pvt)
determine_memory_type(pvt);
edac_dbg(1, " DIMM type: %s\n", edac_mem_types[pvt->dram_type]);

- determine_ecc_sym_sz(pvt);
+ /* ECC symbol size is not available on GPU nodes */
+ if (!pvt->is_gpu)
+ determine_ecc_sym_sz(pvt);

return 0;
}
@@ -3819,9 +4050,10 @@ static int init_one_instance(struct amd64_pvt *pvt)
if (pvt->channel_count < 0)
return ret;

+ /* Define layers for CPU and GPU nodes */
ret = -ENOMEM;
layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
- layers[0].size = pvt->csels[0].b_cnt;
+ layers[0].size = pvt->is_gpu ? pvt->fam_type->max_mcs : pvt->csels[0].b_cnt;
layers[0].is_virt_csrow = true;
layers[1].type = EDAC_MC_LAYER_CHANNEL;

@@ -3830,7 +4062,7 @@ static int init_one_instance(struct amd64_pvt *pvt)
* only one channel. Also, this simplifies handling later for the price
* of a couple of KBs tops.
*/
- layers[1].size = pvt->fam_type->max_mcs;
+ layers[1].size = pvt->is_gpu ? pvt->csels[0].b_cnt : pvt->fam_type->max_mcs;
layers[1].is_virt_csrow = false;

mci = edac_mc_alloc(pvt->mc_node_id, ARRAY_SIZE(layers), layers, 0);
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 82d9f64aa150..da2f6c79cccc 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -126,6 +126,8 @@
#define PCI_DEVICE_ID_AMD_17H_M70H_DF_F6 0x1446
#define PCI_DEVICE_ID_AMD_19H_DF_F0 0x1650
#define PCI_DEVICE_ID_AMD_19H_DF_F6 0x1656
+#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F0 0x14d0
+#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F6 0x14d6

/*
* Function 1 - Address Map
@@ -298,6 +300,7 @@ enum amd_families {
F17_M60H_CPUS,
F17_M70H_CPUS,
F19_CPUS,
+ ALDEBARAN_GPUS,
NUM_FAMILIES,
};

@@ -391,6 +394,8 @@ struct amd64_pvt {
struct amd64_umc *umc; /* UMC registers */

struct amd64_family_type *fam_type;
+
+ bool is_gpu;
};

enum err_codes {
@@ -412,6 +417,28 @@ struct err_info {
u32 offset;
};

+static inline u32 get_umc_base_gpu(u8 umc, u8 channel)
+{
+ /*
+ * On CPUs, there is one channel per UMC, so UMC numbering equals
+ * channel numbering. On GPUs, there are eight channels per UMC,
+ * so the channel numbering is different from UMC numbering.
+ *
+ * On CPU nodes channels are selected in 6th nibble
+ * UMC chY[3:0]= [(chY*2 + 1) : (chY*2)]50000;
+ *
+ * On GPU nodes channels are selected in 3rd nibble
+ * HBM chX[3:0]= [Y ]5X[3:0]000;
+ * HBM chX[7:4]= [Y+1]5X[3:0]000
+ */
+ umc *= 2;
+
+ if (channel >= 4)
+ umc++;
+
+ return 0x50000 + (umc << 20) + ((channel % 4) << 12);
+}
+
static inline u32 get_umc_base(u8 channel)
{
/* chY: 0xY50000 */
--
2.25.1

2021-11-08 19:34:07

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v6 2/5] EDAC/mce_amd: Extract node id from MCA_IPID

On Thu, Oct 28, 2021 at 06:31:03PM +0530, Naveen Krishna Chatradhi wrote:
> On SMCA banks of the GPU nodes, the node id information is
> available in register MCA_IPID[47:44](InstanceIdHi).
>
> Convert the hardware node ID to a value used by Linux
> where GPU nodes are sequencially after the CPU nodes.

"sequentially"

Use a spellchecker when writing commit messages pls.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-11-10 17:46:05

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v6 3/5] EDAC/amd64: Extend family ops functions

On Thu, Oct 28, 2021 at 06:31:04PM +0530, Naveen Krishna Chatradhi wrote:
> @@ -1217,28 +1214,39 @@ static void dump_misc_regs(struct amd64_pvt *pvt)
> /*
> * See BKDG, F2x[1,0][5C:40], F2[1,0][6C:60]
> */
> -static void prep_chip_selects(struct amd64_pvt *pvt)
> +static void k8_prep_chip_selects(struct amd64_pvt *pvt)
> {
> if (pvt->fam == 0xf && pvt->ext_model < K8_REV_F) {
> pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8;
> pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 8;
> - } else if (pvt->fam == 0x15 && pvt->model == 0x30) {
> - pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 4;
> - pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2;
> - } else if (pvt->fam >= 0x17) {
> - int umc;
> -
> - for_each_umc(umc) {
> - pvt->csels[umc].b_cnt = 4;
> - pvt->csels[umc].m_cnt = 2;
> - }
> -
> - } else {
> + } else if (pvt->fam == 0xf && pvt->ext_model >= K8_REV_F) {
> pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8;
> pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 4;
> }

Why is this function looking at the family if it is called a k8_...
function which will be set only on K8?

> }
>
> +static void f15m30_prep_chip_selects(struct amd64_pvt *pvt)
> +{
> + pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 4;
> + pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2;
> +}
> +
> +static void default_prep_chip_selects(struct amd64_pvt *pvt)
> +{
> + pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8;
> + pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 4;
> +}
> +
> +static void f17_prep_chip_selects(struct amd64_pvt *pvt)
> +{
> + int umc;
> +
> + for_each_umc(umc) {
> + pvt->csels[umc].b_cnt = 4;
> + pvt->csels[umc].m_cnt = 2;
> + }
> +}
> +
> static void read_umc_base_mask(struct amd64_pvt *pvt)
> {
> u32 umc_base_reg, umc_base_reg_sec;
> @@ -1297,11 +1305,6 @@ static void read_dct_base_mask(struct amd64_pvt *pvt)
> {
> int cs;
>
> - prep_chip_selects(pvt);
> -
> - if (pvt->umc)
> - return read_umc_base_mask(pvt);
> -
> for_each_chip_select(cs, 0, pvt) {
> int reg0 = DCSB0 + (cs * 4);
> int reg1 = DCSB1 + (cs * 4);
> @@ -2512,143 +2515,181 @@ static void debug_display_dimm_sizes(struct amd64_pvt *pvt, u8 ctrl)
> }
> }
>
> +/* Prototypes for family specific ops routines */
> +static int init_csrows(struct mem_ctl_info *mci);
> +static int init_csrows_df(struct mem_ctl_info *mci);
> +static void read_mc_regs(struct amd64_pvt *pvt);
> +static void __read_mc_regs_df(struct amd64_pvt *pvt);
> +static void update_umc_err_info(struct mce *m, struct err_info *err);

Prototypes belong in headers.

> +
> +static const struct low_ops k8_ops = {

So if you're going to do this, you can go a step further and get rid of
all those static definitions which are completely unused except those of
the current family we're loaded on.

IOW, you should make all family-specific assignments dynamic and get rid
of family_types and those ops. Here's an example I did for K8:

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 1029fe84ba2e..5f1686f22947 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3656,8 +3656,18 @@ static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt)

switch (pvt->fam) {
case 0xf:
- fam_type = &family_types[K8_CPUS];
- pvt->ops = &family_types[K8_CPUS].ops;
+ fam_type->ctl_name = "K8";
+ fam_type->f1_id = PCI_DEVICE_ID_AMD_K8_NB_ADDRMAP;
+ fam_type->f2_id = PCI_DEVICE_ID_AMD_K8_NB_MEMCTL;
+ fam_type->max_mcs = 2;
+ pvt->ops->early_channel_count = k8_early_channel_count;
+ pvt->ops->map_sysaddr_to_csrow = k8_map_sysaddr_to_csrow;
+ pvt->ops->dbam_to_cs = k8_dbam_to_chip_select;
+ pvt->ops->prep_chip_select = k8_prep_chip_selects;
+ pvt->ops->get_base_mask = read_dct_base_mask;
+ pvt->ops->get_misc_regs = __dump_misc_regs;
+ pvt->ops->get_mc_regs = read_mc_regs;
+ pvt->ops->populate_csrows = init_csrows;
break;

case 0x10:

After that, pvt and fam_type have been properly set up.

> @@ -3735,7 +3784,16 @@ static int hw_info_get(struct amd64_pvt *pvt)
> if (ret)
> return ret;
>
> - read_mc_regs(pvt);
> + pvt->ops->get_mc_regs(pvt);
> +
> + pvt->ops->prep_chip_select(pvt);
> +
> + pvt->ops->get_base_mask(pvt);
> +
> + determine_memory_type(pvt);
> + edac_dbg(1, " DIMM type: %s\n", edac_mem_types[pvt->dram_type]);

Move that dbg call at the end of determine_memory_type().

> +
> + determine_ecc_sym_sz(pvt);
>
> return 0;
> }
> @@ -3786,7 +3844,7 @@ static int init_one_instance(struct amd64_pvt *pvt)
>
> setup_mci_misc_attrs(mci);
>
> - if (init_csrows(mci))
> + if (pvt->ops->populate_csrows(mci))
> mci->edac_cap = EDAC_FLAG_NONE;
>
> ret = -ENODEV;
> diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
> index 85aa820bc165..881ff6322bc9 100644
> --- a/drivers/edac/amd64_edac.h
> +++ b/drivers/edac/amd64_edac.h
> @@ -467,11 +467,17 @@ struct ecc_settings {
> * functions and per device encoding/decoding logic.
> */
> struct low_ops {
> - int (*early_channel_count) (struct amd64_pvt *pvt);
> + int (*early_channel_count) (struct amd64_pvt *pvt);
> void (*map_sysaddr_to_csrow) (struct mem_ctl_info *mci, u64 sys_addr,
> struct err_info *);
> - int (*dbam_to_cs) (struct amd64_pvt *pvt, u8 dct,
> + int (*dbam_to_cs) (struct amd64_pvt *pvt, u8 dct,
> unsigned cs_mode, int cs_mask_nr);
> + void (*prep_chip_select) (struct amd64_pvt *pvt);

That name should be "prep_chip_selects" - plural.

> + void (*get_base_mask) (struct amd64_pvt *pvt);
> + void (*get_misc_regs) (struct amd64_pvt *pvt);
> + void (*get_mc_regs) (struct amd64_pvt *pvt);
> + int (*populate_csrows) (struct mem_ctl_info *mci);
> + void (*get_umc_err_info) (struct mce *m, struct err_info *err);

WARNING: Unnecessary space before function pointer arguments
#652: FILE: drivers/edac/amd64_edac.h:470:
+ int (*early_channel_count) (struct amd64_pvt *pvt);

WARNING: Unnecessary space before function pointer arguments
#656: FILE: drivers/edac/amd64_edac.h:473:
+ int (*dbam_to_cs) (struct amd64_pvt *pvt, u8 dct,

WARNING: Unnecessary space before function pointer arguments
#658: FILE: drivers/edac/amd64_edac.h:475:
+ void (*prep_chip_select) (struct amd64_pvt *pvt);

WARNING: Unnecessary space before function pointer arguments
#659: FILE: drivers/edac/amd64_edac.h:476:
+ void (*get_base_mask) (struct amd64_pvt *pvt);

WARNING: Unnecessary space before function pointer arguments
#660: FILE: drivers/edac/amd64_edac.h:477:
+ void (*get_misc_regs) (struct amd64_pvt *pvt);

WARNING: Unnecessary space before function pointer arguments
#661: FILE: drivers/edac/amd64_edac.h:478:
+ void (*get_mc_regs) (struct amd64_pvt *pvt);

WARNING: Unnecessary space before function pointer arguments
#662: FILE: drivers/edac/amd64_edac.h:479:
+ int (*populate_csrows) (struct mem_ctl_info *mci);

WARNING: Unnecessary space before function pointer arguments
#663: FILE: drivers/edac/amd64_edac.h:480:
+ void (*get_umc_err_info) (struct mce *m, struct err_info *err);

total: 0 errors, 8 warnings, 507 lines checked


Please integrate scripts/checkpatch.pl into your patch creation
workflow. Some of the warnings/errors *actually* make sense.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-11-11 12:39:47

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v6 4/5] EDAC/amd64: Move struct fam_type into amd64_pvt structure

On Thu, Oct 28, 2021 at 06:31:05PM +0530, Naveen Krishna Chatradhi wrote:
> diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
> index 881ff6322bc9..82d9f64aa150 100644
> --- a/drivers/edac/amd64_edac.h
> +++ b/drivers/edac/amd64_edac.h
> @@ -389,6 +389,8 @@ struct amd64_pvt {
> enum mem_type dram_type;
>
> struct amd64_umc *umc; /* UMC registers */
> +
> + struct amd64_family_type *fam_type;

With my suggestion to the previous patch to assign ops and per-family
members dynamically, you can just as well get rid of struct
amd64_family_type and put its members directly in the pvt structure.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-11-11 13:12:43

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v6 5/5] EDAC/amd64: Enumerate memory on Aldebaran GPU nodes

On Thu, Oct 28, 2021 at 06:31:06PM +0530, Naveen Krishna Chatradhi wrote:
> On newer heterogeneous systems with AMD CPUs, the data fabrics of the GPUs
> are connected directly via custom links.
>
> One such system, where Aldebaran GPU nodes are connected to the
> Family 19h, model 30h family of CPU nodes, the Aldebaran GPUs can report
> memory errors via SMCA banks.
>
> Aldebaran GPU support was added to DRM framework
> https://lists.freedesktop.org/archives/amd-gfx/2021-February/059694.html
>
> The GPU nodes comes with HBM2 memory in-built, ECC support is enabled by
> default and the UMCs on GPU node are different from the UMCs on CPU nodes.
>
> GPU specific ops routines are defined to extend the amd64_edac
> module to enumerate HBM memory leveraging the existing edac and the
> amd64 specific data structures.
>
> Note: The UMC Phys on GPU nodes are enumerated as csrows and the UMC
> channels connected to HBM banks are enumerated as ranks.

For all your future commit messages, from
Documentation/process/submitting-patches.rst:

"Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
to do frotz", as if you are giving orders to the codebase to change
its behaviour."

Also, do not talk about what your patch does - that should hopefully be
visible in the diff itself. Rather, talk about *why* you're doing what
you're doing.

>
> Cc: Yazen Ghannam <[email protected]>
> Co-developed-by: Muralidhara M K <[email protected]>
> Signed-off-by: Muralidhara M K <[email protected]>
> Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
> Link: https://lkml.kernel.org/r/[email protected]
> ---
> Changes since v5:
> Removed else condition in per_family_init for 19h family
>
> Changes since v4:
> Split "f17_addr_mask_to_cs_size" instead as did in 3rd patch earlier
>
> Changes since v3:
> 1. Bifurcated the GPU code from v2
>
> Changes since v2:
> 1. Restored line deletions and handled minor comments
> 2. Modified commit message and some of the function comments
> 3. variable df_inst_id is introduced instead of umc_num
>
> Changes since v1:
> 1. Modifed the commit message
> 2. Change the edac_cap
> 3. kept sizes of both cpu and noncpu together
> 4. return success if the !F3 condition true and remove unnecessary validation
>
> drivers/edac/amd64_edac.c | 298 +++++++++++++++++++++++++++++++++-----
> drivers/edac/amd64_edac.h | 27 ++++
> 2 files changed, 292 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
> index 7953ffe9d547..b404fa5b03ce 100644
> --- a/drivers/edac/amd64_edac.c
> +++ b/drivers/edac/amd64_edac.c
> @@ -1121,6 +1121,20 @@ static void debug_display_dimm_sizes_df(struct amd64_pvt *pvt, u8 ctrl)
> }
> }
>
> +static void debug_display_dimm_sizes_gpu(struct amd64_pvt *pvt, u8 ctrl)
> +{
> + int size, cs = 0, cs_mode;
> +
> + edac_printk(KERN_DEBUG, EDAC_MC, "UMC%d chip selects:\n", ctrl);
> +
> + cs_mode = CS_EVEN_PRIMARY | CS_ODD_PRIMARY;
> +
> + for_each_chip_select(cs, ctrl, pvt) {
> + size = pvt->ops->dbam_to_cs(pvt, ctrl, cs_mode, cs);
> + amd64_info(EDAC_MC ": %d: %5dMB\n", cs, size);
> + }
> +}
> +
> static void __dump_misc_regs_df(struct amd64_pvt *pvt)
> {
> struct amd64_umc *umc;
> @@ -1165,6 +1179,27 @@ static void __dump_misc_regs_df(struct amd64_pvt *pvt)
> pvt->dhar, dhar_base(pvt));
> }
>
> +static void __dump_misc_regs_gpu(struct amd64_pvt *pvt)

The function pointer this gets assigned to is called *get*_misc_regs.
But this function is is doing *dump*.

When I see __dump_misc_regs_gpu() I'd expect this function to be called
by a higher level dump_misc_regs() as the "__" denotes layering usually.

There is, in fact, dump_misc_regs() but that one calls
->get_misc_regs().

So this all needs fixing - right now I see a mess.

Also, there's <function_name>_gpu and gpu_<function_name> with the "gpu"
being either prefix or suffix. You need to fix the current ones to be
only prefixes - in a pre-patch - and then add yours as prefixes only too.

And in talking about pre-patches - this one is doing a bunch of things
at once and needs splitting. You do the preparatory work like carving
out common functionality and other refactoring in pre-patches, and then
you add the new functionality ontop.

Also, I don't like ->is_gpu one bit, it being sprinkled like that around
the code. This says that the per-family attrs and ops assignment is
lacking.


> @@ -2982,7 +3132,17 @@ static void decode_umc_error(int node_id, struct mce *m)
>
> pvt->ops->get_umc_err_info(m, &err);
>
> - if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, err.channel, &sys_addr)) {
> + /*
> + * GPU node has #phys[X] which has #channels[Y] each.
> + * On GPUs, df_inst_id = [X] * num_ch_per_phy + [Y].
> + * On CPUs, "Channel"="UMC Number"="DF Instance ID".
> + */

This comment doesn't look like it is destined for human beings to read
but maybe to be parsed by programs?

> + if (pvt->is_gpu)
> + df_inst_id = (err.csrow * pvt->channel_count / mci->nr_csrows) + err.channel;

I'm not sure we want to log ECCs from the GPUs: what is going to happen
to them, how does the further error recovery look like?

Also, EDAC sysfs structure currently has only DIMMs, below's from my
box.

I don't see how that structure fits the GPUs so here's the $10^6
question: why does EDAC need to know about GPUs?

What is the strategy here?

Your 0th message talks about the "what" but what gets added is not
important - the "why" is.

$ grep -r . /sys/devices/system/edac/mc/ 2>/dev/null
/sys/devices/system/edac/mc/power/runtime_active_time:0
/sys/devices/system/edac/mc/power/runtime_status:unsupported
/sys/devices/system/edac/mc/power/runtime_suspended_time:0
/sys/devices/system/edac/mc/power/control:auto
/sys/devices/system/edac/mc/mc0/ce_count:0
/sys/devices/system/edac/mc/mc0/rank7/dimm_ue_count:0
/sys/devices/system/edac/mc/mc0/rank7/dimm_mem_type:Unbuffered-DDR4
/sys/devices/system/edac/mc/mc0/rank7/power/runtime_active_time:0
/sys/devices/system/edac/mc/mc0/rank7/power/runtime_status:unsupported
/sys/devices/system/edac/mc/mc0/rank7/power/runtime_suspended_time:0
/sys/devices/system/edac/mc/mc0/rank7/power/control:auto
/sys/devices/system/edac/mc/mc0/rank7/dimm_dev_type:Unknown
/sys/devices/system/edac/mc/mc0/rank7/size:8192
/sys/devices/system/edac/mc/mc0/rank7/dimm_ce_count:0
/sys/devices/system/edac/mc/mc0/rank7/dimm_label:mc#0csrow#3channel#1
/sys/devices/system/edac/mc/mc0/rank7/dimm_location:csrow 3 channel 1
/sys/devices/system/edac/mc/mc0/rank7/dimm_edac_mode:SECDED
/sys/devices/system/edac/mc/mc0/topmem:0x00000000e0000000
/sys/devices/system/edac/mc/mc0/mc_name:F17h
/sys/devices/system/edac/mc/mc0/rank5/dimm_ue_count:0
/sys/devices/system/edac/mc/mc0/rank5/dimm_mem_type:Unbuffered-DDR4
/sys/devices/system/edac/mc/mc0/rank5/power/runtime_active_time:0
/sys/devices/system/edac/mc/mc0/rank5/power/runtime_status:unsupported
/sys/devices/system/edac/mc/mc0/rank5/power/runtime_suspended_time:0
/sys/devices/system/edac/mc/mc0/rank5/power/control:auto
/sys/devices/system/edac/mc/mc0/rank5/dimm_dev_type:Unknown
/sys/devices/system/edac/mc/mc0/rank5/size:8192
/sys/devices/system/edac/mc/mc0/rank5/dimm_ce_count:0
/sys/devices/system/edac/mc/mc0/rank5/dimm_label:mc#0csrow#2channel#1
/sys/devices/system/edac/mc/mc0/rank5/dimm_location:csrow 2 channel 1
/sys/devices/system/edac/mc/mc0/rank5/dimm_edac_mode:SECDED
/sys/devices/system/edac/mc/mc0/dram_hole:0 0 0
/sys/devices/system/edac/mc/mc0/ce_noinfo_count:0
...

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Subject: Re: [PATCH v6 3/5] EDAC/amd64: Extend family ops functions

Hi Boris

On 11/10/2021 11:15 PM, Borislav Petkov wrote:
> [CAUTION: External Email]
>
> On Thu, Oct 28, 2021 at 06:31:04PM +0530, Naveen Krishna Chatradhi wrote:
>> @@ -1217,28 +1214,39 @@ static void dump_misc_regs(struct amd64_pvt *pvt)
>> /*
>> * See BKDG, F2x[1,0][5C:40], F2[1,0][6C:60]
>> */
>> -static void prep_chip_selects(struct amd64_pvt *pvt)
>> +static void k8_prep_chip_selects(struct amd64_pvt *pvt)
>> {
>> if (pvt->fam == 0xf && pvt->ext_model < K8_REV_F) {
>> pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8;
>> pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 8;
>> - } else if (pvt->fam == 0x15 && pvt->model == 0x30) {
>> - pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 4;
>> - pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2;
>> - } else if (pvt->fam >= 0x17) {
>> - int umc;
>> -
>> - for_each_umc(umc) {
>> - pvt->csels[umc].b_cnt = 4;
>> - pvt->csels[umc].m_cnt = 2;
>> - }
>> -
>> - } else {
>> + } else if (pvt->fam == 0xf && pvt->ext_model >= K8_REV_F) {
>> pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8;
>> pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 4;
>> }
> Why is this function looking at the family if it is called a k8_...
> function which will be set only on K8?
Will modify.
>
>> }
>>
>> +static void f15m30_prep_chip_selects(struct amd64_pvt *pvt)
>> +{
>> + pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 4;
>> + pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2;
>> +}
>> +
>> +static void default_prep_chip_selects(struct amd64_pvt *pvt)
>> +{
>> + pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8;
>> + pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 4;
>> +}
>> +
>> +static void f17_prep_chip_selects(struct amd64_pvt *pvt)
>> +{
>> + int umc;
>> +
>> + for_each_umc(umc) {
>> + pvt->csels[umc].b_cnt = 4;
>> + pvt->csels[umc].m_cnt = 2;
>> + }
>> +}
>> +
>> static void read_umc_base_mask(struct amd64_pvt *pvt)
>> {
>> u32 umc_base_reg, umc_base_reg_sec;
>> @@ -1297,11 +1305,6 @@ static void read_dct_base_mask(struct amd64_pvt *pvt)
>> {
>> int cs;
>>
>> - prep_chip_selects(pvt);
>> -
>> - if (pvt->umc)
>> - return read_umc_base_mask(pvt);
>> -
>> for_each_chip_select(cs, 0, pvt) {
>> int reg0 = DCSB0 + (cs * 4);
>> int reg1 = DCSB1 + (cs * 4);
>> @@ -2512,143 +2515,181 @@ static void debug_display_dimm_sizes(struct amd64_pvt *pvt, u8 ctrl)
>> }
>> }
>>
>> +/* Prototypes for family specific ops routines */
>> +static int init_csrows(struct mem_ctl_info *mci);
>> +static int init_csrows_df(struct mem_ctl_info *mci);
>> +static void read_mc_regs(struct amd64_pvt *pvt);
>> +static void __read_mc_regs_df(struct amd64_pvt *pvt);
>> +static void update_umc_err_info(struct mce *m, struct err_info *err);
> Prototypes belong in headers.
Sure, this wont be necessary based on your other comments.
>
>> +
>> +static const struct low_ops k8_ops = {
> So if you're going to do this, you can go a step further and get rid of
> all those static definitions which are completely unused except those of
> the current family we're loaded on.
>
> IOW, you should make all family-specific assignments dynamic and get rid
> of family_types and those ops. Here's an example I did for K8:
>
> diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
> index 1029fe84ba2e..5f1686f22947 100644
> --- a/drivers/edac/amd64_edac.c
> +++ b/drivers/edac/amd64_edac.c
> @@ -3656,8 +3656,18 @@ static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt)
>
> switch (pvt->fam) {
> case 0xf:
> - fam_type = &family_types[K8_CPUS];
> - pvt->ops = &family_types[K8_CPUS].ops;
> + fam_type->ctl_name = "K8";
> + fam_type->f1_id = PCI_DEVICE_ID_AMD_K8_NB_ADDRMAP;
> + fam_type->f2_id = PCI_DEVICE_ID_AMD_K8_NB_MEMCTL;
> + fam_type->max_mcs = 2;
> + pvt->ops->early_channel_count = k8_early_channel_count;
> + pvt->ops->map_sysaddr_to_csrow = k8_map_sysaddr_to_csrow;
> + pvt->ops->dbam_to_cs = k8_dbam_to_chip_select;
> + pvt->ops->prep_chip_select = k8_prep_chip_selects;
> + pvt->ops->get_base_mask = read_dct_base_mask;
> + pvt->ops->get_misc_regs = __dump_misc_regs;
> + pvt->ops->get_mc_regs = read_mc_regs;
> + pvt->ops->populate_csrows = init_csrows;
> break;
>
> case 0x10:
>
> After that, pvt and fam_type have been properly set up.

Agreed, will create family_XXh_init() under per_family_init()'s switch.

At present, we are handling the family_type in 4/5 of this set. Will
club them.

>
>> @@ -3735,7 +3784,16 @@ static int hw_info_get(struct amd64_pvt *pvt)
>> if (ret)
>> return ret;
>>
>> - read_mc_regs(pvt);
>> + pvt->ops->get_mc_regs(pvt);
>> +
>> + pvt->ops->prep_chip_select(pvt);
>> +
>> + pvt->ops->get_base_mask(pvt);
>> +
>> + determine_memory_type(pvt);
>> + edac_dbg(1, " DIMM type: %s\n", edac_mem_types[pvt->dram_type]);
> Move that dbg call at the end of determine_memory_type().
Sure
>
>> +
>> + determine_ecc_sym_sz(pvt);
>>
>> return 0;
>> }
>> @@ -3786,7 +3844,7 @@ static int init_one_instance(struct amd64_pvt *pvt)
>>
>> setup_mci_misc_attrs(mci);
>>
>> - if (init_csrows(mci))
>> + if (pvt->ops->populate_csrows(mci))
>> mci->edac_cap = EDAC_FLAG_NONE;
>>
>> ret = -ENODEV;
>> diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
>> index 85aa820bc165..881ff6322bc9 100644
>> --- a/drivers/edac/amd64_edac.h
>> +++ b/drivers/edac/amd64_edac.h
>> @@ -467,11 +467,17 @@ struct ecc_settings {
>> * functions and per device encoding/decoding logic.
>> */
>> struct low_ops {
>> - int (*early_channel_count) (struct amd64_pvt *pvt);
>> + int (*early_channel_count) (struct amd64_pvt *pvt);
>> void (*map_sysaddr_to_csrow) (struct mem_ctl_info *mci, u64 sys_addr,
>> struct err_info *);
>> - int (*dbam_to_cs) (struct amd64_pvt *pvt, u8 dct,
>> + int (*dbam_to_cs) (struct amd64_pvt *pvt, u8 dct,
>> unsigned cs_mode, int cs_mask_nr);
>> + void (*prep_chip_select) (struct amd64_pvt *pvt);
> That name should be "prep_chip_selects" - plural.
Got it.
>
>> + void (*get_base_mask) (struct amd64_pvt *pvt);
>> + void (*get_misc_regs) (struct amd64_pvt *pvt);
>> + void (*get_mc_regs) (struct amd64_pvt *pvt);
>> + int (*populate_csrows) (struct mem_ctl_info *mci);
>> + void (*get_umc_err_info) (struct mce *m, struct err_info *err);
> WARNING: Unnecessary space before function pointer arguments
> #652: FILE: drivers/edac/amd64_edac.h:470:
> + int (*early_channel_count) (struct amd64_pvt *pvt);
>
> WARNING: Unnecessary space before function pointer arguments
> #656: FILE: drivers/edac/amd64_edac.h:473:
> + int (*dbam_to_cs) (struct amd64_pvt *pvt, u8 dct,
>
> WARNING: Unnecessary space before function pointer arguments
> #658: FILE: drivers/edac/amd64_edac.h:475:
> + void (*prep_chip_select) (struct amd64_pvt *pvt);
>
> WARNING: Unnecessary space before function pointer arguments
> #659: FILE: drivers/edac/amd64_edac.h:476:
> + void (*get_base_mask) (struct amd64_pvt *pvt);
>
> WARNING: Unnecessary space before function pointer arguments
> #660: FILE: drivers/edac/amd64_edac.h:477:
> + void (*get_misc_regs) (struct amd64_pvt *pvt);
>
> WARNING: Unnecessary space before function pointer arguments
> #661: FILE: drivers/edac/amd64_edac.h:478:
> + void (*get_mc_regs) (struct amd64_pvt *pvt);
>
> WARNING: Unnecessary space before function pointer arguments
> #662: FILE: drivers/edac/amd64_edac.h:479:
> + int (*populate_csrows) (struct mem_ctl_info *mci);
>
> WARNING: Unnecessary space before function pointer arguments
> #663: FILE: drivers/edac/amd64_edac.h:480:
> + void (*get_umc_err_info) (struct mce *m, struct err_info *err);
>
> total: 0 errors, 8 warnings, 507 lines checked
>
>
> Please integrate scripts/checkpatch.pl into your patch creation
> workflow. Some of the warnings/errors *actually* make sense.
I have noticed these warnings. As there were previous definitions, i
continued with similar indentation. Will address all of them.
>
> --
> Regards/Gruss,
> Boris.
Thanks for the review.
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&amp;data=04%7C01%7CNaveenKrishna.Chatradhi%40amd.com%7Ca162e192bf9a44cf719308d9a471f62b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637721631674685370%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=eLKOAi8ljBkPLL04EVk4q1jTDQ1PzaNBv2Wltll%2FU24%3D&amp;reserved=0

Subject: Re: [PATCH v6 4/5] EDAC/amd64: Move struct fam_type into amd64_pvt structure

On 11/11/2021 6:09 PM, Borislav Petkov wrote:
> [CAUTION: External Email]
>
> On Thu, Oct 28, 2021 at 06:31:05PM +0530, Naveen Krishna Chatradhi wrote:
>> diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
>> index 881ff6322bc9..82d9f64aa150 100644
>> --- a/drivers/edac/amd64_edac.h
>> +++ b/drivers/edac/amd64_edac.h
>> @@ -389,6 +389,8 @@ struct amd64_pvt {
>> enum mem_type dram_type;
>>
>> struct amd64_umc *umc; /* UMC registers */
>> +
>> + struct amd64_family_type *fam_type;
> With my suggestion to the previous patch to assign ops and per-family
> members dynamically, you can just as well get rid of struct
> amd64_family_type and put its members directly in the pvt structure.
Sure, will do. Thanks.
>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&amp;data=04%7C01%7CNaveenKrishna.Chatradhi%40amd.com%7C9b4ca167dcf14dd0939708d9a5105785%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637722311903173760%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&amp;sdata=lTO5cXSqnkx8Mb6yqP4WnNO1zBTOoqKTt6nWfUFHylI%3D&amp;reserved=0

2021-11-11 18:06:08

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v6 3/5] EDAC/amd64: Extend family ops functions

On Thu, Nov 11, 2021 at 09:53:32PM +0530, Chatradhi, Naveen Krishna wrote:
> Agreed, will create family_XXh_init() under per_family_init()'s switch.

No, you need to simply assign the per-family stuff in that one big
switch-case - no additional family_XXh_init() functions.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-11-12 20:59:39

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH v6 3/5] EDAC/amd64: Extend family ops functions

On Thu, Nov 11, 2021 at 09:53:32PM +0530, Chatradhi, Naveen Krishna wrote:

...
> > > struct low_ops {
> > > - int (*early_channel_count) (struct amd64_pvt *pvt);
> > > + int (*early_channel_count) (struct amd64_pvt *pvt);
> > > void (*map_sysaddr_to_csrow) (struct mem_ctl_info *mci, u64 sys_addr,
> > > struct err_info *);
> > > - int (*dbam_to_cs) (struct amd64_pvt *pvt, u8 dct,
> > > + int (*dbam_to_cs) (struct amd64_pvt *pvt, u8 dct,
> > > unsigned cs_mode, int cs_mask_nr);
> > > + void (*prep_chip_select) (struct amd64_pvt *pvt);
> > That name should be "prep_chip_selects" - plural.
> Got it.
> >
> > > + void (*get_base_mask) (struct amd64_pvt *pvt);
> > > + void (*get_misc_regs) (struct amd64_pvt *pvt);
> > > + void (*get_mc_regs) (struct amd64_pvt *pvt);
> > > + int (*populate_csrows) (struct mem_ctl_info *mci);
> > > + void (*get_umc_err_info) (struct mce *m, struct err_info *err);
> > WARNING: Unnecessary space before function pointer arguments
> > #652: FILE: drivers/edac/amd64_edac.h:470:
> > + int (*early_channel_count) (struct amd64_pvt *pvt);
> >
> > WARNING: Unnecessary space before function pointer arguments
> > #656: FILE: drivers/edac/amd64_edac.h:473:
> > + int (*dbam_to_cs) (struct amd64_pvt *pvt, u8 dct,
> >
> > WARNING: Unnecessary space before function pointer arguments
> > #658: FILE: drivers/edac/amd64_edac.h:475:
> > + void (*prep_chip_select) (struct amd64_pvt *pvt);
> >
> > WARNING: Unnecessary space before function pointer arguments
> > #659: FILE: drivers/edac/amd64_edac.h:476:
> > + void (*get_base_mask) (struct amd64_pvt *pvt);
> >
> > WARNING: Unnecessary space before function pointer arguments
> > #660: FILE: drivers/edac/amd64_edac.h:477:
> > + void (*get_misc_regs) (struct amd64_pvt *pvt);
> >
> > WARNING: Unnecessary space before function pointer arguments
> > #661: FILE: drivers/edac/amd64_edac.h:478:
> > + void (*get_mc_regs) (struct amd64_pvt *pvt);
> >
> > WARNING: Unnecessary space before function pointer arguments
> > #662: FILE: drivers/edac/amd64_edac.h:479:
> > + int (*populate_csrows) (struct mem_ctl_info *mci);
> >
> > WARNING: Unnecessary space before function pointer arguments
> > #663: FILE: drivers/edac/amd64_edac.h:480:
> > + void (*get_umc_err_info) (struct mce *m, struct err_info *err);
> >
> > total: 0 errors, 8 warnings, 507 lines checked
> >
> >
> > Please integrate scripts/checkpatch.pl into your patch creation
> > workflow. Some of the warnings/errors *actually* make sense.
> I have noticed these warnings. As there were previous definitions, i
> continued with similar indentation. Will address all of them.
> >

I've seen the same warning in some of my patches, but I've ignored it for
readability. I'll need to make changes there too. :/

Thanks,
Yazen



2021-11-13 11:58:43

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v6 3/5] EDAC/amd64: Extend family ops functions

On Fri, Nov 12, 2021 at 08:59:21PM +0000, Yazen Ghannam wrote:
> I've seen the same warning in some of my patches, but I've ignored it for
> readability. I'll need to make changes there too. :/

If readability is of concern, you can fixup the issues you see in a
pre-patch - I mean, you're touching the code so might as well fix them
while at it.

What I really don't want to have is patches which simply fix random
checkpatch issues and other whitespace crap because that becomes an
insane mess with new people not really concentrating on fixing actual
bugs but doing whitespace wankery only.

And having such patches always gets in the way when one does git
archeology to figure out why something was done the way it is.

But it's fine to fix those if you're going to touch the code anyway.

HTH.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Subject: Re: [PATCH v6 5/5] EDAC/amd64: Enumerate memory on Aldebaran GPU nodes

Hi Boris

On 11/11/2021 6:42 PM, Borislav Petkov wrote:
> [CAUTION: External Email]
>
> On Thu, Oct 28, 2021 at 06:31:06PM +0530, Naveen Krishna Chatradhi wrote:
>> On newer heterogeneous systems with AMD CPUs, the data fabrics of the GPUs
>> are connected directly via custom links.
>>
>> One such system, where Aldebaran GPU nodes are connected to the
>> Family 19h, model 30h family of CPU nodes, the Aldebaran GPUs can report
>> memory errors via SMCA banks.
>>
>> Aldebaran GPU support was added to DRM framework
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Farchives%2Famd-gfx%2F2021-February%2F059694.html&amp;data=04%7C01%7CNaveenKrishna.Chatradhi%40amd.com%7Ca6aa66df219641c880d008d9a514e5bf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637722331481685552%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Pnx%2FQxgmXS2MfWzu8lVEs22As3yJSWoAsjOvp%2FFOJYw%3D&amp;reserved=0
>>
>> The GPU nodes comes with HBM2 memory in-built, ECC support is enabled by
>> default and the UMCs on GPU node are different from the UMCs on CPU nodes.
>>
>> GPU specific ops routines are defined to extend the amd64_edac
>> module to enumerate HBM memory leveraging the existing edac and the
>> amd64 specific data structures.
>>
>> Note: The UMC Phys on GPU nodes are enumerated as csrows and the UMC
>> channels connected to HBM banks are enumerated as ranks.
> For all your future commit messages, from
> Documentation/process/submitting-patches.rst:
>
> "Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
> instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
> to do frotz", as if you are giving orders to the codebase to change
> its behaviour."
>
> Also, do not talk about what your patch does - that should hopefully be
> visible in the diff itself. Rather, talk about *why* you're doing what
> you're doing.
Sure.
>
>> Cc: Yazen Ghannam <[email protected]>
>> Co-developed-by: Muralidhara M K <[email protected]>
>> Signed-off-by: Muralidhara M K <[email protected]>
>> Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
>> Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.kernel.org%2Fr%2F20210823185437.94417-4-nchatrad%40amd.com&amp;data=04%7C01%7CNaveenKrishna.Chatradhi%40amd.com%7Ca6aa66df219641c880d008d9a514e5bf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637722331481685552%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=aD3jFWS4RrPrIRF5mFoggBs%2BYqUseU3adJFqlrbD2D8%3D&amp;reserved=0
>> ---
>> Changes since v5:
>> Removed else condition in per_family_init for 19h family
>>
>> Changes since v4:
>> Split "f17_addr_mask_to_cs_size" instead as did in 3rd patch earlier
>>
>> Changes since v3:
>> 1. Bifurcated the GPU code from v2
>>
>> Changes since v2:
>> 1. Restored line deletions and handled minor comments
>> 2. Modified commit message and some of the function comments
>> 3. variable df_inst_id is introduced instead of umc_num
>>
>> Changes since v1:
>> 1. Modifed the commit message
>> 2. Change the edac_cap
>> 3. kept sizes of both cpu and noncpu together
>> 4. return success if the !F3 condition true and remove unnecessary validation
>>
>> drivers/edac/amd64_edac.c | 298 +++++++++++++++++++++++++++++++++-----
>> drivers/edac/amd64_edac.h | 27 ++++
>> 2 files changed, 292 insertions(+), 33 deletions(-)
>>
>> diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
>> index 7953ffe9d547..b404fa5b03ce 100644
>> --- a/drivers/edac/amd64_edac.c
>> +++ b/drivers/edac/amd64_edac.c
>> @@ -1121,6 +1121,20 @@ static void debug_display_dimm_sizes_df(struct amd64_pvt *pvt, u8 ctrl)
>> }
>> }
>>
>> +static void debug_display_dimm_sizes_gpu(struct amd64_pvt *pvt, u8 ctrl)
>> +{
>> + int size, cs = 0, cs_mode;
>> +
>> + edac_printk(KERN_DEBUG, EDAC_MC, "UMC%d chip selects:\n", ctrl);
>> +
>> + cs_mode = CS_EVEN_PRIMARY | CS_ODD_PRIMARY;
>> +
>> + for_each_chip_select(cs, ctrl, pvt) {
>> + size = pvt->ops->dbam_to_cs(pvt, ctrl, cs_mode, cs);
>> + amd64_info(EDAC_MC ": %d: %5dMB\n", cs, size);
>> + }
>> +}
>> +
>> static void __dump_misc_regs_df(struct amd64_pvt *pvt)
>> {
>> struct amd64_umc *umc;
>> @@ -1165,6 +1179,27 @@ static void __dump_misc_regs_df(struct amd64_pvt *pvt)
>> pvt->dhar, dhar_base(pvt));
>> }
>>
>> +static void __dump_misc_regs_gpu(struct amd64_pvt *pvt)
> The function pointer this gets assigned to is called *get*_misc_regs.
> But this function is is doing *dump*.
>
> When I see __dump_misc_regs_gpu() I'd expect this function to be called
> by a higher level dump_misc_regs() as the "__" denotes layering usually.
>
> There is, in fact, dump_misc_regs() but that one calls
> ->get_misc_regs().
>
> So this all needs fixing - right now I see a mess.
>
> Also, there's <function_name>_gpu and gpu_<function_name> with the "gpu"
> being either prefix or suffix. You need to fix the current ones to be
> only prefixes - in a pre-patch - and then add yours as prefixes only too.
Will modify the names to use prefixes
>
> And in talking about pre-patches - this one is doing a bunch of things
> at once and needs splitting. You do the preparatory work like carving
> out common functionality and other refactoring in pre-patches, and then
> you add the new functionality ontop.
Sure, i will split this.
>
> Also, I don't like ->is_gpu one bit, it being sprinkled like that around
> the code. This says that the per-family attrs and ops assignment is
> lacking.
Yes, adding few more ops routines should help get rid of this if conditions.
>
>
>> @@ -2982,7 +3132,17 @@ static void decode_umc_error(int node_id, struct mce *m)
>>
>> pvt->ops->get_umc_err_info(m, &err);
>>
>> - if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, err.channel, &sys_addr)) {
>> + /*
>> + * GPU node has #phys[X] which has #channels[Y] each.
>> + * On GPUs, df_inst_id = [X] * num_ch_per_phy + [Y].
>> + * On CPUs, "Channel"="UMC Number"="DF Instance ID".
>> + */
> This comment doesn't look like it is destined for human beings to read
> but maybe to be parsed by programs?

The present system supports the following setup

A GPU node includes four Unified Memory Controllers (UMC). Each UMC
contains eight UMCCH instances.

Each UMCCH controls one 128-bit HBM2e (2GB) channel. The UMC interfaces
to the system via the Data Fabric (DF) Coherent Slave.


For a generalized function, i will modify the comment as

"A GPU node has 'X' number of UMC PHYs and 'Y' number of UMCCH instances
each. This results in 'X*Y' number of

instances in the Data Fabric. Therefore the Data Fabric ID of an
instance can be found with the following formula:

df_inst_id = 'X' * number of channels per PHY + 'Y'

>
>> + if (pvt->is_gpu)
>> + df_inst_id = (err.csrow * pvt->channel_count / mci->nr_csrows) + err.channel;
> I'm not sure we want to log ECCs from the GPUs: what is going to happen
> to them, how does the further error recovery look like?

AMD GPU has support for memory error reporting via the SMCA register
banks, functionality is similar to the CPU dies.

The customer/end users of this product want to count memory errors and
they want to do this in the OS using similar

mechanisms as they are used to on the CPU side.

>
> Also, EDAC sysfs structure currently has only DIMMs, below's from my
> box.
>
> I don't see how that structure fits the GPUs so here's the $10^6
> question: why does EDAC need to know about GPUs?

The errors are not specific to GPUs, the errors are originating from
HBM2e memory chips on the GPU.

As a first step, I'm trying to leverage the existing EDAC interfaces to
report the HBM errors.

Page retirement and storing the bad pages info on a persistent storage
can be the next steps.

>
> What is the strategy here?
>
> Your 0th message talks about the "what" but what gets added is not
> important - the "why" is.
Sure, i will update with the why.
>
> $ grep -r . /sys/devices/system/edac/mc/ 2>/dev/null
> /sys/devices/system/edac/mc/power/runtime_active_time:0
> /sys/devices/system/edac/mc/power/runtime_status:unsupported
> /sys/devices/system/edac/mc/power/runtime_suspended_time:0
> /sys/devices/system/edac/mc/power/control:auto
> /sys/devices/system/edac/mc/mc0/ce_count:0
> /sys/devices/system/edac/mc/mc0/rank7/dimm_ue_count:0
> /sys/devices/system/edac/mc/mc0/rank7/dimm_mem_type:Unbuffered-DDR4
> /sys/devices/system/edac/mc/mc0/rank7/power/runtime_active_time:0
> /sys/devices/system/edac/mc/mc0/rank7/power/runtime_status:unsupported
> /sys/devices/system/edac/mc/mc0/rank7/power/runtime_suspended_time:0
> /sys/devices/system/edac/mc/mc0/rank7/power/control:auto
> /sys/devices/system/edac/mc/mc0/rank7/dimm_dev_type:Unknown
> /sys/devices/system/edac/mc/mc0/rank7/size:8192
> /sys/devices/system/edac/mc/mc0/rank7/dimm_ce_count:0
> /sys/devices/system/edac/mc/mc0/rank7/dimm_label:mc#0csrow#3channel#1
> /sys/devices/system/edac/mc/mc0/rank7/dimm_location:csrow 3 channel 1
> /sys/devices/system/edac/mc/mc0/rank7/dimm_edac_mode:SECDED
> /sys/devices/system/edac/mc/mc0/topmem:0x00000000e0000000
> /sys/devices/system/edac/mc/mc0/mc_name:F17h
> /sys/devices/system/edac/mc/mc0/rank5/dimm_ue_count:0
> /sys/devices/system/edac/mc/mc0/rank5/dimm_mem_type:Unbuffered-DDR4
> /sys/devices/system/edac/mc/mc0/rank5/power/runtime_active_time:0
> /sys/devices/system/edac/mc/mc0/rank5/power/runtime_status:unsupported
> /sys/devices/system/edac/mc/mc0/rank5/power/runtime_suspended_time:0
> /sys/devices/system/edac/mc/mc0/rank5/power/control:auto
> /sys/devices/system/edac/mc/mc0/rank5/dimm_dev_type:Unknown
> /sys/devices/system/edac/mc/mc0/rank5/size:8192
> /sys/devices/system/edac/mc/mc0/rank5/dimm_ce_count:0
> /sys/devices/system/edac/mc/mc0/rank5/dimm_label:mc#0csrow#2channel#1
> /sys/devices/system/edac/mc/mc0/rank5/dimm_location:csrow 2 channel 1
> /sys/devices/system/edac/mc/mc0/rank5/dimm_edac_mode:SECDED
> /sys/devices/system/edac/mc/mc0/dram_hole:0 0 0
> /sys/devices/system/edac/mc/mc0/ce_noinfo_count:0
> ...
>
> --
> Regards/Gruss,
> Boris.

Regards,

Naveen

>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&amp;data=04%7C01%7CNaveenKrishna.Chatradhi%40amd.com%7Ca6aa66df219641c880d008d9a514e5bf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637722331481685552%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=fw%2B%2F8rCVRhhZ0xp8XLUW5rvoDUnfNQzwJleHVoPmDkw%3D&amp;reserved=0

2021-11-15 16:06:13

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v6 5/5] EDAC/amd64: Enumerate memory on Aldebaran GPU nodes

On Mon, Nov 15, 2021 at 08:54:55PM +0530, Chatradhi, Naveen Krishna wrote:
> The errors are not specific to GPUs, the errors are originating from HBM2e
> memory chips on the GPU.
>
> As a first step, I'm trying to leverage the existing EDAC interfaces to
> report the HBM errors.

Report them how? How do the HBM chips fit in the EDAC sysfs hierarchy?
Does it even work with the current hierarchy or does EDAC need more
major restructuring?

You can send me an example from sysfs on such a system, privately is
fine too.

> Page retirement and storing the bad pages info on a persistent storage can
> be the next steps.

If you're thinking about plugging this into memory_failure(), then this
has nothing to do with EDAC.

All EDAC can give you is error count numbers in sysfs.

So I'd like to see where this is going first, and whether it is even
worth it adding it to EDAC.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette