Subject: [PATCH 0/7] x86/edac/amd64: Add support for noncpu nodes

On newer heterogeneous systems from AMD with GPU nodes connected via
xGMI links to the CPUs, the GPU dies are interfaced with HBM2 memory.

This patchset applies on top of the following series by Yazen Ghannam
AMD MCA Address Translation Updates
[https://patchwork.kernel.org/project/linux-edac/list/?series=505989]

This patchset does the following
1. Add support for northbridges on Aldebaran
* x86/amd_nb: Add Aldebaran device to PCI IDs
* x86/amd_nb: Add support for northbridges on Aldebaran
2. Add HBM memory type in EDAC
* EDAC/mc: Add new HBM2 memory type
3. Modifies the amd64_edac module to
a. Handle the UMCs on the noncpu nodes,
* EDAC/mce_amd: extract node id from InstanceHi in IPID
b. Enumerate HBM memory and add address translation
* EDAC/amd64: Enumerate memory on noncpu nodes
c. Address translation on Data Fabric version 3.5.
* EDAC/amd64: Add address translation support for DF3.5
* EDAC/amd64: Add fixed UMC to CS mapping


Aldebaran has 2 Dies (enumerated as a MCx, x= 8 ~ 15)
Each Die has 4 UMCs (enumerated as csrowx, x=0~3)
Each die has 2 root ports, with 4 misc port for each root.
Each UMC manages 8 UMC channels each connected to 2GB of HBM memory.

Muralidhara M K (3):
x86/amd_nb: Add Aldebaran device to PCI IDs
x86/amd_nb: Add support for northbridges on Aldebaran
EDAC/amd64: Add address translation support for DF3.5

Naveen Krishna Chatradhi (3):
EDAC/mc: Add new HBM2 memory type
EDAC/mce_amd: extract node id from InstanceHi in IPID
EDAC/amd64: Enumerate memory on noncpu nodes

Yazen Ghannam (1):
EDAC/amd64: Add fixed UMC to CS mapping

arch/x86/include/asm/amd_nb.h | 6 +
arch/x86/kernel/amd_nb.c | 62 +++-
drivers/edac/amd64_edac.c | 546 +++++++++++++++++++++++++++++-----
drivers/edac/amd64_edac.h | 27 ++
drivers/edac/edac_mc.c | 1 +
drivers/edac/mce_amd.c | 15 +-
include/linux/edac.h | 3 +
include/linux/pci_ids.h | 1 +
8 files changed, 582 insertions(+), 79 deletions(-)

--
2.25.1


Subject: [PATCH 1/7] x86/amd_nb: Add Aldebaran device to PCI IDs

From: Muralidhara M K <[email protected]>

Add Aldebaran device to the PCI ID database. Since this device has a
configurable PCIe endpoint, it could be used with different drivers.

Signed-off-by: Muralidhara M K <[email protected]>
Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
---
include/linux/pci_ids.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index 4bac1831de80..d9aae90dfce9 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -554,6 +554,7 @@
#define PCI_DEVICE_ID_AMD_17H_M30H_DF_F3 0x1493
#define PCI_DEVICE_ID_AMD_17H_M60H_DF_F3 0x144b
#define PCI_DEVICE_ID_AMD_17H_M70H_DF_F3 0x1443
+#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3 0x14d3
#define PCI_DEVICE_ID_AMD_19H_DF_F3 0x1653
#define PCI_DEVICE_ID_AMD_19H_M50H_DF_F3 0x166d
#define PCI_DEVICE_ID_AMD_CNB17H_F3 0x1703
--
2.25.1

Subject: [PATCH 4/7] EDAC/mce_amd: extract node id from InstanceHi in IPID

On AMD systems with SMCA banks on NONCPU nodes, the node id information
is available in the InstanceHI[47:44] of the IPID register.

Signed-off-by: Muralidhara M K <[email protected]>
Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
---
drivers/edac/mce_amd.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index 27d56920b469..364dfb6e359d 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -1049,6 +1049,7 @@ static void decode_smca_error(struct mce *m)
enum smca_bank_types bank_type;
const char *ip_name;
u8 xec = XEC(m->status, xec_mask);
+ u32 node_id = 0;

if (m->bank >= ARRAY_SIZE(smca_banks))
return;
@@ -1072,8 +1073,18 @@ static void decode_smca_error(struct mce *m)
if (xec < smca_mce_descs[bank_type].num_descs)
pr_cont(", %s.\n", smca_mce_descs[bank_type].descs[xec]);

- if (bank_type == SMCA_UMC && xec == 0 && decode_dram_ecc)
- decode_dram_ecc(topology_die_id(m->extcpu), m);
+ /*
+ * SMCA_UMC_V2 is used on the noncpu nodes, extract the node id
+ * from the InstanceHI[47:44] of the IPID register.
+ */
+ if (bank_type == SMCA_UMC_V2 && xec == 0)
+ node_id = ((m->ipid >> 44) & 0xF);
+
+ if (bank_type == SMCA_UMC && xec == 0)
+ node_id = topology_die_id(m->extcpu);
+
+ if (decode_dram_ecc)
+ decode_dram_ecc(node_id, m);
}

static inline void amd_decode_err_code(u16 ec)
--
2.25.1

Subject: [PATCH 6/7] EDAC/amd64: Add address translation support for DF3.5

From: Muralidhara M K <[email protected]>

Add support for address translation on Data Fabric version 3.5.

Add new data fabric ops and interleaving modes. Also, adjust how the
DRAM address maps are found early in the translation for certain cases.

Signed-off-by: Muralidhara M K <[email protected]>
Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
Co-developed-by: Yazen Ghannam <[email protected]>
Signed-off-by: Yazen Ghannam <[email protected]>
---
drivers/edac/amd64_edac.c | 213 +++++++++++++++++++++++++++++++++++++-
1 file changed, 209 insertions(+), 4 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 8fe0a5e3c8f2..a4197061ac2a 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -996,6 +996,7 @@ static int sys_addr_to_csrow(struct mem_ctl_info *mci, u64 sys_addr)
/*
* Glossary of acronyms used in address translation for Zen-based systems
*
+ * CCM = Cache Coherent Master
* COD = Cluster-on-Die
* CS = Coherent Slave
* DF = Data Fabric
@@ -1064,6 +1065,7 @@ static int amd_df_indirect_read(u16 node, struct df_reg reg, u8 instance_id, u32
enum df_reg_names {
/* Function 0 */
FAB_BLK_INST_CNT,
+ FAB_BLK_INST_INFO_0,
FAB_BLK_INST_INFO_3,
DRAM_HOLE_CTL,
DRAM_BASE_ADDR,
@@ -1074,11 +1076,16 @@ enum df_reg_names {
/* Function 1 */
SYS_FAB_ID_MASK,
SYS_FAB_ID_MASK_1,
+ SYSFABIDMASK0_DF3POINT5,
+ SYSFABIDMASK1_DF3POINT5,
+ SYSFABIDMASK2_DF3POINT5,
};

static struct df_reg df_regs[] = {
/* D18F0x40 (FabricBlockInstanceCount) */
[FAB_BLK_INST_CNT] = {0, 0x40},
+ /* D18F0x44 (FabricBlockInstanceInformation0) */
+ [FAB_BLK_INST_INFO_0] = {0, 0x44},
/* D18F0x50 (FabricBlockInstanceInformation3_CS) */
[FAB_BLK_INST_INFO_3] = {0, 0x50},
/* D18F0x104 (DramHoleControl) */
@@ -1095,6 +1102,12 @@ static struct df_reg df_regs[] = {
[SYS_FAB_ID_MASK] = {1, 0x208},
/* D18F1x20C (SystemFabricIdMask1) */
[SYS_FAB_ID_MASK_1] = {1, 0x20C},
+ /* D18F1x150 (SystemFabricIdMask0) */
+ [SYSFABIDMASK0_DF3POINT5] = {1, 0x150},
+ /* D18F1x154 (SystemFabricIdMask1) */
+ [SYSFABIDMASK1_DF3POINT5] = {1, 0x154},
+ /* D18F1x158 (SystemFabricIdMask2) */
+ [SYSFABIDMASK2_DF3POINT5] = {1, 0x158},
};

/* These are mapped 1:1 to the hardware values. Special cases are set at > 0x20. */
@@ -1103,9 +1116,14 @@ enum intlv_modes {
NOHASH_2CH = 0x01,
NOHASH_4CH = 0x03,
NOHASH_8CH = 0x05,
+ NOHASH_16CH = 0x07,
+ NOHASH_32CH = 0x08,
HASH_COD4_2CH = 0x0C,
HASH_COD2_4CH = 0x0D,
HASH_COD1_8CH = 0x0E,
+ HASH_8CH = 0x1C,
+ HASH_16CH = 0x1D,
+ HASH_32CH = 0x1E,
DF2_HASH_2CH = 0x21,
};

@@ -1118,6 +1136,7 @@ struct addr_ctx {
u32 reg_limit_addr;
u32 reg_fab_id_mask0;
u32 reg_fab_id_mask1;
+ u32 reg_fab_id_mask2;
u16 cs_fabric_id;
u16 die_id_mask;
u16 socket_id_mask;
@@ -1447,6 +1466,128 @@ struct data_fabric_ops df3_ops = {
.get_component_id_mask = &get_component_id_mask_df3,
};

+static int dehash_addr_df35(struct addr_ctx *ctx)
+{
+ u8 hashed_bit, intlv_ctl_64k, intlv_ctl_2M, intlv_ctl_1G;
+ u8 num_intlv_bits = ctx->intlv_num_chan;
+ u32 tmp, i;
+
+ if (amd_df_indirect_read(0, df_regs[DF_GLOBAL_CTL], DF_BROADCAST, &tmp))
+ return -EINVAL;
+
+ intlv_ctl_64k = !!((tmp >> 20) & 0x1);
+ intlv_ctl_2M = !!((tmp >> 21) & 0x1);
+ intlv_ctl_1G = !!((tmp >> 22) & 0x1);
+
+ /*
+ * CSSelect[0] = XOR of addr{8, 16, 21, 30};
+ * CSSelect[1] = XOR of addr{9, 17, 22, 31};
+ * CSSelect[2] = XOR of addr{10, 18, 23, 32};
+ * CSSelect[3] = XOR of addr{11, 19, 24, 33}; - 16 and 32 channel only
+ * CSSelect[4] = XOR of addr{12, 20, 25, 34}; - 32 channel only
+ */
+ for (i = 0; i < num_intlv_bits; i++) {
+ hashed_bit = ((ctx->ret_addr >> (8 + i)) ^
+ ((ctx->ret_addr >> (16 + i)) & intlv_ctl_64k) ^
+ ((ctx->ret_addr >> (21 + i)) & intlv_ctl_2M) ^
+ ((ctx->ret_addr >> (30 + i)) & intlv_ctl_1G));
+
+ hashed_bit &= BIT(0);
+ if (hashed_bit != ((ctx->ret_addr >> (8 + i)) & BIT(0)))
+ ctx->ret_addr ^= BIT(8 + i);
+ }
+
+ return 0;
+}
+
+static int get_intlv_mode_df35(struct addr_ctx *ctx)
+{
+ ctx->intlv_mode = (ctx->reg_base_addr >> 2) & 0x1F;
+
+ if (ctx->intlv_mode == HASH_COD4_2CH ||
+ ctx->intlv_mode == HASH_COD2_4CH ||
+ ctx->intlv_mode == HASH_COD1_8CH) {
+ ctx->make_space_for_cs_id = &make_space_for_cs_id_cod_hash;
+ ctx->insert_cs_id = &insert_cs_id_cod_hash;
+ ctx->dehash_addr = &dehash_addr_df3;
+ } else {
+ ctx->make_space_for_cs_id = &make_space_for_cs_id_simple;
+ ctx->insert_cs_id = &insert_cs_id_simple;
+
+ if (ctx->intlv_mode == HASH_8CH ||
+ ctx->intlv_mode == HASH_16CH ||
+ ctx->intlv_mode == HASH_32CH)
+ ctx->dehash_addr = &dehash_addr_df35;
+ }
+
+ return 0;
+}
+
+static void get_intlv_num_dies_df35(struct addr_ctx *ctx)
+{
+ ctx->intlv_num_dies = (ctx->reg_base_addr >> 7) & 0x1;
+}
+
+static u8 get_die_id_shift_df35(struct addr_ctx *ctx)
+{
+ return ctx->node_id_shift;
+}
+
+static u8 get_socket_id_shift_df35(struct addr_ctx *ctx)
+{
+ return (ctx->reg_fab_id_mask1 >> 8) & 0xF;
+}
+
+static int get_masks_df35(struct addr_ctx *ctx)
+{
+ if (amd_df_indirect_read(0, df_regs[SYSFABIDMASK1_DF3POINT5],
+ DF_BROADCAST, &ctx->reg_fab_id_mask1))
+ return -EINVAL;
+
+ if (amd_df_indirect_read(0, df_regs[SYSFABIDMASK2_DF3POINT5],
+ DF_BROADCAST, &ctx->reg_fab_id_mask2))
+ return -EINVAL;
+
+ ctx->node_id_shift = ctx->reg_fab_id_mask1 & 0xF;
+
+ ctx->die_id_mask = ctx->reg_fab_id_mask2 & 0xFFFF;
+
+ ctx->socket_id_mask = (ctx->reg_fab_id_mask2 >> 16) & 0xFFFF;
+
+ return 0;
+}
+
+static u16 get_dst_fabric_id_df35(struct addr_ctx *ctx)
+{
+ return ctx->reg_limit_addr & 0xFFF;
+}
+
+static int get_cs_fabric_id_df35(struct addr_ctx *ctx)
+{
+ ctx->cs_fabric_id = ctx->inst_id | (ctx->nid << ctx->node_id_shift);
+
+ return 0;
+}
+
+static u16 get_component_id_mask_df35(struct addr_ctx *ctx)
+{
+ return ctx->reg_fab_id_mask0 & 0xFFFF;
+}
+
+struct data_fabric_ops df3point5_ops = {
+ .get_hi_addr_offset = &get_hi_addr_offset_df3,
+ .get_intlv_mode = &get_intlv_mode_df35,
+ .get_intlv_addr_sel = &get_intlv_addr_sel_df3,
+ .get_intlv_num_dies = &get_intlv_num_dies_df35,
+ .get_intlv_num_sockets = &get_intlv_num_sockets_df3,
+ .get_masks = &get_masks_df35,
+ .get_die_id_shift = &get_die_id_shift_df35,
+ .get_socket_id_shift = &get_socket_id_shift_df35,
+ .get_dst_fabric_id = &get_dst_fabric_id_df35,
+ .get_cs_fabric_id = &get_cs_fabric_id_df35,
+ .get_component_id_mask = &get_component_id_mask_df35,
+};
+
struct data_fabric_ops *df_ops;

static int set_df_ops(struct addr_ctx *ctx)
@@ -1458,6 +1599,16 @@ static int set_df_ops(struct addr_ctx *ctx)

ctx->num_blk_instances = tmp & 0xFF;

+ if (amd_df_indirect_read(0, df_regs[SYSFABIDMASK0_DF3POINT5],
+ DF_BROADCAST, &ctx->reg_fab_id_mask0))
+ return -EINVAL;
+
+ if ((ctx->reg_fab_id_mask0 & 0xFF) != 0) {
+ ctx->late_hole_remove = true;
+ df_ops = &df3point5_ops;
+ return 0;
+ }
+
if (amd_df_indirect_read(0, df_regs[SYS_FAB_ID_MASK],
DF_BROADCAST, &ctx->reg_fab_id_mask0))
return -EINVAL;
@@ -1558,8 +1709,17 @@ static void get_intlv_num_chan(struct addr_ctx *ctx)
break;
case NOHASH_8CH:
case HASH_COD1_8CH:
+ case HASH_8CH:
ctx->intlv_num_chan = 3;
break;
+ case NOHASH_16CH:
+ case HASH_16CH:
+ ctx->intlv_num_chan = 4;
+ break;
+ case NOHASH_32CH:
+ case HASH_32CH:
+ ctx->intlv_num_chan = 5;
+ break;
default:
/* Valid interleaving modes where checked earlier. */
break;
@@ -1665,6 +1825,43 @@ static int addr_over_limit(struct addr_ctx *ctx)
return 0;
}

+static int find_ccm_instance_id(struct addr_ctx *ctx)
+{
+ u32 temp;
+
+ for (ctx->inst_id = 0; ctx->inst_id < ctx->num_blk_instances; ctx->inst_id++) {
+ if (amd_df_indirect_read(0, df_regs[FAB_BLK_INST_INFO_0], ctx->inst_id, &temp))
+ return -EINVAL;
+
+ if (temp == 0)
+ continue;
+
+ if ((temp & 0xF) == 0)
+ return 0;
+ }
+
+ return -EINVAL;
+}
+
+#define DF_NUM_DRAM_MAPS_AVAILABLE 16
+static int find_map_reg_by_dstfabricid(struct addr_ctx *ctx)
+{
+ u16 node_id_mask = (ctx->reg_fab_id_mask0 >> 16) & 0xFFFF;
+ u16 dst_fabric_id;
+
+ for (ctx->map_num = 0; ctx->map_num < DF_NUM_DRAM_MAPS_AVAILABLE ; ctx->map_num++) {
+ if (get_dram_addr_map(ctx))
+ continue;
+
+ dst_fabric_id = df_ops->get_dst_fabric_id(ctx);
+
+ if ((dst_fabric_id & node_id_mask) == (ctx->cs_fabric_id & node_id_mask))
+ return 0;
+ }
+
+ return -EINVAL;
+}
+
static int umc_normaddr_to_sysaddr(u64 *addr, u16 nid, u8 umc)
{
struct addr_ctx ctx;
@@ -1686,11 +1883,19 @@ static int umc_normaddr_to_sysaddr(u64 *addr, u16 nid, u8 umc)
if (df_ops->get_cs_fabric_id(&ctx))
return -EINVAL;

- if (remove_dram_offset(&ctx))
- return -EINVAL;
+ if (ctx.nid >= NONCPU_NODE_INDEX) {
+ if (find_ccm_instance_id(&ctx))
+ return -EINVAL;

- if (get_dram_addr_map(&ctx))
- return -EINVAL;
+ if (find_map_reg_by_dstfabricid(&ctx))
+ return -EINVAL;
+ } else {
+ if (remove_dram_offset(&ctx))
+ return -EINVAL;
+
+ if (get_dram_addr_map(&ctx))
+ return -EINVAL;
+ }

if (df_ops->get_intlv_mode(&ctx))
return -EINVAL;
--
2.25.1

Subject: [PATCH 7/7] EDAC/amd64: Add fixed UMC to CS mapping

From: Yazen Ghannam <[email protected]>

This patch handles the UMC to CS mapping for Aldebaran

Aldebaran has 2 dies and are enumerated alternatively
* die0's are enumerated as node 8, 10, 12 and 14
* die1's are enumerated as node 9, 11, 13 and 15

Signed-off-by: Yazen Ghannam <[email protected]>
Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
---
drivers/edac/amd64_edac.c | 33 +++++++++++++++++++++++++++++++++
1 file changed, 33 insertions(+)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index a4197061ac2a..3416699fa7f6 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -1562,8 +1562,41 @@ static u16 get_dst_fabric_id_df35(struct addr_ctx *ctx)
return ctx->reg_limit_addr & 0xFFF;
}

+/* UMC to CS mapping for Aldebaran die[0]s */
+u8 umc_to_cs_mapping_aldebaran_die0[] = { 28, 20, 24, 16, 12, 4, 8, 0,
+ 6, 30, 2, 26, 22, 14, 18, 10,
+ 19, 11, 15, 7, 3, 27, 31, 23,
+ 9, 1, 5, 29, 25, 17, 21, 13};
+
+/* UMC to CS mapping for Aldebaran die[1]s */
+u8 umc_to_cs_mapping_aldebaran_die1[] = { 19, 11, 15, 7, 3, 27, 31, 23,
+ 9, 1, 5, 29, 25, 17, 21, 13,
+ 28, 20, 24, 16, 12, 4, 8, 0,
+ 6, 30, 2, 26, 22, 14, 18, 10};
+
+int get_umc_to_cs_mapping(struct addr_ctx *ctx)
+{
+ if (ctx->inst_id >= sizeof(umc_to_cs_mapping_aldebaran_die0))
+ return -EINVAL;
+
+ /*
+ * Aldebaran has 2 dies and are enumerated alternatively
+ * die0's are enumerated as node 8, 10, 12 and 14
+ * die1's are enumerated as node 9, 11, 13 and 15
+ */
+ if (ctx->nid % 2)
+ ctx->inst_id = umc_to_cs_mapping_aldebaran_die1[ctx->inst_id];
+ else
+ ctx->inst_id = umc_to_cs_mapping_aldebaran_die0[ctx->inst_id];
+
+ return 0;
+}
+
static int get_cs_fabric_id_df35(struct addr_ctx *ctx)
{
+ if (ctx->nid >= NONCPU_NODE_INDEX && get_umc_to_cs_mapping(ctx))
+ return -EINVAL;
+
ctx->cs_fabric_id = ctx->inst_id | (ctx->nid << ctx->node_id_shift);

return 0;
--
2.25.1

Subject: [PATCH 2/7] x86/amd_nb: Add support for northbridges on Aldebaran

From: Muralidhara M K <[email protected]>

On newer heterogeneous systems from AMD, there is a possibility of
having GPU nodes along with CPU nodes with the MCA banks. The GPU
nodes (noncpu nodes) starts enumerating from northbridge index 8.

Aldebaran GPUs have 2 root ports, with 4 misc port for each root.

Signed-off-by: Muralidhara M K <[email protected]>
Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
---
arch/x86/include/asm/amd_nb.h | 6 ++++
arch/x86/kernel/amd_nb.c | 62 ++++++++++++++++++++++++++++++++---
2 files changed, 63 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h
index 00d1a400b7a1..e71581cf00e3 100644
--- a/arch/x86/include/asm/amd_nb.h
+++ b/arch/x86/include/asm/amd_nb.h
@@ -79,6 +79,12 @@ struct amd_northbridge_info {

#ifdef CONFIG_AMD_NB

+/*
+ * On Newer heterogeneous systems from AMD with CPU and GPU nodes connected
+ * via xGMI links, the NON CPU Nodes are enumerated from index 8
+ */
+#define NONCPU_NODE_INDEX 8
+
u16 amd_nb_num(void);
bool amd_nb_has_feature(unsigned int feature);
struct amd_northbridge *node_to_amd_nb(int node);
diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
index 5884dfa619ff..489003e850dd 100644
--- a/arch/x86/kernel/amd_nb.c
+++ b/arch/x86/kernel/amd_nb.c
@@ -26,6 +26,8 @@
#define PCI_DEVICE_ID_AMD_17H_M70H_DF_F4 0x1444
#define PCI_DEVICE_ID_AMD_19H_DF_F4 0x1654
#define PCI_DEVICE_ID_AMD_19H_M50H_DF_F4 0x166e
+#define PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT 0x14bb
+#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4 0x14d4

/* Protect the PCI config register pairs used for SMN. */
static DEFINE_MUTEX(smn_mutex);
@@ -94,6 +96,21 @@ static const struct pci_device_id hygon_nb_link_ids[] = {
{}
};

+static const struct pci_device_id amd_noncpu_root_ids[] = {
+ { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT) },
+ {}
+};
+
+static const struct pci_device_id amd_noncpu_nb_misc_ids[] = {
+ { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3) },
+ {}
+};
+
+static const struct pci_device_id amd_noncpu_nb_link_ids[] = {
+ { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4) },
+ {}
+};
+
const struct amd_nb_bus_dev_range amd_nb_bus_dev_ranges[] __initconst = {
{ 0x00, 0x18, 0x20 },
{ 0xff, 0x00, 0x20 },
@@ -182,11 +199,16 @@ int amd_cache_northbridges(void)
const struct pci_device_id *misc_ids = amd_nb_misc_ids;
const struct pci_device_id *link_ids = amd_nb_link_ids;
const struct pci_device_id *root_ids = amd_root_ids;
+
+ const struct pci_device_id *noncpu_misc_ids = amd_noncpu_nb_misc_ids;
+ const struct pci_device_id *noncpu_link_ids = amd_noncpu_nb_link_ids;
+ const struct pci_device_id *noncpu_root_ids = amd_noncpu_root_ids;
+
struct pci_dev *root, *misc, *link;
struct amd_northbridge *nb;
u16 roots_per_misc = 0;
- u16 misc_count = 0;
- u16 root_count = 0;
+ u16 misc_count = 0, misc_count_noncpu = 0;
+ u16 root_count = 0, root_count_noncpu = 0;
u16 i, j;

if (amd_northbridges.num)
@@ -205,10 +227,16 @@ int amd_cache_northbridges(void)
if (!misc_count)
return -ENODEV;

+ while ((misc = next_northbridge(misc, noncpu_misc_ids)) != NULL)
+ misc_count_noncpu++;
+
root = NULL;
while ((root = next_northbridge(root, root_ids)) != NULL)
root_count++;

+ while ((root = next_northbridge(root, noncpu_root_ids)) != NULL)
+ root_count_noncpu++;
+
if (root_count) {
roots_per_misc = root_count / misc_count;

@@ -222,15 +250,27 @@ int amd_cache_northbridges(void)
}
}

- nb = kcalloc(misc_count, sizeof(struct amd_northbridge), GFP_KERNEL);
+ /*
+ * The valid amd_northbridges are in between (0 ~ misc_count) and
+ * (NONCPU_NODE_INDEX ~ NONCPU_NODE_INDEX + misc_count_noncpu)
+ */
+ if (misc_count_noncpu)
+ /*
+ * There are NONCPU Nodes with pci root ports starting at index 8
+ * allocate few extra cells for simplicity in handling the indexes
+ */
+ amd_northbridges.num = NONCPU_NODE_INDEX + misc_count_noncpu;
+ else
+ amd_northbridges.num = misc_count;
+
+ nb = kcalloc(amd_northbridges.num, sizeof(struct amd_northbridge), GFP_KERNEL);
if (!nb)
return -ENOMEM;

amd_northbridges.nb = nb;
- amd_northbridges.num = misc_count;

link = misc = root = NULL;
- for (i = 0; i < amd_northbridges.num; i++) {
+ for (i = 0; i < misc_count; i++) {
node_to_amd_nb(i)->root = root =
next_northbridge(root, root_ids);
node_to_amd_nb(i)->misc = misc =
@@ -251,6 +291,18 @@ int amd_cache_northbridges(void)
root = next_northbridge(root, root_ids);
}

+ link = misc = root = NULL;
+ if (misc_count_noncpu) {
+ for (i = NONCPU_NODE_INDEX; i < NONCPU_NODE_INDEX + misc_count_noncpu; i++) {
+ node_to_amd_nb(i)->root = root =
+ next_northbridge(root, noncpu_root_ids);
+ node_to_amd_nb(i)->misc = misc =
+ next_northbridge(misc, noncpu_misc_ids);
+ node_to_amd_nb(i)->link = link =
+ next_northbridge(link, noncpu_link_ids);
+ }
+ }
+
if (amd_gart_present())
amd_northbridges.flags |= AMD_NB_GART;

--
2.25.1

Subject: [PATCH 3/7] EDAC/mc: Add new HBM2 memory type

Add a new entry to 'enum mem_type' and a new string to 'edac_mem_types[]'
for HBM2 (High Bandwidth Memory Gen 2) new memory type.

Signed-off-by: Muralidhara M K <[email protected]>
Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
---
drivers/edac/edac_mc.c | 1 +
include/linux/edac.h | 3 +++
2 files changed, 4 insertions(+)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index f6d462d0be2d..2c5975674723 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -166,6 +166,7 @@ const char * const edac_mem_types[] = {
[MEM_DDR5] = "Unbuffered-DDR5",
[MEM_NVDIMM] = "Non-volatile-RAM",
[MEM_WIO2] = "Wide-IO-2",
+ [MEM_HBM2] = "High-bandwidth-memory-Gen2",
};
EXPORT_SYMBOL_GPL(edac_mem_types);

diff --git a/include/linux/edac.h b/include/linux/edac.h
index 76d3562d3006..4207d06996a4 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -184,6 +184,7 @@ static inline char *mc_event_error_type(const unsigned int err_type)
* @MEM_DDR5: Unbuffered DDR5 RAM
* @MEM_NVDIMM: Non-volatile RAM
* @MEM_WIO2: Wide I/O 2.
+ * @MEM_HBM2: High bandwidth Memory Gen 2.
*/
enum mem_type {
MEM_EMPTY = 0,
@@ -212,6 +213,7 @@ enum mem_type {
MEM_DDR5,
MEM_NVDIMM,
MEM_WIO2,
+ MEM_HBM2,
};

#define MEM_FLAG_EMPTY BIT(MEM_EMPTY)
@@ -239,6 +241,7 @@ enum mem_type {
#define MEM_FLAG_DDR5 BIT(MEM_DDR5)
#define MEM_FLAG_NVDIMM BIT(MEM_NVDIMM)
#define MEM_FLAG_WIO2 BIT(MEM_WIO2)
+#define MEM_FLAG_HBM2 BIT(MEM_HBM2)

/**
* enum edac_type - Error Detection and Correction capabilities and mode
--
2.25.1

Subject: [PATCH 5/7] EDAC/amd64: Enumerate memory on noncpu nodes

On newer heterogeneous systems from AMD with GPU nodes connected via
xGMI links to the CPUs, the GPU dies are interfaced with HBM2 memory.

This patch modifies the amd64_edac module to handle the HBM memory
enumeration leveraging the existing edac and the amd64 specific data
structures.

The UMC Phys on GPU nodes are enumerated as csrows
The UMC channels connected to HBMs are enumerated as ranks

Signed-off-by: Muralidhara M K <[email protected]>
Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
---
drivers/edac/amd64_edac.c | 300 +++++++++++++++++++++++++++++---------
drivers/edac/amd64_edac.h | 27 ++++
2 files changed, 259 insertions(+), 68 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 25c6362e414b..8fe0a5e3c8f2 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -1741,6 +1741,9 @@ static unsigned long determine_edac_cap(struct amd64_pvt *pvt)

if (umc_en_mask == dimm_ecc_en_mask)
edac_cap = EDAC_FLAG_SECDED;
+
+ if (pvt->is_noncpu)
+ edac_cap = EDAC_FLAG_EC;
} else {
bit = (pvt->fam > 0xf || pvt->ext_model >= K8_REV_F)
? 19
@@ -1799,6 +1802,9 @@ static int f17_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt)
{
int cs_mode = 0;

+ if (pvt->is_noncpu)
+ return CS_EVEN_PRIMARY | CS_ODD_PRIMARY;
+
if (csrow_enabled(2 * dimm, ctrl, pvt))
cs_mode |= CS_EVEN_PRIMARY;

@@ -1818,6 +1824,15 @@ static void debug_display_dimm_sizes_df(struct amd64_pvt *pvt, u8 ctrl)

edac_printk(KERN_DEBUG, EDAC_MC, "UMC%d chip selects:\n", ctrl);

+ if (pvt->is_noncpu) {
+ cs_mode = f17_get_cs_mode(cs0, ctrl, pvt);
+ for_each_chip_select(cs0, ctrl, pvt) {
+ size0 = pvt->ops->dbam_to_cs(pvt, ctrl, cs_mode, cs0);
+ amd64_info(EDAC_MC ": %d: %5dMB\n", cs0, size0);
+ }
+ return;
+ }
+
for (dimm = 0; dimm < 2; dimm++) {
cs0 = dimm * 2;
cs1 = dimm * 2 + 1;
@@ -1833,43 +1848,53 @@ static void debug_display_dimm_sizes_df(struct amd64_pvt *pvt, u8 ctrl)
}
}

-static void __dump_misc_regs_df(struct amd64_pvt *pvt)
+static void dump_umcch_regs(struct amd64_pvt *pvt, int i)
{
- struct amd64_umc *umc;
- u32 i, tmp, umc_base;
-
- for_each_umc(i) {
- umc_base = get_umc_base(i);
- umc = &pvt->umc[i];
+ struct amd64_umc *umc = &pvt->umc[i];
+ u32 tmp, umc_base;

- edac_dbg(1, "UMC%d DIMM cfg: 0x%x\n", i, umc->dimm_cfg);
+ if (pvt->is_noncpu) {
edac_dbg(1, "UMC%d UMC cfg: 0x%x\n", i, umc->umc_cfg);
edac_dbg(1, "UMC%d SDP ctrl: 0x%x\n", i, umc->sdp_ctrl);
edac_dbg(1, "UMC%d ECC ctrl: 0x%x\n", i, umc->ecc_ctrl);
+ edac_dbg(1, "UMC%d All HBMs support ECC: yes\n", i);
+ return;
+ }

- amd_smn_read(pvt->mc_node_id, umc_base + UMCCH_ECC_BAD_SYMBOL, &tmp);
- edac_dbg(1, "UMC%d ECC bad symbol: 0x%x\n", i, tmp);
-
- amd_smn_read(pvt->mc_node_id, umc_base + UMCCH_UMC_CAP, &tmp);
- edac_dbg(1, "UMC%d UMC cap: 0x%x\n", i, tmp);
- edac_dbg(1, "UMC%d UMC cap high: 0x%x\n", i, umc->umc_cap_hi);
-
- edac_dbg(1, "UMC%d ECC capable: %s, ChipKill ECC capable: %s\n",
- i, (umc->umc_cap_hi & BIT(30)) ? "yes" : "no",
- (umc->umc_cap_hi & BIT(31)) ? "yes" : "no");
- edac_dbg(1, "UMC%d All DIMMs support ECC: %s\n",
- i, (umc->umc_cfg & BIT(12)) ? "yes" : "no");
- edac_dbg(1, "UMC%d x4 DIMMs present: %s\n",
- i, (umc->dimm_cfg & BIT(6)) ? "yes" : "no");
- edac_dbg(1, "UMC%d x16 DIMMs present: %s\n",
- i, (umc->dimm_cfg & BIT(7)) ? "yes" : "no");
-
- if (pvt->dram_type == MEM_LRDDR4) {
- amd_smn_read(pvt->mc_node_id, umc_base + UMCCH_ADDR_CFG, &tmp);
- edac_dbg(1, "UMC%d LRDIMM %dx rank multiply\n",
- i, 1 << ((tmp >> 4) & 0x3));
- }
+ umc_base = get_umc_base(i);
+
+ edac_dbg(1, "UMC%d DIMM cfg: 0x%x\n", i, umc->dimm_cfg);
+
+ amd_smn_read(pvt->mc_node_id, umc_base + UMCCH_ECC_BAD_SYMBOL, &tmp);
+ edac_dbg(1, "UMC%d ECC bad symbol: 0x%x\n", i, tmp);
+
+ amd_smn_read(pvt->mc_node_id, umc_base + UMCCH_UMC_CAP, &tmp);
+ edac_dbg(1, "UMC%d UMC cap: 0x%x\n", i, tmp);
+ edac_dbg(1, "UMC%d UMC cap high: 0x%x\n", i, umc->umc_cap_hi);

+ edac_dbg(1, "UMC%d ECC capable: %s, ChipKill ECC capable: %s\n",
+ i, (umc->umc_cap_hi & BIT(30)) ? "yes" : "no",
+ (umc->umc_cap_hi & BIT(31)) ? "yes" : "no");
+ edac_dbg(1, "UMC%d All DIMMs support ECC: %s\n",
+ i, (umc->umc_cfg & BIT(12)) ? "yes" : "no");
+ edac_dbg(1, "UMC%d x4 DIMMs present: %s\n",
+ i, (umc->dimm_cfg & BIT(6)) ? "yes" : "no");
+ edac_dbg(1, "UMC%d x16 DIMMs present: %s\n",
+ i, (umc->dimm_cfg & BIT(7)) ? "yes" : "no");
+
+ if (pvt->dram_type == MEM_LRDDR4) {
+ amd_smn_read(pvt->mc_node_id, umc_base + UMCCH_ADDR_CFG, &tmp);
+ edac_dbg(1, "UMC%d LRDIMM %dx rank multiply\n",
+ i, 1 << ((tmp >> 4) & 0x3));
+ }
+}
+
+static void __dump_misc_regs_df(struct amd64_pvt *pvt)
+{
+ int i;
+
+ for_each_umc(i) {
+ dump_umcch_regs(pvt, i);
debug_display_dimm_sizes_df(pvt, i);
}

@@ -1937,10 +1962,14 @@ static void prep_chip_selects(struct amd64_pvt *pvt)
pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2;
} else if (pvt->fam >= 0x17) {
int umc;
-
for_each_umc(umc) {
- pvt->csels[umc].b_cnt = 4;
- pvt->csels[umc].m_cnt = 2;
+ if (pvt->is_noncpu) {
+ pvt->csels[umc].b_cnt = 8;
+ pvt->csels[umc].m_cnt = 8;
+ } else {
+ pvt->csels[umc].b_cnt = 4;
+ pvt->csels[umc].m_cnt = 2;
+ }
}

} else {
@@ -1949,6 +1978,31 @@ static void prep_chip_selects(struct amd64_pvt *pvt)
}
}

+static void read_noncpu_umc_base_mask(struct amd64_pvt *pvt)
+{
+ u32 base_reg, mask_reg;
+ u32 *base, *mask;
+ int umc, cs;
+
+ for_each_umc(umc) {
+ for_each_chip_select(cs, umc, pvt) {
+ base_reg = get_noncpu_umc_base(umc, cs) + UMCCH_BASE_ADDR;
+ base = &pvt->csels[umc].csbases[cs];
+
+ if (!amd_smn_read(pvt->mc_node_id, base_reg, base))
+ edac_dbg(0, " DCSB%d[%d]=0x%08x reg: 0x%x\n",
+ umc, cs, *base, base_reg);
+
+ mask_reg = get_noncpu_umc_base(umc, cs) + UMCCH_ADDR_MASK;
+ mask = &pvt->csels[umc].csmasks[cs];
+
+ if (!amd_smn_read(pvt->mc_node_id, mask_reg, mask))
+ edac_dbg(0, " DCSM%d[%d]=0x%08x reg: 0x%x\n",
+ umc, cs, *mask, mask_reg);
+ }
+ }
+}
+
static void read_umc_base_mask(struct amd64_pvt *pvt)
{
u32 umc_base_reg, umc_base_reg_sec;
@@ -2009,8 +2063,12 @@ static void read_dct_base_mask(struct amd64_pvt *pvt)

prep_chip_selects(pvt);

- if (pvt->umc)
- return read_umc_base_mask(pvt);
+ if (pvt->umc) {
+ if (pvt->is_noncpu)
+ return read_noncpu_umc_base_mask(pvt);
+ else
+ return read_umc_base_mask(pvt);
+ }

for_each_chip_select(cs, 0, pvt) {
int reg0 = DCSB0 + (cs * 4);
@@ -2056,6 +2114,10 @@ static void determine_memory_type(struct amd64_pvt *pvt)
u32 dram_ctrl, dcsm;

if (pvt->umc) {
+ if (pvt->is_noncpu) {
+ pvt->dram_type = MEM_HBM2;
+ return;
+ }
if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(5))
pvt->dram_type = MEM_LRDDR4;
else if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(4))
@@ -2445,7 +2507,10 @@ static int f17_early_channel_count(struct amd64_pvt *pvt)

/* SDP Control bit 31 (SdpInit) is clear for unused UMC channels */
for_each_umc(i)
- channels += !!(pvt->umc[i].sdp_ctrl & UMC_SDP_INIT);
+ if (pvt->is_noncpu)
+ channels += pvt->csels[i].b_cnt;
+ else
+ channels += !!(pvt->umc[i].sdp_ctrl & UMC_SDP_INIT);

amd64_info("MCT channel count: %d\n", channels);

@@ -2586,6 +2651,12 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
u32 msb, weight, num_zero_bits;
int dimm, size = 0;

+ if (pvt->is_noncpu) {
+ addr_mask_orig = pvt->csels[umc].csmasks[csrow_nr];
+ /* The memory channels in case of GPUs are fully populated */
+ goto skip_noncpu;
+ }
+
/* No Chip Selects are enabled. */
if (!cs_mode)
return size;
@@ -2611,6 +2682,7 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
else
addr_mask_orig = pvt->csels[umc].csmasks[dimm];

+ skip_noncpu:
/*
* The number of zero bits in the mask is equal to the number of bits
* in a full mask minus the number of bits in the current mask.
@@ -3356,6 +3428,16 @@ static struct amd64_family_type family_types[] = {
.dbam_to_cs = f17_addr_mask_to_cs_size,
}
},
+ [ALDEBARAN_GPUS] = {
+ .ctl_name = "ALDEBARAN",
+ .f0_id = PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F0,
+ .f6_id = PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F6,
+ .max_mcs = 4,
+ .ops = {
+ .early_channel_count = f17_early_channel_count,
+ .dbam_to_cs = f17_addr_mask_to_cs_size,
+ }
+ },
};

/*
@@ -3611,6 +3693,19 @@ static int find_umc_channel(struct mce *m)
return (m->ipid & GENMASK(31, 0)) >> 20;
}

+/*
+ * The HBM memory managed by the UMCCH of the noncpu node
+ * can be calculated based on the [15:12]bits of IPID as follows
+ */
+static int find_umc_channel_noncpu(struct mce *m)
+{
+ u8 umc, ch;
+
+ umc = find_umc_channel(m);
+ ch = ((m->ipid >> 12) & 0xf);
+ return umc % 2 ? (ch + 4) : ch;
+}
+
static void decode_umc_error(int node_id, struct mce *m)
{
u8 ecc_type = (m->status >> 45) & 0x3;
@@ -3618,6 +3713,7 @@ static void decode_umc_error(int node_id, struct mce *m)
struct amd64_pvt *pvt;
struct err_info err;
u64 sys_addr = m->addr;
+ u8 umc_num;

mci = edac_mc_find(node_id);
if (!mci)
@@ -3630,7 +3726,16 @@ static void decode_umc_error(int node_id, struct mce *m)
if (m->status & MCI_STATUS_DEFERRED)
ecc_type = 3;

- err.channel = find_umc_channel(m);
+ if (pvt->is_noncpu) {
+ err.csrow = find_umc_channel(m) / 2;
+ /* The UMC channel is reported as the csrow in case of the noncpu nodes */
+ err.channel = find_umc_channel_noncpu(m);
+ umc_num = err.csrow * 8 + err.channel;
+ } else {
+ err.channel = find_umc_channel(m);
+ err.csrow = m->synd & 0x7;
+ umc_num = err.channel;
+ }

if (!(m->status & MCI_STATUS_SYNDV)) {
err.err_code = ERR_SYND;
@@ -3646,9 +3751,7 @@ static void decode_umc_error(int node_id, struct mce *m)
err.err_code = ERR_CHANNEL;
}

- err.csrow = m->synd & 0x7;
-
- if (umc_normaddr_to_sysaddr(&sys_addr, pvt->mc_node_id, err.channel)) {
+ if (umc_normaddr_to_sysaddr(&sys_addr, pvt->mc_node_id, umc_num)) {
err.err_code = ERR_NORM_ADDR;
goto log_error;
}
@@ -3775,15 +3878,20 @@ static void __read_mc_regs_df(struct amd64_pvt *pvt)

/* Read registers from each UMC */
for_each_umc(i) {
+ if (pvt->is_noncpu)
+ umc_base = get_noncpu_umc_base(i, 0);
+ else
+ umc_base = get_umc_base(i);

- umc_base = get_umc_base(i);
umc = &pvt->umc[i];
-
- amd_smn_read(nid, umc_base + UMCCH_DIMM_CFG, &umc->dimm_cfg);
amd_smn_read(nid, umc_base + UMCCH_UMC_CFG, &umc->umc_cfg);
amd_smn_read(nid, umc_base + UMCCH_SDP_CTRL, &umc->sdp_ctrl);
amd_smn_read(nid, umc_base + UMCCH_ECC_CTRL, &umc->ecc_ctrl);
- amd_smn_read(nid, umc_base + UMCCH_UMC_CAP_HI, &umc->umc_cap_hi);
+
+ if (!pvt->is_noncpu) {
+ amd_smn_read(nid, umc_base + UMCCH_DIMM_CFG, &umc->dimm_cfg);
+ amd_smn_read(nid, umc_base + UMCCH_UMC_CAP_HI, &umc->umc_cap_hi);
+ }
}
}

@@ -3865,7 +3973,9 @@ static void read_mc_regs(struct amd64_pvt *pvt)
determine_memory_type(pvt);
edac_dbg(1, " DIMM type: %s\n", edac_mem_types[pvt->dram_type]);

- determine_ecc_sym_sz(pvt);
+ /* ECC symbol size is not available on NONCPU nodes */
+ if (!pvt->is_noncpu)
+ determine_ecc_sym_sz(pvt);
}

/*
@@ -3953,15 +4063,21 @@ static int init_csrows_df(struct mem_ctl_info *mci)
continue;

empty = 0;
- dimm = mci->csrows[cs]->channels[umc]->dimm;
+ if (pvt->is_noncpu) {
+ dimm = mci->csrows[umc]->channels[cs]->dimm;
+ dimm->edac_mode = EDAC_SECDED;
+ dimm->dtype = DEV_X16;
+ } else {
+ dimm->edac_mode = edac_mode;
+ dimm->dtype = dev_type;
+ dimm = mci->csrows[cs]->channels[umc]->dimm;
+ }

edac_dbg(1, "MC node: %d, csrow: %d\n",
pvt->mc_node_id, cs);

dimm->nr_pages = get_csrow_nr_pages(pvt, umc, cs);
dimm->mtype = pvt->dram_type;
- dimm->edac_mode = edac_mode;
- dimm->dtype = dev_type;
dimm->grain = 64;
}
}
@@ -4226,7 +4342,9 @@ static bool ecc_enabled(struct amd64_pvt *pvt)

umc_en_mask |= BIT(i);

- if (umc->umc_cap_hi & UMC_ECC_ENABLED)
+ /* ECC is enabled by default on NONCPU nodes */
+ if (pvt->is_noncpu ||
+ (umc->umc_cap_hi & UMC_ECC_ENABLED))
ecc_en_mask |= BIT(i);
}

@@ -4262,6 +4380,11 @@ f17h_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt *pvt)
{
u8 i, ecc_en = 1, cpk_en = 1, dev_x4 = 1, dev_x16 = 1;

+ if (pvt->is_noncpu) {
+ mci->edac_ctl_cap |= EDAC_SECDED;
+ return;
+ }
+
for_each_umc(i) {
if (pvt->umc[i].sdp_ctrl & UMC_SDP_INIT) {
ecc_en &= !!(pvt->umc[i].umc_cap_hi & UMC_ECC_ENABLED);
@@ -4292,7 +4415,11 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci)
{
struct amd64_pvt *pvt = mci->pvt_info;

- mci->mtype_cap = MEM_FLAG_DDR2 | MEM_FLAG_RDDR2;
+ if (pvt->is_noncpu)
+ mci->mtype_cap = MEM_FLAG_HBM2;
+ else
+ mci->mtype_cap = MEM_FLAG_DDR2 | MEM_FLAG_RDDR2;
+
mci->edac_ctl_cap = EDAC_FLAG_NONE;

if (pvt->umc) {
@@ -4397,11 +4524,25 @@ static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt)
fam_type = &family_types[F17_M70H_CPUS];
pvt->ops = &family_types[F17_M70H_CPUS].ops;
fam_type->ctl_name = "F19h_M20h";
- break;
+ } else if (pvt->model >= 0x30 && pvt->model <= 0x3f) {
+ if (pvt->is_noncpu) {
+ int tmp = 0;
+
+ fam_type = &family_types[ALDEBARAN_GPUS];
+ pvt->ops = &family_types[ALDEBARAN_GPUS].ops;
+ tmp = pvt->mc_node_id - NONCPU_NODE_INDEX;
+ sprintf(pvt->buf, "Aldebaran#%ddie#%d", tmp / 2, tmp % 2);
+ fam_type->ctl_name = pvt->buf;
+ } else {
+ fam_type = &family_types[F19_CPUS];
+ pvt->ops = &family_types[F19_CPUS].ops;
+ fam_type->ctl_name = "F19h_M30h";
+ }
+ } else {
+ fam_type = &family_types[F19_CPUS];
+ pvt->ops = &family_types[F19_CPUS].ops;
+ family_types[F19_CPUS].ctl_name = "F19h";
}
- fam_type = &family_types[F19_CPUS];
- pvt->ops = &family_types[F19_CPUS].ops;
- family_types[F19_CPUS].ctl_name = "F19h";
break;

default:
@@ -4454,6 +4595,30 @@ static void hw_info_put(struct amd64_pvt *pvt)
kfree(pvt->umc);
}

+static void populate_layers(struct amd64_pvt *pvt, struct edac_mc_layer *layers)
+{
+ if (pvt->is_noncpu) {
+ layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
+ layers[0].size = fam_type->max_mcs;
+ layers[0].is_virt_csrow = true;
+ layers[1].type = EDAC_MC_LAYER_CHANNEL;
+ layers[1].size = pvt->csels[0].b_cnt;
+ layers[1].is_virt_csrow = false;
+ } else {
+ layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
+ layers[0].size = pvt->csels[0].b_cnt;
+ layers[0].is_virt_csrow = true;
+ layers[1].type = EDAC_MC_LAYER_CHANNEL;
+ /*
+ * Always allocate two channels since we can have setups with
+ * DIMMs on only one channel. Also, this simplifies handling
+ * later for the price of a couple of KBs tops.
+ */
+ layers[1].size = fam_type->max_mcs;
+ layers[1].is_virt_csrow = false;
+ }
+}
+
static int init_one_instance(struct amd64_pvt *pvt)
{
struct mem_ctl_info *mci = NULL;
@@ -4469,19 +4634,8 @@ static int init_one_instance(struct amd64_pvt *pvt)
if (pvt->channel_count < 0)
return ret;

- ret = -ENOMEM;
- layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
- layers[0].size = pvt->csels[0].b_cnt;
- layers[0].is_virt_csrow = true;
- layers[1].type = EDAC_MC_LAYER_CHANNEL;
-
- /*
- * Always allocate two channels since we can have setups with DIMMs on
- * only one channel. Also, this simplifies handling later for the price
- * of a couple of KBs tops.
- */
- layers[1].size = fam_type->max_mcs;
- layers[1].is_virt_csrow = false;
+ /* Define layers for CPU and NONCPU nodes */
+ populate_layers(pvt, layers);

mci = edac_mc_alloc(pvt->mc_node_id, ARRAY_SIZE(layers), layers, 0);
if (!mci)
@@ -4525,6 +4679,9 @@ static int probe_one_instance(unsigned int nid)
struct ecc_settings *s;
int ret;

+ if (!F3)
+ return -EINVAL;
+
ret = -ENOMEM;
s = kzalloc(sizeof(struct ecc_settings), GFP_KERNEL);
if (!s)
@@ -4536,6 +4693,9 @@ static int probe_one_instance(unsigned int nid)
if (!pvt)
goto err_settings;

+ if (nid >= NONCPU_NODE_INDEX)
+ pvt->is_noncpu = true;
+
pvt->mc_node_id = nid;
pvt->F3 = F3;

@@ -4609,6 +4769,10 @@ static void remove_one_instance(unsigned int nid)
struct mem_ctl_info *mci;
struct amd64_pvt *pvt;

+ /* Nothing to remove for the space holder entries */
+ if (!F3)
+ return;
+
/* Remove from EDAC CORE tracking list */
mci = edac_mc_del_mc(&F3->dev);
if (!mci)
@@ -4682,7 +4846,7 @@ static int __init amd64_edac_init(void)

for (i = 0; i < amd_nb_num(); i++) {
err = probe_one_instance(i);
- if (err) {
+ if (err && (err != -EINVAL)) {
/* unwind properly */
while (--i >= 0)
remove_one_instance(i);
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 85aa820bc165..6d5f7b3afc83 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -126,6 +126,8 @@
#define PCI_DEVICE_ID_AMD_17H_M70H_DF_F6 0x1446
#define PCI_DEVICE_ID_AMD_19H_DF_F0 0x1650
#define PCI_DEVICE_ID_AMD_19H_DF_F6 0x1656
+#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F0 0x14D0
+#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F6 0x14D6

/*
* Function 1 - Address Map
@@ -298,6 +300,7 @@ enum amd_families {
F17_M60H_CPUS,
F17_M70H_CPUS,
F19_CPUS,
+ ALDEBARAN_GPUS,
NUM_FAMILIES,
};

@@ -389,6 +392,9 @@ struct amd64_pvt {
enum mem_type dram_type;

struct amd64_umc *umc; /* UMC registers */
+ char buf[20];
+
+ u8 is_noncpu;
};

enum err_codes {
@@ -410,6 +416,27 @@ struct err_info {
u32 offset;
};

+static inline u32 get_noncpu_umc_base(u8 umc, u8 channel)
+{
+ /*
+ * On the NONCPU nodes, base address is calculated based on
+ * UMC channel and the HBM channel.
+ *
+ * UMC channels are selected in 6th nibble
+ * UMC chY[3:0]= [(chY*2 + 1) : (chY*2)]50000;
+ *
+ * HBM channels are selected in 3rd nibble
+ * HBM chX[3:0]= [Y ]5X[3:0]000;
+ * HBM chX[7:4]= [Y+1]5X[3:0]000
+ */
+ umc *= 2;
+
+ if (channel / 4)
+ umc++;
+
+ return 0x50000 + (umc << 20) + ((channel % 4) << 12);
+}
+
static inline u32 get_umc_base(u8 channel)
{
/* chY: 0xY50000 */
--
2.25.1

2021-07-19 20:28:38

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH 1/7] x86/amd_nb: Add Aldebaran device to PCI IDs

On Wed, Jun 30, 2021 at 08:58:22PM +0530, Naveen Krishna Chatradhi wrote:
> From: Muralidhara M K <[email protected]>
>
> Add Aldebaran device to the PCI ID database. Since this device has a
> configurable PCIe endpoint, it could be used with different drivers.
>
> Signed-off-by: Muralidhara M K <[email protected]>
> Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
> ---
> include/linux/pci_ids.h | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
> index 4bac1831de80..d9aae90dfce9 100644
> --- a/include/linux/pci_ids.h
> +++ b/include/linux/pci_ids.h
> @@ -554,6 +554,7 @@
> #define PCI_DEVICE_ID_AMD_17H_M30H_DF_F3 0x1493
> #define PCI_DEVICE_ID_AMD_17H_M60H_DF_F3 0x144b
> #define PCI_DEVICE_ID_AMD_17H_M70H_DF_F3 0x1443
> +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3 0x14d3
> #define PCI_DEVICE_ID_AMD_19H_DF_F3 0x1653
> #define PCI_DEVICE_ID_AMD_19H_M50H_DF_F3 0x166d
> #define PCI_DEVICE_ID_AMD_CNB17H_F3 0x1703
> --

The PCI ID looks right.

But I think this patch can be part of the next patch where this value is
first used.

Thanks,
Yazen

2021-07-19 20:56:46

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH 2/7] x86/amd_nb: Add support for northbridges on Aldebaran

On Wed, Jun 30, 2021 at 08:58:23PM +0530, Naveen Krishna Chatradhi wrote:
> From: Muralidhara M K <[email protected]>
>
> On newer heterogeneous systems from AMD, there is a possibility of
> having GPU nodes along with CPU nodes with the MCA banks. The GPU
> nodes (noncpu nodes) starts enumerating from northbridge index 8.
>

"there is a possibility of having GPU nodes along with CPU nodes with
the MCA banks" doesn't read clearly to me. It could be more explicit.
For example, "On newer systems...the CPUs manages MCA errors reported
from the GPUs. Enumerate the GPU nodes with the AMD NB framework to
support EDAC, etc." or something like this.

Also, "northbridge index" isn't a hardware thing rather it's an internal
Linux value. I think you are referring to the "AMD Node ID" value from
CPUID. The GPUs don't have CPUID, so the "AMD Node ID" value can't be
directly read like for CPUs. But the current hardware implementation is
such that the GPU nodes are enumerated in sequential order based on the
PCI hierarchy, and the first GPU node is assumed to have an "AMD Node
ID" value of 8 (the second GPU node has 9, etc.). With this
implemenation detail, the Data Fabric on the GPU nodes can be accessed
the same way as the Data Fabric on CPU nodes.

> Aldebaran GPUs have 2 root ports, with 4 misc port for each root.
>

I don't fully understand this sentence. There are 2 "Nodes"/Data Fabrics
per GPU package, but what do "4 misc port for each root" mean? In any
case, is this relevant to this patch?

Also, there should be an imperitive in the commit message, i.e. "Add
...".

> Signed-off-by: Muralidhara M K <[email protected]>
> Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
> ---
> arch/x86/include/asm/amd_nb.h | 6 ++++
> arch/x86/kernel/amd_nb.c | 62 ++++++++++++++++++++++++++++++++---
> 2 files changed, 63 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h
> index 00d1a400b7a1..e71581cf00e3 100644
> --- a/arch/x86/include/asm/amd_nb.h
> +++ b/arch/x86/include/asm/amd_nb.h
> @@ -79,6 +79,12 @@ struct amd_northbridge_info {
>
> #ifdef CONFIG_AMD_NB
>
> +/*
> + * On Newer heterogeneous systems from AMD with CPU and GPU nodes connected
> + * via xGMI links, the NON CPU Nodes are enumerated from index 8
> + */
> +#define NONCPU_NODE_INDEX 8

"Newer" doesn't need to be capatilized. And there should be a period at
the end of the sentence.

I don't think "xGMI links" would mean much to most folks. I think the
implication here is that the CPUs and GPUs are connected directly
together (or rather their Data Fabrics are connected) like is done with
2 socket CPU systems and also within a socket for Multi-chip Module
(MCM) CPUs like Naples.

> +
> u16 amd_nb_num(void);
> bool amd_nb_has_feature(unsigned int feature);
> struct amd_northbridge *node_to_amd_nb(int node);
> diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
> index 5884dfa619ff..489003e850dd 100644
> --- a/arch/x86/kernel/amd_nb.c
> +++ b/arch/x86/kernel/amd_nb.c
> @@ -26,6 +26,8 @@
> #define PCI_DEVICE_ID_AMD_17H_M70H_DF_F4 0x1444
> #define PCI_DEVICE_ID_AMD_19H_DF_F4 0x1654
> #define PCI_DEVICE_ID_AMD_19H_M50H_DF_F4 0x166e
> +#define PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT 0x14bb
> +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4 0x14d4
>

These PCI IDs look correct.

> /* Protect the PCI config register pairs used for SMN. */
> static DEFINE_MUTEX(smn_mutex);
> @@ -94,6 +96,21 @@ static const struct pci_device_id hygon_nb_link_ids[] = {
> {}
> };
>
> +static const struct pci_device_id amd_noncpu_root_ids[] = {
> + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT) },
> + {}
> +};
> +
> +static const struct pci_device_id amd_noncpu_nb_misc_ids[] = {
> + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3) },
> + {}
> +};
> +
> +static const struct pci_device_id amd_noncpu_nb_link_ids[] = {
> + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4) },
> + {}
> +};
> +

I think separating the CPU and non-CPU IDs is a good idea.

> const struct amd_nb_bus_dev_range amd_nb_bus_dev_ranges[] __initconst = {
> { 0x00, 0x18, 0x20 },
> { 0xff, 0x00, 0x20 },
> @@ -182,11 +199,16 @@ int amd_cache_northbridges(void)
> const struct pci_device_id *misc_ids = amd_nb_misc_ids;
> const struct pci_device_id *link_ids = amd_nb_link_ids;
> const struct pci_device_id *root_ids = amd_root_ids;
> +
> + const struct pci_device_id *noncpu_misc_ids = amd_noncpu_nb_misc_ids;
> + const struct pci_device_id *noncpu_link_ids = amd_noncpu_nb_link_ids;
> + const struct pci_device_id *noncpu_root_ids = amd_noncpu_root_ids;
> +
> struct pci_dev *root, *misc, *link;
> struct amd_northbridge *nb;
> u16 roots_per_misc = 0;
> - u16 misc_count = 0;
> - u16 root_count = 0;
> + u16 misc_count = 0, misc_count_noncpu = 0;
> + u16 root_count = 0, root_count_noncpu = 0;
> u16 i, j;
>
> if (amd_northbridges.num)
> @@ -205,10 +227,16 @@ int amd_cache_northbridges(void)
> if (!misc_count)
> return -ENODEV;
>
> + while ((misc = next_northbridge(misc, noncpu_misc_ids)) != NULL)
> + misc_count_noncpu++;
> +
> root = NULL;
> while ((root = next_northbridge(root, root_ids)) != NULL)
> root_count++;
>
> + while ((root = next_northbridge(root, noncpu_root_ids)) != NULL)
> + root_count_noncpu++;
> +
> if (root_count) {
> roots_per_misc = root_count / misc_count;
>
> @@ -222,15 +250,27 @@ int amd_cache_northbridges(void)
> }
> }
>
> - nb = kcalloc(misc_count, sizeof(struct amd_northbridge), GFP_KERNEL);
> + /*
> + * The valid amd_northbridges are in between (0 ~ misc_count) and
> + * (NONCPU_NODE_INDEX ~ NONCPU_NODE_INDEX + misc_count_noncpu)
> + */

This comment isn't clear to me. Is it even necessary?

> + if (misc_count_noncpu)
> + /*
> + * There are NONCPU Nodes with pci root ports starting at index 8
> + * allocate few extra cells for simplicity in handling the indexes
> + */

I think this comment can be more explicit. The first non-CPU Node ID
starts at 8 even if there are fewer than 8 CPU nodes. To maintain the
AMD Node ID to Linux amd_nb indexing scheme, allocate the number of GPU
nodes plus 8. Some allocated amd_northbridge structures will go unused
when the number of CPU nodes is less than 8, but this tradeoff is to
keep things relatively simple.

> + amd_northbridges.num = NONCPU_NODE_INDEX + misc_count_noncpu;
> + else
> + amd_northbridges.num = misc_count;

The if-else statements should have {}s even though there's only a single
line of code in each. This is just to make it easier to read multiple
lines. Or the second code comment can be merged with the first outside
the if-else.

> +
> + nb = kcalloc(amd_northbridges.num, sizeof(struct amd_northbridge), GFP_KERNEL);
> if (!nb)
> return -ENOMEM;
>
> amd_northbridges.nb = nb;
> - amd_northbridges.num = misc_count;
>
> link = misc = root = NULL;
> - for (i = 0; i < amd_northbridges.num; i++) {
> + for (i = 0; i < misc_count; i++) {
> node_to_amd_nb(i)->root = root =
> next_northbridge(root, root_ids);
> node_to_amd_nb(i)->misc = misc =
> @@ -251,6 +291,18 @@ int amd_cache_northbridges(void)
> root = next_northbridge(root, root_ids);
> }
>
> + link = misc = root = NULL;

This line can go inside the if statement below.

I'm not sure it's totally necessary since the GPU devices should be
listed after the CPU devices. But I guess better safe than sorry in case
that implementation detail doesn't hold in the future. If you keep it,
then I think you should do the same above when finding the counts.

> + if (misc_count_noncpu) {
> + for (i = NONCPU_NODE_INDEX; i < NONCPU_NODE_INDEX + misc_count_noncpu; i++) {
> + node_to_amd_nb(i)->root = root =
> + next_northbridge(root, noncpu_root_ids);
> + node_to_amd_nb(i)->misc = misc =
> + next_northbridge(misc, noncpu_misc_ids);
> + node_to_amd_nb(i)->link = link =
> + next_northbridge(link, noncpu_link_ids);
> + }
> + }
> +
> if (amd_gart_present())
> amd_northbridges.flags |= AMD_NB_GART;
>
> --

Thanks,
Yazen

2021-07-19 21:44:52

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH 3/7] EDAC/mc: Add new HBM2 memory type

On Wed, Jun 30, 2021 at 08:58:24PM +0530, Naveen Krishna Chatradhi wrote:
> Add a new entry to 'enum mem_type' and a new string to 'edac_mem_types[]'
> for HBM2 (High Bandwidth Memory Gen 2) new memory type.
>
> Signed-off-by: Muralidhara M K <[email protected]>
> Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
> ---
> drivers/edac/edac_mc.c | 1 +
> include/linux/edac.h | 3 +++
> 2 files changed, 4 insertions(+)
>
> diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
> index f6d462d0be2d..2c5975674723 100644
> --- a/drivers/edac/edac_mc.c
> +++ b/drivers/edac/edac_mc.c
> @@ -166,6 +166,7 @@ const char * const edac_mem_types[] = {
> [MEM_DDR5] = "Unbuffered-DDR5",
> [MEM_NVDIMM] = "Non-volatile-RAM",
> [MEM_WIO2] = "Wide-IO-2",
> + [MEM_HBM2] = "High-bandwidth-memory-Gen2",
> };
> EXPORT_SYMBOL_GPL(edac_mem_types);
>
> diff --git a/include/linux/edac.h b/include/linux/edac.h
> index 76d3562d3006..4207d06996a4 100644
> --- a/include/linux/edac.h
> +++ b/include/linux/edac.h
> @@ -184,6 +184,7 @@ static inline char *mc_event_error_type(const unsigned int err_type)
> * @MEM_DDR5: Unbuffered DDR5 RAM
> * @MEM_NVDIMM: Non-volatile RAM
> * @MEM_WIO2: Wide I/O 2.
> + * @MEM_HBM2: High bandwidth Memory Gen 2.
> */
> enum mem_type {
> MEM_EMPTY = 0,
> @@ -212,6 +213,7 @@ enum mem_type {
> MEM_DDR5,
> MEM_NVDIMM,
> MEM_WIO2,
> + MEM_HBM2,
> };
>
> #define MEM_FLAG_EMPTY BIT(MEM_EMPTY)
> @@ -239,6 +241,7 @@ enum mem_type {
> #define MEM_FLAG_DDR5 BIT(MEM_DDR5)
> #define MEM_FLAG_NVDIMM BIT(MEM_NVDIMM)
> #define MEM_FLAG_WIO2 BIT(MEM_WIO2)
> +#define MEM_FLAG_HBM2 BIT(MEM_HBM2)
>
> /**
> * enum edac_type - Error Detection and Correction capabilities and mode
> --

Looks okay to me.

Reviewed-by: Yazen Ghannam <[email protected]>

Tony,
The following commit added HBM support to some Intel EDAC code.

c945088384d0 EDAC/i10nm: Add support for high bandwidth memory

But it didn't include a new mem_type for HBM. Should it have?

I only see some edac_mem_types use in sysfs and some debug messages. So
I'm curious if users find this information useful.

Thanks,
Yazen

2021-07-19 21:45:06

by Luck, Tony

[permalink] [raw]
Subject: RE: [PATCH 3/7] EDAC/mc: Add new HBM2 memory type

> The following commit added HBM support to some Intel EDAC code.
>
> c945088384d0 EDAC/i10nm: Add support for high bandwidth memory
>
> But it didn't include a new mem_type for HBM. Should it have?
>
> I only see some edac_mem_types use in sysfs and some debug messages. So
> I'm curious if users find this information useful.

Yazen,

That commit makes the normal vs. HBM error visible in the DIMM label (by
prefixing the "MC" for memory controller with "HB".

+ if (imc->hbm_mc)
+ snprintf(dimm->label, sizeof(dimm->label), "CPU_SrcID#%u_HBMC#%u_Chan#%u",
+ imc->src_id, imc->lmc, chan);
+ else
+ snprintf(dimm->label, sizeof(dimm->label), "CPU_SrcID#%u_MC#%u_Chan#%u_DIMM#%u",
+ imc->src_id, imc->lmc, chan, dimmno);

Perhaps we should also set the "type" of the DIMMs. Qiuxu: opinion?

-Tony

2021-07-20 12:16:42

by Qiuxu Zhuo

[permalink] [raw]
Subject: RE: [PATCH 3/7] EDAC/mc: Add new HBM2 memory type

> From: Luck, Tony <[email protected]>
> ...
> That commit makes the normal vs. HBM error visible in the DIMM label (by
> prefixing the "MC" for memory controller with "HB".
>
> + if (imc->hbm_mc)
> + snprintf(dimm->label, sizeof(dimm->label),
> "CPU_SrcID#%u_HBMC#%u_Chan#%u",
> + imc->src_id, imc->lmc, chan);
> + else
> + snprintf(dimm->label, sizeof(dimm->label),
> "CPU_SrcID#%u_MC#%u_Chan#%u_DIMM#%u",
> + imc->src_id, imc->lmc, chan, dimmno);
>
> Perhaps we should also set the "type" of the DIMMs. Qiuxu: opinion?

Yes, we should. I'll make a patch for it.

Thanks!
-Qiuxu

2021-07-20 16:37:11

by Luck, Tony

[permalink] [raw]
Subject: [PATCH] EDAC/skx_common: Set the memory type correctly for HBM memory

From: Qiuxu Zhuo <[email protected]>

Set the memory type to MEM_HBM2 if it's managed by the HBM2
memory controller.

Signed-off-by: Qiuxu Zhuo <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
---
drivers/edac/skx_common.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/edac/skx_common.c b/drivers/edac/skx_common.c
index 5e83f59bef8a..f9120e36bf3a 100644
--- a/drivers/edac/skx_common.c
+++ b/drivers/edac/skx_common.c
@@ -345,7 +345,10 @@ int skx_get_dimm_info(u32 mtr, u32 mcmtr, u32 amap, struct dimm_info *dimm,
rows = numrow(mtr);
cols = imc->hbm_mc ? 6 : numcol(mtr);

- if (cfg->support_ddr5 && ((amap & 0x8) || imc->hbm_mc)) {
+ if (imc->hbm_mc) {
+ banks = 32;
+ mtype = MEM_HBM2;
+ } else if (cfg->support_ddr5 && (amap & 0x8)) {
banks = 32;
mtype = MEM_DDR5;
} else {
--
2.29.2

2021-07-20 16:42:07

by Luck, Tony

[permalink] [raw]
Subject: RE: [PATCH 3/7] EDAC/mc: Add new HBM2 memory type

> Looks okay to me.
>
> Reviewed-by: Yazen Ghannam <[email protected]>

Applied. Thanks.

-Tony

2021-07-29 16:34:49

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH 4/7] EDAC/mce_amd: extract node id from InstanceHi in IPID

On Wed, Jun 30, 2021 at 08:58:25PM +0530, Naveen Krishna Chatradhi wrote:
> On AMD systems with SMCA banks on NONCPU nodes, the node id information
> is available in the InstanceHI[47:44] of the IPID register.

The doesn't read well to me. I saw this as saying "bits 47:44 of the
InstanceHi register". Also, the name of the field is "InstanceIdHi" in
the documentation.

I think it'd be more clear to say "available in MCA_IPID[47:44]
(InstanceIdHi)" or something similar.

>
> Signed-off-by: Muralidhara M K <[email protected]>
> Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
> ---
> drivers/edac/mce_amd.c | 15 +++++++++++++--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
> index 27d56920b469..364dfb6e359d 100644
> --- a/drivers/edac/mce_amd.c
> +++ b/drivers/edac/mce_amd.c
> @@ -1049,6 +1049,7 @@ static void decode_smca_error(struct mce *m)
> enum smca_bank_types bank_type;
> const char *ip_name;
> u8 xec = XEC(m->status, xec_mask);
> + u32 node_id = 0;

Why u32? Why not u16 to match topology_die_id() or int to match
decode_dram_ecc()?

>
> if (m->bank >= ARRAY_SIZE(smca_banks))
> return;
> @@ -1072,8 +1073,18 @@ static void decode_smca_error(struct mce *m)
> if (xec < smca_mce_descs[bank_type].num_descs)
> pr_cont(", %s.\n", smca_mce_descs[bank_type].descs[xec]);
>
> - if (bank_type == SMCA_UMC && xec == 0 && decode_dram_ecc)
> - decode_dram_ecc(topology_die_id(m->extcpu), m);
> + /*
> + * SMCA_UMC_V2 is used on the noncpu nodes, extract the node id
> + * from the InstanceHI[47:44] of the IPID register.
> + */
> + if (bank_type == SMCA_UMC_V2 && xec == 0)
> + node_id = ((m->ipid >> 44) & 0xF);
> +
> + if (bank_type == SMCA_UMC && xec == 0)
> + node_id = topology_die_id(m->extcpu);
> +
> + if (decode_dram_ecc)
> + decode_dram_ecc(node_id, m);

If decode_dram_ecc() is set, then this will call it on every MCA error
that comes in. Rather we only want to call it on DRAM ECC errors.

You could do something like this:

if (decode_dram_ecc && xec == 0) {
u32 node_id = 0;

if (bank_type == SMCA_UMC)
node_id = XXX;
else if (bank_type == SMCA_UMC_V2)
node_id = YYY;
else
return;

decode_dram_ecc(node_id, m);
}

This is just an example. Maybe you can save an indentation level by
negating those conditions and returning early, etc.

Thanks,
Yazen

2021-07-29 17:57:00

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH 5/7] EDAC/amd64: Enumerate memory on noncpu nodes

On Wed, Jun 30, 2021 at 08:58:26PM +0530, Naveen Krishna Chatradhi wrote:
> On newer heterogeneous systems from AMD with GPU nodes connected via
> xGMI links to the CPUs, the GPU dies are interfaced with HBM2 memory.
>
> This patch modifies the amd64_edac module to handle the HBM memory
> enumeration leveraging the existing edac and the amd64 specific data
> structures.
>
> The UMC Phys on GPU nodes are enumerated as csrows
> The UMC channels connected to HBMs are enumerated as ranks
>

Please make sure there is some imperative statement in the commit
message. And watch out for grammar, punctuation, etc.

> Signed-off-by: Muralidhara M K <[email protected]>
> Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
> ---
> drivers/edac/amd64_edac.c | 300 +++++++++++++++++++++++++++++---------
> drivers/edac/amd64_edac.h | 27 ++++
> 2 files changed, 259 insertions(+), 68 deletions(-)
>
> diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
> index 25c6362e414b..8fe0a5e3c8f2 100644
> --- a/drivers/edac/amd64_edac.c
> +++ b/drivers/edac/amd64_edac.c
> @@ -1741,6 +1741,9 @@ static unsigned long determine_edac_cap(struct amd64_pvt *pvt)
>
> if (umc_en_mask == dimm_ecc_en_mask)
> edac_cap = EDAC_FLAG_SECDED;
> +
> + if (pvt->is_noncpu)
> + edac_cap = EDAC_FLAG_EC;

This flag means "Error Checking - no correction". Is that appropriate
for these devices?

> } else {
> bit = (pvt->fam > 0xf || pvt->ext_model >= K8_REV_F)
> ? 19
> @@ -1799,6 +1802,9 @@ static int f17_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt)
> {
> int cs_mode = 0;
>
> + if (pvt->is_noncpu)
> + return CS_EVEN_PRIMARY | CS_ODD_PRIMARY;
> +

Why do this function call if the values are hard-coded? I think you can
just set them below.

> if (csrow_enabled(2 * dimm, ctrl, pvt))
> cs_mode |= CS_EVEN_PRIMARY;
>
> @@ -1818,6 +1824,15 @@ static void debug_display_dimm_sizes_df(struct amd64_pvt *pvt, u8 ctrl)
>
> edac_printk(KERN_DEBUG, EDAC_MC, "UMC%d chip selects:\n", ctrl);
>
> + if (pvt->is_noncpu) {
> + cs_mode = f17_get_cs_mode(cs0, ctrl, pvt);
> + for_each_chip_select(cs0, ctrl, pvt) {
> + size0 = pvt->ops->dbam_to_cs(pvt, ctrl, cs_mode, cs0);
> + amd64_info(EDAC_MC ": %d: %5dMB\n", cs0, size0);

So this puts each chip select size on a new line rather than grouping by
twos. Is there a logical or physical reason for the difference?

> + }
> + return;
> + }
> +
> for (dimm = 0; dimm < 2; dimm++) {
> cs0 = dimm * 2;
> cs1 = dimm * 2 + 1;
> @@ -1833,43 +1848,53 @@ static void debug_display_dimm_sizes_df(struct amd64_pvt *pvt, u8 ctrl)
> }
> }
>
> -static void __dump_misc_regs_df(struct amd64_pvt *pvt)
> +static void dump_umcch_regs(struct amd64_pvt *pvt, int i)
> {
> - struct amd64_umc *umc;
> - u32 i, tmp, umc_base;
> -
> - for_each_umc(i) {
> - umc_base = get_umc_base(i);
> - umc = &pvt->umc[i];
> + struct amd64_umc *umc = &pvt->umc[i];
> + u32 tmp, umc_base;
>
> - edac_dbg(1, "UMC%d DIMM cfg: 0x%x\n", i, umc->dimm_cfg);
> + if (pvt->is_noncpu) {
> edac_dbg(1, "UMC%d UMC cfg: 0x%x\n", i, umc->umc_cfg);
> edac_dbg(1, "UMC%d SDP ctrl: 0x%x\n", i, umc->sdp_ctrl);
> edac_dbg(1, "UMC%d ECC ctrl: 0x%x\n", i, umc->ecc_ctrl);
> + edac_dbg(1, "UMC%d All HBMs support ECC: yes\n", i);
> + return;
> + }
>
> - amd_smn_read(pvt->mc_node_id, umc_base + UMCCH_ECC_BAD_SYMBOL, &tmp);
> - edac_dbg(1, "UMC%d ECC bad symbol: 0x%x\n", i, tmp);
> -
> - amd_smn_read(pvt->mc_node_id, umc_base + UMCCH_UMC_CAP, &tmp);
> - edac_dbg(1, "UMC%d UMC cap: 0x%x\n", i, tmp);
> - edac_dbg(1, "UMC%d UMC cap high: 0x%x\n", i, umc->umc_cap_hi);
> -
> - edac_dbg(1, "UMC%d ECC capable: %s, ChipKill ECC capable: %s\n",
> - i, (umc->umc_cap_hi & BIT(30)) ? "yes" : "no",
> - (umc->umc_cap_hi & BIT(31)) ? "yes" : "no");
> - edac_dbg(1, "UMC%d All DIMMs support ECC: %s\n",
> - i, (umc->umc_cfg & BIT(12)) ? "yes" : "no");
> - edac_dbg(1, "UMC%d x4 DIMMs present: %s\n",
> - i, (umc->dimm_cfg & BIT(6)) ? "yes" : "no");
> - edac_dbg(1, "UMC%d x16 DIMMs present: %s\n",
> - i, (umc->dimm_cfg & BIT(7)) ? "yes" : "no");
> -
> - if (pvt->dram_type == MEM_LRDDR4) {
> - amd_smn_read(pvt->mc_node_id, umc_base + UMCCH_ADDR_CFG, &tmp);
> - edac_dbg(1, "UMC%d LRDIMM %dx rank multiply\n",
> - i, 1 << ((tmp >> 4) & 0x3));
> - }
> + umc_base = get_umc_base(i);
> +
> + edac_dbg(1, "UMC%d DIMM cfg: 0x%x\n", i, umc->dimm_cfg);
> +
> + amd_smn_read(pvt->mc_node_id, umc_base + UMCCH_ECC_BAD_SYMBOL, &tmp);
> + edac_dbg(1, "UMC%d ECC bad symbol: 0x%x\n", i, tmp);
> +
> + amd_smn_read(pvt->mc_node_id, umc_base + UMCCH_UMC_CAP, &tmp);
> + edac_dbg(1, "UMC%d UMC cap: 0x%x\n", i, tmp);
> + edac_dbg(1, "UMC%d UMC cap high: 0x%x\n", i, umc->umc_cap_hi);
>
> + edac_dbg(1, "UMC%d ECC capable: %s, ChipKill ECC capable: %s\n",
> + i, (umc->umc_cap_hi & BIT(30)) ? "yes" : "no",
> + (umc->umc_cap_hi & BIT(31)) ? "yes" : "no");
> + edac_dbg(1, "UMC%d All DIMMs support ECC: %s\n",
> + i, (umc->umc_cfg & BIT(12)) ? "yes" : "no");
> + edac_dbg(1, "UMC%d x4 DIMMs present: %s\n",
> + i, (umc->dimm_cfg & BIT(6)) ? "yes" : "no");
> + edac_dbg(1, "UMC%d x16 DIMMs present: %s\n",
> + i, (umc->dimm_cfg & BIT(7)) ? "yes" : "no");
> +
> + if (pvt->dram_type == MEM_LRDDR4) {
> + amd_smn_read(pvt->mc_node_id, umc_base + UMCCH_ADDR_CFG, &tmp);
> + edac_dbg(1, "UMC%d LRDIMM %dx rank multiply\n",
> + i, 1 << ((tmp >> 4) & 0x3));
> + }
> +}
> +
> +static void __dump_misc_regs_df(struct amd64_pvt *pvt)
> +{
> + int i;
> +
> + for_each_umc(i) {
> + dump_umcch_regs(pvt, i);
> debug_display_dimm_sizes_df(pvt, i);
> }
>
> @@ -1937,10 +1962,14 @@ static void prep_chip_selects(struct amd64_pvt *pvt)
> pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2;
> } else if (pvt->fam >= 0x17) {
> int umc;
> -
> for_each_umc(umc) {
> - pvt->csels[umc].b_cnt = 4;
> - pvt->csels[umc].m_cnt = 2;
> + if (pvt->is_noncpu) {
> + pvt->csels[umc].b_cnt = 8;
> + pvt->csels[umc].m_cnt = 8;
> + } else {
> + pvt->csels[umc].b_cnt = 4;
> + pvt->csels[umc].m_cnt = 2;
> + }
> }
>
> } else {
> @@ -1949,6 +1978,31 @@ static void prep_chip_selects(struct amd64_pvt *pvt)
> }
> }
>
> +static void read_noncpu_umc_base_mask(struct amd64_pvt *pvt)
> +{
> + u32 base_reg, mask_reg;
> + u32 *base, *mask;
> + int umc, cs;
> +
> + for_each_umc(umc) {
> + for_each_chip_select(cs, umc, pvt) {
> + base_reg = get_noncpu_umc_base(umc, cs) + UMCCH_BASE_ADDR;
> + base = &pvt->csels[umc].csbases[cs];
> +
> + if (!amd_smn_read(pvt->mc_node_id, base_reg, base))
> + edac_dbg(0, " DCSB%d[%d]=0x%08x reg: 0x%x\n",
> + umc, cs, *base, base_reg);
> +
> + mask_reg = get_noncpu_umc_base(umc, cs) + UMCCH_ADDR_MASK;
> + mask = &pvt->csels[umc].csmasks[cs];
> +
> + if (!amd_smn_read(pvt->mc_node_id, mask_reg, mask))
> + edac_dbg(0, " DCSM%d[%d]=0x%08x reg: 0x%x\n",
> + umc, cs, *mask, mask_reg);
> + }
> + }
> +}
> +
> static void read_umc_base_mask(struct amd64_pvt *pvt)
> {
> u32 umc_base_reg, umc_base_reg_sec;
> @@ -2009,8 +2063,12 @@ static void read_dct_base_mask(struct amd64_pvt *pvt)
>
> prep_chip_selects(pvt);
>
> - if (pvt->umc)
> - return read_umc_base_mask(pvt);
> + if (pvt->umc) {
> + if (pvt->is_noncpu)
> + return read_noncpu_umc_base_mask(pvt);
> + else
> + return read_umc_base_mask(pvt);
> + }
>
> for_each_chip_select(cs, 0, pvt) {
> int reg0 = DCSB0 + (cs * 4);
> @@ -2056,6 +2114,10 @@ static void determine_memory_type(struct amd64_pvt *pvt)
> u32 dram_ctrl, dcsm;
>
> if (pvt->umc) {
> + if (pvt->is_noncpu) {
> + pvt->dram_type = MEM_HBM2;
> + return;
> + }
> if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(5))
> pvt->dram_type = MEM_LRDDR4;
> else if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(4))
> @@ -2445,7 +2507,10 @@ static int f17_early_channel_count(struct amd64_pvt *pvt)
>
> /* SDP Control bit 31 (SdpInit) is clear for unused UMC channels */
> for_each_umc(i)
> - channels += !!(pvt->umc[i].sdp_ctrl & UMC_SDP_INIT);
> + if (pvt->is_noncpu)
> + channels += pvt->csels[i].b_cnt;
> + else
> + channels += !!(pvt->umc[i].sdp_ctrl & UMC_SDP_INIT);
>
> amd64_info("MCT channel count: %d\n", channels);
>
> @@ -2586,6 +2651,12 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
> u32 msb, weight, num_zero_bits;
> int dimm, size = 0;
>
> + if (pvt->is_noncpu) {
> + addr_mask_orig = pvt->csels[umc].csmasks[csrow_nr];
> + /* The memory channels in case of GPUs are fully populated */
> + goto skip_noncpu;
> + }
> +
> /* No Chip Selects are enabled. */
> if (!cs_mode)
> return size;
> @@ -2611,6 +2682,7 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
> else
> addr_mask_orig = pvt->csels[umc].csmasks[dimm];
>
> + skip_noncpu:
> /*
> * The number of zero bits in the mask is equal to the number of bits
> * in a full mask minus the number of bits in the current mask.
> @@ -3356,6 +3428,16 @@ static struct amd64_family_type family_types[] = {
> .dbam_to_cs = f17_addr_mask_to_cs_size,
> }
> },
> + [ALDEBARAN_GPUS] = {
> + .ctl_name = "ALDEBARAN",
> + .f0_id = PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F0,
> + .f6_id = PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F6,
> + .max_mcs = 4,
> + .ops = {
> + .early_channel_count = f17_early_channel_count,
> + .dbam_to_cs = f17_addr_mask_to_cs_size,
> + }
> + },
> };
>
> /*
> @@ -3611,6 +3693,19 @@ static int find_umc_channel(struct mce *m)
> return (m->ipid & GENMASK(31, 0)) >> 20;
> }
>
> +/*
> + * The HBM memory managed by the UMCCH of the noncpu node
> + * can be calculated based on the [15:12]bits of IPID as follows
> + */
> +static int find_umc_channel_noncpu(struct mce *m)
> +{
> + u8 umc, ch;
> +
> + umc = find_umc_channel(m);
> + ch = ((m->ipid >> 12) & 0xf);
> + return umc % 2 ? (ch + 4) : ch;
> +}
> +
> static void decode_umc_error(int node_id, struct mce *m)
> {
> u8 ecc_type = (m->status >> 45) & 0x3;
> @@ -3618,6 +3713,7 @@ static void decode_umc_error(int node_id, struct mce *m)
> struct amd64_pvt *pvt;
> struct err_info err;
> u64 sys_addr = m->addr;
> + u8 umc_num;
>
> mci = edac_mc_find(node_id);
> if (!mci)
> @@ -3630,7 +3726,16 @@ static void decode_umc_error(int node_id, struct mce *m)
> if (m->status & MCI_STATUS_DEFERRED)
> ecc_type = 3;
>
> - err.channel = find_umc_channel(m);
> + if (pvt->is_noncpu) {
> + err.csrow = find_umc_channel(m) / 2;
> + /* The UMC channel is reported as the csrow in case of the noncpu nodes */
> + err.channel = find_umc_channel_noncpu(m);
> + umc_num = err.csrow * 8 + err.channel;
> + } else {
> + err.channel = find_umc_channel(m);
> + err.csrow = m->synd & 0x7;
> + umc_num = err.channel;
> + }
>
> if (!(m->status & MCI_STATUS_SYNDV)) {
> err.err_code = ERR_SYND;
> @@ -3646,9 +3751,7 @@ static void decode_umc_error(int node_id, struct mce *m)
> err.err_code = ERR_CHANNEL;
> }
>
> - err.csrow = m->synd & 0x7;
> -
> - if (umc_normaddr_to_sysaddr(&sys_addr, pvt->mc_node_id, err.channel)) {
> + if (umc_normaddr_to_sysaddr(&sys_addr, pvt->mc_node_id, umc_num)) {
> err.err_code = ERR_NORM_ADDR;
> goto log_error;
> }
> @@ -3775,15 +3878,20 @@ static void __read_mc_regs_df(struct amd64_pvt *pvt)
>
> /* Read registers from each UMC */
> for_each_umc(i) {
> + if (pvt->is_noncpu)
> + umc_base = get_noncpu_umc_base(i, 0);
> + else
> + umc_base = get_umc_base(i);
>
> - umc_base = get_umc_base(i);
> umc = &pvt->umc[i];
> -
> - amd_smn_read(nid, umc_base + UMCCH_DIMM_CFG, &umc->dimm_cfg);
> amd_smn_read(nid, umc_base + UMCCH_UMC_CFG, &umc->umc_cfg);
> amd_smn_read(nid, umc_base + UMCCH_SDP_CTRL, &umc->sdp_ctrl);
> amd_smn_read(nid, umc_base + UMCCH_ECC_CTRL, &umc->ecc_ctrl);
> - amd_smn_read(nid, umc_base + UMCCH_UMC_CAP_HI, &umc->umc_cap_hi);
> +
> + if (!pvt->is_noncpu) {
> + amd_smn_read(nid, umc_base + UMCCH_DIMM_CFG, &umc->dimm_cfg);
> + amd_smn_read(nid, umc_base + UMCCH_UMC_CAP_HI, &umc->umc_cap_hi);
> + }
> }
> }
>
> @@ -3865,7 +3973,9 @@ static void read_mc_regs(struct amd64_pvt *pvt)
> determine_memory_type(pvt);
> edac_dbg(1, " DIMM type: %s\n", edac_mem_types[pvt->dram_type]);
>
> - determine_ecc_sym_sz(pvt);
> + /* ECC symbol size is not available on NONCPU nodes */
> + if (!pvt->is_noncpu)
> + determine_ecc_sym_sz(pvt);
> }
>
> /*
> @@ -3953,15 +4063,21 @@ static int init_csrows_df(struct mem_ctl_info *mci)
> continue;
>
> empty = 0;
> - dimm = mci->csrows[cs]->channels[umc]->dimm;
> + if (pvt->is_noncpu) {
> + dimm = mci->csrows[umc]->channels[cs]->dimm;
> + dimm->edac_mode = EDAC_SECDED;
> + dimm->dtype = DEV_X16;
> + } else {
> + dimm->edac_mode = edac_mode;
> + dimm->dtype = dev_type;
> + dimm = mci->csrows[cs]->channels[umc]->dimm;

This last line should go before the other two.

> + }
>
> edac_dbg(1, "MC node: %d, csrow: %d\n",
> pvt->mc_node_id, cs);
>
> dimm->nr_pages = get_csrow_nr_pages(pvt, umc, cs);
> dimm->mtype = pvt->dram_type;
> - dimm->edac_mode = edac_mode;
> - dimm->dtype = dev_type;
> dimm->grain = 64;
> }
> }
> @@ -4226,7 +4342,9 @@ static bool ecc_enabled(struct amd64_pvt *pvt)
>
> umc_en_mask |= BIT(i);
>
> - if (umc->umc_cap_hi & UMC_ECC_ENABLED)
> + /* ECC is enabled by default on NONCPU nodes */
> + if (pvt->is_noncpu ||
> + (umc->umc_cap_hi & UMC_ECC_ENABLED))
> ecc_en_mask |= BIT(i);
> }
>
> @@ -4262,6 +4380,11 @@ f17h_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt *pvt)
> {
> u8 i, ecc_en = 1, cpk_en = 1, dev_x4 = 1, dev_x16 = 1;
>
> + if (pvt->is_noncpu) {
> + mci->edac_ctl_cap |= EDAC_SECDED;
> + return;
> + }
> +
> for_each_umc(i) {
> if (pvt->umc[i].sdp_ctrl & UMC_SDP_INIT) {
> ecc_en &= !!(pvt->umc[i].umc_cap_hi & UMC_ECC_ENABLED);
> @@ -4292,7 +4415,11 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci)
> {
> struct amd64_pvt *pvt = mci->pvt_info;
>
> - mci->mtype_cap = MEM_FLAG_DDR2 | MEM_FLAG_RDDR2;
> + if (pvt->is_noncpu)
> + mci->mtype_cap = MEM_FLAG_HBM2;
> + else
> + mci->mtype_cap = MEM_FLAG_DDR2 | MEM_FLAG_RDDR2;
> +
> mci->edac_ctl_cap = EDAC_FLAG_NONE;
>
> if (pvt->umc) {
> @@ -4397,11 +4524,25 @@ static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt)
> fam_type = &family_types[F17_M70H_CPUS];
> pvt->ops = &family_types[F17_M70H_CPUS].ops;
> fam_type->ctl_name = "F19h_M20h";
> - break;
> + } else if (pvt->model >= 0x30 && pvt->model <= 0x3f) {
> + if (pvt->is_noncpu) {
> + int tmp = 0;
> +
> + fam_type = &family_types[ALDEBARAN_GPUS];
> + pvt->ops = &family_types[ALDEBARAN_GPUS].ops;
> + tmp = pvt->mc_node_id - NONCPU_NODE_INDEX;

This can be set when you declare it.

> + sprintf(pvt->buf, "Aldebaran#%ddie#%d", tmp / 2, tmp % 2);
> + fam_type->ctl_name = pvt->buf;
> + } else {
> + fam_type = &family_types[F19_CPUS];
> + pvt->ops = &family_types[F19_CPUS].ops;
> + fam_type->ctl_name = "F19h_M30h";
> + }
> + } else {
> + fam_type = &family_types[F19_CPUS];
> + pvt->ops = &family_types[F19_CPUS].ops;
> + family_types[F19_CPUS].ctl_name = "F19h";
> }
> - fam_type = &family_types[F19_CPUS];
> - pvt->ops = &family_types[F19_CPUS].ops;
> - family_types[F19_CPUS].ctl_name = "F19h";
> break;
>
> default:
> @@ -4454,6 +4595,30 @@ static void hw_info_put(struct amd64_pvt *pvt)
> kfree(pvt->umc);
> }
>
> +static void populate_layers(struct amd64_pvt *pvt, struct edac_mc_layer *layers)
> +{
> + if (pvt->is_noncpu) {
> + layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
> + layers[0].size = fam_type->max_mcs;
> + layers[0].is_virt_csrow = true;
> + layers[1].type = EDAC_MC_LAYER_CHANNEL;
> + layers[1].size = pvt->csels[0].b_cnt;
> + layers[1].is_virt_csrow = false;

This looks mostly the same as below but the sizes are different. Can't
you keep all this together and just adjust the sizes?

> + } else {
> + layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
> + layers[0].size = pvt->csels[0].b_cnt;
> + layers[0].is_virt_csrow = true;
> + layers[1].type = EDAC_MC_LAYER_CHANNEL;
> + /*
> + * Always allocate two channels since we can have setups with
> + * DIMMs on only one channel. Also, this simplifies handling
> + * later for the price of a couple of KBs tops.
> + */
> + layers[1].size = fam_type->max_mcs;
> + layers[1].is_virt_csrow = false;
> + }
> +}
> +
> static int init_one_instance(struct amd64_pvt *pvt)
> {
> struct mem_ctl_info *mci = NULL;
> @@ -4469,19 +4634,8 @@ static int init_one_instance(struct amd64_pvt *pvt)
> if (pvt->channel_count < 0)
> return ret;
>
> - ret = -ENOMEM;
> - layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
> - layers[0].size = pvt->csels[0].b_cnt;
> - layers[0].is_virt_csrow = true;
> - layers[1].type = EDAC_MC_LAYER_CHANNEL;
> -
> - /*
> - * Always allocate two channels since we can have setups with DIMMs on
> - * only one channel. Also, this simplifies handling later for the price
> - * of a couple of KBs tops.
> - */
> - layers[1].size = fam_type->max_mcs;
> - layers[1].is_virt_csrow = false;
> + /* Define layers for CPU and NONCPU nodes */
> + populate_layers(pvt, layers);
>
> mci = edac_mc_alloc(pvt->mc_node_id, ARRAY_SIZE(layers), layers, 0);
> if (!mci)
> @@ -4525,6 +4679,9 @@ static int probe_one_instance(unsigned int nid)
> struct ecc_settings *s;
> int ret;
>
> + if (!F3)
> + return -EINVAL;
> +

Why is this needed?

> ret = -ENOMEM;
> s = kzalloc(sizeof(struct ecc_settings), GFP_KERNEL);
> if (!s)
> @@ -4536,6 +4693,9 @@ static int probe_one_instance(unsigned int nid)
> if (!pvt)
> goto err_settings;
>
> + if (nid >= NONCPU_NODE_INDEX)
> + pvt->is_noncpu = true;
> +
> pvt->mc_node_id = nid;
> pvt->F3 = F3;
>
> @@ -4609,6 +4769,10 @@ static void remove_one_instance(unsigned int nid)
> struct mem_ctl_info *mci;
> struct amd64_pvt *pvt;
>
> + /* Nothing to remove for the space holder entries */
> + if (!F3)
> + return;
> +
> /* Remove from EDAC CORE tracking list */
> mci = edac_mc_del_mc(&F3->dev);
> if (!mci)
> @@ -4682,7 +4846,7 @@ static int __init amd64_edac_init(void)
>
> for (i = 0; i < amd_nb_num(); i++) {
> err = probe_one_instance(i);
> - if (err) {
> + if (err && (err != -EINVAL)) {

If the !F3 condition above is "okay", why not just return 0 (success)?

> /* unwind properly */
> while (--i >= 0)
> remove_one_instance(i);
> diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
> index 85aa820bc165..6d5f7b3afc83 100644
> --- a/drivers/edac/amd64_edac.h
> +++ b/drivers/edac/amd64_edac.h
> @@ -126,6 +126,8 @@
> #define PCI_DEVICE_ID_AMD_17H_M70H_DF_F6 0x1446
> #define PCI_DEVICE_ID_AMD_19H_DF_F0 0x1650
> #define PCI_DEVICE_ID_AMD_19H_DF_F6 0x1656
> +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F0 0x14D0
> +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F6 0x14D6
>

These are correct.

> /*
> * Function 1 - Address Map
> @@ -298,6 +300,7 @@ enum amd_families {
> F17_M60H_CPUS,
> F17_M70H_CPUS,
> F19_CPUS,
> + ALDEBARAN_GPUS,
> NUM_FAMILIES,
> };
>
> @@ -389,6 +392,9 @@ struct amd64_pvt {
> enum mem_type dram_type;
>
> struct amd64_umc *umc; /* UMC registers */
> + char buf[20];
> +
> + u8 is_noncpu;

Can this be a "bool"?

> };
>
> enum err_codes {
> @@ -410,6 +416,27 @@ struct err_info {
> u32 offset;
> };
>
> +static inline u32 get_noncpu_umc_base(u8 umc, u8 channel)
> +{
> + /*
> + * On the NONCPU nodes, base address is calculated based on
> + * UMC channel and the HBM channel.
> + *
> + * UMC channels are selected in 6th nibble
> + * UMC chY[3:0]= [(chY*2 + 1) : (chY*2)]50000;
> + *
> + * HBM channels are selected in 3rd nibble
> + * HBM chX[3:0]= [Y ]5X[3:0]000;
> + * HBM chX[7:4]= [Y+1]5X[3:0]000
> + */
> + umc *= 2;
> +
> + if (channel / 4)

Can this be "if (channel >= 4)"?

> + umc++;
> +
> + return 0x50000 + (umc << 20) + ((channel % 4) << 12);
> +}
> +
> static inline u32 get_umc_base(u8 channel)
> {
> /* chY: 0xY50000 */
> --

There are a lot of changes in this patch. I think you should give the
highlights in the commit message. For example, you may want to say if
you introduced new functions, changed code flow, etc., and why this is
needed compared to existing systems. I think the code comments have some
details, but a summary in the commit message may help.

Thanks,
Yazen

2021-07-29 18:00:55

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH 6/7] EDAC/amd64: Add address translation support for DF3.5

On Wed, Jun 30, 2021 at 08:58:27PM +0530, Naveen Krishna Chatradhi wrote:
> From: Muralidhara M K <[email protected]>
>
> Add support for address translation on Data Fabric version 3.5.
>
> Add new data fabric ops and interleaving modes. Also, adjust how the
> DRAM address maps are found early in the translation for certain cases.
>
> Signed-off-by: Muralidhara M K <[email protected]>
> Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
> Co-developed-by: Yazen Ghannam <[email protected]>
> Signed-off-by: Yazen Ghannam <[email protected]>
> ---

I think this patch and the following one may need to be reworked based
on the changes in the other "address translation" patchset. I can take
these two patches, rework them, and include them with next revision of
the other set. What do you think?

Thanks,
Yazen

Subject: RE: [PATCH 6/7] EDAC/amd64: Add address translation support for DF3.5

[Public]

Hi Yazen,

Regards,
Naveenk

-----Original Message-----
From: Ghannam, Yazen <[email protected]>
Sent: Thursday, July 29, 2021 11:30 PM
To: Chatradhi, Naveen Krishna <[email protected]>
Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
Subject: Re: [PATCH 6/7] EDAC/amd64: Add address translation support for DF3.5

On Wed, Jun 30, 2021 at 08:58:27PM +0530, Naveen Krishna Chatradhi wrote:
> From: Muralidhara M K <[email protected]>
>
> Add support for address translation on Data Fabric version 3.5.
>
> Add new data fabric ops and interleaving modes. Also, adjust how the
> DRAM address maps are found early in the translation for certain cases.
>
> Signed-off-by: Muralidhara M K <[email protected]>
> Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
> Co-developed-by: Yazen Ghannam <[email protected]>
> Signed-off-by: Yazen Ghannam <[email protected]>
> ---

I think this patch and the following one may need to be reworked based on the changes in the other "address translation" patchset. I can take these two patches, rework them, and include them with next revision of the other set. What do you think?
[naveenk:] Sure thanks.

Thanks,
Yazen

Subject: [PATCH v2 0/3] x86/edac/amd64: Add support for noncpu nodes

From: Muralidhara M K <[email protected]>

On newer heterogeneous systems from AMD with GPU nodes connected via
xGMI links to the CPUs, the GPU dies are interfaced with HBM2 memory.

This patchset applies on top of the following series by Yazen Ghannam
AMD MCA Address Translation Updates
[https://patchwork.kernel.org/project/linux-edac/list/?series=505989]

This patchset does the following
1. Add support for northbridges on Aldebaran
* x86/amd_nb: Add support for northbridges on Aldebaran
2. Modifies the amd64_edac module to
a. Handle the UMCs on the noncpu nodes,
* EDAC/mce_amd: extract node id from InstanceHi in IPID
b. Enumerate HBM memory and add address translation
* EDAC/amd64: Enumerate memory on noncpu nodes

Muralidhara M K (1):
x86/amd_nb: Add support for northbridges on Aldebaran

Naveen Krishna Chatradhi (2):
EDAC/mce_amd: Extract node id from InstanceHi in IPID
EDAC/amd64: Enumerate memory on noncpu nodes

arch/x86/include/asm/amd_nb.h | 10 ++
arch/x86/kernel/amd_nb.c | 63 ++++++++++-
drivers/edac/amd64_edac.c | 202 +++++++++++++++++++++++++++++-----
drivers/edac/amd64_edac.h | 27 +++++
drivers/edac/mce_amd.c | 19 +++-
include/linux/pci_ids.h | 1 +
6 files changed, 288 insertions(+), 34 deletions(-)

--
2.25.1

Subject: [PATCH v2 2/3] EDAC/mce_amd: Extract node id from InstanceHi in IPID

On AMD systems with SMCA banks on NONCPU nodes, the node id
information is available in MCA_IPID[47:44](InstanceIdHi).

Signed-off-by: Muralidhara M K <[email protected]>
Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
---
Changes since v1:
1. Modified the commit message
2. rearranged the conditions before calling decode_dram_ecc()

drivers/edac/mce_amd.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index 27d56920b469..318b7fb715ff 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -1072,8 +1072,23 @@ static void decode_smca_error(struct mce *m)
if (xec < smca_mce_descs[bank_type].num_descs)
pr_cont(", %s.\n", smca_mce_descs[bank_type].descs[xec]);

- if (bank_type == SMCA_UMC && xec == 0 && decode_dram_ecc)
- decode_dram_ecc(topology_die_id(m->extcpu), m);
+ if (xec == 0 && decode_dram_ecc) {
+ int node_id = 0;
+
+ if (bank_type == SMCA_UMC) {
+ node_id = topology_die_id(m->extcpu);
+ } else if (bank_type == SMCA_UMC_V2) {
+ /*
+ * SMCA_UMC_V2 is used on the noncpu nodes, extract
+ * the node id from MCA_IPID[47:44](InstanceIdHi)
+ */
+ node_id = ((m->ipid >> 44) & 0xF);
+ } else {
+ return;
+ }
+
+ decode_dram_ecc(node_id, m);
+ }
}

static inline void amd_decode_err_code(u16 ec)
--
2.25.1

Subject: [PATCH v2 3/3] EDAC/amd64: Enumerate memory on noncpu nodes

On newer heterogeneous systems from AMD with GPU nodes interfaced
with HBM2 memory are connected to the CPUs via custom links.

This patch modifies the amd64_edac module to handle the HBM memory
enumeration leveraging the existing edac and the amd64 specific data
structures.

This patch does the following for non-cpu nodes:
1. Define PCI IDs and ops for Aldeberarn GPUs in family_types array.
2. The UMC Phys on GPU nodes are enumerated as csrows and the UMC channels
connected to HBMs are enumerated as ranks.
3. Define a function to find the UMCv2 channel number
4. Define a function to calculate base address of the UMCv2 registers
5. Add debug information for UMCv2 channel registers.

Signed-off-by: Muralidhara M K <[email protected]>
Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
---
Changes since v1:
1. Modifed the commit message
2. Change the edac_cap
3. kept sizes of both cpu and noncpu together
4. return success if the !F3 condition true and remove unnecessary validation
5. declared is_noncpu as bool
6. modified the condition from channel/4 to channel>=4
7. Rearranged debug information for noncpu umcch registers

drivers/edac/amd64_edac.c | 202 +++++++++++++++++++++++++++++++++-----
drivers/edac/amd64_edac.h | 27 +++++
2 files changed, 202 insertions(+), 27 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index b03c33240238..2dd77a828394 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -1979,6 +1979,9 @@ static unsigned long determine_edac_cap(struct amd64_pvt *pvt)

if (umc_en_mask == dimm_ecc_en_mask)
edac_cap = EDAC_FLAG_SECDED;
+
+ if (pvt->is_noncpu)
+ edac_cap = EDAC_FLAG_SECDED;
} else {
bit = (pvt->fam > 0xf || pvt->ext_model >= K8_REV_F)
? 19
@@ -2037,6 +2040,9 @@ static int f17_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt)
{
int cs_mode = 0;

+ if (pvt->is_noncpu)
+ return CS_EVEN_PRIMARY | CS_ODD_PRIMARY;
+
if (csrow_enabled(2 * dimm, ctrl, pvt))
cs_mode |= CS_EVEN_PRIMARY;

@@ -2056,6 +2062,15 @@ static void debug_display_dimm_sizes_df(struct amd64_pvt *pvt, u8 ctrl)

edac_printk(KERN_DEBUG, EDAC_MC, "UMC%d chip selects:\n", ctrl);

+ if (pvt->is_noncpu) {
+ cs_mode = f17_get_cs_mode(cs0, ctrl, pvt);
+ for_each_chip_select(cs0, ctrl, pvt) {
+ size0 = pvt->ops->dbam_to_cs(pvt, ctrl, cs_mode, cs0);
+ amd64_info(EDAC_MC ": %d: %5dMB\n", cs0, size0);
+ }
+ return;
+ }
+
for (dimm = 0; dimm < 2; dimm++) {
cs0 = dimm * 2;
cs1 = dimm * 2 + 1;
@@ -2080,10 +2095,15 @@ static void __dump_misc_regs_df(struct amd64_pvt *pvt)
umc_base = get_umc_base(i);
umc = &pvt->umc[i];

- edac_dbg(1, "UMC%d DIMM cfg: 0x%x\n", i, umc->dimm_cfg);
+ if (!pvt->is_noncpu)
+ edac_dbg(1, "UMC%d DIMM cfg: 0x%x\n", i, umc->dimm_cfg);
edac_dbg(1, "UMC%d UMC cfg: 0x%x\n", i, umc->umc_cfg);
edac_dbg(1, "UMC%d SDP ctrl: 0x%x\n", i, umc->sdp_ctrl);
edac_dbg(1, "UMC%d ECC ctrl: 0x%x\n", i, umc->ecc_ctrl);
+ if (pvt->is_noncpu) {
+ edac_dbg(1, "UMC%d All HBMs support ECC: yes\n", i);
+ goto dimm_size;
+ }

amd_smn_read(pvt->mc_node_id, umc_base + UMCCH_ECC_BAD_SYMBOL, &tmp);
edac_dbg(1, "UMC%d ECC bad symbol: 0x%x\n", i, tmp);
@@ -2108,6 +2128,7 @@ static void __dump_misc_regs_df(struct amd64_pvt *pvt)
i, 1 << ((tmp >> 4) & 0x3));
}

+ dimm_size:
debug_display_dimm_sizes_df(pvt, i);
}

@@ -2175,10 +2196,14 @@ static void prep_chip_selects(struct amd64_pvt *pvt)
pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2;
} else if (pvt->fam >= 0x17) {
int umc;
-
for_each_umc(umc) {
- pvt->csels[umc].b_cnt = 4;
- pvt->csels[umc].m_cnt = 2;
+ if (pvt->is_noncpu) {
+ pvt->csels[umc].b_cnt = 8;
+ pvt->csels[umc].m_cnt = 8;
+ } else {
+ pvt->csels[umc].b_cnt = 4;
+ pvt->csels[umc].m_cnt = 2;
+ }
}

} else {
@@ -2187,6 +2212,31 @@ static void prep_chip_selects(struct amd64_pvt *pvt)
}
}

+static void read_noncpu_umc_base_mask(struct amd64_pvt *pvt)
+{
+ u32 base_reg, mask_reg;
+ u32 *base, *mask;
+ int umc, cs;
+
+ for_each_umc(umc) {
+ for_each_chip_select(cs, umc, pvt) {
+ base_reg = get_noncpu_umc_base(umc, cs) + UMCCH_BASE_ADDR;
+ base = &pvt->csels[umc].csbases[cs];
+
+ if (!amd_smn_read(pvt->mc_node_id, base_reg, base))
+ edac_dbg(0, " DCSB%d[%d]=0x%08x reg: 0x%x\n",
+ umc, cs, *base, base_reg);
+
+ mask_reg = get_noncpu_umc_base(umc, cs) + UMCCH_ADDR_MASK;
+ mask = &pvt->csels[umc].csmasks[cs];
+
+ if (!amd_smn_read(pvt->mc_node_id, mask_reg, mask))
+ edac_dbg(0, " DCSM%d[%d]=0x%08x reg: 0x%x\n",
+ umc, cs, *mask, mask_reg);
+ }
+ }
+}
+
static void read_umc_base_mask(struct amd64_pvt *pvt)
{
u32 umc_base_reg, umc_base_reg_sec;
@@ -2247,8 +2297,12 @@ static void read_dct_base_mask(struct amd64_pvt *pvt)

prep_chip_selects(pvt);

- if (pvt->umc)
- return read_umc_base_mask(pvt);
+ if (pvt->umc) {
+ if (pvt->is_noncpu)
+ return read_noncpu_umc_base_mask(pvt);
+ else
+ return read_umc_base_mask(pvt);
+ }

for_each_chip_select(cs, 0, pvt) {
int reg0 = DCSB0 + (cs * 4);
@@ -2294,6 +2348,10 @@ static void determine_memory_type(struct amd64_pvt *pvt)
u32 dram_ctrl, dcsm;

if (pvt->umc) {
+ if (pvt->is_noncpu) {
+ pvt->dram_type = MEM_HBM2;
+ return;
+ }
if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(5))
pvt->dram_type = MEM_LRDDR4;
else if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(4))
@@ -2683,7 +2741,10 @@ static int f17_early_channel_count(struct amd64_pvt *pvt)

/* SDP Control bit 31 (SdpInit) is clear for unused UMC channels */
for_each_umc(i)
- channels += !!(pvt->umc[i].sdp_ctrl & UMC_SDP_INIT);
+ if (pvt->is_noncpu)
+ channels += pvt->csels[i].b_cnt;
+ else
+ channels += !!(pvt->umc[i].sdp_ctrl & UMC_SDP_INIT);

amd64_info("MCT channel count: %d\n", channels);

@@ -2824,6 +2885,12 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
u32 msb, weight, num_zero_bits;
int dimm, size = 0;

+ if (pvt->is_noncpu) {
+ addr_mask_orig = pvt->csels[umc].csmasks[csrow_nr];
+ /* The memory channels in case of GPUs are fully populated */
+ goto skip_noncpu;
+ }
+
/* No Chip Selects are enabled. */
if (!cs_mode)
return size;
@@ -2849,6 +2916,7 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
else
addr_mask_orig = pvt->csels[umc].csmasks[dimm];

+ skip_noncpu:
/*
* The number of zero bits in the mask is equal to the number of bits
* in a full mask minus the number of bits in the current mask.
@@ -3594,6 +3662,16 @@ static struct amd64_family_type family_types[] = {
.dbam_to_cs = f17_addr_mask_to_cs_size,
}
},
+ [ALDEBARAN_GPUS] = {
+ .ctl_name = "ALDEBARAN",
+ .f0_id = PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F0,
+ .f6_id = PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F6,
+ .max_mcs = 4,
+ .ops = {
+ .early_channel_count = f17_early_channel_count,
+ .dbam_to_cs = f17_addr_mask_to_cs_size,
+ }
+ },
};

/*
@@ -3849,6 +3927,19 @@ static int find_umc_channel(struct mce *m)
return (m->ipid & GENMASK(31, 0)) >> 20;
}

+/*
+ * The HBM memory managed by the UMCCH of the noncpu node
+ * can be calculated based on the [15:12]bits of IPID as follows
+ */
+static int find_umc_channel_noncpu(struct mce *m)
+{
+ u8 umc, ch;
+
+ umc = find_umc_channel(m);
+ ch = ((m->ipid >> 12) & 0xf);
+ return umc % 2 ? (ch + 4) : ch;
+}
+
static void decode_umc_error(int node_id, struct mce *m)
{
u8 ecc_type = (m->status >> 45) & 0x3;
@@ -3856,6 +3947,7 @@ static void decode_umc_error(int node_id, struct mce *m)
struct amd64_pvt *pvt;
struct err_info err;
u64 sys_addr = m->addr;
+ u8 umc_num;

mci = edac_mc_find(node_id);
if (!mci)
@@ -3868,7 +3960,17 @@ static void decode_umc_error(int node_id, struct mce *m)
if (m->status & MCI_STATUS_DEFERRED)
ecc_type = 3;

- err.channel = find_umc_channel(m);
+ if (pvt->is_noncpu) {
+ /* The UMCPHY is reported as csrow in case of noncpu nodes */
+ err.csrow = find_umc_channel(m) / 2;
+ /* UMCCH is managing the HBM memory */
+ err.channel = find_umc_channel_noncpu(m);
+ umc_num = err.csrow * 8 + err.channel;
+ } else {
+ err.channel = find_umc_channel(m);
+ err.csrow = m->synd & 0x7;
+ umc_num = err.channel;
+ }

if (!(m->status & MCI_STATUS_SYNDV)) {
err.err_code = ERR_SYND;
@@ -3884,9 +3986,7 @@ static void decode_umc_error(int node_id, struct mce *m)
err.err_code = ERR_CHANNEL;
}

- err.csrow = m->synd & 0x7;
-
- if (umc_normaddr_to_sysaddr(&sys_addr, pvt->mc_node_id, err.channel)) {
+ if (umc_normaddr_to_sysaddr(&sys_addr, pvt->mc_node_id, umc_num)) {
err.err_code = ERR_NORM_ADDR;
goto log_error;
}
@@ -4013,15 +4113,20 @@ static void __read_mc_regs_df(struct amd64_pvt *pvt)

/* Read registers from each UMC */
for_each_umc(i) {
+ if (pvt->is_noncpu)
+ umc_base = get_noncpu_umc_base(i, 0);
+ else
+ umc_base = get_umc_base(i);

- umc_base = get_umc_base(i);
umc = &pvt->umc[i];
-
- amd_smn_read(nid, umc_base + UMCCH_DIMM_CFG, &umc->dimm_cfg);
amd_smn_read(nid, umc_base + UMCCH_UMC_CFG, &umc->umc_cfg);
amd_smn_read(nid, umc_base + UMCCH_SDP_CTRL, &umc->sdp_ctrl);
amd_smn_read(nid, umc_base + UMCCH_ECC_CTRL, &umc->ecc_ctrl);
- amd_smn_read(nid, umc_base + UMCCH_UMC_CAP_HI, &umc->umc_cap_hi);
+
+ if (!pvt->is_noncpu) {
+ amd_smn_read(nid, umc_base + UMCCH_DIMM_CFG, &umc->dimm_cfg);
+ amd_smn_read(nid, umc_base + UMCCH_UMC_CAP_HI, &umc->umc_cap_hi);
+ }
}
}

@@ -4103,7 +4208,9 @@ static void read_mc_regs(struct amd64_pvt *pvt)
determine_memory_type(pvt);
edac_dbg(1, " DIMM type: %s\n", edac_mem_types[pvt->dram_type]);

- determine_ecc_sym_sz(pvt);
+ /* ECC symbol size is not available on NONCPU nodes */
+ if (!pvt->is_noncpu)
+ determine_ecc_sym_sz(pvt);
}

/*
@@ -4191,15 +4298,21 @@ static int init_csrows_df(struct mem_ctl_info *mci)
continue;

empty = 0;
- dimm = mci->csrows[cs]->channels[umc]->dimm;
+ if (pvt->is_noncpu) {
+ dimm = mci->csrows[umc]->channels[cs]->dimm;
+ dimm->edac_mode = EDAC_SECDED;
+ dimm->dtype = DEV_X16;
+ } else {
+ dimm = mci->csrows[cs]->channels[umc]->dimm;
+ dimm->edac_mode = edac_mode;
+ dimm->dtype = dev_type;
+ }

edac_dbg(1, "MC node: %d, csrow: %d\n",
pvt->mc_node_id, cs);

dimm->nr_pages = get_csrow_nr_pages(pvt, umc, cs);
dimm->mtype = pvt->dram_type;
- dimm->edac_mode = edac_mode;
- dimm->dtype = dev_type;
dimm->grain = 64;
}
}
@@ -4464,7 +4577,9 @@ static bool ecc_enabled(struct amd64_pvt *pvt)

umc_en_mask |= BIT(i);

- if (umc->umc_cap_hi & UMC_ECC_ENABLED)
+ /* ECC is enabled by default on NONCPU nodes */
+ if (pvt->is_noncpu ||
+ (umc->umc_cap_hi & UMC_ECC_ENABLED))
ecc_en_mask |= BIT(i);
}

@@ -4500,6 +4615,11 @@ f17h_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt *pvt)
{
u8 i, ecc_en = 1, cpk_en = 1, dev_x4 = 1, dev_x16 = 1;

+ if (pvt->is_noncpu) {
+ mci->edac_ctl_cap |= EDAC_SECDED;
+ return;
+ }
+
for_each_umc(i) {
if (pvt->umc[i].sdp_ctrl & UMC_SDP_INIT) {
ecc_en &= !!(pvt->umc[i].umc_cap_hi & UMC_ECC_ENABLED);
@@ -4530,7 +4650,11 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci)
{
struct amd64_pvt *pvt = mci->pvt_info;

- mci->mtype_cap = MEM_FLAG_DDR2 | MEM_FLAG_RDDR2;
+ if (pvt->is_noncpu)
+ mci->mtype_cap = MEM_FLAG_HBM2;
+ else
+ mci->mtype_cap = MEM_FLAG_DDR2 | MEM_FLAG_RDDR2;
+
mci->edac_ctl_cap = EDAC_FLAG_NONE;

if (pvt->umc) {
@@ -4635,11 +4759,24 @@ static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt)
fam_type = &family_types[F17_M70H_CPUS];
pvt->ops = &family_types[F17_M70H_CPUS].ops;
fam_type->ctl_name = "F19h_M20h";
- break;
+ } else if (pvt->model >= 0x30 && pvt->model <= 0x3f) {
+ if (pvt->is_noncpu) {
+ int tmp = pvt->mc_node_id - NONCPU_NODE_INDEX;
+
+ fam_type = &family_types[ALDEBARAN_GPUS];
+ pvt->ops = &family_types[ALDEBARAN_GPUS].ops;
+ sprintf(pvt->buf, "Aldebaran#%ddie#%d", tmp / 2, tmp % 2);
+ fam_type->ctl_name = pvt->buf;
+ } else {
+ fam_type = &family_types[F19_CPUS];
+ pvt->ops = &family_types[F19_CPUS].ops;
+ fam_type->ctl_name = "F19h_M30h";
+ }
+ } else {
+ fam_type = &family_types[F19_CPUS];
+ pvt->ops = &family_types[F19_CPUS].ops;
+ family_types[F19_CPUS].ctl_name = "F19h";
}
- fam_type = &family_types[F19_CPUS];
- pvt->ops = &family_types[F19_CPUS].ops;
- family_types[F19_CPUS].ctl_name = "F19h";
break;

default:
@@ -4707,9 +4844,10 @@ static int init_one_instance(struct amd64_pvt *pvt)
if (pvt->channel_count < 0)
return ret;

+ /* Define layers for CPU and NONCPU nodes */
ret = -ENOMEM;
layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
- layers[0].size = pvt->csels[0].b_cnt;
+ layers[0].size = pvt->is_noncpu ? fam_type->max_mcs : pvt->csels[0].b_cnt;
layers[0].is_virt_csrow = true;
layers[1].type = EDAC_MC_LAYER_CHANNEL;

@@ -4718,7 +4856,7 @@ static int init_one_instance(struct amd64_pvt *pvt)
* only one channel. Also, this simplifies handling later for the price
* of a couple of KBs tops.
*/
- layers[1].size = fam_type->max_mcs;
+ layers[1].size = pvt->is_noncpu ? pvt->csels[0].b_cnt : fam_type->max_mcs;
layers[1].is_virt_csrow = false;

mci = edac_mc_alloc(pvt->mc_node_id, ARRAY_SIZE(layers), layers, 0);
@@ -4763,6 +4901,9 @@ static int probe_one_instance(unsigned int nid)
struct ecc_settings *s;
int ret;

+ if (!F3)
+ return 0;
+
ret = -ENOMEM;
s = kzalloc(sizeof(struct ecc_settings), GFP_KERNEL);
if (!s)
@@ -4774,6 +4915,9 @@ static int probe_one_instance(unsigned int nid)
if (!pvt)
goto err_settings;

+ if (nid >= NONCPU_NODE_INDEX)
+ pvt->is_noncpu = true;
+
pvt->mc_node_id = nid;
pvt->F3 = F3;

@@ -4847,6 +4991,10 @@ static void remove_one_instance(unsigned int nid)
struct mem_ctl_info *mci;
struct amd64_pvt *pvt;

+ /* Nothing to remove for the space holder entries */
+ if (!F3)
+ return;
+
/* Remove from EDAC CORE tracking list */
mci = edac_mc_del_mc(&F3->dev);
if (!mci)
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 85aa820bc165..c5532a6f0c34 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -126,6 +126,8 @@
#define PCI_DEVICE_ID_AMD_17H_M70H_DF_F6 0x1446
#define PCI_DEVICE_ID_AMD_19H_DF_F0 0x1650
#define PCI_DEVICE_ID_AMD_19H_DF_F6 0x1656
+#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F0 0x14D0
+#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F6 0x14D6

/*
* Function 1 - Address Map
@@ -298,6 +300,7 @@ enum amd_families {
F17_M60H_CPUS,
F17_M70H_CPUS,
F19_CPUS,
+ ALDEBARAN_GPUS,
NUM_FAMILIES,
};

@@ -389,6 +392,9 @@ struct amd64_pvt {
enum mem_type dram_type;

struct amd64_umc *umc; /* UMC registers */
+ char buf[20];
+
+ bool is_noncpu;
};

enum err_codes {
@@ -410,6 +416,27 @@ struct err_info {
u32 offset;
};

+static inline u32 get_noncpu_umc_base(u8 umc, u8 channel)
+{
+ /*
+ * On the NONCPU nodes, base address is calculated based on
+ * UMC channel and the HBM channel.
+ *
+ * UMC channels are selected in 6th nibble
+ * UMC chY[3:0]= [(chY*2 + 1) : (chY*2)]50000;
+ *
+ * HBM channels are selected in 3rd nibble
+ * HBM chX[3:0]= [Y ]5X[3:0]000;
+ * HBM chX[7:4]= [Y+1]5X[3:0]000
+ */
+ umc *= 2;
+
+ if (channel >= 4)
+ umc++;
+
+ return 0x50000 + (umc << 20) + ((channel % 4) << 12);
+}
+
static inline u32 get_umc_base(u8 channel)
{
/* chY: 0xY50000 */
--
2.25.1

Subject: [PATCH v2 1/3] x86/amd_nb: Add support for northbridges on Aldebaran

From: Muralidhara M K <[email protected]>

On newer systems the CPUs manage MCA errors reported from the GPUs.
Enumerate the GPU nodes with the AMD NB framework to support EDAC.

This patch adds necessary code to manage the Aldebaran nodes along with
the CPU nodes.

The GPU nodes are enumerated in sequential order based on the
PCI hierarchy, and the first GPU node is assumed to have an "AMD Node
ID" value of 8 (the second GPU node has 9, etc.). Each Aldebaran GPU
package has 2 Data Fabrics, which are enumerated as 2 nodes.
With this implementation detail, the Data Fabric on the GPU nodes can be
accessed the same way as the Data Fabric on CPU nodes.

Signed-off-by: Muralidhara M K <[email protected]>
Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
---
Changes since v1:
1. Modified the commit message and comments in the code
2. Squashed patch 1/7: "x86/amd_nb: Add Aldebaran device to PCI IDs"

arch/x86/include/asm/amd_nb.h | 10 ++++++
arch/x86/kernel/amd_nb.c | 63 ++++++++++++++++++++++++++++++++---
include/linux/pci_ids.h | 1 +
3 files changed, 69 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h
index 00d1a400b7a1..f15247422992 100644
--- a/arch/x86/include/asm/amd_nb.h
+++ b/arch/x86/include/asm/amd_nb.h
@@ -79,6 +79,16 @@ struct amd_northbridge_info {

#ifdef CONFIG_AMD_NB

+/*
+ * On newer heterogeneous systems the data gabrics of the CPUs and GPUs
+ * are connected directly via a custom links, like is done with
+ * 2 socket CPU systems and also within a socket for Multi-chip Module
+ * (MCM) CPUs like Naples.
+ * The first GPU node(non cpu) is assumed to have an "AMD Node ID" value
+ * of 8 (the second GPU node has 9, etc.).
+ */
+#define NONCPU_NODE_INDEX 8
+
u16 amd_nb_num(void);
bool amd_nb_has_feature(unsigned int feature);
struct amd_northbridge *node_to_amd_nb(int node);
diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
index 5884dfa619ff..5597135a18b5 100644
--- a/arch/x86/kernel/amd_nb.c
+++ b/arch/x86/kernel/amd_nb.c
@@ -26,6 +26,8 @@
#define PCI_DEVICE_ID_AMD_17H_M70H_DF_F4 0x1444
#define PCI_DEVICE_ID_AMD_19H_DF_F4 0x1654
#define PCI_DEVICE_ID_AMD_19H_M50H_DF_F4 0x166e
+#define PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT 0x14bb
+#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4 0x14d4

/* Protect the PCI config register pairs used for SMN. */
static DEFINE_MUTEX(smn_mutex);
@@ -94,6 +96,21 @@ static const struct pci_device_id hygon_nb_link_ids[] = {
{}
};

+static const struct pci_device_id amd_noncpu_root_ids[] = {
+ { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT) },
+ {}
+};
+
+static const struct pci_device_id amd_noncpu_nb_misc_ids[] = {
+ { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3) },
+ {}
+};
+
+static const struct pci_device_id amd_noncpu_nb_link_ids[] = {
+ { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4) },
+ {}
+};
+
const struct amd_nb_bus_dev_range amd_nb_bus_dev_ranges[] __initconst = {
{ 0x00, 0x18, 0x20 },
{ 0xff, 0x00, 0x20 },
@@ -182,11 +199,16 @@ int amd_cache_northbridges(void)
const struct pci_device_id *misc_ids = amd_nb_misc_ids;
const struct pci_device_id *link_ids = amd_nb_link_ids;
const struct pci_device_id *root_ids = amd_root_ids;
+
+ const struct pci_device_id *noncpu_misc_ids = amd_noncpu_nb_misc_ids;
+ const struct pci_device_id *noncpu_link_ids = amd_noncpu_nb_link_ids;
+ const struct pci_device_id *noncpu_root_ids = amd_noncpu_root_ids;
+
struct pci_dev *root, *misc, *link;
struct amd_northbridge *nb;
u16 roots_per_misc = 0;
- u16 misc_count = 0;
- u16 root_count = 0;
+ u16 misc_count = 0, misc_count_noncpu = 0;
+ u16 root_count = 0, root_count_noncpu = 0;
u16 i, j;

if (amd_northbridges.num)
@@ -205,10 +227,16 @@ int amd_cache_northbridges(void)
if (!misc_count)
return -ENODEV;

+ while ((misc = next_northbridge(misc, noncpu_misc_ids)) != NULL)
+ misc_count_noncpu++;
+
root = NULL;
while ((root = next_northbridge(root, root_ids)) != NULL)
root_count++;

+ while ((root = next_northbridge(root, noncpu_root_ids)) != NULL)
+ root_count_noncpu++;
+
if (root_count) {
roots_per_misc = root_count / misc_count;

@@ -222,15 +250,28 @@ int amd_cache_northbridges(void)
}
}

- nb = kcalloc(misc_count, sizeof(struct amd_northbridge), GFP_KERNEL);
+ if (misc_count_noncpu) {
+ /*
+ * The first non-CPU Node ID starts at 8 even if there are fewer
+ * than 8 CPU nodes. To maintain the AMD Node ID to Linux amd_nb
+ * indexing scheme, allocate the number of GPU nodes plus 8.
+ * Some allocated amd_northbridge structures will go unused when
+ * the number of CPU nodes is less than 8, but this tradeoff is to
+ * keep things relatively simple.
+ */
+ amd_northbridges.num = NONCPU_NODE_INDEX + misc_count_noncpu;
+ } else {
+ amd_northbridges.num = misc_count;
+ }
+
+ nb = kcalloc(amd_northbridges.num, sizeof(struct amd_northbridge), GFP_KERNEL);
if (!nb)
return -ENOMEM;

amd_northbridges.nb = nb;
- amd_northbridges.num = misc_count;

link = misc = root = NULL;
- for (i = 0; i < amd_northbridges.num; i++) {
+ for (i = 0; i < misc_count; i++) {
node_to_amd_nb(i)->root = root =
next_northbridge(root, root_ids);
node_to_amd_nb(i)->misc = misc =
@@ -251,6 +292,18 @@ int amd_cache_northbridges(void)
root = next_northbridge(root, root_ids);
}

+ if (misc_count_noncpu) {
+ link = misc = root = NULL;
+ for (i = NONCPU_NODE_INDEX; i < NONCPU_NODE_INDEX + misc_count_noncpu; i++) {
+ node_to_amd_nb(i)->root = root =
+ next_northbridge(root, noncpu_root_ids);
+ node_to_amd_nb(i)->misc = misc =
+ next_northbridge(misc, noncpu_misc_ids);
+ node_to_amd_nb(i)->link = link =
+ next_northbridge(link, noncpu_link_ids);
+ }
+ }
+
if (amd_gart_present())
amd_northbridges.flags |= AMD_NB_GART;

diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index 4bac1831de80..d9aae90dfce9 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -554,6 +554,7 @@
#define PCI_DEVICE_ID_AMD_17H_M30H_DF_F3 0x1493
#define PCI_DEVICE_ID_AMD_17H_M60H_DF_F3 0x144b
#define PCI_DEVICE_ID_AMD_17H_M70H_DF_F3 0x1443
+#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3 0x14d3
#define PCI_DEVICE_ID_AMD_19H_DF_F3 0x1653
#define PCI_DEVICE_ID_AMD_19H_M50H_DF_F3 0x166d
#define PCI_DEVICE_ID_AMD_CNB17H_F3 0x1703
--
2.25.1

Subject: RE: [PATCH 1/7] x86/amd_nb: Add Aldebaran device to PCI IDs

[Public]

Hi Yazen

-----Original Message-----
From: Ghannam, Yazen <[email protected]>
Sent: Tuesday, July 20, 2021 12:59 AM
To: Chatradhi, Naveen Krishna <[email protected]>
Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
Subject: Re: [PATCH 1/7] x86/amd_nb: Add Aldebaran device to PCI IDs

On Wed, Jun 30, 2021 at 08:58:22PM +0530, Naveen Krishna Chatradhi wrote:
> From: Muralidhara M K <[email protected]>
>
> Add Aldebaran device to the PCI ID database. Since this device has a
> configurable PCIe endpoint, it could be used with different drivers.
>
> Signed-off-by: Muralidhara M K <[email protected]>
> Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
> ---
> include/linux/pci_ids.h | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h index
> 4bac1831de80..d9aae90dfce9 100644
> --- a/include/linux/pci_ids.h
> +++ b/include/linux/pci_ids.h
> @@ -554,6 +554,7 @@
> #define PCI_DEVICE_ID_AMD_17H_M30H_DF_F3 0x1493 #define
> PCI_DEVICE_ID_AMD_17H_M60H_DF_F3 0x144b #define
> PCI_DEVICE_ID_AMD_17H_M70H_DF_F3 0x1443
> +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3 0x14d3
> #define PCI_DEVICE_ID_AMD_19H_DF_F3 0x1653
> #define PCI_DEVICE_ID_AMD_19H_M50H_DF_F3 0x166d
> #define PCI_DEVICE_ID_AMD_CNB17H_F3 0x1703
> --

The PCI ID looks right.

But I think this patch can be part of the next patch where this value is first used.
[naveenk:] Squashed this change into the 2nd patch and submitted v2 https://patchwork.kernel.org/project/linux-edac/patch/[email protected]/

Thanks,
Yazen

Subject: RE: [PATCH 4/7] EDAC/mce_amd: extract node id from InstanceHi in IPID

[Public]

Hi Yazen

-----Original Message-----
From: Ghannam, Yazen <[email protected]>
Sent: Thursday, July 29, 2021 10:03 PM
To: Chatradhi, Naveen Krishna <[email protected]>
Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
Subject: Re: [PATCH 4/7] EDAC/mce_amd: extract node id from InstanceHi in IPID

On Wed, Jun 30, 2021 at 08:58:25PM +0530, Naveen Krishna Chatradhi wrote:
> On AMD systems with SMCA banks on NONCPU nodes, the node id
> information is available in the InstanceHI[47:44] of the IPID register.

The doesn't read well to me. I saw this as saying "bits 47:44 of the InstanceHi register". Also, the name of the field is "InstanceIdHi" in the documentation.

I think it'd be more clear to say "available in MCA_IPID[47:44] (InstanceIdHi)" or something similar.
[naveenk:] Modified the commit message

>
> Signed-off-by: Muralidhara M K <[email protected]>
> Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
> ---
> drivers/edac/mce_amd.c | 15 +++++++++++++--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c index
> 27d56920b469..364dfb6e359d 100644
> --- a/drivers/edac/mce_amd.c
> +++ b/drivers/edac/mce_amd.c
> @@ -1049,6 +1049,7 @@ static void decode_smca_error(struct mce *m)
> enum smca_bank_types bank_type;
> const char *ip_name;
> u8 xec = XEC(m->status, xec_mask);
> + u32 node_id = 0;

Why u32? Why not u16 to match topology_die_id() or int to match decode_dram_ecc()?
[naveenk:] Done, used int.

>
> if (m->bank >= ARRAY_SIZE(smca_banks))
> return;
> @@ -1072,8 +1073,18 @@ static void decode_smca_error(struct mce *m)
> if (xec < smca_mce_descs[bank_type].num_descs)
> pr_cont(", %s.\n", smca_mce_descs[bank_type].descs[xec]);
>
> - if (bank_type == SMCA_UMC && xec == 0 && decode_dram_ecc)
> - decode_dram_ecc(topology_die_id(m->extcpu), m);
> + /*
> + * SMCA_UMC_V2 is used on the noncpu nodes, extract the node id
> + * from the InstanceHI[47:44] of the IPID register.
> + */
> + if (bank_type == SMCA_UMC_V2 && xec == 0)
> + node_id = ((m->ipid >> 44) & 0xF);
> +
> + if (bank_type == SMCA_UMC && xec == 0)
> + node_id = topology_die_id(m->extcpu);
> +
> + if (decode_dram_ecc)
> + decode_dram_ecc(node_id, m);

If decode_dram_ecc() is set, then this will call it on every MCA error that comes in. Rather we only want to call it on DRAM ECC errors.

You could do something like this:

if (decode_dram_ecc && xec == 0) {
u32 node_id = 0;

if (bank_type == SMCA_UMC)
node_id = XXX;
else if (bank_type == SMCA_UMC_V2)
node_id = YYY;
else
return;

decode_dram_ecc(node_id, m);
}

This is just an example. Maybe you can save an indentation level by negating those conditions and returning early, etc.
[naveenk:] modified the ladder.

Thanks,
Yazen
[naveenk:] Thank you.

Subject: RE: [PATCH 2/7] x86/amd_nb: Add support for northbridges on Aldebaran

[Public]

Hi Yazen

Regards,
Naveenk

-----Original Message-----
From: Ghannam, Yazen <[email protected]>
Sent: Tuesday, July 20, 2021 1:56 AM
To: Chatradhi, Naveen Krishna <[email protected]>
Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
Subject: Re: [PATCH 2/7] x86/amd_nb: Add support for northbridges on Aldebaran

On Wed, Jun 30, 2021 at 08:58:23PM +0530, Naveen Krishna Chatradhi wrote:
> From: Muralidhara M K <[email protected]>
>
> On newer heterogeneous systems from AMD, there is a possibility of
> having GPU nodes along with CPU nodes with the MCA banks. The GPU
> nodes (noncpu nodes) starts enumerating from northbridge index 8.
>

"there is a possibility of having GPU nodes along with CPU nodes with the MCA banks" doesn't read clearly to me. It could be more explicit.
For example, "On newer systems...the CPUs manages MCA errors reported from the GPUs. Enumerate the GPU nodes with the AMD NB framework to support EDAC, etc." or something like this.

Also, "northbridge index" isn't a hardware thing rather it's an internal Linux value. I think you are referring to the "AMD Node ID" value from CPUID. The GPUs don't have CPUID, so the "AMD Node ID" value can't be directly read like for CPUs. But the current hardware implementation is such that the GPU nodes are enumerated in sequential order based on the PCI hierarchy, and the first GPU node is assumed to have an "AMD Node ID" value of 8 (the second GPU node has 9, etc.). With this implemenation detail, the Data Fabric on the GPU nodes can be accessed the same way as the Data Fabric on CPU nodes.

> Aldebaran GPUs have 2 root ports, with 4 misc port for each root.
>

I don't fully understand this sentence. There are 2 "Nodes"/Data Fabrics per GPU package, but what do "4 misc port for each root" mean? In any case, is this relevant to this patch?

Also, there should be an imperitive in the commit message, i.e. "Add ...".
[naveenk:] Modified the commit message

> Signed-off-by: Muralidhara M K <[email protected]>
> Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
> ---
> arch/x86/include/asm/amd_nb.h | 6 ++++
> arch/x86/kernel/amd_nb.c | 62 ++++++++++++++++++++++++++++++++---
> 2 files changed, 63 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/include/asm/amd_nb.h
> b/arch/x86/include/asm/amd_nb.h index 00d1a400b7a1..e71581cf00e3
> 100644
> --- a/arch/x86/include/asm/amd_nb.h
> +++ b/arch/x86/include/asm/amd_nb.h
> @@ -79,6 +79,12 @@ struct amd_northbridge_info {
>
> #ifdef CONFIG_AMD_NB
>
> +/*
> + * On Newer heterogeneous systems from AMD with CPU and GPU nodes
> +connected
> + * via xGMI links, the NON CPU Nodes are enumerated from index 8 */
> +#define NONCPU_NODE_INDEX 8

"Newer" doesn't need to be capatilized. And there should be a period at the end of the sentence.

I don't think "xGMI links" would mean much to most folks. I think the implication here is that the CPUs and GPUs are connected directly together (or rather their Data Fabrics are connected) like is done with
2 socket CPU systems and also within a socket for Multi-chip Module
(MCM) CPUs like Naples.
[naveenk:] Modified the message

> +
> u16 amd_nb_num(void);
> bool amd_nb_has_feature(unsigned int feature); struct
> amd_northbridge *node_to_amd_nb(int node); diff --git
> a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c index
> 5884dfa619ff..489003e850dd 100644
> --- a/arch/x86/kernel/amd_nb.c
> +++ b/arch/x86/kernel/amd_nb.c
> @@ -26,6 +26,8 @@
> #define PCI_DEVICE_ID_AMD_17H_M70H_DF_F4 0x1444
> #define PCI_DEVICE_ID_AMD_19H_DF_F4 0x1654
> #define PCI_DEVICE_ID_AMD_19H_M50H_DF_F4 0x166e
> +#define PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT 0x14bb
> +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4 0x14d4
>

These PCI IDs look correct.

> /* Protect the PCI config register pairs used for SMN. */ static
> DEFINE_MUTEX(smn_mutex); @@ -94,6 +96,21 @@ static const struct
> pci_device_id hygon_nb_link_ids[] = {
> {}
> };
>
> +static const struct pci_device_id amd_noncpu_root_ids[] = {
> + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT) },
> + {}
> +};
> +
> +static const struct pci_device_id amd_noncpu_nb_misc_ids[] = {
> + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3) },
> + {}
> +};
> +
> +static const struct pci_device_id amd_noncpu_nb_link_ids[] = {
> + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4) },
> + {}
> +};
> +

I think separating the CPU and non-CPU IDs is a good idea.

> const struct amd_nb_bus_dev_range amd_nb_bus_dev_ranges[] __initconst = {
> { 0x00, 0x18, 0x20 },
> { 0xff, 0x00, 0x20 },
> @@ -182,11 +199,16 @@ int amd_cache_northbridges(void)
> const struct pci_device_id *misc_ids = amd_nb_misc_ids;
> const struct pci_device_id *link_ids = amd_nb_link_ids;
> const struct pci_device_id *root_ids = amd_root_ids;
> +
> + const struct pci_device_id *noncpu_misc_ids = amd_noncpu_nb_misc_ids;
> + const struct pci_device_id *noncpu_link_ids = amd_noncpu_nb_link_ids;
> + const struct pci_device_id *noncpu_root_ids = amd_noncpu_root_ids;
> +
> struct pci_dev *root, *misc, *link;
> struct amd_northbridge *nb;
> u16 roots_per_misc = 0;
> - u16 misc_count = 0;
> - u16 root_count = 0;
> + u16 misc_count = 0, misc_count_noncpu = 0;
> + u16 root_count = 0, root_count_noncpu = 0;
> u16 i, j;
>
> if (amd_northbridges.num)
> @@ -205,10 +227,16 @@ int amd_cache_northbridges(void)
> if (!misc_count)
> return -ENODEV;
>
> + while ((misc = next_northbridge(misc, noncpu_misc_ids)) != NULL)
> + misc_count_noncpu++;
> +
> root = NULL;
> while ((root = next_northbridge(root, root_ids)) != NULL)
> root_count++;
>
> + while ((root = next_northbridge(root, noncpu_root_ids)) != NULL)
> + root_count_noncpu++;
> +
> if (root_count) {
> roots_per_misc = root_count / misc_count;
>
> @@ -222,15 +250,27 @@ int amd_cache_northbridges(void)
> }
> }
>
> - nb = kcalloc(misc_count, sizeof(struct amd_northbridge), GFP_KERNEL);
> + /*
> + * The valid amd_northbridges are in between (0 ~ misc_count) and
> + * (NONCPU_NODE_INDEX ~ NONCPU_NODE_INDEX + misc_count_noncpu)
> + */

This comment isn't clear to me. Is it even necessary?
[naveenk:] moved the message

> + if (misc_count_noncpu)
> + /*
> + * There are NONCPU Nodes with pci root ports starting at index 8
> + * allocate few extra cells for simplicity in handling the indexes
> + */

I think this comment can be more explicit. The first non-CPU Node ID starts at 8 even if there are fewer than 8 CPU nodes. To maintain the AMD Node ID to Linux amd_nb indexing scheme, allocate the number of GPU nodes plus 8. Some allocated amd_northbridge structures will go unused when the number of CPU nodes is less than 8, but this tradeoff is to keep things relatively simple.

> + amd_northbridges.num = NONCPU_NODE_INDEX + misc_count_noncpu;
> + else
> + amd_northbridges.num = misc_count;

The if-else statements should have {}s even though there's only a single line of code in each. This is just to make it easier to read multiple lines. Or the second code comment can be merged with the first outside the if-else.
[naveenk:] Done

> +
> + nb = kcalloc(amd_northbridges.num, sizeof(struct amd_northbridge),
> +GFP_KERNEL);
> if (!nb)
> return -ENOMEM;
>
> amd_northbridges.nb = nb;
> - amd_northbridges.num = misc_count;
>
> link = misc = root = NULL;
> - for (i = 0; i < amd_northbridges.num; i++) {
> + for (i = 0; i < misc_count; i++) {
> node_to_amd_nb(i)->root = root =
> next_northbridge(root, root_ids);
> node_to_amd_nb(i)->misc = misc =
> @@ -251,6 +291,18 @@ int amd_cache_northbridges(void)
> root = next_northbridge(root, root_ids);
> }
>
> + link = misc = root = NULL;

This line can go inside the if statement below.
[naveenk:] Done

I'm not sure it's totally necessary since the GPU devices should be listed after the CPU devices. But I guess better safe than sorry in case that implementation detail doesn't hold in the future. If you keep it, then I think you should do the same above when finding the counts.

> + if (misc_count_noncpu) {
> + for (i = NONCPU_NODE_INDEX; i < NONCPU_NODE_INDEX + misc_count_noncpu; i++) {
> + node_to_amd_nb(i)->root = root =
> + next_northbridge(root, noncpu_root_ids);
> + node_to_amd_nb(i)->misc = misc =
> + next_northbridge(misc, noncpu_misc_ids);
> + node_to_amd_nb(i)->link = link =
> + next_northbridge(link, noncpu_link_ids);
> + }
> + }
> +
> if (amd_gart_present())
> amd_northbridges.flags |= AMD_NB_GART;
>
> --

Thanks,
Yazen
[naveenk:] Than you

Subject: RE: [PATCH 5/7] EDAC/amd64: Enumerate memory on noncpu nodes

[Public]

Hi Yazen,

-----Original Message-----
From: Ghannam, Yazen <[email protected]>
Sent: Thursday, July 29, 2021 11:26 PM
To: Chatradhi, Naveen Krishna <[email protected]>
Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
Subject: Re: [PATCH 5/7] EDAC/amd64: Enumerate memory on noncpu nodes

On Wed, Jun 30, 2021 at 08:58:26PM +0530, Naveen Krishna Chatradhi wrote:
> On newer heterogeneous systems from AMD with GPU nodes connected via
> xGMI links to the CPUs, the GPU dies are interfaced with HBM2 memory.
>
> This patch modifies the amd64_edac module to handle the HBM memory
> enumeration leveraging the existing edac and the amd64 specific data
> structures.
>
> The UMC Phys on GPU nodes are enumerated as csrows The UMC channels
> connected to HBMs are enumerated as ranks
>

Please make sure there is some imperative statement in the commit message. And watch out for grammar, punctuation, etc.

> Signed-off-by: Muralidhara M K <[email protected]>
> Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
> ---
> drivers/edac/amd64_edac.c | 300
> +++++++++++++++++++++++++++++---------
> drivers/edac/amd64_edac.h | 27 ++++
> 2 files changed, 259 insertions(+), 68 deletions(-)
>
> diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
> index 25c6362e414b..8fe0a5e3c8f2 100644
> --- a/drivers/edac/amd64_edac.c
> +++ b/drivers/edac/amd64_edac.c
> @@ -1741,6 +1741,9 @@ static unsigned long determine_edac_cap(struct
> amd64_pvt *pvt)
>
> if (umc_en_mask == dimm_ecc_en_mask)
> edac_cap = EDAC_FLAG_SECDED;
> +
> + if (pvt->is_noncpu)
> + edac_cap = EDAC_FLAG_EC;

This flag means "Error Checking - no correction". Is that appropriate for these devices?
[naveenk:] Used FLAG_SECDED which seems appropriate for our requirement.

> } else {
> bit = (pvt->fam > 0xf || pvt->ext_model >= K8_REV_F)
> ? 19
> @@ -1799,6 +1802,9 @@ static int f17_get_cs_mode(int dimm, u8 ctrl,
> struct amd64_pvt *pvt) {
> int cs_mode = 0;
>
> + if (pvt->is_noncpu)
> + return CS_EVEN_PRIMARY | CS_ODD_PRIMARY;
> +

Why do this function call if the values are hard-coded? I think you can just set them below.
[naveenk:] The mode was required in 2 places, instead of hardcoding in 2 places we have modified once in the f17_get_cs_mode()

> if (csrow_enabled(2 * dimm, ctrl, pvt))
> cs_mode |= CS_EVEN_PRIMARY;
>
> @@ -1818,6 +1824,15 @@ static void debug_display_dimm_sizes_df(struct
> amd64_pvt *pvt, u8 ctrl)
>
> edac_printk(KERN_DEBUG, EDAC_MC, "UMC%d chip selects:\n", ctrl);
>
> + if (pvt->is_noncpu) {
> + cs_mode = f17_get_cs_mode(cs0, ctrl, pvt);
> + for_each_chip_select(cs0, ctrl, pvt) {
> + size0 = pvt->ops->dbam_to_cs(pvt, ctrl, cs_mode, cs0);
> + amd64_info(EDAC_MC ": %d: %5dMB\n", cs0, size0);

So this puts each chip select size on a new line rather than grouping by twos. Is there a logical or physical reason for the difference?
[naveenk:] The existing code is printing mode, size for cs0 and cs1. I did not find a reason to keep cs1 for noncpu nodes. Hence the difference.

> + }
> + return;
> + }
> +
> for (dimm = 0; dimm < 2; dimm++) {
> cs0 = dimm * 2;
> cs1 = dimm * 2 + 1;
> @@ -1833,43 +1848,53 @@ static void debug_display_dimm_sizes_df(struct amd64_pvt *pvt, u8 ctrl)
> }
> }
>
> -static void __dump_misc_regs_df(struct amd64_pvt *pvt)
> +static void dump_umcch_regs(struct amd64_pvt *pvt, int i)
> {
> - struct amd64_umc *umc;
> - u32 i, tmp, umc_base;
> -
> - for_each_umc(i) {
> - umc_base = get_umc_base(i);
> - umc = &pvt->umc[i];
> + struct amd64_umc *umc = &pvt->umc[i];
> + u32 tmp, umc_base;
>
> - edac_dbg(1, "UMC%d DIMM cfg: 0x%x\n", i, umc->dimm_cfg);
> + if (pvt->is_noncpu) {
> edac_dbg(1, "UMC%d UMC cfg: 0x%x\n", i, umc->umc_cfg);
> edac_dbg(1, "UMC%d SDP ctrl: 0x%x\n", i, umc->sdp_ctrl);
> edac_dbg(1, "UMC%d ECC ctrl: 0x%x\n", i, umc->ecc_ctrl);
> + edac_dbg(1, "UMC%d All HBMs support ECC: yes\n", i);
> + return;
> + }
>
> - amd_smn_read(pvt->mc_node_id, umc_base + UMCCH_ECC_BAD_SYMBOL, &tmp);
> - edac_dbg(1, "UMC%d ECC bad symbol: 0x%x\n", i, tmp);
> -
> - amd_smn_read(pvt->mc_node_id, umc_base + UMCCH_UMC_CAP, &tmp);
> - edac_dbg(1, "UMC%d UMC cap: 0x%x\n", i, tmp);
> - edac_dbg(1, "UMC%d UMC cap high: 0x%x\n", i, umc->umc_cap_hi);
> -
> - edac_dbg(1, "UMC%d ECC capable: %s, ChipKill ECC capable: %s\n",
> - i, (umc->umc_cap_hi & BIT(30)) ? "yes" : "no",
> - (umc->umc_cap_hi & BIT(31)) ? "yes" : "no");
> - edac_dbg(1, "UMC%d All DIMMs support ECC: %s\n",
> - i, (umc->umc_cfg & BIT(12)) ? "yes" : "no");
> - edac_dbg(1, "UMC%d x4 DIMMs present: %s\n",
> - i, (umc->dimm_cfg & BIT(6)) ? "yes" : "no");
> - edac_dbg(1, "UMC%d x16 DIMMs present: %s\n",
> - i, (umc->dimm_cfg & BIT(7)) ? "yes" : "no");
> -
> - if (pvt->dram_type == MEM_LRDDR4) {
> - amd_smn_read(pvt->mc_node_id, umc_base + UMCCH_ADDR_CFG, &tmp);
> - edac_dbg(1, "UMC%d LRDIMM %dx rank multiply\n",
> - i, 1 << ((tmp >> 4) & 0x3));
> - }
> + umc_base = get_umc_base(i);
> +
> + edac_dbg(1, "UMC%d DIMM cfg: 0x%x\n", i, umc->dimm_cfg);
> +
> + amd_smn_read(pvt->mc_node_id, umc_base + UMCCH_ECC_BAD_SYMBOL, &tmp);
> + edac_dbg(1, "UMC%d ECC bad symbol: 0x%x\n", i, tmp);
> +
> + amd_smn_read(pvt->mc_node_id, umc_base + UMCCH_UMC_CAP, &tmp);
> + edac_dbg(1, "UMC%d UMC cap: 0x%x\n", i, tmp);
> + edac_dbg(1, "UMC%d UMC cap high: 0x%x\n", i, umc->umc_cap_hi);
>
> + edac_dbg(1, "UMC%d ECC capable: %s, ChipKill ECC capable: %s\n",
> + i, (umc->umc_cap_hi & BIT(30)) ? "yes" : "no",
> + (umc->umc_cap_hi & BIT(31)) ? "yes" : "no");
> + edac_dbg(1, "UMC%d All DIMMs support ECC: %s\n",
> + i, (umc->umc_cfg & BIT(12)) ? "yes" : "no");
> + edac_dbg(1, "UMC%d x4 DIMMs present: %s\n",
> + i, (umc->dimm_cfg & BIT(6)) ? "yes" : "no");
> + edac_dbg(1, "UMC%d x16 DIMMs present: %s\n",
> + i, (umc->dimm_cfg & BIT(7)) ? "yes" : "no");
> +
> + if (pvt->dram_type == MEM_LRDDR4) {
> + amd_smn_read(pvt->mc_node_id, umc_base + UMCCH_ADDR_CFG, &tmp);
> + edac_dbg(1, "UMC%d LRDIMM %dx rank multiply\n",
> + i, 1 << ((tmp >> 4) & 0x3));
> + }
> +}
> +
> +static void __dump_misc_regs_df(struct amd64_pvt *pvt) {
> + int i;
> +
> + for_each_umc(i) {
> + dump_umcch_regs(pvt, i);
> debug_display_dimm_sizes_df(pvt, i);
> }
>
> @@ -1937,10 +1962,14 @@ static void prep_chip_selects(struct amd64_pvt *pvt)
> pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2;
> } else if (pvt->fam >= 0x17) {
> int umc;
> -
> for_each_umc(umc) {
> - pvt->csels[umc].b_cnt = 4;
> - pvt->csels[umc].m_cnt = 2;
> + if (pvt->is_noncpu) {
> + pvt->csels[umc].b_cnt = 8;
> + pvt->csels[umc].m_cnt = 8;
> + } else {
> + pvt->csels[umc].b_cnt = 4;
> + pvt->csels[umc].m_cnt = 2;
> + }
> }
>
> } else {
> @@ -1949,6 +1978,31 @@ static void prep_chip_selects(struct amd64_pvt *pvt)
> }
> }
>
> +static void read_noncpu_umc_base_mask(struct amd64_pvt *pvt) {
> + u32 base_reg, mask_reg;
> + u32 *base, *mask;
> + int umc, cs;
> +
> + for_each_umc(umc) {
> + for_each_chip_select(cs, umc, pvt) {
> + base_reg = get_noncpu_umc_base(umc, cs) + UMCCH_BASE_ADDR;
> + base = &pvt->csels[umc].csbases[cs];
> +
> + if (!amd_smn_read(pvt->mc_node_id, base_reg, base))
> + edac_dbg(0, " DCSB%d[%d]=0x%08x reg: 0x%x\n",
> + umc, cs, *base, base_reg);
> +
> + mask_reg = get_noncpu_umc_base(umc, cs) + UMCCH_ADDR_MASK;
> + mask = &pvt->csels[umc].csmasks[cs];
> +
> + if (!amd_smn_read(pvt->mc_node_id, mask_reg, mask))
> + edac_dbg(0, " DCSM%d[%d]=0x%08x reg: 0x%x\n",
> + umc, cs, *mask, mask_reg);
> + }
> + }
> +}
> +
> static void read_umc_base_mask(struct amd64_pvt *pvt) {
> u32 umc_base_reg, umc_base_reg_sec;
> @@ -2009,8 +2063,12 @@ static void read_dct_base_mask(struct amd64_pvt
> *pvt)
>
> prep_chip_selects(pvt);
>
> - if (pvt->umc)
> - return read_umc_base_mask(pvt);
> + if (pvt->umc) {
> + if (pvt->is_noncpu)
> + return read_noncpu_umc_base_mask(pvt);
> + else
> + return read_umc_base_mask(pvt);
> + }
>
> for_each_chip_select(cs, 0, pvt) {
> int reg0 = DCSB0 + (cs * 4);
> @@ -2056,6 +2114,10 @@ static void determine_memory_type(struct amd64_pvt *pvt)
> u32 dram_ctrl, dcsm;
>
> if (pvt->umc) {
> + if (pvt->is_noncpu) {
> + pvt->dram_type = MEM_HBM2;
> + return;
> + }
> if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(5))
> pvt->dram_type = MEM_LRDDR4;
> else if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(4)) @@
> -2445,7 +2507,10 @@ static int f17_early_channel_count(struct
> amd64_pvt *pvt)
>
> /* SDP Control bit 31 (SdpInit) is clear for unused UMC channels */
> for_each_umc(i)
> - channels += !!(pvt->umc[i].sdp_ctrl & UMC_SDP_INIT);
> + if (pvt->is_noncpu)
> + channels += pvt->csels[i].b_cnt;
> + else
> + channels += !!(pvt->umc[i].sdp_ctrl & UMC_SDP_INIT);
>
> amd64_info("MCT channel count: %d\n", channels);
>
> @@ -2586,6 +2651,12 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
> u32 msb, weight, num_zero_bits;
> int dimm, size = 0;
>
> + if (pvt->is_noncpu) {
> + addr_mask_orig = pvt->csels[umc].csmasks[csrow_nr];
> + /* The memory channels in case of GPUs are fully populated */
> + goto skip_noncpu;
> + }
> +
> /* No Chip Selects are enabled. */
> if (!cs_mode)
> return size;
> @@ -2611,6 +2682,7 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
> else
> addr_mask_orig = pvt->csels[umc].csmasks[dimm];
>
> + skip_noncpu:
> /*
> * The number of zero bits in the mask is equal to the number of bits
> * in a full mask minus the number of bits in the current mask.
> @@ -3356,6 +3428,16 @@ static struct amd64_family_type family_types[] = {
> .dbam_to_cs = f17_addr_mask_to_cs_size,
> }
> },
> + [ALDEBARAN_GPUS] = {
> + .ctl_name = "ALDEBARAN",
> + .f0_id = PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F0,
> + .f6_id = PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F6,
> + .max_mcs = 4,
> + .ops = {
> + .early_channel_count = f17_early_channel_count,
> + .dbam_to_cs = f17_addr_mask_to_cs_size,
> + }
> + },
> };
>
> /*
> @@ -3611,6 +3693,19 @@ static int find_umc_channel(struct mce *m)
> return (m->ipid & GENMASK(31, 0)) >> 20; }
>
> +/*
> + * The HBM memory managed by the UMCCH of the noncpu node
> + * can be calculated based on the [15:12]bits of IPID as follows */
> +static int find_umc_channel_noncpu(struct mce *m) {
> + u8 umc, ch;
> +
> + umc = find_umc_channel(m);
> + ch = ((m->ipid >> 12) & 0xf);
> + return umc % 2 ? (ch + 4) : ch;
> +}
> +
> static void decode_umc_error(int node_id, struct mce *m) {
> u8 ecc_type = (m->status >> 45) & 0x3; @@ -3618,6 +3713,7 @@ static
> void decode_umc_error(int node_id, struct mce *m)
> struct amd64_pvt *pvt;
> struct err_info err;
> u64 sys_addr = m->addr;
> + u8 umc_num;
>
> mci = edac_mc_find(node_id);
> if (!mci)
> @@ -3630,7 +3726,16 @@ static void decode_umc_error(int node_id, struct mce *m)
> if (m->status & MCI_STATUS_DEFERRED)
> ecc_type = 3;
>
> - err.channel = find_umc_channel(m);
> + if (pvt->is_noncpu) {
> + err.csrow = find_umc_channel(m) / 2;
> + /* The UMC channel is reported as the csrow in case of the noncpu nodes */
> + err.channel = find_umc_channel_noncpu(m);
> + umc_num = err.csrow * 8 + err.channel;
> + } else {
> + err.channel = find_umc_channel(m);
> + err.csrow = m->synd & 0x7;
> + umc_num = err.channel;
> + }
>
> if (!(m->status & MCI_STATUS_SYNDV)) {
> err.err_code = ERR_SYND;
> @@ -3646,9 +3751,7 @@ static void decode_umc_error(int node_id, struct mce *m)
> err.err_code = ERR_CHANNEL;
> }
>
> - err.csrow = m->synd & 0x7;
> -
> - if (umc_normaddr_to_sysaddr(&sys_addr, pvt->mc_node_id, err.channel)) {
> + if (umc_normaddr_to_sysaddr(&sys_addr, pvt->mc_node_id, umc_num)) {
> err.err_code = ERR_NORM_ADDR;
> goto log_error;
> }
> @@ -3775,15 +3878,20 @@ static void __read_mc_regs_df(struct amd64_pvt
> *pvt)
>
> /* Read registers from each UMC */
> for_each_umc(i) {
> + if (pvt->is_noncpu)
> + umc_base = get_noncpu_umc_base(i, 0);
> + else
> + umc_base = get_umc_base(i);
>
> - umc_base = get_umc_base(i);
> umc = &pvt->umc[i];
> -
> - amd_smn_read(nid, umc_base + UMCCH_DIMM_CFG, &umc->dimm_cfg);
> amd_smn_read(nid, umc_base + UMCCH_UMC_CFG, &umc->umc_cfg);
> amd_smn_read(nid, umc_base + UMCCH_SDP_CTRL, &umc->sdp_ctrl);
> amd_smn_read(nid, umc_base + UMCCH_ECC_CTRL, &umc->ecc_ctrl);
> - amd_smn_read(nid, umc_base + UMCCH_UMC_CAP_HI, &umc->umc_cap_hi);
> +
> + if (!pvt->is_noncpu) {
> + amd_smn_read(nid, umc_base + UMCCH_DIMM_CFG, &umc->dimm_cfg);
> + amd_smn_read(nid, umc_base + UMCCH_UMC_CAP_HI, &umc->umc_cap_hi);
> + }
> }
> }
>
> @@ -3865,7 +3973,9 @@ static void read_mc_regs(struct amd64_pvt *pvt)
> determine_memory_type(pvt);
> edac_dbg(1, " DIMM type: %s\n", edac_mem_types[pvt->dram_type]);
>
> - determine_ecc_sym_sz(pvt);
> + /* ECC symbol size is not available on NONCPU nodes */
> + if (!pvt->is_noncpu)
> + determine_ecc_sym_sz(pvt);
> }
>
> /*
> @@ -3953,15 +4063,21 @@ static int init_csrows_df(struct mem_ctl_info *mci)
> continue;
>
> empty = 0;
> - dimm = mci->csrows[cs]->channels[umc]->dimm;
> + if (pvt->is_noncpu) {
> + dimm = mci->csrows[umc]->channels[cs]->dimm;
> + dimm->edac_mode = EDAC_SECDED;
> + dimm->dtype = DEV_X16;
> + } else {
> + dimm->edac_mode = edac_mode;
> + dimm->dtype = dev_type;
> + dimm = mci->csrows[cs]->channels[umc]->dimm;

This last line should go before the other two.
[naveenk:] done

> + }
>
> edac_dbg(1, "MC node: %d, csrow: %d\n",
> pvt->mc_node_id, cs);
>
> dimm->nr_pages = get_csrow_nr_pages(pvt, umc, cs);
> dimm->mtype = pvt->dram_type;
> - dimm->edac_mode = edac_mode;
> - dimm->dtype = dev_type;
> dimm->grain = 64;
> }
> }
> @@ -4226,7 +4342,9 @@ static bool ecc_enabled(struct amd64_pvt *pvt)
>
> umc_en_mask |= BIT(i);
>
> - if (umc->umc_cap_hi & UMC_ECC_ENABLED)
> + /* ECC is enabled by default on NONCPU nodes */
> + if (pvt->is_noncpu ||
> + (umc->umc_cap_hi & UMC_ECC_ENABLED))
> ecc_en_mask |= BIT(i);
> }
>
> @@ -4262,6 +4380,11 @@ f17h_determine_edac_ctl_cap(struct mem_ctl_info
> *mci, struct amd64_pvt *pvt) {
> u8 i, ecc_en = 1, cpk_en = 1, dev_x4 = 1, dev_x16 = 1;
>
> + if (pvt->is_noncpu) {
> + mci->edac_ctl_cap |= EDAC_SECDED;
> + return;
> + }
> +
> for_each_umc(i) {
> if (pvt->umc[i].sdp_ctrl & UMC_SDP_INIT) {
> ecc_en &= !!(pvt->umc[i].umc_cap_hi & UMC_ECC_ENABLED); @@ -4292,7
> +4415,11 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci)
> {
> struct amd64_pvt *pvt = mci->pvt_info;
>
> - mci->mtype_cap = MEM_FLAG_DDR2 | MEM_FLAG_RDDR2;
> + if (pvt->is_noncpu)
> + mci->mtype_cap = MEM_FLAG_HBM2;
> + else
> + mci->mtype_cap = MEM_FLAG_DDR2 | MEM_FLAG_RDDR2;
> +
> mci->edac_ctl_cap = EDAC_FLAG_NONE;
>
> if (pvt->umc) {
> @@ -4397,11 +4524,25 @@ static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt)
> fam_type = &family_types[F17_M70H_CPUS];
> pvt->ops = &family_types[F17_M70H_CPUS].ops;
> fam_type->ctl_name = "F19h_M20h";
> - break;
> + } else if (pvt->model >= 0x30 && pvt->model <= 0x3f) {
> + if (pvt->is_noncpu) {
> + int tmp = 0;
> +
> + fam_type = &family_types[ALDEBARAN_GPUS];
> + pvt->ops = &family_types[ALDEBARAN_GPUS].ops;
> + tmp = pvt->mc_node_id - NONCPU_NODE_INDEX;

This can be set when you declare it.
[naveenk:] done

> + sprintf(pvt->buf, "Aldebaran#%ddie#%d", tmp / 2, tmp % 2);
> + fam_type->ctl_name = pvt->buf;
> + } else {
> + fam_type = &family_types[F19_CPUS];
> + pvt->ops = &family_types[F19_CPUS].ops;
> + fam_type->ctl_name = "F19h_M30h";
> + }
> + } else {
> + fam_type = &family_types[F19_CPUS];
> + pvt->ops = &family_types[F19_CPUS].ops;
> + family_types[F19_CPUS].ctl_name = "F19h";
> }
> - fam_type = &family_types[F19_CPUS];
> - pvt->ops = &family_types[F19_CPUS].ops;
> - family_types[F19_CPUS].ctl_name = "F19h";
> break;
>
> default:
> @@ -4454,6 +4595,30 @@ static void hw_info_put(struct amd64_pvt *pvt)
> kfree(pvt->umc);
> }
>
> +static void populate_layers(struct amd64_pvt *pvt, struct
> +edac_mc_layer *layers) {
> + if (pvt->is_noncpu) {
> + layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
> + layers[0].size = fam_type->max_mcs;
> + layers[0].is_virt_csrow = true;
> + layers[1].type = EDAC_MC_LAYER_CHANNEL;
> + layers[1].size = pvt->csels[0].b_cnt;
> + layers[1].is_virt_csrow = false;

This looks mostly the same as below but the sizes are different. Can't you keep all this together and just adjust the sizes?
[naveenk:] done

> + } else {
> + layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
> + layers[0].size = pvt->csels[0].b_cnt;
> + layers[0].is_virt_csrow = true;
> + layers[1].type = EDAC_MC_LAYER_CHANNEL;
> + /*
> + * Always allocate two channels since we can have setups with
> + * DIMMs on only one channel. Also, this simplifies handling
> + * later for the price of a couple of KBs tops.
> + */
> + layers[1].size = fam_type->max_mcs;
> + layers[1].is_virt_csrow = false;
> + }
> +}
> +
> static int init_one_instance(struct amd64_pvt *pvt) {
> struct mem_ctl_info *mci = NULL;
> @@ -4469,19 +4634,8 @@ static int init_one_instance(struct amd64_pvt *pvt)
> if (pvt->channel_count < 0)
> return ret;
>
> - ret = -ENOMEM;
> - layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
> - layers[0].size = pvt->csels[0].b_cnt;
> - layers[0].is_virt_csrow = true;
> - layers[1].type = EDAC_MC_LAYER_CHANNEL;
> -
> - /*
> - * Always allocate two channels since we can have setups with DIMMs on
> - * only one channel. Also, this simplifies handling later for the price
> - * of a couple of KBs tops.
> - */
> - layers[1].size = fam_type->max_mcs;
> - layers[1].is_virt_csrow = false;
> + /* Define layers for CPU and NONCPU nodes */
> + populate_layers(pvt, layers);
>
> mci = edac_mc_alloc(pvt->mc_node_id, ARRAY_SIZE(layers), layers, 0);
> if (!mci)
> @@ -4525,6 +4679,9 @@ static int probe_one_instance(unsigned int nid)
> struct ecc_settings *s;
> int ret;
>
> + if (!F3)
> + return -EINVAL;
> +

Why is this needed?
[naveenk:] make sense, returning 0 instead.

> ret = -ENOMEM;
> s = kzalloc(sizeof(struct ecc_settings), GFP_KERNEL);
> if (!s)
> @@ -4536,6 +4693,9 @@ static int probe_one_instance(unsigned int nid)
> if (!pvt)
> goto err_settings;
>
> + if (nid >= NONCPU_NODE_INDEX)
> + pvt->is_noncpu = true;
> +
> pvt->mc_node_id = nid;
> pvt->F3 = F3;
>
> @@ -4609,6 +4769,10 @@ static void remove_one_instance(unsigned int nid)
> struct mem_ctl_info *mci;
> struct amd64_pvt *pvt;
>
> + /* Nothing to remove for the space holder entries */
> + if (!F3)
> + return;
> +
> /* Remove from EDAC CORE tracking list */
> mci = edac_mc_del_mc(&F3->dev);
> if (!mci)
> @@ -4682,7 +4846,7 @@ static int __init amd64_edac_init(void)
>
> for (i = 0; i < amd_nb_num(); i++) {
> err = probe_one_instance(i);
> - if (err) {
> + if (err && (err != -EINVAL)) {

If the !F3 condition above is "okay", why not just return 0 (success)?
[naveenk:] done

> /* unwind properly */
> while (--i >= 0)
> remove_one_instance(i);
> diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
> index 85aa820bc165..6d5f7b3afc83 100644
> --- a/drivers/edac/amd64_edac.h
> +++ b/drivers/edac/amd64_edac.h
> @@ -126,6 +126,8 @@
> #define PCI_DEVICE_ID_AMD_17H_M70H_DF_F6 0x1446
> #define PCI_DEVICE_ID_AMD_19H_DF_F0 0x1650
> #define PCI_DEVICE_ID_AMD_19H_DF_F6 0x1656
> +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F0 0x14D0
> +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F6 0x14D6
>

These are correct.

> /*
> * Function 1 - Address Map
> @@ -298,6 +300,7 @@ enum amd_families {
> F17_M60H_CPUS,
> F17_M70H_CPUS,
> F19_CPUS,
> + ALDEBARAN_GPUS,
> NUM_FAMILIES,
> };
>
> @@ -389,6 +392,9 @@ struct amd64_pvt {
> enum mem_type dram_type;
>
> struct amd64_umc *umc; /* UMC registers */
> + char buf[20];
> +
> + u8 is_noncpu;

Can this be a "bool"?
[naveenk:] done

> };
>
> enum err_codes {
> @@ -410,6 +416,27 @@ struct err_info {
> u32 offset;
> };
>
> +static inline u32 get_noncpu_umc_base(u8 umc, u8 channel) {
> + /*
> + * On the NONCPU nodes, base address is calculated based on
> + * UMC channel and the HBM channel.
> + *
> + * UMC channels are selected in 6th nibble
> + * UMC chY[3:0]= [(chY*2 + 1) : (chY*2)]50000;
> + *
> + * HBM channels are selected in 3rd nibble
> + * HBM chX[3:0]= [Y ]5X[3:0]000;
> + * HBM chX[7:4]= [Y+1]5X[3:0]000
> + */
> + umc *= 2;
> +
> + if (channel / 4)

Can this be "if (channel >= 4)"?
[naveenk:] done

> + umc++;
> +
> + return 0x50000 + (umc << 20) + ((channel % 4) << 12); }
> +
> static inline u32 get_umc_base(u8 channel) {
> /* chY: 0xY50000 */
> --

There are a lot of changes in this patch. I think you should give the highlights in the commit message. For example, you may want to say if you introduced new functions, changed code flow, etc., and why this is needed compared to existing systems. I think the code comments have some details, but a summary in the commit message may help.
[naveenk:] Updated the commit message to reflect the changes.

Thanks,
Yazen
[naveenk:] Thank you

2021-08-10 16:35:12

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 1/7] x86/amd_nb: Add Aldebaran device to PCI IDs

On Tue, Aug 10, 2021 at 12:45:17PM +0000, Chatradhi, Naveen Krishna wrote:
> But I think this patch can be part of the next patch
> where this value is first used. [naveenk:] Squashed
> this change into the 2nd patch and submitted v2
> https://patchwork.kernel.org/project/linux-edac/patch/20210806074350.1
> [email protected]/

Btw, I'd suggest you find someone at AMD to teach you to use a proper
mail client for replying to lkml messages which does proper quoting,
etc. Outlook and windoze simply isn't cut out for this type of
communication but rather for managerial blabla.

Alternatively, you can read this here:

https://www.kernel.org/doc/html/latest/process/email-clients.html

and try to set up something yourself.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-08-20 15:39:41

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] x86/amd_nb: Add support for northbridges on Aldebaran

On Fri, Aug 06, 2021 at 01:13:48PM +0530, Naveen Krishna Chatradhi wrote:
> From: Muralidhara M K <[email protected]>
>
> On newer systems the CPUs manage MCA errors reported from the GPUs.
> Enumerate the GPU nodes with the AMD NB framework to support EDAC.
>
> This patch adds necessary code to manage the Aldebaran nodes along with
> the CPU nodes.
>
> The GPU nodes are enumerated in sequential order based on the
> PCI hierarchy, and the first GPU node is assumed to have an "AMD Node
> ID" value of 8 (the second GPU node has 9, etc.). Each Aldebaran GPU
> package has 2 Data Fabrics, which are enumerated as 2 nodes.
> With this implementation detail, the Data Fabric on the GPU nodes can be
> accessed the same way as the Data Fabric on CPU nodes.
>
> Signed-off-by: Muralidhara M K <[email protected]>
> Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
> ---
> Changes since v1:
> 1. Modified the commit message and comments in the code
> 2. Squashed patch 1/7: "x86/amd_nb: Add Aldebaran device to PCI IDs"

It's nice to have a link or links to previous patches here.

For example,
https://lkml.kernel.org/r/<Message-ID>

>
> arch/x86/include/asm/amd_nb.h | 10 ++++++
> arch/x86/kernel/amd_nb.c | 63 ++++++++++++++++++++++++++++++++---
> include/linux/pci_ids.h | 1 +
> 3 files changed, 69 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h
> index 00d1a400b7a1..f15247422992 100644
> --- a/arch/x86/include/asm/amd_nb.h
> +++ b/arch/x86/include/asm/amd_nb.h
> @@ -79,6 +79,16 @@ struct amd_northbridge_info {
>
> #ifdef CONFIG_AMD_NB
>
> +/*
> + * On newer heterogeneous systems the data gabrics of the CPUs and GPUs
> + * are connected directly via a custom links, like is done with
> + * 2 socket CPU systems and also within a socket for Multi-chip Module
> + * (MCM) CPUs like Naples.
> + * The first GPU node(non cpu) is assumed to have an "AMD Node ID" value
> + * of 8 (the second GPU node has 9, etc.).
> + */
> +#define NONCPU_NODE_INDEX 8
> +
> u16 amd_nb_num(void);
> bool amd_nb_has_feature(unsigned int feature);
> struct amd_northbridge *node_to_amd_nb(int node);
> diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
> index 5884dfa619ff..5597135a18b5 100644
> --- a/arch/x86/kernel/amd_nb.c
> +++ b/arch/x86/kernel/amd_nb.c
> @@ -26,6 +26,8 @@
> #define PCI_DEVICE_ID_AMD_17H_M70H_DF_F4 0x1444
> #define PCI_DEVICE_ID_AMD_19H_DF_F4 0x1654
> #define PCI_DEVICE_ID_AMD_19H_M50H_DF_F4 0x166e
> +#define PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT 0x14bb
> +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4 0x14d4
>
> /* Protect the PCI config register pairs used for SMN. */
> static DEFINE_MUTEX(smn_mutex);
> @@ -94,6 +96,21 @@ static const struct pci_device_id hygon_nb_link_ids[] = {
> {}
> };
>
> +static const struct pci_device_id amd_noncpu_root_ids[] = {
> + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT) },
> + {}
> +};
> +
> +static const struct pci_device_id amd_noncpu_nb_misc_ids[] = {
> + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3) },
> + {}
> +};
> +
> +static const struct pci_device_id amd_noncpu_nb_link_ids[] = {
> + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4) },
> + {}
> +};
> +
> const struct amd_nb_bus_dev_range amd_nb_bus_dev_ranges[] __initconst = {
> { 0x00, 0x18, 0x20 },
> { 0xff, 0x00, 0x20 },
> @@ -182,11 +199,16 @@ int amd_cache_northbridges(void)
> const struct pci_device_id *misc_ids = amd_nb_misc_ids;
> const struct pci_device_id *link_ids = amd_nb_link_ids;
> const struct pci_device_id *root_ids = amd_root_ids;
> +
> + const struct pci_device_id *noncpu_misc_ids = amd_noncpu_nb_misc_ids;
> + const struct pci_device_id *noncpu_link_ids = amd_noncpu_nb_link_ids;
> + const struct pci_device_id *noncpu_root_ids = amd_noncpu_root_ids;
> +
> struct pci_dev *root, *misc, *link;
> struct amd_northbridge *nb;
> u16 roots_per_misc = 0;
> - u16 misc_count = 0;
> - u16 root_count = 0;
> + u16 misc_count = 0, misc_count_noncpu = 0;
> + u16 root_count = 0, root_count_noncpu = 0;
> u16 i, j;
>
> if (amd_northbridges.num)
> @@ -205,10 +227,16 @@ int amd_cache_northbridges(void)
> if (!misc_count)
> return -ENODEV;
>
> + while ((misc = next_northbridge(misc, noncpu_misc_ids)) != NULL)
> + misc_count_noncpu++;
> +
> root = NULL;
> while ((root = next_northbridge(root, root_ids)) != NULL)
> root_count++;
>
> + while ((root = next_northbridge(root, noncpu_root_ids)) != NULL)
> + root_count_noncpu++;
> +
> if (root_count) {
> roots_per_misc = root_count / misc_count;
>
> @@ -222,15 +250,28 @@ int amd_cache_northbridges(void)
> }
> }
>
> - nb = kcalloc(misc_count, sizeof(struct amd_northbridge), GFP_KERNEL);
> + if (misc_count_noncpu) {
> + /*
> + * The first non-CPU Node ID starts at 8 even if there are fewer
> + * than 8 CPU nodes. To maintain the AMD Node ID to Linux amd_nb
> + * indexing scheme, allocate the number of GPU nodes plus 8.
> + * Some allocated amd_northbridge structures will go unused when
> + * the number of CPU nodes is less than 8, but this tradeoff is to
> + * keep things relatively simple.
> + */
> + amd_northbridges.num = NONCPU_NODE_INDEX + misc_count_noncpu;
> + } else {
> + amd_northbridges.num = misc_count;
> + }
> +
> + nb = kcalloc(amd_northbridges.num, sizeof(struct amd_northbridge), GFP_KERNEL);
> if (!nb)
> return -ENOMEM;
>
> amd_northbridges.nb = nb;
> - amd_northbridges.num = misc_count;
>
> link = misc = root = NULL;
> - for (i = 0; i < amd_northbridges.num; i++) {
> + for (i = 0; i < misc_count; i++) {
> node_to_amd_nb(i)->root = root =
> next_northbridge(root, root_ids);
> node_to_amd_nb(i)->misc = misc =
> @@ -251,6 +292,18 @@ int amd_cache_northbridges(void)
> root = next_northbridge(root, root_ids);
> }
>
> + if (misc_count_noncpu) {
> + link = misc = root = NULL;
> + for (i = NONCPU_NODE_INDEX; i < NONCPU_NODE_INDEX + misc_count_noncpu; i++) {
> + node_to_amd_nb(i)->root = root =
> + next_northbridge(root, noncpu_root_ids);
> + node_to_amd_nb(i)->misc = misc =
> + next_northbridge(misc, noncpu_misc_ids);
> + node_to_amd_nb(i)->link = link =
> + next_northbridge(link, noncpu_link_ids);
> + }
> + }
> +
> if (amd_gart_present())
> amd_northbridges.flags |= AMD_NB_GART;
>
> diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
> index 4bac1831de80..d9aae90dfce9 100644
> --- a/include/linux/pci_ids.h
> +++ b/include/linux/pci_ids.h
> @@ -554,6 +554,7 @@
> #define PCI_DEVICE_ID_AMD_17H_M30H_DF_F3 0x1493
> #define PCI_DEVICE_ID_AMD_17H_M60H_DF_F3 0x144b
> #define PCI_DEVICE_ID_AMD_17H_M70H_DF_F3 0x1443
> +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3 0x14d3
> #define PCI_DEVICE_ID_AMD_19H_DF_F3 0x1653
> #define PCI_DEVICE_ID_AMD_19H_M50H_DF_F3 0x166d
> #define PCI_DEVICE_ID_AMD_CNB17H_F3 0x1703
> --

Reviewed-by: Yazen Ghannam <[email protected]>

Thanks,
Yazen

2021-08-20 15:48:25

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH v2 2/3] EDAC/mce_amd: Extract node id from InstanceHi in IPID

On Fri, Aug 06, 2021 at 01:13:49PM +0530, Naveen Krishna Chatradhi wrote:
> On AMD systems with SMCA banks on NONCPU nodes, the node id
> information is available in MCA_IPID[47:44](InstanceIdHi).
>

The bitfield name in the $SUBJECT is wrong.

Also, the commit message implies that this behavior applies to all MCA
banks on systems with NONCPU nodes. But rather it only applies to the
banks on the NONCPU nodes.

Thanks,
Yazen

2021-08-20 17:05:05

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] EDAC/amd64: Enumerate memory on noncpu nodes

On Fri, Aug 06, 2021 at 01:13:50PM +0530, Naveen Krishna Chatradhi wrote:
> On newer heterogeneous systems from AMD with GPU nodes interfaced
> with HBM2 memory are connected to the CPUs via custom links.
>

This sentence is not clear to me.

> This patch modifies the amd64_edac module to handle the HBM memory
> enumeration leveraging the existing edac and the amd64 specific data
> structures.
>
> This patch does the following for non-cpu nodes:
> 1. Define PCI IDs and ops for Aldeberarn GPUs in family_types array.
> 2. The UMC Phys on GPU nodes are enumerated as csrows and the UMC channels
> connected to HBMs are enumerated as ranks.
> 3. Define a function to find the UMCv2 channel number
> 4. Define a function to calculate base address of the UMCv2 registers
> 5. Add debug information for UMCv2 channel registers.
>

I don't think you need to say "This patch does..." and give a list. Just
write each point as a line in the commit message.

> Signed-off-by: Muralidhara M K <[email protected]>
> Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
> ---
> Changes since v1:
> 1. Modifed the commit message
> 2. Change the edac_cap
> 3. kept sizes of both cpu and noncpu together
> 4. return success if the !F3 condition true and remove unnecessary validation
> 5. declared is_noncpu as bool
> 6. modified the condition from channel/4 to channel>=4
> 7. Rearranged debug information for noncpu umcch registers
>
> drivers/edac/amd64_edac.c | 202 +++++++++++++++++++++++++++++++++-----
> drivers/edac/amd64_edac.h | 27 +++++
> 2 files changed, 202 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
> index b03c33240238..2dd77a828394 100644
> --- a/drivers/edac/amd64_edac.c
> +++ b/drivers/edac/amd64_edac.c
> @@ -1979,6 +1979,9 @@ static unsigned long determine_edac_cap(struct amd64_pvt *pvt)
>
> if (umc_en_mask == dimm_ecc_en_mask)
> edac_cap = EDAC_FLAG_SECDED;
> +
> + if (pvt->is_noncpu)
> + edac_cap = EDAC_FLAG_SECDED;
> } else {
> bit = (pvt->fam > 0xf || pvt->ext_model >= K8_REV_F)
> ? 19
> @@ -2037,6 +2040,9 @@ static int f17_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt)
> {
> int cs_mode = 0;
>
> + if (pvt->is_noncpu)
> + return CS_EVEN_PRIMARY | CS_ODD_PRIMARY;
> +
> if (csrow_enabled(2 * dimm, ctrl, pvt))
> cs_mode |= CS_EVEN_PRIMARY;
>
> @@ -2056,6 +2062,15 @@ static void debug_display_dimm_sizes_df(struct amd64_pvt *pvt, u8 ctrl)
>
> edac_printk(KERN_DEBUG, EDAC_MC, "UMC%d chip selects:\n", ctrl);
>
> + if (pvt->is_noncpu) {
> + cs_mode = f17_get_cs_mode(cs0, ctrl, pvt);
> + for_each_chip_select(cs0, ctrl, pvt) {
> + size0 = pvt->ops->dbam_to_cs(pvt, ctrl, cs_mode, cs0);
> + amd64_info(EDAC_MC ": %d: %5dMB\n", cs0, size0);
> + }
> + return;
> + }
> +
> for (dimm = 0; dimm < 2; dimm++) {
> cs0 = dimm * 2;
> cs1 = dimm * 2 + 1;
> @@ -2080,10 +2095,15 @@ static void __dump_misc_regs_df(struct amd64_pvt *pvt)
> umc_base = get_umc_base(i);
> umc = &pvt->umc[i];
>
> - edac_dbg(1, "UMC%d DIMM cfg: 0x%x\n", i, umc->dimm_cfg);
> + if (!pvt->is_noncpu)
> + edac_dbg(1, "UMC%d DIMM cfg: 0x%x\n", i, umc->dimm_cfg);
> edac_dbg(1, "UMC%d UMC cfg: 0x%x\n", i, umc->umc_cfg);
> edac_dbg(1, "UMC%d SDP ctrl: 0x%x\n", i, umc->sdp_ctrl);
> edac_dbg(1, "UMC%d ECC ctrl: 0x%x\n", i, umc->ecc_ctrl);
> + if (pvt->is_noncpu) {
> + edac_dbg(1, "UMC%d All HBMs support ECC: yes\n", i);
> + goto dimm_size;
> + }
>
> amd_smn_read(pvt->mc_node_id, umc_base + UMCCH_ECC_BAD_SYMBOL, &tmp);
> edac_dbg(1, "UMC%d ECC bad symbol: 0x%x\n", i, tmp);
> @@ -2108,6 +2128,7 @@ static void __dump_misc_regs_df(struct amd64_pvt *pvt)
> i, 1 << ((tmp >> 4) & 0x3));
> }
>
> + dimm_size:
> debug_display_dimm_sizes_df(pvt, i);
> }
>
> @@ -2175,10 +2196,14 @@ static void prep_chip_selects(struct amd64_pvt *pvt)
> pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2;
> } else if (pvt->fam >= 0x17) {
> int umc;
> -

This looks like a stray change. Was it intentional?

> for_each_umc(umc) {
> - pvt->csels[umc].b_cnt = 4;
> - pvt->csels[umc].m_cnt = 2;
> + if (pvt->is_noncpu) {
> + pvt->csels[umc].b_cnt = 8;
> + pvt->csels[umc].m_cnt = 8;
> + } else {
> + pvt->csels[umc].b_cnt = 4;
> + pvt->csels[umc].m_cnt = 2;
> + }
> }
>
> } else {
> @@ -2187,6 +2212,31 @@ static void prep_chip_selects(struct amd64_pvt *pvt)
> }
> }
>
> +static void read_noncpu_umc_base_mask(struct amd64_pvt *pvt)
> +{
> + u32 base_reg, mask_reg;
> + u32 *base, *mask;
> + int umc, cs;
> +
> + for_each_umc(umc) {
> + for_each_chip_select(cs, umc, pvt) {
> + base_reg = get_noncpu_umc_base(umc, cs) + UMCCH_BASE_ADDR;
> + base = &pvt->csels[umc].csbases[cs];
> +
> + if (!amd_smn_read(pvt->mc_node_id, base_reg, base))
> + edac_dbg(0, " DCSB%d[%d]=0x%08x reg: 0x%x\n",
> + umc, cs, *base, base_reg);
> +

There should be {} here since the code spans multiple lines.

> + mask_reg = get_noncpu_umc_base(umc, cs) + UMCCH_ADDR_MASK;
> + mask = &pvt->csels[umc].csmasks[cs];
> +
> + if (!amd_smn_read(pvt->mc_node_id, mask_reg, mask))
> + edac_dbg(0, " DCSM%d[%d]=0x%08x reg: 0x%x\n",
> + umc, cs, *mask, mask_reg);

Same as above.

> + }
> + }
> +}
> +
> static void read_umc_base_mask(struct amd64_pvt *pvt)
> {
> u32 umc_base_reg, umc_base_reg_sec;
> @@ -2247,8 +2297,12 @@ static void read_dct_base_mask(struct amd64_pvt *pvt)
>
> prep_chip_selects(pvt);
>
> - if (pvt->umc)
> - return read_umc_base_mask(pvt);
> + if (pvt->umc) {
> + if (pvt->is_noncpu)
> + return read_noncpu_umc_base_mask(pvt);
> + else
> + return read_umc_base_mask(pvt);
> + }
>
> for_each_chip_select(cs, 0, pvt) {
> int reg0 = DCSB0 + (cs * 4);
> @@ -2294,6 +2348,10 @@ static void determine_memory_type(struct amd64_pvt *pvt)
> u32 dram_ctrl, dcsm;
>
> if (pvt->umc) {
> + if (pvt->is_noncpu) {
> + pvt->dram_type = MEM_HBM2;
> + return;
> + }

Needs a newline here.

> if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(5))
> pvt->dram_type = MEM_LRDDR4;
> else if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(4))
> @@ -2683,7 +2741,10 @@ static int f17_early_channel_count(struct amd64_pvt *pvt)
>
> /* SDP Control bit 31 (SdpInit) is clear for unused UMC channels */
> for_each_umc(i)
> - channels += !!(pvt->umc[i].sdp_ctrl & UMC_SDP_INIT);
> + if (pvt->is_noncpu)
> + channels += pvt->csels[i].b_cnt;
> + else
> + channels += !!(pvt->umc[i].sdp_ctrl & UMC_SDP_INIT);
>
> amd64_info("MCT channel count: %d\n", channels);
>
> @@ -2824,6 +2885,12 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
> u32 msb, weight, num_zero_bits;
> int dimm, size = 0;
>
> + if (pvt->is_noncpu) {
> + addr_mask_orig = pvt->csels[umc].csmasks[csrow_nr];
> + /* The memory channels in case of GPUs are fully populated */
> + goto skip_noncpu;
> + }
> +
> /* No Chip Selects are enabled. */
> if (!cs_mode)
> return size;
> @@ -2849,6 +2916,7 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
> else
> addr_mask_orig = pvt->csels[umc].csmasks[dimm];
>
> + skip_noncpu:
> /*
> * The number of zero bits in the mask is equal to the number of bits
> * in a full mask minus the number of bits in the current mask.
> @@ -3594,6 +3662,16 @@ static struct amd64_family_type family_types[] = {
> .dbam_to_cs = f17_addr_mask_to_cs_size,
> }
> },
> + [ALDEBARAN_GPUS] = {
> + .ctl_name = "ALDEBARAN",
> + .f0_id = PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F0,
> + .f6_id = PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F6,
> + .max_mcs = 4,
> + .ops = {
> + .early_channel_count = f17_early_channel_count,
> + .dbam_to_cs = f17_addr_mask_to_cs_size,
> + }
> + },
> };
>
> /*
> @@ -3849,6 +3927,19 @@ static int find_umc_channel(struct mce *m)
> return (m->ipid & GENMASK(31, 0)) >> 20;
> }
>
> +/*
> + * The HBM memory managed by the UMCCH of the noncpu node
> + * can be calculated based on the [15:12]bits of IPID as follows

This comment doesn't make sense to me.

Maybe it'll help to give some more context. The CPUs have one channel
per UMC, so a UMC number is equivalent to a channel number. The GPUs
have 8 channels per UMC, so the UMC number no longer works as a channel
number. The channel number within a GPU UMC is given in MCA_IPID[15:12].
However, the IDs are split such that two UMC values go to one UMC, and
the channel numbers are split in two groups of four.

For example,
UMC0 CH[3:0] = 0x0005[3:0]000
UMC0 CH[7:4] = 0x0015[3:0]000
UMC1 CH[3:0] = 0x0025[3:0]000
UMC1 CH[7:4] = 0x0035[3:0]000

> + */
> +static int find_umc_channel_noncpu(struct mce *m)
> +{
> + u8 umc, ch;
> +
> + umc = find_umc_channel(m);
> + ch = ((m->ipid >> 12) & 0xf);

Each of these can be on a single line when declared above.

Also, please leave a newline before the return.

> + return umc % 2 ? (ch + 4) : ch;
> +}
> +
> static void decode_umc_error(int node_id, struct mce *m)
> {
> u8 ecc_type = (m->status >> 45) & 0x3;
> @@ -3856,6 +3947,7 @@ static void decode_umc_error(int node_id, struct mce *m)
> struct amd64_pvt *pvt;
> struct err_info err;
> u64 sys_addr = m->addr;
> + u8 umc_num;
>
> mci = edac_mc_find(node_id);
> if (!mci)
> @@ -3868,7 +3960,17 @@ static void decode_umc_error(int node_id, struct mce *m)
> if (m->status & MCI_STATUS_DEFERRED)
> ecc_type = 3;
>
> - err.channel = find_umc_channel(m);
> + if (pvt->is_noncpu) {
> + /* The UMCPHY is reported as csrow in case of noncpu nodes */
> + err.csrow = find_umc_channel(m) / 2;
> + /* UMCCH is managing the HBM memory */
> + err.channel = find_umc_channel_noncpu(m);

I don't think "UMCPHY" or "UMCCH" a clear here. Like above, more context
should help. The GPUs have one Chip Select per UMC, so the UMC number can
be used as the Chip Select number. However, the UMC number is split in
the ID value as mentioned above, so that's why it's necessary to divide
by 2.

> + umc_num = err.csrow * 8 + err.channel;

Now here "umc_num" is getting overloaded. The value in this line is not
"N" in UMCN, e.g. UMC0, UMC1, etc. It's an artificial value we construct
to have a global (within a GPU node) ID for each GPU memory channel.

This is used as the "DF Instance ID" input to the address translation
code. On CPUs this is "Channel"="UMC Number"="DF Instance ID". On GPUs,
this value is calculated according to the line above.

So I think "umc_num" should be renamed to something like "df_inst_id" to
be more explicit.

> + } else {
> + err.channel = find_umc_channel(m);
> + err.csrow = m->synd & 0x7;
> + umc_num = err.channel;
> + }
>
> if (!(m->status & MCI_STATUS_SYNDV)) {
> err.err_code = ERR_SYND;
> @@ -3884,9 +3986,7 @@ static void decode_umc_error(int node_id, struct mce *m)
> err.err_code = ERR_CHANNEL;
> }
>
> - err.csrow = m->synd & 0x7;
> -
> - if (umc_normaddr_to_sysaddr(&sys_addr, pvt->mc_node_id, err.channel)) {
> + if (umc_normaddr_to_sysaddr(&sys_addr, pvt->mc_node_id, umc_num)) {
> err.err_code = ERR_NORM_ADDR;
> goto log_error;
> }
> @@ -4013,15 +4113,20 @@ static void __read_mc_regs_df(struct amd64_pvt *pvt)
>
> /* Read registers from each UMC */
> for_each_umc(i) {
> + if (pvt->is_noncpu)
> + umc_base = get_noncpu_umc_base(i, 0);
> + else
> + umc_base = get_umc_base(i);
>
> - umc_base = get_umc_base(i);
> umc = &pvt->umc[i];
> -

Another spurious line deletion?

> - amd_smn_read(nid, umc_base + UMCCH_DIMM_CFG, &umc->dimm_cfg);
> amd_smn_read(nid, umc_base + UMCCH_UMC_CFG, &umc->umc_cfg);
> amd_smn_read(nid, umc_base + UMCCH_SDP_CTRL, &umc->sdp_ctrl);
> amd_smn_read(nid, umc_base + UMCCH_ECC_CTRL, &umc->ecc_ctrl);
> - amd_smn_read(nid, umc_base + UMCCH_UMC_CAP_HI, &umc->umc_cap_hi);
> +
> + if (!pvt->is_noncpu) {
> + amd_smn_read(nid, umc_base + UMCCH_DIMM_CFG, &umc->dimm_cfg);
> + amd_smn_read(nid, umc_base + UMCCH_UMC_CAP_HI, &umc->umc_cap_hi);
> + }
> }
> }
>
> @@ -4103,7 +4208,9 @@ static void read_mc_regs(struct amd64_pvt *pvt)
> determine_memory_type(pvt);
> edac_dbg(1, " DIMM type: %s\n", edac_mem_types[pvt->dram_type]);
>
> - determine_ecc_sym_sz(pvt);
> + /* ECC symbol size is not available on NONCPU nodes */
> + if (!pvt->is_noncpu)
> + determine_ecc_sym_sz(pvt);
> }
>
> /*
> @@ -4191,15 +4298,21 @@ static int init_csrows_df(struct mem_ctl_info *mci)
> continue;
>
> empty = 0;
> - dimm = mci->csrows[cs]->channels[umc]->dimm;
> + if (pvt->is_noncpu) {
> + dimm = mci->csrows[umc]->channels[cs]->dimm;
> + dimm->edac_mode = EDAC_SECDED;
> + dimm->dtype = DEV_X16;
> + } else {
> + dimm = mci->csrows[cs]->channels[umc]->dimm;
> + dimm->edac_mode = edac_mode;
> + dimm->dtype = dev_type;
> + }
>
> edac_dbg(1, "MC node: %d, csrow: %d\n",
> pvt->mc_node_id, cs);
>
> dimm->nr_pages = get_csrow_nr_pages(pvt, umc, cs);
> dimm->mtype = pvt->dram_type;
> - dimm->edac_mode = edac_mode;
> - dimm->dtype = dev_type;
> dimm->grain = 64;
> }
> }
> @@ -4464,7 +4577,9 @@ static bool ecc_enabled(struct amd64_pvt *pvt)
>
> umc_en_mask |= BIT(i);
>
> - if (umc->umc_cap_hi & UMC_ECC_ENABLED)
> + /* ECC is enabled by default on NONCPU nodes */
> + if (pvt->is_noncpu ||

Can you skip all the bitmask stuff and just say "ecc_en=true" if on a
non-CPU node?

> + (umc->umc_cap_hi & UMC_ECC_ENABLED))
> ecc_en_mask |= BIT(i);
> }
>
> @@ -4500,6 +4615,11 @@ f17h_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt *pvt)
> {
> u8 i, ecc_en = 1, cpk_en = 1, dev_x4 = 1, dev_x16 = 1;
>
> + if (pvt->is_noncpu) {
> + mci->edac_ctl_cap |= EDAC_SECDED;
> + return;
> + }
> +
> for_each_umc(i) {
> if (pvt->umc[i].sdp_ctrl & UMC_SDP_INIT) {
> ecc_en &= !!(pvt->umc[i].umc_cap_hi & UMC_ECC_ENABLED);
> @@ -4530,7 +4650,11 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci)
> {
> struct amd64_pvt *pvt = mci->pvt_info;
>
> - mci->mtype_cap = MEM_FLAG_DDR2 | MEM_FLAG_RDDR2;
> + if (pvt->is_noncpu)
> + mci->mtype_cap = MEM_FLAG_HBM2;
> + else
> + mci->mtype_cap = MEM_FLAG_DDR2 | MEM_FLAG_RDDR2;
> +
> mci->edac_ctl_cap = EDAC_FLAG_NONE;
>
> if (pvt->umc) {
> @@ -4635,11 +4759,24 @@ static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt)
> fam_type = &family_types[F17_M70H_CPUS];
> pvt->ops = &family_types[F17_M70H_CPUS].ops;
> fam_type->ctl_name = "F19h_M20h";
> - break;
> + } else if (pvt->model >= 0x30 && pvt->model <= 0x3f) {
> + if (pvt->is_noncpu) {
> + int tmp = pvt->mc_node_id - NONCPU_NODE_INDEX;
> +
> + fam_type = &family_types[ALDEBARAN_GPUS];
> + pvt->ops = &family_types[ALDEBARAN_GPUS].ops;
> + sprintf(pvt->buf, "Aldebaran#%ddie#%d", tmp / 2, tmp % 2);
> + fam_type->ctl_name = pvt->buf;

I like the idea of giving unique names for each "MC". Maybe this can be
used on the CPU nodes too? I'll check this out. Thanks for the idea.

> + } else {
> + fam_type = &family_types[F19_CPUS];
> + pvt->ops = &family_types[F19_CPUS].ops;
> + fam_type->ctl_name = "F19h_M30h";
> + }
> + } else {
> + fam_type = &family_types[F19_CPUS];
> + pvt->ops = &family_types[F19_CPUS].ops;
> + family_types[F19_CPUS].ctl_name = "F19h";
> }
> - fam_type = &family_types[F19_CPUS];
> - pvt->ops = &family_types[F19_CPUS].ops;
> - family_types[F19_CPUS].ctl_name = "F19h";
> break;
>
> default:
> @@ -4707,9 +4844,10 @@ static int init_one_instance(struct amd64_pvt *pvt)
> if (pvt->channel_count < 0)
> return ret;
>
> + /* Define layers for CPU and NONCPU nodes */
> ret = -ENOMEM;
> layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
> - layers[0].size = pvt->csels[0].b_cnt;
> + layers[0].size = pvt->is_noncpu ? fam_type->max_mcs : pvt->csels[0].b_cnt;
> layers[0].is_virt_csrow = true;
> layers[1].type = EDAC_MC_LAYER_CHANNEL;
>
> @@ -4718,7 +4856,7 @@ static int init_one_instance(struct amd64_pvt *pvt)
> * only one channel. Also, this simplifies handling later for the price
> * of a couple of KBs tops.
> */
> - layers[1].size = fam_type->max_mcs;
> + layers[1].size = pvt->is_noncpu ? pvt->csels[0].b_cnt : fam_type->max_mcs;
> layers[1].is_virt_csrow = false;
>
> mci = edac_mc_alloc(pvt->mc_node_id, ARRAY_SIZE(layers), layers, 0);
> @@ -4763,6 +4901,9 @@ static int probe_one_instance(unsigned int nid)
> struct ecc_settings *s;
> int ret;
>
> + if (!F3)
> + return 0;
> +
> ret = -ENOMEM;
> s = kzalloc(sizeof(struct ecc_settings), GFP_KERNEL);
> if (!s)
> @@ -4774,6 +4915,9 @@ static int probe_one_instance(unsigned int nid)
> if (!pvt)
> goto err_settings;
>
> + if (nid >= NONCPU_NODE_INDEX)
> + pvt->is_noncpu = true;
> +
> pvt->mc_node_id = nid;
> pvt->F3 = F3;
>
> @@ -4847,6 +4991,10 @@ static void remove_one_instance(unsigned int nid)
> struct mem_ctl_info *mci;
> struct amd64_pvt *pvt;
>
> + /* Nothing to remove for the space holder entries */
> + if (!F3)
> + return;
> +
> /* Remove from EDAC CORE tracking list */
> mci = edac_mc_del_mc(&F3->dev);
> if (!mci)
> diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
> index 85aa820bc165..c5532a6f0c34 100644
> --- a/drivers/edac/amd64_edac.h
> +++ b/drivers/edac/amd64_edac.h
> @@ -126,6 +126,8 @@
> #define PCI_DEVICE_ID_AMD_17H_M70H_DF_F6 0x1446
> #define PCI_DEVICE_ID_AMD_19H_DF_F0 0x1650
> #define PCI_DEVICE_ID_AMD_19H_DF_F6 0x1656
> +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F0 0x14D0
> +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F6 0x14D6
>
> /*
> * Function 1 - Address Map
> @@ -298,6 +300,7 @@ enum amd_families {
> F17_M60H_CPUS,
> F17_M70H_CPUS,
> F19_CPUS,
> + ALDEBARAN_GPUS,
> NUM_FAMILIES,
> };
>
> @@ -389,6 +392,9 @@ struct amd64_pvt {
> enum mem_type dram_type;
>
> struct amd64_umc *umc; /* UMC registers */
> + char buf[20];
> +
> + bool is_noncpu;
> };
>
> enum err_codes {
> @@ -410,6 +416,27 @@ struct err_info {
> u32 offset;
> };
>
> +static inline u32 get_noncpu_umc_base(u8 umc, u8 channel)
> +{
> + /*
> + * On the NONCPU nodes, base address is calculated based on
> + * UMC channel and the HBM channel.
> + *
> + * UMC channels are selected in 6th nibble
> + * UMC chY[3:0]= [(chY*2 + 1) : (chY*2)]50000;
> + *
> + * HBM channels are selected in 3rd nibble
> + * HBM chX[3:0]= [Y ]5X[3:0]000;
> + * HBM chX[7:4]= [Y+1]5X[3:0]000

The wording in this comment is inaccurate. There is no "UMC channel" and
"HBM channel". There is only "channel". On CPUs, there is one channel
per UMC, so UMC numbering equals channel numbering. On GPUs, there are
eight channels per UMC, so the channel numbering is different from UMC
numbering.

The notes are good overall, so I think it'll help to reference this
comment in amd64_edac.c where appropriate.

> + */
> + umc *= 2;
> +
> + if (channel >= 4)
> + umc++;
> +
> + return 0x50000 + (umc << 20) + ((channel % 4) << 12);
> +}
> +
> static inline u32 get_umc_base(u8 channel)
> {
> /* chY: 0xY50000 */
> --

Thanks,
Yazen

2021-08-20 17:08:37

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] x86/edac/amd64: Add support for noncpu nodes

On Fri, Aug 06, 2021 at 01:13:47PM +0530, Naveen Krishna Chatradhi wrote:
> From: Muralidhara M K <[email protected]>
>
> On newer heterogeneous systems from AMD with GPU nodes connected via
> xGMI links to the CPUs, the GPU dies are interfaced with HBM2 memory.
>
> This patchset applies on top of the following series by Yazen Ghannam
> AMD MCA Address Translation Updates
> [https://patchwork.kernel.org/project/linux-edac/list/?series=505989]
>

Hi Naveen,

As I was reworking the set referenced above, I got into a circular
dependency with your set here. Can you please rebase your set on the
latest upstream code? I can then base the next version of my set on
yours. I think the only change you may need to make is around the
address translation hunk in amd64_edac.c in Patch 3.

Also, can you please CC me on the next revision of your set?

Thanks,
Yazen

Subject: RE: [PATCH v2 3/3] EDAC/amd64: Enumerate memory on noncpu nodes

Hi Yazen,

Will address your comments and send the next version. Thank you.

Regards,
Naveenk

-----Original Message-----
From: Ghannam, Yazen <[email protected]>
Sent: Friday, August 20, 2021 10:32 PM
To: Chatradhi, Naveen Krishna <[email protected]>
Cc: [email protected]; [email protected]; [email protected]; [email protected]; M K, Muralidhara <[email protected]>
Subject: Re: [PATCH v2 3/3] EDAC/amd64: Enumerate memory on noncpu nodes

On Fri, Aug 06, 2021 at 01:13:50PM +0530, Naveen Krishna Chatradhi wrote:
> On newer heterogeneous systems from AMD with GPU nodes interfaced with
> HBM2 memory are connected to the CPUs via custom links.
>

This sentence is not clear to me.

> This patch modifies the amd64_edac module to handle the HBM memory
> enumeration leveraging the existing edac and the amd64 specific data
> structures.
>
> This patch does the following for non-cpu nodes:
> 1. Define PCI IDs and ops for Aldeberarn GPUs in family_types array.
> 2. The UMC Phys on GPU nodes are enumerated as csrows and the UMC channels
> connected to HBMs are enumerated as ranks.
> 3. Define a function to find the UMCv2 channel number 4. Define a
> function to calculate base address of the UMCv2 registers 5. Add debug
> information for UMCv2 channel registers.
>

I don't think you need to say "This patch does..." and give a list. Just write each point as a line in the commit message.

> Signed-off-by: Muralidhara M K <[email protected]>
> Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
> ---
> Changes since v1:
> 1. Modifed the commit message
> 2. Change the edac_cap
> 3. kept sizes of both cpu and noncpu together 4. return success if the
> !F3 condition true and remove unnecessary validation 5. declared
> is_noncpu as bool 6. modified the condition from channel/4 to
> channel>=4 7. Rearranged debug information for noncpu umcch registers
>
> drivers/edac/amd64_edac.c | 202
> +++++++++++++++++++++++++++++++++-----
> drivers/edac/amd64_edac.h | 27 +++++
> 2 files changed, 202 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
> index b03c33240238..2dd77a828394 100644
> --- a/drivers/edac/amd64_edac.c
> +++ b/drivers/edac/amd64_edac.c
> @@ -1979,6 +1979,9 @@ static unsigned long determine_edac_cap(struct
> amd64_pvt *pvt)
>
> if (umc_en_mask == dimm_ecc_en_mask)
> edac_cap = EDAC_FLAG_SECDED;
> +
> + if (pvt->is_noncpu)
> + edac_cap = EDAC_FLAG_SECDED;
> } else {
> bit = (pvt->fam > 0xf || pvt->ext_model >= K8_REV_F)
> ? 19
> @@ -2037,6 +2040,9 @@ static int f17_get_cs_mode(int dimm, u8 ctrl,
> struct amd64_pvt *pvt) {
> int cs_mode = 0;
>
> + if (pvt->is_noncpu)
> + return CS_EVEN_PRIMARY | CS_ODD_PRIMARY;
> +
> if (csrow_enabled(2 * dimm, ctrl, pvt))
> cs_mode |= CS_EVEN_PRIMARY;
>
> @@ -2056,6 +2062,15 @@ static void debug_display_dimm_sizes_df(struct
> amd64_pvt *pvt, u8 ctrl)
>
> edac_printk(KERN_DEBUG, EDAC_MC, "UMC%d chip selects:\n", ctrl);
>
> + if (pvt->is_noncpu) {
> + cs_mode = f17_get_cs_mode(cs0, ctrl, pvt);
> + for_each_chip_select(cs0, ctrl, pvt) {
> + size0 = pvt->ops->dbam_to_cs(pvt, ctrl, cs_mode, cs0);
> + amd64_info(EDAC_MC ": %d: %5dMB\n", cs0, size0);
> + }
> + return;
> + }
> +
> for (dimm = 0; dimm < 2; dimm++) {
> cs0 = dimm * 2;
> cs1 = dimm * 2 + 1;
> @@ -2080,10 +2095,15 @@ static void __dump_misc_regs_df(struct amd64_pvt *pvt)
> umc_base = get_umc_base(i);
> umc = &pvt->umc[i];
>
> - edac_dbg(1, "UMC%d DIMM cfg: 0x%x\n", i, umc->dimm_cfg);
> + if (!pvt->is_noncpu)
> + edac_dbg(1, "UMC%d DIMM cfg: 0x%x\n", i, umc->dimm_cfg);
> edac_dbg(1, "UMC%d UMC cfg: 0x%x\n", i, umc->umc_cfg);
> edac_dbg(1, "UMC%d SDP ctrl: 0x%x\n", i, umc->sdp_ctrl);
> edac_dbg(1, "UMC%d ECC ctrl: 0x%x\n", i, umc->ecc_ctrl);
> + if (pvt->is_noncpu) {
> + edac_dbg(1, "UMC%d All HBMs support ECC: yes\n", i);
> + goto dimm_size;
> + }
>
> amd_smn_read(pvt->mc_node_id, umc_base + UMCCH_ECC_BAD_SYMBOL, &tmp);
> edac_dbg(1, "UMC%d ECC bad symbol: 0x%x\n", i, tmp); @@ -2108,6
> +2128,7 @@ static void __dump_misc_regs_df(struct amd64_pvt *pvt)
> i, 1 << ((tmp >> 4) & 0x3));
> }
>
> + dimm_size:
> debug_display_dimm_sizes_df(pvt, i);
> }
>
> @@ -2175,10 +2196,14 @@ static void prep_chip_selects(struct amd64_pvt *pvt)
> pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2;
> } else if (pvt->fam >= 0x17) {
> int umc;
> -

This looks like a stray change. Was it intentional?

> for_each_umc(umc) {
> - pvt->csels[umc].b_cnt = 4;
> - pvt->csels[umc].m_cnt = 2;
> + if (pvt->is_noncpu) {
> + pvt->csels[umc].b_cnt = 8;
> + pvt->csels[umc].m_cnt = 8;
> + } else {
> + pvt->csels[umc].b_cnt = 4;
> + pvt->csels[umc].m_cnt = 2;
> + }
> }
>
> } else {
> @@ -2187,6 +2212,31 @@ static void prep_chip_selects(struct amd64_pvt *pvt)
> }
> }
>
> +static void read_noncpu_umc_base_mask(struct amd64_pvt *pvt) {
> + u32 base_reg, mask_reg;
> + u32 *base, *mask;
> + int umc, cs;
> +
> + for_each_umc(umc) {
> + for_each_chip_select(cs, umc, pvt) {
> + base_reg = get_noncpu_umc_base(umc, cs) + UMCCH_BASE_ADDR;
> + base = &pvt->csels[umc].csbases[cs];
> +
> + if (!amd_smn_read(pvt->mc_node_id, base_reg, base))
> + edac_dbg(0, " DCSB%d[%d]=0x%08x reg: 0x%x\n",
> + umc, cs, *base, base_reg);
> +

There should be {} here since the code spans multiple lines.

> + mask_reg = get_noncpu_umc_base(umc, cs) + UMCCH_ADDR_MASK;
> + mask = &pvt->csels[umc].csmasks[cs];
> +
> + if (!amd_smn_read(pvt->mc_node_id, mask_reg, mask))
> + edac_dbg(0, " DCSM%d[%d]=0x%08x reg: 0x%x\n",
> + umc, cs, *mask, mask_reg);

Same as above.

> + }
> + }
> +}
> +
> static void read_umc_base_mask(struct amd64_pvt *pvt) {
> u32 umc_base_reg, umc_base_reg_sec;
> @@ -2247,8 +2297,12 @@ static void read_dct_base_mask(struct amd64_pvt
> *pvt)
>
> prep_chip_selects(pvt);
>
> - if (pvt->umc)
> - return read_umc_base_mask(pvt);
> + if (pvt->umc) {
> + if (pvt->is_noncpu)
> + return read_noncpu_umc_base_mask(pvt);
> + else
> + return read_umc_base_mask(pvt);
> + }
>
> for_each_chip_select(cs, 0, pvt) {
> int reg0 = DCSB0 + (cs * 4);
> @@ -2294,6 +2348,10 @@ static void determine_memory_type(struct amd64_pvt *pvt)
> u32 dram_ctrl, dcsm;
>
> if (pvt->umc) {
> + if (pvt->is_noncpu) {
> + pvt->dram_type = MEM_HBM2;
> + return;
> + }

Needs a newline here.

> if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(5))
> pvt->dram_type = MEM_LRDDR4;
> else if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(4)) @@
> -2683,7 +2741,10 @@ static int f17_early_channel_count(struct
> amd64_pvt *pvt)
>
> /* SDP Control bit 31 (SdpInit) is clear for unused UMC channels */
> for_each_umc(i)
> - channels += !!(pvt->umc[i].sdp_ctrl & UMC_SDP_INIT);
> + if (pvt->is_noncpu)
> + channels += pvt->csels[i].b_cnt;
> + else
> + channels += !!(pvt->umc[i].sdp_ctrl & UMC_SDP_INIT);
>
> amd64_info("MCT channel count: %d\n", channels);
>
> @@ -2824,6 +2885,12 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
> u32 msb, weight, num_zero_bits;
> int dimm, size = 0;
>
> + if (pvt->is_noncpu) {
> + addr_mask_orig = pvt->csels[umc].csmasks[csrow_nr];
> + /* The memory channels in case of GPUs are fully populated */
> + goto skip_noncpu;
> + }
> +
> /* No Chip Selects are enabled. */
> if (!cs_mode)
> return size;
> @@ -2849,6 +2916,7 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
> else
> addr_mask_orig = pvt->csels[umc].csmasks[dimm];
>
> + skip_noncpu:
> /*
> * The number of zero bits in the mask is equal to the number of bits
> * in a full mask minus the number of bits in the current mask.
> @@ -3594,6 +3662,16 @@ static struct amd64_family_type family_types[] = {
> .dbam_to_cs = f17_addr_mask_to_cs_size,
> }
> },
> + [ALDEBARAN_GPUS] = {
> + .ctl_name = "ALDEBARAN",
> + .f0_id = PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F0,
> + .f6_id = PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F6,
> + .max_mcs = 4,
> + .ops = {
> + .early_channel_count = f17_early_channel_count,
> + .dbam_to_cs = f17_addr_mask_to_cs_size,
> + }
> + },
> };
>
> /*
> @@ -3849,6 +3927,19 @@ static int find_umc_channel(struct mce *m)
> return (m->ipid & GENMASK(31, 0)) >> 20; }
>
> +/*
> + * The HBM memory managed by the UMCCH of the noncpu node
> + * can be calculated based on the [15:12]bits of IPID as follows

This comment doesn't make sense to me.

Maybe it'll help to give some more context. The CPUs have one channel per UMC, so a UMC number is equivalent to a channel number. The GPUs have 8 channels per UMC, so the UMC number no longer works as a channel number. The channel number within a GPU UMC is given in MCA_IPID[15:12].
However, the IDs are split such that two UMC values go to one UMC, and the channel numbers are split in two groups of four.

For example,
UMC0 CH[3:0] = 0x0005[3:0]000
UMC0 CH[7:4] = 0x0015[3:0]000
UMC1 CH[3:0] = 0x0025[3:0]000
UMC1 CH[7:4] = 0x0035[3:0]000

> + */
> +static int find_umc_channel_noncpu(struct mce *m) {
> + u8 umc, ch;
> +
> + umc = find_umc_channel(m);
> + ch = ((m->ipid >> 12) & 0xf);

Each of these can be on a single line when declared above.

Also, please leave a newline before the return.

> + return umc % 2 ? (ch + 4) : ch;
> +}
> +
> static void decode_umc_error(int node_id, struct mce *m) {
> u8 ecc_type = (m->status >> 45) & 0x3; @@ -3856,6 +3947,7 @@ static
> void decode_umc_error(int node_id, struct mce *m)
> struct amd64_pvt *pvt;
> struct err_info err;
> u64 sys_addr = m->addr;
> + u8 umc_num;
>
> mci = edac_mc_find(node_id);
> if (!mci)
> @@ -3868,7 +3960,17 @@ static void decode_umc_error(int node_id, struct mce *m)
> if (m->status & MCI_STATUS_DEFERRED)
> ecc_type = 3;
>
> - err.channel = find_umc_channel(m);
> + if (pvt->is_noncpu) {
> + /* The UMCPHY is reported as csrow in case of noncpu nodes */
> + err.csrow = find_umc_channel(m) / 2;
> + /* UMCCH is managing the HBM memory */
> + err.channel = find_umc_channel_noncpu(m);

I don't think "UMCPHY" or "UMCCH" a clear here. Like above, more context should help. The GPUs have one Chip Select per UMC, so the UMC number can be used as the Chip Select number. However, the UMC number is split in the ID value as mentioned above, so that's why it's necessary to divide by 2.

> + umc_num = err.csrow * 8 + err.channel;

Now here "umc_num" is getting overloaded. The value in this line is not "N" in UMCN, e.g. UMC0, UMC1, etc. It's an artificial value we construct to have a global (within a GPU node) ID for each GPU memory channel.

This is used as the "DF Instance ID" input to the address translation code. On CPUs this is "Channel"="UMC Number"="DF Instance ID". On GPUs, this value is calculated according to the line above.

So I think "umc_num" should be renamed to something like "df_inst_id" to be more explicit.

> + } else {
> + err.channel = find_umc_channel(m);
> + err.csrow = m->synd & 0x7;
> + umc_num = err.channel;
> + }
>
> if (!(m->status & MCI_STATUS_SYNDV)) {
> err.err_code = ERR_SYND;
> @@ -3884,9 +3986,7 @@ static void decode_umc_error(int node_id, struct mce *m)
> err.err_code = ERR_CHANNEL;
> }
>
> - err.csrow = m->synd & 0x7;
> -
> - if (umc_normaddr_to_sysaddr(&sys_addr, pvt->mc_node_id, err.channel)) {
> + if (umc_normaddr_to_sysaddr(&sys_addr, pvt->mc_node_id, umc_num)) {
> err.err_code = ERR_NORM_ADDR;
> goto log_error;
> }
> @@ -4013,15 +4113,20 @@ static void __read_mc_regs_df(struct amd64_pvt
> *pvt)
>
> /* Read registers from each UMC */
> for_each_umc(i) {
> + if (pvt->is_noncpu)
> + umc_base = get_noncpu_umc_base(i, 0);
> + else
> + umc_base = get_umc_base(i);
>
> - umc_base = get_umc_base(i);
> umc = &pvt->umc[i];
> -

Another spurious line deletion?

> - amd_smn_read(nid, umc_base + UMCCH_DIMM_CFG, &umc->dimm_cfg);
> amd_smn_read(nid, umc_base + UMCCH_UMC_CFG, &umc->umc_cfg);
> amd_smn_read(nid, umc_base + UMCCH_SDP_CTRL, &umc->sdp_ctrl);
> amd_smn_read(nid, umc_base + UMCCH_ECC_CTRL, &umc->ecc_ctrl);
> - amd_smn_read(nid, umc_base + UMCCH_UMC_CAP_HI, &umc->umc_cap_hi);
> +
> + if (!pvt->is_noncpu) {
> + amd_smn_read(nid, umc_base + UMCCH_DIMM_CFG, &umc->dimm_cfg);
> + amd_smn_read(nid, umc_base + UMCCH_UMC_CAP_HI, &umc->umc_cap_hi);
> + }
> }
> }
>
> @@ -4103,7 +4208,9 @@ static void read_mc_regs(struct amd64_pvt *pvt)
> determine_memory_type(pvt);
> edac_dbg(1, " DIMM type: %s\n", edac_mem_types[pvt->dram_type]);
>
> - determine_ecc_sym_sz(pvt);
> + /* ECC symbol size is not available on NONCPU nodes */
> + if (!pvt->is_noncpu)
> + determine_ecc_sym_sz(pvt);
> }
>
> /*
> @@ -4191,15 +4298,21 @@ static int init_csrows_df(struct mem_ctl_info *mci)
> continue;
>
> empty = 0;
> - dimm = mci->csrows[cs]->channels[umc]->dimm;
> + if (pvt->is_noncpu) {
> + dimm = mci->csrows[umc]->channels[cs]->dimm;
> + dimm->edac_mode = EDAC_SECDED;
> + dimm->dtype = DEV_X16;
> + } else {
> + dimm = mci->csrows[cs]->channels[umc]->dimm;
> + dimm->edac_mode = edac_mode;
> + dimm->dtype = dev_type;
> + }
>
> edac_dbg(1, "MC node: %d, csrow: %d\n",
> pvt->mc_node_id, cs);
>
> dimm->nr_pages = get_csrow_nr_pages(pvt, umc, cs);
> dimm->mtype = pvt->dram_type;
> - dimm->edac_mode = edac_mode;
> - dimm->dtype = dev_type;
> dimm->grain = 64;
> }
> }
> @@ -4464,7 +4577,9 @@ static bool ecc_enabled(struct amd64_pvt *pvt)
>
> umc_en_mask |= BIT(i);
>
> - if (umc->umc_cap_hi & UMC_ECC_ENABLED)
> + /* ECC is enabled by default on NONCPU nodes */
> + if (pvt->is_noncpu ||

Can you skip all the bitmask stuff and just say "ecc_en=true" if on a non-CPU node?
[naveenk:] ecc_en is assigned only when umc_en_mask is set and is equal to ecc_en_mask.
/* Check whether at least one UMC is enabled: */
if (umc_en_mask)
ecc_en = umc_en_mask == ecc_en_mask;
else
edac_dbg(0, "Node %d: No enabled UMCs.\n", nid);

Since, the umc_en_mask is getting set appropriately, handling just the ecc_en_mask is enough to get the right ecc_en.
Setting ecc_en directly dint look nice to me. Let me know if you think otherwise.

> + (umc->umc_cap_hi & UMC_ECC_ENABLED))
> ecc_en_mask |= BIT(i);
> }
>
> @@ -4500,6 +4615,11 @@ f17h_determine_edac_ctl_cap(struct mem_ctl_info
> *mci, struct amd64_pvt *pvt) {
> u8 i, ecc_en = 1, cpk_en = 1, dev_x4 = 1, dev_x16 = 1;
>
> + if (pvt->is_noncpu) {
> + mci->edac_ctl_cap |= EDAC_SECDED;
> + return;
> + }
> +
> for_each_umc(i) {
> if (pvt->umc[i].sdp_ctrl & UMC_SDP_INIT) {
> ecc_en &= !!(pvt->umc[i].umc_cap_hi & UMC_ECC_ENABLED); @@ -4530,7
> +4650,11 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci)
> {
> struct amd64_pvt *pvt = mci->pvt_info;
>
> - mci->mtype_cap = MEM_FLAG_DDR2 | MEM_FLAG_RDDR2;
> + if (pvt->is_noncpu)
> + mci->mtype_cap = MEM_FLAG_HBM2;
> + else
> + mci->mtype_cap = MEM_FLAG_DDR2 | MEM_FLAG_RDDR2;
> +
> mci->edac_ctl_cap = EDAC_FLAG_NONE;
>
> if (pvt->umc) {
> @@ -4635,11 +4759,24 @@ static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt)
> fam_type = &family_types[F17_M70H_CPUS];
> pvt->ops = &family_types[F17_M70H_CPUS].ops;
> fam_type->ctl_name = "F19h_M20h";
> - break;
> + } else if (pvt->model >= 0x30 && pvt->model <= 0x3f) {
> + if (pvt->is_noncpu) {
> + int tmp = pvt->mc_node_id - NONCPU_NODE_INDEX;
> +
> + fam_type = &family_types[ALDEBARAN_GPUS];
> + pvt->ops = &family_types[ALDEBARAN_GPUS].ops;
> + sprintf(pvt->buf, "Aldebaran#%ddie#%d", tmp / 2, tmp % 2);
> + fam_type->ctl_name = pvt->buf;

I like the idea of giving unique names for each "MC". Maybe this can be used on the CPU nodes too? I'll check this out. Thanks for the idea.

> + } else {
> + fam_type = &family_types[F19_CPUS];
> + pvt->ops = &family_types[F19_CPUS].ops;
> + fam_type->ctl_name = "F19h_M30h";
> + }
> + } else {
> + fam_type = &family_types[F19_CPUS];
> + pvt->ops = &family_types[F19_CPUS].ops;
> + family_types[F19_CPUS].ctl_name = "F19h";
> }
> - fam_type = &family_types[F19_CPUS];
> - pvt->ops = &family_types[F19_CPUS].ops;
> - family_types[F19_CPUS].ctl_name = "F19h";
> break;
>
> default:
> @@ -4707,9 +4844,10 @@ static int init_one_instance(struct amd64_pvt *pvt)
> if (pvt->channel_count < 0)
> return ret;
>
> + /* Define layers for CPU and NONCPU nodes */
> ret = -ENOMEM;
> layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
> - layers[0].size = pvt->csels[0].b_cnt;
> + layers[0].size = pvt->is_noncpu ? fam_type->max_mcs :
> +pvt->csels[0].b_cnt;
> layers[0].is_virt_csrow = true;
> layers[1].type = EDAC_MC_LAYER_CHANNEL;
>
> @@ -4718,7 +4856,7 @@ static int init_one_instance(struct amd64_pvt *pvt)
> * only one channel. Also, this simplifies handling later for the price
> * of a couple of KBs tops.
> */
> - layers[1].size = fam_type->max_mcs;
> + layers[1].size = pvt->is_noncpu ? pvt->csels[0].b_cnt :
> +fam_type->max_mcs;
> layers[1].is_virt_csrow = false;
>
> mci = edac_mc_alloc(pvt->mc_node_id, ARRAY_SIZE(layers), layers, 0);
> @@ -4763,6 +4901,9 @@ static int probe_one_instance(unsigned int nid)
> struct ecc_settings *s;
> int ret;
>
> + if (!F3)
> + return 0;
> +
> ret = -ENOMEM;
> s = kzalloc(sizeof(struct ecc_settings), GFP_KERNEL);
> if (!s)
> @@ -4774,6 +4915,9 @@ static int probe_one_instance(unsigned int nid)
> if (!pvt)
> goto err_settings;
>
> + if (nid >= NONCPU_NODE_INDEX)
> + pvt->is_noncpu = true;
> +
> pvt->mc_node_id = nid;
> pvt->F3 = F3;
>
> @@ -4847,6 +4991,10 @@ static void remove_one_instance(unsigned int nid)
> struct mem_ctl_info *mci;
> struct amd64_pvt *pvt;
>
> + /* Nothing to remove for the space holder entries */
> + if (!F3)
> + return;
> +
> /* Remove from EDAC CORE tracking list */
> mci = edac_mc_del_mc(&F3->dev);
> if (!mci)
> diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
> index 85aa820bc165..c5532a6f0c34 100644
> --- a/drivers/edac/amd64_edac.h
> +++ b/drivers/edac/amd64_edac.h
> @@ -126,6 +126,8 @@
> #define PCI_DEVICE_ID_AMD_17H_M70H_DF_F6 0x1446
> #define PCI_DEVICE_ID_AMD_19H_DF_F0 0x1650
> #define PCI_DEVICE_ID_AMD_19H_DF_F6 0x1656
> +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F0 0x14D0
> +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F6 0x14D6
>
> /*
> * Function 1 - Address Map
> @@ -298,6 +300,7 @@ enum amd_families {
> F17_M60H_CPUS,
> F17_M70H_CPUS,
> F19_CPUS,
> + ALDEBARAN_GPUS,
> NUM_FAMILIES,
> };
>
> @@ -389,6 +392,9 @@ struct amd64_pvt {
> enum mem_type dram_type;
>
> struct amd64_umc *umc; /* UMC registers */
> + char buf[20];
> +
> + bool is_noncpu;
> };
>
> enum err_codes {
> @@ -410,6 +416,27 @@ struct err_info {
> u32 offset;
> };
>
> +static inline u32 get_noncpu_umc_base(u8 umc, u8 channel) {
> + /*
> + * On the NONCPU nodes, base address is calculated based on
> + * UMC channel and the HBM channel.
> + *
> + * UMC channels are selected in 6th nibble
> + * UMC chY[3:0]= [(chY*2 + 1) : (chY*2)]50000;
> + *
> + * HBM channels are selected in 3rd nibble
> + * HBM chX[3:0]= [Y ]5X[3:0]000;
> + * HBM chX[7:4]= [Y+1]5X[3:0]000

The wording in this comment is inaccurate. There is no "UMC channel" and "HBM channel". There is only "channel". On CPUs, there is one channel per UMC, so UMC numbering equals channel numbering. On GPUs, there are eight channels per UMC, so the channel numbering is different from UMC numbering.

The notes are good overall, so I think it'll help to reference this comment in amd64_edac.c where appropriate.

> + */
> + umc *= 2;
> +
> + if (channel >= 4)
> + umc++;
> +
> + return 0x50000 + (umc << 20) + ((channel % 4) << 12); }
> +
> static inline u32 get_umc_base(u8 channel) {
> /* chY: 0xY50000 */
> --

Thanks,
Yazen

Subject: [PATCH v3 0/3] x86/edac/amd64: Add support for noncpu nodes

From: Muralidhara M K <[email protected]>

On newer heterogeneous systems the data fabrics of the CPUs and GPUs
are connected directly via a custom links.

This patchset do not have any dependency on series by Yazen Ghannam
AMD MCA Address Translation Updates
[https://patchwork.kernel.org/project/linux-edac/list/?series=505989]

This patchset does the following
1. Add support for northbridges on Aldebaran
* x86/amd_nb: Add support for northbridges on Aldebaran
2. Modifies the amd64_edac module to
a. Handle the UMCs on the noncpu nodes,
* EDAC/mce_amd: Extract node id from MCA_IPID
b. Enumerate PCI IDs and HBM memory
* EDAC/amd64: Enumerate memory on noncpu nodes

Muralidhara M K (1):
x86/amd_nb: Add support for northbridges on Aldebaran

Naveen Krishna Chatradhi (2):
EDAC/mce_amd: Extract node id from MCA_IPID
EDAC/amd64: Enumerate memory on noncpu nodes

arch/x86/include/asm/amd_nb.h | 10 ++
arch/x86/kernel/amd_nb.c | 63 +++++++++-
drivers/edac/amd64_edac.c | 219 ++++++++++++++++++++++++++++++----
drivers/edac/amd64_edac.h | 28 +++++
drivers/edac/mce_amd.c | 19 ++-
include/linux/pci_ids.h | 1 +
6 files changed, 308 insertions(+), 32 deletions(-)

--
2.25.1

Subject: [PATCH v3 1/3] x86/amd_nb: Add support for northbridges on Aldebaran

From: Muralidhara M K <[email protected]>

On newer systems the CPUs manage MCA errors reported from the GPUs.
Enumerate the GPU nodes with the AMD NB framework to support EDAC.

This patch adds necessary code to manage the Aldebaran nodes along with
the CPU nodes.

The GPU nodes are enumerated in sequential order based on the
PCI hierarchy, and the first GPU node is assumed to have an "AMD Node
ID" value of 8 (the second GPU node has 9, etc.). Each Aldebaran GPU
package has 2 Data Fabrics, which are enumerated as 2 nodes.
With this implementation detail, the Data Fabric on the GPU nodes can be
accessed the same way as the Data Fabric on CPU nodes.

Signed-off-by: Muralidhara M K <[email protected]>
Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
Reviewed-by: Yazen Ghannam <[email protected]>
---
Changes since v2: Added Reviewed-by Yazen Ghannam

arch/x86/include/asm/amd_nb.h | 10 ++++++
arch/x86/kernel/amd_nb.c | 63 ++++++++++++++++++++++++++++++++---
include/linux/pci_ids.h | 1 +
3 files changed, 69 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h
index 455066a06f60..09905f6c7218 100644
--- a/arch/x86/include/asm/amd_nb.h
+++ b/arch/x86/include/asm/amd_nb.h
@@ -80,6 +80,16 @@ struct amd_northbridge_info {

#ifdef CONFIG_AMD_NB

+/*
+ * On newer heterogeneous systems the data fabrics of the CPUs and GPUs
+ * are connected directly via a custom links, like is done with
+ * 2 socket CPU systems and also within a socket for Multi-chip Module
+ * (MCM) CPUs like Naples.
+ * The first GPU node(non cpu) is assumed to have an "AMD Node ID" value
+ * of 8 (the second GPU node has 9, etc.).
+ */
+#define NONCPU_NODE_INDEX 8
+
u16 amd_nb_num(void);
bool amd_nb_has_feature(unsigned int feature);
struct amd_northbridge *node_to_amd_nb(int node);
diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
index 23dda362dc0f..6ad5664a18aa 100644
--- a/arch/x86/kernel/amd_nb.c
+++ b/arch/x86/kernel/amd_nb.c
@@ -26,6 +26,8 @@
#define PCI_DEVICE_ID_AMD_17H_M70H_DF_F4 0x1444
#define PCI_DEVICE_ID_AMD_19H_DF_F4 0x1654
#define PCI_DEVICE_ID_AMD_19H_M50H_DF_F4 0x166e
+#define PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT 0x14bb
+#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4 0x14d4

/* Protect the PCI config register pairs used for SMN and DF indirect access. */
static DEFINE_MUTEX(smn_mutex);
@@ -94,6 +96,21 @@ static const struct pci_device_id hygon_nb_link_ids[] = {
{}
};

+static const struct pci_device_id amd_noncpu_root_ids[] = {
+ { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT) },
+ {}
+};
+
+static const struct pci_device_id amd_noncpu_nb_misc_ids[] = {
+ { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3) },
+ {}
+};
+
+static const struct pci_device_id amd_noncpu_nb_link_ids[] = {
+ { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4) },
+ {}
+};
+
const struct amd_nb_bus_dev_range amd_nb_bus_dev_ranges[] __initconst = {
{ 0x00, 0x18, 0x20 },
{ 0xff, 0x00, 0x20 },
@@ -230,11 +247,16 @@ int amd_cache_northbridges(void)
const struct pci_device_id *misc_ids = amd_nb_misc_ids;
const struct pci_device_id *link_ids = amd_nb_link_ids;
const struct pci_device_id *root_ids = amd_root_ids;
+
+ const struct pci_device_id *noncpu_misc_ids = amd_noncpu_nb_misc_ids;
+ const struct pci_device_id *noncpu_link_ids = amd_noncpu_nb_link_ids;
+ const struct pci_device_id *noncpu_root_ids = amd_noncpu_root_ids;
+
struct pci_dev *root, *misc, *link;
struct amd_northbridge *nb;
u16 roots_per_misc = 0;
- u16 misc_count = 0;
- u16 root_count = 0;
+ u16 misc_count = 0, misc_count_noncpu = 0;
+ u16 root_count = 0, root_count_noncpu = 0;
u16 i, j;

if (amd_northbridges.num)
@@ -253,10 +275,16 @@ int amd_cache_northbridges(void)
if (!misc_count)
return -ENODEV;

+ while ((misc = next_northbridge(misc, noncpu_misc_ids)) != NULL)
+ misc_count_noncpu++;
+
root = NULL;
while ((root = next_northbridge(root, root_ids)) != NULL)
root_count++;

+ while ((root = next_northbridge(root, noncpu_root_ids)) != NULL)
+ root_count_noncpu++;
+
if (root_count) {
roots_per_misc = root_count / misc_count;

@@ -270,15 +298,28 @@ int amd_cache_northbridges(void)
}
}

- nb = kcalloc(misc_count, sizeof(struct amd_northbridge), GFP_KERNEL);
+ if (misc_count_noncpu) {
+ /*
+ * The first non-CPU Node ID starts at 8 even if there are fewer
+ * than 8 CPU nodes. To maintain the AMD Node ID to Linux amd_nb
+ * indexing scheme, allocate the number of GPU nodes plus 8.
+ * Some allocated amd_northbridge structures will go unused when
+ * the number of CPU nodes is less than 8, but this tradeoff is to
+ * keep things relatively simple.
+ */
+ amd_northbridges.num = NONCPU_NODE_INDEX + misc_count_noncpu;
+ } else {
+ amd_northbridges.num = misc_count;
+ }
+
+ nb = kcalloc(amd_northbridges.num, sizeof(struct amd_northbridge), GFP_KERNEL);
if (!nb)
return -ENOMEM;

amd_northbridges.nb = nb;
- amd_northbridges.num = misc_count;

link = misc = root = NULL;
- for (i = 0; i < amd_northbridges.num; i++) {
+ for (i = 0; i < misc_count; i++) {
node_to_amd_nb(i)->root = root =
next_northbridge(root, root_ids);
node_to_amd_nb(i)->misc = misc =
@@ -299,6 +340,18 @@ int amd_cache_northbridges(void)
root = next_northbridge(root, root_ids);
}

+ if (misc_count_noncpu) {
+ link = misc = root = NULL;
+ for (i = NONCPU_NODE_INDEX; i < NONCPU_NODE_INDEX + misc_count_noncpu; i++) {
+ node_to_amd_nb(i)->root = root =
+ next_northbridge(root, noncpu_root_ids);
+ node_to_amd_nb(i)->misc = misc =
+ next_northbridge(misc, noncpu_misc_ids);
+ node_to_amd_nb(i)->link = link =
+ next_northbridge(link, noncpu_link_ids);
+ }
+ }
+
if (amd_gart_present())
amd_northbridges.flags |= AMD_NB_GART;

diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index 4bac1831de80..d9aae90dfce9 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -554,6 +554,7 @@
#define PCI_DEVICE_ID_AMD_17H_M30H_DF_F3 0x1493
#define PCI_DEVICE_ID_AMD_17H_M60H_DF_F3 0x144b
#define PCI_DEVICE_ID_AMD_17H_M70H_DF_F3 0x1443
+#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3 0x14d3
#define PCI_DEVICE_ID_AMD_19H_DF_F3 0x1653
#define PCI_DEVICE_ID_AMD_19H_M50H_DF_F3 0x166d
#define PCI_DEVICE_ID_AMD_CNB17H_F3 0x1703
--
2.25.1

Subject: [PATCH v3 3/3] EDAC/amd64: Enumerate memory on noncpu nodes

On newer heterogeneous systems the data fabrics of the CPUs and GPUs
are connected directly via a custom links.

This patch modifies the amd64_edac module to handle the HBM memory
enumeration leveraging the existing edac and the amd64 specific data
structures.

Define PCI IDs and ops for Aldeberarn GPUs in family_types array.
The UMC Phys on GPU nodes are enumerated as csrows and the UMC channels
connected to HBMs are enumerated as ranks.
Define a function to find the UMCv2 channel number.
Define a function to calculate base address of the UMCv2 registers.
ECC is enabled by default on HBM's.
Adds debug information for UMCv2 channel registers.

Signed-off-by: Muralidhara M K <[email protected]>
Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
Cc: Yazen Ghannam <[email protected]>
---
Changes since v2:
1. Restored line deletions and handled minor comments
2. Modified commit message and some of the function comments
3. variable df_inst_id is introduced instead of umc_num

drivers/edac/amd64_edac.c | 219 +++++++++++++++++++++++++++++++++-----
drivers/edac/amd64_edac.h | 28 +++++
2 files changed, 222 insertions(+), 25 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index f0d8f60acee1..452556adc1f9 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -1020,6 +1020,9 @@ static unsigned long determine_edac_cap(struct amd64_pvt *pvt)

if (umc_en_mask == dimm_ecc_en_mask)
edac_cap = EDAC_FLAG_SECDED;
+
+ if (pvt->is_noncpu)
+ edac_cap = EDAC_FLAG_SECDED;
} else {
bit = (pvt->fam > 0xf || pvt->ext_model >= K8_REV_F)
? 19
@@ -1078,6 +1081,9 @@ static int f17_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt)
{
int cs_mode = 0;

+ if (pvt->is_noncpu)
+ return CS_EVEN_PRIMARY | CS_ODD_PRIMARY;
+
if (csrow_enabled(2 * dimm, ctrl, pvt))
cs_mode |= CS_EVEN_PRIMARY;

@@ -1097,6 +1103,15 @@ static void debug_display_dimm_sizes_df(struct amd64_pvt *pvt, u8 ctrl)

edac_printk(KERN_DEBUG, EDAC_MC, "UMC%d chip selects:\n", ctrl);

+ if (pvt->is_noncpu) {
+ cs_mode = f17_get_cs_mode(cs0, ctrl, pvt);
+ for_each_chip_select(cs0, ctrl, pvt) {
+ size0 = pvt->ops->dbam_to_cs(pvt, ctrl, cs_mode, cs0);
+ amd64_info(EDAC_MC ": %d: %5dMB\n", cs0, size0);
+ }
+ return;
+ }
+
for (dimm = 0; dimm < 2; dimm++) {
cs0 = dimm * 2;
cs1 = dimm * 2 + 1;
@@ -1121,10 +1136,15 @@ static void __dump_misc_regs_df(struct amd64_pvt *pvt)
umc_base = get_umc_base(i);
umc = &pvt->umc[i];

- edac_dbg(1, "UMC%d DIMM cfg: 0x%x\n", i, umc->dimm_cfg);
+ if (!pvt->is_noncpu)
+ edac_dbg(1, "UMC%d DIMM cfg: 0x%x\n", i, umc->dimm_cfg);
edac_dbg(1, "UMC%d UMC cfg: 0x%x\n", i, umc->umc_cfg);
edac_dbg(1, "UMC%d SDP ctrl: 0x%x\n", i, umc->sdp_ctrl);
edac_dbg(1, "UMC%d ECC ctrl: 0x%x\n", i, umc->ecc_ctrl);
+ if (pvt->is_noncpu) {
+ edac_dbg(1, "UMC%d All HBMs support ECC: yes\n", i);
+ goto dimm_size;
+ }

amd_smn_read(pvt->mc_node_id, umc_base + UMCCH_ECC_BAD_SYMBOL, &tmp);
edac_dbg(1, "UMC%d ECC bad symbol: 0x%x\n", i, tmp);
@@ -1149,6 +1169,7 @@ static void __dump_misc_regs_df(struct amd64_pvt *pvt)
i, 1 << ((tmp >> 4) & 0x3));
}

+ dimm_size:
debug_display_dimm_sizes_df(pvt, i);
}

@@ -1218,8 +1239,13 @@ static void prep_chip_selects(struct amd64_pvt *pvt)
int umc;

for_each_umc(umc) {
- pvt->csels[umc].b_cnt = 4;
- pvt->csels[umc].m_cnt = 2;
+ if (pvt->is_noncpu) {
+ pvt->csels[umc].b_cnt = 8;
+ pvt->csels[umc].m_cnt = 8;
+ } else {
+ pvt->csels[umc].b_cnt = 4;
+ pvt->csels[umc].m_cnt = 2;
+ }
}

} else {
@@ -1228,6 +1254,33 @@ static void prep_chip_selects(struct amd64_pvt *pvt)
}
}

+static void read_noncpu_umc_base_mask(struct amd64_pvt *pvt)
+{
+ u32 base_reg, mask_reg;
+ u32 *base, *mask;
+ int umc, cs;
+
+ for_each_umc(umc) {
+ for_each_chip_select(cs, umc, pvt) {
+ base_reg = get_noncpu_umc_base(umc, cs) + UMCCH_BASE_ADDR;
+ base = &pvt->csels[umc].csbases[cs];
+
+ if (!amd_smn_read(pvt->mc_node_id, base_reg, base)) {
+ edac_dbg(0, " DCSB%d[%d]=0x%08x reg: 0x%x\n",
+ umc, cs, *base, base_reg);
+ }
+
+ mask_reg = get_noncpu_umc_base(umc, cs) + UMCCH_ADDR_MASK;
+ mask = &pvt->csels[umc].csmasks[cs];
+
+ if (!amd_smn_read(pvt->mc_node_id, mask_reg, mask)) {
+ edac_dbg(0, " DCSM%d[%d]=0x%08x reg: 0x%x\n",
+ umc, cs, *mask, mask_reg);
+ }
+ }
+ }
+}
+
static void read_umc_base_mask(struct amd64_pvt *pvt)
{
u32 umc_base_reg, umc_base_reg_sec;
@@ -1288,8 +1341,12 @@ static void read_dct_base_mask(struct amd64_pvt *pvt)

prep_chip_selects(pvt);

- if (pvt->umc)
- return read_umc_base_mask(pvt);
+ if (pvt->umc) {
+ if (pvt->is_noncpu)
+ return read_noncpu_umc_base_mask(pvt);
+ else
+ return read_umc_base_mask(pvt);
+ }

for_each_chip_select(cs, 0, pvt) {
int reg0 = DCSB0 + (cs * 4);
@@ -1335,6 +1392,11 @@ static void determine_memory_type(struct amd64_pvt *pvt)
u32 dram_ctrl, dcsm;

if (pvt->umc) {
+ if (pvt->is_noncpu) {
+ pvt->dram_type = MEM_HBM2;
+ return;
+ }
+
if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(5))
pvt->dram_type = MEM_LRDDR4;
else if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(4))
@@ -1724,7 +1786,10 @@ static int f17_early_channel_count(struct amd64_pvt *pvt)

/* SDP Control bit 31 (SdpInit) is clear for unused UMC channels */
for_each_umc(i)
- channels += !!(pvt->umc[i].sdp_ctrl & UMC_SDP_INIT);
+ if (pvt->is_noncpu)
+ channels += pvt->csels[i].b_cnt;
+ else
+ channels += !!(pvt->umc[i].sdp_ctrl & UMC_SDP_INIT);

amd64_info("MCT channel count: %d\n", channels);

@@ -1865,6 +1930,12 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
u32 msb, weight, num_zero_bits;
int dimm, size = 0;

+ if (pvt->is_noncpu) {
+ addr_mask_orig = pvt->csels[umc].csmasks[csrow_nr];
+ /* The memory channels in case of GPUs are fully populated */
+ goto skip_noncpu;
+ }
+
/* No Chip Selects are enabled. */
if (!cs_mode)
return size;
@@ -1890,6 +1961,7 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
else
addr_mask_orig = pvt->csels[umc].csmasks[dimm];

+ skip_noncpu:
/*
* The number of zero bits in the mask is equal to the number of bits
* in a full mask minus the number of bits in the current mask.
@@ -2635,6 +2707,16 @@ static struct amd64_family_type family_types[] = {
.dbam_to_cs = f17_addr_mask_to_cs_size,
}
},
+ [ALDEBARAN_GPUS] = {
+ .ctl_name = "ALDEBARAN",
+ .f0_id = PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F0,
+ .f6_id = PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F6,
+ .max_mcs = 4,
+ .ops = {
+ .early_channel_count = f17_early_channel_count,
+ .dbam_to_cs = f17_addr_mask_to_cs_size,
+ }
+ },
};

/*
@@ -2890,6 +2972,30 @@ static int find_umc_channel(struct mce *m)
return (m->ipid & GENMASK(31, 0)) >> 20;
}

+/*
+ * The CPUs have one channel per UMC, So a UMC number is equivalent to a
+ * channel number. The NONCPUs have 8 channels per UMC, so the UMC number no
+ * longer works as a channel number.
+ * The channel number within a NONCPU UMC is given in MCA_IPID[15:12].
+ * However, the IDs are split such that two UMC values go to one UMC, and
+ * the channel numbers are split in two groups of four.
+ *
+ * Refer comment on get_noncpu_umc_base() from amd64_edac.h
+ *
+ * For example,
+ * UMC0 CH[3:0] = 0x0005[3:0]000
+ * UMC0 CH[7:4] = 0x0015[3:0]000
+ * UMC1 CH[3:0] = 0x0025[3:0]000
+ * UMC1 CH[7:4] = 0x0035[3:0]000
+ */
+static int find_umc_channel_noncpu(struct mce *m)
+{
+ u8 umc = find_umc_channel(m);
+ u8 ch = ((m->ipid >> 12) & 0xf);
+
+ return umc % 2 ? (ch + 4) : ch;
+}
+
static void decode_umc_error(int node_id, struct mce *m)
{
u8 ecc_type = (m->status >> 45) & 0x3;
@@ -2897,6 +3003,7 @@ static void decode_umc_error(int node_id, struct mce *m)
struct amd64_pvt *pvt;
struct err_info err;
u64 sys_addr;
+ u8 df_inst_id;

mci = edac_mc_find(node_id);
if (!mci)
@@ -2909,7 +3016,22 @@ static void decode_umc_error(int node_id, struct mce *m)
if (m->status & MCI_STATUS_DEFERRED)
ecc_type = 3;

- err.channel = find_umc_channel(m);
+ if (pvt->is_noncpu) {
+ /*
+ * The NONCPUs have one Chip Select per UMC, so the UMC number
+ * can used as the Chip Select number. However, the UMC number
+ * is split in the ID value so it's necessary to divide by 2.
+ */
+ err.csrow = find_umc_channel(m) / 2;
+ err.channel = find_umc_channel_noncpu(m);
+ /* On NONCPUs, instance id is calculated as below. */
+ df_inst_id = err.csrow * 8 + err.channel;
+ } else {
+ /* On CPUs, "Channel"="UMC Number"="DF Instance ID". */
+ err.channel = find_umc_channel(m);
+ err.csrow = m->synd & 0x7;
+ df_inst_id = err.channel;
+ }

if (!(m->status & MCI_STATUS_SYNDV)) {
err.err_code = ERR_SYND;
@@ -2925,9 +3047,7 @@ static void decode_umc_error(int node_id, struct mce *m)
err.err_code = ERR_CHANNEL;
}

- err.csrow = m->synd & 0x7;
-
- if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, err.channel, &sys_addr)) {
+ if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, df_inst_id, &sys_addr)) {
err.err_code = ERR_NORM_ADDR;
goto log_error;
}
@@ -3054,15 +3174,21 @@ static void __read_mc_regs_df(struct amd64_pvt *pvt)

/* Read registers from each UMC */
for_each_umc(i) {
+ if (pvt->is_noncpu)
+ umc_base = get_noncpu_umc_base(i, 0);
+ else
+ umc_base = get_umc_base(i);

- umc_base = get_umc_base(i);
umc = &pvt->umc[i];

- amd_smn_read(nid, umc_base + UMCCH_DIMM_CFG, &umc->dimm_cfg);
amd_smn_read(nid, umc_base + UMCCH_UMC_CFG, &umc->umc_cfg);
amd_smn_read(nid, umc_base + UMCCH_SDP_CTRL, &umc->sdp_ctrl);
amd_smn_read(nid, umc_base + UMCCH_ECC_CTRL, &umc->ecc_ctrl);
- amd_smn_read(nid, umc_base + UMCCH_UMC_CAP_HI, &umc->umc_cap_hi);
+
+ if (!pvt->is_noncpu) {
+ amd_smn_read(nid, umc_base + UMCCH_DIMM_CFG, &umc->dimm_cfg);
+ amd_smn_read(nid, umc_base + UMCCH_UMC_CAP_HI, &umc->umc_cap_hi);
+ }
}
}

@@ -3144,7 +3270,9 @@ static void read_mc_regs(struct amd64_pvt *pvt)
determine_memory_type(pvt);
edac_dbg(1, " DIMM type: %s\n", edac_mem_types[pvt->dram_type]);

- determine_ecc_sym_sz(pvt);
+ /* ECC symbol size is not available on NONCPU nodes */
+ if (!pvt->is_noncpu)
+ determine_ecc_sym_sz(pvt);
}

/*
@@ -3232,15 +3360,21 @@ static int init_csrows_df(struct mem_ctl_info *mci)
continue;

empty = 0;
- dimm = mci->csrows[cs]->channels[umc]->dimm;
+ if (pvt->is_noncpu) {
+ dimm = mci->csrows[umc]->channels[cs]->dimm;
+ dimm->edac_mode = EDAC_SECDED;
+ dimm->dtype = DEV_X16;
+ } else {
+ dimm = mci->csrows[cs]->channels[umc]->dimm;
+ dimm->edac_mode = edac_mode;
+ dimm->dtype = dev_type;
+ }

edac_dbg(1, "MC node: %d, csrow: %d\n",
pvt->mc_node_id, cs);

dimm->nr_pages = get_csrow_nr_pages(pvt, umc, cs);
dimm->mtype = pvt->dram_type;
- dimm->edac_mode = edac_mode;
- dimm->dtype = dev_type;
dimm->grain = 64;
}
}
@@ -3505,7 +3639,9 @@ static bool ecc_enabled(struct amd64_pvt *pvt)

umc_en_mask |= BIT(i);

- if (umc->umc_cap_hi & UMC_ECC_ENABLED)
+ /* ECC is enabled by default on NONCPU nodes */
+ if (pvt->is_noncpu ||
+ (umc->umc_cap_hi & UMC_ECC_ENABLED))
ecc_en_mask |= BIT(i);
}

@@ -3541,6 +3677,11 @@ f17h_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt *pvt)
{
u8 i, ecc_en = 1, cpk_en = 1, dev_x4 = 1, dev_x16 = 1;

+ if (pvt->is_noncpu) {
+ mci->edac_ctl_cap |= EDAC_SECDED;
+ return;
+ }
+
for_each_umc(i) {
if (pvt->umc[i].sdp_ctrl & UMC_SDP_INIT) {
ecc_en &= !!(pvt->umc[i].umc_cap_hi & UMC_ECC_ENABLED);
@@ -3571,7 +3712,11 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci)
{
struct amd64_pvt *pvt = mci->pvt_info;

- mci->mtype_cap = MEM_FLAG_DDR2 | MEM_FLAG_RDDR2;
+ if (pvt->is_noncpu)
+ mci->mtype_cap = MEM_FLAG_HBM2;
+ else
+ mci->mtype_cap = MEM_FLAG_DDR2 | MEM_FLAG_RDDR2;
+
mci->edac_ctl_cap = EDAC_FLAG_NONE;

if (pvt->umc) {
@@ -3676,11 +3821,24 @@ static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt)
fam_type = &family_types[F17_M70H_CPUS];
pvt->ops = &family_types[F17_M70H_CPUS].ops;
fam_type->ctl_name = "F19h_M20h";
- break;
+ } else if (pvt->model >= 0x30 && pvt->model <= 0x3f) {
+ if (pvt->is_noncpu) {
+ int tmp = pvt->mc_node_id - NONCPU_NODE_INDEX;
+
+ fam_type = &family_types[ALDEBARAN_GPUS];
+ pvt->ops = &family_types[ALDEBARAN_GPUS].ops;
+ sprintf(pvt->buf, "Aldebaran#%ddie#%d", tmp / 2, tmp % 2);
+ fam_type->ctl_name = pvt->buf;
+ } else {
+ fam_type = &family_types[F19_CPUS];
+ pvt->ops = &family_types[F19_CPUS].ops;
+ fam_type->ctl_name = "F19h_M30h";
+ }
+ } else {
+ fam_type = &family_types[F19_CPUS];
+ pvt->ops = &family_types[F19_CPUS].ops;
+ family_types[F19_CPUS].ctl_name = "F19h";
}
- fam_type = &family_types[F19_CPUS];
- pvt->ops = &family_types[F19_CPUS].ops;
- family_types[F19_CPUS].ctl_name = "F19h";
break;

default:
@@ -3748,9 +3906,10 @@ static int init_one_instance(struct amd64_pvt *pvt)
if (pvt->channel_count < 0)
return ret;

+ /* Define layers for CPU and NONCPU nodes */
ret = -ENOMEM;
layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
- layers[0].size = pvt->csels[0].b_cnt;
+ layers[0].size = pvt->is_noncpu ? fam_type->max_mcs : pvt->csels[0].b_cnt;
layers[0].is_virt_csrow = true;
layers[1].type = EDAC_MC_LAYER_CHANNEL;

@@ -3759,7 +3918,7 @@ static int init_one_instance(struct amd64_pvt *pvt)
* only one channel. Also, this simplifies handling later for the price
* of a couple of KBs tops.
*/
- layers[1].size = fam_type->max_mcs;
+ layers[1].size = pvt->is_noncpu ? pvt->csels[0].b_cnt : fam_type->max_mcs;
layers[1].is_virt_csrow = false;

mci = edac_mc_alloc(pvt->mc_node_id, ARRAY_SIZE(layers), layers, 0);
@@ -3804,6 +3963,9 @@ static int probe_one_instance(unsigned int nid)
struct ecc_settings *s;
int ret;

+ if (!F3)
+ return 0;
+
ret = -ENOMEM;
s = kzalloc(sizeof(struct ecc_settings), GFP_KERNEL);
if (!s)
@@ -3815,6 +3977,9 @@ static int probe_one_instance(unsigned int nid)
if (!pvt)
goto err_settings;

+ if (nid >= NONCPU_NODE_INDEX)
+ pvt->is_noncpu = true;
+
pvt->mc_node_id = nid;
pvt->F3 = F3;

@@ -3888,6 +4053,10 @@ static void remove_one_instance(unsigned int nid)
struct mem_ctl_info *mci;
struct amd64_pvt *pvt;

+ /* Nothing to remove for the space holder entries */
+ if (!F3)
+ return;
+
/* Remove from EDAC CORE tracking list */
mci = edac_mc_del_mc(&F3->dev);
if (!mci)
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 85aa820bc165..0844f004c90b 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -126,6 +126,8 @@
#define PCI_DEVICE_ID_AMD_17H_M70H_DF_F6 0x1446
#define PCI_DEVICE_ID_AMD_19H_DF_F0 0x1650
#define PCI_DEVICE_ID_AMD_19H_DF_F6 0x1656
+#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F0 0x14D0
+#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F6 0x14D6

/*
* Function 1 - Address Map
@@ -298,6 +300,7 @@ enum amd_families {
F17_M60H_CPUS,
F17_M70H_CPUS,
F19_CPUS,
+ ALDEBARAN_GPUS,
NUM_FAMILIES,
};

@@ -389,6 +392,9 @@ struct amd64_pvt {
enum mem_type dram_type;

struct amd64_umc *umc; /* UMC registers */
+ char buf[20];
+
+ bool is_noncpu;
};

enum err_codes {
@@ -410,6 +416,28 @@ struct err_info {
u32 offset;
};

+static inline u32 get_noncpu_umc_base(u8 umc, u8 channel)
+{
+ /*
+ * On CPUs, there is one channel per UMC, so UMC numbering equals
+ * channel numbering. On NONCPUs, there are eight channels per UMC,
+ * so the channel numbering is different from UMC numbering.
+ *
+ * On CPU nodes channels are selected in 6th nibble
+ * UMC chY[3:0]= [(chY*2 + 1) : (chY*2)]50000;
+ *
+ * On NONCPU nodes channels are selected in 3rd nibble
+ * HBM chX[3:0]= [Y ]5X[3:0]000;
+ * HBM chX[7:4]= [Y+1]5X[3:0]000
+ */
+ umc *= 2;
+
+ if (channel >= 4)
+ umc++;
+
+ return 0x50000 + (umc << 20) + ((channel % 4) << 12);
+}
+
static inline u32 get_umc_base(u8 channel)
{
/* chY: 0xY50000 */
--
2.25.1

Subject: [PATCH v3 2/3] EDAC/mce_amd: Extract node id from MCA_IPID

On SMCA banks of the NONCPU nodes, the node id information is
available in MCA_IPID[47:44](InstanceIdHi).

Signed-off-by: Muralidhara M K <[email protected]>
Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
Reviewed-by: Yazen Ghannam <[email protected]>
---
Changes since v2:
1. Modified subject and commit message
2. Added Reviewed by Yazen Ghannam

drivers/edac/mce_amd.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index 27d56920b469..1398032ba25a 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -1072,8 +1072,23 @@ static void decode_smca_error(struct mce *m)
if (xec < smca_mce_descs[bank_type].num_descs)
pr_cont(", %s.\n", smca_mce_descs[bank_type].descs[xec]);

- if (bank_type == SMCA_UMC && xec == 0 && decode_dram_ecc)
- decode_dram_ecc(topology_die_id(m->extcpu), m);
+ if (xec == 0 && decode_dram_ecc) {
+ int node_id = 0;
+
+ if (bank_type == SMCA_UMC) {
+ node_id = topology_die_id(m->extcpu);
+ } else if (bank_type == SMCA_UMC_V2) {
+ /*
+ * SMCA_UMC_V2 is used on the noncpu nodes, extract
+ * the node id from MCA_IPID[47:44](InstanceIdHi)
+ */
+ node_id = ((m->ipid >> 44) & 0xF);
+ } else {
+ return;
+ }
+
+ decode_dram_ecc(node_id, m);
+ }
}

static inline void amd_decode_err_code(u16 ec)
--
2.25.1

2021-08-25 10:43:19

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] x86/amd_nb: Add support for northbridges on Aldebaran

On Tue, Aug 24, 2021 at 12:24:35AM +0530, Naveen Krishna Chatradhi wrote:
> From: Muralidhara M K <[email protected]>
>
> On newer systems the CPUs manage MCA errors reported from the GPUs.
> Enumerate the GPU nodes with the AMD NB framework to support EDAC.
>
> This patch adds necessary code to manage the Aldebaran nodes along with

Avoid having "This patch" or "This commit" in the commit message. It is
tautologically useless.

Also, do

$ git grep 'This patch' Documentation/process

for more details.

Also, what are the "Aldebaran nodes"?

Something on a star which is 65 light years away?

> the CPU nodes.
>
> The GPU nodes are enumerated in sequential order based on the
> PCI hierarchy, and the first GPU node is assumed to have an "AMD Node
> ID" value of 8 (the second GPU node has 9, etc.).

What does that mean? The GPU nodes are simply numerically after the CPU
nodes or how am I to understand this nomenclature?

> Each Aldebaran GPU
> package has 2 Data Fabrics, which are enumerated as 2 nodes.
> With this implementation detail, the Data Fabric on the GPU nodes can be
> accessed the same way as the Data Fabric on CPU nodes.
>
> Signed-off-by: Muralidhara M K <[email protected]>
> Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
> Reviewed-by: Yazen Ghannam <[email protected]>
> ---
> Changes since v2: Added Reviewed-by Yazen Ghannam
>
> arch/x86/include/asm/amd_nb.h | 10 ++++++
> arch/x86/kernel/amd_nb.c | 63 ++++++++++++++++++++++++++++++++---
> include/linux/pci_ids.h | 1 +
> 3 files changed, 69 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h
> index 455066a06f60..09905f6c7218 100644
> --- a/arch/x86/include/asm/amd_nb.h
> +++ b/arch/x86/include/asm/amd_nb.h
> @@ -80,6 +80,16 @@ struct amd_northbridge_info {
>
> #ifdef CONFIG_AMD_NB
>
> +/*
> + * On newer heterogeneous systems the data fabrics of the CPUs and GPUs
> + * are connected directly via a custom links, like is done with

s/ a //

> + * 2 socket CPU systems and also within a socket for Multi-chip Module
> + * (MCM) CPUs like Naples.
> + * The first GPU node(non cpu) is assumed to have an "AMD Node ID" value

In all your text:

s/cpu/CPU/g

> + * of 8 (the second GPU node has 9, etc.).
> + */
> +#define NONCPU_NODE_INDEX 8

Why is this assumed? Can it instead be read from the hardware somewhere?
Or there simply won't be more than 8 CPU nodes anyway? Not at least in
the near future?

I'd prefer stuff to be read out directly from the hardware so that when
the hardware changes, the code just works instead of doing assumptions
which get invalidated later.

> +
> u16 amd_nb_num(void);
> bool amd_nb_has_feature(unsigned int feature);
> struct amd_northbridge *node_to_amd_nb(int node);
> diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
> index 23dda362dc0f..6ad5664a18aa 100644
> --- a/arch/x86/kernel/amd_nb.c
> +++ b/arch/x86/kernel/amd_nb.c
> @@ -26,6 +26,8 @@
> #define PCI_DEVICE_ID_AMD_17H_M70H_DF_F4 0x1444
> #define PCI_DEVICE_ID_AMD_19H_DF_F4 0x1654
> #define PCI_DEVICE_ID_AMD_19H_M50H_DF_F4 0x166e
> +#define PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT 0x14bb
> +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4 0x14d4

You see how those defines are aligned vertically, right?

If your new defines don't fit, you realign them all vertically too - you
don't just slap them there.

And if it wasn't clear above, that Aldebaran GPU chip name means
something only to AMD folks. If this is correct

https://en.wikipedia.org/wiki/Video_Core_Next

then that Aldebaran thing is VCN 2.6 although this is only the video
encoding stuff and GPU I guess is more than that.

IOW, what I'm trying to say is, just like we name the CPUs using their
families, you should name the GPUs nomenclature with GPU families (I
guess there's stuff like that) and not use the marketing crap.

If you need an example, here's how we did it for the Intel marketing
pile of bullsh*t:

arch/x86/include/asm/intel-family.h

Please provide a similar way of referring to the GPU chips.

> /* Protect the PCI config register pairs used for SMN and DF indirect access. */
> static DEFINE_MUTEX(smn_mutex);
> @@ -94,6 +96,21 @@ static const struct pci_device_id hygon_nb_link_ids[] = {
> {}
> };
>
> +static const struct pci_device_id amd_noncpu_root_ids[] = {

Why is that "noncpu" thing everywhere? Is this thing going to be
anything else besides a GPU?

If not, you can simply call it

amd_gpu_root_ids

to mean *exactly* what they are. PCI IDs on the GPU.

> + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT) },
> + {}
> +};
> +
> +static const struct pci_device_id amd_noncpu_nb_misc_ids[] = {
> + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3) },
> + {}
> +};
> +
> +static const struct pci_device_id amd_noncpu_nb_link_ids[] = {
> + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4) },
> + {}
> +};
> +
> const struct amd_nb_bus_dev_range amd_nb_bus_dev_ranges[] __initconst = {
> { 0x00, 0x18, 0x20 },
> { 0xff, 0x00, 0x20 },
> @@ -230,11 +247,16 @@ int amd_cache_northbridges(void)
> const struct pci_device_id *misc_ids = amd_nb_misc_ids;
> const struct pci_device_id *link_ids = amd_nb_link_ids;
> const struct pci_device_id *root_ids = amd_root_ids;
> +
> + const struct pci_device_id *noncpu_misc_ids = amd_noncpu_nb_misc_ids;
> + const struct pci_device_id *noncpu_link_ids = amd_noncpu_nb_link_ids;
> + const struct pci_device_id *noncpu_root_ids = amd_noncpu_root_ids;
> +
> struct pci_dev *root, *misc, *link;
> struct amd_northbridge *nb;
> u16 roots_per_misc = 0;
> - u16 misc_count = 0;
> - u16 root_count = 0;
> + u16 misc_count = 0, misc_count_noncpu = 0;
> + u16 root_count = 0, root_count_noncpu = 0;
> u16 i, j;
>
> if (amd_northbridges.num)
> @@ -253,10 +275,16 @@ int amd_cache_northbridges(void)
> if (!misc_count)
> return -ENODEV;
>
> + while ((misc = next_northbridge(misc, noncpu_misc_ids)) != NULL)
> + misc_count_noncpu++;
> +
> root = NULL;
> while ((root = next_northbridge(root, root_ids)) != NULL)
> root_count++;
>
> + while ((root = next_northbridge(root, noncpu_root_ids)) != NULL)
> + root_count_noncpu++;
> +
> if (root_count) {
> roots_per_misc = root_count / misc_count;
>
> @@ -270,15 +298,28 @@ int amd_cache_northbridges(void)
> }
> }
>
> - nb = kcalloc(misc_count, sizeof(struct amd_northbridge), GFP_KERNEL);
> + if (misc_count_noncpu) {
> + /*
> + * The first non-CPU Node ID starts at 8 even if there are fewer
> + * than 8 CPU nodes. To maintain the AMD Node ID to Linux amd_nb
> + * indexing scheme, allocate the number of GPU nodes plus 8.
> + * Some allocated amd_northbridge structures will go unused when
> + * the number of CPU nodes is less than 8, but this tradeoff is to
> + * keep things relatively simple.

Why simple?

What's wrong with having

[node IDs][GPU node IDs]

i.e., the usual nodes come first and the GPU ones after it.

You enumerate everything properly here so you can control what goes
where. Which means, you don't need this NONCPU_NODE_INDEX non-sense at
all.

Hmmm?

> + */
> + amd_northbridges.num = NONCPU_NODE_INDEX + misc_count_noncpu;
> + } else {
> + amd_northbridges.num = misc_count;
> + }
> +
> + nb = kcalloc(amd_northbridges.num, sizeof(struct amd_northbridge), GFP_KERNEL);
> if (!nb)
> return -ENOMEM;
>
> amd_northbridges.nb = nb;
> - amd_northbridges.num = misc_count;
>
> link = misc = root = NULL;
> - for (i = 0; i < amd_northbridges.num; i++) {
> + for (i = 0; i < misc_count; i++) {
> node_to_amd_nb(i)->root = root =
> next_northbridge(root, root_ids);
> node_to_amd_nb(i)->misc = misc =
> @@ -299,6 +340,18 @@ int amd_cache_northbridges(void)
> root = next_northbridge(root, root_ids);
> }
>
> + if (misc_count_noncpu) {
> + link = misc = root = NULL;
> + for (i = NONCPU_NODE_INDEX; i < NONCPU_NODE_INDEX + misc_count_noncpu; i++) {

So this is not keeping things relatively simple - this is making you
jump to the GPU nodes to prepare them too which is making them special.

> + node_to_amd_nb(i)->root = root =
> + next_northbridge(root, noncpu_root_ids);
> + node_to_amd_nb(i)->misc = misc =
> + next_northbridge(misc, noncpu_misc_ids);
> + node_to_amd_nb(i)->link = link =
> + next_northbridge(link, noncpu_link_ids);

And seeing how you put those pointers in ->root, ->misc and ->link,
you can just as well drop those noncpu_*_ids and put the aldebaran
PCI IDs simply in amd_root_ids, amd_nb_misc_ids and amd_nb_link_ids
respectively.

Because to this code, the RAS functionality is no different than any
other CPU because, well, the interface is those PCI devices. So the
thing doesn't care if it is GPU or not.

So you don't need any of that separation between GPU and CPU nodes when
it comes to the RAS code.

Makes sense?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-08-27 10:25:46

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 2/3] EDAC/mce_amd: Extract node id from MCA_IPID

On Tue, Aug 24, 2021 at 12:24:36AM +0530, Naveen Krishna Chatradhi wrote:
> On SMCA banks of the NONCPU nodes, the node id information is
> available in MCA_IPID[47:44](InstanceIdHi).
>
> Signed-off-by: Muralidhara M K <[email protected]>
> Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
> Reviewed-by: Yazen Ghannam <[email protected]>
> ---
> Changes since v2:
> 1. Modified subject and commit message
> 2. Added Reviewed by Yazen Ghannam
>
> drivers/edac/mce_amd.c | 19 +++++++++++++++++--
> 1 file changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
> index 27d56920b469..1398032ba25a 100644
> --- a/drivers/edac/mce_amd.c
> +++ b/drivers/edac/mce_amd.c
> @@ -1072,8 +1072,23 @@ static void decode_smca_error(struct mce *m)
> if (xec < smca_mce_descs[bank_type].num_descs)
> pr_cont(", %s.\n", smca_mce_descs[bank_type].descs[xec]);
>
> - if (bank_type == SMCA_UMC && xec == 0 && decode_dram_ecc)
> - decode_dram_ecc(topology_die_id(m->extcpu), m);
> + if (xec == 0 && decode_dram_ecc) {
> + int node_id = 0;
> +
> + if (bank_type == SMCA_UMC) {
> + node_id = topology_die_id(m->extcpu);
> + } else if (bank_type == SMCA_UMC_V2) {
> + /*
> + * SMCA_UMC_V2 is used on the noncpu nodes, extract

Above "NONCPU", here "noncpu", I don't like that "noncpu" nomenclature.
I wonder if we can do without it...

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-08-27 11:33:06

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 3/3] EDAC/amd64: Enumerate memory on noncpu nodes

On Tue, Aug 24, 2021 at 12:24:37AM +0530, Naveen Krishna Chatradhi wrote:
> On newer heterogeneous systems the data fabrics of the CPUs and GPUs
> are connected directly via a custom links.
>
> This patch modifies the amd64_edac module to handle the HBM memory
> enumeration leveraging the existing edac and the amd64 specific data
> structures.
>
> Define PCI IDs and ops for Aldeberarn GPUs in family_types array.
> The UMC Phys on GPU nodes are enumerated as csrows and the UMC channels
> connected to HBMs are enumerated as ranks.
> Define a function to find the UMCv2 channel number.
> Define a function to calculate base address of the UMCv2 registers.
> ECC is enabled by default on HBM's.
> Adds debug information for UMCv2 channel registers.

Avoid having "This patch" or "This commit" in the commit message. It is
tautologically useless.

Also, do

$ git grep 'This patch' Documentation/process

for more details.

What is more, a commit message should not explain *what* the patch is
doing - that should be obvious from the diff itself. Rather, it should
concentrate more on the *why* it is doing it.

> Signed-off-by: Muralidhara M K <[email protected]>
> Signed-off-by: Naveen Krishna Chatradhi <[email protected]>

This SOB chain is wrong. It suggests Muralidhara is the author but his
From: like in patch 1 is missing here.

> Cc: Yazen Ghannam <[email protected]>
> ---
> Changes since v2:
> 1. Restored line deletions and handled minor comments
> 2. Modified commit message and some of the function comments
> 3. variable df_inst_id is introduced instead of umc_num
>
> drivers/edac/amd64_edac.c | 219 +++++++++++++++++++++++++++++++++-----
> drivers/edac/amd64_edac.h | 28 +++++
> 2 files changed, 222 insertions(+), 25 deletions(-)

...

> @@ -1097,6 +1103,15 @@ static void debug_display_dimm_sizes_df(struct amd64_pvt *pvt, u8 ctrl)
>
> edac_printk(KERN_DEBUG, EDAC_MC, "UMC%d chip selects:\n", ctrl);
>
> + if (pvt->is_noncpu) {
> + cs_mode = f17_get_cs_mode(cs0, ctrl, pvt);
> + for_each_chip_select(cs0, ctrl, pvt) {
> + size0 = pvt->ops->dbam_to_cs(pvt, ctrl, cs_mode, cs0);
> + amd64_info(EDAC_MC ": %d: %5dMB\n", cs0, size0);
> + }
> + return;
> + }

No, define a separate debug_display_dimm_sizes_gpu() and put all the
GPU-specific dumping in there instead of sprinkling them around the code
like that.

> +
> for (dimm = 0; dimm < 2; dimm++) {
> cs0 = dimm * 2;
> cs1 = dimm * 2 + 1;
> @@ -1121,10 +1136,15 @@ static void __dump_misc_regs_df(struct amd64_pvt *pvt)

Ditto: __dump_misc_regs_gpu()

> umc_base = get_umc_base(i);
> umc = &pvt->umc[i];
>
> - edac_dbg(1, "UMC%d DIMM cfg: 0x%x\n", i, umc->dimm_cfg);
> + if (!pvt->is_noncpu)
> + edac_dbg(1, "UMC%d DIMM cfg: 0x%x\n", i, umc->dimm_cfg);
> edac_dbg(1, "UMC%d UMC cfg: 0x%x\n", i, umc->umc_cfg);
> edac_dbg(1, "UMC%d SDP ctrl: 0x%x\n", i, umc->sdp_ctrl);
> edac_dbg(1, "UMC%d ECC ctrl: 0x%x\n", i, umc->ecc_ctrl);
> + if (pvt->is_noncpu) {
> + edac_dbg(1, "UMC%d All HBMs support ECC: yes\n", i);
> + goto dimm_size;
> + }
>
> amd_smn_read(pvt->mc_node_id, umc_base + UMCCH_ECC_BAD_SYMBOL, &tmp);
> edac_dbg(1, "UMC%d ECC bad symbol: 0x%x\n", i, tmp);

...

> +static void read_noncpu_umc_base_mask(struct amd64_pvt *pvt)
> +{
> + u32 base_reg, mask_reg;
> + u32 *base, *mask;
> + int umc, cs;
> +
> + for_each_umc(umc) {
> + for_each_chip_select(cs, umc, pvt) {
> + base_reg = get_noncpu_umc_base(umc, cs) + UMCCH_BASE_ADDR;
> + base = &pvt->csels[umc].csbases[cs];
> +
> + if (!amd_smn_read(pvt->mc_node_id, base_reg, base)) {
> + edac_dbg(0, " DCSB%d[%d]=0x%08x reg: 0x%x\n",
> + umc, cs, *base, base_reg);
> + }
> +
> + mask_reg = get_noncpu_umc_base(umc, cs) + UMCCH_ADDR_MASK;
> + mask = &pvt->csels[umc].csmasks[cs];
> +
> + if (!amd_smn_read(pvt->mc_node_id, mask_reg, mask)) {
> + edac_dbg(0, " DCSM%d[%d]=0x%08x reg: 0x%x\n",
> + umc, cs, *mask, mask_reg);
> + }
> + }
> + }
> +}

This code pretty-much duplicates what read_umc_base_mask() does - pls
add a common helper which is used by both CPU and GPU.

> +
> static void read_umc_base_mask(struct amd64_pvt *pvt)
> {
> u32 umc_base_reg, umc_base_reg_sec;
> @@ -1288,8 +1341,12 @@ static void read_dct_base_mask(struct amd64_pvt *pvt)
>
> prep_chip_selects(pvt);
>
> - if (pvt->umc)
> - return read_umc_base_mask(pvt);
> + if (pvt->umc) {
> + if (pvt->is_noncpu)
> + return read_noncpu_umc_base_mask(pvt);
> + else
> + return read_umc_base_mask(pvt);
> + }
>
> for_each_chip_select(cs, 0, pvt) {
> int reg0 = DCSB0 + (cs * 4);
> @@ -1335,6 +1392,11 @@ static void determine_memory_type(struct amd64_pvt *pvt)
> u32 dram_ctrl, dcsm;
>
> if (pvt->umc) {
> + if (pvt->is_noncpu) {
> + pvt->dram_type = MEM_HBM2;
> + return;
> + }

I don't like this sprinkling of "if (pvt->is_noncpu)" everywhere,
at all. Please define a separate read_mc_regs_df() or so which
contains only the needed functionalty which you can carve out from
read_mc_regs().

> +
> if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(5))
> pvt->dram_type = MEM_LRDDR4;
> else if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(4))
> @@ -1724,7 +1786,10 @@ static int f17_early_channel_count(struct amd64_pvt *pvt)
>
> /* SDP Control bit 31 (SdpInit) is clear for unused UMC channels */
> for_each_umc(i)
> - channels += !!(pvt->umc[i].sdp_ctrl & UMC_SDP_INIT);
> + if (pvt->is_noncpu)
> + channels += pvt->csels[i].b_cnt;
> + else
> + channels += !!(pvt->umc[i].sdp_ctrl & UMC_SDP_INIT);
>
> amd64_info("MCT channel count: %d\n", channels);
>

No, a separate gpu_early_channel_count() is needed here. There's a
reason for those function pointers getting assigned depending on family.

> @@ -1865,6 +1930,12 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
> u32 msb, weight, num_zero_bits;
> int dimm, size = 0;
>
> + if (pvt->is_noncpu) {
> + addr_mask_orig = pvt->csels[umc].csmasks[csrow_nr];
> + /* The memory channels in case of GPUs are fully populated */
> + goto skip_noncpu;
> + }
> +

Ditto.

> /* No Chip Selects are enabled. */
> if (!cs_mode)
> return size;
> @@ -1890,6 +1961,7 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
> else
> addr_mask_orig = pvt->csels[umc].csmasks[dimm];
>
> + skip_noncpu:
> /*
> * The number of zero bits in the mask is equal to the number of bits
> * in a full mask minus the number of bits in the current mask.
> @@ -2635,6 +2707,16 @@ static struct amd64_family_type family_types[] = {
> .dbam_to_cs = f17_addr_mask_to_cs_size,
> }
> },
> + [ALDEBARAN_GPUS] = {
> + .ctl_name = "ALDEBARAN",
> + .f0_id = PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F0,
> + .f6_id = PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F6,
> + .max_mcs = 4,
> + .ops = {
> + .early_channel_count = f17_early_channel_count,
> + .dbam_to_cs = f17_addr_mask_to_cs_size,
> + }
> + },

Here you define those GPU-specific function pointers which you then call.

> @@ -2890,6 +2972,30 @@ static int find_umc_channel(struct mce *m)
> return (m->ipid & GENMASK(31, 0)) >> 20;
> }
>
> +/*
> + * The CPUs have one channel per UMC, So a UMC number is equivalent to a
> + * channel number. The NONCPUs have 8 channels per UMC, so the UMC number no
> + * longer works as a channel number.
> + * The channel number within a NONCPU UMC is given in MCA_IPID[15:12].
> + * However, the IDs are split such that two UMC values go to one UMC, and
> + * the channel numbers are split in two groups of four.
> + *
> + * Refer comment on get_noncpu_umc_base() from amd64_edac.h
> + *
> + * For example,
> + * UMC0 CH[3:0] = 0x0005[3:0]000
> + * UMC0 CH[7:4] = 0x0015[3:0]000
> + * UMC1 CH[3:0] = 0x0025[3:0]000
> + * UMC1 CH[7:4] = 0x0035[3:0]000
> + */
> +static int find_umc_channel_noncpu(struct mce *m)
> +{
> + u8 umc = find_umc_channel(m);
> + u8 ch = ((m->ipid >> 12) & 0xf);
> +
> + return umc % 2 ? (ch + 4) : ch;
> +}
> +
> static void decode_umc_error(int node_id, struct mce *m)
> {
> u8 ecc_type = (m->status >> 45) & 0x3;
> @@ -2897,6 +3003,7 @@ static void decode_umc_error(int node_id, struct mce *m)
> struct amd64_pvt *pvt;
> struct err_info err;
> u64 sys_addr;
> + u8 df_inst_id;

You don't need that variable and can work with err.channel just fine.

> mci = edac_mc_find(node_id);
> if (!mci)
> @@ -2909,7 +3016,22 @@ static void decode_umc_error(int node_id, struct mce *m)
> if (m->status & MCI_STATUS_DEFERRED)
> ecc_type = 3;
>
> - err.channel = find_umc_channel(m);
> + if (pvt->is_noncpu) {
> + /*
> + * The NONCPUs have one Chip Select per UMC, so the UMC number
> + * can used as the Chip Select number. However, the UMC number
> + * is split in the ID value so it's necessary to divide by 2.
> + */
> + err.csrow = find_umc_channel(m) / 2;
> + err.channel = find_umc_channel_noncpu(m);
> + /* On NONCPUs, instance id is calculated as below. */
> + df_inst_id = err.csrow * 8 + err.channel;

err.channel += err.csrow * 8;

tadaaa!

> + } else {
> + /* On CPUs, "Channel"="UMC Number"="DF Instance ID". */
> + err.channel = find_umc_channel(m);
> + err.csrow = m->synd & 0x7;
> + df_inst_id = err.channel;
> + }
>
> if (!(m->status & MCI_STATUS_SYNDV)) {
> err.err_code = ERR_SYND;
> @@ -2925,9 +3047,7 @@ static void decode_umc_error(int node_id, struct mce *m)
> err.err_code = ERR_CHANNEL;
> }
>
> - err.csrow = m->synd & 0x7;
> -
> - if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, err.channel, &sys_addr)) {
> + if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, df_inst_id, &sys_addr)) {
> err.err_code = ERR_NORM_ADDR;
> goto log_error;
> }
> @@ -3054,15 +3174,21 @@ static void __read_mc_regs_df(struct amd64_pvt *pvt)
>
> /* Read registers from each UMC */
> for_each_umc(i) {
> + if (pvt->is_noncpu)
> + umc_base = get_noncpu_umc_base(i, 0);
> + else
> + umc_base = get_umc_base(i);
>
> - umc_base = get_umc_base(i);
> umc = &pvt->umc[i];
>
> - amd_smn_read(nid, umc_base + UMCCH_DIMM_CFG, &umc->dimm_cfg);
> amd_smn_read(nid, umc_base + UMCCH_UMC_CFG, &umc->umc_cfg);
> amd_smn_read(nid, umc_base + UMCCH_SDP_CTRL, &umc->sdp_ctrl);
> amd_smn_read(nid, umc_base + UMCCH_ECC_CTRL, &umc->ecc_ctrl);
> - amd_smn_read(nid, umc_base + UMCCH_UMC_CAP_HI, &umc->umc_cap_hi);
> +
> + if (!pvt->is_noncpu) {
> + amd_smn_read(nid, umc_base + UMCCH_DIMM_CFG, &umc->dimm_cfg);
> + amd_smn_read(nid, umc_base + UMCCH_UMC_CAP_HI, &umc->umc_cap_hi);
> + }
> }
> }
>
> @@ -3144,7 +3270,9 @@ static void read_mc_regs(struct amd64_pvt *pvt)
> determine_memory_type(pvt);
> edac_dbg(1, " DIMM type: %s\n", edac_mem_types[pvt->dram_type]);
>
> - determine_ecc_sym_sz(pvt);
> + /* ECC symbol size is not available on NONCPU nodes */
> + if (!pvt->is_noncpu)
> + determine_ecc_sym_sz(pvt);
> }
>
> /*
> @@ -3232,15 +3360,21 @@ static int init_csrows_df(struct mem_ctl_info *mci)

No, separate function: init_csrows_gpu()

> continue;
>
> empty = 0;
> - dimm = mci->csrows[cs]->channels[umc]->dimm;
> + if (pvt->is_noncpu) {
> + dimm = mci->csrows[umc]->channels[cs]->dimm;
> + dimm->edac_mode = EDAC_SECDED;
> + dimm->dtype = DEV_X16;
> + } else {
> + dimm = mci->csrows[cs]->channels[umc]->dimm;
> + dimm->edac_mode = edac_mode;
> + dimm->dtype = dev_type;
> + }
>
> edac_dbg(1, "MC node: %d, csrow: %d\n",
> pvt->mc_node_id, cs);
>
> dimm->nr_pages = get_csrow_nr_pages(pvt, umc, cs);
> dimm->mtype = pvt->dram_type;
> - dimm->edac_mode = edac_mode;
> - dimm->dtype = dev_type;
> dimm->grain = 64;
> }
> }
> @@ -3505,7 +3639,9 @@ static bool ecc_enabled(struct amd64_pvt *pvt)
>
> umc_en_mask |= BIT(i);
>
> - if (umc->umc_cap_hi & UMC_ECC_ENABLED)
> + /* ECC is enabled by default on NONCPU nodes */
> + if (pvt->is_noncpu ||
> + (umc->umc_cap_hi & UMC_ECC_ENABLED))
> ecc_en_mask |= BIT(i);

Separate function pls.

I guess you get the idea - you simply define a separate function for
the family you're adding support for instead of sprinkling if (bla)
everywhere.

If functionality is duplicated, you define a common helper.

Feel free to ask if something's unclear.

...

> @@ -3804,6 +3963,9 @@ static int probe_one_instance(unsigned int nid)
> struct ecc_settings *s;
> int ret;
>
> + if (!F3)
> + return 0;
> +
> ret = -ENOMEM;
> s = kzalloc(sizeof(struct ecc_settings), GFP_KERNEL);
> if (!s)
> @@ -3815,6 +3977,9 @@ static int probe_one_instance(unsigned int nid)
> if (!pvt)
> goto err_settings;
>
> + if (nid >= NONCPU_NODE_INDEX)
> + pvt->is_noncpu = true;

This is silly and error-prone. Proper detection should happen in
per_family_init() and there you should read out from the hardware
whether this is a GPU or a CPU node.

Then, you should put an enum type in amd64_family_type which has

{ FAM_TYPE_CPU, FAM_TYPE_GPU, ... }

etc and the places where you need to check whether it is CPU or a GPU,
test those types.

> +
> pvt->mc_node_id = nid;
> pvt->F3 = F3;
>
> @@ -3888,6 +4053,10 @@ static void remove_one_instance(unsigned int nid)
> struct mem_ctl_info *mci;
> struct amd64_pvt *pvt;
>
> + /* Nothing to remove for the space holder entries */
> + if (!F3)
> + return;
> +
> /* Remove from EDAC CORE tracking list */
> mci = edac_mc_del_mc(&F3->dev);
> if (!mci)
> diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
> index 85aa820bc165..0844f004c90b 100644
> --- a/drivers/edac/amd64_edac.h
> +++ b/drivers/edac/amd64_edac.h
> @@ -126,6 +126,8 @@
> #define PCI_DEVICE_ID_AMD_17H_M70H_DF_F6 0x1446
> #define PCI_DEVICE_ID_AMD_19H_DF_F0 0x1650
> #define PCI_DEVICE_ID_AMD_19H_DF_F6 0x1656
> +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F0 0x14D0
> +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F6 0x14D6
>
> /*
> * Function 1 - Address Map
> @@ -298,6 +300,7 @@ enum amd_families {
> F17_M60H_CPUS,
> F17_M70H_CPUS,
> F19_CPUS,
> + ALDEBARAN_GPUS,
> NUM_FAMILIES,
> };
>
> @@ -389,6 +392,9 @@ struct amd64_pvt {
> enum mem_type dram_type;
>
> struct amd64_umc *umc; /* UMC registers */
> + char buf[20];

A 20 char buffer in every pvt structure just so that you can sprintf
into it when it is a GPU? Err, I don't think so.

You can do the same thing as with the CPUs - the same string for every
pvt instance.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-09-01 19:03:19

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] x86/amd_nb: Add support for northbridges on Aldebaran

On Wed, Aug 25, 2021 at 12:42:43PM +0200, Borislav Petkov wrote:
> On Tue, Aug 24, 2021 at 12:24:35AM +0530, Naveen Krishna Chatradhi wrote:
...
> >
> > The GPU nodes are enumerated in sequential order based on the
> > PCI hierarchy, and the first GPU node is assumed to have an "AMD Node
> > ID" value of 8 (the second GPU node has 9, etc.).
>
> What does that mean? The GPU nodes are simply numerically after the CPU
> nodes or how am I to understand this nomenclature?
>

Yes, the GPU nodes will be numerically after the CPU nodes. However, there
will be a gap in the "Node ID" values. For example, if there is one CPU node
and two GPU nodes, then the "Node ID" values will look like this:

CPU Node0 -> System Node ID 0
GPU Node0 -> System Node ID 8
GPU Node1 -> System Node ID 9

...
> > + * of 8 (the second GPU node has 9, etc.).
> > + */
> > +#define NONCPU_NODE_INDEX 8
>
> Why is this assumed? Can it instead be read from the hardware somewhere?
> Or there simply won't be more than 8 CPU nodes anyway? Not at least in
> the near future?
>

Yes, the intention is to leave a big enough gap for at least the forseeable
future.

> I'd prefer stuff to be read out directly from the hardware so that when
> the hardware changes, the code just works instead of doing assumptions
> which get invalidated later.
>

So after going through the latest documentation and asking the one of our
hardware folks, it looks like we have an option to read this value from one of
the Data Fabric registers. Hopefully, whatever solution we settle on will
stick for a while. The Data Fabric registers are not architectural, and
registers and fields have changed between model groups.

...
> > +static const struct pci_device_id amd_noncpu_root_ids[] = {
>
> Why is that "noncpu" thing everywhere? Is this thing going to be
> anything else besides a GPU?
>
> If not, you can simply call it
>
> amd_gpu_root_ids
>
> to mean *exactly* what they are. PCI IDs on the GPU.
>

These devices aren't officially GPUs, since they don't have graphics/video
capabilities. Can we come up with a new term for this class of devices? Maybe
accelerators or something?

In any case, GPU is still used throughout documentation and code, so it's fair
to just stick with "gpu".

...
> >
> > - nb = kcalloc(misc_count, sizeof(struct amd_northbridge), GFP_KERNEL);
> > + if (misc_count_noncpu) {
> > + /*
> > + * The first non-CPU Node ID starts at 8 even if there are fewer
> > + * than 8 CPU nodes. To maintain the AMD Node ID to Linux amd_nb
> > + * indexing scheme, allocate the number of GPU nodes plus 8.
> > + * Some allocated amd_northbridge structures will go unused when
> > + * the number of CPU nodes is less than 8, but this tradeoff is to
> > + * keep things relatively simple.
>
> Why simple?
>
> What's wrong with having
>
> [node IDs][GPU node IDs]
>
> i.e., the usual nodes come first and the GPU ones after it.
>
> You enumerate everything properly here so you can control what goes
> where. Which means, you don't need this NONCPU_NODE_INDEX non-sense at
> all.
>
> Hmmm?
>

We use the Node ID to index into the amd_northbridge.nb array, e.g. in
node_to_amd_nb().

We can get the Node ID of a GPU node when processing an MCA error as in Patch
2 of this set. The hardware is going to give us a value of 8 or more.

So, for example, if we set up the "nb" array like this for 1 CPU and 2 GPUs:
[ID:Type] : [0: CPU], [8: GPU], [9: GPU]

Then I think we'll need some more processing at runtime to map, for example,
an error from GPU Node 9 to NB array Index 2, etc.

Or we can manage this at init time like this:
[0: CPU], [1: NULL], [2: NULL], [3: NULL], [4: NULL], [5: NULL], [6: NULL],
[7, NULL], [8: GPU], [9: GPU]

And at runtime, the code which does Node ID to NB entry just works. This
applies to node_to_amd_nb(), places where we loop over amd_nb_num(), etc.

What do you think?

Thanks,
Yazen

2021-09-01 20:21:11

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH v3 2/3] EDAC/mce_amd: Extract node id from MCA_IPID

On Fri, Aug 27, 2021 at 12:24:29PM +0200, Borislav Petkov wrote:
> On Tue, Aug 24, 2021 at 12:24:36AM +0530, Naveen Krishna Chatradhi wrote:
> > On SMCA banks of the NONCPU nodes, the node id information is
> > available in MCA_IPID[47:44](InstanceIdHi).
> >
> > Signed-off-by: Muralidhara M K <[email protected]>
> > Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
> > Reviewed-by: Yazen Ghannam <[email protected]>
> > ---
> > Changes since v2:
> > 1. Modified subject and commit message
> > 2. Added Reviewed by Yazen Ghannam
> >
> > drivers/edac/mce_amd.c | 19 +++++++++++++++++--
> > 1 file changed, 17 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
> > index 27d56920b469..1398032ba25a 100644
> > --- a/drivers/edac/mce_amd.c
> > +++ b/drivers/edac/mce_amd.c
> > @@ -1072,8 +1072,23 @@ static void decode_smca_error(struct mce *m)
> > if (xec < smca_mce_descs[bank_type].num_descs)
> > pr_cont(", %s.\n", smca_mce_descs[bank_type].descs[xec]);
> >
> > - if (bank_type == SMCA_UMC && xec == 0 && decode_dram_ecc)
> > - decode_dram_ecc(topology_die_id(m->extcpu), m);
> > + if (xec == 0 && decode_dram_ecc) {
> > + int node_id = 0;
> > +
> > + if (bank_type == SMCA_UMC) {
> > + node_id = topology_die_id(m->extcpu);
> > + } else if (bank_type == SMCA_UMC_V2) {
> > + /*
> > + * SMCA_UMC_V2 is used on the noncpu nodes, extract
>
> Above "NONCPU", here "noncpu", I don't like that "noncpu" nomenclature.
> I wonder if we can do without it...
>

Yeah, I think that's fair.

Thanks,
Yazen

2021-09-01 20:22:10

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH v3 3/3] EDAC/amd64: Enumerate memory on noncpu nodes

On Fri, Aug 27, 2021 at 01:30:57PM +0200, Borislav Petkov wrote:
> On Tue, Aug 24, 2021 at 12:24:37AM +0530, Naveen Krishna Chatradhi wrote:

...

> > @@ -1335,6 +1392,11 @@ static void determine_memory_type(struct amd64_pvt *pvt)
> > u32 dram_ctrl, dcsm;
> >
> > if (pvt->umc) {
> > + if (pvt->is_noncpu) {
> > + pvt->dram_type = MEM_HBM2;
> > + return;
> > + }
>
> I don't like this sprinkling of "if (pvt->is_noncpu)" everywhere,
> at all. Please define a separate read_mc_regs_df() or so which
> contains only the needed functionalty which you can carve out from
> read_mc_regs().
>

I like this idea.

> > +
> > if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(5))
> > pvt->dram_type = MEM_LRDDR4;
> > else if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(4))
> > @@ -1724,7 +1786,10 @@ static int f17_early_channel_count(struct amd64_pvt *pvt)
> >
> > /* SDP Control bit 31 (SdpInit) is clear for unused UMC channels */
> > for_each_umc(i)
> > - channels += !!(pvt->umc[i].sdp_ctrl & UMC_SDP_INIT);
> > + if (pvt->is_noncpu)
> > + channels += pvt->csels[i].b_cnt;
> > + else
> > + channels += !!(pvt->umc[i].sdp_ctrl & UMC_SDP_INIT);
> >
> > amd64_info("MCT channel count: %d\n", channels);
> >
>
> No, a separate gpu_early_channel_count() is needed here. There's a
> reason for those function pointers getting assigned depending on family.
>

Good point.

...
> > +/*
> > + * The CPUs have one channel per UMC, So a UMC number is equivalent to a
> > + * channel number. The NONCPUs have 8 channels per UMC, so the UMC number no
> > + * longer works as a channel number.
> > + * The channel number within a NONCPU UMC is given in MCA_IPID[15:12].
> > + * However, the IDs are split such that two UMC values go to one UMC, and
> > + * the channel numbers are split in two groups of four.
> > + *
> > + * Refer comment on get_noncpu_umc_base() from amd64_edac.h
> > + *
> > + * For example,
> > + * UMC0 CH[3:0] = 0x0005[3:0]000
> > + * UMC0 CH[7:4] = 0x0015[3:0]000
> > + * UMC1 CH[3:0] = 0x0025[3:0]000
> > + * UMC1 CH[7:4] = 0x0035[3:0]000
> > + */
> > +static int find_umc_channel_noncpu(struct mce *m)
> > +{
> > + u8 umc = find_umc_channel(m);
> > + u8 ch = ((m->ipid >> 12) & 0xf);
> > +
> > + return umc % 2 ? (ch + 4) : ch;
> > +}
> > +
> > static void decode_umc_error(int node_id, struct mce *m)
> > {
> > u8 ecc_type = (m->status >> 45) & 0x3;
> > @@ -2897,6 +3003,7 @@ static void decode_umc_error(int node_id, struct mce *m)
> > struct amd64_pvt *pvt;
> > struct err_info err;
> > u64 sys_addr;
> > + u8 df_inst_id;
>
> You don't need that variable and can work with err.channel just fine.
>
> > mci = edac_mc_find(node_id);
> > if (!mci)
> > @@ -2909,7 +3016,22 @@ static void decode_umc_error(int node_id, struct mce *m)
> > if (m->status & MCI_STATUS_DEFERRED)
> > ecc_type = 3;
> >
> > - err.channel = find_umc_channel(m);
> > + if (pvt->is_noncpu) {
> > + /*
> > + * The NONCPUs have one Chip Select per UMC, so the UMC number
> > + * can used as the Chip Select number. However, the UMC number
> > + * is split in the ID value so it's necessary to divide by 2.
> > + */
> > + err.csrow = find_umc_channel(m) / 2;
> > + err.channel = find_umc_channel_noncpu(m);
> > + /* On NONCPUs, instance id is calculated as below. */
> > + df_inst_id = err.csrow * 8 + err.channel;
>
> err.channel += err.csrow * 8;
>
> tadaaa!
>

err.channel still needs to be used in error_address_to_page_and_offset()
below. So changing it here messes up what's reported to EDAC.

...
> > @@ -3804,6 +3963,9 @@ static int probe_one_instance(unsigned int nid)
> > struct ecc_settings *s;
> > int ret;
> >
> > + if (!F3)
> > + return 0;
> > +
> > ret = -ENOMEM;
> > s = kzalloc(sizeof(struct ecc_settings), GFP_KERNEL);
> > if (!s)
> > @@ -3815,6 +3977,9 @@ static int probe_one_instance(unsigned int nid)
> > if (!pvt)
> > goto err_settings;
> >
> > + if (nid >= NONCPU_NODE_INDEX)
> > + pvt->is_noncpu = true;
>
> This is silly and error-prone. Proper detection should happen in
> per_family_init() and there you should read out from the hardware
> whether this is a GPU or a CPU node.
>
> Then, you should put an enum type in amd64_family_type which has
>
> { FAM_TYPE_CPU, FAM_TYPE_GPU, ... }
>
> etc and the places where you need to check whether it is CPU or a GPU,
> test those types.
>

This is a good idea. But we have a global *fam_type, so this should be moved
into struct amd64_pvt, if possible. Then each node can have its own fam_type.

..
> > @@ -389,6 +392,9 @@ struct amd64_pvt {
> > enum mem_type dram_type;
> >
> > struct amd64_umc *umc; /* UMC registers */
> > + char buf[20];
>
> A 20 char buffer in every pvt structure just so that you can sprintf
> into it when it is a GPU? Err, I don't think so.
>
> You can do the same thing as with the CPUs - the same string for every
> pvt instance.
>

Fair point. I like the idea of having unique names though. Is this possible
with the current EDAC framework? Or is it not worth it?

Thanks,
Yazen

2021-09-02 17:34:03

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] x86/amd_nb: Add support for northbridges on Aldebaran

On Wed, Sep 01, 2021 at 06:17:21PM +0000, Yazen Ghannam wrote:
> These devices aren't officially GPUs, since they don't have graphics/video
> capabilities. Can we come up with a new term for this class of devices? Maybe
> accelerators or something?
>
> In any case, GPU is still used throughout documentation and code, so it's fair
> to just stick with "gpu".

Hmm, yeah, everybody is talking about special-purpose processing units
now, i.e., accelerators or whatever they call them. I guess this is the
new fancy thing since sliced bread.

Well, what are those PCI IDs going to represent? Devices which have RAS
capabilities on them?

We have this nomenclature called "uncore" in the perf subsystem for
counters which are not part of the CPU core or whatever. But there we
use that term on AMD already so that might cause confusion.

But I guess the type of those devices doesn't matter for amd_nb.c,
right?

All that thing cares for is having an array of northbridges, each with
the respective PCI devices and that's it. So for amd_nb.c I think that
differentiation doesn't matter... but keep reading...

> We use the Node ID to index into the amd_northbridge.nb array, e.g. in
> node_to_amd_nb().
>
> We can get the Node ID of a GPU node when processing an MCA error as in Patch
> 2 of this set. The hardware is going to give us a value of 8 or more.
>
> So, for example, if we set up the "nb" array like this for 1 CPU and 2 GPUs:
> [ID:Type] : [0: CPU], [8: GPU], [9: GPU]
>
> Then I think we'll need some more processing at runtime to map, for example,
> an error from GPU Node 9 to NB array Index 2, etc.
>
> Or we can manage this at init time like this:
> [0: CPU], [1: NULL], [2: NULL], [3: NULL], [4: NULL], [5: NULL], [6: NULL],
> [7, NULL], [8: GPU], [9: GPU]
>
> And at runtime, the code which does Node ID to NB entry just works. This
> applies to node_to_amd_nb(), places where we loop over amd_nb_num(), etc.
>
> What do you think?

Ok, looking at patch 2, it does:

node_id = ((m->ipid >> 44) & 0xF);

So how ugly would it become if you do here:

node_id = ((m->ipid >> 44) & 0xF);
node_id -= accel_id_offset;

where that accel_id_offset is the thing you've read out from one of the
Data Fabric registers before?

This way, the gap between CPU IDs and accel IDs is gone and in the
software view, there is none.

Or are we reading other hardware registers which are aware of that gap
and we would have to remove it again to get the proper index? And if so,
and if it becomes real ugly, maybe we will have to bite the bullet and
do the gap in the array but that would be yucky...

Hmmm.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-09-08 19:07:42

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 3/3] EDAC/amd64: Enumerate memory on noncpu nodes

On Wed, Sep 01, 2021 at 06:42:26PM +0000, Yazen Ghannam wrote:
> err.channel still needs to be used in error_address_to_page_and_offset()
> below.

I think you mean __log_ecc_error().

> This is a good idea. But we have a global *fam_type, so this should be moved
> into struct amd64_pvt, if possible. Then each node can have its own fam_type.

per_family_init() does assign stuff to pvt members so yes, we're saying
the same thing, practically.

> Fair point. I like the idea of having unique names though. Is this possible
> with the current EDAC framework? Or is it not worth it?

We don't have unique names for the CPU nodes:

[ 25.637486] EDAC MC0: Giving out device to module amd64_edac controller F17h_M30h: DEV 0000:00:18.3 (INTERRUPT)
[ 25.799554] EDAC MC1: Giving out device to module amd64_edac controller F17h_M30h: DEV 0000:00:19.3 (INTERRUPT)

why does it matter to have unique names for the accelerators?

If you wanna differentiate them, you can dump the PCI devs like above.

Just to make it clear - I'm not against it per-se - I'd just need a
stronger justification for doing this than just "I like the idea".

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-09-14 00:50:22

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] x86/amd_nb: Add support for northbridges on Aldebaran

On Thu, Sep 02, 2021 at 07:30:24PM +0200, Borislav Petkov wrote:
> On Wed, Sep 01, 2021 at 06:17:21PM +0000, Yazen Ghannam wrote:
> > These devices aren't officially GPUs, since they don't have graphics/video
> > capabilities. Can we come up with a new term for this class of devices? Maybe
> > accelerators or something?
> >
> > In any case, GPU is still used throughout documentation and code, so it's fair
> > to just stick with "gpu".
>
> Hmm, yeah, everybody is talking about special-purpose processing units
> now, i.e., accelerators or whatever they call them. I guess this is the
> new fancy thing since sliced bread.
>
> Well, what are those PCI IDs going to represent? Devices which have RAS
> capabilities on them?
>
> We have this nomenclature called "uncore" in the perf subsystem for
> counters which are not part of the CPU core or whatever. But there we
> use that term on AMD already so that might cause confusion.
>
> But I guess the type of those devices doesn't matter for amd_nb.c,
> right?
>
> All that thing cares for is having an array of northbridges, each with
> the respective PCI devices and that's it. So for amd_nb.c I think that
> differentiation doesn't matter... but keep reading...
>
> > We use the Node ID to index into the amd_northbridge.nb array, e.g. in
> > node_to_amd_nb().
> >
> > We can get the Node ID of a GPU node when processing an MCA error as in Patch
> > 2 of this set. The hardware is going to give us a value of 8 or more.
> >
> > So, for example, if we set up the "nb" array like this for 1 CPU and 2 GPUs:
> > [ID:Type] : [0: CPU], [8: GPU], [9: GPU]
> >
> > Then I think we'll need some more processing at runtime to map, for example,
> > an error from GPU Node 9 to NB array Index 2, etc.
> >
> > Or we can manage this at init time like this:
> > [0: CPU], [1: NULL], [2: NULL], [3: NULL], [4: NULL], [5: NULL], [6: NULL],
> > [7, NULL], [8: GPU], [9: GPU]
> >
> > And at runtime, the code which does Node ID to NB entry just works. This
> > applies to node_to_amd_nb(), places where we loop over amd_nb_num(), etc.
> >
> > What do you think?
>
> Ok, looking at patch 2, it does:
>
> node_id = ((m->ipid >> 44) & 0xF);
>
> So how ugly would it become if you do here:
>
> node_id = ((m->ipid >> 44) & 0xF);
> node_id -= accel_id_offset;
>
> where that accel_id_offset is the thing you've read out from one of the
> Data Fabric registers before?
>
> This way, the gap between CPU IDs and accel IDs is gone and in the
> software view, there is none.
>
> Or are we reading other hardware registers which are aware of that gap
> and we would have to remove it again to get the proper index? And if so,
> and if it becomes real ugly, maybe we will have to bite the bullet and
> do the gap in the array but that would be yucky...
>
> Hmmm.
>

I really like this idea. I've gone over the current and future code a few
times to make sure things are okay. As far as I can tell, this idea should
work most of the time, since the "node_id" value is mostly used to look up the
right devices in the nb array. But there is one case so far where the "real"
hardware node_id is needed during address translation. This case is in the new
code in review for Data Fabric v3.5, and it only applies to the GPU devices.

What do you think about having a couple of helper functions to go between the
hardware and Linux index IDs? Most cases will use "hardware -> Linux index",
and when needed there can be a "Linux index -> hardware".

I think we still need some piece of info to indicate a device is a GPU based
on its node_id. The AMD NB code doesn't need to know, but the address
translation code does. The AMD NB enumeration can be mostly generic. I think
it may be enough to save an "id offset" value and also a "first special index"
value. Then we can go back and forth between the appropriate values without
having to allocate a bunch of unused memory or hardcoding certain values.

Thanks for the idea!

-Yazen

2021-09-14 00:53:27

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH v3 3/3] EDAC/amd64: Enumerate memory on noncpu nodes

On Wed, Sep 08, 2021 at 08:41:46PM +0200, Borislav Petkov wrote:
> On Wed, Sep 01, 2021 at 06:42:26PM +0000, Yazen Ghannam wrote:
> > err.channel still needs to be used in error_address_to_page_and_offset()
> > below.
>
> I think you mean __log_ecc_error().
>

Yep, you're right.

> > This is a good idea. But we have a global *fam_type, so this should be moved
> > into struct amd64_pvt, if possible. Then each node can have its own fam_type.
>
> per_family_init() does assign stuff to pvt members so yes, we're saying
> the same thing, practically.
>
> > Fair point. I like the idea of having unique names though. Is this possible
> > with the current EDAC framework? Or is it not worth it?
>
> We don't have unique names for the CPU nodes:
>
> [ 25.637486] EDAC MC0: Giving out device to module amd64_edac controller F17h_M30h: DEV 0000:00:18.3 (INTERRUPT)
> [ 25.799554] EDAC MC1: Giving out device to module amd64_edac controller F17h_M30h: DEV 0000:00:19.3 (INTERRUPT)
>
> why does it matter to have unique names for the accelerators?
>
> If you wanna differentiate them, you can dump the PCI devs like above.
>
> Just to make it clear - I'm not against it per-se - I'd just need a
> stronger justification for doing this than just "I like the idea".
>

There isn't a strong reason at the moment. I think it may be one less hurdle
for users to go through when identifying a device. But, as you said, there are
other ways to differentiate devices.

Thanks,
Yazen

2021-09-17 05:47:40

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] x86/amd_nb: Add support for northbridges on Aldebaran

On Mon, Sep 13, 2021 at 06:07:30PM +0000, Yazen Ghannam wrote:
> I really like this idea. I've gone over the current and future code a few
> times to make sure things are okay. As far as I can tell, this idea should
> work most of the time, since the "node_id" value is mostly used to look up the
> right devices in the nb array. But there is one case so far where the "real"
> hardware node_id is needed during address translation.

Yap, I figured as much as this is kinda like the only place where you'd
care about the actual node id.

> This case is in the new code in review for Data Fabric v3.5, and it
> only applies to the GPU devices.
>
> What do you think about having a couple of helper functions to go between the
> hardware and Linux index IDs? Most cases will use "hardware -> Linux index",
> and when needed there can be a "Linux index -> hardware".

That's fine as long as it is properly documented what it does.

> I think we still need some piece of info to indicate a device is a GPU based
> on its node_id. The AMD NB code doesn't need to know, but the address
> translation code does. The AMD NB enumeration can be mostly generic. I think
> it may be enough to save an "id offset" value and also a "first special index"
> value. Then we can go back and forth between the appropriate values without
> having to allocate a bunch of unused memory or hardcoding certain values.

Well, since we're going to need this in the translation logic and that
is part of amd64_edac and there we said we'll move the family type up
into amd64_pvt so that you can have a family descriptor per node, then I
guess you're all set. :-)

> Thanks for the idea!

Sure, np.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Subject: Re: [PATCH v3 1/3] x86/amd_nb: Add support for northbridges on Aldebaran

Hi Boris,

Apologies for the late reply. We were modifying the code based on the
conversation between yourself and Yazen.

I would like to answer some of your questions before submitting the next
version of the patch-set.

On 8/25/2021 4:12 PM, Borislav Petkov wrote:
> [CAUTION: External Email]
>
> On Tue, Aug 24, 2021 at 12:24:35AM +0530, Naveen Krishna Chatradhi wrote:
>> From: Muralidhara M K <[email protected]>
>>
>> On newer systems the CPUs manage MCA errors reported from the GPUs.
>> Enumerate the GPU nodes with the AMD NB framework to support EDAC.
>>
>> This patch adds necessary code to manage the Aldebaran nodes along with
> Avoid having "This patch" or "This commit" in the commit message. It is
> tautologically useless.
>
> Also, do
>
> $ git grep 'This patch' Documentation/process
>
> for more details.
Sure thanks
>
> Also, what are the "Aldebaran nodes"?
>
> Something on a star which is 65 light years away?

Aldebaran is an AMD GPU name, code submitted [PATCH 000/159] Aldebaran
support (lists.freedesktop.org)
<https://lists.freedesktop.org/archives/amd-gfx/2021-February/059694.html>
is a part of the DRM framework

>
>> the CPU nodes.
>>
>> The GPU nodes are enumerated in sequential order based on the
>> PCI hierarchy, and the first GPU node is assumed to have an "AMD Node
>> ID" value of 8 (the second GPU node has 9, etc.).
> What does that mean? The GPU nodes are simply numerically after the CPU
> nodes or how am I to understand this nomenclature?
>
>> Each Aldebaran GPU
>> package has 2 Data Fabrics, which are enumerated as 2 nodes.
>> With this implementation detail, the Data Fabric on the GPU nodes can be
>> accessed the same way as the Data Fabric on CPU nodes.
>>
>> Signed-off-by: Muralidhara M K <[email protected]>
>> Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
>> Reviewed-by: Yazen Ghannam <[email protected]>
>> ---
>> Changes since v2: Added Reviewed-by Yazen Ghannam
>>
>> arch/x86/include/asm/amd_nb.h | 10 ++++++
>> arch/x86/kernel/amd_nb.c | 63 ++++++++++++++++++++++++++++++++---
>> include/linux/pci_ids.h | 1 +
>> 3 files changed, 69 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h
>> index 455066a06f60..09905f6c7218 100644
>> --- a/arch/x86/include/asm/amd_nb.h
>> +++ b/arch/x86/include/asm/amd_nb.h
>> @@ -80,6 +80,16 @@ struct amd_northbridge_info {
>>
>> #ifdef CONFIG_AMD_NB
>>
>> +/*
>> + * On newer heterogeneous systems the data fabrics of the CPUs and GPUs
>> + * are connected directly via a custom links, like is done with
> s/ a //
>
>> + * 2 socket CPU systems and also within a socket for Multi-chip Module
>> + * (MCM) CPUs like Naples.
>> + * The first GPU node(non cpu) is assumed to have an "AMD Node ID" value
> In all your text:
>
> s/cpu/CPU/g
>
>> + * of 8 (the second GPU node has 9, etc.).
>> + */
>> +#define NONCPU_NODE_INDEX 8
> Why is this assumed? Can it instead be read from the hardware somewhere?
> Or there simply won't be more than 8 CPU nodes anyway? Not at least in
> the near future?
>
> I'd prefer stuff to be read out directly from the hardware so that when
> the hardware changes, the code just works instead of doing assumptions
> which get invalidated later.
>
>> +
>> u16 amd_nb_num(void);
>> bool amd_nb_has_feature(unsigned int feature);
>> struct amd_northbridge *node_to_amd_nb(int node);
>> diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
>> index 23dda362dc0f..6ad5664a18aa 100644
>> --- a/arch/x86/kernel/amd_nb.c
>> +++ b/arch/x86/kernel/amd_nb.c
>> @@ -26,6 +26,8 @@
>> #define PCI_DEVICE_ID_AMD_17H_M70H_DF_F4 0x1444
>> #define PCI_DEVICE_ID_AMD_19H_DF_F4 0x1654
>> #define PCI_DEVICE_ID_AMD_19H_M50H_DF_F4 0x166e
>> +#define PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT 0x14bb
>> +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4 0x14d4
> You see how those defines are aligned vertically, right?
>
> If your new defines don't fit, you realign them all vertically too - you
> don't just slap them there.
>
> And if it wasn't clear above, that Aldebaran GPU chip name means
> something only to AMD folks. If this is correct
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FVideo_Core_Next&amp;data=04%7C01%7CNaveenKrishna.Chatradhi%40amd.com%7Ce18fc8abf0da4da621b908d967b4ff31%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637654849356966973%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Ge%2Fh%2FA2YmlKA7IF8XjuwM%2F4eygYYfybMOJs4jLX3g3I%3D&amp;reserved=0
>
> then that Aldebaran thing is VCN 2.6 although this is only the video
> encoding stuff and GPU I guess is more than that.
>
> IOW, what I'm trying to say is, just like we name the CPUs using their
> families, you should name the GPUs nomenclature with GPU families (I
> guess there's stuff like that) and not use the marketing crap.

Aldebaran GPU might be a later variant of gfx9 and are connected to the
CPU sockets via custom xGMI links.

I could not find any family number associated with the GPUs. The DRM
driver code uses it as follows and

does not expose the value to other frameworks in Linux.

+#define CHIP_ALDEBARAN 25

in
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm

> If you need an example, here's how we did it for the Intel marketing
> pile of bullsh*t:
>
> arch/x86/include/asm/intel-family.h
>
> Please provide a similar way of referring to the GPU chips.
>
>> /* Protect the PCI config register pairs used for SMN and DF indirect access. */
>> static DEFINE_MUTEX(smn_mutex);
>> @@ -94,6 +96,21 @@ static const struct pci_device_id hygon_nb_link_ids[] = {
>> {}
>> };
>>
>> +static const struct pci_device_id amd_noncpu_root_ids[] = {
> Why is that "noncpu" thing everywhere? Is this thing going to be
> anything else besides a GPU?
>
> If not, you can simply call it
>
> amd_gpu_root_ids
>
> to mean *exactly* what they are. PCI IDs on the GPU.
>
>> + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT) },
>> + {}
>> +};
>> +
>> +static const struct pci_device_id amd_noncpu_nb_misc_ids[] = {
>> + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3) },
>> + {}
>> +};
>> +
>> +static const struct pci_device_id amd_noncpu_nb_link_ids[] = {
>> + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4) },
>> + {}
>> +};
>> +
>> const struct amd_nb_bus_dev_range amd_nb_bus_dev_ranges[] __initconst = {
>> { 0x00, 0x18, 0x20 },
>> { 0xff, 0x00, 0x20 },
>> @@ -230,11 +247,16 @@ int amd_cache_northbridges(void)
>> const struct pci_device_id *misc_ids = amd_nb_misc_ids;
>> const struct pci_device_id *link_ids = amd_nb_link_ids;
>> const struct pci_device_id *root_ids = amd_root_ids;
>> +
>> + const struct pci_device_id *noncpu_misc_ids = amd_noncpu_nb_misc_ids;
>> + const struct pci_device_id *noncpu_link_ids = amd_noncpu_nb_link_ids;
>> + const struct pci_device_id *noncpu_root_ids = amd_noncpu_root_ids;
>> +
>> struct pci_dev *root, *misc, *link;
>> struct amd_northbridge *nb;
>> u16 roots_per_misc = 0;
>> - u16 misc_count = 0;
>> - u16 root_count = 0;
>> + u16 misc_count = 0, misc_count_noncpu = 0;
>> + u16 root_count = 0, root_count_noncpu = 0;
>> u16 i, j;
>>
>> if (amd_northbridges.num)
>> @@ -253,10 +275,16 @@ int amd_cache_northbridges(void)
>> if (!misc_count)
>> return -ENODEV;
>>
>> + while ((misc = next_northbridge(misc, noncpu_misc_ids)) != NULL)
>> + misc_count_noncpu++;
>> +
>> root = NULL;
>> while ((root = next_northbridge(root, root_ids)) != NULL)
>> root_count++;
>>
>> + while ((root = next_northbridge(root, noncpu_root_ids)) != NULL)
>> + root_count_noncpu++;
>> +
>> if (root_count) {
>> roots_per_misc = root_count / misc_count;
>>
>> @@ -270,15 +298,28 @@ int amd_cache_northbridges(void)
>> }
>> }
>>
>> - nb = kcalloc(misc_count, sizeof(struct amd_northbridge), GFP_KERNEL);
>> + if (misc_count_noncpu) {
>> + /*
>> + * The first non-CPU Node ID starts at 8 even if there are fewer
>> + * than 8 CPU nodes. To maintain the AMD Node ID to Linux amd_nb
>> + * indexing scheme, allocate the number of GPU nodes plus 8.
>> + * Some allocated amd_northbridge structures will go unused when
>> + * the number of CPU nodes is less than 8, but this tradeoff is to
>> + * keep things relatively simple.
> Why simple?
>
> What's wrong with having
>
> [node IDs][GPU node IDs]
>
> i.e., the usual nodes come first and the GPU ones after it.
>
> You enumerate everything properly here so you can control what goes
> where. Which means, you don't need this NONCPU_NODE_INDEX non-sense at
> all.
>
> Hmmm?
>
>> + */
>> + amd_northbridges.num = NONCPU_NODE_INDEX + misc_count_noncpu;
>> + } else {
>> + amd_northbridges.num = misc_count;
>> + }
>> +
>> + nb = kcalloc(amd_northbridges.num, sizeof(struct amd_northbridge), GFP_KERNEL);
>> if (!nb)
>> return -ENOMEM;
>>
>> amd_northbridges.nb = nb;
>> - amd_northbridges.num = misc_count;
>>
>> link = misc = root = NULL;
>> - for (i = 0; i < amd_northbridges.num; i++) {
>> + for (i = 0; i < misc_count; i++) {
>> node_to_amd_nb(i)->root = root =
>> next_northbridge(root, root_ids);
>> node_to_amd_nb(i)->misc = misc =
>> @@ -299,6 +340,18 @@ int amd_cache_northbridges(void)
>> root = next_northbridge(root, root_ids);
>> }
>>
>> + if (misc_count_noncpu) {
>> + link = misc = root = NULL;
>> + for (i = NONCPU_NODE_INDEX; i < NONCPU_NODE_INDEX + misc_count_noncpu; i++) {
> So this is not keeping things relatively simple - this is making you
> jump to the GPU nodes to prepare them too which is making them special.
>
>> + node_to_amd_nb(i)->root = root =
>> + next_northbridge(root, noncpu_root_ids);
>> + node_to_amd_nb(i)->misc = misc =
>> + next_northbridge(misc, noncpu_misc_ids);
>> + node_to_amd_nb(i)->link = link =
>> + next_northbridge(link, noncpu_link_ids);
> And seeing how you put those pointers in ->root, ->misc and ->link,
> you can just as well drop those noncpu_*_ids and put the aldebaran
> PCI IDs simply in amd_root_ids, amd_nb_misc_ids and amd_nb_link_ids
> respectively.
>
> Because to this code, the RAS functionality is no different than any
> other CPU because, well, the interface is those PCI devices. So the
> thing doesn't care if it is GPU or not.
>
> So you don't need any of that separation between GPU and CPU nodes when
> it comes to the RAS code.

The roots_per_misc count is different for the CPU nodes and GPU nodes.
We tried to address

your comment without introducing pci_dev_id arrays for GPU roots, misc
and links. But, introducing

GPU ID arrays looks cleaner, let me submit the revised code and we can
revisit this point.

>
> Makes sense?
>
> --
> Regards/Gruss,
> Boris.

Regards,

Naveen

>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&amp;data=04%7C01%7CNaveenKrishna.Chatradhi%40amd.com%7Ce18fc8abf0da4da621b908d967b4ff31%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637654849356966973%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Y5j%2BNctWbWjlpOkQ5DYvDtroWv8MplSBTPgopewm38E%3D&amp;reserved=0

2021-10-11 18:12:21

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v3 1/3] x86/amd_nb: Add support for northbridges on Aldebaran

On Mon, Oct 11, 2021 at 07:56:34PM +0530, Chatradhi, Naveen Krishna wrote:
> Aldebaran is an AMD GPU name, code submitted [PATCH 000/159] Aldebaran
> support (lists.freedesktop.org)
> <https://lists.freedesktop.org/archives/amd-gfx/2021-February/059694.html>
> is a part of the DRM framework

A short explanation in your patchset would be very helpful so that a
reader can know what it is and search the net further, if more info is
needed.

> Aldebaran GPU might be a later variant of gfx9 and are connected to the CPU
> sockets via custom xGMI links.
>
> I could not find any family number associated with the GPUs. The DRM driver
> code uses it as follows and
>
> does not expose the value to other frameworks in Linux.
>
> +#define CHIP_ALDEBARAN 25
>
> in https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm

Aha, so Aldebaran is the chip name. And how are those PCI IDs named in
the documentation? Aldebaran data fabric PCI functions or so?

> The roots_per_misc count is different for the CPU nodes and GPU nodes. We
> tried to address
>
> your comment without introducing pci_dev_id arrays for GPU roots, misc and
> links. But, introducing
>
> GPU ID arrays looks cleaner, let me submit the revised code and we can
> revisit this point.

Ok, but as I said above, what those devices are, means nothing to the
amd_nb code because that simply enumerates PCI IDs when those things
were simply northbridges.

If the GPU PCI IDs do not fit easily into the scheme then maybe the
scheme has become inadeqate... we'll see...

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Subject: [PATCH v4 0/4] x86/edac/amd64: Add support for noncpu nodes

On newer heterogeneous systems with AMD CPUs the data fabrics of GPUs
can be connected directly via custom links.

This patchset does the following
1. amd_nb.c:
a. Add support for northbridges on Aldebaran GPU nodes
b. export AMD node map details to be used by edac and mce modules

2. mce_amd module:
a. Identify the node ID where the error occurred and map the node id
to linux enumerated node id.

2. Modifies the amd64_edac module
a. Add new family op routines
b. Enumerate UMCs and HBMs on the GPU nodes

This patchset is rebased on top of
"
commit 07416cadfdfa38283b840e700427ae3782c76f6b
Author: Yazen Ghannam <[email protected]>
Date: Tue Oct 5 15:44:19 2021 +0000

EDAC/amd64: Handle three rank interleaving mode
"

Muralidhara M K (2):
x86/amd_nb: Add support for northbridges on Aldebaran
EDAC/amd64: Extend family ops functions

Naveen Krishna Chatradhi (2):
EDAC/mce_amd: Extract node id from MCA_IPID
EDAC/amd64: Enumerate memory on Aldebaran GPU nodes

arch/x86/include/asm/amd_nb.h | 9 +
arch/x86/kernel/amd_nb.c | 131 +++++++--
drivers/edac/amd64_edac.c | 517 +++++++++++++++++++++++++---------
drivers/edac/amd64_edac.h | 33 +++
drivers/edac/mce_amd.c | 24 +-
include/linux/pci_ids.h | 1 +
6 files changed, 564 insertions(+), 151 deletions(-)

--
2.25.1

Subject: [PATCH 4/4] EDAC/amd64: Enumerate memory on Aldebaran GPU nodes

On newer heterogeneous systems with AMD CPUs, the data fabrics of the GPUs
are connected directly via custom links.

One such system, where Aldebaran GPU nodes are connected to the
Family 19h, model 30h family of CPU nodes, the Aldebaran GPUs can report
memory errors via SMCA banks.

Aldebaran GPU support was added to DRM framework
https://lists.freedesktop.org/archives/amd-gfx/2021-February/059694.html

The GPU nodes comes with HBM2 memory in-built, ECC support is enabled by
default and the UMCs on GPU node are different from the UMCs on CPU nodes.

GPU specific ops routines are defined to extend the amd64_edac
module to enumerate HBM memory leveraging the existing edac and the
amd64 specific data structures.

Note: The UMC Phys on GPU nodes are enumerated as csrows and the UMC
channels connected to HBM banks are enumerated as ranks.

Cc: Yazen Ghannam <[email protected]>
Co-developed-by: Muralidhara M K <[email protected]>
Signed-off-by: Muralidhara M K <[email protected]>
Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
Changes since v3:
1. Bifurcated the GPU code from v2

Changes since v2:
1. Restored line deletions and handled minor comments
2. Modified commit message and some of the function comments
3. variable df_inst_id is introduced instead of umc_num

Changes since v1:
1. Modifed the commit message
2. Change the edac_cap
3. kept sizes of both cpu and noncpu together
4. return success if the !F3 condition true and remove unnecessary validation

drivers/edac/amd64_edac.c | 233 +++++++++++++++++++++++++++++++++++++-
drivers/edac/amd64_edac.h | 27 +++++
2 files changed, 254 insertions(+), 6 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 131ed19f69dd..7173310660a3 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -1123,6 +1123,20 @@ static void debug_display_dimm_sizes_df(struct amd64_pvt *pvt, u8 ctrl)
}
}

+static void debug_display_dimm_sizes_gpu(struct amd64_pvt *pvt, u8 ctrl)
+{
+ int size, cs = 0, cs_mode;
+
+ edac_printk(KERN_DEBUG, EDAC_MC, "UMC%d chip selects:\n", ctrl);
+
+ cs_mode = CS_EVEN_PRIMARY | CS_ODD_PRIMARY;
+
+ for_each_chip_select(cs, ctrl, pvt) {
+ size = pvt->ops->dbam_to_cs(pvt, ctrl, cs_mode, cs);
+ amd64_info(EDAC_MC ": %d: %5dMB\n", cs, size);
+ }
+}
+
static void __dump_misc_regs_df(struct amd64_pvt *pvt)
{
struct amd64_umc *umc;
@@ -1167,6 +1181,27 @@ static void __dump_misc_regs_df(struct amd64_pvt *pvt)
pvt->dhar, dhar_base(pvt));
}

+static void __dump_misc_regs_gpu(struct amd64_pvt *pvt)
+{
+ struct amd64_umc *umc;
+ u32 i, umc_base;
+
+ for_each_umc(i) {
+ umc_base = get_umc_base(i);
+ umc = &pvt->umc[i];
+
+ edac_dbg(1, "UMC%d UMC cfg: 0x%x\n", i, umc->umc_cfg);
+ edac_dbg(1, "UMC%d SDP ctrl: 0x%x\n", i, umc->sdp_ctrl);
+ edac_dbg(1, "UMC%d ECC ctrl: 0x%x\n", i, umc->ecc_ctrl);
+ edac_dbg(1, "UMC%d All HBMs support ECC: yes\n", i);
+
+ debug_display_dimm_sizes_gpu(pvt, i);
+ }
+
+ edac_dbg(1, "F0x104 (DRAM Hole Address): 0x%08x, base: 0x%08x\n",
+ pvt->dhar, dhar_base(pvt));
+}
+
/* Display and decode various NB registers for debug purposes. */
static void __dump_misc_regs(struct amd64_pvt *pvt)
{
@@ -1242,6 +1277,43 @@ static void f17_prep_chip_selects(struct amd64_pvt *pvt)
}
}

+static void gpu_prep_chip_selects(struct amd64_pvt *pvt)
+{
+ int umc;
+
+ for_each_umc(umc) {
+ pvt->csels[umc].b_cnt = 8;
+ pvt->csels[umc].m_cnt = 8;
+ }
+}
+
+static void read_umc_base_mask_gpu(struct amd64_pvt *pvt)
+{
+ u32 base_reg, mask_reg;
+ u32 *base, *mask;
+ int umc, cs;
+
+ for_each_umc(umc) {
+ for_each_chip_select(cs, umc, pvt) {
+ base_reg = get_umc_base_gpu(umc, cs) + UMCCH_BASE_ADDR;
+ base = &pvt->csels[umc].csbases[cs];
+
+ if (!amd_smn_read(pvt->mc_node_id, base_reg, base)) {
+ edac_dbg(0, " DCSB%d[%d]=0x%08x reg: 0x%x\n",
+ umc, cs, *base, base_reg);
+ }
+
+ mask_reg = get_umc_base_gpu(umc, cs) + UMCCH_ADDR_MASK;
+ mask = &pvt->csels[umc].csmasks[cs];
+
+ if (!amd_smn_read(pvt->mc_node_id, mask_reg, mask)) {
+ edac_dbg(0, " DCSM%d[%d]=0x%08x reg: 0x%x\n",
+ umc, cs, *mask, mask_reg);
+ }
+ }
+ }
+}
+
static void read_umc_base_mask(struct amd64_pvt *pvt)
{
u32 umc_base_reg, umc_base_reg_sec;
@@ -1745,6 +1817,19 @@ static int f17_early_channel_count(struct amd64_pvt *pvt)
return channels;
}

+static int gpu_early_channel_count(struct amd64_pvt *pvt)
+{
+ int i, channels = 0;
+
+ /* The memory channels in case of GPUs are fully populated */
+ for_each_umc(i)
+ channels += pvt->csels[i].b_cnt;
+
+ amd64_info("MCT channel count: %d\n", channels);
+
+ return channels;
+}
+
static int ddr3_cs_size(unsigned i, bool dct_width)
{
unsigned shift = 0;
@@ -1942,6 +2027,14 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
return __addr_mask_to_cs_size(addr_mask_orig, cs_mode, csrow_nr, dimm);
}

+static int gpu_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
+ unsigned int cs_mode, int csrow_nr)
+{
+ u32 addr_mask_orig = pvt->csels[umc].csmasks[csrow_nr];
+
+ return __addr_mask_to_cs_size(addr_mask_orig, cs_mode, csrow_nr, csrow_nr >> 1);
+}
+
static void read_dram_ctl_register(struct amd64_pvt *pvt)
{

@@ -2527,8 +2620,11 @@ static void debug_display_dimm_sizes(struct amd64_pvt *pvt, u8 ctrl)
/* Prototypes for family specific ops routines */
static int init_csrows(struct mem_ctl_info *mci);
static int init_csrows_df(struct mem_ctl_info *mci);
+static int init_csrows_gpu(struct mem_ctl_info *mci);
static void __read_mc_regs_df(struct amd64_pvt *pvt);
+static void __read_mc_regs_gpu(struct amd64_pvt *pvt);
static void find_umc_channel(struct mce *m, struct err_info *err);
+static void find_umc_channel_gpu(struct mce *m, struct err_info *err);

static const struct low_ops k8_ops = {
.early_channel_count = k8_early_channel_count,
@@ -2595,6 +2691,17 @@ static const struct low_ops f17_ops = {
.get_umc_err_info = find_umc_channel,
};

+static const struct low_ops gpu_ops = {
+ .early_channel_count = gpu_early_channel_count,
+ .dbam_to_cs = gpu_addr_mask_to_cs_size,
+ .prep_chip_select = gpu_prep_chip_selects,
+ .get_base_mask = read_umc_base_mask_gpu,
+ .display_misc_regs = __dump_misc_regs_gpu,
+ .get_mc_regs = __read_mc_regs_gpu,
+ .populate_csrows = init_csrows_gpu,
+ .get_umc_err_info = find_umc_channel_gpu,
+};
+
static struct amd64_family_type family_types[] = {
[K8_CPUS] = {
.ctl_name = "K8",
@@ -2687,6 +2794,14 @@ static struct amd64_family_type family_types[] = {
.max_mcs = 8,
.ops = f17_ops,
},
+ [ALDEBARAN_GPUS] = {
+ .ctl_name = "ALDEBARAN",
+ .f0_id = PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F0,
+ .f6_id = PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F6,
+ .max_mcs = 4,
+ .ops = gpu_ops,
+ },
+
};

/*
@@ -2943,12 +3058,38 @@ static void find_umc_channel(struct mce *m, struct err_info *err)
err->csrow = m->synd & 0x7;
}

+/*
+ * The CPUs have one channel per UMC, So UMC number is equivalent to a
+ * channel number. The GPUs have 8 channels per UMC, so the UMC number no
+ * longer works as a channel number.
+ * The channel number within a GPU UMC is given in MCA_IPID[15:12].
+ * However, the IDs are split such that two UMC values go to one UMC, and
+ * the channel numbers are split in two groups of four.
+ *
+ * Refer comment on get_umc_base_gpu() from amd64_edac.h
+ *
+ * For example,
+ * UMC0 CH[3:0] = 0x0005[3:0]000
+ * UMC0 CH[7:4] = 0x0015[3:0]000
+ * UMC1 CH[3:0] = 0x0025[3:0]000
+ * UMC1 CH[7:4] = 0x0035[3:0]000
+ */
+static void find_umc_channel_gpu(struct mce *m, struct err_info *err)
+{
+ u8 ch = (m->ipid & GENMASK(31, 0)) >> 20;
+ u8 phy = ((m->ipid >> 12) & 0xf);
+
+ err->channel = ch % 2 ? phy + 4 : phy;
+ err->csrow = phy;
+}
+
static void decode_umc_error(int node_id, struct mce *m)
{
u8 ecc_type = (m->status >> 45) & 0x3;
struct mem_ctl_info *mci;
struct amd64_pvt *pvt;
struct err_info err;
+ u8 df_inst_id;
u64 sys_addr;

mci = edac_mc_find(node_id);
@@ -2978,7 +3119,17 @@ static void decode_umc_error(int node_id, struct mce *m)
err.err_code = ERR_CHANNEL;
}

- if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, err.channel, &sys_addr)) {
+ /*
+ * GPU node has #phys[X] which has #channels[Y] each.
+ * On GPUs, df_inst_id = [X] * num_ch_per_phy + [Y].
+ * On CPUs, "Channel"="UMC Number"="DF Instance ID".
+ */
+ if (pvt->is_gpu)
+ df_inst_id = (err.csrow * pvt->channel_count / mci->nr_csrows) + err.channel;
+ else
+ df_inst_id = err.channel;
+
+ if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, df_inst_id, &sys_addr)) {
err.err_code = ERR_NORM_ADDR;
goto log_error;
}
@@ -3117,6 +3268,23 @@ static void __read_mc_regs_df(struct amd64_pvt *pvt)
}
}

+static void __read_mc_regs_gpu(struct amd64_pvt *pvt)
+{
+ u8 nid = pvt->mc_node_id;
+ struct amd64_umc *umc;
+ u32 i, umc_base;
+
+ /* Read registers from each UMC */
+ for_each_umc(i) {
+ umc_base = get_umc_base_gpu(i, 0);
+ umc = &pvt->umc[i];
+
+ amd_smn_read(nid, umc_base + UMCCH_UMC_CFG, &umc->umc_cfg);
+ amd_smn_read(nid, umc_base + UMCCH_SDP_CTRL, &umc->sdp_ctrl);
+ amd_smn_read(nid, umc_base + UMCCH_ECC_CTRL, &umc->ecc_ctrl);
+ }
+}
+
/*
* Retrieve the hardware registers of the memory controller (this includes the
* 'Address Map' and 'Misc' device regs)
@@ -3196,7 +3364,9 @@ static void read_mc_regs(struct amd64_pvt *pvt)
determine_memory_type(pvt);
edac_dbg(1, " DIMM type: %s\n", edac_mem_types[pvt->dram_type]);

- determine_ecc_sym_sz(pvt);
+ /* ECC symbol size is not available on GPU nodes */
+ if (!pvt->is_gpu)
+ determine_ecc_sym_sz(pvt);
}

/*
@@ -3243,7 +3413,10 @@ static u32 get_csrow_nr_pages(struct amd64_pvt *pvt, u8 dct, int csrow_nr_orig)
csrow_nr >>= 1;
cs_mode = DBAM_DIMM(csrow_nr, dbam);
} else {
- cs_mode = f17_get_cs_mode(csrow_nr >> 1, dct, pvt);
+ if (pvt->is_gpu)
+ cs_mode = CS_EVEN_PRIMARY | CS_ODD_PRIMARY;
+ else
+ cs_mode = f17_get_cs_mode(csrow_nr >> 1, dct, pvt);
}

nr_pages = pvt->ops->dbam_to_cs(pvt, dct, cs_mode, csrow_nr);
@@ -3300,6 +3473,35 @@ static int init_csrows_df(struct mem_ctl_info *mci)
return empty;
}

+static int init_csrows_gpu(struct mem_ctl_info *mci)
+{
+ struct amd64_pvt *pvt = mci->pvt_info;
+ struct dimm_info *dimm;
+ int empty = 1;
+ u8 umc, cs;
+
+ for_each_umc(umc) {
+ for_each_chip_select(cs, umc, pvt) {
+ if (!csrow_enabled(cs, umc, pvt))
+ continue;
+
+ empty = 0;
+ dimm = mci->csrows[umc]->channels[cs]->dimm;
+
+ edac_dbg(1, "MC node: %d, csrow: %d\n",
+ pvt->mc_node_id, cs);
+
+ dimm->nr_pages = get_csrow_nr_pages(pvt, umc, cs);
+ dimm->mtype = MEM_HBM2;
+ dimm->edac_mode = EDAC_SECDED;
+ dimm->dtype = DEV_X16;
+ dimm->grain = 64;
+ }
+ }
+
+ return empty;
+}
+
/*
* Initialize the array of csrow attribute instances, based on the values
* from pci config hardware registers.
@@ -3541,6 +3743,10 @@ static bool ecc_enabled(struct amd64_pvt *pvt)
u8 ecc_en = 0, i;
u32 value;

+ /* ECC is enabled by default on GPU nodes */
+ if (pvt->is_gpu)
+ return true;
+
if (boot_cpu_data.x86 >= 0x17) {
u8 umc_en_mask = 0, ecc_en_mask = 0;
struct amd64_umc *umc;
@@ -3624,7 +3830,10 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci)
mci->edac_ctl_cap = EDAC_FLAG_NONE;

if (pvt->umc) {
- f17h_determine_edac_ctl_cap(mci, pvt);
+ if (pvt->is_gpu)
+ mci->edac_ctl_cap |= EDAC_FLAG_SECDED;
+ else
+ f17h_determine_edac_ctl_cap(mci, pvt);
} else {
if (pvt->nbcap & NBCAP_SECDED)
mci->edac_ctl_cap |= EDAC_FLAG_SECDED;
@@ -3726,6 +3935,17 @@ static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt)
pvt->ops = &family_types[F17_M70H_CPUS].ops;
fam_type->ctl_name = "F19h_M20h";
break;
+ } else if (pvt->model >= 0x30 && pvt->model <= 0x3f) {
+ if (pvt->mc_node_id >= amd_cpu_node_count()) {
+ fam_type = &family_types[ALDEBARAN_GPUS];
+ pvt->ops = &family_types[ALDEBARAN_GPUS].ops;
+ pvt->is_gpu = true;
+ } else {
+ fam_type = &family_types[F19_CPUS];
+ pvt->ops = &family_types[F19_CPUS].ops;
+ fam_type->ctl_name = "F19h_M30h";
+ }
+ break;
}
fam_type = &family_types[F19_CPUS];
pvt->ops = &family_types[F19_CPUS].ops;
@@ -3808,9 +4028,10 @@ static int init_one_instance(struct amd64_pvt *pvt)
if (pvt->channel_count < 0)
return ret;

+ /* Define layers for CPU and GPU nodes */
ret = -ENOMEM;
layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
- layers[0].size = pvt->csels[0].b_cnt;
+ layers[0].size = pvt->is_gpu ? fam_type->max_mcs : pvt->csels[0].b_cnt;
layers[0].is_virt_csrow = true;
layers[1].type = EDAC_MC_LAYER_CHANNEL;

@@ -3819,7 +4040,7 @@ static int init_one_instance(struct amd64_pvt *pvt)
* only one channel. Also, this simplifies handling later for the price
* of a couple of KBs tops.
*/
- layers[1].size = fam_type->max_mcs;
+ layers[1].size = pvt->is_gpu ? pvt->csels[0].b_cnt : fam_type->max_mcs;
layers[1].is_virt_csrow = false;

mci = edac_mc_alloc(pvt->mc_node_id, ARRAY_SIZE(layers), layers, 0);
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index ce21b3cf0825..2dbf6fe14a55 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -126,6 +126,8 @@
#define PCI_DEVICE_ID_AMD_17H_M70H_DF_F6 0x1446
#define PCI_DEVICE_ID_AMD_19H_DF_F0 0x1650
#define PCI_DEVICE_ID_AMD_19H_DF_F6 0x1656
+#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F0 0x14d0
+#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F6 0x14d6

/*
* Function 1 - Address Map
@@ -298,6 +300,7 @@ enum amd_families {
F17_M60H_CPUS,
F17_M70H_CPUS,
F19_CPUS,
+ ALDEBARAN_GPUS,
NUM_FAMILIES,
};

@@ -389,6 +392,8 @@ struct amd64_pvt {
enum mem_type dram_type;

struct amd64_umc *umc; /* UMC registers */
+
+ bool is_gpu;
};

enum err_codes {
@@ -410,6 +415,28 @@ struct err_info {
u32 offset;
};

+static inline u32 get_umc_base_gpu(u8 umc, u8 channel)
+{
+ /*
+ * On CPUs, there is one channel per UMC, so UMC numbering equals
+ * channel numbering. On GPUs, there are eight channels per UMC,
+ * so the channel numbering is different from UMC numbering.
+ *
+ * On CPU nodes channels are selected in 6th nibble
+ * UMC chY[3:0]= [(chY*2 + 1) : (chY*2)]50000;
+ *
+ * On GPU nodes channels are selected in 3rd nibble
+ * HBM chX[3:0]= [Y ]5X[3:0]000;
+ * HBM chX[7:4]= [Y+1]5X[3:0]000
+ */
+ umc *= 2;
+
+ if (channel >= 4)
+ umc++;
+
+ return 0x50000 + (umc << 20) + ((channel % 4) << 12);
+}
+
static inline u32 get_umc_base(u8 channel)
{
/* chY: 0xY50000 */
--
2.25.1

Subject: [PATCH v4 0/4] x86/edac/amd64: Add support for noncpu nodes

On newer heterogeneous systems with AMD CPUs the data fabrics of GPUs
can be connected directly via custom links.

This patchset does the following
1. amd_nb.c:
a. Add support for northbridges on Aldebaran GPU nodes
b. export AMD node map details to be used by edac and mce modules

2. mce_amd module:
a. Identify the node ID where the error occurred and map the node id
to linux enumerated node id.

2. Modifies the amd64_edac module
a. Add new family op routines
b. Enumerate UMCs and HBMs on the GPU nodes

This patchset is rebased on top of
"
commit 07416cadfdfa38283b840e700427ae3782c76f6b
Author: Yazen Ghannam <[email protected]>
Date: Tue Oct 5 15:44:19 2021 +0000

EDAC/amd64: Handle three rank interleaving mode
"

Muralidhara M K (2):
x86/amd_nb: Add support for northbridges on Aldebaran
EDAC/amd64: Extend family ops functions

Naveen Krishna Chatradhi (2):
EDAC/mce_amd: Extract node id from MCA_IPID
EDAC/amd64: Enumerate memory on Aldebaran GPU nodes

arch/x86/include/asm/amd_nb.h | 9 +
arch/x86/kernel/amd_nb.c | 131 +++++++--
drivers/edac/amd64_edac.c | 517 +++++++++++++++++++++++++---------
drivers/edac/amd64_edac.h | 33 +++
drivers/edac/mce_amd.c | 24 +-
include/linux/pci_ids.h | 1 +
6 files changed, 564 insertions(+), 151 deletions(-)

--
2.25.1

Subject: [PATCH v4 1/4] x86/amd_nb: Add support for northbridges on Aldebaran

From: Muralidhara M K <[email protected]>

On newer systems the CPUs manage MCA errors reported from the GPUs.
Enumerate the GPU nodes with the AMD NB framework to support EDAC.

GPU nodes are enumerated in sequential order based on the PCI hierarchy,
and the first GPU node is assumed to have an "AMD Node ID" value after
CPU Nodes are fully populated.

Aldebaran is an AMD GPU, GPU drivers are part of the DRM framework
https://lists.freedesktop.org/archives/amd-gfx/2021-February/059694.html

Each Aldebaran GPU has 2 Data Fabrics, which are enumerated as 2 nodes.
With this implementation detail, the Data Fabric on the GPU nodes can be
accessed the same way as the Data Fabric on CPU nodes.

Special handling was necessary in northbridge enumeration as the
roots_per_misc value is different for GPU and CPU nodes.

Signed-off-by: Muralidhara M K <[email protected]>
Co-developed-by: Naveen Krishna Chatradhi <[email protected]>
Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
Changes since v3:
1. Use word "gpu" instead of "noncpu" in the patch
2. Do not create pci_dev_ids arrays for gpu nodes
3. Identify the gpu node start index from DF18F1 registers on the GPU nodes.
a. Export cpu node count and gpu start node id

Changes since v2:
1. Added Reviewed-by Yazen Ghannam

Changes since v1:
1. Modified the commit message and comments in the code
2. Squashed patch 1/7: "x86/amd_nb: Add Aldebaran device to PCI IDs"

arch/x86/include/asm/amd_nb.h | 9 +++
arch/x86/kernel/amd_nb.c | 131 ++++++++++++++++++++++++++++------
include/linux/pci_ids.h | 1 +
3 files changed, 118 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h
index 455066a06f60..5898300f11ed 100644
--- a/arch/x86/include/asm/amd_nb.h
+++ b/arch/x86/include/asm/amd_nb.h
@@ -68,10 +68,17 @@ struct amd_northbridge {
struct threshold_bank *bank4;
};

+/* heterogeneous system node type map variables */
+struct amd_node_map {
+ u16 gpu_node_start_id;
+ u16 cpu_node_count;
+};
+
struct amd_northbridge_info {
u16 num;
u64 flags;
struct amd_northbridge *nb;
+ struct amd_node_map *nmap;
};

#define AMD_NB_GART BIT(0)
@@ -83,6 +90,8 @@ struct amd_northbridge_info {
u16 amd_nb_num(void);
bool amd_nb_has_feature(unsigned int feature);
struct amd_northbridge *node_to_amd_nb(int node);
+u16 amd_gpu_node_start_id(void);
+u16 amd_cpu_node_count(void);

static inline u16 amd_pci_dev_to_node_id(struct pci_dev *pdev)
{
diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
index c92c9c774c0e..54a6a7462f07 100644
--- a/arch/x86/kernel/amd_nb.c
+++ b/arch/x86/kernel/amd_nb.c
@@ -19,6 +19,7 @@
#define PCI_DEVICE_ID_AMD_17H_M10H_ROOT 0x15d0
#define PCI_DEVICE_ID_AMD_17H_M30H_ROOT 0x1480
#define PCI_DEVICE_ID_AMD_17H_M60H_ROOT 0x1630
+#define PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT 0x14bb
#define PCI_DEVICE_ID_AMD_17H_DF_F4 0x1464
#define PCI_DEVICE_ID_AMD_17H_M10H_DF_F4 0x15ec
#define PCI_DEVICE_ID_AMD_17H_M30H_DF_F4 0x1494
@@ -28,6 +29,7 @@
#define PCI_DEVICE_ID_AMD_19H_M40H_ROOT 0x14b5
#define PCI_DEVICE_ID_AMD_19H_M40H_DF_F4 0x167d
#define PCI_DEVICE_ID_AMD_19H_M50H_DF_F4 0x166e
+#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4 0x14d4

/* Protect the PCI config register pairs used for SMN and DF indirect access. */
static DEFINE_MUTEX(smn_mutex);
@@ -40,6 +42,7 @@ static const struct pci_device_id amd_root_ids[] = {
{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_17H_M30H_ROOT) },
{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_17H_M60H_ROOT) },
{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M40H_ROOT) },
+ { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT) },
{}
};

@@ -63,6 +66,7 @@ static const struct pci_device_id amd_nb_misc_ids[] = {
{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_DF_F3) },
{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M40H_DF_F3) },
{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M50H_DF_F3) },
+ { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3) },
{}
};

@@ -81,6 +85,7 @@ static const struct pci_device_id amd_nb_link_ids[] = {
{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M40H_DF_F4) },
{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M50H_DF_F4) },
{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_CNB17H_F4) },
+ { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4) },
{}
};

@@ -126,6 +131,55 @@ struct amd_northbridge *node_to_amd_nb(int node)
}
EXPORT_SYMBOL_GPL(node_to_amd_nb);

+/*
+ * GPU start index and CPU count values on an heterogeneous system,
+ * these values will be used by the AMD EDAC and MCE modules.
+ */
+u16 amd_gpu_node_start_id(void)
+{
+ return (amd_northbridges.nmap) ?
+ amd_northbridges.nmap->gpu_node_start_id : 0;
+}
+EXPORT_SYMBOL_GPL(amd_gpu_node_start_id);
+
+u16 amd_cpu_node_count(void)
+{
+ return (amd_northbridges.nmap) ?
+ amd_northbridges.nmap->cpu_node_count : amd_northbridges.num;
+}
+EXPORT_SYMBOL_GPL(amd_cpu_node_count);
+
+/* DF18xF1 regsters on Aldebaran GPU */
+#define REG_LOCAL_NODE_TYPE_MAP 0x144
+#define REG_RMT_NODE_TYPE_MAP 0x148
+
+#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F1 0x14d1
+
+static int amd_get_node_map(void)
+{
+ struct amd_node_map *np;
+ struct pci_dev *pdev = NULL;
+ u32 tmp;
+
+ pdev = pci_get_device(PCI_VENDOR_ID_AMD,
+ PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F1, pdev);
+ if (!pdev)
+ return -ENODEV;
+
+ np = kmalloc(sizeof(*np), GFP_KERNEL);
+ if (!np)
+ return -ENOMEM;
+
+ pci_read_config_dword(pdev, REG_LOCAL_NODE_TYPE_MAP, &tmp);
+ np->gpu_node_start_id = tmp & 0xFFF;
+
+ pci_read_config_dword(pdev, REG_RMT_NODE_TYPE_MAP, &tmp);
+ np->cpu_node_count = tmp >> 16 & 0xFFF;
+
+ amd_northbridges.nmap = np;
+ return 0;
+}
+
static struct pci_dev *next_northbridge(struct pci_dev *dev,
const struct pci_device_id *ids)
{
@@ -230,6 +284,27 @@ int amd_df_indirect_read(u16 node, u8 func, u16 reg, u8 instance_id, u32 *lo)
}
EXPORT_SYMBOL_GPL(amd_df_indirect_read);

+struct pci_dev *get_root_devs(struct pci_dev *root,
+ const struct pci_device_id *root_ids,
+ u16 roots_per_misc)
+{
+ u16 j;
+
+ /*
+ * If there are more PCI root devices than data fabric/
+ * system management network interfaces, then the (N)
+ * PCI roots per DF/SMN interface are functionally the
+ * same (for DF/SMN access) and N-1 are redundant. N-1
+ * PCI roots should be skipped per DF/SMN interface so
+ * the following DF/SMN interfaces get mapped to
+ * correct PCI roots.
+ */
+ for (j = 0; j < roots_per_misc; j++)
+ root = next_northbridge(root, root_ids);
+
+ return root;
+}
+
int amd_cache_northbridges(void)
{
const struct pci_device_id *misc_ids = amd_nb_misc_ids;
@@ -237,10 +312,10 @@ int amd_cache_northbridges(void)
const struct pci_device_id *root_ids = amd_root_ids;
struct pci_dev *root, *misc, *link;
struct amd_northbridge *nb;
- u16 roots_per_misc = 0;
- u16 misc_count = 0;
- u16 root_count = 0;
- u16 i, j;
+ u16 roots_per_misc = 0, gpu_roots_per_misc = 0;
+ u16 misc_count = 0, gpu_misc_count = 0;
+ u16 root_count = 0, gpu_root_count = 0;
+ u16 i;

if (amd_northbridges.num)
return 0;
@@ -252,15 +327,23 @@ int amd_cache_northbridges(void)
}

misc = NULL;
- while ((misc = next_northbridge(misc, misc_ids)) != NULL)
- misc_count++;
+ while ((misc = next_northbridge(misc, misc_ids)) != NULL) {
+ if (misc->device == PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3)
+ gpu_misc_count++;
+ else
+ misc_count++;
+ }

if (!misc_count)
return -ENODEV;

root = NULL;
- while ((root = next_northbridge(root, root_ids)) != NULL)
- root_count++;
+ while ((root = next_northbridge(root, root_ids)) != NULL) {
+ if (root->device == PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT)
+ gpu_root_count++;
+ else
+ root_count++;
+ }

if (root_count) {
roots_per_misc = root_count / misc_count;
@@ -275,33 +358,35 @@ int amd_cache_northbridges(void)
}
}

- nb = kcalloc(misc_count, sizeof(struct amd_northbridge), GFP_KERNEL);
+ /*
+ * The number of miscs, roots and roots_per_misc might vary on different
+ * nodes of a heterogeneous system.
+ * calculate roots_per_misc accordingly in order to skip the redundant
+ * roots and map the DF/SMN interfaces to correct PCI roots.
+ */
+ if (gpu_root_count && gpu_misc_count) {
+ if (amd_get_node_map())
+ return -ENOMEM;
+
+ gpu_roots_per_misc = gpu_root_count / gpu_misc_count;
+ }
+
+ amd_northbridges.num = misc_count + gpu_misc_count;
+ nb = kcalloc(amd_northbridges.num, sizeof(struct amd_northbridge), GFP_KERNEL);
if (!nb)
return -ENOMEM;

amd_northbridges.nb = nb;
- amd_northbridges.num = misc_count;

link = misc = root = NULL;
for (i = 0; i < amd_northbridges.num; i++) {
+ u16 misc_roots = i < misc_count ? roots_per_misc : gpu_roots_per_misc;
node_to_amd_nb(i)->root = root =
- next_northbridge(root, root_ids);
+ get_root_devs(root, root_ids, misc_roots);
node_to_amd_nb(i)->misc = misc =
next_northbridge(misc, misc_ids);
node_to_amd_nb(i)->link = link =
next_northbridge(link, link_ids);
-
- /*
- * If there are more PCI root devices than data fabric/
- * system management network interfaces, then the (N)
- * PCI roots per DF/SMN interface are functionally the
- * same (for DF/SMN access) and N-1 are redundant. N-1
- * PCI roots should be skipped per DF/SMN interface so
- * the following DF/SMN interfaces get mapped to
- * correct PCI roots.
- */
- for (j = 1; j < roots_per_misc; j++)
- root = next_northbridge(root, root_ids);
}

if (amd_gart_present())
diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index 011f2f1ea5bb..b3a0ec29dbd6 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -557,6 +557,7 @@
#define PCI_DEVICE_ID_AMD_19H_DF_F3 0x1653
#define PCI_DEVICE_ID_AMD_19H_M40H_DF_F3 0x167c
#define PCI_DEVICE_ID_AMD_19H_M50H_DF_F3 0x166d
+#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3 0x14d3
#define PCI_DEVICE_ID_AMD_CNB17H_F3 0x1703
#define PCI_DEVICE_ID_AMD_LANCE 0x2000
#define PCI_DEVICE_ID_AMD_LANCE_HOME 0x2001
--
2.25.1

Subject: [PATCH v4 2/4] EDAC/mce_amd: Extract node id from MCA_IPID

On SMCA banks of the GPU nodes, the node id information is
available in register MCA_IPID[47:44](InstanceIdHi).

Convert the hardware node ID to a value used by Linux
where GPU nodes are sequencially after the CPU nodes.

Co-developed-by: Muralidhara M K <[email protected]>
Signed-off-by: Muralidhara M K <[email protected]>
Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
Changes since v3:
1. Use APIs from amd_nb to identify the gpu_node_start_id and cpu_node_count.
Which is required to map the hardware node id to node id enumerated by Linux.

Changes since v2:
1. Modified subject and commit message
2. Added Reviewed by Yazen Ghannam

Changes since v1:
1. Modified the commit message
2. rearranged the conditions before calling decode_dram_ecc()

drivers/edac/mce_amd.c | 24 ++++++++++++++++++++++--
1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index 67dbf4c31271..af6caa76adc7 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -2,6 +2,7 @@
#include <linux/module.h>
#include <linux/slab.h>

+#include <asm/amd_nb.h>
#include <asm/cpu.h>

#include "mce_amd.h"
@@ -1072,8 +1073,27 @@ static void decode_smca_error(struct mce *m)
if (xec < smca_mce_descs[bank_type].num_descs)
pr_cont(", %s.\n", smca_mce_descs[bank_type].descs[xec]);

- if (bank_type == SMCA_UMC && xec == 0 && decode_dram_ecc)
- decode_dram_ecc(topology_die_id(m->extcpu), m);
+ if (xec == 0 && decode_dram_ecc) {
+ int node_id = 0;
+
+ if (bank_type == SMCA_UMC) {
+ node_id = topology_die_id(m->extcpu);
+ } else if (bank_type == SMCA_UMC_V2) {
+ /*
+ * SMCA_UMC_V2 exists on GPU nodes, extract the node id
+ * from register MCA_IPID[47:44](InstanceIdHi).
+ * The InstanceIdHi field represents the instance ID of the GPU.
+ * Which needs to be mapped to a value used by Linux,
+ * where GPU nodes are simply numerically after the CPU nodes.
+ */
+ node_id = ((m->ipid >> 44) & 0xF) -
+ amd_gpu_node_start_id() + amd_cpu_node_count();
+ } else {
+ return;
+ }
+
+ decode_dram_ecc(node_id, m);
+ }
}

static inline void amd_decode_err_code(u16 ec)
--
2.25.1

Subject: [PATCH v4 3/4] EDAC/amd64: Extend family ops functions

From: Muralidhara M K <[email protected]>

Create new family operation routines and define them respectively.
This would simplify adding support for future platforms.

Signed-off-by: Muralidhara M K <[email protected]>
Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
---
Changes since v3:
1. Defined new family operation routines

Changs since v2:
1. new patch

drivers/edac/amd64_edac.c | 291 ++++++++++++++++++++++----------------
drivers/edac/amd64_edac.h | 6 +
2 files changed, 174 insertions(+), 123 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 4fce75013674..131ed19f69dd 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -1204,10 +1204,7 @@ static void __dump_misc_regs(struct amd64_pvt *pvt)
/* Display and decode various NB registers for debug purposes. */
static void dump_misc_regs(struct amd64_pvt *pvt)
{
- if (pvt->umc)
- __dump_misc_regs_df(pvt);
- else
- __dump_misc_regs(pvt);
+ pvt->ops->display_misc_regs(pvt);

edac_dbg(1, " DramHoleValid: %s\n", dhar_valid(pvt) ? "yes" : "no");

@@ -1217,25 +1214,31 @@ static void dump_misc_regs(struct amd64_pvt *pvt)
/*
* See BKDG, F2x[1,0][5C:40], F2[1,0][6C:60]
*/
-static void prep_chip_selects(struct amd64_pvt *pvt)
+static void k8_prep_chip_selects(struct amd64_pvt *pvt)
{
- if (pvt->fam == 0xf && pvt->ext_model < K8_REV_F) {
- pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8;
- pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 8;
- } else if (pvt->fam == 0x15 && pvt->model == 0x30) {
- pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 4;
- pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2;
- } else if (pvt->fam >= 0x17) {
- int umc;
-
- for_each_umc(umc) {
- pvt->csels[umc].b_cnt = 4;
- pvt->csels[umc].m_cnt = 2;
- }
+ pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8;
+ pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 8;
+}

- } else {
- pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8;
- pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 4;
+static void f15m30_prep_chip_selects(struct amd64_pvt *pvt)
+{
+ pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 4;
+ pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2;
+}
+
+static void fmisc_prep_chip_selects(struct amd64_pvt *pvt)
+{
+ pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8;
+ pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 4;
+}
+
+static void f17_prep_chip_selects(struct amd64_pvt *pvt)
+{
+ int umc;
+
+ for_each_umc(umc) {
+ pvt->csels[umc].b_cnt = 4;
+ pvt->csels[umc].m_cnt = 2;
}
}

@@ -1297,10 +1300,10 @@ static void read_dct_base_mask(struct amd64_pvt *pvt)
{
int cs;

- prep_chip_selects(pvt);
+ pvt->ops->prep_chip_select(pvt);

- if (pvt->umc)
- return read_umc_base_mask(pvt);
+ if (pvt->ops->get_base_mask)
+ return pvt->ops->get_base_mask(pvt);

for_each_chip_select(cs, 0, pvt) {
int reg0 = DCSB0 + (cs * 4);
@@ -1869,37 +1872,12 @@ static int f16_dbam_to_chip_select(struct amd64_pvt *pvt, u8 dct,
return ddr3_cs_size(cs_mode, false);
}

-static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
- unsigned int cs_mode, int csrow_nr)
+static int __addr_mask_to_cs_size(u32 addr_mask_orig, unsigned int cs_mode,
+ int csrow_nr, int dimm)
{
- u32 addr_mask_orig, addr_mask_deinterleaved;
u32 msb, weight, num_zero_bits;
- int dimm, size = 0;
-
- /* No Chip Selects are enabled. */
- if (!cs_mode)
- return size;
-
- /* Requested size of an even CS but none are enabled. */
- if (!(cs_mode & CS_EVEN) && !(csrow_nr & 1))
- return size;
-
- /* Requested size of an odd CS but none are enabled. */
- if (!(cs_mode & CS_ODD) && (csrow_nr & 1))
- return size;
-
- /*
- * There is one mask per DIMM, and two Chip Selects per DIMM.
- * CS0 and CS1 -> DIMM0
- * CS2 and CS3 -> DIMM1
- */
- dimm = csrow_nr >> 1;
-
- /* Asymmetric dual-rank DIMM support. */
- if ((csrow_nr & 1) && (cs_mode & CS_ODD_SECONDARY))
- addr_mask_orig = pvt->csels[umc].csmasks_sec[dimm];
- else
- addr_mask_orig = pvt->csels[umc].csmasks[dimm];
+ u32 addr_mask_deinterleaved;
+ int size = 0;

/*
* The number of zero bits in the mask is equal to the number of bits
@@ -1930,6 +1908,40 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
return size >> 10;
}

+static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
+ unsigned int cs_mode, int csrow_nr)
+{
+ u32 addr_mask_orig;
+ int dimm, size = 0;
+
+ /* No Chip Selects are enabled. */
+ if (!cs_mode)
+ return size;
+
+ /* Requested size of an even CS but none are enabled. */
+ if (!(cs_mode & CS_EVEN) && !(csrow_nr & 1))
+ return size;
+
+ /* Requested size of an odd CS but none are enabled. */
+ if (!(cs_mode & CS_ODD) && (csrow_nr & 1))
+ return size;
+
+ /*
+ * There is one mask per DIMM, and two Chip Selects per DIMM.
+ * CS0 and CS1 -> DIMM0
+ * CS2 and CS3 -> DIMM1
+ */
+ dimm = csrow_nr >> 1;
+
+ /* Asymmetric dual-rank DIMM support. */
+ if ((csrow_nr & 1) && (cs_mode & CS_ODD_SECONDARY))
+ addr_mask_orig = pvt->csels[umc].csmasks_sec[dimm];
+ else
+ addr_mask_orig = pvt->csels[umc].csmasks[dimm];
+
+ return __addr_mask_to_cs_size(addr_mask_orig, cs_mode, csrow_nr, dimm);
+}
+
static void read_dram_ctl_register(struct amd64_pvt *pvt)
{

@@ -2512,143 +2524,168 @@ static void debug_display_dimm_sizes(struct amd64_pvt *pvt, u8 ctrl)
}
}

+/* Prototypes for family specific ops routines */
+static int init_csrows(struct mem_ctl_info *mci);
+static int init_csrows_df(struct mem_ctl_info *mci);
+static void __read_mc_regs_df(struct amd64_pvt *pvt);
+static void find_umc_channel(struct mce *m, struct err_info *err);
+
+static const struct low_ops k8_ops = {
+ .early_channel_count = k8_early_channel_count,
+ .map_sysaddr_to_csrow = k8_map_sysaddr_to_csrow,
+ .dbam_to_cs = k8_dbam_to_chip_select,
+ .prep_chip_select = k8_prep_chip_selects,
+ .display_misc_regs = __dump_misc_regs,
+ .populate_csrows = init_csrows,
+};
+
+static const struct low_ops f10_ops = {
+ .early_channel_count = f1x_early_channel_count,
+ .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
+ .dbam_to_cs = f10_dbam_to_chip_select,
+ .prep_chip_select = fmisc_prep_chip_selects,
+ .display_misc_regs = __dump_misc_regs,
+ .populate_csrows = init_csrows,
+};
+
+static const struct low_ops f15_ops = {
+ .early_channel_count = f1x_early_channel_count,
+ .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
+ .dbam_to_cs = f15_dbam_to_chip_select,
+ .prep_chip_select = fmisc_prep_chip_selects,
+ .display_misc_regs = __dump_misc_regs,
+ .populate_csrows = init_csrows,
+};
+
+static const struct low_ops f15m30_ops = {
+ .early_channel_count = f1x_early_channel_count,
+ .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
+ .dbam_to_cs = f16_dbam_to_chip_select,
+ .prep_chip_select = f15m30_prep_chip_selects,
+ .display_misc_regs = __dump_misc_regs,
+ .populate_csrows = init_csrows,
+};
+
+static const struct low_ops f16_x_ops = {
+ .early_channel_count = f1x_early_channel_count,
+ .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
+ .dbam_to_cs = f15_m60h_dbam_to_chip_select,
+ .prep_chip_select = fmisc_prep_chip_selects,
+ .display_misc_regs = __dump_misc_regs,
+ .populate_csrows = init_csrows,
+};
+
+static const struct low_ops f16_ops = {
+ .early_channel_count = f1x_early_channel_count,
+ .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
+ .dbam_to_cs = f16_dbam_to_chip_select,
+ .prep_chip_select = fmisc_prep_chip_selects,
+ .display_misc_regs = __dump_misc_regs,
+ .populate_csrows = init_csrows,
+};
+
+static const struct low_ops f17_ops = {
+ .early_channel_count = f17_early_channel_count,
+ .dbam_to_cs = f17_addr_mask_to_cs_size,
+ .prep_chip_select = f17_prep_chip_selects,
+ .get_base_mask = read_umc_base_mask,
+ .display_misc_regs = __dump_misc_regs_df,
+ .get_mc_regs = __read_mc_regs_df,
+ .populate_csrows = init_csrows_df,
+ .get_umc_err_info = find_umc_channel,
+};
+
static struct amd64_family_type family_types[] = {
[K8_CPUS] = {
.ctl_name = "K8",
.f1_id = PCI_DEVICE_ID_AMD_K8_NB_ADDRMAP,
.f2_id = PCI_DEVICE_ID_AMD_K8_NB_MEMCTL,
.max_mcs = 2,
- .ops = {
- .early_channel_count = k8_early_channel_count,
- .map_sysaddr_to_csrow = k8_map_sysaddr_to_csrow,
- .dbam_to_cs = k8_dbam_to_chip_select,
- }
+ .ops = k8_ops,
},
[F10_CPUS] = {
.ctl_name = "F10h",
.f1_id = PCI_DEVICE_ID_AMD_10H_NB_MAP,
.f2_id = PCI_DEVICE_ID_AMD_10H_NB_DRAM,
.max_mcs = 2,
- .ops = {
- .early_channel_count = f1x_early_channel_count,
- .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
- .dbam_to_cs = f10_dbam_to_chip_select,
- }
+ .ops = f10_ops,
},
[F15_CPUS] = {
.ctl_name = "F15h",
.f1_id = PCI_DEVICE_ID_AMD_15H_NB_F1,
.f2_id = PCI_DEVICE_ID_AMD_15H_NB_F2,
.max_mcs = 2,
- .ops = {
- .early_channel_count = f1x_early_channel_count,
- .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
- .dbam_to_cs = f15_dbam_to_chip_select,
- }
+ .ops = f15_ops,
},
[F15_M30H_CPUS] = {
.ctl_name = "F15h_M30h",
.f1_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F1,
.f2_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F2,
.max_mcs = 2,
- .ops = {
- .early_channel_count = f1x_early_channel_count,
- .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
- .dbam_to_cs = f16_dbam_to_chip_select,
- }
+ .ops = f15m30_ops,
},
[F15_M60H_CPUS] = {
.ctl_name = "F15h_M60h",
.f1_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F1,
.f2_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F2,
.max_mcs = 2,
- .ops = {
- .early_channel_count = f1x_early_channel_count,
- .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
- .dbam_to_cs = f15_m60h_dbam_to_chip_select,
- }
+ .ops = f16_x_ops,
},
[F16_CPUS] = {
.ctl_name = "F16h",
.f1_id = PCI_DEVICE_ID_AMD_16H_NB_F1,
.f2_id = PCI_DEVICE_ID_AMD_16H_NB_F2,
.max_mcs = 2,
- .ops = {
- .early_channel_count = f1x_early_channel_count,
- .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
- .dbam_to_cs = f16_dbam_to_chip_select,
- }
+ .ops = f16_ops,
},
[F16_M30H_CPUS] = {
.ctl_name = "F16h_M30h",
.f1_id = PCI_DEVICE_ID_AMD_16H_M30H_NB_F1,
.f2_id = PCI_DEVICE_ID_AMD_16H_M30H_NB_F2,
.max_mcs = 2,
- .ops = {
- .early_channel_count = f1x_early_channel_count,
- .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
- .dbam_to_cs = f16_dbam_to_chip_select,
- }
+ .ops = f16_ops,
},
[F17_CPUS] = {
.ctl_name = "F17h",
.f0_id = PCI_DEVICE_ID_AMD_17H_DF_F0,
.f6_id = PCI_DEVICE_ID_AMD_17H_DF_F6,
.max_mcs = 2,
- .ops = {
- .early_channel_count = f17_early_channel_count,
- .dbam_to_cs = f17_addr_mask_to_cs_size,
- }
+ .ops = f17_ops,
},
[F17_M10H_CPUS] = {
.ctl_name = "F17h_M10h",
.f0_id = PCI_DEVICE_ID_AMD_17H_M10H_DF_F0,
.f6_id = PCI_DEVICE_ID_AMD_17H_M10H_DF_F6,
.max_mcs = 2,
- .ops = {
- .early_channel_count = f17_early_channel_count,
- .dbam_to_cs = f17_addr_mask_to_cs_size,
- }
+ .ops = f17_ops,
},
[F17_M30H_CPUS] = {
.ctl_name = "F17h_M30h",
.f0_id = PCI_DEVICE_ID_AMD_17H_M30H_DF_F0,
.f6_id = PCI_DEVICE_ID_AMD_17H_M30H_DF_F6,
.max_mcs = 8,
- .ops = {
- .early_channel_count = f17_early_channel_count,
- .dbam_to_cs = f17_addr_mask_to_cs_size,
- }
+ .ops = f17_ops,
},
[F17_M60H_CPUS] = {
.ctl_name = "F17h_M60h",
.f0_id = PCI_DEVICE_ID_AMD_17H_M60H_DF_F0,
.f6_id = PCI_DEVICE_ID_AMD_17H_M60H_DF_F6,
.max_mcs = 2,
- .ops = {
- .early_channel_count = f17_early_channel_count,
- .dbam_to_cs = f17_addr_mask_to_cs_size,
- }
+ .ops = f17_ops,
},
[F17_M70H_CPUS] = {
.ctl_name = "F17h_M70h",
.f0_id = PCI_DEVICE_ID_AMD_17H_M70H_DF_F0,
.f6_id = PCI_DEVICE_ID_AMD_17H_M70H_DF_F6,
.max_mcs = 2,
- .ops = {
- .early_channel_count = f17_early_channel_count,
- .dbam_to_cs = f17_addr_mask_to_cs_size,
- }
+ .ops = f17_ops,
},
[F19_CPUS] = {
.ctl_name = "F19h",
.f0_id = PCI_DEVICE_ID_AMD_19H_DF_F0,
.f6_id = PCI_DEVICE_ID_AMD_19H_DF_F6,
.max_mcs = 8,
- .ops = {
- .early_channel_count = f17_early_channel_count,
- .dbam_to_cs = f17_addr_mask_to_cs_size,
- }
+ .ops = f17_ops,
},
};

@@ -2900,9 +2937,10 @@ static inline void decode_bus_error(int node_id, struct mce *m)
* the instance_id. For example, instance_id=0xYXXXXX where Y is the channel
* number.
*/
-static int find_umc_channel(struct mce *m)
+static void find_umc_channel(struct mce *m, struct err_info *err)
{
- return (m->ipid & GENMASK(31, 0)) >> 20;
+ err->channel = (m->ipid & GENMASK(31, 0)) >> 20;
+ err->csrow = m->synd & 0x7;
}

static void decode_umc_error(int node_id, struct mce *m)
@@ -2924,7 +2962,7 @@ static void decode_umc_error(int node_id, struct mce *m)
if (m->status & MCI_STATUS_DEFERRED)
ecc_type = 3;

- err.channel = find_umc_channel(m);
+ pvt->ops->get_umc_err_info(m, &err);

if (!(m->status & MCI_STATUS_SYNDV)) {
err.err_code = ERR_SYND;
@@ -2940,8 +2978,6 @@ static void decode_umc_error(int node_id, struct mce *m)
err.err_code = ERR_CHANNEL;
}

- err.csrow = m->synd & 0x7;
-
if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, err.channel, &sys_addr)) {
err.err_code = ERR_NORM_ADDR;
goto log_error;
@@ -3106,8 +3142,9 @@ static void read_mc_regs(struct amd64_pvt *pvt)
edac_dbg(0, " TOP_MEM2 disabled\n");
}

- if (pvt->umc) {
- __read_mc_regs_df(pvt);
+ if (pvt->ops->get_mc_regs) {
+ pvt->ops->get_mc_regs(pvt);
+
amd64_read_pci_cfg(pvt->F0, DF_DHAR, &pvt->dhar);

goto skip;
@@ -3277,9 +3314,6 @@ static int init_csrows(struct mem_ctl_info *mci)
int nr_pages = 0;
u32 val;

- if (pvt->umc)
- return init_csrows_df(mci);
-
amd64_read_pci_cfg(pvt->F3, NBCFG, &val);

pvt->nbcfg = val;
@@ -3703,6 +3737,17 @@ static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt)
return NULL;
}

+ /* ops required for all the families */
+ if (!pvt->ops->early_channel_count | !pvt->ops->prep_chip_select |
+ !pvt->ops->display_misc_regs | !pvt->ops->dbam_to_cs |
+ !pvt->ops->populate_csrows)
+ return NULL;
+
+ /* ops required for families 17h and later */
+ if (pvt->fam >= 0x17 && (!pvt->ops->get_base_mask |
+ !pvt->ops->get_umc_err_info | !pvt->ops->get_mc_regs))
+ return NULL;
+
return fam_type;
}

@@ -3786,7 +3831,7 @@ static int init_one_instance(struct amd64_pvt *pvt)

setup_mci_misc_attrs(mci);

- if (init_csrows(mci))
+ if (pvt->ops->populate_csrows(mci))
mci->edac_cap = EDAC_FLAG_NONE;

ret = -ENODEV;
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 85aa820bc165..ce21b3cf0825 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -472,6 +472,12 @@ struct low_ops {
struct err_info *);
int (*dbam_to_cs) (struct amd64_pvt *pvt, u8 dct,
unsigned cs_mode, int cs_mask_nr);
+ void (*prep_chip_select)(struct amd64_pvt *pvt);
+ void (*get_base_mask)(struct amd64_pvt *pvt);
+ void (*display_misc_regs)(struct amd64_pvt *pvt);
+ void (*get_mc_regs)(struct amd64_pvt *pvt);
+ int (*populate_csrows)(struct mem_ctl_info *mci);
+ void (*get_umc_err_info)(struct mce *m, struct err_info *err);
};

struct amd64_family_type {
--
2.25.1

Subject: [PATCH v4 4/4] EDAC/amd64: Enumerate memory on Aldebaran GPU nodes

On newer heterogeneous systems with AMD CPUs, the data fabrics of the GPUs
are connected directly via custom links.

One such system, where Aldebaran GPU nodes are connected to the
Family 19h, model 30h family of CPU nodes, the Aldebaran GPUs can report
memory errors via SMCA banks.

Aldebaran GPU support was added to DRM framework
https://lists.freedesktop.org/archives/amd-gfx/2021-February/059694.html

The GPU nodes comes with HBM2 memory in-built, ECC support is enabled by
default and the UMCs on GPU node are different from the UMCs on CPU nodes.

GPU specific ops routines are defined to extend the amd64_edac
module to enumerate HBM memory leveraging the existing edac and the
amd64 specific data structures.

Note: The UMC Phys on GPU nodes are enumerated as csrows and the UMC
channels connected to HBM banks are enumerated as ranks.

Cc: Yazen Ghannam <[email protected]>
Co-developed-by: Muralidhara M K <[email protected]>
Signed-off-by: Muralidhara M K <[email protected]>
Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
Changes since v3:
1. Bifurcated the GPU code from v2

Changes since v2:
1. Restored line deletions and handled minor comments
2. Modified commit message and some of the function comments
3. variable df_inst_id is introduced instead of umc_num

Changes since v1:
1. Modifed the commit message
2. Change the edac_cap
3. kept sizes of both cpu and noncpu together
4. return success if the !F3 condition true and remove unnecessary validation

drivers/edac/amd64_edac.c | 233 +++++++++++++++++++++++++++++++++++++-
drivers/edac/amd64_edac.h | 27 +++++
2 files changed, 254 insertions(+), 6 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 131ed19f69dd..7173310660a3 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -1123,6 +1123,20 @@ static void debug_display_dimm_sizes_df(struct amd64_pvt *pvt, u8 ctrl)
}
}

+static void debug_display_dimm_sizes_gpu(struct amd64_pvt *pvt, u8 ctrl)
+{
+ int size, cs = 0, cs_mode;
+
+ edac_printk(KERN_DEBUG, EDAC_MC, "UMC%d chip selects:\n", ctrl);
+
+ cs_mode = CS_EVEN_PRIMARY | CS_ODD_PRIMARY;
+
+ for_each_chip_select(cs, ctrl, pvt) {
+ size = pvt->ops->dbam_to_cs(pvt, ctrl, cs_mode, cs);
+ amd64_info(EDAC_MC ": %d: %5dMB\n", cs, size);
+ }
+}
+
static void __dump_misc_regs_df(struct amd64_pvt *pvt)
{
struct amd64_umc *umc;
@@ -1167,6 +1181,27 @@ static void __dump_misc_regs_df(struct amd64_pvt *pvt)
pvt->dhar, dhar_base(pvt));
}

+static void __dump_misc_regs_gpu(struct amd64_pvt *pvt)
+{
+ struct amd64_umc *umc;
+ u32 i, umc_base;
+
+ for_each_umc(i) {
+ umc_base = get_umc_base(i);
+ umc = &pvt->umc[i];
+
+ edac_dbg(1, "UMC%d UMC cfg: 0x%x\n", i, umc->umc_cfg);
+ edac_dbg(1, "UMC%d SDP ctrl: 0x%x\n", i, umc->sdp_ctrl);
+ edac_dbg(1, "UMC%d ECC ctrl: 0x%x\n", i, umc->ecc_ctrl);
+ edac_dbg(1, "UMC%d All HBMs support ECC: yes\n", i);
+
+ debug_display_dimm_sizes_gpu(pvt, i);
+ }
+
+ edac_dbg(1, "F0x104 (DRAM Hole Address): 0x%08x, base: 0x%08x\n",
+ pvt->dhar, dhar_base(pvt));
+}
+
/* Display and decode various NB registers for debug purposes. */
static void __dump_misc_regs(struct amd64_pvt *pvt)
{
@@ -1242,6 +1277,43 @@ static void f17_prep_chip_selects(struct amd64_pvt *pvt)
}
}

+static void gpu_prep_chip_selects(struct amd64_pvt *pvt)
+{
+ int umc;
+
+ for_each_umc(umc) {
+ pvt->csels[umc].b_cnt = 8;
+ pvt->csels[umc].m_cnt = 8;
+ }
+}
+
+static void read_umc_base_mask_gpu(struct amd64_pvt *pvt)
+{
+ u32 base_reg, mask_reg;
+ u32 *base, *mask;
+ int umc, cs;
+
+ for_each_umc(umc) {
+ for_each_chip_select(cs, umc, pvt) {
+ base_reg = get_umc_base_gpu(umc, cs) + UMCCH_BASE_ADDR;
+ base = &pvt->csels[umc].csbases[cs];
+
+ if (!amd_smn_read(pvt->mc_node_id, base_reg, base)) {
+ edac_dbg(0, " DCSB%d[%d]=0x%08x reg: 0x%x\n",
+ umc, cs, *base, base_reg);
+ }
+
+ mask_reg = get_umc_base_gpu(umc, cs) + UMCCH_ADDR_MASK;
+ mask = &pvt->csels[umc].csmasks[cs];
+
+ if (!amd_smn_read(pvt->mc_node_id, mask_reg, mask)) {
+ edac_dbg(0, " DCSM%d[%d]=0x%08x reg: 0x%x\n",
+ umc, cs, *mask, mask_reg);
+ }
+ }
+ }
+}
+
static void read_umc_base_mask(struct amd64_pvt *pvt)
{
u32 umc_base_reg, umc_base_reg_sec;
@@ -1745,6 +1817,19 @@ static int f17_early_channel_count(struct amd64_pvt *pvt)
return channels;
}

+static int gpu_early_channel_count(struct amd64_pvt *pvt)
+{
+ int i, channels = 0;
+
+ /* The memory channels in case of GPUs are fully populated */
+ for_each_umc(i)
+ channels += pvt->csels[i].b_cnt;
+
+ amd64_info("MCT channel count: %d\n", channels);
+
+ return channels;
+}
+
static int ddr3_cs_size(unsigned i, bool dct_width)
{
unsigned shift = 0;
@@ -1942,6 +2027,14 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
return __addr_mask_to_cs_size(addr_mask_orig, cs_mode, csrow_nr, dimm);
}

+static int gpu_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
+ unsigned int cs_mode, int csrow_nr)
+{
+ u32 addr_mask_orig = pvt->csels[umc].csmasks[csrow_nr];
+
+ return __addr_mask_to_cs_size(addr_mask_orig, cs_mode, csrow_nr, csrow_nr >> 1);
+}
+
static void read_dram_ctl_register(struct amd64_pvt *pvt)
{

@@ -2527,8 +2620,11 @@ static void debug_display_dimm_sizes(struct amd64_pvt *pvt, u8 ctrl)
/* Prototypes for family specific ops routines */
static int init_csrows(struct mem_ctl_info *mci);
static int init_csrows_df(struct mem_ctl_info *mci);
+static int init_csrows_gpu(struct mem_ctl_info *mci);
static void __read_mc_regs_df(struct amd64_pvt *pvt);
+static void __read_mc_regs_gpu(struct amd64_pvt *pvt);
static void find_umc_channel(struct mce *m, struct err_info *err);
+static void find_umc_channel_gpu(struct mce *m, struct err_info *err);

static const struct low_ops k8_ops = {
.early_channel_count = k8_early_channel_count,
@@ -2595,6 +2691,17 @@ static const struct low_ops f17_ops = {
.get_umc_err_info = find_umc_channel,
};

+static const struct low_ops gpu_ops = {
+ .early_channel_count = gpu_early_channel_count,
+ .dbam_to_cs = gpu_addr_mask_to_cs_size,
+ .prep_chip_select = gpu_prep_chip_selects,
+ .get_base_mask = read_umc_base_mask_gpu,
+ .display_misc_regs = __dump_misc_regs_gpu,
+ .get_mc_regs = __read_mc_regs_gpu,
+ .populate_csrows = init_csrows_gpu,
+ .get_umc_err_info = find_umc_channel_gpu,
+};
+
static struct amd64_family_type family_types[] = {
[K8_CPUS] = {
.ctl_name = "K8",
@@ -2687,6 +2794,14 @@ static struct amd64_family_type family_types[] = {
.max_mcs = 8,
.ops = f17_ops,
},
+ [ALDEBARAN_GPUS] = {
+ .ctl_name = "ALDEBARAN",
+ .f0_id = PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F0,
+ .f6_id = PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F6,
+ .max_mcs = 4,
+ .ops = gpu_ops,
+ },
+
};

/*
@@ -2943,12 +3058,38 @@ static void find_umc_channel(struct mce *m, struct err_info *err)
err->csrow = m->synd & 0x7;
}

+/*
+ * The CPUs have one channel per UMC, So UMC number is equivalent to a
+ * channel number. The GPUs have 8 channels per UMC, so the UMC number no
+ * longer works as a channel number.
+ * The channel number within a GPU UMC is given in MCA_IPID[15:12].
+ * However, the IDs are split such that two UMC values go to one UMC, and
+ * the channel numbers are split in two groups of four.
+ *
+ * Refer comment on get_umc_base_gpu() from amd64_edac.h
+ *
+ * For example,
+ * UMC0 CH[3:0] = 0x0005[3:0]000
+ * UMC0 CH[7:4] = 0x0015[3:0]000
+ * UMC1 CH[3:0] = 0x0025[3:0]000
+ * UMC1 CH[7:4] = 0x0035[3:0]000
+ */
+static void find_umc_channel_gpu(struct mce *m, struct err_info *err)
+{
+ u8 ch = (m->ipid & GENMASK(31, 0)) >> 20;
+ u8 phy = ((m->ipid >> 12) & 0xf);
+
+ err->channel = ch % 2 ? phy + 4 : phy;
+ err->csrow = phy;
+}
+
static void decode_umc_error(int node_id, struct mce *m)
{
u8 ecc_type = (m->status >> 45) & 0x3;
struct mem_ctl_info *mci;
struct amd64_pvt *pvt;
struct err_info err;
+ u8 df_inst_id;
u64 sys_addr;

mci = edac_mc_find(node_id);
@@ -2978,7 +3119,17 @@ static void decode_umc_error(int node_id, struct mce *m)
err.err_code = ERR_CHANNEL;
}

- if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, err.channel, &sys_addr)) {
+ /*
+ * GPU node has #phys[X] which has #channels[Y] each.
+ * On GPUs, df_inst_id = [X] * num_ch_per_phy + [Y].
+ * On CPUs, "Channel"="UMC Number"="DF Instance ID".
+ */
+ if (pvt->is_gpu)
+ df_inst_id = (err.csrow * pvt->channel_count / mci->nr_csrows) + err.channel;
+ else
+ df_inst_id = err.channel;
+
+ if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, df_inst_id, &sys_addr)) {
err.err_code = ERR_NORM_ADDR;
goto log_error;
}
@@ -3117,6 +3268,23 @@ static void __read_mc_regs_df(struct amd64_pvt *pvt)
}
}

+static void __read_mc_regs_gpu(struct amd64_pvt *pvt)
+{
+ u8 nid = pvt->mc_node_id;
+ struct amd64_umc *umc;
+ u32 i, umc_base;
+
+ /* Read registers from each UMC */
+ for_each_umc(i) {
+ umc_base = get_umc_base_gpu(i, 0);
+ umc = &pvt->umc[i];
+
+ amd_smn_read(nid, umc_base + UMCCH_UMC_CFG, &umc->umc_cfg);
+ amd_smn_read(nid, umc_base + UMCCH_SDP_CTRL, &umc->sdp_ctrl);
+ amd_smn_read(nid, umc_base + UMCCH_ECC_CTRL, &umc->ecc_ctrl);
+ }
+}
+
/*
* Retrieve the hardware registers of the memory controller (this includes the
* 'Address Map' and 'Misc' device regs)
@@ -3196,7 +3364,9 @@ static void read_mc_regs(struct amd64_pvt *pvt)
determine_memory_type(pvt);
edac_dbg(1, " DIMM type: %s\n", edac_mem_types[pvt->dram_type]);

- determine_ecc_sym_sz(pvt);
+ /* ECC symbol size is not available on GPU nodes */
+ if (!pvt->is_gpu)
+ determine_ecc_sym_sz(pvt);
}

/*
@@ -3243,7 +3413,10 @@ static u32 get_csrow_nr_pages(struct amd64_pvt *pvt, u8 dct, int csrow_nr_orig)
csrow_nr >>= 1;
cs_mode = DBAM_DIMM(csrow_nr, dbam);
} else {
- cs_mode = f17_get_cs_mode(csrow_nr >> 1, dct, pvt);
+ if (pvt->is_gpu)
+ cs_mode = CS_EVEN_PRIMARY | CS_ODD_PRIMARY;
+ else
+ cs_mode = f17_get_cs_mode(csrow_nr >> 1, dct, pvt);
}

nr_pages = pvt->ops->dbam_to_cs(pvt, dct, cs_mode, csrow_nr);
@@ -3300,6 +3473,35 @@ static int init_csrows_df(struct mem_ctl_info *mci)
return empty;
}

+static int init_csrows_gpu(struct mem_ctl_info *mci)
+{
+ struct amd64_pvt *pvt = mci->pvt_info;
+ struct dimm_info *dimm;
+ int empty = 1;
+ u8 umc, cs;
+
+ for_each_umc(umc) {
+ for_each_chip_select(cs, umc, pvt) {
+ if (!csrow_enabled(cs, umc, pvt))
+ continue;
+
+ empty = 0;
+ dimm = mci->csrows[umc]->channels[cs]->dimm;
+
+ edac_dbg(1, "MC node: %d, csrow: %d\n",
+ pvt->mc_node_id, cs);
+
+ dimm->nr_pages = get_csrow_nr_pages(pvt, umc, cs);
+ dimm->mtype = MEM_HBM2;
+ dimm->edac_mode = EDAC_SECDED;
+ dimm->dtype = DEV_X16;
+ dimm->grain = 64;
+ }
+ }
+
+ return empty;
+}
+
/*
* Initialize the array of csrow attribute instances, based on the values
* from pci config hardware registers.
@@ -3541,6 +3743,10 @@ static bool ecc_enabled(struct amd64_pvt *pvt)
u8 ecc_en = 0, i;
u32 value;

+ /* ECC is enabled by default on GPU nodes */
+ if (pvt->is_gpu)
+ return true;
+
if (boot_cpu_data.x86 >= 0x17) {
u8 umc_en_mask = 0, ecc_en_mask = 0;
struct amd64_umc *umc;
@@ -3624,7 +3830,10 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci)
mci->edac_ctl_cap = EDAC_FLAG_NONE;

if (pvt->umc) {
- f17h_determine_edac_ctl_cap(mci, pvt);
+ if (pvt->is_gpu)
+ mci->edac_ctl_cap |= EDAC_FLAG_SECDED;
+ else
+ f17h_determine_edac_ctl_cap(mci, pvt);
} else {
if (pvt->nbcap & NBCAP_SECDED)
mci->edac_ctl_cap |= EDAC_FLAG_SECDED;
@@ -3726,6 +3935,17 @@ static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt)
pvt->ops = &family_types[F17_M70H_CPUS].ops;
fam_type->ctl_name = "F19h_M20h";
break;
+ } else if (pvt->model >= 0x30 && pvt->model <= 0x3f) {
+ if (pvt->mc_node_id >= amd_cpu_node_count()) {
+ fam_type = &family_types[ALDEBARAN_GPUS];
+ pvt->ops = &family_types[ALDEBARAN_GPUS].ops;
+ pvt->is_gpu = true;
+ } else {
+ fam_type = &family_types[F19_CPUS];
+ pvt->ops = &family_types[F19_CPUS].ops;
+ fam_type->ctl_name = "F19h_M30h";
+ }
+ break;
}
fam_type = &family_types[F19_CPUS];
pvt->ops = &family_types[F19_CPUS].ops;
@@ -3808,9 +4028,10 @@ static int init_one_instance(struct amd64_pvt *pvt)
if (pvt->channel_count < 0)
return ret;

+ /* Define layers for CPU and GPU nodes */
ret = -ENOMEM;
layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
- layers[0].size = pvt->csels[0].b_cnt;
+ layers[0].size = pvt->is_gpu ? fam_type->max_mcs : pvt->csels[0].b_cnt;
layers[0].is_virt_csrow = true;
layers[1].type = EDAC_MC_LAYER_CHANNEL;

@@ -3819,7 +4040,7 @@ static int init_one_instance(struct amd64_pvt *pvt)
* only one channel. Also, this simplifies handling later for the price
* of a couple of KBs tops.
*/
- layers[1].size = fam_type->max_mcs;
+ layers[1].size = pvt->is_gpu ? pvt->csels[0].b_cnt : fam_type->max_mcs;
layers[1].is_virt_csrow = false;

mci = edac_mc_alloc(pvt->mc_node_id, ARRAY_SIZE(layers), layers, 0);
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index ce21b3cf0825..2dbf6fe14a55 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -126,6 +126,8 @@
#define PCI_DEVICE_ID_AMD_17H_M70H_DF_F6 0x1446
#define PCI_DEVICE_ID_AMD_19H_DF_F0 0x1650
#define PCI_DEVICE_ID_AMD_19H_DF_F6 0x1656
+#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F0 0x14d0
+#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F6 0x14d6

/*
* Function 1 - Address Map
@@ -298,6 +300,7 @@ enum amd_families {
F17_M60H_CPUS,
F17_M70H_CPUS,
F19_CPUS,
+ ALDEBARAN_GPUS,
NUM_FAMILIES,
};

@@ -389,6 +392,8 @@ struct amd64_pvt {
enum mem_type dram_type;

struct amd64_umc *umc; /* UMC registers */
+
+ bool is_gpu;
};

enum err_codes {
@@ -410,6 +415,28 @@ struct err_info {
u32 offset;
};

+static inline u32 get_umc_base_gpu(u8 umc, u8 channel)
+{
+ /*
+ * On CPUs, there is one channel per UMC, so UMC numbering equals
+ * channel numbering. On GPUs, there are eight channels per UMC,
+ * so the channel numbering is different from UMC numbering.
+ *
+ * On CPU nodes channels are selected in 6th nibble
+ * UMC chY[3:0]= [(chY*2 + 1) : (chY*2)]50000;
+ *
+ * On GPU nodes channels are selected in 3rd nibble
+ * HBM chX[3:0]= [Y ]5X[3:0]000;
+ * HBM chX[7:4]= [Y+1]5X[3:0]000
+ */
+ umc *= 2;
+
+ if (channel >= 4)
+ umc++;
+
+ return 0x50000 + (umc << 20) + ((channel % 4) << 12);
+}
+
static inline u32 get_umc_base(u8 channel)
{
/* chY: 0xY50000 */
--
2.25.1

2021-10-15 01:42:53

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] x86/edac/amd64: Add support for noncpu nodes

On Fri, Oct 15, 2021 at 12:23:56AM +0530, Naveen Krishna Chatradhi wrote:
> On newer heterogeneous systems with AMD CPUs the data fabrics of GPUs
> can be connected directly via custom links.
>
> This patchset does the following
> 1. amd_nb.c:
> a. Add support for northbridges on Aldebaran GPU nodes
> b. export AMD node map details to be used by edac and mce modules
>
> 2. mce_amd module:
> a. Identify the node ID where the error occurred and map the node id
> to linux enumerated node id.
>
> 2. Modifies the amd64_edac module
> a. Add new family op routines
> b. Enumerate UMCs and HBMs on the GPU nodes
>
> This patchset is rebased on top of
> "
> commit 07416cadfdfa38283b840e700427ae3782c76f6b
> Author: Yazen Ghannam <[email protected]>
> Date: Tue Oct 5 15:44:19 2021 +0000
>
> EDAC/amd64: Handle three rank interleaving mode
> "
>
> Muralidhara M K (2):
> x86/amd_nb: Add support for northbridges on Aldebaran
> EDAC/amd64: Extend family ops functions
>
> Naveen Krishna Chatradhi (2):
> EDAC/mce_amd: Extract node id from MCA_IPID
> EDAC/amd64: Enumerate memory on Aldebaran GPU nodes
>
> arch/x86/include/asm/amd_nb.h | 9 +
> arch/x86/kernel/amd_nb.c | 131 +++++++--
> drivers/edac/amd64_edac.c | 517 +++++++++++++++++++++++++---------
> drivers/edac/amd64_edac.h | 33 +++
> drivers/edac/mce_amd.c | 24 +-
> include/linux/pci_ids.h | 1 +
> 6 files changed, 564 insertions(+), 151 deletions(-)

So which v4 should I be looking at - this one or

https://lore.kernel.org/r/[email protected]

?

Btw, you don't have to do --in-reply-to and keep all patchsets in a
single thread - just send the new revision as a separate thread.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Subject: Re: [PATCH v4 0/4] x86/edac/amd64: Add support for noncpu nodes

Hi Boris,

On 10/15/2021 1:23 AM, Borislav Petkov wrote:
> [CAUTION: External Email]
>
> On Fri, Oct 15, 2021 at 12:23:56AM +0530, Naveen Krishna Chatradhi wrote:
>> On newer heterogeneous systems with AMD CPUs the data fabrics of GPUs
>> can be connected directly via custom links.
>>
>> This patchset does the following
>> 1. amd_nb.c:
>> a. Add support for northbridges on Aldebaran GPU nodes
>> b. export AMD node map details to be used by edac and mce modules
>>
>> 2. mce_amd module:
>> a. Identify the node ID where the error occurred and map the node id
>> to linux enumerated node id.
>>
>> 2. Modifies the amd64_edac module
>> a. Add new family op routines
>> b. Enumerate UMCs and HBMs on the GPU nodes
>>
>> This patchset is rebased on top of
>> "
>> commit 07416cadfdfa38283b840e700427ae3782c76f6b
>> Author: Yazen Ghannam <[email protected]>
>> Date: Tue Oct 5 15:44:19 2021 +0000
>>
>> EDAC/amd64: Handle three rank interleaving mode
>> "
>>
>> Muralidhara M K (2):
>> x86/amd_nb: Add support for northbridges on Aldebaran
>> EDAC/amd64: Extend family ops functions
>>
>> Naveen Krishna Chatradhi (2):
>> EDAC/mce_amd: Extract node id from MCA_IPID
>> EDAC/amd64: Enumerate memory on Aldebaran GPU nodes
>>
>> arch/x86/include/asm/amd_nb.h | 9 +
>> arch/x86/kernel/amd_nb.c | 131 +++++++--
>> drivers/edac/amd64_edac.c | 517 +++++++++++++++++++++++++---------
>> drivers/edac/amd64_edac.h | 33 +++
>> drivers/edac/mce_amd.c | 24 +-
>> include/linux/pci_ids.h | 1 +
>> 6 files changed, 564 insertions(+), 151 deletions(-)
> So which v4 should I be looking at - this one or
I've noticed the v4 tag missing on the 3rd and 4th patch in the series.
i tried to abort but the git send-email went through.
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fr%2F20211014185058.9587-1-nchatrad%40amd.com&amp;data=04%7C01%7CNaveenKrishna.Chatradhi%40amd.com%7C74d7878274cb4849f2b008d98f4c42c6%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637698379996982690%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=g1%2Bw29J4X%2FbdsugpYICDKMzM4Qd2nvsS1dV8zgR150w%3D&amp;reserved=0
Could you please review the latest one (above link) or should i push
them as v5, to avoid the confusion.
>
> ?
>
> Btw, you don't have to do --in-reply-to and keep all patchsets in a
> single thread - just send the new revision as a separate thread.
Sure, will do that. thank you.
>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&amp;data=04%7C01%7CNaveenKrishna.Chatradhi%40amd.com%7C74d7878274cb4849f2b008d98f4c42c6%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637698379996982690%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=RNIqyNMrAXLLIzfDB06wXSrmRXH7C596oHAnGIQM1e0%3D&amp;reserved=0

2021-10-17 21:43:54

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] x86/edac/amd64: Add support for noncpu nodes

On Fri, Oct 15, 2021 at 05:48:32PM +0530, Chatradhi, Naveen Krishna wrote:
> Could you please review the latest one (above link)

Ok.

> or should i push them as v5, to avoid the confusion.

Nah, not necessary.

The goal is to always avoid spamming maintainers with patchsets if not
absolutely necessary. :-)

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-10-21 15:58:04

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH v4 1/4] x86/amd_nb: Add support for northbridges on Aldebaran

On Fri, Oct 15, 2021 at 12:23:57AM +0530, Naveen Krishna Chatradhi wrote:
> From: Muralidhara M K <[email protected]>
>
> On newer systems the CPUs manage MCA errors reported from the GPUs.
> Enumerate the GPU nodes with the AMD NB framework to support EDAC.
>
> GPU nodes are enumerated in sequential order based on the PCI hierarchy,
> and the first GPU node is assumed to have an "AMD Node ID" value after
> CPU Nodes are fully populated.
>
> Aldebaran is an AMD GPU, GPU drivers are part of the DRM framework
> https://lists.freedesktop.org/archives/amd-gfx/2021-February/059694.html
>
> Each Aldebaran GPU has 2 Data Fabrics, which are enumerated as 2 nodes.
> With this implementation detail, the Data Fabric on the GPU nodes can be
> accessed the same way as the Data Fabric on CPU nodes.
>
> Special handling was necessary in northbridge enumeration as the
> roots_per_misc value is different for GPU and CPU nodes.
>
> Signed-off-by: Muralidhara M K <[email protected]>
> Co-developed-by: Naveen Krishna Chatradhi <[email protected]>
> Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
> Link: https://lkml.kernel.org/r/[email protected]
> ---
> Changes since v3:
> 1. Use word "gpu" instead of "noncpu" in the patch
> 2. Do not create pci_dev_ids arrays for gpu nodes
> 3. Identify the gpu node start index from DF18F1 registers on the GPU nodes.
> a. Export cpu node count and gpu start node id
>
> Changes since v2:
> 1. Added Reviewed-by Yazen Ghannam
>
> Changes since v1:
> 1. Modified the commit message and comments in the code
> 2. Squashed patch 1/7: "x86/amd_nb: Add Aldebaran device to PCI IDs"
>
> arch/x86/include/asm/amd_nb.h | 9 +++
> arch/x86/kernel/amd_nb.c | 131 ++++++++++++++++++++++++++++------
> include/linux/pci_ids.h | 1 +
> 3 files changed, 118 insertions(+), 23 deletions(-)
>
> diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h
> index 455066a06f60..5898300f11ed 100644
> --- a/arch/x86/include/asm/amd_nb.h
> +++ b/arch/x86/include/asm/amd_nb.h
> @@ -68,10 +68,17 @@ struct amd_northbridge {
> struct threshold_bank *bank4;
> };
>
> +/* heterogeneous system node type map variables */
> +struct amd_node_map {
> + u16 gpu_node_start_id;
> + u16 cpu_node_count;
> +};
> +
> struct amd_northbridge_info {
> u16 num;
> u64 flags;
> struct amd_northbridge *nb;
> + struct amd_node_map *nmap;

Just a minor nit, but does the name "nmap" conflict with anything in the
kernel? At first glance it looks like "network" map.

> };
>
> #define AMD_NB_GART BIT(0)
> @@ -83,6 +90,8 @@ struct amd_northbridge_info {
> u16 amd_nb_num(void);
> bool amd_nb_has_feature(unsigned int feature);
> struct amd_northbridge *node_to_amd_nb(int node);
> +u16 amd_gpu_node_start_id(void);
> +u16 amd_cpu_node_count(void);
>
> static inline u16 amd_pci_dev_to_node_id(struct pci_dev *pdev)
> {
> diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
> index c92c9c774c0e..54a6a7462f07 100644
> --- a/arch/x86/kernel/amd_nb.c
> +++ b/arch/x86/kernel/amd_nb.c
> @@ -19,6 +19,7 @@
> #define PCI_DEVICE_ID_AMD_17H_M10H_ROOT 0x15d0
> #define PCI_DEVICE_ID_AMD_17H_M30H_ROOT 0x1480
> #define PCI_DEVICE_ID_AMD_17H_M60H_ROOT 0x1630
> +#define PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT 0x14bb
> #define PCI_DEVICE_ID_AMD_17H_DF_F4 0x1464
> #define PCI_DEVICE_ID_AMD_17H_M10H_DF_F4 0x15ec
> #define PCI_DEVICE_ID_AMD_17H_M30H_DF_F4 0x1494
> @@ -28,6 +29,7 @@
> #define PCI_DEVICE_ID_AMD_19H_M40H_ROOT 0x14b5
> #define PCI_DEVICE_ID_AMD_19H_M40H_DF_F4 0x167d
> #define PCI_DEVICE_ID_AMD_19H_M50H_DF_F4 0x166e
> +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4 0x14d4
>
> /* Protect the PCI config register pairs used for SMN and DF indirect access. */
> static DEFINE_MUTEX(smn_mutex);
> @@ -40,6 +42,7 @@ static const struct pci_device_id amd_root_ids[] = {
> { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_17H_M30H_ROOT) },
> { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_17H_M60H_ROOT) },
> { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M40H_ROOT) },
> + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT) },
> {}
> };
>
> @@ -63,6 +66,7 @@ static const struct pci_device_id amd_nb_misc_ids[] = {
> { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_DF_F3) },
> { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M40H_DF_F3) },
> { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M50H_DF_F3) },
> + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3) },
> {}
> };
>
> @@ -81,6 +85,7 @@ static const struct pci_device_id amd_nb_link_ids[] = {
> { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M40H_DF_F4) },
> { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M50H_DF_F4) },
> { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_CNB17H_F4) },
> + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4) },
> {}
> };
>
> @@ -126,6 +131,55 @@ struct amd_northbridge *node_to_amd_nb(int node)
> }
> EXPORT_SYMBOL_GPL(node_to_amd_nb);
>
> +/*
> + * GPU start index and CPU count values on an heterogeneous system,
> + * these values will be used by the AMD EDAC and MCE modules.
> + */
> +u16 amd_gpu_node_start_id(void)
> +{
> + return (amd_northbridges.nmap) ?
> + amd_northbridges.nmap->gpu_node_start_id : 0;
> +}
> +EXPORT_SYMBOL_GPL(amd_gpu_node_start_id);
> +
> +u16 amd_cpu_node_count(void)
> +{
> + return (amd_northbridges.nmap) ?
> + amd_northbridges.nmap->cpu_node_count : amd_northbridges.num;
> +}
> +EXPORT_SYMBOL_GPL(amd_cpu_node_count);
> +
> +/* DF18xF1 regsters on Aldebaran GPU */
> +#define REG_LOCAL_NODE_TYPE_MAP 0x144
> +#define REG_RMT_NODE_TYPE_MAP 0x148
> +
> +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F1 0x14d1
> +
> +static int amd_get_node_map(void)
> +{
> + struct amd_node_map *np;
> + struct pci_dev *pdev = NULL;
> + u32 tmp;
> +

These lines should be ordered from longest to shortest.

You could even combine the "struct pci_dev" line with the line below. Just
pass NULL to pci_get_device().

> + pdev = pci_get_device(PCI_VENDOR_ID_AMD,
> + PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F1, pdev);
> + if (!pdev)
> + return -ENODEV;
> +
> + np = kmalloc(sizeof(*np), GFP_KERNEL);
> + if (!np)
> + return -ENOMEM;
> +
> + pci_read_config_dword(pdev, REG_LOCAL_NODE_TYPE_MAP, &tmp);
> + np->gpu_node_start_id = tmp & 0xFFF;
> +
> + pci_read_config_dword(pdev, REG_RMT_NODE_TYPE_MAP, &tmp);
> + np->cpu_node_count = tmp >> 16 & 0xFFF;
> +

The PCI device, register offsets, and bit fields all look correct.

I think a comment with explanation will be helpful. Something that mentions
how some DF devices have these registers with "Base Node ID" and a "Node
Count" fields. "Local Node Type" refers to nodes with the same type as that
from which the register is read, and "Remote Node Type" refers to nodes with
a different type. So if you read the registers from a GPU node, then "Local"
refers to GPU nodes and "Remote" refers to CPU nodes, and vice versa.

Since this information is only needed with we have CPU+GPU system, we only
need to gather it when we find a GPU device.

> + amd_northbridges.nmap = np;
> + return 0;
> +}
> +
> static struct pci_dev *next_northbridge(struct pci_dev *dev,
> const struct pci_device_id *ids)
> {
> @@ -230,6 +284,27 @@ int amd_df_indirect_read(u16 node, u8 func, u16 reg, u8 instance_id, u32 *lo)
> }
> EXPORT_SYMBOL_GPL(amd_df_indirect_read);
>
> +struct pci_dev *get_root_devs(struct pci_dev *root,
> + const struct pci_device_id *root_ids,
> + u16 roots_per_misc)
> +{
> + u16 j;
> +
> + /*
> + * If there are more PCI root devices than data fabric/
> + * system management network interfaces, then the (N)
> + * PCI roots per DF/SMN interface are functionally the
> + * same (for DF/SMN access) and N-1 are redundant. N-1
> + * PCI roots should be skipped per DF/SMN interface so
> + * the following DF/SMN interfaces get mapped to
> + * correct PCI roots.
> + */
> + for (j = 0; j < roots_per_misc; j++)
> + root = next_northbridge(root, root_ids);
> +
> + return root;
> +}
> +
> int amd_cache_northbridges(void)
> {
> const struct pci_device_id *misc_ids = amd_nb_misc_ids;
> @@ -237,10 +312,10 @@ int amd_cache_northbridges(void)
> const struct pci_device_id *root_ids = amd_root_ids;
> struct pci_dev *root, *misc, *link;
> struct amd_northbridge *nb;
> - u16 roots_per_misc = 0;
> - u16 misc_count = 0;
> - u16 root_count = 0;
> - u16 i, j;
> + u16 roots_per_misc = 0, gpu_roots_per_misc = 0;
> + u16 misc_count = 0, gpu_misc_count = 0;
> + u16 root_count = 0, gpu_root_count = 0;
> + u16 i;
>
> if (amd_northbridges.num)
> return 0;
> @@ -252,15 +327,23 @@ int amd_cache_northbridges(void)
> }
>
> misc = NULL;
> - while ((misc = next_northbridge(misc, misc_ids)) != NULL)
> - misc_count++;
> + while ((misc = next_northbridge(misc, misc_ids)) != NULL) {
> + if (misc->device == PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3)

I think this may need to be extended for future devices. In which case, it may
make sense to go back to the original solution of having another list of IDs
just for GPUs.

I can't say for sure though. So maybe we keep this how you have it and deal
with future systems when we come to them.

> + gpu_misc_count++;
> + else
> + misc_count++;
> + }
>
> if (!misc_count)
> return -ENODEV;
>
> root = NULL;
> - while ((root = next_northbridge(root, root_ids)) != NULL)
> - root_count++;
> + while ((root = next_northbridge(root, root_ids)) != NULL) {
> + if (root->device == PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT)
> + gpu_root_count++;
> + else
> + root_count++;
> + }
>
> if (root_count) {
> roots_per_misc = root_count / misc_count;
> @@ -275,33 +358,35 @@ int amd_cache_northbridges(void)
> }
> }
>
> - nb = kcalloc(misc_count, sizeof(struct amd_northbridge), GFP_KERNEL);
> + /*
> + * The number of miscs, roots and roots_per_misc might vary on different
> + * nodes of a heterogeneous system.
> + * calculate roots_per_misc accordingly in order to skip the redundant

Capitalize "calculate".

> + * roots and map the DF/SMN interfaces to correct PCI roots.
> + */
> + if (gpu_root_count && gpu_misc_count) {
> + if (amd_get_node_map())
> + return -ENOMEM;

amd_get_node_map() can return ENODEV and ENOMEM, but only ENOMEM is passed
along here.

I'm not sure that the ENODEV case is necessary. I think you can just return
silently if the GPU PCI ID is not found. In this case, the nmap structure
won't be set, so the code will act as if the system is CPU-only.

Or you can save the return value from amd_get_node_map() and return that.
Maybe this would be the more conservative behavior. We want to give an error
if we found some GPU devices, but we didn't find the one device that we need
to gather the node map info.

Thanks,
Yazen

2021-10-21 15:58:20

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH v4 2/4] EDAC/mce_amd: Extract node id from MCA_IPID

On Fri, Oct 15, 2021 at 12:23:58AM +0530, Naveen Krishna Chatradhi wrote:
> On SMCA banks of the GPU nodes, the node id information is
> available in register MCA_IPID[47:44](InstanceIdHi).
>
> Convert the hardware node ID to a value used by Linux
> where GPU nodes are sequencially after the CPU nodes.
>
> Co-developed-by: Muralidhara M K <[email protected]>
> Signed-off-by: Muralidhara M K <[email protected]>
> Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
> Link: https://lkml.kernel.org/r/[email protected]
> ---
> Changes since v3:
> 1. Use APIs from amd_nb to identify the gpu_node_start_id and cpu_node_count.
> Which is required to map the hardware node id to node id enumerated by Linux.
>
> Changes since v2:
> 1. Modified subject and commit message
> 2. Added Reviewed by Yazen Ghannam
>
> Changes since v1:
> 1. Modified the commit message
> 2. rearranged the conditions before calling decode_dram_ecc()
>
> drivers/edac/mce_amd.c | 24 ++++++++++++++++++++++--
> 1 file changed, 22 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
> index 67dbf4c31271..af6caa76adc7 100644
> --- a/drivers/edac/mce_amd.c
> +++ b/drivers/edac/mce_amd.c
> @@ -2,6 +2,7 @@
> #include <linux/module.h>
> #include <linux/slab.h>
>
> +#include <asm/amd_nb.h>
> #include <asm/cpu.h>
>
> #include "mce_amd.h"
> @@ -1072,8 +1073,27 @@ static void decode_smca_error(struct mce *m)
> if (xec < smca_mce_descs[bank_type].num_descs)
> pr_cont(", %s.\n", smca_mce_descs[bank_type].descs[xec]);
>
> - if (bank_type == SMCA_UMC && xec == 0 && decode_dram_ecc)
> - decode_dram_ecc(topology_die_id(m->extcpu), m);
> + if (xec == 0 && decode_dram_ecc) {
> + int node_id = 0;
> +
> + if (bank_type == SMCA_UMC) {
> + node_id = topology_die_id(m->extcpu);
> + } else if (bank_type == SMCA_UMC_V2) {
> + /*
> + * SMCA_UMC_V2 exists on GPU nodes, extract the node id
> + * from register MCA_IPID[47:44](InstanceIdHi).
> + * The InstanceIdHi field represents the instance ID of the GPU.
> + * Which needs to be mapped to a value used by Linux,
> + * where GPU nodes are simply numerically after the CPU nodes.
> + */
> + node_id = ((m->ipid >> 44) & 0xF) -
> + amd_gpu_node_start_id() + amd_cpu_node_count();
> + } else {
> + return;
> + }
> +
> + decode_dram_ecc(node_id, m);
> + }
> }
>
> static inline void amd_decode_err_code(u16 ec)
> --

This looks good to me.

Reviewed-by: Yazen Ghannam <[email protected]>

Thanks,
Yazen

2021-10-21 16:30:52

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH v4 3/4] EDAC/amd64: Extend family ops functions

On Fri, Oct 15, 2021 at 12:23:59AM +0530, Naveen Krishna Chatradhi wrote:
> From: Muralidhara M K <[email protected]>
>
> Create new family operation routines and define them respectively.
> This would simplify adding support for future platforms.
>
> Signed-off-by: Muralidhara M K <[email protected]>
> Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
> ---
> Changes since v3:
> 1. Defined new family operation routines
>
> Changs since v2:
> 1. new patch
>
> drivers/edac/amd64_edac.c | 291 ++++++++++++++++++++++----------------
> drivers/edac/amd64_edac.h | 6 +
> 2 files changed, 174 insertions(+), 123 deletions(-)
>
> diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
> index 4fce75013674..131ed19f69dd 100644
> --- a/drivers/edac/amd64_edac.c
> +++ b/drivers/edac/amd64_edac.c
> @@ -1204,10 +1204,7 @@ static void __dump_misc_regs(struct amd64_pvt *pvt)
> /* Display and decode various NB registers for debug purposes. */
> static void dump_misc_regs(struct amd64_pvt *pvt)
> {
> - if (pvt->umc)
> - __dump_misc_regs_df(pvt);
> - else
> - __dump_misc_regs(pvt);
> + pvt->ops->display_misc_regs(pvt);
>
> edac_dbg(1, " DramHoleValid: %s\n", dhar_valid(pvt) ? "yes" : "no");
>
> @@ -1217,25 +1214,31 @@ static void dump_misc_regs(struct amd64_pvt *pvt)
> /*
> * See BKDG, F2x[1,0][5C:40], F2[1,0][6C:60]
> */
> -static void prep_chip_selects(struct amd64_pvt *pvt)
> +static void k8_prep_chip_selects(struct amd64_pvt *pvt)
> {
> - if (pvt->fam == 0xf && pvt->ext_model < K8_REV_F) {
> - pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8;
> - pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 8;
> - } else if (pvt->fam == 0x15 && pvt->model == 0x30) {
> - pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 4;
> - pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2;
> - } else if (pvt->fam >= 0x17) {
> - int umc;
> -
> - for_each_umc(umc) {
> - pvt->csels[umc].b_cnt = 4;
> - pvt->csels[umc].m_cnt = 2;
> - }
> + pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8;
> + pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 8;

This doesn't exactly match the existing code. Base/mask = 8/8 applies to
revisions less than K8_REV_F, and 8/4 for K8_REF_F.

So I think you'll still need the "ext_model" check here in
k8_prep_chip_selects().

> +}
>
> - } else {
> - pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8;
> - pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 4;
> +static void f15m30_prep_chip_selects(struct amd64_pvt *pvt)
> +{
> + pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 4;
> + pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2;
> +}
> +
> +static void fmisc_prep_chip_selects(struct amd64_pvt *pvt)

"fmisc" looks weird. Maybe just call it "default" since it was the
default/else path in the code?

> +{
> + pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8;
> + pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 4;
> +}
> +
> +static void f17_prep_chip_selects(struct amd64_pvt *pvt)
> +{
> + int umc;
> +
> + for_each_umc(umc) {
> + pvt->csels[umc].b_cnt = 4;
> + pvt->csels[umc].m_cnt = 2;
> }
> }
>
> @@ -1297,10 +1300,10 @@ static void read_dct_base_mask(struct amd64_pvt *pvt)
> {
> int cs;
>
> - prep_chip_selects(pvt);
> + pvt->ops->prep_chip_select(pvt);
>
> - if (pvt->umc)
> - return read_umc_base_mask(pvt);
> + if (pvt->ops->get_base_mask)
> + return pvt->ops->get_base_mask(pvt);

The get_base_mask() pointer can be set to this read_dct_base_mask() function
on pre-Family17h systems.

>
> for_each_chip_select(cs, 0, pvt) {
> int reg0 = DCSB0 + (cs * 4);
> @@ -1869,37 +1872,12 @@ static int f16_dbam_to_chip_select(struct amd64_pvt *pvt, u8 dct,
> return ddr3_cs_size(cs_mode, false);
> }
>
> -static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
> - unsigned int cs_mode, int csrow_nr)
> +static int __addr_mask_to_cs_size(u32 addr_mask_orig, unsigned int cs_mode,
> + int csrow_nr, int dimm)
> {
> - u32 addr_mask_orig, addr_mask_deinterleaved;
> u32 msb, weight, num_zero_bits;
> - int dimm, size = 0;
> -
> - /* No Chip Selects are enabled. */
> - if (!cs_mode)
> - return size;
> -
> - /* Requested size of an even CS but none are enabled. */
> - if (!(cs_mode & CS_EVEN) && !(csrow_nr & 1))
> - return size;
> -
> - /* Requested size of an odd CS but none are enabled. */
> - if (!(cs_mode & CS_ODD) && (csrow_nr & 1))
> - return size;
> -
> - /*
> - * There is one mask per DIMM, and two Chip Selects per DIMM.
> - * CS0 and CS1 -> DIMM0
> - * CS2 and CS3 -> DIMM1
> - */
> - dimm = csrow_nr >> 1;
> -
> - /* Asymmetric dual-rank DIMM support. */
> - if ((csrow_nr & 1) && (cs_mode & CS_ODD_SECONDARY))
> - addr_mask_orig = pvt->csels[umc].csmasks_sec[dimm];
> - else
> - addr_mask_orig = pvt->csels[umc].csmasks[dimm];
> + u32 addr_mask_deinterleaved;
> + int size = 0;
>
> /*
> * The number of zero bits in the mask is equal to the number of bits
> @@ -1930,6 +1908,40 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
> return size >> 10;
> }
>
> +static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
> + unsigned int cs_mode, int csrow_nr)
> +{
> + u32 addr_mask_orig;
> + int dimm, size = 0;
> +
> + /* No Chip Selects are enabled. */
> + if (!cs_mode)
> + return size;
> +
> + /* Requested size of an even CS but none are enabled. */
> + if (!(cs_mode & CS_EVEN) && !(csrow_nr & 1))
> + return size;
> +
> + /* Requested size of an odd CS but none are enabled. */
> + if (!(cs_mode & CS_ODD) && (csrow_nr & 1))
> + return size;
> +
> + /*
> + * There is one mask per DIMM, and two Chip Selects per DIMM.
> + * CS0 and CS1 -> DIMM0
> + * CS2 and CS3 -> DIMM1
> + */
> + dimm = csrow_nr >> 1;
> +
> + /* Asymmetric dual-rank DIMM support. */
> + if ((csrow_nr & 1) && (cs_mode & CS_ODD_SECONDARY))
> + addr_mask_orig = pvt->csels[umc].csmasks_sec[dimm];
> + else
> + addr_mask_orig = pvt->csels[umc].csmasks[dimm];
> +
> + return __addr_mask_to_cs_size(addr_mask_orig, cs_mode, csrow_nr, dimm);

The commit message refers to function ops/pointers, but it didn't say why this
helper function is needed.

> +}
> +
> static void read_dram_ctl_register(struct amd64_pvt *pvt)
> {
>
> @@ -2512,143 +2524,168 @@ static void debug_display_dimm_sizes(struct amd64_pvt *pvt, u8 ctrl)
> }
> }
>
> +/* Prototypes for family specific ops routines */
> +static int init_csrows(struct mem_ctl_info *mci);
> +static int init_csrows_df(struct mem_ctl_info *mci);
> +static void __read_mc_regs_df(struct amd64_pvt *pvt);
> +static void find_umc_channel(struct mce *m, struct err_info *err);
> +
> +static const struct low_ops k8_ops = {
> + .early_channel_count = k8_early_channel_count,
> + .map_sysaddr_to_csrow = k8_map_sysaddr_to_csrow,
> + .dbam_to_cs = k8_dbam_to_chip_select,
> + .prep_chip_select = k8_prep_chip_selects,
> + .display_misc_regs = __dump_misc_regs,
> + .populate_csrows = init_csrows,
> +};
> +
> +static const struct low_ops f10_ops = {
> + .early_channel_count = f1x_early_channel_count,
> + .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
> + .dbam_to_cs = f10_dbam_to_chip_select,
> + .prep_chip_select = fmisc_prep_chip_selects,
> + .display_misc_regs = __dump_misc_regs,
> + .populate_csrows = init_csrows,
> +};
> +
> +static const struct low_ops f15_ops = {
> + .early_channel_count = f1x_early_channel_count,
> + .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
> + .dbam_to_cs = f15_dbam_to_chip_select,
> + .prep_chip_select = fmisc_prep_chip_selects,
> + .display_misc_regs = __dump_misc_regs,
> + .populate_csrows = init_csrows,
> +};
> +
> +static const struct low_ops f15m30_ops = {
> + .early_channel_count = f1x_early_channel_count,
> + .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
> + .dbam_to_cs = f16_dbam_to_chip_select,
> + .prep_chip_select = f15m30_prep_chip_selects,
> + .display_misc_regs = __dump_misc_regs,
> + .populate_csrows = init_csrows,
> +};
> +
> +static const struct low_ops f16_x_ops = {

Why is this "f16_x" rather than "f15m60"?

> + .early_channel_count = f1x_early_channel_count,
> + .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
> + .dbam_to_cs = f15_m60h_dbam_to_chip_select,
> + .prep_chip_select = fmisc_prep_chip_selects,
> + .display_misc_regs = __dump_misc_regs,
> + .populate_csrows = init_csrows,
> +};
> +
> +static const struct low_ops f16_ops = {
> + .early_channel_count = f1x_early_channel_count,
> + .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
> + .dbam_to_cs = f16_dbam_to_chip_select,
> + .prep_chip_select = fmisc_prep_chip_selects,
> + .display_misc_regs = __dump_misc_regs,
> + .populate_csrows = init_csrows,
> +};
> +
> +static const struct low_ops f17_ops = {
> + .early_channel_count = f17_early_channel_count,
> + .dbam_to_cs = f17_addr_mask_to_cs_size,
> + .prep_chip_select = f17_prep_chip_selects,
> + .get_base_mask = read_umc_base_mask,
> + .display_misc_regs = __dump_misc_regs_df,
> + .get_mc_regs = __read_mc_regs_df,
> + .populate_csrows = init_csrows_df,
> + .get_umc_err_info = find_umc_channel,
> +};
> +
> static struct amd64_family_type family_types[] = {
> [K8_CPUS] = {
> .ctl_name = "K8",
> .f1_id = PCI_DEVICE_ID_AMD_K8_NB_ADDRMAP,
> .f2_id = PCI_DEVICE_ID_AMD_K8_NB_MEMCTL,
> .max_mcs = 2,
> - .ops = {
> - .early_channel_count = k8_early_channel_count,
> - .map_sysaddr_to_csrow = k8_map_sysaddr_to_csrow,
> - .dbam_to_cs = k8_dbam_to_chip_select,
> - }
> + .ops = k8_ops,
> },
> [F10_CPUS] = {
> .ctl_name = "F10h",
> .f1_id = PCI_DEVICE_ID_AMD_10H_NB_MAP,
> .f2_id = PCI_DEVICE_ID_AMD_10H_NB_DRAM,
> .max_mcs = 2,
> - .ops = {
> - .early_channel_count = f1x_early_channel_count,
> - .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
> - .dbam_to_cs = f10_dbam_to_chip_select,
> - }
> + .ops = f10_ops,
> },
> [F15_CPUS] = {
> .ctl_name = "F15h",
> .f1_id = PCI_DEVICE_ID_AMD_15H_NB_F1,
> .f2_id = PCI_DEVICE_ID_AMD_15H_NB_F2,
> .max_mcs = 2,
> - .ops = {
> - .early_channel_count = f1x_early_channel_count,
> - .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
> - .dbam_to_cs = f15_dbam_to_chip_select,
> - }
> + .ops = f15_ops,
> },
> [F15_M30H_CPUS] = {
> .ctl_name = "F15h_M30h",
> .f1_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F1,
> .f2_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F2,
> .max_mcs = 2,
> - .ops = {
> - .early_channel_count = f1x_early_channel_count,
> - .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
> - .dbam_to_cs = f16_dbam_to_chip_select,
> - }
> + .ops = f15m30_ops,
> },
> [F15_M60H_CPUS] = {
> .ctl_name = "F15h_M60h",
> .f1_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F1,
> .f2_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F2,
> .max_mcs = 2,
> - .ops = {
> - .early_channel_count = f1x_early_channel_count,
> - .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
> - .dbam_to_cs = f15_m60h_dbam_to_chip_select,
> - }
> + .ops = f16_x_ops,
> },
> [F16_CPUS] = {
> .ctl_name = "F16h",
> .f1_id = PCI_DEVICE_ID_AMD_16H_NB_F1,
> .f2_id = PCI_DEVICE_ID_AMD_16H_NB_F2,
> .max_mcs = 2,
> - .ops = {
> - .early_channel_count = f1x_early_channel_count,
> - .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
> - .dbam_to_cs = f16_dbam_to_chip_select,
> - }
> + .ops = f16_ops,
> },
> [F16_M30H_CPUS] = {
> .ctl_name = "F16h_M30h",
> .f1_id = PCI_DEVICE_ID_AMD_16H_M30H_NB_F1,
> .f2_id = PCI_DEVICE_ID_AMD_16H_M30H_NB_F2,
> .max_mcs = 2,
> - .ops = {
> - .early_channel_count = f1x_early_channel_count,
> - .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
> - .dbam_to_cs = f16_dbam_to_chip_select,
> - }
> + .ops = f16_ops,
> },
> [F17_CPUS] = {
> .ctl_name = "F17h",
> .f0_id = PCI_DEVICE_ID_AMD_17H_DF_F0,
> .f6_id = PCI_DEVICE_ID_AMD_17H_DF_F6,
> .max_mcs = 2,
> - .ops = {
> - .early_channel_count = f17_early_channel_count,
> - .dbam_to_cs = f17_addr_mask_to_cs_size,
> - }
> + .ops = f17_ops,
> },
> [F17_M10H_CPUS] = {
> .ctl_name = "F17h_M10h",
> .f0_id = PCI_DEVICE_ID_AMD_17H_M10H_DF_F0,
> .f6_id = PCI_DEVICE_ID_AMD_17H_M10H_DF_F6,
> .max_mcs = 2,
> - .ops = {
> - .early_channel_count = f17_early_channel_count,
> - .dbam_to_cs = f17_addr_mask_to_cs_size,
> - }
> + .ops = f17_ops,
> },
> [F17_M30H_CPUS] = {
> .ctl_name = "F17h_M30h",
> .f0_id = PCI_DEVICE_ID_AMD_17H_M30H_DF_F0,
> .f6_id = PCI_DEVICE_ID_AMD_17H_M30H_DF_F6,
> .max_mcs = 8,
> - .ops = {
> - .early_channel_count = f17_early_channel_count,
> - .dbam_to_cs = f17_addr_mask_to_cs_size,
> - }
> + .ops = f17_ops,
> },
> [F17_M60H_CPUS] = {
> .ctl_name = "F17h_M60h",
> .f0_id = PCI_DEVICE_ID_AMD_17H_M60H_DF_F0,
> .f6_id = PCI_DEVICE_ID_AMD_17H_M60H_DF_F6,
> .max_mcs = 2,
> - .ops = {
> - .early_channel_count = f17_early_channel_count,
> - .dbam_to_cs = f17_addr_mask_to_cs_size,
> - }
> + .ops = f17_ops,
> },
> [F17_M70H_CPUS] = {
> .ctl_name = "F17h_M70h",
> .f0_id = PCI_DEVICE_ID_AMD_17H_M70H_DF_F0,
> .f6_id = PCI_DEVICE_ID_AMD_17H_M70H_DF_F6,
> .max_mcs = 2,
> - .ops = {
> - .early_channel_count = f17_early_channel_count,
> - .dbam_to_cs = f17_addr_mask_to_cs_size,
> - }
> + .ops = f17_ops,
> },
> [F19_CPUS] = {
> .ctl_name = "F19h",
> .f0_id = PCI_DEVICE_ID_AMD_19H_DF_F0,
> .f6_id = PCI_DEVICE_ID_AMD_19H_DF_F6,
> .max_mcs = 8,
> - .ops = {
> - .early_channel_count = f17_early_channel_count,
> - .dbam_to_cs = f17_addr_mask_to_cs_size,
> - }
> + .ops = f17_ops,
> },
> };
>
> @@ -2900,9 +2937,10 @@ static inline void decode_bus_error(int node_id, struct mce *m)
> * the instance_id. For example, instance_id=0xYXXXXX where Y is the channel
> * number.
> */
> -static int find_umc_channel(struct mce *m)
> +static void find_umc_channel(struct mce *m, struct err_info *err)

This function now gets more than just the channel. Can this be reflected in
the name?

> {
> - return (m->ipid & GENMASK(31, 0)) >> 20;
> + err->channel = (m->ipid & GENMASK(31, 0)) >> 20;
> + err->csrow = m->synd & 0x7;
> }
>
> static void decode_umc_error(int node_id, struct mce *m)
> @@ -2924,7 +2962,7 @@ static void decode_umc_error(int node_id, struct mce *m)
> if (m->status & MCI_STATUS_DEFERRED)
> ecc_type = 3;
>
> - err.channel = find_umc_channel(m);
> + pvt->ops->get_umc_err_info(m, &err);

Because the "csrow" value is derived from the MCA_SYND value, this function
call should go after checking SYNDV below.

>
> if (!(m->status & MCI_STATUS_SYNDV)) {
> err.err_code = ERR_SYND;
> @@ -2940,8 +2978,6 @@ static void decode_umc_error(int node_id, struct mce *m)
> err.err_code = ERR_CHANNEL;
> }
>
> - err.csrow = m->synd & 0x7;
> -
> if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, err.channel, &sys_addr)) {
> err.err_code = ERR_NORM_ADDR;
> goto log_error;
> @@ -3106,8 +3142,9 @@ static void read_mc_regs(struct amd64_pvt *pvt)
> edac_dbg(0, " TOP_MEM2 disabled\n");
> }
>
> - if (pvt->umc) {
> - __read_mc_regs_df(pvt);
> + if (pvt->ops->get_mc_regs) {
> + pvt->ops->get_mc_regs(pvt);
> +

I think this is okay for now. Maybe we can break up this function in a future
patch.

> amd64_read_pci_cfg(pvt->F0, DF_DHAR, &pvt->dhar);
>
> goto skip;
> @@ -3277,9 +3314,6 @@ static int init_csrows(struct mem_ctl_info *mci)
> int nr_pages = 0;
> u32 val;
>
> - if (pvt->umc)
> - return init_csrows_df(mci);
> -
> amd64_read_pci_cfg(pvt->F3, NBCFG, &val);
>
> pvt->nbcfg = val;
> @@ -3703,6 +3737,17 @@ static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt)
> return NULL;
> }
>
> + /* ops required for all the families */
> + if (!pvt->ops->early_channel_count | !pvt->ops->prep_chip_select |
> + !pvt->ops->display_misc_regs | !pvt->ops->dbam_to_cs |
> + !pvt->ops->populate_csrows)
> + return NULL;
> +
> + /* ops required for families 17h and later */
> + if (pvt->fam >= 0x17 && (!pvt->ops->get_base_mask |
> + !pvt->ops->get_umc_err_info | !pvt->ops->get_mc_regs))
> + return NULL;
> +

Can you please add an EDAC debug message for these? I think that'll help track
down any coding bugs.

Also, all the "|" should be "||" right?

> return fam_type;
> }
>
> @@ -3786,7 +3831,7 @@ static int init_one_instance(struct amd64_pvt *pvt)
>
> setup_mci_misc_attrs(mci);
>
> - if (init_csrows(mci))
> + if (pvt->ops->populate_csrows(mci))
> mci->edac_cap = EDAC_FLAG_NONE;
>
> ret = -ENODEV;
> diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
> index 85aa820bc165..ce21b3cf0825 100644
> --- a/drivers/edac/amd64_edac.h
> +++ b/drivers/edac/amd64_edac.h
> @@ -472,6 +472,12 @@ struct low_ops {
> struct err_info *);
> int (*dbam_to_cs) (struct amd64_pvt *pvt, u8 dct,
> unsigned cs_mode, int cs_mask_nr);
> + void (*prep_chip_select)(struct amd64_pvt *pvt);
> + void (*get_base_mask)(struct amd64_pvt *pvt);
> + void (*display_misc_regs)(struct amd64_pvt *pvt);
> + void (*get_mc_regs)(struct amd64_pvt *pvt);
> + int (*populate_csrows)(struct mem_ctl_info *mci);
> + void (*get_umc_err_info)(struct mce *m, struct err_info *err);

Can you please align all the parathenses?

> };
>
> struct amd64_family_type {
> --

Thanks,
Yazen

2021-10-21 16:42:42

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH v4 4/4] EDAC/amd64: Enumerate memory on Aldebaran GPU nodes

On Fri, Oct 15, 2021 at 12:24:00AM +0530, Naveen Krishna Chatradhi wrote:
...
> @@ -3726,6 +3935,17 @@ static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt)
> pvt->ops = &family_types[F17_M70H_CPUS].ops;
> fam_type->ctl_name = "F19h_M20h";
> break;
> + } else if (pvt->model >= 0x30 && pvt->model <= 0x3f) {
> + if (pvt->mc_node_id >= amd_cpu_node_count()) {
> + fam_type = &family_types[ALDEBARAN_GPUS];

The fam_type needs to become part of amd64_pvt.

Otherwise, what happens here is the module loads on a CPU node and sets a CPU
family type. Then a GPU node is probed and the family type is overwritten
with a GPU family type.

> + pvt->ops = &family_types[ALDEBARAN_GPUS].ops;
> + pvt->is_gpu = true;
> + } else {
> + fam_type = &family_types[F19_CPUS];
> + pvt->ops = &family_types[F19_CPUS].ops;
> + fam_type->ctl_name = "F19h_M30h";
> + }
> + break;
> }
> fam_type = &family_types[F19_CPUS];
> pvt->ops = &family_types[F19_CPUS].ops;
> @@ -3808,9 +4028,10 @@ static int init_one_instance(struct amd64_pvt *pvt)

Thanks,
Yazen