2023-01-27 17:06:36

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH v2 00/22] AMD64 EDAC Cleanup and Refactor

Hi Boris,

This set removes a good amount of code that is no longer useful. Also,
this set splits the module initialization more clearly between legacy
and modern systems. The original intention was to prep for adding GPU
support, but I think the split is worthwhile on its own.

Patches 1-4 remove useless code for Family 17h and later systems.

Patch 5 removes code that is no longer needed for all systems.

Patches 6-22 merge, split, and rename various functions and structures
to have clear legacy and modern initialization paths. A future patch
set will add a third "GPU" path.

The basis of the first revision of this set was to make a function
pointer for each legacy/modern split in the code path. This resulted in
16 function pointers. And, in my mind, it was still hard to follow.

The basis of this revision is to diverge early during initialization,
and then follow separate paths. This results in 7 functions pointers
including 2 used only for legacy systems and 1 used only for modern
systems. I think this is much easier to follow as each path has a clear
set of helper functions.

Also, I dropped a lot of the boiler plate commit messages. Most are now
fairly terse though I did try to add extra details in some cases.

Thanks!

-Yazen

Link:
https://lore.kernel.org/r/[email protected]

Muralidhara M K (11):
EDAC/amd64: Merge struct amd64_family_type into struct amd64_pvt
EDAC/amd64: Split prep_chip_selects() into dct/umc functions
EDAC/amd64: Split read_base_mask() into dct/umc functions
EDAC/amd64: Split determine_memory_type() into dct/umc functions
EDAC/amd64: Split read_mc_regs() into dct/umc functions
EDAC/amd64: Split ecc_enabled() into dct/umc functions
EDAC/amd64: Split setup_mci_misc_attrs() into dct/umc functions
EDAC/amd64: Split determine_edac_cap() into dct/umc functions
EDAC/amd64: Split init_csrows() into dct/umc functions
EDAC/amd64: Split dump_misc_regs() into dct/umc functions
EDAC/amd64: Add get_err_info() to pvt->ops

Yazen Ghannam (11):
EDAC/amd64: Don't set up EDAC PCI control on Family 17h+
EDAC/amd64: Remove scrub rate control for Family 17h and later
EDAC/amd64: Remove PCI Function 6
EDAC/amd64: Remove PCI Function 0
EDAC/amd64: Remove early_channel_count()
EDAC/amd64: Rename debug_display_dimm_sizes()
EDAC/amd64: Split get_csrow_nr_pages() into dct/umc functions
EDAC/amd64: Drop dbam_to_cs() for Family 17h and later
EDAC/amd64: Don't find ECC symbol size for Family 17h and later
EDAC/amd64: Rework hw_info_{get,put}
EDAC/amd64: Rename f17h_determine_edac_ctl_cap()

drivers/edac/amd64_edac.c | 1221 ++++++++++++++-----------------------
drivers/edac/amd64_edac.h | 89 +--
2 files changed, 483 insertions(+), 827 deletions(-)

--
2.25.1



2023-01-27 17:06:40

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH v2 06/22] EDAC/amd64: Rename debug_display_dimm_sizes()

Use the "dct" and "umc" prefixes for legacy and modern versions
respectively.

Also, move the "dct" version to avoid a forward declaration, and fixup
some checkpatch warnings in the process.

Signed-off-by: Yazen Ghannam <[email protected]>
---
Link:
https://lore.kernel.org/r/[email protected]

v1->v2:
* New in v2.

drivers/edac/amd64_edac.c | 128 +++++++++++++++++++-------------------
1 file changed, 63 insertions(+), 65 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 1c4bef1cdf28..2d0558aeca28 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -1291,7 +1291,65 @@ static unsigned long determine_edac_cap(struct amd64_pvt *pvt)
return edac_cap;
}

-static void debug_display_dimm_sizes(struct amd64_pvt *, u8);
+/*
+ * debug routine to display the memory sizes of all logical DIMMs and its
+ * CSROWs
+ */
+static void dct_debug_display_dimm_sizes(struct amd64_pvt *pvt, u8 ctrl)
+{
+ u32 *dcsb = ctrl ? pvt->csels[1].csbases : pvt->csels[0].csbases;
+ u32 dbam = ctrl ? pvt->dbam1 : pvt->dbam0;
+ int dimm, size0, size1;
+
+ if (pvt->fam == 0xf) {
+ /* K8 families < revF not supported yet */
+ if (pvt->ext_model < K8_REV_F)
+ return;
+
+ WARN_ON(ctrl != 0);
+ }
+
+ if (pvt->fam == 0x10) {
+ dbam = (ctrl && !dct_ganging_enabled(pvt)) ? pvt->dbam1
+ : pvt->dbam0;
+ dcsb = (ctrl && !dct_ganging_enabled(pvt)) ?
+ pvt->csels[1].csbases :
+ pvt->csels[0].csbases;
+ } else if (ctrl) {
+ dbam = pvt->dbam0;
+ dcsb = pvt->csels[1].csbases;
+ }
+ edac_dbg(1, "F2x%d80 (DRAM Bank Address Mapping): 0x%08x\n",
+ ctrl, dbam);
+
+ edac_printk(KERN_DEBUG, EDAC_MC, "DCT%d chip selects:\n", ctrl);
+
+ /* Dump memory sizes for DIMM and its CSROWs */
+ for (dimm = 0; dimm < 4; dimm++) {
+ size0 = 0;
+ if (dcsb[dimm * 2] & DCSB_CS_ENABLE)
+ /*
+ * For F15m60h, we need multiplier for LRDIMM cs_size
+ * calculation. We pass dimm value to the dbam_to_cs
+ * mapper so we can find the multiplier from the
+ * corresponding DCSM.
+ */
+ size0 = pvt->ops->dbam_to_cs(pvt, ctrl,
+ DBAM_DIMM(dimm, dbam),
+ dimm);
+
+ size1 = 0;
+ if (dcsb[dimm * 2 + 1] & DCSB_CS_ENABLE)
+ size1 = pvt->ops->dbam_to_cs(pvt, ctrl,
+ DBAM_DIMM(dimm, dbam),
+ dimm);
+
+ amd64_info(EDAC_MC ": %d: %5dMB %d: %5dMB\n",
+ dimm * 2, size0,
+ dimm * 2 + 1, size1);
+ }
+}
+

static void debug_dump_dramcfg_low(struct amd64_pvt *pvt, u32 dclr, int chan)
{
@@ -1366,7 +1424,7 @@ static int f17_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt)
return cs_mode;
}

-static void debug_display_dimm_sizes_df(struct amd64_pvt *pvt, u8 ctrl)
+static void umc_debug_display_dimm_sizes(struct amd64_pvt *pvt, u8 ctrl)
{
int dimm, size0, size1, cs0, cs1, cs_mode;

@@ -1426,7 +1484,7 @@ static void __dump_misc_regs_df(struct amd64_pvt *pvt)
i, 1 << ((tmp >> 4) & 0x3));
}

- debug_display_dimm_sizes_df(pvt, i);
+ umc_debug_display_dimm_sizes(pvt, i);
}
}

@@ -1451,13 +1509,13 @@ static void __dump_misc_regs(struct amd64_pvt *pvt)
(pvt->fam == 0xf) ? k8_dhar_offset(pvt)
: f10_dhar_offset(pvt));

- debug_display_dimm_sizes(pvt, 0);
+ dct_debug_display_dimm_sizes(pvt, 0);

/* everything below this point is Fam10h and above */
if (pvt->fam == 0xf)
return;

- debug_display_dimm_sizes(pvt, 1);
+ dct_debug_display_dimm_sizes(pvt, 1);

/* Only if NOT ganged does dclr1 have valid info */
if (!dct_ganging_enabled(pvt))
@@ -2681,66 +2739,6 @@ static void f1x_map_sysaddr_to_csrow(struct mem_ctl_info *mci, u64 sys_addr,
err->channel = get_channel_from_ecc_syndrome(mci, err->syndrome);
}

-/*
- * debug routine to display the memory sizes of all logical DIMMs and its
- * CSROWs
- */
-static void debug_display_dimm_sizes(struct amd64_pvt *pvt, u8 ctrl)
-{
- int dimm, size0, size1;
- u32 *dcsb = ctrl ? pvt->csels[1].csbases : pvt->csels[0].csbases;
- u32 dbam = ctrl ? pvt->dbam1 : pvt->dbam0;
-
- if (pvt->fam == 0xf) {
- /* K8 families < revF not supported yet */
- if (pvt->ext_model < K8_REV_F)
- return;
- else
- WARN_ON(ctrl != 0);
- }
-
- if (pvt->fam == 0x10) {
- dbam = (ctrl && !dct_ganging_enabled(pvt)) ? pvt->dbam1
- : pvt->dbam0;
- dcsb = (ctrl && !dct_ganging_enabled(pvt)) ?
- pvt->csels[1].csbases :
- pvt->csels[0].csbases;
- } else if (ctrl) {
- dbam = pvt->dbam0;
- dcsb = pvt->csels[1].csbases;
- }
- edac_dbg(1, "F2x%d80 (DRAM Bank Address Mapping): 0x%08x\n",
- ctrl, dbam);
-
- edac_printk(KERN_DEBUG, EDAC_MC, "DCT%d chip selects:\n", ctrl);
-
- /* Dump memory sizes for DIMM and its CSROWs */
- for (dimm = 0; dimm < 4; dimm++) {
-
- size0 = 0;
- if (dcsb[dimm*2] & DCSB_CS_ENABLE)
- /*
- * For F15m60h, we need multiplier for LRDIMM cs_size
- * calculation. We pass dimm value to the dbam_to_cs
- * mapper so we can find the multiplier from the
- * corresponding DCSM.
- */
- size0 = pvt->ops->dbam_to_cs(pvt, ctrl,
- DBAM_DIMM(dimm, dbam),
- dimm);
-
- size1 = 0;
- if (dcsb[dimm*2 + 1] & DCSB_CS_ENABLE)
- size1 = pvt->ops->dbam_to_cs(pvt, ctrl,
- DBAM_DIMM(dimm, dbam),
- dimm);
-
- amd64_info(EDAC_MC ": %d: %5dMB %d: %5dMB\n",
- dimm * 2, size0,
- dimm * 2 + 1, size1);
- }
-}
-
static struct amd64_family_type family_types[] = {
[K8_CPUS] = {
.ctl_name = "K8",
--
2.25.1


2023-01-27 17:06:42

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH v2 05/22] EDAC/amd64: Remove early_channel_count()

The early_channel_count() function seems to have been useful in the past
for knowing how many EDAC mci structures to populate. However, this is no
longer needed as the maximum channel count for a system is used instead.

Remove the early_channel_count() helper functions and related code. Use the
size of the channel layer when iterating over channel structures.

Signed-off-by: Yazen Ghannam <[email protected]>
---
Link:
https://lore.kernel.org/r/[email protected]

v1->v2:
* New in v2.

drivers/edac/amd64_edac.c | 116 +-------------------------------------
drivers/edac/amd64_edac.h | 2 -
2 files changed, 2 insertions(+), 116 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 352cbcda53f9..1c4bef1cdf28 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -1703,24 +1703,6 @@ static void determine_memory_type(struct amd64_pvt *pvt)
pvt->dram_type = (pvt->dclr0 & BIT(16)) ? MEM_DDR3 : MEM_RDDR3;
}

-/* Get the number of DCT channels the memory controller is using. */
-static int k8_early_channel_count(struct amd64_pvt *pvt)
-{
- int flag;
-
- if (pvt->ext_model >= K8_REV_F)
- /* RevF (NPT) and later */
- flag = pvt->dclr0 & WIDTH_128;
- else
- /* RevE and earlier */
- flag = pvt->dclr0 & REVE_WIDTH_128;
-
- /* not used */
- pvt->dclr1 = 0;
-
- return (flag) ? 2 : 1;
-}
-
/* On F10h and later ErrAddr is MC4_ADDR[47:1] */
static u64 get_error_address(struct amd64_pvt *pvt, struct mce *m)
{
@@ -1972,69 +1954,6 @@ static int k8_dbam_to_chip_select(struct amd64_pvt *pvt, u8 dct,
}
}

-/*
- * Get the number of DCT channels in use.
- *
- * Return:
- * number of Memory Channels in operation
- * Pass back:
- * contents of the DCL0_LOW register
- */
-static int f1x_early_channel_count(struct amd64_pvt *pvt)
-{
- int i, j, channels = 0;
-
- /* On F10h, if we are in 128 bit mode, then we are using 2 channels */
- if (pvt->fam == 0x10 && (pvt->dclr0 & WIDTH_128))
- return 2;
-
- /*
- * Need to check if in unganged mode: In such, there are 2 channels,
- * but they are not in 128 bit mode and thus the above 'dclr0' status
- * bit will be OFF.
- *
- * Need to check DCT0[0] and DCT1[0] to see if only one of them has
- * their CSEnable bit on. If so, then SINGLE DIMM case.
- */
- edac_dbg(0, "Data width is not 128 bits - need more decoding\n");
-
- /*
- * Check DRAM Bank Address Mapping values for each DIMM to see if there
- * is more than just one DIMM present in unganged mode. Need to check
- * both controllers since DIMMs can be placed in either one.
- */
- for (i = 0; i < 2; i++) {
- u32 dbam = (i ? pvt->dbam1 : pvt->dbam0);
-
- for (j = 0; j < 4; j++) {
- if (DBAM_DIMM(j, dbam) > 0) {
- channels++;
- break;
- }
- }
- }
-
- if (channels > 2)
- channels = 2;
-
- amd64_info("MCT channel count: %d\n", channels);
-
- return channels;
-}
-
-static int f17_early_channel_count(struct amd64_pvt *pvt)
-{
- int i, channels = 0;
-
- /* SDP Control bit 31 (SdpInit) is clear for unused UMC channels */
- for_each_umc(i)
- channels += !!(pvt->umc[i].sdp_ctrl & UMC_SDP_INIT);
-
- amd64_info("MCT channel count: %d\n", channels);
-
- return channels;
-}
-
static int ddr3_cs_size(unsigned i, bool dct_width)
{
unsigned shift = 0;
@@ -2829,7 +2748,6 @@ static struct amd64_family_type family_types[] = {
.f2_id = PCI_DEVICE_ID_AMD_K8_NB_MEMCTL,
.max_mcs = 2,
.ops = {
- .early_channel_count = k8_early_channel_count,
.map_sysaddr_to_csrow = k8_map_sysaddr_to_csrow,
.dbam_to_cs = k8_dbam_to_chip_select,
}
@@ -2840,7 +2758,6 @@ static struct amd64_family_type family_types[] = {
.f2_id = PCI_DEVICE_ID_AMD_10H_NB_DRAM,
.max_mcs = 2,
.ops = {
- .early_channel_count = f1x_early_channel_count,
.map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
.dbam_to_cs = f10_dbam_to_chip_select,
}
@@ -2851,7 +2768,6 @@ static struct amd64_family_type family_types[] = {
.f2_id = PCI_DEVICE_ID_AMD_15H_NB_F2,
.max_mcs = 2,
.ops = {
- .early_channel_count = f1x_early_channel_count,
.map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
.dbam_to_cs = f15_dbam_to_chip_select,
}
@@ -2862,7 +2778,6 @@ static struct amd64_family_type family_types[] = {
.f2_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F2,
.max_mcs = 2,
.ops = {
- .early_channel_count = f1x_early_channel_count,
.map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
.dbam_to_cs = f16_dbam_to_chip_select,
}
@@ -2873,7 +2788,6 @@ static struct amd64_family_type family_types[] = {
.f2_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F2,
.max_mcs = 2,
.ops = {
- .early_channel_count = f1x_early_channel_count,
.map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
.dbam_to_cs = f15_m60h_dbam_to_chip_select,
}
@@ -2884,7 +2798,6 @@ static struct amd64_family_type family_types[] = {
.f2_id = PCI_DEVICE_ID_AMD_16H_NB_F2,
.max_mcs = 2,
.ops = {
- .early_channel_count = f1x_early_channel_count,
.map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
.dbam_to_cs = f16_dbam_to_chip_select,
}
@@ -2895,7 +2808,6 @@ static struct amd64_family_type family_types[] = {
.f2_id = PCI_DEVICE_ID_AMD_16H_M30H_NB_F2,
.max_mcs = 2,
.ops = {
- .early_channel_count = f1x_early_channel_count,
.map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
.dbam_to_cs = f16_dbam_to_chip_select,
}
@@ -2904,7 +2816,6 @@ static struct amd64_family_type family_types[] = {
.ctl_name = "F17h",
.max_mcs = 2,
.ops = {
- .early_channel_count = f17_early_channel_count,
.dbam_to_cs = f17_addr_mask_to_cs_size,
}
},
@@ -2912,7 +2823,6 @@ static struct amd64_family_type family_types[] = {
.ctl_name = "F17h_M10h",
.max_mcs = 2,
.ops = {
- .early_channel_count = f17_early_channel_count,
.dbam_to_cs = f17_addr_mask_to_cs_size,
}
},
@@ -2920,7 +2830,6 @@ static struct amd64_family_type family_types[] = {
.ctl_name = "F17h_M30h",
.max_mcs = 8,
.ops = {
- .early_channel_count = f17_early_channel_count,
.dbam_to_cs = f17_addr_mask_to_cs_size,
}
},
@@ -2928,7 +2837,6 @@ static struct amd64_family_type family_types[] = {
.ctl_name = "F17h_M60h",
.max_mcs = 2,
.ops = {
- .early_channel_count = f17_early_channel_count,
.dbam_to_cs = f17_addr_mask_to_cs_size,
}
},
@@ -2936,7 +2844,6 @@ static struct amd64_family_type family_types[] = {
.ctl_name = "F17h_M70h",
.max_mcs = 2,
.ops = {
- .early_channel_count = f17_early_channel_count,
.dbam_to_cs = f17_addr_mask_to_cs_size,
}
},
@@ -2944,7 +2851,6 @@ static struct amd64_family_type family_types[] = {
.ctl_name = "F19h",
.max_mcs = 8,
.ops = {
- .early_channel_count = f17_early_channel_count,
.dbam_to_cs = f17_addr_mask_to_cs_size,
}
},
@@ -2953,7 +2859,6 @@ static struct amd64_family_type family_types[] = {
.max_mcs = 12,
.flags.zn_regs_v2 = 1,
.ops = {
- .early_channel_count = f17_early_channel_count,
.dbam_to_cs = f17_addr_mask_to_cs_size,
}
},
@@ -2961,7 +2866,6 @@ static struct amd64_family_type family_types[] = {
.ctl_name = "F19h_M50h",
.max_mcs = 2,
.ops = {
- .early_channel_count = f17_early_channel_count,
.dbam_to_cs = f17_addr_mask_to_cs_size,
}
},
@@ -3620,7 +3524,7 @@ static int init_csrows(struct mem_ctl_info *mci)
: EDAC_SECDED;
}

- for (j = 0; j < pvt->channel_count; j++) {
+ for (j = 0; j < fam_type->max_mcs; j++) {
dimm = csrow->channels[j]->dimm;
dimm->mtype = pvt->dram_type;
dimm->edac_mode = edac_mode;
@@ -4057,28 +3961,12 @@ static int init_one_instance(struct amd64_pvt *pvt)
{
struct mem_ctl_info *mci = NULL;
struct edac_mc_layer layers[2];
- int ret = -EINVAL;
+ int ret = -ENOMEM;

- /*
- * We need to determine how many memory channels there are. Then use
- * that information for calculating the size of the dynamic instance
- * tables in the 'mci' structure.
- */
- pvt->channel_count = pvt->ops->early_channel_count(pvt);
- if (pvt->channel_count < 0)
- return ret;
-
- ret = -ENOMEM;
layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
layers[0].size = pvt->csels[0].b_cnt;
layers[0].is_virt_csrow = true;
layers[1].type = EDAC_MC_LAYER_CHANNEL;
-
- /*
- * Always allocate two channels since we can have setups with DIMMs on
- * only one channel. Also, this simplifies handling later for the price
- * of a couple of KBs tops.
- */
layers[1].size = fam_type->max_mcs;
layers[1].is_virt_csrow = false;

diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 398fb58dacbf..e4329dff8cf2 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -346,7 +346,6 @@ struct amd64_pvt {
u8 stepping; /* ... stepping */

int ext_model; /* extended model value of this node */
- int channel_count;

/* Raw registers */
u32 dclr0; /* DRAM Configuration Low DCT0 reg */
@@ -466,7 +465,6 @@ struct ecc_settings {
* functions and per device encoding/decoding logic.
*/
struct low_ops {
- int (*early_channel_count) (struct amd64_pvt *pvt);
void (*map_sysaddr_to_csrow) (struct mem_ctl_info *mci, u64 sys_addr,
struct err_info *);
int (*dbam_to_cs) (struct amd64_pvt *pvt, u8 dct,
--
2.25.1


2023-01-27 17:06:44

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH v2 08/22] EDAC/amd64: Drop dbam_to_cs() for Family 17h and later

The same function is used to calculate chip select size for all Zen-based
family/models. Therefore, a family/model function pointer is not necessary.

Drop the dbam_to_cs() function pointer for Family 17h and later systems.
Also, move the Family 17h function to avoid a forward declaration. Rename
it to indicate that the UMC Address Mask is used rather than the legacy
DBAM value.

Signed-off-by: Yazen Ghannam <[email protected]>
---
Link:
https://lore.kernel.org/r/[email protected]

v1->v2:
* New in v2.

drivers/edac/amd64_edac.c | 186 +++++++++++++++++---------------------
1 file changed, 81 insertions(+), 105 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 5559d05fb15d..e13fe400bad5 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -1424,6 +1424,84 @@ static int umc_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt)
return cs_mode;
}

+static int umc_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
+ unsigned int cs_mode, int csrow_nr)
+{
+ u32 addr_mask_orig, addr_mask_deinterleaved;
+ u32 msb, weight, num_zero_bits;
+ int cs_mask_nr = csrow_nr;
+ int dimm, size = 0;
+
+ /* No Chip Selects are enabled. */
+ if (!cs_mode)
+ return size;
+
+ /* Requested size of an even CS but none are enabled. */
+ if (!(cs_mode & CS_EVEN) && !(csrow_nr & 1))
+ return size;
+
+ /* Requested size of an odd CS but none are enabled. */
+ if (!(cs_mode & CS_ODD) && (csrow_nr & 1))
+ return size;
+
+ /*
+ * Family 17h introduced systems with one mask per DIMM,
+ * and two Chip Selects per DIMM.
+ *
+ * CS0 and CS1 -> MASK0 / DIMM0
+ * CS2 and CS3 -> MASK1 / DIMM1
+ *
+ * Family 19h Model 10h introduced systems with one mask per Chip Select,
+ * and two Chip Selects per DIMM.
+ *
+ * CS0 -> MASK0 -> DIMM0
+ * CS1 -> MASK1 -> DIMM0
+ * CS2 -> MASK2 -> DIMM1
+ * CS3 -> MASK3 -> DIMM1
+ *
+ * Keep the mask number equal to the Chip Select number for newer systems,
+ * and shift the mask number for older systems.
+ */
+ dimm = csrow_nr >> 1;
+
+ if (!fam_type->flags.zn_regs_v2)
+ cs_mask_nr >>= 1;
+
+ /* Asymmetric dual-rank DIMM support. */
+ if ((csrow_nr & 1) && (cs_mode & CS_ODD_SECONDARY))
+ addr_mask_orig = pvt->csels[umc].csmasks_sec[cs_mask_nr];
+ else
+ addr_mask_orig = pvt->csels[umc].csmasks[cs_mask_nr];
+
+ /*
+ * The number of zero bits in the mask is equal to the number of bits
+ * in a full mask minus the number of bits in the current mask.
+ *
+ * The MSB is the number of bits in the full mask because BIT[0] is
+ * always 0.
+ *
+ * In the special 3 Rank interleaving case, a single bit is flipped
+ * without swapping with the most significant bit. This can be handled
+ * by keeping the MSB where it is and ignoring the single zero bit.
+ */
+ msb = fls(addr_mask_orig) - 1;
+ weight = hweight_long(addr_mask_orig);
+ num_zero_bits = msb - weight - !!(cs_mode & CS_3R_INTERLEAVE);
+
+ /* Take the number of zero bits off from the top of the mask. */
+ addr_mask_deinterleaved = GENMASK_ULL(msb - num_zero_bits, 1);
+
+ edac_dbg(1, "CS%d DIMM%d AddrMasks:\n", csrow_nr, dimm);
+ edac_dbg(1, " Original AddrMask: 0x%x\n", addr_mask_orig);
+ edac_dbg(1, " Deinterleaved AddrMask: 0x%x\n", addr_mask_deinterleaved);
+
+ /* Register [31:1] = Address [39:9]. Size is in kBs here. */
+ size = (addr_mask_deinterleaved >> 2) + 1;
+
+ /* Return size in MBs. */
+ return size >> 10;
+}
+
static void umc_debug_display_dimm_sizes(struct amd64_pvt *pvt, u8 ctrl)
{
int dimm, size0, size1, cs0, cs1, cs_mode;
@@ -1436,8 +1514,8 @@ static void umc_debug_display_dimm_sizes(struct amd64_pvt *pvt, u8 ctrl)

cs_mode = umc_get_cs_mode(dimm, ctrl, pvt);

- size0 = pvt->ops->dbam_to_cs(pvt, ctrl, cs_mode, cs0);
- size1 = pvt->ops->dbam_to_cs(pvt, ctrl, cs_mode, cs1);
+ size0 = umc_addr_mask_to_cs_size(pvt, ctrl, cs_mode, cs0);
+ size1 = umc_addr_mask_to_cs_size(pvt, ctrl, cs_mode, cs1);

amd64_info(EDAC_MC ": %d: %5dMB %d: %5dMB\n",
cs0, size0,
@@ -2139,84 +2217,6 @@ static int f16_dbam_to_chip_select(struct amd64_pvt *pvt, u8 dct,
return ddr3_cs_size(cs_mode, false);
}

-static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
- unsigned int cs_mode, int csrow_nr)
-{
- u32 addr_mask_orig, addr_mask_deinterleaved;
- u32 msb, weight, num_zero_bits;
- int cs_mask_nr = csrow_nr;
- int dimm, size = 0;
-
- /* No Chip Selects are enabled. */
- if (!cs_mode)
- return size;
-
- /* Requested size of an even CS but none are enabled. */
- if (!(cs_mode & CS_EVEN) && !(csrow_nr & 1))
- return size;
-
- /* Requested size of an odd CS but none are enabled. */
- if (!(cs_mode & CS_ODD) && (csrow_nr & 1))
- return size;
-
- /*
- * Family 17h introduced systems with one mask per DIMM,
- * and two Chip Selects per DIMM.
- *
- * CS0 and CS1 -> MASK0 / DIMM0
- * CS2 and CS3 -> MASK1 / DIMM1
- *
- * Family 19h Model 10h introduced systems with one mask per Chip Select,
- * and two Chip Selects per DIMM.
- *
- * CS0 -> MASK0 -> DIMM0
- * CS1 -> MASK1 -> DIMM0
- * CS2 -> MASK2 -> DIMM1
- * CS3 -> MASK3 -> DIMM1
- *
- * Keep the mask number equal to the Chip Select number for newer systems,
- * and shift the mask number for older systems.
- */
- dimm = csrow_nr >> 1;
-
- if (!fam_type->flags.zn_regs_v2)
- cs_mask_nr >>= 1;
-
- /* Asymmetric dual-rank DIMM support. */
- if ((csrow_nr & 1) && (cs_mode & CS_ODD_SECONDARY))
- addr_mask_orig = pvt->csels[umc].csmasks_sec[cs_mask_nr];
- else
- addr_mask_orig = pvt->csels[umc].csmasks[cs_mask_nr];
-
- /*
- * The number of zero bits in the mask is equal to the number of bits
- * in a full mask minus the number of bits in the current mask.
- *
- * The MSB is the number of bits in the full mask because BIT[0] is
- * always 0.
- *
- * In the special 3 Rank interleaving case, a single bit is flipped
- * without swapping with the most significant bit. This can be handled
- * by keeping the MSB where it is and ignoring the single zero bit.
- */
- msb = fls(addr_mask_orig) - 1;
- weight = hweight_long(addr_mask_orig);
- num_zero_bits = msb - weight - !!(cs_mode & CS_3R_INTERLEAVE);
-
- /* Take the number of zero bits off from the top of the mask. */
- addr_mask_deinterleaved = GENMASK_ULL(msb - num_zero_bits, 1);
-
- edac_dbg(1, "CS%d DIMM%d AddrMasks:\n", csrow_nr, dimm);
- edac_dbg(1, " Original AddrMask: 0x%x\n", addr_mask_orig);
- edac_dbg(1, " Deinterleaved AddrMask: 0x%x\n", addr_mask_deinterleaved);
-
- /* Register [31:1] = Address [39:9]. Size is in kBs here. */
- size = (addr_mask_deinterleaved >> 2) + 1;
-
- /* Return size in MBs. */
- return size >> 10;
-}
-
static void read_dram_ctl_register(struct amd64_pvt *pvt)
{

@@ -2813,59 +2813,35 @@ static struct amd64_family_type family_types[] = {
[F17_CPUS] = {
.ctl_name = "F17h",
.max_mcs = 2,
- .ops = {
- .dbam_to_cs = f17_addr_mask_to_cs_size,
- }
},
[F17_M10H_CPUS] = {
.ctl_name = "F17h_M10h",
.max_mcs = 2,
- .ops = {
- .dbam_to_cs = f17_addr_mask_to_cs_size,
- }
},
[F17_M30H_CPUS] = {
.ctl_name = "F17h_M30h",
.max_mcs = 8,
- .ops = {
- .dbam_to_cs = f17_addr_mask_to_cs_size,
- }
},
[F17_M60H_CPUS] = {
.ctl_name = "F17h_M60h",
.max_mcs = 2,
- .ops = {
- .dbam_to_cs = f17_addr_mask_to_cs_size,
- }
},
[F17_M70H_CPUS] = {
.ctl_name = "F17h_M70h",
.max_mcs = 2,
- .ops = {
- .dbam_to_cs = f17_addr_mask_to_cs_size,
- }
},
[F19_CPUS] = {
.ctl_name = "F19h",
.max_mcs = 8,
- .ops = {
- .dbam_to_cs = f17_addr_mask_to_cs_size,
- }
},
[F19_M10H_CPUS] = {
.ctl_name = "F19h_M10h",
.max_mcs = 12,
.flags.zn_regs_v2 = 1,
- .ops = {
- .dbam_to_cs = f17_addr_mask_to_cs_size,
- }
},
[F19_M50H_CPUS] = {
.ctl_name = "F19h_M50h",
.max_mcs = 2,
- .ops = {
- .dbam_to_cs = f17_addr_mask_to_cs_size,
- }
},
};

@@ -3414,7 +3390,7 @@ static u32 umc_get_csrow_nr_pages(struct amd64_pvt *pvt, u8 dct, int csrow_nr_or

cs_mode = umc_get_cs_mode(csrow_nr >> 1, dct, pvt);

- nr_pages = pvt->ops->dbam_to_cs(pvt, dct, cs_mode, csrow_nr);
+ nr_pages = umc_addr_mask_to_cs_size(pvt, dct, cs_mode, csrow_nr);
nr_pages <<= 20 - PAGE_SHIFT;

edac_dbg(0, "csrow: %d, channel: %d, cs_mode %d\n",
--
2.25.1


2023-01-27 17:06:48

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH v2 10/22] EDAC/amd64: Merge struct amd64_family_type into struct amd64_pvt

From: Muralidhara M K <[email protected]>

Future AMD systems will support heterogeneous "AMD Node" types, e.g.
CPU+GPU. Therefore, a global "family type" shared across all "AMD Nodes"
is no longer appropriate.

Move struct low_ops routines and members of struct amd64_family_type
to struct amd64_pvt.

Currently, there are many code branches that split between "modern" and
"legacy" systems. Another code branch will be needed in order to cover
"GPU" cases. However, rather than introduce another branching case in
multiple functions, the current branching code should be switched to a
set of function pointers. This change makes the code more readable and
simplifies adding support for new families/models.

In order to reuse code, define two sets of function pointers. Use one
for modern systems (Family 17h and later). This will not change between
current CPU families. Use another set of function pointers for legacy
systems (before Family 17h). Use the Family 16h versions as default
for the legacy ops since these are the latest, and adjust the function
pointers as needed for older families.

Signed-off-by: Muralidhara M K <[email protected]>
Co-developed-by: Naveen Krishna Chatradhi <[email protected]>
Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
[Rebased/reworked patch and reworded commit message]
Co-developed-by: Yazen Ghannam <[email protected]>
Signed-off-by: Yazen Ghannam <[email protected]>
---
Link:
https://lore.kernel.org/r/[email protected]

v1->v2:
* Same a v1 patch 5, but rebased on new patches.

drivers/edac/amd64_edac.c | 308 ++++++++++++++------------------------
drivers/edac/amd64_edac.h | 62 +++-----
2 files changed, 133 insertions(+), 237 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 1d5c2c97d563..a9b20bc413af 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -13,11 +13,9 @@ module_param(ecc_enable_override, int, 0644);

static struct msr __percpu *msrs;

-static struct amd64_family_type *fam_type;
-
-static inline u32 get_umc_reg(u32 reg)
+static inline u32 get_umc_reg(struct amd64_pvt *pvt, u32 reg)
{
- if (!fam_type->flags.zn_regs_v2)
+ if (!pvt->flags.zn_regs_v2)
return reg;

switch (reg) {
@@ -437,7 +435,7 @@ static void get_cs_base_and_mask(struct amd64_pvt *pvt, int csrow, u8 dct,
for (i = 0; i < pvt->csels[dct].m_cnt; i++)

#define for_each_umc(i) \
- for (i = 0; i < fam_type->max_mcs; i++)
+ for (i = 0; i < pvt->max_mcs; i++)

/*
* @input_addr is an InputAddr associated with the node given by mci. Return the
@@ -1464,7 +1462,7 @@ static int umc_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
*/
dimm = csrow_nr >> 1;

- if (!fam_type->flags.zn_regs_v2)
+ if (!pvt->flags.zn_regs_v2)
cs_mask_nr >>= 1;

/* Asymmetric dual-rank DIMM support. */
@@ -1556,7 +1554,7 @@ static void __dump_misc_regs_df(struct amd64_pvt *pvt)

if (umc->dram_type == MEM_LRDDR4 || umc->dram_type == MEM_LRDDR5) {
amd_smn_read(pvt->mc_node_id,
- umc_base + get_umc_reg(UMCCH_ADDR_CFG),
+ umc_base + get_umc_reg(pvt, UMCCH_ADDR_CFG),
&tmp);
edac_dbg(1, "UMC%d LRDIMM %dx rank multiply\n",
i, 1 << ((tmp >> 4) & 0x3));
@@ -1629,7 +1627,7 @@ static void prep_chip_selects(struct amd64_pvt *pvt)

for_each_umc(umc) {
pvt->csels[umc].b_cnt = 4;
- pvt->csels[umc].m_cnt = fam_type->flags.zn_regs_v2 ? 4 : 2;
+ pvt->csels[umc].m_cnt = pvt->flags.zn_regs_v2 ? 4 : 2;
}

} else {
@@ -1669,7 +1667,7 @@ static void read_umc_base_mask(struct amd64_pvt *pvt)
}

umc_mask_reg = get_umc_base(umc) + UMCCH_ADDR_MASK;
- umc_mask_reg_sec = get_umc_base(umc) + get_umc_reg(UMCCH_ADDR_MASK_SEC);
+ umc_mask_reg_sec = get_umc_base(umc) + get_umc_reg(pvt, UMCCH_ADDR_MASK_SEC);

for_each_chip_select_mask(cs, umc, pvt) {
mask = &pvt->csels[umc].csmasks[cs];
@@ -1757,7 +1755,7 @@ static void determine_memory_type_df(struct amd64_pvt *pvt)
* Check if the system supports the "DDR Type" field in UMC Config
* and has DDR5 DIMMs in use.
*/
- if (fam_type->flags.zn_regs_v2 && ((umc->umc_cfg & GENMASK(2, 0)) == 0x1)) {
+ if (pvt->flags.zn_regs_v2 && ((umc->umc_cfg & GENMASK(2, 0)) == 0x1)) {
if (umc->dimm_cfg & BIT(5))
umc->dram_type = MEM_LRDDR5;
else if (umc->dimm_cfg & BIT(4))
@@ -2739,112 +2737,6 @@ static void f1x_map_sysaddr_to_csrow(struct mem_ctl_info *mci, u64 sys_addr,
err->channel = get_channel_from_ecc_syndrome(mci, err->syndrome);
}

-static struct amd64_family_type family_types[] = {
- [K8_CPUS] = {
- .ctl_name = "K8",
- .f1_id = PCI_DEVICE_ID_AMD_K8_NB_ADDRMAP,
- .f2_id = PCI_DEVICE_ID_AMD_K8_NB_MEMCTL,
- .max_mcs = 2,
- .ops = {
- .map_sysaddr_to_csrow = k8_map_sysaddr_to_csrow,
- .dbam_to_cs = k8_dbam_to_chip_select,
- }
- },
- [F10_CPUS] = {
- .ctl_name = "F10h",
- .f1_id = PCI_DEVICE_ID_AMD_10H_NB_MAP,
- .f2_id = PCI_DEVICE_ID_AMD_10H_NB_DRAM,
- .max_mcs = 2,
- .ops = {
- .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
- .dbam_to_cs = f10_dbam_to_chip_select,
- }
- },
- [F15_CPUS] = {
- .ctl_name = "F15h",
- .f1_id = PCI_DEVICE_ID_AMD_15H_NB_F1,
- .f2_id = PCI_DEVICE_ID_AMD_15H_NB_F2,
- .max_mcs = 2,
- .ops = {
- .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
- .dbam_to_cs = f15_dbam_to_chip_select,
- }
- },
- [F15_M30H_CPUS] = {
- .ctl_name = "F15h_M30h",
- .f1_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F1,
- .f2_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F2,
- .max_mcs = 2,
- .ops = {
- .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
- .dbam_to_cs = f16_dbam_to_chip_select,
- }
- },
- [F15_M60H_CPUS] = {
- .ctl_name = "F15h_M60h",
- .f1_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F1,
- .f2_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F2,
- .max_mcs = 2,
- .ops = {
- .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
- .dbam_to_cs = f15_m60h_dbam_to_chip_select,
- }
- },
- [F16_CPUS] = {
- .ctl_name = "F16h",
- .f1_id = PCI_DEVICE_ID_AMD_16H_NB_F1,
- .f2_id = PCI_DEVICE_ID_AMD_16H_NB_F2,
- .max_mcs = 2,
- .ops = {
- .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
- .dbam_to_cs = f16_dbam_to_chip_select,
- }
- },
- [F16_M30H_CPUS] = {
- .ctl_name = "F16h_M30h",
- .f1_id = PCI_DEVICE_ID_AMD_16H_M30H_NB_F1,
- .f2_id = PCI_DEVICE_ID_AMD_16H_M30H_NB_F2,
- .max_mcs = 2,
- .ops = {
- .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
- .dbam_to_cs = f16_dbam_to_chip_select,
- }
- },
- [F17_CPUS] = {
- .ctl_name = "F17h",
- .max_mcs = 2,
- },
- [F17_M10H_CPUS] = {
- .ctl_name = "F17h_M10h",
- .max_mcs = 2,
- },
- [F17_M30H_CPUS] = {
- .ctl_name = "F17h_M30h",
- .max_mcs = 8,
- },
- [F17_M60H_CPUS] = {
- .ctl_name = "F17h_M60h",
- .max_mcs = 2,
- },
- [F17_M70H_CPUS] = {
- .ctl_name = "F17h_M70h",
- .max_mcs = 2,
- },
- [F19_CPUS] = {
- .ctl_name = "F19h",
- .max_mcs = 8,
- },
- [F19_M10H_CPUS] = {
- .ctl_name = "F19h_M10h",
- .max_mcs = 12,
- .flags.zn_regs_v2 = 1,
- },
- [F19_M50H_CPUS] = {
- .ctl_name = "F19h_M50h",
- .max_mcs = 2,
- },
-};
-
/*
* These are tables of eigenvectors (one per line) which can be used for the
* construction of the syndrome tables. The modified syndrome search algorithm
@@ -3226,7 +3118,7 @@ static void __read_mc_regs_df(struct amd64_pvt *pvt)
umc_base = get_umc_base(i);
umc = &pvt->umc[i];

- amd_smn_read(nid, umc_base + get_umc_reg(UMCCH_DIMM_CFG), &umc->dimm_cfg);
+ amd_smn_read(nid, umc_base + get_umc_reg(pvt, UMCCH_DIMM_CFG), &umc->dimm_cfg);
amd_smn_read(nid, umc_base + UMCCH_UMC_CFG, &umc->umc_cfg);
amd_smn_read(nid, umc_base + UMCCH_SDP_CTRL, &umc->sdp_ctrl);
amd_smn_read(nid, umc_base + UMCCH_ECC_CTRL, &umc->ecc_ctrl);
@@ -3495,7 +3387,7 @@ static int init_csrows(struct mem_ctl_info *mci)
: EDAC_SECDED;
}

- for (j = 0; j < fam_type->max_mcs; j++) {
+ for (j = 0; j < pvt->max_mcs; j++) {
dimm = csrow->channels[j]->dimm;
dimm->mtype = pvt->dram_type;
dimm->edac_mode = edac_mode;
@@ -3767,7 +3659,7 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci)

mci->edac_cap = determine_edac_cap(pvt);
mci->mod_name = EDAC_MOD_STR;
- mci->ctl_name = fam_type->ctl_name;
+ mci->ctl_name = pvt->ctl_name;
mci->dev_name = pci_name(pvt->F3);
mci->ctl_page_to_phys = NULL;

@@ -3779,114 +3671,145 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci)
mci->get_sdram_scrub_rate = get_scrub_rate;
}

-/*
- * returns a pointer to the family descriptor on success, NULL otherwise.
- */
-static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt)
+static struct low_ops umc_ops = {
+};
+
+/* Use Family 16h versions for defaults and adjust as needed below. */
+static struct low_ops dct_ops = {
+ .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
+ .dbam_to_cs = f16_dbam_to_chip_select,
+};
+
+static int per_family_init(struct amd64_pvt *pvt)
{
pvt->ext_model = boot_cpu_data.x86_model >> 4;
pvt->stepping = boot_cpu_data.x86_stepping;
pvt->model = boot_cpu_data.x86_model;
pvt->fam = boot_cpu_data.x86;
+ pvt->max_mcs = 2;
+
+ /*
+ * Decide on which ops group to use here and do any family/model
+ * overrides below.
+ */
+ if (pvt->fam >= 0x17)
+ pvt->ops = &umc_ops;
+ else
+ pvt->ops = &dct_ops;

switch (pvt->fam) {
case 0xf:
- fam_type = &family_types[K8_CPUS];
- pvt->ops = &family_types[K8_CPUS].ops;
+ pvt->ctl_name = (pvt->ext_model > K8_REV_F) ?
+ "K8 revF or later" : "K8 revE or earlier";
+ pvt->f1_id = PCI_DEVICE_ID_AMD_K8_NB_ADDRMAP;
+ pvt->f2_id = PCI_DEVICE_ID_AMD_K8_NB_MEMCTL;
+ pvt->ops->map_sysaddr_to_csrow = k8_map_sysaddr_to_csrow;
+ pvt->ops->dbam_to_cs = k8_dbam_to_chip_select;
break;

case 0x10:
- fam_type = &family_types[F10_CPUS];
- pvt->ops = &family_types[F10_CPUS].ops;
+ pvt->ctl_name = "F10h";
+ pvt->f1_id = PCI_DEVICE_ID_AMD_10H_NB_MAP;
+ pvt->f2_id = PCI_DEVICE_ID_AMD_10H_NB_DRAM;
+ pvt->ops->dbam_to_cs = f10_dbam_to_chip_select;
break;

case 0x15:
- if (pvt->model == 0x30) {
- fam_type = &family_types[F15_M30H_CPUS];
- pvt->ops = &family_types[F15_M30H_CPUS].ops;
+ switch (pvt->model) {
+ case 0x30:
+ pvt->ctl_name = "F15h_M30h";
+ pvt->f1_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F1;
+ pvt->f2_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F2;
break;
- } else if (pvt->model == 0x60) {
- fam_type = &family_types[F15_M60H_CPUS];
- pvt->ops = &family_types[F15_M60H_CPUS].ops;
+ case 0x60:
+ pvt->ctl_name = "F15h_M60h";
+ pvt->f1_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F1;
+ pvt->f2_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F2;
+ pvt->ops->dbam_to_cs = f15_m60h_dbam_to_chip_select;
+ break;
+ case 0x13:
+ /* Richland is only client */
+ return -ENODEV;
+ default:
+ pvt->ctl_name = "F15h";
+ pvt->f1_id = PCI_DEVICE_ID_AMD_15H_NB_F1;
+ pvt->f2_id = PCI_DEVICE_ID_AMD_15H_NB_F2;
+ pvt->ops->dbam_to_cs = f15_dbam_to_chip_select;
break;
- /* Richland is only client */
- } else if (pvt->model == 0x13) {
- return NULL;
- } else {
- fam_type = &family_types[F15_CPUS];
- pvt->ops = &family_types[F15_CPUS].ops;
}
break;

case 0x16:
- if (pvt->model == 0x30) {
- fam_type = &family_types[F16_M30H_CPUS];
- pvt->ops = &family_types[F16_M30H_CPUS].ops;
+ switch (pvt->model) {
+ case 0x30:
+ pvt->ctl_name = "F16h_M30h";
+ pvt->f1_id = PCI_DEVICE_ID_AMD_16H_M30H_NB_F1;
+ pvt->f2_id = PCI_DEVICE_ID_AMD_16H_M30H_NB_F2;
+ break;
+ default:
+ pvt->ctl_name = "F16h";
+ pvt->f1_id = PCI_DEVICE_ID_AMD_16H_NB_F1;
+ pvt->f2_id = PCI_DEVICE_ID_AMD_16H_NB_F2;
break;
}
- fam_type = &family_types[F16_CPUS];
- pvt->ops = &family_types[F16_CPUS].ops;
break;

case 0x17:
- if (pvt->model >= 0x10 && pvt->model <= 0x2f) {
- fam_type = &family_types[F17_M10H_CPUS];
- pvt->ops = &family_types[F17_M10H_CPUS].ops;
+ switch (pvt->model) {
+ case 0x10 ... 0x2f:
+ pvt->ctl_name = "F17h_M10h";
break;
- } else if (pvt->model >= 0x30 && pvt->model <= 0x3f) {
- fam_type = &family_types[F17_M30H_CPUS];
- pvt->ops = &family_types[F17_M30H_CPUS].ops;
+ case 0x30 ... 0x3f:
+ pvt->ctl_name = "F17h_M30h";
+ pvt->max_mcs = 8;
break;
- } else if (pvt->model >= 0x60 && pvt->model <= 0x6f) {
- fam_type = &family_types[F17_M60H_CPUS];
- pvt->ops = &family_types[F17_M60H_CPUS].ops;
+ case 0x60 ... 0x6f:
+ pvt->ctl_name = "F17h_M60h";
break;
- } else if (pvt->model >= 0x70 && pvt->model <= 0x7f) {
- fam_type = &family_types[F17_M70H_CPUS];
- pvt->ops = &family_types[F17_M70H_CPUS].ops;
+ case 0x70 ... 0x7f:
+ pvt->ctl_name = "F17h_M70h";
+ break;
+ default:
+ pvt->ctl_name = "F17h";
break;
}
- fallthrough;
- case 0x18:
- fam_type = &family_types[F17_CPUS];
- pvt->ops = &family_types[F17_CPUS].ops;
+ break;

- if (pvt->fam == 0x18)
- family_types[F17_CPUS].ctl_name = "F18h";
+ case 0x18:
+ pvt->ctl_name = "F18h";
break;

case 0x19:
- if (pvt->model >= 0x10 && pvt->model <= 0x1f) {
- fam_type = &family_types[F19_M10H_CPUS];
- pvt->ops = &family_types[F19_M10H_CPUS].ops;
+ switch (pvt->model) {
+ case 0x00 ... 0x0f:
+ pvt->ctl_name = "F19h";
+ pvt->max_mcs = 8;
break;
- } else if (pvt->model >= 0x20 && pvt->model <= 0x2f) {
- fam_type = &family_types[F17_M70H_CPUS];
- pvt->ops = &family_types[F17_M70H_CPUS].ops;
- fam_type->ctl_name = "F19h_M20h";
+ case 0x10 ... 0x1f:
+ pvt->ctl_name = "F19h_M10h";
+ pvt->max_mcs = 12;
+ pvt->flags.zn_regs_v2 = 1;
break;
- } else if (pvt->model >= 0x50 && pvt->model <= 0x5f) {
- fam_type = &family_types[F19_M50H_CPUS];
- pvt->ops = &family_types[F19_M50H_CPUS].ops;
- fam_type->ctl_name = "F19h_M50h";
+ case 0x20 ... 0x2f:
+ pvt->ctl_name = "F19h_M20h";
break;
- } else if (pvt->model >= 0xa0 && pvt->model <= 0xaf) {
- fam_type = &family_types[F19_M10H_CPUS];
- pvt->ops = &family_types[F19_M10H_CPUS].ops;
- fam_type->ctl_name = "F19h_MA0h";
+ case 0x50 ... 0x5f:
+ pvt->ctl_name = "F19h_M50h";
+ break;
+ case 0xa0 ... 0xaf:
+ pvt->ctl_name = "F19h_MA0h";
+ pvt->max_mcs = 12;
+ pvt->flags.zn_regs_v2 = 1;
break;
}
- fam_type = &family_types[F19_CPUS];
- pvt->ops = &family_types[F19_CPUS].ops;
- family_types[F19_CPUS].ctl_name = "F19h";
break;

default:
amd64_err("Unsupported family!\n");
- return NULL;
+ return -ENODEV;
}

- return fam_type;
+ return 0;
}

static const struct attribute_group *amd64_edac_attr_groups[] = {
@@ -3903,12 +3826,12 @@ static int hw_info_get(struct amd64_pvt *pvt)
int ret;

if (pvt->fam >= 0x17) {
- pvt->umc = kcalloc(fam_type->max_mcs, sizeof(struct amd64_umc), GFP_KERNEL);
+ pvt->umc = kcalloc(pvt->max_mcs, sizeof(struct amd64_umc), GFP_KERNEL);
if (!pvt->umc)
return -ENOMEM;
} else {
- pci_id1 = fam_type->f1_id;
- pci_id2 = fam_type->f2_id;
+ pci_id1 = pvt->f1_id;
+ pci_id2 = pvt->f2_id;
}

ret = reserve_mc_sibling_devs(pvt, pci_id1, pci_id2);
@@ -3938,7 +3861,7 @@ static int init_one_instance(struct amd64_pvt *pvt)
layers[0].size = pvt->csels[0].b_cnt;
layers[0].is_virt_csrow = true;
layers[1].type = EDAC_MC_LAYER_CHANNEL;
- layers[1].size = fam_type->max_mcs;
+ layers[1].size = pvt->max_mcs;
layers[1].is_virt_csrow = false;

mci = edac_mc_alloc(pvt->mc_node_id, ARRAY_SIZE(layers), layers, 0);
@@ -3968,7 +3891,7 @@ static bool instance_has_memory(struct amd64_pvt *pvt)
bool cs_enabled = false;
int cs = 0, dct = 0;

- for (dct = 0; dct < fam_type->max_mcs; dct++) {
+ for (dct = 0; dct < pvt->max_mcs; dct++) {
for_each_chip_select(cs, dct, pvt)
cs_enabled |= csrow_enabled(cs, dct, pvt);
}
@@ -3997,9 +3920,8 @@ static int probe_one_instance(unsigned int nid)
pvt->mc_node_id = nid;
pvt->F3 = F3;

- ret = -ENODEV;
- fam_type = per_family_init(pvt);
- if (!fam_type)
+ ret = per_family_init(pvt);
+ if (ret < 0)
goto err_enable;

ret = hw_info_get(pvt);
@@ -4038,11 +3960,7 @@ static int probe_one_instance(unsigned int nid)
goto err_enable;
}

- amd64_info("%s %sdetected (node %d).\n", fam_type->ctl_name,
- (pvt->fam == 0xf ?
- (pvt->ext_model >= K8_REV_F ? "revF or later "
- : "revE or earlier ")
- : ""), pvt->mc_node_id);
+ amd64_info("%s detected (node %d).\n", pvt->ctl_name, pvt->mc_node_id);

dump_misc_regs(pvt);

diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index e4329dff8cf2..8eea15546b9f 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -273,25 +273,6 @@

#define UMC_SDP_INIT BIT(31)

-enum amd_families {
- K8_CPUS = 0,
- F10_CPUS,
- F15_CPUS,
- F15_M30H_CPUS,
- F15_M60H_CPUS,
- F16_CPUS,
- F16_M30H_CPUS,
- F17_CPUS,
- F17_M10H_CPUS,
- F17_M30H_CPUS,
- F17_M60H_CPUS,
- F17_M70H_CPUS,
- F19_CPUS,
- F19_M10H_CPUS,
- F19_M50H_CPUS,
- NUM_FAMILIES,
-};
-
/* Error injection control structure */
struct error_injection {
u32 section;
@@ -334,6 +315,16 @@ struct amd64_umc {
enum mem_type dram_type;
};

+struct amd64_family_flags {
+ /*
+ * Indicates that the system supports the new register offsets, etc.
+ * first introduced with Family 19h Model 10h.
+ */
+ __u64 zn_regs_v2 : 1,
+
+ __reserved : 63;
+};
+
struct amd64_pvt {
struct low_ops *ops;

@@ -375,6 +366,12 @@ struct amd64_pvt {
/* x4, x8, or x16 syndromes in use */
u8 ecc_sym_sz;

+ const char *ctl_name;
+ u16 f1_id, f2_id;
+ /* Maximum number of memory controllers per die/node. */
+ u8 max_mcs;
+
+ struct amd64_family_flags flags;
/* place to store error injection parameters prior to issue */
struct error_injection injection;

@@ -465,29 +462,10 @@ struct ecc_settings {
* functions and per device encoding/decoding logic.
*/
struct low_ops {
- void (*map_sysaddr_to_csrow) (struct mem_ctl_info *mci, u64 sys_addr,
- struct err_info *);
- int (*dbam_to_cs) (struct amd64_pvt *pvt, u8 dct,
- unsigned cs_mode, int cs_mask_nr);
-};
-
-struct amd64_family_flags {
- /*
- * Indicates that the system supports the new register offsets, etc.
- * first introduced with Family 19h Model 10h.
- */
- __u64 zn_regs_v2 : 1,
-
- __reserved : 63;
-};
-
-struct amd64_family_type {
- const char *ctl_name;
- u16 f1_id, f2_id;
- /* Maximum number of memory controllers per die/node. */
- u8 max_mcs;
- struct amd64_family_flags flags;
- struct low_ops ops;
+ void (*map_sysaddr_to_csrow)(struct mem_ctl_info *mci, u64 sys_addr,
+ struct err_info *err);
+ int (*dbam_to_cs)(struct amd64_pvt *pvt, u8 dct,
+ unsigned int cs_mode, int cs_mask_nr);
};

int __amd64_read_pci_cfg_dword(struct pci_dev *pdev, int offset,
--
2.25.1


2023-01-27 17:07:01

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH v2 15/22] EDAC/amd64: Split read_mc_regs() into dct/umc functions

From: Muralidhara M K <[email protected]>

...and call them from their respective hw_info_get() paths.

ECC symbol size is not needed on UMC systems, so determine_ecc_sym_sz()
is left out of the UMC path.

Signed-off-by: Muralidhara M K <[email protected]>
Co-developed-by: Naveen Krishna Chatradhi <[email protected]>
Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
[Rebased/reworked patch and reworded commit message]
Co-developed-by: Yazen Ghannam <[email protected]>
Signed-off-by: Yazen Ghannam <[email protected]>
---
Link:
https://lore.kernel.org/r/[email protected]

v1->v2:
* Call functions directly instead of using pointers.

drivers/edac/amd64_edac.c | 17 ++++-------------
1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 0d0e563c99db..757d35fad2a6 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3089,7 +3089,7 @@ static void determine_ecc_sym_sz(struct amd64_pvt *pvt)
/*
* Retrieve the hardware registers of the memory controller.
*/
-static void __read_mc_regs_df(struct amd64_pvt *pvt)
+static void umc_read_mc_regs(struct amd64_pvt *pvt)
{
u8 nid = pvt->mc_node_id;
struct amd64_umc *umc;
@@ -3113,7 +3113,7 @@ static void __read_mc_regs_df(struct amd64_pvt *pvt)
* Retrieve the hardware registers of the memory controller (this includes the
* 'Address Map' and 'Misc' device regs)
*/
-static void read_mc_regs(struct amd64_pvt *pvt)
+static void dct_read_mc_regs(struct amd64_pvt *pvt)
{
unsigned int range;
u64 msr_val;
@@ -3134,12 +3134,6 @@ static void read_mc_regs(struct amd64_pvt *pvt)
edac_dbg(0, " TOP_MEM2 disabled\n");
}

- if (pvt->umc) {
- __read_mc_regs_df(pvt);
-
- goto skip;
- }
-
amd64_read_pci_cfg(pvt->F3, NBCAP, &pvt->nbcap);

read_dram_ctl_register(pvt);
@@ -3180,9 +3174,6 @@ static void read_mc_regs(struct amd64_pvt *pvt)
amd64_read_dct_pci_cfg(pvt, 1, DCHR0, &pvt->dchr1);
}

-skip:
-
-
determine_ecc_sym_sz(pvt);
}

@@ -3658,7 +3649,7 @@ static int dct_hw_info_get(struct amd64_pvt *pvt)

dct_prep_chip_selects(pvt);
dct_read_base_mask(pvt);
- read_mc_regs(pvt);
+ dct_read_mc_regs(pvt);
dct_determine_memory_type(pvt);

return 0;
@@ -3672,7 +3663,7 @@ static int umc_hw_info_get(struct amd64_pvt *pvt)

umc_prep_chip_selects(pvt);
umc_read_base_mask(pvt);
- read_mc_regs(pvt);
+ umc_read_mc_regs(pvt);
umc_determine_memory_type(pvt);

return 0;
--
2.25.1


2023-01-27 17:07:44

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH v2 03/22] EDAC/amd64: Remove PCI Function 6

PCI Function 6 is used on Family 17h and later to access scrub
registers. With scrub access removed, this function has no other use.

Remove all Function 6 PCI IDs and related code.

Signed-off-by: Yazen Ghannam <[email protected]>
---
Link:
https://lore.kernel.org/r/[email protected]

v1->v2:
* Also remove "pvt->F6" pointer.

drivers/edac/amd64_edac.c | 22 +---------------------
drivers/edac/amd64_edac.h | 12 ++----------
2 files changed, 3 insertions(+), 31 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 07a89df0d4f4..dce2179ad454 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -2906,7 +2906,6 @@ static struct amd64_family_type family_types[] = {
[F17_CPUS] = {
.ctl_name = "F17h",
.f0_id = PCI_DEVICE_ID_AMD_17H_DF_F0,
- .f6_id = PCI_DEVICE_ID_AMD_17H_DF_F6,
.max_mcs = 2,
.ops = {
.early_channel_count = f17_early_channel_count,
@@ -2916,7 +2915,6 @@ static struct amd64_family_type family_types[] = {
[F17_M10H_CPUS] = {
.ctl_name = "F17h_M10h",
.f0_id = PCI_DEVICE_ID_AMD_17H_M10H_DF_F0,
- .f6_id = PCI_DEVICE_ID_AMD_17H_M10H_DF_F6,
.max_mcs = 2,
.ops = {
.early_channel_count = f17_early_channel_count,
@@ -2926,7 +2924,6 @@ static struct amd64_family_type family_types[] = {
[F17_M30H_CPUS] = {
.ctl_name = "F17h_M30h",
.f0_id = PCI_DEVICE_ID_AMD_17H_M30H_DF_F0,
- .f6_id = PCI_DEVICE_ID_AMD_17H_M30H_DF_F6,
.max_mcs = 8,
.ops = {
.early_channel_count = f17_early_channel_count,
@@ -2936,7 +2933,6 @@ static struct amd64_family_type family_types[] = {
[F17_M60H_CPUS] = {
.ctl_name = "F17h_M60h",
.f0_id = PCI_DEVICE_ID_AMD_17H_M60H_DF_F0,
- .f6_id = PCI_DEVICE_ID_AMD_17H_M60H_DF_F6,
.max_mcs = 2,
.ops = {
.early_channel_count = f17_early_channel_count,
@@ -2946,7 +2942,6 @@ static struct amd64_family_type family_types[] = {
[F17_M70H_CPUS] = {
.ctl_name = "F17h_M70h",
.f0_id = PCI_DEVICE_ID_AMD_17H_M70H_DF_F0,
- .f6_id = PCI_DEVICE_ID_AMD_17H_M70H_DF_F6,
.max_mcs = 2,
.ops = {
.early_channel_count = f17_early_channel_count,
@@ -2956,7 +2951,6 @@ static struct amd64_family_type family_types[] = {
[F19_CPUS] = {
.ctl_name = "F19h",
.f0_id = PCI_DEVICE_ID_AMD_19H_DF_F0,
- .f6_id = PCI_DEVICE_ID_AMD_19H_DF_F6,
.max_mcs = 8,
.ops = {
.early_channel_count = f17_early_channel_count,
@@ -2966,7 +2960,6 @@ static struct amd64_family_type family_types[] = {
[F19_M10H_CPUS] = {
.ctl_name = "F19h_M10h",
.f0_id = PCI_DEVICE_ID_AMD_19H_M10H_DF_F0,
- .f6_id = PCI_DEVICE_ID_AMD_19H_M10H_DF_F6,
.max_mcs = 12,
.flags.zn_regs_v2 = 1,
.ops = {
@@ -2977,7 +2970,6 @@ static struct amd64_family_type family_types[] = {
[F19_M50H_CPUS] = {
.ctl_name = "F19h_M50h",
.f0_id = PCI_DEVICE_ID_AMD_19H_M50H_DF_F0,
- .f6_id = PCI_DEVICE_ID_AMD_19H_M50H_DF_F6,
.max_mcs = 2,
.ops = {
.early_channel_count = f17_early_channel_count,
@@ -3290,7 +3282,7 @@ static void decode_umc_error(int node_id, struct mce *m)
/*
* Use pvt->F3 which contains the F3 CPU PCI device to get the related
* F1 (AddrMap) and F2 (Dct) devices. Return negative value on error.
- * Reserve F0 and F6 on systems with a UMC.
+ * Reserve F0 on systems with a UMC.
*/
static int
reserve_mc_sibling_devs(struct amd64_pvt *pvt, u16 pci_id1, u16 pci_id2)
@@ -3302,21 +3294,11 @@ reserve_mc_sibling_devs(struct amd64_pvt *pvt, u16 pci_id1, u16 pci_id2)
return -ENODEV;
}

- pvt->F6 = pci_get_related_function(pvt->F3->vendor, pci_id2, pvt->F3);
- if (!pvt->F6) {
- pci_dev_put(pvt->F0);
- pvt->F0 = NULL;
-
- edac_dbg(1, "F6 not found: device 0x%x\n", pci_id2);
- return -ENODEV;
- }
-
if (!pci_ctl_dev)
pci_ctl_dev = &pvt->F0->dev;

edac_dbg(1, "F0: %s\n", pci_name(pvt->F0));
edac_dbg(1, "F3: %s\n", pci_name(pvt->F3));
- edac_dbg(1, "F6: %s\n", pci_name(pvt->F6));

return 0;
}
@@ -3352,7 +3334,6 @@ static void free_mc_sibling_devs(struct amd64_pvt *pvt)
{
if (pvt->umc) {
pci_dev_put(pvt->F0);
- pci_dev_put(pvt->F6);
} else {
pci_dev_put(pvt->F1);
pci_dev_put(pvt->F2);
@@ -4078,7 +4059,6 @@ static int hw_info_get(struct amd64_pvt *pvt)
return -ENOMEM;

pci_id1 = fam_type->f0_id;
- pci_id2 = fam_type->f6_id;
} else {
pci_id1 = fam_type->f1_id;
pci_id2 = fam_type->f2_id;
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 48f1d97e1274..2d5ea9ca3868 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -115,21 +115,13 @@
#define PCI_DEVICE_ID_AMD_16H_M30H_NB_F1 0x1581
#define PCI_DEVICE_ID_AMD_16H_M30H_NB_F2 0x1582
#define PCI_DEVICE_ID_AMD_17H_DF_F0 0x1460
-#define PCI_DEVICE_ID_AMD_17H_DF_F6 0x1466
#define PCI_DEVICE_ID_AMD_17H_M10H_DF_F0 0x15e8
-#define PCI_DEVICE_ID_AMD_17H_M10H_DF_F6 0x15ee
#define PCI_DEVICE_ID_AMD_17H_M30H_DF_F0 0x1490
-#define PCI_DEVICE_ID_AMD_17H_M30H_DF_F6 0x1496
#define PCI_DEVICE_ID_AMD_17H_M60H_DF_F0 0x1448
-#define PCI_DEVICE_ID_AMD_17H_M60H_DF_F6 0x144e
#define PCI_DEVICE_ID_AMD_17H_M70H_DF_F0 0x1440
-#define PCI_DEVICE_ID_AMD_17H_M70H_DF_F6 0x1446
#define PCI_DEVICE_ID_AMD_19H_DF_F0 0x1650
-#define PCI_DEVICE_ID_AMD_19H_DF_F6 0x1656
#define PCI_DEVICE_ID_AMD_19H_M10H_DF_F0 0x14ad
-#define PCI_DEVICE_ID_AMD_19H_M10H_DF_F6 0x14b3
#define PCI_DEVICE_ID_AMD_19H_M50H_DF_F0 0x166a
-#define PCI_DEVICE_ID_AMD_19H_M50H_DF_F6 0x1670

/*
* Function 1 - Address Map
@@ -354,7 +346,7 @@ struct amd64_pvt {
struct low_ops *ops;

/* pci_device handles which we utilize */
- struct pci_dev *F0, *F1, *F2, *F3, *F6;
+ struct pci_dev *F0, *F1, *F2, *F3;

u16 mc_node_id; /* MC index of this MC node */
u8 fam; /* CPU family */
@@ -501,7 +493,7 @@ struct amd64_family_flags {

struct amd64_family_type {
const char *ctl_name;
- u16 f0_id, f1_id, f2_id, f6_id;
+ u16 f0_id, f1_id, f2_id;
/* Maximum number of memory controllers per die/node. */
u8 max_mcs;
struct amd64_family_flags flags;
--
2.25.1


2023-01-27 17:08:29

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH v2 20/22] EDAC/amd64: Split init_csrows() into dct/umc functions

From: Muralidhara M K <[email protected]>

...and call them from their respective setup_mci_misc_attrs() paths.

Also, drop the check for an "empty" device, i.e. one without memory.
This is redundant and already done in instance_has_memory() earlier in
the init path.

No functional change is intended.

Signed-off-by: Muralidhara M K <[email protected]>
Co-developed-by: Naveen Krishna Chatradhi <[email protected]>
Signed-off-by: Naveen Krishna Chatradhi <[email protected]>
[Rebased/reworked patch and reworded commit message]
Co-developed-by: Yazen Ghannam <[email protected]>
Signed-off-by: Yazen Ghannam <[email protected]>
---
Link:
https://lore.kernel.org/r/[email protected]

v1->v2:
* Call function directly instead of using pointers.
* Clean up redundant code.

drivers/edac/amd64_edac.c | 23 +++++++----------------
1 file changed, 7 insertions(+), 16 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index f1c613107a22..ddf2178dabff 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3250,13 +3250,12 @@ static u32 umc_get_csrow_nr_pages(struct amd64_pvt *pvt, u8 dct, int csrow_nr_or
return nr_pages;
}

-static int init_csrows_df(struct mem_ctl_info *mci)
+static void umc_init_csrows(struct mem_ctl_info *mci)
{
struct amd64_pvt *pvt = mci->pvt_info;
enum edac_type edac_mode = EDAC_NONE;
enum dev_type dev_type = DEV_UNKNOWN;
struct dimm_info *dimm;
- int empty = 1;
u8 umc, cs;

if (mci->edac_ctl_cap & EDAC_FLAG_S16ECD16ED) {
@@ -3277,7 +3276,6 @@ static int init_csrows_df(struct mem_ctl_info *mci)
if (!csrow_enabled(cs, umc, pvt))
continue;

- empty = 0;
dimm = mci->csrows[cs]->channels[umc]->dimm;

edac_dbg(1, "MC node: %d, csrow: %d\n",
@@ -3290,27 +3288,22 @@ static int init_csrows_df(struct mem_ctl_info *mci)
dimm->grain = 64;
}
}
-
- return empty;
}

/*
* Initialize the array of csrow attribute instances, based on the values
* from pci config hardware registers.
*/
-static int init_csrows(struct mem_ctl_info *mci)
+static void dct_init_csrows(struct mem_ctl_info *mci)
{
struct amd64_pvt *pvt = mci->pvt_info;
enum edac_type edac_mode = EDAC_NONE;
struct csrow_info *csrow;
struct dimm_info *dimm;
- int i, j, empty = 1;
int nr_pages = 0;
+ int i, j;
u32 val;

- if (pvt->umc)
- return init_csrows_df(mci);
-
amd64_read_pci_cfg(pvt->F3, NBCFG, &val);

pvt->nbcfg = val;
@@ -3333,7 +3326,6 @@ static int init_csrows(struct mem_ctl_info *mci)
continue;

csrow = mci->csrows[i];
- empty = 0;

edac_dbg(1, "MC node: %d, csrow: %d\n",
pvt->mc_node_id, i);
@@ -3367,8 +3359,6 @@ static int init_csrows(struct mem_ctl_info *mci)
dimm->grain = 64;
}
}
-
- return empty;
}

/* get all cores on this DCT */
@@ -3642,6 +3632,8 @@ static void dct_setup_mci_misc_attrs(struct mem_ctl_info *mci)
/* memory scrubber interface */
mci->set_sdram_scrub_rate = set_scrub_rate;
mci->get_sdram_scrub_rate = get_scrub_rate;
+
+ dct_init_csrows(mci);
}

static void umc_setup_mci_misc_attrs(struct mem_ctl_info *mci)
@@ -3658,6 +3650,8 @@ static void umc_setup_mci_misc_attrs(struct mem_ctl_info *mci)
mci->ctl_name = pvt->ctl_name;
mci->dev_name = pci_name(pvt->F3);
mci->ctl_page_to_phys = NULL;
+
+ umc_init_csrows(mci);
}

static int dct_hw_info_get(struct amd64_pvt *pvt)
@@ -3873,9 +3867,6 @@ static int init_one_instance(struct amd64_pvt *pvt)

pvt->ops->setup_mci_misc_attrs(mci);

- if (init_csrows(mci))
- mci->edac_cap = EDAC_FLAG_NONE;
-
ret = -ENODEV;
if (edac_mc_add_mc_with_groups(mci, amd64_edac_attr_groups)) {
edac_dbg(1, "failed edac_mc_add_mc()\n");
--
2.25.1


2023-01-27 17:11:32

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH v2 02/22] EDAC/amd64: Remove scrub rate control for Family 17h and later

The scrub registers on AMD Family 17h and later may be inaccessible to
the OS. Furthermore, hardware designers recommend that the scrubbing
feature is managed by the firmware.

Remove support for the sdram_scrub_rate interface for AMD Family 17h
systems and later by not setting the scrub function pointers. The EDAC MC
core will then not expose the scrub files in sysfs.

Signed-off-by: Yazen Ghannam <[email protected]>
---
Link:
https://lore.kernel.org/r/[email protected]

v1->v2:
* Don't set the scrub function pointers.

drivers/edac/amd64_edac.c | 33 +++++----------------------------
drivers/edac/amd64_edac.h | 2 --
2 files changed, 5 insertions(+), 30 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 2cc7336a5121..07a89df0d4f4 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -182,21 +182,6 @@ static inline int amd64_read_dct_pci_cfg(struct amd64_pvt *pvt, u8 dct,
* other archs, we might not have access to the caches directly.
*/

-static inline void __f17h_set_scrubval(struct amd64_pvt *pvt, u32 scrubval)
-{
- /*
- * Fam17h supports scrub values between 0x5 and 0x14. Also, the values
- * are shifted down by 0x5, so scrubval 0x5 is written to the register
- * as 0x0, scrubval 0x6 as 0x1, etc.
- */
- if (scrubval >= 0x5 && scrubval <= 0x14) {
- scrubval -= 0x5;
- pci_write_bits32(pvt->F6, F17H_SCR_LIMIT_ADDR, scrubval, 0xF);
- pci_write_bits32(pvt->F6, F17H_SCR_BASE_ADDR, 1, 0x1);
- } else {
- pci_write_bits32(pvt->F6, F17H_SCR_BASE_ADDR, 0, 0x1);
- }
-}
/*
* Scan the scrub rate mapping table for a close or matching bandwidth value to
* issue. If requested is too big, then use last maximum value found.
@@ -229,9 +214,7 @@ static int __set_scrub_rate(struct amd64_pvt *pvt, u32 new_bw, u32 min_rate)

scrubval = scrubrates[i].scrubval;

- if (pvt->umc) {
- __f17h_set_scrubval(pvt, scrubval);
- } else if (pvt->fam == 0x15 && pvt->model == 0x60) {
+ if (pvt->fam == 0x15 && pvt->model == 0x60) {
f15h_select_dct(pvt, 0);
pci_write_bits32(pvt->F2, F15H_M60H_SCRCTRL, scrubval, 0x001F);
f15h_select_dct(pvt, 1);
@@ -271,16 +254,7 @@ static int get_scrub_rate(struct mem_ctl_info *mci)
int i, retval = -EINVAL;
u32 scrubval = 0;

- if (pvt->umc) {
- amd64_read_pci_cfg(pvt->F6, F17H_SCR_BASE_ADDR, &scrubval);
- if (scrubval & BIT(0)) {
- amd64_read_pci_cfg(pvt->F6, F17H_SCR_LIMIT_ADDR, &scrubval);
- scrubval &= 0xF;
- scrubval += 0x5;
- } else {
- scrubval = 0;
- }
- } else if (pvt->fam == 0x15) {
+ if (pvt->fam == 0x15) {
/* Erratum #505 */
if (pvt->model < 0x10)
f15h_select_dct(pvt, 0);
@@ -3967,6 +3941,9 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci)
mci->dev_name = pci_name(pvt->F3);
mci->ctl_page_to_phys = NULL;

+ if (pvt->fam >= 0x17)
+ return;
+
/* memory scrubber interface */
mci->set_sdram_scrub_rate = set_scrub_rate;
mci->get_sdram_scrub_rate = get_scrub_rate;
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 38e5ad95d010..48f1d97e1274 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -215,8 +215,6 @@
#define DCT_SEL_HI 0x114

#define F15H_M60H_SCRCTRL 0x1C8
-#define F17H_SCR_BASE_ADDR 0x48
-#define F17H_SCR_LIMIT_ADDR 0x4C

/*
* Function 3 - Misc Control
--
2.25.1


2023-02-09 14:25:34

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 06/22] EDAC/amd64: Rename debug_display_dimm_sizes()

On Fri, Jan 27, 2023 at 05:04:03PM +0000, Yazen Ghannam wrote:
> Use the "dct" and "umc" prefixes for legacy and modern versions
> respectively.
>
> Also, move the "dct" version to avoid a forward declaration, and fixup
> some checkpatch warnings in the process.

Yeah, but it is hard to review a move and changes in a single patch.
Pls split it into

1. Mechanical move only
2. Other changes

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-02-10 12:16:40

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 05/22] EDAC/amd64: Remove early_channel_count()

On Fri, Jan 27, 2023 at 05:04:02PM +0000, Yazen Ghannam wrote:
> The early_channel_count() function seems to have been useful in the past
> for knowing how many EDAC mci structures to populate. However, this is no
> longer needed as the maximum channel count for a system is used instead.
>
> Remove the early_channel_count() helper functions and related code. Use the
> size of the channel layer when iterating over channel structures.
>
> Signed-off-by: Yazen Ghannam <[email protected]>
> ---
> Link:
> https://lore.kernel.org/r/[email protected]
>
> v1->v2:
> * New in v2.
>
> drivers/edac/amd64_edac.c | 116 +-------------------------------------
> drivers/edac/amd64_edac.h | 2 -
> 2 files changed, 2 insertions(+), 116 deletions(-)

First 5 patches: applied, thanks.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-02-13 16:54:24

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH v2 06/22] EDAC/amd64: Rename debug_display_dimm_sizes()

On Thu, Feb 09, 2023 at 03:25:19PM +0100, Borislav Petkov wrote:
> On Fri, Jan 27, 2023 at 05:04:03PM +0000, Yazen Ghannam wrote:
> > Use the "dct" and "umc" prefixes for legacy and modern versions
> > respectively.
> >
> > Also, move the "dct" version to avoid a forward declaration, and fixup
> > some checkpatch warnings in the process.
>
> Yeah, but it is hard to review a move and changes in a single patch.
> Pls split it into
>
> 1. Mechanical move only
> 2. Other changes
>

Understood. Patch 8 also has a move, so I'll split that one too.

Would you like me to resend the entire set? Do you have comments on the
remaining patches?

Thanks,
Yazen

2023-02-13 16:55:24

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH v2 05/22] EDAC/amd64: Remove early_channel_count()

On Fri, Feb 10, 2023 at 01:16:29PM +0100, Borislav Petkov wrote:
> On Fri, Jan 27, 2023 at 05:04:02PM +0000, Yazen Ghannam wrote:
> > The early_channel_count() function seems to have been useful in the past
> > for knowing how many EDAC mci structures to populate. However, this is no
> > longer needed as the maximum channel count for a system is used instead.
> >
> > Remove the early_channel_count() helper functions and related code. Use the
> > size of the channel layer when iterating over channel structures.
> >
> > Signed-off-by: Yazen Ghannam <[email protected]>
> > ---
> > Link:
> > https://lore.kernel.org/r/[email protected]
> >
> > v1->v2:
> > * New in v2.
> >
> > drivers/edac/amd64_edac.c | 116 +-------------------------------------
> > drivers/edac/amd64_edac.h | 2 -
> > 2 files changed, 2 insertions(+), 116 deletions(-)
>
> First 5 patches: applied, thanks.
>

Thank you!

-Yazen

2023-03-18 15:39:59

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 10/22] EDAC/amd64: Merge struct amd64_family_type into struct amd64_pvt

On Fri, Jan 27, 2023 at 05:04:07PM +0000, Yazen Ghannam wrote:
> From: Muralidhara M K <[email protected]>
>
> Future AMD systems will support heterogeneous "AMD Node" types, e.g.
> CPU+GPU. Therefore, a global "family type" shared across all "AMD Nodes"

"CPU and GPU types."

...

> +static int per_family_init(struct amd64_pvt *pvt)
> {
> pvt->ext_model = boot_cpu_data.x86_model >> 4;
> pvt->stepping = boot_cpu_data.x86_stepping;
> pvt->model = boot_cpu_data.x86_model;
> pvt->fam = boot_cpu_data.x86;
> + pvt->max_mcs = 2;
> +
> + /*
> + * Decide on which ops group to use here and do any family/model
> + * overrides below.
> + */
> + if (pvt->fam >= 0x17)
> + pvt->ops = &umc_ops;
> + else
> + pvt->ops = &dct_ops;
>
> switch (pvt->fam) {
> case 0xf:
> - fam_type = &family_types[K8_CPUS];
> - pvt->ops = &family_types[K8_CPUS].ops;
> + pvt->ctl_name = (pvt->ext_model > K8_REV_F) ?

The original test is >=

- amd64_info("%s %sdetected (node %d).\n", fam_type->ctl_name,
- (pvt->fam == 0xf ?
- (pvt->ext_model >= K8_REV_F ? "revF or later "
^^^^^


- : "revE or earlier ")

> + "K8 revF or later" : "K8 revE or earlier";
> + pvt->f1_id = PCI_DEVICE_ID_AMD_K8_NB_ADDRMAP;
> + pvt->f2_id = PCI_DEVICE_ID_AMD_K8_NB_MEMCTL;
> + pvt->ops->map_sysaddr_to_csrow = k8_map_sysaddr_to_csrow;
> + pvt->ops->dbam_to_cs = k8_dbam_to_chip_select;
> break;

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-03-23 11:12:47

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 00/22] AMD64 EDAC Cleanup and Refactor

On Fri, Jan 27, 2023 at 05:03:57PM +0000, Yazen Ghannam wrote:
> Muralidhara M K (11):
> EDAC/amd64: Merge struct amd64_family_type into struct amd64_pvt
> EDAC/amd64: Split prep_chip_selects() into dct/umc functions
> EDAC/amd64: Split read_base_mask() into dct/umc functions
> EDAC/amd64: Split determine_memory_type() into dct/umc functions
> EDAC/amd64: Split read_mc_regs() into dct/umc functions
> EDAC/amd64: Split ecc_enabled() into dct/umc functions
> EDAC/amd64: Split setup_mci_misc_attrs() into dct/umc functions
> EDAC/amd64: Split determine_edac_cap() into dct/umc functions
> EDAC/amd64: Split init_csrows() into dct/umc functions
> EDAC/amd64: Split dump_misc_regs() into dct/umc functions
> EDAC/amd64: Add get_err_info() to pvt->ops
>
> Yazen Ghannam (11):
> EDAC/amd64: Don't set up EDAC PCI control on Family 17h+
> EDAC/amd64: Remove scrub rate control for Family 17h and later
> EDAC/amd64: Remove PCI Function 6
> EDAC/amd64: Remove PCI Function 0
> EDAC/amd64: Remove early_channel_count()
> EDAC/amd64: Rename debug_display_dimm_sizes()
> EDAC/amd64: Split get_csrow_nr_pages() into dct/umc functions
> EDAC/amd64: Drop dbam_to_cs() for Family 17h and later
> EDAC/amd64: Don't find ECC symbol size for Family 17h and later
> EDAC/amd64: Rework hw_info_{get,put}
> EDAC/amd64: Rename f17h_determine_edac_ctl_cap()
>
> drivers/edac/amd64_edac.c | 1221 ++++++++++++++-----------------------
> drivers/edac/amd64_edac.h | 89 +--
> 2 files changed, 483 insertions(+), 827 deletions(-)

All small issues fixed up and applied.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-03-23 15:22:06

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH v2 00/22] AMD64 EDAC Cleanup and Refactor

On Thu, Mar 23, 2023 at 12:01:50PM +0100, Borislav Petkov wrote:
> On Fri, Jan 27, 2023 at 05:03:57PM +0000, Yazen Ghannam wrote:
> > Muralidhara M K (11):
> > EDAC/amd64: Merge struct amd64_family_type into struct amd64_pvt
> > EDAC/amd64: Split prep_chip_selects() into dct/umc functions
> > EDAC/amd64: Split read_base_mask() into dct/umc functions
> > EDAC/amd64: Split determine_memory_type() into dct/umc functions
> > EDAC/amd64: Split read_mc_regs() into dct/umc functions
> > EDAC/amd64: Split ecc_enabled() into dct/umc functions
> > EDAC/amd64: Split setup_mci_misc_attrs() into dct/umc functions
> > EDAC/amd64: Split determine_edac_cap() into dct/umc functions
> > EDAC/amd64: Split init_csrows() into dct/umc functions
> > EDAC/amd64: Split dump_misc_regs() into dct/umc functions
> > EDAC/amd64: Add get_err_info() to pvt->ops
> >
> > Yazen Ghannam (11):
> > EDAC/amd64: Don't set up EDAC PCI control on Family 17h+
> > EDAC/amd64: Remove scrub rate control for Family 17h and later
> > EDAC/amd64: Remove PCI Function 6
> > EDAC/amd64: Remove PCI Function 0
> > EDAC/amd64: Remove early_channel_count()
> > EDAC/amd64: Rename debug_display_dimm_sizes()
> > EDAC/amd64: Split get_csrow_nr_pages() into dct/umc functions
> > EDAC/amd64: Drop dbam_to_cs() for Family 17h and later
> > EDAC/amd64: Don't find ECC symbol size for Family 17h and later
> > EDAC/amd64: Rework hw_info_{get,put}
> > EDAC/amd64: Rename f17h_determine_edac_ctl_cap()
> >
> > drivers/edac/amd64_edac.c | 1221 ++++++++++++++-----------------------
> > drivers/edac/amd64_edac.h | 89 +--
> > 2 files changed, 483 insertions(+), 827 deletions(-)
>
> All small issues fixed up and applied.
>

Thank you!

-Yazen