2019-07-09 22:02:03

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH v2 0/7] AMD64 EDAC fixes

From: Yazen Ghannam <[email protected]>

Hi Boris,

This set contains a few fixes for some changes merged in v5.2. There
are also a couple of fixes for older issues. In addition, there are a
couple of patches to add support for Asymmetric Dual-Rank DIMMs.

Thanks,
Yazen

Link:
https://lkml.kernel.org/r/[email protected]

v1->v2:
* Squash patches 1 and 2 together.

Yazen Ghannam (7):
EDAC/amd64: Support more than two controllers for chip selects
handling
EDAC/amd64: Recognize DRAM device type with EDAC_CTL_CAP
EDAC/amd64: Initialize DIMM info for systems with more than two
channels
EDAC/amd64: Find Chip Select memory size using Address Mask
EDAC/amd64: Decode syndrome before translating address
EDAC/amd64: Cache secondary Chip Select registers
EDAC/amd64: Support Asymmetric Dual-Rank DIMMs

drivers/edac/amd64_edac.c | 348 ++++++++++++++++++++++++--------------
drivers/edac/amd64_edac.h | 9 +-
2 files changed, 232 insertions(+), 125 deletions(-)

--
2.17.1


2019-07-09 22:02:34

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH v2 2/7] EDAC/amd64: Recognize DRAM device type with EDAC_CTL_CAP

From: Yazen Ghannam <[email protected]>

AMD Family 17h systems support x4 and x16 DRAM devices. However, the
device type is not checked when setting EDAC_CTL_CAP.

Set the appropriate EDAC_CTL_CAP flag based on the device type.

Fixes: 2d09d8f301f5 ("EDAC, amd64: Determine EDAC MC capabilities on Fam17h")
Signed-off-by: Yazen Ghannam <[email protected]>
---
Link:
https://lkml.kernel.org/r/[email protected]

v1->v2:
* No change.

drivers/edac/amd64_edac.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index dd60cf5a3d96..125d6e2a828e 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -3150,12 +3150,15 @@ static bool ecc_enabled(struct pci_dev *F3, u16 nid)
static inline void
f17h_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt *pvt)
{
- u8 i, ecc_en = 1, cpk_en = 1;
+ u8 i, ecc_en = 1, cpk_en = 1, dev_x4 = 1, dev_x16 = 1;

for_each_umc(i) {
if (pvt->umc[i].sdp_ctrl & UMC_SDP_INIT) {
ecc_en &= !!(pvt->umc[i].umc_cap_hi & UMC_ECC_ENABLED);
cpk_en &= !!(pvt->umc[i].umc_cap_hi & UMC_ECC_CHIPKILL_CAP);
+
+ dev_x4 &= !!(pvt->umc[i].dimm_cfg & BIT(6));
+ dev_x16 &= !!(pvt->umc[i].dimm_cfg & BIT(7));
}
}

@@ -3163,8 +3166,12 @@ f17h_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt *pvt)
if (ecc_en) {
mci->edac_ctl_cap |= EDAC_FLAG_SECDED;

- if (cpk_en)
- mci->edac_ctl_cap |= EDAC_FLAG_S4ECD4ED;
+ if (cpk_en) {
+ if (dev_x4)
+ mci->edac_ctl_cap |= EDAC_FLAG_S4ECD4ED;
+ else if (dev_x16)
+ mci->edac_ctl_cap |= EDAC_FLAG_S16ECD16ED;
+ }
}
}

--
2.17.1

2019-07-09 22:02:35

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH v2 7/7] EDAC/amd64: Support Asymmetric Dual-Rank DIMMs

From: Yazen Ghannam <[email protected]>

Future AMD systems will support "Asymmetric" Dual-Rank DIMMs. These are
DIMMs were the ranks are of different sizes.

The even rank will use the Primary Even Chip Select registers and the
odd rank will use the Secondary Odd Chip Select registers.

Recognize if a Secondary Odd Chip Select is being used. Use the
Secondary Odd Address Mask when calculating the chip select size.

Signed-off-by: Yazen Ghannam <[email protected]>
---
Link:
https://lkml.kernel.org/r/[email protected]

v1->v2:
* No change.

drivers/edac/amd64_edac.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 006417cb79dc..6c284a4f980c 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -790,6 +790,9 @@ static void debug_dump_dramcfg_low(struct amd64_pvt *pvt, u32 dclr, int chan)

#define CS_EVEN_PRIMARY BIT(0)
#define CS_ODD_PRIMARY BIT(1)
+#define CS_ODD_SECONDARY BIT(2)
+
+#define csrow_sec_enabled(i, dct, pvt) ((pvt)->csels[(dct)].csbases_sec[(i)] & DCSB_CS_ENABLE)

static int f17_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt)
{
@@ -801,6 +804,10 @@ static int f17_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt)
if (csrow_enabled(2 * dimm + 1, ctrl, pvt))
cs_mode |= CS_ODD_PRIMARY;

+ /* Asymmetric Dual-Rank DIMM support. */
+ if (csrow_sec_enabled(2 * dimm + 1, ctrl, pvt))
+ cs_mode |= CS_ODD_SECONDARY;
+
return cs_mode;
}

@@ -1590,7 +1597,11 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
*/
dimm = csrow_nr >> 1;

- addr_mask_orig = pvt->csels[umc].csmasks[dimm];
+ /* Asymmetric Dual-Rank DIMM support. */
+ if (cs_mode & CS_ODD_SECONDARY)
+ addr_mask_orig = pvt->csels[umc].csmasks_sec[dimm];
+ else
+ addr_mask_orig = pvt->csels[umc].csmasks[dimm];

/*
* The number of zero bits in the mask is equal to the number of bits
--
2.17.1

2019-07-09 22:02:39

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH v2 1/7] EDAC/amd64: Support more than two controllers for chip selects handling

From: Yazen Ghannam <[email protected]>

The struct chip_select array that's used for saving chip select bases
and masks is fixed at length of two. There should be one struct
chip_select for each controller, so this array should be increased to
support systems that may have more than two controllers.

Increase the size of the struct chip_select array to eight, which is the
largest number of controllers per die currently supported on AMD
systems.

Fix number of DIMMs and Chip Select bases/masks on Family17h, because AMD
Family 17h systems support 2 DIMMs, 4 CS bases, and 2 CS masks per
channel.

Also, carve out the Family 17h+ reading of the bases/masks into a
separate function. This effectively reverts the original bases/masks
reading code to before Family 17h support was added.

This is a second version of a commit that was reverted.

Fixes: 07ed82ef93d6 ("EDAC, amd64: Add Fam17h debug output")
Fixes: 8de9930a4618 ("Revert "EDAC/amd64: Support more than two controllers for chip select handling"")
Signed-off-by: Yazen Ghannam <[email protected]>
---
Link:
https://lkml.kernel.org/r/[email protected]
https://lkml.kernel.org/r/[email protected]

v1->v2:
* Patches 1 and 2 squashed together.

drivers/edac/amd64_edac.c | 123 +++++++++++++++++++++-----------------
drivers/edac/amd64_edac.h | 5 +-
2 files changed, 71 insertions(+), 57 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 873437be86d9..dd60cf5a3d96 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -810,7 +810,7 @@ static void debug_display_dimm_sizes_df(struct amd64_pvt *pvt, u8 ctrl)

edac_printk(KERN_DEBUG, EDAC_MC, "UMC%d chip selects:\n", ctrl);

- for (dimm = 0; dimm < 4; dimm++) {
+ for (dimm = 0; dimm < 2; dimm++) {
size0 = 0;
cs0 = dimm * 2;

@@ -942,89 +942,102 @@ static void prep_chip_selects(struct amd64_pvt *pvt)
} else if (pvt->fam == 0x15 && pvt->model == 0x30) {
pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 4;
pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2;
+ } else if (pvt->fam >= 0x17) {
+ int umc;
+
+ for_each_umc(umc) {
+ pvt->csels[umc].b_cnt = 4;
+ pvt->csels[umc].m_cnt = 2;
+ }
+
} else {
pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8;
pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 4;
}
}

+static void read_umc_base_mask(struct amd64_pvt *pvt)
+{
+ u32 umc_base_reg, umc_mask_reg;
+ u32 base_reg, mask_reg;
+ u32 *base, *mask;
+ int cs, umc;
+
+ for_each_umc(umc) {
+ umc_base_reg = get_umc_base(umc) + UMCCH_BASE_ADDR;
+
+ for_each_chip_select(cs, umc, pvt) {
+ base = &pvt->csels[umc].csbases[cs];
+
+ base_reg = umc_base_reg + (cs * 4);
+
+ if (!amd_smn_read(pvt->mc_node_id, base_reg, base))
+ edac_dbg(0, " DCSB%d[%d]=0x%08x reg: 0x%x\n",
+ umc, cs, *base, base_reg);
+ }
+
+ umc_mask_reg = get_umc_base(umc) + UMCCH_ADDR_MASK;
+
+ for_each_chip_select_mask(cs, umc, pvt) {
+ mask = &pvt->csels[umc].csmasks[cs];
+
+ mask_reg = umc_mask_reg + (cs * 4);
+
+ if (!amd_smn_read(pvt->mc_node_id, mask_reg, mask))
+ edac_dbg(0, " DCSM%d[%d]=0x%08x reg: 0x%x\n",
+ umc, cs, *mask, mask_reg);
+ }
+ }
+}
+
/*
* Function 2 Offset F10_DCSB0; read in the DCS Base and DCS Mask registers
*/
static void read_dct_base_mask(struct amd64_pvt *pvt)
{
- int base_reg0, base_reg1, mask_reg0, mask_reg1, cs;
+ int cs;

prep_chip_selects(pvt);

- if (pvt->umc) {
- base_reg0 = get_umc_base(0) + UMCCH_BASE_ADDR;
- base_reg1 = get_umc_base(1) + UMCCH_BASE_ADDR;
- mask_reg0 = get_umc_base(0) + UMCCH_ADDR_MASK;
- mask_reg1 = get_umc_base(1) + UMCCH_ADDR_MASK;
- } else {
- base_reg0 = DCSB0;
- base_reg1 = DCSB1;
- mask_reg0 = DCSM0;
- mask_reg1 = DCSM1;
- }
+ if (pvt->umc)
+ return read_umc_base_mask(pvt);

for_each_chip_select(cs, 0, pvt) {
- int reg0 = base_reg0 + (cs * 4);
- int reg1 = base_reg1 + (cs * 4);
+ int reg0 = DCSB0 + (cs * 4);
+ int reg1 = DCSB1 + (cs * 4);
u32 *base0 = &pvt->csels[0].csbases[cs];
u32 *base1 = &pvt->csels[1].csbases[cs];

- if (pvt->umc) {
- if (!amd_smn_read(pvt->mc_node_id, reg0, base0))
- edac_dbg(0, " DCSB0[%d]=0x%08x reg: 0x%x\n",
- cs, *base0, reg0);
-
- if (!amd_smn_read(pvt->mc_node_id, reg1, base1))
- edac_dbg(0, " DCSB1[%d]=0x%08x reg: 0x%x\n",
- cs, *base1, reg1);
- } else {
- if (!amd64_read_dct_pci_cfg(pvt, 0, reg0, base0))
- edac_dbg(0, " DCSB0[%d]=0x%08x reg: F2x%x\n",
- cs, *base0, reg0);
+ if (!amd64_read_dct_pci_cfg(pvt, 0, reg0, base0))
+ edac_dbg(0, " DCSB0[%d]=0x%08x reg: F2x%x\n",
+ cs, *base0, reg0);

- if (pvt->fam == 0xf)
- continue;
+ if (pvt->fam == 0xf)
+ continue;

- if (!amd64_read_dct_pci_cfg(pvt, 1, reg0, base1))
- edac_dbg(0, " DCSB1[%d]=0x%08x reg: F2x%x\n",
- cs, *base1, (pvt->fam == 0x10) ? reg1
- : reg0);
- }
+ if (!amd64_read_dct_pci_cfg(pvt, 1, reg0, base1))
+ edac_dbg(0, " DCSB1[%d]=0x%08x reg: F2x%x\n",
+ cs, *base1, (pvt->fam == 0x10) ? reg1
+ : reg0);
}

for_each_chip_select_mask(cs, 0, pvt) {
- int reg0 = mask_reg0 + (cs * 4);
- int reg1 = mask_reg1 + (cs * 4);
+ int reg0 = DCSM0 + (cs * 4);
+ int reg1 = DCSM1 + (cs * 4);
u32 *mask0 = &pvt->csels[0].csmasks[cs];
u32 *mask1 = &pvt->csels[1].csmasks[cs];

- if (pvt->umc) {
- if (!amd_smn_read(pvt->mc_node_id, reg0, mask0))
- edac_dbg(0, " DCSM0[%d]=0x%08x reg: 0x%x\n",
- cs, *mask0, reg0);
-
- if (!amd_smn_read(pvt->mc_node_id, reg1, mask1))
- edac_dbg(0, " DCSM1[%d]=0x%08x reg: 0x%x\n",
- cs, *mask1, reg1);
- } else {
- if (!amd64_read_dct_pci_cfg(pvt, 0, reg0, mask0))
- edac_dbg(0, " DCSM0[%d]=0x%08x reg: F2x%x\n",
- cs, *mask0, reg0);
+ if (!amd64_read_dct_pci_cfg(pvt, 0, reg0, mask0))
+ edac_dbg(0, " DCSM0[%d]=0x%08x reg: F2x%x\n",
+ cs, *mask0, reg0);

- if (pvt->fam == 0xf)
- continue;
+ if (pvt->fam == 0xf)
+ continue;

- if (!amd64_read_dct_pci_cfg(pvt, 1, reg0, mask1))
- edac_dbg(0, " DCSM1[%d]=0x%08x reg: F2x%x\n",
- cs, *mask1, (pvt->fam == 0x10) ? reg1
- : reg0);
- }
+ if (!amd64_read_dct_pci_cfg(pvt, 1, reg0, mask1))
+ edac_dbg(0, " DCSM1[%d]=0x%08x reg: F2x%x\n",
+ cs, *mask1, (pvt->fam == 0x10) ? reg1
+ : reg0);
}
}

diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 8f66472f7adc..4dce6a2ac75f 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -96,6 +96,7 @@
/* Hardware limit on ChipSelect rows per MC and processors per system */
#define NUM_CHIPSELECTS 8
#define DRAM_RANGES 8
+#define NUM_CONTROLLERS 8

#define ON true
#define OFF false
@@ -351,8 +352,8 @@ struct amd64_pvt {
u32 dbam0; /* DRAM Base Address Mapping reg for DCT0 */
u32 dbam1; /* DRAM Base Address Mapping reg for DCT1 */

- /* one for each DCT */
- struct chip_select csels[2];
+ /* one for each DCT/UMC */
+ struct chip_select csels[NUM_CONTROLLERS];

/* DRAM base and limit pairs F1x[78,70,68,60,58,50,48,40] */
struct dram_range ranges[DRAM_RANGES];
--
2.17.1

2019-07-09 22:22:21

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH v2 5/7] EDAC/amd64: Decode syndrome before translating address

From: Yazen Ghannam <[email protected]>

AMD Family 17h systems currently require address translation in order to
report the system address of a DRAM ECC error. This is currently done
before decoding the syndrome information. The syndrome information does
not depend on the address translation, so the proper EDAC csrow/channel
reporting can function without the address. However, the syndrome
information will not be decoded if the address translation fails.

Decode the syndrome information before doing the address translation.
The syndrome information is architecturally defined in MCA_SYND and can
be considered robust. The address translation is system-specific and may
fail on newer systems without proper updates to the translation
algorithm.

Fixes: 713ad54675fd ("EDAC, amd64: Define and register UMC error decode function")
Signed-off-by: Yazen Ghannam <[email protected]>
---
Link:
https://lkml.kernel.org/r/[email protected]

v1->v2:
* No change.

drivers/edac/amd64_edac.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index f0424c10cac0..4058b24b8e04 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -2567,13 +2567,6 @@ static void decode_umc_error(int node_id, struct mce *m)

err.channel = find_umc_channel(m);

- if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, err.channel, &sys_addr)) {
- err.err_code = ERR_NORM_ADDR;
- goto log_error;
- }
-
- error_address_to_page_and_offset(sys_addr, &err);
-
if (!(m->status & MCI_STATUS_SYNDV)) {
err.err_code = ERR_SYND;
goto log_error;
@@ -2590,6 +2583,13 @@ static void decode_umc_error(int node_id, struct mce *m)

err.csrow = m->synd & 0x7;

+ if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, err.channel, &sys_addr)) {
+ err.err_code = ERR_NORM_ADDR;
+ goto log_error;
+ }
+
+ error_address_to_page_and_offset(sys_addr, &err);
+
log_error:
__log_ecc_error(mci, &err, ecc_type);
}
--
2.17.1

2019-07-09 22:22:45

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH v2 3/7] EDAC/amd64: Initialize DIMM info for systems with more than two channels

From: Yazen Ghannam <[email protected]>

Currently, the DIMM info for AMD Family 17h systems is initialized in
init_csrows(). This function is shared with legacy systems, and it has a
limit of two channel support.

This prevents initialization of the DIMM info for a number of ranks, so
there will be missing ranks in the EDAC sysfs.

Create a new init_csrows_df() for Family17h+ and revert init_csrows()
back to pre-Family17h support.

Loop over all channels in the new function in order to support systems
with more than two channels.

Fixes: bdcee7747f5c ("EDAC/amd64: Support more than two Unified Memory Controllers")
Signed-off-by: Yazen Ghannam <[email protected]>
---
Link:
https://lkml.kernel.org/r/[email protected]

v1->v2:
* No change.

drivers/edac/amd64_edac.c | 63 ++++++++++++++++++++++++++++++---------
1 file changed, 49 insertions(+), 14 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 125d6e2a828e..d0926b181c7c 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -2837,6 +2837,46 @@ static u32 get_csrow_nr_pages(struct amd64_pvt *pvt, u8 dct, int csrow_nr_orig)
return nr_pages;
}

+static int init_csrows_df(struct mem_ctl_info *mci)
+{
+ struct amd64_pvt *pvt = mci->pvt_info;
+ enum edac_type edac_mode = EDAC_NONE;
+ enum dev_type dev_type = DEV_UNKNOWN;
+ struct dimm_info *dimm;
+ int empty = 1;
+ u8 umc, cs;
+
+ if (mci->edac_ctl_cap & EDAC_FLAG_S16ECD16ED) {
+ edac_mode = EDAC_S16ECD16ED;
+ dev_type = DEV_X16;
+ } else if (mci->edac_ctl_cap & EDAC_FLAG_S4ECD4ED) {
+ edac_mode = EDAC_S4ECD4ED;
+ dev_type = DEV_X4;
+ } else if (mci->edac_ctl_cap & EDAC_FLAG_SECDED) {
+ edac_mode = EDAC_SECDED;
+ }
+
+ for_each_umc(umc) {
+ for_each_chip_select(cs, umc, pvt) {
+ if (!csrow_enabled(cs, umc, pvt))
+ continue;
+
+ empty = 0;
+ dimm = mci->csrows[cs]->channels[umc]->dimm;
+
+ edac_dbg(1, "MC node: %d, csrow: %d\n",
+ pvt->mc_node_id, cs);
+
+ dimm->nr_pages = get_csrow_nr_pages(pvt, umc, cs);
+ dimm->mtype = pvt->dram_type;
+ dimm->edac_mode = edac_mode;
+ dimm->dtype = dev_type;
+ }
+ }
+
+ return empty;
+}
+
/*
* Initialize the array of csrow attribute instances, based on the values
* from pci config hardware registers.
@@ -2851,15 +2891,16 @@ static int init_csrows(struct mem_ctl_info *mci)
int nr_pages = 0;
u32 val;

- if (!pvt->umc) {
- amd64_read_pci_cfg(pvt->F3, NBCFG, &val);
+ if (pvt->umc)
+ return init_csrows_df(mci);
+
+ amd64_read_pci_cfg(pvt->F3, NBCFG, &val);

- pvt->nbcfg = val;
+ pvt->nbcfg = val;

- edac_dbg(0, "node %d, NBCFG=0x%08x[ChipKillEccCap: %d|DramEccEn: %d]\n",
- pvt->mc_node_id, val,
- !!(val & NBCFG_CHIPKILL), !!(val & NBCFG_ECC_ENABLE));
- }
+ edac_dbg(0, "node %d, NBCFG=0x%08x[ChipKillEccCap: %d|DramEccEn: %d]\n",
+ pvt->mc_node_id, val,
+ !!(val & NBCFG_CHIPKILL), !!(val & NBCFG_ECC_ENABLE));

/*
* We iterate over DCT0 here but we look at DCT1 in parallel, if needed.
@@ -2896,13 +2937,7 @@ static int init_csrows(struct mem_ctl_info *mci)
edac_dbg(1, "Total csrow%d pages: %u\n", i, nr_pages);

/* Determine DIMM ECC mode: */
- if (pvt->umc) {
- if (mci->edac_ctl_cap & EDAC_FLAG_S4ECD4ED)
- edac_mode = EDAC_S4ECD4ED;
- else if (mci->edac_ctl_cap & EDAC_FLAG_SECDED)
- edac_mode = EDAC_SECDED;
-
- } else if (pvt->nbcfg & NBCFG_ECC_ENABLE) {
+ if (pvt->nbcfg & NBCFG_ECC_ENABLE) {
edac_mode = (pvt->nbcfg & NBCFG_CHIPKILL)
? EDAC_S4ECD4ED
: EDAC_SECDED;
--
2.17.1

2019-07-09 22:23:06

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH v2 4/7] EDAC/amd64: Find Chip Select memory size using Address Mask

From: Yazen Ghannam <[email protected]>

Chip Select memory size reporting on AMD Family 17h was recently fixed
in order to account for interleaving. However, the current method is not
robust.

The Chip Select Address Mask can be used to find the memory size. There
are a few cases.

1) For single-rank, use the address mask as the size.

2) For dual-rank non-interleaved, use the address mask divided by 2 as
the size.

3) For dual-rank interleaved, do #2 but "de-interleave" the address mask
first.

Always "de-interleave" the address mask in order to simplify the code
flow. Bit mask manipulation is necessary to check for interleaving, so
just go ahead do the de-interleaving. In the non-interleaved case, the
original and de-interleaved address masks will be the same.

To de-interleave the mask, count the number of zero bits in the middle
of the mask and swap them with the most significant bits.

For example,
Original=0xFFFF9FE, De-interleaved=0x3FFFFFE

Fixes: fc00c6a41638 ("EDAC/amd64: Adjust printed chip select sizes when interleaved")
Signed-off-by: Yazen Ghannam <[email protected]>
---
Link:
https://lkml.kernel.org/r/[email protected]

v1->v2:
* No change.

drivers/edac/amd64_edac.c | 107 ++++++++++++++++++++++----------------
1 file changed, 63 insertions(+), 44 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index d0926b181c7c..f0424c10cac0 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -788,51 +788,36 @@ static void debug_dump_dramcfg_low(struct amd64_pvt *pvt, u32 dclr, int chan)
(dclr & BIT(15)) ? "yes" : "no");
}

-/*
- * The Address Mask should be a contiguous set of bits in the non-interleaved
- * case. So to check for CS interleaving, find the most- and least-significant
- * bits of the mask, generate a contiguous bitmask, and compare the two.
- */
-static bool f17_cs_interleaved(struct amd64_pvt *pvt, u8 ctrl, int cs)
+#define CS_EVEN_PRIMARY BIT(0)
+#define CS_ODD_PRIMARY BIT(1)
+
+static int f17_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt)
{
- u32 mask = pvt->csels[ctrl].csmasks[cs >> 1];
- u32 msb = fls(mask) - 1, lsb = ffs(mask) - 1;
- u32 test_mask = GENMASK(msb, lsb);
+ int cs_mode = 0;
+
+ if (csrow_enabled(2 * dimm, ctrl, pvt))
+ cs_mode |= CS_EVEN_PRIMARY;

- edac_dbg(1, "mask=0x%08x test_mask=0x%08x\n", mask, test_mask);
+ if (csrow_enabled(2 * dimm + 1, ctrl, pvt))
+ cs_mode |= CS_ODD_PRIMARY;

- return mask ^ test_mask;
+ return cs_mode;
}

static void debug_display_dimm_sizes_df(struct amd64_pvt *pvt, u8 ctrl)
{
- int dimm, size0, size1, cs0, cs1;
+ int dimm, size0, size1, cs0, cs1, cs_mode;

edac_printk(KERN_DEBUG, EDAC_MC, "UMC%d chip selects:\n", ctrl);

for (dimm = 0; dimm < 2; dimm++) {
- size0 = 0;
cs0 = dimm * 2;
-
- if (csrow_enabled(cs0, ctrl, pvt))
- size0 = pvt->ops->dbam_to_cs(pvt, ctrl, 0, cs0);
-
- size1 = 0;
cs1 = dimm * 2 + 1;

- if (csrow_enabled(cs1, ctrl, pvt)) {
- /*
- * CS interleaving is only supported if both CSes have
- * the same amount of memory. Because they are
- * interleaved, it will look like both CSes have the
- * full amount of memory. Save the size for both as
- * half the amount we found on CS0, if interleaved.
- */
- if (f17_cs_interleaved(pvt, ctrl, cs1))
- size1 = size0 = (size0 >> 1);
- else
- size1 = pvt->ops->dbam_to_cs(pvt, ctrl, 0, cs1);
- }
+ cs_mode = f17_get_cs_mode(dimm, ctrl, pvt);
+
+ size0 = pvt->ops->dbam_to_cs(pvt, ctrl, cs_mode, cs0);
+ size1 = pvt->ops->dbam_to_cs(pvt, ctrl, cs_mode, cs1);

amd64_info(EDAC_MC ": %d: %5dMB %d: %5dMB\n",
cs0, size0,
@@ -1569,18 +1554,50 @@ static int f16_dbam_to_chip_select(struct amd64_pvt *pvt, u8 dct,
return ddr3_cs_size(cs_mode, false);
}

-static int f17_base_addr_to_cs_size(struct amd64_pvt *pvt, u8 umc,
+static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
unsigned int cs_mode, int csrow_nr)
{
- u32 base_addr = pvt->csels[umc].csbases[csrow_nr];
+ u32 addr_mask_orig, addr_mask_deinterleaved;
+ u32 msb, weight, num_zero_bits;
+ int dimm, dual_rank, size = 0;

- /* Each mask is used for every two base addresses. */
- u32 addr_mask = pvt->csels[umc].csmasks[csrow_nr >> 1];
+ if (!cs_mode)
+ return size;

- /* Register [31:1] = Address [39:9]. Size is in kBs here. */
- u32 size = ((addr_mask >> 1) - (base_addr >> 1) + 1) >> 1;
+ dual_rank = !!(cs_mode & (CS_EVEN_PRIMARY | CS_ODD_PRIMARY));

- edac_dbg(1, "BaseAddr: 0x%x, AddrMask: 0x%x\n", base_addr, addr_mask);
+ /*
+ * There is one mask per DIMM, and two Chip Selects per DIMM.
+ * CS0 and CS1 -> DIMM0
+ * CS2 and CS3 -> DIMM1
+ */
+ dimm = csrow_nr >> 1;
+
+ addr_mask_orig = pvt->csels[umc].csmasks[dimm];
+
+ /*
+ * The number of zero bits in the mask is equal to the number of bits
+ * in a full mask minus the number of bits in the current mask.
+ *
+ * The MSB is the number of bits in the full mask because BIT[0] is
+ * always 0.
+ */
+ msb = fls(addr_mask_orig) - 1;
+ weight = hweight_long(addr_mask_orig);
+ num_zero_bits = msb - weight;
+
+ /* Take the number of zero bits off from the top of the mask. */
+ addr_mask_deinterleaved = GENMASK_ULL(msb - num_zero_bits, 1);
+
+ edac_dbg(1, "CS%d DIMM%d AddrMasks:\n", csrow_nr, dimm);
+ edac_dbg(1, " Original AddrMask: 0x%x\n", addr_mask_orig);
+ edac_dbg(1, " Deinterleaved AddrMask: 0x%x\n", addr_mask_deinterleaved);
+
+ /* Register [31:1] = Address [39:9]. Size is in kBs here. */
+ size = addr_mask_deinterleaved >> 1;
+
+ /* Divide size by two if dual rank. */
+ size >>= dual_rank;

/* Return size in MBs. */
return size >> 10;
@@ -2245,7 +2262,7 @@ static struct amd64_family_type family_types[] = {
.f6_id = PCI_DEVICE_ID_AMD_17H_DF_F6,
.ops = {
.early_channel_count = f17_early_channel_count,
- .dbam_to_cs = f17_base_addr_to_cs_size,
+ .dbam_to_cs = f17_addr_mask_to_cs_size,
}
},
[F17_M10H_CPUS] = {
@@ -2254,7 +2271,7 @@ static struct amd64_family_type family_types[] = {
.f6_id = PCI_DEVICE_ID_AMD_17H_M10H_DF_F6,
.ops = {
.early_channel_count = f17_early_channel_count,
- .dbam_to_cs = f17_base_addr_to_cs_size,
+ .dbam_to_cs = f17_addr_mask_to_cs_size,
}
},
[F17_M30H_CPUS] = {
@@ -2263,7 +2280,7 @@ static struct amd64_family_type family_types[] = {
.f6_id = PCI_DEVICE_ID_AMD_17H_M30H_DF_F6,
.ops = {
.early_channel_count = f17_early_channel_count,
- .dbam_to_cs = f17_base_addr_to_cs_size,
+ .dbam_to_cs = f17_addr_mask_to_cs_size,
}
},
};
@@ -2822,10 +2839,12 @@ static u32 get_csrow_nr_pages(struct amd64_pvt *pvt, u8 dct, int csrow_nr_orig)
int csrow_nr = csrow_nr_orig;
u32 cs_mode, nr_pages;

- if (!pvt->umc)
+ if (!pvt->umc) {
csrow_nr >>= 1;
-
- cs_mode = DBAM_DIMM(csrow_nr, dbam);
+ cs_mode = DBAM_DIMM(csrow_nr, dbam);
+ } else {
+ cs_mode = f17_get_cs_mode(csrow_nr >> 1, dct, pvt);
+ }

nr_pages = pvt->ops->dbam_to_cs(pvt, dct, cs_mode, csrow_nr);
nr_pages <<= 20 - PAGE_SHIFT;
--
2.17.1

2019-07-10 18:03:04

by Kim Phillips

[permalink] [raw]
Subject: Re: [PATCH v2 1/7] EDAC/amd64: Support more than two controllers for chip selects handling

On 7/9/19 4:56 PM, Ghannam, Yazen wrote:
> From: Yazen Ghannam <[email protected]>
>
> The struct chip_select array that's used for saving chip select bases
> and masks is fixed at length of two. There should be one struct
> chip_select for each controller, so this array should be increased to
> support systems that may have more than two controllers.
>
> Increase the size of the struct chip_select array to eight, which is the
> largest number of controllers per die currently supported on AMD
> systems.
>
> Fix number of DIMMs and Chip Select bases/masks on Family17h, because AMD
> Family 17h systems support 2 DIMMs, 4 CS bases, and 2 CS masks per
> channel.
>
> Also, carve out the Family 17h+ reading of the bases/masks into a
> separate function. This effectively reverts the original bases/masks
> reading code to before Family 17h support was added.
>
> This is a second version of a commit that was reverted.
>
> Fixes: 07ed82ef93d6 ("EDAC, amd64: Add Fam17h debug output")
> Fixes: 8de9930a4618 ("Revert "EDAC/amd64: Support more than two controllers for chip select handling"")
> Signed-off-by: Yazen Ghannam <[email protected]>
> ---

For this and the rest of the series:

Tested-by: Kim Phillips <[email protected]>

Thanks,

Kim

2019-08-02 12:11:21

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 1/7] EDAC/amd64: Support more than two controllers for chip selects handling

On Tue, Jul 09, 2019 at 09:56:54PM +0000, Ghannam, Yazen wrote:
> From: Yazen Ghannam <[email protected]>
>
> The struct chip_select array that's used for saving chip select bases
> and masks is fixed at length of two. There should be one struct
> chip_select for each controller, so this array should be increased to
> support systems that may have more than two controllers.
>
> Increase the size of the struct chip_select array to eight, which is the
> largest number of controllers per die currently supported on AMD
> systems.
>
> Fix number of DIMMs and Chip Select bases/masks on Family17h, because AMD
> Family 17h systems support 2 DIMMs, 4 CS bases, and 2 CS masks per
> channel.
>
> Also, carve out the Family 17h+ reading of the bases/masks into a
> separate function. This effectively reverts the original bases/masks
> reading code to before Family 17h support was added.
>
> This is a second version of a commit that was reverted.
>
> Fixes: 07ed82ef93d6 ("EDAC, amd64: Add Fam17h debug output")
> Fixes: 8de9930a4618 ("Revert "EDAC/amd64: Support more than two controllers for chip select handling"")

I'm not sure about those Fixes: tags you're slapping everywhere. First
of all, 8de9930a4618 is a revert so how can this be fixing a revert? If
anything, it should be fixing the original commit

0a227af521d6 ("EDAC/amd64: Support more than two controllers for chip select handling")

which tried the more-than-2-memory-controllers thing.

But, it is not really a fix for that commit but a second attempt at it.
Which is not really a fix but hw enablement.

So I'm dropping those tags here. If you want them in stable, pls
backport them properly and test them on the respective stable kernels
before sending them to stable.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-08-02 12:55:02

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 2/7] EDAC/amd64: Recognize DRAM device type with EDAC_CTL_CAP

On Tue, Jul 09, 2019 at 09:56:55PM +0000, Ghannam, Yazen wrote:
> From: Yazen Ghannam <[email protected]>
>
> AMD Family 17h systems support x4 and x16 DRAM devices. However, the
> device type is not checked when setting EDAC_CTL_CAP.
>
> Set the appropriate EDAC_CTL_CAP flag based on the device type.
>
> Fixes: 2d09d8f301f5 ("EDAC, amd64: Determine EDAC MC capabilities on Fam17h")

This is better: a patch which fixes a previous patch and is simple,
small and clear. That you can tag with Fixes: just fine.

> Signed-off-by: Yazen Ghannam <[email protected]>
> ---
> Link:
> https://lkml.kernel.org/r/[email protected]
>
> v1->v2:
> * No change.
>
> drivers/edac/amd64_edac.c | 13 ++++++++++---
> 1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
> index dd60cf5a3d96..125d6e2a828e 100644
> --- a/drivers/edac/amd64_edac.c
> +++ b/drivers/edac/amd64_edac.c
> @@ -3150,12 +3150,15 @@ static bool ecc_enabled(struct pci_dev *F3, u16 nid)
> static inline void
> f17h_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt *pvt)
> {
> - u8 i, ecc_en = 1, cpk_en = 1;
> + u8 i, ecc_en = 1, cpk_en = 1, dev_x4 = 1, dev_x16 = 1;
>
> for_each_umc(i) {
> if (pvt->umc[i].sdp_ctrl & UMC_SDP_INIT) {
> ecc_en &= !!(pvt->umc[i].umc_cap_hi & UMC_ECC_ENABLED);
> cpk_en &= !!(pvt->umc[i].umc_cap_hi & UMC_ECC_CHIPKILL_CAP);
> +
> + dev_x4 &= !!(pvt->umc[i].dimm_cfg & BIT(6));
> + dev_x16 &= !!(pvt->umc[i].dimm_cfg & BIT(7));

Are those bits mutually exclusive?

I.e., so that you can do:

if (dev_x4)
mci->edac_ctl_cap |= EDAC_FLAG_S4ECD4ED;
else
mci->edac_ctl_cap |= EDAC_FLAG_S16ECD16ED;

?

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-08-03 18:26:05

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 0/7] AMD64 EDAC fixes

On Tue, Jul 09, 2019 at 09:56:54PM +0000, Ghannam, Yazen wrote:
> From: Yazen Ghannam <[email protected]>
>
> Hi Boris,
>
> This set contains a few fixes for some changes merged in v5.2. There
> are also a couple of fixes for older issues. In addition, there are a
> couple of patches to add support for Asymmetric Dual-Rank DIMMs.
>
> Thanks,
> Yazen
>
> Link:
> https://lkml.kernel.org/r/[email protected]
>
> v1->v2:
> * Squash patches 1 and 2 together.
>
> Yazen Ghannam (7):
> EDAC/amd64: Support more than two controllers for chip selects
> handling
> EDAC/amd64: Recognize DRAM device type with EDAC_CTL_CAP
> EDAC/amd64: Initialize DIMM info for systems with more than two
> channels
> EDAC/amd64: Find Chip Select memory size using Address Mask
> EDAC/amd64: Decode syndrome before translating address
> EDAC/amd64: Cache secondary Chip Select registers
> EDAC/amd64: Support Asymmetric Dual-Rank DIMMs
>
> drivers/edac/amd64_edac.c | 348 ++++++++++++++++++++++++--------------
> drivers/edac/amd64_edac.h | 9 +-
> 2 files changed, 232 insertions(+), 125 deletions(-)

So this still has this confusing reporting of unpopulated nodes:

[ 4.291774] EDAC MC1: Giving out device to module amd64_edac controller F17h: DEV 0000:00:19.3 (INTERRUPT)
[ 4.292021] EDAC DEBUG: ecc_enabled: Node 2: No enabled UMCs.
[ 4.292231] EDAC amd64: Node 2: DRAM ECC disabled.
[ 4.292405] EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
[ 4.292859] EDAC DEBUG: ecc_enabled: Node 3: No enabled UMCs.
[ 4.292963] EDAC amd64: Node 3: DRAM ECC disabled.
[ 4.293063] EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
[ 4.293347] AMD64 EDAC driver v3.5.0

which needs fixing.

Regardless, still not good enough. The snowy owl box I have here has 16
GB:

$ head -n1 /proc/meminfo
MemTotal: 15715328 kB

and yet

[ 4.282251] EDAC MC: UMC0 chip selects:
[ 4.282348] EDAC DEBUG: f17_addr_mask_to_cs_size: CS0 DIMM0 AddrMasks:
[ 4.282455] EDAC DEBUG: f17_addr_mask_to_cs_size: Original AddrMask: 0x1fffffe
[ 4.282592] EDAC DEBUG: f17_addr_mask_to_cs_size: Deinterleaved AddrMask: 0x1fffffe
[ 4.282732] EDAC DEBUG: f17_addr_mask_to_cs_size: CS1 DIMM0 AddrMasks:
[ 4.282839] EDAC DEBUG: f17_addr_mask_to_cs_size: Original AddrMask: 0x1fffffe
[ 4.283060] EDAC DEBUG: f17_addr_mask_to_cs_size: Deinterleaved AddrMask: 0x1fffffe
[ 4.283286] EDAC amd64: MC: 0: 8191MB 1: 8191MB
^^^^^^^^^^^^^^^^^

[ 4.283456] EDAC amd64: MC: 2: 0MB 3: 0MB

...

[ 4.285379] EDAC MC: UMC1 chip selects:
[ 4.285476] EDAC DEBUG: f17_addr_mask_to_cs_size: CS0 DIMM0 AddrMasks:
[ 4.285583] EDAC DEBUG: f17_addr_mask_to_cs_size: Original AddrMask: 0x1fffffe
[ 4.285721] EDAC DEBUG: f17_addr_mask_to_cs_size: Deinterleaved AddrMask: 0x1fffffe
[ 4.285860] EDAC DEBUG: f17_addr_mask_to_cs_size: CS1 DIMM0 AddrMasks:
[ 4.285967] EDAC DEBUG: f17_addr_mask_to_cs_size: Original AddrMask: 0x1fffffe
[ 4.286105] EDAC DEBUG: f17_addr_mask_to_cs_size: Deinterleaved AddrMask: 0x1fffffe
[ 4.286244] EDAC amd64: MC: 0: 8191MB 1: 8191MB
^^^^^^^^^^^^^^^^^

[ 4.286345] EDAC amd64: MC: 2: 0MB 3: 0MB

which shows 4 chip selects x 8Gb = 32G.

So something's still wrong. Before the patchset it says:

EDAC MC: UMC0 chip selects:
EDAC amd64: MC: 0: 8192MB 1: 0MB
...
EDAC MC: UMC1 chip selects:
EDAC amd64: MC: 0: 8192MB 1: 0MB

which is the correct output.

Thx.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-08-15 20:45:42

by Yazen Ghannam

[permalink] [raw]
Subject: RE: [PATCH v2 0/7] AMD64 EDAC fixes

> -----Original Message-----
> From: [email protected] <[email protected]> On Behalf Of Borislav Petkov
> Sent: Friday, August 2, 2019 9:46 AM
> To: Ghannam, Yazen <[email protected]>
> Cc: [email protected]; [email protected]
> Subject: Re: [PATCH v2 0/7] AMD64 EDAC fixes
>
...
>
> So this still has this confusing reporting of unpopulated nodes:
>
> [ 4.291774] EDAC MC1: Giving out device to module amd64_edac controller F17h: DEV 0000:00:19.3 (INTERRUPT)
> [ 4.292021] EDAC DEBUG: ecc_enabled: Node 2: No enabled UMCs.
> [ 4.292231] EDAC amd64: Node 2: DRAM ECC disabled.
> [ 4.292405] EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
> [ 4.292859] EDAC DEBUG: ecc_enabled: Node 3: No enabled UMCs.
> [ 4.292963] EDAC amd64: Node 3: DRAM ECC disabled.
> [ 4.293063] EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
> [ 4.293347] AMD64 EDAC driver v3.5.0
>
> which needs fixing.
>

Yes, I agree. I was planning to do a fix in a separate set. Is that okay? Or should I add it here?

> Regardless, still not good enough. The snowy owl box I have here has 16
> GB:
>
> $ head -n1 /proc/meminfo
> MemTotal: 15715328 kB
>
> and yet
>
> [ 4.282251] EDAC MC: UMC0 chip selects:
> [ 4.282348] EDAC DEBUG: f17_addr_mask_to_cs_size: CS0 DIMM0 AddrMasks:
> [ 4.282455] EDAC DEBUG: f17_addr_mask_to_cs_size: Original AddrMask: 0x1fffffe
> [ 4.282592] EDAC DEBUG: f17_addr_mask_to_cs_size: Deinterleaved AddrMask: 0x1fffffe
> [ 4.282732] EDAC DEBUG: f17_addr_mask_to_cs_size: CS1 DIMM0 AddrMasks:
> [ 4.282839] EDAC DEBUG: f17_addr_mask_to_cs_size: Original AddrMask: 0x1fffffe
> [ 4.283060] EDAC DEBUG: f17_addr_mask_to_cs_size: Deinterleaved AddrMask: 0x1fffffe
> [ 4.283286] EDAC amd64: MC: 0: 8191MB 1: 8191MB
> ^^^^^^^^^^^^^^^^^
>
> [ 4.283456] EDAC amd64: MC: 2: 0MB 3: 0MB
>
> ...
>
> [ 4.285379] EDAC MC: UMC1 chip selects:
> [ 4.285476] EDAC DEBUG: f17_addr_mask_to_cs_size: CS0 DIMM0 AddrMasks:
> [ 4.285583] EDAC DEBUG: f17_addr_mask_to_cs_size: Original AddrMask: 0x1fffffe
> [ 4.285721] EDAC DEBUG: f17_addr_mask_to_cs_size: Deinterleaved AddrMask: 0x1fffffe
> [ 4.285860] EDAC DEBUG: f17_addr_mask_to_cs_size: CS1 DIMM0 AddrMasks:
> [ 4.285967] EDAC DEBUG: f17_addr_mask_to_cs_size: Original AddrMask: 0x1fffffe
> [ 4.286105] EDAC DEBUG: f17_addr_mask_to_cs_size: Deinterleaved AddrMask: 0x1fffffe
> [ 4.286244] EDAC amd64: MC: 0: 8191MB 1: 8191MB
> ^^^^^^^^^^^^^^^^^
>
> [ 4.286345] EDAC amd64: MC: 2: 0MB 3: 0MB
>
> which shows 4 chip selects x 8Gb = 32G.
>
> So something's still wrong. Before the patchset it says:
>
> EDAC MC: UMC0 chip selects:
> EDAC amd64: MC: 0: 8192MB 1: 0MB
> ...
> EDAC MC: UMC1 chip selects:
> EDAC amd64: MC: 0: 8192MB 1: 0MB
>
> which is the correct output.
>

Can you please send me the full kernel log and dmidecode output?

Thanks,
Yazen

2019-08-16 06:47:51

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 0/7] AMD64 EDAC fixes

On Thu, Aug 15, 2019 at 08:08:39PM +0000, Ghannam, Yazen wrote:
> Yes, I agree. I was planning to do a fix in a separate set. Is that
> okay? Or should I add it here?

If you have it, send it on, sure.

Thx.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-08-19 19:57:45

by Yazen Ghannam

[permalink] [raw]
Subject: RE: [PATCH v2 1/7] EDAC/amd64: Support more than two controllers for chip selects handling

> -----Original Message-----
> From: Borislav Petkov <[email protected]>
> Sent: Friday, August 2, 2019 1:50 AM
> To: Ghannam, Yazen <[email protected]>
> Cc: [email protected]; [email protected]
> Subject: Re: [PATCH v2 1/7] EDAC/amd64: Support more than two controllers for chip selects handling
>
> On Tue, Jul 09, 2019 at 09:56:54PM +0000, Ghannam, Yazen wrote:
> > From: Yazen Ghannam <[email protected]>
> >
> > The struct chip_select array that's used for saving chip select bases
> > and masks is fixed at length of two. There should be one struct
> > chip_select for each controller, so this array should be increased to
> > support systems that may have more than two controllers.
> >
> > Increase the size of the struct chip_select array to eight, which is the
> > largest number of controllers per die currently supported on AMD
> > systems.
> >
> > Fix number of DIMMs and Chip Select bases/masks on Family17h, because AMD
> > Family 17h systems support 2 DIMMs, 4 CS bases, and 2 CS masks per
> > channel.
> >
> > Also, carve out the Family 17h+ reading of the bases/masks into a
> > separate function. This effectively reverts the original bases/masks
> > reading code to before Family 17h support was added.
> >
> > This is a second version of a commit that was reverted.
> >
> > Fixes: 07ed82ef93d6 ("EDAC, amd64: Add Fam17h debug output")
> > Fixes: 8de9930a4618 ("Revert "EDAC/amd64: Support more than two controllers for chip select handling"")
>
> I'm not sure about those Fixes: tags you're slapping everywhere. First
> of all, 8de9930a4618 is a revert so how can this be fixing a revert? If
> anything, it should be fixing the original commit
>
> 0a227af521d6 ("EDAC/amd64: Support more than two controllers for chip select handling")
>
> which tried the more-than-2-memory-controllers thing.
>
> But, it is not really a fix for that commit but a second attempt at it.
> Which is not really a fix but hw enablement.
>
> So I'm dropping those tags here. If you want them in stable, pls
> backport them properly and test them on the respective stable kernels
> before sending them to stable.
>

Okay, no problem.

Should I drop the Fixes tags on any other of the patches in this set?

Thanks,
Yazen

2019-08-19 20:21:22

by Yazen Ghannam

[permalink] [raw]
Subject: RE: [PATCH v2 2/7] EDAC/amd64: Recognize DRAM device type with EDAC_CTL_CAP

> -----Original Message-----
> From: [email protected] <[email protected]> On Behalf Of Borislav Petkov
> Sent: Friday, August 2, 2019 2:42 AM
> To: Ghannam, Yazen <[email protected]>
> Cc: [email protected]; [email protected]
> Subject: Re: [PATCH v2 2/7] EDAC/amd64: Recognize DRAM device type with EDAC_CTL_CAP
>
> On Tue, Jul 09, 2019 at 09:56:55PM +0000, Ghannam, Yazen wrote:
> > From: Yazen Ghannam <[email protected]>
> >
> > AMD Family 17h systems support x4 and x16 DRAM devices. However, the
> > device type is not checked when setting EDAC_CTL_CAP.
> >
> > Set the appropriate EDAC_CTL_CAP flag based on the device type.
> >
> > Fixes: 2d09d8f301f5 ("EDAC, amd64: Determine EDAC MC capabilities on Fam17h")
>
> This is better: a patch which fixes a previous patch and is simple,
> small and clear. That you can tag with Fixes: just fine.
>
> > Signed-off-by: Yazen Ghannam <[email protected]>
> > ---
> > Link:
> > https://lkml.kernel.org/r/[email protected]
> >
> > v1->v2:
> > * No change.
> >
> > drivers/edac/amd64_edac.c | 13 ++++++++++---
> > 1 file changed, 10 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
> > index dd60cf5a3d96..125d6e2a828e 100644
> > --- a/drivers/edac/amd64_edac.c
> > +++ b/drivers/edac/amd64_edac.c
> > @@ -3150,12 +3150,15 @@ static bool ecc_enabled(struct pci_dev *F3, u16 nid)
> > static inline void
> > f17h_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt *pvt)
> > {
> > - u8 i, ecc_en = 1, cpk_en = 1;
> > + u8 i, ecc_en = 1, cpk_en = 1, dev_x4 = 1, dev_x16 = 1;
> >
> > for_each_umc(i) {
> > if (pvt->umc[i].sdp_ctrl & UMC_SDP_INIT) {
> > ecc_en &= !!(pvt->umc[i].umc_cap_hi & UMC_ECC_ENABLED);
> > cpk_en &= !!(pvt->umc[i].umc_cap_hi & UMC_ECC_CHIPKILL_CAP);
> > +
> > + dev_x4 &= !!(pvt->umc[i].dimm_cfg & BIT(6));
> > + dev_x16 &= !!(pvt->umc[i].dimm_cfg & BIT(7));
>
> Are those bits mutually exclusive?
>
> I.e., so that you can do:
>
> if (dev_x4)
> mci->edac_ctl_cap |= EDAC_FLAG_S4ECD4ED;
> else
> mci->edac_ctl_cap |= EDAC_FLAG_S16ECD16ED;
>
> ?
>

I don't think so. I believe they can both be zero. I'll verify and make the change if they are mutually exclusive.

Thanks,
Yazen