2021-12-15 16:01:51

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH v2 0/2] AMD Family 19h Models 10h-1Fh Updates

Hi all,

This set adds support for AMD Family 19h Models 10h-1Fh and A0h-AFh.

This second revision includes patches 3 and 4 from the first. Patches 1
and 2 from the first set were applied.

This set is based on the following branches:
- tip/master
- ras/edac-for-next
- groeck/linux-staging/hwmon-next

The following commit in hwmon-next is needed for functional support of
this set.

49e90c39d0be ("x86/amd_nb: Add AMD Family 19h Models (10h-1Fh) and (A0h-AFh) PCI IDs")

Patch 1 is a small fixup on how some registers are checked. The patch is
functionally the same as patch 3 in set 1, but the commit message
updated to give more details.

Patch 2 adds register offset and other minor changes introduced with
these new models. The patch is updated to use a different name for a new
flag (thanks Boris for the suggestion).

Thanks,
Yazen

Yazen Ghannam (2):
EDAC/amd64: Check register values from all UMCs
EDAC/amd64: Add new register offset support and related changes

drivers/edac/amd64_edac.c | 70 ++++++++++++++++++++++++++++++++++-----
drivers/edac/amd64_edac.h | 14 ++++++++
2 files changed, 76 insertions(+), 8 deletions(-)

--
2.25.1



2021-12-15 16:01:51

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH v2 2/2] EDAC/amd64: Add new register offset support and related changes

Introduce a "family flags" bitmask that can be used to indicate any
special behavior needed on a per-family basis.

Add a flag to indicate a system uses the new register offsets introduced
with Family 19h Model 10h.

Use this flag to account for register offset changes, a new bitfield
indicating DDR5 use on a memory controller, and to set the proper number
of chip select masks.

Signed-off-by: Yazen Ghannam <[email protected]>
---
Link:
https://lkml.kernel.org/r/[email protected]

v1->v2:
* Was patch 4 in v1.
* Change "has_ddr5" flag to "zn_regs_v2".
* Drop flag check helper function.
* Update determine_memory_type() to check bitfield for DDR5.
* Update code comments.

drivers/edac/amd64_edac.c | 59 +++++++++++++++++++++++++++++++++++----
drivers/edac/amd64_edac.h | 14 ++++++++++
2 files changed, 67 insertions(+), 6 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 1df763128483..b7dd87636155 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -15,6 +15,31 @@ static struct msr __percpu *msrs;

static struct amd64_family_type *fam_type;

+/* Family flag helpers */
+static inline u64 get_addr_cfg(void)
+{
+ if (fam_type->flags.zn_regs_v2)
+ return UMCCH_ADDR_CFG_DDR5;
+
+ return UMCCH_ADDR_CFG;
+}
+
+static inline u64 get_addr_mask_sec(void)
+{
+ if (fam_type->flags.zn_regs_v2)
+ return UMCCH_ADDR_MASK_SEC_DDR5;
+
+ return UMCCH_ADDR_MASK_SEC;
+}
+
+static inline u64 get_dimm_cfg(void)
+{
+ if (fam_type->flags.zn_regs_v2)
+ return UMCCH_DIMM_CFG_DDR5;
+
+ return UMCCH_DIMM_CFG;
+}
+
/* Per-node stuff */
static struct ecc_settings **ecc_stngs;

@@ -1429,8 +1454,10 @@ static void __dump_misc_regs_df(struct amd64_pvt *pvt)
edac_dbg(1, "UMC%d x16 DIMMs present: %s\n",
i, (umc->dimm_cfg & BIT(7)) ? "yes" : "no");

- if (pvt->dram_type == MEM_LRDDR4) {
- amd_smn_read(pvt->mc_node_id, umc_base + UMCCH_ADDR_CFG, &tmp);
+ if (pvt->dram_type == MEM_LRDDR4 || pvt->dram_type == MEM_LRDDR5) {
+ amd_smn_read(pvt->mc_node_id,
+ umc_base + get_addr_cfg(),
+ &tmp);
edac_dbg(1, "UMC%d LRDIMM %dx rank multiply\n",
i, 1 << ((tmp >> 4) & 0x3));
}
@@ -1505,7 +1532,7 @@ static void prep_chip_selects(struct amd64_pvt *pvt)

for_each_umc(umc) {
pvt->csels[umc].b_cnt = 4;
- pvt->csels[umc].m_cnt = 2;
+ pvt->csels[umc].m_cnt = fam_type->flags.zn_regs_v2 ? 4 : 2;
}

} else {
@@ -1545,7 +1572,7 @@ static void read_umc_base_mask(struct amd64_pvt *pvt)
}

umc_mask_reg = get_umc_base(umc) + UMCCH_ADDR_MASK;
- umc_mask_reg_sec = get_umc_base(umc) + UMCCH_ADDR_MASK_SEC;
+ umc_mask_reg_sec = get_umc_base(umc) + get_addr_mask_sec();

for_each_chip_select_mask(cs, umc, pvt) {
mask = &pvt->csels[umc].csmasks[cs];
@@ -1628,6 +1655,20 @@ static void determine_memory_type(struct amd64_pvt *pvt)
dimm_cfg |= pvt->umc[i].dimm_cfg;
}

+ /*
+ * Check if the system supports the "DDR Type" field in UMC Config
+ * and has DDR5 DIMMs in use.
+ */
+ if (fam_type->flags.zn_regs_v2 && ((umc_cfg & GENMASK(2, 0)) == 0x1)) {
+ if (dimm_cfg & BIT(5))
+ pvt->dram_type = MEM_LRDDR5;
+ else if (dimm_cfg & BIT(4))
+ pvt->dram_type = MEM_RDDR5;
+ else
+ pvt->dram_type = MEM_DDR5;
+ return;
+ }
+
if (dimm_cfg & BIT(5))
pvt->dram_type = MEM_LRDDR4;
else if (dimm_cfg & BIT(4))
@@ -2174,8 +2215,13 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
* There is one mask per DIMM, and two Chip Selects per DIMM.
* CS0 and CS1 -> DIMM0
* CS2 and CS3 -> DIMM1
+ *
+ * Systems with newer register layout have one mask per Chip Select.
*/
- dimm = csrow_nr >> 1;
+ if (fam_type->flags.zn_regs_v2)
+ dimm = csrow_nr;
+ else
+ dimm = csrow_nr >> 1;

/* Asymmetric dual-rank DIMM support. */
if ((csrow_nr & 1) && (cs_mode & CS_ODD_SECONDARY))
@@ -2937,6 +2983,7 @@ static struct amd64_family_type family_types[] = {
.f0_id = PCI_DEVICE_ID_AMD_19H_M10H_DF_F0,
.f6_id = PCI_DEVICE_ID_AMD_19H_M10H_DF_F6,
.max_mcs = 12,
+ .flags.zn_regs_v2 = 1,
.ops = {
.early_channel_count = f17_early_channel_count,
.dbam_to_cs = f17_addr_mask_to_cs_size,
@@ -3365,7 +3412,7 @@ static void __read_mc_regs_df(struct amd64_pvt *pvt)
umc_base = get_umc_base(i);
umc = &pvt->umc[i];

- amd_smn_read(nid, umc_base + UMCCH_DIMM_CFG, &umc->dimm_cfg);
+ amd_smn_read(nid, umc_base + get_dimm_cfg(), &umc->dimm_cfg);
amd_smn_read(nid, umc_base + UMCCH_UMC_CFG, &umc->umc_cfg);
amd_smn_read(nid, umc_base + UMCCH_SDP_CTRL, &umc->sdp_ctrl);
amd_smn_read(nid, umc_base + UMCCH_ECC_CTRL, &umc->ecc_ctrl);
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 650cab401e21..39ecb77873db 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -271,8 +271,11 @@
#define UMCCH_BASE_ADDR_SEC 0x10
#define UMCCH_ADDR_MASK 0x20
#define UMCCH_ADDR_MASK_SEC 0x28
+#define UMCCH_ADDR_MASK_SEC_DDR5 0x30
#define UMCCH_ADDR_CFG 0x30
+#define UMCCH_ADDR_CFG_DDR5 0x40
#define UMCCH_DIMM_CFG 0x80
+#define UMCCH_DIMM_CFG_DDR5 0x90
#define UMCCH_UMC_CFG 0x100
#define UMCCH_SDP_CTRL 0x104
#define UMCCH_ECC_CTRL 0x14C
@@ -477,11 +480,22 @@ struct low_ops {
unsigned cs_mode, int cs_mask_nr);
};

+struct amd64_family_flags {
+ /*
+ * Indicates that the system supports the new register offsets, etc.
+ * first introduced with Family 19h Model 10h.
+ */
+ __u64 zn_regs_v2 : 1,
+
+ __reserved : 63;
+};
+
struct amd64_family_type {
const char *ctl_name;
u16 f0_id, f1_id, f2_id, f6_id;
/* Maximum number of memory controllers per die/node. */
u8 max_mcs;
+ struct amd64_family_flags flags;
struct low_ops ops;
};

--
2.25.1


2021-12-15 16:01:51

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH v2 1/2] EDAC/amd64: Check register values from all UMCs

The initial support for Unified Memory Controllers (UMCs) was added to
AMD64 EDAC for the first generation of Zen systems. These systems have
two UMCs per Die, and the code originally assumed two UMCs in various
places.

Later systems have more than two UMCs, and this assumption was fixed in
the following commits.

commit bdcee7747f5c ("EDAC/amd64: Support more than two Unified Memory Controllers")
commit d971e28e2ce4 ("EDAC/amd64: Support more than two controllers for chip selects handling")

However, the determine_memory_type() function was missed in these
changes, and two UMCs are still assumed.

Update determine_memory_type() to account for all UMCs when checking the
register values.

Signed-off-by: Yazen Ghannam <[email protected]>
---
Link:
https://lkml.kernel.org/r/[email protected]

v1->v2:
* Was patch 3 in v1.
* Update commit message.

drivers/edac/amd64_edac.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index ff29267e46a6..1df763128483 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -1621,9 +1621,16 @@ static void determine_memory_type(struct amd64_pvt *pvt)
u32 dram_ctrl, dcsm;

if (pvt->umc) {
- if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(5))
+ u32 umc_cfg = 0, dimm_cfg = 0, i = 0;
+
+ for_each_umc(i) {
+ umc_cfg |= pvt->umc[i].umc_cfg;
+ dimm_cfg |= pvt->umc[i].dimm_cfg;
+ }
+
+ if (dimm_cfg & BIT(5))
pvt->dram_type = MEM_LRDDR4;
- else if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(4))
+ else if (dimm_cfg & BIT(4))
pvt->dram_type = MEM_RDDR4;
else
pvt->dram_type = MEM_DDR4;
--
2.25.1


2021-12-15 16:33:07

by William Roche

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] EDAC/amd64: Add new register offset support and related changes

On 15/12/2021 16:53, Yazen Ghannam wrote:
> Introduce a "family flags" bitmask that can be used to indicate any
> special behavior needed on a per-family basis.
>
> Add a flag to indicate a system uses the new register offsets introduced
> with Family 19h Model 10h.
>
> Use this flag to account for register offset changes, a new bitfield
> indicating DDR5 use on a memory controller, and to set the proper number
> of chip select masks.
>
> Signed-off-by: Yazen Ghannam <[email protected]>
> ---
> Link:
> https://lkml.kernel.org/r/[email protected]
>
> v1->v2:
> * Was patch 4 in v1.
> * Change "has_ddr5" flag to "zn_regs_v2".
> * Drop flag check helper function.
> * Update determine_memory_type() to check bitfield for DDR5.
> * Update code comments.
>
> drivers/edac/amd64_edac.c | 59 +++++++++++++++++++++++++++++++++++----
> drivers/edac/amd64_edac.h | 14 ++++++++++
> 2 files changed, 67 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
> index 1df763128483..b7dd87636155 100644
> --- a/drivers/edac/amd64_edac.c
> +++ b/drivers/edac/amd64_edac.c
> @@ -15,6 +15,31 @@ static struct msr __percpu *msrs;
>
> static struct amd64_family_type *fam_type;
>
> +/* Family flag helpers */
> +static inline u64 get_addr_cfg(void)
> +{
> + if (fam_type->flags.zn_regs_v2)
> + return UMCCH_ADDR_CFG_DDR5;
> +
> + return UMCCH_ADDR_CFG;
> +}
> +
> +static inline u64 get_addr_mask_sec(void)
> +{
> + if (fam_type->flags.zn_regs_v2)
> + return UMCCH_ADDR_MASK_SEC_DDR5;
> +
> + return UMCCH_ADDR_MASK_SEC;
> +}
> +
> +static inline u64 get_dimm_cfg(void)
> +{
> + if (fam_type->flags.zn_regs_v2)
> + return UMCCH_DIMM_CFG_DDR5;
> +
> + return UMCCH_DIMM_CFG;
> +}
> +
> /* Per-node stuff */
> static struct ecc_settings **ecc_stngs;
>
> @@ -1429,8 +1454,10 @@ static void __dump_misc_regs_df(struct amd64_pvt *pvt)
> edac_dbg(1, "UMC%d x16 DIMMs present: %s\n",
> i, (umc->dimm_cfg & BIT(7)) ? "yes" : "no");
>
> - if (pvt->dram_type == MEM_LRDDR4) {
> - amd_smn_read(pvt->mc_node_id, umc_base + UMCCH_ADDR_CFG, &tmp);
> + if (pvt->dram_type == MEM_LRDDR4 || pvt->dram_type == MEM_LRDDR5) {
> + amd_smn_read(pvt->mc_node_id,
> + umc_base + get_addr_cfg(),
> + &tmp);
> edac_dbg(1, "UMC%d LRDIMM %dx rank multiply\n",
> i, 1 << ((tmp >> 4) & 0x3));
> }
> @@ -1505,7 +1532,7 @@ static void prep_chip_selects(struct amd64_pvt *pvt)
>
> for_each_umc(umc) {
> pvt->csels[umc].b_cnt = 4;
> - pvt->csels[umc].m_cnt = 2;
> + pvt->csels[umc].m_cnt = fam_type->flags.zn_regs_v2 ? 4 : 2;
> }
>
> } else {
> @@ -1545,7 +1572,7 @@ static void read_umc_base_mask(struct amd64_pvt *pvt)
> }
>
> umc_mask_reg = get_umc_base(umc) + UMCCH_ADDR_MASK;
> - umc_mask_reg_sec = get_umc_base(umc) + UMCCH_ADDR_MASK_SEC;
> + umc_mask_reg_sec = get_umc_base(umc) + get_addr_mask_sec();
>
> for_each_chip_select_mask(cs, umc, pvt) {
> mask = &pvt->csels[umc].csmasks[cs];
> @@ -1628,6 +1655,20 @@ static void determine_memory_type(struct amd64_pvt *pvt)
> dimm_cfg |= pvt->umc[i].dimm_cfg;
> }
>
> + /*
> + * Check if the system supports the "DDR Type" field in UMC Config
> + * and has DDR5 DIMMs in use.
> + */
> + if (fam_type->flags.zn_regs_v2 && ((umc_cfg & GENMASK(2, 0)) == 0x1)) {
> + if (dimm_cfg & BIT(5))
> + pvt->dram_type = MEM_LRDDR5;
> + else if (dimm_cfg & BIT(4))
> + pvt->dram_type = MEM_RDDR5;
> + else
> + pvt->dram_type = MEM_DDR5;
> + return;
> + }
> +
> if (dimm_cfg & BIT(5))
> pvt->dram_type = MEM_LRDDR4;
> else if (dimm_cfg & BIT(4))
> @@ -2174,8 +2215,13 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
> * There is one mask per DIMM, and two Chip Selects per DIMM.
> * CS0 and CS1 -> DIMM0
> * CS2 and CS3 -> DIMM1
> + *
> + * Systems with newer register layout have one mask per Chip Select.

Just a question about this comment: Can it be translated into this ?

+ * Except on systems with newer register layout where we have one Chip Select per DIMM.

> */
> - dimm = csrow_nr >> 1;
> + if (fam_type->flags.zn_regs_v2)
> + dimm = csrow_nr;
> + else
> + dimm = csrow_nr >> 1;
>
> /* Asymmetric dual-rank DIMM support. */
> if ((csrow_nr & 1) && (cs_mode & CS_ODD_SECONDARY))
> @@ -2937,6 +2983,7 @@ static struct amd64_family_type family_types[] = {
> .f0_id = PCI_DEVICE_ID_AMD_19H_M10H_DF_F0,
> .f6_id = PCI_DEVICE_ID_AMD_19H_M10H_DF_F6,
> .max_mcs = 12,
> + .flags.zn_regs_v2 = 1,
> .ops = {
> .early_channel_count = f17_early_channel_count,
> .dbam_to_cs = f17_addr_mask_to_cs_size,
> @@ -3365,7 +3412,7 @@ static void __read_mc_regs_df(struct amd64_pvt *pvt)
> umc_base = get_umc_base(i);
> umc = &pvt->umc[i];
>
> - amd_smn_read(nid, umc_base + UMCCH_DIMM_CFG, &umc->dimm_cfg);
> + amd_smn_read(nid, umc_base + get_dimm_cfg(), &umc->dimm_cfg);
> amd_smn_read(nid, umc_base + UMCCH_UMC_CFG, &umc->umc_cfg);
> amd_smn_read(nid, umc_base + UMCCH_SDP_CTRL, &umc->sdp_ctrl);
> amd_smn_read(nid, umc_base + UMCCH_ECC_CTRL, &umc->ecc_ctrl);
> diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
> index 650cab401e21..39ecb77873db 100644
> --- a/drivers/edac/amd64_edac.h
> +++ b/drivers/edac/amd64_edac.h
> @@ -271,8 +271,11 @@
> #define UMCCH_BASE_ADDR_SEC 0x10
> #define UMCCH_ADDR_MASK 0x20
> #define UMCCH_ADDR_MASK_SEC 0x28
> +#define UMCCH_ADDR_MASK_SEC_DDR5 0x30
> #define UMCCH_ADDR_CFG 0x30
> +#define UMCCH_ADDR_CFG_DDR5 0x40
> #define UMCCH_DIMM_CFG 0x80
> +#define UMCCH_DIMM_CFG_DDR5 0x90
> #define UMCCH_UMC_CFG 0x100
> #define UMCCH_SDP_CTRL 0x104
> #define UMCCH_ECC_CTRL 0x14C
> @@ -477,11 +480,22 @@ struct low_ops {
> unsigned cs_mode, int cs_mask_nr);
> };
>
> +struct amd64_family_flags {
> + /*
> + * Indicates that the system supports the new register offsets, etc.
> + * first introduced with Family 19h Model 10h.
> + */
> + __u64 zn_regs_v2 : 1,
> +
> + __reserved : 63;
> +};
> +
> struct amd64_family_type {
> const char *ctl_name;
> u16 f0_id, f1_id, f2_id, f6_id;
> /* Maximum number of memory controllers per die/node. */
> u8 max_mcs;
> + struct amd64_family_flags flags;
> struct low_ops ops;
> };
>
Thanks in advance,

William.



2021-12-15 17:53:36

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] AMD Family 19h Models 10h-1Fh Updates

On Wed, Dec 15, 2021 at 03:53:07PM +0000, Yazen Ghannam wrote:
> The following commit in hwmon-next is needed for functional support of
> this set.
>
> 49e90c39d0be ("x86/amd_nb: Add AMD Family 19h Models (10h-1Fh) and (A0h-AFh) PCI IDs")

I'd normally want do cross-merge that one so that stuff can get tested
but am being told that those models are not shipping yet so we're not
going to break anything even if those go separately Linus-wards.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-12-15 18:01:24

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] EDAC/amd64: Check register values from all UMCs

On Wed, Dec 15, 2021 at 03:53:08PM +0000, Yazen Ghannam wrote:
> - if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(5))
> + u32 umc_cfg = 0, dimm_cfg = 0, i = 0;
> +
> + for_each_umc(i) {
> + umc_cfg |= pvt->umc[i].umc_cfg;
> + dimm_cfg |= pvt->umc[i].dimm_cfg;
> + }
> +
> + if (dimm_cfg & BIT(5))
> pvt->dram_type = MEM_LRDDR4;
> - else if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(4))
> + else if (dimm_cfg & BIT(4))

You're working here under the assumption that bit 4 and 5 will have the
same value on all those UMCs.

You're probably going to say that that is how the BIOS is programming
them so they should be all the same and any other configuration is
invalid but lemme still ask about it explicitly.

And if so, this would probably need a comment above it which I can add
when applying...

Hmm?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-12-15 18:07:19

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] EDAC/amd64: Add new register offset support and related changes

On Wed, Dec 15, 2021 at 05:32:27PM +0100, William Roche wrote:
> > @@ -2174,8 +2215,13 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
> > * There is one mask per DIMM, and two Chip Selects per DIMM.
> > * CS0 and CS1 -> DIMM0
> > * CS2 and CS3 -> DIMM1
> > + *
> > + * Systems with newer register layout have one mask per Chip Select.
>
> Just a question about this comment: Can it be translated into this ?
>
> + * Except on systems with newer register layout where we have one Chip Select per DIMM.

Sure, but without the "we":

...
* On systems with the newer register layout there is one Chip Select per DIMM.
*/

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-12-16 15:47:11

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] EDAC/amd64: Add new register offset support and related changes

On Wed, Dec 15, 2021 at 07:07:17PM +0100, Borislav Petkov wrote:
> On Wed, Dec 15, 2021 at 05:32:27PM +0100, William Roche wrote:
> > > @@ -2174,8 +2215,13 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
> > > * There is one mask per DIMM, and two Chip Selects per DIMM.
> > > * CS0 and CS1 -> DIMM0
> > > * CS2 and CS3 -> DIMM1
> > > + *
> > > + * Systems with newer register layout have one mask per Chip Select.
> >
> > Just a question about this comment: Can it be translated into this ?
> >
> > + * Except on systems with newer register layout where we have one Chip Select per DIMM.
>
> Sure, but without the "we":
>
> ...
> * On systems with the newer register layout there is one Chip Select per DIMM.
> */
>

Hi William,
Thanks for the suggestion, but it's not quite correct.

There are still two Chip Selects per DIMM module, i.e. the system can support
dual-rank (2R) DIMMs. Current AMD systems can support upto 2 DIMMs per Unified
Memory Controller (UMC). There are two "Address Mask" registers in each UMC,
and each register covers an entire DIMM (and by extension the two Chip Selects
available for each DIMM).

Future systems will still support upto 2 DIMMs per UMC. However, the register
space is updated so that there are now four "Address Mask" registers per UMC.
And each of these registers is now explicitly related to one of the four Chip
Selects available per UMC.

Does this help? I can update the code comments with these details.

Thanks,
Yazen

2021-12-16 16:08:35

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] EDAC/amd64: Check register values from all UMCs

On Wed, Dec 15, 2021 at 07:01:22PM +0100, Borislav Petkov wrote:
> On Wed, Dec 15, 2021 at 03:53:08PM +0000, Yazen Ghannam wrote:
> > - if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(5))
> > + u32 umc_cfg = 0, dimm_cfg = 0, i = 0;
> > +
> > + for_each_umc(i) {
> > + umc_cfg |= pvt->umc[i].umc_cfg;
> > + dimm_cfg |= pvt->umc[i].dimm_cfg;
> > + }
> > +
> > + if (dimm_cfg & BIT(5))
> > pvt->dram_type = MEM_LRDDR4;
> > - else if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(4))
> > + else if (dimm_cfg & BIT(4))
>
> You're working here under the assumption that bit 4 and 5 will have the
> same value on all those UMCs.
>
> You're probably going to say that that is how the BIOS is programming
> them so they should be all the same and any other configuration is
> invalid but lemme still ask about it explicitly.
>
> And if so, this would probably need a comment above it which I can add
> when applying...
>
> Hmm?
>

No, that's a good question. And actually the assumption is incorrect. It is
allowed to have different DIMM types in a system though all DIMMs on a single
UMC must match.

Do you recommend a follow up patch or should this one be reworked?

Thanks,
Yazen

2021-12-16 18:44:23

by William Roche

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] EDAC/amd64: Add new register offset support and related changes

On 16/12/2021 16:46, Yazen Ghannam wrote:
> On Wed, Dec 15, 2021 at 07:07:17PM +0100, Borislav Petkov wrote:
>> On Wed, Dec 15, 2021 at 05:32:27PM +0100, William Roche wrote:
>>>> @@ -2174,8 +2215,13 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
>>>> * There is one mask per DIMM, and two Chip Selects per DIMM.
>>>> * CS0 and CS1 -> DIMM0
>>>> * CS2 and CS3 -> DIMM1
>>>> + *
>>>> + * Systems with newer register layout have one mask per Chip Select.
>>> Just a question about this comment: Can it be translated into this ?
>>>
>>> + * Except on systems with newer register layout where we have one Chip Select per DIMM.
>> Sure, but without the "we":
>>
>> ...
>> * On systems with the newer register layout there is one Chip Select per DIMM.
>> */
>>
> Hi William,
> Thanks for the suggestion, but it's not quite correct.

That's exactly what I wanted to know. Thanks.

>
> There are still two Chip Selects per DIMM module, i.e. the system can support
> dual-rank (2R) DIMMs. Current AMD systems can support upto 2 DIMMs per Unified
> Memory Controller (UMC). There are two "Address Mask" registers in each UMC,
> and each register covers an entire DIMM (and by extension the two Chip Selects
> available for each DIMM).
>
> Future systems will still support upto 2 DIMMs per UMC. However, the register
> space is updated so that there are now four "Address Mask" registers per UMC.
> And each of these registers is now explicitly related to one of the four Chip
> Selects available per UMC.

From what I understand, future systems would still support the same
number of dimms per UMC (2), the same number of Chip Select (2 per
dimm), the only thing that changes is the number of Address Mask
registers (going from 2 per UMC  to  4 per UMC).

So I'm confused, we deduce 'dimm' from csrow_nr, which would be in fact
the Chip Select *masks* number (cs_mask_nr from the dbam_to_cs signature
in struct low_ops), so why are we saying and dimm=csrow_nr in the case
of the new layout, but dimm = csrow_nr / 2 in the case on the standard
layout ?

Should we indicate what this 'dimm' value really is ?

Sorry if I'm missing something very obvious here.

Thanks,
William.


> Does this help? I can update the code comments with these details.
>
> Thanks,
> Yazen

2021-12-16 19:22:00

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] EDAC/amd64: Add new register offset support and related changes

On Thu, Dec 16, 2021 at 07:43:55PM +0100, William Roche wrote:
...
> From what I understand, future systems would still support the same number
> of dimms per UMC (2), the same number of Chip Select (2 per dimm), the only
> thing that changes is the number of Address Mask registers (going from 2 per
> UMC? to? 4 per UMC).
>
> So I'm confused, we deduce 'dimm' from csrow_nr, which would be in fact the
> Chip Select *masks* number (cs_mask_nr from the dbam_to_cs signature in
> struct low_ops), so why are we saying and dimm=csrow_nr in the case of the
> new layout, but dimm = csrow_nr / 2 in the case on the standard layout ?
>
> Should we indicate what this 'dimm' value really is ?
>
> Sorry if I'm missing something very obvious here.
>

That's fair. I can rework the patch to explicitly differentiate between "dimm"
and "cs_mask_nr" here.

I think this would resolve an issue in a later debug print statement that
includes the csrow_nr and dimm.

Thanks,
Yazen

2021-12-30 11:36:50

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] EDAC/amd64: Check register values from all UMCs

On Thu, Dec 16, 2021 at 04:08:18PM +0000, Yazen Ghannam wrote:
> No, that's a good question. And actually the assumption is incorrect. It is
> allowed to have different DIMM types in a system though all DIMMs on a single
> UMC must match.

Oh fun, really?!

So a single system can have DDR4 *and* DDR5 on the same board?!

So then that

pvt->dram_type

is insufficient to store the DIMM type for a pvt. If you have multiple
UMCs on a pvt and all have different type DIMMs, then you need the
relevant DIMM type when you dump it in sysfs...

Which then means, you need ->dram_type to be per UMC...

And also, I'm assuming the hw already enforces that DIMMs on a single
UMC must match - it simply won't boot if they don't so you don't have to
enforce that, at least.

> Do you recommend a follow up patch or should this one be reworked?

This one is insufficient, I'm afraid.

One way to address this is, you could use pvt->umc at the places where
dram_type is used and assign directly to the dimm->mtype thing. But then
you'd need a way to map each struct dimm_info *dimm to the UMC so that
you can determine the correct DIMM type.

Which would make pvt->dram_type redundant and can be removed.

Or you might have a better idea...

HTH.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-01-05 16:12:22

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] EDAC/amd64: Check register values from all UMCs

On Thu, Dec 30, 2021 at 12:36:44PM +0100, Borislav Petkov wrote:
> On Thu, Dec 16, 2021 at 04:08:18PM +0000, Yazen Ghannam wrote:
> > No, that's a good question. And actually the assumption is incorrect. It is
> > allowed to have different DIMM types in a system though all DIMMs on a single
> > UMC must match.
>
> Oh fun, really?!
>
> So a single system can have DDR4 *and* DDR5 on the same board?!
>

Well, I don't know about that specifically. There are some restrictions, but
you could have UDIMMs and RDIMMs of the same generation, at least.

> So then that
>
> pvt->dram_type
>
> is insufficient to store the DIMM type for a pvt. If you have multiple
> UMCs on a pvt and all have different type DIMMs, then you need the
> relevant DIMM type when you dump it in sysfs...
>
> Which then means, you need ->dram_type to be per UMC...
>
> And also, I'm assuming the hw already enforces that DIMMs on a single
> UMC must match - it simply won't boot if they don't so you don't have to
> enforce that, at least.
>
> > Do you recommend a follow up patch or should this one be reworked?
>
> This one is insufficient, I'm afraid.
>
> One way to address this is, you could use pvt->umc at the places where
> dram_type is used and assign directly to the dimm->mtype thing. But then
> you'd need a way to map each struct dimm_info *dimm to the UMC so that
> you can determine the correct DIMM type.
>

I did send a patch that did something like this.
https://lkml.kernel.org/r/[email protected]

Though this got a build warning report, so I need to follow up on that.

> Which would make pvt->dram_type redundant and can be removed.
>

I kept this so as to not break legacy systems. But I'll look at it again. I
think you may be right.

Thanks,
Yazen