2021-01-16 17:12:36

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH] EDAC/AMD64: Update scrub register addresses for newer models

From: Yazen Ghannam <[email protected]>

The Family 17h scrubber registers moved to different offset starting
with Model 30h. The new register offsets are used for all currently
available models since then.

Use the new register addresses as the defaults.

Set the proper scrub register addresses during module init for older
models.

Reported-by: WGH <[email protected]>
Signed-off-by: Yazen Ghannam <[email protected]>
---
drivers/edac/amd64_edac.c | 23 ++++++++++++++++++-----
drivers/edac/amd64_edac.h | 2 ++
2 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 9868f95a5622..b324b1589e5a 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -167,6 +167,10 @@ static inline int amd64_read_dct_pci_cfg(struct amd64_pvt *pvt, u8 dct,
* other archs, we might not have access to the caches directly.
*/

+static struct {
+ u32 base, limit;
+} f17h_scrub_regs = {F17H_M30H_SCR_BASE_ADDR, F17H_M30H_SCR_LIMIT_ADDR};
+
static inline void __f17h_set_scrubval(struct amd64_pvt *pvt, u32 scrubval)
{
/*
@@ -176,10 +180,10 @@ static inline void __f17h_set_scrubval(struct amd64_pvt *pvt, u32 scrubval)
*/
if (scrubval >= 0x5 && scrubval <= 0x14) {
scrubval -= 0x5;
- pci_write_bits32(pvt->F6, F17H_SCR_LIMIT_ADDR, scrubval, 0xF);
- pci_write_bits32(pvt->F6, F17H_SCR_BASE_ADDR, 1, 0x1);
+ pci_write_bits32(pvt->F6, f17h_scrub_regs.limit, scrubval, 0xF);
+ pci_write_bits32(pvt->F6, f17h_scrub_regs.base, 1, 0x1);
} else {
- pci_write_bits32(pvt->F6, F17H_SCR_BASE_ADDR, 0, 0x1);
+ pci_write_bits32(pvt->F6, f17h_scrub_regs.base, 0, 0x1);
}
}
/*
@@ -257,9 +261,9 @@ static int get_scrub_rate(struct mem_ctl_info *mci)
u32 scrubval = 0;

if (pvt->umc) {
- amd64_read_pci_cfg(pvt->F6, F17H_SCR_BASE_ADDR, &scrubval);
+ amd64_read_pci_cfg(pvt->F6, f17h_scrub_regs.base, &scrubval);
if (scrubval & BIT(0)) {
- amd64_read_pci_cfg(pvt->F6, F17H_SCR_LIMIT_ADDR, &scrubval);
+ amd64_read_pci_cfg(pvt->F6, f17h_scrub_regs.limit, &scrubval);
scrubval &= 0xF;
scrubval += 0x5;
} else {
@@ -3568,6 +3572,14 @@ f17h_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt *pvt)
}
}

+static void f17h_set_scrub_regs(struct amd64_pvt *pvt)
+{
+ if ((pvt->fam == 0x17 && pvt->model < 0x30) || pvt->fam == 0x18) {
+ f17h_scrub_regs.base = F17H_SCR_BASE_ADDR;
+ f17h_scrub_regs.limit = F17H_SCR_LIMIT_ADDR;
+ }
+}
+
static void setup_mci_misc_attrs(struct mem_ctl_info *mci)
{
struct amd64_pvt *pvt = mci->pvt_info;
@@ -3577,6 +3589,7 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci)

if (pvt->umc) {
f17h_determine_edac_ctl_cap(mci, pvt);
+ f17h_set_scrub_regs(pvt);
} else {
if (pvt->nbcap & NBCAP_SECDED)
mci->edac_ctl_cap |= EDAC_FLAG_SECDED;
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 85aa820bc165..4606f72f4258 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -213,6 +213,8 @@
#define F15H_M60H_SCRCTRL 0x1C8
#define F17H_SCR_BASE_ADDR 0x48
#define F17H_SCR_LIMIT_ADDR 0x4C
+#define F17H_M30H_SCR_BASE_ADDR 0x40
+#define F17H_M30H_SCR_LIMIT_ADDR 0x44

/*
* Function 3 - Misc Control
--
2.25.1


2021-01-18 01:44:10

by WGH

[permalink] [raw]
Subject: Re: [PATCH] EDAC/AMD64: Update scrub register addresses for newer models

On 16/01/2021 17:33, Yazen Ghannam wrote:
> From: Yazen Ghannam <[email protected]>
>
> The Family 17h scrubber registers moved to different offset starting
> with Model 30h. The new register offsets are used for all currently
> available models since then.
>
> Use the new register addresses as the defaults.
>
> Set the proper scrub register addresses during module init for older
> models.

So I tested the patch on my machine (AMD Ryzen 9 3900XT on ASRock B550 Extreme4 motherboard, Linux 5.10.7).

The /sys/devices/system/edac/mc/mc0/sdram_scrub_rate value seems to be stuck at 12284069 right after the boot, and does not change.
Writes to the file do not report any errors.

dmesg:

[    0.549451] EDAC MC: Ver: 3.0.0
[    0.817576] EDAC amd64: F17h_M70h detected (node 0).
[    0.818159] EDAC amd64: Node 0: DRAM ECC enabled.
[    0.818717] EDAC amd64: MCT channel count: 2
[    0.819324] EDAC MC0: Giving out device to module amd64_edac controller F17h_M70h: DEV 0000:00:18.3 (INTERRUPT)
[    0.819909] EDAC MC: UMC0 chip selects:
[    0.819910] EDAC amd64: MC: 0: 16384MB 1: 16384MB
[    0.820488] EDAC amd64: MC: 2: 16384MB 3: 16384MB
[    0.821067] EDAC MC: UMC1 chip selects:
[    0.821067] EDAC amd64: MC: 0: 16384MB 1: 16384MB
[    0.821630] EDAC amd64: MC: 2: 16384MB 3: 16384MB
[    0.822187] EDAC amd64: using x16 syndromes.
[    0.822739] EDAC PCI0: Giving out device to module amd64_edac controller EDAC PCI controller: DEV 0000:00:18.0 (POLLED)
[    0.823314] AMD64 EDAC driver v3.5.0


2021-01-19 05:11:49

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] EDAC/AMD64: Update scrub register addresses for newer models

On Sat, Jan 16, 2021 at 02:33:53PM +0000, Yazen Ghannam wrote:
> +static struct {
> + u32 base, limit;
> +} f17h_scrub_regs = {F17H_M30H_SCR_BASE_ADDR, F17H_M30H_SCR_LIMIT_ADDR};

Why not make this part of struct amd64_umc so that you can access them
through pvt->umc?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-01-20 14:49:06

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH] EDAC/AMD64: Update scrub register addresses for newer models

On Mon, Jan 18, 2021 at 04:30:58AM +0300, WGH wrote:
> On 16/01/2021 17:33, Yazen Ghannam wrote:
> > From: Yazen Ghannam <[email protected]>
> >
> > The Family 17h scrubber registers moved to different offset starting
> > with Model 30h. The new register offsets are used for all currently
> > available models since then.
> >
> > Use the new register addresses as the defaults.
> >
> > Set the proper scrub register addresses during module init for older
> > models.
>
> So I tested the patch on my machine (AMD Ryzen 9 3900XT on ASRock B550 Extreme4 motherboard, Linux 5.10.7).
>
> The /sys/devices/system/edac/mc/mc0/sdram_scrub_rate value seems to be stuck at 12284069 right after the boot, and does not change.
> Writes to the file do not report any errors.
>
> dmesg:
>
> [??? 0.549451] EDAC MC: Ver: 3.0.0
> [??? 0.817576] EDAC amd64: F17h_M70h detected (node 0).
> [??? 0.818159] EDAC amd64: Node 0: DRAM ECC enabled.
> [??? 0.818717] EDAC amd64: MCT channel count: 2
> [??? 0.819324] EDAC MC0: Giving out device to module amd64_edac controller F17h_M70h: DEV 0000:00:18.3 (INTERRUPT)
> [??? 0.819909] EDAC MC: UMC0 chip selects:
> [??? 0.819910] EDAC amd64: MC: 0: 16384MB 1: 16384MB
> [??? 0.820488] EDAC amd64: MC: 2: 16384MB 3: 16384MB
> [??? 0.821067] EDAC MC: UMC1 chip selects:
> [??? 0.821067] EDAC amd64: MC: 0: 16384MB 1: 16384MB
> [??? 0.821630] EDAC amd64: MC: 2: 16384MB 3: 16384MB
> [??? 0.822187] EDAC amd64: using x16 syndromes.
> [??? 0.822739] EDAC PCI0: Giving out device to module amd64_edac controller EDAC PCI controller: DEV 0000:00:18.0 (POLLED)
> [??? 0.823314] AMD64 EDAC driver v3.5.0
>
>

Thanks for testing. I'll try to find a similar system and check it out.

Thanks,
Yazen

2021-01-20 14:50:55

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH] EDAC/AMD64: Update scrub register addresses for newer models

On Mon, Jan 18, 2021 at 08:31:12PM +0100, Borislav Petkov wrote:
> On Sat, Jan 16, 2021 at 02:33:53PM +0000, Yazen Ghannam wrote:
> > +static struct {
> > + u32 base, limit;
> > +} f17h_scrub_regs = {F17H_M30H_SCR_BASE_ADDR, F17H_M30H_SCR_LIMIT_ADDR};
>
> Why not make this part of struct amd64_umc so that you can access them
> through pvt->umc?
>

We have a struct amd64_umc per channel, so putting these fixed values
there seemed redundant. Would you mind if we put this in struct
amd64_family_type? We can then set the values per family/model group
like we do with the max_mcs.

Thanks,
Yazen

2021-01-20 17:23:23

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] EDAC/AMD64: Update scrub register addresses for newer models

On Wed, Jan 20, 2021 at 08:41:11AM -0600, Yazen Ghannam wrote:
> We have a struct amd64_umc per channel, so putting these fixed values
> there seemed redundant. Would you mind if we put this in struct
> amd64_family_type? We can then set the values per family/model group
> like we do with the max_mcs.

Sure, that's better.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette