2019-02-12 21:53:58

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH 1/2] EDAC/mce_amd: Decode MCA_STATUS[Scrub] bit

From: Yazen Ghannam <[email protected]>

Previous AMD systems have had a bit in MCA_STATUS to indicate that an
error was detected on a scrub operation. However, this bit was defined
differently within different banks and families/models.

Starting with Family 17h, MCA_STATUS[40] is either Reserved/Read-as-Zero
or defined as "Scrub", for all MCA banks and CPU models. Therefore, we
can define this bit as the "Scrub" bit.

Define MCA_STATUS[40] as "Scrub" and decode it in the AMD MCE decoding
module for Family 17h and newer systems.

Signed-off-by: Yazen Ghannam <[email protected]>
---
arch/x86/include/asm/mce.h | 1 +
drivers/edac/mce_amd.c | 3 +++
2 files changed, 4 insertions(+)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 299a38536567..22d05e3835f0 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -48,6 +48,7 @@
#define MCI_STATUS_SYNDV BIT_ULL(53) /* synd reg. valid */
#define MCI_STATUS_DEFERRED BIT_ULL(44) /* uncorrected error, deferred exception */
#define MCI_STATUS_POISON BIT_ULL(43) /* access poisonous data */
+#define MCI_STATUS_SCRUB BIT_ULL(40) /* Error detected during scrub operation */

/*
* McaX field if set indicates a given bank supports MCA extensions:
diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index f286b880f981..b349c22bb386 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -1078,6 +1078,9 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
if (ecc)
pr_cont("|%sECC", ((ecc == 2) ? "C" : "U"));

+ if (fam >= 0x17)
+ pr_cont("|%s", (m->status & MCI_STATUS_SCRUB ? "Scrub" : "-"));
+
pr_cont("]: 0x%016llx\n", m->status);

if (m->status & MCI_STATUS_ADDRV)
--
2.17.1



2019-02-12 21:54:21

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH 2/2] EDAC/mce_amd: Decode MCA_STATUS in bit definition order

From: Yazen Ghannam <[email protected]>

Reorder how we decode the bits in MCA_STATUS to follow how their defined
in the register.

The order is as follows:

Bit : Decode
61 : UC
59 : MiscV
58 : AddrV
57 : PCC
55 : TCC
53 : SyndV
46 : CECC
45 : UECC
44 : Deferred
43 : Poison
40 : Scrub

Signed-off-by: Yazen Ghannam <[email protected]>
---
drivers/edac/mce_amd.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index b349c22bb386..0a1814dad6cf 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -1051,26 +1051,18 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
((m->status & MCI_STATUS_UC) ? "UE" :
(m->status & MCI_STATUS_DEFERRED) ? "-" : "CE"),
((m->status & MCI_STATUS_MISCV) ? "MiscV" : "-"),
- ((m->status & MCI_STATUS_PCC) ? "PCC" : "-"),
- ((m->status & MCI_STATUS_ADDRV) ? "AddrV" : "-"));
-
- if (fam >= 0x15) {
- pr_cont("|%s", (m->status & MCI_STATUS_DEFERRED ? "Deferred" : "-"));
-
- /* F15h, bank4, bit 43 is part of McaStatSubCache. */
- if (fam != 0x15 || m->bank != 4)
- pr_cont("|%s", (m->status & MCI_STATUS_POISON ? "Poison" : "-"));
- }
+ ((m->status & MCI_STATUS_ADDRV) ? "AddrV" : "-"),
+ ((m->status & MCI_STATUS_PCC) ? "PCC" : "-"));

if (boot_cpu_has(X86_FEATURE_SMCA)) {
u32 low, high;
u32 addr = MSR_AMD64_SMCA_MCx_CONFIG(m->bank);

- pr_cont("|%s", ((m->status & MCI_STATUS_SYNDV) ? "SyndV" : "-"));
-
if (!rdmsr_safe(addr, &low, &high) &&
(low & MCI_CONFIG_MCAX))
pr_cont("|%s", ((m->status & MCI_STATUS_TCC) ? "TCC" : "-"));
+
+ pr_cont("|%s", ((m->status & MCI_STATUS_SYNDV) ? "SyndV" : "-"));
}

/* do the two bits[14:13] together */
@@ -1078,6 +1070,14 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
if (ecc)
pr_cont("|%sECC", ((ecc == 2) ? "C" : "U"));

+ if (fam >= 0x15) {
+ pr_cont("|%s", (m->status & MCI_STATUS_DEFERRED ? "Deferred" : "-"));
+
+ /* F15h, bank4, bit 43 is part of McaStatSubCache. */
+ if (fam != 0x15 || m->bank != 4)
+ pr_cont("|%s", (m->status & MCI_STATUS_POISON ? "Poison" : "-"));
+ }
+
if (fam >= 0x17)
pr_cont("|%s", (m->status & MCI_STATUS_SCRUB ? "Scrub" : "-"));

--
2.17.1


Subject: [tip:ras/core] EDAC/mce_amd: Decode MCA_STATUS in bit definition order

Commit-ID: a0bcd3c0b8a52ba0eb74371fa6be15ad0390ba67
Gitweb: https://git.kernel.org/tip/a0bcd3c0b8a52ba0eb74371fa6be15ad0390ba67
Author: Yazen Ghannam <[email protected]>
AuthorDate: Tue, 12 Feb 2019 21:24:29 +0000
Committer: Borislav Petkov <[email protected]>
CommitDate: Fri, 15 Feb 2019 14:36:31 +0100

EDAC/mce_amd: Decode MCA_STATUS in bit definition order

Sort the MCA_STATUS bits in decode output to follow how they are defined
in the register.

The order is as follows:

Bit | Decode
------------
62 | Over
61 | UC
59 | MiscV
58 | AddrV
57 | PCC
55 | TCC
53 | SyndV
46 | CECC
45 | UECC
44 | Deferred
43 | Poison
40 | Scrub

[ bp: Massage a bit. ]

Signed-off-by: Yazen Ghannam <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: linux-edac <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
---
drivers/edac/mce_amd.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index b349c22bb386..0a1814dad6cf 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -1051,26 +1051,18 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
((m->status & MCI_STATUS_UC) ? "UE" :
(m->status & MCI_STATUS_DEFERRED) ? "-" : "CE"),
((m->status & MCI_STATUS_MISCV) ? "MiscV" : "-"),
- ((m->status & MCI_STATUS_PCC) ? "PCC" : "-"),
- ((m->status & MCI_STATUS_ADDRV) ? "AddrV" : "-"));
-
- if (fam >= 0x15) {
- pr_cont("|%s", (m->status & MCI_STATUS_DEFERRED ? "Deferred" : "-"));
-
- /* F15h, bank4, bit 43 is part of McaStatSubCache. */
- if (fam != 0x15 || m->bank != 4)
- pr_cont("|%s", (m->status & MCI_STATUS_POISON ? "Poison" : "-"));
- }
+ ((m->status & MCI_STATUS_ADDRV) ? "AddrV" : "-"),
+ ((m->status & MCI_STATUS_PCC) ? "PCC" : "-"));

if (boot_cpu_has(X86_FEATURE_SMCA)) {
u32 low, high;
u32 addr = MSR_AMD64_SMCA_MCx_CONFIG(m->bank);

- pr_cont("|%s", ((m->status & MCI_STATUS_SYNDV) ? "SyndV" : "-"));
-
if (!rdmsr_safe(addr, &low, &high) &&
(low & MCI_CONFIG_MCAX))
pr_cont("|%s", ((m->status & MCI_STATUS_TCC) ? "TCC" : "-"));
+
+ pr_cont("|%s", ((m->status & MCI_STATUS_SYNDV) ? "SyndV" : "-"));
}

/* do the two bits[14:13] together */
@@ -1078,6 +1070,14 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
if (ecc)
pr_cont("|%sECC", ((ecc == 2) ? "C" : "U"));

+ if (fam >= 0x15) {
+ pr_cont("|%s", (m->status & MCI_STATUS_DEFERRED ? "Deferred" : "-"));
+
+ /* F15h, bank4, bit 43 is part of McaStatSubCache. */
+ if (fam != 0x15 || m->bank != 4)
+ pr_cont("|%s", (m->status & MCI_STATUS_POISON ? "Poison" : "-"));
+ }
+
if (fam >= 0x17)
pr_cont("|%s", (m->status & MCI_STATUS_SCRUB ? "Scrub" : "-"));


Subject: [tip:ras/core] EDAC/mce_amd: Decode MCA_STATUS[Scrub] bit

Commit-ID: 3f4da372ec8e4ce58c17ac4f2e3c8891bbfea17e
Gitweb: https://git.kernel.org/tip/3f4da372ec8e4ce58c17ac4f2e3c8891bbfea17e
Author: Yazen Ghannam <[email protected]>
AuthorDate: Tue, 12 Feb 2019 21:24:28 +0000
Committer: Borislav Petkov <[email protected]>
CommitDate: Fri, 15 Feb 2019 14:25:58 +0100

EDAC/mce_amd: Decode MCA_STATUS[Scrub] bit

Previous AMD systems have had a bit in MCA_STATUS to indicate that an
error was detected on a scrub operation. However, this bit was defined
differently within different banks and families/models.

Starting with Family 17h, MCA_STATUS[40] is either Reserved/Read-as-Zero
or defined as "Scrub", for all MCA banks and CPU models. Therefore, this
bit can be defined as the "Scrub" bit.

Define MCA_STATUS[40] as "Scrub" and decode it in the AMD MCE decoding
module for Family 17h and newer systems.

Signed-off-by: Yazen Ghannam <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Morse <[email protected]>
Cc: linux-edac <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Pu Wen <[email protected]>
Cc: Qiuxu Zhuo <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/include/asm/mce.h | 1 +
drivers/edac/mce_amd.c | 3 +++
2 files changed, 4 insertions(+)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 299a38536567..22d05e3835f0 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -48,6 +48,7 @@
#define MCI_STATUS_SYNDV BIT_ULL(53) /* synd reg. valid */
#define MCI_STATUS_DEFERRED BIT_ULL(44) /* uncorrected error, deferred exception */
#define MCI_STATUS_POISON BIT_ULL(43) /* access poisonous data */
+#define MCI_STATUS_SCRUB BIT_ULL(40) /* Error detected during scrub operation */

/*
* McaX field if set indicates a given bank supports MCA extensions:
diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index f286b880f981..b349c22bb386 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -1078,6 +1078,9 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
if (ecc)
pr_cont("|%sECC", ((ecc == 2) ? "C" : "U"));

+ if (fam >= 0x17)
+ pr_cont("|%s", (m->status & MCI_STATUS_SCRUB ? "Scrub" : "-"));
+
pr_cont("]: 0x%016llx\n", m->status);

if (m->status & MCI_STATUS_ADDRV)