2024-05-23 15:57:22

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH 0/9] AMD MCA interrupts rework

Hi all,

This set unifies the AMD MCA interrupt handlers with common MCA code.
The goal is to avoid duplicating functionality like reading and clearing
MCA banks.

Patches 1-3 are minor changes for issues found during testing.

Patches 4-9 are revised work of patches 6-12 from the following set:
https://lkml.kernel.org/r/[email protected]

In addition to addressing review comments, I tried to reduce the amount
of refactoring to only what is functionally needed for fixes and
features. I still want to do a broader clean up, but I think that can
come later.

Patch 7 has a minor merge conflict with the following set:
https://lkml.kernel.org/r/[email protected]

The sets do not depend on each other, so I thought to keep them
separate. But I can rebase this one on the other, if needed.

Thanks,
Yazen

Yazen Ghannam (9):
x86/mce/inject: Only write MCA_MISC with user-set value
x86/mce: Remove unused variable and return value in
machine_check_poll()
x86/mce: Increment MCP count only for timer calls
x86/mce: Move machine_check_poll() status checks to helper functions
x86/mce: Skip AMD threshold init if no threshold banks found
x86/mce: Unify AMD THR handler with MCA Polling
x86/mce: Unify AMD DFR handler with MCA Polling
x86/mce/amd: Enable interrupt vectors once per-CPU on SMCA systems
x86/mce/amd: Support SMCA Corrected Error Interrupt

arch/x86/include/asm/mce.h | 3 +-
arch/x86/kernel/cpu/mce/amd.c | 274 +++++++++--------------------
arch/x86/kernel/cpu/mce/core.c | 143 +++++++++------
arch/x86/kernel/cpu/mce/inject.c | 8 +-
arch/x86/kernel/cpu/mce/internal.h | 7 +-
5 files changed, 186 insertions(+), 249 deletions(-)


base-commit: 108c6494bdf1dfeaefc0a506e2f471aa92fafdd6
--
2.34.1



2024-05-23 15:57:51

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH 4/9] x86/mce: Move machine_check_poll() status checks to helper functions

There are a number of generic and vendor-specific status checks in
machine_check_poll(). These are used to determine if an error should be
skipped.

Move these into helper functions. Future vendor-specific checks will be
added to the helpers.

Signed-off-by: Yazen Ghannam <[email protected]>
---
arch/x86/kernel/cpu/mce/core.c | 79 +++++++++++++++++-----------------
1 file changed, 39 insertions(+), 40 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 70c8df1a766a..704e651203b4 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -662,6 +662,44 @@ static noinstr void mce_read_aux(struct mce *m, int i)

DEFINE_PER_CPU(unsigned, mce_poll_count);

+static bool ser_log_poll_error(struct mce *m)
+{
+ /* Log "not enabled" (speculative) errors */
+ if (!(m->status & MCI_STATUS_EN))
+ return true;
+
+ /*
+ * Log UCNA (SDM: 15.6.3 "UCR Error Classification")
+ * UC == 1 && PCC == 0 && S == 0
+ */
+ if (!(m->status & MCI_STATUS_PCC) && !(m->status & MCI_STATUS_S))
+ return true;
+
+ return false;
+}
+
+static bool log_poll_error(enum mcp_flags flags, struct mce *m)
+{
+ /* If this entry is not valid, ignore it. */
+ if (!(m->status & MCI_STATUS_VAL))
+ return false;
+
+ /*
+ * If we are logging everything (at CPU online) or this
+ * is a corrected error, then we must log it.
+ */
+ if ((flags & MCP_UC) || !(m->status & MCI_STATUS_UC))
+ return true;
+
+ if (mca_cfg.ser)
+ return ser_log_poll_error(m);
+
+ if (m->status & MCI_STATUS_UC)
+ return false;
+
+ return true;
+}
+
/*
* Poll for corrected events or events that happened before reset.
* Those are just logged through /dev/mcelog.
@@ -709,48 +747,9 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
if (!mca_cfg.cmci_disabled)
mce_track_storm(&m);

- /* If this entry is not valid, ignore it */
- if (!(m.status & MCI_STATUS_VAL))
+ if (!log_poll_error(flags, &m))
continue;

- /*
- * If we are logging everything (at CPU online) or this
- * is a corrected error, then we must log it.
- */
- if ((flags & MCP_UC) || !(m.status & MCI_STATUS_UC))
- goto log_it;
-
- /*
- * Newer Intel systems that support software error
- * recovery need to make additional checks. Other
- * CPUs should skip over uncorrected errors, but log
- * everything else.
- */
- if (!mca_cfg.ser) {
- if (m.status & MCI_STATUS_UC)
- continue;
- goto log_it;
- }
-
- /* Log "not enabled" (speculative) errors */
- if (!(m.status & MCI_STATUS_EN))
- goto log_it;
-
- /*
- * Log UCNA (SDM: 15.6.3 "UCR Error Classification")
- * UC == 1 && PCC == 0 && S == 0
- */
- if (!(m.status & MCI_STATUS_PCC) && !(m.status & MCI_STATUS_S))
- goto log_it;
-
- /*
- * Skip anything else. Presumption is that our read of this
- * bank is racing with a machine check. Leave the log alone
- * for do_machine_check() to deal with it.
- */
- continue;
-
-log_it:
if (flags & MCP_DONTLOG)
goto clear_it;

--
2.34.1


2024-05-23 15:58:05

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH 5/9] x86/mce: Skip AMD threshold init if no threshold banks found

AMD systems optionally support MCA thresholding. This feature is
discovered by checking capability bits in the MCA_MISC* registers.

Currently, MCA thresholding is set up in two passes. The first is during
CPU init where available banks are detected, and the "bank_map" variable
is updated. The second is during sysfs/device init when the thresholding
data structures are allocated and hardware is fully configured.

During device init, the "threshold_banks" array is allocated even if no
available banks were discovered. Furthermore, the thresholding reset
flow checks if the top-level "threshold_banks" array is non-NULL, but it
doesn't check if individual "threshold_bank" structures are non-NULL.
This is avoided because the hardware interrupt is not enabled in this
case. But this issue becomes present if enabling the interrupt when the
thresholding data structures are not initialized.

Check "bank_map" to determine if the thresholding structures should be
allocated and initialized. Also, remove "mce_flags.amd_threshold" which
is redundant when checking "bank_map".

Signed-off-by: Yazen Ghannam <[email protected]>
---
arch/x86/kernel/cpu/mce/amd.c | 2 +-
arch/x86/kernel/cpu/mce/core.c | 1 -
arch/x86/kernel/cpu/mce/internal.h | 5 +----
3 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 9a0133ef7e20..d7dee59cc1ca 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -1395,7 +1395,7 @@ int mce_threshold_create_device(unsigned int cpu)
struct threshold_bank **bp;
int err;

- if (!mce_flags.amd_threshold)
+ if (!this_cpu_read(bank_map))
return 0;

bp = this_cpu_read(threshold_banks);
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 704e651203b4..58b8efdcec0b 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1984,7 +1984,6 @@ static void __mcheck_cpu_init_early(struct cpuinfo_x86 *c)
mce_flags.overflow_recov = !!cpu_has(c, X86_FEATURE_OVERFLOW_RECOV);
mce_flags.succor = !!cpu_has(c, X86_FEATURE_SUCCOR);
mce_flags.smca = !!cpu_has(c, X86_FEATURE_SMCA);
- mce_flags.amd_threshold = 1;
}
}

diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index 01f8f03969e6..08571b10bf3f 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -214,9 +214,6 @@ struct mce_vendor_flags {
/* Zen IFU quirk */
zen_ifu_quirk : 1,

- /* AMD-style error thresholding banks present. */
- amd_threshold : 1,
-
/* Pentium, family 5-style MCA */
p5 : 1,

@@ -229,7 +226,7 @@ struct mce_vendor_flags {
/* Skylake, Cascade Lake, Cooper Lake REP;MOVS* quirk */
skx_repmov_quirk : 1,

- __reserved_0 : 55;
+ __reserved_0 : 56;
};

extern struct mce_vendor_flags mce_flags;
--
2.34.1


2024-05-23 15:58:18

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH 7/9] x86/mce: Unify AMD DFR handler with MCA Polling

AMD systems optionally support a deferred error interrupt. The interrupt
should be used as another signal to trigger MCA polling. This is similar
to how other MCA interrupts are handled.

Deferred errors do not require any special handling related to the
interrupt, e.g. resetting or rearming the interrupt, etc.

However, Scalable MCA systems include a pair of registers, MCA_DESTAT
and MCA_DEADDR, that should be checked for valid errors. This check
should be done whenever MCA registers are polled. Currently, the
deferred error interrupt does this check, but the MCA polling function
does not.

Call the MCA polling function when handling the deferred error
interrupt. This keeps all "polling" cases in a common function.

Call the polling function only for banks that have the deferred error
interrupt enabled.

Add an SMCA status check helper. This will do the same status check and
register clearing that the interrupt handler has done. And it extends
the common polling flow to find AMD deferred errors.

Remove old code whose functionality is already covered in the common MCA
code.

Signed-off-by: Yazen Ghannam <[email protected]>
---
arch/x86/kernel/cpu/mce/amd.c | 99 ++--------------------------------
arch/x86/kernel/cpu/mce/core.c | 46 ++++++++++++++--
2 files changed, 46 insertions(+), 99 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 1ac445a0dc12..c6594da95340 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -57,6 +57,7 @@

static bool thresholding_irq_en;
static DEFINE_PER_CPU_READ_MOSTLY(mce_banks_t, mce_thr_intr_banks);
+static DEFINE_PER_CPU_READ_MOSTLY(mce_banks_t, mce_dfr_intr_banks);

static const char * const th_names[] = {
"load_store",
@@ -296,8 +297,10 @@ static void smca_configure(unsigned int bank, unsigned int cpu)
* APIC based interrupt. First, check that no interrupt has been
* set.
*/
- if ((low & BIT(5)) && !((high >> 5) & 0x3))
+ if ((low & BIT(5)) && !((high >> 5) & 0x3)) {
+ __set_bit(bank, this_cpu_ptr(mce_dfr_intr_banks));
high |= BIT(5);
+ }

this_cpu_ptr(mce_banks_array)[bank].lsb_in_status = !!(low & BIT(8));

@@ -778,33 +781,6 @@ bool amd_mce_usable_address(struct mce *m)
return false;
}

-static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc)
-{
- struct mce m;
-
- mce_setup(&m);
-
- m.status = status;
- m.misc = misc;
- m.bank = bank;
- m.tsc = rdtsc();
-
- if (m.status & MCI_STATUS_ADDRV) {
- m.addr = addr;
-
- smca_extract_err_addr(&m);
- }
-
- if (mce_flags.smca) {
- rdmsrl(MSR_AMD64_SMCA_MCx_IPID(bank), m.ipid);
-
- if (m.status & MCI_STATUS_SYNDV)
- rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m.synd);
- }
-
- mce_log(&m);
-}
-
DEFINE_IDTENTRY_SYSVEC(sysvec_deferred_error)
{
trace_deferred_error_apic_entry(DEFERRED_ERROR_VECTOR);
@@ -814,75 +790,10 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_deferred_error)
apic_eoi();
}

-/*
- * Returns true if the logged error is deferred. False, otherwise.
- */
-static inline bool
-_log_error_bank(unsigned int bank, u32 msr_stat, u32 msr_addr, u64 misc)
-{
- u64 status, addr = 0;
-
- rdmsrl(msr_stat, status);
- if (!(status & MCI_STATUS_VAL))
- return false;
-
- if (status & MCI_STATUS_ADDRV)
- rdmsrl(msr_addr, addr);
-
- __log_error(bank, status, addr, misc);
-
- wrmsrl(msr_stat, 0);
-
- return status & MCI_STATUS_DEFERRED;
-}
-
-static bool _log_error_deferred(unsigned int bank, u32 misc)
-{
- if (!_log_error_bank(bank, mca_msr_reg(bank, MCA_STATUS),
- mca_msr_reg(bank, MCA_ADDR), misc))
- return false;
-
- /*
- * Non-SMCA systems don't have MCA_DESTAT/MCA_DEADDR registers.
- * Return true here to avoid accessing these registers.
- */
- if (!mce_flags.smca)
- return true;
-
- /* Clear MCA_DESTAT if the deferred error was logged from MCA_STATUS. */
- wrmsrl(MSR_AMD64_SMCA_MCx_DESTAT(bank), 0);
- return true;
-}
-
-/*
- * We have three scenarios for checking for Deferred errors:
- *
- * 1) Non-SMCA systems check MCA_STATUS and log error if found.
- * 2) SMCA systems check MCA_STATUS. If error is found then log it and also
- * clear MCA_DESTAT.
- * 3) SMCA systems check MCA_DESTAT, if error was not found in MCA_STATUS, and
- * log it.
- */
-static void log_error_deferred(unsigned int bank)
-{
- if (_log_error_deferred(bank, 0))
- return;
-
- /*
- * Only deferred errors are logged in MCA_DE{STAT,ADDR} so just check
- * for a valid error.
- */
- _log_error_bank(bank, MSR_AMD64_SMCA_MCx_DESTAT(bank),
- MSR_AMD64_SMCA_MCx_DEADDR(bank), 0);
-}
-
/* APIC interrupt handler for deferred errors */
static void amd_deferred_error_interrupt(void)
{
- unsigned int bank;
-
- for (bank = 0; bank < this_cpu_read(mce_num_banks); ++bank)
- log_error_deferred(bank);
+ machine_check_poll(MCP_TIMESTAMP, this_cpu_ptr(&mce_dfr_intr_banks));
}

static void reset_block(struct threshold_block *block)
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index d6517b93c903..16c999b2cc1f 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -637,7 +637,8 @@ static noinstr void mce_read_aux(struct mce *m, int i)
if (m->status & MCI_STATUS_MISCV)
m->misc = mce_rdmsrl(mca_msr_reg(i, MCA_MISC));

- if (m->status & MCI_STATUS_ADDRV) {
+ /* Don't overwrite an address value that was saved earlier. */
+ if (m->status & MCI_STATUS_ADDRV && !m->addr) {
m->addr = mce_rdmsrl(mca_msr_reg(i, MCA_ADDR));

/*
@@ -668,6 +669,35 @@ static void reset_thr_limit(unsigned int bank)

DEFINE_PER_CPU(unsigned, mce_poll_count);

+static bool smca_log_poll_error(struct mce *m, u32 *status_reg)
+{
+ /*
+ * If this is a deferred error found in MCA_STATUS, then clear
+ * the redundant data from the MCA_DESTAT register.
+ */
+ if (m->status & MCI_STATUS_VAL) {
+ if (m->status & MCI_STATUS_DEFERRED)
+ mce_wrmsrl(MSR_AMD64_SMCA_MCx_DESTAT(m->bank), 0);
+
+ return true;
+ }
+
+ /*
+ * If the MCA_DESTAT register has valid data, then use
+ * it as the status register.
+ */
+ *status_reg = MSR_AMD64_SMCA_MCx_DESTAT(m->bank);
+ m->status = mce_rdmsrl(*status_reg);
+
+ if (!(m->status & MCI_STATUS_VAL))
+ return false;
+
+ if (m->status & MCI_STATUS_ADDRV)
+ m->addr = mce_rdmsrl(MSR_AMD64_SMCA_MCx_DEADDR(m->bank));
+
+ return true;
+}
+
static bool ser_log_poll_error(struct mce *m)
{
/* Log "not enabled" (speculative) errors */
@@ -684,8 +714,11 @@ static bool ser_log_poll_error(struct mce *m)
return false;
}

-static bool log_poll_error(enum mcp_flags flags, struct mce *m)
+static bool log_poll_error(enum mcp_flags flags, struct mce *m, u32 *status_reg)
{
+ if (mce_flags.smca)
+ return smca_log_poll_error(m, status_reg);
+
/* If this entry is not valid, ignore it. */
if (!(m->status & MCI_STATUS_VAL))
return false;
@@ -724,6 +757,7 @@ static bool log_poll_error(enum mcp_flags flags, struct mce *m)
void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
{
struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
+ u32 status_reg;
struct mce m;
int i;

@@ -736,12 +770,14 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
if (!mce_banks[i].ctl || !test_bit(i, *b))
continue;

+ status_reg = mca_msr_reg(i, MCA_STATUS);
+
m.misc = 0;
m.addr = 0;
m.bank = i;

barrier();
- m.status = mce_rdmsrl(mca_msr_reg(i, MCA_STATUS));
+ m.status = mce_rdmsrl(status_reg);

/*
* Update storm tracking here, before checking for the
@@ -753,7 +789,7 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
if (!mca_cfg.cmci_disabled)
mce_track_storm(&m);

- if (!log_poll_error(flags, &m))
+ if (!log_poll_error(flags, &m, &status_reg))
continue;

if (flags & MCP_DONTLOG)
@@ -780,7 +816,7 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
/*
* Clear state for this bank.
*/
- mce_wrmsrl(mca_msr_reg(i, MCA_STATUS), 0);
+ mce_wrmsrl(status_reg, 0);
}

/*
--
2.34.1


2024-05-23 15:58:27

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH 6/9] x86/mce: Unify AMD THR handler with MCA Polling

AMD systems optionally support an MCA thresholding interrupt. The
interrupt should be used as another signal to trigger MCA polling. This
is similar to how the Intel Corrected Machine Check interrupt (CMCI) is
handled.

AMD MCA thresholding is managed using the MCA_MISC registers within an
MCA bank. The OS will need to modify the hardware error count field in
order to reset the threshold limit and rearm the interrupt. Management
of the MCA_MISC register should be done as a follow up to the basic MCA
polling flow. It should not be the main focus of the interrupt handler.

Furthermore, future systems will have the ability to send an MCA
thresholding interrupt to the OS even when the OS does not manage the
feature, i.e. MCA_MISC registers are Read-as-Zero/Locked.

Call the common MCA polling function when handling the MCA thresholding
interrupt. This will allow the OS to find any valid errors whether or
not the MCA thresholding feature is OS-managed. Also, this allows the
common MCA polling options and kernel parameters to apply to AMD
systems.

Add a callback to the MCA polling function to check and reset any
threshold blocks that have reached their threshold limit.

Signed-off-by: Yazen Ghannam <[email protected]>
---
arch/x86/kernel/cpu/mce/amd.c | 54 +++++++++++++-----------------
arch/x86/kernel/cpu/mce/core.c | 8 +++++
arch/x86/kernel/cpu/mce/internal.h | 2 ++
3 files changed, 33 insertions(+), 31 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index d7dee59cc1ca..1ac445a0dc12 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -56,6 +56,7 @@
#define SMCA_THR_LVT_OFF 0xF000

static bool thresholding_irq_en;
+static DEFINE_PER_CPU_READ_MOSTLY(mce_banks_t, mce_thr_intr_banks);

static const char * const th_names[] = {
"load_store",
@@ -578,6 +579,7 @@ prepare_threshold_block(unsigned int bank, unsigned int block, u32 addr,
if (!b.interrupt_capable)
goto done;

+ __set_bit(bank, this_cpu_ptr(mce_thr_intr_banks));
b.interrupt_enable = 1;

if (!mce_flags.smca) {
@@ -883,12 +885,7 @@ static void amd_deferred_error_interrupt(void)
log_error_deferred(bank);
}

-static void log_error_thresholding(unsigned int bank, u64 misc)
-{
- _log_error_deferred(bank, misc);
-}
-
-static void log_and_reset_block(struct threshold_block *block)
+static void reset_block(struct threshold_block *block)
{
struct thresh_restart tr;
u32 low = 0, high = 0;
@@ -902,49 +899,44 @@ static void log_and_reset_block(struct threshold_block *block)
if (!(high & MASK_OVERFLOW_HI))
return;

- /* Log the MCE which caused the threshold event. */
- log_error_thresholding(block->bank, ((u64)high << 32) | low);
-
- /* Reset threshold block after logging error. */
memset(&tr, 0, sizeof(tr));
tr.b = block;
threshold_restart_bank(&tr);
}

-/*
- * Threshold interrupt handler will service THRESHOLD_APIC_VECTOR. The interrupt
- * goes off when error_count reaches threshold_limit.
- */
-static void amd_threshold_interrupt(void)
+void amd_reset_thr_limit(unsigned int bank)
{
struct threshold_block *first_block = NULL, *block = NULL, *tmp = NULL;
struct threshold_bank **bp = this_cpu_read(threshold_banks);
- unsigned int bank, cpu = smp_processor_id();

/*
* Validate that the threshold bank has been initialized already. The
* handler is installed at boot time, but on a hotplug event the
* interrupt might fire before the data has been initialized.
*/
- if (!bp)
+ if (!bp || !bp[bank])
return;

- for (bank = 0; bank < this_cpu_read(mce_num_banks); ++bank) {
- if (!(per_cpu(bank_map, cpu) & BIT_ULL(bank)))
- continue;
+ first_block = bp[bank]->blocks;
+ if (!first_block)
+ return;

- first_block = bp[bank]->blocks;
- if (!first_block)
- continue;
+ /*
+ * The first block is also the head of the list. Check it first
+ * before iterating over the rest.
+ */
+ reset_block(first_block);
+ list_for_each_entry_safe(block, tmp, &first_block->miscj, miscj)
+ reset_block(block);
+}

- /*
- * The first block is also the head of the list. Check it first
- * before iterating over the rest.
- */
- log_and_reset_block(first_block);
- list_for_each_entry_safe(block, tmp, &first_block->miscj, miscj)
- log_and_reset_block(block);
- }
+/*
+ * Threshold interrupt handler will service THRESHOLD_APIC_VECTOR. The interrupt
+ * goes off when error_count reaches threshold_limit.
+ */
+static void amd_threshold_interrupt(void)
+{
+ machine_check_poll(MCP_TIMESTAMP, this_cpu_ptr(&mce_thr_intr_banks));
}

/*
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 58b8efdcec0b..d6517b93c903 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -660,6 +660,12 @@ static noinstr void mce_read_aux(struct mce *m, int i)
}
}

+static void reset_thr_limit(unsigned int bank)
+{
+ if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
+ return amd_reset_thr_limit(bank);
+}
+
DEFINE_PER_CPU(unsigned, mce_poll_count);

static bool ser_log_poll_error(struct mce *m)
@@ -769,6 +775,8 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
mce_log(&m);

clear_it:
+ reset_thr_limit(i);
+
/*
* Clear state for this bank.
*/
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index 08571b10bf3f..3e062cf01d4d 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -262,6 +262,7 @@ extern bool filter_mce(struct mce *m);
#ifdef CONFIG_X86_MCE_AMD
extern bool amd_filter_mce(struct mce *m);
bool amd_mce_usable_address(struct mce *m);
+void amd_reset_thr_limit(unsigned int bank);

/*
* If MCA_CONFIG[McaLsbInStatusSupported] is set, extract ErrAddr in bits
@@ -290,6 +291,7 @@ static __always_inline void smca_extract_err_addr(struct mce *m)
#else
static inline bool amd_filter_mce(struct mce *m) { return false; }
static inline bool amd_mce_usable_address(struct mce *m) { return false; }
+static inline void amd_reset_thr_limit(unsigned int bank) { }
static inline void smca_extract_err_addr(struct mce *m) { }
#endif

--
2.34.1


2024-05-23 15:58:53

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH 9/9] x86/mce/amd: Support SMCA Corrected Error Interrupt

AMD systems optionally support MCA thresholding which provides the
ability for hardware to send an interrupt when a set error threshold is
reached. This feature counts errors of all severities, but it is
commonly used to report correctable errors with an interrupt rather than
polling.

Scalable MCA systems allow the Platform to take control of this feature.
In this case, the OS will not see the feature configuration and control
bits in the MCA_MISC* registers. The OS will not receive the MCA
thresholding interrupt, and it will need to poll for correctable errors.

A "corrected error interrupt" will be available on Scalable MCA systems.
This will be used in the same configuration where the Platform controls
MCA thresholding. However, the Platform will now be able to send the
MCA thresholding interrupt to the OS.

Check for the feature bit in the MCA_CONFIG register and confirm that
the MCA thresholding interrupt handler is already enabled. If successful,
set the feature enable bit in the MCA_CONFIG register to indicate to the
Platform that the OS is ready for the interrupt.

Signed-off-by: Yazen Ghannam <[email protected]>
---
arch/x86/kernel/cpu/mce/amd.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 7acaa21e11e1..cc1527ff76fc 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -302,6 +302,11 @@ static void smca_configure(unsigned int bank, unsigned int cpu)
high |= BIT(5);
}

+ if ((low & BIT(10)) && this_cpu_read(smca_thr_intr_enabled)) {
+ __set_bit(bank, this_cpu_ptr(mce_thr_intr_banks));
+ high |= BIT(8);
+ }
+
this_cpu_ptr(mce_banks_array)[bank].lsb_in_status = !!(low & BIT(8));

wrmsr(smca_config, low, high);
--
2.34.1


2024-05-23 15:59:14

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH 2/9] x86/mce: Remove unused variable and return value in machine_check_poll()

The recent CMCI storm handling rework removed the last case that checks
the return value of machine_check_poll().

Therefore the "error_seen" variable is no longer used, so remove it.

Fixes: 3ed57b41a412 ("x86/mce: Remove old CMCI storm mitigation code")
Signed-off-by: Yazen Ghannam <[email protected]>
---
arch/x86/include/asm/mce.h | 3 ++-
arch/x86/kernel/cpu/mce/core.c | 7 +------
2 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index de3118305838..bc3813c94c79 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -259,7 +259,8 @@ enum mcp_flags {
MCP_DONTLOG = BIT(2), /* only clear, don't log */
MCP_QUEUE_LOG = BIT(3), /* only queue to genpool */
};
-bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b);
+
+void machine_check_poll(enum mcp_flags flags, mce_banks_t *b);

int mce_notify_irq(void);

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index b5cc557cfc37..287108de210e 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -677,10 +677,9 @@ DEFINE_PER_CPU(unsigned, mce_poll_count);
* is already totally * confused. In this case it's likely it will
* not fully execute the machine check handler either.
*/
-bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
+void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
{
struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
- bool error_seen = false;
struct mce m;
int i;

@@ -754,8 +753,6 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
continue;

log_it:
- error_seen = true;
-
if (flags & MCP_DONTLOG)
goto clear_it;

@@ -787,8 +784,6 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
*/

sync_core();
-
- return error_seen;
}
EXPORT_SYMBOL_GPL(machine_check_poll);

--
2.34.1


2024-05-23 16:00:12

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH 3/9] x86/mce: Increment MCP count only for timer calls

MCP count is currently incremented for any call to machine_check_poll().
Therefore, the count includes calls from the timer, boot-time polling,
and interrupt handlers.

Only increment the MCP count when called from the timer so as to avoid
double counting the interrupt handlers.

Signed-off-by: Yazen Ghannam <[email protected]>
---
arch/x86/kernel/cpu/mce/core.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 287108de210e..70c8df1a766a 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -683,8 +683,6 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
struct mce m;
int i;

- this_cpu_inc(mce_poll_count);
-
mce_gather_info(&m, NULL);

if (flags & MCP_TIMESTAMP)
@@ -1667,8 +1665,10 @@ static void mce_timer_fn(struct timer_list *t)

iv = __this_cpu_read(mce_next_interval);

- if (mce_available(this_cpu_ptr(&cpu_info)))
+ if (mce_available(this_cpu_ptr(&cpu_info))) {
+ this_cpu_inc(mce_poll_count);
mc_poll_banks();
+ }

/*
* Alert userspace if needed. If we logged an MCE, reduce the polling
--
2.34.1


2024-05-23 16:00:33

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH 8/9] x86/mce/amd: Enable interrupt vectors once per-CPU on SMCA systems

Scalable MCA systems have a per-CPU register that gives the APIC LVT
offset for the thresholding and deferred error interrupts.

Currently, this register is read once to set up the deferred error
interrupt and then read again for each thresholding block. Furthermore,
the APIC LVT registers are configured each time, but they only need to
be configured once per-CPU.

Move the APIC LVT setup to the early part of CPU init, so that the
registers are set up once. Also, this ensures that the kernel is ready
to service the interrupts before the individual error sources (each MCA
bank) are enabled.

Apply this change only to SMCA systems to avoid breaking any legacy
behavior. The deferred error interrupt is technically advertised by the
SUCCOR feature. However, this was first made available on SMCA systems.
Therefore, only set up the deferred error interrupt on SMCA systems and
simplify the code.

Signed-off-by: Yazen Ghannam <[email protected]>
---
arch/x86/kernel/cpu/mce/amd.c | 116 +++++++++++++++-------------------
1 file changed, 52 insertions(+), 64 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index c6594da95340..7acaa21e11e1 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -46,9 +46,6 @@
/* Deferred error settings */
#define MSR_CU_DEF_ERR 0xC0000410
#define MASK_DEF_LVTOFF 0x000000F0
-#define MASK_DEF_INT_TYPE 0x00000006
-#define DEF_LVT_OFF 0x2
-#define DEF_INT_TYPE_APIC 0x2

/* Scalable MCA: */

@@ -58,6 +55,8 @@
static bool thresholding_irq_en;
static DEFINE_PER_CPU_READ_MOSTLY(mce_banks_t, mce_thr_intr_banks);
static DEFINE_PER_CPU_READ_MOSTLY(mce_banks_t, mce_dfr_intr_banks);
+static DEFINE_PER_CPU_READ_MOSTLY(bool, smca_thr_intr_enabled);
+static DEFINE_PER_CPU_READ_MOSTLY(bool, smca_dfr_intr_enabled);

static const char * const th_names[] = {
"load_store",
@@ -297,7 +296,8 @@ static void smca_configure(unsigned int bank, unsigned int cpu)
* APIC based interrupt. First, check that no interrupt has been
* set.
*/
- if ((low & BIT(5)) && !((high >> 5) & 0x3)) {
+ if ((low & BIT(5)) && !((high >> 5) & 0x3) &&
+ this_cpu_read(smca_dfr_intr_enabled)) {
__set_bit(bank, this_cpu_ptr(mce_dfr_intr_banks));
high |= BIT(5);
}
@@ -389,6 +389,14 @@ static int lvt_off_valid(struct threshold_block *b, int apic, u32 lo, u32 hi)
{
int msr = (hi & MASK_LVTOFF_HI) >> 20;

+ /*
+ * On SMCA CPUs, LVT offset is programmed at a different MSR, and
+ * the BIOS provides the value. The original field where LVT offset
+ * was set is reserved. Return early here:
+ */
+ if (mce_flags.smca)
+ return 0;
+
if (apic < 0) {
pr_err(FW_BUG "cpu %d, failed to setup threshold interrupt "
"for bank %d, block %d (MSR%08X=0x%x%08x)\n", b->cpu,
@@ -397,14 +405,6 @@ static int lvt_off_valid(struct threshold_block *b, int apic, u32 lo, u32 hi)
}

if (apic != msr) {
- /*
- * On SMCA CPUs, LVT offset is programmed at a different MSR, and
- * the BIOS provides the value. The original field where LVT offset
- * was set is reserved. Return early here:
- */
- if (mce_flags.smca)
- return 0;
-
pr_err(FW_BUG "cpu %d, invalid threshold interrupt offset %d "
"for bank %d, block %d (MSR%08X=0x%x%08x)\n",
b->cpu, apic, b->bank, b->block, b->address, hi, lo);
@@ -485,41 +485,6 @@ static int setup_APIC_mce_threshold(int reserved, int new)
return reserved;
}

-static int setup_APIC_deferred_error(int reserved, int new)
-{
- if (reserved < 0 && !setup_APIC_eilvt(new, DEFERRED_ERROR_VECTOR,
- APIC_EILVT_MSG_FIX, 0))
- return new;
-
- return reserved;
-}
-
-static void deferred_error_interrupt_enable(struct cpuinfo_x86 *c)
-{
- u32 low = 0, high = 0;
- int def_offset = -1, def_new;
-
- if (rdmsr_safe(MSR_CU_DEF_ERR, &low, &high))
- return;
-
- def_new = (low & MASK_DEF_LVTOFF) >> 4;
- if (!(low & MASK_DEF_LVTOFF)) {
- pr_err(FW_BUG "Your BIOS is not setting up LVT offset 0x2 for deferred error IRQs correctly.\n");
- def_new = DEF_LVT_OFF;
- low = (low & ~MASK_DEF_LVTOFF) | (DEF_LVT_OFF << 4);
- }
-
- def_offset = setup_APIC_deferred_error(def_offset, def_new);
- if ((def_offset == def_new) &&
- (deferred_error_int_vector != amd_deferred_error_interrupt))
- deferred_error_int_vector = amd_deferred_error_interrupt;
-
- if (!mce_flags.smca)
- low = (low & ~MASK_DEF_INT_TYPE) | DEF_INT_TYPE_APIC;
-
- wrmsr(MSR_CU_DEF_ERR, low, high);
-}
-
static u32 smca_get_block_address(unsigned int bank, unsigned int block,
unsigned int cpu)
{
@@ -565,7 +530,6 @@ prepare_threshold_block(unsigned int bank, unsigned int block, u32 addr,
int offset, u32 misc_high)
{
unsigned int cpu = smp_processor_id();
- u32 smca_low, smca_high;
struct threshold_block b;
int new;

@@ -585,18 +549,10 @@ prepare_threshold_block(unsigned int bank, unsigned int block, u32 addr,
__set_bit(bank, this_cpu_ptr(mce_thr_intr_banks));
b.interrupt_enable = 1;

- if (!mce_flags.smca) {
- new = (misc_high & MASK_LVTOFF_HI) >> 20;
- goto set_offset;
- }
-
- /* Gather LVT offset for thresholding: */
- if (rdmsr_safe(MSR_CU_DEF_ERR, &smca_low, &smca_high))
- goto out;
-
- new = (smca_low & SMCA_THR_LVT_OFF) >> 12;
+ if (mce_flags.smca)
+ goto done;

-set_offset:
+ new = (misc_high & MASK_LVTOFF_HI) >> 20;
offset = setup_APIC_mce_threshold(offset, new);
if (offset == new)
thresholding_irq_en = true;
@@ -604,7 +560,6 @@ prepare_threshold_block(unsigned int bank, unsigned int block, u32 addr,
done:
mce_threshold_block_init(&b, offset);

-out:
return offset;
}

@@ -673,6 +628,37 @@ static void disable_err_thresholding(struct cpuinfo_x86 *c, unsigned int bank)
wrmsrl(MSR_K7_HWCR, hwcr);
}

+/*
+ * Enable the APIC LVT interrupt vectors once per-CPU. This should be done before hardware is
+ * ready to send interrupts.
+ *
+ * Individual error sources are enabled later during per-bank init.
+ */
+static void smca_enable_interrupt_vectors(struct cpuinfo_x86 *c)
+{
+ u8 thr_offset, dfr_offset;
+ u64 mca_intr_cfg;
+
+ if (!mce_flags.smca || !mce_flags.succor)
+ return;
+
+ if (c == &boot_cpu_data) {
+ mce_threshold_vector = amd_threshold_interrupt;
+ deferred_error_int_vector = amd_deferred_error_interrupt;
+ }
+
+ if (rdmsrl_safe(MSR_CU_DEF_ERR, &mca_intr_cfg))
+ return;
+
+ thr_offset = (mca_intr_cfg & SMCA_THR_LVT_OFF) >> 12;
+ if (!setup_APIC_eilvt(thr_offset, THRESHOLD_APIC_VECTOR, APIC_EILVT_MSG_FIX, 0))
+ this_cpu_write(smca_thr_intr_enabled, true);
+
+ dfr_offset = (mca_intr_cfg & MASK_DEF_LVTOFF) >> 4;
+ if (!setup_APIC_eilvt(dfr_offset, DEFERRED_ERROR_VECTOR, APIC_EILVT_MSG_FIX, 0))
+ this_cpu_write(smca_dfr_intr_enabled, true);
+}
+
/* cpu init entry point, called from mce.c with preempt off */
void mce_amd_feature_init(struct cpuinfo_x86 *c)
{
@@ -680,11 +666,16 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
u32 low = 0, high = 0, address = 0;
int offset = -1;

+ smca_enable_interrupt_vectors(c);

for (bank = 0; bank < this_cpu_read(mce_num_banks); ++bank) {
- if (mce_flags.smca)
+ if (mce_flags.smca) {
smca_configure(bank, cpu);

+ if (!this_cpu_read(smca_thr_intr_enabled))
+ continue;
+ }
+
disable_err_thresholding(c, bank);

for (block = 0; block < NR_BLOCKS; ++block) {
@@ -705,9 +696,6 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
offset = prepare_threshold_block(bank, block, address, offset, high);
}
}
-
- if (mce_flags.succor)
- deferred_error_interrupt_enable(c);
}

/*
--
2.34.1


2024-05-24 14:54:11

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 3/9] x86/mce: Increment MCP count only for timer calls

On Thu, May 23, 2024 at 10:56:35AM -0500, Yazen Ghannam wrote:
> MCP count is currently incremented for any call to machine_check_poll().
> Therefore, the count includes calls from the timer, boot-time polling,
> and interrupt handlers.
>
> Only increment the MCP count when called from the timer so as to avoid
> double counting the interrupt handlers.

Well, but, every time the function is called, we did poll the banks.
Sure, the count is part of /proc/interrupts but we did poll the banks in
those other cases too. So I think showing an accurate poll number is
actually representing the truth, no matter where it is shown...

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Subject: [tip: ras/core] x86/mce: Remove unused variable and return value in machine_check_poll()

The following commit has been merged into the ras/core branch of tip:

Commit-ID: 5b9d292ea87c836ec47483f98344cb0e7add82fe
Gitweb: https://git.kernel.org/tip/5b9d292ea87c836ec47483f98344cb0e7add82fe
Author: Yazen Ghannam <[email protected]>
AuthorDate: Thu, 23 May 2024 10:56:34 -05:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Mon, 27 May 2024 10:49:25 +02:00

x86/mce: Remove unused variable and return value in machine_check_poll()

The recent CMCI storm handling rework removed the last case that checks
the return value of machine_check_poll().

Therefore the "error_seen" variable is no longer used, so remove it.

Fixes: 3ed57b41a412 ("x86/mce: Remove old CMCI storm mitigation code")
Signed-off-by: Yazen Ghannam <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
---
arch/x86/include/asm/mce.h | 3 ++-
arch/x86/kernel/cpu/mce/core.c | 7 +------
2 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index dfd2e96..3ad29b1 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -261,7 +261,8 @@ enum mcp_flags {
MCP_DONTLOG = BIT(2), /* only clear, don't log */
MCP_QUEUE_LOG = BIT(3), /* only queue to genpool */
};
-bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b);
+
+void machine_check_poll(enum mcp_flags flags, mce_banks_t *b);

int mce_notify_irq(void);

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index ad0623b..b85ec7a 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -677,10 +677,9 @@ DEFINE_PER_CPU(unsigned, mce_poll_count);
* is already totally * confused. In this case it's likely it will
* not fully execute the machine check handler either.
*/
-bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
+void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
{
struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
- bool error_seen = false;
struct mce m;
int i;

@@ -754,8 +753,6 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
continue;

log_it:
- error_seen = true;
-
if (flags & MCP_DONTLOG)
goto clear_it;

@@ -787,8 +784,6 @@ clear_it:
*/

sync_core();
-
- return error_seen;
}
EXPORT_SYMBOL_GPL(machine_check_poll);


2024-06-03 14:22:42

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [PATCH 3/9] x86/mce: Increment MCP count only for timer calls

On 5/24/24 10:53 AM, Borislav Petkov wrote:
> On Thu, May 23, 2024 at 10:56:35AM -0500, Yazen Ghannam wrote:
>> MCP count is currently incremented for any call to machine_check_poll().
>> Therefore, the count includes calls from the timer, boot-time polling,
>> and interrupt handlers.
>>
>> Only increment the MCP count when called from the timer so as to avoid
>> double counting the interrupt handlers.
>
> Well, but, every time the function is called, we did poll the banks.
> Sure, the count is part of /proc/interrupts but we did poll the banks in
> those other cases too. So I think showing an accurate poll number is
> actually representing the truth, no matter where it is shown...
>

Okay, fair enough.

In this case, should we also increment the count in __mc_scan_banks()?

Thanks,
Yazen

2024-06-03 15:24:51

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 3/9] x86/mce: Increment MCP count only for timer calls

On Mon, Jun 03, 2024 at 10:22:22AM -0400, Yazen Ghannam wrote:
> In this case, should we also increment the count in __mc_scan_banks()?

Well, that's called only in do_machine_check() and latter's not polling
the banks but called as a result of a #MC exception raised.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-06-03 17:39:43

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 4/9] x86/mce: Move machine_check_poll() status checks to helper functions

On Thu, May 23, 2024 at 10:56:36AM -0500, Yazen Ghannam wrote:
> @@ -709,48 +747,9 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
> if (!mca_cfg.cmci_disabled)
> mce_track_storm(&m);
>
> - /* If this entry is not valid, ignore it */
> - if (!(m.status & MCI_STATUS_VAL))
> + if (!log_poll_error(flags, &m))
> continue;
>
> - /*
> - * If we are logging everything (at CPU online) or this
> - * is a corrected error, then we must log it.
> - */
> - if ((flags & MCP_UC) || !(m.status & MCI_STATUS_UC))
> - goto log_it;
> -
> - /*
> - * Newer Intel systems that support software error
> - * recovery need to make additional checks. Other
> - * CPUs should skip over uncorrected errors, but log
> - * everything else.
> - */

You lost that comment.

> - if (!mca_cfg.ser) {
> - if (m.status & MCI_STATUS_UC)
> - continue;
> - goto log_it;
> - }
> -
> - /* Log "not enabled" (speculative) errors */
> - if (!(m.status & MCI_STATUS_EN))
> - goto log_it;
> -
> - /*
> - * Log UCNA (SDM: 15.6.3 "UCR Error Classification")
> - * UC == 1 && PCC == 0 && S == 0
> - */
> - if (!(m.status & MCI_STATUS_PCC) && !(m.status & MCI_STATUS_S))
> - goto log_it;
> -
> - /*
> - * Skip anything else. Presumption is that our read of this
> - * bank is racing with a machine check. Leave the log alone
> - * for do_machine_check() to deal with it.
> - */
> - continue;
> -
> -log_it:
> if (flags & MCP_DONTLOG)
> goto clear_it;

Btw, the code looks really weird now:

if (!log_poll_error(flags, &m))
continue;

if (flags & MCP_DONTLOG)
goto clear_it;

i.e.,

1. Should I log it?

2. Should I not log it?

Oh well, it was like that before logically so...

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-06-04 09:23:01

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 6/9] x86/mce: Unify AMD THR handler with MCA Polling

On Thu, May 23, 2024 at 10:56:38AM -0500, Yazen Ghannam wrote:
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 58b8efdcec0b..d6517b93c903 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -660,6 +660,12 @@ static noinstr void mce_read_aux(struct mce *m, int i)
> }
> }
>
> +static void reset_thr_limit(unsigned int bank)
> +{
> + if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
> + return amd_reset_thr_limit(bank);
> +}
> +
> DEFINE_PER_CPU(unsigned, mce_poll_count);
>
> static bool ser_log_poll_error(struct mce *m)
> @@ -769,6 +775,8 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
> mce_log(&m);
>
> clear_it:
> + reset_thr_limit(i);

if (mca_cfg.thresholding)
reset_thr_limit(i);

and then you don't have to do a vendor check but simply set
mca_cfg.thresholding on AMD after having defined it in the patch.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-06-04 11:06:07

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 7/9] x86/mce: Unify AMD DFR handler with MCA Polling

On Thu, May 23, 2024 at 10:56:39AM -0500, Yazen Ghannam wrote:
> +static bool smca_log_poll_error(struct mce *m, u32 *status_reg)

That handing of *status_reg back'n'forth just to clear it in the end is
not nice. Let's get rid of it:

---
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 0a9cff329487..a0ba82fe6de3 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -669,7 +669,7 @@ static void reset_thr_limit(unsigned int bank)

DEFINE_PER_CPU(unsigned, mce_poll_count);

-static bool smca_log_poll_error(struct mce *m, u32 *status_reg)
+static bool smca_log_poll_error(struct mce *m, u32 status_reg)
{
/*
* If this is a deferred error found in MCA_STATUS, then clear
@@ -686,8 +686,8 @@ static bool smca_log_poll_error(struct mce *m, u32 *status_reg)
* If the MCA_DESTAT register has valid data, then use
* it as the status register.
*/
- *status_reg = MSR_AMD64_SMCA_MCx_DESTAT(m->bank);
- m->status = mce_rdmsrl(*status_reg);
+ status_reg = MSR_AMD64_SMCA_MCx_DESTAT(m->bank);
+ m->status = mce_rdmsrl(status_reg);

if (!(m->status & MCI_STATUS_VAL))
return false;
@@ -695,6 +695,8 @@ static bool smca_log_poll_error(struct mce *m, u32 *status_reg)
if (m->status & MCI_STATUS_ADDRV)
m->addr = mce_rdmsrl(MSR_AMD64_SMCA_MCx_DEADDR(m->bank));

+ mce_wrmsrl(status_reg, 0);
+
return true;
}

@@ -714,7 +716,7 @@ static bool ser_log_poll_error(struct mce *m)
return false;
}

-static bool log_poll_error(enum mcp_flags flags, struct mce *m, u32 *status_reg)
+static bool log_poll_error(enum mcp_flags flags, struct mce *m, u32 status_reg)
{
if (mce_flags.smca)
return smca_log_poll_error(m, status_reg);
@@ -789,7 +791,7 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
if (!mca_cfg.cmci_disabled)
mce_track_storm(&m);

- if (!log_poll_error(flags, &m, &status_reg))
+ if (!log_poll_error(flags, &m, status_reg))
continue;

if (flags & MCP_DONTLOG)


--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-06-04 11:19:15

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 7/9] x86/mce: Unify AMD DFR handler with MCA Polling

On Thu, May 23, 2024 at 10:56:39AM -0500, Yazen Ghannam wrote:
> -/*
> - * We have three scenarios for checking for Deferred errors:
> - *
> - * 1) Non-SMCA systems check MCA_STATUS and log error if found.
> - * 2) SMCA systems check MCA_STATUS. If error is found then log it and also
> - * clear MCA_DESTAT.
> - * 3) SMCA systems check MCA_DESTAT, if error was not found in MCA_STATUS, and
> - * log it.
> - */

I don't like it when you're killing those written down rules. Are they
not true anymore?

Because smca_log_poll_error() still does exactly that.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-06-04 15:50:05

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 8/9] x86/mce/amd: Enable interrupt vectors once per-CPU on SMCA systems

On Thu, May 23, 2024 at 10:56:40AM -0500, Yazen Ghannam wrote:
> static bool thresholding_irq_en;
> static DEFINE_PER_CPU_READ_MOSTLY(mce_banks_t, mce_thr_intr_banks);
> static DEFINE_PER_CPU_READ_MOSTLY(mce_banks_t, mce_dfr_intr_banks);
> +static DEFINE_PER_CPU_READ_MOSTLY(bool, smca_thr_intr_enabled);
> +static DEFINE_PER_CPU_READ_MOSTLY(bool, smca_dfr_intr_enabled);

So before you add those, we already have:

static DEFINE_PER_CPU_READ_MOSTLY(struct smca_bank[MAX_NR_BANKS], smca_banks);
static DEFINE_PER_CPU_READ_MOSTLY(u8[N_SMCA_BANK_TYPES], smca_bank_counts);
static DEFINE_PER_CPU(struct threshold_bank **, threshold_banks);
static DEFINE_PER_CPU(u64, bank_map);
static DEFINE_PER_CPU(u64, smca_misc_banks_map);

Please think of a proper struct which collects all that info in the
smallest possible format and unify everything.

It is a mess currently.

> +/*
> + * Enable the APIC LVT interrupt vectors once per-CPU. This should be done before hardware is
> + * ready to send interrupts.
> + *
> + * Individual error sources are enabled later during per-bank init.
> + */
> +static void smca_enable_interrupt_vectors(struct cpuinfo_x86 *c)
> +{
> + u8 thr_offset, dfr_offset;
> + u64 mca_intr_cfg;
> +
> + if (!mce_flags.smca || !mce_flags.succor)
> + return;
> +
> + if (c == &boot_cpu_data) {
> + mce_threshold_vector = amd_threshold_interrupt;
> + deferred_error_int_vector = amd_deferred_error_interrupt;
> + }

Nah, this should be done differently: you define a function
cpu_mca_init() which you call from early_identify_cpu(). In it, you do
the proper checks and assign those two vectors above. That in
a pre-patch.

Then, the rest becomes per-CPU code which you simply run in
mce_amd_feature_init(), dilligently, one thing after the other.

And then you don't need smca_{dfr,thr}_intr_enabled anymore because you
know that after having run setup_APIC_eilvt().

IOW, mce_amd_feature_init() does *all* per-CPU MCA init on AMD and it is
all concentrated in one place and not spread around.

I think this should be a much better cleanup.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette