2023-12-12 21:22:55

by Luck, Tony

[permalink] [raw]
Subject: [PATCH] ACPI: extlog: Clear Extended Error Log status when RAS_CEC handled the error

When both CONFIG_RAS_CEC and CONFIG_ACPI_EXTLOG are enabled, Linux does
not clear the status word of the BIOS supplied error record for corrected
errors. This may prevent logging of subsequent uncorrected errors.

Fix by clearing the status.

Fixes: 23ba710a0864 ("x86/mce: Fix all mce notifiers to update the mce->kflags bitmask")
Reported-by: Erwin Tsaur <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
---
drivers/acpi/acpi_extlog.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
index e120a96e1eae..71e8d4e7a36c 100644
--- a/drivers/acpi/acpi_extlog.c
+++ b/drivers/acpi/acpi_extlog.c
@@ -145,9 +145,14 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
static u32 err_seq;

estatus = extlog_elog_entry_check(cpu, bank);
- if (estatus == NULL || (mce->kflags & MCE_HANDLED_CEC))
+ if (!estatus)
return NOTIFY_DONE;

+ if (mce->kflags & MCE_HANDLED_CEC) {
+ estatus->block_status = 0;
+ return NOTIFY_DONE;
+ }
+
memcpy(elog_buf, (void *)estatus, ELOG_ENTRY_LEN);
/* clear record status to enable BIOS to update it again */
estatus->block_status = 0;
--
2.43.0


2023-12-13 12:53:58

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] ACPI: extlog: Clear Extended Error Log status when RAS_CEC handled the error

On Tue, Dec 12, 2023 at 10:22 PM Tony Luck <[email protected]> wrote:
>
> When both CONFIG_RAS_CEC and CONFIG_ACPI_EXTLOG are enabled, Linux does
> not clear the status word of the BIOS supplied error record for corrected
> errors. This may prevent logging of subsequent uncorrected errors.
>
> Fix by clearing the status.
>
> Fixes: 23ba710a0864 ("x86/mce: Fix all mce notifiers to update the mce->kflags bitmask")
> Reported-by: Erwin Tsaur <[email protected]>
> Signed-off-by: Tony Luck <[email protected]>
> ---
> drivers/acpi/acpi_extlog.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
> index e120a96e1eae..71e8d4e7a36c 100644
> --- a/drivers/acpi/acpi_extlog.c
> +++ b/drivers/acpi/acpi_extlog.c
> @@ -145,9 +145,14 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
> static u32 err_seq;
>
> estatus = extlog_elog_entry_check(cpu, bank);
> - if (estatus == NULL || (mce->kflags & MCE_HANDLED_CEC))
> + if (!estatus)
> return NOTIFY_DONE;
>
> + if (mce->kflags & MCE_HANDLED_CEC) {
> + estatus->block_status = 0;
> + return NOTIFY_DONE;
> + }
> +
> memcpy(elog_buf, (void *)estatus, ELOG_ENTRY_LEN);
> /* clear record status to enable BIOS to update it again */
> estatus->block_status = 0;
> --

Applied as 6.8 material, thanks!