2023-03-01 22:14:39

by Yazen Ghannam

[permalink] [raw]
Subject: [PATCH v2] x86/mce: Schedule work after restart from sysfs update

A recent change introduced a flag to queue up errors found during
boot-time polling. These errors will be processed during late init once
the MCE subsystem is fully set up.

A number of sysfs updates call mce_restart() which goes through a subset
of the CPU init flow. This includes polling MCA banks and logging any
errors found. Since the same function is used as boot-time polling,
errors will be queued. However, the system is now past late init, so the
errors will remain queued until another error is found and the workqueue
is triggered.

Call mce_schedule_work() at the end of mce_restart() so that queued
errors are processed.

Fixes: 3bff147b187d ("x86/mce: Defer processing of early errors")
Cc: [email protected]
Signed-off-by: Yazen Ghannam <[email protected]>
---
Link:
https://lore.kernel.org/r/[email protected]

v1->v2:
* Refer to correct function in commit message.

arch/x86/kernel/cpu/mce/core.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 7832a69d170e..2eec60f50057 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -2355,6 +2355,7 @@ static void mce_restart(void)
{
mce_timer_delete_all();
on_each_cpu(mce_cpu_restart, NULL, 1);
+ mce_schedule_work();
}

/* Toggle features for corrected errors */
--
2.34.1



2023-03-01 22:16:50

by Luck, Tony

[permalink] [raw]
Subject: RE: [PATCH v2] x86/mce: Schedule work after restart from sysfs update

> Call mce_schedule_work() at the end of mce_restart() so that queued
> errors are processed.

Reviewed-by: Tony Luck <[email protected]>

-Tony

2023-03-01 22:19:04

by Slade Watkins

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mce: Schedule work after restart from sysfs update

On 3/1/23 17:14, Yazen Ghannam wrote:
> A recent change introduced a flag to queue up errors found during
> boot-time polling. These errors will be processed during late init once
> the MCE subsystem is fully set up.
>
> A number of sysfs updates call mce_restart() which goes through a subset
> of the CPU init flow. This includes polling MCA banks and logging any
> errors found. Since the same function is used as boot-time polling,
> errors will be queued. However, the system is now past late init, so the
> errors will remain queued until another error is found and the workqueue
> is triggered.
>
> Call mce_schedule_work() at the end of mce_restart() so that queued
> errors are processed.
>
> Fixes: 3bff147b187d ("x86/mce: Defer processing of early errors")
> Cc: [email protected]

Yazen,
Despite Cc: [email protected] being here, the list wasn't Cc'd on this
email. Figured I'd let you know in case you create a v3 or resend at any point :).

Cheers,
-- Slade

Subject: [tip: ras/urgent] x86/mce: Make sure logged MCEs are processed after sysfs update

The following commit has been merged into the ras/urgent branch of tip:

Commit-ID: 4783b9cb374af02d49740e00e2da19fd4ed6dec4
Gitweb: https://git.kernel.org/tip/4783b9cb374af02d49740e00e2da19fd4ed6dec4
Author: Yazen Ghannam <[email protected]>
AuthorDate: Wed, 01 Mar 2023 22:14:20
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Sun, 12 Mar 2023 21:12:21 +01:00

x86/mce: Make sure logged MCEs are processed after sysfs update

A recent change introduced a flag to queue up errors found during
boot-time polling. These errors will be processed during late init once
the MCE subsystem is fully set up.

A number of sysfs updates call mce_restart() which goes through a subset
of the CPU init flow. This includes polling MCA banks and logging any
errors found. Since the same function is used as boot-time polling,
errors will be queued. However, the system is now past late init, so the
errors will remain queued until another error is found and the workqueue
is triggered.

Call mce_schedule_work() at the end of mce_restart() so that queued
errors are processed.

Fixes: 3bff147b187d ("x86/mce: Defer processing of early errors")
Signed-off-by: Yazen Ghannam <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/mce/core.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 7832a69..2eec60f 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -2355,6 +2355,7 @@ static void mce_restart(void)
{
mce_timer_delete_all();
on_each_cpu(mce_cpu_restart, NULL, 1);
+ mce_schedule_work();
}

/* Toggle features for corrected errors */

2023-03-14 14:40:53

by Yazen Ghannam

[permalink] [raw]
Subject: Re: [tip: ras/urgent] x86/mce: Make sure logged MCEs are processed after sysfs update

On Sun, Mar 12, 2023 at 08:38:33PM -0000, tip-bot2 for Yazen Ghannam wrote:
> The following commit has been merged into the ras/urgent branch of tip:
>
> Commit-ID: 4783b9cb374af02d49740e00e2da19fd4ed6dec4
> Gitweb: https://git.kernel.org/tip/4783b9cb374af02d49740e00e2da19fd4ed6dec4
> Author: Yazen Ghannam <[email protected]>
> AuthorDate: Wed, 01 Mar 2023 22:14:20
> Committer: Borislav Petkov (AMD) <[email protected]>
> CommitterDate: Sun, 12 Mar 2023 21:12:21 +01:00
>
> x86/mce: Make sure logged MCEs are processed after sysfs update
>
> A recent change introduced a flag to queue up errors found during
> boot-time polling. These errors will be processed during late init once
> the MCE subsystem is fully set up.
>
> A number of sysfs updates call mce_restart() which goes through a subset
> of the CPU init flow. This includes polling MCA banks and logging any
> errors found. Since the same function is used as boot-time polling,
> errors will be queued. However, the system is now past late init, so the
> errors will remain queued until another error is found and the workqueue
> is triggered.
>
> Call mce_schedule_work() at the end of mce_restart() so that queued
> errors are processed.
>
> Fixes: 3bff147b187d ("x86/mce: Defer processing of early errors")
> Signed-off-by: Yazen Ghannam <[email protected]>
> Signed-off-by: Borislav Petkov (AMD) <[email protected]>
> Reviewed-by: Tony Luck <[email protected]>

Thank you!

-Yazen