2020-11-27 16:20:11

by Gabriele Paoloni

[permalink] [raw]
Subject: [PATCH v2 0/5] x86/MCE: some minor fixes

During the safety analysis that was done in the context of the
ELISA project by the safety architecture working group some
incorrectnesses were spotted.
This patchset proposes some fixes.

Changes since v1:
- fixed grammar
- improved readibility of patch1 and Cc'd for stable
- kill_it flag renamed to kill_current_task

Signed-off-by: Gabriele Paoloni <[email protected]>
Reviewed-by: Tony Luck <[email protected]>

Gabriele Paoloni (5):
x86/mce: do not overwrite no_way_out if mce_end() fails
x86/mce: move the mce_panic() call and 'kill_it' assignments to the
right places
x86/mce: for LMCE panic only if mca_cfg.tolerant < 3
x86/mce: remove redundant call to irq_work_queue()
x86/mce: rename kill_it as kill_current_task

arch/x86/kernel/cpu/mce/core.c | 39 +++++++++++++++-------------------
1 file changed, 17 insertions(+), 22 deletions(-)

--
2.20.1

---------------------------------------------------------------------
INTEL CORPORATION ITALIA S.p.A. con unico socio
Sede: Milanofiori Palazzo E 4
CAP 20094 Assago (MI)
Capitale Sociale Euro 104.000,00 interamente versato
Partita I.V.A. e Codice Fiscale 04236760155
Repertorio Economico Amministrativo n. 997124
Registro delle Imprese di Milano nr. 183983/5281/33
Soggetta ad attivita' di direzione e coordinamento di
INTEL CORPORATION, USA

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


2020-11-27 16:20:15

by Gabriele Paoloni

[permalink] [raw]
Subject: [PATCH v2 2/5] x86/mce: move the mce_panic() call and 'kill_it' assignments to the right places

Right now for local MCEs we panic(),if needed, right after lmce is
set. For global MCEs mce_reign() takes care of calling mce_panic().
Hence:
- improve readibility by moving the conditional evaluation of
tolerant up to when kill_it is set first;
- move the mce_panic() call up into the statement where mce_end()
fails.

Signed-off-by: Gabriele Paoloni <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
---
arch/x86/kernel/cpu/mce/core.c | 18 +++++++-----------
1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 32b7099e3511..50e9b0893a92 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1350,8 +1350,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
* severity is MCE_AR_SEVERITY we have other options.
*/
if (!(m.mcgstatus & MCG_STATUS_RIPV))
- kill_it = 1;
-
+ kill_it = (cfg->tolerant == 3) ? 0 : 1;
/*
* Check if this MCE is signaled to only this logical processor,
* on Intel, Zhaoxin only.
@@ -1387,6 +1386,12 @@ noinstr void do_machine_check(struct pt_regs *regs)
if (mce_end(order) < 0) {
if (!no_way_out)
no_way_out = worst >= MCE_PANIC_SEVERITY;
+ /*
+ * mce_reign() has probably failed hence evaluate if we need
+ * to panic
+ */
+ if (no_way_out && mca_cfg.tolerant < 3)
+ mce_panic("Fatal machine check on current CPU", &m, msg);
}
} else {
/*
@@ -1403,15 +1408,6 @@ noinstr void do_machine_check(struct pt_regs *regs)
}
}

- /*
- * If tolerant is at an insane level we drop requests to kill
- * processes and continue even when there is no way out.
- */
- if (cfg->tolerant == 3)
- kill_it = 0;
- else if (no_way_out)
- mce_panic("Fatal machine check on current CPU", &m, msg);
-
if (worst > 0)
irq_work_queue(&mce_irq_work);

--
2.20.1

---------------------------------------------------------------------
INTEL CORPORATION ITALIA S.p.A. con unico socio
Sede: Milanofiori Palazzo E 4
CAP 20094 Assago (MI)
Capitale Sociale Euro 104.000,00 interamente versato
Partita I.V.A. e Codice Fiscale 04236760155
Repertorio Economico Amministrativo n. 997124
Registro delle Imprese di Milano nr. 183983/5281/33
Soggetta ad attivita' di direzione e coordinamento di
INTEL CORPORATION, USA

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

2020-11-27 16:20:29

by Gabriele Paoloni

[permalink] [raw]
Subject: [PATCH v2 3/5] x86/mce: for LMCE panic only if mca_cfg.tolerant < 3

Right now for LMCE if no_way_out is set mce_panic() is called
regardless of mca_cfg.tolerant. This is not correct as, if
mca_cfg.tolerant = 3, the code should never panic.

Signed-off-by: Gabriele Paoloni <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
---
arch/x86/kernel/cpu/mce/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 50e9b0893a92..d766a3f6a343 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1367,7 +1367,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
* to see it will clear it.
*/
if (lmce) {
- if (no_way_out)
+ if (no_way_out && mca_cfg.tolerant < 3)
mce_panic("Fatal local machine check", &m, msg);
} else {
order = mce_start(&no_way_out);
--
2.20.1

---------------------------------------------------------------------
INTEL CORPORATION ITALIA S.p.A. con unico socio
Sede: Milanofiori Palazzo E 4
CAP 20094 Assago (MI)
Capitale Sociale Euro 104.000,00 interamente versato
Partita I.V.A. e Codice Fiscale 04236760155
Repertorio Economico Amministrativo n. 997124
Registro delle Imprese di Milano nr. 183983/5281/33
Soggetta ad attivita' di direzione e coordinamento di
INTEL CORPORATION, USA

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

2020-11-27 16:20:48

by Gabriele Paoloni

[permalink] [raw]
Subject: [PATCH v2 4/5] x86/mce: remove redundant call to irq_work_queue()

Right now in do_machine_check() __mc_scan_banks() triggers
the following call tree:
__mc_scan_banks()->mce_log()->irq_work_queue(&mce_irq_work).

Hence the call of irq_work_queue() below after __mc_scan_banks()
seems redundant. Just remove it.

Signed-off-by: Gabriele Paoloni <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
---
arch/x86/kernel/cpu/mce/core.c | 3 ---
1 file changed, 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index d766a3f6a343..802302c54762 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1408,9 +1408,6 @@ noinstr void do_machine_check(struct pt_regs *regs)
}
}

- if (worst > 0)
- irq_work_queue(&mce_irq_work);
-
if (worst != MCE_AR_SEVERITY && !kill_it)
goto out;

--
2.20.1

---------------------------------------------------------------------
INTEL CORPORATION ITALIA S.p.A. con unico socio
Sede: Milanofiori Palazzo E 4
CAP 20094 Assago (MI)
Capitale Sociale Euro 104.000,00 interamente versato
Partita I.V.A. e Codice Fiscale 04236760155
Repertorio Economico Amministrativo n. 997124
Registro delle Imprese di Milano nr. 183983/5281/33
Soggetta ad attivita' di direzione e coordinamento di
INTEL CORPORATION, USA

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

2020-11-27 16:22:18

by Gabriele Paoloni

[permalink] [raw]
Subject: [PATCH v2 1/5] x86/mce: do not overwrite no_way_out if mce_end() fails

Currently if mce_end() fails 'no_way_out' is set equal to 'worst'.
'worst' is the worst severity that was found across the MCA banks
associated with the current CPU; however at this point 'no_way_out'
could have been already set by mca_start() by looking at all
severities of all CPUs that entered the MCE handler.
If mce_end() fails, check first if no_way_out is already set and,
if so, stick to it, otherwise use the local worst value.

Cc: <[email protected]>
Signed-off-by: Gabriele Paoloni <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
---
arch/x86/kernel/cpu/mce/core.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 4102b866e7c0..32b7099e3511 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1384,8 +1384,10 @@ noinstr void do_machine_check(struct pt_regs *regs)
* When there's any problem use only local no_way_out state.
*/
if (!lmce) {
- if (mce_end(order) < 0)
- no_way_out = worst >= MCE_PANIC_SEVERITY;
+ if (mce_end(order) < 0) {
+ if (!no_way_out)
+ no_way_out = worst >= MCE_PANIC_SEVERITY;
+ }
} else {
/*
* If there was a fatal machine check we should have
--
2.20.1

---------------------------------------------------------------------
INTEL CORPORATION ITALIA S.p.A. con unico socio
Sede: Milanofiori Palazzo E 4
CAP 20094 Assago (MI)
Capitale Sociale Euro 104.000,00 interamente versato
Partita I.V.A. e Codice Fiscale 04236760155
Repertorio Economico Amministrativo n. 997124
Registro delle Imprese di Milano nr. 183983/5281/33
Soggetta ad attivita' di direzione e coordinamento di
INTEL CORPORATION, USA

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

2020-11-27 16:23:21

by Gabriele Paoloni

[permalink] [raw]
Subject: [PATCH v2 5/5] x86/mce: rename kill_it as kill_current_task

Currently if an MCE happens in user-mode or while the kernel
is copying data from user space, 'kill_it' is used to check
if we can recover the execution of the interrupted task or
not; the flag name however is not much meaningful, hence
rename it to match its goal.

Signed-off-by: Gabriele Paoloni <[email protected]>
---
arch/x86/kernel/cpu/mce/core.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 802302c54762..740a4fcc1e90 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1320,10 +1320,10 @@ noinstr void do_machine_check(struct pt_regs *regs)
int no_way_out = 0;

/*
- * If kill_it gets set, there might be a way to recover from this
+ * If kill_current_task is not set, there might be a way to recover from this
* error.
*/
- int kill_it = 0;
+ int kill_current_task = 0;

/*
* MCEs are always local on AMD. Same is determined by MCG_STATUS_LMCES
@@ -1350,7 +1350,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
* severity is MCE_AR_SEVERITY we have other options.
*/
if (!(m.mcgstatus & MCG_STATUS_RIPV))
- kill_it = (cfg->tolerant == 3) ? 0 : 1;
+ kill_current_task = (cfg->tolerant == 3) ? 0 : 1;
/*
* Check if this MCE is signaled to only this logical processor,
* on Intel, Zhaoxin only.
@@ -1408,7 +1408,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
}
}

- if (worst != MCE_AR_SEVERITY && !kill_it)
+ if (worst != MCE_AR_SEVERITY && !kill_current_task)
goto out;

/* Fault was in user mode and we need to take some action */
@@ -1416,7 +1416,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
/* If this triggers there is no way to recover. Die hard. */
BUG_ON(!on_thread_stack() || !user_mode(regs));

- queue_task_work(&m, kill_it);
+ queue_task_work(&m, kill_current_task);

} else {
/*
@@ -1434,7 +1434,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
}

if (m.kflags & MCE_IN_KERNEL_COPYIN)
- queue_task_work(&m, kill_it);
+ queue_task_work(&m, kill_current_task);
}
out:
mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
--
2.20.1

---------------------------------------------------------------------
INTEL CORPORATION ITALIA S.p.A. con unico socio
Sede: Milanofiori Palazzo E 4
CAP 20094 Assago (MI)
Capitale Sociale Euro 104.000,00 interamente versato
Partita I.V.A. e Codice Fiscale 04236760155
Repertorio Economico Amministrativo n. 997124
Registro delle Imprese di Milano nr. 183983/5281/33
Soggetta ad attivita' di direzione e coordinamento di
INTEL CORPORATION, USA

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Subject: [tip: x86/urgent] x86/mce: Do not overwrite no_way_out if mce_end() fails

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: 25bc65d8ddfc17cc1d7a45bd48e9bdc0e729ced3
Gitweb: https://git.kernel.org/tip/25bc65d8ddfc17cc1d7a45bd48e9bdc0e729ced3
Author: Gabriele Paoloni <[email protected]>
AuthorDate: Fri, 27 Nov 2020 16:18:15
Committer: Borislav Petkov <[email protected]>
CommitterDate: Fri, 27 Nov 2020 17:38:36 +01:00

x86/mce: Do not overwrite no_way_out if mce_end() fails

Currently, if mce_end() fails, no_way_out - the variable denoting
whether the machine can recover from this MCE - is determined by whether
the worst severity that was found across the MCA banks associated with
the current CPU, is of panic severity.

However, at this point no_way_out could have been already set by
mca_start() after looking at all severities of all CPUs that entered the
MCE handler. If mce_end() fails, check first if no_way_out is already
set and, if so, stick to it, otherwise use the local worst value.

[ bp: Massage. ]

Signed-off-by: Gabriele Paoloni <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Cc: <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/mce/core.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 4102b86..32b7099 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1384,8 +1384,10 @@ noinstr void do_machine_check(struct pt_regs *regs)
* When there's any problem use only local no_way_out state.
*/
if (!lmce) {
- if (mce_end(order) < 0)
- no_way_out = worst >= MCE_PANIC_SEVERITY;
+ if (mce_end(order) < 0) {
+ if (!no_way_out)
+ no_way_out = worst >= MCE_PANIC_SEVERITY;
+ }
} else {
/*
* If there was a fatal machine check we should have

Subject: [tip: ras/core] x86/mce: Remove redundant call to irq_work_queue()

The following commit has been merged into the ras/core branch of tip:

Commit-ID: d5b38e3d0fdb1a16994b449bc338fb8b26816b07
Gitweb: https://git.kernel.org/tip/d5b38e3d0fdb1a16994b449bc338fb8b26816b07
Author: Gabriele Paoloni <[email protected]>
AuthorDate: Fri, 27 Nov 2020 16:18:18
Committer: Borislav Petkov <[email protected]>
CommitterDate: Tue, 01 Dec 2020 18:54:32 +01:00

x86/mce: Remove redundant call to irq_work_queue()

Currently, __mc_scan_banks() in do_machine_check() does the following
callchain:

__mc_scan_banks()->mce_log()->irq_work_queue(&mce_irq_work).

Hence, the call to irq_work_queue() below after __mc_scan_banks()
seems redundant. Just remove it.

Signed-off-by: Gabriele Paoloni <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/mce/core.c | 3 ---
1 file changed, 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 99da2e0..a9991a9 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1406,9 +1406,6 @@ noinstr void do_machine_check(struct pt_regs *regs)
}
}

- if (worst > 0)
- irq_work_queue(&mce_irq_work);
-
if (worst != MCE_AR_SEVERITY && !kill_it)
goto out;

Subject: [tip: ras/core] x86/mce: Panic for LMCE only if mca_cfg.tolerant < 3

The following commit has been merged into the ras/core branch of tip:

Commit-ID: 3a866b16fd2360a9c4ebf71cfbf7ebfe968c1409
Gitweb: https://git.kernel.org/tip/3a866b16fd2360a9c4ebf71cfbf7ebfe968c1409
Author: Gabriele Paoloni <[email protected]>
AuthorDate: Fri, 27 Nov 2020 16:18:17
Committer: Borislav Petkov <[email protected]>
CommitterDate: Tue, 01 Dec 2020 18:49:29 +01:00

x86/mce: Panic for LMCE only if mca_cfg.tolerant < 3

Right now for LMCE, if no_way_out is set, mce_panic() is called
regardless of mca_cfg.tolerant. This is not correct as, if
mca_cfg.tolerant = 3, the code should never panic.

Add that check.

[ bp: use local ptr 'cfg'. ]

Signed-off-by: Gabriele Paoloni <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/mce/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index ebaa52a..99da2e0 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1368,7 +1368,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
* to see it will clear it.
*/
if (lmce) {
- if (no_way_out)
+ if (no_way_out && cfg->tolerant < 3)
mce_panic("Fatal local machine check", &m, msg);
} else {
order = mce_start(&no_way_out);

Subject: [tip: ras/core] x86/mce: Rename kill_it to kill_current_task

The following commit has been merged into the ras/core branch of tip:

Commit-ID: e1c06d2366e743475b91045ef0c2ce1bbd028cb6
Gitweb: https://git.kernel.org/tip/e1c06d2366e743475b91045ef0c2ce1bbd028cb6
Author: Gabriele Paoloni <[email protected]>
AuthorDate: Fri, 27 Nov 2020 16:18:19
Committer: Borislav Petkov <[email protected]>
CommitterDate: Tue, 01 Dec 2020 18:58:50 +01:00

x86/mce: Rename kill_it to kill_current_task

Currently, if an MCE happens in user-mode or while the kernel is copying
data from user space, 'kill_it' is used to check if execution of the
interrupted task can be recovered or not; the flag name however is not
very meaningful, hence rename it to match its goal.

[ bp: Massage commit message, rename the queue_task_work() arg too. ]

Signed-off-by: Gabriele Paoloni <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/mce/core.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index a9991a9..6af6a3c 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1266,14 +1266,14 @@ static void kill_me_maybe(struct callback_head *cb)
}
}

-static void queue_task_work(struct mce *m, int kill_it)
+static void queue_task_work(struct mce *m, int kill_current_task)
{
current->mce_addr = m->addr;
current->mce_kflags = m->kflags;
current->mce_ripv = !!(m->mcgstatus & MCG_STATUS_RIPV);
current->mce_whole_page = whole_page(m);

- if (kill_it)
+ if (kill_current_task)
current->mce_kill_me.func = kill_me_now;
else
current->mce_kill_me.func = kill_me_maybe;
@@ -1321,10 +1321,10 @@ noinstr void do_machine_check(struct pt_regs *regs)
int no_way_out = 0;

/*
- * If kill_it gets set, there might be a way to recover from this
+ * If kill_current_task is not set, there might be a way to recover from this
* error.
*/
- int kill_it = 0;
+ int kill_current_task = 0;

/*
* MCEs are always local on AMD. Same is determined by MCG_STATUS_LMCES
@@ -1351,7 +1351,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
* severity is MCE_AR_SEVERITY we have other options.
*/
if (!(m.mcgstatus & MCG_STATUS_RIPV))
- kill_it = (cfg->tolerant == 3) ? 0 : 1;
+ kill_current_task = (cfg->tolerant == 3) ? 0 : 1;
/*
* Check if this MCE is signaled to only this logical processor,
* on Intel, Zhaoxin only.
@@ -1406,7 +1406,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
}
}

- if (worst != MCE_AR_SEVERITY && !kill_it)
+ if (worst != MCE_AR_SEVERITY && !kill_current_task)
goto out;

/* Fault was in user mode and we need to take some action */
@@ -1414,7 +1414,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
/* If this triggers there is no way to recover. Die hard. */
BUG_ON(!on_thread_stack() || !user_mode(regs));

- queue_task_work(&m, kill_it);
+ queue_task_work(&m, kill_current_task);

} else {
/*
@@ -1432,7 +1432,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
}

if (m.kflags & MCE_IN_KERNEL_COPYIN)
- queue_task_work(&m, kill_it);
+ queue_task_work(&m, kill_current_task);
}
out:
mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);

Subject: [tip: ras/core] x86/mce: Move the mce_panic() call and 'kill_it' assignments to the right places

The following commit has been merged into the ras/core branch of tip:

Commit-ID: e273e6e12ab1db3eb57712bd60655744d0091fa3
Gitweb: https://git.kernel.org/tip/e273e6e12ab1db3eb57712bd60655744d0091fa3
Author: Gabriele Paoloni <[email protected]>
AuthorDate: Fri, 27 Nov 2020 16:18:16
Committer: Borislav Petkov <[email protected]>
CommitterDate: Tue, 01 Dec 2020 18:45:56 +01:00

x86/mce: Move the mce_panic() call and 'kill_it' assignments to the right places

Right now, for local MCEs the machine calls panic(), if needed, right
after lmce is set. For MCE broadcasting, mce_reign() takes care of
calling mce_panic().

Hence:
- improve readability by moving the conditional evaluation of
tolerant up to when kill_it is set first;
- move the mce_panic() call up into the statement where mce_end()
fails.

[ bp: Massage, remove comment in the mce_end() failure case because it
is superfluous; use local ptr 'cfg' in both tests. ]

Signed-off-by: Gabriele Paoloni <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/mce/core.c | 15 ++++-----------
1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index f319bed..ebaa52a 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1351,8 +1351,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
* severity is MCE_AR_SEVERITY we have other options.
*/
if (!(m.mcgstatus & MCG_STATUS_RIPV))
- kill_it = 1;
-
+ kill_it = (cfg->tolerant == 3) ? 0 : 1;
/*
* Check if this MCE is signaled to only this logical processor,
* on Intel, Zhaoxin only.
@@ -1388,6 +1387,9 @@ noinstr void do_machine_check(struct pt_regs *regs)
if (mce_end(order) < 0) {
if (!no_way_out)
no_way_out = worst >= MCE_PANIC_SEVERITY;
+
+ if (no_way_out && cfg->tolerant < 3)
+ mce_panic("Fatal machine check on current CPU", &m, msg);
}
} else {
/*
@@ -1404,15 +1406,6 @@ noinstr void do_machine_check(struct pt_regs *regs)
}
}

- /*
- * If tolerant is at an insane level we drop requests to kill
- * processes and continue even when there is no way out.
- */
- if (cfg->tolerant == 3)
- kill_it = 0;
- else if (no_way_out)
- mce_panic("Fatal machine check on current CPU", &m, msg);
-
if (worst > 0)
irq_work_queue(&mce_irq_work);