Part 1 fixes an oversight in the existing code that didn't show
up while I was testing the data recovery case (because I never tried
injecting an error into a clean LRU data page ... only to dirty pages
where "memory_failure()" says we have to SIGBUS the process).
Part simply adds the right entries to the severity table so that we
will recognise instruction fetch faults.
Tony Luck (2):
x86/mce: Only restart instruction after machine check recovery if it
is safe
x86/mce: Add instruction recovery signatures to mce-severity table
arch/x86/kernel/cpu/mcheck/mce-severity.c | 10 ++++++++++
arch/x86/kernel/cpu/mcheck/mce.c | 9 ++++++---
2 files changed, 16 insertions(+), 3 deletions(-)
--
1.7.9.rc2.1.g69204
Section 15.3.1.2 of the software developer manual has this to say about the
RIPV bit in the IA32_MCG_STATUS register:
RIPV (restart IP valid) flag, bit 0 — Indicates (when set) that program
execution can be restarted reliably at the instruction pointed to by the
instruction pointer pushed on the stack when the machine-check exception
is generated. When clear, the program cannot be reliably restarted at
the pushed instruction pointer.
We need to save the state of this bit in do_machine_check() and use it
in mce_notify_process() to force a signal; even if memory_failure() says
it made a complete recovery ... e.g. replaced a clean LRU page).
Signed-off-by: Tony Luck <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce.c | 9 ++++++---
1 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 66e1c51..3b8ebdc 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -947,9 +947,10 @@ struct mce_info {
atomic_t inuse;
struct task_struct *t;
__u64 paddr;
+ int restartable;
} mce_info[MCE_INFO_MAX];
-static void mce_save_info(__u64 addr)
+static void mce_save_info(__u64 addr, int c)
{
struct mce_info *mi;
@@ -957,6 +958,7 @@ static void mce_save_info(__u64 addr)
if (atomic_cmpxchg(&mi->inuse, 0, 1) == 0) {
mi->t = current;
mi->paddr = addr;
+ mi->restartable = c;
return;
}
}
@@ -1136,7 +1138,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
mce_panic("Fatal machine check on current CPU", &m, msg);
if (worst == MCE_AR_SEVERITY) {
/* schedule action before return to userland */
- mce_save_info(m.addr);
+ mce_save_info(m.addr, m.mcgstatus & MCG_STATUS_RIPV);
set_thread_flag(TIF_MCE_NOTIFY);
} else if (kill_it) {
force_sig(SIGBUS, current);
@@ -1185,7 +1187,8 @@ void mce_notify_process(void)
pr_err("Uncorrected hardware memory error in user-access at %llx",
mi->paddr);
- if (memory_failure(pfn, MCE_VECTOR, MF_ACTION_REQUIRED) < 0) {
+ if (memory_failure(pfn, MCE_VECTOR, MF_ACTION_REQUIRED) < 0 ||
+ mi->restartable == 0) {
pr_err("Memory error not recovered");
force_sig(SIGBUS, current);
}
--
1.7.9.rc2.1.g69204
Instruction recovery cases are very similar to the data recovery one
we already have. Just trade out for a new MCACOD value.
Signed-off-by: Tony Luck <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce-severity.c | 10 ++++++++++
1 files changed, 10 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c b/arch/x86/kernel/cpu/mcheck/mce-severity.c
index 0c82091..163c4364ca 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-severity.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c
@@ -126,6 +126,16 @@ static struct severity {
SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|MCACOD_DATA),
USER
),
+ MCESEV(
+ KEEP, "HT thread notices Action required: instruction fetch error",
+ SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|MCACOD_INSTR),
+ MCGMASK(MCG_STATUS_EIPV, 0)
+ ),
+ MCESEV(
+ AR, "Action required: instruction fetch error",
+ SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|MCACOD_INSTR),
+ USER
+ ),
#endif
MCESEV(
PANIC, "Action required: unknown MCACOD",
--
1.7.9.rc2.1.g69204
于 2012/5/11 2:01, Tony Luck 写道:
> Section 15.3.1.2 of the software developer manual has this to say
> about the RIPV bit in the IA32_MCG_STATUS register:
>
> RIPV (restart IP valid) flag, bit 0 — Indicates (when set) that
> program execution can be restarted reliably at the instruction
> pointed to by the instruction pointer pushed on the stack when the
> machine-check exception is generated. When clear, the program
> cannot be reliably restarted at the pushed instruction pointer.
>
> We need to save the state of this bit in do_machine_check() and use
> it in mce_notify_process() to force a signal; even if
> memory_failure() says it made a complete recovery ... e.g. replaced
> a clean LRU page).
>
> Signed-off-by: Tony Luck <[email protected]> ---
> arch/x86/kernel/cpu/mcheck/mce.c | 9 ++++++--- 1 files changed,
> 6 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c
> b/arch/x86/kernel/cpu/mcheck/mce.c index 66e1c51..3b8ebdc 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c +++
> b/arch/x86/kernel/cpu/mcheck/mce.c @@ -947,9 +947,10 @@ struct
> mce_info { atomic_t inuse; struct task_struct *t; __u64 paddr; +
> int restartable; } mce_info[MCE_INFO_MAX];
>
> -static void mce_save_info(__u64 addr) +static void
> mce_save_info(__u64 addr, int c) { struct mce_info *mi;
>
> @@ -957,6 +958,7 @@ static void mce_save_info(__u64 addr) if
> (atomic_cmpxchg(&mi->inuse, 0, 1) == 0) { mi->t = current;
> mi->paddr = addr; + mi->restartable = c; return; } } @@ -1136,7
> +1138,7 @@ void do_machine_check(struct pt_regs *regs, long
> error_code) mce_panic("Fatal machine check on current CPU", &m,
> msg); if (worst == MCE_AR_SEVERITY) { /* schedule action before
> return to userland */ - mce_save_info(m.addr); +
> mce_save_info(m.addr, m.mcgstatus & MCG_STATUS_RIPV);
> set_thread_flag(TIF_MCE_NOTIFY); } else if (kill_it) {
> force_sig(SIGBUS, current); @@ -1185,7 +1187,8 @@ void
> mce_notify_process(void)
>
> pr_err("Uncorrected hardware memory error in user-access at %llx",
> mi->paddr); - if (memory_failure(pfn, MCE_VECTOR,
> MF_ACTION_REQUIRED) < 0) { + if (memory_failure(pfn, MCE_VECTOR,
> MF_ACTION_REQUIRED) < 0 || + mi->restartable == 0) {
> pr_err("Memory error not recovered"); force_sig(SIGBUS, current);
> }
How about using following condition to decrease the execution time?
if (mi->restartable == 0 ||
memory_failure(pfn, MCE_VECTOR, MF_ACTION_REQUIRED) < 0)
Since restart operation is impossible, whether recovery operation can
be avoided?
于 2012/5/11 2:12, Tony Luck 写道:
> Instruction recovery cases are very similar to the data recovery one
> we already have. Just trade out for a new MCACOD value.
>
> Signed-off-by: Tony Luck <[email protected]>
> ---
> arch/x86/kernel/cpu/mcheck/mce-severity.c | 10 ++++++++++
> 1 files changed, 10 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c b/arch/x86/kernel/cpu/mcheck/mce-severity.c
> index 0c82091..163c4364ca 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce-severity.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c
> @@ -126,6 +126,16 @@ static struct severity {
> SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|MCACOD_DATA),
> USER
> ),
> + MCESEV(
> + KEEP, "HT thread notices Action required: instruction fetch error",
> + SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|MCACOD_INSTR),
> + MCGMASK(MCG_STATUS_EIPV, 0)
> + ),
> + MCESEV(
> + AR, "Action required: instruction fetch error",
> + SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|MCACOD_INSTR),
> + USER
> + ),
> #endif
> MCESEV(
> PANIC, "Action required: unknown MCACOD",
For IFU, on affected logical processors, RIPV and EIPV both are 0,
since now new IFU entries are added into severity table, the old
entry as below should be removed:
/* Neither return not error IP -- no chance to recover -> PANIC */
MCESEV(
PANIC, "Neither restart nor error IP",
MCGMASK(MCG_STATUS_RIPV|MCG_STATUS_EIPV, 0)
),
> How about using following condition to decrease the execution time?
> if (mi->restartable == 0 ||
> memory_failure(pfn, MCE_VECTOR, MF_ACTION_REQUIRED) < 0)
>
> Since restart operation is impossible, whether recovery operation can
> be avoided?
Because I still want the side-effects from memory_failure() of
marking the page as poisoned and SRAO signal to other users of
the page (or no signal at all to the others for clean LRU page).
Perhaps this deserves a comment
-Tony
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m????????????I?
> For IFU, on affected logical processors, RIPV and EIPV both are 0,
> since now new IFU entries are added into severity table, the old
> entry as below should be removed:
>
> /* Neither return not error IP -- no chance to recover -> PANIC */
> MCESEV(
> PANIC, "Neither restart nor error IP",
> MCGMASK(MCG_STATUS_RIPV|MCG_STATUS_EIPV, 0)
> ),
We need to keep this. If EIPV is not set, then CS and IP on the
stack are not guaranteed ... so we can't tell whether the error
happened in user or kernel mode. This makes recovery "challenging".
I'm trying to figure out a quirk for processors that do generate
EIPV=RIPV=0 signature for IFU errors. There are some case where
we can work around the lack of EIPV.
-Tony
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m????????????I?
On Thu, May 10, 2012 at 11:01:20AM -0700, Tony Luck wrote:
> Section 15.3.1.2 of the software developer manual has this to say about the
> RIPV bit in the IA32_MCG_STATUS register:
>
> RIPV (restart IP valid) flag, bit 0 — Indicates (when set) that program
> execution can be restarted reliably at the instruction pointed to by the
> instruction pointer pushed on the stack when the machine-check exception
> is generated. When clear, the program cannot be reliably restarted at
> the pushed instruction pointer.
>
> We need to save the state of this bit in do_machine_check() and use it
> in mce_notify_process() to force a signal; even if memory_failure() says
> it made a complete recovery ... e.g. replaced a clean LRU page).
>
> Signed-off-by: Tony Luck <[email protected]>
> ---
> arch/x86/kernel/cpu/mcheck/mce.c | 9 ++++++---
> 1 files changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 66e1c51..3b8ebdc 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -947,9 +947,10 @@ struct mce_info {
> atomic_t inuse;
> struct task_struct *t;
> __u64 paddr;
> + int restartable;
Is it me or does this look like a flag, or a bitfield? Instead of
wasting a whole integer for a single bit of information.
It will probably end up the same size though due to compiler padding
since this struct is currently 4 + 2*8 byte without the ->restartable
thing.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
On Fri, May 11, 2012 at 05:42:09PM +0000, Luck, Tony wrote:
> > For IFU, on affected logical processors, RIPV and EIPV both are 0,
> > since now new IFU entries are added into severity table, the old
> > entry as below should be removed:
> >
> > /* Neither return not error IP -- no chance to recover -> PANIC */
> > MCESEV(
> > PANIC, "Neither restart nor error IP",
> > MCGMASK(MCG_STATUS_RIPV|MCG_STATUS_EIPV, 0)
> > ),
>
> We need to keep this. If EIPV is not set, then CS and IP on the
> stack are not guaranteed ... so we can't tell whether the error
> happened in user or kernel mode. This makes recovery "challenging".
>
> I'm trying to figure out a quirk for processors that do generate
> EIPV=RIPV=0 signature for IFU errors. There are some case where
> we can work around the lack of EIPV.
__mcheck_cpu_apply_quirks?
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
>> + int restartable;
>
> Is it me or does this look like a flag, or a bitfield? Instead of
> wasting a whole integer for a single bit of information.
I could make it "int flags;" and "#define MCE_INFO_RESTARTABLE 1"
to make it clear that we have lots more bits available for special
cases?
> It will probably end up the same size though due to compiler padding
> since this struct is currently 4 + 2*8 byte without the ->restartable
> thing.
Yup - we can't save any memory (unless we introduce more complexity to
the code ... low PAGE_SHIFT bits of the "paddr" field are simply thrown
away ... so we could allocate a bit there ... but I don't think that
the resulting ugliness is worth the memory savings).
-Tony
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m????????????I?
> > I'm trying to figure out a quirk for processors that do generate
> > EIPV=RIPV=0 signature for IFU errors. There are some case where
> > we can work around the lack of EIPV.
>
> __mcheck_cpu_apply_quirks?
The quirk is a bit more extensive than just setting some flag in there
for "if (c->x86 == 6 && c->x86_model == 45)". Sometime EIPV isn't set
because the regs->ip and regs->cs really aren't valid. So I'll need
to do something like:
if (we are on Intel family 6 model 45 AND
this is a UC=1, PCC=0, AR=1, S=1, ADDRV=1, MCACOD=0x150 error AND
(virtophys(regs->ip) >> PAGE_SHIFT) == (MC(bank)_ADDR >> PAGE_SHIFT)) {
/* ok to trust CS & IP */
m.mcgstatus |= MCG_STATUS_EIPV;
}
before running through the mce_severity() table lookup.
x86 doesn't seem to have a "virtophys()" that I can find. Closest is lookup_address(vaddr, level)
which gets a pte (which is almost all the work).
Sigh!
-Tony
On Mon, May 14, 2012 at 04:16:05PM +0000, Luck, Tony wrote:
> >> + int restartable;
> >
> > Is it me or does this look like a flag, or a bitfield? Instead of
> > wasting a whole integer for a single bit of information.
>
> I could make it "int flags;" and "#define MCE_INFO_RESTARTABLE 1"
> to make it clear that we have lots more bits available for special
> cases?
I was wondering about that but don't have any other flags/use cases.
Well, we can always change it later if needed.
> > It will probably end up the same size though due to compiler padding
> > since this struct is currently 4 + 2*8 byte without the ->restartable
> > thing.
>
> Yup - we can't save any memory (unless we introduce more complexity to
> the code ... low PAGE_SHIFT bits of the "paddr" field are simply thrown
> away ... so we could allocate a bit there ... but I don't think that
> the resulting ugliness is worth the memory savings).
Agreed, that would be too ugly for no reason :-)
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
On Mon, May 14, 2012 at 04:28:35PM +0000, Luck, Tony wrote:
> > > I'm trying to figure out a quirk for processors that do generate
> > > EIPV=RIPV=0 signature for IFU errors. There are some case where
> > > we can work around the lack of EIPV.
> >
> > __mcheck_cpu_apply_quirks?
>
> The quirk is a bit more extensive than just setting some flag in there
> for "if (c->x86 == 6 && c->x86_model == 45)". Sometime EIPV isn't set
> because the regs->ip and regs->cs really aren't valid. So I'll need
> to do something like:
>
> if (we are on Intel family 6 model 45 AND
> this is a UC=1, PCC=0, AR=1, S=1, ADDRV=1, MCACOD=0x150 error AND
> (virtophys(regs->ip) >> PAGE_SHIFT) == (MC(bank)_ADDR >> PAGE_SHIFT)) {
> /* ok to trust CS & IP */
> m.mcgstatus |= MCG_STATUS_EIPV;
> }
Hmm, maybe add a flag to struct mce or whatever comes handy in that
codepath which says ->on_this_cpu_it_is_ok_to_set_MCG_STATUS_EIPV and
query it in the mce_severity() thing?
Btw, this severities deal could be much more helpful if it were a bunch
of per-vendor, per-bank or whatever function pointers so that you can do
arbitrary filtering there.
> before running through the mce_severity() table lookup.
>
> x86 doesn't seem to have a "virtophys()" that I can find. Closest is lookup_address(vaddr, level)
> which gets a pte (which is almost all the work).
>
> Sigh!
What does __pa() do actually?
Oh, that's only for kernel memory, it seems.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
On Mon, May 14, 2012 at 10:17 AM, Borislav Petkov <[email protected]> wrote:
> I was wondering about that but don't have any other flags/use cases.
> Well, we can always change it later if needed.
So is that an Ack? I was thinking that I should send this part to
Linus now ... since restarting from a fault when RIPV isn't set
could result in an application doing something bad (like getting
the wrong answer).
The instruction stuff is new and needs to wait for the merge window.
-Tony
On Mon, May 14, 2012 at 11:02:59AM -0700, Tony Luck wrote:
> On Mon, May 14, 2012 at 10:17 AM, Borislav Petkov <[email protected]> wrote:
> > I was wondering about that but don't have any other flags/use cases.
> > Well, we can always change it later if needed.
>
> So is that an Ack? I was thinking that I should send this part to
> Linus now ... since restarting from a fault when RIPV isn't set
> could result in an application doing something bad (like getting
> the wrong answer).
Yep, sure.
Thanks.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551