2021-02-22 03:35:21

by yaoaili [么爱利]

[permalink] [raw]
Subject: x86/mce: fix wrong no-return-ip logic in do_machine_check()

From commit b2f9d678e28c ("x86/mce: Check for faults tagged in
EXTABLE_CLASS_FAULT exception table entries"), When there is a
memory MCE_AR_SEVERITY error with no return ip, Only a SIGBUS
signal is send to current. As the page is not poisoned, the SIGBUS
process coredump step in kernel will touch the error page again,
whick result to a fatal error. We need to poison the page and then
kill current in memory-failure module.

So fix it using the orinigal checking method.

Signed-off-by: Aili Yao <[email protected]>
---
arch/x86/kernel/cpu/mce/core.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index e133ce1e562b..ae09b0279422 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1413,9 +1413,10 @@ noinstr void do_machine_check(struct pt_regs *regs)
if ((m.cs & 3) == 3) {
/* If this triggers there is no way to recover. Die hard. */
BUG_ON(!on_thread_stack() || !user_mode(regs));
-
- queue_task_work(&m, kill_current_task);
-
+ if (worst == MCE_AR_SEVERITY)
+ queue_task_work(&m, 0);
+ else if (kill_current_task)
+ queue_task_work(&m, kill_current_task);
} else {
/*
* Handle an MCE which has happened in kernel space but from
--
2.25.1


2021-02-22 03:53:51

by yaoaili [么爱利]

[permalink] [raw]
Subject: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

From commit b2f9d678e28c ("x86/mce: Check for faults tagged in
EXTABLE_CLASS_FAULT exception table entries"), When there is a
memory MCE_AR_SEVERITY error with no return ip, Only a SIGBUS
signal is send to current. As the page is not poisoned, the SIGBUS
process's coredump step in kernel will touch the error page again,
which result to a fatal error. We need to poison the page and then
kill current in memory-failure module.

So fix it using the orinigal checking method.

Signed-off-by: Aili Yao <[email protected]>
---
arch/x86/kernel/cpu/mce/core.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index e133ce1e562b..70380d7d98b3 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1414,7 +1414,10 @@ noinstr void do_machine_check(struct pt_regs *regs)
/* If this triggers there is no way to recover. Die hard. */
BUG_ON(!on_thread_stack() || !user_mode(regs));

- queue_task_work(&m, kill_current_task);
+ if (worst == MCE_AR_SEVERITY)
+ queue_task_work(&m, 0);
+ else if (kill_current_task)
+ queue_task_work(&m, kill_current_task);

} else {
/*
--
2.25.1

2021-02-22 09:26:41

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

On Mon, Feb 22, 2021 at 11:50:07AM +0800, Aili Yao wrote:
> From commit b2f9d678e28c ("x86/mce: Check for faults tagged in
> EXTABLE_CLASS_FAULT exception table entries"), When there is a
> memory MCE_AR_SEVERITY error with no return ip,

What is a "no return ip" - MCG_STATUS_RIPV?

How do you trigger this error?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-02-22 09:36:22

by yaoaili [么爱利]

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

On Mon, 22 Feb 2021 10:24:03 +0100
Borislav Petkov <[email protected]> wrote:

> On Mon, Feb 22, 2021 at 11:50:07AM +0800, Aili Yao wrote:
> > From commit b2f9d678e28c ("x86/mce: Check for faults tagged in
> > EXTABLE_CLASS_FAULT exception table entries"), When there is a
> > memory MCE_AR_SEVERITY error with no return ip,
>
> What is a "no return ip" - MCG_STATUS_RIPV?

yes

> How do you trigger this error?

you can inject a memory UE to a VM, it should always be MCG_STATUS_RIPV 0.

Best Regard!
Aili Yao

2021-02-22 10:11:39

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

On Mon, Feb 22, 2021 at 05:31:09PM +0800, Aili Yao wrote:
> you can inject a memory UE to a VM, it should always be MCG_STATUS_RIPV 0.

So the signature you injected is not something the hardware would
generate - you just didn't set MCG_STATUS_RIPV.

If so, why should the code handle invalid signatures which the harware
cannot generate?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-02-22 10:13:59

by yaoaili [么爱利]

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

On Mon, 22 Feb 2021 11:03:56 +0100
Borislav Petkov <[email protected]> wrote:

> On Mon, Feb 22, 2021 at 05:31:09PM +0800, Aili Yao wrote:
> > you can inject a memory UE to a VM, it should always be MCG_STATUS_RIPV 0.
>
> So the signature you injected is not something the hardware would
> generate - you just didn't set MCG_STATUS_RIPV.
>
> If so, why should the code handle invalid signatures which the harware
> cannot generate?
>

So why would intel provide this MCG_STATUS_RIPV flag, it's better to remove it as it will
never be set, and all the related logic for this flag is really needed ?

2021-02-22 10:26:41

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

On Mon, Feb 22, 2021 at 06:08:19PM +0800, Aili Yao wrote:
> So why would intel provide this MCG_STATUS_RIPV flag, it's better to
> remove it as it will never be set, and all the related logic for this
> flag is really needed ?

Why would it never be set - of course it will be. You don't set it. If
you wanna inject errors, then make sure you inject *valid* errors which
the hardware *actually* generates, not some random ones.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-02-22 11:24:08

by yaoaili [么爱利]

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

On Mon, 22 Feb 2021 11:22:06 +0100
Borislav Petkov <[email protected]> wrote:

> On Mon, Feb 22, 2021 at 06:08:19PM +0800, Aili Yao wrote:
> > So why would intel provide this MCG_STATUS_RIPV flag, it's better to
> > remove it as it will never be set, and all the related logic for this
> > flag is really needed ?
>
> Why would it never be set - of course it will be. You don't set it. If
> you wanna inject errors, then make sure you inject *valid* errors which
> the hardware *actually* generates, not some random ones.
>

As far as I know, Most of RAS related tests are faked, not real errors, and it's really meaningful.

You should better reproduce the issue I tried to fix, or at least read the code more detailly and you will
know if it's random and invalid

Best Regards!
Aili Yao

2021-02-22 12:54:53

by yaoaili [么爱利]

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

On Mon, 22 Feb 2021 19:21:46 +0800
Aili Yao <[email protected]> wrote:

> On Mon, 22 Feb 2021 11:22:06 +0100
> Borislav Petkov <[email protected]> wrote:
>
> > On Mon, Feb 22, 2021 at 06:08:19PM +0800, Aili Yao wrote:
> > > So why would intel provide this MCG_STATUS_RIPV flag, it's better to
> > > remove it as it will never be set, and all the related logic for this
> > > flag is really needed ?
> >
> > Why would it never be set - of course it will be. You don't set it. If
> > you wanna inject errors, then make sure you inject *valid* errors which
> > the hardware *actually* generates, not some random ones.
> >
>
> As far as I know, Most of RAS related tests are faked, not real errors, and it's really meaningful.
>
> You should better reproduce the issue I tried to fix, or at least read the code more detailly and you will
> know if it's random and invalid
>
I See this in sdm 325462:

AR (Action Required) flag, bit 55 - Indicates (when set) that MCA error code specific recovery action must be
performed by system software at the time this error was signaled. This recovery action must be completed
successfully before any additional work is scheduled for this processor.
-------------------
When the RIPV flag in the IA32_MCG_STATUS is clear, an alternative execution stream needs to be provided;
------------------
when the MCA error code
specific recovery specific recovery action cannot be successfully completed, system software must shut down
the system. When the AR flag in the IA32_MCi_STATUS register is clear, system software may still take MCA
error code specific recovery action but this is optional; system software can safely resume program execution
at the instruction pointer saved on the stack from the machine check exception when the RIPV flag in the
IA32_MCG_STATUS register is set.

Best Regards!
Aili Yao

2021-02-22 12:57:43

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

On Mon, Feb 22, 2021 at 08:17:23PM +0800, Aili Yao wrote:
> AR (Action Required) flag, bit 55 - Indicates (when set) that MCA
> error code specific recovery action must be...

Give me the *exact* MCE signature you're injecting please.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-02-22 13:02:46

by yaoaili [么爱利]

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

On Mon, 22 Feb 2021 13:22:41 +0100
Borislav Petkov <[email protected]> wrote:

> On Mon, Feb 22, 2021 at 08:17:23PM +0800, Aili Yao wrote:
> > AR (Action Required) flag, bit 55 - Indicates (when set) that MCA
> > error code specific recovery action must be...
>
> Give me the *exact* MCE signature you're injecting please.
>
> Thx.
>

Guest VM, the qemu has no way to know the RIPV value, so always get it cleared.

Hardware event. This is not a software error.
MCE 0
CPU 9 BANK 9 TSC 103d511e68c
RIP 33:401270
MISC 8c ADDR 10e91d000
TIME 1613974147 Mon Feb 22 01:09:07 2021
MCG status:EIPV MCIP LMCE
MCi status:
Uncorrected error
Error enabled
MCi_MISC register valid
MCi_ADDR register valid
SRAR
MCA: Data CACHE Level-0 Data-Read Error
STATUS bd80000000000134 MCGSTATUS e
MCGCAP 900010a APICID 9 SOCKETID 9
MICROCODE 1
CPUID Vendor Intel Family 6 Model 85 Step 7

Host:
Hardware event. This is not a software error.
MCE 0
CPU 1 BANK 1 TSC 1ee4f074462
RIP 33:4013a6
MISC 86 ADDR 10ed608000
TIME 1613985132 Mon Feb 22 17:12:12 2021
MCG status:RIPV EIPV MCIP LMCE
MCi status:
Uncorrected error
Error enabled
MCi_MISC register valid
MCi_ADDR register valid
SRAR
MCA: Data CACHE Level-0 Data-Read Error
STATUS bd80000000100134 MCGSTATUS f
MCGCAP f000c14 APICID 2 SOCKETID 0
MICROCODE 5000021
CPUID Vendor Intel Family 6 Model 85

2021-02-22 13:59:38

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

On Mon, Feb 22, 2021 at 08:35:49PM +0800, Aili Yao wrote:
> Guest VM, the qemu has no way to know the RIPV value, so always get it
> cleared.

What does that mean?

The guest VM will get the MCE signature it gets from the host kernel so
the host kernel most definitely knows the RIPV value.

It looks like you're testing how guests will handle MCEs which the host
has caught and wants to inject into the guest for further handling. What
is your exact use case? Please explain in detail how I can reproduce it
step-by-step locally.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-02-23 02:38:31

by yaoaili [么爱利]

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

On Mon, 22 Feb 2021 13:45:50 +0100
Borislav Petkov <[email protected]> wrote:

> On Mon, Feb 22, 2021 at 08:35:49PM +0800, Aili Yao wrote:
> > Guest VM, the qemu has no way to know the RIPV value, so always get it
> > cleared.
>
> What does that mean?
>
> The guest VM will get the MCE signature it gets from the host kernel so
> the host kernel most definitely knows the RIPV value.

When Guest access one address with UE error, it will exit guest mode, the host
will do the recovery job, and then one SIGBUS is send to the VCPU and qemu will
catch the signal, there is only address and error level no RIPV in signal, so qemu will
assume RIPV is cleared and inject the error into guest OS.

> It looks like you're testing how guests will handle MCEs which the host
> has caught and wants to inject into the guest for further handling. What
> is your exact use case? Please explain in detail how I can reproduce it
> step-by-step locally.

Yeah, there are multiple steps i do:
1. One small test code in guest OS access one address A which will be injected UC error,
the address will be logged, and use vtop you can get the guest physical address.

2. Using "virsh qemu-monitor-command guest --hmp gpa2hvagpa2hva 0xxxxxx" to get the user
virtual address,

3. Using vtop you can get host physical address from the above user address.

4. Inject 0x10 level error using einj module.

5. then when guest access the address, you will see what happens.

Please using latest upstream kernel for guest OS, and you may change monarch_timeout to a bigger
value, or you will see other issues not only talked one.

Tks

Best Regards!
Aili Yao

2021-02-23 11:31:01

by yaoaili [么爱利]

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

On Tue, 23 Feb 2021 11:05:38 +0100
Borislav Petkov <[email protected]> wrote:

> On Tue, Feb 23, 2021 at 05:56:40PM +0800, Aili Yao wrote:
> > What i inject is AR error, and I don't see MCG_STATUS_RIPV flag.
>
> Then keep debugging qemu to figure out why that is.
>

What I think is qemu has not an easy to get the MCE signature from host or currently no methods for this
So qemu treat all AR will be No RIPV, Do more is better than do less.

Thanks
Aili Yao

2021-02-23 13:52:35

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

On Tue, Feb 23, 2021 at 10:27:55AM +0800, Aili Yao wrote:
> When Guest access one address with UE error, it will exit guest mode,
> the host will do the recovery job, and then one SIGBUS is send to
> the VCPU and qemu will catch the signal, there is only address and
> error level no RIPV in signal, so qemu will assume RIPV is cleared and
> inject the error into guest OS.

Lemme see:

void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)

/* If we get an action required MCE, it has been injected by KVM
* while the VM was running. An action optional MCE instead should
* be coming from the main thread, which qemu_init_sigbus identifies
* as the "early kill" thread.
*/
assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);

...

kvm_mce_inject(cpu, paddr, code);

in that function:

if (code == BUS_MCEERR_AR) {
status |= MCI_STATUS_AR | 0x134;
mcg_status |= MCG_STATUS_EIPV;
} else {
status |= 0xc0;
mcg_status |= MCG_STATUS_RIPV;
}

That looks like a valid RIP bit to me. Then cpu_x86_inject_mce() gets
that mcg_status and injects it into the guest.

So I can't follow your claim - qemu does handle RIPV just fine, it
seems.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-02-23 14:02:50

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

On Tue, Feb 23, 2021 at 05:56:40PM +0800, Aili Yao wrote:
> What i inject is AR error, and I don't see MCG_STATUS_RIPV flag.

Then keep debugging qemu to figure out why that is.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-02-23 14:03:07

by yaoaili [么爱利]

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

On Tue, 23 Feb 2021 10:43:00 +0100
Borislav Petkov <[email protected]> wrote:

> On Tue, Feb 23, 2021 at 10:27:55AM +0800, Aili Yao wrote:
> > When Guest access one address with UE error, it will exit guest mode,
> > the host will do the recovery job, and then one SIGBUS is send to
> > the VCPU and qemu will catch the signal, there is only address and
> > error level no RIPV in signal, so qemu will assume RIPV is cleared and
> > inject the error into guest OS.
>
> Lemme see:
>
> void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
>
> /* If we get an action required MCE, it has been injected by KVM
> * while the VM was running. An action optional MCE instead should
> * be coming from the main thread, which qemu_init_sigbus identifies
> * as the "early kill" thread.
> */
> assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
>
> ...
>
> kvm_mce_inject(cpu, paddr, code);
>
> in that function:
>
> if (code == BUS_MCEERR_AR) {
> status |= MCI_STATUS_AR | 0x134;
> mcg_status |= MCG_STATUS_EIPV;
> } else {
> status |= 0xc0;
> mcg_status |= MCG_STATUS_RIPV;
> }
>
> That looks like a valid RIP bit to me. Then cpu_x86_inject_mce() gets
> that mcg_status and injects it into the guest.

What i inject is AR error, and I don't see MCG_STATUS_RIPV flag.

Tks
Aili Yao


2021-02-23 20:21:55

by Luck, Tony

[permalink] [raw]
Subject: RE: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

> What I think is qemu has not an easy to get the MCE signature from host or currently no methods for this
> So qemu treat all AR will be No RIPV, Do more is better than do less.

RIPV would be important in the guest in the case where the guest can fix the problem that caused
the machine check and return to the failed instruction to continue.

I think the only case where this happens is a fault in a read-only page mapped from a file (typically
code page, but could be a data page). In this case memory-failure() unmaps the page with the posion
but Linux can recover by reading data from the file into a new page.

Other cases we send SIGBUS (so go to the signal handler instead of to the faulting instruction).

So it would be good if the state of RIPV could be added to the signal state sent to qemu. If that
isn't possible, then this full recovery case turns into another SIGBUS case.

-Tony

2021-02-24 07:11:31

by yaoaili [么爱利]

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

On Tue, 23 Feb 2021 16:12:43 +0000
"Luck, Tony" <[email protected]> wrote:

> > What I think is qemu has not an easy to get the MCE signature from host or currently no methods for this
> > So qemu treat all AR will be No RIPV, Do more is better than do less.
>
> RIPV would be important in the guest in the case where the guest can fix the problem that caused
> the machine check and return to the failed instruction to continue.
>
> I think the only case where this happens is a fault in a read-only page mapped from a file (typically
> code page, but could be a data page). In this case memory-failure() unmaps the page with the posion
> but Linux can recover by reading data from the file into a new page.
>
> Other cases we send SIGBUS (so go to the signal handler instead of to the faulting instruction).
>
> So it would be good if the state of RIPV could be added to the signal state sent to qemu. If that
> isn't possible, then this full recovery case turns into another SIGBUS case.

This KVM and VM case of failing recovery for SRAR is just one scenario I think,
If Intel guarantee that when memory SRAR is triggered, RIPV will always be set, then it's the job of qemu to
set the RIPV instead.
Or if When SRAR is triggered with RIPV cleared, the same issue will be true for host.

And I think it's better for VM to know the real RIPV value, It need more work in qemu and kernel if possible.

Thanks
Aili Yao

2021-03-24 09:30:42

by yaoaili [么爱利]

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

On Wed, 24 Feb 2021 10:39:21 +0800
Aili Yao <[email protected]> wrote:

> On Tue, 23 Feb 2021 16:12:43 +0000
> "Luck, Tony" <[email protected]> wrote:
>
> > > What I think is qemu has not an easy to get the MCE signature from host or currently no methods for this
> > > So qemu treat all AR will be No RIPV, Do more is better than do less.
> >
> > RIPV would be important in the guest in the case where the guest can fix the problem that caused
> > the machine check and return to the failed instruction to continue.
> >
> > I think the only case where this happens is a fault in a read-only page mapped from a file (typically
> > code page, but could be a data page). In this case memory-failure() unmaps the page with the posion
> > but Linux can recover by reading data from the file into a new page.
> >
> > Other cases we send SIGBUS (so go to the signal handler instead of to the faulting instruction).
> >
> > So it would be good if the state of RIPV could be added to the signal state sent to qemu. If that
> > isn't possible, then this full recovery case turns into another SIGBUS case.
>
> This KVM and VM case of failing recovery for SRAR is just one scenario I think,
> If Intel guarantee that when memory SRAR is triggered, RIPV will always be set, then it's the job of qemu to
> set the RIPV instead.
> Or if When SRAR is triggered with RIPV cleared, the same issue will be true for host.
>
> And I think it's better for VM to know the real RIPV value, It need more work in qemu and kernel if possible.
>
> Thanks
> Aili Yao

ADD this topic to qemu list, this is really one bad issue.

Issue report:
when VM receive one SRAR memory failure from host, it all has RIPV cleared, and then vm process it and trigger one panic!

Can any qemu maintainer fix this?

Suggestion:
qemu get the true value of RIPV from host, the inject it to VM accordingly.

Thanks
Aili Yao!

2021-03-24 10:26:54

by yaoaili [么爱利]

[permalink] [raw]
Subject: Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

On Wed, 24 Mar 2021 10:59:50 +0800
Aili Yao <[email protected]> wrote:

> On Wed, 24 Feb 2021 10:39:21 +0800
> Aili Yao <[email protected]> wrote:
>
> > On Tue, 23 Feb 2021 16:12:43 +0000
> > "Luck, Tony" <[email protected]> wrote:
> >
> > > > What I think is qemu has not an easy to get the MCE signature from host or currently no methods for this
> > > > So qemu treat all AR will be No RIPV, Do more is better than do less.
> > >
> > > RIPV would be important in the guest in the case where the guest can fix the problem that caused
> > > the machine check and return to the failed instruction to continue.
> > >
> > > I think the only case where this happens is a fault in a read-only page mapped from a file (typically
> > > code page, but could be a data page). In this case memory-failure() unmaps the page with the posion
> > > but Linux can recover by reading data from the file into a new page.
> > >
> > > Other cases we send SIGBUS (so go to the signal handler instead of to the faulting instruction).
> > >
> > > So it would be good if the state of RIPV could be added to the signal state sent to qemu. If that
> > > isn't possible, then this full recovery case turns into another SIGBUS case.
> >
> > This KVM and VM case of failing recovery for SRAR is just one scenario I think,
> > If Intel guarantee that when memory SRAR is triggered, RIPV will always be set, then it's the job of qemu to
> > set the RIPV instead.
> > Or if When SRAR is triggered with RIPV cleared, the same issue will be true for host.
> >
> > And I think it's better for VM to know the real RIPV value, It need more work in qemu and kernel if possible.
> >
> > Thanks
> > Aili Yao
>
> ADD this topic to qemu list, this is really one bad issue.
>
> Issue report:
> when VM receive one SRAR memory failure from host, it all has RIPV cleared, and then vm process it and trigger one panic!
>
> Can any qemu maintainer fix this?
>
> Suggestion:
> qemu get the true value of RIPV from host, the inject it to VM accordingly.

Sorry for my previous description, I may not describe the issue clearly,
I found this issue when I do memory SRAR test for kvm virtual machine, the step is:

1. Inject one uncorrectable error to one specific memory address A.
2. Then one user process in the VM access the address A and trigger a MCE exception to host.
3. In do_machine_check() kernel will check the related register and do recovery job from memory_failure();
4. Normally a BUS_MCEERR_AR SIGBUS is sent to the specifc core triggering this error.
5. Qemu will take control, and will inject this event to VM, all infomation qume can get currently is the Error code
BUS_MCEERR_AR and virtual address, in the qemu inject function:
if (code == BUS_MCEERR_AR) {
status |= MCI_STATUS_AR | 0x134;
mcg_status |= MCG_STATUS_EIPV;
} else {
status |= 0xc0;
mcg_status |= MCG_STATUS_RIPV;
}
For BUS_MCEERR_AR case, MCG_STATUS_RIPV will always be cleared.

6. Then in VM kernel, do_machine_check will got this:
if (!(m.mcgstatus & MCG_STATUS_RIPV))
kill_current_task = 1;
then go to force_sig(SIGBUS) without calling memory_failure();
so for now, the page is not marked hwpoison.

7 The VM kernel want to exit to user mode and then process the SIGBUS signal.
As SIGBUS is a fatal signal, the coredump related work will be called.

8. Then coredump will get the user space mapped memory dumped, include the error page.

9. Then UE is triggered again, and qemu will take control again, then inject this UE event to VM and
this time the error is triggered in kernel code, then VM panic.

I don't know how can this issue be fixed cleanly, maybe qemu developers may help on this.
If qemu can fix this, that will be great!

--
Thanks!
Aili Yao