Both EX_TYPE_FAULT_MCE_SAFE and EX_TYPE_DEFAULT_MCE_SAFE exception
fixup types are used to identify fixups which allow in kernel #MC
recovery, that is the Machine Check Safe Copy.
If an MCE which has happened in kernel space but from which the kernel
can recover, mce.kflags MCE_IN_KERNEL_RECOV will set in error_context(),
and we try to fixup the exception in do_machine_check(). But due to lack
of MCE_IN_KERNEL_COPYIN, although the kernel won't panic, the corrupted
page don't be isolated, new one maybe consume it again, which is not what
we expected.
In order to avoid above issue, some hwpoison recover process[1][2][3],
memory_failure_queue() is called to cope with such unhandled corrupted
pages, and recently coredump hwpoison recovery support[4] is asked to
do the same thing, also there are some other already existed MC-safe
copy scenarios, eg, nvdimm, dm-writecache, dax, which don't isolate
corrupted pages.
The best way to fix them is set MCE_IN_KERNEL_COPYIN for MC-Safe Copy,
then let the core do_machine_check() to isolate corrupted page instead
of doing it one-by-one.
[1] commit d302c2398ba2 ("mm, hwpoison: when copy-on-write hits poison, take page offline")
[2] commit 1cb9dc4b475c ("mm: hwpoison: support recovery from HugePage copy-on-write faults")
[3] commit 6b970599e807 ("mm: hwpoison: support recovery from ksm_might_need_to_copy()")
[4] https://lkml.kernel.org/r/[email protected]
Reviewed-by: Naoya Horiguchi <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Signed-off-by: Kefeng Wang <[email protected]>
---
v2:
- try to describe more clear problem statement, per Dave Hansen
- collect RB
arch/x86/kernel/cpu/mce/severity.c | 3 +--
mm/ksm.c | 1 -
mm/memory.c | 12 +++---------
3 files changed, 4 insertions(+), 12 deletions(-)
diff --git a/arch/x86/kernel/cpu/mce/severity.c b/arch/x86/kernel/cpu/mce/severity.c
index c4477162c07d..63e94484c5d6 100644
--- a/arch/x86/kernel/cpu/mce/severity.c
+++ b/arch/x86/kernel/cpu/mce/severity.c
@@ -293,12 +293,11 @@ static noinstr int error_context(struct mce *m, struct pt_regs *regs)
case EX_TYPE_COPY:
if (!copy_user)
return IN_KERNEL;
- m->kflags |= MCE_IN_KERNEL_COPYIN;
fallthrough;
case EX_TYPE_FAULT_MCE_SAFE:
case EX_TYPE_DEFAULT_MCE_SAFE:
- m->kflags |= MCE_IN_KERNEL_RECOV;
+ m->kflags |= MCE_IN_KERNEL_RECOV | MCE_IN_KERNEL_COPYIN;
return IN_KERNEL_RECOV;
default:
diff --git a/mm/ksm.c b/mm/ksm.c
index 0156bded3a66..7abdf4892387 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -2794,7 +2794,6 @@ struct page *ksm_might_need_to_copy(struct page *page,
if (new_page) {
if (copy_mc_user_highpage(new_page, page, address, vma)) {
put_page(new_page);
- memory_failure_queue(page_to_pfn(page), 0);
return ERR_PTR(-EHWPOISON);
}
SetPageDirty(new_page);
diff --git a/mm/memory.c b/mm/memory.c
index 8358f3b853f2..74873e7126aa 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2813,10 +2813,8 @@ static inline int __wp_page_copy_user(struct page *dst, struct page *src,
unsigned long addr = vmf->address;
if (likely(src)) {
- if (copy_mc_user_highpage(dst, src, addr, vma)) {
- memory_failure_queue(page_to_pfn(src), 0);
+ if (copy_mc_user_highpage(dst, src, addr, vma))
return -EHWPOISON;
- }
return 0;
}
@@ -5851,10 +5849,8 @@ static int copy_user_gigantic_page(struct folio *dst, struct folio *src,
cond_resched();
if (copy_mc_user_highpage(dst_page, src_page,
- addr + i*PAGE_SIZE, vma)) {
- memory_failure_queue(page_to_pfn(src_page), 0);
+ addr + i*PAGE_SIZE, vma))
return -EHWPOISON;
- }
}
return 0;
}
@@ -5870,10 +5866,8 @@ static int copy_subpage(unsigned long addr, int idx, void *arg)
struct copy_subpage_arg *copy_arg = arg;
if (copy_mc_user_highpage(copy_arg->dst + idx, copy_arg->src + idx,
- addr, copy_arg->vma)) {
- memory_failure_queue(page_to_pfn(copy_arg->src + idx), 0);
+ addr, copy_arg->vma))
return -EHWPOISON;
- }
return 0;
}
--
2.35.3
On Fri, May 26, 2023 at 02:32:42PM +0800, Kefeng Wang wrote:
> The best way to fix them is set MCE_IN_KERNEL_COPYIN for MC-Safe Copy,
> then let the core do_machine_check() to isolate corrupted page instead
> of doing it one-by-one.
No, this whole thing is confused.
* Indicates an MCE that happened in kernel space while copying data
* from user.
#define MCE_IN_KERNEL_COPYIN
This is a very specific exception type: EX_TYPE_COPY which got added by
278b917f8cb9 ("x86/mce: Add _ASM_EXTABLE_CPY for copy user access")
but Linus then removed all such user copy exception points in
034ff37d3407 ("x86: rewrite '__copy_user_nocache' function")
So now that EX_TYPE_COPY never happens.
And what you're doing is lumping the handling for
EX_TYPE_DEFAULT_MCE_SAFE and EX_TYPE_FAULT_MCE_SAFE together and saying
that the MCE happened while copying data from user.
And XSTATE_OP() is one example where this is not really the case.
So no, this is not correct.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 2023/5/26 15:09, Borislav Petkov wrote:
> On Fri, May 26, 2023 at 02:32:42PM +0800, Kefeng Wang wrote:
>> The best way to fix them is set MCE_IN_KERNEL_COPYIN for MC-Safe Copy,
>> then let the core do_machine_check() to isolate corrupted page instead
>> of doing it one-by-one.
>
> No, this whole thing is confused.
>
> * Indicates an MCE that happened in kernel space while copying data
> * from user.
>
> #define MCE_IN_KERNEL_COPYIN
>
> This is a very specific exception type: EX_TYPE_COPY which got added by
>
> 278b917f8cb9 ("x86/mce: Add _ASM_EXTABLE_CPY for copy user access")
>
> but Linus then removed all such user copy exception points in
>
> 034ff37d3407 ("x86: rewrite '__copy_user_nocache' function")
>
> So now that EX_TYPE_COPY never happens.
Is this broken the recover when kernel was copying from user space?
+ Youquan could you help to check it?
>
> And what you're doing is lumping the handling for
> EX_TYPE_DEFAULT_MCE_SAFE and EX_TYPE_FAULT_MCE_SAFE together and saying
> that the MCE happened while copying data from user.
>
> And XSTATE_OP() is one example where this is not really the case.
>
Oh, for XSTATE_OP(), it uses EX_TYPE_DEFAULT_MCE_SAFE, but I'm focus on
EX_TYPE_DEFAULT_MCE_SAFE, which use copy_mc (arch/x86/lib/copy_mc_64.S),
like I maintained in changelog, CoW/Coredump/nvdimm/dax, they use
copy_mc_xxx function, sorry for mixed them up.
> So no, this is not correct.
so only add MCE_IN_KERNEL_COPYIN for EX_TYPE_DEFAULT_MCE_SAFE?
diff --git a/arch/x86/kernel/cpu/mce/severity.c
b/arch/x86/kernel/cpu/mce/severity.c
index c4477162c07d..6d2587994623 100644
--- a/arch/x86/kernel/cpu/mce/severity.c
+++ b/arch/x86/kernel/cpu/mce/severity.c
@@ -293,11 +293,11 @@ static noinstr int error_context(struct mce *m,
struct pt_regs *regs)
case EX_TYPE_COPY:
if (!copy_user)
return IN_KERNEL;
+ fallthrough;
+ case EX_TYPE_DEFAULT_MCE_SAFE:
m->kflags |= MCE_IN_KERNEL_COPYIN;
fallthrough;
-
case EX_TYPE_FAULT_MCE_SAFE:
- case EX_TYPE_DEFAULT_MCE_SAFE:
m->kflags |= MCE_IN_KERNEL_RECOV;
return IN_KERNEL_RECOV;
Correct me if I am wrong, thanks for you reviewing.
>
On 2023/5/26 20:18, Kefeng Wang wrote:
>
>
> On 2023/5/26 15:09, Borislav Petkov wrote:
>> On Fri, May 26, 2023 at 02:32:42PM +0800, Kefeng Wang wrote:
>>> The best way to fix them is set MCE_IN_KERNEL_COPYIN for MC-Safe Copy,
>>> then let the core do_machine_check() to isolate corrupted page instead
>>> of doing it one-by-one.
>>
>> No, this whole thing is confused.
>>
>> * Indicates an MCE that happened in kernel space while copying data
>> * from user.
>>
>> #define MCE_IN_KERNEL_COPYIN
>>
>> This is a very specific exception type: EX_TYPE_COPY which got added by
>>
>> 278b917f8cb9 ("x86/mce: Add _ASM_EXTABLE_CPY for copy user access")
>>
>> but Linus then removed all such user copy exception points in
>>
>> 034ff37d3407 ("x86: rewrite '__copy_user_nocache' function")
>>
>> So now that EX_TYPE_COPY never happens.
>
> Is this broken the recover when kernel was copying from user space?
>
> + Youquan could you help to check it?
>
>>
>> And what you're doing is lumping the handling for
>> EX_TYPE_DEFAULT_MCE_SAFE and EX_TYPE_FAULT_MCE_SAFE together and saying
>> that the MCE happened while copying data from user.
>>
>> And XSTATE_OP() is one example where this is not really the case.
>>
>
> Oh, for XSTATE_OP(), it uses EX_TYPE_DEFAULT_MCE_SAFE, but I'm focus on
> EX_TYPE_DEFAULT_MCE_SAFE, which use copy_mc (arch/x86/lib/copy_mc_64.S),
> like I maintained in changelog, CoW/Coredump/nvdimm/dax, they use
> copy_mc_xxx function, sorry for mixed them up.
>
>
>> So no, this is not correct.
>
> so only add MCE_IN_KERNEL_COPYIN for EX_TYPE_DEFAULT_MCE_SAFE?
>
> diff --git a/arch/x86/kernel/cpu/mce/severity.c
> b/arch/x86/kernel/cpu/mce/severity.c
> index c4477162c07d..6d2587994623 100644
> --- a/arch/x86/kernel/cpu/mce/severity.c
> +++ b/arch/x86/kernel/cpu/mce/severity.c
> @@ -293,11 +293,11 @@ static noinstr int error_context(struct mce *m,
> struct pt_regs *regs)
> case EX_TYPE_COPY:
> if (!copy_user)
> return IN_KERNEL;
> + fallthrough;
> + case EX_TYPE_DEFAULT_MCE_SAFE:
> m->kflags |= MCE_IN_KERNEL_COPYIN;
> fallthrough;
As mentioned above, I am focus on copy_mc_XXX calling, it will
abort if the exception fires when accessing the source, and we
want to isolate the corrupted src page, maybe we could a new flag
to indicate this scenario, the *Final Goals* is to let core
do_machine_check to deal with the corrupted src page.
Any suggestiong, thanks
> -
> case EX_TYPE_FAULT_MCE_SAFE:
> - case EX_TYPE_DEFAULT_MCE_SAFE:
> m->kflags |= MCE_IN_KERNEL_RECOV;
> return IN_KERNEL_RECOV;
>
> Correct me if I am wrong, thanks for you reviewing.
>
>
>>
> As mentioned above, I am focus on copy_mc_XXX calling, it will
> abort if the exception fires when accessing the source, and we
> want to isolate the corrupted src page, maybe we could a new flag
> to indicate this scenario, the *Final Goals* is to let core
> do_machine_check to deal with the corrupted src page.
A new flag seems like a good direction. Re-using the existing
MCE_IN_KERNEL_COPYIN one happens to work now, but
is causing confusing now (since this case isn't a copy-from-user)
and may cause code maintenance issues in the future.
-Tony
On Fri, Jun 02, 2023 at 03:12:30PM +0000, Luck, Tony wrote:
> > As mentioned above, I am focus on copy_mc_XXX calling, it will
> > abort if the exception fires when accessing the source, and we
> > want to isolate the corrupted src page, maybe we could a new flag
> > to indicate this scenario, the *Final Goals* is to let core
> > do_machine_check to deal with the corrupted src page.
>
> A new flag seems like a good direction.
Before anything happens here, the fate of the now unused EX_TYPE_COPY
needs to be decided first. Then new stuff.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 2023/6/3 0:01, Borislav Petkov wrote:
> On Fri, Jun 02, 2023 at 03:12:30PM +0000, Luck, Tony wrote:
>>> As mentioned above, I am focus on copy_mc_XXX calling, it will
>>> abort if the exception fires when accessing the source, and we
>>> want to isolate the corrupted src page, maybe we could a new flag
>>> to indicate this scenario, the *Final Goals* is to let core
>>> do_machine_check to deal with the corrupted src page.
>>
>> A new flag seems like a good direction.
>
> Before anything happens here, the fate of the now unused EX_TYPE_COPY
> needs to be decided first. Then new stuff.
EX_TYPE_COPY is actually not related EX_TYPE_DEFAULT_MCE_SAFE, but we
could recheck it again, it seems that mce recover from copy from user
is broken now. Maybe change _ASM_EXTABLE_UA to _ASM_EXTABLE_CPY in
copy_user_64.S again as commit[1] did before.
Thank you for the advice.
[1] 278b917f8cb9 x86/mce: Add _ASM_EXTABLE_CPY for copy user access
>
> Thx.
>