On Sat, May 28, 2022 at 06:50:54AM +0000, Tong Tiangen wrote:
> During the processing of arm64 kernel hardware memory errors(do_sea()), if
> the errors is consumed in the kernel, the current processing is panic.
> However, it is not optimal.
>
> Take uaccess for example, if the uaccess operation fails due to memory
> error, only the user process will be affected, kill the user process
> and isolate the user page with hardware memory errors is a better choice.
>
> This patch only enable machine error check framework, it add exception
> fixup before kernel panic in do_sea() and only limit the consumption of
> hardware memory errors in kernel mode triggered by user mode processes.
> If fixup successful, panic can be avoided.
>
> Signed-off-by: Tong Tiangen <[email protected]>
> ---
> arch/arm64/Kconfig | 1 +
> arch/arm64/include/asm/extable.h | 1 +
> arch/arm64/mm/extable.c | 17 +++++++++++++++++
> arch/arm64/mm/fault.c | 27 ++++++++++++++++++++++++++-
> 4 files changed, 45 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index aaeb70358979..a3b12ff0cd7f 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -19,6 +19,7 @@ config ARM64
> select ARCH_ENABLE_SPLIT_PMD_PTLOCK if PGTABLE_LEVELS > 2
> select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
> select ARCH_HAS_CACHE_LINE_SIZE
> + select ARCH_HAS_COPY_MC if ACPI_APEI_GHES
> select ARCH_HAS_CURRENT_STACK_POINTER
> select ARCH_HAS_DEBUG_VIRTUAL
> select ARCH_HAS_DEBUG_VM_PGTABLE
> diff --git a/arch/arm64/include/asm/extable.h b/arch/arm64/include/asm/extable.h
> index 72b0e71cc3de..f80ebd0addfd 100644
> --- a/arch/arm64/include/asm/extable.h
> +++ b/arch/arm64/include/asm/extable.h
> @@ -46,4 +46,5 @@ bool ex_handler_bpf(const struct exception_table_entry *ex,
> #endif /* !CONFIG_BPF_JIT */
>
> bool fixup_exception(struct pt_regs *regs);
> +bool fixup_exception_mc(struct pt_regs *regs);
> #endif
> diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c
> index 228d681a8715..c301dcf6335f 100644
> --- a/arch/arm64/mm/extable.c
> +++ b/arch/arm64/mm/extable.c
> @@ -9,6 +9,7 @@
>
> #include <asm/asm-extable.h>
> #include <asm/ptrace.h>
> +#include <asm/esr.h>
>
> static inline unsigned long
> get_ex_fixup(const struct exception_table_entry *ex)
> @@ -76,3 +77,19 @@ bool fixup_exception(struct pt_regs *regs)
>
> BUG();
> }
> +
> +bool fixup_exception_mc(struct pt_regs *regs)
> +{
> + const struct exception_table_entry *ex;
> +
> + ex = search_exception_tables(instruction_pointer(regs));
> + if (!ex)
> + return false;
> +
> + /*
> + * This is not complete, More Machine check safe extable type can
> + * be processed here.
> + */
> +
> + return false;
> +}
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index c5e11768e5c1..b262bd282a89 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -696,6 +696,29 @@ static int do_bad(unsigned long far, unsigned long esr, struct pt_regs *regs)
> return 1; /* "fault" */
> }
>
> +static bool arm64_do_kernel_sea(unsigned long addr, unsigned int esr,
> + struct pt_regs *regs, int sig, int code)
> +{
> + if (!IS_ENABLED(CONFIG_ARCH_HAS_COPY_MC))
> + return false;
> +
> + if (user_mode(regs) || !current->mm)
> + return false;
What's the `!current->mm` check for?
> +
> + if (apei_claim_sea(regs) < 0)
> + return false;
> +
> + if (!fixup_exception_mc(regs))
> + return false;
I thought we still wanted to signal the task in this case? Or do you expect to
add that into `fixup_exception_mc()` ?
> +
> + set_thread_esr(0, esr);
Why are we not setting the address? Is that deliberate, or an oversight?
> +
> + arm64_force_sig_fault(sig, code, addr,
> + "Uncorrected hardware memory error in kernel-access\n");
I think the wording here is misleading since we don't expect to recover from
accesses to kernel memory, and would be better as something like:
"Uncorrected memory error on access to user memory\n"
Thanks,
Mark.
> +
> + return true;
> +}
> +
> static int do_sea(unsigned long far, unsigned long esr, struct pt_regs *regs)
> {
> const struct fault_info *inf;
> @@ -721,7 +744,9 @@ static int do_sea(unsigned long far, unsigned long esr, struct pt_regs *regs)
> */
> siaddr = untagged_addr(far);
> }
> - arm64_notify_die(inf->name, regs, inf->sig, inf->code, siaddr, esr);
> +
> + if (!arm64_do_kernel_sea(siaddr, esr, regs, inf->sig, inf->code))
> + arm64_notify_die(inf->name, regs, inf->sig, inf->code, siaddr, esr);
>
> return 0;
> }
> --
> 2.25.1
>
在 2022/6/17 16:55, Mark Rutland 写道:
> On Sat, May 28, 2022 at 06:50:54AM +0000, Tong Tiangen wrote:
>> During the processing of arm64 kernel hardware memory errors(do_sea()), if
>> the errors is consumed in the kernel, the current processing is panic.
>> However, it is not optimal.
>>
>> Take uaccess for example, if the uaccess operation fails due to memory
>> error, only the user process will be affected, kill the user process
>> and isolate the user page with hardware memory errors is a better choice.
>>
>> This patch only enable machine error check framework, it add exception
>> fixup before kernel panic in do_sea() and only limit the consumption of
>> hardware memory errors in kernel mode triggered by user mode processes.
>> If fixup successful, panic can be avoided.
>>
>> Signed-off-by: Tong Tiangen <[email protected]>
>> ---
>> arch/arm64/Kconfig | 1 +
>> arch/arm64/include/asm/extable.h | 1 +
>> arch/arm64/mm/extable.c | 17 +++++++++++++++++
>> arch/arm64/mm/fault.c | 27 ++++++++++++++++++++++++++-
>> 4 files changed, 45 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index aaeb70358979..a3b12ff0cd7f 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -19,6 +19,7 @@ config ARM64
>> select ARCH_ENABLE_SPLIT_PMD_PTLOCK if PGTABLE_LEVELS > 2
>> select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
>> select ARCH_HAS_CACHE_LINE_SIZE
>> + select ARCH_HAS_COPY_MC if ACPI_APEI_GHES
>> select ARCH_HAS_CURRENT_STACK_POINTER
>> select ARCH_HAS_DEBUG_VIRTUAL
>> select ARCH_HAS_DEBUG_VM_PGTABLE
>> diff --git a/arch/arm64/include/asm/extable.h b/arch/arm64/include/asm/extable.h
>> index 72b0e71cc3de..f80ebd0addfd 100644
>> --- a/arch/arm64/include/asm/extable.h
>> +++ b/arch/arm64/include/asm/extable.h
>> @@ -46,4 +46,5 @@ bool ex_handler_bpf(const struct exception_table_entry *ex,
>> #endif /* !CONFIG_BPF_JIT */
>>
>> bool fixup_exception(struct pt_regs *regs);
>> +bool fixup_exception_mc(struct pt_regs *regs);
>> #endif
>> diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c
>> index 228d681a8715..c301dcf6335f 100644
>> --- a/arch/arm64/mm/extable.c
>> +++ b/arch/arm64/mm/extable.c
>> @@ -9,6 +9,7 @@
>>
>> #include <asm/asm-extable.h>
>> #include <asm/ptrace.h>
>> +#include <asm/esr.h>
>>
>> static inline unsigned long
>> get_ex_fixup(const struct exception_table_entry *ex)
>> @@ -76,3 +77,19 @@ bool fixup_exception(struct pt_regs *regs)
>>
>> BUG();
>> }
>> +
>> +bool fixup_exception_mc(struct pt_regs *regs)
>> +{
>> + const struct exception_table_entry *ex;
>> +
>> + ex = search_exception_tables(instruction_pointer(regs));
>> + if (!ex)
>> + return false;
>> +
>> + /*
>> + * This is not complete, More Machine check safe extable type can
>> + * be processed here.
>> + */
>> +
>> + return false;
>> +}
>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> index c5e11768e5c1..b262bd282a89 100644
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -696,6 +696,29 @@ static int do_bad(unsigned long far, unsigned long esr, struct pt_regs *regs)
>> return 1; /* "fault" */
>> }
>>
>> +static bool arm64_do_kernel_sea(unsigned long addr, unsigned int esr,
>> + struct pt_regs *regs, int sig, int code)
>> +{
>> + if (!IS_ENABLED(CONFIG_ARCH_HAS_COPY_MC))
>> + return false;
>> +
>> + if (user_mode(regs) || !current->mm)
>> + return false;
>
> What's the `!current->mm` check for?
At first, I considered that only user processes have the opportunity to
recover when they trigger memory error.
But it seems that this restriction is unreasonable. When the kernel
thread triggers memory error, it can also be recovered. for instance:
https://lore.kernel.org/linux-mm/[email protected]/
And i think if(!current->mm) shoud be added below:
if(!current->mm) {
set_thread_esr(0, esr);
arm64_force_sig_fault(...);
}
return true;
>
>> +
>> + if (apei_claim_sea(regs) < 0)
>> + return false;
>> +
>> + if (!fixup_exception_mc(regs))
>> + return false;
>
> I thought we still wanted to signal the task in this case? Or do you expect to
> add that into `fixup_exception_mc()` ?
Yeah, here return false and will signal to task in do_sea() ->
arm64_notify_die().
>
>> +
>> + set_thread_esr(0, esr);
>
> Why are we not setting the address? Is that deliberate, or an oversight?
Here set fault_address to 0, i refer to the logic of arm64_notify_die().
void arm64_notify_die(...)
{
if (user_mode(regs)) {
WARN_ON(regs != current_pt_regs());
current->thread.fault_address = 0;
current->thread.fault_code = err;
arm64_force_sig_fault(signo, sicode, far, str);
} else {
die(str, regs, err);
}
}
I don't know exactly why and do you know why arm64_notify_die() did this? :)
>
>> +
>> + arm64_force_sig_fault(sig, code, addr,
>> + "Uncorrected hardware memory error in kernel-access\n");
>
> I think the wording here is misleading since we don't expect to recover from
> accesses to kernel memory, and would be better as something like:
>
> "Uncorrected memory error on access to user memory\n"
OK, agreed.
Thanks,
Tong.
>
> Thanks,
> Mark.
>
>> +
>> + return true;
>> +}
>> +
>> static int do_sea(unsigned long far, unsigned long esr, struct pt_regs *regs)
>> {
>> const struct fault_info *inf;
>> @@ -721,7 +744,9 @@ static int do_sea(unsigned long far, unsigned long esr, struct pt_regs *regs)
>> */
>> siaddr = untagged_addr(far);
>> }
>> - arm64_notify_die(inf->name, regs, inf->sig, inf->code, siaddr, esr);
>> +
>> + if (!arm64_do_kernel_sea(siaddr, esr, regs, inf->sig, inf->code))
>> + arm64_notify_die(inf->name, regs, inf->sig, inf->code, siaddr, esr);
>>
>> return 0;
>> }
>> --
>> 2.25.1
>>
>
> .
On Sat, Jun 18, 2022 at 05:18:55PM +0800, Tong Tiangen wrote:
> 在 2022/6/17 16:55, Mark Rutland 写道:
> > On Sat, May 28, 2022 at 06:50:54AM +0000, Tong Tiangen wrote:
> > > +static bool arm64_do_kernel_sea(unsigned long addr, unsigned int esr,
> > > + struct pt_regs *regs, int sig, int code)
> > > +{
> > > + if (!IS_ENABLED(CONFIG_ARCH_HAS_COPY_MC))
> > > + return false;
> > > +
> > > + if (user_mode(regs) || !current->mm)
> > > + return false;
> >
> > What's the `!current->mm` check for?
>
> At first, I considered that only user processes have the opportunity to
> recover when they trigger memory error.
>
> But it seems that this restriction is unreasonable. When the kernel thread
> triggers memory error, it can also be recovered. for instance:
>
> https://lore.kernel.org/linux-mm/[email protected]/
>
> And i think if(!current->mm) shoud be added below:
>
> if(!current->mm) {
> set_thread_esr(0, esr);
> arm64_force_sig_fault(...);
> }
> return true;
Why does 'current->mm' have anything to do with this, though?
There can be kernel threads with `current->mm` set in unusual circumstances
(and there's a lot of kernel code out there which handles that wrong), so if
you want to treat user tasks differently, we should be doing something like
checking PF_KTHREAD, or adding something like an is_user_task() helper.
[...]
> > > +
> > > + if (apei_claim_sea(regs) < 0)
> > > + return false;
> > > +
> > > + if (!fixup_exception_mc(regs))
> > > + return false;
> >
> > I thought we still wanted to signal the task in this case? Or do you expect to
> > add that into `fixup_exception_mc()` ?
>
> Yeah, here return false and will signal to task in do_sea() ->
> arm64_notify_die().
I mean when we do the fixup.
I thought the idea was to apply the fixup (to stop the kernel from crashing),
but still to deliver a fatal signal to the user task since we can't do what the
user task asked us to.
> > > +
> > > + set_thread_esr(0, esr);
> >
> > Why are we not setting the address? Is that deliberate, or an oversight?
>
> Here set fault_address to 0, i refer to the logic of arm64_notify_die().
>
> void arm64_notify_die(...)
> {
> if (user_mode(regs)) {
> WARN_ON(regs != current_pt_regs());
> current->thread.fault_address = 0;
> current->thread.fault_code = err;
>
> arm64_force_sig_fault(signo, sicode, far, str);
> } else {
> die(str, regs, err);
> }
> }
>
> I don't know exactly why and do you know why arm64_notify_die() did this? :)
To be honest, I don't know, and that looks equally suspicious to me.
Looking at the git history, that was added in commit:
9141300a5884b57c ("arm64: Provide read/write fault information in compat signal handlers")
... so maybe Catalin recalls why.
Perhaps the assumption is just that this will be fatal and so unimportant? ...
but in that case the same logic would apply to the ESR value, so it's not clear
to me.
Mark.
在 2022/6/18 20:52, Mark Rutland 写道:
> On Sat, Jun 18, 2022 at 05:18:55PM +0800, Tong Tiangen wrote:
>> 在 2022/6/17 16:55, Mark Rutland 写道:
>>> On Sat, May 28, 2022 at 06:50:54AM +0000, Tong Tiangen wrote:
>>>> +static bool arm64_do_kernel_sea(unsigned long addr, unsigned int esr,
>>>> + struct pt_regs *regs, int sig, int code)
>>>> +{
>>>> + if (!IS_ENABLED(CONFIG_ARCH_HAS_COPY_MC))
>>>> + return false;
>>>> +
>>>> + if (user_mode(regs) || !current->mm)
>>>> + return false;
>>>
>>> What's the `!current->mm` check for? >>
>> At first, I considered that only user processes have the opportunity to
>> recover when they trigger memory error.
>>
>> But it seems that this restriction is unreasonable. When the kernel thread
>> triggers memory error, it can also be recovered. for instance:
>>
>> https://lore.kernel.org/linux-mm/[email protected]/
>>
>> And i think if(!current->mm) shoud be added below:
>>
>> if(!current->mm) {
>> set_thread_esr(0, esr);
>> arm64_force_sig_fault(...);
>> }
>> return true;
>
> Why does 'current->mm' have anything to do with this, though?
Sorry, typo, my original logic was:
if(current->mm) {
[...]
}
>
> There can be kernel threads with `current->mm` set in unusual circumstances
> (and there's a lot of kernel code out there which handles that wrong), so if
> you want to treat user tasks differently, we should be doing something like
> checking PF_KTHREAD, or adding something like an is_user_task() helper.
>
OK, i do want to treat user tasks differently here and didn't take into
account what you said. will be fixed next version according to your
suggestiong.
As follows:
if (!(current->flags & PF_KTHREAD)) {
set_thread_esr(0, esr);
arm64_force_sig_fault(...);
}
return true;
> [...]
>
>>>> +
>>>> + if (apei_claim_sea(regs) < 0)
>>>> + return false;
>>>> +
>>>> + if (!fixup_exception_mc(regs))
>>>> + return false;
>>>
>>> I thought we still wanted to signal the task in this case? Or do you expect to
>>> add that into `fixup_exception_mc()` ?
>>
>> Yeah, here return false and will signal to task in do_sea() ->
>> arm64_notify_die().
>
> I mean when we do the fixup.
>
> I thought the idea was to apply the fixup (to stop the kernel from crashing),
> but still to deliver a fatal signal to the user task since we can't do what the
> user task asked us to.
>
Yes, that's what i mean. :)
>>>> +
>>>> + set_thread_esr(0, esr);
>>>
>>> Why are we not setting the address? Is that deliberate, or an oversight?
>>
>> Here set fault_address to 0, i refer to the logic of arm64_notify_die().
>>
>> void arm64_notify_die(...)
>> {
>> if (user_mode(regs)) {
>> WARN_ON(regs != current_pt_regs());
>> current->thread.fault_address = 0;
>> current->thread.fault_code = err;
>>
>> arm64_force_sig_fault(signo, sicode, far, str);
>> } else {
>> die(str, regs, err);
>> }
>> }
>>
>> I don't know exactly why and do you know why arm64_notify_die() did this? :)
>
> To be honest, I don't know, and that looks equally suspicious to me.
>
> Looking at the git history, that was added in commit:
>
> 9141300a5884b57c ("arm64: Provide read/write fault information in compat signal handlers")
>
> ... so maybe Catalin recalls why.
>
> Perhaps the assumption is just that this will be fatal and so unimportant? ...
> but in that case the same logic would apply to the ESR value, so it's not clear
> to me.
OK, let's proceed as set to 0, if there is any change later, the two
positions shall be changed together.
Thanks,
Tong.
>
> Mark.
>
> .