On Mon, Jan 29, 2024 at 09:46:49PM +0800, Tong Tiangen wrote:
> If user process access memory fails due to hardware memory error, only the
> relevant processes are affected, so it is more reasonable to kill the user
> process and isolate the corrupt page than to panic the kernel.
>
> Signed-off-by: Tong Tiangen <[email protected]>
> ---
> arch/arm64/lib/copy_from_user.S | 10 +++++-----
> arch/arm64/lib/copy_to_user.S | 10 +++++-----
> arch/arm64/mm/extable.c | 8 ++++----
> 3 files changed, 14 insertions(+), 14 deletions(-)
>
> diff --git a/arch/arm64/lib/copy_from_user.S b/arch/arm64/lib/copy_from_user.S
> index 34e317907524..1bf676e9201d 100644
> --- a/arch/arm64/lib/copy_from_user.S
> +++ b/arch/arm64/lib/copy_from_user.S
> @@ -25,7 +25,7 @@
> .endm
>
> .macro strb1 reg, ptr, val
> - strb \reg, [\ptr], \val
> + USER(9998f, strb \reg, [\ptr], \val)
> .endm
This is a store to *kernel* memory, not user memory. It should not be marked
with USER().
I understand that you *might* want to handle memory errors on these stores, but
the commit message doesn't describe that and the associated trade-off. For
example, consider that when a copy_form_user fails we'll try to zero the
remaining buffer via memset(); so if a STR* instruction in copy_to_user
faulted, upon handling the fault we'll immediately try to fix that up with some
more stores which will also fault, but won't get fixed up, leading to a panic()
anyway...
Further, this change will also silently fixup unexpected kernel faults if we
pass bad kernel pointers to copy_{to,from}_user, which will hide real bugs.
So NAK to this change as-is; likewise for the addition of USER() to other ldr*
macros in copy_from_user.S and the addition of USER() str* macros in
copy_to_user.S.
If we want to handle memory errors on some kaccesses, we need a new EX_TYPE_*
separate from the usual EX_TYPE_KACESS_ERR_ZERO that means "handle memory
errors, but treat other faults as fatal". That should come with a rationale and
explanation of why it's actually useful.
[...]
> diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c
> index 478e639f8680..28ec35e3d210 100644
> --- a/arch/arm64/mm/extable.c
> +++ b/arch/arm64/mm/extable.c
> @@ -85,10 +85,10 @@ bool fixup_exception_mc(struct pt_regs *regs)
> if (!ex)
> return false;
>
> - /*
> - * This is not complete, More Machine check safe extable type can
> - * be processed here.
> - */
> + switch (ex->type) {
> + case EX_TYPE_UACCESS_ERR_ZERO:
> + return ex_handler_uaccess_err_zero(ex, regs);
> + }
Please fold this part into the prior patch, and start ogf with *only* handling
errors on accesses already marked with EX_TYPE_UACCESS_ERR_ZERO. I think that
change would be relatively uncontroversial, and it would be much easier to
build atop that.
Thanks,
Mark.