This patchset is based on[1].
With the increase of memory capacity and density, the probability of
memory error increases. The increasing size and density of server RAM
in the data center and cloud have shown increased uncorrectable memory
errors.
Currently, the kernel has a mechanism to recover from hardware memory
errors. This patchset provides an new recovery mechanism.
For ARM64, the hardware error handling is do_sea() which divided into
two cases:
1. The user state consumed the memory errors, the solution is kill th
user process and isolate the error page.
2. The kernel state consumed the memory errors, the solution is panic.
For kernelspace, Undifferentiated panic maybe not the optimal choice,
it can be handled better.
This patchset deals with four sscenarios of hardware memory error consumed
in kernelspace:
1. copy_from_user.
2. get_user.
3. cow(copy on write).
4. pagecache reading.
These four scenarios have similarities. Although the error is consumed in
the kernel state, but the consumed data belongs to the user state.
The processing scheme is based on CONFIG_ARCH_HAS_COPY_MC and uses the
process killing plus isolate error page to replace kernel panic.
[1]https://lore.kernel.org/lkml/[email protected]/
Since V2:
1.Consistent with PPC/x86, Using CONFIG_ARCH_HAS_COPY_MC instead of
ARM64_UCE_KERNEL_RECOVERY.
2.Add two new scenarios, cow and pagecache reading.
3.Fix two small bug(the first two patch).
Tong Tiangen (7):
x86: fix copy_mc_to_user compile error
arm64: fix page_address return value in copy_highpage
arm64: add support for machine check error safe
arm64: add copy_from_user to machine check safe
arm64: add get_user to machine check safe
arm64: add cow to machine check safe
arm64: add pagecache reading to machine check safe
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/asm-extable.h | 25 +++++++
arch/arm64/include/asm/asm-uaccess.h | 16 +++++
arch/arm64/include/asm/esr.h | 5 ++
arch/arm64/include/asm/extable.h | 2 +-
arch/arm64/include/asm/page.h | 10 +++
arch/arm64/include/asm/uaccess.h | 17 ++++-
arch/arm64/kernel/probes/kprobes.c | 2 +-
arch/arm64/lib/Makefile | 2 +
arch/arm64/lib/copy_from_user.S | 11 ++--
arch/arm64/lib/copy_page_mc.S | 98 ++++++++++++++++++++++++++++
arch/arm64/lib/copy_to_user_mc.S | 78 ++++++++++++++++++++++
arch/arm64/mm/copypage.c | 36 ++++++++--
arch/arm64/mm/extable.c | 21 +++++-
arch/arm64/mm/fault.c | 30 ++++++++-
arch/x86/include/asm/uaccess.h | 1 +
include/linux/highmem.h | 8 +++
include/linux/uaccess.h | 8 +++
include/linux/uio.h | 9 ++-
lib/iov_iter.c | 85 +++++++++++++++++++-----
mm/memory.c | 2 +-
21 files changed, 432 insertions(+), 35 deletions(-)
create mode 100644 arch/arm64/lib/copy_page_mc.S
create mode 100644 arch/arm64/lib/copy_to_user_mc.S
--
2.18.0.huawei.25
The follow patch will add copy_mc_to_user to include/linux/uaccess.h, X86
must declare Implemented to avoid compile error.
Signed-off-by: Tong Tiangen <[email protected]>
---
arch/x86/include/asm/uaccess.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index f78e2b3501a1..e18c5f098025 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -415,6 +415,7 @@ copy_mc_to_kernel(void *to, const void *from, unsigned len);
unsigned long __must_check
copy_mc_to_user(void *to, const void *from, unsigned len);
+#define copy_mc_to_user copy_mc_to_user
#endif
/*
--
2.18.0.huawei.25
When user process reading file, the data is cached in pagecache and
the data belongs to the user process, When machine check error is
encountered during pagecache reading, killing the user process and
isolate the user page with hardware memory errors is a more reasonable
choice than kernel panic.
The __arch_copy_mc_to_user() in copy_to_user_mc.S is largely borrows
from __arch_copy_to_user() in copy_to_user.S and the main difference
is __arch_copy_mc_to_user() add the extable entry to support machine
check safe.
In _copy_page_to_iter(), machine check safe only considered ITER_IOVEC
which is used by pagecache reading.
Signed-off-by: Tong Tiangen <[email protected]>
---
arch/arm64/include/asm/uaccess.h | 15 ++++++
arch/arm64/lib/Makefile | 2 +-
arch/arm64/lib/copy_to_user_mc.S | 78 +++++++++++++++++++++++++++++
include/linux/uio.h | 9 +++-
lib/iov_iter.c | 85 +++++++++++++++++++++++++-------
5 files changed, 170 insertions(+), 19 deletions(-)
create mode 100644 arch/arm64/lib/copy_to_user_mc.S
diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index 24b662407fbd..f0d5e811165a 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -448,6 +448,21 @@ extern long strncpy_from_user(char *dest, const char __user *src, long count);
extern __must_check long strnlen_user(const char __user *str, long n);
+#ifdef CONFIG_ARCH_HAS_COPY_MC
+extern unsigned long __must_check __arch_copy_mc_to_user(void __user *to,
+ const void *from, unsigned long n);
+static inline unsigned long __must_check
+copy_mc_to_user(void __user *to, const void *from, unsigned long n)
+{
+ uaccess_ttbr0_enable();
+ n = __arch_copy_mc_to_user(__uaccess_mask_ptr(to), from, n);
+ uaccess_ttbr0_disable();
+
+ return n;
+}
+#define copy_mc_to_user copy_mc_to_user
+#endif
+
#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
struct page;
void memcpy_page_flushcache(char *to, struct page *page, size_t offset, size_t len);
diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
index 29c578414b12..9b3571227fb4 100644
--- a/arch/arm64/lib/Makefile
+++ b/arch/arm64/lib/Makefile
@@ -23,4 +23,4 @@ obj-$(CONFIG_ARM64_MTE) += mte.o
obj-$(CONFIG_KASAN_SW_TAGS) += kasan_sw_tags.o
-obj-$(CONFIG_ARCH_HAS_CPY_MC) += copy_page_mc.o
+obj-$(CONFIG_ARCH_HAS_COPY_MC) += copy_page_mc.o copy_to_user_mc.o
diff --git a/arch/arm64/lib/copy_to_user_mc.S b/arch/arm64/lib/copy_to_user_mc.S
new file mode 100644
index 000000000000..9d228ff15446
--- /dev/null
+++ b/arch/arm64/lib/copy_to_user_mc.S
@@ -0,0 +1,78 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ */
+
+#include <linux/linkage.h>
+
+#include <asm/asm-uaccess.h>
+#include <asm/assembler.h>
+#include <asm/cache.h>
+
+/*
+ * Copy to user space from a kernel buffer (alignment handled by the hardware)
+ *
+ * Parameters:
+ * x0 - to
+ * x1 - from
+ * x2 - n
+ * Returns:
+ * x0 - bytes not copied
+ */
+ .macro ldrb1 reg, ptr, val
+ 1000: ldrb \reg, [\ptr], \val;
+ _asm_extable_mc 1000b, 9998f;
+ .endm
+
+ .macro strb1 reg, ptr, val
+ user_ldst_mc 9998f, sttrb, \reg, \ptr, \val
+ .endm
+
+ .macro ldrh1 reg, ptr, val
+ 1001: ldrh \reg, [\ptr], \val;
+ _asm_extable_mc 1001b, 9998f;
+ .endm
+
+ .macro strh1 reg, ptr, val
+ user_ldst_mc 9997f, sttrh, \reg, \ptr, \val
+ .endm
+
+ .macro ldr1 reg, ptr, val
+ 1002: ldr \reg, [\ptr], \val;
+ _asm_extable_mc 1002b, 9998f;
+ .endm
+
+ .macro str1 reg, ptr, val
+ user_ldst_mc 9997f, sttr, \reg, \ptr, \val
+ .endm
+
+ .macro ldp1 reg1, reg2, ptr, val
+ 1003: ldp \reg1, \reg2, [\ptr], \val;
+ _asm_extable_mc 1003b, 9998f;
+ .endm
+
+ .macro stp1 reg1, reg2, ptr, val
+ user_stp 9997f, \reg1, \reg2, \ptr, \val
+ .endm
+
+end .req x5
+srcin .req x15
+SYM_FUNC_START(__arch_copy_mc_to_user)
+ add end, x0, x2
+ mov srcin, x1
+#include "copy_template.S"
+ mov x0, #0
+ ret
+
+ // Exception fixups
+9997: cbz x0, 9998f // Check machine check exception
+ cmp dst, dstin
+ b.ne 9998f
+ // Before being absolutely sure we couldn't copy anything, try harder
+ ldrb tmp1w, [srcin]
+USER(9998f, sttrb tmp1w, [dst])
+ add dst, dst, #1
+9998: sub x0, end, dst // bytes not copied
+ ret
+SYM_FUNC_END(__arch_copy_mc_to_user)
+EXPORT_SYMBOL(__arch_copy_mc_to_user)
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 739285fe5a2f..539d9ee9b032 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -147,10 +147,17 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i);
size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i);
+#ifdef CONFIG_ARCH_HAS_COPY_MC
+size_t copy_mc_page_to_iter(struct page *page, size_t offset, size_t bytes,
+ struct iov_iter *i);
+#else
+#define copy_mc_page_to_iter copy_page_to_iter
+#endif
+
static inline size_t copy_folio_to_iter(struct folio *folio, size_t offset,
size_t bytes, struct iov_iter *i)
{
- return copy_page_to_iter(&folio->page, offset, bytes, i);
+ return copy_mc_page_to_iter(&folio->page, offset, bytes, i);
}
static __always_inline __must_check
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 6dd5330f7a99..2c5f3bb6391d 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -157,6 +157,19 @@ static int copyout(void __user *to, const void *from, size_t n)
return n;
}
+#ifdef CONFIG_ARCH_HAS_COPY_MC
+static int copyout_mc(void __user *to, const void *from, size_t n)
+{
+ if (access_ok(to, n)) {
+ instrument_copy_to_user(to, from, n);
+ n = copy_mc_to_user((__force void *) to, from, n);
+ }
+ return n;
+}
+#else
+#define copyout_mc copyout
+#endif
+
static int copyin(void *to, const void __user *from, size_t n)
{
if (should_fail_usercopy())
@@ -169,7 +182,7 @@ static int copyin(void *to, const void __user *from, size_t n)
}
static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t bytes,
- struct iov_iter *i)
+ struct iov_iter *i, bool mc_safe)
{
size_t skip, copy, left, wanted;
const struct iovec *iov;
@@ -194,7 +207,10 @@ static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t b
from = kaddr + offset;
/* first chunk, usually the only one */
- left = copyout(buf, from, copy);
+ if (mc_safe)
+ left = copyout_mc(buf, from, copy);
+ else
+ left = copyout(buf, from, copy);
copy -= left;
skip += copy;
from += copy;
@@ -204,7 +220,10 @@ static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t b
iov++;
buf = iov->iov_base;
copy = min(bytes, iov->iov_len);
- left = copyout(buf, from, copy);
+ if (mc_safe)
+ left = copyout_mc(buf, from, copy);
+ else
+ left = copyout(buf, from, copy);
copy -= left;
skip = copy;
from += copy;
@@ -223,7 +242,10 @@ static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t b
kaddr = kmap(page);
from = kaddr + offset;
- left = copyout(buf, from, copy);
+ if (mc_safe)
+ left = copyout_mc(buf, from, copy);
+ else
+ left = copyout(buf, from, copy);
copy -= left;
skip += copy;
from += copy;
@@ -232,7 +254,10 @@ static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t b
iov++;
buf = iov->iov_base;
copy = min(bytes, iov->iov_len);
- left = copyout(buf, from, copy);
+ if (mc_safe)
+ left = copyout_mc(buf, from, copy);
+ else
+ left = copyout(buf, from, copy);
copy -= left;
skip = copy;
from += copy;
@@ -674,15 +699,6 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
EXPORT_SYMBOL(_copy_to_iter);
#ifdef CONFIG_ARCH_HAS_COPY_MC
-static int copyout_mc(void __user *to, const void *from, size_t n)
-{
- if (access_ok(to, n)) {
- instrument_copy_to_user(to, from, n);
- n = copy_mc_to_user((__force void *) to, from, n);
- }
- return n;
-}
-
static size_t copy_mc_pipe_to_iter(const void *addr, size_t bytes,
struct iov_iter *i)
{
@@ -846,10 +862,10 @@ static inline bool page_copy_sane(struct page *page, size_t offset, size_t n)
}
static size_t __copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
- struct iov_iter *i)
+ struct iov_iter *i, bool mc_safe)
{
if (likely(iter_is_iovec(i)))
- return copy_page_to_iter_iovec(page, offset, bytes, i);
+ return copy_page_to_iter_iovec(page, offset, bytes, i, mc_safe);
if (iov_iter_is_bvec(i) || iov_iter_is_kvec(i) || iov_iter_is_xarray(i)) {
void *kaddr = kmap_local_page(page);
size_t wanted = _copy_to_iter(kaddr + offset, bytes, i);
@@ -878,7 +894,7 @@ size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
offset %= PAGE_SIZE;
while (1) {
size_t n = __copy_page_to_iter(page, offset,
- min(bytes, (size_t)PAGE_SIZE - offset), i);
+ min(bytes, (size_t)PAGE_SIZE - offset), i, false);
res += n;
bytes -= n;
if (!bytes || !n)
@@ -893,6 +909,41 @@ size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
}
EXPORT_SYMBOL(copy_page_to_iter);
+#ifdef CONFIG_ARCH_HAS_COPY_MC
+/**
+ * copy_mc_page_to_iter - copy page to iter with source memory error exception handling.
+ *
+ * The filemap_read deploys this for pagecache reading and the main differences between
+ * this and typical copy_page_to_iter() is call __copy_page_to_iter with mc_safe true.
+ *
+ * Return: number of bytes copied (may be %0)
+ */
+size_t copy_mc_page_to_iter(struct page *page, size_t offset, size_t bytes,
+ struct iov_iter *i)
+{
+ size_t res = 0;
+
+ if (unlikely(!page_copy_sane(page, offset, bytes)))
+ return 0;
+ page += offset / PAGE_SIZE; // first subpage
+ offset %= PAGE_SIZE;
+ while (1) {
+ size_t n = __copy_page_to_iter(page, offset,
+ min(bytes, (size_t)PAGE_SIZE - offset), i, true);
+ res += n;
+ bytes -= n;
+ if (!bytes || !n)
+ break;
+ offset += n;
+ if (offset == PAGE_SIZE) {
+ page++;
+ offset = 0;
+ }
+ }
+ return res;
+}
+#endif
+
size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
struct iov_iter *i)
{
--
2.18.0.huawei.25
Function page_address return void, fix it.
Signed-off-by: Tong Tiangen <[email protected]>
---
arch/arm64/mm/copypage.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/mm/copypage.c b/arch/arm64/mm/copypage.c
index b5447e53cd73..0dea80bf6de4 100644
--- a/arch/arm64/mm/copypage.c
+++ b/arch/arm64/mm/copypage.c
@@ -16,8 +16,8 @@
void copy_highpage(struct page *to, struct page *from)
{
- struct page *kto = page_address(to);
- struct page *kfrom = page_address(from);
+ void *kto = page_address(to);
+ void *kfrom = page_address(from);
copy_page(kto, kfrom);
--
2.18.0.huawei.25
In arm64 kernel hardware memory errors process(do_sea()), if the errors
is consumed in the kernel, the current processing is panic. However,
it is not optimal. In some case, the page accessed in kernel is a user
page (such as copy_from_user/get_user), kill the user process and
isolate the user page with hardware memory errors is a better choice.
Consistent with PPC/x86, it is implemented by CONFIG_ARCH_HAS_COPY_MC.
This patch only enable machine error check framework, it add exception
fixup before kernel panic in do_sea() and only limit the consumption of
hardware memory errors in kernel mode triggered by user mode processes.
If fixup successful, there is no need to panic.
Also add _asm_extable_mc macro used for add extable entry to help
fixup.
Signed-off-by: Tong Tiangen <[email protected]>
---
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/asm-extable.h | 13 ++++++++++++
arch/arm64/include/asm/esr.h | 5 +++++
arch/arm64/include/asm/extable.h | 2 +-
arch/arm64/kernel/probes/kprobes.c | 2 +-
arch/arm64/mm/extable.c | 20 ++++++++++++++++++-
arch/arm64/mm/fault.c | 30 +++++++++++++++++++++++++++-
include/linux/uaccess.h | 8 ++++++++
8 files changed, 77 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index d9325dd95eba..012e38309955 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -19,6 +19,7 @@ config ARM64
select ARCH_ENABLE_SPLIT_PMD_PTLOCK if PGTABLE_LEVELS > 2
select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
select ARCH_HAS_CACHE_LINE_SIZE
+ select ARCH_HAS_COPY_MC if ACPI_APEI_GHES
select ARCH_HAS_CURRENT_STACK_POINTER
select ARCH_HAS_DEBUG_VIRTUAL
select ARCH_HAS_DEBUG_VM_PGTABLE
diff --git a/arch/arm64/include/asm/asm-extable.h b/arch/arm64/include/asm/asm-extable.h
index c39f2437e08e..74d1db74fd86 100644
--- a/arch/arm64/include/asm/asm-extable.h
+++ b/arch/arm64/include/asm/asm-extable.h
@@ -8,6 +8,11 @@
#define EX_TYPE_UACCESS_ERR_ZERO 3
#define EX_TYPE_LOAD_UNALIGNED_ZEROPAD 4
+/* _MC indicates that can fixup from machine check errors */
+#define EX_TYPE_FIXUP_MC 5
+
+#define IS_EX_TYPE_MC(type) (type == EX_TYPE_FIXUP_MC)
+
#ifdef __ASSEMBLY__
#define __ASM_EXTABLE_RAW(insn, fixup, type, data) \
@@ -27,6 +32,14 @@
__ASM_EXTABLE_RAW(\insn, \fixup, EX_TYPE_FIXUP, 0)
.endm
+/*
+ * Create an exception table entry for `insn`, which will branch to `fixup`
+ * when an unhandled fault(include sea fault) is taken.
+ */
+ .macro _asm_extable_mc, insn, fixup
+ __ASM_EXTABLE_RAW(\insn, \fixup, EX_TYPE_FIXUP_MC, 0)
+ .endm
+
/*
* Create an exception table entry for `insn` if `fixup` is provided. Otherwise
* do nothing.
diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index d52a0b269ee8..11fcfc002654 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -330,6 +330,11 @@
#ifndef __ASSEMBLY__
#include <asm/types.h>
+static inline bool esr_is_sea(u32 esr)
+{
+ return (esr & ESR_ELx_FSC) == ESR_ELx_FSC_EXTABT;
+}
+
static inline bool esr_is_data_abort(u32 esr)
{
const u32 ec = ESR_ELx_EC(esr);
diff --git a/arch/arm64/include/asm/extable.h b/arch/arm64/include/asm/extable.h
index 72b0e71cc3de..f7835b0f473b 100644
--- a/arch/arm64/include/asm/extable.h
+++ b/arch/arm64/include/asm/extable.h
@@ -45,5 +45,5 @@ bool ex_handler_bpf(const struct exception_table_entry *ex,
}
#endif /* !CONFIG_BPF_JIT */
-bool fixup_exception(struct pt_regs *regs);
+bool fixup_exception(struct pt_regs *regs, unsigned int esr);
#endif
diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c
index d9dfa82c1f18..16a069e8eec3 100644
--- a/arch/arm64/kernel/probes/kprobes.c
+++ b/arch/arm64/kernel/probes/kprobes.c
@@ -285,7 +285,7 @@ int __kprobes kprobe_fault_handler(struct pt_regs *regs, unsigned int fsr)
* In case the user-specified fault handler returned
* zero, try to fix up.
*/
- if (fixup_exception(regs))
+ if (fixup_exception(regs, fsr))
return 1;
}
return 0;
diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c
index 489455309695..f1134c88e849 100644
--- a/arch/arm64/mm/extable.c
+++ b/arch/arm64/mm/extable.c
@@ -9,6 +9,7 @@
#include <asm/asm-extable.h>
#include <asm/ptrace.h>
+#include <asm/esr.h>
static inline unsigned long
get_ex_fixup(const struct exception_table_entry *ex)
@@ -23,6 +24,18 @@ static bool ex_handler_fixup(const struct exception_table_entry *ex,
return true;
}
+static bool ex_handler_fixup_mc(const struct exception_table_entry *ex,
+ struct pt_regs *regs, unsigned int esr)
+{
+ if (esr_is_sea(esr))
+ regs->regs[0] = 0;
+ else
+ regs->regs[0] = 1;
+
+ regs->pc = get_ex_fixup(ex);
+ return true;
+}
+
static bool ex_handler_uaccess_err_zero(const struct exception_table_entry *ex,
struct pt_regs *regs)
{
@@ -63,7 +76,7 @@ ex_handler_load_unaligned_zeropad(const struct exception_table_entry *ex,
return true;
}
-bool fixup_exception(struct pt_regs *regs)
+bool fixup_exception(struct pt_regs *regs, unsigned int esr)
{
const struct exception_table_entry *ex;
@@ -71,9 +84,14 @@ bool fixup_exception(struct pt_regs *regs)
if (!ex)
return false;
+ if (esr_is_sea(esr) && !IS_EX_TYPE_MC(ex->type))
+ return false;
+
switch (ex->type) {
case EX_TYPE_FIXUP:
return ex_handler_fixup(ex, regs);
+ case EX_TYPE_FIXUP_MC:
+ return ex_handler_fixup_mc(ex, regs, esr);
case EX_TYPE_BPF:
return ex_handler_bpf(ex, regs);
case EX_TYPE_UACCESS_ERR_ZERO:
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 77341b160aca..ffdfab2fdd60 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -361,7 +361,7 @@ static void __do_kernel_fault(unsigned long addr, unsigned int esr,
* Are we prepared to handle this kernel fault?
* We are almost certainly not prepared to handle instruction faults.
*/
- if (!is_el1_instruction_abort(esr) && fixup_exception(regs))
+ if (!is_el1_instruction_abort(esr) && fixup_exception(regs, esr))
return;
if (WARN_RATELIMIT(is_spurious_el1_translation_fault(addr, esr, regs),
@@ -695,6 +695,30 @@ static int do_bad(unsigned long far, unsigned int esr, struct pt_regs *regs)
return 1; /* "fault" */
}
+static bool arm64_process_kernel_sea(unsigned long addr, unsigned int esr,
+ struct pt_regs *regs, int sig, int code)
+{
+ if (!IS_ENABLED(CONFIG_ARCH_HAS_COPY_MC))
+ return false;
+
+ if (user_mode(regs) || !current->mm)
+ return false;
+
+ if (apei_claim_sea(regs) < 0)
+ return false;
+
+ current->thread.fault_address = 0;
+ current->thread.fault_code = esr;
+
+ if (!fixup_exception(regs, esr))
+ return false;
+
+ arm64_force_sig_fault(sig, code, addr,
+ "Uncorrected hardware memory error in kernel-access\n");
+
+ return true;
+}
+
static int do_sea(unsigned long far, unsigned int esr, struct pt_regs *regs)
{
const struct fault_info *inf;
@@ -720,6 +744,10 @@ static int do_sea(unsigned long far, unsigned int esr, struct pt_regs *regs)
*/
siaddr = untagged_addr(far);
}
+
+ if (arm64_process_kernel_sea(siaddr, esr, regs, inf->sig, inf->code))
+ return 0;
+
arm64_notify_die(inf->name, regs, inf->sig, inf->code, siaddr, esr);
return 0;
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 546179418ffa..dd952aeecdc1 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -174,6 +174,14 @@ copy_mc_to_kernel(void *dst, const void *src, size_t cnt)
}
#endif
+#ifndef copy_mc_to_user
+static inline unsigned long __must_check
+copy_mc_to_user(void *dst, const void *src, size_t cnt)
+{
+ return raw_copy_to_user(dst, src, cnt);
+}
+#endif
+
static __always_inline void pagefault_disabled_inc(void)
{
current->pagefault_disabled++;
--
2.18.0.huawei.25
Add scenarios get_user to machine check safe. The processing of
EX_TYPE_UACCESS_ERR_ZERO and EX_TYPE_UACCESS_ERR_ZERO_UCE_RECOVERY is same
and both return -EFAULT.
Signed-off-by: Tong Tiangen <[email protected]>
---
arch/arm64/include/asm/asm-extable.h | 14 +++++++++++++-
arch/arm64/include/asm/uaccess.h | 2 +-
arch/arm64/mm/extable.c | 1 +
3 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/asm-extable.h b/arch/arm64/include/asm/asm-extable.h
index 74d1db74fd86..bfc2d224cbae 100644
--- a/arch/arm64/include/asm/asm-extable.h
+++ b/arch/arm64/include/asm/asm-extable.h
@@ -10,8 +10,11 @@
/* _MC indicates that can fixup from machine check errors */
#define EX_TYPE_FIXUP_MC 5
+#define EX_TYPE_UACCESS_ERR_ZERO_MC 6
-#define IS_EX_TYPE_MC(type) (type == EX_TYPE_FIXUP_MC)
+#define IS_EX_TYPE_MC(type) \
+ (type == EX_TYPE_FIXUP_MC || \
+ type == EX_TYPE_UACCESS_ERR_ZERO_MC)
#ifdef __ASSEMBLY__
@@ -77,6 +80,15 @@
#define EX_DATA_REG(reg, gpr) \
"((.L__gpr_num_" #gpr ") << " __stringify(EX_DATA_REG_##reg##_SHIFT) ")"
+#define _ASM_EXTABLE_UACCESS_ERR_ZERO_MC(insn, fixup, err, zero) \
+ __DEFINE_ASM_GPR_NUMS \
+ __ASM_EXTABLE_RAW(#insn, #fixup, \
+ __stringify(EX_TYPE_UACCESS_ERR_ZERO_MC), \
+ "(" \
+ EX_DATA_REG(ERR, err) " | " \
+ EX_DATA_REG(ZERO, zero) \
+ ")")
+
#define _ASM_EXTABLE_UACCESS_ERR_ZERO(insn, fixup, err, zero) \
__DEFINE_ASM_GPR_NUMS \
__ASM_EXTABLE_RAW(#insn, #fixup, \
diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index e8dce0cc5eaa..24b662407fbd 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -236,7 +236,7 @@ static inline void __user *__uaccess_mask_ptr(const void __user *ptr)
asm volatile( \
"1: " load " " reg "1, [%2]\n" \
"2:\n" \
- _ASM_EXTABLE_UACCESS_ERR_ZERO(1b, 2b, %w0, %w1) \
+ _ASM_EXTABLE_UACCESS_ERR_ZERO_MC(1b, 2b, %w0, %w1) \
: "+r" (err), "=&r" (x) \
: "r" (addr))
diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c
index f1134c88e849..7c05f8d2bce0 100644
--- a/arch/arm64/mm/extable.c
+++ b/arch/arm64/mm/extable.c
@@ -95,6 +95,7 @@ bool fixup_exception(struct pt_regs *regs, unsigned int esr)
case EX_TYPE_BPF:
return ex_handler_bpf(ex, regs);
case EX_TYPE_UACCESS_ERR_ZERO:
+ case EX_TYPE_UACCESS_ERR_ZERO_MC:
return ex_handler_uaccess_err_zero(ex, regs);
case EX_TYPE_LOAD_UNALIGNED_ZEROPAD:
return ex_handler_load_unaligned_zeropad(ex, regs);
--
2.18.0.huawei.25
In the cow(copy on write) processing, the data of the user process is
copied, when machine check error is encountered during copy, killing
the user process and isolate the user page with hardware memory errors
is a more reasonable choice than kernel panic.
The copy_page_mc() in copy_page_mc.S is largely borrows from
copy_page() in copy_page.S and the main difference is copy_page_mc()
add the extable entry to support machine check safe.
Signed-off-by: Tong Tiangen <[email protected]>
---
arch/arm64/include/asm/page.h | 10 ++++
arch/arm64/lib/Makefile | 2 +
arch/arm64/lib/copy_page_mc.S | 98 +++++++++++++++++++++++++++++++++++
arch/arm64/mm/copypage.c | 36 ++++++++++---
include/linux/highmem.h | 8 +++
mm/memory.c | 2 +-
6 files changed, 149 insertions(+), 7 deletions(-)
create mode 100644 arch/arm64/lib/copy_page_mc.S
diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index 993a27ea6f54..832571a7dddb 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -29,6 +29,16 @@ void copy_user_highpage(struct page *to, struct page *from,
void copy_highpage(struct page *to, struct page *from);
#define __HAVE_ARCH_COPY_HIGHPAGE
+#ifdef CONFIG_ARCH_HAS_COPY_MC
+extern void copy_page_mc(void *to, const void *from);
+void copy_highpage_mc(struct page *to, struct page *from);
+#define __HAVE_ARCH_COPY_HIGHPAGE_MC
+
+void copy_user_highpage_mc(struct page *to, struct page *from,
+ unsigned long vaddr, struct vm_area_struct *vma);
+#define __HAVE_ARCH_COPY_USER_HIGHPAGE_MC
+#endif
+
struct page *alloc_zeroed_user_highpage_movable(struct vm_area_struct *vma,
unsigned long vaddr);
#define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE_MOVABLE
diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
index 29490be2546b..29c578414b12 100644
--- a/arch/arm64/lib/Makefile
+++ b/arch/arm64/lib/Makefile
@@ -22,3 +22,5 @@ obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
obj-$(CONFIG_ARM64_MTE) += mte.o
obj-$(CONFIG_KASAN_SW_TAGS) += kasan_sw_tags.o
+
+obj-$(CONFIG_ARCH_HAS_CPY_MC) += copy_page_mc.o
diff --git a/arch/arm64/lib/copy_page_mc.S b/arch/arm64/lib/copy_page_mc.S
new file mode 100644
index 000000000000..cbf56e661efe
--- /dev/null
+++ b/arch/arm64/lib/copy_page_mc.S
@@ -0,0 +1,98 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ */
+
+#include <linux/linkage.h>
+#include <linux/const.h>
+#include <asm/assembler.h>
+#include <asm/page.h>
+#include <asm/cpufeature.h>
+#include <asm/alternative.h>
+
+/*
+ * Copy a page from src to dest (both are page aligned) with machine check
+ *
+ * Parameters:
+ * x0 - dest
+ * x1 - src
+ */
+SYM_FUNC_START(__pi_copy_page_mc)
+alternative_if ARM64_HAS_NO_HW_PREFETCH
+ // Prefetch three cache lines ahead.
+ prfm pldl1strm, [x1, #128]
+ prfm pldl1strm, [x1, #256]
+ prfm pldl1strm, [x1, #384]
+alternative_else_nop_endif
+
+100: ldp x2, x3, [x1]
+101: ldp x4, x5, [x1, #16]
+102: ldp x6, x7, [x1, #32]
+103: ldp x8, x9, [x1, #48]
+104: ldp x10, x11, [x1, #64]
+105: ldp x12, x13, [x1, #80]
+106: ldp x14, x15, [x1, #96]
+107: ldp x16, x17, [x1, #112]
+
+ add x0, x0, #256
+ add x1, x1, #128
+1:
+ tst x0, #(PAGE_SIZE - 1)
+
+alternative_if ARM64_HAS_NO_HW_PREFETCH
+ prfm pldl1strm, [x1, #384]
+alternative_else_nop_endif
+
+ stnp x2, x3, [x0, #-256]
+200: ldp x2, x3, [x1]
+ stnp x4, x5, [x0, #16 - 256]
+201: ldp x4, x5, [x1, #16]
+ stnp x6, x7, [x0, #32 - 256]
+202: ldp x6, x7, [x1, #32]
+ stnp x8, x9, [x0, #48 - 256]
+203: ldp x8, x9, [x1, #48]
+ stnp x10, x11, [x0, #64 - 256]
+204: ldp x10, x11, [x1, #64]
+ stnp x12, x13, [x0, #80 - 256]
+205: ldp x12, x13, [x1, #80]
+ stnp x14, x15, [x0, #96 - 256]
+206: ldp x14, x15, [x1, #96]
+ stnp x16, x17, [x0, #112 - 256]
+207: ldp x16, x17, [x1, #112]
+
+ add x0, x0, #128
+ add x1, x1, #128
+
+ b.ne 1b
+
+ stnp x2, x3, [x0, #-256]
+ stnp x4, x5, [x0, #16 - 256]
+ stnp x6, x7, [x0, #32 - 256]
+ stnp x8, x9, [x0, #48 - 256]
+ stnp x10, x11, [x0, #64 - 256]
+ stnp x12, x13, [x0, #80 - 256]
+ stnp x14, x15, [x0, #96 - 256]
+ stnp x16, x17, [x0, #112 - 256]
+
+300: ret
+
+_asm_extable_mc 100b, 300b
+_asm_extable_mc 101b, 300b
+_asm_extable_mc 102b, 300b
+_asm_extable_mc 103b, 300b
+_asm_extable_mc 104b, 300b
+_asm_extable_mc 105b, 300b
+_asm_extable_mc 106b, 300b
+_asm_extable_mc 107b, 300b
+_asm_extable_mc 200b, 300b
+_asm_extable_mc 201b, 300b
+_asm_extable_mc 202b, 300b
+_asm_extable_mc 203b, 300b
+_asm_extable_mc 204b, 300b
+_asm_extable_mc 205b, 300b
+_asm_extable_mc 206b, 300b
+_asm_extable_mc 207b, 300b
+
+SYM_FUNC_END(__pi_copy_page_mc)
+SYM_FUNC_ALIAS(copy_page_mc, __pi_copy_page_mc)
+EXPORT_SYMBOL(copy_page_mc)
diff --git a/arch/arm64/mm/copypage.c b/arch/arm64/mm/copypage.c
index 0dea80bf6de4..0f28edfcb234 100644
--- a/arch/arm64/mm/copypage.c
+++ b/arch/arm64/mm/copypage.c
@@ -14,13 +14,8 @@
#include <asm/cpufeature.h>
#include <asm/mte.h>
-void copy_highpage(struct page *to, struct page *from)
+static void do_mte(struct page *to, struct page *from, void *kto, void *kfrom)
{
- void *kto = page_address(to);
- void *kfrom = page_address(from);
-
- copy_page(kto, kfrom);
-
if (system_supports_mte() && test_bit(PG_mte_tagged, &from->flags)) {
set_bit(PG_mte_tagged, &to->flags);
page_kasan_tag_reset(to);
@@ -35,6 +30,15 @@ void copy_highpage(struct page *to, struct page *from)
mte_copy_page_tags(kto, kfrom);
}
}
+
+void copy_highpage(struct page *to, struct page *from)
+{
+ void *kto = page_address(to);
+ void *kfrom = page_address(from);
+
+ copy_page(kto, kfrom);
+ do_mte(to, from, kto, kfrom);
+}
EXPORT_SYMBOL(copy_highpage);
void copy_user_highpage(struct page *to, struct page *from,
@@ -44,3 +48,23 @@ void copy_user_highpage(struct page *to, struct page *from,
flush_dcache_page(to);
}
EXPORT_SYMBOL_GPL(copy_user_highpage);
+
+#ifdef CONFIG_ARCH_HAS_COPY_MC
+void copy_highpage_mc(struct page *to, struct page *from)
+{
+ void *kto = page_address(to);
+ void *kfrom = page_address(from);
+
+ copy_page_mc(kto, kfrom);
+ do_mte(to, from, kto, kfrom);
+}
+EXPORT_SYMBOL(copy_highpage_mc);
+
+void copy_user_highpage_mc(struct page *to, struct page *from,
+ unsigned long vaddr, struct vm_area_struct *vma)
+{
+ copy_highpage_mc(to, from);
+ flush_dcache_page(to);
+}
+EXPORT_SYMBOL_GPL(copy_user_highpage_mc);
+#endif
diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 39bb9b47fa9c..a9dbf331b038 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -283,6 +283,10 @@ static inline void copy_user_highpage(struct page *to, struct page *from,
#endif
+#ifndef __HAVE_ARCH_COPY_USER_HIGHPAGE_MC
+#define copy_user_highpage_mc copy_user_highpage
+#endif
+
#ifndef __HAVE_ARCH_COPY_HIGHPAGE
static inline void copy_highpage(struct page *to, struct page *from)
@@ -298,6 +302,10 @@ static inline void copy_highpage(struct page *to, struct page *from)
#endif
+#ifndef __HAVE_ARCH_COPY_HIGHPAGE_MC
+#define cop_highpage_mc copy_highpage
+#endif
+
static inline void memcpy_page(struct page *dst_page, size_t dst_off,
struct page *src_page, size_t src_off,
size_t len)
diff --git a/mm/memory.c b/mm/memory.c
index 76e3af9639d9..d5f62234152d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2767,7 +2767,7 @@ static inline bool cow_user_page(struct page *dst, struct page *src,
unsigned long addr = vmf->address;
if (likely(src)) {
- copy_user_highpage(dst, src, addr, vma);
+ copy_user_highpage_mc(dst, src, addr, vma);
return true;
}
--
2.18.0.huawei.25
Add scenarios copy_from_user to machine check safe.
The data copied is user data and is machine check safe, so just kill
the user process and isolate the error page, not necessary panic.
Signed-off-by: Tong Tiangen <[email protected]>
---
arch/arm64/include/asm/asm-uaccess.h | 16 ++++++++++++++++
arch/arm64/lib/copy_from_user.S | 11 ++++++-----
2 files changed, 22 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/include/asm/asm-uaccess.h b/arch/arm64/include/asm/asm-uaccess.h
index 0557af834e03..f31c8978e1af 100644
--- a/arch/arm64/include/asm/asm-uaccess.h
+++ b/arch/arm64/include/asm/asm-uaccess.h
@@ -92,4 +92,20 @@ alternative_else_nop_endif
_asm_extable 8888b,\l;
.endm
+
+ .macro user_ldp_mc l, reg1, reg2, addr, post_inc
+8888: ldtr \reg1, [\addr];
+8889: ldtr \reg2, [\addr, #8];
+ add \addr, \addr, \post_inc;
+
+ _asm_extable_mc 8888b, \l;
+ _asm_extable_mc 8889b, \l;
+ .endm
+
+ .macro user_ldst_mc l, inst, reg, addr, post_inc
+8888: \inst \reg, [\addr];
+ add \addr, \addr, \post_inc;
+
+ _asm_extable_mc 8888b, \l;
+ .endm
#endif
diff --git a/arch/arm64/lib/copy_from_user.S b/arch/arm64/lib/copy_from_user.S
index 34e317907524..d9d7c5291871 100644
--- a/arch/arm64/lib/copy_from_user.S
+++ b/arch/arm64/lib/copy_from_user.S
@@ -21,7 +21,7 @@
*/
.macro ldrb1 reg, ptr, val
- user_ldst 9998f, ldtrb, \reg, \ptr, \val
+ user_ldst_mc 9998f, ldtrb, \reg, \ptr, \val
.endm
.macro strb1 reg, ptr, val
@@ -29,7 +29,7 @@
.endm
.macro ldrh1 reg, ptr, val
- user_ldst 9997f, ldtrh, \reg, \ptr, \val
+ user_ldst_mc 9997f, ldtrh, \reg, \ptr, \val
.endm
.macro strh1 reg, ptr, val
@@ -37,7 +37,7 @@
.endm
.macro ldr1 reg, ptr, val
- user_ldst 9997f, ldtr, \reg, \ptr, \val
+ user_ldst_mc 9997f, ldtr, \reg, \ptr, \val
.endm
.macro str1 reg, ptr, val
@@ -45,7 +45,7 @@
.endm
.macro ldp1 reg1, reg2, ptr, val
- user_ldp 9997f, \reg1, \reg2, \ptr, \val
+ user_ldp_mc 9997f, \reg1, \reg2, \ptr, \val
.endm
.macro stp1 reg1, reg2, ptr, val
@@ -62,7 +62,8 @@ SYM_FUNC_START(__arch_copy_from_user)
ret
// Exception fixups
-9997: cmp dst, dstin
+9997: cbz x0, 9998f // Check machine check exception
+ cmp dst, dstin
b.ne 9998f
// Before being absolutely sure we couldn't copy anything, try harder
USER(9998f, ldtrb tmp1w, [srcin])
--
2.18.0.huawei.25
On Wed, Apr 06, 2022 at 09:13:06AM +0000, Tong Tiangen wrote:
> Function page_address return void, fix it.
>
> Signed-off-by: Tong Tiangen <[email protected]>
This looks like a sensible cleanup, but the commit title and message aren't
that clear.
Can you please make this:
| arm64: fix types in copy_highpage()
|
| In copy_highpage() the `kto` and `kfrom` local variables are pointers to
| struct page, but these are used to hold arbitrary pointers to kernel memory.
| Each call to page_address() returns a void pointer to memory associated with
| the relevant page, and copy_page() expects void pointers to this memory.
|
| This inconsistency was introduced in commit:
|
| 2563776b41c31908 ("arm64: mte: Tags-aware copy_{user_,}highpage() implementations")
|
| ... and while this doesn't appear to be harmful in practice it is clearly wrong.
|
| Correct this by making `kto` and `kfrom` void pointers.
|
| Fixes: 2563776b41c31908 ("arm64: mte: Tags-aware copy_{user_,}highpage() implementations")
With that:
Acked-by: Mark Rutland <[email protected]>
Thanks,
Mark.
> ---
> arch/arm64/mm/copypage.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/mm/copypage.c b/arch/arm64/mm/copypage.c
> index b5447e53cd73..0dea80bf6de4 100644
> --- a/arch/arm64/mm/copypage.c
> +++ b/arch/arm64/mm/copypage.c
> @@ -16,8 +16,8 @@
>
> void copy_highpage(struct page *to, struct page *from)
> {
> - struct page *kto = page_address(to);
> - struct page *kfrom = page_address(from);
> + void *kto = page_address(to);
> + void *kfrom = page_address(from);
>
> copy_page(kto, kfrom);
>
> --
> 2.18.0.huawei.25
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
在 2022/4/6 17:22, Borislav Petkov 写道:
> On Wed, Apr 06, 2022 at 09:13:05AM +0000, Tong Tiangen wrote:
>> The follow patch
>
> There's no concept of "follow patch" in git - you need to explain this
> differently.
>
>> will add copy_mc_to_user to include/linux/uaccess.h, X86
>> must declare Implemented to avoid compile error.
>
> I don't know what that means. Try again pls.
>
This description is not good, will redescribe it in next version.
Here I describe the reasons for this:
Patch 3/7 in patchset[1] introduce copy_mc_to_user() in
include/linux/uaccess.h and the prototype is:
static inline unsigned long __must_check
copy_mc_to_user(void *dst, const void *src, size_t cnt)
The prototype in x86 is:
unsigned long __must_check
copy_mc_to_user(void *to, const void *from, unsigned len);
This two are a little different, so I added the follow code to x86 to
avoid prototype conflict compile error.
#define copy_mc_to_user copy_mc_to_user
In addition, I think this #define should be added here.
[1]https://patchwork.kernel.org/project/linux-mm/cover/[email protected]/
Thanks.
Tong.
On Wed, Apr 06, 2022 at 09:13:07AM +0000, Tong Tiangen wrote:
> In arm64 kernel hardware memory errors process(do_sea()), if the errors
> is consumed in the kernel, the current processing is panic. However,
> it is not optimal. In some case, the page accessed in kernel is a user
> page (such as copy_from_user/get_user), kill the user process and
> isolate the user page with hardware memory errors is a better choice.
>
> Consistent with PPC/x86, it is implemented by CONFIG_ARCH_HAS_COPY_MC.
Why do we need new helpers for this, rather than doing this for *any* uaccess?
I understand this is consistent with PPC & X86, but *why* is it done that way
today? e.g. are there cases where we access memroy where we do not expect the
situation to be recoverable?
> This patch only enable machine error check framework, it add exception
> fixup before kernel panic in do_sea() and only limit the consumption of
> hardware memory errors in kernel mode triggered by user mode processes.
> If fixup successful, there is no need to panic.
>
> Also add _asm_extable_mc macro used for add extable entry to help
> fixup.
>
> Signed-off-by: Tong Tiangen <[email protected]>
> ---
> arch/arm64/Kconfig | 1 +
> arch/arm64/include/asm/asm-extable.h | 13 ++++++++++++
> arch/arm64/include/asm/esr.h | 5 +++++
> arch/arm64/include/asm/extable.h | 2 +-
> arch/arm64/kernel/probes/kprobes.c | 2 +-
> arch/arm64/mm/extable.c | 20 ++++++++++++++++++-
> arch/arm64/mm/fault.c | 30 +++++++++++++++++++++++++++-
> include/linux/uaccess.h | 8 ++++++++
> 8 files changed, 77 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index d9325dd95eba..012e38309955 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -19,6 +19,7 @@ config ARM64
> select ARCH_ENABLE_SPLIT_PMD_PTLOCK if PGTABLE_LEVELS > 2
> select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
> select ARCH_HAS_CACHE_LINE_SIZE
> + select ARCH_HAS_COPY_MC if ACPI_APEI_GHES
> select ARCH_HAS_CURRENT_STACK_POINTER
> select ARCH_HAS_DEBUG_VIRTUAL
> select ARCH_HAS_DEBUG_VM_PGTABLE
> diff --git a/arch/arm64/include/asm/asm-extable.h b/arch/arm64/include/asm/asm-extable.h
> index c39f2437e08e..74d1db74fd86 100644
> --- a/arch/arm64/include/asm/asm-extable.h
> +++ b/arch/arm64/include/asm/asm-extable.h
> @@ -8,6 +8,11 @@
> #define EX_TYPE_UACCESS_ERR_ZERO 3
> #define EX_TYPE_LOAD_UNALIGNED_ZEROPAD 4
>
> +/* _MC indicates that can fixup from machine check errors */
> +#define EX_TYPE_FIXUP_MC 5
> +
> +#define IS_EX_TYPE_MC(type) (type == EX_TYPE_FIXUP_MC)
If we need this, I'd strongly prefer that we have a EX_TYPE_UACCESS_MC or
EX_TYPE_UACCESS_MC_ERR_ZERO for the uaccess cases, so that we can clearly
distinguish those from non-uaccess cases.
AFAICT the only remaining raw EX_TYPE_FIXUP cases we have today are in some
cache maintenance routines, and we should be able to convert those to a new
EX_TYPE_FIXUP_UACCESS, or EX_TYPE_UACCESS_ERR_ZERO.
> +
> #ifdef __ASSEMBLY__
>
> #define __ASM_EXTABLE_RAW(insn, fixup, type, data) \
> @@ -27,6 +32,14 @@
> __ASM_EXTABLE_RAW(\insn, \fixup, EX_TYPE_FIXUP, 0)
> .endm
>
> +/*
> + * Create an exception table entry for `insn`, which will branch to `fixup`
> + * when an unhandled fault(include sea fault) is taken.
> + */
> + .macro _asm_extable_mc, insn, fixup
> + __ASM_EXTABLE_RAW(\insn, \fixup, EX_TYPE_FIXUP_MC, 0)
> + .endm
> +
> /*
> * Create an exception table entry for `insn` if `fixup` is provided. Otherwise
> * do nothing.
> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
> index d52a0b269ee8..11fcfc002654 100644
> --- a/arch/arm64/include/asm/esr.h
> +++ b/arch/arm64/include/asm/esr.h
> @@ -330,6 +330,11 @@
> #ifndef __ASSEMBLY__
> #include <asm/types.h>
>
> +static inline bool esr_is_sea(u32 esr)
> +{
> + return (esr & ESR_ELx_FSC) == ESR_ELx_FSC_EXTABT;
> +}
> +
> static inline bool esr_is_data_abort(u32 esr)
> {
> const u32 ec = ESR_ELx_EC(esr);
> diff --git a/arch/arm64/include/asm/extable.h b/arch/arm64/include/asm/extable.h
> index 72b0e71cc3de..f7835b0f473b 100644
> --- a/arch/arm64/include/asm/extable.h
> +++ b/arch/arm64/include/asm/extable.h
> @@ -45,5 +45,5 @@ bool ex_handler_bpf(const struct exception_table_entry *ex,
> }
> #endif /* !CONFIG_BPF_JIT */
>
> -bool fixup_exception(struct pt_regs *regs);
> +bool fixup_exception(struct pt_regs *regs, unsigned int esr);
> #endif
> diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c
> index d9dfa82c1f18..16a069e8eec3 100644
> --- a/arch/arm64/kernel/probes/kprobes.c
> +++ b/arch/arm64/kernel/probes/kprobes.c
> @@ -285,7 +285,7 @@ int __kprobes kprobe_fault_handler(struct pt_regs *regs, unsigned int fsr)
> * In case the user-specified fault handler returned
> * zero, try to fix up.
> */
> - if (fixup_exception(regs))
> + if (fixup_exception(regs, fsr))
> return 1;
> }
> return 0;
> diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c
> index 489455309695..f1134c88e849 100644
> --- a/arch/arm64/mm/extable.c
> +++ b/arch/arm64/mm/extable.c
> @@ -9,6 +9,7 @@
>
> #include <asm/asm-extable.h>
> #include <asm/ptrace.h>
> +#include <asm/esr.h>
>
> static inline unsigned long
> get_ex_fixup(const struct exception_table_entry *ex)
> @@ -23,6 +24,18 @@ static bool ex_handler_fixup(const struct exception_table_entry *ex,
> return true;
> }
>
> +static bool ex_handler_fixup_mc(const struct exception_table_entry *ex,
> + struct pt_regs *regs, unsigned int esr)
> +{
> + if (esr_is_sea(esr))
> + regs->regs[0] = 0;
> + else
> + regs->regs[0] = 1;
This needs more explanation.
Why does this hard-code an assumption that we can alter x0?
Why is the x0 value distinct for SEA or non-SEA? What is this meant to
represent specifically?
What if this SEA was taken for a reason other than a memory error?
> +
> + regs->pc = get_ex_fixup(ex);
> + return true;
> +}
> +
> static bool ex_handler_uaccess_err_zero(const struct exception_table_entry *ex,
> struct pt_regs *regs)
> {
> @@ -63,7 +76,7 @@ ex_handler_load_unaligned_zeropad(const struct exception_table_entry *ex,
> return true;
> }
>
> -bool fixup_exception(struct pt_regs *regs)
> +bool fixup_exception(struct pt_regs *regs, unsigned int esr)
> {
> const struct exception_table_entry *ex;
>
> @@ -71,9 +84,14 @@ bool fixup_exception(struct pt_regs *regs)
> if (!ex)
> return false;
>
> + if (esr_is_sea(esr) && !IS_EX_TYPE_MC(ex->type))
> + return false;
I don't think this check belongs here.
Either this should be folded into ex_handler_fixup_mc(), or we should make the
judgement earlier in the fault handling path, and have a separate
fixup_exception_mc() that we can call specifically in the case of a memory
error.
> +
> switch (ex->type) {
> case EX_TYPE_FIXUP:
> return ex_handler_fixup(ex, regs);
> + case EX_TYPE_FIXUP_MC:
> + return ex_handler_fixup_mc(ex, regs, esr);
> case EX_TYPE_BPF:
> return ex_handler_bpf(ex, regs);
> case EX_TYPE_UACCESS_ERR_ZERO:
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 77341b160aca..ffdfab2fdd60 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -361,7 +361,7 @@ static void __do_kernel_fault(unsigned long addr, unsigned int esr,
> * Are we prepared to handle this kernel fault?
> * We are almost certainly not prepared to handle instruction faults.
> */
> - if (!is_el1_instruction_abort(esr) && fixup_exception(regs))
> + if (!is_el1_instruction_abort(esr) && fixup_exception(regs, esr))
> return;
>
> if (WARN_RATELIMIT(is_spurious_el1_translation_fault(addr, esr, regs),
> @@ -695,6 +695,30 @@ static int do_bad(unsigned long far, unsigned int esr, struct pt_regs *regs)
> return 1; /* "fault" */
> }
>
> +static bool arm64_process_kernel_sea(unsigned long addr, unsigned int esr,
> + struct pt_regs *regs, int sig, int code)
> +{
> + if (!IS_ENABLED(CONFIG_ARCH_HAS_COPY_MC))
> + return false;
> +
> + if (user_mode(regs) || !current->mm)
> + return false;
> +
> + if (apei_claim_sea(regs) < 0)
> + return false;
> +
> + current->thread.fault_address = 0;
> + current->thread.fault_code = esr;
> +
> + if (!fixup_exception(regs, esr))
> + return false;
> +
> + arm64_force_sig_fault(sig, code, addr,
> + "Uncorrected hardware memory error in kernel-access\n");
> +
> + return true;
> +}
> +
> static int do_sea(unsigned long far, unsigned int esr, struct pt_regs *regs)
> {
> const struct fault_info *inf;
> @@ -720,6 +744,10 @@ static int do_sea(unsigned long far, unsigned int esr, struct pt_regs *regs)
> */
> siaddr = untagged_addr(far);
> }
> +
> + if (arm64_process_kernel_sea(siaddr, esr, regs, inf->sig, inf->code))
> + return 0;
> +
> arm64_notify_die(inf->name, regs, inf->sig, inf->code, siaddr, esr);
>
> return 0;
> diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
> index 546179418ffa..dd952aeecdc1 100644
> --- a/include/linux/uaccess.h
> +++ b/include/linux/uaccess.h
> @@ -174,6 +174,14 @@ copy_mc_to_kernel(void *dst, const void *src, size_t cnt)
> }
> #endif
>
> +#ifndef copy_mc_to_user
> +static inline unsigned long __must_check
> +copy_mc_to_user(void *dst, const void *src, size_t cnt)
> +{
> + return raw_copy_to_user(dst, src, cnt);
> +}
... this isn't using the new EX_TYPE_FIXUP_MC type, so isn't this just broken
as of this patch?
Thanks,
Mark.
> +#endif
> +
> static __always_inline void pagefault_disabled_inc(void)
> {
> current->pagefault_disabled++;
> --
> 2.18.0.huawei.25
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On Wed, Apr 06, 2022 at 09:13:08AM +0000, Tong Tiangen wrote:
> Add scenarios copy_from_user to machine check safe.
>
> The data copied is user data and is machine check safe, so just kill
> the user process and isolate the error page, not necessary panic.
>
> Signed-off-by: Tong Tiangen <[email protected]>
> ---
> arch/arm64/include/asm/asm-uaccess.h | 16 ++++++++++++++++
> arch/arm64/lib/copy_from_user.S | 11 ++++++-----
> 2 files changed, 22 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm64/include/asm/asm-uaccess.h b/arch/arm64/include/asm/asm-uaccess.h
> index 0557af834e03..f31c8978e1af 100644
> --- a/arch/arm64/include/asm/asm-uaccess.h
> +++ b/arch/arm64/include/asm/asm-uaccess.h
> @@ -92,4 +92,20 @@ alternative_else_nop_endif
>
> _asm_extable 8888b,\l;
> .endm
> +
> + .macro user_ldp_mc l, reg1, reg2, addr, post_inc
> +8888: ldtr \reg1, [\addr];
> +8889: ldtr \reg2, [\addr, #8];
> + add \addr, \addr, \post_inc;
> +
> + _asm_extable_mc 8888b, \l;
> + _asm_extable_mc 8889b, \l;
> + .endm
> +
> + .macro user_ldst_mc l, inst, reg, addr, post_inc
> +8888: \inst \reg, [\addr];
> + add \addr, \addr, \post_inc;
> +
> + _asm_extable_mc 8888b, \l;
> + .endm
> #endif
> diff --git a/arch/arm64/lib/copy_from_user.S b/arch/arm64/lib/copy_from_user.S
> index 34e317907524..d9d7c5291871 100644
> --- a/arch/arm64/lib/copy_from_user.S
> +++ b/arch/arm64/lib/copy_from_user.S
> @@ -21,7 +21,7 @@
> */
>
> .macro ldrb1 reg, ptr, val
> - user_ldst 9998f, ldtrb, \reg, \ptr, \val
> + user_ldst_mc 9998f, ldtrb, \reg, \ptr, \val
> .endm
>
> .macro strb1 reg, ptr, val
> @@ -29,7 +29,7 @@
> .endm
>
> .macro ldrh1 reg, ptr, val
> - user_ldst 9997f, ldtrh, \reg, \ptr, \val
> + user_ldst_mc 9997f, ldtrh, \reg, \ptr, \val
> .endm
>
> .macro strh1 reg, ptr, val
> @@ -37,7 +37,7 @@
> .endm
>
> .macro ldr1 reg, ptr, val
> - user_ldst 9997f, ldtr, \reg, \ptr, \val
> + user_ldst_mc 9997f, ldtr, \reg, \ptr, \val
> .endm
>
> .macro str1 reg, ptr, val
> @@ -45,7 +45,7 @@
> .endm
>
> .macro ldp1 reg1, reg2, ptr, val
> - user_ldp 9997f, \reg1, \reg2, \ptr, \val
> + user_ldp_mc 9997f, \reg1, \reg2, \ptr, \val
> .endm
>
> .macro stp1 reg1, reg2, ptr, val
> @@ -62,7 +62,8 @@ SYM_FUNC_START(__arch_copy_from_user)
> ret
>
> // Exception fixups
> -9997: cmp dst, dstin
> +9997: cbz x0, 9998f // Check machine check exception
> + cmp dst, dstin
> b.ne 9998f
If you look at the copy template, you'd see that `dstin` *is* x0.
Consier if we took a non-SEA fault. The the fixup handler will overwrite x0,
it's likely `dst` != `dstin`, and we'll branch to the byte-by-byte copy. Or if
we're doing something odd and mmap_min_addr is 0, we can do the wrong thing the
other way around and *not* branch to the byte-by-byte copy when we should.
So this is at best confusing, but likely broken too.
Thanks,
Mark.
> // Before being absolutely sure we couldn't copy anything, try harder
> USER(9998f, ldtrb tmp1w, [srcin])
> --
> 2.18.0.huawei.25
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On Wed, Apr 06, 2022 at 09:13:05AM +0000, Tong Tiangen wrote:
> The follow patch
There's no concept of "follow patch" in git - you need to explain this
differently.
> will add copy_mc_to_user to include/linux/uaccess.h, X86
> must declare Implemented to avoid compile error.
I don't know what that means. Try again pls.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Hi,
In future, for the arm64 uaccess stuff, could you please CC me, and for the
arm64 RAS bits (e.g. the SEA handling), could you please CC James Morse?
On Wed, Apr 06, 2022 at 09:13:04AM +0000, Tong Tiangen wrote:
> This patchset is based on[1].
That link below appears to be a single patch. Sending that separately makes
this harder to review, so in future could you please send this as a combined
series?
> With the increase of memory capacity and density, the probability of
> memory error increases. The increasing size and density of server RAM
> in the data center and cloud have shown increased uncorrectable memory
> errors.
>
> Currently, the kernel has a mechanism to recover from hardware memory
> errors. This patchset provides an new recovery mechanism.
>
> For ARM64, the hardware error handling is do_sea() which divided into
> two cases:
> 1. The user state consumed the memory errors, the solution is kill th
> user process and isolate the error page.
> 2. The kernel state consumed the memory errors, the solution is panic.
>
> For kernelspace, Undifferentiated panic maybe not the optimal choice,
> it can be handled better.
>
> This patchset deals with four sscenarios of hardware memory error consumed
> in kernelspace:
> 1. copy_from_user.
> 2. get_user.
What about atomics to user memory? e.g. futexes, or the armv8_deprecated
emulations?
It seems the assumption is that writing to user memory (e.g. copy_to_user() and
put_user()) don't matter? Could you please mention why? e.g. do we never take
an exception for writes to memory with errors?
> 3. cow(copy on write).
> 4. pagecache reading.
There are a bunch of other places where we'll access user memory via the linear
map, so I assume this is just a best-effort "try not to die" rather than "never
die" ?
Are there other places we might need/want to expand this to in future?
Thanks,
Mark.
> These four scenarios have similarities. Although the error is consumed in
> the kernel state, but the consumed data belongs to the user state.
>
> The processing scheme is based on CONFIG_ARCH_HAS_COPY_MC and uses the
> process killing plus isolate error page to replace kernel panic.
>
> [1]https://lore.kernel.org/lkml/[email protected]/
>
> Since V2:
> 1.Consistent with PPC/x86, Using CONFIG_ARCH_HAS_COPY_MC instead of
> ARM64_UCE_KERNEL_RECOVERY.
> 2.Add two new scenarios, cow and pagecache reading.
> 3.Fix two small bug(the first two patch).
>
> Tong Tiangen (7):
> x86: fix copy_mc_to_user compile error
> arm64: fix page_address return value in copy_highpage
> arm64: add support for machine check error safe
> arm64: add copy_from_user to machine check safe
> arm64: add get_user to machine check safe
> arm64: add cow to machine check safe
> arm64: add pagecache reading to machine check safe
>
> arch/arm64/Kconfig | 1 +
> arch/arm64/include/asm/asm-extable.h | 25 +++++++
> arch/arm64/include/asm/asm-uaccess.h | 16 +++++
> arch/arm64/include/asm/esr.h | 5 ++
> arch/arm64/include/asm/extable.h | 2 +-
> arch/arm64/include/asm/page.h | 10 +++
> arch/arm64/include/asm/uaccess.h | 17 ++++-
> arch/arm64/kernel/probes/kprobes.c | 2 +-
> arch/arm64/lib/Makefile | 2 +
> arch/arm64/lib/copy_from_user.S | 11 ++--
> arch/arm64/lib/copy_page_mc.S | 98 ++++++++++++++++++++++++++++
> arch/arm64/lib/copy_to_user_mc.S | 78 ++++++++++++++++++++++
> arch/arm64/mm/copypage.c | 36 ++++++++--
> arch/arm64/mm/extable.c | 21 +++++-
> arch/arm64/mm/fault.c | 30 ++++++++-
> arch/x86/include/asm/uaccess.h | 1 +
> include/linux/highmem.h | 8 +++
> include/linux/uaccess.h | 8 +++
> include/linux/uio.h | 9 ++-
> lib/iov_iter.c | 85 +++++++++++++++++++-----
> mm/memory.c | 2 +-
> 21 files changed, 432 insertions(+), 35 deletions(-)
> create mode 100644 arch/arm64/lib/copy_page_mc.S
> create mode 100644 arch/arm64/lib/copy_to_user_mc.S
>
> --
> 2.18.0.huawei.25
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On Wed, Apr 06, 2022 at 09:13:11AM +0000, Tong Tiangen wrote:
> When user process reading file, the data is cached in pagecache and
> the data belongs to the user process, When machine check error is
> encountered during pagecache reading, killing the user process and
> isolate the user page with hardware memory errors is a more reasonable
> choice than kernel panic.
>
> The __arch_copy_mc_to_user() in copy_to_user_mc.S is largely borrows
> from __arch_copy_to_user() in copy_to_user.S and the main difference
> is __arch_copy_mc_to_user() add the extable entry to support machine
> check safe.
As with prior patches, *why* is the distinction necessary?
This patch adds a bunch of conditional logic, but *structurally* it doesn't
alter the handling to be substantially different for the MC and non-MC cases.
This seems like pointless duplication that just makes it harder to maintain
this code.
Thanks,
Mark.
> In _copy_page_to_iter(), machine check safe only considered ITER_IOVEC
> which is used by pagecache reading.
>
> Signed-off-by: Tong Tiangen <[email protected]>
> ---
> arch/arm64/include/asm/uaccess.h | 15 ++++++
> arch/arm64/lib/Makefile | 2 +-
> arch/arm64/lib/copy_to_user_mc.S | 78 +++++++++++++++++++++++++++++
> include/linux/uio.h | 9 +++-
> lib/iov_iter.c | 85 +++++++++++++++++++++++++-------
> 5 files changed, 170 insertions(+), 19 deletions(-)
> create mode 100644 arch/arm64/lib/copy_to_user_mc.S
>
> diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
> index 24b662407fbd..f0d5e811165a 100644
> --- a/arch/arm64/include/asm/uaccess.h
> +++ b/arch/arm64/include/asm/uaccess.h
> @@ -448,6 +448,21 @@ extern long strncpy_from_user(char *dest, const char __user *src, long count);
>
> extern __must_check long strnlen_user(const char __user *str, long n);
>
> +#ifdef CONFIG_ARCH_HAS_COPY_MC
> +extern unsigned long __must_check __arch_copy_mc_to_user(void __user *to,
> + const void *from, unsigned long n);
> +static inline unsigned long __must_check
> +copy_mc_to_user(void __user *to, const void *from, unsigned long n)
> +{
> + uaccess_ttbr0_enable();
> + n = __arch_copy_mc_to_user(__uaccess_mask_ptr(to), from, n);
> + uaccess_ttbr0_disable();
> +
> + return n;
> +}
> +#define copy_mc_to_user copy_mc_to_user
> +#endif
> +
> #ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
> struct page;
> void memcpy_page_flushcache(char *to, struct page *page, size_t offset, size_t len);
> diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
> index 29c578414b12..9b3571227fb4 100644
> --- a/arch/arm64/lib/Makefile
> +++ b/arch/arm64/lib/Makefile
> @@ -23,4 +23,4 @@ obj-$(CONFIG_ARM64_MTE) += mte.o
>
> obj-$(CONFIG_KASAN_SW_TAGS) += kasan_sw_tags.o
>
> -obj-$(CONFIG_ARCH_HAS_CPY_MC) += copy_page_mc.o
> +obj-$(CONFIG_ARCH_HAS_COPY_MC) += copy_page_mc.o copy_to_user_mc.o
> diff --git a/arch/arm64/lib/copy_to_user_mc.S b/arch/arm64/lib/copy_to_user_mc.S
> new file mode 100644
> index 000000000000..9d228ff15446
> --- /dev/null
> +++ b/arch/arm64/lib/copy_to_user_mc.S
> @@ -0,0 +1,78 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2012 ARM Ltd.
> + */
> +
> +#include <linux/linkage.h>
> +
> +#include <asm/asm-uaccess.h>
> +#include <asm/assembler.h>
> +#include <asm/cache.h>
> +
> +/*
> + * Copy to user space from a kernel buffer (alignment handled by the hardware)
> + *
> + * Parameters:
> + * x0 - to
> + * x1 - from
> + * x2 - n
> + * Returns:
> + * x0 - bytes not copied
> + */
> + .macro ldrb1 reg, ptr, val
> + 1000: ldrb \reg, [\ptr], \val;
> + _asm_extable_mc 1000b, 9998f;
> + .endm
> +
> + .macro strb1 reg, ptr, val
> + user_ldst_mc 9998f, sttrb, \reg, \ptr, \val
> + .endm
> +
> + .macro ldrh1 reg, ptr, val
> + 1001: ldrh \reg, [\ptr], \val;
> + _asm_extable_mc 1001b, 9998f;
> + .endm
> +
> + .macro strh1 reg, ptr, val
> + user_ldst_mc 9997f, sttrh, \reg, \ptr, \val
> + .endm
> +
> + .macro ldr1 reg, ptr, val
> + 1002: ldr \reg, [\ptr], \val;
> + _asm_extable_mc 1002b, 9998f;
> + .endm
> +
> + .macro str1 reg, ptr, val
> + user_ldst_mc 9997f, sttr, \reg, \ptr, \val
> + .endm
> +
> + .macro ldp1 reg1, reg2, ptr, val
> + 1003: ldp \reg1, \reg2, [\ptr], \val;
> + _asm_extable_mc 1003b, 9998f;
> + .endm
> +
> + .macro stp1 reg1, reg2, ptr, val
> + user_stp 9997f, \reg1, \reg2, \ptr, \val
> + .endm
> +
> +end .req x5
> +srcin .req x15
> +SYM_FUNC_START(__arch_copy_mc_to_user)
> + add end, x0, x2
> + mov srcin, x1
> +#include "copy_template.S"
> + mov x0, #0
> + ret
> +
> + // Exception fixups
> +9997: cbz x0, 9998f // Check machine check exception
> + cmp dst, dstin
> + b.ne 9998f
> + // Before being absolutely sure we couldn't copy anything, try harder
> + ldrb tmp1w, [srcin]
> +USER(9998f, sttrb tmp1w, [dst])
> + add dst, dst, #1
> +9998: sub x0, end, dst // bytes not copied
> + ret
> +SYM_FUNC_END(__arch_copy_mc_to_user)
> +EXPORT_SYMBOL(__arch_copy_mc_to_user)
> diff --git a/include/linux/uio.h b/include/linux/uio.h
> index 739285fe5a2f..539d9ee9b032 100644
> --- a/include/linux/uio.h
> +++ b/include/linux/uio.h
> @@ -147,10 +147,17 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
> size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i);
> size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i);
>
> +#ifdef CONFIG_ARCH_HAS_COPY_MC
> +size_t copy_mc_page_to_iter(struct page *page, size_t offset, size_t bytes,
> + struct iov_iter *i);
> +#else
> +#define copy_mc_page_to_iter copy_page_to_iter
> +#endif
> +
> static inline size_t copy_folio_to_iter(struct folio *folio, size_t offset,
> size_t bytes, struct iov_iter *i)
> {
> - return copy_page_to_iter(&folio->page, offset, bytes, i);
> + return copy_mc_page_to_iter(&folio->page, offset, bytes, i);
> }
>
> static __always_inline __must_check
> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> index 6dd5330f7a99..2c5f3bb6391d 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -157,6 +157,19 @@ static int copyout(void __user *to, const void *from, size_t n)
> return n;
> }
>
> +#ifdef CONFIG_ARCH_HAS_COPY_MC
> +static int copyout_mc(void __user *to, const void *from, size_t n)
> +{
> + if (access_ok(to, n)) {
> + instrument_copy_to_user(to, from, n);
> + n = copy_mc_to_user((__force void *) to, from, n);
> + }
> + return n;
> +}
> +#else
> +#define copyout_mc copyout
> +#endif
> +
> static int copyin(void *to, const void __user *from, size_t n)
> {
> if (should_fail_usercopy())
> @@ -169,7 +182,7 @@ static int copyin(void *to, const void __user *from, size_t n)
> }
>
> static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t bytes,
> - struct iov_iter *i)
> + struct iov_iter *i, bool mc_safe)
> {
> size_t skip, copy, left, wanted;
> const struct iovec *iov;
> @@ -194,7 +207,10 @@ static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t b
> from = kaddr + offset;
>
> /* first chunk, usually the only one */
> - left = copyout(buf, from, copy);
> + if (mc_safe)
> + left = copyout_mc(buf, from, copy);
> + else
> + left = copyout(buf, from, copy);
> copy -= left;
> skip += copy;
> from += copy;
> @@ -204,7 +220,10 @@ static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t b
> iov++;
> buf = iov->iov_base;
> copy = min(bytes, iov->iov_len);
> - left = copyout(buf, from, copy);
> + if (mc_safe)
> + left = copyout_mc(buf, from, copy);
> + else
> + left = copyout(buf, from, copy);
> copy -= left;
> skip = copy;
> from += copy;
> @@ -223,7 +242,10 @@ static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t b
>
> kaddr = kmap(page);
> from = kaddr + offset;
> - left = copyout(buf, from, copy);
> + if (mc_safe)
> + left = copyout_mc(buf, from, copy);
> + else
> + left = copyout(buf, from, copy);
> copy -= left;
> skip += copy;
> from += copy;
> @@ -232,7 +254,10 @@ static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t b
> iov++;
> buf = iov->iov_base;
> copy = min(bytes, iov->iov_len);
> - left = copyout(buf, from, copy);
> + if (mc_safe)
> + left = copyout_mc(buf, from, copy);
> + else
> + left = copyout(buf, from, copy);
> copy -= left;
> skip = copy;
> from += copy;
> @@ -674,15 +699,6 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
> EXPORT_SYMBOL(_copy_to_iter);
>
> #ifdef CONFIG_ARCH_HAS_COPY_MC
> -static int copyout_mc(void __user *to, const void *from, size_t n)
> -{
> - if (access_ok(to, n)) {
> - instrument_copy_to_user(to, from, n);
> - n = copy_mc_to_user((__force void *) to, from, n);
> - }
> - return n;
> -}
> -
> static size_t copy_mc_pipe_to_iter(const void *addr, size_t bytes,
> struct iov_iter *i)
> {
> @@ -846,10 +862,10 @@ static inline bool page_copy_sane(struct page *page, size_t offset, size_t n)
> }
>
> static size_t __copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
> - struct iov_iter *i)
> + struct iov_iter *i, bool mc_safe)
> {
> if (likely(iter_is_iovec(i)))
> - return copy_page_to_iter_iovec(page, offset, bytes, i);
> + return copy_page_to_iter_iovec(page, offset, bytes, i, mc_safe);
> if (iov_iter_is_bvec(i) || iov_iter_is_kvec(i) || iov_iter_is_xarray(i)) {
> void *kaddr = kmap_local_page(page);
> size_t wanted = _copy_to_iter(kaddr + offset, bytes, i);
> @@ -878,7 +894,7 @@ size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
> offset %= PAGE_SIZE;
> while (1) {
> size_t n = __copy_page_to_iter(page, offset,
> - min(bytes, (size_t)PAGE_SIZE - offset), i);
> + min(bytes, (size_t)PAGE_SIZE - offset), i, false);
> res += n;
> bytes -= n;
> if (!bytes || !n)
> @@ -893,6 +909,41 @@ size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
> }
> EXPORT_SYMBOL(copy_page_to_iter);
>
> +#ifdef CONFIG_ARCH_HAS_COPY_MC
> +/**
> + * copy_mc_page_to_iter - copy page to iter with source memory error exception handling.
> + *
> + * The filemap_read deploys this for pagecache reading and the main differences between
> + * this and typical copy_page_to_iter() is call __copy_page_to_iter with mc_safe true.
> + *
> + * Return: number of bytes copied (may be %0)
> + */
> +size_t copy_mc_page_to_iter(struct page *page, size_t offset, size_t bytes,
> + struct iov_iter *i)
> +{
> + size_t res = 0;
> +
> + if (unlikely(!page_copy_sane(page, offset, bytes)))
> + return 0;
> + page += offset / PAGE_SIZE; // first subpage
> + offset %= PAGE_SIZE;
> + while (1) {
> + size_t n = __copy_page_to_iter(page, offset,
> + min(bytes, (size_t)PAGE_SIZE - offset), i, true);
> + res += n;
> + bytes -= n;
> + if (!bytes || !n)
> + break;
> + offset += n;
> + if (offset == PAGE_SIZE) {
> + page++;
> + offset = 0;
> + }
> + }
> + return res;
> +}
> +#endif
> +
> size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
> struct iov_iter *i)
> {
> --
> 2.18.0.huawei.25
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On Wed, Apr 06, 2022 at 09:13:09AM +0000, Tong Tiangen wrote:
> Add scenarios get_user to machine check safe. The processing of
> EX_TYPE_UACCESS_ERR_ZERO and EX_TYPE_UACCESS_ERR_ZERO_UCE_RECOVERY is same
> and both return -EFAULT.
Which uaccess cases do we expect to *not* be recoverable?
Naively I would assume that if we're going to treat a memory error on a uaccess
as fatal to userspace we should be able to do that for *any* uacesses.
The commit message should explain why we need the distinction between a
recoverable uaccess and a non-recoverable uaccess.
Thanks,
Mark.
>
> Signed-off-by: Tong Tiangen <[email protected]>
> ---
> arch/arm64/include/asm/asm-extable.h | 14 +++++++++++++-
> arch/arm64/include/asm/uaccess.h | 2 +-
> arch/arm64/mm/extable.c | 1 +
> 3 files changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/include/asm/asm-extable.h b/arch/arm64/include/asm/asm-extable.h
> index 74d1db74fd86..bfc2d224cbae 100644
> --- a/arch/arm64/include/asm/asm-extable.h
> +++ b/arch/arm64/include/asm/asm-extable.h
> @@ -10,8 +10,11 @@
>
> /* _MC indicates that can fixup from machine check errors */
> #define EX_TYPE_FIXUP_MC 5
> +#define EX_TYPE_UACCESS_ERR_ZERO_MC 6
>
> -#define IS_EX_TYPE_MC(type) (type == EX_TYPE_FIXUP_MC)
> +#define IS_EX_TYPE_MC(type) \
> + (type == EX_TYPE_FIXUP_MC || \
> + type == EX_TYPE_UACCESS_ERR_ZERO_MC)
>
> #ifdef __ASSEMBLY__
>
> @@ -77,6 +80,15 @@
> #define EX_DATA_REG(reg, gpr) \
> "((.L__gpr_num_" #gpr ") << " __stringify(EX_DATA_REG_##reg##_SHIFT) ")"
>
> +#define _ASM_EXTABLE_UACCESS_ERR_ZERO_MC(insn, fixup, err, zero) \
> + __DEFINE_ASM_GPR_NUMS \
> + __ASM_EXTABLE_RAW(#insn, #fixup, \
> + __stringify(EX_TYPE_UACCESS_ERR_ZERO_MC), \
> + "(" \
> + EX_DATA_REG(ERR, err) " | " \
> + EX_DATA_REG(ZERO, zero) \
> + ")")
> +
> #define _ASM_EXTABLE_UACCESS_ERR_ZERO(insn, fixup, err, zero) \
> __DEFINE_ASM_GPR_NUMS \
> __ASM_EXTABLE_RAW(#insn, #fixup, \
> diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
> index e8dce0cc5eaa..24b662407fbd 100644
> --- a/arch/arm64/include/asm/uaccess.h
> +++ b/arch/arm64/include/asm/uaccess.h
> @@ -236,7 +236,7 @@ static inline void __user *__uaccess_mask_ptr(const void __user *ptr)
> asm volatile( \
> "1: " load " " reg "1, [%2]\n" \
> "2:\n" \
> - _ASM_EXTABLE_UACCESS_ERR_ZERO(1b, 2b, %w0, %w1) \
> + _ASM_EXTABLE_UACCESS_ERR_ZERO_MC(1b, 2b, %w0, %w1) \
> : "+r" (err), "=&r" (x) \
> : "r" (addr))
>
> diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c
> index f1134c88e849..7c05f8d2bce0 100644
> --- a/arch/arm64/mm/extable.c
> +++ b/arch/arm64/mm/extable.c
> @@ -95,6 +95,7 @@ bool fixup_exception(struct pt_regs *regs, unsigned int esr)
> case EX_TYPE_BPF:
> return ex_handler_bpf(ex, regs);
> case EX_TYPE_UACCESS_ERR_ZERO:
> + case EX_TYPE_UACCESS_ERR_ZERO_MC:
> return ex_handler_uaccess_err_zero(ex, regs);
> case EX_TYPE_LOAD_UNALIGNED_ZEROPAD:
> return ex_handler_load_unaligned_zeropad(ex, regs);
> --
> 2.18.0.huawei.25
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
在 2022/4/6 18:22, Mark Rutland 写道:
> On Wed, Apr 06, 2022 at 09:13:06AM +0000, Tong Tiangen wrote:
>> Function page_address return void, fix it.
>>
>> Signed-off-by: Tong Tiangen <[email protected]>
>
> This looks like a sensible cleanup, but the commit title and message aren't
> that clear.
>
> Can you please make this:
>
> | arm64: fix types in copy_highpage()
> |
> | In copy_highpage() the `kto` and `kfrom` local variables are pointers to
> | struct page, but these are used to hold arbitrary pointers to kernel memory.
> | Each call to page_address() returns a void pointer to memory associated with
> | the relevant page, and copy_page() expects void pointers to this memory.
> |
> | This inconsistency was introduced in commit:
> |
> | 2563776b41c31908 ("arm64: mte: Tags-aware copy_{user_,}highpage() implementations")
> |
> | ... and while this doesn't appear to be harmful in practice it is clearly wrong.
> |
> | Correct this by making `kto` and `kfrom` void pointers.
> |
> | Fixes: 2563776b41c31908 ("arm64: mte: Tags-aware copy_{user_,}highpage() implementations")
>
> With that:
>
> Acked-by: Mark Rutland <[email protected]>
>
> Thanks,
> Mark.
>
OK, sure , will do in next version.
Thanks.
Tong.
>> ---
>> arch/arm64/mm/copypage.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/mm/copypage.c b/arch/arm64/mm/copypage.c
>> index b5447e53cd73..0dea80bf6de4 100644
>> --- a/arch/arm64/mm/copypage.c
>> +++ b/arch/arm64/mm/copypage.c
>> @@ -16,8 +16,8 @@
>>
>> void copy_highpage(struct page *to, struct page *from)
>> {
>> - struct page *kto = page_address(to);
>> - struct page *kfrom = page_address(from);
>> + void *kto = page_address(to);
>> + void *kfrom = page_address(from);
>>
>> copy_page(kto, kfrom);
>>
>> --
>> 2.18.0.huawei.25
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> [email protected]
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> .
在 2022/4/6 19:19, Mark Rutland 写道:
> On Wed, Apr 06, 2022 at 09:13:08AM +0000, Tong Tiangen wrote:
>> Add scenarios copy_from_user to machine check safe.
>>
>> The data copied is user data and is machine check safe, so just kill
>> the user process and isolate the error page, not necessary panic.
>>
>> Signed-off-by: Tong Tiangen <[email protected]>
>> ---
>> arch/arm64/include/asm/asm-uaccess.h | 16 ++++++++++++++++
>> arch/arm64/lib/copy_from_user.S | 11 ++++++-----
>> 2 files changed, 22 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/asm-uaccess.h b/arch/arm64/include/asm/asm-uaccess.h
>> index 0557af834e03..f31c8978e1af 100644
>> --- a/arch/arm64/include/asm/asm-uaccess.h
>> +++ b/arch/arm64/include/asm/asm-uaccess.h
>> @@ -92,4 +92,20 @@ alternative_else_nop_endif
>>
>> _asm_extable 8888b,\l;
>> .endm
>> +
>> + .macro user_ldp_mc l, reg1, reg2, addr, post_inc
>> +8888: ldtr \reg1, [\addr];
>> +8889: ldtr \reg2, [\addr, #8];
>> + add \addr, \addr, \post_inc;
>> +
>> + _asm_extable_mc 8888b, \l;
>> + _asm_extable_mc 8889b, \l;
>> + .endm
>> +
>> + .macro user_ldst_mc l, inst, reg, addr, post_inc
>> +8888: \inst \reg, [\addr];
>> + add \addr, \addr, \post_inc;
>> +
>> + _asm_extable_mc 8888b, \l;
>> + .endm
>> #endif
>> diff --git a/arch/arm64/lib/copy_from_user.S b/arch/arm64/lib/copy_from_user.S
>> index 34e317907524..d9d7c5291871 100644
>> --- a/arch/arm64/lib/copy_from_user.S
>> +++ b/arch/arm64/lib/copy_from_user.S
>> @@ -21,7 +21,7 @@
>> */
>>
>> .macro ldrb1 reg, ptr, val
>> - user_ldst 9998f, ldtrb, \reg, \ptr, \val
>> + user_ldst_mc 9998f, ldtrb, \reg, \ptr, \val
>> .endm
>>
>> .macro strb1 reg, ptr, val
>> @@ -29,7 +29,7 @@
>> .endm
>>
>> .macro ldrh1 reg, ptr, val
>> - user_ldst 9997f, ldtrh, \reg, \ptr, \val
>> + user_ldst_mc 9997f, ldtrh, \reg, \ptr, \val
>> .endm
>>
>> .macro strh1 reg, ptr, val
>> @@ -37,7 +37,7 @@
>> .endm
>>
>> .macro ldr1 reg, ptr, val
>> - user_ldst 9997f, ldtr, \reg, \ptr, \val
>> + user_ldst_mc 9997f, ldtr, \reg, \ptr, \val
>> .endm
>>
>> .macro str1 reg, ptr, val
>> @@ -45,7 +45,7 @@
>> .endm
>>
>> .macro ldp1 reg1, reg2, ptr, val
>> - user_ldp 9997f, \reg1, \reg2, \ptr, \val
>> + user_ldp_mc 9997f, \reg1, \reg2, \ptr, \val
>> .endm
>>
>> .macro stp1 reg1, reg2, ptr, val
>> @@ -62,7 +62,8 @@ SYM_FUNC_START(__arch_copy_from_user)
>> ret
>>
>> // Exception fixups
>> -9997: cmp dst, dstin
>> +9997: cbz x0, 9998f // Check machine check exception
>> + cmp dst, dstin
>> b.ne 9998f
>
> If you look at the copy template, you'd see that `dstin` *is* x0.
>
> Consier if we took a non-SEA fault. The the fixup handler will overwrite x0,
> it's likely `dst` != `dstin`, and we'll branch to the byte-by-byte copy. Or if
> we're doing something odd and mmap_min_addr is 0, we can do the wrong thing the
> other way around and *not* branch to the byte-by-byte copy when we should.
>
> So this is at best confusing, but likely broken too.
>
> Thanks,
> Mark.
OK, missing that, will be fixed in next verison.
Thanks,
Tong.
>
>> // Before being absolutely sure we couldn't copy anything, try harder
>> USER(9998f, ldtrb tmp1w, [srcin])
>> --
>> 2.18.0.huawei.25
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> [email protected]
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> .
在 2022/4/6 19:22, Mark Rutland 写道:
> On Wed, Apr 06, 2022 at 09:13:09AM +0000, Tong Tiangen wrote:
>> Add scenarios get_user to machine check safe. The processing of
>> EX_TYPE_UACCESS_ERR_ZERO and EX_TYPE_UACCESS_ERR_ZERO_UCE_RECOVERY is same
>> and both return -EFAULT.
>
> Which uaccess cases do we expect to *not* be recoverable?
>
> Naively I would assume that if we're going to treat a memory error on a uaccess
> as fatal to userspace we should be able to do that for *any* uacesses.
>
> The commit message should explain why we need the distinction between a
> recoverable uaccess and a non-recoverable uaccess.
>
> Thanks,
> Mark.
>
Currently, any memory error consumed in kernel mode will lead to panic
(do_sea()).
My idea is that not all memory errors consumed in kernel mode are fatal,
such as copy_ from_ user/get_ user is a memory error consumed when
reading user data in the process context. In this case, we can not let
the kernel panic, just kill the process without affecting the operation
of the system.
However, not all uaccess can be recovered without affecting the normal
operation of the system. The key is not whether it is uaccess, but
whether there are key data affecting the normal operation of the system
in the read page.
Thanks,
Tong.
>>
>> Signed-off-by: Tong Tiangen <[email protected]>
>> ---
>> arch/arm64/include/asm/asm-extable.h | 14 +++++++++++++-
>> arch/arm64/include/asm/uaccess.h | 2 +-
>> arch/arm64/mm/extable.c | 1 +
>> 3 files changed, 15 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/asm-extable.h b/arch/arm64/include/asm/asm-extable.h
>> index 74d1db74fd86..bfc2d224cbae 100644
>> --- a/arch/arm64/include/asm/asm-extable.h
>> +++ b/arch/arm64/include/asm/asm-extable.h
>> @@ -10,8 +10,11 @@
>>
>> /* _MC indicates that can fixup from machine check errors */
>> #define EX_TYPE_FIXUP_MC 5
>> +#define EX_TYPE_UACCESS_ERR_ZERO_MC 6
>>
>> -#define IS_EX_TYPE_MC(type) (type == EX_TYPE_FIXUP_MC)
>> +#define IS_EX_TYPE_MC(type) \
>> + (type == EX_TYPE_FIXUP_MC || \
>> + type == EX_TYPE_UACCESS_ERR_ZERO_MC)
>>
>> #ifdef __ASSEMBLY__
>>
>> @@ -77,6 +80,15 @@
>> #define EX_DATA_REG(reg, gpr) \
>> "((.L__gpr_num_" #gpr ") << " __stringify(EX_DATA_REG_##reg##_SHIFT) ")"
>>
>> +#define _ASM_EXTABLE_UACCESS_ERR_ZERO_MC(insn, fixup, err, zero) \
>> + __DEFINE_ASM_GPR_NUMS \
>> + __ASM_EXTABLE_RAW(#insn, #fixup, \
>> + __stringify(EX_TYPE_UACCESS_ERR_ZERO_MC), \
>> + "(" \
>> + EX_DATA_REG(ERR, err) " | " \
>> + EX_DATA_REG(ZERO, zero) \
>> + ")")
>> +
>> #define _ASM_EXTABLE_UACCESS_ERR_ZERO(insn, fixup, err, zero) \
>> __DEFINE_ASM_GPR_NUMS \
>> __ASM_EXTABLE_RAW(#insn, #fixup, \
>> diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
>> index e8dce0cc5eaa..24b662407fbd 100644
>> --- a/arch/arm64/include/asm/uaccess.h
>> +++ b/arch/arm64/include/asm/uaccess.h
>> @@ -236,7 +236,7 @@ static inline void __user *__uaccess_mask_ptr(const void __user *ptr)
>> asm volatile( \
>> "1: " load " " reg "1, [%2]\n" \
>> "2:\n" \
>> - _ASM_EXTABLE_UACCESS_ERR_ZERO(1b, 2b, %w0, %w1) \
>> + _ASM_EXTABLE_UACCESS_ERR_ZERO_MC(1b, 2b, %w0, %w1) \
>> : "+r" (err), "=&r" (x) \
>> : "r" (addr))
>>
>> diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c
>> index f1134c88e849..7c05f8d2bce0 100644
>> --- a/arch/arm64/mm/extable.c
>> +++ b/arch/arm64/mm/extable.c
>> @@ -95,6 +95,7 @@ bool fixup_exception(struct pt_regs *regs, unsigned int esr)
>> case EX_TYPE_BPF:
>> return ex_handler_bpf(ex, regs);
>> case EX_TYPE_UACCESS_ERR_ZERO:
>> + case EX_TYPE_UACCESS_ERR_ZERO_MC:
>> return ex_handler_uaccess_err_zero(ex, regs);
>> case EX_TYPE_LOAD_UNALIGNED_ZEROPAD:
>> return ex_handler_load_unaligned_zeropad(ex, regs);
>> --
>> 2.18.0.huawei.25
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> [email protected]
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> .
在 2022/4/6 18:04, Mark Rutland 写道:
> Hi,
>
> In future, for the arm64 uaccess stuff, could you please CC me, and for the
> arm64 RAS bits (e.g. the SEA handling), could you please CC James Morse?
ok :)
>
> On Wed, Apr 06, 2022 at 09:13:04AM +0000, Tong Tiangen wrote:
>> This patchset is based on[1].
>
> That link below appears to be a single patch. Sending that separately makes
> this harder to review, so in future could you please send this as a combined
> series?
>
>> With the increase of memory capacity and density, the probability of
>> memory error increases. The increasing size and density of server RAM
>> in the data center and cloud have shown increased uncorrectable memory
>> errors.
>>
>> Currently, the kernel has a mechanism to recover from hardware memory
>> errors. This patchset provides an new recovery mechanism.
>>
>> For ARM64, the hardware error handling is do_sea() which divided into
>> two cases:
>> 1. The user state consumed the memory errors, the solution is kill th
>> user process and isolate the error page.
>> 2. The kernel state consumed the memory errors, the solution is panic.
>>
>> For kernelspace, Undifferentiated panic maybe not the optimal choice,
>> it can be handled better.
>>
>> This patchset deals with four sscenarios of hardware memory error consumed
>> in kernelspace:
>> 1. copy_from_user.
>> 2. get_user.
>
> What about atomics to user memory? e.g. futexes, or the armv8_deprecated
> emulations?
>
> It seems the assumption is that writing to user memory (e.g. copy_to_user() and
> put_user()) don't matter? Could you please mention why? e.g. do we never take
> an exception for writes to memory with errors?
First, explain why only pay attention to the errors that occur when
reading memory and not when writing memory:
1. For Linux reading page, the Linux is consumer[*], the DDR controller
is producer. if page with memory error, Linux consumes the error will
receive an error signal than process the signal.
2. For Linux writing page, the Linux is producer, the DDR controller is
consumer, the DDR controller will process the memory error.
3. From the perspective of Linux, here we only focus on his situation as
a consumer. Focus on how Linux responds to errors when reading pages.
[*]For definitions of producers and consumers, refer to the documentation:
Reliability, Availability, and Serviceability (RAS) Architecture Extension
Second, explain why writing to user memory don't matter.
Don't matter means that we will not deal with it in this patchset, but
follow the current strategy of the kernel(kernel panic). Take
copy_from[to]_user/get[put]_user as an example:
1. In copy_to_user()/put_user(), it read the kernel page and write to
user page, We cannot judge the importance of this kernel page that holds
kernel data, if a memory error is encountered while reading, the normal
operation of the system after recovery cannot be guaranteed,so the
current processing strategy of the kernel is panic,we will not change this.
2. In copy_from_user()/get_user(), it read the user page and write to
kernel page in user process context, this user data is only critical to
this user process and does not affect the operation of the whole system.
Therefore, if a memory error is encountered while reading, we can
recover by killing this process and isolating the error user page
without going to kernel panic, This patchset is aimed at this situation.
>
>> 3. cow(copy on write).
>> 4. pagecache reading.
>
> There are a bunch of other places where we'll access user memory via the linear
> map, so I assume this is just a best-effort "try not to die" rather than "never
> die" ?
>
> Are there other places we might need/want to expand this to in future?
>
> Thanks,
> Mark.
Yes.
The strategy is "try not to die" in some specific scene.
In both cases(cow and pagecache reading), when the page with memory
error is read in user process context, the result is not fatal, because
the data of the error page is only critical to the user process. Killing
the process and isolating the error page will not affect the normal
operation of the system.
I hope I can explain this clearly.
Great thanks to mark and I hope James can help take a look at this idea.
Thanks.
Tong.
>
>> These four scenarios have similarities. Although the error is consumed in
>> the kernel state, but the consumed data belongs to the user state.
>>
>> The processing scheme is based on CONFIG_ARCH_HAS_COPY_MC and uses the
>> process killing plus isolate error page to replace kernel panic.
>>
>> [1]https://lore.kernel.org/lkml/[email protected]/
>>
>> Since V2:
>> 1.Consistent with PPC/x86, Using CONFIG_ARCH_HAS_COPY_MC instead of
>> ARM64_UCE_KERNEL_RECOVERY.
>> 2.Add two new scenarios, cow and pagecache reading.
>> 3.Fix two small bug(the first two patch).
>>
>> Tong Tiangen (7):
>> x86: fix copy_mc_to_user compile error
>> arm64: fix page_address return value in copy_highpage
>> arm64: add support for machine check error safe
>> arm64: add copy_from_user to machine check safe
>> arm64: add get_user to machine check safe
>> arm64: add cow to machine check safe
>> arm64: add pagecache reading to machine check safe
>>
>> arch/arm64/Kconfig | 1 +
>> arch/arm64/include/asm/asm-extable.h | 25 +++++++
>> arch/arm64/include/asm/asm-uaccess.h | 16 +++++
>> arch/arm64/include/asm/esr.h | 5 ++
>> arch/arm64/include/asm/extable.h | 2 +-
>> arch/arm64/include/asm/page.h | 10 +++
>> arch/arm64/include/asm/uaccess.h | 17 ++++-
>> arch/arm64/kernel/probes/kprobes.c | 2 +-
>> arch/arm64/lib/Makefile | 2 +
>> arch/arm64/lib/copy_from_user.S | 11 ++--
>> arch/arm64/lib/copy_page_mc.S | 98 ++++++++++++++++++++++++++++
>> arch/arm64/lib/copy_to_user_mc.S | 78 ++++++++++++++++++++++
>> arch/arm64/mm/copypage.c | 36 ++++++++--
>> arch/arm64/mm/extable.c | 21 +++++-
>> arch/arm64/mm/fault.c | 30 ++++++++-
>> arch/x86/include/asm/uaccess.h | 1 +
>> include/linux/highmem.h | 8 +++
>> include/linux/uaccess.h | 8 +++
>> include/linux/uio.h | 9 ++-
>> lib/iov_iter.c | 85 +++++++++++++++++++-----
>> mm/memory.c | 2 +-
>> 21 files changed, 432 insertions(+), 35 deletions(-)
>> create mode 100644 arch/arm64/lib/copy_page_mc.S
>> create mode 100644 arch/arm64/lib/copy_to_user_mc.S
>>
>> --
>> 2.18.0.huawei.25
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> [email protected]
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> .
On 2022-04-07 15:56, Tong Tiangen wrote:
>
>
> 在 2022/4/6 19:27, Mark Rutland 写道:
>> On Wed, Apr 06, 2022 at 09:13:11AM +0000, Tong Tiangen wrote:
>>> When user process reading file, the data is cached in pagecache and
>>> the data belongs to the user process, When machine check error is
>>> encountered during pagecache reading, killing the user process and
>>> isolate the user page with hardware memory errors is a more reasonable
>>> choice than kernel panic.
>>>
>>> The __arch_copy_mc_to_user() in copy_to_user_mc.S is largely borrows
>>> from __arch_copy_to_user() in copy_to_user.S and the main difference
>>> is __arch_copy_mc_to_user() add the extable entry to support machine
>>> check safe.
>>
>> As with prior patches, *why* is the distinction necessary?
>>
>> This patch adds a bunch of conditional logic, but *structurally* it
>> doesn't
>> alter the handling to be substantially different for the MC and non-MC
>> cases.
>>
>> This seems like pointless duplication that just makes it harder to
>> maintain
>> this code.
>>
>> Thanks,
>> Mark.
>
> Agreed, The implementation here looks a little ugly and harder to maintain.
>
> The purpose of my doing this is not all copy_to_user can be recovered.
>
> A memory error is consumed when reading pagecache using copy_to_user.
> I think in this scenario, only the process is affected because it can't
> read
> pagecache data correctly. Just kill the process and don't need the whole
> kernel panic.
>
> So I need two different copy_to_user implementation, one is existing
> __arch_copy_to_user,
> this function will panic when consuming memory errors. The other one is
> this new helper
> __arch_copy_mc_to_user, this interface is used when reading pagecache.
> It can recover from
> consume memory error.
OK, but do we really need two almost-identical implementations of every
function where the only difference is how the exception table entries
are annotated? Could the exception handler itself just figure out who
owns the page where the fault occurred and decide what action to take as
appropriate?
Robin.
>
> In future, if find a scenario use copy_to_user can also be recovered, we
> can also use this mc
> safe helper instead it.
>
> Thanks,
> Tong.
>
>>
>>> In _copy_page_to_iter(), machine check safe only considered ITER_IOVEC
>>> which is used by pagecache reading.
>>>
>>> Signed-off-by: Tong Tiangen <[email protected]>
>>> ---
>>> arch/arm64/include/asm/uaccess.h | 15 ++++++
>>> arch/arm64/lib/Makefile | 2 +-
>>> arch/arm64/lib/copy_to_user_mc.S | 78 +++++++++++++++++++++++++++++
>>> include/linux/uio.h | 9 +++-
>>> lib/iov_iter.c | 85 +++++++++++++++++++++++++-------
>>> 5 files changed, 170 insertions(+), 19 deletions(-)
>>> create mode 100644 arch/arm64/lib/copy_to_user_mc.S
>>>
>>> diff --git a/arch/arm64/include/asm/uaccess.h
>>> b/arch/arm64/include/asm/uaccess.h
>>> index 24b662407fbd..f0d5e811165a 100644
>>> --- a/arch/arm64/include/asm/uaccess.h
>>> +++ b/arch/arm64/include/asm/uaccess.h
>>> @@ -448,6 +448,21 @@ extern long strncpy_from_user(char *dest, const
>>> char __user *src, long count);
>>> extern __must_check long strnlen_user(const char __user *str, long n);
>>> +#ifdef CONFIG_ARCH_HAS_COPY_MC
>>> +extern unsigned long __must_check __arch_copy_mc_to_user(void __user
>>> *to,
>>> + const void *from, unsigned long n);
>>> +static inline unsigned long __must_check
>>> +copy_mc_to_user(void __user *to, const void *from, unsigned long n)
>>> +{
>>> + uaccess_ttbr0_enable();
>>> + n = __arch_copy_mc_to_user(__uaccess_mask_ptr(to), from, n);
>>> + uaccess_ttbr0_disable();
>>> +
>>> + return n;
>>> +}
>>> +#define copy_mc_to_user copy_mc_to_user
>>> +#endif
>>> +
>>> #ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
>>> struct page;
>>> void memcpy_page_flushcache(char *to, struct page *page, size_t
>>> offset, size_t len);
>>> diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
>>> index 29c578414b12..9b3571227fb4 100644
>>> --- a/arch/arm64/lib/Makefile
>>> +++ b/arch/arm64/lib/Makefile
>>> @@ -23,4 +23,4 @@ obj-$(CONFIG_ARM64_MTE) += mte.o
>>> obj-$(CONFIG_KASAN_SW_TAGS) += kasan_sw_tags.o
>>> -obj-$(CONFIG_ARCH_HAS_CPY_MC) += copy_page_mc.o
>>> +obj-$(CONFIG_ARCH_HAS_COPY_MC) += copy_page_mc.o copy_to_user_mc.o
>>> diff --git a/arch/arm64/lib/copy_to_user_mc.S
>>> b/arch/arm64/lib/copy_to_user_mc.S
>>> new file mode 100644
>>> index 000000000000..9d228ff15446
>>> --- /dev/null
>>> +++ b/arch/arm64/lib/copy_to_user_mc.S
>>> @@ -0,0 +1,78 @@
>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>> +/*
>>> + * Copyright (C) 2012 ARM Ltd.
>>> + */
>>> +
>>> +#include <linux/linkage.h>
>>> +
>>> +#include <asm/asm-uaccess.h>
>>> +#include <asm/assembler.h>
>>> +#include <asm/cache.h>
>>> +
>>> +/*
>>> + * Copy to user space from a kernel buffer (alignment handled by the
>>> hardware)
>>> + *
>>> + * Parameters:
>>> + * x0 - to
>>> + * x1 - from
>>> + * x2 - n
>>> + * Returns:
>>> + * x0 - bytes not copied
>>> + */
>>> + .macro ldrb1 reg, ptr, val
>>> + 1000: ldrb \reg, [\ptr], \val;
>>> + _asm_extable_mc 1000b, 9998f;
>>> + .endm
>>> +
>>> + .macro strb1 reg, ptr, val
>>> + user_ldst_mc 9998f, sttrb, \reg, \ptr, \val
>>> + .endm
>>> +
>>> + .macro ldrh1 reg, ptr, val
>>> + 1001: ldrh \reg, [\ptr], \val;
>>> + _asm_extable_mc 1001b, 9998f;
>>> + .endm
>>> +
>>> + .macro strh1 reg, ptr, val
>>> + user_ldst_mc 9997f, sttrh, \reg, \ptr, \val
>>> + .endm
>>> +
>>> + .macro ldr1 reg, ptr, val
>>> + 1002: ldr \reg, [\ptr], \val;
>>> + _asm_extable_mc 1002b, 9998f;
>>> + .endm
>>> +
>>> + .macro str1 reg, ptr, val
>>> + user_ldst_mc 9997f, sttr, \reg, \ptr, \val
>>> + .endm
>>> +
>>> + .macro ldp1 reg1, reg2, ptr, val
>>> + 1003: ldp \reg1, \reg2, [\ptr], \val;
>>> + _asm_extable_mc 1003b, 9998f;
>>> + .endm
>>> +
>>> + .macro stp1 reg1, reg2, ptr, val
>>> + user_stp 9997f, \reg1, \reg2, \ptr, \val
>>> + .endm
>>> +
>>> +end .req x5
>>> +srcin .req x15
>>> +SYM_FUNC_START(__arch_copy_mc_to_user)
>>> + add end, x0, x2
>>> + mov srcin, x1
>>> +#include "copy_template.S"
>>> + mov x0, #0
>>> + ret
>>> +
>>> + // Exception fixups
>>> +9997: cbz x0, 9998f // Check machine check exception
>>> + cmp dst, dstin
>>> + b.ne 9998f
>>> + // Before being absolutely sure we couldn't copy anything, try
>>> harder
>>> + ldrb tmp1w, [srcin]
>>> +USER(9998f, sttrb tmp1w, [dst])
>>> + add dst, dst, #1
>>> +9998: sub x0, end, dst // bytes not copied
>>> + ret
>>> +SYM_FUNC_END(__arch_copy_mc_to_user)
>>> +EXPORT_SYMBOL(__arch_copy_mc_to_user)
>>> diff --git a/include/linux/uio.h b/include/linux/uio.h
>>> index 739285fe5a2f..539d9ee9b032 100644
>>> --- a/include/linux/uio.h
>>> +++ b/include/linux/uio.h
>>> @@ -147,10 +147,17 @@ size_t _copy_to_iter(const void *addr, size_t
>>> bytes, struct iov_iter *i);
>>> size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i);
>>> size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct
>>> iov_iter *i);
>>> +#ifdef CONFIG_ARCH_HAS_COPY_MC
>>> +size_t copy_mc_page_to_iter(struct page *page, size_t offset, size_t
>>> bytes,
>>> + struct iov_iter *i);
>>> +#else
>>> +#define copy_mc_page_to_iter copy_page_to_iter
>>> +#endif
>>> +
>>> static inline size_t copy_folio_to_iter(struct folio *folio, size_t
>>> offset,
>>> size_t bytes, struct iov_iter *i)
>>> {
>>> - return copy_page_to_iter(&folio->page, offset, bytes, i);
>>> + return copy_mc_page_to_iter(&folio->page, offset, bytes, i);
>>> }
>>> static __always_inline __must_check
>>> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
>>> index 6dd5330f7a99..2c5f3bb6391d 100644
>>> --- a/lib/iov_iter.c
>>> +++ b/lib/iov_iter.c
>>> @@ -157,6 +157,19 @@ static int copyout(void __user *to, const void
>>> *from, size_t n)
>>> return n;
>>> }
>>> +#ifdef CONFIG_ARCH_HAS_COPY_MC
>>> +static int copyout_mc(void __user *to, const void *from, size_t n)
>>> +{
>>> + if (access_ok(to, n)) {
>>> + instrument_copy_to_user(to, from, n);
>>> + n = copy_mc_to_user((__force void *) to, from, n);
>>> + }
>>> + return n;
>>> +}
>>> +#else
>>> +#define copyout_mc copyout
>>> +#endif
>>> +
>>> static int copyin(void *to, const void __user *from, size_t n)
>>> {
>>> if (should_fail_usercopy())
>>> @@ -169,7 +182,7 @@ static int copyin(void *to, const void __user
>>> *from, size_t n)
>>> }
>>> static size_t copy_page_to_iter_iovec(struct page *page, size_t
>>> offset, size_t bytes,
>>> - struct iov_iter *i)
>>> + struct iov_iter *i, bool mc_safe)
>>> {
>>> size_t skip, copy, left, wanted;
>>> const struct iovec *iov;
>>> @@ -194,7 +207,10 @@ static size_t copy_page_to_iter_iovec(struct
>>> page *page, size_t offset, size_t b
>>> from = kaddr + offset;
>>> /* first chunk, usually the only one */
>>> - left = copyout(buf, from, copy);
>>> + if (mc_safe)
>>> + left = copyout_mc(buf, from, copy);
>>> + else
>>> + left = copyout(buf, from, copy);
>>> copy -= left;
>>> skip += copy;
>>> from += copy;
>>> @@ -204,7 +220,10 @@ static size_t copy_page_to_iter_iovec(struct
>>> page *page, size_t offset, size_t b
>>> iov++;
>>> buf = iov->iov_base;
>>> copy = min(bytes, iov->iov_len);
>>> - left = copyout(buf, from, copy);
>>> + if (mc_safe)
>>> + left = copyout_mc(buf, from, copy);
>>> + else
>>> + left = copyout(buf, from, copy);
>>> copy -= left;
>>> skip = copy;
>>> from += copy;
>>> @@ -223,7 +242,10 @@ static size_t copy_page_to_iter_iovec(struct
>>> page *page, size_t offset, size_t b
>>> kaddr = kmap(page);
>>> from = kaddr + offset;
>>> - left = copyout(buf, from, copy);
>>> + if (mc_safe)
>>> + left = copyout_mc(buf, from, copy);
>>> + else
>>> + left = copyout(buf, from, copy);
>>> copy -= left;
>>> skip += copy;
>>> from += copy;
>>> @@ -232,7 +254,10 @@ static size_t copy_page_to_iter_iovec(struct
>>> page *page, size_t offset, size_t b
>>> iov++;
>>> buf = iov->iov_base;
>>> copy = min(bytes, iov->iov_len);
>>> - left = copyout(buf, from, copy);
>>> + if (mc_safe)
>>> + left = copyout_mc(buf, from, copy);
>>> + else
>>> + left = copyout(buf, from, copy);
>>> copy -= left;
>>> skip = copy;
>>> from += copy;
>>> @@ -674,15 +699,6 @@ size_t _copy_to_iter(const void *addr, size_t
>>> bytes, struct iov_iter *i)
>>> EXPORT_SYMBOL(_copy_to_iter);
>>> #ifdef CONFIG_ARCH_HAS_COPY_MC
>>> -static int copyout_mc(void __user *to, const void *from, size_t n)
>>> -{
>>> - if (access_ok(to, n)) {
>>> - instrument_copy_to_user(to, from, n);
>>> - n = copy_mc_to_user((__force void *) to, from, n);
>>> - }
>>> - return n;
>>> -}
>>> -
>>> static size_t copy_mc_pipe_to_iter(const void *addr, size_t bytes,
>>> struct iov_iter *i)
>>> {
>>> @@ -846,10 +862,10 @@ static inline bool page_copy_sane(struct page
>>> *page, size_t offset, size_t n)
>>> }
>>> static size_t __copy_page_to_iter(struct page *page, size_t offset,
>>> size_t bytes,
>>> - struct iov_iter *i)
>>> + struct iov_iter *i, bool mc_safe)
>>> {
>>> if (likely(iter_is_iovec(i)))
>>> - return copy_page_to_iter_iovec(page, offset, bytes, i);
>>> + return copy_page_to_iter_iovec(page, offset, bytes, i,
>>> mc_safe);
>>> if (iov_iter_is_bvec(i) || iov_iter_is_kvec(i) ||
>>> iov_iter_is_xarray(i)) {
>>> void *kaddr = kmap_local_page(page);
>>> size_t wanted = _copy_to_iter(kaddr + offset, bytes, i);
>>> @@ -878,7 +894,7 @@ size_t copy_page_to_iter(struct page *page,
>>> size_t offset, size_t bytes,
>>> offset %= PAGE_SIZE;
>>> while (1) {
>>> size_t n = __copy_page_to_iter(page, offset,
>>> - min(bytes, (size_t)PAGE_SIZE - offset), i);
>>> + min(bytes, (size_t)PAGE_SIZE - offset), i, false);
>>> res += n;
>>> bytes -= n;
>>> if (!bytes || !n)
>>> @@ -893,6 +909,41 @@ size_t copy_page_to_iter(struct page *page,
>>> size_t offset, size_t bytes,
>>> }
>>> EXPORT_SYMBOL(copy_page_to_iter);
>>> +#ifdef CONFIG_ARCH_HAS_COPY_MC
>>> +/**
>>> + * copy_mc_page_to_iter - copy page to iter with source memory error
>>> exception handling.
>>> + *
>>> + * The filemap_read deploys this for pagecache reading and the main
>>> differences between
>>> + * this and typical copy_page_to_iter() is call __copy_page_to_iter
>>> with mc_safe true.
>>> + *
>>> + * Return: number of bytes copied (may be %0)
>>> + */
>>> +size_t copy_mc_page_to_iter(struct page *page, size_t offset, size_t
>>> bytes,
>>> + struct iov_iter *i)
>>> +{
>>> + size_t res = 0;
>>> +
>>> + if (unlikely(!page_copy_sane(page, offset, bytes)))
>>> + return 0;
>>> + page += offset / PAGE_SIZE; // first subpage
>>> + offset %= PAGE_SIZE;
>>> + while (1) {
>>> + size_t n = __copy_page_to_iter(page, offset,
>>> + min(bytes, (size_t)PAGE_SIZE - offset), i, true);
>>> + res += n;
>>> + bytes -= n;
>>> + if (!bytes || !n)
>>> + break;
>>> + offset += n;
>>> + if (offset == PAGE_SIZE) {
>>> + page++;
>>> + offset = 0;
>>> + }
>>> + }
>>> + return res;
>>> +}
>>> +#endif
>>> +
>>> size_t copy_page_from_iter(struct page *page, size_t offset, size_t
>>> bytes,
>>> struct iov_iter *i)
>>> {
>>> --
>>> 2.18.0.huawei.25
>>>
>>>
>>> _______________________________________________
>>> linux-arm-kernel mailing list
>>> [email protected]
>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>> .
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
在 2022/4/6 18:58, Mark Rutland 写道:
> On Wed, Apr 06, 2022 at 09:13:07AM +0000, Tong Tiangen wrote:
>> In arm64 kernel hardware memory errors process(do_sea()), if the errors
>> is consumed in the kernel, the current processing is panic. However,
>> it is not optimal. In some case, the page accessed in kernel is a user
>> page (such as copy_from_user/get_user), kill the user process and
>> isolate the user page with hardware memory errors is a better choice.
>>
>> Consistent with PPC/x86, it is implemented by CONFIG_ARCH_HAS_COPY_MC.
>
> Why do we need new helpers for this, rather than doing this for *any* uaccess >
> I understand this is consistent with PPC & X86, but *why* is it done that way
> today? e.g. are there cases where we access memroy where we do not expect the
> situation to be recoverable?
Here[1] i explain why not all uaccess needs to be recoverable.
[1]https://lore.kernel.org/lkml/[email protected]/
>
>> This patch only enable machine error check framework, it add exception
>> fixup before kernel panic in do_sea() and only limit the consumption of
>> hardware memory errors in kernel mode triggered by user mode processes.
>> If fixup successful, there is no need to panic.
>>
>> Also add _asm_extable_mc macro used for add extable entry to help
>> fixup.
>>
>> Signed-off-by: Tong Tiangen <[email protected]>
>> ---
>> arch/arm64/Kconfig | 1 +
>> arch/arm64/include/asm/asm-extable.h | 13 ++++++++++++
>> arch/arm64/include/asm/esr.h | 5 +++++
>> arch/arm64/include/asm/extable.h | 2 +-
>> arch/arm64/kernel/probes/kprobes.c | 2 +-
>> arch/arm64/mm/extable.c | 20 ++++++++++++++++++-
>> arch/arm64/mm/fault.c | 30 +++++++++++++++++++++++++++-
>> include/linux/uaccess.h | 8 ++++++++
>> 8 files changed, 77 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index d9325dd95eba..012e38309955 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -19,6 +19,7 @@ config ARM64
>> select ARCH_ENABLE_SPLIT_PMD_PTLOCK if PGTABLE_LEVELS > 2
>> select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
>> select ARCH_HAS_CACHE_LINE_SIZE
>> + select ARCH_HAS_COPY_MC if ACPI_APEI_GHES
>> select ARCH_HAS_CURRENT_STACK_POINTER
>> select ARCH_HAS_DEBUG_VIRTUAL
>> select ARCH_HAS_DEBUG_VM_PGTABLE
>> diff --git a/arch/arm64/include/asm/asm-extable.h b/arch/arm64/include/asm/asm-extable.h
>> index c39f2437e08e..74d1db74fd86 100644
>> --- a/arch/arm64/include/asm/asm-extable.h
>> +++ b/arch/arm64/include/asm/asm-extable.h
>> @@ -8,6 +8,11 @@
>> #define EX_TYPE_UACCESS_ERR_ZERO 3
>> #define EX_TYPE_LOAD_UNALIGNED_ZEROPAD 4
>>
>> +/* _MC indicates that can fixup from machine check errors */
>> +#define EX_TYPE_FIXUP_MC 5
>> +
>> +#define IS_EX_TYPE_MC(type) (type == EX_TYPE_FIXUP_MC)
>
> If we need this, I'd strongly prefer that we have a EX_TYPE_UACCESS_MC or
> EX_TYPE_UACCESS_MC_ERR_ZERO for the uaccess cases, so that we can clearly
> distinguish those from non-uaccess cases.
Agreed.
>
> AFAICT the only remaining raw EX_TYPE_FIXUP cases we have today are in some
> cache maintenance routines, and we should be able to convert those to a new
> EX_TYPE_FIXUP_UACCESS, or EX_TYPE_UACCESS_ERR_ZERO.
I will redesign this part. The general idea is not to use x0 to save
exception information. At that time, it may be merged into raw
EX_TYPE_FIXUP processing.
>
>> +
>> #ifdef __ASSEMBLY__
>>
>> #define __ASM_EXTABLE_RAW(insn, fixup, type, data) \
>> @@ -27,6 +32,14 @@
>> __ASM_EXTABLE_RAW(\insn, \fixup, EX_TYPE_FIXUP, 0)
>> .endm
>>
>> +/*
>> + * Create an exception table entry for `insn`, which will branch to `fixup`
>> + * when an unhandled fault(include sea fault) is taken.
>> + */
>> + .macro _asm_extable_mc, insn, fixup
>> + __ASM_EXTABLE_RAW(\insn, \fixup, EX_TYPE_FIXUP_MC, 0)
>> + .endm
>> +
>> /*
>> * Create an exception table entry for `insn` if `fixup` is provided. Otherwise
>> * do nothing.
>> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
>> index d52a0b269ee8..11fcfc002654 100644
>> --- a/arch/arm64/include/asm/esr.h
>> +++ b/arch/arm64/include/asm/esr.h
>> @@ -330,6 +330,11 @@
>> #ifndef __ASSEMBLY__
>> #include <asm/types.h>
>>
>> +static inline bool esr_is_sea(u32 esr)
>> +{
>> + return (esr & ESR_ELx_FSC) == ESR_ELx_FSC_EXTABT;
>> +}
>> +
>> static inline bool esr_is_data_abort(u32 esr)
>> {
>> const u32 ec = ESR_ELx_EC(esr);
>> diff --git a/arch/arm64/include/asm/extable.h b/arch/arm64/include/asm/extable.h
>> index 72b0e71cc3de..f7835b0f473b 100644
>> --- a/arch/arm64/include/asm/extable.h
>> +++ b/arch/arm64/include/asm/extable.h
>> @@ -45,5 +45,5 @@ bool ex_handler_bpf(const struct exception_table_entry *ex,
>> }
>> #endif /* !CONFIG_BPF_JIT */
>>
>> -bool fixup_exception(struct pt_regs *regs);
>> +bool fixup_exception(struct pt_regs *regs, unsigned int esr);
>> #endif
>> diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c
>> index d9dfa82c1f18..16a069e8eec3 100644
>> --- a/arch/arm64/kernel/probes/kprobes.c
>> +++ b/arch/arm64/kernel/probes/kprobes.c
>> @@ -285,7 +285,7 @@ int __kprobes kprobe_fault_handler(struct pt_regs *regs, unsigned int fsr)
>> * In case the user-specified fault handler returned
>> * zero, try to fix up.
>> */
>> - if (fixup_exception(regs))
>> + if (fixup_exception(regs, fsr))
>> return 1;
>> }
>> return 0;
>> diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c
>> index 489455309695..f1134c88e849 100644
>> --- a/arch/arm64/mm/extable.c
>> +++ b/arch/arm64/mm/extable.c
>> @@ -9,6 +9,7 @@
>>
>> #include <asm/asm-extable.h>
>> #include <asm/ptrace.h>
>> +#include <asm/esr.h>
>>
>> static inline unsigned long
>> get_ex_fixup(const struct exception_table_entry *ex)
>> @@ -23,6 +24,18 @@ static bool ex_handler_fixup(const struct exception_table_entry *ex,
>> return true;
>> }
>>
>> +static bool ex_handler_fixup_mc(const struct exception_table_entry *ex,
>> + struct pt_regs *regs, unsigned int esr)
>> +{
>> + if (esr_is_sea(esr))
>> + regs->regs[0] = 0;
>> + else
>> + regs->regs[0] = 1;
>
> This needs more explanation.
>
> Why does this hard-code an assumption that we can alter x0?
>
> Why is the x0 value distinct for SEA or non-SEA? What is this meant to
> represent specifically?
>
> What if this SEA was taken for a reason other than a memory error?
There is a problem with the design of this place and it needs to be
modified. The value in x0 cannot be overwritten. will be fixed in next
version.
>
>> +
>> + regs->pc = get_ex_fixup(ex);
>> + return true;
>> +}
>> +
>> static bool ex_handler_uaccess_err_zero(const struct exception_table_entry *ex,
>> struct pt_regs *regs)
>> {
>> @@ -63,7 +76,7 @@ ex_handler_load_unaligned_zeropad(const struct exception_table_entry *ex,
>> return true;
>> }
>>
>> -bool fixup_exception(struct pt_regs *regs)
>> +bool fixup_exception(struct pt_regs *regs, unsigned int esr)
>> {
>> const struct exception_table_entry *ex;
>>
>> @@ -71,9 +84,14 @@ bool fixup_exception(struct pt_regs *regs)
>> if (!ex)
>> return false;
>>
>> + if (esr_is_sea(esr) && !IS_EX_TYPE_MC(ex->type))
>> + return false;
>
> I don't think this check belongs here.
>
> Either this should be folded into ex_handler_fixup_mc(), or we should make the
> judgement earlier in the fault handling path, and have a separate
> fixup_exception_mc() that we can call specifically in the case of a memory
> error.
Agreed, Maybe it's better to use a separate fixup_exception_mc().
>
>> +
>> switch (ex->type) {
>> case EX_TYPE_FIXUP:
>> return ex_handler_fixup(ex, regs);
>> + case EX_TYPE_FIXUP_MC:
>> + return ex_handler_fixup_mc(ex, regs, esr);
>> case EX_TYPE_BPF:
>> return ex_handler_bpf(ex, regs);
>> case EX_TYPE_UACCESS_ERR_ZERO:
>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> index 77341b160aca..ffdfab2fdd60 100644
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -361,7 +361,7 @@ static void __do_kernel_fault(unsigned long addr, unsigned int esr,
>> * Are we prepared to handle this kernel fault?
>> * We are almost certainly not prepared to handle instruction faults.
>> */
>> - if (!is_el1_instruction_abort(esr) && fixup_exception(regs))
>> + if (!is_el1_instruction_abort(esr) && fixup_exception(regs, esr))
>> return;
>>
>> if (WARN_RATELIMIT(is_spurious_el1_translation_fault(addr, esr, regs),
>> @@ -695,6 +695,30 @@ static int do_bad(unsigned long far, unsigned int esr, struct pt_regs *regs)
>> return 1; /* "fault" */
>> }
>>
>> +static bool arm64_process_kernel_sea(unsigned long addr, unsigned int esr,
>> + struct pt_regs *regs, int sig, int code)
>> +{
>> + if (!IS_ENABLED(CONFIG_ARCH_HAS_COPY_MC))
>> + return false;
>> +
>> + if (user_mode(regs) || !current->mm)
>> + return false;
>> +
>> + if (apei_claim_sea(regs) < 0)
>> + return false;
>> +
>> + current->thread.fault_address = 0;
>> + current->thread.fault_code = esr;
>> +
>> + if (!fixup_exception(regs, esr))
>> + return false;
>> +
>> + arm64_force_sig_fault(sig, code, addr,
>> + "Uncorrected hardware memory error in kernel-access\n");
>> +
>> + return true;
>> +}
>> +
>> static int do_sea(unsigned long far, unsigned int esr, struct pt_regs *regs)
>> {
>> const struct fault_info *inf;
>> @@ -720,6 +744,10 @@ static int do_sea(unsigned long far, unsigned int esr, struct pt_regs *regs)
>> */
>> siaddr = untagged_addr(far);
>> }
>> +
>> + if (arm64_process_kernel_sea(siaddr, esr, regs, inf->sig, inf->code))
>> + return 0;
>> +
>> arm64_notify_die(inf->name, regs, inf->sig, inf->code, siaddr, esr);
>>
>> return 0;
>> diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
>> index 546179418ffa..dd952aeecdc1 100644
>> --- a/include/linux/uaccess.h
>> +++ b/include/linux/uaccess.h
>> @@ -174,6 +174,14 @@ copy_mc_to_kernel(void *dst, const void *src, size_t cnt)
>> }
>> #endif
>>
>> +#ifndef copy_mc_to_user
>> +static inline unsigned long __must_check
>> +copy_mc_to_user(void *dst, const void *src, size_t cnt)
>> +{
>> + return raw_copy_to_user(dst, src, cnt);
>> +}
>
> ... this isn't using the new EX_TYPE_FIXUP_MC type, so isn't this just broken
> as of this patch?
>
> Thanks,
> Mark.
It is useful in the later patch (adding specific scenarios to use this
framework), here just add helper.
Great thanks,
Tong.
>
>> +#endif
>> +
>> static __always_inline void pagefault_disabled_inc(void)
>> {
>> current->pagefault_disabled++;
>> --
>> 2.18.0.huawei.25
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> [email protected]
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> .
在 2022/4/6 19:27, Mark Rutland 写道:
> On Wed, Apr 06, 2022 at 09:13:11AM +0000, Tong Tiangen wrote:
>> When user process reading file, the data is cached in pagecache and
>> the data belongs to the user process, When machine check error is
>> encountered during pagecache reading, killing the user process and
>> isolate the user page with hardware memory errors is a more reasonable
>> choice than kernel panic.
>>
>> The __arch_copy_mc_to_user() in copy_to_user_mc.S is largely borrows
>> from __arch_copy_to_user() in copy_to_user.S and the main difference
>> is __arch_copy_mc_to_user() add the extable entry to support machine
>> check safe.
>
> As with prior patches, *why* is the distinction necessary?
>
> This patch adds a bunch of conditional logic, but *structurally* it doesn't
> alter the handling to be substantially different for the MC and non-MC cases.
>
> This seems like pointless duplication that just makes it harder to maintain
> this code.
>
> Thanks,
> Mark.
Agreed, The implementation here looks a little ugly and harder to maintain.
The purpose of my doing this is not all copy_to_user can be recovered.
A memory error is consumed when reading pagecache using copy_to_user.
I think in this scenario, only the process is affected because it can't read
pagecache data correctly. Just kill the process and don't need the whole
kernel panic.
So I need two different copy_to_user implementation, one is existing
__arch_copy_to_user,
this function will panic when consuming memory errors. The other one is
this new helper
__arch_copy_mc_to_user, this interface is used when reading pagecache.
It can recover from
consume memory error.
In future, if find a scenario use copy_to_user can also be recovered, we
can also use this mc
safe helper instead it.
Thanks,
Tong.
>
>> In _copy_page_to_iter(), machine check safe only considered ITER_IOVEC
>> which is used by pagecache reading.
>>
>> Signed-off-by: Tong Tiangen <[email protected]>
>> ---
>> arch/arm64/include/asm/uaccess.h | 15 ++++++
>> arch/arm64/lib/Makefile | 2 +-
>> arch/arm64/lib/copy_to_user_mc.S | 78 +++++++++++++++++++++++++++++
>> include/linux/uio.h | 9 +++-
>> lib/iov_iter.c | 85 +++++++++++++++++++++++++-------
>> 5 files changed, 170 insertions(+), 19 deletions(-)
>> create mode 100644 arch/arm64/lib/copy_to_user_mc.S
>>
>> diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
>> index 24b662407fbd..f0d5e811165a 100644
>> --- a/arch/arm64/include/asm/uaccess.h
>> +++ b/arch/arm64/include/asm/uaccess.h
>> @@ -448,6 +448,21 @@ extern long strncpy_from_user(char *dest, const char __user *src, long count);
>>
>> extern __must_check long strnlen_user(const char __user *str, long n);
>>
>> +#ifdef CONFIG_ARCH_HAS_COPY_MC
>> +extern unsigned long __must_check __arch_copy_mc_to_user(void __user *to,
>> + const void *from, unsigned long n);
>> +static inline unsigned long __must_check
>> +copy_mc_to_user(void __user *to, const void *from, unsigned long n)
>> +{
>> + uaccess_ttbr0_enable();
>> + n = __arch_copy_mc_to_user(__uaccess_mask_ptr(to), from, n);
>> + uaccess_ttbr0_disable();
>> +
>> + return n;
>> +}
>> +#define copy_mc_to_user copy_mc_to_user
>> +#endif
>> +
>> #ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
>> struct page;
>> void memcpy_page_flushcache(char *to, struct page *page, size_t offset, size_t len);
>> diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
>> index 29c578414b12..9b3571227fb4 100644
>> --- a/arch/arm64/lib/Makefile
>> +++ b/arch/arm64/lib/Makefile
>> @@ -23,4 +23,4 @@ obj-$(CONFIG_ARM64_MTE) += mte.o
>>
>> obj-$(CONFIG_KASAN_SW_TAGS) += kasan_sw_tags.o
>>
>> -obj-$(CONFIG_ARCH_HAS_CPY_MC) += copy_page_mc.o
>> +obj-$(CONFIG_ARCH_HAS_COPY_MC) += copy_page_mc.o copy_to_user_mc.o
>> diff --git a/arch/arm64/lib/copy_to_user_mc.S b/arch/arm64/lib/copy_to_user_mc.S
>> new file mode 100644
>> index 000000000000..9d228ff15446
>> --- /dev/null
>> +++ b/arch/arm64/lib/copy_to_user_mc.S
>> @@ -0,0 +1,78 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/*
>> + * Copyright (C) 2012 ARM Ltd.
>> + */
>> +
>> +#include <linux/linkage.h>
>> +
>> +#include <asm/asm-uaccess.h>
>> +#include <asm/assembler.h>
>> +#include <asm/cache.h>
>> +
>> +/*
>> + * Copy to user space from a kernel buffer (alignment handled by the hardware)
>> + *
>> + * Parameters:
>> + * x0 - to
>> + * x1 - from
>> + * x2 - n
>> + * Returns:
>> + * x0 - bytes not copied
>> + */
>> + .macro ldrb1 reg, ptr, val
>> + 1000: ldrb \reg, [\ptr], \val;
>> + _asm_extable_mc 1000b, 9998f;
>> + .endm
>> +
>> + .macro strb1 reg, ptr, val
>> + user_ldst_mc 9998f, sttrb, \reg, \ptr, \val
>> + .endm
>> +
>> + .macro ldrh1 reg, ptr, val
>> + 1001: ldrh \reg, [\ptr], \val;
>> + _asm_extable_mc 1001b, 9998f;
>> + .endm
>> +
>> + .macro strh1 reg, ptr, val
>> + user_ldst_mc 9997f, sttrh, \reg, \ptr, \val
>> + .endm
>> +
>> + .macro ldr1 reg, ptr, val
>> + 1002: ldr \reg, [\ptr], \val;
>> + _asm_extable_mc 1002b, 9998f;
>> + .endm
>> +
>> + .macro str1 reg, ptr, val
>> + user_ldst_mc 9997f, sttr, \reg, \ptr, \val
>> + .endm
>> +
>> + .macro ldp1 reg1, reg2, ptr, val
>> + 1003: ldp \reg1, \reg2, [\ptr], \val;
>> + _asm_extable_mc 1003b, 9998f;
>> + .endm
>> +
>> + .macro stp1 reg1, reg2, ptr, val
>> + user_stp 9997f, \reg1, \reg2, \ptr, \val
>> + .endm
>> +
>> +end .req x5
>> +srcin .req x15
>> +SYM_FUNC_START(__arch_copy_mc_to_user)
>> + add end, x0, x2
>> + mov srcin, x1
>> +#include "copy_template.S"
>> + mov x0, #0
>> + ret
>> +
>> + // Exception fixups
>> +9997: cbz x0, 9998f // Check machine check exception
>> + cmp dst, dstin
>> + b.ne 9998f
>> + // Before being absolutely sure we couldn't copy anything, try harder
>> + ldrb tmp1w, [srcin]
>> +USER(9998f, sttrb tmp1w, [dst])
>> + add dst, dst, #1
>> +9998: sub x0, end, dst // bytes not copied
>> + ret
>> +SYM_FUNC_END(__arch_copy_mc_to_user)
>> +EXPORT_SYMBOL(__arch_copy_mc_to_user)
>> diff --git a/include/linux/uio.h b/include/linux/uio.h
>> index 739285fe5a2f..539d9ee9b032 100644
>> --- a/include/linux/uio.h
>> +++ b/include/linux/uio.h
>> @@ -147,10 +147,17 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
>> size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i);
>> size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i);
>>
>> +#ifdef CONFIG_ARCH_HAS_COPY_MC
>> +size_t copy_mc_page_to_iter(struct page *page, size_t offset, size_t bytes,
>> + struct iov_iter *i);
>> +#else
>> +#define copy_mc_page_to_iter copy_page_to_iter
>> +#endif
>> +
>> static inline size_t copy_folio_to_iter(struct folio *folio, size_t offset,
>> size_t bytes, struct iov_iter *i)
>> {
>> - return copy_page_to_iter(&folio->page, offset, bytes, i);
>> + return copy_mc_page_to_iter(&folio->page, offset, bytes, i);
>> }
>>
>> static __always_inline __must_check
>> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
>> index 6dd5330f7a99..2c5f3bb6391d 100644
>> --- a/lib/iov_iter.c
>> +++ b/lib/iov_iter.c
>> @@ -157,6 +157,19 @@ static int copyout(void __user *to, const void *from, size_t n)
>> return n;
>> }
>>
>> +#ifdef CONFIG_ARCH_HAS_COPY_MC
>> +static int copyout_mc(void __user *to, const void *from, size_t n)
>> +{
>> + if (access_ok(to, n)) {
>> + instrument_copy_to_user(to, from, n);
>> + n = copy_mc_to_user((__force void *) to, from, n);
>> + }
>> + return n;
>> +}
>> +#else
>> +#define copyout_mc copyout
>> +#endif
>> +
>> static int copyin(void *to, const void __user *from, size_t n)
>> {
>> if (should_fail_usercopy())
>> @@ -169,7 +182,7 @@ static int copyin(void *to, const void __user *from, size_t n)
>> }
>>
>> static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t bytes,
>> - struct iov_iter *i)
>> + struct iov_iter *i, bool mc_safe)
>> {
>> size_t skip, copy, left, wanted;
>> const struct iovec *iov;
>> @@ -194,7 +207,10 @@ static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t b
>> from = kaddr + offset;
>>
>> /* first chunk, usually the only one */
>> - left = copyout(buf, from, copy);
>> + if (mc_safe)
>> + left = copyout_mc(buf, from, copy);
>> + else
>> + left = copyout(buf, from, copy);
>> copy -= left;
>> skip += copy;
>> from += copy;
>> @@ -204,7 +220,10 @@ static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t b
>> iov++;
>> buf = iov->iov_base;
>> copy = min(bytes, iov->iov_len);
>> - left = copyout(buf, from, copy);
>> + if (mc_safe)
>> + left = copyout_mc(buf, from, copy);
>> + else
>> + left = copyout(buf, from, copy);
>> copy -= left;
>> skip = copy;
>> from += copy;
>> @@ -223,7 +242,10 @@ static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t b
>>
>> kaddr = kmap(page);
>> from = kaddr + offset;
>> - left = copyout(buf, from, copy);
>> + if (mc_safe)
>> + left = copyout_mc(buf, from, copy);
>> + else
>> + left = copyout(buf, from, copy);
>> copy -= left;
>> skip += copy;
>> from += copy;
>> @@ -232,7 +254,10 @@ static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t b
>> iov++;
>> buf = iov->iov_base;
>> copy = min(bytes, iov->iov_len);
>> - left = copyout(buf, from, copy);
>> + if (mc_safe)
>> + left = copyout_mc(buf, from, copy);
>> + else
>> + left = copyout(buf, from, copy);
>> copy -= left;
>> skip = copy;
>> from += copy;
>> @@ -674,15 +699,6 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
>> EXPORT_SYMBOL(_copy_to_iter);
>>
>> #ifdef CONFIG_ARCH_HAS_COPY_MC
>> -static int copyout_mc(void __user *to, const void *from, size_t n)
>> -{
>> - if (access_ok(to, n)) {
>> - instrument_copy_to_user(to, from, n);
>> - n = copy_mc_to_user((__force void *) to, from, n);
>> - }
>> - return n;
>> -}
>> -
>> static size_t copy_mc_pipe_to_iter(const void *addr, size_t bytes,
>> struct iov_iter *i)
>> {
>> @@ -846,10 +862,10 @@ static inline bool page_copy_sane(struct page *page, size_t offset, size_t n)
>> }
>>
>> static size_t __copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
>> - struct iov_iter *i)
>> + struct iov_iter *i, bool mc_safe)
>> {
>> if (likely(iter_is_iovec(i)))
>> - return copy_page_to_iter_iovec(page, offset, bytes, i);
>> + return copy_page_to_iter_iovec(page, offset, bytes, i, mc_safe);
>> if (iov_iter_is_bvec(i) || iov_iter_is_kvec(i) || iov_iter_is_xarray(i)) {
>> void *kaddr = kmap_local_page(page);
>> size_t wanted = _copy_to_iter(kaddr + offset, bytes, i);
>> @@ -878,7 +894,7 @@ size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
>> offset %= PAGE_SIZE;
>> while (1) {
>> size_t n = __copy_page_to_iter(page, offset,
>> - min(bytes, (size_t)PAGE_SIZE - offset), i);
>> + min(bytes, (size_t)PAGE_SIZE - offset), i, false);
>> res += n;
>> bytes -= n;
>> if (!bytes || !n)
>> @@ -893,6 +909,41 @@ size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
>> }
>> EXPORT_SYMBOL(copy_page_to_iter);
>>
>> +#ifdef CONFIG_ARCH_HAS_COPY_MC
>> +/**
>> + * copy_mc_page_to_iter - copy page to iter with source memory error exception handling.
>> + *
>> + * The filemap_read deploys this for pagecache reading and the main differences between
>> + * this and typical copy_page_to_iter() is call __copy_page_to_iter with mc_safe true.
>> + *
>> + * Return: number of bytes copied (may be %0)
>> + */
>> +size_t copy_mc_page_to_iter(struct page *page, size_t offset, size_t bytes,
>> + struct iov_iter *i)
>> +{
>> + size_t res = 0;
>> +
>> + if (unlikely(!page_copy_sane(page, offset, bytes)))
>> + return 0;
>> + page += offset / PAGE_SIZE; // first subpage
>> + offset %= PAGE_SIZE;
>> + while (1) {
>> + size_t n = __copy_page_to_iter(page, offset,
>> + min(bytes, (size_t)PAGE_SIZE - offset), i, true);
>> + res += n;
>> + bytes -= n;
>> + if (!bytes || !n)
>> + break;
>> + offset += n;
>> + if (offset == PAGE_SIZE) {
>> + page++;
>> + offset = 0;
>> + }
>> + }
>> + return res;
>> +}
>> +#endif
>> +
>> size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
>> struct iov_iter *i)
>> {
>> --
>> 2.18.0.huawei.25
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> [email protected]
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> .
在 2022/4/7 23:53, Robin Murphy 写道:
> On 2022-04-07 15:56, Tong Tiangen wrote:
>>
>>
>> 在 2022/4/6 19:27, Mark Rutland 写道:
>>> On Wed, Apr 06, 2022 at 09:13:11AM +0000, Tong Tiangen wrote:
>>>> When user process reading file, the data is cached in pagecache and
>>>> the data belongs to the user process, When machine check error is
>>>> encountered during pagecache reading, killing the user process and
>>>> isolate the user page with hardware memory errors is a more reasonable
>>>> choice than kernel panic.
>>>>
>>>> The __arch_copy_mc_to_user() in copy_to_user_mc.S is largely borrows
>>>> from __arch_copy_to_user() in copy_to_user.S and the main difference
>>>> is __arch_copy_mc_to_user() add the extable entry to support machine
>>>> check safe.
>>>
>>> As with prior patches, *why* is the distinction necessary?
>>>
>>> This patch adds a bunch of conditional logic, but *structurally* it
>>> doesn't
>>> alter the handling to be substantially different for the MC and
>>> non-MC cases.
>>>
>>> This seems like pointless duplication that just makes it harder to
>>> maintain
>>> this code.
>>>
>>> Thanks,
>>> Mark.
>>
>> Agreed, The implementation here looks a little ugly and harder to
>> maintain.
>>
>> The purpose of my doing this is not all copy_to_user can be recovered.
>>
>> A memory error is consumed when reading pagecache using copy_to_user.
>> I think in this scenario, only the process is affected because it
>> can't read
>> pagecache data correctly. Just kill the process and don't need the whole
>> kernel panic.
>>
>> So I need two different copy_to_user implementation, one is existing
>> __arch_copy_to_user,
>> this function will panic when consuming memory errors. The other one
>> is this new helper
>> __arch_copy_mc_to_user, this interface is used when reading pagecache.
>> It can recover from
>> consume memory error.
>
> OK, but do we really need two almost-identical implementations of every
> function where the only difference is how the exception table entries
> are annotated? Could the exception handler itself just figure out who
> owns the page where the fault occurred and decide what action to take as
> appropriate?
>
> Robin.
>
Thank you, Robin.
I added this call path in this patchset: do_sea() -> fixup_exception(),
the purpose is to provide a chance for sea fault to fixup (refer patch
3/7).
If fixup successful, panic can be avoided. Otherwise, panic can be
eliminated according to the original logic.
fixup_exception() will set regs->pc and jump to regs->pc when the
context recovery of an exception occurs.
If mc-safe-fixup added to __arch_copy_to_user(), in *non pagecache
reading* scenario encount memory error when call __arch_copy_to_user() ,
do_sea() -> fixup_exception() will not return fail and will miss the
panic logic in do_sea().
So I add new helper __arch_copy_mc_to_user() and add mc-safe-fixup to
this helper, which can be used in the required scenarios. At present,
there is only one pagecache reading scenario, other scenarios need to be
developed.
This is my current confusion. Of course, I will think about the solution
to solve the duplicate code problem.
Thanks,
Tong.
>>
>> In future, if find a scenario use copy_to_user can also be recovered,
>> we can also use this mc
>> safe helper instead it.
>>
>> Thanks,
>> Tong.
>>
>>>
>>>> In _copy_page_to_iter(), machine check safe only considered ITER_IOVEC
>>>> which is used by pagecache reading.
>>>>
>>>> Signed-off-by: Tong Tiangen <[email protected]>
>>>> ---
>>>> arch/arm64/include/asm/uaccess.h | 15 ++++++
>>>> arch/arm64/lib/Makefile | 2 +-
>>>> arch/arm64/lib/copy_to_user_mc.S | 78 +++++++++++++++++++++++++++++
>>>> include/linux/uio.h | 9 +++-
>>>> lib/iov_iter.c | 85
>>>> +++++++++++++++++++++++++-------
>>>> 5 files changed, 170 insertions(+), 19 deletions(-)
>>>> create mode 100644 arch/arm64/lib/copy_to_user_mc.S
>>>>
>>>> diff --git a/arch/arm64/include/asm/uaccess.h
>>>> b/arch/arm64/include/asm/uaccess.h
>>>> index 24b662407fbd..f0d5e811165a 100644
>>>> --- a/arch/arm64/include/asm/uaccess.h
>>>> +++ b/arch/arm64/include/asm/uaccess.h
>>>> @@ -448,6 +448,21 @@ extern long strncpy_from_user(char *dest, const
>>>> char __user *src, long count);
>>>> extern __must_check long strnlen_user(const char __user *str, long
>>>> n);
>>>> +#ifdef CONFIG_ARCH_HAS_COPY_MC
>>>> +extern unsigned long __must_check __arch_copy_mc_to_user(void
>>>> __user *to,
>>>> + const void *from, unsigned long n);
>>>> +static inline unsigned long __must_check
>>>> +copy_mc_to_user(void __user *to, const void *from, unsigned long n)
>>>> +{
>>>> + uaccess_ttbr0_enable();
>>>> + n = __arch_copy_mc_to_user(__uaccess_mask_ptr(to), from, n);
>>>> + uaccess_ttbr0_disable();
>>>> +
>>>> + return n;
>>>> +}
>>>> +#define copy_mc_to_user copy_mc_to_user
>>>> +#endif
>>>> +
>>>> #ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
>>>> struct page;
>>>> void memcpy_page_flushcache(char *to, struct page *page, size_t
>>>> offset, size_t len);
>>>> diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
>>>> index 29c578414b12..9b3571227fb4 100644
>>>> --- a/arch/arm64/lib/Makefile
>>>> +++ b/arch/arm64/lib/Makefile
>>>> @@ -23,4 +23,4 @@ obj-$(CONFIG_ARM64_MTE) += mte.o
>>>> obj-$(CONFIG_KASAN_SW_TAGS) += kasan_sw_tags.o
>>>> -obj-$(CONFIG_ARCH_HAS_CPY_MC) += copy_page_mc.o
>>>> +obj-$(CONFIG_ARCH_HAS_COPY_MC) += copy_page_mc.o copy_to_user_mc.o
>>>> diff --git a/arch/arm64/lib/copy_to_user_mc.S
>>>> b/arch/arm64/lib/copy_to_user_mc.S
>>>> new file mode 100644
>>>> index 000000000000..9d228ff15446
>>>> --- /dev/null
>>>> +++ b/arch/arm64/lib/copy_to_user_mc.S
>>>> @@ -0,0 +1,78 @@
>>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>>> +/*
>>>> + * Copyright (C) 2012 ARM Ltd.
>>>> + */
>>>> +
>>>> +#include <linux/linkage.h>
>>>> +
>>>> +#include <asm/asm-uaccess.h>
>>>> +#include <asm/assembler.h>
>>>> +#include <asm/cache.h>
>>>> +
>>>> +/*
>>>> + * Copy to user space from a kernel buffer (alignment handled by
>>>> the hardware)
>>>> + *
>>>> + * Parameters:
>>>> + * x0 - to
>>>> + * x1 - from
>>>> + * x2 - n
>>>> + * Returns:
>>>> + * x0 - bytes not copied
>>>> + */
>>>> + .macro ldrb1 reg, ptr, val
>>>> + 1000: ldrb \reg, [\ptr], \val;
>>>> + _asm_extable_mc 1000b, 9998f;
>>>> + .endm
>>>> +
>>>> + .macro strb1 reg, ptr, val
>>>> + user_ldst_mc 9998f, sttrb, \reg, \ptr, \val
>>>> + .endm
>>>> +
>>>> + .macro ldrh1 reg, ptr, val
>>>> + 1001: ldrh \reg, [\ptr], \val;
>>>> + _asm_extable_mc 1001b, 9998f;
>>>> + .endm
>>>> +
>>>> + .macro strh1 reg, ptr, val
>>>> + user_ldst_mc 9997f, sttrh, \reg, \ptr, \val
>>>> + .endm
>>>> +
>>>> + .macro ldr1 reg, ptr, val
>>>> + 1002: ldr \reg, [\ptr], \val;
>>>> + _asm_extable_mc 1002b, 9998f;
>>>> + .endm
>>>> +
>>>> + .macro str1 reg, ptr, val
>>>> + user_ldst_mc 9997f, sttr, \reg, \ptr, \val
>>>> + .endm
>>>> +
>>>> + .macro ldp1 reg1, reg2, ptr, val
>>>> + 1003: ldp \reg1, \reg2, [\ptr], \val;
>>>> + _asm_extable_mc 1003b, 9998f;
>>>> + .endm
>>>> +
>>>> + .macro stp1 reg1, reg2, ptr, val
>>>> + user_stp 9997f, \reg1, \reg2, \ptr, \val
>>>> + .endm
>>>> +
>>>> +end .req x5
>>>> +srcin .req x15
>>>> +SYM_FUNC_START(__arch_copy_mc_to_user)
>>>> + add end, x0, x2
>>>> + mov srcin, x1
>>>> +#include "copy_template.S"
>>>> + mov x0, #0
>>>> + ret
>>>> +
>>>> + // Exception fixups
>>>> +9997: cbz x0, 9998f // Check machine check exception
>>>> + cmp dst, dstin
>>>> + b.ne 9998f
>>>> + // Before being absolutely sure we couldn't copy anything, try
>>>> harder
>>>> + ldrb tmp1w, [srcin]
>>>> +USER(9998f, sttrb tmp1w, [dst])
>>>> + add dst, dst, #1
>>>> +9998: sub x0, end, dst // bytes not copied
>>>> + ret
>>>> +SYM_FUNC_END(__arch_copy_mc_to_user)
>>>> +EXPORT_SYMBOL(__arch_copy_mc_to_user)
>>>> diff --git a/include/linux/uio.h b/include/linux/uio.h
>>>> index 739285fe5a2f..539d9ee9b032 100644
>>>> --- a/include/linux/uio.h
>>>> +++ b/include/linux/uio.h
>>>> @@ -147,10 +147,17 @@ size_t _copy_to_iter(const void *addr, size_t
>>>> bytes, struct iov_iter *i);
>>>> size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i);
>>>> size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct
>>>> iov_iter *i);
>>>> +#ifdef CONFIG_ARCH_HAS_COPY_MC
>>>> +size_t copy_mc_page_to_iter(struct page *page, size_t offset,
>>>> size_t bytes,
>>>> + struct iov_iter *i);
>>>> +#else
>>>> +#define copy_mc_page_to_iter copy_page_to_iter
>>>> +#endif
>>>> +
>>>> static inline size_t copy_folio_to_iter(struct folio *folio,
>>>> size_t offset,
>>>> size_t bytes, struct iov_iter *i)
>>>> {
>>>> - return copy_page_to_iter(&folio->page, offset, bytes, i);
>>>> + return copy_mc_page_to_iter(&folio->page, offset, bytes, i);
>>>> }
>>>> static __always_inline __must_check
>>>> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
>>>> index 6dd5330f7a99..2c5f3bb6391d 100644
>>>> --- a/lib/iov_iter.c
>>>> +++ b/lib/iov_iter.c
>>>> @@ -157,6 +157,19 @@ static int copyout(void __user *to, const void
>>>> *from, size_t n)
>>>> return n;
>>>> }
>>>> +#ifdef CONFIG_ARCH_HAS_COPY_MC
>>>> +static int copyout_mc(void __user *to, const void *from, size_t n)
>>>> +{
>>>> + if (access_ok(to, n)) {
>>>> + instrument_copy_to_user(to, from, n);
>>>> + n = copy_mc_to_user((__force void *) to, from, n);
>>>> + }
>>>> + return n;
>>>> +}
>>>> +#else
>>>> +#define copyout_mc copyout
>>>> +#endif
>>>> +
>>>> static int copyin(void *to, const void __user *from, size_t n)
>>>> {
>>>> if (should_fail_usercopy())
>>>> @@ -169,7 +182,7 @@ static int copyin(void *to, const void __user
>>>> *from, size_t n)
>>>> }
>>>> static size_t copy_page_to_iter_iovec(struct page *page, size_t
>>>> offset, size_t bytes,
>>>> - struct iov_iter *i)
>>>> + struct iov_iter *i, bool mc_safe)
>>>> {
>>>> size_t skip, copy, left, wanted;
>>>> const struct iovec *iov;
>>>> @@ -194,7 +207,10 @@ static size_t copy_page_to_iter_iovec(struct
>>>> page *page, size_t offset, size_t b
>>>> from = kaddr + offset;
>>>> /* first chunk, usually the only one */
>>>> - left = copyout(buf, from, copy);
>>>> + if (mc_safe)
>>>> + left = copyout_mc(buf, from, copy);
>>>> + else
>>>> + left = copyout(buf, from, copy);
>>>> copy -= left;
>>>> skip += copy;
>>>> from += copy;
>>>> @@ -204,7 +220,10 @@ static size_t copy_page_to_iter_iovec(struct
>>>> page *page, size_t offset, size_t b
>>>> iov++;
>>>> buf = iov->iov_base;
>>>> copy = min(bytes, iov->iov_len);
>>>> - left = copyout(buf, from, copy);
>>>> + if (mc_safe)
>>>> + left = copyout_mc(buf, from, copy);
>>>> + else
>>>> + left = copyout(buf, from, copy);
>>>> copy -= left;
>>>> skip = copy;
>>>> from += copy;
>>>> @@ -223,7 +242,10 @@ static size_t copy_page_to_iter_iovec(struct
>>>> page *page, size_t offset, size_t b
>>>> kaddr = kmap(page);
>>>> from = kaddr + offset;
>>>> - left = copyout(buf, from, copy);
>>>> + if (mc_safe)
>>>> + left = copyout_mc(buf, from, copy);
>>>> + else
>>>> + left = copyout(buf, from, copy);
>>>> copy -= left;
>>>> skip += copy;
>>>> from += copy;
>>>> @@ -232,7 +254,10 @@ static size_t copy_page_to_iter_iovec(struct
>>>> page *page, size_t offset, size_t b
>>>> iov++;
>>>> buf = iov->iov_base;
>>>> copy = min(bytes, iov->iov_len);
>>>> - left = copyout(buf, from, copy);
>>>> + if (mc_safe)
>>>> + left = copyout_mc(buf, from, copy);
>>>> + else
>>>> + left = copyout(buf, from, copy);
>>>> copy -= left;
>>>> skip = copy;
>>>> from += copy;
>>>> @@ -674,15 +699,6 @@ size_t _copy_to_iter(const void *addr, size_t
>>>> bytes, struct iov_iter *i)
>>>> EXPORT_SYMBOL(_copy_to_iter);
>>>> #ifdef CONFIG_ARCH_HAS_COPY_MC
>>>> -static int copyout_mc(void __user *to, const void *from, size_t n)
>>>> -{
>>>> - if (access_ok(to, n)) {
>>>> - instrument_copy_to_user(to, from, n);
>>>> - n = copy_mc_to_user((__force void *) to, from, n);
>>>> - }
>>>> - return n;
>>>> -}
>>>> -
>>>> static size_t copy_mc_pipe_to_iter(const void *addr, size_t bytes,
>>>> struct iov_iter *i)
>>>> {
>>>> @@ -846,10 +862,10 @@ static inline bool page_copy_sane(struct page
>>>> *page, size_t offset, size_t n)
>>>> }
>>>> static size_t __copy_page_to_iter(struct page *page, size_t
>>>> offset, size_t bytes,
>>>> - struct iov_iter *i)
>>>> + struct iov_iter *i, bool mc_safe)
>>>> {
>>>> if (likely(iter_is_iovec(i)))
>>>> - return copy_page_to_iter_iovec(page, offset, bytes, i);
>>>> + return copy_page_to_iter_iovec(page, offset, bytes, i,
>>>> mc_safe);
>>>> if (iov_iter_is_bvec(i) || iov_iter_is_kvec(i) ||
>>>> iov_iter_is_xarray(i)) {
>>>> void *kaddr = kmap_local_page(page);
>>>> size_t wanted = _copy_to_iter(kaddr + offset, bytes, i);
>>>> @@ -878,7 +894,7 @@ size_t copy_page_to_iter(struct page *page,
>>>> size_t offset, size_t bytes,
>>>> offset %= PAGE_SIZE;
>>>> while (1) {
>>>> size_t n = __copy_page_to_iter(page, offset,
>>>> - min(bytes, (size_t)PAGE_SIZE - offset), i);
>>>> + min(bytes, (size_t)PAGE_SIZE - offset), i, false);
>>>> res += n;
>>>> bytes -= n;
>>>> if (!bytes || !n)
>>>> @@ -893,6 +909,41 @@ size_t copy_page_to_iter(struct page *page,
>>>> size_t offset, size_t bytes,
>>>> }
>>>> EXPORT_SYMBOL(copy_page_to_iter);
>>>> +#ifdef CONFIG_ARCH_HAS_COPY_MC
>>>> +/**
>>>> + * copy_mc_page_to_iter - copy page to iter with source memory
>>>> error exception handling.
>>>> + *
>>>> + * The filemap_read deploys this for pagecache reading and the main
>>>> differences between
>>>> + * this and typical copy_page_to_iter() is call __copy_page_to_iter
>>>> with mc_safe true.
>>>> + *
>>>> + * Return: number of bytes copied (may be %0)
>>>> + */
>>>> +size_t copy_mc_page_to_iter(struct page *page, size_t offset,
>>>> size_t bytes,
>>>> + struct iov_iter *i)
>>>> +{
>>>> + size_t res = 0;
>>>> +
>>>> + if (unlikely(!page_copy_sane(page, offset, bytes)))
>>>> + return 0;
>>>> + page += offset / PAGE_SIZE; // first subpage
>>>> + offset %= PAGE_SIZE;
>>>> + while (1) {
>>>> + size_t n = __copy_page_to_iter(page, offset,
>>>> + min(bytes, (size_t)PAGE_SIZE - offset), i, true);
>>>> + res += n;
>>>> + bytes -= n;
>>>> + if (!bytes || !n)
>>>> + break;
>>>> + offset += n;
>>>> + if (offset == PAGE_SIZE) {
>>>> + page++;
>>>> + offset = 0;
>>>> + }
>>>> + }
>>>> + return res;
>>>> +}
>>>> +#endif
>>>> +
>>>> size_t copy_page_from_iter(struct page *page, size_t offset,
>>>> size_t bytes,
>>>> struct iov_iter *i)
>>>> {
>>>> --
>>>> 2.18.0.huawei.25
>>>>
>>>>
>>>> _______________________________________________
>>>> linux-arm-kernel mailing list
>>>> [email protected]
>>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>> .
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> [email protected]
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> .
On 2022-04-08 03:43, Tong Tiangen wrote:
>
>
> 在 2022/4/7 23:53, Robin Murphy 写道:
>> On 2022-04-07 15:56, Tong Tiangen wrote:
>>>
>>>
>>> 在 2022/4/6 19:27, Mark Rutland 写道:
>>>> On Wed, Apr 06, 2022 at 09:13:11AM +0000, Tong Tiangen wrote:
>>>>> When user process reading file, the data is cached in pagecache and
>>>>> the data belongs to the user process, When machine check error is
>>>>> encountered during pagecache reading, killing the user process and
>>>>> isolate the user page with hardware memory errors is a more reasonable
>>>>> choice than kernel panic.
>>>>>
>>>>> The __arch_copy_mc_to_user() in copy_to_user_mc.S is largely borrows
>>>>> from __arch_copy_to_user() in copy_to_user.S and the main difference
>>>>> is __arch_copy_mc_to_user() add the extable entry to support machine
>>>>> check safe.
>>>>
>>>> As with prior patches, *why* is the distinction necessary?
>>>>
>>>> This patch adds a bunch of conditional logic, but *structurally* it
>>>> doesn't
>>>> alter the handling to be substantially different for the MC and
>>>> non-MC cases.
>>>>
>>>> This seems like pointless duplication that just makes it harder to
>>>> maintain
>>>> this code.
>>>>
>>>> Thanks,
>>>> Mark.
>>>
>>> Agreed, The implementation here looks a little ugly and harder to
>>> maintain.
>>>
>>> The purpose of my doing this is not all copy_to_user can be recovered.
>>>
>>> A memory error is consumed when reading pagecache using copy_to_user.
>>> I think in this scenario, only the process is affected because it
>>> can't read
>>> pagecache data correctly. Just kill the process and don't need the whole
>>> kernel panic.
>>>
>>> So I need two different copy_to_user implementation, one is existing
>>> __arch_copy_to_user,
>>> this function will panic when consuming memory errors. The other one
>>> is this new helper
>>> __arch_copy_mc_to_user, this interface is used when reading
>>> pagecache. It can recover from
>>> consume memory error.
>>
>> OK, but do we really need two almost-identical implementations of
>> every function where the only difference is how the exception table
>> entries are annotated? Could the exception handler itself just figure
>> out who owns the page where the fault occurred and decide what action
>> to take as appropriate?
>>
>> Robin.
>>
>
> Thank you, Robin.
>
> I added this call path in this patchset: do_sea() -> fixup_exception(),
> the purpose is to provide a chance for sea fault to fixup (refer patch
> 3/7).
>
> If fixup successful, panic can be avoided. Otherwise, panic can be
> eliminated according to the original logic.
>
> fixup_exception() will set regs->pc and jump to regs->pc when the
> context recovery of an exception occurs.
>
> If mc-safe-fixup added to __arch_copy_to_user(), in *non pagecache
> reading* scenario encount memory error when call __arch_copy_to_user() ,
> do_sea() -> fixup_exception() will not return fail and will miss the
> panic logic in do_sea().
>
> So I add new helper __arch_copy_mc_to_user() and add mc-safe-fixup to
> this helper, which can be used in the required scenarios. At present,
> there is only one pagecache reading scenario, other scenarios need to be
> developed.
>
> This is my current confusion. Of course, I will think about the solution
> to solve the duplicate code problem.
Right, but if the point is that faults in pagecahe pages are special,
surely __do_kernel_fault() could ultimately figure out from the address
whether that's the case or not?
In general, if the principle is that whether a fault is recoverable or
not depends on what was being accessed, then to me it seems
fundamentally more robust to base that decision on details of the fault
that actually occurred, rather than what the caller thought it was
supposed to be doing at the time.
Thanks,
Robin.
On Thu, Apr 07, 2022 at 10:38:04PM +0800, Tong Tiangen wrote:
> 在 2022/4/6 19:22, Mark Rutland 写道:
> > On Wed, Apr 06, 2022 at 09:13:09AM +0000, Tong Tiangen wrote:
> > > Add scenarios get_user to machine check safe. The processing of
> > > EX_TYPE_UACCESS_ERR_ZERO and EX_TYPE_UACCESS_ERR_ZERO_UCE_RECOVERY is same
> > > and both return -EFAULT.
> >
> > Which uaccess cases do we expect to *not* be recoverable?
> >
> > Naively I would assume that if we're going to treat a memory error on a uaccess
> > as fatal to userspace we should be able to do that for *any* uacesses.
> >
> > The commit message should explain why we need the distinction between a
> > recoverable uaccess and a non-recoverable uaccess.
> >
> > Thanks,
> > Mark.
>
> Currently, any memory error consumed in kernel mode will lead to panic
> (do_sea()).
>
> My idea is that not all memory errors consumed in kernel mode are fatal,
> such as copy_ from_ user/get_ user is a memory error consumed when
> reading user data in the process context. In this case, we can not let the
> kernel panic, just kill the process without affecting the operation
> of the system.
I understood this part.
> However, not all uaccess can be recovered without affecting the normal
> operation of the system. The key is not whether it is uaccess, but whether
> there are key data affecting the normal operation of the system in the read
> page.
Ok. Can you give an example of such a case where the a uaccess that hits
a memory error must be fatal?
I think you might be trying to say that for copy_{to,from}_user() we can
make that judgement, but those are combined user+kernel access
primitives, and the *uaccess* part should never be reading from a page
with "key data affecting the normal operation of the system", since
that's userspace memory.
Is there any *userspace access* (e.g. where we use LDTR/STTR today)
where we must treat a memory error as fatal to the system?
Thanks,
Mark.
在 2022/4/8 23:22, Mark Rutland 写道:
> On Thu, Apr 07, 2022 at 10:38:04PM +0800, Tong Tiangen wrote:
>> 在 2022/4/6 19:22, Mark Rutland 写道:
>>> On Wed, Apr 06, 2022 at 09:13:09AM +0000, Tong Tiangen wrote:
>>>> Add scenarios get_user to machine check safe. The processing of
>>>> EX_TYPE_UACCESS_ERR_ZERO and EX_TYPE_UACCESS_ERR_ZERO_UCE_RECOVERY is same
>>>> and both return -EFAULT.
>>>
>>> Which uaccess cases do we expect to *not* be recoverable?
>>>
>>> Naively I would assume that if we're going to treat a memory error on a uaccess
>>> as fatal to userspace we should be able to do that for *any* uacesses.
>>>
>>> The commit message should explain why we need the distinction between a
>>> recoverable uaccess and a non-recoverable uaccess.
>>>
>>> Thanks,
>>> Mark.
>>
>> Currently, any memory error consumed in kernel mode will lead to panic
>> (do_sea()).
>>
>> My idea is that not all memory errors consumed in kernel mode are fatal,
>> such as copy_ from_ user/get_ user is a memory error consumed when
>> reading user data in the process context. In this case, we can not let the
>> kernel panic, just kill the process without affecting the operation
>> of the system.
>
> I understood this part.
>
>> However, not all uaccess can be recovered without affecting the normal
>> operation of the system. The key is not whether it is uaccess, but whether
>> there are key data affecting the normal operation of the system in the read
>> page.
>
> Ok. Can you give an example of such a case where the a uaccess that hits
> a memory error must be fatal?
>
> I think you might be trying to say that for copy_{to,from}_user() we can
> make that judgement, but those are combined user+kernel access
> primitives, and the *uaccess* part should never be reading from a page
> with "key data affecting the normal operation of the system", since
> that's userspace memory.
>
> Is there any *userspace access* (e.g. where we use LDTR/STTR today)
> where we must treat a memory error as fatal to the system?
>
> Thanks,
> Mark.
> .
I seem to understand what you mean.
Take copy_to_user()/put_user() as an example. If it encounters memory
error, only related processes will be affected. According to this
understanding, it seems that all uaccess can be recovered.
Thanks,
Tong.
在 2022/4/8 19:11, Robin Murphy 写道:
> On 2022-04-08 03:43, Tong Tiangen wrote:
>>
>>
>> 在 2022/4/7 23:53, Robin Murphy 写道:
>>> On 2022-04-07 15:56, Tong Tiangen wrote:
>>>>
>>>>
>>>> 在 2022/4/6 19:27, Mark Rutland 写道:
>>>>> On Wed, Apr 06, 2022 at 09:13:11AM +0000, Tong Tiangen wrote:
>>>>>> When user process reading file, the data is cached in pagecache and
>>>>>> the data belongs to the user process, When machine check error is
>>>>>> encountered during pagecache reading, killing the user process and
>>>>>> isolate the user page with hardware memory errors is a more
>>>>>> reasonable
>>>>>> choice than kernel panic.
>>>>>>
>>>>>> The __arch_copy_mc_to_user() in copy_to_user_mc.S is largely borrows
>>>>>> from __arch_copy_to_user() in copy_to_user.S and the main difference
>>>>>> is __arch_copy_mc_to_user() add the extable entry to support machine
>>>>>> check safe.
>>>>>
>>>>> As with prior patches, *why* is the distinction necessary?
>>>>>
>>>>> This patch adds a bunch of conditional logic, but *structurally* it
>>>>> doesn't
>>>>> alter the handling to be substantially different for the MC and
>>>>> non-MC cases.
>>>>>
>>>>> This seems like pointless duplication that just makes it harder to
>>>>> maintain
>>>>> this code.
>>>>>
>>>>> Thanks,
>>>>> Mark.
>>>>
>>>> Agreed, The implementation here looks a little ugly and harder to
>>>> maintain.
>>>>
>>>> The purpose of my doing this is not all copy_to_user can be recovered.
>>>>
>>>> A memory error is consumed when reading pagecache using copy_to_user.
>>>> I think in this scenario, only the process is affected because it
>>>> can't read
>>>> pagecache data correctly. Just kill the process and don't need the
>>>> whole
>>>> kernel panic.
>>>>
>>>> So I need two different copy_to_user implementation, one is existing
>>>> __arch_copy_to_user,
>>>> this function will panic when consuming memory errors. The other one
>>>> is this new helper
>>>> __arch_copy_mc_to_user, this interface is used when reading
>>>> pagecache. It can recover from
>>>> consume memory error.
>>>
>>> OK, but do we really need two almost-identical implementations of
>>> every function where the only difference is how the exception table
>>> entries are annotated? Could the exception handler itself just figure
>>> out who owns the page where the fault occurred and decide what action
>>> to take as appropriate?
>>>
>>> Robin.
>>>
>>
>> Thank you, Robin.
>>
>> I added this call path in this patchset: do_sea() ->
>> fixup_exception(), the purpose is to provide a chance for sea fault to
>> fixup (refer patch 3/7).
>>
>> If fixup successful, panic can be avoided. Otherwise, panic can be
>> eliminated according to the original logic.
>>
>> fixup_exception() will set regs->pc and jump to regs->pc when the
>> context recovery of an exception occurs.
>>
>> If mc-safe-fixup added to __arch_copy_to_user(), in *non pagecache
>> reading* scenario encount memory error when call __arch_copy_to_user()
>> , do_sea() -> fixup_exception() will not return fail and will miss the
>> panic logic in do_sea().
>>
>> So I add new helper __arch_copy_mc_to_user() and add mc-safe-fixup to
>> this helper, which can be used in the required scenarios. At present,
>> there is only one pagecache reading scenario, other scenarios need to
>> be developed.
>>
>> This is my current confusion. Of course, I will think about the
>> solution to solve the duplicate code problem.
>
> Right, but if the point is that faults in pagecahe pages are special,
> surely __do_kernel_fault() could ultimately figure out from the address
> whether that's the case or not?
>
> In general, if the principle is that whether a fault is recoverable or
> not depends on what was being accessed, then to me it seems
> fundamentally more robust to base that decision on details of the fault
> that actually occurred, rather than what the caller thought it was
> supposed to be doing at the time.
>
> Thanks,
> Robin.
> .
According to Mark's suggestion, all uaccess can be recovered, including
copy_to_user(), so there is no need to add new helper
__arch_mc_copy_to_user()。
Thanks,
Tong.