2020-08-01 17:39:29

by Dan Williams

[permalink] [raw]
Subject: [PATCH v8 0/2] Renovate memcpy_mcsafe with copy_mc_to_{user, kernel}

Changes since v7 [1]:
- Rebased on v5.8-rc5 to resolve a conflict with commit eb25de276505
("tools arch: Update arch/x86/lib/memcpy_64.S copy used in 'perf bench
mem memcpy'")

[1]: http://lore.kernel.org/r/159408043801.2272533.17485467640602344900.stgit@dwillia2-desk3.amr.corp.intel.com

---
Vishal, since this patch set has experienced unprecedented silence from
x86 folks I expect you will need to send it to Linus directly during the
merge window. It merges cleanly with recent -next.

Thomas, Ingo, Boris, please chime in to save Vishal from that
awkwardness. I am only going to be sporadically online for the next few
weeks.

---

The primary motivation to go touch memcpy_mcsafe() is that the existing
benefit of doing slow and careful copies is obviated on newer CPUs. That
fact solves the problem of needing to detect machine-check recovery
capability. Now the old "mcsafe_key" opt-in to careful copying can be made
an opt-out from the default fast copy implementation.

The discussion with Linus further made clear that this facility had
already lost its x86-machine-check specificity starting with commit
2c89130a56a ("x86/asm/memcpy_mcsafe: Add write-protection-fault
handling"). The new changes to not require a "careful copy" further
de-emphasizes the role that x86-MCA plays in the implementation to just
one more source of recoverable trap during the operation.

With the above realizations the name "mcsafe" is no longer accurate and
copy_safe() is proposed as its replacement. x86 grows a copy_safe_fast()
implementation as a default implementation that is independent of
detecting the presence of x86-MCA.

---

Dan Williams (2):
x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user,kernel}()
x86/copy_mc: Introduce copy_mc_generic()


arch/powerpc/Kconfig | 2
arch/powerpc/include/asm/string.h | 2
arch/powerpc/include/asm/uaccess.h | 40 +++--
arch/powerpc/lib/Makefile | 2
arch/powerpc/lib/copy_mc_64.S | 4
arch/x86/Kconfig | 2
arch/x86/Kconfig.debug | 2
arch/x86/include/asm/copy_mc_test.h | 75 +++++++++
arch/x86/include/asm/mcsafe_test.h | 75 ---------
arch/x86/include/asm/string_64.h | 32 ----
arch/x86/include/asm/uaccess.h | 21 +++
arch/x86/include/asm/uaccess_64.h | 20 --
arch/x86/kernel/cpu/mce/core.c | 8 -
arch/x86/kernel/quirks.c | 9 -
arch/x86/lib/Makefile | 1
arch/x86/lib/copy_mc.c | 64 ++++++++
arch/x86/lib/copy_mc_64.S | 165 ++++++++++++++++++++
arch/x86/lib/memcpy_64.S | 115 --------------
arch/x86/lib/usercopy_64.c | 21 ---
drivers/md/dm-writecache.c | 15 +-
drivers/nvdimm/claim.c | 2
drivers/nvdimm/pmem.c | 6 -
include/linux/string.h | 9 -
include/linux/uaccess.h | 9 +
include/linux/uio.h | 10 +
lib/Kconfig | 7 +
lib/iov_iter.c | 43 +++--
tools/arch/x86/include/asm/mcsafe_test.h | 13 --
tools/arch/x86/lib/memcpy_64.S | 115 --------------
tools/objtool/check.c | 5 -
tools/perf/bench/Build | 1
tools/perf/bench/mem-memcpy-x86-64-lib.c | 24 ---
tools/testing/nvdimm/test/nfit.c | 48 +++---
.../testing/selftests/powerpc/copyloops/.gitignore | 2
tools/testing/selftests/powerpc/copyloops/Makefile | 6 -
.../selftests/powerpc/copyloops/copy_mc_64.S | 1
.../selftests/powerpc/copyloops/memcpy_mcsafe_64.S | 1
37 files changed, 451 insertions(+), 526 deletions(-)
rename arch/powerpc/lib/{memcpy_mcsafe_64.S => copy_mc_64.S} (98%)
create mode 100644 arch/x86/include/asm/copy_mc_test.h
delete mode 100644 arch/x86/include/asm/mcsafe_test.h
create mode 100644 arch/x86/lib/copy_mc.c
create mode 100644 arch/x86/lib/copy_mc_64.S
delete mode 100644 tools/arch/x86/include/asm/mcsafe_test.h
delete mode 100644 tools/perf/bench/mem-memcpy-x86-64-lib.c
create mode 120000 tools/testing/selftests/powerpc/copyloops/copy_mc_64.S
delete mode 120000 tools/testing/selftests/powerpc/copyloops/memcpy_mcsafe_64.S

base-commit: 11ba468877bb23f28956a35e896356252d63c983


2020-08-01 17:39:47

by Dan Williams

[permalink] [raw]
Subject: [PATCH v8 1/2] x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()

In reaction to a proposal to introduce a memcpy_mcsafe_fast()
implementation Linus points out that memcpy_mcsafe() is poorly named
relative to communicating the scope of the interface. Specifically what
addresses are valid to pass as source, destination, and what faults /
exceptions are handled. Of particular concern is that even though x86
might be able to handle the semantics of copy_mc_to_user() with its
common copy_user_generic() implementation other archs likely need / want
an explicit path for this case:

On Fri, May 1, 2020 at 11:28 AM Linus Torvalds <[email protected]> wrote:
>
> On Thu, Apr 30, 2020 at 6:21 PM Dan Williams <[email protected]> wrote:
> >
> > However now I see that copy_user_generic() works for the wrong reason.
> > It works because the exception on the source address due to poison
> > looks no different than a write fault on the user address to the
> > caller, it's still just a short copy. So it makes copy_to_user() work
> > for the wrong reason relative to the name.
>
> Right.
>
> And it won't work that way on other architectures. On x86, we have a
> generic function that can take faults on either side, and we use it
> for both cases (and for the "in_user" case too), but that's an
> artifact of the architecture oddity.
>
> In fact, it's probably wrong even on x86 - because it can hide bugs -
> but writing those things is painful enough that everybody prefers
> having just one function.

The rename replaces a single top-level memcpy_mcsafe() with either
copy_mc_to_user(), or copy_mc_to_kernel().

An x86 copy_mc_fragile() name is introduced as the rename for the
low-level x86 implementation formerly named memcpy_mcsafe(). It is used
as the slow / careful backend that is supplanted by a fast
copy_mc_generic() in a follow-on patch.

One side-effect of this reorganization is that separating copy_mc_64.S
to its own file means that perf no longer needs to track dependencies
for its memcpy_64.S benchmarks.

Cc: [email protected]
Cc: <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Mikulas Patocka <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Acked-by: Michael Ellerman <[email protected]>
Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg@mail.gmail.com
Signed-off-by: Dan Williams <[email protected]>
---
arch/powerpc/Kconfig | 2
arch/powerpc/include/asm/string.h | 2
arch/powerpc/include/asm/uaccess.h | 40 ++++--
arch/powerpc/lib/Makefile | 2
arch/powerpc/lib/copy_mc_64.S | 4 -
arch/x86/Kconfig | 2
arch/x86/Kconfig.debug | 2
arch/x86/include/asm/copy_mc_test.h | 75 ++++++++++++
arch/x86/include/asm/mcsafe_test.h | 75 ------------
arch/x86/include/asm/string_64.h | 32 -----
arch/x86/include/asm/uaccess.h | 18 +++
arch/x86/include/asm/uaccess_64.h | 20 ---
arch/x86/kernel/cpu/mce/core.c | 8 -
arch/x86/kernel/quirks.c | 9 -
arch/x86/lib/Makefile | 1
arch/x86/lib/copy_mc.c | 66 +++++++++++
arch/x86/lib/copy_mc_64.S | 125 ++++++++++++++++++++
arch/x86/lib/memcpy_64.S | 115 ------------------
arch/x86/lib/usercopy_64.c | 21 ---
drivers/md/dm-writecache.c | 15 +-
drivers/nvdimm/claim.c | 2
drivers/nvdimm/pmem.c | 6 -
include/linux/string.h | 9 -
include/linux/uaccess.h | 9 +
include/linux/uio.h | 10 +-
lib/Kconfig | 7 +
lib/iov_iter.c | 43 ++++---
tools/arch/x86/include/asm/mcsafe_test.h | 13 --
tools/arch/x86/lib/memcpy_64.S | 115 ------------------
tools/objtool/check.c | 4 -
tools/perf/bench/Build | 1
tools/perf/bench/mem-memcpy-x86-64-lib.c | 24 ----
tools/testing/nvdimm/test/nfit.c | 48 ++++----
.../testing/selftests/powerpc/copyloops/.gitignore | 2
tools/testing/selftests/powerpc/copyloops/Makefile | 6 -
.../selftests/powerpc/copyloops/copy_mc_64.S | 1
.../selftests/powerpc/copyloops/memcpy_mcsafe_64.S | 1
37 files changed, 409 insertions(+), 526 deletions(-)
rename arch/powerpc/lib/{memcpy_mcsafe_64.S => copy_mc_64.S} (98%)
create mode 100644 arch/x86/include/asm/copy_mc_test.h
delete mode 100644 arch/x86/include/asm/mcsafe_test.h
create mode 100644 arch/x86/lib/copy_mc.c
create mode 100644 arch/x86/lib/copy_mc_64.S
delete mode 100644 tools/arch/x86/include/asm/mcsafe_test.h
delete mode 100644 tools/perf/bench/mem-memcpy-x86-64-lib.c
create mode 120000 tools/testing/selftests/powerpc/copyloops/copy_mc_64.S
delete mode 120000 tools/testing/selftests/powerpc/copyloops/memcpy_mcsafe_64.S

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 9fa23eb320ff..cf78ad7ff0b7 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -135,7 +135,7 @@ config PPC
select ARCH_HAS_STRICT_KERNEL_RWX if (PPC32 && !HIBERNATION)
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAS_UACCESS_FLUSHCACHE
- select ARCH_HAS_UACCESS_MCSAFE if PPC64
+ select ARCH_HAS_COPY_MC if PPC64
select ARCH_HAS_UBSAN_SANITIZE_ALL
select ARCH_HAVE_NMI_SAFE_CMPXCHG
select ARCH_KEEP_MEMBLOCK
diff --git a/arch/powerpc/include/asm/string.h b/arch/powerpc/include/asm/string.h
index b72692702f35..9bf6dffb4090 100644
--- a/arch/powerpc/include/asm/string.h
+++ b/arch/powerpc/include/asm/string.h
@@ -53,9 +53,7 @@ void *__memmove(void *to, const void *from, __kernel_size_t n);
#ifndef CONFIG_KASAN
#define __HAVE_ARCH_MEMSET32
#define __HAVE_ARCH_MEMSET64
-#define __HAVE_ARCH_MEMCPY_MCSAFE

-extern int memcpy_mcsafe(void *dst, const void *src, __kernel_size_t sz);
extern void *__memset16(uint16_t *, uint16_t v, __kernel_size_t);
extern void *__memset32(uint32_t *, uint32_t v, __kernel_size_t);
extern void *__memset64(uint64_t *, uint64_t v, __kernel_size_t);
diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h
index 64c04ab09112..97506441c15b 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -436,6 +436,32 @@ do { \
extern unsigned long __copy_tofrom_user(void __user *to,
const void __user *from, unsigned long size);

+#ifdef CONFIG_ARCH_HAS_COPY_MC
+unsigned long __must_check
+copy_mc_generic(void *to, const void *from, unsigned long size);
+
+static inline unsigned long __must_check
+copy_mc_to_kernel(void *to, const void *from, unsigned long size)
+{
+ return copy_mc_generic(to, from, size);
+}
+#define copy_mc_to_kernel copy_mc_to_kernel
+
+static inline unsigned long __must_check
+copy_mc_to_user(void __user *to, const void *from, unsigned long n)
+{
+ if (likely(check_copy_size(from, n, true))) {
+ if (access_ok(to, n)) {
+ allow_write_to_user(to, n);
+ n = copy_mc_generic((void *)to, from, n);
+ prevent_write_to_user(to, n);
+ }
+ }
+
+ return n;
+}
+#endif
+
#ifdef __powerpc64__
static inline unsigned long
raw_copy_in_user(void __user *to, const void __user *from, unsigned long n)
@@ -524,20 +550,6 @@ raw_copy_to_user(void __user *to, const void *from, unsigned long n)
return ret;
}

-static __always_inline unsigned long __must_check
-copy_to_user_mcsafe(void __user *to, const void *from, unsigned long n)
-{
- if (likely(check_copy_size(from, n, true))) {
- if (access_ok(to, n)) {
- allow_write_to_user(to, n);
- n = memcpy_mcsafe((void *)to, from, n);
- prevent_write_to_user(to, n);
- }
- }
-
- return n;
-}
-
unsigned long __arch_clear_user(void __user *addr, unsigned long size);

static inline unsigned long clear_user(void __user *addr, unsigned long size)
diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
index 5e994cda8e40..c254f5f733a8 100644
--- a/arch/powerpc/lib/Makefile
+++ b/arch/powerpc/lib/Makefile
@@ -39,7 +39,7 @@ obj-$(CONFIG_PPC_BOOK3S_64) += copyuser_power7.o copypage_power7.o \
memcpy_power7.o

obj64-y += copypage_64.o copyuser_64.o mem_64.o hweight_64.o \
- memcpy_64.o memcpy_mcsafe_64.o
+ memcpy_64.o copy_mc_64.o

obj64-$(CONFIG_SMP) += locks.o
obj64-$(CONFIG_ALTIVEC) += vmx-helper.o
diff --git a/arch/powerpc/lib/memcpy_mcsafe_64.S b/arch/powerpc/lib/copy_mc_64.S
similarity index 98%
rename from arch/powerpc/lib/memcpy_mcsafe_64.S
rename to arch/powerpc/lib/copy_mc_64.S
index cb882d9a6d8a..88d46c471493 100644
--- a/arch/powerpc/lib/memcpy_mcsafe_64.S
+++ b/arch/powerpc/lib/copy_mc_64.S
@@ -50,7 +50,7 @@ err3; stb r0,0(r3)
blr


-_GLOBAL(memcpy_mcsafe)
+_GLOBAL(copy_mc_generic)
mr r7,r5
cmpldi r5,16
blt .Lshort_copy
@@ -239,4 +239,4 @@ err1; stb r0,0(r3)
15: li r3,0
blr

-EXPORT_SYMBOL_GPL(memcpy_mcsafe);
+EXPORT_SYMBOL_GPL(copy_mc_generic);
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 883da0abf779..1f4104f8852b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -75,7 +75,7 @@ config X86
select ARCH_HAS_PTE_DEVMAP if X86_64
select ARCH_HAS_PTE_SPECIAL
select ARCH_HAS_UACCESS_FLUSHCACHE if X86_64
- select ARCH_HAS_UACCESS_MCSAFE if X86_64 && X86_MCE
+ select ARCH_HAS_COPY_MC if X86_64
select ARCH_HAS_SET_MEMORY
select ARCH_HAS_SET_DIRECT_MAP
select ARCH_HAS_STRICT_KERNEL_RWX
diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index 0dd319e6e5b4..ec98b400e38f 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -59,7 +59,7 @@ config EARLY_PRINTK_USB_XDBC
You should normally say N here, unless you want to debug early
crashes or need a very simple printk logging facility.

-config MCSAFE_TEST
+config COPY_MC_TEST
def_bool n

config EFI_PGT_DUMP
diff --git a/arch/x86/include/asm/copy_mc_test.h b/arch/x86/include/asm/copy_mc_test.h
new file mode 100644
index 000000000000..e4991ba96726
--- /dev/null
+++ b/arch/x86/include/asm/copy_mc_test.h
@@ -0,0 +1,75 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _COPY_MC_TEST_H_
+#define _COPY_MC_TEST_H_
+
+#ifndef __ASSEMBLY__
+#ifdef CONFIG_COPY_MC_TEST
+extern unsigned long copy_mc_test_src;
+extern unsigned long copy_mc_test_dst;
+
+static inline void copy_mc_inject_src(void *addr)
+{
+ if (addr)
+ copy_mc_test_src = (unsigned long) addr;
+ else
+ copy_mc_test_src = ~0UL;
+}
+
+static inline void copy_mc_inject_dst(void *addr)
+{
+ if (addr)
+ copy_mc_test_dst = (unsigned long) addr;
+ else
+ copy_mc_test_dst = ~0UL;
+}
+#else /* CONFIG_COPY_MC_TEST */
+static inline void copy_mc_inject_src(void *addr)
+{
+}
+
+static inline void copy_mc_inject_dst(void *addr)
+{
+}
+#endif /* CONFIG_COPY_MC_TEST */
+
+#else /* __ASSEMBLY__ */
+#include <asm/export.h>
+
+#ifdef CONFIG_COPY_MC_TEST
+.macro COPY_MC_TEST_CTL
+ .pushsection .data
+ .align 8
+ .globl copy_mc_test_src
+ copy_mc_test_src:
+ .quad 0
+ EXPORT_SYMBOL_GPL(copy_mc_test_src)
+ .globl copy_mc_test_dst
+ copy_mc_test_dst:
+ .quad 0
+ EXPORT_SYMBOL_GPL(copy_mc_test_dst)
+ .popsection
+.endm
+
+.macro COPY_MC_TEST_SRC reg count target
+ leaq \count(\reg), %r9
+ cmp copy_mc_test_src, %r9
+ ja \target
+.endm
+
+.macro COPY_MC_TEST_DST reg count target
+ leaq \count(\reg), %r9
+ cmp copy_mc_test_dst, %r9
+ ja \target
+.endm
+#else
+.macro COPY_MC_TEST_CTL
+.endm
+
+.macro COPY_MC_TEST_SRC reg count target
+.endm
+
+.macro COPY_MC_TEST_DST reg count target
+.endm
+#endif /* CONFIG_COPY_MC_TEST */
+#endif /* __ASSEMBLY__ */
+#endif /* _COPY_MC_TEST_H_ */
diff --git a/arch/x86/include/asm/mcsafe_test.h b/arch/x86/include/asm/mcsafe_test.h
deleted file mode 100644
index eb59804b6201..000000000000
--- a/arch/x86/include/asm/mcsafe_test.h
+++ /dev/null
@@ -1,75 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _MCSAFE_TEST_H_
-#define _MCSAFE_TEST_H_
-
-#ifndef __ASSEMBLY__
-#ifdef CONFIG_MCSAFE_TEST
-extern unsigned long mcsafe_test_src;
-extern unsigned long mcsafe_test_dst;
-
-static inline void mcsafe_inject_src(void *addr)
-{
- if (addr)
- mcsafe_test_src = (unsigned long) addr;
- else
- mcsafe_test_src = ~0UL;
-}
-
-static inline void mcsafe_inject_dst(void *addr)
-{
- if (addr)
- mcsafe_test_dst = (unsigned long) addr;
- else
- mcsafe_test_dst = ~0UL;
-}
-#else /* CONFIG_MCSAFE_TEST */
-static inline void mcsafe_inject_src(void *addr)
-{
-}
-
-static inline void mcsafe_inject_dst(void *addr)
-{
-}
-#endif /* CONFIG_MCSAFE_TEST */
-
-#else /* __ASSEMBLY__ */
-#include <asm/export.h>
-
-#ifdef CONFIG_MCSAFE_TEST
-.macro MCSAFE_TEST_CTL
- .pushsection .data
- .align 8
- .globl mcsafe_test_src
- mcsafe_test_src:
- .quad 0
- EXPORT_SYMBOL_GPL(mcsafe_test_src)
- .globl mcsafe_test_dst
- mcsafe_test_dst:
- .quad 0
- EXPORT_SYMBOL_GPL(mcsafe_test_dst)
- .popsection
-.endm
-
-.macro MCSAFE_TEST_SRC reg count target
- leaq \count(\reg), %r9
- cmp mcsafe_test_src, %r9
- ja \target
-.endm
-
-.macro MCSAFE_TEST_DST reg count target
- leaq \count(\reg), %r9
- cmp mcsafe_test_dst, %r9
- ja \target
-.endm
-#else
-.macro MCSAFE_TEST_CTL
-.endm
-
-.macro MCSAFE_TEST_SRC reg count target
-.endm
-
-.macro MCSAFE_TEST_DST reg count target
-.endm
-#endif /* CONFIG_MCSAFE_TEST */
-#endif /* __ASSEMBLY__ */
-#endif /* _MCSAFE_TEST_H_ */
diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
index 75314c3dbe47..6e450827f677 100644
--- a/arch/x86/include/asm/string_64.h
+++ b/arch/x86/include/asm/string_64.h
@@ -82,38 +82,6 @@ int strcmp(const char *cs, const char *ct);

#endif

-#define __HAVE_ARCH_MEMCPY_MCSAFE 1
-__must_check unsigned long __memcpy_mcsafe(void *dst, const void *src,
- size_t cnt);
-DECLARE_STATIC_KEY_FALSE(mcsafe_key);
-
-/**
- * memcpy_mcsafe - copy memory with indication if a machine check happened
- *
- * @dst: destination address
- * @src: source address
- * @cnt: number of bytes to copy
- *
- * Low level memory copy function that catches machine checks
- * We only call into the "safe" function on systems that can
- * actually do machine check recovery. Everyone else can just
- * use memcpy().
- *
- * Return 0 for success, or number of bytes not copied if there was an
- * exception.
- */
-static __always_inline __must_check unsigned long
-memcpy_mcsafe(void *dst, const void *src, size_t cnt)
-{
-#ifdef CONFIG_X86_MCE
- if (static_branch_unlikely(&mcsafe_key))
- return __memcpy_mcsafe(dst, src, cnt);
- else
-#endif
- memcpy(dst, src, cnt);
- return 0;
-}
-
#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
#define __HAVE_ARCH_MEMCPY_FLUSHCACHE 1
void __memcpy_flushcache(void *dst, const void *src, size_t cnt);
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 18dfa07d3ef0..4b2082b61e3e 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -452,6 +452,24 @@ extern __must_check long strnlen_user(const char __user *str, long n);
unsigned long __must_check clear_user(void __user *mem, unsigned long len);
unsigned long __must_check __clear_user(void __user *mem, unsigned long len);

+#ifdef CONFIG_ARCH_HAS_COPY_MC
+void enable_copy_mc_fragile(void);
+
+unsigned long __must_check
+copy_mc_to_kernel(void *to, const void *from, unsigned len);
+#define copy_mc_to_kernel copy_mc_to_kernel
+
+unsigned long __must_check
+copy_mc_to_user(void *to, const void *from, unsigned len);
+
+unsigned long __must_check
+copy_mc_fragile(void *dst, const void *src, unsigned cnt);
+#else
+static inline void enable_copy_mc_fragile(void)
+{
+}
+#endif
+
/*
* movsl can be slow when source and dest are not both 8-byte aligned
*/
diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index bc10e3dc64fe..e7265a552f4f 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -46,22 +46,6 @@ copy_user_generic(void *to, const void *from, unsigned len)
return ret;
}

-static __always_inline __must_check unsigned long
-copy_to_user_mcsafe(void *to, const void *from, unsigned len)
-{
- unsigned long ret;
-
- __uaccess_begin();
- /*
- * Note, __memcpy_mcsafe() is explicitly used since it can
- * handle exceptions / faults. memcpy_mcsafe() may fall back to
- * memcpy() which lacks this handling.
- */
- ret = __memcpy_mcsafe(to, from, len);
- __uaccess_end();
- return ret;
-}
-
static __always_inline __must_check unsigned long
raw_copy_from_user(void *dst, const void __user *src, unsigned long size)
{
@@ -102,8 +86,4 @@ __copy_from_user_flushcache(void *dst, const void __user *src, unsigned size)
kasan_check_write(dst, size);
return __copy_user_flushcache(dst, src, size);
}
-
-unsigned long
-mcsafe_handle_tail(char *to, char *from, unsigned len);
-
#endif /* _ASM_X86_UACCESS_64_H */
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 14e4b4d17ee5..5ec3a2d2bd8f 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -40,7 +40,6 @@
#include <linux/debugfs.h>
#include <linux/irq_work.h>
#include <linux/export.h>
-#include <linux/jump_label.h>
#include <linux/set_memory.h>
#include <linux/task_work.h>
#include <linux/hardirq.h>
@@ -2059,7 +2058,7 @@ void mce_disable_bank(int bank)
and older.
* mce=nobootlog Don't log MCEs from before booting.
* mce=bios_cmci_threshold Don't program the CMCI threshold
- * mce=recovery force enable memcpy_mcsafe()
+ * mce=recovery force enable copy_mc_fragile()
*/
static int __init mcheck_enable(char *str)
{
@@ -2667,13 +2666,10 @@ static void __init mcheck_debugfs_init(void)
static void __init mcheck_debugfs_init(void) { }
#endif

-DEFINE_STATIC_KEY_FALSE(mcsafe_key);
-EXPORT_SYMBOL_GPL(mcsafe_key);
-
static int __init mcheck_late_init(void)
{
if (mca_cfg.recovery)
- static_branch_inc(&mcsafe_key);
+ enable_copy_mc_fragile();

mcheck_debugfs_init();

diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c
index 896d74cb5081..a6640d30c69d 100644
--- a/arch/x86/kernel/quirks.c
+++ b/arch/x86/kernel/quirks.c
@@ -624,10 +624,6 @@ static void amd_disable_seq_and_redirect_scrub(struct pci_dev *dev)
DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_16H_NB_F3,
amd_disable_seq_and_redirect_scrub);

-#if defined(CONFIG_X86_64) && defined(CONFIG_X86_MCE)
-#include <linux/jump_label.h>
-#include <asm/string_64.h>
-
/* Ivy Bridge, Haswell, Broadwell */
static void quirk_intel_brickland_xeon_ras_cap(struct pci_dev *pdev)
{
@@ -636,7 +632,7 @@ static void quirk_intel_brickland_xeon_ras_cap(struct pci_dev *pdev)
pci_read_config_dword(pdev, 0x84, &capid0);

if (capid0 & 0x10)
- static_branch_inc(&mcsafe_key);
+ enable_copy_mc_fragile();
}

/* Skylake */
@@ -653,7 +649,7 @@ static void quirk_intel_purley_xeon_ras_cap(struct pci_dev *pdev)
* enabled, so memory machine check recovery is also enabled.
*/
if ((capid0 & 0xc0) == 0xc0 || (capid5 & 0x1e0))
- static_branch_inc(&mcsafe_key);
+ enable_copy_mc_fragile();

}
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x0ec3, quirk_intel_brickland_xeon_ras_cap);
@@ -661,7 +657,6 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2fc0, quirk_intel_brickland_xeon_
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x6fc0, quirk_intel_brickland_xeon_ras_cap);
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2083, quirk_intel_purley_xeon_ras_cap);
#endif
-#endif

bool x86_apple_machine;
EXPORT_SYMBOL(x86_apple_machine);
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 6110bce7237b..02c3cec7e515 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -44,6 +44,7 @@ obj-$(CONFIG_SMP) += msr-smp.o cache-smp.o
lib-y := delay.o misc.o cmdline.o cpu.o
lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o
lib-y += memcpy_$(BITS).o
+lib-$(CONFIG_ARCH_HAS_COPY_MC) += copy_mc.o copy_mc_64.o
lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o insn-eval.o
lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
lib-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
diff --git a/arch/x86/lib/copy_mc.c b/arch/x86/lib/copy_mc.c
new file mode 100644
index 000000000000..cdb8f5dc403d
--- /dev/null
+++ b/arch/x86/lib/copy_mc.c
@@ -0,0 +1,66 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2016-2020 Intel Corporation. All rights reserved. */
+
+#include <linux/jump_label.h>
+#include <linux/uaccess.h>
+#include <linux/export.h>
+#include <linux/string.h>
+#include <linux/types.h>
+
+static DEFINE_STATIC_KEY_FALSE(copy_mc_fragile_key);
+
+void enable_copy_mc_fragile(void)
+{
+ static_branch_inc(&copy_mc_fragile_key);
+}
+
+/**
+ * copy_mc_to_kernel - memory copy that that handles source exceptions
+ *
+ * @dst: destination address
+ * @src: source address
+ * @cnt: number of bytes to copy
+ *
+ * Call into the 'fragile' version on systems that have trouble
+ * actually do machine check recovery. Everyone else can just
+ * use memcpy().
+ *
+ * Return 0 for success, or number of bytes not copied if there was an
+ * exception.
+ */
+unsigned long __must_check
+copy_mc_to_kernel(void *dst, const void *src, unsigned cnt)
+{
+ if (static_branch_unlikely(&copy_mc_fragile_key))
+ return copy_mc_fragile(dst, src, cnt);
+ memcpy(dst, src, cnt);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(copy_mc_to_kernel);
+
+/*
+ * Similar to copy_user_handle_tail, probe for the write fault point, or
+ * source exception point.
+ */
+__visible notrace unsigned long
+copy_mc_fragile_handle_tail(char *to, char *from, unsigned len)
+{
+ for (; len; --len, to++, from++)
+ if (copy_mc_fragile(to, from, 1))
+ break;
+ return len;
+}
+
+unsigned long __must_check
+copy_mc_to_user(void *to, const void *from, unsigned len)
+{
+ unsigned long ret;
+
+ if (!static_branch_unlikely(&copy_mc_fragile_key))
+ return copy_user_generic(to, from, len);
+
+ __uaccess_begin();
+ ret = copy_mc_fragile(to, from, len);
+ __uaccess_end();
+ return ret;
+}
diff --git a/arch/x86/lib/copy_mc_64.S b/arch/x86/lib/copy_mc_64.S
new file mode 100644
index 000000000000..35a67c50890b
--- /dev/null
+++ b/arch/x86/lib/copy_mc_64.S
@@ -0,0 +1,125 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2016-2020 Intel Corporation. All rights reserved. */
+
+#include <linux/linkage.h>
+#include <asm/copy_mc_test.h>
+#include <asm/export.h>
+#include <asm/asm.h>
+
+#ifndef CONFIG_UML
+
+COPY_MC_TEST_CTL
+
+/*
+ * copy_mc_fragile - copy memory with indication if an exception / fault happened
+ *
+ * The 'fragile' version is opted into by platform quirks and takes
+ * pains to avoid unrecoverable corner cases like 'fast-string'
+ * instruction sequences, and consuming poison across a cacheline
+ * boundary. The non-fragile version is equivalent to memcpy()
+ * regardless of CPU machine-check-recovery capability.
+ */
+SYM_FUNC_START(copy_mc_fragile)
+ cmpl $8, %edx
+ /* Less than 8 bytes? Go to byte copy loop */
+ jb .L_no_whole_words
+
+ /* Check for bad alignment of source */
+ testl $7, %esi
+ /* Already aligned */
+ jz .L_8byte_aligned
+
+ /* Copy one byte at a time until source is 8-byte aligned */
+ movl %esi, %ecx
+ andl $7, %ecx
+ subl $8, %ecx
+ negl %ecx
+ subl %ecx, %edx
+.L_read_leading_bytes:
+ movb (%rsi), %al
+ COPY_MC_TEST_SRC %rsi 1 .E_leading_bytes
+ COPY_MC_TEST_DST %rdi 1 .E_leading_bytes
+.L_write_leading_bytes:
+ movb %al, (%rdi)
+ incq %rsi
+ incq %rdi
+ decl %ecx
+ jnz .L_read_leading_bytes
+
+.L_8byte_aligned:
+ movl %edx, %ecx
+ andl $7, %edx
+ shrl $3, %ecx
+ jz .L_no_whole_words
+
+.L_read_words:
+ movq (%rsi), %r8
+ COPY_MC_TEST_SRC %rsi 8 .E_read_words
+ COPY_MC_TEST_DST %rdi 8 .E_write_words
+.L_write_words:
+ movq %r8, (%rdi)
+ addq $8, %rsi
+ addq $8, %rdi
+ decl %ecx
+ jnz .L_read_words
+
+ /* Any trailing bytes? */
+.L_no_whole_words:
+ andl %edx, %edx
+ jz .L_done_memcpy_trap
+
+ /* Copy trailing bytes */
+ movl %edx, %ecx
+.L_read_trailing_bytes:
+ movb (%rsi), %al
+ COPY_MC_TEST_SRC %rsi 1 .E_trailing_bytes
+ COPY_MC_TEST_DST %rdi 1 .E_trailing_bytes
+.L_write_trailing_bytes:
+ movb %al, (%rdi)
+ incq %rsi
+ incq %rdi
+ decl %ecx
+ jnz .L_read_trailing_bytes
+
+ /* Copy successful. Return zero */
+.L_done_memcpy_trap:
+ xorl %eax, %eax
+.L_done:
+ ret
+SYM_FUNC_END(copy_mc_fragile)
+EXPORT_SYMBOL_GPL(copy_mc_fragile)
+
+ .section .fixup, "ax"
+ /*
+ * Return number of bytes not copied for any failure. Note that
+ * there is no "tail" handling since the source buffer is 8-byte
+ * aligned and poison is cacheline aligned.
+ */
+.E_read_words:
+ shll $3, %ecx
+.E_leading_bytes:
+ addl %edx, %ecx
+.E_trailing_bytes:
+ mov %ecx, %eax
+ jmp .L_done
+
+ /*
+ * For write fault handling, given the destination is unaligned,
+ * we handle faults on multi-byte writes with a byte-by-byte
+ * copy up to the write-protected page.
+ */
+.E_write_words:
+ shll $3, %ecx
+ addl %edx, %ecx
+ movl %ecx, %edx
+ jmp copy_mc_fragile_handle_tail
+
+ .previous
+
+ _ASM_EXTABLE_FAULT(.L_read_leading_bytes, .E_leading_bytes)
+ _ASM_EXTABLE_FAULT(.L_read_words, .E_read_words)
+ _ASM_EXTABLE_FAULT(.L_read_trailing_bytes, .E_trailing_bytes)
+ _ASM_EXTABLE(.L_write_leading_bytes, .E_leading_bytes)
+ _ASM_EXTABLE(.L_write_words, .E_write_words)
+ _ASM_EXTABLE(.L_write_trailing_bytes, .E_trailing_bytes)
+#endif
diff --git a/arch/x86/lib/memcpy_64.S b/arch/x86/lib/memcpy_64.S
index bbcc05bcefad..037faac46b0c 100644
--- a/arch/x86/lib/memcpy_64.S
+++ b/arch/x86/lib/memcpy_64.S
@@ -4,7 +4,6 @@
#include <linux/linkage.h>
#include <asm/errno.h>
#include <asm/cpufeatures.h>
-#include <asm/mcsafe_test.h>
#include <asm/alternative-asm.h>
#include <asm/export.h>

@@ -187,117 +186,3 @@ SYM_FUNC_START_LOCAL(memcpy_orig)
SYM_FUNC_END(memcpy_orig)

.popsection
-
-#ifndef CONFIG_UML
-
-MCSAFE_TEST_CTL
-
-/*
- * __memcpy_mcsafe - memory copy with machine check exception handling
- * Note that we only catch machine checks when reading the source addresses.
- * Writes to target are posted and don't generate machine checks.
- */
-SYM_FUNC_START(__memcpy_mcsafe)
- cmpl $8, %edx
- /* Less than 8 bytes? Go to byte copy loop */
- jb .L_no_whole_words
-
- /* Check for bad alignment of source */
- testl $7, %esi
- /* Already aligned */
- jz .L_8byte_aligned
-
- /* Copy one byte at a time until source is 8-byte aligned */
- movl %esi, %ecx
- andl $7, %ecx
- subl $8, %ecx
- negl %ecx
- subl %ecx, %edx
-.L_read_leading_bytes:
- movb (%rsi), %al
- MCSAFE_TEST_SRC %rsi 1 .E_leading_bytes
- MCSAFE_TEST_DST %rdi 1 .E_leading_bytes
-.L_write_leading_bytes:
- movb %al, (%rdi)
- incq %rsi
- incq %rdi
- decl %ecx
- jnz .L_read_leading_bytes
-
-.L_8byte_aligned:
- movl %edx, %ecx
- andl $7, %edx
- shrl $3, %ecx
- jz .L_no_whole_words
-
-.L_read_words:
- movq (%rsi), %r8
- MCSAFE_TEST_SRC %rsi 8 .E_read_words
- MCSAFE_TEST_DST %rdi 8 .E_write_words
-.L_write_words:
- movq %r8, (%rdi)
- addq $8, %rsi
- addq $8, %rdi
- decl %ecx
- jnz .L_read_words
-
- /* Any trailing bytes? */
-.L_no_whole_words:
- andl %edx, %edx
- jz .L_done_memcpy_trap
-
- /* Copy trailing bytes */
- movl %edx, %ecx
-.L_read_trailing_bytes:
- movb (%rsi), %al
- MCSAFE_TEST_SRC %rsi 1 .E_trailing_bytes
- MCSAFE_TEST_DST %rdi 1 .E_trailing_bytes
-.L_write_trailing_bytes:
- movb %al, (%rdi)
- incq %rsi
- incq %rdi
- decl %ecx
- jnz .L_read_trailing_bytes
-
- /* Copy successful. Return zero */
-.L_done_memcpy_trap:
- xorl %eax, %eax
-.L_done:
- ret
-SYM_FUNC_END(__memcpy_mcsafe)
-EXPORT_SYMBOL_GPL(__memcpy_mcsafe)
-
- .section .fixup, "ax"
- /*
- * Return number of bytes not copied for any failure. Note that
- * there is no "tail" handling since the source buffer is 8-byte
- * aligned and poison is cacheline aligned.
- */
-.E_read_words:
- shll $3, %ecx
-.E_leading_bytes:
- addl %edx, %ecx
-.E_trailing_bytes:
- mov %ecx, %eax
- jmp .L_done
-
- /*
- * For write fault handling, given the destination is unaligned,
- * we handle faults on multi-byte writes with a byte-by-byte
- * copy up to the write-protected page.
- */
-.E_write_words:
- shll $3, %ecx
- addl %edx, %ecx
- movl %ecx, %edx
- jmp mcsafe_handle_tail
-
- .previous
-
- _ASM_EXTABLE_FAULT(.L_read_leading_bytes, .E_leading_bytes)
- _ASM_EXTABLE_FAULT(.L_read_words, .E_read_words)
- _ASM_EXTABLE_FAULT(.L_read_trailing_bytes, .E_trailing_bytes)
- _ASM_EXTABLE(.L_write_leading_bytes, .E_leading_bytes)
- _ASM_EXTABLE(.L_write_words, .E_write_words)
- _ASM_EXTABLE(.L_write_trailing_bytes, .E_trailing_bytes)
-#endif
diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c
index b0dfac3d3df7..5f1d4a9ebd5a 100644
--- a/arch/x86/lib/usercopy_64.c
+++ b/arch/x86/lib/usercopy_64.c
@@ -56,27 +56,6 @@ unsigned long clear_user(void __user *to, unsigned long n)
}
EXPORT_SYMBOL(clear_user);

-/*
- * Similar to copy_user_handle_tail, probe for the write fault point,
- * but reuse __memcpy_mcsafe in case a new read error is encountered.
- * clac() is handled in _copy_to_iter_mcsafe().
- */
-__visible notrace unsigned long
-mcsafe_handle_tail(char *to, char *from, unsigned len)
-{
- for (; len; --len, to++, from++) {
- /*
- * Call the assembly routine back directly since
- * memcpy_mcsafe() may silently fallback to memcpy.
- */
- unsigned long rem = __memcpy_mcsafe(to, from, 1);
-
- if (rem)
- break;
- }
- return len;
-}
-
#ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
/**
* clean_cache_range - write back a cache range with CLWB
diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
index 5358894bb9fd..b0522e3ed604 100644
--- a/drivers/md/dm-writecache.c
+++ b/drivers/md/dm-writecache.c
@@ -49,7 +49,7 @@ do { \
#define pmem_assign(dest, src) ((dest) = (src))
#endif

-#if defined(__HAVE_ARCH_MEMCPY_MCSAFE) && defined(DM_WRITECACHE_HAS_PMEM)
+#if IS_ENABLED(CONFIG_ARCH_HAS_COPY_MC) && defined(DM_WRITECACHE_HAS_PMEM)
#define DM_WRITECACHE_HANDLE_HARDWARE_ERRORS
#endif

@@ -984,7 +984,8 @@ static void writecache_resume(struct dm_target *ti)
}
wc->freelist_size = 0;

- r = memcpy_mcsafe(&sb_seq_count, &sb(wc)->seq_count, sizeof(uint64_t));
+ r = copy_mc_to_kernel(&sb_seq_count, &sb(wc)->seq_count,
+ sizeof(uint64_t));
if (r) {
writecache_error(wc, r, "hardware memory error when reading superblock: %d", r);
sb_seq_count = cpu_to_le64(0);
@@ -1000,7 +1001,8 @@ static void writecache_resume(struct dm_target *ti)
e->seq_count = -1;
continue;
}
- r = memcpy_mcsafe(&wme, memory_entry(wc, e), sizeof(struct wc_memory_entry));
+ r = copy_mc_to_kernel(&wme, memory_entry(wc, e),
+ sizeof(struct wc_memory_entry));
if (r) {
writecache_error(wc, r, "hardware memory error when reading metadata entry %lu: %d",
(unsigned long)b, r);
@@ -1198,7 +1200,7 @@ static void bio_copy_block(struct dm_writecache *wc, struct bio *bio, void *data

if (rw == READ) {
int r;
- r = memcpy_mcsafe(buf, data, size);
+ r = copy_mc_to_kernel(buf, data, size);
flush_dcache_page(bio_page(bio));
if (unlikely(r)) {
writecache_error(wc, r, "hardware memory error when reading data: %d", r);
@@ -2341,7 +2343,7 @@ static int writecache_ctr(struct dm_target *ti, unsigned argc, char **argv)
}
}

- r = memcpy_mcsafe(&s, sb(wc), sizeof(struct wc_memory_superblock));
+ r = copy_mc_to_kernel(&s, sb(wc), sizeof(struct wc_memory_superblock));
if (r) {
ti->error = "Hardware memory error when reading superblock";
goto bad;
@@ -2352,7 +2354,8 @@ static int writecache_ctr(struct dm_target *ti, unsigned argc, char **argv)
ti->error = "Unable to initialize device";
goto bad;
}
- r = memcpy_mcsafe(&s, sb(wc), sizeof(struct wc_memory_superblock));
+ r = copy_mc_to_kernel(&s, sb(wc),
+ sizeof(struct wc_memory_superblock));
if (r) {
ti->error = "Hardware memory error when reading superblock";
goto bad;
diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index 45964acba944..22d865ba6353 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -268,7 +268,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
if (rw == READ) {
if (unlikely(is_bad_pmem(&nsio->bb, sector, sz_align)))
return -EIO;
- if (memcpy_mcsafe(buf, nsio->addr + offset, size) != 0)
+ if (copy_mc_to_kernel(buf, nsio->addr + offset, size) != 0)
return -EIO;
return 0;
}
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index d25e66fd942d..5a4f588605ca 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -125,7 +125,7 @@ static blk_status_t read_pmem(struct page *page, unsigned int off,
while (len) {
mem = kmap_atomic(page);
chunk = min_t(unsigned int, len, PAGE_SIZE - off);
- rem = memcpy_mcsafe(mem + off, pmem_addr, chunk);
+ rem = copy_mc_to_kernel(mem + off, pmem_addr, chunk);
kunmap_atomic(mem);
if (rem)
return BLK_STS_IOERR;
@@ -305,7 +305,7 @@ static long pmem_dax_direct_access(struct dax_device *dax_dev,

/*
* Use the 'no check' versions of copy_from_iter_flushcache() and
- * copy_to_iter_mcsafe() to bypass HARDENED_USERCOPY overhead. Bounds
+ * copy_mc_to_iter() to bypass HARDENED_USERCOPY overhead. Bounds
* checking, both file offset and device offset, is handled by
* dax_iomap_actor()
*/
@@ -318,7 +318,7 @@ static size_t pmem_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff,
static size_t pmem_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff,
void *addr, size_t bytes, struct iov_iter *i)
{
- return _copy_to_iter_mcsafe(addr, bytes, i);
+ return _copy_mc_to_iter(addr, bytes, i);
}

static const struct dax_operations pmem_dax_ops = {
diff --git a/include/linux/string.h b/include/linux/string.h
index 9b7a0632e87a..b1f3894a0a3e 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -161,20 +161,13 @@ extern int bcmp(const void *,const void *,__kernel_size_t);
#ifndef __HAVE_ARCH_MEMCHR
extern void * memchr(const void *,int,__kernel_size_t);
#endif
-#ifndef __HAVE_ARCH_MEMCPY_MCSAFE
-static inline __must_check unsigned long memcpy_mcsafe(void *dst,
- const void *src, size_t cnt)
-{
- memcpy(dst, src, cnt);
- return 0;
-}
-#endif
#ifndef __HAVE_ARCH_MEMCPY_FLUSHCACHE
static inline void memcpy_flushcache(void *dst, const void *src, size_t cnt)
{
memcpy(dst, src, cnt);
}
#endif
+
void *memchr_inv(const void *s, int c, size_t n);
char *strreplace(char *s, char old, char new);

diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 0a76ddc07d59..77923b290171 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -163,6 +163,15 @@ copy_in_user(void __user *to, const void __user *from, unsigned long n)
}
#endif

+#ifndef copy_mc_to_kernel
+static inline unsigned long __must_check
+copy_mc_to_kernel(void *dst, const void *src, size_t cnt)
+{
+ memcpy(dst, src, cnt);
+ return 0;
+}
+#endif
+
static __always_inline void pagefault_disabled_inc(void)
{
current->pagefault_disabled++;
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 9576fd8158d7..6a97b4d10b2e 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -186,10 +186,10 @@ size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i);
#define _copy_from_iter_flushcache _copy_from_iter_nocache
#endif

-#ifdef CONFIG_ARCH_HAS_UACCESS_MCSAFE
-size_t _copy_to_iter_mcsafe(const void *addr, size_t bytes, struct iov_iter *i);
+#ifdef CONFIG_ARCH_HAS_COPY_MC
+size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
#else
-#define _copy_to_iter_mcsafe _copy_to_iter
+#define _copy_mc_to_iter _copy_to_iter
#endif

static __always_inline __must_check
@@ -202,12 +202,12 @@ size_t copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i)
}

static __always_inline __must_check
-size_t copy_to_iter_mcsafe(void *addr, size_t bytes, struct iov_iter *i)
+size_t copy_mc_to_iter(void *addr, size_t bytes, struct iov_iter *i)
{
if (unlikely(!check_copy_size(addr, bytes, true)))
return 0;
else
- return _copy_to_iter_mcsafe(addr, bytes, i);
+ return _copy_mc_to_iter(addr, bytes, i);
}

size_t iov_iter_zero(size_t bytes, struct iov_iter *);
diff --git a/lib/Kconfig b/lib/Kconfig
index df3f3da95990..776145864937 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -631,7 +631,12 @@ config UACCESS_MEMCPY
config ARCH_HAS_UACCESS_FLUSHCACHE
bool

-config ARCH_HAS_UACCESS_MCSAFE
+# arch has a concept of a recoverable synchronous exception due to a
+# memory-read error like x86 machine-check or ARM data-abort, and
+# implements copy_mc_to_{user,kernel} to abort and report
+# 'bytes-transferred' if that exception fires when accessing the source
+# buffer.
+config ARCH_HAS_COPY_MC
bool

# Temporary. Goes away when all archs are cleaned up
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index bf538c2bec77..315bb2ff1ee8 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -636,30 +636,30 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
}
EXPORT_SYMBOL(_copy_to_iter);

-#ifdef CONFIG_ARCH_HAS_UACCESS_MCSAFE
-static int copyout_mcsafe(void __user *to, const void *from, size_t n)
+#ifdef CONFIG_ARCH_HAS_COPY_MC
+static int copyout_mc(void __user *to, const void *from, size_t n)
{
if (access_ok(to, n)) {
instrument_copy_to_user(to, from, n);
- n = copy_to_user_mcsafe((__force void *) to, from, n);
+ n = copy_mc_to_user((__force void *) to, from, n);
}
return n;
}

-static unsigned long memcpy_mcsafe_to_page(struct page *page, size_t offset,
+static unsigned long copy_mc_to_page(struct page *page, size_t offset,
const char *from, size_t len)
{
unsigned long ret;
char *to;

to = kmap_atomic(page);
- ret = memcpy_mcsafe(to + offset, from, len);
+ ret = copy_mc_to_kernel(to + offset, from, len);
kunmap_atomic(to);

return ret;
}

-static size_t copy_pipe_to_iter_mcsafe(const void *addr, size_t bytes,
+static size_t copy_mc_pipe_to_iter(const void *addr, size_t bytes,
struct iov_iter *i)
{
struct pipe_inode_info *pipe = i->pipe;
@@ -677,7 +677,7 @@ static size_t copy_pipe_to_iter_mcsafe(const void *addr, size_t bytes,
size_t chunk = min_t(size_t, n, PAGE_SIZE - off);
unsigned long rem;

- rem = memcpy_mcsafe_to_page(pipe->bufs[i_head & p_mask].page,
+ rem = copy_mc_to_page(pipe->bufs[i_head & p_mask].page,
off, addr, chunk);
i->head = i_head;
i->iov_offset = off + chunk - rem;
@@ -694,7 +694,7 @@ static size_t copy_pipe_to_iter_mcsafe(const void *addr, size_t bytes,
}

/**
- * _copy_to_iter_mcsafe - copy to user with source-read error exception handling
+ * _copy_mc_to_iter - copy to iter with source memory error exception handling
* @addr: source kernel address
* @bytes: total transfer length
* @iter: destination iterator
@@ -702,8 +702,9 @@ static size_t copy_pipe_to_iter_mcsafe(const void *addr, size_t bytes,
* The pmem driver arranges for filesystem-dax to use this facility via
* dax_copy_to_iter() for protecting read/write to persistent memory.
* Unless / until an architecture can guarantee identical performance
- * between _copy_to_iter_mcsafe() and _copy_to_iter() it would be a
- * performance regression to switch more users to the mcsafe version.
+ * between _copy_mc_to_iter() and _copy_to_iter(), see
+ * copy_mc_fragile(), it would be a performance regression to switch
+ * more users to the source exception handling version.
*
* Otherwise, the main differences between this and typical _copy_to_iter().
*
@@ -717,22 +718,24 @@ static size_t copy_pipe_to_iter_mcsafe(const void *addr, size_t bytes,
* Compare to copy_to_iter() where only ITER_IOVEC attempts might return
* a short copy.
*
- * See MCSAFE_TEST for self-test.
+ * See COPY_MC_TEST for self-test of the copy_mc_to_fragile()
+ * implementation.
*/
-size_t _copy_to_iter_mcsafe(const void *addr, size_t bytes, struct iov_iter *i)
+size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
{
const char *from = addr;
unsigned long rem, curr_addr, s_addr = (unsigned long) addr;

if (unlikely(iov_iter_is_pipe(i)))
- return copy_pipe_to_iter_mcsafe(addr, bytes, i);
+ return copy_mc_pipe_to_iter(addr, bytes, i);
if (iter_is_iovec(i))
might_fault();
iterate_and_advance(i, bytes, v,
- copyout_mcsafe(v.iov_base, (from += v.iov_len) - v.iov_len, v.iov_len),
+ copyout_mc(v.iov_base, (from += v.iov_len) - v.iov_len,
+ v.iov_len),
({
- rem = memcpy_mcsafe_to_page(v.bv_page, v.bv_offset,
- (from += v.bv_len) - v.bv_len, v.bv_len);
+ rem = copy_mc_to_page(v.bv_page, v.bv_offset,
+ (from += v.bv_len) - v.bv_len, v.bv_len);
if (rem) {
curr_addr = (unsigned long) from;
bytes = curr_addr - s_addr - rem;
@@ -740,8 +743,8 @@ size_t _copy_to_iter_mcsafe(const void *addr, size_t bytes, struct iov_iter *i)
}
}),
({
- rem = memcpy_mcsafe(v.iov_base, (from += v.iov_len) - v.iov_len,
- v.iov_len);
+ rem = copy_mc_to_kernel(v.iov_base, (from += v.iov_len)
+ - v.iov_len, v.iov_len);
if (rem) {
curr_addr = (unsigned long) from;
bytes = curr_addr - s_addr - rem;
@@ -752,8 +755,8 @@ size_t _copy_to_iter_mcsafe(const void *addr, size_t bytes, struct iov_iter *i)

return bytes;
}
-EXPORT_SYMBOL_GPL(_copy_to_iter_mcsafe);
-#endif /* CONFIG_ARCH_HAS_UACCESS_MCSAFE */
+EXPORT_SYMBOL_GPL(_copy_mc_to_iter);
+#endif /* CONFIG_ARCH_HAS_COPY_MC */

size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
{
diff --git a/tools/arch/x86/include/asm/mcsafe_test.h b/tools/arch/x86/include/asm/mcsafe_test.h
deleted file mode 100644
index 2ccd588fbad4..000000000000
--- a/tools/arch/x86/include/asm/mcsafe_test.h
+++ /dev/null
@@ -1,13 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _MCSAFE_TEST_H_
-#define _MCSAFE_TEST_H_
-
-.macro MCSAFE_TEST_CTL
-.endm
-
-.macro MCSAFE_TEST_SRC reg count target
-.endm
-
-.macro MCSAFE_TEST_DST reg count target
-.endm
-#endif /* _MCSAFE_TEST_H_ */
diff --git a/tools/arch/x86/lib/memcpy_64.S b/tools/arch/x86/lib/memcpy_64.S
index 45f8e1b02241..0b5b8ae56bd9 100644
--- a/tools/arch/x86/lib/memcpy_64.S
+++ b/tools/arch/x86/lib/memcpy_64.S
@@ -4,7 +4,6 @@
#include <linux/linkage.h>
#include <asm/errno.h>
#include <asm/cpufeatures.h>
-#include <asm/mcsafe_test.h>
#include <asm/alternative-asm.h>
#include <asm/export.h>

@@ -187,117 +186,3 @@ SYM_FUNC_START(memcpy_orig)
SYM_FUNC_END(memcpy_orig)

.popsection
-
-#ifndef CONFIG_UML
-
-MCSAFE_TEST_CTL
-
-/*
- * __memcpy_mcsafe - memory copy with machine check exception handling
- * Note that we only catch machine checks when reading the source addresses.
- * Writes to target are posted and don't generate machine checks.
- */
-SYM_FUNC_START(__memcpy_mcsafe)
- cmpl $8, %edx
- /* Less than 8 bytes? Go to byte copy loop */
- jb .L_no_whole_words
-
- /* Check for bad alignment of source */
- testl $7, %esi
- /* Already aligned */
- jz .L_8byte_aligned
-
- /* Copy one byte at a time until source is 8-byte aligned */
- movl %esi, %ecx
- andl $7, %ecx
- subl $8, %ecx
- negl %ecx
- subl %ecx, %edx
-.L_read_leading_bytes:
- movb (%rsi), %al
- MCSAFE_TEST_SRC %rsi 1 .E_leading_bytes
- MCSAFE_TEST_DST %rdi 1 .E_leading_bytes
-.L_write_leading_bytes:
- movb %al, (%rdi)
- incq %rsi
- incq %rdi
- decl %ecx
- jnz .L_read_leading_bytes
-
-.L_8byte_aligned:
- movl %edx, %ecx
- andl $7, %edx
- shrl $3, %ecx
- jz .L_no_whole_words
-
-.L_read_words:
- movq (%rsi), %r8
- MCSAFE_TEST_SRC %rsi 8 .E_read_words
- MCSAFE_TEST_DST %rdi 8 .E_write_words
-.L_write_words:
- movq %r8, (%rdi)
- addq $8, %rsi
- addq $8, %rdi
- decl %ecx
- jnz .L_read_words
-
- /* Any trailing bytes? */
-.L_no_whole_words:
- andl %edx, %edx
- jz .L_done_memcpy_trap
-
- /* Copy trailing bytes */
- movl %edx, %ecx
-.L_read_trailing_bytes:
- movb (%rsi), %al
- MCSAFE_TEST_SRC %rsi 1 .E_trailing_bytes
- MCSAFE_TEST_DST %rdi 1 .E_trailing_bytes
-.L_write_trailing_bytes:
- movb %al, (%rdi)
- incq %rsi
- incq %rdi
- decl %ecx
- jnz .L_read_trailing_bytes
-
- /* Copy successful. Return zero */
-.L_done_memcpy_trap:
- xorl %eax, %eax
-.L_done:
- ret
-SYM_FUNC_END(__memcpy_mcsafe)
-EXPORT_SYMBOL_GPL(__memcpy_mcsafe)
-
- .section .fixup, "ax"
- /*
- * Return number of bytes not copied for any failure. Note that
- * there is no "tail" handling since the source buffer is 8-byte
- * aligned and poison is cacheline aligned.
- */
-.E_read_words:
- shll $3, %ecx
-.E_leading_bytes:
- addl %edx, %ecx
-.E_trailing_bytes:
- mov %ecx, %eax
- jmp .L_done
-
- /*
- * For write fault handling, given the destination is unaligned,
- * we handle faults on multi-byte writes with a byte-by-byte
- * copy up to the write-protected page.
- */
-.E_write_words:
- shll $3, %ecx
- addl %edx, %ecx
- movl %ecx, %edx
- jmp mcsafe_handle_tail
-
- .previous
-
- _ASM_EXTABLE_FAULT(.L_read_leading_bytes, .E_leading_bytes)
- _ASM_EXTABLE_FAULT(.L_read_words, .E_read_words)
- _ASM_EXTABLE_FAULT(.L_read_trailing_bytes, .E_trailing_bytes)
- _ASM_EXTABLE(.L_write_leading_bytes, .E_leading_bytes)
- _ASM_EXTABLE(.L_write_words, .E_write_words)
- _ASM_EXTABLE(.L_write_trailing_bytes, .E_trailing_bytes)
-#endif
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 5e0d70a89fb8..17cb0933bf42 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -548,8 +548,8 @@ static const char *uaccess_safe_builtin[] = {
"__ubsan_handle_shift_out_of_bounds",
/* misc */
"csum_partial_copy_generic",
- "__memcpy_mcsafe",
- "mcsafe_handle_tail",
+ "copy_mc_fragile",
+ "copy_mc_fragile_handle_tail",
"ftrace_likely_update", /* CONFIG_TRACE_BRANCH_PROFILING */
NULL
};
diff --git a/tools/perf/bench/Build b/tools/perf/bench/Build
index 768e408757a0..5352303518e1 100644
--- a/tools/perf/bench/Build
+++ b/tools/perf/bench/Build
@@ -11,7 +11,6 @@ perf-y += epoll-ctl.o
perf-y += synthesize.o
perf-y += kallsyms-parse.o

-perf-$(CONFIG_X86_64) += mem-memcpy-x86-64-lib.o
perf-$(CONFIG_X86_64) += mem-memcpy-x86-64-asm.o
perf-$(CONFIG_X86_64) += mem-memset-x86-64-asm.o

diff --git a/tools/perf/bench/mem-memcpy-x86-64-lib.c b/tools/perf/bench/mem-memcpy-x86-64-lib.c
deleted file mode 100644
index 4130734dde84..000000000000
--- a/tools/perf/bench/mem-memcpy-x86-64-lib.c
+++ /dev/null
@@ -1,24 +0,0 @@
-/*
- * From code in arch/x86/lib/usercopy_64.c, copied to keep tools/ copy
- * of the kernel's arch/x86/lib/memcpy_64.s used in 'perf bench mem memcpy'
- * happy.
- */
-#include <linux/types.h>
-
-unsigned long __memcpy_mcsafe(void *dst, const void *src, size_t cnt);
-unsigned long mcsafe_handle_tail(char *to, char *from, unsigned len);
-
-unsigned long mcsafe_handle_tail(char *to, char *from, unsigned len)
-{
- for (; len; --len, to++, from++) {
- /*
- * Call the assembly routine back directly since
- * memcpy_mcsafe() may silently fallback to memcpy.
- */
- unsigned long rem = __memcpy_mcsafe(to, from, 1);
-
- if (rem)
- break;
- }
- return len;
-}
diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c
index a8ee5c4d41eb..41d467c83729 100644
--- a/tools/testing/nvdimm/test/nfit.c
+++ b/tools/testing/nvdimm/test/nfit.c
@@ -23,7 +23,7 @@
#include "nfit_test.h"
#include "../watermark.h"

-#include <asm/mcsafe_test.h>
+#include <asm/copy_mc_test.h>

/*
* Generate an NFIT table to describe the following topology:
@@ -3052,7 +3052,7 @@ static struct platform_driver nfit_test_driver = {
.id_table = nfit_test_id,
};

-static char mcsafe_buf[PAGE_SIZE] __attribute__((__aligned__(PAGE_SIZE)));
+static char copy_mc_buf[PAGE_SIZE] __attribute__((__aligned__(PAGE_SIZE)));

enum INJECT {
INJECT_NONE,
@@ -3060,7 +3060,7 @@ enum INJECT {
INJECT_DST,
};

-static void mcsafe_test_init(char *dst, char *src, size_t size)
+static void copy_mc_test_init(char *dst, char *src, size_t size)
{
size_t i;

@@ -3069,7 +3069,7 @@ static void mcsafe_test_init(char *dst, char *src, size_t size)
src[i] = (char) i;
}

-static bool mcsafe_test_validate(unsigned char *dst, unsigned char *src,
+static bool copy_mc_test_validate(unsigned char *dst, unsigned char *src,
size_t size, unsigned long rem)
{
size_t i;
@@ -3090,12 +3090,12 @@ static bool mcsafe_test_validate(unsigned char *dst, unsigned char *src,
return true;
}

-void mcsafe_test(void)
+void copy_mc_test(void)
{
char *inject_desc[] = { "none", "source", "destination" };
enum INJECT inj;

- if (IS_ENABLED(CONFIG_MCSAFE_TEST)) {
+ if (IS_ENABLED(CONFIG_COPY_MC_TEST)) {
pr_info("%s: run...\n", __func__);
} else {
pr_info("%s: disabled, skip.\n", __func__);
@@ -3113,31 +3113,31 @@ void mcsafe_test(void)

switch (inj) {
case INJECT_NONE:
- mcsafe_inject_src(NULL);
- mcsafe_inject_dst(NULL);
- dst = &mcsafe_buf[2048];
- src = &mcsafe_buf[1024 - i];
+ copy_mc_inject_src(NULL);
+ copy_mc_inject_dst(NULL);
+ dst = &copy_mc_buf[2048];
+ src = &copy_mc_buf[1024 - i];
expect = 0;
break;
case INJECT_SRC:
- mcsafe_inject_src(&mcsafe_buf[1024]);
- mcsafe_inject_dst(NULL);
- dst = &mcsafe_buf[2048];
- src = &mcsafe_buf[1024 - i];
+ copy_mc_inject_src(&copy_mc_buf[1024]);
+ copy_mc_inject_dst(NULL);
+ dst = &copy_mc_buf[2048];
+ src = &copy_mc_buf[1024 - i];
expect = 512 - i;
break;
case INJECT_DST:
- mcsafe_inject_src(NULL);
- mcsafe_inject_dst(&mcsafe_buf[2048]);
- dst = &mcsafe_buf[2048 - i];
- src = &mcsafe_buf[1024];
+ copy_mc_inject_src(NULL);
+ copy_mc_inject_dst(&copy_mc_buf[2048]);
+ dst = &copy_mc_buf[2048 - i];
+ src = &copy_mc_buf[1024];
expect = 512 - i;
break;
}

- mcsafe_test_init(dst, src, 512);
- rem = __memcpy_mcsafe(dst, src, 512);
- valid = mcsafe_test_validate(dst, src, 512, expect);
+ copy_mc_test_init(dst, src, 512);
+ rem = copy_mc_fragile(dst, src, 512);
+ valid = copy_mc_test_validate(dst, src, 512, expect);
if (rem == expect && valid)
continue;
pr_info("%s: copy(%#lx, %#lx, %d) off: %d rem: %ld %s expect: %ld\n",
@@ -3149,8 +3149,8 @@ void mcsafe_test(void)
}
}

- mcsafe_inject_src(NULL);
- mcsafe_inject_dst(NULL);
+ copy_mc_inject_src(NULL);
+ copy_mc_inject_dst(NULL);
}

static __init int nfit_test_init(void)
@@ -3161,7 +3161,7 @@ static __init int nfit_test_init(void)
libnvdimm_test();
acpi_nfit_test();
device_dax_test();
- mcsafe_test();
+ copy_mc_test();
dax_pmem_test();
dax_pmem_core_test();
#ifdef CONFIG_DEV_DAX_PMEM_COMPAT
diff --git a/tools/testing/selftests/powerpc/copyloops/.gitignore b/tools/testing/selftests/powerpc/copyloops/.gitignore
index ddaf140b8255..994b11af765c 100644
--- a/tools/testing/selftests/powerpc/copyloops/.gitignore
+++ b/tools/testing/selftests/powerpc/copyloops/.gitignore
@@ -12,4 +12,4 @@ memcpy_p7_t1
copyuser_64_exc_t0
copyuser_64_exc_t1
copyuser_64_exc_t2
-memcpy_mcsafe_64
+copy_mc_64
diff --git a/tools/testing/selftests/powerpc/copyloops/Makefile b/tools/testing/selftests/powerpc/copyloops/Makefile
index 0917983a1c78..3095b1f1c02b 100644
--- a/tools/testing/selftests/powerpc/copyloops/Makefile
+++ b/tools/testing/selftests/powerpc/copyloops/Makefile
@@ -12,7 +12,7 @@ ASFLAGS = $(CFLAGS) -Wa,-mpower4
TEST_GEN_PROGS := copyuser_64_t0 copyuser_64_t1 copyuser_64_t2 \
copyuser_p7_t0 copyuser_p7_t1 \
memcpy_64_t0 memcpy_64_t1 memcpy_64_t2 \
- memcpy_p7_t0 memcpy_p7_t1 memcpy_mcsafe_64 \
+ memcpy_p7_t0 memcpy_p7_t1 copy_mc_64 \
copyuser_64_exc_t0 copyuser_64_exc_t1 copyuser_64_exc_t2

EXTRA_SOURCES := validate.c ../harness.c stubs.S
@@ -45,9 +45,9 @@ $(OUTPUT)/memcpy_p7_t%: memcpy_power7.S $(EXTRA_SOURCES)
-D SELFTEST_CASE=$(subst memcpy_p7_t,,$(notdir $@)) \
-o $@ $^

-$(OUTPUT)/memcpy_mcsafe_64: memcpy_mcsafe_64.S $(EXTRA_SOURCES)
+$(OUTPUT)/copy_mc_64: copy_mc_64.S $(EXTRA_SOURCES)
$(CC) $(CPPFLAGS) $(CFLAGS) \
- -D COPY_LOOP=test_memcpy_mcsafe \
+ -D COPY_LOOP=test_copy_mc_generic \
-o $@ $^

$(OUTPUT)/copyuser_64_exc_t%: copyuser_64.S exc_validate.c ../harness.c \
diff --git a/tools/testing/selftests/powerpc/copyloops/copy_mc_64.S b/tools/testing/selftests/powerpc/copyloops/copy_mc_64.S
new file mode 120000
index 000000000000..dcbe06d500fb
--- /dev/null
+++ b/tools/testing/selftests/powerpc/copyloops/copy_mc_64.S
@@ -0,0 +1 @@
+../../../../../arch/powerpc/lib/copy_mc_64.S
\ No newline at end of file
diff --git a/tools/testing/selftests/powerpc/copyloops/memcpy_mcsafe_64.S b/tools/testing/selftests/powerpc/copyloops/memcpy_mcsafe_64.S
deleted file mode 120000
index f0feef3062f6..000000000000
--- a/tools/testing/selftests/powerpc/copyloops/memcpy_mcsafe_64.S
+++ /dev/null
@@ -1 +0,0 @@
-../../../../../arch/powerpc/lib/memcpy_mcsafe_64.S
\ No newline at end of file

2020-08-01 17:39:59

by Dan Williams

[permalink] [raw]
Subject: [PATCH v8 2/2] x86/copy_mc: Introduce copy_mc_generic()

The original copy_mc_fragile() implementation had negative performance
implications since it did not use the fast-string instruction sequence
to perform copies. For this reason copy_mc_to_kernel() fell back to
plain memcpy() to preserve performance on platform that did not indicate
the capability to recover from machine check exceptions. However, that
capability detection was not architectural and now that some platforms
can recover from fast-string consumption of memory errors the memcpy()
fallback now causes these more capable platforms to fail.

Introduce copy_mc_generic() as the fast default implementation of
copy_mc_to_kernel() and finalize the transition of copy_mc_fragile() to
be a platform quirk to indicate 'fragility'. With this in place
copy_mc_to_kernel() is fast and recovery-ready by default regardless of
hardware capability.

Thanks to Vivek for identifying that copy_user_generic() is not suitable
as the copy_mc_to_user() backend since the #MC handler explicitly checks
ex_has_fault_handler().

Cc: [email protected]
Cc: <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Linus Torvalds <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Reported-by: Erwin Tsaur <[email protected]>
Tested-by: Erwin Tsaur <[email protected]>
Fixes: 92b0729c34ca ("x86/mm, x86/mce: Add memcpy_mcsafe()")
Signed-off-by: Dan Williams <[email protected]>
---
arch/x86/include/asm/uaccess.h | 3 +++
arch/x86/lib/copy_mc.c | 12 +++++-------
arch/x86/lib/copy_mc_64.S | 40 ++++++++++++++++++++++++++++++++++++++++
tools/objtool/check.c | 1 +
4 files changed, 49 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 4b2082b61e3e..b038eda58958 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -464,6 +464,9 @@ copy_mc_to_user(void *to, const void *from, unsigned len);

unsigned long __must_check
copy_mc_fragile(void *dst, const void *src, unsigned cnt);
+
+unsigned long __must_check
+copy_mc_generic(void *dst, const void *src, unsigned cnt);
#else
static inline void enable_copy_mc_fragile(void)
{
diff --git a/arch/x86/lib/copy_mc.c b/arch/x86/lib/copy_mc.c
index cdb8f5dc403d..9e6fac1ab72e 100644
--- a/arch/x86/lib/copy_mc.c
+++ b/arch/x86/lib/copy_mc.c
@@ -23,7 +23,7 @@ void enable_copy_mc_fragile(void)
*
* Call into the 'fragile' version on systems that have trouble
* actually do machine check recovery. Everyone else can just
- * use memcpy().
+ * use copy_mc_generic().
*
* Return 0 for success, or number of bytes not copied if there was an
* exception.
@@ -33,8 +33,7 @@ copy_mc_to_kernel(void *dst, const void *src, unsigned cnt)
{
if (static_branch_unlikely(&copy_mc_fragile_key))
return copy_mc_fragile(dst, src, cnt);
- memcpy(dst, src, cnt);
- return 0;
+ return copy_mc_generic(dst, src, cnt);
}
EXPORT_SYMBOL_GPL(copy_mc_to_kernel);

@@ -56,11 +55,10 @@ copy_mc_to_user(void *to, const void *from, unsigned len)
{
unsigned long ret;

- if (!static_branch_unlikely(&copy_mc_fragile_key))
- return copy_user_generic(to, from, len);
-
__uaccess_begin();
- ret = copy_mc_fragile(to, from, len);
+ if (static_branch_unlikely(&copy_mc_fragile_key))
+ ret = copy_mc_fragile(to, from, len);
+ ret = copy_mc_generic(to, from, len);
__uaccess_end();
return ret;
}
diff --git a/arch/x86/lib/copy_mc_64.S b/arch/x86/lib/copy_mc_64.S
index 35a67c50890b..a08e7a4d9e28 100644
--- a/arch/x86/lib/copy_mc_64.S
+++ b/arch/x86/lib/copy_mc_64.S
@@ -2,7 +2,9 @@
/* Copyright(c) 2016-2020 Intel Corporation. All rights reserved. */

#include <linux/linkage.h>
+#include <asm/alternative-asm.h>
#include <asm/copy_mc_test.h>
+#include <asm/cpufeatures.h>
#include <asm/export.h>
#include <asm/asm.h>

@@ -122,4 +124,42 @@ EXPORT_SYMBOL_GPL(copy_mc_fragile)
_ASM_EXTABLE(.L_write_leading_bytes, .E_leading_bytes)
_ASM_EXTABLE(.L_write_words, .E_write_words)
_ASM_EXTABLE(.L_write_trailing_bytes, .E_trailing_bytes)
+
+/*
+ * copy_mc_generic - memory copy with exception handling
+ *
+ * Fast string copy + fault / exception handling. If the CPU does
+ * support machine check exception recovery, but does not support
+ * recovering from fast-string exceptions then this CPU needs to be
+ * added to the copy_mc_fragile_key set of quirks. Otherwise, absent any
+ * machine check recovery support this version should be no slower than
+ * standard memcpy.
+ */
+SYM_FUNC_START(copy_mc_generic)
+ ALTERNATIVE "jmp copy_mc_fragile", "", X86_FEATURE_ERMS
+ movq %rdi, %rax
+ movq %rdx, %rcx
+.L_copy:
+ rep movsb
+ /* Copy successful. Return zero */
+ xorl %eax, %eax
+ ret
+SYM_FUNC_END(copy_mc_generic)
+EXPORT_SYMBOL_GPL(copy_mc_generic)
+
+ .section .fixup, "ax"
+.E_copy:
+ /*
+ * On fault %rcx is updated such that the copy instruction could
+ * optionally be restarted at the fault position, i.e. it
+ * contains 'bytes remaining'. A non-zero return indicates error
+ * to copy_mc_generic() users, or indicate short transfers to
+ * user-copy routines.
+ */
+ movq %rcx, %rax
+ ret
+
+ .previous
+
+ _ASM_EXTABLE_FAULT(.L_copy, .E_copy)
#endif
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 17cb0933bf42..d2e1f01df10b 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -548,6 +548,7 @@ static const char *uaccess_safe_builtin[] = {
"__ubsan_handle_shift_out_of_bounds",
/* misc */
"csum_partial_copy_generic",
+ "copy_mc_generic",
"copy_mc_fragile",
"copy_mc_fragile_handle_tail",
"ftrace_likely_update", /* CONFIG_TRACE_BRANCH_PROFILING */

2020-08-03 09:49:59

by Chen, Rong A

[permalink] [raw]
Subject: [x86/copy_mc] a0ac629ebe: fio.read_iops -43.3% regression

Greeting,

FYI, we noticed a -43.3% regression of fio.read_iops due to commit:


commit: a0ac629ebe7b3d248cb93807782a00d9142fdb98 ("x86/copy_mc: Introduce copy_mc_generic()")
url: https://github.com/0day-ci/linux/commits/Dan-Williams/Renovate-memcpy_mcsafe-with-copy_mc_to_-user-kernel/20200802-014046


in testcase: fio-basic
on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 256G memory
with following parameters:

disk: 2pmem
fs: xfs
mount_option: dax
runtime: 200s
nr_task: 50%
time_based: tb
rw: read
bs: 2M
ioengine: libaio
test_size: 200G
cpufreq_governor: performance
ucode: 0x5002f01

test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.
test-url: https://github.com/axboe/fio

In addition to that, the commit also has significant impact on the following tests:

+------------------+----------------------------------------------------------------------+
| testcase: change | fio-basic: fio.read_iops -55.6% regression |
| test machine | 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 256G memory |
| test parameters | bs=2M |
| | cpufreq_governor=performance |
| | disk=2pmem |
| | fs=xfs |
| | ioengine=sync |
| | mount_option=dax |
| | nr_task=50% |
| | runtime=200s |
| | rw=read |
| | test_size=200G |
| | time_based=tb |
| | ucode=0x5002f01 |
+------------------+----------------------------------------------------------------------+


If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml

=========================================================================================
bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/mount_option/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase/time_based/ucode:
2M/gcc-9/performance/2pmem/xfs/libaio/x86_64-rhel-8.3/dax/50%/debian-10.4-x86_64-20200603.cgz/200s/read/lkp-csl-2sp6/200G/fio-basic/tb/0x5002f01

commit:
7476b91d4d ("x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()")
a0ac629ebe ("x86/copy_mc: Introduce copy_mc_generic()")

7476b91d4db369d8 a0ac629ebe7b3d248cb93807782
---------------- ---------------------------
%stddev %change %stddev
\ | \
97.22 -96.0 1.19 ± 21% fio.latency_100ms%
0.14 -0.1 0.05 fio.latency_10ms%
0.27 ± 13% -0.1 0.14 fio.latency_20ms%
0.04 ± 6% -0.0 0.03 ± 12% fio.latency_20us%
1.00 ± 28% +96.6 97.57 fio.latency_250ms%
0.05 -0.0 0.05 fio.latency_4ms%
0.02 ± 48% +0.3 0.31 ± 15% fio.latency_500ms%
1.25 ± 47% -0.6 0.63 ± 11% fio.latency_50ms%
0.01 ± 9% +0.0 0.02 ± 24% fio.latency_50us%
44292 -43.3% 25124 fio.read_bw_MBps
67895296 +76.8% 1.201e+08 fio.read_clat_90%_us
68681728 +76.7% 1.214e+08 fio.read_clat_95%_us
98304000 ± 19% +80.3% 1.772e+08 ± 4% fio.read_clat_99%_us
66674508 +76.2% 1.175e+08 fio.read_clat_mean_us
9950116 ± 12% +80.3% 17935634 fio.read_clat_stddev
22146 -43.3% 12562 fio.read_iops
2152824 +76.8% 3805428 fio.read_slat_mean_us
291719 ± 14% +86.6% 544324 fio.read_slat_stddev
12923 -2.5% 12594 fio.time.involuntary_context_switches
77.65 ± 3% -39.1% 47.29 fio.time.user_time
4429275 -43.3% 2512537 fio.workload
0.14 ± 3% +0.0 0.16 ± 4% mpstat.cpu.all.soft%
0.47 ± 3% -0.2 0.31 mpstat.cpu.all.usr%
53185 ± 91% +121.2% 117642 ± 40% numa-vmstat.node0.numa_other
122640 ± 39% -52.6% 58092 ± 81% numa-vmstat.node1.numa_other
60096 +1.5% 61021 proc-vmstat.nr_slab_unreclaimable
20103 ± 5% -17.9% 16495 ± 12% proc-vmstat.pgactivate
49.00 -2.0% 48.00 vmstat.cpu.id
1612 -1.6% 1587 vmstat.system.cs
2713 ± 4% +8.0% 2931 ± 4% slabinfo.PING.active_objs
2713 ± 4% +8.0% 2931 ± 4% slabinfo.PING.num_objs
1164 ± 9% +16.8% 1360 ± 6% slabinfo.task_group.active_objs
1164 ± 9% +16.8% 1360 ± 6% slabinfo.task_group.num_objs
379.25 ± 85% +279.7% 1439 ± 75% sched_debug.cfs_rq:/.exec_clock.min
29948 ± 5% -15.5% 25309 ± 5% sched_debug.cfs_rq:/.exec_clock.stddev
21606 ± 7% +25.1% 27034 ± 7% sched_debug.cfs_rq:/.min_vruntime.min
33321 ± 6% -16.5% 27820 ± 6% sched_debug.cfs_rq:/.min_vruntime.stddev
13783 ±109% +184.1% 39158 ± 20% sched_debug.cfs_rq:/.spread0.avg
-38497 -76.6% -9012 sched_debug.cfs_rq:/.spread0.min
33321 ± 6% -16.5% 27820 ± 6% sched_debug.cfs_rq:/.spread0.stddev
12.22 ± 10% +27.9% 15.62 ± 3% sched_debug.cpu.clock.stddev
3716 ±173% -100.0% 1.50 ± 57% softirqs.CPU10.NET_RX
17411 ± 36% -41.8% 10126 ± 19% softirqs.CPU24.SCHED
9179 ± 67% +87.1% 17173 ± 23% softirqs.CPU35.SCHED
9611 ± 34% -58.9% 3951 ± 10% softirqs.CPU48.SCHED
17177 ± 30% -42.6% 9864 ± 37% softirqs.CPU69.SCHED
86644 ± 29% -22.3% 67339 ± 5% softirqs.CPU76.TIMER
6339 ± 66% +115.9% 13686 ± 31% softirqs.CPU78.SCHED
10156 ± 64% +91.8% 19477 ± 25% softirqs.CPU81.SCHED
1239 ±172% -100.0% 0.00 interrupts.62:PCI-MSI.31981595-edge.i40e-eth0-TxRx-26
47482 +5.4% 50055 ± 4% interrupts.CAL:Function_call_interrupts
209.00 ± 23% -50.4% 103.75 ± 8% interrupts.CPU0.RES:Rescheduling_interrupts
146.25 ± 16% -27.4% 106.25 ± 16% interrupts.CPU15.RES:Rescheduling_interrupts
168.75 ± 81% -64.6% 59.75 ± 33% interrupts.CPU15.TLB:TLB_shootdowns
7321 ± 5% -52.7% 3461 ± 39% interrupts.CPU20.NMI:Non-maskable_interrupts
7321 ± 5% -52.7% 3461 ± 39% interrupts.CPU20.PMI:Performance_monitoring_interrupts
6665 ± 14% -61.2% 2586 ± 26% interrupts.CPU21.NMI:Non-maskable_interrupts
6665 ± 14% -61.2% 2586 ± 26% interrupts.CPU21.PMI:Performance_monitoring_interrupts
64.50 ± 23% +41.9% 91.50 ± 22% interrupts.CPU21.TLB:TLB_shootdowns
100.00 ± 41% +66.0% 166.00 ± 9% interrupts.CPU24.RES:Rescheduling_interrupts
1238 ±173% -100.0% 0.00 interrupts.CPU26.62:PCI-MSI.31981595-edge.i40e-eth0-TxRx-26
438.25 ± 4% +16.1% 509.00 ± 18% interrupts.CPU28.CAL:Function_call_interrupts
145.50 ± 20% -34.4% 95.50 ± 25% interrupts.CPU35.RES:Rescheduling_interrupts
7134 ± 11% -28.3% 5118 ± 19% interrupts.CPU41.NMI:Non-maskable_interrupts
7134 ± 11% -28.3% 5118 ± 19% interrupts.CPU41.PMI:Performance_monitoring_interrupts
107.75 ± 34% -47.3% 56.75 ± 40% interrupts.CPU93.RES:Rescheduling_interrupts
63.18 ± 12% -26.1 37.12 ± 15% perf-profile.calltrace.cycles-pp.copy_mc_fragile.copy_mc_to_user.copyout_mc._copy_mc_to_iter.dax_iomap_actor
0.00 +3.7 3.72 ± 52% perf-profile.calltrace.cycles-pp.copy_mc_generic.copy_mc_to_user.copyout_mc._copy_mc_to_iter.dax_iomap_actor
0.00 +37.8 37.83 ± 12% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.copy_mc_generic.copy_mc_to_user.copyout_mc._copy_mc_to_iter
63.34 ± 12% -26.2 37.14 ± 15% perf-profile.children.cycles-pp.copy_mc_fragile
2.41 ±112% -2.2 0.25 ±108% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
2.26 ±109% -2.0 0.29 ± 89% perf-profile.children.cycles-pp.asm_call_on_stack
2.15 ±112% -1.9 0.23 ±110% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
2.12 ±113% -1.9 0.23 ±110% perf-profile.children.cycles-pp.hrtimer_interrupt
1.68 ±114% -1.5 0.17 ±119% perf-profile.children.cycles-pp.__hrtimer_run_queues
1.48 ±123% -1.3 0.15 ±121% perf-profile.children.cycles-pp.tick_sched_timer
1.34 ±120% -1.2 0.14 ±122% perf-profile.children.cycles-pp.tick_sched_handle
1.28 ±119% -1.1 0.14 ±122% perf-profile.children.cycles-pp.update_process_times
0.70 ±107% -0.6 0.10 ±120% perf-profile.children.cycles-pp.scheduler_tick
2.65 ±106% +16.5 19.13 ± 12% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.00 +22.6 22.58 ± 7% perf-profile.children.cycles-pp.copy_mc_generic
62.52 ± 12% -25.5 37.00 ± 15% perf-profile.self.cycles-pp.copy_mc_fragile
0.00 +22.4 22.41 ± 6% perf-profile.self.cycles-pp.copy_mc_generic
42.43 +68.7% 71.58 perf-stat.i.MPKI
5.949e+09 -42.2% 3.44e+09 perf-stat.i.branch-instructions
0.07 +0.0 0.10 ± 5% perf-stat.i.branch-miss-rate%
3554006 ± 2% -7.5% 3286479 ± 3% perf-stat.i.branch-misses
95.02 -2.4 92.63 perf-stat.i.cache-miss-rate%
1.444e+09 -5.2% 1.369e+09 perf-stat.i.cache-misses
1.513e+09 -2.8% 1.471e+09 perf-stat.i.cache-references
3.81 +72.5% 6.58 perf-stat.i.cpi
102.49 +4.5% 107.13 perf-stat.i.cycles-between-cache-misses
0.00 ± 4% +0.0 0.00 ± 41% perf-stat.i.dTLB-load-miss-rate%
6.03e+09 -42.0% 3.495e+09 perf-stat.i.dTLB-loads
0.00 ± 5% +0.0 0.00 ± 7% perf-stat.i.dTLB-store-miss-rate%
5.909e+09 -42.5% 3.4e+09 perf-stat.i.dTLB-stores
47.00 +1.4 48.45 perf-stat.i.iTLB-load-miss-rate%
2270674 -11.0% 2021114 perf-stat.i.iTLB-load-misses
2563127 -16.0% 2151931 perf-stat.i.iTLB-loads
3.548e+10 -42.4% 2.044e+10 perf-stat.i.instructions
15634 -35.2% 10127 perf-stat.i.instructions-per-iTLB-miss
0.26 -41.6% 0.15 perf-stat.i.ipc
207.77 -37.5% 129.85 perf-stat.i.metric.M/sec
78061415 ± 13% +98.0% 1.546e+08 ± 20% perf-stat.i.node-load-misses
85582855 ± 11% +58.1% 1.353e+08 ± 20% perf-stat.i.node-loads
3.817e+08 -2.8% 3.709e+08 perf-stat.i.node-stores
42.66 +68.7% 71.96 perf-stat.overall.MPKI
0.06 +0.0 0.09 ± 3% perf-stat.overall.branch-miss-rate%
95.45 -2.4 93.07 perf-stat.overall.cache-miss-rate%
3.81 +73.0% 6.59 perf-stat.overall.cpi
93.55 +5.2% 98.41 perf-stat.overall.cycles-between-cache-misses
0.00 ± 5% +0.0 0.00 ± 13% perf-stat.overall.dTLB-load-miss-rate%
0.00 ± 5% +0.0 0.00 ± 2% perf-stat.overall.dTLB-store-miss-rate%
46.98 +1.5 48.43 perf-stat.overall.iTLB-load-miss-rate%
15639 -35.2% 10127 perf-stat.overall.instructions-per-iTLB-miss
0.26 -42.2% 0.15 perf-stat.overall.ipc
1605743 +1.5% 1630326 perf-stat.overall.path-length
5.919e+09 -42.2% 3.422e+09 perf-stat.ps.branch-instructions
3519866 ± 2% -7.8% 3245208 ± 3% perf-stat.ps.branch-misses
1.437e+09 -5.2% 1.362e+09 perf-stat.ps.cache-misses
1.506e+09 -2.8% 1.463e+09 perf-stat.ps.cache-references
1552 -1.4% 1530 perf-stat.ps.context-switches
6e+09 -42.1% 3.477e+09 perf-stat.ps.dTLB-loads
5.88e+09 -42.5% 3.382e+09 perf-stat.ps.dTLB-stores
2257568 -11.0% 2008542 perf-stat.ps.iTLB-load-misses
2547705 -16.1% 2138603 perf-stat.ps.iTLB-loads
3.53e+10 -42.4% 2.034e+10 perf-stat.ps.instructions
77685715 ± 13% +97.9% 1.538e+08 ± 20% perf-stat.ps.node-load-misses
85143339 ± 11% +58.1% 1.346e+08 ± 20% perf-stat.ps.node-loads
3.797e+08 -2.8% 3.69e+08 perf-stat.ps.node-stores
7.112e+12 -42.4% 4.096e+12 perf-stat.total.instructions



fio.read_bw_MBps

46000 +-------------------------------------------------------------------+
44000 |..+.+..+.+..+..+.+..+..+.+.. .+.. .+.+..+.+..+ |
| + +. |
42000 |-+ |
40000 |-+ |
38000 |-+ |
36000 |-+ |
| |
34000 |-+ |
32000 |-+ |
30000 |-+ |
28000 |-+ |
| |
26000 |-+O O O O O O O O O O O O O O O O O O O O O O O O O |
24000 +-------------------------------------------------------------------+


fio.read_iops

23000 +-------------------------------------------------------------------+
22000 |..+.+..+.+..+..+.+..+..+.+.. .+.. .+.+..+.+..+ |
| + +. |
21000 |-+ |
20000 |-+ |
19000 |-+ |
18000 |-+ |
| |
17000 |-+ |
16000 |-+ |
15000 |-+ |
14000 |-+ |
| |
13000 |-+O O O O O O O O O O O O O O O O O O O O O O O O O |
12000 +-------------------------------------------------------------------+


fio.read_clat_mean_us

1.2e+08 +-----------------------------------------------------------------+
| O O O O O O O O O O O O O O O O O O O O O O O O |
1.1e+08 |-+ |
| |
| |
1e+08 |-+ |
| |
9e+07 |-+ |
| |
8e+07 |-+ |
| |
| |
7e+07 |..+.+.. .+..+. .+..+.+..+..+.+..+.+.. |
| +.+..+ +..+ + |
6e+07 +-----------------------------------------------------------------+


fio.read_clat_90__us

1.3e+08 +-----------------------------------------------------------------+
| |
1.2e+08 |-+O O O O O O O O O O O O O O O O O O O O O O O O O |
| |
1.1e+08 |-+ |
| |
1e+08 |-+ |
| |
9e+07 |-+ |
| |
8e+07 |-+ |
| |
7e+07 |..+.+..+.+..+.+..+.+..+.+..+.+..+..+.+..+.+..+ |
| |
6e+07 +-----------------------------------------------------------------+


fio.read_clat_95__us

1.3e+08 +-----------------------------------------------------------------+
| O O O O O |
1.2e+08 |-+O O O O O O O O O O O O O O O O O O O O |
| |
1.1e+08 |-+ |
| |
1e+08 |-+ |
| |
9e+07 |-+ |
| |
8e+07 |-+ |
| .+. .+.. |
7e+07 |..+.+..+.+..+.+..+.+..+.+. +. +.+..+.+..+ |
| |
6e+07 +-----------------------------------------------------------------+


fio.read_slat_mean_us

4e+06 +-----------------------------------------------------------------+
3.8e+06 |-+O O O O O O O O O O O O O O O O O O O O O O O O O |
| |
3.6e+06 |-+ |
3.4e+06 |-+ |
| |
3.2e+06 |-+ |
3e+06 |-+ |
2.8e+06 |-+ |
| |
2.6e+06 |-+ |
2.4e+06 |-+ |
| |
2.2e+06 |..+.+..+.+..+.+..+.+..+.+..+.+..+..+.+..+.+..+ |
2e+06 +-----------------------------------------------------------------+


fio.latency_10ms_

0.15 +--------------------------------------------------------------------+
0.14 |..+.+..+..+.+..+..+.+..+..+.+..+..+.+..+.+..+..+ |
| |
0.13 |-+ |
0.12 |-+ |
0.11 |-+ |
0.1 |-+ |
| |
0.09 |-+ |
0.08 |-+ |
0.07 |-+ |
0.06 |-+ |
| |
0.05 |-+O O O O O O O O O O O O O O O O O O O O O O O O O |
0.04 +--------------------------------------------------------------------+


fio.latency_20ms_

0.55 +--------------------------------------------------------------------+
| + + + |
0.5 |-+ : + + :: |
0.45 |++ : + + + : : |
| + : + + : |
0.4 |-+ : : : : : |
0.35 |-+ : : : : : |
| : : : : : + |
0.3 |-+ : : : +. : + + |
0.25 |-+ : : .. +.. : + + |
| + +..+ +..+ +..+ |
0.2 |-+ |
0.15 |-+ |
| O O O O O O O O O O O O O O O O O O O O O O O O O |
0.1 +--------------------------------------------------------------------+


fio.latency_100ms_

100 +---------------------------------------------------------------------+
90 |-+ + +.+..+..+ +. |
| |
80 |-+ |
70 |-+ |
| |
60 |-+ |
50 |-+ |
40 |-+ |
| |
30 |-+ |
20 |-+ |
| |
10 |-+ |
0 +---------------------------------------------------------------------+


fio.latency_250ms_

100 +---------------------------------------------------------------------+
90 |-+ O |
| |
80 |-+ |
70 |-+ |
| |
60 |-+ |
50 |-+ |
40 |-+ |
| |
30 |-+ |
20 |-+ |
| |
10 |-+ |
0 +---------------------------------------------------------------------+


fio.workload

4.6e+06 +-----------------------------------------------------------------+
4.4e+06 |..+.+..+.+..+.+..+.+..+.+.. .+.. .+.+..+.+..+ |
| + +. |
4.2e+06 |-+ |
4e+06 |-+ |
3.8e+06 |-+ |
3.6e+06 |-+ |
| |
3.4e+06 |-+ |
3.2e+06 |-+ |
3e+06 |-+ |
2.8e+06 |-+ |
| |
2.6e+06 |-+O O O O O O O O O O O O O O O O O O O O O O O O O |
2.4e+06 +-----------------------------------------------------------------+


[*] bisect-good sample
[O] bisect-bad sample

***************************************************************************************************
lkp-csl-2sp6: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 256G memory
=========================================================================================
bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/mount_option/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase/time_based/ucode:
2M/gcc-9/performance/2pmem/xfs/sync/x86_64-rhel-8.3/dax/50%/debian-10.4-x86_64-20200603.cgz/200s/read/lkp-csl-2sp6/200G/fio-basic/tb/0x5002f01

commit:
7476b91d4d ("x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()")
a0ac629ebe ("x86/copy_mc: Introduce copy_mc_generic()")

7476b91d4db369d8 a0ac629ebe7b3d248cb93807782
---------------- ---------------------------
%stddev %change %stddev
\ | \
0.61 ± 15% -0.4 0.22 ± 94% fio.latency_1000us%
0.01 ± 11% +1.3 1.27 ± 25% fio.latency_10ms%
96.06 -95.5 0.60 ± 80% fio.latency_2ms%
1.27 ± 33% +96.2 97.48 fio.latency_4ms%
1.29 ± 55% -1.2 0.05 ± 54% fio.latency_500us%
75143 -55.6% 33381 fio.read_bw_MBps
1372160 +118.5% 2998272 fio.read_clat_90%_us
1409024 +116.9% 3055616 fio.read_clat_95%_us
2142208 ± 19% +120.3% 4718592 ± 17% fio.read_clat_99%_us
1272849 +125.4% 2869293 fio.read_clat_mean_us
228201 ± 15% +103.6% 464620 ± 14% fio.read_clat_stddev
37571 -55.6% 16690 fio.read_iops
69.28 ± 2% -40.3% 41.38 ± 3% fio.time.user_time
7514438 -55.6% 3338252 fio.workload
0.11 ± 3% +0.0 0.14 ± 5% mpstat.cpu.all.soft%
0.43 ± 3% -0.1 0.28 ± 2% mpstat.cpu.all.usr%
115069 -2.3% 112454 proc-vmstat.nr_shmem
20846 ± 6% -27.8% 15052 ± 3% proc-vmstat.pgactivate
967.50 ± 27% -50.0% 483.75 ± 78% slabinfo.xfs_buf_item.active_objs
967.50 ± 27% -50.0% 483.75 ± 78% slabinfo.xfs_buf_item.num_objs
100.00 -2.0% 98.00 vmstat.io.bo
1672 -3.3% 1616 vmstat.system.cs
9.059e+09 ± 6% -32.3% 6.131e+09 ± 54% cpuidle.C1E.time
19004364 ± 3% -22.4% 14741281 ± 34% cpuidle.C1E.usage
4.034e+08 ±133% +713.0% 3.28e+09 ±100% cpuidle.C6.time
570211 ±122% +571.6% 3829822 ± 86% cpuidle.C6.usage
61.80 ± 9% -17.6 44.19 perf-profile.calltrace.cycles-pp.copy_mc_fragile.copy_mc_to_user.copyout_mc._copy_mc_to_iter.dax_iomap_actor
0.00 +7.8 7.81 ± 6% perf-profile.calltrace.cycles-pp.copy_mc_generic.copy_mc_to_user.copyout_mc._copy_mc_to_iter.dax_iomap_actor
0.00 +29.2 29.21 ± 5% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.copy_mc_generic.copy_mc_to_user.copyout_mc._copy_mc_to_iter
61.92 ± 9% -17.7 44.25 perf-profile.children.cycles-pp.copy_mc_fragile
3.47 ±132% +11.7 15.21 ± 5% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.00 +22.3 22.32 perf-profile.children.cycles-pp.copy_mc_generic
61.16 ± 9% -17.4 43.78 perf-profile.self.cycles-pp.copy_mc_fragile
0.00 +22.1 22.09 perf-profile.self.cycles-pp.copy_mc_generic
212.00 ± 38% +288.6% 823.90 ± 67% sched_debug.cfs_rq:/.exec_clock.min
34013 ± 3% -17.1% 28181 ± 2% sched_debug.cfs_rq:/.exec_clock.stddev
36118 ± 5% -15.0% 30710 ± 2% sched_debug.cfs_rq:/.min_vruntime.stddev
36118 ± 5% -15.0% 30707 ± 2% sched_debug.cfs_rq:/.spread0.stddev
9.52 ± 11% +33.8% 12.73 ± 9% sched_debug.cpu.clock.stddev
17832 ± 13% -47.5% 9368 ± 17% sched_debug.cpu.sched_count.max
2475 ± 9% -34.4% 1624 ± 8% sched_debug.cpu.sched_count.stddev
8858 ± 13% -48.3% 4577 ± 18% sched_debug.cpu.sched_goidle.max
1260 ± 9% -33.2% 841.68 ± 8% sched_debug.cpu.sched_goidle.stddev
8285 ± 16% -32.1% 5622 ± 7% sched_debug.cpu.ttwu_count.max
1169 ± 9% -24.9% 878.40 ± 4% sched_debug.cpu.ttwu_count.stddev
26587 ± 8% -21.9% 20773 ± 22% softirqs.CPU1.SCHED
19906 ± 37% -55.7% 8824 ± 96% softirqs.CPU10.SCHED
21997 ± 34% -82.2% 3910 ± 55% softirqs.CPU20.SCHED
5126 ± 70% +166.6% 13666 ± 15% softirqs.CPU30.SCHED
5567 ± 56% +165.3% 14772 ± 29% softirqs.CPU31.SCHED
10027 ± 35% +101.3% 20182 ± 18% softirqs.CPU33.SCHED
4868 ± 50% +112.6% 10349 ± 14% softirqs.CPU44.SCHED
6304 ± 60% +154.5% 16043 ± 22% softirqs.CPU46.SCHED
4127 ± 76% +198.6% 12326 ± 32% softirqs.CPU49.SCHED
6313 ± 62% +98.5% 12530 ± 19% softirqs.CPU51.SCHED
8249 ± 58% +148.7% 20515 ± 31% softirqs.CPU57.SCHED
6971 ±109% +268.6% 25698 ± 8% softirqs.CPU68.SCHED
25116 ± 15% -32.4% 16974 ± 12% softirqs.CPU78.SCHED
24757 ± 12% -36.8% 15657 ± 27% softirqs.CPU79.SCHED
20231 ± 14% -45.5% 11024 ± 24% softirqs.CPU81.SCHED
21830 ± 23% -55.4% 9733 ± 67% softirqs.CPU9.SCHED
24043 ± 16% -39.9% 14449 ± 23% softirqs.CPU94.SCHED
42.31 +68.3% 71.22 perf-stat.i.MPKI
9.958e+09 -54.7% 4.511e+09 perf-stat.i.branch-instructions
0.05 ± 2% +0.0 0.08 ± 4% perf-stat.i.branch-miss-rate%
3682118 ± 2% -8.2% 3381534 perf-stat.i.branch-misses
67.34 +10.4 77.74 perf-stat.i.cache-miss-rate%
1.709e+09 -12.2% 1.501e+09 perf-stat.i.cache-misses
2.531e+09 -24.0% 1.923e+09 perf-stat.i.cache-references
1639 -4.1% 1571 perf-stat.i.context-switches
2.25 +121.4% 4.98 perf-stat.i.cpi
99.03 -1.8% 97.24 perf-stat.i.cpu-migrations
85.60 +14.2% 97.78 perf-stat.i.cycles-between-cache-misses
0.00 ± 18% +0.0 0.00 ± 44% perf-stat.i.dTLB-load-miss-rate%
9.996e+09 -54.5% 4.549e+09 perf-stat.i.dTLB-loads
0.00 ± 7% +0.0 0.00 ± 6% perf-stat.i.dTLB-store-miss-rate%
9.904e+09 -54.9% 4.466e+09 perf-stat.i.dTLB-stores
44.79 +4.2 48.99 perf-stat.i.iTLB-load-miss-rate%
2535885 -13.8% 2185118 perf-stat.i.iTLB-load-misses
3134177 -27.3% 2278467 perf-stat.i.iTLB-loads
5.952e+10 -54.9% 2.687e+10 perf-stat.i.instructions
23480 -47.6% 12304 perf-stat.i.instructions-per-iTLB-miss
0.45 -54.6% 0.20 perf-stat.i.ipc
342.39 -51.0% 167.90 perf-stat.i.metric.M/sec
1.165e+08 ± 30% +72.8% 2.013e+08 ± 9% perf-stat.i.node-load-misses
1.257e+08 ± 26% +41.1% 1.773e+08 ± 9% perf-stat.i.node-loads
2.42e+08 +19.6% 2.895e+08 perf-stat.i.node-stores
42.53 +68.3% 71.58 perf-stat.overall.MPKI
0.04 ± 2% +0.0 0.07 perf-stat.overall.branch-miss-rate%
67.52 +10.5 78.07 perf-stat.overall.cache-miss-rate%
2.24 +122.4% 4.99 perf-stat.overall.cpi
78.17 +14.3% 89.34 perf-stat.overall.cycles-between-cache-misses
0.00 ± 25% +0.0 0.00 ± 12% perf-stat.overall.dTLB-load-miss-rate%
0.00 ± 13% +0.0 0.00 ± 10% perf-stat.overall.dTLB-store-miss-rate%
44.72 +4.2 48.96 perf-stat.overall.iTLB-load-miss-rate%
23499 -47.6% 12306 perf-stat.overall.instructions-per-iTLB-miss
0.45 -55.0% 0.20 perf-stat.overall.ipc
1587395 +1.5% 1611895 perf-stat.overall.path-length
9.912e+09 -54.7% 4.489e+09 perf-stat.ps.branch-instructions
3650903 ± 2% -8.4% 3345674 perf-stat.ps.branch-misses
1.701e+09 -12.2% 1.494e+09 perf-stat.ps.cache-misses
2.52e+09 -24.1% 1.914e+09 perf-stat.ps.cache-references
1616 -3.7% 1556 perf-stat.ps.context-switches
9.95e+09 -54.5% 4.526e+09 perf-stat.ps.dTLB-loads
9.859e+09 -54.9% 4.445e+09 perf-stat.ps.dTLB-stores
2521574 -13.8% 2172342 perf-stat.ps.iTLB-load-misses
3116655 -27.3% 2264894 perf-stat.ps.iTLB-loads
5.925e+10 -54.9% 2.673e+10 perf-stat.ps.instructions
1.159e+08 ± 30% +72.8% 2.003e+08 ± 9% perf-stat.ps.node-load-misses
1.25e+08 ± 26% +41.1% 1.764e+08 ± 9% perf-stat.ps.node-loads
2.407e+08 +19.6% 2.878e+08 perf-stat.ps.node-stores
1.193e+13 -54.9% 5.381e+12 perf-stat.total.instructions
0.00 +2.7e+105% 2689 ±171% interrupts.115:PCI-MSI.31981648-edge.i40e-eth0-TxRx-79
62.75 ± 27% +51.8% 95.25 ± 21% interrupts.CPU1.RES:Rescheduling_interrupts
6530 ± 17% -44.1% 3647 ± 35% interrupts.CPU17.NMI:Non-maskable_interrupts
6530 ± 17% -44.1% 3647 ± 35% interrupts.CPU17.PMI:Performance_monitoring_interrupts
62.00 ± 74% +187.9% 178.50 ± 5% interrupts.CPU20.RES:Rescheduling_interrupts
365.00 ± 78% -76.0% 87.50 ± 53% interrupts.CPU25.TLB:TLB_shootdowns
170.50 ± 15% -26.8% 124.75 ± 10% interrupts.CPU30.RES:Rescheduling_interrupts
7605 -43.3% 4316 ± 32% interrupts.CPU31.NMI:Non-maskable_interrupts
7605 -43.3% 4316 ± 32% interrupts.CPU31.PMI:Performance_monitoring_interrupts
169.00 ± 12% -37.1% 106.25 ± 23% interrupts.CPU31.RES:Rescheduling_interrupts
7145 ± 11% -33.0% 4786 ± 18% interrupts.CPU36.NMI:Non-maskable_interrupts
7145 ± 11% -33.0% 4786 ± 18% interrupts.CPU36.PMI:Performance_monitoring_interrupts
136.50 ± 27% -44.7% 75.50 ± 60% interrupts.CPU39.TLB:TLB_shootdowns
149.25 ± 24% -24.6% 112.50 ± 30% interrupts.CPU4.RES:Rescheduling_interrupts
7599 -46.6% 4061 ± 35% interrupts.CPU41.NMI:Non-maskable_interrupts
7599 -46.6% 4061 ± 35% interrupts.CPU41.PMI:Performance_monitoring_interrupts
6661 ± 24% -52.1% 3191 ± 51% interrupts.CPU44.NMI:Non-maskable_interrupts
6661 ± 24% -52.1% 3191 ± 51% interrupts.CPU44.PMI:Performance_monitoring_interrupts
7622 -43.5% 4307 ± 33% interrupts.CPU46.NMI:Non-maskable_interrupts
7622 -43.5% 4307 ± 33% interrupts.CPU46.PMI:Performance_monitoring_interrupts
7613 -43.1% 4331 ± 31% interrupts.CPU47.NMI:Non-maskable_interrupts
7613 -43.1% 4331 ± 31% interrupts.CPU47.PMI:Performance_monitoring_interrupts
5823 ± 32% -36.4% 3703 ± 34% interrupts.CPU5.NMI:Non-maskable_interrupts
5823 ± 32% -36.4% 3703 ± 34% interrupts.CPU5.PMI:Performance_monitoring_interrupts
89.25 ± 48% -61.1% 34.75 ± 31% interrupts.CPU53.TLB:TLB_shootdowns
5698 ± 33% -42.5% 3277 ± 49% interrupts.CPU55.NMI:Non-maskable_interrupts
5698 ± 33% -42.5% 3277 ± 49% interrupts.CPU55.PMI:Performance_monitoring_interrupts
172.00 ± 14% -35.2% 111.50 ± 41% interrupts.CPU56.RES:Rescheduling_interrupts
64.00 ± 42% -39.5% 38.75 ± 29% interrupts.CPU56.TLB:TLB_shootdowns
156.00 ± 17% -36.2% 99.50 ± 21% interrupts.CPU57.RES:Rescheduling_interrupts
146.25 ± 28% -48.9% 74.75 ± 67% interrupts.CPU58.RES:Rescheduling_interrupts
7627 -47.0% 4043 ± 31% interrupts.CPU62.NMI:Non-maskable_interrupts
7627 -47.0% 4043 ± 31% interrupts.CPU62.PMI:Performance_monitoring_interrupts
174.75 ± 12% -29.9% 122.50 ± 30% interrupts.CPU62.RES:Rescheduling_interrupts
76.00 ± 29% -48.4% 39.25 ± 29% interrupts.CPU62.TLB:TLB_shootdowns
7159 ± 11% -50.2% 3564 ± 32% interrupts.CPU63.NMI:Non-maskable_interrupts
7159 ± 11% -50.2% 3564 ± 32% interrupts.CPU63.PMI:Performance_monitoring_interrupts
7628 -62.9% 2831 interrupts.CPU66.NMI:Non-maskable_interrupts
7628 -62.9% 2831 interrupts.CPU66.PMI:Performance_monitoring_interrupts
174.50 ± 10% -36.4% 111.00 ± 50% interrupts.CPU66.RES:Rescheduling_interrupts
4370 ± 18% -34.7% 2853 interrupts.CPU69.NMI:Non-maskable_interrupts
4370 ± 18% -34.7% 2853 interrupts.CPU69.PMI:Performance_monitoring_interrupts
6885 ± 18% -45.8% 3731 ± 28% interrupts.CPU74.NMI:Non-maskable_interrupts
6885 ± 18% -45.8% 3731 ± 28% interrupts.CPU74.PMI:Performance_monitoring_interrupts
5900 ± 18% -57.5% 2510 ± 24% interrupts.CPU77.NMI:Non-maskable_interrupts
5900 ± 18% -57.5% 2510 ± 24% interrupts.CPU77.PMI:Performance_monitoring_interrupts
62.00 ± 41% +58.9% 98.50 ± 14% interrupts.CPU78.RES:Rescheduling_interrupts
0.00 +2.7e+105% 2689 ±171% interrupts.CPU79.115:PCI-MSI.31981648-edge.i40e-eth0-TxRx-79
49.75 ± 47% +119.6% 109.25 ± 28% interrupts.CPU79.RES:Rescheduling_interrupts
61.50 ± 54% +115.0% 132.25 ± 31% interrupts.CPU8.RES:Rescheduling_interrupts
5871 ± 19% -38.8% 3594 ± 32% interrupts.CPU80.NMI:Non-maskable_interrupts
5871 ± 19% -38.8% 3594 ± 32% interrupts.CPU80.PMI:Performance_monitoring_interrupts
60.50 ± 19% +120.2% 133.25 ± 14% interrupts.CPU81.RES:Rescheduling_interrupts
36.00 ± 79% +179.2% 100.50 ± 35% interrupts.CPU86.RES:Rescheduling_interrupts
6322 ± 21% -60.6% 2490 ± 25% interrupts.CPU88.NMI:Non-maskable_interrupts
6322 ± 21% -60.6% 2490 ± 25% interrupts.CPU88.PMI:Performance_monitoring_interrupts
32.50 ± 40% +150.0% 81.25 ± 41% interrupts.CPU92.RES:Rescheduling_interrupts
124.00 ± 11% -22.8% 95.75 ± 4% interrupts.IWI:IRQ_work_interrupts
538989 ± 8% -28.0% 387910 ± 2% interrupts.NMI:Non-maskable_interrupts
538989 ± 8% -28.0% 387910 ± 2% interrupts.PMI:Performance_monitoring_interrupts





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Rong Chen


Attachments:
(No filename) (46.84 kB)
config-5.8.0-rc5-00002-ga0ac629ebe7b3d (160.83 kB)
job-script (8.31 kB)
job.yaml (5.76 kB)
reproduce (997.00 B)
Download all attachments

2020-08-06 16:51:33

by Ingo Molnar

[permalink] [raw]
Subject: Re: [x86/copy_mc] a0ac629ebe: fio.read_iops -43.3% regression


* Dan Williams <[email protected]> wrote:

> On Thu, Aug 6, 2020 at 6:35 AM Ingo Molnar <[email protected]> wrote:
> >
> >
> > * kernel test robot <[email protected]> wrote:
> >
> > > Greeting,
> > >
> > > FYI, we noticed a -43.3% regression of fio.read_iops due to commit:
> > >
> > >
> > > commit: a0ac629ebe7b3d248cb93807782a00d9142fdb98 ("x86/copy_mc: Introduce copy_mc_generic()")
> > > url: https://github.com/0day-ci/linux/commits/Dan-Williams/Renovate-memcpy_mcsafe-with-copy_mc_to_-user-kernel/20200802-014046
> > >
> > >
> > > in testcase: fio-basic
> > > on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 256G memory
> > > with following parameters:
> >
> > So this performance regression, if it isn't a spurious result, looks
> > concerning. Is this expected?
>
> This is not expected and I think delays these patches until I'm back
> from leave in a few weeks. I know that we might lose some inlining
> effect due to replacing native memcpy, but I did not expect it would
> have an impact like this. In my testing I was seeing a performance
> improvement from replacing the careful / open-coded copy with rep;
> mov;, which increases the surprise of this result.

It would be nice to double check this on the kernel-test-robot side as
well, to make sure it's not a false positive.

Thanks,

Ingo

2020-08-06 17:24:36

by Ingo Molnar

[permalink] [raw]
Subject: Re: [x86/copy_mc] a0ac629ebe: fio.read_iops -43.3% regression


* kernel test robot <[email protected]> wrote:

> Greeting,
>
> FYI, we noticed a -43.3% regression of fio.read_iops due to commit:
>
>
> commit: a0ac629ebe7b3d248cb93807782a00d9142fdb98 ("x86/copy_mc: Introduce copy_mc_generic()")
> url: https://github.com/0day-ci/linux/commits/Dan-Williams/Renovate-memcpy_mcsafe-with-copy_mc_to_-user-kernel/20200802-014046
>
>
> in testcase: fio-basic
> on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 256G memory
> with following parameters:

So this performance regression, if it isn't a spurious result, looks
concerning. Is this expected?

Thanks,

Ingo

2020-08-06 17:36:57

by Dan Williams

[permalink] [raw]
Subject: Re: [x86/copy_mc] a0ac629ebe: fio.read_iops -43.3% regression

On Thu, Aug 6, 2020 at 6:35 AM Ingo Molnar <[email protected]> wrote:
>
>
> * kernel test robot <[email protected]> wrote:
>
> > Greeting,
> >
> > FYI, we noticed a -43.3% regression of fio.read_iops due to commit:
> >
> >
> > commit: a0ac629ebe7b3d248cb93807782a00d9142fdb98 ("x86/copy_mc: Introduce copy_mc_generic()")
> > url: https://github.com/0day-ci/linux/commits/Dan-Williams/Renovate-memcpy_mcsafe-with-copy_mc_to_-user-kernel/20200802-014046
> >
> >
> > in testcase: fio-basic
> > on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 256G memory
> > with following parameters:
>
> So this performance regression, if it isn't a spurious result, looks
> concerning. Is this expected?

This is not expected and I think delays these patches until I'm back
from leave in a few weeks. I know that we might lose some inlining
effect due to replacing native memcpy, but I did not expect it would
have an impact like this. In my testing I was seeing a performance
improvement from replacing the careful / open-coded copy with rep;
mov;, which increases the surprise of this result.

2020-08-07 07:21:01

by Chen, Rong A

[permalink] [raw]
Subject: Re: [x86/copy_mc] a0ac629ebe: fio.read_iops -43.3% regression

On Thu, Aug 06, 2020 at 05:35:00PM +0200, Ingo Molnar wrote:
>
> * Dan Williams <[email protected]> wrote:
>
> > On Thu, Aug 6, 2020 at 6:35 AM Ingo Molnar <[email protected]> wrote:
> > >
> > >
> > > * kernel test robot <[email protected]> wrote:
> > >
> > > > Greeting,
> > > >
> > > > FYI, we noticed a -43.3% regression of fio.read_iops due to commit:
> > > >
> > > >
> > > > commit: a0ac629ebe7b3d248cb93807782a00d9142fdb98 ("x86/copy_mc: Introduce copy_mc_generic()")
> > > > url: https://github.com/0day-ci/linux/commits/Dan-Williams/Renovate-memcpy_mcsafe-with-copy_mc_to_-user-kernel/20200802-014046
> > > >
> > > >
> > > > in testcase: fio-basic
> > > > on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 256G memory
> > > > with following parameters:
> > >
> > > So this performance regression, if it isn't a spurious result, looks
> > > concerning. Is this expected?
> >
> > This is not expected and I think delays these patches until I'm back
> > from leave in a few weeks. I know that we might lose some inlining
> > effect due to replacing native memcpy, but I did not expect it would
> > have an impact like this. In my testing I was seeing a performance
> > improvement from replacing the careful / open-coded copy with rep;
> > mov;, which increases the surprise of this result.
>
> It would be nice to double check this on the kernel-test-robot side as
> well, to make sure it's not a false positive.
>

Hi Ingo,

We recompiled the kernels with option "-falign-functions=32", and the
regression still exists:

7476b91d4db369d8 a0ac629ebe7b3d248cb9380778 testcase/testparams/testbox
---------------- -------------------------- ---------------------------
%stddev change %stddev
\ | \
22103 -43% 12551 fio-basic/2M-performance-2pmem-xfs-libaio-dax-50%-200s-read-200G-tb-ucode=0x5002f01/lkp-csl-2sp6
22103 -43% 12551 GEO-MEAN fio.read_iops

Best Regards,
Rong Chen

2020-09-23 01:02:08

by Dan Williams

[permalink] [raw]
Subject: Re: [x86/copy_mc] a0ac629ebe: fio.read_iops -43.3% regression

On Thu, Aug 6, 2020 at 8:35 AM Ingo Molnar <[email protected]> wrote:
>
>
> * Dan Williams <[email protected]> wrote:
>
> > On Thu, Aug 6, 2020 at 6:35 AM Ingo Molnar <[email protected]> wrote:
> > >
> > >
> > > * kernel test robot <[email protected]> wrote:
> > >
> > > > Greeting,
> > > >
> > > > FYI, we noticed a -43.3% regression of fio.read_iops due to commit:
> > > >
> > > >
> > > > commit: a0ac629ebe7b3d248cb93807782a00d9142fdb98 ("x86/copy_mc: Introduce copy_mc_generic()")
> > > > url: https://github.com/0day-ci/linux/commits/Dan-Williams/Renovate-memcpy_mcsafe-with-copy_mc_to_-user-kernel/20200802-014046
> > > >
> > > >
> > > > in testcase: fio-basic
> > > > on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 256G memory
> > > > with following parameters:
> > >
> > > So this performance regression, if it isn't a spurious result, looks
> > > concerning. Is this expected?
> >
> > This is not expected and I think delays these patches until I'm back
> > from leave in a few weeks. I know that we might lose some inlining
> > effect due to replacing native memcpy, but I did not expect it would
> > have an impact like this. In my testing I was seeing a performance
> > improvement from replacing the careful / open-coded copy with rep;
> > mov;, which increases the surprise of this result.
>
> It would be nice to double check this on the kernel-test-robot side as
> well, to make sure it's not a false positive.

Circling back to this, I found the bug. This incremental patch nearly
doubles the iops in the case when copy_mc_fragile() is enabled because
it was turning around and redoing the copy with copy_mc_generic(). So
this would have been a regression for existing systems that indicate
that "carefu/fragilel" copying can avoid some PCC=1 machine checks. My
performance checkout was comparing copy_mc_fragile() and
copy_mc_generic() in isolation. Refreshed patches inbound.

diff --git a/arch/x86/lib/copy_mc.c b/arch/x86/lib/copy_mc.c
index 9e6fac1ab72e..afac844c8f45 100644
--- a/arch/x86/lib/copy_mc.c
+++ b/arch/x86/lib/copy_mc.c
@@ -58,7 +58,8 @@ copy_mc_to_user(void *to, const void *from, unsigned len)
__uaccess_begin();
if (static_branch_unlikely(&copy_mc_fragile_key))
ret = copy_mc_fragile(to, from, len);
- ret = copy_mc_generic(to, from, len);
+ else
+ ret = copy_mc_generic(to, from, len);
__uaccess_end();
return ret;
}