Less code means less bugs so add a knob to skip the compat stuff.
This is tested on ppc64le top of
https://patchwork.ozlabs.org/cover/1153556/
Changes in v2: saner CONFIG_COMPAT ifdefs
Changes in v3:
- change llseek to 32bit instead of builing it unconditionally in fs
- clanup the makefile conditionals
- remove some ifdefs or convert to IS_DEFINED where possible
Changes in v4:
- cleanup is_32bit_task and current_is_64bit
- more makefile cleanup
Changes in v5:
- more current_is_64bit cleanup
- split off callchain.c 32bit and 64bit parts
Changes in v6:
- cleanup makefile after split
- consolidate read_user_stack_32
- fix some checkpatch warnings
Changes in v7:
- add back __ARCH_WANT_SYS_LLSEEK to fix build with llseek
- remove leftover hunk
- add review tags
Michal Suchanek (6):
powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro
powerpc: move common register copy functions from signal_32.c to
signal.c
powerpc/perf: consolidate read_user_stack_32
powerpc/64: make buildable without CONFIG_COMPAT
powerpc/64: Make COMPAT user-selectable disabled on littleendian by
default.
powerpc/perf: split callchain.c by bitness
arch/powerpc/Kconfig | 5 +-
arch/powerpc/include/asm/thread_info.h | 4 +-
arch/powerpc/include/asm/unistd.h | 1 +
arch/powerpc/kernel/Makefile | 7 +-
arch/powerpc/kernel/entry_64.S | 2 +
arch/powerpc/kernel/signal.c | 144 +++++++++-
arch/powerpc/kernel/signal_32.c | 140 ---------
arch/powerpc/kernel/syscall_64.c | 6 +-
arch/powerpc/kernel/vdso.c | 5 +-
arch/powerpc/perf/Makefile | 5 +-
arch/powerpc/perf/callchain.c | 377 +------------------------
arch/powerpc/perf/callchain.h | 11 +
arch/powerpc/perf/callchain_32.c | 204 +++++++++++++
arch/powerpc/perf/callchain_64.c | 185 ++++++++++++
fs/read_write.c | 3 +-
15 files changed, 566 insertions(+), 533 deletions(-)
create mode 100644 arch/powerpc/perf/callchain.h
create mode 100644 arch/powerpc/perf/callchain_32.c
create mode 100644 arch/powerpc/perf/callchain_64.c
--
2.22.0
There are two almost identical copies for 32bit and 64bit.
The function is used only in 32bit code which will be split out in next
patch so consolidate to one function.
Signed-off-by: Michal Suchanek <[email protected]>
Reviewed-by: Christophe Leroy <[email protected]>
---
new patch in v6
---
arch/powerpc/perf/callchain.c | 25 +++++++++----------------
1 file changed, 9 insertions(+), 16 deletions(-)
diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index c84bbd4298a0..b7cdcce20280 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -165,22 +165,6 @@ static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
return read_user_stack_slow(ptr, ret, 8);
}
-static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
-{
- if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
- ((unsigned long)ptr & 3))
- return -EFAULT;
-
- pagefault_disable();
- if (!__get_user_inatomic(*ret, ptr)) {
- pagefault_enable();
- return 0;
- }
- pagefault_enable();
-
- return read_user_stack_slow(ptr, ret, 4);
-}
-
static inline int valid_user_sp(unsigned long sp, int is_64)
{
if (!sp || (sp & 7) || sp > (is_64 ? TASK_SIZE : 0x100000000UL) - 32)
@@ -295,6 +279,12 @@ static inline int current_is_64bit(void)
}
#else /* CONFIG_PPC64 */
+static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
+{
+ return 0;
+}
+#endif /* CONFIG_PPC64 */
+
/*
* On 32-bit we just access the address and let hash_page create a
* HPTE if necessary, so there is no need to fall back to reading
@@ -313,9 +303,12 @@ static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
rc = __get_user_inatomic(*ret, ptr);
pagefault_enable();
+ if (IS_ENABLED(CONFIG_PPC64) && rc)
+ return read_user_stack_slow(ptr, ret, 4);
return rc;
}
+#ifndef CONFIG_PPC64
static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
struct pt_regs *regs)
{
--
2.22.0
On bigendian ppc64 it is common to have 32bit legacy binaries but much
less so on littleendian.
Signed-off-by: Michal Suchanek <[email protected]>
Reviewed-by: Christophe Leroy <[email protected]>
---
v3: make configurable
---
arch/powerpc/Kconfig | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 5bab0bb6b833..b0339e892329 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -264,8 +264,9 @@ config PANIC_TIMEOUT
default 180
config COMPAT
- bool
- default y if PPC64
+ bool "Enable support for 32bit binaries"
+ depends on PPC64
+ default y if !CPU_LITTLE_ENDIAN
select COMPAT_BINFMT_ELF
select ARCH_WANT_OLD_COMPAT_IPC
select COMPAT_OLD_SIGACTION
--
2.22.0
Building callchain.c with !COMPAT proved quite ugly with all the
defines. Splitting out the 32bit and 64bit parts looks better.
No code change intended.
Signed-off-by: Michal Suchanek <[email protected]>
Reviewed-by: Christophe Leroy <[email protected]>
---
v6:
- move current_is_64bit consolidetaion to earlier patch
- move defines to the top of callchain_32.c
- Makefile cleanup
---
arch/powerpc/perf/Makefile | 5 +-
arch/powerpc/perf/callchain.c | 365 +------------------------------
arch/powerpc/perf/callchain.h | 11 +
arch/powerpc/perf/callchain_32.c | 204 +++++++++++++++++
arch/powerpc/perf/callchain_64.c | 185 ++++++++++++++++
5 files changed, 405 insertions(+), 365 deletions(-)
create mode 100644 arch/powerpc/perf/callchain.h
create mode 100644 arch/powerpc/perf/callchain_32.c
create mode 100644 arch/powerpc/perf/callchain_64.c
diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
index c155dcbb8691..53d614e98537 100644
--- a/arch/powerpc/perf/Makefile
+++ b/arch/powerpc/perf/Makefile
@@ -1,6 +1,9 @@
# SPDX-License-Identifier: GPL-2.0
-obj-$(CONFIG_PERF_EVENTS) += callchain.o perf_regs.o
+obj-$(CONFIG_PERF_EVENTS) += callchain.o callchain_$(BITS).o perf_regs.o
+ifdef CONFIG_COMPAT
+obj-$(CONFIG_PERF_EVENTS) += callchain_32.o
+endif
obj-$(CONFIG_PPC_PERF_CTRS) += core-book3s.o bhrb.o
obj64-$(CONFIG_PPC_PERF_CTRS) += ppc970-pmu.o power5-pmu.o \
diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index 9b8dc822f531..8f30f1b47c78 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -15,11 +15,9 @@
#include <asm/sigcontext.h>
#include <asm/ucontext.h>
#include <asm/vdso.h>
-#ifdef CONFIG_COMPAT
-#include "../kernel/ppc32.h"
-#endif
#include <asm/pte-walk.h>
+#include "callchain.h"
/*
* Is sp valid as the address of the next kernel stack frame after prev_sp?
@@ -102,367 +100,6 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
}
}
-#ifdef CONFIG_PPC64
-/*
- * On 64-bit we don't want to invoke hash_page on user addresses from
- * interrupt context, so if the access faults, we read the page tables
- * to find which page (if any) is mapped and access it directly.
- */
-static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
-{
- int ret = -EFAULT;
- pgd_t *pgdir;
- pte_t *ptep, pte;
- unsigned shift;
- unsigned long addr = (unsigned long) ptr;
- unsigned long offset;
- unsigned long pfn, flags;
- void *kaddr;
-
- pgdir = current->mm->pgd;
- if (!pgdir)
- return -EFAULT;
-
- local_irq_save(flags);
- ptep = find_current_mm_pte(pgdir, addr, NULL, &shift);
- if (!ptep)
- goto err_out;
- if (!shift)
- shift = PAGE_SHIFT;
-
- /* align address to page boundary */
- offset = addr & ((1UL << shift) - 1);
-
- pte = READ_ONCE(*ptep);
- if (!pte_present(pte) || !pte_user(pte))
- goto err_out;
- pfn = pte_pfn(pte);
- if (!page_is_ram(pfn))
- goto err_out;
-
- /* no highmem to worry about here */
- kaddr = pfn_to_kaddr(pfn);
- memcpy(buf, kaddr + offset, nb);
- ret = 0;
-err_out:
- local_irq_restore(flags);
- return ret;
-}
-
-static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
-{
- if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned long) ||
- ((unsigned long)ptr & 7))
- return -EFAULT;
-
- pagefault_disable();
- if (!__get_user_inatomic(*ret, ptr)) {
- pagefault_enable();
- return 0;
- }
- pagefault_enable();
-
- return read_user_stack_slow(ptr, ret, 8);
-}
-
-static inline int valid_user_sp(unsigned long sp, int is_64)
-{
- if (!sp || (sp & 7) || sp > (is_64 ? TASK_SIZE : 0x100000000UL) - 32)
- return 0;
- return 1;
-}
-
-/*
- * 64-bit user processes use the same stack frame for RT and non-RT signals.
- */
-struct signal_frame_64 {
- char dummy[__SIGNAL_FRAMESIZE];
- struct ucontext uc;
- unsigned long unused[2];
- unsigned int tramp[6];
- struct siginfo *pinfo;
- void *puc;
- struct siginfo info;
- char abigap[288];
-};
-
-static int is_sigreturn_64_address(unsigned long nip, unsigned long fp)
-{
- if (nip == fp + offsetof(struct signal_frame_64, tramp))
- return 1;
- if (vdso64_rt_sigtramp && current->mm->context.vdso_base &&
- nip == current->mm->context.vdso_base + vdso64_rt_sigtramp)
- return 1;
- return 0;
-}
-
-/*
- * Do some sanity checking on the signal frame pointed to by sp.
- * We check the pinfo and puc pointers in the frame.
- */
-static int sane_signal_64_frame(unsigned long sp)
-{
- struct signal_frame_64 __user *sf;
- unsigned long pinfo, puc;
-
- sf = (struct signal_frame_64 __user *) sp;
- if (read_user_stack_64((unsigned long __user *) &sf->pinfo, &pinfo) ||
- read_user_stack_64((unsigned long __user *) &sf->puc, &puc))
- return 0;
- return pinfo == (unsigned long) &sf->info &&
- puc == (unsigned long) &sf->uc;
-}
-
-static void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
- struct pt_regs *regs)
-{
- unsigned long sp, next_sp;
- unsigned long next_ip;
- unsigned long lr;
- long level = 0;
- struct signal_frame_64 __user *sigframe;
- unsigned long __user *fp, *uregs;
-
- next_ip = perf_instruction_pointer(regs);
- lr = regs->link;
- sp = regs->gpr[1];
- perf_callchain_store(entry, next_ip);
-
- while (entry->nr < entry->max_stack) {
- fp = (unsigned long __user *) sp;
- if (!valid_user_sp(sp, 1) || read_user_stack_64(fp, &next_sp))
- return;
- if (level > 0 && read_user_stack_64(&fp[2], &next_ip))
- return;
-
- /*
- * Note: the next_sp - sp >= signal frame size check
- * is true when next_sp < sp, which can happen when
- * transitioning from an alternate signal stack to the
- * normal stack.
- */
- if (next_sp - sp >= sizeof(struct signal_frame_64) &&
- (is_sigreturn_64_address(next_ip, sp) ||
- (level <= 1 && is_sigreturn_64_address(lr, sp))) &&
- sane_signal_64_frame(sp)) {
- /*
- * This looks like an signal frame
- */
- sigframe = (struct signal_frame_64 __user *) sp;
- uregs = sigframe->uc.uc_mcontext.gp_regs;
- if (read_user_stack_64(&uregs[PT_NIP], &next_ip) ||
- read_user_stack_64(&uregs[PT_LNK], &lr) ||
- read_user_stack_64(&uregs[PT_R1], &sp))
- return;
- level = 0;
- perf_callchain_store_context(entry, PERF_CONTEXT_USER);
- perf_callchain_store(entry, next_ip);
- continue;
- }
-
- if (level == 0)
- next_ip = lr;
- perf_callchain_store(entry, next_ip);
- ++level;
- sp = next_sp;
- }
-}
-
-#else /* CONFIG_PPC64 */
-static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
-{
- return 0;
-}
-#endif /* CONFIG_PPC64 */
-
-/*
- * On 32-bit we just access the address and let hash_page create a
- * HPTE if necessary, so there is no need to fall back to reading
- * the page tables. Since this is called at interrupt level,
- * do_page_fault() won't treat a DSI as a page fault.
- */
-static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
-{
- int rc;
-
- if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
- ((unsigned long)ptr & 3))
- return -EFAULT;
-
- pagefault_disable();
- rc = __get_user_inatomic(*ret, ptr);
- pagefault_enable();
-
- if (IS_ENABLED(CONFIG_PPC64) && rc)
- return read_user_stack_slow(ptr, ret, 4);
- return rc;
-}
-
-#ifndef CONFIG_PPC64
-static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
- struct pt_regs *regs)
-{
-}
-
-static inline int valid_user_sp(unsigned long sp, int is_64)
-{
- if (!sp || (sp & 7) || sp > TASK_SIZE - 32)
- return 0;
- return 1;
-}
-
-#define __SIGNAL_FRAMESIZE32 __SIGNAL_FRAMESIZE
-#define sigcontext32 sigcontext
-#define mcontext32 mcontext
-#define ucontext32 ucontext
-#define compat_siginfo_t struct siginfo
-
-#endif /* CONFIG_PPC64 */
-
-#if defined(CONFIG_PPC32) || defined(CONFIG_COMPAT)
-/*
- * Layout for non-RT signal frames
- */
-struct signal_frame_32 {
- char dummy[__SIGNAL_FRAMESIZE32];
- struct sigcontext32 sctx;
- struct mcontext32 mctx;
- int abigap[56];
-};
-
-/*
- * Layout for RT signal frames
- */
-struct rt_signal_frame_32 {
- char dummy[__SIGNAL_FRAMESIZE32 + 16];
- compat_siginfo_t info;
- struct ucontext32 uc;
- int abigap[56];
-};
-
-static int is_sigreturn_32_address(unsigned int nip, unsigned int fp)
-{
- if (nip == fp + offsetof(struct signal_frame_32, mctx.mc_pad))
- return 1;
- if (vdso32_sigtramp && current->mm->context.vdso_base &&
- nip == current->mm->context.vdso_base + vdso32_sigtramp)
- return 1;
- return 0;
-}
-
-static int is_rt_sigreturn_32_address(unsigned int nip, unsigned int fp)
-{
- if (nip == fp + offsetof(struct rt_signal_frame_32,
- uc.uc_mcontext.mc_pad))
- return 1;
- if (vdso32_rt_sigtramp && current->mm->context.vdso_base &&
- nip == current->mm->context.vdso_base + vdso32_rt_sigtramp)
- return 1;
- return 0;
-}
-
-static int sane_signal_32_frame(unsigned int sp)
-{
- struct signal_frame_32 __user *sf;
- unsigned int regs;
-
- sf = (struct signal_frame_32 __user *) (unsigned long) sp;
- if (read_user_stack_32((unsigned int __user *) &sf->sctx.regs, ®s))
- return 0;
- return regs == (unsigned long) &sf->mctx;
-}
-
-static int sane_rt_signal_32_frame(unsigned int sp)
-{
- struct rt_signal_frame_32 __user *sf;
- unsigned int regs;
-
- sf = (struct rt_signal_frame_32 __user *) (unsigned long) sp;
- if (read_user_stack_32((unsigned int __user *) &sf->uc.uc_regs, ®s))
- return 0;
- return regs == (unsigned long) &sf->uc.uc_mcontext;
-}
-
-static unsigned int __user *signal_frame_32_regs(unsigned int sp,
- unsigned int next_sp, unsigned int next_ip)
-{
- struct mcontext32 __user *mctx = NULL;
- struct signal_frame_32 __user *sf;
- struct rt_signal_frame_32 __user *rt_sf;
-
- /*
- * Note: the next_sp - sp >= signal frame size check
- * is true when next_sp < sp, for example, when
- * transitioning from an alternate signal stack to the
- * normal stack.
- */
- if (next_sp - sp >= sizeof(struct signal_frame_32) &&
- is_sigreturn_32_address(next_ip, sp) &&
- sane_signal_32_frame(sp)) {
- sf = (struct signal_frame_32 __user *) (unsigned long) sp;
- mctx = &sf->mctx;
- }
-
- if (!mctx && next_sp - sp >= sizeof(struct rt_signal_frame_32) &&
- is_rt_sigreturn_32_address(next_ip, sp) &&
- sane_rt_signal_32_frame(sp)) {
- rt_sf = (struct rt_signal_frame_32 __user *) (unsigned long) sp;
- mctx = &rt_sf->uc.uc_mcontext;
- }
-
- if (!mctx)
- return NULL;
- return mctx->mc_gregs;
-}
-
-static void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
- struct pt_regs *regs)
-{
- unsigned int sp, next_sp;
- unsigned int next_ip;
- unsigned int lr;
- long level = 0;
- unsigned int __user *fp, *uregs;
-
- next_ip = perf_instruction_pointer(regs);
- lr = regs->link;
- sp = regs->gpr[1];
- perf_callchain_store(entry, next_ip);
-
- while (entry->nr < entry->max_stack) {
- fp = (unsigned int __user *) (unsigned long) sp;
- if (!valid_user_sp(sp, 0) || read_user_stack_32(fp, &next_sp))
- return;
- if (level > 0 && read_user_stack_32(&fp[1], &next_ip))
- return;
-
- uregs = signal_frame_32_regs(sp, next_sp, next_ip);
- if (!uregs && level <= 1)
- uregs = signal_frame_32_regs(sp, next_sp, lr);
- if (uregs) {
- /*
- * This looks like an signal frame, so restart
- * the stack trace with the values in it.
- */
- if (read_user_stack_32(&uregs[PT_NIP], &next_ip) ||
- read_user_stack_32(&uregs[PT_LNK], &lr) ||
- read_user_stack_32(&uregs[PT_R1], &sp))
- return;
- level = 0;
- perf_callchain_store_context(entry, PERF_CONTEXT_USER);
- perf_callchain_store(entry, next_ip);
- continue;
- }
-
- if (level == 0)
- next_ip = lr;
- perf_callchain_store(entry, next_ip);
- ++level;
- sp = next_sp;
- }
-}
-#endif /* 32bit */
-
static inline int current_is_64bit(void)
{
if (!IS_ENABLED(CONFIG_COMPAT))
diff --git a/arch/powerpc/perf/callchain.h b/arch/powerpc/perf/callchain.h
new file mode 100644
index 000000000000..63ffb43f3668
--- /dev/null
+++ b/arch/powerpc/perf/callchain.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef _POWERPC_PERF_CALLCHAIN_H
+#define _POWERPC_PERF_CALLCHAIN_H
+
+int read_user_stack_slow(void __user *ptr, void *buf, int nb);
+void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
+ struct pt_regs *regs);
+void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
+ struct pt_regs *regs);
+
+#endif /* _POWERPC_PERF_CALLCHAIN_H */
diff --git a/arch/powerpc/perf/callchain_32.c b/arch/powerpc/perf/callchain_32.c
new file mode 100644
index 000000000000..01a38d929078
--- /dev/null
+++ b/arch/powerpc/perf/callchain_32.c
@@ -0,0 +1,204 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Performance counter callchain support - powerpc architecture code
+ *
+ * Copyright © 2009 Paul Mackerras, IBM Corporation.
+ */
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/perf_event.h>
+#include <linux/percpu.h>
+#include <linux/uaccess.h>
+#include <linux/mm.h>
+#include <asm/ptrace.h>
+#include <asm/pgtable.h>
+#include <asm/sigcontext.h>
+#include <asm/ucontext.h>
+#include <asm/vdso.h>
+#include <asm/pte-walk.h>
+
+#include "callchain.h"
+
+#ifdef CONFIG_PPC64
+#include "../kernel/ppc32.h"
+#else /* CONFIG_PPC64 */
+
+#define __SIGNAL_FRAMESIZE32 __SIGNAL_FRAMESIZE
+#define sigcontext32 sigcontext
+#define mcontext32 mcontext
+#define ucontext32 ucontext
+#define compat_siginfo_t struct siginfo
+
+#endif /* CONFIG_PPC64 */
+
+/*
+ * On 32-bit we just access the address and let hash_page create a
+ * HPTE if necessary, so there is no need to fall back to reading
+ * the page tables. Since this is called at interrupt level,
+ * do_page_fault() won't treat a DSI as a page fault.
+ */
+static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
+{
+ int rc;
+
+ if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
+ ((unsigned long)ptr & 3))
+ return -EFAULT;
+
+ pagefault_disable();
+ rc = __get_user_inatomic(*ret, ptr);
+ pagefault_enable();
+
+ if (IS_ENABLED(CONFIG_PPC64) && rc)
+ return read_user_stack_slow(ptr, ret, 4);
+ return rc;
+}
+
+static inline int valid_user_sp(unsigned long sp, int is_64)
+{
+ if (!sp || (sp & 7) || sp > TASK_SIZE - 32)
+ return 0;
+ return 1;
+}
+
+/*
+ * Layout for non-RT signal frames
+ */
+struct signal_frame_32 {
+ char dummy[__SIGNAL_FRAMESIZE32];
+ struct sigcontext32 sctx;
+ struct mcontext32 mctx;
+ int abigap[56];
+};
+
+/*
+ * Layout for RT signal frames
+ */
+struct rt_signal_frame_32 {
+ char dummy[__SIGNAL_FRAMESIZE32 + 16];
+ compat_siginfo_t info;
+ struct ucontext32 uc;
+ int abigap[56];
+};
+
+static int is_sigreturn_32_address(unsigned int nip, unsigned int fp)
+{
+ if (nip == fp + offsetof(struct signal_frame_32, mctx.mc_pad))
+ return 1;
+ if (vdso32_sigtramp && current->mm->context.vdso_base &&
+ nip == current->mm->context.vdso_base + vdso32_sigtramp)
+ return 1;
+ return 0;
+}
+
+static int is_rt_sigreturn_32_address(unsigned int nip, unsigned int fp)
+{
+ if (nip == fp + offsetof(struct rt_signal_frame_32,
+ uc.uc_mcontext.mc_pad))
+ return 1;
+ if (vdso32_rt_sigtramp && current->mm->context.vdso_base &&
+ nip == current->mm->context.vdso_base + vdso32_rt_sigtramp)
+ return 1;
+ return 0;
+}
+
+static int sane_signal_32_frame(unsigned int sp)
+{
+ struct signal_frame_32 __user *sf;
+ unsigned int regs;
+
+ sf = (struct signal_frame_32 __user *) (unsigned long) sp;
+ if (read_user_stack_32((unsigned int __user *) &sf->sctx.regs, ®s))
+ return 0;
+ return regs == (unsigned long) &sf->mctx;
+}
+
+static int sane_rt_signal_32_frame(unsigned int sp)
+{
+ struct rt_signal_frame_32 __user *sf;
+ unsigned int regs;
+
+ sf = (struct rt_signal_frame_32 __user *) (unsigned long) sp;
+ if (read_user_stack_32((unsigned int __user *) &sf->uc.uc_regs, ®s))
+ return 0;
+ return regs == (unsigned long) &sf->uc.uc_mcontext;
+}
+
+static unsigned int __user *signal_frame_32_regs(unsigned int sp,
+ unsigned int next_sp, unsigned int next_ip)
+{
+ struct mcontext32 __user *mctx = NULL;
+ struct signal_frame_32 __user *sf;
+ struct rt_signal_frame_32 __user *rt_sf;
+
+ /*
+ * Note: the next_sp - sp >= signal frame size check
+ * is true when next_sp < sp, for example, when
+ * transitioning from an alternate signal stack to the
+ * normal stack.
+ */
+ if (next_sp - sp >= sizeof(struct signal_frame_32) &&
+ is_sigreturn_32_address(next_ip, sp) &&
+ sane_signal_32_frame(sp)) {
+ sf = (struct signal_frame_32 __user *) (unsigned long) sp;
+ mctx = &sf->mctx;
+ }
+
+ if (!mctx && next_sp - sp >= sizeof(struct rt_signal_frame_32) &&
+ is_rt_sigreturn_32_address(next_ip, sp) &&
+ sane_rt_signal_32_frame(sp)) {
+ rt_sf = (struct rt_signal_frame_32 __user *) (unsigned long) sp;
+ mctx = &rt_sf->uc.uc_mcontext;
+ }
+
+ if (!mctx)
+ return NULL;
+ return mctx->mc_gregs;
+}
+
+void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
+ struct pt_regs *regs)
+{
+ unsigned int sp, next_sp;
+ unsigned int next_ip;
+ unsigned int lr;
+ long level = 0;
+ unsigned int __user *fp, *uregs;
+
+ next_ip = perf_instruction_pointer(regs);
+ lr = regs->link;
+ sp = regs->gpr[1];
+ perf_callchain_store(entry, next_ip);
+
+ while (entry->nr < entry->max_stack) {
+ fp = (unsigned int __user *) (unsigned long) sp;
+ if (!valid_user_sp(sp, 0) || read_user_stack_32(fp, &next_sp))
+ return;
+ if (level > 0 && read_user_stack_32(&fp[1], &next_ip))
+ return;
+
+ uregs = signal_frame_32_regs(sp, next_sp, next_ip);
+ if (!uregs && level <= 1)
+ uregs = signal_frame_32_regs(sp, next_sp, lr);
+ if (uregs) {
+ /*
+ * This looks like an signal frame, so restart
+ * the stack trace with the values in it.
+ */
+ if (read_user_stack_32(&uregs[PT_NIP], &next_ip) ||
+ read_user_stack_32(&uregs[PT_LNK], &lr) ||
+ read_user_stack_32(&uregs[PT_R1], &sp))
+ return;
+ level = 0;
+ perf_callchain_store_context(entry, PERF_CONTEXT_USER);
+ perf_callchain_store(entry, next_ip);
+ continue;
+ }
+
+ if (level == 0)
+ next_ip = lr;
+ perf_callchain_store(entry, next_ip);
+ ++level;
+ sp = next_sp;
+ }
+}
diff --git a/arch/powerpc/perf/callchain_64.c b/arch/powerpc/perf/callchain_64.c
new file mode 100644
index 000000000000..60308c2221a8
--- /dev/null
+++ b/arch/powerpc/perf/callchain_64.c
@@ -0,0 +1,185 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Performance counter callchain support - powerpc architecture code
+ *
+ * Copyright © 2009 Paul Mackerras, IBM Corporation.
+ */
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/perf_event.h>
+#include <linux/percpu.h>
+#include <linux/uaccess.h>
+#include <linux/mm.h>
+#include <asm/ptrace.h>
+#include <asm/pgtable.h>
+#include <asm/sigcontext.h>
+#include <asm/ucontext.h>
+#include <asm/vdso.h>
+#include <asm/pte-walk.h>
+
+#include "callchain.h"
+
+/*
+ * On 64-bit we don't want to invoke hash_page on user addresses from
+ * interrupt context, so if the access faults, we read the page tables
+ * to find which page (if any) is mapped and access it directly.
+ */
+int read_user_stack_slow(void __user *ptr, void *buf, int nb)
+{
+ int ret = -EFAULT;
+ pgd_t *pgdir;
+ pte_t *ptep, pte;
+ unsigned int shift;
+ unsigned long addr = (unsigned long) ptr;
+ unsigned long offset;
+ unsigned long pfn, flags;
+ void *kaddr;
+
+ pgdir = current->mm->pgd;
+ if (!pgdir)
+ return -EFAULT;
+
+ local_irq_save(flags);
+ ptep = find_current_mm_pte(pgdir, addr, NULL, &shift);
+ if (!ptep)
+ goto err_out;
+ if (!shift)
+ shift = PAGE_SHIFT;
+
+ /* align address to page boundary */
+ offset = addr & ((1UL << shift) - 1);
+
+ pte = READ_ONCE(*ptep);
+ if (!pte_present(pte) || !pte_user(pte))
+ goto err_out;
+ pfn = pte_pfn(pte);
+ if (!page_is_ram(pfn))
+ goto err_out;
+
+ /* no highmem to worry about here */
+ kaddr = pfn_to_kaddr(pfn);
+ memcpy(buf, kaddr + offset, nb);
+ ret = 0;
+err_out:
+ local_irq_restore(flags);
+ return ret;
+}
+
+static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
+{
+ if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned long) ||
+ ((unsigned long)ptr & 7))
+ return -EFAULT;
+
+ pagefault_disable();
+ if (!__get_user_inatomic(*ret, ptr)) {
+ pagefault_enable();
+ return 0;
+ }
+ pagefault_enable();
+
+ return read_user_stack_slow(ptr, ret, 8);
+}
+
+static inline int valid_user_sp(unsigned long sp, int is_64)
+{
+ if (!sp || (sp & 7) || sp > (is_64 ? TASK_SIZE : 0x100000000UL) - 32)
+ return 0;
+ return 1;
+}
+
+/*
+ * 64-bit user processes use the same stack frame for RT and non-RT signals.
+ */
+struct signal_frame_64 {
+ char dummy[__SIGNAL_FRAMESIZE];
+ struct ucontext uc;
+ unsigned long unused[2];
+ unsigned int tramp[6];
+ struct siginfo *pinfo;
+ void *puc;
+ struct siginfo info;
+ char abigap[288];
+};
+
+static int is_sigreturn_64_address(unsigned long nip, unsigned long fp)
+{
+ if (nip == fp + offsetof(struct signal_frame_64, tramp))
+ return 1;
+ if (vdso64_rt_sigtramp && current->mm->context.vdso_base &&
+ nip == current->mm->context.vdso_base + vdso64_rt_sigtramp)
+ return 1;
+ return 0;
+}
+
+/*
+ * Do some sanity checking on the signal frame pointed to by sp.
+ * We check the pinfo and puc pointers in the frame.
+ */
+static int sane_signal_64_frame(unsigned long sp)
+{
+ struct signal_frame_64 __user *sf;
+ unsigned long pinfo, puc;
+
+ sf = (struct signal_frame_64 __user *) sp;
+ if (read_user_stack_64((unsigned long __user *) &sf->pinfo, &pinfo) ||
+ read_user_stack_64((unsigned long __user *) &sf->puc, &puc))
+ return 0;
+ return pinfo == (unsigned long) &sf->info &&
+ puc == (unsigned long) &sf->uc;
+}
+
+void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
+ struct pt_regs *regs)
+{
+ unsigned long sp, next_sp;
+ unsigned long next_ip;
+ unsigned long lr;
+ long level = 0;
+ struct signal_frame_64 __user *sigframe;
+ unsigned long __user *fp, *uregs;
+
+ next_ip = perf_instruction_pointer(regs);
+ lr = regs->link;
+ sp = regs->gpr[1];
+ perf_callchain_store(entry, next_ip);
+
+ while (entry->nr < entry->max_stack) {
+ fp = (unsigned long __user *) sp;
+ if (!valid_user_sp(sp, 1) || read_user_stack_64(fp, &next_sp))
+ return;
+ if (level > 0 && read_user_stack_64(&fp[2], &next_ip))
+ return;
+
+ /*
+ * Note: the next_sp - sp >= signal frame size check
+ * is true when next_sp < sp, which can happen when
+ * transitioning from an alternate signal stack to the
+ * normal stack.
+ */
+ if (next_sp - sp >= sizeof(struct signal_frame_64) &&
+ (is_sigreturn_64_address(next_ip, sp) ||
+ (level <= 1 && is_sigreturn_64_address(lr, sp))) &&
+ sane_signal_64_frame(sp)) {
+ /*
+ * This looks like an signal frame
+ */
+ sigframe = (struct signal_frame_64 __user *) sp;
+ uregs = sigframe->uc.uc_mcontext.gp_regs;
+ if (read_user_stack_64(&uregs[PT_NIP], &next_ip) ||
+ read_user_stack_64(&uregs[PT_LNK], &lr) ||
+ read_user_stack_64(&uregs[PT_R1], &sp))
+ return;
+ level = 0;
+ perf_callchain_store_context(entry, PERF_CONTEXT_USER);
+ perf_callchain_store(entry, next_ip);
+ continue;
+ }
+
+ if (level == 0)
+ next_ip = lr;
+ perf_callchain_store(entry, next_ip);
+ ++level;
+ sp = next_sp;
+ }
+}
--
2.22.0
There are numerous references to 32bit functions in generic and 64bit
code so ifdef them out.
Signed-off-by: Michal Suchanek <[email protected]>
---
v2:
- fix 32bit ifdef condition in signal.c
- simplify the compat ifdef condition in vdso.c - 64bit is redundant
- simplify the compat ifdef condition in callchain.c - 64bit is redundant
v3:
- use IS_ENABLED and maybe_unused where possible
- do not ifdef declarations
- clean up Makefile
v4:
- further makefile cleanup
- simplify is_32bit_task conditions
- avoid ifdef in condition by using return
v5:
- avoid unreachable code on 32bit
- make is_current_64bit constant on !COMPAT
- add stub perf_callchain_user_32 to avoid some ifdefs
v6:
- consolidate current_is_64bit
v7:
- remove leftover perf_callchain_user_32 stub from previous series version
---
arch/powerpc/include/asm/thread_info.h | 4 ++--
arch/powerpc/kernel/Makefile | 7 +++---
arch/powerpc/kernel/entry_64.S | 2 ++
arch/powerpc/kernel/signal.c | 3 +--
arch/powerpc/kernel/syscall_64.c | 6 ++---
arch/powerpc/kernel/vdso.c | 5 ++---
arch/powerpc/perf/callchain.c | 31 +++++++++++++-------------
7 files changed, 27 insertions(+), 31 deletions(-)
diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h
index 8e1d0195ac36..c128d8a48ea3 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -144,10 +144,10 @@ static inline bool test_thread_local_flags(unsigned int flags)
return (ti->local_flags & flags) != 0;
}
-#ifdef CONFIG_PPC64
+#ifdef CONFIG_COMPAT
#define is_32bit_task() (test_thread_flag(TIF_32BIT))
#else
-#define is_32bit_task() (1)
+#define is_32bit_task() (IS_ENABLED(CONFIG_PPC32))
#endif
#if defined(CONFIG_PPC64)
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 1d646a94d96c..9d8772e863b9 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -44,16 +44,15 @@ CFLAGS_btext.o += -DDISABLE_BRANCH_PROFILING
endif
obj-y := cputable.o ptrace.o syscalls.o \
- irq.o align.o signal_32.o pmc.o vdso.o \
+ irq.o align.o signal_$(BITS).o pmc.o vdso.o \
process.o systbl.o idle.o \
signal.o sysfs.o cacheinfo.o time.o \
prom.o traps.o setup-common.o \
udbg.o misc.o io.o misc_$(BITS).o \
of_platform.o prom_parse.o
-obj-$(CONFIG_PPC64) += setup_64.o sys_ppc32.o \
- signal_64.o ptrace32.o \
- paca.o nvram_64.o firmware.o \
+obj-$(CONFIG_PPC64) += setup_64.o paca.o nvram_64.o firmware.o \
syscall_64.o
+obj-$(CONFIG_COMPAT) += sys_ppc32.o ptrace32.o signal_32.o
obj-$(CONFIG_VDSO32) += vdso32/
obj-$(CONFIG_PPC_WATCHDOG) += watchdog.o
obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 2ec825a85f5b..a2dbf216f607 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -51,8 +51,10 @@
SYS_CALL_TABLE:
.tc sys_call_table[TC],sys_call_table
+#ifdef CONFIG_COMPAT
COMPAT_SYS_CALL_TABLE:
.tc compat_sys_call_table[TC],compat_sys_call_table
+#endif
/* This value is used to mark exception frames on the stack. */
exception_marker:
diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index 60436432399f..61678cb0e6a1 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -247,7 +247,6 @@ static void do_signal(struct task_struct *tsk)
sigset_t *oldset = sigmask_to_save();
struct ksignal ksig = { .sig = 0 };
int ret;
- int is32 = is_32bit_task();
BUG_ON(tsk != current);
@@ -277,7 +276,7 @@ static void do_signal(struct task_struct *tsk)
rseq_signal_deliver(&ksig, tsk->thread.regs);
- if (is32) {
+ if (is_32bit_task()) {
if (ksig.ka.sa.sa_flags & SA_SIGINFO)
ret = handle_rt_signal32(&ksig, oldset, tsk);
else
diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
index 98ed970796d5..0d5cbbe54cf1 100644
--- a/arch/powerpc/kernel/syscall_64.c
+++ b/arch/powerpc/kernel/syscall_64.c
@@ -38,7 +38,6 @@ typedef long (*syscall_fn)(long, long, long, long, long, long);
long system_call_exception(long r3, long r4, long r5, long r6, long r7, long r8, unsigned long r0, struct pt_regs *regs)
{
- unsigned long ti_flags;
syscall_fn f;
BUG_ON(!(regs->msr & MSR_PR));
@@ -83,8 +82,7 @@ long system_call_exception(long r3, long r4, long r5, long r6, long r7, long r8,
*/
regs->softe = IRQS_ENABLED;
- ti_flags = current_thread_info()->flags;
- if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) {
+ if (unlikely(current_thread_info()->flags & _TIF_SYSCALL_DOTRACE)) {
/*
* We use the return value of do_syscall_trace_enter() as the
* syscall number. If the syscall was rejected for any reason
@@ -100,7 +98,7 @@ long system_call_exception(long r3, long r4, long r5, long r6, long r7, long r8,
/* May be faster to do array_index_nospec? */
barrier_nospec();
- if (unlikely(ti_flags & _TIF_32BIT)) {
+ if (unlikely(is_32bit_task())) {
f = (void *)compat_sys_call_table[r0];
r3 &= 0x00000000ffffffffULL;
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index d60598113a9f..6d4a077f74d6 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -667,9 +667,7 @@ static void __init vdso_setup_syscall_map(void)
{
unsigned int i;
extern unsigned long *sys_call_table;
-#ifdef CONFIG_PPC64
extern unsigned long *compat_sys_call_table;
-#endif
extern unsigned long sys_ni_syscall;
@@ -678,7 +676,8 @@ static void __init vdso_setup_syscall_map(void)
if (sys_call_table[i] != sys_ni_syscall)
vdso_data->syscall_map_64[i >> 5] |=
0x80000000UL >> (i & 0x1f);
- if (compat_sys_call_table[i] != sys_ni_syscall)
+ if (IS_ENABLED(CONFIG_COMPAT) &&
+ compat_sys_call_table[i] != sys_ni_syscall)
vdso_data->syscall_map_32[i >> 5] |=
0x80000000UL >> (i & 0x1f);
#else /* CONFIG_PPC64 */
diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index b7cdcce20280..9b8dc822f531 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -15,7 +15,7 @@
#include <asm/sigcontext.h>
#include <asm/ucontext.h>
#include <asm/vdso.h>
-#ifdef CONFIG_PPC64
+#ifdef CONFIG_COMPAT
#include "../kernel/ppc32.h"
#endif
#include <asm/pte-walk.h>
@@ -268,16 +268,6 @@ static void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
}
}
-static inline int current_is_64bit(void)
-{
- /*
- * We can't use test_thread_flag() here because we may be on an
- * interrupt stack, and the thread flags don't get copied over
- * from the thread_info on the main stack to the interrupt stack.
- */
- return !test_ti_thread_flag(task_thread_info(current), TIF_32BIT);
-}
-
#else /* CONFIG_PPC64 */
static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
{
@@ -314,11 +304,6 @@ static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry
{
}
-static inline int current_is_64bit(void)
-{
- return 0;
-}
-
static inline int valid_user_sp(unsigned long sp, int is_64)
{
if (!sp || (sp & 7) || sp > TASK_SIZE - 32)
@@ -334,6 +319,7 @@ static inline int valid_user_sp(unsigned long sp, int is_64)
#endif /* CONFIG_PPC64 */
+#if defined(CONFIG_PPC32) || defined(CONFIG_COMPAT)
/*
* Layout for non-RT signal frames
*/
@@ -475,6 +461,19 @@ static void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
sp = next_sp;
}
}
+#endif /* 32bit */
+
+static inline int current_is_64bit(void)
+{
+ if (!IS_ENABLED(CONFIG_COMPAT))
+ return IS_ENABLED(CONFIG_PPC64);
+ /*
+ * We can't use test_thread_flag() here because we may be on an
+ * interrupt stack, and the thread flags don't get copied over
+ * from the thread_info on the main stack to the interrupt stack.
+ */
+ return !test_ti_thread_flag(task_thread_info(current), TIF_32BIT);
+}
void
perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
--
2.22.0
These functions are required for 64bit as well.
Signed-off-by: Michal Suchanek <[email protected]>
Reviewed-by: Christophe Leroy <[email protected]>
---
arch/powerpc/kernel/signal.c | 141 ++++++++++++++++++++++++++++++++
arch/powerpc/kernel/signal_32.c | 140 -------------------------------
2 files changed, 141 insertions(+), 140 deletions(-)
diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index e6c30cee6abf..60436432399f 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -18,12 +18,153 @@
#include <linux/syscalls.h>
#include <asm/hw_breakpoint.h>
#include <linux/uaccess.h>
+#include <asm/switch_to.h>
#include <asm/unistd.h>
#include <asm/debug.h>
#include <asm/tm.h>
#include "signal.h"
+#ifdef CONFIG_VSX
+unsigned long copy_fpr_to_user(void __user *to,
+ struct task_struct *task)
+{
+ u64 buf[ELF_NFPREG];
+ int i;
+
+ /* save FPR copy to local buffer then write to the thread_struct */
+ for (i = 0; i < (ELF_NFPREG - 1) ; i++)
+ buf[i] = task->thread.TS_FPR(i);
+ buf[i] = task->thread.fp_state.fpscr;
+ return __copy_to_user(to, buf, ELF_NFPREG * sizeof(double));
+}
+
+unsigned long copy_fpr_from_user(struct task_struct *task,
+ void __user *from)
+{
+ u64 buf[ELF_NFPREG];
+ int i;
+
+ if (__copy_from_user(buf, from, ELF_NFPREG * sizeof(double)))
+ return 1;
+ for (i = 0; i < (ELF_NFPREG - 1) ; i++)
+ task->thread.TS_FPR(i) = buf[i];
+ task->thread.fp_state.fpscr = buf[i];
+
+ return 0;
+}
+
+unsigned long copy_vsx_to_user(void __user *to,
+ struct task_struct *task)
+{
+ u64 buf[ELF_NVSRHALFREG];
+ int i;
+
+ /* save FPR copy to local buffer then write to the thread_struct */
+ for (i = 0; i < ELF_NVSRHALFREG; i++)
+ buf[i] = task->thread.fp_state.fpr[i][TS_VSRLOWOFFSET];
+ return __copy_to_user(to, buf, ELF_NVSRHALFREG * sizeof(double));
+}
+
+unsigned long copy_vsx_from_user(struct task_struct *task,
+ void __user *from)
+{
+ u64 buf[ELF_NVSRHALFREG];
+ int i;
+
+ if (__copy_from_user(buf, from, ELF_NVSRHALFREG * sizeof(double)))
+ return 1;
+ for (i = 0; i < ELF_NVSRHALFREG ; i++)
+ task->thread.fp_state.fpr[i][TS_VSRLOWOFFSET] = buf[i];
+ return 0;
+}
+
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+unsigned long copy_ckfpr_to_user(void __user *to,
+ struct task_struct *task)
+{
+ u64 buf[ELF_NFPREG];
+ int i;
+
+ /* save FPR copy to local buffer then write to the thread_struct */
+ for (i = 0; i < (ELF_NFPREG - 1) ; i++)
+ buf[i] = task->thread.TS_CKFPR(i);
+ buf[i] = task->thread.ckfp_state.fpscr;
+ return __copy_to_user(to, buf, ELF_NFPREG * sizeof(double));
+}
+
+unsigned long copy_ckfpr_from_user(struct task_struct *task,
+ void __user *from)
+{
+ u64 buf[ELF_NFPREG];
+ int i;
+
+ if (__copy_from_user(buf, from, ELF_NFPREG * sizeof(double)))
+ return 1;
+ for (i = 0; i < (ELF_NFPREG - 1) ; i++)
+ task->thread.TS_CKFPR(i) = buf[i];
+ task->thread.ckfp_state.fpscr = buf[i];
+
+ return 0;
+}
+
+unsigned long copy_ckvsx_to_user(void __user *to,
+ struct task_struct *task)
+{
+ u64 buf[ELF_NVSRHALFREG];
+ int i;
+
+ /* save FPR copy to local buffer then write to the thread_struct */
+ for (i = 0; i < ELF_NVSRHALFREG; i++)
+ buf[i] = task->thread.ckfp_state.fpr[i][TS_VSRLOWOFFSET];
+ return __copy_to_user(to, buf, ELF_NVSRHALFREG * sizeof(double));
+}
+
+unsigned long copy_ckvsx_from_user(struct task_struct *task,
+ void __user *from)
+{
+ u64 buf[ELF_NVSRHALFREG];
+ int i;
+
+ if (__copy_from_user(buf, from, ELF_NVSRHALFREG * sizeof(double)))
+ return 1;
+ for (i = 0; i < ELF_NVSRHALFREG ; i++)
+ task->thread.ckfp_state.fpr[i][TS_VSRLOWOFFSET] = buf[i];
+ return 0;
+}
+#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
+#else
+inline unsigned long copy_fpr_to_user(void __user *to,
+ struct task_struct *task)
+{
+ return __copy_to_user(to, task->thread.fp_state.fpr,
+ ELF_NFPREG * sizeof(double));
+}
+
+inline unsigned long copy_fpr_from_user(struct task_struct *task,
+ void __user *from)
+{
+ return __copy_from_user(task->thread.fp_state.fpr, from,
+ ELF_NFPREG * sizeof(double));
+}
+
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+inline unsigned long copy_ckfpr_to_user(void __user *to,
+ struct task_struct *task)
+{
+ return __copy_to_user(to, task->thread.ckfp_state.fpr,
+ ELF_NFPREG * sizeof(double));
+}
+
+inline unsigned long copy_ckfpr_from_user(struct task_struct *task,
+ void __user *from)
+{
+ return __copy_from_user(task->thread.ckfp_state.fpr, from,
+ ELF_NFPREG * sizeof(double));
+}
+#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
+#endif
+
/* Log an error when sending an unhandled signal to a process. Controlled
* through debug.exception-trace sysctl.
*/
diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index 98600b276f76..c93c937ea568 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -235,146 +235,6 @@ struct rt_sigframe {
int abigap[56];
};
-#ifdef CONFIG_VSX
-unsigned long copy_fpr_to_user(void __user *to,
- struct task_struct *task)
-{
- u64 buf[ELF_NFPREG];
- int i;
-
- /* save FPR copy to local buffer then write to the thread_struct */
- for (i = 0; i < (ELF_NFPREG - 1) ; i++)
- buf[i] = task->thread.TS_FPR(i);
- buf[i] = task->thread.fp_state.fpscr;
- return __copy_to_user(to, buf, ELF_NFPREG * sizeof(double));
-}
-
-unsigned long copy_fpr_from_user(struct task_struct *task,
- void __user *from)
-{
- u64 buf[ELF_NFPREG];
- int i;
-
- if (__copy_from_user(buf, from, ELF_NFPREG * sizeof(double)))
- return 1;
- for (i = 0; i < (ELF_NFPREG - 1) ; i++)
- task->thread.TS_FPR(i) = buf[i];
- task->thread.fp_state.fpscr = buf[i];
-
- return 0;
-}
-
-unsigned long copy_vsx_to_user(void __user *to,
- struct task_struct *task)
-{
- u64 buf[ELF_NVSRHALFREG];
- int i;
-
- /* save FPR copy to local buffer then write to the thread_struct */
- for (i = 0; i < ELF_NVSRHALFREG; i++)
- buf[i] = task->thread.fp_state.fpr[i][TS_VSRLOWOFFSET];
- return __copy_to_user(to, buf, ELF_NVSRHALFREG * sizeof(double));
-}
-
-unsigned long copy_vsx_from_user(struct task_struct *task,
- void __user *from)
-{
- u64 buf[ELF_NVSRHALFREG];
- int i;
-
- if (__copy_from_user(buf, from, ELF_NVSRHALFREG * sizeof(double)))
- return 1;
- for (i = 0; i < ELF_NVSRHALFREG ; i++)
- task->thread.fp_state.fpr[i][TS_VSRLOWOFFSET] = buf[i];
- return 0;
-}
-
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-unsigned long copy_ckfpr_to_user(void __user *to,
- struct task_struct *task)
-{
- u64 buf[ELF_NFPREG];
- int i;
-
- /* save FPR copy to local buffer then write to the thread_struct */
- for (i = 0; i < (ELF_NFPREG - 1) ; i++)
- buf[i] = task->thread.TS_CKFPR(i);
- buf[i] = task->thread.ckfp_state.fpscr;
- return __copy_to_user(to, buf, ELF_NFPREG * sizeof(double));
-}
-
-unsigned long copy_ckfpr_from_user(struct task_struct *task,
- void __user *from)
-{
- u64 buf[ELF_NFPREG];
- int i;
-
- if (__copy_from_user(buf, from, ELF_NFPREG * sizeof(double)))
- return 1;
- for (i = 0; i < (ELF_NFPREG - 1) ; i++)
- task->thread.TS_CKFPR(i) = buf[i];
- task->thread.ckfp_state.fpscr = buf[i];
-
- return 0;
-}
-
-unsigned long copy_ckvsx_to_user(void __user *to,
- struct task_struct *task)
-{
- u64 buf[ELF_NVSRHALFREG];
- int i;
-
- /* save FPR copy to local buffer then write to the thread_struct */
- for (i = 0; i < ELF_NVSRHALFREG; i++)
- buf[i] = task->thread.ckfp_state.fpr[i][TS_VSRLOWOFFSET];
- return __copy_to_user(to, buf, ELF_NVSRHALFREG * sizeof(double));
-}
-
-unsigned long copy_ckvsx_from_user(struct task_struct *task,
- void __user *from)
-{
- u64 buf[ELF_NVSRHALFREG];
- int i;
-
- if (__copy_from_user(buf, from, ELF_NVSRHALFREG * sizeof(double)))
- return 1;
- for (i = 0; i < ELF_NVSRHALFREG ; i++)
- task->thread.ckfp_state.fpr[i][TS_VSRLOWOFFSET] = buf[i];
- return 0;
-}
-#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
-#else
-inline unsigned long copy_fpr_to_user(void __user *to,
- struct task_struct *task)
-{
- return __copy_to_user(to, task->thread.fp_state.fpr,
- ELF_NFPREG * sizeof(double));
-}
-
-inline unsigned long copy_fpr_from_user(struct task_struct *task,
- void __user *from)
-{
- return __copy_from_user(task->thread.fp_state.fpr, from,
- ELF_NFPREG * sizeof(double));
-}
-
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-inline unsigned long copy_ckfpr_to_user(void __user *to,
- struct task_struct *task)
-{
- return __copy_to_user(to, task->thread.ckfp_state.fpr,
- ELF_NFPREG * sizeof(double));
-}
-
-inline unsigned long copy_ckfpr_from_user(struct task_struct *task,
- void __user *from)
-{
- return __copy_from_user(task->thread.ckfp_state.fpr, from,
- ELF_NFPREG * sizeof(double));
-}
-#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
-#endif
-
/*
* Save the current user registers on the user stack.
* We only save the altivec/spe registers if the process has used
--
2.22.0
This partially reverts commit caf6f9c8a326 ("asm-generic: Remove
unneeded __ARCH_WANT_SYS_LLSEEK macro")
When CONFIG_COMPAT is disabled on ppc64 the kernel does not build.
There is resistance to both removing the llseek syscall from the 64bit
syscall tables and building the llseek interface unconditionally.
Link: https://lore.kernel.org/lkml/[email protected]/
Link: https://lore.kernel.org/lkml/20190829214319.498c7de2@naga/
Signed-off-by: Michal Suchanek <[email protected]>
Reviewed-by: Arnd Bergmann <[email protected]>
---
arch/powerpc/include/asm/unistd.h | 1 +
fs/read_write.c | 3 ++-
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h
index b0720c7c3fcf..700fcdac2e3c 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -31,6 +31,7 @@
#define __ARCH_WANT_SYS_SOCKETCALL
#define __ARCH_WANT_SYS_FADVISE64
#define __ARCH_WANT_SYS_GETPGRP
+#define __ARCH_WANT_SYS_LLSEEK
#define __ARCH_WANT_SYS_NICE
#define __ARCH_WANT_SYS_OLD_GETRLIMIT
#define __ARCH_WANT_SYS_OLD_UNAME
diff --git a/fs/read_write.c b/fs/read_write.c
index 5bbf587f5bc1..89aa2701dbeb 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -331,7 +331,8 @@ COMPAT_SYSCALL_DEFINE3(lseek, unsigned int, fd, compat_off_t, offset, unsigned i
}
#endif
-#if !defined(CONFIG_64BIT) || defined(CONFIG_COMPAT)
+#if !defined(CONFIG_64BIT) || defined(CONFIG_COMPAT) || \
+ defined(__ARCH_WANT_SYS_LLSEEK)
SYSCALL_DEFINE5(llseek, unsigned int, fd, unsigned long, offset_high,
unsigned long, offset_low, loff_t __user *, result,
unsigned int, whence)
--
2.22.0
Le 30/08/2019 à 23:03, Michal Suchanek a écrit :
> Less code means less bugs so add a knob to skip the compat stuff.
I guess on PPC64 you have Gigabytes of memory and thousands of bogomips,
hence you focus on bugs.
My main focus usually is kernel size and performance, which makes this
series interesting as well.
Anyway, I was wondering, would it make sense (in a following series, not
in this one) to make it buildable as a module, just like some of binfmt ?
Christophe
>
> This is tested on ppc64le top of
>
> https://patchwork.ozlabs.org/cover/1153556/
>
> Changes in v2: saner CONFIG_COMPAT ifdefs
> Changes in v3:
> - change llseek to 32bit instead of builing it unconditionally in fs
> - clanup the makefile conditionals
> - remove some ifdefs or convert to IS_DEFINED where possible
> Changes in v4:
> - cleanup is_32bit_task and current_is_64bit
> - more makefile cleanup
> Changes in v5:
> - more current_is_64bit cleanup
> - split off callchain.c 32bit and 64bit parts
> Changes in v6:
> - cleanup makefile after split
> - consolidate read_user_stack_32
> - fix some checkpatch warnings
> Changes in v7:
> - add back __ARCH_WANT_SYS_LLSEEK to fix build with llseek
> - remove leftover hunk
> - add review tags
>
> Michal Suchanek (6):
> powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro
> powerpc: move common register copy functions from signal_32.c to
> signal.c
> powerpc/perf: consolidate read_user_stack_32
> powerpc/64: make buildable without CONFIG_COMPAT
> powerpc/64: Make COMPAT user-selectable disabled on littleendian by
> default.
> powerpc/perf: split callchain.c by bitness
>
> arch/powerpc/Kconfig | 5 +-
> arch/powerpc/include/asm/thread_info.h | 4 +-
> arch/powerpc/include/asm/unistd.h | 1 +
> arch/powerpc/kernel/Makefile | 7 +-
> arch/powerpc/kernel/entry_64.S | 2 +
> arch/powerpc/kernel/signal.c | 144 +++++++++-
> arch/powerpc/kernel/signal_32.c | 140 ---------
> arch/powerpc/kernel/syscall_64.c | 6 +-
> arch/powerpc/kernel/vdso.c | 5 +-
> arch/powerpc/perf/Makefile | 5 +-
> arch/powerpc/perf/callchain.c | 377 +------------------------
> arch/powerpc/perf/callchain.h | 11 +
> arch/powerpc/perf/callchain_32.c | 204 +++++++++++++
> arch/powerpc/perf/callchain_64.c | 185 ++++++++++++
> fs/read_write.c | 3 +-
> 15 files changed, 566 insertions(+), 533 deletions(-)
> create mode 100644 arch/powerpc/perf/callchain.h
> create mode 100644 arch/powerpc/perf/callchain_32.c
> create mode 100644 arch/powerpc/perf/callchain_64.c
>
Le 30/08/2019 à 23:03, Michal Suchanek a écrit :
> There are numerous references to 32bit functions in generic and 64bit
> code so ifdef them out.
>
> Signed-off-by: Michal Suchanek <[email protected]>
Reviewed-by: Christophe Leroy <[email protected]>
> ---
> v2:
> - fix 32bit ifdef condition in signal.c
> - simplify the compat ifdef condition in vdso.c - 64bit is redundant
> - simplify the compat ifdef condition in callchain.c - 64bit is redundant
> v3:
> - use IS_ENABLED and maybe_unused where possible
> - do not ifdef declarations
> - clean up Makefile
> v4:
> - further makefile cleanup
> - simplify is_32bit_task conditions
> - avoid ifdef in condition by using return
> v5:
> - avoid unreachable code on 32bit
> - make is_current_64bit constant on !COMPAT
> - add stub perf_callchain_user_32 to avoid some ifdefs
> v6:
> - consolidate current_is_64bit
> v7:
> - remove leftover perf_callchain_user_32 stub from previous series version
> ---
> arch/powerpc/include/asm/thread_info.h | 4 ++--
> arch/powerpc/kernel/Makefile | 7 +++---
> arch/powerpc/kernel/entry_64.S | 2 ++
> arch/powerpc/kernel/signal.c | 3 +--
> arch/powerpc/kernel/syscall_64.c | 6 ++---
> arch/powerpc/kernel/vdso.c | 5 ++---
> arch/powerpc/perf/callchain.c | 31 +++++++++++++-------------
> 7 files changed, 27 insertions(+), 31 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h
> index 8e1d0195ac36..c128d8a48ea3 100644
> --- a/arch/powerpc/include/asm/thread_info.h
> +++ b/arch/powerpc/include/asm/thread_info.h
> @@ -144,10 +144,10 @@ static inline bool test_thread_local_flags(unsigned int flags)
> return (ti->local_flags & flags) != 0;
> }
>
> -#ifdef CONFIG_PPC64
> +#ifdef CONFIG_COMPAT
> #define is_32bit_task() (test_thread_flag(TIF_32BIT))
> #else
> -#define is_32bit_task() (1)
> +#define is_32bit_task() (IS_ENABLED(CONFIG_PPC32))
> #endif
>
> #if defined(CONFIG_PPC64)
> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> index 1d646a94d96c..9d8772e863b9 100644
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -44,16 +44,15 @@ CFLAGS_btext.o += -DDISABLE_BRANCH_PROFILING
> endif
>
> obj-y := cputable.o ptrace.o syscalls.o \
> - irq.o align.o signal_32.o pmc.o vdso.o \
> + irq.o align.o signal_$(BITS).o pmc.o vdso.o \
> process.o systbl.o idle.o \
> signal.o sysfs.o cacheinfo.o time.o \
> prom.o traps.o setup-common.o \
> udbg.o misc.o io.o misc_$(BITS).o \
> of_platform.o prom_parse.o
> -obj-$(CONFIG_PPC64) += setup_64.o sys_ppc32.o \
> - signal_64.o ptrace32.o \
> - paca.o nvram_64.o firmware.o \
> +obj-$(CONFIG_PPC64) += setup_64.o paca.o nvram_64.o firmware.o \
> syscall_64.o
> +obj-$(CONFIG_COMPAT) += sys_ppc32.o ptrace32.o signal_32.o
> obj-$(CONFIG_VDSO32) += vdso32/
> obj-$(CONFIG_PPC_WATCHDOG) += watchdog.o
> obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index 2ec825a85f5b..a2dbf216f607 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -51,8 +51,10 @@
> SYS_CALL_TABLE:
> .tc sys_call_table[TC],sys_call_table
>
> +#ifdef CONFIG_COMPAT
> COMPAT_SYS_CALL_TABLE:
> .tc compat_sys_call_table[TC],compat_sys_call_table
> +#endif
>
> /* This value is used to mark exception frames on the stack. */
> exception_marker:
> diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
> index 60436432399f..61678cb0e6a1 100644
> --- a/arch/powerpc/kernel/signal.c
> +++ b/arch/powerpc/kernel/signal.c
> @@ -247,7 +247,6 @@ static void do_signal(struct task_struct *tsk)
> sigset_t *oldset = sigmask_to_save();
> struct ksignal ksig = { .sig = 0 };
> int ret;
> - int is32 = is_32bit_task();
>
> BUG_ON(tsk != current);
>
> @@ -277,7 +276,7 @@ static void do_signal(struct task_struct *tsk)
>
> rseq_signal_deliver(&ksig, tsk->thread.regs);
>
> - if (is32) {
> + if (is_32bit_task()) {
> if (ksig.ka.sa.sa_flags & SA_SIGINFO)
> ret = handle_rt_signal32(&ksig, oldset, tsk);
> else
> diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
> index 98ed970796d5..0d5cbbe54cf1 100644
> --- a/arch/powerpc/kernel/syscall_64.c
> +++ b/arch/powerpc/kernel/syscall_64.c
> @@ -38,7 +38,6 @@ typedef long (*syscall_fn)(long, long, long, long, long, long);
>
> long system_call_exception(long r3, long r4, long r5, long r6, long r7, long r8, unsigned long r0, struct pt_regs *regs)
> {
> - unsigned long ti_flags;
> syscall_fn f;
>
> BUG_ON(!(regs->msr & MSR_PR));
> @@ -83,8 +82,7 @@ long system_call_exception(long r3, long r4, long r5, long r6, long r7, long r8,
> */
> regs->softe = IRQS_ENABLED;
>
> - ti_flags = current_thread_info()->flags;
> - if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) {
> + if (unlikely(current_thread_info()->flags & _TIF_SYSCALL_DOTRACE)) {
> /*
> * We use the return value of do_syscall_trace_enter() as the
> * syscall number. If the syscall was rejected for any reason
> @@ -100,7 +98,7 @@ long system_call_exception(long r3, long r4, long r5, long r6, long r7, long r8,
> /* May be faster to do array_index_nospec? */
> barrier_nospec();
>
> - if (unlikely(ti_flags & _TIF_32BIT)) {
> + if (unlikely(is_32bit_task())) {
> f = (void *)compat_sys_call_table[r0];
>
> r3 &= 0x00000000ffffffffULL;
> diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
> index d60598113a9f..6d4a077f74d6 100644
> --- a/arch/powerpc/kernel/vdso.c
> +++ b/arch/powerpc/kernel/vdso.c
> @@ -667,9 +667,7 @@ static void __init vdso_setup_syscall_map(void)
> {
> unsigned int i;
> extern unsigned long *sys_call_table;
> -#ifdef CONFIG_PPC64
> extern unsigned long *compat_sys_call_table;
> -#endif
> extern unsigned long sys_ni_syscall;
>
>
> @@ -678,7 +676,8 @@ static void __init vdso_setup_syscall_map(void)
> if (sys_call_table[i] != sys_ni_syscall)
> vdso_data->syscall_map_64[i >> 5] |=
> 0x80000000UL >> (i & 0x1f);
> - if (compat_sys_call_table[i] != sys_ni_syscall)
> + if (IS_ENABLED(CONFIG_COMPAT) &&
> + compat_sys_call_table[i] != sys_ni_syscall)
> vdso_data->syscall_map_32[i >> 5] |=
> 0x80000000UL >> (i & 0x1f);
> #else /* CONFIG_PPC64 */
> diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
> index b7cdcce20280..9b8dc822f531 100644
> --- a/arch/powerpc/perf/callchain.c
> +++ b/arch/powerpc/perf/callchain.c
> @@ -15,7 +15,7 @@
> #include <asm/sigcontext.h>
> #include <asm/ucontext.h>
> #include <asm/vdso.h>
> -#ifdef CONFIG_PPC64
> +#ifdef CONFIG_COMPAT
> #include "../kernel/ppc32.h"
> #endif
> #include <asm/pte-walk.h>
> @@ -268,16 +268,6 @@ static void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
> }
> }
>
> -static inline int current_is_64bit(void)
> -{
> - /*
> - * We can't use test_thread_flag() here because we may be on an
> - * interrupt stack, and the thread flags don't get copied over
> - * from the thread_info on the main stack to the interrupt stack.
> - */
> - return !test_ti_thread_flag(task_thread_info(current), TIF_32BIT);
> -}
> -
> #else /* CONFIG_PPC64 */
> static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
> {
> @@ -314,11 +304,6 @@ static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry
> {
> }
>
> -static inline int current_is_64bit(void)
> -{
> - return 0;
> -}
> -
> static inline int valid_user_sp(unsigned long sp, int is_64)
> {
> if (!sp || (sp & 7) || sp > TASK_SIZE - 32)
> @@ -334,6 +319,7 @@ static inline int valid_user_sp(unsigned long sp, int is_64)
>
> #endif /* CONFIG_PPC64 */
>
> +#if defined(CONFIG_PPC32) || defined(CONFIG_COMPAT)
> /*
> * Layout for non-RT signal frames
> */
> @@ -475,6 +461,19 @@ static void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
> sp = next_sp;
> }
> }
> +#endif /* 32bit */
> +
> +static inline int current_is_64bit(void)
> +{
> + if (!IS_ENABLED(CONFIG_COMPAT))
> + return IS_ENABLED(CONFIG_PPC64);
> + /*
> + * We can't use test_thread_flag() here because we may be on an
> + * interrupt stack, and the thread flags don't get copied over
> + * from the thread_info on the main stack to the interrupt stack.
> + */
> + return !test_ti_thread_flag(task_thread_info(current), TIF_32BIT);
> +}
>
> void
> perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
>
On Fri, 30 Aug 2019 23:03:41 +0200
Michal Suchanek <[email protected]> wrote:
> There are numerous references to 32bit functions in generic and 64bit
> code so ifdef them out.
>
> Signed-off-by: Michal Suchanek <[email protected]>
> ---
> v2:
> - fix 32bit ifdef condition in signal.c
> - simplify the compat ifdef condition in vdso.c - 64bit is redundant
> - simplify the compat ifdef condition in callchain.c - 64bit is redundant
> v3:
> - use IS_ENABLED and maybe_unused where possible
> - do not ifdef declarations
> - clean up Makefile
> v4:
> - further makefile cleanup
> - simplify is_32bit_task conditions
> - avoid ifdef in condition by using return
> v5:
> - avoid unreachable code on 32bit
> - make is_current_64bit constant on !COMPAT
> - add stub perf_callchain_user_32 to avoid some ifdefs
> v6:
> - consolidate current_is_64bit
> v7:
> - remove leftover perf_callchain_user_32 stub from previous series version
removed too much stuff here
> ---
> arch/powerpc/include/asm/thread_info.h | 4 ++--
> arch/powerpc/kernel/Makefile | 7 +++---
> arch/powerpc/kernel/entry_64.S | 2 ++
> arch/powerpc/kernel/signal.c | 3 +--
> arch/powerpc/kernel/syscall_64.c | 6 ++---
> arch/powerpc/kernel/vdso.c | 5 ++---
> arch/powerpc/perf/callchain.c | 31 +++++++++++++-------------
> 7 files changed, 27 insertions(+), 31 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h
> index 8e1d0195ac36..c128d8a48ea3 100644
> --- a/arch/powerpc/include/asm/thread_info.h
> +++ b/arch/powerpc/include/asm/thread_info.h
> @@ -144,10 +144,10 @@ static inline bool test_thread_local_flags(unsigned int flags)
> return (ti->local_flags & flags) != 0;
> }
>
> -#ifdef CONFIG_PPC64
> +#ifdef CONFIG_COMPAT
> #define is_32bit_task() (test_thread_flag(TIF_32BIT))
> #else
> -#define is_32bit_task() (1)
> +#define is_32bit_task() (IS_ENABLED(CONFIG_PPC32))
> #endif
>
> #if defined(CONFIG_PPC64)
> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> index 1d646a94d96c..9d8772e863b9 100644
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -44,16 +44,15 @@ CFLAGS_btext.o += -DDISABLE_BRANCH_PROFILING
> endif
>
> obj-y := cputable.o ptrace.o syscalls.o \
> - irq.o align.o signal_32.o pmc.o vdso.o \
> + irq.o align.o signal_$(BITS).o pmc.o vdso.o \
> process.o systbl.o idle.o \
> signal.o sysfs.o cacheinfo.o time.o \
> prom.o traps.o setup-common.o \
> udbg.o misc.o io.o misc_$(BITS).o \
> of_platform.o prom_parse.o
> -obj-$(CONFIG_PPC64) += setup_64.o sys_ppc32.o \
> - signal_64.o ptrace32.o \
> - paca.o nvram_64.o firmware.o \
> +obj-$(CONFIG_PPC64) += setup_64.o paca.o nvram_64.o firmware.o \
> syscall_64.o
> +obj-$(CONFIG_COMPAT) += sys_ppc32.o ptrace32.o signal_32.o
> obj-$(CONFIG_VDSO32) += vdso32/
> obj-$(CONFIG_PPC_WATCHDOG) += watchdog.o
> obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index 2ec825a85f5b..a2dbf216f607 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -51,8 +51,10 @@
> SYS_CALL_TABLE:
> .tc sys_call_table[TC],sys_call_table
>
> +#ifdef CONFIG_COMPAT
> COMPAT_SYS_CALL_TABLE:
> .tc compat_sys_call_table[TC],compat_sys_call_table
> +#endif
>
> /* This value is used to mark exception frames on the stack. */
> exception_marker:
> diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
> index 60436432399f..61678cb0e6a1 100644
> --- a/arch/powerpc/kernel/signal.c
> +++ b/arch/powerpc/kernel/signal.c
> @@ -247,7 +247,6 @@ static void do_signal(struct task_struct *tsk)
> sigset_t *oldset = sigmask_to_save();
> struct ksignal ksig = { .sig = 0 };
> int ret;
> - int is32 = is_32bit_task();
>
> BUG_ON(tsk != current);
>
> @@ -277,7 +276,7 @@ static void do_signal(struct task_struct *tsk)
>
> rseq_signal_deliver(&ksig, tsk->thread.regs);
>
> - if (is32) {
> + if (is_32bit_task()) {
> if (ksig.ka.sa.sa_flags & SA_SIGINFO)
> ret = handle_rt_signal32(&ksig, oldset, tsk);
> else
> diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
> index 98ed970796d5..0d5cbbe54cf1 100644
> --- a/arch/powerpc/kernel/syscall_64.c
> +++ b/arch/powerpc/kernel/syscall_64.c
> @@ -38,7 +38,6 @@ typedef long (*syscall_fn)(long, long, long, long, long, long);
>
> long system_call_exception(long r3, long r4, long r5, long r6, long r7, long r8, unsigned long r0, struct pt_regs *regs)
> {
> - unsigned long ti_flags;
> syscall_fn f;
>
> BUG_ON(!(regs->msr & MSR_PR));
> @@ -83,8 +82,7 @@ long system_call_exception(long r3, long r4, long r5, long r6, long r7, long r8,
> */
> regs->softe = IRQS_ENABLED;
>
> - ti_flags = current_thread_info()->flags;
> - if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) {
> + if (unlikely(current_thread_info()->flags & _TIF_SYSCALL_DOTRACE)) {
> /*
> * We use the return value of do_syscall_trace_enter() as the
> * syscall number. If the syscall was rejected for any reason
> @@ -100,7 +98,7 @@ long system_call_exception(long r3, long r4, long r5, long r6, long r7, long r8,
> /* May be faster to do array_index_nospec? */
> barrier_nospec();
>
> - if (unlikely(ti_flags & _TIF_32BIT)) {
> + if (unlikely(is_32bit_task())) {
> f = (void *)compat_sys_call_table[r0];
>
> r3 &= 0x00000000ffffffffULL;
> diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
> index d60598113a9f..6d4a077f74d6 100644
> --- a/arch/powerpc/kernel/vdso.c
> +++ b/arch/powerpc/kernel/vdso.c
> @@ -667,9 +667,7 @@ static void __init vdso_setup_syscall_map(void)
> {
> unsigned int i;
> extern unsigned long *sys_call_table;
> -#ifdef CONFIG_PPC64
> extern unsigned long *compat_sys_call_table;
> -#endif
> extern unsigned long sys_ni_syscall;
>
>
> @@ -678,7 +676,8 @@ static void __init vdso_setup_syscall_map(void)
> if (sys_call_table[i] != sys_ni_syscall)
> vdso_data->syscall_map_64[i >> 5] |=
> 0x80000000UL >> (i & 0x1f);
> - if (compat_sys_call_table[i] != sys_ni_syscall)
> + if (IS_ENABLED(CONFIG_COMPAT) &&
> + compat_sys_call_table[i] != sys_ni_syscall)
> vdso_data->syscall_map_32[i >> 5] |=
> 0x80000000UL >> (i & 0x1f);
> #else /* CONFIG_PPC64 */
> diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
> index b7cdcce20280..9b8dc822f531 100644
> --- a/arch/powerpc/perf/callchain.c
> +++ b/arch/powerpc/perf/callchain.c
> @@ -15,7 +15,7 @@
> #include <asm/sigcontext.h>
> #include <asm/ucontext.h>
> #include <asm/vdso.h>
> -#ifdef CONFIG_PPC64
> +#ifdef CONFIG_COMPAT
> #include "../kernel/ppc32.h"
> #endif
> #include <asm/pte-walk.h>
> @@ -268,16 +268,6 @@ static void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
> }
> }
>
> -static inline int current_is_64bit(void)
> -{
> - /*
> - * We can't use test_thread_flag() here because we may be on an
> - * interrupt stack, and the thread flags don't get copied over
> - * from the thread_info on the main stack to the interrupt stack.
> - */
> - return !test_ti_thread_flag(task_thread_info(current), TIF_32BIT);
> -}
> -
> #else /* CONFIG_PPC64 */
> static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
> {
> @@ -314,11 +304,6 @@ static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry
> {
> }
>
> -static inline int current_is_64bit(void)
> -{
> - return 0;
> -}
> -
> static inline int valid_user_sp(unsigned long sp, int is_64)
> {
> if (!sp || (sp & 7) || sp > TASK_SIZE - 32)
> @@ -334,6 +319,7 @@ static inline int valid_user_sp(unsigned long sp, int is_64)
>
> #endif /* CONFIG_PPC64 */
>
> +#if defined(CONFIG_PPC32) || defined(CONFIG_COMPAT)
> /*
> * Layout for non-RT signal frames
> */
> @@ -475,6 +461,19 @@ static void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
> sp = next_sp;
> }
> }
> +#endif /* 32bit */
> +
> +static inline int current_is_64bit(void)
> +{
> + if (!IS_ENABLED(CONFIG_COMPAT))
> + return IS_ENABLED(CONFIG_PPC64);
> + /*
> + * We can't use test_thread_flag() here because we may be on an
> + * interrupt stack, and the thread flags don't get copied over
> + * from the thread_info on the main stack to the interrupt stack.
> + */
> + return !test_ti_thread_flag(task_thread_info(current), TIF_32BIT);
> +}
>
> void
> perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs)
On Sat, 31 Aug 2019 08:41:58 +0200
Christophe Leroy <[email protected]> wrote:
> Le 30/08/2019 à 23:03, Michal Suchanek a écrit :
> > Less code means less bugs so add a knob to skip the compat stuff.
>
> I guess on PPC64 you have Gigabytes of memory and thousands of bogomips,
> hence you focus on bugs.
>
> My main focus usually is kernel size and performance, which makes this
> series interesting as well.
>
> Anyway, I was wondering, would it make sense (in a following series, not
> in this one) to make it buildable as a module, just like some of binfmt ?
I think not.
You have the case when 32bit support is not optional because you are
32bit and when it is optional because you are 64bit. These cases either
diverge or the 32bit case suffers the penalty of some indirection that
makes the functionality loadable.
The indirection requires some synchronization so the 32bit code cannot
be unloaded while you are trying to call it, and possibly counting
32bit tasks so you will know when you can unload the 32bit code again.
This would add more code which benefits neither performance nor size
nor reliability.
Also you can presumably run 32bit code with binfmt-misc already but
don't get stuff like perf counters handled in the kernel because it is
not native code anymore.
Thanks
Michal
>
> Christophe
>
> >
> > This is tested on ppc64le top of
> >
> > https://patchwork.ozlabs.org/cover/1153556/
> >
> > Changes in v2: saner CONFIG_COMPAT ifdefs
> > Changes in v3:
> > - change llseek to 32bit instead of builing it unconditionally in fs
> > - clanup the makefile conditionals
> > - remove some ifdefs or convert to IS_DEFINED where possible
> > Changes in v4:
> > - cleanup is_32bit_task and current_is_64bit
> > - more makefile cleanup
> > Changes in v5:
> > - more current_is_64bit cleanup
> > - split off callchain.c 32bit and 64bit parts
> > Changes in v6:
> > - cleanup makefile after split
> > - consolidate read_user_stack_32
> > - fix some checkpatch warnings
> > Changes in v7:
> > - add back __ARCH_WANT_SYS_LLSEEK to fix build with llseek
> > - remove leftover hunk
> > - add review tags
> >
> > Michal Suchanek (6):
> > powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro
> > powerpc: move common register copy functions from signal_32.c to
> > signal.c
> > powerpc/perf: consolidate read_user_stack_32
> > powerpc/64: make buildable without CONFIG_COMPAT
> > powerpc/64: Make COMPAT user-selectable disabled on littleendian by
> > default.
> > powerpc/perf: split callchain.c by bitness
> >
> > arch/powerpc/Kconfig | 5 +-
> > arch/powerpc/include/asm/thread_info.h | 4 +-
> > arch/powerpc/include/asm/unistd.h | 1 +
> > arch/powerpc/kernel/Makefile | 7 +-
> > arch/powerpc/kernel/entry_64.S | 2 +
> > arch/powerpc/kernel/signal.c | 144 +++++++++-
> > arch/powerpc/kernel/signal_32.c | 140 ---------
> > arch/powerpc/kernel/syscall_64.c | 6 +-
> > arch/powerpc/kernel/vdso.c | 5 +-
> > arch/powerpc/perf/Makefile | 5 +-
> > arch/powerpc/perf/callchain.c | 377 +------------------------
> > arch/powerpc/perf/callchain.h | 11 +
> > arch/powerpc/perf/callchain_32.c | 204 +++++++++++++
> > arch/powerpc/perf/callchain_64.c | 185 ++++++++++++
> > fs/read_write.c | 3 +-
> > 15 files changed, 566 insertions(+), 533 deletions(-)
> > create mode 100644 arch/powerpc/perf/callchain.h
> > create mode 100644 arch/powerpc/perf/callchain_32.c
> > create mode 100644 arch/powerpc/perf/callchain_64.c
> >
On Fri, 30 Aug 2019 23:03:43 +0200
Michal Suchanek <[email protected]> wrote:
> Building callchain.c with !COMPAT proved quite ugly with all the
> defines. Splitting out the 32bit and 64bit parts looks better.
>
> No code change intended.
valid_user_sp is broken in compat. It needs to be ifdefed for the 32bit
case.
>
> Signed-off-by: Michal Suchanek <[email protected]>
> Reviewed-by: Christophe Leroy <[email protected]>
> ---
> v6:
> - move current_is_64bit consolidetaion to earlier patch
> - move defines to the top of callchain_32.c
> - Makefile cleanup
> ---
> arch/powerpc/perf/Makefile | 5 +-
> arch/powerpc/perf/callchain.c | 365 +------------------------------
> arch/powerpc/perf/callchain.h | 11 +
> arch/powerpc/perf/callchain_32.c | 204 +++++++++++++++++
> arch/powerpc/perf/callchain_64.c | 185 ++++++++++++++++
> 5 files changed, 405 insertions(+), 365 deletions(-)
> create mode 100644 arch/powerpc/perf/callchain.h
> create mode 100644 arch/powerpc/perf/callchain_32.c
> create mode 100644 arch/powerpc/perf/callchain_64.c
>
> diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
> index c155dcbb8691..53d614e98537 100644
> --- a/arch/powerpc/perf/Makefile
> +++ b/arch/powerpc/perf/Makefile
> @@ -1,6 +1,9 @@
> # SPDX-License-Identifier: GPL-2.0
>
> -obj-$(CONFIG_PERF_EVENTS) += callchain.o perf_regs.o
> +obj-$(CONFIG_PERF_EVENTS) += callchain.o callchain_$(BITS).o perf_regs.o
> +ifdef CONFIG_COMPAT
> +obj-$(CONFIG_PERF_EVENTS) += callchain_32.o
> +endif
>
> obj-$(CONFIG_PPC_PERF_CTRS) += core-book3s.o bhrb.o
> obj64-$(CONFIG_PPC_PERF_CTRS) += ppc970-pmu.o power5-pmu.o \
> diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
> index 9b8dc822f531..8f30f1b47c78 100644
> --- a/arch/powerpc/perf/callchain.c
> +++ b/arch/powerpc/perf/callchain.c
> @@ -15,11 +15,9 @@
> #include <asm/sigcontext.h>
> #include <asm/ucontext.h>
> #include <asm/vdso.h>
> -#ifdef CONFIG_COMPAT
> -#include "../kernel/ppc32.h"
> -#endif
> #include <asm/pte-walk.h>
>
> +#include "callchain.h"
>
> /*
> * Is sp valid as the address of the next kernel stack frame after prev_sp?
> @@ -102,367 +100,6 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
> }
> }
>
> -#ifdef CONFIG_PPC64
> -/*
> - * On 64-bit we don't want to invoke hash_page on user addresses from
> - * interrupt context, so if the access faults, we read the page tables
> - * to find which page (if any) is mapped and access it directly.
> - */
> -static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
> -{
> - int ret = -EFAULT;
> - pgd_t *pgdir;
> - pte_t *ptep, pte;
> - unsigned shift;
> - unsigned long addr = (unsigned long) ptr;
> - unsigned long offset;
> - unsigned long pfn, flags;
> - void *kaddr;
> -
> - pgdir = current->mm->pgd;
> - if (!pgdir)
> - return -EFAULT;
> -
> - local_irq_save(flags);
> - ptep = find_current_mm_pte(pgdir, addr, NULL, &shift);
> - if (!ptep)
> - goto err_out;
> - if (!shift)
> - shift = PAGE_SHIFT;
> -
> - /* align address to page boundary */
> - offset = addr & ((1UL << shift) - 1);
> -
> - pte = READ_ONCE(*ptep);
> - if (!pte_present(pte) || !pte_user(pte))
> - goto err_out;
> - pfn = pte_pfn(pte);
> - if (!page_is_ram(pfn))
> - goto err_out;
> -
> - /* no highmem to worry about here */
> - kaddr = pfn_to_kaddr(pfn);
> - memcpy(buf, kaddr + offset, nb);
> - ret = 0;
> -err_out:
> - local_irq_restore(flags);
> - return ret;
> -}
> -
> -static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
> -{
> - if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned long) ||
> - ((unsigned long)ptr & 7))
> - return -EFAULT;
> -
> - pagefault_disable();
> - if (!__get_user_inatomic(*ret, ptr)) {
> - pagefault_enable();
> - return 0;
> - }
> - pagefault_enable();
> -
> - return read_user_stack_slow(ptr, ret, 8);
> -}
> -
> -static inline int valid_user_sp(unsigned long sp, int is_64)
> -{
> - if (!sp || (sp & 7) || sp > (is_64 ? TASK_SIZE : 0x100000000UL) - 32)
> - return 0;
> - return 1;
> -}
> -
> -/*
> - * 64-bit user processes use the same stack frame for RT and non-RT signals.
> - */
> -struct signal_frame_64 {
> - char dummy[__SIGNAL_FRAMESIZE];
> - struct ucontext uc;
> - unsigned long unused[2];
> - unsigned int tramp[6];
> - struct siginfo *pinfo;
> - void *puc;
> - struct siginfo info;
> - char abigap[288];
> -};
> -
> -static int is_sigreturn_64_address(unsigned long nip, unsigned long fp)
> -{
> - if (nip == fp + offsetof(struct signal_frame_64, tramp))
> - return 1;
> - if (vdso64_rt_sigtramp && current->mm->context.vdso_base &&
> - nip == current->mm->context.vdso_base + vdso64_rt_sigtramp)
> - return 1;
> - return 0;
> -}
> -
> -/*
> - * Do some sanity checking on the signal frame pointed to by sp.
> - * We check the pinfo and puc pointers in the frame.
> - */
> -static int sane_signal_64_frame(unsigned long sp)
> -{
> - struct signal_frame_64 __user *sf;
> - unsigned long pinfo, puc;
> -
> - sf = (struct signal_frame_64 __user *) sp;
> - if (read_user_stack_64((unsigned long __user *) &sf->pinfo, &pinfo) ||
> - read_user_stack_64((unsigned long __user *) &sf->puc, &puc))
> - return 0;
> - return pinfo == (unsigned long) &sf->info &&
> - puc == (unsigned long) &sf->uc;
> -}
> -
> -static void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
> - struct pt_regs *regs)
> -{
> - unsigned long sp, next_sp;
> - unsigned long next_ip;
> - unsigned long lr;
> - long level = 0;
> - struct signal_frame_64 __user *sigframe;
> - unsigned long __user *fp, *uregs;
> -
> - next_ip = perf_instruction_pointer(regs);
> - lr = regs->link;
> - sp = regs->gpr[1];
> - perf_callchain_store(entry, next_ip);
> -
> - while (entry->nr < entry->max_stack) {
> - fp = (unsigned long __user *) sp;
> - if (!valid_user_sp(sp, 1) || read_user_stack_64(fp, &next_sp))
> - return;
> - if (level > 0 && read_user_stack_64(&fp[2], &next_ip))
> - return;
> -
> - /*
> - * Note: the next_sp - sp >= signal frame size check
> - * is true when next_sp < sp, which can happen when
> - * transitioning from an alternate signal stack to the
> - * normal stack.
> - */
> - if (next_sp - sp >= sizeof(struct signal_frame_64) &&
> - (is_sigreturn_64_address(next_ip, sp) ||
> - (level <= 1 && is_sigreturn_64_address(lr, sp))) &&
> - sane_signal_64_frame(sp)) {
> - /*
> - * This looks like an signal frame
> - */
> - sigframe = (struct signal_frame_64 __user *) sp;
> - uregs = sigframe->uc.uc_mcontext.gp_regs;
> - if (read_user_stack_64(&uregs[PT_NIP], &next_ip) ||
> - read_user_stack_64(&uregs[PT_LNK], &lr) ||
> - read_user_stack_64(&uregs[PT_R1], &sp))
> - return;
> - level = 0;
> - perf_callchain_store_context(entry, PERF_CONTEXT_USER);
> - perf_callchain_store(entry, next_ip);
> - continue;
> - }
> -
> - if (level == 0)
> - next_ip = lr;
> - perf_callchain_store(entry, next_ip);
> - ++level;
> - sp = next_sp;
> - }
> -}
> -
> -#else /* CONFIG_PPC64 */
> -static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
> -{
> - return 0;
> -}
> -#endif /* CONFIG_PPC64 */
> -
> -/*
> - * On 32-bit we just access the address and let hash_page create a
> - * HPTE if necessary, so there is no need to fall back to reading
> - * the page tables. Since this is called at interrupt level,
> - * do_page_fault() won't treat a DSI as a page fault.
> - */
> -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> -{
> - int rc;
> -
> - if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> - ((unsigned long)ptr & 3))
> - return -EFAULT;
> -
> - pagefault_disable();
> - rc = __get_user_inatomic(*ret, ptr);
> - pagefault_enable();
> -
> - if (IS_ENABLED(CONFIG_PPC64) && rc)
> - return read_user_stack_slow(ptr, ret, 4);
> - return rc;
> -}
> -
> -#ifndef CONFIG_PPC64
> -static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
> - struct pt_regs *regs)
> -{
> -}
> -
> -static inline int valid_user_sp(unsigned long sp, int is_64)
> -{
> - if (!sp || (sp & 7) || sp > TASK_SIZE - 32)
> - return 0;
> - return 1;
> -}
> -
> -#define __SIGNAL_FRAMESIZE32 __SIGNAL_FRAMESIZE
> -#define sigcontext32 sigcontext
> -#define mcontext32 mcontext
> -#define ucontext32 ucontext
> -#define compat_siginfo_t struct siginfo
> -
> -#endif /* CONFIG_PPC64 */
> -
> -#if defined(CONFIG_PPC32) || defined(CONFIG_COMPAT)
> -/*
> - * Layout for non-RT signal frames
> - */
> -struct signal_frame_32 {
> - char dummy[__SIGNAL_FRAMESIZE32];
> - struct sigcontext32 sctx;
> - struct mcontext32 mctx;
> - int abigap[56];
> -};
> -
> -/*
> - * Layout for RT signal frames
> - */
> -struct rt_signal_frame_32 {
> - char dummy[__SIGNAL_FRAMESIZE32 + 16];
> - compat_siginfo_t info;
> - struct ucontext32 uc;
> - int abigap[56];
> -};
> -
> -static int is_sigreturn_32_address(unsigned int nip, unsigned int fp)
> -{
> - if (nip == fp + offsetof(struct signal_frame_32, mctx.mc_pad))
> - return 1;
> - if (vdso32_sigtramp && current->mm->context.vdso_base &&
> - nip == current->mm->context.vdso_base + vdso32_sigtramp)
> - return 1;
> - return 0;
> -}
> -
> -static int is_rt_sigreturn_32_address(unsigned int nip, unsigned int fp)
> -{
> - if (nip == fp + offsetof(struct rt_signal_frame_32,
> - uc.uc_mcontext.mc_pad))
> - return 1;
> - if (vdso32_rt_sigtramp && current->mm->context.vdso_base &&
> - nip == current->mm->context.vdso_base + vdso32_rt_sigtramp)
> - return 1;
> - return 0;
> -}
> -
> -static int sane_signal_32_frame(unsigned int sp)
> -{
> - struct signal_frame_32 __user *sf;
> - unsigned int regs;
> -
> - sf = (struct signal_frame_32 __user *) (unsigned long) sp;
> - if (read_user_stack_32((unsigned int __user *) &sf->sctx.regs, ®s))
> - return 0;
> - return regs == (unsigned long) &sf->mctx;
> -}
> -
> -static int sane_rt_signal_32_frame(unsigned int sp)
> -{
> - struct rt_signal_frame_32 __user *sf;
> - unsigned int regs;
> -
> - sf = (struct rt_signal_frame_32 __user *) (unsigned long) sp;
> - if (read_user_stack_32((unsigned int __user *) &sf->uc.uc_regs, ®s))
> - return 0;
> - return regs == (unsigned long) &sf->uc.uc_mcontext;
> -}
> -
> -static unsigned int __user *signal_frame_32_regs(unsigned int sp,
> - unsigned int next_sp, unsigned int next_ip)
> -{
> - struct mcontext32 __user *mctx = NULL;
> - struct signal_frame_32 __user *sf;
> - struct rt_signal_frame_32 __user *rt_sf;
> -
> - /*
> - * Note: the next_sp - sp >= signal frame size check
> - * is true when next_sp < sp, for example, when
> - * transitioning from an alternate signal stack to the
> - * normal stack.
> - */
> - if (next_sp - sp >= sizeof(struct signal_frame_32) &&
> - is_sigreturn_32_address(next_ip, sp) &&
> - sane_signal_32_frame(sp)) {
> - sf = (struct signal_frame_32 __user *) (unsigned long) sp;
> - mctx = &sf->mctx;
> - }
> -
> - if (!mctx && next_sp - sp >= sizeof(struct rt_signal_frame_32) &&
> - is_rt_sigreturn_32_address(next_ip, sp) &&
> - sane_rt_signal_32_frame(sp)) {
> - rt_sf = (struct rt_signal_frame_32 __user *) (unsigned long) sp;
> - mctx = &rt_sf->uc.uc_mcontext;
> - }
> -
> - if (!mctx)
> - return NULL;
> - return mctx->mc_gregs;
> -}
> -
> -static void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
> - struct pt_regs *regs)
> -{
> - unsigned int sp, next_sp;
> - unsigned int next_ip;
> - unsigned int lr;
> - long level = 0;
> - unsigned int __user *fp, *uregs;
> -
> - next_ip = perf_instruction_pointer(regs);
> - lr = regs->link;
> - sp = regs->gpr[1];
> - perf_callchain_store(entry, next_ip);
> -
> - while (entry->nr < entry->max_stack) {
> - fp = (unsigned int __user *) (unsigned long) sp;
> - if (!valid_user_sp(sp, 0) || read_user_stack_32(fp, &next_sp))
> - return;
> - if (level > 0 && read_user_stack_32(&fp[1], &next_ip))
> - return;
> -
> - uregs = signal_frame_32_regs(sp, next_sp, next_ip);
> - if (!uregs && level <= 1)
> - uregs = signal_frame_32_regs(sp, next_sp, lr);
> - if (uregs) {
> - /*
> - * This looks like an signal frame, so restart
> - * the stack trace with the values in it.
> - */
> - if (read_user_stack_32(&uregs[PT_NIP], &next_ip) ||
> - read_user_stack_32(&uregs[PT_LNK], &lr) ||
> - read_user_stack_32(&uregs[PT_R1], &sp))
> - return;
> - level = 0;
> - perf_callchain_store_context(entry, PERF_CONTEXT_USER);
> - perf_callchain_store(entry, next_ip);
> - continue;
> - }
> -
> - if (level == 0)
> - next_ip = lr;
> - perf_callchain_store(entry, next_ip);
> - ++level;
> - sp = next_sp;
> - }
> -}
> -#endif /* 32bit */
> -
> static inline int current_is_64bit(void)
> {
> if (!IS_ENABLED(CONFIG_COMPAT))
> diff --git a/arch/powerpc/perf/callchain.h b/arch/powerpc/perf/callchain.h
> new file mode 100644
> index 000000000000..63ffb43f3668
> --- /dev/null
> +++ b/arch/powerpc/perf/callchain.h
> @@ -0,0 +1,11 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +#ifndef _POWERPC_PERF_CALLCHAIN_H
> +#define _POWERPC_PERF_CALLCHAIN_H
> +
> +int read_user_stack_slow(void __user *ptr, void *buf, int nb);
> +void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
> + struct pt_regs *regs);
> +void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
> + struct pt_regs *regs);
> +
> +#endif /* _POWERPC_PERF_CALLCHAIN_H */
> diff --git a/arch/powerpc/perf/callchain_32.c b/arch/powerpc/perf/callchain_32.c
> new file mode 100644
> index 000000000000..01a38d929078
> --- /dev/null
> +++ b/arch/powerpc/perf/callchain_32.c
> @@ -0,0 +1,204 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Performance counter callchain support - powerpc architecture code
> + *
> + * Copyright © 2009 Paul Mackerras, IBM Corporation.
> + */
> +#include <linux/kernel.h>
> +#include <linux/sched.h>
> +#include <linux/perf_event.h>
> +#include <linux/percpu.h>
> +#include <linux/uaccess.h>
> +#include <linux/mm.h>
> +#include <asm/ptrace.h>
> +#include <asm/pgtable.h>
> +#include <asm/sigcontext.h>
> +#include <asm/ucontext.h>
> +#include <asm/vdso.h>
> +#include <asm/pte-walk.h>
> +
> +#include "callchain.h"
> +
> +#ifdef CONFIG_PPC64
> +#include "../kernel/ppc32.h"
> +#else /* CONFIG_PPC64 */
> +
> +#define __SIGNAL_FRAMESIZE32 __SIGNAL_FRAMESIZE
> +#define sigcontext32 sigcontext
> +#define mcontext32 mcontext
> +#define ucontext32 ucontext
> +#define compat_siginfo_t struct siginfo
> +
> +#endif /* CONFIG_PPC64 */
> +
> +/*
> + * On 32-bit we just access the address and let hash_page create a
> + * HPTE if necessary, so there is no need to fall back to reading
> + * the page tables. Since this is called at interrupt level,
> + * do_page_fault() won't treat a DSI as a page fault.
> + */
> +static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> +{
> + int rc;
> +
> + if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> + ((unsigned long)ptr & 3))
> + return -EFAULT;
> +
> + pagefault_disable();
> + rc = __get_user_inatomic(*ret, ptr);
> + pagefault_enable();
> +
> + if (IS_ENABLED(CONFIG_PPC64) && rc)
> + return read_user_stack_slow(ptr, ret, 4);
> + return rc;
> +}
> +
> +static inline int valid_user_sp(unsigned long sp, int is_64)
> +{
> + if (!sp || (sp & 7) || sp > TASK_SIZE - 32)
> + return 0;
> + return 1;
> +}
> +
> +/*
> + * Layout for non-RT signal frames
> + */
> +struct signal_frame_32 {
> + char dummy[__SIGNAL_FRAMESIZE32];
> + struct sigcontext32 sctx;
> + struct mcontext32 mctx;
> + int abigap[56];
> +};
> +
> +/*
> + * Layout for RT signal frames
> + */
> +struct rt_signal_frame_32 {
> + char dummy[__SIGNAL_FRAMESIZE32 + 16];
> + compat_siginfo_t info;
> + struct ucontext32 uc;
> + int abigap[56];
> +};
> +
> +static int is_sigreturn_32_address(unsigned int nip, unsigned int fp)
> +{
> + if (nip == fp + offsetof(struct signal_frame_32, mctx.mc_pad))
> + return 1;
> + if (vdso32_sigtramp && current->mm->context.vdso_base &&
> + nip == current->mm->context.vdso_base + vdso32_sigtramp)
> + return 1;
> + return 0;
> +}
> +
> +static int is_rt_sigreturn_32_address(unsigned int nip, unsigned int fp)
> +{
> + if (nip == fp + offsetof(struct rt_signal_frame_32,
> + uc.uc_mcontext.mc_pad))
> + return 1;
> + if (vdso32_rt_sigtramp && current->mm->context.vdso_base &&
> + nip == current->mm->context.vdso_base + vdso32_rt_sigtramp)
> + return 1;
> + return 0;
> +}
> +
> +static int sane_signal_32_frame(unsigned int sp)
> +{
> + struct signal_frame_32 __user *sf;
> + unsigned int regs;
> +
> + sf = (struct signal_frame_32 __user *) (unsigned long) sp;
> + if (read_user_stack_32((unsigned int __user *) &sf->sctx.regs, ®s))
> + return 0;
> + return regs == (unsigned long) &sf->mctx;
> +}
> +
> +static int sane_rt_signal_32_frame(unsigned int sp)
> +{
> + struct rt_signal_frame_32 __user *sf;
> + unsigned int regs;
> +
> + sf = (struct rt_signal_frame_32 __user *) (unsigned long) sp;
> + if (read_user_stack_32((unsigned int __user *) &sf->uc.uc_regs, ®s))
> + return 0;
> + return regs == (unsigned long) &sf->uc.uc_mcontext;
> +}
> +
> +static unsigned int __user *signal_frame_32_regs(unsigned int sp,
> + unsigned int next_sp, unsigned int next_ip)
> +{
> + struct mcontext32 __user *mctx = NULL;
> + struct signal_frame_32 __user *sf;
> + struct rt_signal_frame_32 __user *rt_sf;
> +
> + /*
> + * Note: the next_sp - sp >= signal frame size check
> + * is true when next_sp < sp, for example, when
> + * transitioning from an alternate signal stack to the
> + * normal stack.
> + */
> + if (next_sp - sp >= sizeof(struct signal_frame_32) &&
> + is_sigreturn_32_address(next_ip, sp) &&
> + sane_signal_32_frame(sp)) {
> + sf = (struct signal_frame_32 __user *) (unsigned long) sp;
> + mctx = &sf->mctx;
> + }
> +
> + if (!mctx && next_sp - sp >= sizeof(struct rt_signal_frame_32) &&
> + is_rt_sigreturn_32_address(next_ip, sp) &&
> + sane_rt_signal_32_frame(sp)) {
> + rt_sf = (struct rt_signal_frame_32 __user *) (unsigned long) sp;
> + mctx = &rt_sf->uc.uc_mcontext;
> + }
> +
> + if (!mctx)
> + return NULL;
> + return mctx->mc_gregs;
> +}
> +
> +void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
> + struct pt_regs *regs)
> +{
> + unsigned int sp, next_sp;
> + unsigned int next_ip;
> + unsigned int lr;
> + long level = 0;
> + unsigned int __user *fp, *uregs;
> +
> + next_ip = perf_instruction_pointer(regs);
> + lr = regs->link;
> + sp = regs->gpr[1];
> + perf_callchain_store(entry, next_ip);
> +
> + while (entry->nr < entry->max_stack) {
> + fp = (unsigned int __user *) (unsigned long) sp;
> + if (!valid_user_sp(sp, 0) || read_user_stack_32(fp, &next_sp))
> + return;
> + if (level > 0 && read_user_stack_32(&fp[1], &next_ip))
> + return;
> +
> + uregs = signal_frame_32_regs(sp, next_sp, next_ip);
> + if (!uregs && level <= 1)
> + uregs = signal_frame_32_regs(sp, next_sp, lr);
> + if (uregs) {
> + /*
> + * This looks like an signal frame, so restart
> + * the stack trace with the values in it.
> + */
> + if (read_user_stack_32(&uregs[PT_NIP], &next_ip) ||
> + read_user_stack_32(&uregs[PT_LNK], &lr) ||
> + read_user_stack_32(&uregs[PT_R1], &sp))
> + return;
> + level = 0;
> + perf_callchain_store_context(entry, PERF_CONTEXT_USER);
> + perf_callchain_store(entry, next_ip);
> + continue;
> + }
> +
> + if (level == 0)
> + next_ip = lr;
> + perf_callchain_store(entry, next_ip);
> + ++level;
> + sp = next_sp;
> + }
> +}
> diff --git a/arch/powerpc/perf/callchain_64.c b/arch/powerpc/perf/callchain_64.c
> new file mode 100644
> index 000000000000..60308c2221a8
> --- /dev/null
> +++ b/arch/powerpc/perf/callchain_64.c
> @@ -0,0 +1,185 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Performance counter callchain support - powerpc architecture code
> + *
> + * Copyright © 2009 Paul Mackerras, IBM Corporation.
> + */
> +#include <linux/kernel.h>
> +#include <linux/sched.h>
> +#include <linux/perf_event.h>
> +#include <linux/percpu.h>
> +#include <linux/uaccess.h>
> +#include <linux/mm.h>
> +#include <asm/ptrace.h>
> +#include <asm/pgtable.h>
> +#include <asm/sigcontext.h>
> +#include <asm/ucontext.h>
> +#include <asm/vdso.h>
> +#include <asm/pte-walk.h>
> +
> +#include "callchain.h"
> +
> +/*
> + * On 64-bit we don't want to invoke hash_page on user addresses from
> + * interrupt context, so if the access faults, we read the page tables
> + * to find which page (if any) is mapped and access it directly.
> + */
> +int read_user_stack_slow(void __user *ptr, void *buf, int nb)
> +{
> + int ret = -EFAULT;
> + pgd_t *pgdir;
> + pte_t *ptep, pte;
> + unsigned int shift;
> + unsigned long addr = (unsigned long) ptr;
> + unsigned long offset;
> + unsigned long pfn, flags;
> + void *kaddr;
> +
> + pgdir = current->mm->pgd;
> + if (!pgdir)
> + return -EFAULT;
> +
> + local_irq_save(flags);
> + ptep = find_current_mm_pte(pgdir, addr, NULL, &shift);
> + if (!ptep)
> + goto err_out;
> + if (!shift)
> + shift = PAGE_SHIFT;
> +
> + /* align address to page boundary */
> + offset = addr & ((1UL << shift) - 1);
> +
> + pte = READ_ONCE(*ptep);
> + if (!pte_present(pte) || !pte_user(pte))
> + goto err_out;
> + pfn = pte_pfn(pte);
> + if (!page_is_ram(pfn))
> + goto err_out;
> +
> + /* no highmem to worry about here */
> + kaddr = pfn_to_kaddr(pfn);
> + memcpy(buf, kaddr + offset, nb);
> + ret = 0;
> +err_out:
> + local_irq_restore(flags);
> + return ret;
> +}
> +
> +static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
> +{
> + if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned long) ||
> + ((unsigned long)ptr & 7))
> + return -EFAULT;
> +
> + pagefault_disable();
> + if (!__get_user_inatomic(*ret, ptr)) {
> + pagefault_enable();
> + return 0;
> + }
> + pagefault_enable();
> +
> + return read_user_stack_slow(ptr, ret, 8);
> +}
> +
> +static inline int valid_user_sp(unsigned long sp, int is_64)
> +{
> + if (!sp || (sp & 7) || sp > (is_64 ? TASK_SIZE : 0x100000000UL) - 32)
> + return 0;
> + return 1;
> +}
> +
> +/*
> + * 64-bit user processes use the same stack frame for RT and non-RT signals.
> + */
> +struct signal_frame_64 {
> + char dummy[__SIGNAL_FRAMESIZE];
> + struct ucontext uc;
> + unsigned long unused[2];
> + unsigned int tramp[6];
> + struct siginfo *pinfo;
> + void *puc;
> + struct siginfo info;
> + char abigap[288];
> +};
> +
> +static int is_sigreturn_64_address(unsigned long nip, unsigned long fp)
> +{
> + if (nip == fp + offsetof(struct signal_frame_64, tramp))
> + return 1;
> + if (vdso64_rt_sigtramp && current->mm->context.vdso_base &&
> + nip == current->mm->context.vdso_base + vdso64_rt_sigtramp)
> + return 1;
> + return 0;
> +}
> +
> +/*
> + * Do some sanity checking on the signal frame pointed to by sp.
> + * We check the pinfo and puc pointers in the frame.
> + */
> +static int sane_signal_64_frame(unsigned long sp)
> +{
> + struct signal_frame_64 __user *sf;
> + unsigned long pinfo, puc;
> +
> + sf = (struct signal_frame_64 __user *) sp;
> + if (read_user_stack_64((unsigned long __user *) &sf->pinfo, &pinfo) ||
> + read_user_stack_64((unsigned long __user *) &sf->puc, &puc))
> + return 0;
> + return pinfo == (unsigned long) &sf->info &&
> + puc == (unsigned long) &sf->uc;
> +}
> +
> +void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
> + struct pt_regs *regs)
> +{
> + unsigned long sp, next_sp;
> + unsigned long next_ip;
> + unsigned long lr;
> + long level = 0;
> + struct signal_frame_64 __user *sigframe;
> + unsigned long __user *fp, *uregs;
> +
> + next_ip = perf_instruction_pointer(regs);
> + lr = regs->link;
> + sp = regs->gpr[1];
> + perf_callchain_store(entry, next_ip);
> +
> + while (entry->nr < entry->max_stack) {
> + fp = (unsigned long __user *) sp;
> + if (!valid_user_sp(sp, 1) || read_user_stack_64(fp, &next_sp))
> + return;
> + if (level > 0 && read_user_stack_64(&fp[2], &next_ip))
> + return;
> +
> + /*
> + * Note: the next_sp - sp >= signal frame size check
> + * is true when next_sp < sp, which can happen when
> + * transitioning from an alternate signal stack to the
> + * normal stack.
> + */
> + if (next_sp - sp >= sizeof(struct signal_frame_64) &&
> + (is_sigreturn_64_address(next_ip, sp) ||
> + (level <= 1 && is_sigreturn_64_address(lr, sp))) &&
> + sane_signal_64_frame(sp)) {
> + /*
> + * This looks like an signal frame
> + */
> + sigframe = (struct signal_frame_64 __user *) sp;
> + uregs = sigframe->uc.uc_mcontext.gp_regs;
> + if (read_user_stack_64(&uregs[PT_NIP], &next_ip) ||
> + read_user_stack_64(&uregs[PT_LNK], &lr) ||
> + read_user_stack_64(&uregs[PT_R1], &sp))
> + return;
> + level = 0;
> + perf_callchain_store_context(entry, PERF_CONTEXT_USER);
> + perf_callchain_store(entry, next_ip);
> + continue;
> + }
> +
> + if (level == 0)
> + next_ip = lr;
> + perf_callchain_store(entry, next_ip);
> + ++level;
> + sp = next_sp;
> + }
> +}
Michal Suchanek <[email protected]> writes:
> On bigendian ppc64 it is common to have 32bit legacy binaries but much
> less so on littleendian.
I think the toolchain people will tell you that there is no 32-bit
little endian ABI defined at all, if anything works it's by accident.
So I think we should not make this selectable, unless someone puts their
hand up to say they want it and are willing to test it and keep it
working.
cheers
> v3: make configurable
> ---
> arch/powerpc/Kconfig | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 5bab0bb6b833..b0339e892329 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -264,8 +264,9 @@ config PANIC_TIMEOUT
> default 180
>
> config COMPAT
> - bool
> - default y if PPC64
> + bool "Enable support for 32bit binaries"
> + depends on PPC64
> + default y if !CPU_LITTLE_ENDIAN
> select COMPAT_BINFMT_ELF
> select ARCH_WANT_OLD_COMPAT_IPC
> select COMPAT_OLD_SIGACTION
> --
> 2.22.0
Michal Suchanek <[email protected]> writes:
> There are two almost identical copies for 32bit and 64bit.
>
> The function is used only in 32bit code which will be split out in next
> patch so consolidate to one function.
>
> Signed-off-by: Michal Suchanek <[email protected]>
> Reviewed-by: Christophe Leroy <[email protected]>
> ---
> new patch in v6
> ---
> arch/powerpc/perf/callchain.c | 25 +++++++++----------------
> 1 file changed, 9 insertions(+), 16 deletions(-)
>
> diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
> index c84bbd4298a0..b7cdcce20280 100644
> --- a/arch/powerpc/perf/callchain.c
> +++ b/arch/powerpc/perf/callchain.c
> @@ -165,22 +165,6 @@ static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
> return read_user_stack_slow(ptr, ret, 8);
> }
>
> -static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> -{
> - if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
> - ((unsigned long)ptr & 3))
> - return -EFAULT;
> -
> - pagefault_disable();
> - if (!__get_user_inatomic(*ret, ptr)) {
> - pagefault_enable();
> - return 0;
> - }
> - pagefault_enable();
> -
> - return read_user_stack_slow(ptr, ret, 4);
> -}
> -
> static inline int valid_user_sp(unsigned long sp, int is_64)
> {
> if (!sp || (sp & 7) || sp > (is_64 ? TASK_SIZE : 0x100000000UL) - 32)
> @@ -295,6 +279,12 @@ static inline int current_is_64bit(void)
> }
>
> #else /* CONFIG_PPC64 */
> +static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
> +{
> + return 0;
> +}
> +#endif /* CONFIG_PPC64 */
Ending the PPC64 else case here, and then restarting it below with an
ifndef means we end up with two parts of the file that define 32-bit
code, with a common chunk in the middle, which I dislike.
I'd rather you add the empty read_user_stack_slow() in the existing
#else section and then move read_user_stack_32() below the whole ifdef
PPC64/else/endif section.
Is there some reason that doesn't work?
cheers
> @@ -313,9 +303,12 @@ static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
> rc = __get_user_inatomic(*ret, ptr);
> pagefault_enable();
>
> + if (IS_ENABLED(CONFIG_PPC64) && rc)
> + return read_user_stack_slow(ptr, ret, 4);
> return rc;
> }
>
> +#ifndef CONFIG_PPC64
> static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
> struct pt_regs *regs)
> {
> --
> 2.22.0
Michael Ellerman <[email protected]> writes:
> Michal Suchanek <[email protected]> writes:
...
>> @@ -295,6 +279,12 @@ static inline int current_is_64bit(void)
>> }
>>
>> #else /* CONFIG_PPC64 */
>> +static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
>> +{
>> + return 0;
>> +}
>> +#endif /* CONFIG_PPC64 */
>
> Ending the PPC64 else case here, and then restarting it below with an
> ifndef means we end up with two parts of the file that define 32-bit
> code, with a common chunk in the middle, which I dislike.
>
> I'd rather you add the empty read_user_stack_slow() in the existing
> #else section and then move read_user_stack_32() below the whole ifdef
> PPC64/else/endif section.
>
> Is there some reason that doesn't work?
Gah, I missed that you split the whole file later in the series. Any
reason you did it in two steps rather than moving patch 6 earlier in the
series?
cheers
On Mon, 02 Sep 2019 14:01:17 +1000
Michael Ellerman <[email protected]> wrote:
> Michael Ellerman <[email protected]> writes:
> > Michal Suchanek <[email protected]> writes:
> ...
> >> @@ -295,6 +279,12 @@ static inline int current_is_64bit(void)
> >> }
> >>
> >> #else /* CONFIG_PPC64 */
> >> +static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
> >> +{
> >> + return 0;
> >> +}
> >> +#endif /* CONFIG_PPC64 */
> >
> > Ending the PPC64 else case here, and then restarting it below with an
> > ifndef means we end up with two parts of the file that define 32-bit
> > code, with a common chunk in the middle, which I dislike.
> >
> > I'd rather you add the empty read_user_stack_slow() in the existing
> > #else section and then move read_user_stack_32() below the whole ifdef
> > PPC64/else/endif section.
> >
> > Is there some reason that doesn't work?
>
> Gah, I missed that you split the whole file later in the series. Any
> reason you did it in two steps rather than moving patch 6 earlier in the
> series?
To make this patch readable.
Thanks
Michal
On Mon, 02 Sep 2019 12:03:12 +1000
Michael Ellerman <[email protected]> wrote:
> Michal Suchanek <[email protected]> writes:
> > On bigendian ppc64 it is common to have 32bit legacy binaries but much
> > less so on littleendian.
>
> I think the toolchain people will tell you that there is no 32-bit
> little endian ABI defined at all, if anything works it's by accident.
I have seen a piece of software that workarounds code issues on 64bit
by always compiling 32bit code. So it does work in some way. Also it
has been pointed out that you can still switch to BE even with the
'fast-switch' removed.
>
> So I think we should not make this selectable, unless someone puts their
> hand up to say they want it and are willing to test it and keep it
> working.
I don't really care either way.
Thanks
Michal
>
> cheers
>
> > v3: make configurable
> > ---
> > arch/powerpc/Kconfig | 5 +++--
> > 1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > index 5bab0bb6b833..b0339e892329 100644
> > --- a/arch/powerpc/Kconfig
> > +++ b/arch/powerpc/Kconfig
> > @@ -264,8 +264,9 @@ config PANIC_TIMEOUT
> > default 180
> >
> > config COMPAT
> > - bool
> > - default y if PPC64
> > + bool "Enable support for 32bit binaries"
> > + depends on PPC64
> > + default y if !CPU_LITTLE_ENDIAN
> > select COMPAT_BINFMT_ELF
> > select ARCH_WANT_OLD_COMPAT_IPC
> > select COMPAT_OLD_SIGACTION
> > --
> > 2.22.0
On Mon, Sep 02, 2019 at 12:03:12PM +1000, Michael Ellerman wrote:
> Michal Suchanek <[email protected]> writes:
> > On bigendian ppc64 it is common to have 32bit legacy binaries but much
> > less so on littleendian.
>
> I think the toolchain people will tell you that there is no 32-bit
> little endian ABI defined at all, if anything works it's by accident.
There of course is a lot of powerpcle-* support. The ABI used for it
on linux is the SYSV ABI, just like on BE 32-bit.
There also is specific powerpcle-linux support in GCC, and in binutils,
too. Also, config.guess/config.sub supports it. Half a year ago this
all built fine (no, I don't test it often either).
I don't think glibc supports it though, so I wonder if anyone builds an
actual system with it? Maybe busybox or the like?
> So I think we should not make this selectable, unless someone puts their
> hand up to say they want it and are willing to test it and keep it
> working.
What about actual 32-bit LE systems? Does anyone still use those?
Segher
Michal Suchánek <[email protected]> writes:
> On Mon, 02 Sep 2019 14:01:17 +1000
> Michael Ellerman <[email protected]> wrote:
>> Michael Ellerman <[email protected]> writes:
>> > Michal Suchanek <[email protected]> writes:
>> ...
>> >> @@ -295,6 +279,12 @@ static inline int current_is_64bit(void)
>> >> }
>> >>
>> >> #else /* CONFIG_PPC64 */
>> >> +static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
>> >> +{
>> >> + return 0;
>> >> +}
>> >> +#endif /* CONFIG_PPC64 */
>> >
>> > Ending the PPC64 else case here, and then restarting it below with an
>> > ifndef means we end up with two parts of the file that define 32-bit
>> > code, with a common chunk in the middle, which I dislike.
>> >
>> > I'd rather you add the empty read_user_stack_slow() in the existing
>> > #else section and then move read_user_stack_32() below the whole ifdef
>> > PPC64/else/endif section.
>> >
>> > Is there some reason that doesn't work?
>>
>> Gah, I missed that you split the whole file later in the series. Any
>> reason you did it in two steps rather than moving patch 6 earlier in the
>> series?
>
> To make this patch readable.
But it's not very readable :)
You also retained the comment about the 32-bit behaviour which is now a
bit confusing, because the function is used on both 32 & 64-bit.
I think moving it as I suggested in my first reply makes for a better
diff, something like eg:
diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index c84bbd4298a0..82c0f81b89a5 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -165,22 +165,6 @@ static int read_user_stack_64(unsigned long __user *ptr, unsigned long *ret)
return read_user_stack_slow(ptr, ret, 8);
}
-static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
-{
- if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
- ((unsigned long)ptr & 3))
- return -EFAULT;
-
- pagefault_disable();
- if (!__get_user_inatomic(*ret, ptr)) {
- pagefault_enable();
- return 0;
- }
- pagefault_enable();
-
- return read_user_stack_slow(ptr, ret, 4);
-}
-
static inline int valid_user_sp(unsigned long sp, int is_64)
{
if (!sp || (sp & 7) || sp > (is_64 ? TASK_SIZE : 0x100000000UL) - 32)
@@ -295,27 +279,6 @@ static inline int current_is_64bit(void)
}
#else /* CONFIG_PPC64 */
-/*
- * On 32-bit we just access the address and let hash_page create a
- * HPTE if necessary, so there is no need to fall back to reading
- * the page tables. Since this is called at interrupt level,
- * do_page_fault() won't treat a DSI as a page fault.
- */
-static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
-{
- int rc;
-
- if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
- ((unsigned long)ptr & 3))
- return -EFAULT;
-
- pagefault_disable();
- rc = __get_user_inatomic(*ret, ptr);
- pagefault_enable();
-
- return rc;
-}
-
static inline void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
struct pt_regs *regs)
{
@@ -333,6 +296,11 @@ static inline int valid_user_sp(unsigned long sp, int is_64)
return 1;
}
+static int read_user_stack_slow(void __user *ptr, void *buf, int nb)
+{
+ return 0;
+}
+
#define __SIGNAL_FRAMESIZE32 __SIGNAL_FRAMESIZE
#define sigcontext32 sigcontext
#define mcontext32 mcontext
@@ -341,6 +309,33 @@ static inline int valid_user_sp(unsigned long sp, int is_64)
#endif /* CONFIG_PPC64 */
+static int read_user_stack_32(unsigned int __user *ptr, unsigned int *ret)
+{
+ int rc;
+
+ if ((unsigned long)ptr > TASK_SIZE - sizeof(unsigned int) ||
+ ((unsigned long)ptr & 3))
+ return -EFAULT;
+
+ pagefault_disable();
+ rc = __get_user_inatomic(*ret, ptr);
+ pagefault_enable();
+
+ /*
+ * On 32-bit we just access the address and let hash_page() create a
+ * HPTE if necessary, so there is no need to fall back to reading the
+ * page tables. Since this is called at interrupt level, do_page_fault()
+ * won't treat a DSI as a page fault.
+ *
+ * On 64-bit if the access faults we fall back to
+ * read_user_stack_slow(), see the comment there for more details.
+ */
+ if (IS_ENABLED(CONFIG_PPC64) && rc)
+ return read_user_stack_slow(ptr, ret, 4);
+
+ return rc;
+}
+
/*
* Layout for non-RT signal frames
*/
cheers
Segher Boessenkool <[email protected]> writes:
> On Mon, Sep 02, 2019 at 12:03:12PM +1000, Michael Ellerman wrote:
>> Michal Suchanek <[email protected]> writes:
>> > On bigendian ppc64 it is common to have 32bit legacy binaries but much
>> > less so on littleendian.
>>
>> I think the toolchain people will tell you that there is no 32-bit
>> little endian ABI defined at all, if anything works it's by accident.
^
v2
> There of course is a lot of powerpcle-* support. The ABI used for it
> on linux is the SYSV ABI, just like on BE 32-bit.
I was talking about ELFv2, which is 64-bit only. But that was based on
me thinking we had a hard assumption in the kernel that ppc64le kernels
always expect ELFv2 userland. Looking at the code though I was wrong
about that, it looks like we will run little endian ELFv1 binaries,
though I don't think anyone is testing it.
> There also is specific powerpcle-linux support in GCC, and in binutils,
> too. Also, config.guess/config.sub supports it. Half a year ago this
> all built fine (no, I don't test it often either).
>
> I don't think glibc supports it though, so I wonder if anyone builds an
> actual system with it? Maybe busybox or the like?
>
>> So I think we should not make this selectable, unless someone puts their
>> hand up to say they want it and are willing to test it and keep it
>> working.
>
> What about actual 32-bit LE systems? Does anyone still use those?
Not that I've ever heard of.
cheers
Michal Suchánek <[email protected]> writes:
> On Mon, 02 Sep 2019 12:03:12 +1000
> Michael Ellerman <[email protected]> wrote:
>
>> Michal Suchanek <[email protected]> writes:
>> > On bigendian ppc64 it is common to have 32bit legacy binaries but much
>> > less so on littleendian.
>>
>> I think the toolchain people will tell you that there is no 32-bit
>> little endian ABI defined at all, if anything works it's by accident.
>
> I have seen a piece of software that workarounds code issues on 64bit
> by always compiling 32bit code. So it does work in some way.
What software is that?
> Also it has been pointed out that you can still switch to BE even with
> the 'fast-switch' removed.
Yes we have a proper syscall for endian switching, sys_switch_endian(),
which is definitely supported.
But that *only* switches the endian-ness of the process, it does nothing
to the syscall layer. So any process that switches to the other endian
must endian flip syscall arguments (that aren't in registers), or flip
back to the native endian before calling syscalls.
>> So I think we should not make this selectable, unless someone puts their
>> hand up to say they want it and are willing to test it and keep it
>> working.
>
> I don't really care either way.
Sure. We'll see if anyone else speaks up.
cheers
On 09/02/2019 11:53 PM, Michael Ellerman wrote:
> Segher Boessenkool <[email protected]> writes:
>> On Mon, Sep 02, 2019 at 12:03:12PM +1000, Michael Ellerman wrote:
>>> Michal Suchanek <[email protected]> writes:
>>>> On bigendian ppc64 it is common to have 32bit legacy binaries but much
>>>> less so on littleendian.
>>>
>>> I think the toolchain people will tell you that there is no 32-bit
>>> little endian ABI defined at all, if anything works it's by accident.
> ^
> v2
>
>> There of course is a lot of powerpcle-* support. The ABI used for it
>> on linux is the SYSV ABI, just like on BE 32-bit.
>
> I was talking about ELFv2, which is 64-bit only. But that was based on
> me thinking we had a hard assumption in the kernel that ppc64le kernels
> always expect ELFv2 userland. Looking at the code though I was wrong
> about that, it looks like we will run little endian ELFv1 binaries,
> though I don't think anyone is testing it.
>
>> There also is specific powerpcle-linux support in GCC, and in binutils,
>> too. Also, config.guess/config.sub supports it. Half a year ago this
>> all built fine (no, I don't test it often either).
>>
>> I don't think glibc supports it though, so I wonder if anyone builds an
>> actual system with it? Maybe busybox or the like?
>>
>>> So I think we should not make this selectable, unless someone puts their
>>> hand up to say they want it and are willing to test it and keep it
>>> working.
>>
>> What about actual 32-bit LE systems? Does anyone still use those?
>
> Not that I've ever heard of.
>
We dropped support from 32-bit LE at least with a1f3ae3fe8a1
("powerpc/32: Use stmw/lmw for registers save/restore in asm").
Discussion about it can be found at
https://patchwork.ozlabs.org/patch/899465/
Christophe
On Tue, 03 Sep 2019 10:00:57 +1000
Michael Ellerman <[email protected]> wrote:
> Michal Suchánek <[email protected]> writes:
> > On Mon, 02 Sep 2019 12:03:12 +1000
> > Michael Ellerman <[email protected]> wrote:
> >
> >> Michal Suchanek <[email protected]> writes:
> >> > On bigendian ppc64 it is common to have 32bit legacy binaries but much
> >> > less so on littleendian.
> >>
> >> I think the toolchain people will tell you that there is no 32-bit
> >> little endian ABI defined at all, if anything works it's by accident.
> >
> > I have seen a piece of software that workarounds code issues on 64bit
> > by always compiling 32bit code. So it does work in some way.
>
> What software is that?
The only one I have seen is stockfish (v9)
>
> > Also it has been pointed out that you can still switch to BE even with
> > the 'fast-switch' removed.
>
> Yes we have a proper syscall for endian switching, sys_switch_endian(),
> which is definitely supported.
>
> But that *only* switches the endian-ness of the process, it does nothing
> to the syscall layer. So any process that switches to the other endian
> must endian flip syscall arguments (that aren't in registers), or flip
> back to the native endian before calling syscalls.
In other words just installing a chroot of binaries built for the other
endian won't work. You need something like qemu to do the syscall
translation or run full VM with a kernel that has the swapped endian
syscall ABI.
Thanks
Michal
Michal Suchánek <[email protected]> writes:
> On Tue, 03 Sep 2019 10:00:57 +1000
> Michael Ellerman <[email protected]> wrote:
>> Michal Suchánek <[email protected]> writes:
>> > On Mon, 02 Sep 2019 12:03:12 +1000
>> > Michael Ellerman <[email protected]> wrote:
>> >
>> >> Michal Suchanek <[email protected]> writes:
>> >> > On bigendian ppc64 it is common to have 32bit legacy binaries but much
>> >> > less so on littleendian.
>> >>
>> >> I think the toolchain people will tell you that there is no 32-bit
>> >> little endian ABI defined at all, if anything works it's by accident.
>> >
>> > I have seen a piece of software that workarounds code issues on 64bit
>> > by always compiling 32bit code. So it does work in some way.
>>
>> What software is that?
>
> The only one I have seen is stockfish (v9)
OK, not sure how many people are testing that on powerpc :)
>> > Also it has been pointed out that you can still switch to BE even with
>> > the 'fast-switch' removed.
>>
>> Yes we have a proper syscall for endian switching, sys_switch_endian(),
>> which is definitely supported.
>>
>> But that *only* switches the endian-ness of the process, it does nothing
>> to the syscall layer. So any process that switches to the other endian
>> must endian flip syscall arguments (that aren't in registers), or flip
>> back to the native endian before calling syscalls.
>
> In other words just installing a chroot of binaries built for the other
> endian won't work. You need something like qemu to do the syscall
> translation or run full VM with a kernel that has the swapped endian
> syscall ABI.
Yes that's right.
cheers