2012-08-14 17:53:51

by Catalin Marinas

[permalink] [raw]
Subject: [PATCH v2 00/31] AArch64 Linux kernel port

This is the 2nd version of the set of patches implementing Linux kernel
support for the 64-bit ARM architecture (AArch64). Thanks to all who
provided feedback on the previous version.

The Linux kernel patches are available on this tree:

git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64.git upstream

The "master" branch in the above repository tracks the development
history.

Main changes from the previous version (for the full log see the
"master" branch above):

- Kernel port directory and related functions renamed to "arm64".
"uname -m" reports "aarch64" as per the official name.
- NO_BOOTMEM enabled.
- "mem=" is now used for limiting the amount of memory rather than
specifying the memory banks (already done via FDT).
- struct mem_type removed as static definitions are enough for
ioremap().
- Support for ZONE_DMA32.
- Replaced "user_debug" with "/proc/sys/debug/exception-trace".
- Added a generic defconfig file.
- More clean-up (comments, code) and bug-fixes.

The generic patches were dropped from this series as they have been pushed
separately (and most of them already merged into mainline).


Background to the 64-bit ARM architecture:

ARM introduced AArch64 as part of the ARMv8 architecture and consists of
a substantially revised exception model (with 4 exception levels: EL0 -
user, EL1 - kernel, EL2 - hypervisor, EL3 - secure monitor), new A64
instruction set based on larger register file, new FP/SIMD instructions.
The new ABI is LP64 and takes advantage of the larger register file. It
also mandates FP.

AArch64 documentation currently available (publicly, though
click-through agreement required):

- Instruction Set Overview:
http://infocenter.arm.com/help/topic/com.arm.doc.genc010197a/index.html

- ABI (PCS, ELF, DWARF, C++):
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0059a/index.html


Regards,

Catalin


Catalin Marinas (23):
arm64: Assembly macros and definitions
arm64: Kernel booting and initialisation
arm64: Exception handling
arm64: MMU definitions
arm64: MMU initialisation
arm64: MMU fault handling and page table management
arm64: Process management
arm64: CPU support
arm64: Cache maintenance routines
arm64: TLB maintenance functionality
arm64: Atomic operations
arm64: Device specific operations
arm64: DMA mapping API
arm64: SMP support
arm64: ELF definitions
arm64: System calls handling
arm64: Signal handling support
arm64: User access library functions
arm64: Floating point and SIMD
arm64: Add support for /proc/sys/debug/exception-trace
arm64: Miscellaneous header files
arm64: Build infrastructure
arm64: MAINTAINERS update

Marc Zyngier (3):
arm64: IRQ handling
arm64: Miscellaneous library functions
arm64: Generic timers support

Will Deacon (5):
arm64: VDSO support
arm64: 32-bit (compat) applications support
arm64: Debugging support
arm64: Performance counters support
arm64: Loadable modules

Documentation/arm64/booting.txt | 141 +++
Documentation/arm64/memory.txt | 69 ++
MAINTAINERS | 6 +
arch/arm64/Kconfig | 261 +++++
arch/arm64/Kconfig.debug | 27 +
arch/arm64/Makefile | 71 ++
arch/arm64/boot/.gitignore | 2 +
arch/arm64/boot/Makefile | 38 +
arch/arm64/boot/install.sh | 52 +
arch/arm64/configs/generic_defconfig | 85 ++
arch/arm64/include/asm/Kbuild | 51 +
arch/arm64/include/asm/asm-offsets.h | 1 +
arch/arm64/include/asm/assembler.h | 109 ++
arch/arm64/include/asm/atomic.h | 306 ++++++
arch/arm64/include/asm/auxvec.h | 22 +
arch/arm64/include/asm/barrier.h | 52 +
arch/arm64/include/asm/bitops.h | 74 ++
arch/arm64/include/asm/bitsperlong.h | 23 +
arch/arm64/include/asm/byteorder.h | 21 +
arch/arm64/include/asm/cache.h | 32 +
arch/arm64/include/asm/cacheflush.h | 209 ++++
arch/arm64/include/asm/cachetype.h | 48 +
arch/arm64/include/asm/cmpxchg.h | 180 ++++
arch/arm64/include/asm/compat.h | 232 +++++
arch/arm64/include/asm/compiler.h | 30 +
arch/arm64/include/asm/cputype.h | 49 +
arch/arm64/include/asm/debug-monitors.h | 88 ++
arch/arm64/include/asm/device.h | 26 +
arch/arm64/include/asm/dma-mapping.h | 124 +++
arch/arm64/include/asm/elf.h | 176 ++++
arch/arm64/include/asm/exception.h | 23 +
arch/arm64/include/asm/exec.h | 23 +
arch/arm64/include/asm/fb.h | 34 +
arch/arm64/include/asm/fcntl.h | 29 +
arch/arm64/include/asm/fpsimd.h | 64 ++
arch/arm64/include/asm/futex.h | 134 +++
arch/arm64/include/asm/hardirq.h | 52 +
arch/arm64/include/asm/hw_breakpoint.h | 137 +++
arch/arm64/include/asm/hwcap.h | 57 +
arch/arm64/include/asm/io.h | 263 +++++
arch/arm64/include/asm/irq.h | 8 +
arch/arm64/include/asm/irqflags.h | 91 ++
arch/arm64/include/asm/memblock.h | 21 +
arch/arm64/include/asm/memory.h | 144 +++
arch/arm64/include/asm/mmu.h | 27 +
arch/arm64/include/asm/mmu_context.h | 152 +++
arch/arm64/include/asm/module.h | 23 +
arch/arm64/include/asm/page.h | 67 ++
arch/arm64/include/asm/param.h | 23 +
arch/arm64/include/asm/perf_event.h | 22 +
arch/arm64/include/asm/pgalloc.h | 113 ++
arch/arm64/include/asm/pgtable-2level-hwdef.h | 43 +
arch/arm64/include/asm/pgtable-2level-types.h | 60 ++
arch/arm64/include/asm/pgtable-3level-hwdef.h | 50 +
arch/arm64/include/asm/pgtable-3level-types.h | 66 ++
arch/arm64/include/asm/pgtable-hwdef.h | 94 ++
arch/arm64/include/asm/pgtable.h | 328 ++++++
arch/arm64/include/asm/pmu.h | 82 ++
arch/arm64/include/asm/proc-fns.h | 51 +
arch/arm64/include/asm/processor.h | 174 ++++
arch/arm64/include/asm/procinfo.h | 44 +
arch/arm64/include/asm/prom.h | 1 +
arch/arm64/include/asm/ptrace.h | 206 ++++
arch/arm64/include/asm/setup.h | 26 +
arch/arm64/include/asm/shmparam.h | 28 +
arch/arm64/include/asm/sigcontext.h | 69 ++
arch/arm64/include/asm/siginfo.h | 23 +
arch/arm64/include/asm/signal.h | 24 +
arch/arm64/include/asm/signal32.h | 54 +
arch/arm64/include/asm/smp.h | 69 ++
arch/arm64/include/asm/sparsemem.h | 24 +
arch/arm64/include/asm/spinlock.h | 199 ++++
arch/arm64/include/asm/spinlock_types.h | 38 +
arch/arm64/include/asm/stacktrace.h | 29 +
arch/arm64/include/asm/stat.h | 63 ++
arch/arm64/include/asm/statfs.h | 23 +
arch/arm64/include/asm/syscall.h | 101 ++
arch/arm64/include/asm/syscalls.h | 40 +
arch/arm64/include/asm/system_misc.h | 54 +
arch/arm64/include/asm/thread_info.h | 124 +++
arch/arm64/include/asm/timex.h | 32 +
arch/arm64/include/asm/tlb.h | 190 ++++
arch/arm64/include/asm/tlbflush.h | 123 +++
arch/arm64/include/asm/traps.h | 30 +
arch/arm64/include/asm/uaccess.h | 377 +++++++
arch/arm64/include/asm/ucontext.h | 30 +
arch/arm64/include/asm/unistd.h | 27 +
arch/arm64/include/asm/unistd32.h | 758 ++++++++++++++
arch/arm64/include/asm/vdso.h | 41 +
arch/arm64/include/asm/vdso_datapage.h | 43 +
arch/arm64/kernel/.gitignore | 1 +
arch/arm64/kernel/Makefile | 27 +
arch/arm64/kernel/arm64ksyms.c | 55 +
arch/arm64/kernel/asm-offsets.c | 108 ++
arch/arm64/kernel/debug-monitors.c | 288 ++++++
arch/arm64/kernel/elf.c | 41 +
arch/arm64/kernel/entry-fpsimd.S | 80 ++
arch/arm64/kernel/entry.S | 695 +++++++++++++
arch/arm64/kernel/fpsimd.c | 106 ++
arch/arm64/kernel/head.S | 521 ++++++++++
arch/arm64/kernel/hw_breakpoint.c | 880 ++++++++++++++++
arch/arm64/kernel/io.c | 64 ++
arch/arm64/kernel/irq.c | 84 ++
arch/arm64/kernel/kuser32.S | 77 ++
arch/arm64/kernel/module.c | 456 ++++++++
arch/arm64/kernel/perf_event.c | 1368 +++++++++++++++++++++++++
arch/arm64/kernel/process.c | 416 ++++++++
arch/arm64/kernel/ptrace.c | 834 +++++++++++++++
arch/arm64/kernel/setup.c | 357 +++++++
arch/arm64/kernel/signal.c | 436 ++++++++
arch/arm64/kernel/signal32.c | 876 ++++++++++++++++
arch/arm64/kernel/smp.c | 469 +++++++++
arch/arm64/kernel/stacktrace.c | 127 +++
arch/arm64/kernel/sys.c | 138 +++
arch/arm64/kernel/sys32.S | 283 +++++
arch/arm64/kernel/sys_compat.c | 177 ++++
arch/arm64/kernel/time.c | 65 ++
arch/arm64/kernel/traps.c | 357 +++++++
arch/arm64/kernel/vdso.c | 261 +++++
arch/arm64/kernel/vdso/.gitignore | 2 +
arch/arm64/kernel/vdso/Makefile | 63 ++
arch/arm64/kernel/vdso/gen_vdso_offsets.sh | 15 +
arch/arm64/kernel/vdso/gettimeofday.S | 242 +++++
arch/arm64/kernel/vdso/note.S | 28 +
arch/arm64/kernel/vdso/sigreturn.S | 37 +
arch/arm64/kernel/vdso/vdso.S | 33 +
arch/arm64/kernel/vdso/vdso.lds.S | 100 ++
arch/arm64/kernel/vmlinux.lds.S | 146 +++
arch/arm64/lib/Makefile | 5 +
arch/arm64/lib/bitops.c | 25 +
arch/arm64/lib/clear_page.S | 39 +
arch/arm64/lib/clear_user.S | 58 ++
arch/arm64/lib/copy_from_user.S | 66 ++
arch/arm64/lib/copy_in_user.S | 63 ++
arch/arm64/lib/copy_page.S | 46 +
arch/arm64/lib/copy_to_user.S | 61 ++
arch/arm64/lib/delay.c | 55 +
arch/arm64/lib/getuser.S | 75 ++
arch/arm64/lib/putuser.S | 73 ++
arch/arm64/lib/strncpy_from_user.S | 50 +
arch/arm64/lib/strnlen_user.S | 47 +
arch/arm64/mm/Kconfig | 5 +
arch/arm64/mm/Makefile | 6 +
arch/arm64/mm/cache.S | 279 +++++
arch/arm64/mm/context.c | 159 +++
arch/arm64/mm/copypage.c | 34 +
arch/arm64/mm/dma-mapping.c | 208 ++++
arch/arm64/mm/extable.c | 17 +
arch/arm64/mm/fault.c | 534 ++++++++++
arch/arm64/mm/flush.c | 132 +++
arch/arm64/mm/init.c | 416 ++++++++
arch/arm64/mm/ioremap.c | 84 ++
arch/arm64/mm/mm.h | 2 +
arch/arm64/mm/mmap.c | 144 +++
arch/arm64/mm/mmu.c | 395 +++++++
arch/arm64/mm/pgd.c | 49 +
arch/arm64/mm/proc-macros.S | 55 +
arch/arm64/mm/proc-syms.c | 31 +
arch/arm64/mm/proc.S | 193 ++++
arch/arm64/mm/tlb.S | 71 ++
drivers/clocksource/Kconfig | 5 +
drivers/clocksource/Makefile | 1 +
drivers/clocksource/arm_generic.c | 309 ++++++
include/clocksource/arm_generic.h | 21 +
init/Kconfig | 3 +-
kernel/sysctl.c | 2 +-
lib/Kconfig.debug | 6 +-
tools/perf/perf.h | 6 +
168 files changed, 22089 insertions(+), 4 deletions(-)


2012-08-14 17:53:08

by Catalin Marinas

[permalink] [raw]
Subject: [PATCH v2 01/31] arm64: Assembly macros and definitions

This patch introduces several assembly macros and definitions used in
the .S files across arch/arm64/ like IRQ disabling/enabling, together
with asm-offsets.c.

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
---
arch/arm64/include/asm/asm-offsets.h | 1 +
arch/arm64/include/asm/assembler.h | 109 ++++++++++++++++++++++++++++++++++
arch/arm64/kernel/asm-offsets.c | 108 +++++++++++++++++++++++++++++++++
arch/arm64/mm/proc-macros.S | 55 +++++++++++++++++
4 files changed, 273 insertions(+), 0 deletions(-)
create mode 100644 arch/arm64/include/asm/asm-offsets.h
create mode 100644 arch/arm64/include/asm/assembler.h
create mode 100644 arch/arm64/kernel/asm-offsets.c
create mode 100644 arch/arm64/mm/proc-macros.S

diff --git a/arch/arm64/include/asm/asm-offsets.h b/arch/arm64/include/asm/asm-offsets.h
new file mode 100644
index 0000000..d370ee3
--- /dev/null
+++ b/arch/arm64/include/asm/asm-offsets.h
@@ -0,0 +1 @@
+#include <generated/asm-offsets.h>
diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
new file mode 100644
index 0000000..da2a13e
--- /dev/null
+++ b/arch/arm64/include/asm/assembler.h
@@ -0,0 +1,109 @@
+/*
+ * Based on arch/arm/include/asm/assembler.h
+ *
+ * Copyright (C) 1996-2000 Russell King
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASSEMBLY__
+#error "Only include this from assembly code"
+#endif
+
+#include <asm/ptrace.h>
+
+/*
+ * Stack pushing/popping (register pairs only). Equivalent to store decrement
+ * before, load increment after.
+ */
+ .macro push, xreg1, xreg2
+ stp \xreg1, \xreg2, [sp, #-16]!
+ .endm
+
+ .macro pop, xreg1, xreg2
+ ldp \xreg1, \xreg2, [sp], #16
+ .endm
+
+/*
+ * Enable and disable interrupts.
+ */
+ .macro disable_irq
+ msr daifset, #2
+ .endm
+
+ .macro enable_irq
+ msr daifclr, #2
+ .endm
+
+/*
+ * Save/disable and restore interrupts.
+ */
+ .macro save_and_disable_irqs, olddaif
+ mrs \olddaif, daif
+ disable_irq
+ .endm
+
+ .macro restore_irqs, olddaif
+ msr daif, \olddaif
+ .endm
+
+/*
+ * Enable and disable debug exceptions.
+ */
+ .macro disable_dbg
+ msr daifset, #8
+ .endm
+
+ .macro enable_dbg
+ msr daifclr, #8
+ .endm
+
+ .macro disable_step, tmp
+ mrs \tmp, mdscr_el1
+ bic \tmp, \tmp, #1
+ msr mdscr_el1, \tmp
+ .endm
+
+ .macro enable_step, tmp
+ mrs \tmp, mdscr_el1
+ orr \tmp, \tmp, #1
+ msr mdscr_el1, \tmp
+ .endm
+
+ .macro enable_dbg_if_not_stepping, tmp
+ mrs \tmp, mdscr_el1
+ tbnz \tmp, #1, 9990f
+ enable_dbg
+9990:
+ .endm
+
+/*
+ * SMP data memory barrier
+ */
+ .macro smp_dmb, opt
+#ifdef CONFIG_SMP
+ dmb \opt
+#endif
+ .endm
+
+#define USER(l, x...) \
+9999: x; \
+ .section __ex_table,"a"; \
+ .align 3; \
+ .quad 9999b,l; \
+ .previous
+
+/*
+ * Register aliases.
+ */
+lr .req x30 // link register
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
new file mode 100644
index 0000000..5120e51
--- /dev/null
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -0,0 +1,108 @@
+/*
+ * Based on arch/arm/kernel/asm-offsets.c
+ *
+ * Copyright (C) 1995-2003 Russell King
+ * 2001-2002 Keith Owens
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/sched.h>
+#include <linux/mm.h>
+#include <linux/dma-mapping.h>
+#include <asm/thread_info.h>
+#include <asm/memory.h>
+#include <asm/procinfo.h>
+#include <asm/vdso_datapage.h>
+#include <linux/kbuild.h>
+
+int main(void)
+{
+ DEFINE(TSK_ACTIVE_MM, offsetof(struct task_struct, active_mm));
+ BLANK();
+ DEFINE(TI_FLAGS, offsetof(struct thread_info, flags));
+ DEFINE(TI_PREEMPT, offsetof(struct thread_info, preempt_count));
+ DEFINE(TI_ADDR_LIMIT, offsetof(struct thread_info, addr_limit));
+ DEFINE(TI_TASK, offsetof(struct thread_info, task));
+ DEFINE(TI_EXEC_DOMAIN, offsetof(struct thread_info, exec_domain));
+ DEFINE(TI_CPU, offsetof(struct thread_info, cpu));
+ BLANK();
+ DEFINE(THREAD_CPU_CONTEXT, offsetof(struct task_struct, thread.cpu_context));
+ BLANK();
+ DEFINE(S_X0, offsetof(struct pt_regs, regs[0]));
+ DEFINE(S_X1, offsetof(struct pt_regs, regs[1]));
+ DEFINE(S_X2, offsetof(struct pt_regs, regs[2]));
+ DEFINE(S_X3, offsetof(struct pt_regs, regs[3]));
+ DEFINE(S_X4, offsetof(struct pt_regs, regs[4]));
+ DEFINE(S_X5, offsetof(struct pt_regs, regs[5]));
+ DEFINE(S_X6, offsetof(struct pt_regs, regs[6]));
+ DEFINE(S_X7, offsetof(struct pt_regs, regs[7]));
+ DEFINE(S_LR, offsetof(struct pt_regs, regs[30]));
+ DEFINE(S_SP, offsetof(struct pt_regs, sp));
+#ifdef CONFIG_AARCH32_EMULATION
+ DEFINE(S_COMPAT_SP, offsetof(struct pt_regs, compat_sp));
+#endif
+ DEFINE(S_PSTATE, offsetof(struct pt_regs, pstate));
+ DEFINE(S_PC, offsetof(struct pt_regs, pc));
+ DEFINE(S_ORIG_X0, offsetof(struct pt_regs, orig_x0));
+ DEFINE(S_SYSCALLNO, offsetof(struct pt_regs, syscallno));
+ DEFINE(S_FRAME_SIZE, sizeof(struct pt_regs));
+ BLANK();
+ DEFINE(MM_CONTEXT_ID, offsetof(struct mm_struct, context.id));
+ BLANK();
+ DEFINE(VMA_VM_MM, offsetof(struct vm_area_struct, vm_mm));
+ DEFINE(VMA_VM_FLAGS, offsetof(struct vm_area_struct, vm_flags));
+ BLANK();
+ DEFINE(VM_EXEC, VM_EXEC);
+ BLANK();
+ DEFINE(PAGE_SZ, PAGE_SIZE);
+ BLANK();
+ DEFINE(PROC_INFO_SZ, sizeof(struct proc_info_list));
+ DEFINE(PROCINFO_INITFUNC, offsetof(struct proc_info_list, __cpu_flush));
+ BLANK();
+ DEFINE(DMA_BIDIRECTIONAL, DMA_BIDIRECTIONAL);
+ DEFINE(DMA_TO_DEVICE, DMA_TO_DEVICE);
+ DEFINE(DMA_FROM_DEVICE, DMA_FROM_DEVICE);
+ BLANK();
+ DEFINE(CLOCK_REALTIME, CLOCK_REALTIME);
+ DEFINE(CLOCK_MONOTONIC, CLOCK_MONOTONIC);
+ DEFINE(CLOCK_REALTIME_RES, MONOTONIC_RES_NSEC);
+ DEFINE(CLOCK_REALTIME_COARSE, CLOCK_REALTIME_COARSE);
+ DEFINE(CLOCK_MONOTONIC_COARSE,CLOCK_MONOTONIC_COARSE);
+ DEFINE(CLOCK_COARSE_RES, LOW_RES_NSEC);
+ DEFINE(NSEC_PER_SEC, NSEC_PER_SEC);
+ BLANK();
+ DEFINE(VDSO_CS_CYCLE_LAST, offsetof(struct vdso_data, cs_cycle_last));
+ DEFINE(VDSO_XTIME_CLK_SEC, offsetof(struct vdso_data, xtime_clock_sec));
+ DEFINE(VDSO_XTIME_CLK_NSEC, offsetof(struct vdso_data, xtime_clock_nsec));
+ DEFINE(VDSO_XTIME_CRS_SEC, offsetof(struct vdso_data, xtime_coarse_sec));
+ DEFINE(VDSO_XTIME_CRS_NSEC, offsetof(struct vdso_data, xtime_coarse_nsec));
+ DEFINE(VDSO_WTM_CLK_SEC, offsetof(struct vdso_data, wtm_clock_sec));
+ DEFINE(VDSO_WTM_CLK_NSEC, offsetof(struct vdso_data, wtm_clock_nsec));
+ DEFINE(VDSO_TB_SEQ_COUNT, offsetof(struct vdso_data, tb_seq_count));
+ DEFINE(VDSO_CS_MULT, offsetof(struct vdso_data, cs_mult));
+ DEFINE(VDSO_CS_SHIFT, offsetof(struct vdso_data, cs_shift));
+ DEFINE(VDSO_TZ_MINWEST, offsetof(struct vdso_data, tz_minuteswest));
+ DEFINE(VDSO_TZ_DSTTIME, offsetof(struct vdso_data, tz_dsttime));
+ DEFINE(VDSO_USE_SYSCALL, offsetof(struct vdso_data, use_syscall));
+ BLANK();
+ DEFINE(TVAL_TV_SEC, offsetof(struct timeval, tv_sec));
+ DEFINE(TVAL_TV_USEC, offsetof(struct timeval, tv_usec));
+ DEFINE(TSPEC_TV_SEC, offsetof(struct timespec, tv_sec));
+ DEFINE(TSPEC_TV_NSEC, offsetof(struct timespec, tv_nsec));
+ BLANK();
+ DEFINE(TZ_MINWEST, offsetof(struct timezone, tz_minuteswest));
+ DEFINE(TZ_DSTTIME, offsetof(struct timezone, tz_dsttime));
+ return 0;
+}
diff --git a/arch/arm64/mm/proc-macros.S b/arch/arm64/mm/proc-macros.S
new file mode 100644
index 0000000..8957b82
--- /dev/null
+++ b/arch/arm64/mm/proc-macros.S
@@ -0,0 +1,55 @@
+/*
+ * Based on arch/arm/mm/proc-macros.S
+ *
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <asm/asm-offsets.h>
+#include <asm/thread_info.h>
+
+/*
+ * vma_vm_mm - get mm pointer from vma pointer (vma->vm_mm)
+ */
+ .macro vma_vm_mm, rd, rn
+ ldr \rd, [\rn, #VMA_VM_MM]
+ .endm
+
+/*
+ * mmid - get context id from mm pointer (mm->context.id)
+ */
+ .macro mmid, rd, rn
+ ldr \rd, [\rn, #MM_CONTEXT_ID]
+ .endm
+
+/*
+ * dcache_line_size - get the minimum D-cache line size from the CTR register.
+ */
+ .macro dcache_line_size, reg, tmp
+ mrs \tmp, ctr_el0 // read CTR
+ lsr \tmp, \tmp, #16
+ and \tmp, \tmp, #0xf // cache line size encoding
+ mov \reg, #4 // bytes per word
+ lsl \reg, \reg, \tmp // actual cache line size
+ .endm
+
+/*
+ * icache_line_size - get the minimum I-cache line size from the CTR register.
+ */
+ .macro icache_line_size, reg, tmp
+ mrs \tmp, ctr_el0 // read CTR
+ and \tmp, \tmp, #0xf // cache line size encoding
+ mov \reg, #4 // bytes per word
+ lsl \reg, \reg, \tmp // actual cache line size
+ .endm

2012-08-14 17:54:20

by Catalin Marinas

[permalink] [raw]
Subject: [PATCH v2 20/31] arm64: User access library functions

This patch add support for various user access functions. These
functions use the standard LDR/STR instructions and not the LDRT/STRT
variants in order to allow kernel addresses (after set_fs(KERNEL_DS)).

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
---
arch/arm64/include/asm/uaccess.h | 377 ++++++++++++++++++++++++++++++++++++
arch/arm64/lib/clear_user.S | 58 ++++++
arch/arm64/lib/copy_from_user.S | 66 +++++++
arch/arm64/lib/copy_in_user.S | 63 ++++++
arch/arm64/lib/copy_to_user.S | 61 ++++++
arch/arm64/lib/getuser.S | 75 +++++++
arch/arm64/lib/putuser.S | 73 +++++++
arch/arm64/lib/strncpy_from_user.S | 50 +++++
arch/arm64/lib/strnlen_user.S | 47 +++++
9 files changed, 870 insertions(+), 0 deletions(-)
create mode 100644 arch/arm64/include/asm/uaccess.h
create mode 100644 arch/arm64/lib/clear_user.S
create mode 100644 arch/arm64/lib/copy_from_user.S
create mode 100644 arch/arm64/lib/copy_in_user.S
create mode 100644 arch/arm64/lib/copy_to_user.S
create mode 100644 arch/arm64/lib/getuser.S
create mode 100644 arch/arm64/lib/putuser.S
create mode 100644 arch/arm64/lib/strncpy_from_user.S
create mode 100644 arch/arm64/lib/strnlen_user.S

diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
new file mode 100644
index 0000000..09d7b53
--- /dev/null
+++ b/arch/arm64/include/asm/uaccess.h
@@ -0,0 +1,377 @@
+/*
+ * Based on arch/arm/include/asm/uaccess.h
+ *
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_UACCESS_H
+#define __ASM_UACCESS_H
+
+/*
+ * User space memory access functions
+ */
+#include <linux/string.h>
+#include <linux/thread_info.h>
+
+#include <asm/ptrace.h>
+#include <asm/errno.h>
+#include <asm/memory.h>
+#include <asm/compiler.h>
+
+#define VERIFY_READ 0
+#define VERIFY_WRITE 1
+
+/*
+ * The exception table consists of pairs of addresses: the first is the
+ * address of an instruction that is allowed to fault, and the second is
+ * the address at which the program should continue. No registers are
+ * modified, so it is entirely up to the continuation code to figure out
+ * what to do.
+ *
+ * All the routines below use bits of fixup code that are out of line
+ * with the main instruction path. This means when everything is well,
+ * we don't even have to jump over them. Further, they do not intrude
+ * on our cache or tlb entries.
+ */
+
+struct exception_table_entry
+{
+ unsigned long insn, fixup;
+};
+
+extern int fixup_exception(struct pt_regs *regs);
+
+/*
+ * These two are intentionally not defined anywhere - if the kernel
+ * code generates any references to them, that's a bug.
+ */
+extern long __get_user_bad(void);
+extern long __put_user_bad(void);
+
+#define KERNEL_DS (-1UL)
+#define get_ds() (KERNEL_DS)
+
+#define USER_DS TASK_SIZE_64
+#define get_fs() (current_thread_info()->addr_limit)
+
+static inline void set_fs(mm_segment_t fs)
+{
+ current_thread_info()->addr_limit = fs;
+}
+
+#define segment_eq(a,b) ((a) == (b))
+
+/*
+ * Return 1 if addr < current->addr_limit, 0 otherwise.
+ */
+#define __addr_ok(addr) \
+({ \
+ unsigned long flag; \
+ asm("cmp %1, %0; cset %0, lo" \
+ : "=&r" (flag) \
+ : "r" (addr), "0" (current_thread_info()->addr_limit) \
+ : "cc"); \
+ flag; \
+})
+
+/*
+ * Test whether a block of memory is a valid user space address.
+ * Returns 1 if the range is valid, 0 otherwise.
+ *
+ * This is equivalent to the following test:
+ * (u65)addr + (u65)size < (u65)current->addr_limit
+ *
+ * This needs 65-bit arithmetic.
+ */
+#define __range_ok(addr,size) \
+({ \
+ unsigned long flag, roksum; \
+ __chk_user_ptr(addr); \
+ asm("adds %1, %1, %3; ccmp %1, %4, #2, cc; cset %0, cc" \
+ : "=&r" (flag), "=&r" (roksum) \
+ : "1" (addr), "Ir" (size), \
+ "r" (current_thread_info()->addr_limit) \
+ : "cc"); \
+ flag; \
+})
+
+/*
+ * Single-value transfer routines. They automatically use the right
+ * size if we just have the right pointer type. Note that the functions
+ * which read from user space (*get_*) need to take care not to leak
+ * kernel data even if the calling code is buggy and fails to check
+ * the return value. This means zeroing out the destination variable
+ * or buffer on error. Normally this is done out of line by the
+ * fixup code, but there are a few places where it intrudes on the
+ * main code path. When we only write to user space, there is no
+ * problem.
+ */
+extern long __get_user_1(void *);
+extern long __get_user_2(void *);
+extern long __get_user_4(void *);
+extern long __get_user_8(void *);
+
+#define __get_user_x(__r2,__p,__e,__s,__i...) \
+ asm volatile( \
+ __asmeq("%0", "x0") __asmeq("%1", "x2") \
+ "bl __get_user_" #__s \
+ : "=&r" (__e), "=r" (__r2) \
+ : "0" (__p) \
+ : __i, "cc")
+
+#define get_user(x,p) \
+ ({ \
+ register const typeof(*(p)) __user *__p asm("x0") = (p);\
+ register unsigned long __r2 asm("x2"); \
+ register long __e asm("x0"); \
+ switch (sizeof(*(__p))) { \
+ case 1: \
+ __get_user_x(__r2, __p, __e, 1, "x30"); \
+ break; \
+ case 2: \
+ __get_user_x(__r2, __p, __e, 2, "x3", "x30"); \
+ break; \
+ case 4: \
+ __get_user_x(__r2, __p, __e, 4, "x30"); \
+ break; \
+ case 8: \
+ __get_user_x(__r2, __p, __e, 8, "x30"); \
+ break; \
+ default: __e = __get_user_bad(); break; \
+ } \
+ x = (typeof(*(p))) __r2; \
+ __e; \
+ })
+
+#define __get_user_unaligned __get_user
+
+extern long __put_user_1(void *, unsigned long);
+extern long __put_user_2(void *, unsigned long);
+extern long __put_user_4(void *, unsigned long);
+extern long __put_user_8(void *, unsigned long);
+
+#define __put_user_x(__r2,__p,__e,__s) \
+ asm volatile( \
+ __asmeq("%0", "x0") __asmeq("%2", "x2") \
+ "bl __put_user_" #__s \
+ : "=&r" (__e) \
+ : "0" (__p), "r" (__r2) \
+ : "x8", "x30", "cc")
+
+#define put_user(x,p) \
+ ({ \
+ register const typeof(*(p)) __r2 asm("x2") = (x); \
+ register const typeof(*(p)) __user *__p asm("x0") = (p);\
+ register long __e asm("x0"); \
+ switch (sizeof(*(__p))) { \
+ case 1: \
+ __put_user_x(__r2, __p, __e, 1); \
+ break; \
+ case 2: \
+ __put_user_x(__r2, __p, __e, 2); \
+ break; \
+ case 4: \
+ __put_user_x(__r2, __p, __e, 4); \
+ break; \
+ case 8: \
+ __put_user_x(__r2, __p, __e, 8); \
+ break; \
+ default: __e = __put_user_bad(); break; \
+ } \
+ __e; \
+ })
+
+#define __put_user_unaligned __put_user
+
+#define access_ok(type,addr,size) __range_ok(addr,size)
+
+/*
+ * The "__xxx" versions of the user access functions do not verify the
+ * address space - it must have been done previously with a separate
+ * "access_ok()" call.
+ *
+ * The "xxx_error" versions set the third argument to EFAULT if an
+ * error occurs, and leave it unchanged on success. Note that these
+ * versions are void (ie, don't return a value as such).
+ */
+#define __get_user(x,ptr) \
+({ \
+ long __gu_err = 0; \
+ __get_user_err((x),(ptr),__gu_err); \
+ __gu_err; \
+})
+
+#define __get_user_error(x,ptr,err) \
+({ \
+ __get_user_err((x),(ptr),err); \
+ (void) 0; \
+})
+
+#define __get_user_err(x,ptr,err) \
+do { \
+ unsigned long __gu_addr = (unsigned long)(ptr); \
+ unsigned long __gu_val; \
+ __chk_user_ptr(ptr); \
+ switch (sizeof(*(ptr))) { \
+ case 1: \
+ __get_user_asm("ldrb", "%w", __gu_val, __gu_addr, err); \
+ break; \
+ case 2: \
+ __get_user_asm("ldrh", "%w", __gu_val, __gu_addr, err); \
+ break; \
+ case 4: \
+ __get_user_asm("ldr", "%w", __gu_val, __gu_addr, err); \
+ break; \
+ case 8: \
+ __get_user_asm("ldr", "%", __gu_val, __gu_addr, err); \
+ break; \
+ default: \
+ (__gu_val) = __get_user_bad(); \
+ } \
+ (x) = (__typeof__(*(ptr)))__gu_val; \
+} while (0)
+
+#define __get_user_asm(instr, reg, x, addr, err) \
+ asm volatile( \
+ "1: " instr " " reg "1, [%2]\n" \
+ "2:\n" \
+ " .section .fixup, \"ax\"\n" \
+ " .align 2\n" \
+ "3: mov %0, %3\n" \
+ " mov %1, #0\n" \
+ " b 2b\n" \
+ " .previous\n" \
+ " .section __ex_table,\"a\"\n" \
+ " .align 3\n" \
+ " .quad 1b, 3b\n" \
+ " .previous" \
+ : "+r" (err), "=&r" (x) \
+ : "r" (addr), "i" (-EFAULT) \
+ : "cc")
+
+#define __put_user(x,ptr) \
+({ \
+ long __pu_err = 0; \
+ __put_user_err((x),(ptr),__pu_err); \
+ __pu_err; \
+})
+
+#define __put_user_error(x,ptr,err) \
+({ \
+ __put_user_err((x),(ptr),err); \
+ (void) 0; \
+})
+
+#define __put_user_err(x,ptr,err) \
+do { \
+ unsigned long __pu_addr = (unsigned long)(ptr); \
+ __typeof__(*(ptr)) __pu_val = (x); \
+ __chk_user_ptr(ptr); \
+ switch (sizeof(*(ptr))) { \
+ case 1: \
+ __put_user_asm("strb", "%w", __pu_val, __pu_addr, err); \
+ break; \
+ case 2: \
+ __put_user_asm("strh", "%w", __pu_val, __pu_addr, err); \
+ break; \
+ case 4: \
+ __put_user_asm("str", "%w", __pu_val, __pu_addr, err); \
+ break; \
+ case 8: \
+ __put_user_asm("str", "%", __pu_val, __pu_addr, err); \
+ break; \
+ default: \
+ __put_user_bad(); \
+ } \
+} while (0)
+
+#define __put_user_asm(instr, reg, x, __pu_addr, err) \
+ asm volatile( \
+ "1: " instr " " reg "1, [%2]\n" \
+ "2:\n" \
+ " .section .fixup,\"ax\"\n" \
+ " .align 2\n" \
+ "3: mov %0, %3\n" \
+ " b 2b\n" \
+ " .previous\n" \
+ " .section __ex_table,\"a\"\n" \
+ " .align 3\n" \
+ " .quad 1b, 3b\n" \
+ " .previous" \
+ : "+r" (err) \
+ : "r" (x), "r" (__pu_addr), "i" (-EFAULT) \
+ : "cc")
+
+extern unsigned long __must_check __copy_from_user(void *to, const void __user *from, unsigned long n);
+extern unsigned long __must_check __copy_to_user(void __user *to, const void *from, unsigned long n);
+extern unsigned long __must_check __copy_in_user(void __user *to, const void __user *from, unsigned long n);
+extern unsigned long __must_check __clear_user(void __user *addr, unsigned long n);
+
+extern unsigned long __must_check __strncpy_from_user(char *to, const char __user *from, unsigned long count);
+extern unsigned long __must_check __strnlen_user(const char __user *s, long n);
+
+static inline unsigned long __must_check copy_from_user(void *to, const void __user *from, unsigned long n)
+{
+ if (access_ok(VERIFY_READ, from, n))
+ n = __copy_from_user(to, from, n);
+ else /* security hole - plug it */
+ memset(to, 0, n);
+ return n;
+}
+
+static inline unsigned long __must_check copy_to_user(void __user *to, const void *from, unsigned long n)
+{
+ if (access_ok(VERIFY_WRITE, to, n))
+ n = __copy_to_user(to, from, n);
+ return n;
+}
+
+static inline unsigned long __must_check copy_in_user(void __user *to, const void __user *from, unsigned long n)
+{
+ if (access_ok(VERIFY_READ, from, n) && access_ok(VERIFY_WRITE, to, n))
+ n = __copy_in_user(to, from, n);
+ return n;
+}
+
+#define __copy_to_user_inatomic __copy_to_user
+#define __copy_from_user_inatomic __copy_from_user
+
+static inline unsigned long __must_check clear_user(void __user *to, unsigned long n)
+{
+ if (access_ok(VERIFY_WRITE, to, n))
+ n = __clear_user(to, n);
+ return n;
+}
+
+static inline long __must_check strncpy_from_user(char *dst, const char __user *src, long count)
+{
+ long res = -EFAULT;
+ if (access_ok(VERIFY_READ, src, 1))
+ res = __strncpy_from_user(dst, src, count);
+ return res;
+}
+
+#define strlen_user(s) strnlen_user(s, ~0UL >> 1)
+
+static inline long __must_check strnlen_user(const char __user *s, long n)
+{
+ unsigned long res = 0;
+
+ if (__addr_ok(s))
+ res = __strnlen_user(s, n);
+
+ return res;
+}
+
+#endif /* __ASM_UACCESS_H */
diff --git a/arch/arm64/lib/clear_user.S b/arch/arm64/lib/clear_user.S
new file mode 100644
index 0000000..6e0ed93
--- /dev/null
+++ b/arch/arm64/lib/clear_user.S
@@ -0,0 +1,58 @@
+/*
+ * Based on arch/arm/lib/clear_user.S
+ *
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+
+ .text
+
+/* Prototype: int __clear_user(void *addr, size_t sz)
+ * Purpose : clear some user memory
+ * Params : addr - user memory address to clear
+ * : sz - number of bytes to clear
+ * Returns : number of bytes NOT cleared
+ *
+ * Alignment fixed up by hardware.
+ */
+ENTRY(__clear_user)
+ mov x2, x1 // save the size for fixup return
+ subs x1, x1, #8
+ b.mi 2f
+1:
+USER(9f, str xzr, [x0], #8 )
+ subs x1, x1, #8
+ b.pl 1b
+2: adds x1, x1, #4
+ b.mi 3f
+USER(9f, str wzr, [x0], #4 )
+ sub x1, x1, #4
+3: adds x1, x1, #2
+ b.mi 4f
+USER(9f, strh wzr, [x0], #2 )
+ sub x1, x1, #2
+4: adds x1, x1, #1
+ b.mi 5f
+ strb wzr, [x0]
+5: mov x0, #0
+ ret
+ENDPROC(__clear_user)
+
+ .section .fixup,"ax"
+ .align 2
+9: mov x0, x2 // return the original size
+ ret
+ .previous
diff --git a/arch/arm64/lib/copy_from_user.S b/arch/arm64/lib/copy_from_user.S
new file mode 100644
index 0000000..5e27add
--- /dev/null
+++ b/arch/arm64/lib/copy_from_user.S
@@ -0,0 +1,66 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+
+/*
+ * Copy from user space to a kernel buffer (alignment handled by the hardware)
+ *
+ * Parameters:
+ * x0 - to
+ * x1 - from
+ * x2 - n
+ * Returns:
+ * x0 - bytes not copied
+ */
+ENTRY(__copy_from_user)
+ add x4, x1, x2 // upper user buffer boundary
+ subs x2, x2, #8
+ b.mi 2f
+1:
+USER(9f, ldr x3, [x1], #8 )
+ subs x2, x2, #8
+ str x3, [x0], #8
+ b.pl 1b
+2: adds x2, x2, #4
+ b.mi 3f
+USER(9f, ldr w3, [x1], #4 )
+ sub x2, x2, #4
+ str w3, [x0], #4
+3: adds x2, x2, #2
+ b.mi 4f
+USER(9f, ldrh w3, [x1], #2 )
+ sub x2, x2, #2
+ strh w3, [x0], #2
+4: adds x2, x2, #1
+ b.mi 5f
+USER(9f, ldrb w3, [x1] )
+ strb w3, [x0]
+5: mov x0, #0
+ ret
+ENDPROC(__copy_from_user)
+
+ .section .fixup,"ax"
+ .align 2
+9: sub x2, x4, x1
+ mov x3, x2
+10: strb wzr, [x0], #1 // zero remaining buffer space
+ subs x3, x3, #1
+ b.ne 10b
+ mov x0, x2 // bytes not copied
+ ret
+ .previous
diff --git a/arch/arm64/lib/copy_in_user.S b/arch/arm64/lib/copy_in_user.S
new file mode 100644
index 0000000..84b6c9b
--- /dev/null
+++ b/arch/arm64/lib/copy_in_user.S
@@ -0,0 +1,63 @@
+/*
+ * Copy from user space to user space
+ *
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+
+/*
+ * Copy from user space to user space (alignment handled by the hardware)
+ *
+ * Parameters:
+ * x0 - to
+ * x1 - from
+ * x2 - n
+ * Returns:
+ * x0 - bytes not copied
+ */
+ENTRY(__copy_in_user)
+ add x4, x0, x2 // upper user buffer boundary
+ subs x2, x2, #8
+ b.mi 2f
+1:
+USER(9f, ldr x3, [x1], #8 )
+ subs x2, x2, #8
+USER(9f, str x3, [x0], #8 )
+ b.pl 1b
+2: adds x2, x2, #4
+ b.mi 3f
+USER(9f, ldr w3, [x1], #4 )
+ sub x2, x2, #4
+USER(9f, str w3, [x0], #4 )
+3: adds x2, x2, #2
+ b.mi 4f
+USER(9f, ldrh w3, [x1], #2 )
+ sub x2, x2, #2
+USER(9f, strh w3, [x0], #2 )
+4: adds x2, x2, #1
+ b.mi 5f
+USER(9f, ldrb w3, [x1] )
+USER(9f, strb w3, [x0] )
+5: mov x0, #0
+ ret
+ENDPROC(__copy_in_user)
+
+ .section .fixup,"ax"
+ .align 2
+9: sub x0, x4, x0 // bytes not copied
+ ret
+ .previous
diff --git a/arch/arm64/lib/copy_to_user.S b/arch/arm64/lib/copy_to_user.S
new file mode 100644
index 0000000..a0aeeb9
--- /dev/null
+++ b/arch/arm64/lib/copy_to_user.S
@@ -0,0 +1,61 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+
+/*
+ * Copy to user space from a kernel buffer (alignment handled by the hardware)
+ *
+ * Parameters:
+ * x0 - to
+ * x1 - from
+ * x2 - n
+ * Returns:
+ * x0 - bytes not copied
+ */
+ENTRY(__copy_to_user)
+ add x4, x0, x2 // upper user buffer boundary
+ subs x2, x2, #8
+ b.mi 2f
+1:
+ ldr x3, [x1], #8
+ subs x2, x2, #8
+USER(9f, str x3, [x0], #8 )
+ b.pl 1b
+2: adds x2, x2, #4
+ b.mi 3f
+ ldr w3, [x1], #4
+ sub x2, x2, #4
+USER(9f, str w3, [x0], #4 )
+3: adds x2, x2, #2
+ b.mi 4f
+ ldrh w3, [x1], #2
+ sub x2, x2, #2
+USER(9f, strh w3, [x0], #2 )
+4: adds x2, x2, #1
+ b.mi 5f
+ ldrb w3, [x1]
+USER(9f, strb w3, [x0] )
+5: mov x0, #0
+ ret
+ENDPROC(__copy_to_user)
+
+ .section .fixup,"ax"
+ .align 2
+9: sub x0, x4, x0 // bytes not copied
+ ret
+ .previous
diff --git a/arch/arm64/lib/getuser.S b/arch/arm64/lib/getuser.S
new file mode 100644
index 0000000..1b4da22
--- /dev/null
+++ b/arch/arm64/lib/getuser.S
@@ -0,0 +1,75 @@
+/*
+ * Based on arch/arm/lib/getuser.S
+ *
+ * Copyright (C) 2012 ARM Ltd.
+ * Idea from x86 version, (C) Copyright 1998 Linus Torvalds
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ *
+ *
+ * These functions have a non-standard call interface to make them more
+ * efficient, especially as they return an error value in addition to
+ * the "real" return value.
+ *
+ * __get_user_X
+ *
+ * Inputs: x0 contains the address
+ * Outputs: x0 is the error code
+ * x2, x3 contains the zero-extended value
+ * lr corrupted
+ *
+ * No other registers must be altered. (see <asm/uaccess.h>
+ * for specific ASM register usage).
+ *
+ * Note also that it is intended that __get_user_bad is not global.
+ */
+
+#include <linux/linkage.h>
+#include <asm/errno.h>
+
+ENTRY(__get_user_1)
+1: ldrb w2, [x0]
+ mov x0, #0
+ ret
+ENDPROC(__get_user_1)
+
+ENTRY(__get_user_2)
+2: ldrh w2, [x0]
+ mov x0, #0
+ ret
+ENDPROC(__get_user_2)
+
+ENTRY(__get_user_4)
+3: ldr w2, [x0]
+ mov x0, #0
+ ret
+ENDPROC(__get_user_4)
+
+ENTRY(__get_user_8)
+4: ldr x2, [x0]
+ mov x0, #0
+ ret
+ENDPROC(__get_user_4)
+
+__get_user_bad:
+ mov x2, #0
+ mov x0, #-EFAULT
+ ret
+ENDPROC(__get_user_bad)
+
+.section __ex_table, "a"
+ .quad 1b, __get_user_bad
+ .quad 2b, __get_user_bad
+ .quad 3b, __get_user_bad
+ .quad 4b, __get_user_bad
+.previous
diff --git a/arch/arm64/lib/putuser.S b/arch/arm64/lib/putuser.S
new file mode 100644
index 0000000..62d4a42
--- /dev/null
+++ b/arch/arm64/lib/putuser.S
@@ -0,0 +1,73 @@
+/*
+ * Based on arch/arm/lib/putuser.S
+ *
+ * Copyright (C) 2012 ARM Ltd.
+ * Idea from x86 version, (C) Copyright 1998 Linus Torvalds
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ *
+ * These functions have a non-standard call interface to make
+ * them more efficient, especially as they return an error
+ * value in addition to the "real" return value.
+ *
+ * __put_user_X
+ *
+ * Inputs: x0 contains the address
+ * x2, x3 contains the value
+ * Outputs: x0 is the error code
+ * lr corrupted
+ *
+ * No other registers must be altered. (see <asm/uaccess.h>
+ * for specific ASM register usage).
+ *
+ * Note that it is intended that __put_user_bad is not global.
+ */
+
+#include <linux/linkage.h>
+#include <asm/errno.h>
+
+ENTRY(__put_user_1)
+1: strb w2, [x0]
+ mov x0, #0
+ ret
+ENDPROC(__put_user_1)
+
+ENTRY(__put_user_2)
+2: strh w2, [x0]
+ mov x0, #0
+ ret
+ENDPROC(__put_user_2)
+
+ENTRY(__put_user_4)
+3: str w2, [x0]
+ mov x0, #0
+ ret
+ENDPROC(__put_user_4)
+
+ENTRY(__put_user_8)
+4: str x2, [x0]
+ mov x0, #0
+ ret
+ENDPROC(__put_user_8)
+
+__put_user_bad:
+ mov x0, #-EFAULT
+ ret
+ENDPROC(__put_user_bad)
+
+.section __ex_table, "a"
+ .quad 1b, __put_user_bad
+ .quad 2b, __put_user_bad
+ .quad 3b, __put_user_bad
+ .quad 4b, __put_user_bad
+.previous
diff --git a/arch/arm64/lib/strncpy_from_user.S b/arch/arm64/lib/strncpy_from_user.S
new file mode 100644
index 0000000..56e448a
--- /dev/null
+++ b/arch/arm64/lib/strncpy_from_user.S
@@ -0,0 +1,50 @@
+/*
+ * Based on arch/arm/lib/strncpy_from_user.S
+ *
+ * Copyright (C) 1995-2000 Russell King
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+#include <asm/errno.h>
+
+ .text
+ .align 5
+
+/*
+ * Copy a string from user space to kernel space.
+ * x0 = dst, x1 = src, x2 = byte length
+ * returns the number of characters copied (strlen of copied string),
+ * -EFAULT on exception, or "len" if we fill the whole buffer
+ */
+ENTRY(__strncpy_from_user)
+ mov x4, x1
+1: subs x2, x2, #1
+ bmi 2f
+USER(9f, ldrb w3, [x1], #1 )
+ strb w3, [x0], #1
+ cbnz w3, 1b
+ sub x1, x1, #1 // take NUL character out of count
+2: sub x0, x1, x4
+ ret
+ENDPROC(__strncpy_from_user)
+
+ .section .fixup,"ax"
+ .align 0
+9: strb wzr, [x0] // null terminate
+ mov x0, #-EFAULT
+ ret
+ .previous
diff --git a/arch/arm64/lib/strnlen_user.S b/arch/arm64/lib/strnlen_user.S
new file mode 100644
index 0000000..7f7b176
--- /dev/null
+++ b/arch/arm64/lib/strnlen_user.S
@@ -0,0 +1,47 @@
+/*
+ * Based on arch/arm/lib/strnlen_user.S
+ *
+ * Copyright (C) 1995-2000 Russell King
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+#include <asm/errno.h>
+
+ .text
+ .align 5
+
+/* Prototype: unsigned long __strnlen_user(const char *str, long n)
+ * Purpose : get length of a string in user memory
+ * Params : str - address of string in user memory
+ * Returns : length of string *including terminator*
+ * or zero on exception, or n if too long
+ */
+ENTRY(__strnlen_user)
+ mov x2, x0
+1: subs x1, x1, #1
+ b.mi 2f
+USER(9f, ldrb w3, [x0], #1 )
+ cbnz w3, 1b
+2: sub x0, x0, x2
+ ret
+ENDPROC(__strnlen_user)
+
+ .section .fixup,"ax"
+ .align 0
+9: mov x0, #0
+ ret
+ .previous

2012-08-14 17:54:27

by Catalin Marinas

[permalink] [raw]
Subject: [PATCH v2 22/31] arm64: Floating point and SIMD

This patch adds support for FP/ASIMD register bank saving and restoring
during context switch and FP exception handling to generate SIGFPE.
There are 32 128-bit registers and the context switching is currently
done non-lazily. Benchmarks on real hardware are required before
implementing lazy FP state saving/restoring.

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
---
arch/arm64/include/asm/fpsimd.h | 64 +++++++++++++++++++++++
arch/arm64/kernel/entry-fpsimd.S | 80 ++++++++++++++++++++++++++++
arch/arm64/kernel/fpsimd.c | 106 ++++++++++++++++++++++++++++++++++++++
3 files changed, 250 insertions(+), 0 deletions(-)
create mode 100644 arch/arm64/include/asm/fpsimd.h
create mode 100644 arch/arm64/kernel/entry-fpsimd.S
create mode 100644 arch/arm64/kernel/fpsimd.c

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
new file mode 100644
index 0000000..7ea4711
--- /dev/null
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -0,0 +1,64 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_FP_H
+#define __ASM_FP_H
+
+#include <asm/ptrace.h>
+
+#ifndef __ASSEMBLY__
+
+/*
+ * FP/SIMD storage area has:
+ * - FPSR and FPCR
+ * - 32 128-bit data registers
+ *
+ * Note that user_fp forms a prefix of this structure, which is relied
+ * upon in the ptrace FP/SIMD accessors. struct user_fpsimd_state must
+ * form a prefix of struct fpsimd_state.
+ */
+struct fpsimd_state {
+ union {
+ struct user_fpsimd_state user_fpsimd;
+ struct {
+ __uint128_t vregs[32];
+ u32 fpsr;
+ u32 fpcr;
+ };
+ };
+};
+
+#if defined(__KERNEL__) && defined(CONFIG_AARCH32_EMULATION)
+/* Masks for extracting the FPSR and FPCR from the FPSCR */
+#define VFP_FPSCR_STAT_MASK 0xf800009f
+#define VFP_FPSCR_CTRL_MASK 0x07f79f00
+/*
+ * The VFP state has 32x64-bit registers and a single 32-bit
+ * control/status register.
+ */
+#define VFP_STATE_SIZE ((32 * 8) + 4)
+#endif
+
+struct task_struct;
+
+extern void fpsimd_save_state(struct fpsimd_state *state);
+extern void fpsimd_load_state(struct fpsimd_state *state);
+
+extern void fpsimd_thread_switch(struct task_struct *next);
+extern void fpsimd_flush_thread(void);
+
+#endif
+
+#endif
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
new file mode 100644
index 0000000..17988a6
--- /dev/null
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -0,0 +1,80 @@
+/*
+ * FP/SIMD state saving and restoring
+ *
+ * Copyright (C) 2012 ARM Ltd.
+ * Author: Catalin Marinas <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/linkage.h>
+
+#include <asm/assembler.h>
+
+/*
+ * Save the FP registers.
+ *
+ * x0 - pointer to struct fpsimd_state
+ */
+ENTRY(fpsimd_save_state)
+ stp q0, q1, [x0, #16 * 0]
+ stp q2, q3, [x0, #16 * 2]
+ stp q4, q5, [x0, #16 * 4]
+ stp q6, q7, [x0, #16 * 6]
+ stp q8, q9, [x0, #16 * 8]
+ stp q10, q11, [x0, #16 * 10]
+ stp q12, q13, [x0, #16 * 12]
+ stp q14, q15, [x0, #16 * 14]
+ stp q16, q17, [x0, #16 * 16]
+ stp q18, q19, [x0, #16 * 18]
+ stp q20, q21, [x0, #16 * 20]
+ stp q22, q23, [x0, #16 * 22]
+ stp q24, q25, [x0, #16 * 24]
+ stp q26, q27, [x0, #16 * 26]
+ stp q28, q29, [x0, #16 * 28]
+ stp q30, q31, [x0, #16 * 30]!
+ mrs x8, fpsr
+ str w8, [x0, #16 * 2]
+ mrs x8, fpcr
+ str w8, [x0, #16 * 2 + 4]
+ ret
+ENDPROC(fpsimd_save_state)
+
+/*
+ * Load the FP registers.
+ *
+ * x0 - pointer to struct fpsimd_state
+ */
+ENTRY(fpsimd_load_state)
+ ldp q0, q1, [x0, #16 * 0]
+ ldp q2, q3, [x0, #16 * 2]
+ ldp q4, q5, [x0, #16 * 4]
+ ldp q6, q7, [x0, #16 * 6]
+ ldp q8, q9, [x0, #16 * 8]
+ ldp q10, q11, [x0, #16 * 10]
+ ldp q12, q13, [x0, #16 * 12]
+ ldp q14, q15, [x0, #16 * 14]
+ ldp q16, q17, [x0, #16 * 16]
+ ldp q18, q19, [x0, #16 * 18]
+ ldp q20, q21, [x0, #16 * 20]
+ ldp q22, q23, [x0, #16 * 22]
+ ldp q24, q25, [x0, #16 * 24]
+ ldp q26, q27, [x0, #16 * 26]
+ ldp q28, q29, [x0, #16 * 28]
+ ldp q30, q31, [x0, #16 * 30]!
+ ldr w8, [x0, #16 * 2]
+ ldr w9, [x0, #16 * 2 + 4]
+ msr fpsr, x8
+ msr fpcr, x9
+ ret
+ENDPROC(fpsimd_load_state)
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
new file mode 100644
index 0000000..e8b8357
--- /dev/null
+++ b/arch/arm64/kernel/fpsimd.c
@@ -0,0 +1,106 @@
+/*
+ * FP/SIMD context switching and fault handling
+ *
+ * Copyright (C) 2012 ARM Ltd.
+ * Author: Catalin Marinas <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/sched.h>
+#include <linux/signal.h>
+
+#include <asm/fpsimd.h>
+#include <asm/cputype.h>
+
+#define FPEXC_IOF (1 << 0)
+#define FPEXC_DZF (1 << 1)
+#define FPEXC_OFF (1 << 2)
+#define FPEXC_UFF (1 << 3)
+#define FPEXC_IXF (1 << 4)
+#define FPEXC_IDF (1 << 7)
+
+/*
+ * Trapped FP/ASIMD access.
+ */
+void do_fpsimd_acc(unsigned int esr, struct pt_regs *regs)
+{
+ /* TODO: implement lazy context saving/restoring */
+ WARN_ON(1);
+}
+
+/*
+ * Raise a SIGFPE for the current process.
+ */
+void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs)
+{
+ siginfo_t info;
+ unsigned int si_code = 0;
+
+ if (esr & FPEXC_IOF)
+ si_code = FPE_FLTINV;
+ else if (esr & FPEXC_DZF)
+ si_code = FPE_FLTDIV;
+ else if (esr & FPEXC_OFF)
+ si_code = FPE_FLTOVF;
+ else if (esr & FPEXC_UFF)
+ si_code = FPE_FLTUND;
+ else if (esr & FPEXC_IXF)
+ si_code = FPE_FLTRES;
+
+ memset(&info, 0, sizeof(info));
+ info.si_signo = SIGFPE;
+ info.si_code = si_code;
+ info.si_addr = (void __user *)instruction_pointer(regs);
+
+ send_sig_info(SIGFPE, &info, current);
+}
+
+void fpsimd_thread_switch(struct task_struct *next)
+{
+ /* check if not kernel threads */
+ if (current->mm)
+ fpsimd_save_state(&current->thread.fpsimd_state);
+ if (next->mm)
+ fpsimd_load_state(&next->thread.fpsimd_state);
+}
+
+void fpsimd_flush_thread(void)
+{
+ memset(&current->thread.fpsimd_state, 0, sizeof(struct fpsimd_state));
+ fpsimd_load_state(&current->thread.fpsimd_state);
+}
+
+/*
+ * FP/SIMD support code initialisation.
+ */
+static int __init fpsimd_init(void)
+{
+ u64 pfr = read_cpuid(ID_AA64PFR0_EL1);
+
+ if (pfr & (0xf << 16)) {
+ pr_notice("Floating-point is not implemented\n");
+ return 0;
+ }
+ elf_hwcap |= HWCAP_FP;
+
+ if (pfr & (0xf << 20))
+ pr_notice("Advanced SIMD is not implemented\n");
+ else
+ elf_hwcap |= HWCAP_ASIMD;
+
+ return 0;
+}
+late_initcall(fpsimd_init);

2012-08-14 17:54:51

by Catalin Marinas

[permalink] [raw]
Subject: [PATCH v2 05/31] arm64: MMU initialisation

This patch contains the initialisation of the memory blocks, MMU
attributes and the memory map. Only five memory types are defined:
Device nGnRnE (equivalent to Strongly Ordered), Device nGnRE (classic
Device memory), Device GRE, Normal Non-cacheable and Normal Cacheable.
Cache policies are supported via the memory attributes register
(MAIR_EL1) and only affect the Normal Cacheable mappings.

This patch also adds the SPARSEMEM_VMEMMAP initialisation.

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
---
arch/arm64/include/asm/memblock.h | 21 ++
arch/arm64/mm/init.c | 416 +++++++++++++++++++++++++++++++++++++
arch/arm64/mm/mmu.c | 395 +++++++++++++++++++++++++++++++++++
3 files changed, 832 insertions(+), 0 deletions(-)
create mode 100644 arch/arm64/include/asm/memblock.h
create mode 100644 arch/arm64/mm/init.c
create mode 100644 arch/arm64/mm/mmu.c

diff --git a/arch/arm64/include/asm/memblock.h b/arch/arm64/include/asm/memblock.h
new file mode 100644
index 0000000..6afeed2
--- /dev/null
+++ b/arch/arm64/include/asm/memblock.h
@@ -0,0 +1,21 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_MEMBLOCK_H
+#define __ASM_MEMBLOCK_H
+
+extern void arm64_memblock_init(void);
+
+#endif
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
new file mode 100644
index 0000000..c87583d
--- /dev/null
+++ b/arch/arm64/mm/init.c
@@ -0,0 +1,416 @@
+/*
+ * Based on arch/arm/mm/init.c
+ *
+ * Copyright (C) 1995-2005 Russell King
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/kernel.h>
+#include <linux/export.h>
+#include <linux/errno.h>
+#include <linux/swap.h>
+#include <linux/init.h>
+#include <linux/bootmem.h>
+#include <linux/mman.h>
+#include <linux/nodemask.h>
+#include <linux/initrd.h>
+#include <linux/gfp.h>
+#include <linux/memblock.h>
+#include <linux/sort.h>
+#include <linux/of_fdt.h>
+
+#include <asm/prom.h>
+#include <asm/sections.h>
+#include <asm/setup.h>
+#include <asm/sizes.h>
+#include <asm/tlb.h>
+
+#include "mm.h"
+
+static unsigned long phys_initrd_start __initdata = 0;
+static unsigned long phys_initrd_size __initdata = 0;
+
+phys_addr_t memstart_addr __read_mostly = 0;
+
+void __init early_init_dt_setup_initrd_arch(unsigned long start,
+ unsigned long end)
+{
+ phys_initrd_start = start;
+ phys_initrd_size = end - start;
+}
+
+static int __init early_initrd(char *p)
+{
+ unsigned long start, size;
+ char *endp;
+
+ start = memparse(p, &endp);
+ if (*endp == ',') {
+ size = memparse(endp + 1, NULL);
+
+ phys_initrd_start = start;
+ phys_initrd_size = size;
+ }
+ return 0;
+}
+early_param("initrd", early_initrd);
+
+#define MAX_DMA32_PFN ((4UL * 1024 * 1024 * 1024) >> PAGE_SHIFT)
+
+static void __init zone_sizes_init(unsigned long min, unsigned long max)
+{
+ unsigned long zone_size[MAX_NR_ZONES];
+ unsigned long max_dma32 = min;
+
+ memset(zone_size, 0, sizeof(zone_size));
+
+ zone_size[0] = max - min;
+#ifdef CONFIG_ZONE_DMA32
+ /* 4GB maximum for 32-bit only capable devices */
+ max_dma32 = min(max, MAX_DMA32_PFN);
+ zone_size[ZONE_DMA32] = max_dma32 - min;
+#endif
+ zone_size[ZONE_NORMAL] = max - max_dma32;
+
+ free_area_init(zone_size);
+}
+
+#ifdef CONFIG_HAVE_ARCH_PFN_VALID
+int pfn_valid(unsigned long pfn)
+{
+ return memblock_is_memory(pfn << PAGE_SHIFT);
+}
+EXPORT_SYMBOL(pfn_valid);
+#endif
+
+#ifndef CONFIG_SPARSEMEM
+static void arm64_memory_present(void)
+{
+}
+#else
+static void arm64_memory_present(void)
+{
+ struct memblock_region *reg;
+
+ for_each_memblock(memory, reg)
+ memory_present(0, memblock_region_memory_base_pfn(reg),
+ memblock_region_memory_end_pfn(reg));
+}
+#endif
+
+void __init arm64_memblock_init(void)
+{
+ u64 *reserve_map, base, size;
+
+ /* Register the kernel text, kernel data and initrd with memblock */
+ memblock_reserve(__pa(_text), _end - _text);
+#ifdef CONFIG_BLK_DEV_INITRD
+ if (phys_initrd_size) {
+ memblock_reserve(phys_initrd_start, phys_initrd_size);
+
+ /* Now convert initrd to virtual addresses */
+ initrd_start = __phys_to_virt(phys_initrd_start);
+ initrd_end = initrd_start + phys_initrd_size;
+ }
+#endif
+
+ /*
+ * Reserve the page tables. These are already in use,
+ * and can only be in node 0.
+ */
+ memblock_reserve(__pa(swapper_pg_dir), SWAPPER_DIR_SIZE);
+ memblock_reserve(__pa(idmap_pg_dir), IDMAP_DIR_SIZE);
+
+ /* Reserve the dtb region */
+ memblock_reserve(virt_to_phys(initial_boot_params),
+ be32_to_cpu(initial_boot_params->totalsize));
+
+ /*
+ * Process the reserve map. This will probably overlap the initrd
+ * and dtb locations which are already reserved, but overlapping
+ * doesn't hurt anything
+ */
+ reserve_map = ((void*)initial_boot_params) +
+ be32_to_cpu(initial_boot_params->off_mem_rsvmap);
+ while (1) {
+ base = be64_to_cpup(reserve_map++);
+ size = be64_to_cpup(reserve_map++);
+ if (!size)
+ break;
+ memblock_reserve(base, size);
+ }
+
+ memblock_allow_resize();
+ memblock_dump_all();
+}
+
+void __init bootmem_init(void)
+{
+ unsigned long min, max;
+
+ min = PFN_UP(memblock_start_of_DRAM());
+ max = PFN_DOWN(memblock_end_of_DRAM());
+
+ /*
+ * Sparsemem tries to allocate bootmem in memory_present(), so must be
+ * done after the fixed reservations.
+ */
+ arm64_memory_present();
+
+ sparse_init();
+ zone_sizes_init(min, max);
+
+ high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
+ max_pfn = max_low_pfn = max;
+}
+
+static inline int free_area(unsigned long pfn, unsigned long end, char *s)
+{
+ unsigned int pages = 0, size = (end - pfn) << (PAGE_SHIFT - 10);
+
+ for (; pfn < end; pfn++) {
+ struct page *page = pfn_to_page(pfn);
+ ClearPageReserved(page);
+ init_page_count(page);
+ __free_page(page);
+ pages++;
+ }
+
+ if (size && s)
+ pr_info("Freeing %s memory: %dK\n", s, size);
+
+ return pages;
+}
+
+/*
+ * Poison init memory with an undefined instruction (0x0).
+ */
+static inline void poison_init_mem(void *s, size_t count)
+{
+ memset(s, 0, count);
+}
+
+#ifndef CONFIG_SPARSEMEM_VMEMMAP
+static inline void free_memmap(unsigned long start_pfn, unsigned long end_pfn)
+{
+ struct page *start_pg, *end_pg;
+ unsigned long pg, pgend;
+
+ /*
+ * Convert start_pfn/end_pfn to a struct page pointer.
+ */
+ start_pg = pfn_to_page(start_pfn - 1) + 1;
+ end_pg = pfn_to_page(end_pfn - 1) + 1;
+
+ /*
+ * Convert to physical addresses, and round start upwards and end
+ * downwards.
+ */
+ pg = (unsigned long)PAGE_ALIGN(__pa(start_pg));
+ pgend = (unsigned long)__pa(end_pg) & PAGE_MASK;
+
+ /*
+ * If there are free pages between these, free the section of the
+ * memmap array.
+ */
+ if (pg < pgend)
+ free_bootmem(pg, pgend - pg);
+}
+
+/*
+ * The mem_map array can get very big. Free the unused area of the memory map.
+ */
+static void __init free_unused_memmap(void)
+{
+ unsigned long start, prev_end = 0;
+ struct memblock_region *reg;
+
+ for_each_memblock(memory, reg) {
+ start = __phys_to_pfn(reg->base);
+
+#ifdef CONFIG_SPARSEMEM
+ /*
+ * Take care not to free memmap entries that don't exist due
+ * to SPARSEMEM sections which aren't present.
+ */
+ start = min(start, ALIGN(prev_end, PAGES_PER_SECTION));
+#endif
+ /*
+ * If we had a previous bank, and there is a space between the
+ * current bank and the previous, free it.
+ */
+ if (prev_end && prev_end < start)
+ free_memmap(prev_end, start);
+
+ /*
+ * Align up here since the VM subsystem insists that the
+ * memmap entries are valid from the bank end aligned to
+ * MAX_ORDER_NR_PAGES.
+ */
+ prev_end = ALIGN(start + __phys_to_pfn(reg->size),
+ MAX_ORDER_NR_PAGES);
+ }
+
+#ifdef CONFIG_SPARSEMEM
+ if (!IS_ALIGNED(prev_end, PAGES_PER_SECTION))
+ free_memmap(prev_end, ALIGN(prev_end, PAGES_PER_SECTION));
+#endif
+}
+#endif /* !CONFIG_SPARSEMEM_VMEMMAP */
+
+/*
+ * mem_init() marks the free areas in the mem_map and tells us how much memory
+ * is free. This is done after various parts of the system have claimed their
+ * memory after the kernel image.
+ */
+void __init mem_init(void)
+{
+ unsigned long reserved_pages, free_pages;
+ struct memblock_region *reg;
+
+#if CONFIG_SWIOTLB
+ extern void __init arm64_swiotlb_init(size_t max_size);
+ arm64_swiotlb_init(max_pfn << (PAGE_SHIFT - 1));
+#endif
+
+ max_mapnr = pfn_to_page(max_pfn + PHYS_PFN_OFFSET) - mem_map;
+
+#ifndef CONFIG_SPARSEMEM_VMEMMAP
+ /* this will put all unused low memory onto the freelists */
+ free_unused_memmap();
+#endif
+
+ totalram_pages += free_all_bootmem();
+
+ reserved_pages = free_pages = 0;
+
+ for_each_memblock(memory, reg) {
+ unsigned int pfn1, pfn2;
+ struct page *page, *end;
+
+ pfn1 = __phys_to_pfn(reg->base);
+ pfn2 = pfn1 + __phys_to_pfn(reg->size);
+
+ page = pfn_to_page(pfn1);
+ end = pfn_to_page(pfn2 - 1) + 1;
+
+ do {
+ if (PageReserved(page))
+ reserved_pages++;
+ else if (!page_count(page))
+ free_pages++;
+ page++;
+ } while (page < end);
+ }
+
+ /*
+ * Since our memory may not be contiguous, calculate the real number
+ * of pages we have in this system.
+ */
+ pr_info("Memory:");
+ num_physpages = 0;
+ for_each_memblock(memory, reg) {
+ unsigned long pages = memblock_region_memory_end_pfn(reg) -
+ memblock_region_memory_base_pfn(reg);
+ num_physpages += pages;
+ printk(" %ldMB", pages >> (20 - PAGE_SHIFT));
+ }
+ printk(" = %luMB total\n", num_physpages >> (20 - PAGE_SHIFT));
+
+ pr_notice("Memory: %luk/%luk available, %luk reserved\n",
+ nr_free_pages() << (PAGE_SHIFT-10),
+ free_pages << (PAGE_SHIFT-10),
+ reserved_pages << (PAGE_SHIFT-10));
+
+#define MLK(b, t) b, t, ((t) - (b)) >> 10
+#define MLM(b, t) b, t, ((t) - (b)) >> 20
+#define MLK_ROUNDUP(b, t) b, t, DIV_ROUND_UP(((t) - (b)), SZ_1K)
+
+ pr_notice("Virtual kernel memory layout:\n"
+ " vmalloc : 0x%16lx - 0x%16lx (%6ld MB)\n"
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+ " vmemmap : 0x%16lx - 0x%16lx (%6ld MB)\n"
+#endif
+ " modules : 0x%16lx - 0x%16lx (%6ld MB)\n"
+ " memory : 0x%16lx - 0x%16lx (%6ld MB)\n"
+ " .init : 0x%p" " - 0x%p" " (%6ld kB)\n"
+ " .text : 0x%p" " - 0x%p" " (%6ld kB)\n"
+ " .data : 0x%p" " - 0x%p" " (%6ld kB)\n",
+ MLM(VMALLOC_START, VMALLOC_END),
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+ MLM((unsigned long)virt_to_page(PAGE_OFFSET),
+ (unsigned long)virt_to_page(high_memory)),
+#endif
+ MLM(MODULES_VADDR, MODULES_END),
+ MLM(PAGE_OFFSET, (unsigned long)high_memory),
+
+ MLK_ROUNDUP(__init_begin, __init_end),
+ MLK_ROUNDUP(_text, _etext),
+ MLK_ROUNDUP(_sdata, _edata));
+
+#undef MLK
+#undef MLM
+#undef MLK_ROUNDUP
+
+ /*
+ * Check boundaries twice: Some fundamental inconsistencies can be
+ * detected at build time already.
+ */
+#ifdef CONFIG_AARCH32_EMULATION
+ BUILD_BUG_ON(TASK_SIZE_32 > TASK_SIZE_64);
+#endif
+ BUILD_BUG_ON(TASK_SIZE_64 > MODULES_VADDR);
+ BUG_ON(TASK_SIZE_64 > MODULES_VADDR);
+
+ if (PAGE_SIZE >= 16384 && num_physpages <= 128) {
+ extern int sysctl_overcommit_memory;
+ /*
+ * On a machine this small we won't get anywhere without
+ * overcommit, so turn it on by default.
+ */
+ sysctl_overcommit_memory = OVERCOMMIT_ALWAYS;
+ }
+}
+
+void free_initmem(void)
+{
+ poison_init_mem(__init_begin, __init_end - __init_begin);
+ totalram_pages += free_area(__phys_to_pfn(__pa(__init_begin)),
+ __phys_to_pfn(__pa(__init_end)),
+ "init");
+}
+
+#ifdef CONFIG_BLK_DEV_INITRD
+
+static int keep_initrd;
+
+void free_initrd_mem(unsigned long start, unsigned long end)
+{
+ if (!keep_initrd) {
+ poison_init_mem((void *)start, PAGE_ALIGN(end) - start);
+ totalram_pages += free_area(__phys_to_pfn(__pa(start)),
+ __phys_to_pfn(__pa(end)),
+ "initrd");
+ }
+}
+
+static int __init keepinitrd_setup(char *__unused)
+{
+ keep_initrd = 1;
+ return 1;
+}
+
+__setup("keepinitrd", keepinitrd_setup);
+#endif
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
new file mode 100644
index 0000000..d2dd438
--- /dev/null
+++ b/arch/arm64/mm/mmu.c
@@ -0,0 +1,395 @@
+/*
+ * Based on arch/arm/mm/mmu.c
+ *
+ * Copyright (C) 1995-2005 Russell King
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/export.h>
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/mman.h>
+#include <linux/nodemask.h>
+#include <linux/memblock.h>
+#include <linux/fs.h>
+
+#include <asm/cputype.h>
+#include <asm/sections.h>
+#include <asm/setup.h>
+#include <asm/sizes.h>
+#include <asm/tlb.h>
+#include <asm/mmu_context.h>
+
+#include "mm.h"
+
+/*
+ * Empty_zero_page is a special page that is used for zero-initialized data
+ * and COW.
+ */
+struct page *empty_zero_page;
+EXPORT_SYMBOL(empty_zero_page);
+
+pgprot_t pgprot_default;
+EXPORT_SYMBOL(pgprot_default);
+
+static pmdval_t prot_sect_kernel;
+
+struct cachepolicy {
+ const char policy[16];
+ u64 mair;
+ u64 tcr;
+};
+
+static struct cachepolicy cache_policies[] __initdata = {
+ {
+ .policy = "uncached",
+ .mair = 0x44, /* inner, outer non-cacheable */
+ .tcr = TCR_IRGN_NC | TCR_ORGN_NC,
+ }, {
+ .policy = "writethrough",
+ .mair = 0xaa, /* inner, outer write-through, read-allocate */
+ .tcr = TCR_IRGN_WT | TCR_ORGN_WT,
+ }, {
+ .policy = "writeback",
+ .mair = 0xee, /* inner, outer write-back, read-allocate */
+ .tcr = TCR_IRGN_WBnWA | TCR_ORGN_WBnWA,
+ }
+};
+
+/*
+ * These are useful for identifying cache coherency problems by allowing the
+ * cache or the cache and writebuffer to be turned off. It changes the Normal
+ * memory caching attributes in the MAIR_EL1 register.
+ */
+static int __init early_cachepolicy(char *p)
+{
+ int i;
+ u64 tmp;
+
+ for (i = 0; i < ARRAY_SIZE(cache_policies); i++) {
+ int len = strlen(cache_policies[i].policy);
+
+ if (memcmp(p, cache_policies[i].policy, len) == 0)
+ break;
+ }
+ if (i == ARRAY_SIZE(cache_policies)) {
+ pr_err("ERROR: unknown or unsupported cache policy: %s\n", p);
+ return 0;
+ }
+
+ flush_cache_all();
+
+ /*
+ * Modify MT_NORMAL attributes in MAIR_EL1.
+ */
+ asm volatile(
+ " mrs %0, mair_el1\n"
+ " bfi %0, %1, #%2, #8\n"
+ " msr mair_el1, %0\n"
+ " isb\n"
+ : "=&r" (tmp)
+ : "r" (cache_policies[i].mair), "i" (MT_NORMAL * 8));
+
+ /*
+ * Modify TCR PTW cacheability attributes.
+ */
+ asm volatile(
+ " mrs %0, tcr_el1\n"
+ " bic %0, %0, %2\n"
+ " orr %0, %0, %1\n"
+ " msr tcr_el1, %0\n"
+ " isb\n"
+ : "=&r" (tmp)
+ : "r" (cache_policies[i].tcr), "r" (TCR_IRGN_MASK | TCR_ORGN_MASK));
+
+ flush_cache_all();
+
+ return 0;
+}
+early_param("cachepolicy", early_cachepolicy);
+
+/*
+ * Adjust the PMD section entries according to the CPU in use.
+ */
+static void __init init_mem_pgprot(void)
+{
+ pteval_t default_pgprot;
+ int i;
+
+ default_pgprot = PTE_ATTRINDX(MT_NORMAL);
+ prot_sect_kernel = PMD_TYPE_SECT | PMD_SECT_AF | PMD_ATTRINDX(MT_NORMAL);
+
+#ifdef CONFIG_SMP
+ /*
+ * Mark memory with the "shared" attribute for SMP systems
+ */
+ default_pgprot |= PTE_SHARED;
+ prot_sect_kernel |= PMD_SECT_S;
+#endif
+
+ for (i = 0; i < 16; i++) {
+ unsigned long v = pgprot_val(protection_map[i]);
+ protection_map[i] = __pgprot(v | default_pgprot);
+ }
+
+ pgprot_default = __pgprot(PTE_TYPE_PAGE | PTE_AF | default_pgprot);
+}
+
+pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
+ unsigned long size, pgprot_t vma_prot)
+{
+ if (!pfn_valid(pfn))
+ return pgprot_noncached(vma_prot);
+ else if (file->f_flags & O_SYNC)
+ return pgprot_writecombine(vma_prot);
+ return vma_prot;
+}
+EXPORT_SYMBOL(phys_mem_access_prot);
+
+static void __init *early_alloc(unsigned long sz)
+{
+ void *ptr = __va(memblock_alloc(sz, sz));
+ memset(ptr, 0, sz);
+ return ptr;
+}
+
+static void __init alloc_init_pte(pmd_t *pmd, unsigned long addr,
+ unsigned long end, unsigned long pfn)
+{
+ pte_t *pte;
+
+ if (pmd_none(*pmd)) {
+ pte = early_alloc(PTRS_PER_PTE * sizeof(pte_t));
+ __pmd_populate(pmd, __pa(pte), PMD_TYPE_TABLE);
+ }
+ BUG_ON(pmd_bad(*pmd));
+
+ pte = pte_offset_kernel(pmd, addr);
+ do {
+ set_pte(pte, pfn_pte(pfn, PAGE_KERNEL_EXEC));
+ pfn++;
+ } while (pte++, addr += PAGE_SIZE, addr != end);
+}
+
+static void __init alloc_init_pmd(pud_t *pud, unsigned long addr,
+ unsigned long end, phys_addr_t phys)
+{
+ pmd_t *pmd;
+ unsigned long next;
+
+ /*
+ * Check for initial section mappings in the pgd/pud and remove them.
+ */
+ if (pud_none(*pud) || pud_bad(*pud)) {
+ pmd = early_alloc(PTRS_PER_PMD * sizeof(pmd_t));
+ pud_populate(&init_mm, pud, pmd);
+ }
+
+ pmd = pmd_offset(pud, addr);
+ do {
+ next = pmd_addr_end(addr, end);
+ /* try section mapping first */
+ if (((addr | next | phys) & ~SECTION_MASK) == 0)
+ set_pmd(pmd, __pmd(phys | prot_sect_kernel));
+ else
+ alloc_init_pte(pmd, addr, next, __phys_to_pfn(phys));
+ phys += next - addr;
+ } while (pmd++, addr = next, addr != end);
+}
+
+static void __init alloc_init_pud(pgd_t *pgd, unsigned long addr,
+ unsigned long end, unsigned long phys)
+{
+ pud_t *pud = pud_offset(pgd, addr);
+ unsigned long next;
+
+ do {
+ next = pud_addr_end(addr, end);
+ alloc_init_pmd(pud, addr, next, phys);
+ phys += next - addr;
+ } while (pud++, addr = next, addr != end);
+}
+
+/*
+ * Create the page directory entries and any necessary page tables for the
+ * mapping specified by 'md'.
+ */
+static void __init create_mapping(phys_addr_t phys, unsigned long virt,
+ phys_addr_t size)
+{
+ unsigned long addr, length, end, next;
+ pgd_t *pgd;
+
+ if (virt < VMALLOC_START) {
+ pr_warning("BUG: not creating mapping for 0x%016llx at 0x%016lx - outside kernel range\n",
+ phys, virt);
+ return;
+ }
+
+ addr = virt & PAGE_MASK;
+ length = PAGE_ALIGN(size + (virt & ~PAGE_MASK));
+
+ pgd = pgd_offset_k(addr);
+ end = addr + length;
+ do {
+ next = pgd_addr_end(addr, end);
+ alloc_init_pud(pgd, addr, next, phys);
+ phys += next - addr;
+ } while (pgd++, addr = next, addr != end);
+}
+
+static void __init map_mem(void)
+{
+ struct memblock_region *reg;
+
+ /* map all the memory banks */
+ for_each_memblock(memory, reg) {
+ phys_addr_t start = reg->base;
+ phys_addr_t end = start + reg->size;
+
+ if (start >= end)
+ break;
+
+ create_mapping(start, __phys_to_virt(start), end - start);
+ }
+}
+
+/*
+ * paging_init() sets up the page tables, initialises the zone memory
+ * maps and sets up the zero page.
+ */
+void __init paging_init(void)
+{
+ void *zero_page;
+
+ /*
+ * Maximum PGDIR_SIZE addressable via the initial direct kernel
+ * mapping in swapper_pg_dir.
+ */
+ memblock_set_current_limit((PHYS_OFFSET & PGDIR_MASK) + PGDIR_SIZE);
+
+ init_mem_pgprot();
+ map_mem();
+
+ /*
+ * Finally flush the caches and tlb to ensure that we're in a
+ * consistent state.
+ */
+ flush_cache_all();
+ flush_tlb_all();
+
+ /* allocate the zero page. */
+ zero_page = early_alloc(PAGE_SIZE);
+
+ bootmem_init();
+
+ empty_zero_page = virt_to_page(zero_page);
+ __flush_dcache_page(NULL, empty_zero_page);
+
+ /*
+ * TTBR0 is only used for the identity mapping at this stage. Make it
+ * point to zero page to avoid speculatively fetching new entries.
+ */
+ cpu_set_reserved_ttbr0();
+ flush_tlb_all();
+}
+
+/*
+ * Enable the identity mapping to allow the MMU disabling.
+ */
+void setup_mm_for_reboot(void)
+{
+ cpu_switch_mm(idmap_pg_dir, &init_mm);
+ flush_tlb_all();
+}
+
+/*
+ * Check whether a kernel address is valid (derived from arch/x86/).
+ */
+int kern_addr_valid(unsigned long addr)
+{
+ pgd_t *pgd;
+ pud_t *pud;
+ pmd_t *pmd;
+ pte_t *pte;
+
+ if ((((long)addr) >> VA_BITS) != -1UL)
+ return 0;
+
+ pgd = pgd_offset_k(addr);
+ if (pgd_none(*pgd))
+ return 0;
+
+ pud = pud_offset(pgd, addr);
+ if (pud_none(*pud))
+ return 0;
+
+ pmd = pmd_offset(pud, addr);
+ if (pmd_none(*pmd))
+ return 0;
+
+ pte = pte_offset_kernel(pmd, addr);
+ if (pte_none(*pte))
+ return 0;
+
+ return pfn_valid(pte_pfn(*pte));
+}
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+#ifdef CONFIG_ARM64_64K_PAGES
+int __meminit vmemmap_populate(struct page *start_page,
+ unsigned long size, int node)
+{
+ return vmemmap_populate_basepages(start_page, size, node);
+}
+#else /* !CONFIG_ARM64_64K_PAGES */
+int __meminit vmemmap_populate(struct page *start_page,
+ unsigned long size, int node)
+{
+ unsigned long addr = (unsigned long)start_page;
+ unsigned long end = (unsigned long)(start_page + size);
+ unsigned long next;
+ pgd_t *pgd;
+ pud_t *pud;
+ pmd_t *pmd;
+
+ do {
+ next = pmd_addr_end(addr, end);
+
+ pgd = vmemmap_pgd_populate(addr, node);
+ if (!pgd)
+ return -ENOMEM;
+
+ pud = vmemmap_pud_populate(pgd, addr, node);
+ if (!pud)
+ return -ENOMEM;
+
+ pmd = pmd_offset(pud, addr);
+ if (pmd_none(*pmd)) {
+ void *p = NULL;
+
+ p = vmemmap_alloc_block_buf(PMD_SIZE, node);
+ if (!p)
+ return -ENOMEM;
+
+ set_pmd(pmd, __pmd(__pa(p) | prot_sect_kernel));
+ } else
+ vmemmap_verify((pte_t *)pmd, node, addr, next);
+ } while (addr = next, addr != end);
+
+ return 0;
+}
+#endif /* CONFIG_ARM64_64K_PAGES */
+#endif /* CONFIG_SPARSEMEM_VMEMMAP */

2012-08-14 17:54:59

by Catalin Marinas

[permalink] [raw]
Subject: [PATCH v2 06/31] arm64: MMU fault handling and page table management

This patch adds support for the handling of the MMU faults (exception
entry code introduced by a previous patch) and page table management.

The user translation table is pointed to by TTBR0 and the kernel one
(swapper_pg_dir) by TTBR1. There is no translation information shared or
address space overlapping between user and kernel page tables.

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
---
arch/arm64/include/asm/page.h | 67 +++++
arch/arm64/include/asm/pgalloc.h | 113 ++++++++
arch/arm64/mm/copypage.c | 34 +++
arch/arm64/mm/extable.c | 17 ++
arch/arm64/mm/fault.c | 534 ++++++++++++++++++++++++++++++++++++++
arch/arm64/mm/mm.h | 2 +
arch/arm64/mm/mmap.c | 144 ++++++++++
arch/arm64/mm/pgd.c | 49 ++++
8 files changed, 960 insertions(+), 0 deletions(-)
create mode 100644 arch/arm64/include/asm/page.h
create mode 100644 arch/arm64/include/asm/pgalloc.h
create mode 100644 arch/arm64/mm/copypage.c
create mode 100644 arch/arm64/mm/extable.c
create mode 100644 arch/arm64/mm/fault.c
create mode 100644 arch/arm64/mm/mm.h
create mode 100644 arch/arm64/mm/mmap.c
create mode 100644 arch/arm64/mm/pgd.c

diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
new file mode 100644
index 0000000..46bf666
--- /dev/null
+++ b/arch/arm64/include/asm/page.h
@@ -0,0 +1,67 @@
+/*
+ * Based on arch/arm/include/asm/page.h
+ *
+ * Copyright (C) 1995-2003 Russell King
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_PAGE_H
+#define __ASM_PAGE_H
+
+/* PAGE_SHIFT determines the page size */
+#ifdef CONFIG_ARM64_64K_PAGES
+#define PAGE_SHIFT 16
+#else
+#define PAGE_SHIFT 12
+#endif
+#define PAGE_SIZE (_AC(1,UL) << PAGE_SHIFT)
+#define PAGE_MASK (~(PAGE_SIZE-1))
+
+/* We do define AT_SYSINFO_EHDR but don't use the gate mechanism */
+#define __HAVE_ARCH_GATE_AREA 1
+
+#ifndef __ASSEMBLY__
+
+#ifdef CONFIG_ARM64_64K_PAGES
+#include <asm/pgtable-2level-types.h>
+#else
+#include <asm/pgtable-3level-types.h>
+#endif
+
+extern void __cpu_clear_user_page(void *p, unsigned long user);
+extern void __cpu_copy_user_page(void *to, const void *from,
+ unsigned long user);
+extern void copy_page(void *to, const void *from);
+extern void clear_page(void *to);
+
+#define clear_user_page(addr,vaddr,pg) __cpu_clear_user_page(addr, vaddr)
+#define copy_user_page(to,from,vaddr,pg) __cpu_copy_user_page(to, from, vaddr)
+
+typedef struct page *pgtable_t;
+
+#ifdef CONFIG_HAVE_ARCH_PFN_VALID
+extern int pfn_valid(unsigned long);
+#endif
+
+#include <asm/memory.h>
+
+#endif /* !__ASSEMBLY__ */
+
+#define VM_DATA_DEFAULT_FLAGS \
+ (((current->personality & READ_IMPLIES_EXEC) ? VM_EXEC : 0) | \
+ VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC)
+
+#include <asm-generic/getorder.h>
+
+#endif
diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
new file mode 100644
index 0000000..f214069
--- /dev/null
+++ b/arch/arm64/include/asm/pgalloc.h
@@ -0,0 +1,113 @@
+/*
+ * Based on arch/arm/include/asm/pgalloc.h
+ *
+ * Copyright (C) 2000-2001 Russell King
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_PGALLOC_H
+#define __ASM_PGALLOC_H
+
+#include <asm/pgtable-hwdef.h>
+#include <asm/processor.h>
+#include <asm/cacheflush.h>
+#include <asm/tlbflush.h>
+
+#define check_pgt_cache() do { } while (0)
+
+#ifndef CONFIG_ARM64_64K_PAGES
+
+static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
+{
+ return (pmd_t *)get_zeroed_page(GFP_KERNEL | __GFP_REPEAT);
+}
+
+static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
+{
+ BUG_ON((unsigned long)pmd & (PAGE_SIZE-1));
+ free_page((unsigned long)pmd);
+}
+
+static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
+{
+ set_pud(pud, __pud(__pa(pmd) | PMD_TYPE_TABLE));
+}
+
+#endif /* CONFIG_ARM64_64K_PAGES */
+
+extern pgd_t *pgd_alloc(struct mm_struct *mm);
+extern void pgd_free(struct mm_struct *mm, pgd_t *pgd);
+
+#define PGALLOC_GFP (GFP_KERNEL | __GFP_NOTRACK | __GFP_REPEAT | __GFP_ZERO)
+
+static inline pte_t *
+pte_alloc_one_kernel(struct mm_struct *mm, unsigned long addr)
+{
+ return (pte_t *)__get_free_page(PGALLOC_GFP);
+}
+
+static inline pgtable_t
+pte_alloc_one(struct mm_struct *mm, unsigned long addr)
+{
+ struct page *pte;
+
+ pte = alloc_pages(PGALLOC_GFP, 0);
+ if (pte)
+ pgtable_page_ctor(pte);
+
+ return pte;
+}
+
+/*
+ * Free a PTE table.
+ */
+static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
+{
+ if (pte)
+ free_page((unsigned long)pte);
+}
+
+static inline void pte_free(struct mm_struct *mm, pgtable_t pte)
+{
+ pgtable_page_dtor(pte);
+ __free_page(pte);
+}
+
+static inline void __pmd_populate(pmd_t *pmdp, phys_addr_t pte,
+ pmdval_t prot)
+{
+ set_pmd(pmdp, __pmd(pte | prot));
+}
+
+/*
+ * Populate the pmdp entry with a pointer to the pte. This pmd is part
+ * of the mm address space.
+ */
+static inline void
+pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmdp, pte_t *ptep)
+{
+ /*
+ * The pmd must be loaded with the physical address of the PTE table
+ */
+ __pmd_populate(pmdp, __pa(ptep), PMD_TYPE_TABLE);
+}
+
+static inline void
+pmd_populate(struct mm_struct *mm, pmd_t *pmdp, pgtable_t ptep)
+{
+ __pmd_populate(pmdp, page_to_phys(ptep), PMD_TYPE_TABLE);
+}
+#define pmd_pgtable(pmd) pmd_page(pmd)
+
+#endif
diff --git a/arch/arm64/mm/copypage.c b/arch/arm64/mm/copypage.c
new file mode 100644
index 0000000..9361662
--- /dev/null
+++ b/arch/arm64/mm/copypage.c
@@ -0,0 +1,34 @@
+/*
+ * Based on arch/arm/mm/copypage.c
+ *
+ * Copyright (C) 2002 Deep Blue Solutions Ltd, All Rights Reserved.
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/mm.h>
+
+#include <asm/page.h>
+#include <asm/cacheflush.h>
+
+void __cpu_copy_user_page(void *kto, const void *kfrom, unsigned long vaddr)
+{
+ copy_page(kto, kfrom);
+ __cpuc_flush_dcache_area(kto, PAGE_SIZE);
+}
+
+void __cpu_clear_user_page(void *kaddr, unsigned long vaddr)
+{
+ clear_page(kaddr);
+}
diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c
new file mode 100644
index 0000000..7944427
--- /dev/null
+++ b/arch/arm64/mm/extable.c
@@ -0,0 +1,17 @@
+/*
+ * Based on arch/arm/mm/extable.c
+ */
+
+#include <linux/module.h>
+#include <linux/uaccess.h>
+
+int fixup_exception(struct pt_regs *regs)
+{
+ const struct exception_table_entry *fixup;
+
+ fixup = search_exception_tables(instruction_pointer(regs));
+ if (fixup)
+ regs->pc = fixup->fixup;
+
+ return fixup != NULL;
+}
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
new file mode 100644
index 0000000..1909a69
--- /dev/null
+++ b/arch/arm64/mm/fault.c
@@ -0,0 +1,534 @@
+/*
+ * Based on arch/arm/mm/fault.c
+ *
+ * Copyright (C) 1995 Linus Torvalds
+ * Copyright (C) 1995-2004 Russell King
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/module.h>
+#include <linux/signal.h>
+#include <linux/mm.h>
+#include <linux/hardirq.h>
+#include <linux/init.h>
+#include <linux/kprobes.h>
+#include <linux/uaccess.h>
+#include <linux/page-flags.h>
+#include <linux/sched.h>
+#include <linux/highmem.h>
+#include <linux/perf_event.h>
+
+#include <asm/exception.h>
+#include <asm/debug-monitors.h>
+#include <asm/system_misc.h>
+#include <asm/pgtable.h>
+#include <asm/tlbflush.h>
+
+/*
+ * Dump out the page tables associated with 'addr' in mm 'mm'.
+ */
+void show_pte(struct mm_struct *mm, unsigned long addr)
+{
+ pgd_t *pgd;
+
+ if (!mm)
+ mm = &init_mm;
+
+ pr_alert("pgd = %p\n", mm->pgd);
+ pgd = pgd_offset(mm, addr);
+ pr_alert("[%08lx] *pgd=%016llx", addr, pgd_val(*pgd));
+
+ do {
+ pud_t *pud;
+ pmd_t *pmd;
+ pte_t *pte;
+
+ if (pgd_none_or_clear_bad(pgd))
+ break;
+
+ pud = pud_offset(pgd, addr);
+ if (pud_none_or_clear_bad(pud))
+ break;
+
+ pmd = pmd_offset(pud, addr);
+ printk(", *pmd=%016llx", pmd_val(*pmd));
+ if (pmd_none_or_clear_bad(pmd))
+ break;
+
+ pte = pte_offset_map(pmd, addr);
+ printk(", *pte=%016llx", pte_val(*pte));
+ pte_unmap(pte);
+ } while(0);
+
+ printk("\n");
+}
+
+/*
+ * The kernel tried to access some page that wasn't present.
+ */
+static void __do_kernel_fault(struct mm_struct *mm, unsigned long addr,
+ unsigned int esr, struct pt_regs *regs)
+{
+ /*
+ * Are we prepared to handle this kernel fault?
+ */
+ if (fixup_exception(regs))
+ return;
+
+ /*
+ * No handler, we'll have to terminate things with extreme prejudice.
+ */
+ bust_spinlocks(1);
+ pr_alert("Unable to handle kernel %s at virtual address %08lx\n",
+ (addr < PAGE_SIZE) ? "NULL pointer dereference" :
+ "paging request", addr);
+
+ show_pte(mm, addr);
+ die("Oops", regs, esr);
+ bust_spinlocks(0);
+ do_exit(SIGKILL);
+}
+
+/*
+ * Something tried to access memory that isn't in our memory map. User mode
+ * accesses just cause a SIGSEGV
+ */
+static void __do_user_fault(struct task_struct *tsk, unsigned long addr,
+ unsigned int esr, unsigned int sig, int code,
+ struct pt_regs *regs)
+{
+ struct siginfo si;
+
+ if (show_unhandled_signals) {
+ pr_info("%s[%d]: unhandled page fault (%d) at 0x%08lx, code 0x%03x\n",
+ tsk->comm, task_pid_nr(tsk), sig, addr, esr);
+ show_pte(tsk->mm, addr);
+ show_regs(regs);
+ }
+
+ tsk->thread.fault_address = addr;
+ si.si_signo = sig;
+ si.si_errno = 0;
+ si.si_code = code;
+ si.si_addr = (void __user *)addr;
+ force_sig_info(sig, &si, tsk);
+}
+
+void do_bad_area(unsigned long addr, unsigned int esr, struct pt_regs *regs)
+{
+ struct task_struct *tsk = current;
+ struct mm_struct *mm = tsk->active_mm;
+
+ /*
+ * If we are in kernel mode at this point, we have no context to
+ * handle this fault with.
+ */
+ if (user_mode(regs))
+ __do_user_fault(tsk, addr, esr, SIGSEGV, SEGV_MAPERR, regs);
+ else
+ __do_kernel_fault(mm, addr, esr, regs);
+}
+
+#define VM_FAULT_BADMAP 0x010000
+#define VM_FAULT_BADACCESS 0x020000
+
+#define ESR_WRITE (1 << 6)
+#define ESR_LNX_EXEC (1 << 24)
+
+/*
+ * Check that the permissions on the VMA allow for the fault which occurred.
+ * If we encountered a write fault, we must have write permission, otherwise
+ * we allow any permission.
+ */
+static inline bool access_error(unsigned int esr, struct vm_area_struct *vma)
+{
+ unsigned int mask = VM_READ | VM_WRITE | VM_EXEC;
+
+ if (esr & ESR_WRITE)
+ mask = VM_WRITE;
+ if (esr & ESR_LNX_EXEC)
+ mask = VM_EXEC;
+
+ return vma->vm_flags & mask ? false : true;
+}
+
+static int __do_page_fault(struct mm_struct *mm, unsigned long addr,
+ unsigned int esr, unsigned int flags,
+ struct task_struct *tsk)
+{
+ struct vm_area_struct *vma;
+ int fault;
+
+ vma = find_vma(mm, addr);
+ fault = VM_FAULT_BADMAP;
+ if (unlikely(!vma))
+ goto out;
+ if (unlikely(vma->vm_start > addr))
+ goto check_stack;
+
+ /*
+ * Ok, we have a good vm_area for this memory access, so we can handle
+ * it.
+ */
+good_area:
+ if (access_error(esr, vma)) {
+ fault = VM_FAULT_BADACCESS;
+ goto out;
+ }
+
+ return handle_mm_fault(mm, vma, addr & PAGE_MASK, flags);
+
+check_stack:
+ if (vma->vm_flags & VM_GROWSDOWN && !expand_stack(vma, addr))
+ goto good_area;
+out:
+ return fault;
+}
+
+static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
+ struct pt_regs *regs)
+{
+ struct task_struct *tsk;
+ struct mm_struct *mm;
+ int fault, sig, code;
+ int write = esr & ESR_WRITE;
+ unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE |
+ (write ? FAULT_FLAG_WRITE : 0);
+
+ tsk = current;
+ mm = tsk->mm;
+
+ /* Enable interrupts if they were enabled in the parent context. */
+ if (interrupts_enabled(regs))
+ local_irq_enable();
+
+ /*
+ * If we're in an interrupt or have no user context, we must not take
+ * the fault.
+ */
+ if (in_atomic() || !mm)
+ goto no_context;
+
+ /*
+ * As per x86, we may deadlock here. However, since the kernel only
+ * validly references user space from well defined areas of the code,
+ * we can bug out early if this is from code which shouldn't.
+ */
+ if (!down_read_trylock(&mm->mmap_sem)) {
+ if (!user_mode(regs) && !search_exception_tables(regs->pc))
+ goto no_context;
+retry:
+ down_read(&mm->mmap_sem);
+ } else {
+ /*
+ * The above down_read_trylock() might have succeeded in which
+ * case, we'll have missed the might_sleep() from down_read().
+ */
+ might_sleep();
+#ifdef CONFIG_DEBUG_VM
+ if (!user_mode(regs) && !search_exception_tables(regs->pc))
+ goto no_context;
+#endif
+ }
+
+ fault = __do_page_fault(mm, addr, esr, flags, tsk);
+
+ /*
+ * If we need to retry but a fatal signal is pending, handle the
+ * signal first. We do not need to release the mmap_sem because it
+ * would already be released in __lock_page_or_retry in mm/filemap.c.
+ */
+ if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
+ return 0;
+
+ /*
+ * Major/minor page fault accounting is only done on the initial
+ * attempt. If we go through a retry, it is extremely likely that the
+ * page will be found in page cache at that point.
+ */
+
+ perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
+ if (flags & FAULT_FLAG_ALLOW_RETRY) {
+ if (fault & VM_FAULT_MAJOR) {
+ tsk->maj_flt++;
+ perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs,
+ addr);
+ } else {
+ tsk->min_flt++;
+ perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs,
+ addr);
+ }
+ if (fault & VM_FAULT_RETRY) {
+ /*
+ * Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk of
+ * starvation.
+ */
+ flags &= ~FAULT_FLAG_ALLOW_RETRY;
+ goto retry;
+ }
+ }
+
+ up_read(&mm->mmap_sem);
+
+ /*
+ * Handle the "normal" case first - VM_FAULT_MAJOR / VM_FAULT_MINOR
+ */
+ if (likely(!(fault & (VM_FAULT_ERROR | VM_FAULT_BADMAP |
+ VM_FAULT_BADACCESS))))
+ return 0;
+
+ if (fault & VM_FAULT_OOM) {
+ /*
+ * We ran out of memory, call the OOM killer, and return to
+ * userspace (which will retry the fault, or kill us if we got
+ * oom-killed).
+ */
+ pagefault_out_of_memory();
+ return 0;
+ }
+
+ /*
+ * If we are in kernel mode at this point, we have no context to
+ * handle this fault with.
+ */
+ if (!user_mode(regs))
+ goto no_context;
+
+ if (fault & VM_FAULT_SIGBUS) {
+ /*
+ * We had some memory, but were unable to successfully fix up
+ * this page fault.
+ */
+ sig = SIGBUS;
+ code = BUS_ADRERR;
+ } else {
+ /*
+ * Something tried to access memory that isn't in our memory
+ * map.
+ */
+ sig = SIGSEGV;
+ code = fault == VM_FAULT_BADACCESS ?
+ SEGV_ACCERR : SEGV_MAPERR;
+ }
+
+ __do_user_fault(tsk, addr, esr, sig, code, regs);
+ return 0;
+
+no_context:
+ __do_kernel_fault(mm, addr, esr, regs);
+ return 0;
+}
+
+/*
+ * First Level Translation Fault Handler
+ *
+ * We enter here because the first level page table doesn't contain a valid
+ * entry for the address.
+ *
+ * If the address is in kernel space (>= TASK_SIZE), then we are probably
+ * faulting in the vmalloc() area.
+ *
+ * If the init_task's first level page tables contains the relevant entry, we
+ * copy the it to this task. If not, we send the process a signal, fixup the
+ * exception, or oops the kernel.
+ *
+ * NOTE! We MUST NOT take any locks for this case. We may be in an interrupt
+ * or a critical region, and should only copy the information from the master
+ * page table, nothing more.
+ */
+static int __kprobes do_translation_fault(unsigned long addr,
+ unsigned int esr,
+ struct pt_regs *regs)
+{
+ if (addr < TASK_SIZE)
+ return do_page_fault(addr, esr, regs);
+
+ do_bad_area(addr, esr, regs);
+ return 0;
+}
+
+/*
+ * Some section permission faults need to be handled gracefully. They can
+ * happen due to a __{get,put}_user during an oops.
+ */
+static int do_sect_fault(unsigned long addr, unsigned int esr,
+ struct pt_regs *regs)
+{
+ do_bad_area(addr, esr, regs);
+ return 0;
+}
+
+/*
+ * This abort handler always returns "fault".
+ */
+static int do_bad(unsigned long addr, unsigned int esr, struct pt_regs *regs)
+{
+ return 1;
+}
+
+static struct fault_info {
+ int (*fn)(unsigned long addr, unsigned int esr, struct pt_regs *regs);
+ int sig;
+ int code;
+ const char *name;
+} fault_info[] = {
+ { do_bad, SIGBUS, 0, "ttbr address size fault" },
+ { do_bad, SIGBUS, 0, "level 1 address size fault" },
+ { do_bad, SIGBUS, 0, "level 2 address size fault" },
+ { do_bad, SIGBUS, 0, "level 3 address size fault" },
+ { do_translation_fault, SIGSEGV, SEGV_MAPERR, "input address range fault" },
+ { do_translation_fault, SIGSEGV, SEGV_MAPERR, "level 1 translation fault" },
+ { do_translation_fault, SIGSEGV, SEGV_MAPERR, "level 2 translation fault" },
+ { do_page_fault, SIGSEGV, SEGV_MAPERR, "level 3 translation fault" },
+ { do_bad, SIGBUS, 0, "reserved access flag fault" },
+ { do_bad, SIGSEGV, SEGV_ACCERR, "level 1 access flag fault" },
+ { do_bad, SIGSEGV, SEGV_ACCERR, "level 2 access flag fault" },
+ { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 3 access flag fault" },
+ { do_bad, SIGBUS, 0, "reserved permission fault" },
+ { do_bad, SIGSEGV, SEGV_ACCERR, "level 1 permission fault" },
+ { do_sect_fault, SIGSEGV, SEGV_ACCERR, "level 2 permission fault" },
+ { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 3 permission fault" },
+ { do_bad, SIGBUS, 0, "synchronous external abort" },
+ { do_bad, SIGBUS, 0, "asynchronous external abort" },
+ { do_bad, SIGBUS, 0, "unknown 18" },
+ { do_bad, SIGBUS, 0, "unknown 19" },
+ { do_bad, SIGBUS, 0, "synchronous abort (translation table walk)" },
+ { do_bad, SIGBUS, 0, "synchronous abort (translation table walk)" },
+ { do_bad, SIGBUS, 0, "synchronous abort (translation table walk)" },
+ { do_bad, SIGBUS, 0, "synchronous abort (translation table walk)" },
+ { do_bad, SIGBUS, 0, "synchronous parity error" },
+ { do_bad, SIGBUS, 0, "asynchronous parity error" },
+ { do_bad, SIGBUS, 0, "unknown 26" },
+ { do_bad, SIGBUS, 0, "unknown 27" },
+ { do_bad, SIGBUS, 0, "synchronous parity error (translation table walk" },
+ { do_bad, SIGBUS, 0, "synchronous parity error (translation table walk" },
+ { do_bad, SIGBUS, 0, "synchronous parity error (translation table walk" },
+ { do_bad, SIGBUS, 0, "synchronous parity error (translation table walk" },
+ { do_bad, SIGBUS, 0, "unknown 32" },
+ { do_bad, SIGBUS, BUS_ADRALN, "alignment fault" },
+ { do_bad, SIGBUS, 0, "debug event" },
+ { do_bad, SIGBUS, 0, "unknown 35" },
+ { do_bad, SIGBUS, 0, "unknown 36" },
+ { do_bad, SIGBUS, 0, "unknown 37" },
+ { do_bad, SIGBUS, 0, "unknown 38" },
+ { do_bad, SIGBUS, 0, "unknown 39" },
+ { do_bad, SIGBUS, 0, "unknown 40" },
+ { do_bad, SIGBUS, 0, "unknown 41" },
+ { do_bad, SIGBUS, 0, "unknown 42" },
+ { do_bad, SIGBUS, 0, "unknown 43" },
+ { do_bad, SIGBUS, 0, "unknown 44" },
+ { do_bad, SIGBUS, 0, "unknown 45" },
+ { do_bad, SIGBUS, 0, "unknown 46" },
+ { do_bad, SIGBUS, 0, "unknown 47" },
+ { do_bad, SIGBUS, 0, "unknown 48" },
+ { do_bad, SIGBUS, 0, "unknown 49" },
+ { do_bad, SIGBUS, 0, "unknown 50" },
+ { do_bad, SIGBUS, 0, "unknown 51" },
+ { do_bad, SIGBUS, 0, "implementation fault (lockdown abort)" },
+ { do_bad, SIGBUS, 0, "unknown 53" },
+ { do_bad, SIGBUS, 0, "unknown 54" },
+ { do_bad, SIGBUS, 0, "unknown 55" },
+ { do_bad, SIGBUS, 0, "unknown 56" },
+ { do_bad, SIGBUS, 0, "unknown 57" },
+ { do_bad, SIGBUS, 0, "implementation fault (coprocessor abort)" },
+ { do_bad, SIGBUS, 0, "unknown 59" },
+ { do_bad, SIGBUS, 0, "unknown 60" },
+ { do_bad, SIGBUS, 0, "unknown 61" },
+ { do_bad, SIGBUS, 0, "unknown 62" },
+ { do_bad, SIGBUS, 0, "unknown 63" },
+};
+
+/*
+ * Dispatch a data abort to the relevant handler.
+ */
+asmlinkage void __exception do_mem_abort(unsigned long addr, unsigned int esr,
+ struct pt_regs *regs)
+{
+ const struct fault_info *inf = fault_info + (esr & 63);
+ struct siginfo info;
+
+ if (!inf->fn(addr, esr, regs))
+ return;
+
+ pr_alert("Unhandled fault: %s (0x%08x) at 0x%016lx\n",
+ inf->name, esr, addr);
+
+ info.si_signo = inf->sig;
+ info.si_errno = 0;
+ info.si_code = inf->code;
+ info.si_addr = (void __user *)addr;
+ arm64_notify_die("", regs, &info, esr);
+}
+
+/*
+ * Handle stack alignment exceptions.
+ */
+asmlinkage void __exception do_sp_pc_abort(unsigned long addr,
+ unsigned int esr,
+ struct pt_regs *regs)
+{
+ struct siginfo info;
+
+ info.si_signo = SIGBUS;
+ info.si_errno = 0;
+ info.si_code = BUS_ADRALN;
+ info.si_addr = (void __user *)addr;
+ arm64_notify_die("", regs, &info, esr);
+}
+
+static struct fault_info debug_fault_info[] = {
+ { do_bad, SIGTRAP, TRAP_HWBKPT, "hardware breakpoint" },
+ { do_bad, SIGTRAP, TRAP_HWBKPT, "hardware single-step" },
+ { do_bad, SIGTRAP, TRAP_HWBKPT, "hardware watchpoint" },
+ { do_bad, SIGBUS, 0, "unknown 3" },
+ { do_bad, SIGTRAP, TRAP_BRKPT, "aarch32 BKPT" },
+ { do_bad, SIGTRAP, 0, "aarch32 vector catch" },
+ { do_bad, SIGTRAP, TRAP_BRKPT, "aarch64 BRK" },
+ { do_bad, SIGBUS, 0, "unknown 7" },
+};
+
+void __init hook_debug_fault_code(int nr,
+ int (*fn)(unsigned long, unsigned int, struct pt_regs *),
+ int sig, int code, const char *name)
+{
+ BUG_ON(nr < 0 || nr >= ARRAY_SIZE(debug_fault_info));
+
+ debug_fault_info[nr].fn = fn;
+ debug_fault_info[nr].sig = sig;
+ debug_fault_info[nr].code = code;
+ debug_fault_info[nr].name = name;
+}
+
+asmlinkage int __exception do_debug_exception(unsigned long addr,
+ unsigned int esr,
+ struct pt_regs *regs)
+{
+ const struct fault_info *inf = debug_fault_info + DBG_ESR_EVT(esr);
+ struct siginfo info;
+
+ if (!inf->fn(addr, esr, regs))
+ return 1;
+
+ pr_alert("Unhandled debug exception: %s (0x%08x) at 0x%016lx\n",
+ inf->name, esr, addr);
+
+ info.si_signo = inf->sig;
+ info.si_errno = 0;
+ info.si_code = inf->code;
+ info.si_addr = (void __user *)addr;
+ arm64_notify_die("", regs, &info, esr);
+
+ return 0;
+}
diff --git a/arch/arm64/mm/mm.h b/arch/arm64/mm/mm.h
new file mode 100644
index 0000000..c84f68b
--- /dev/null
+++ b/arch/arm64/mm/mm.h
@@ -0,0 +1,2 @@
+extern void __flush_dcache_page(struct address_space *mapping, struct page *page);
+extern void __init bootmem_init(void);
diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c
new file mode 100644
index 0000000..7c7be78
--- /dev/null
+++ b/arch/arm64/mm/mmap.c
@@ -0,0 +1,144 @@
+/*
+ * Based on arch/arm/mm/mmap.c
+ *
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/elf.h>
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/export.h>
+#include <linux/shm.h>
+#include <linux/sched.h>
+#include <linux/io.h>
+#include <linux/personality.h>
+#include <linux/random.h>
+
+#include <asm/cputype.h>
+
+/*
+ * Leave enough space between the mmap area and the stack to honour ulimit in
+ * the face of randomisation.
+ */
+#define MIN_GAP (SZ_128M + ((STACK_RND_MASK << PAGE_SHIFT) + 1))
+#define MAX_GAP (STACK_TOP/6*5)
+
+static int mmap_is_legacy(void)
+{
+ if (current->personality & ADDR_COMPAT_LAYOUT)
+ return 1;
+
+ if (rlimit(RLIMIT_STACK) == RLIM_INFINITY)
+ return 1;
+
+ return sysctl_legacy_va_layout;
+}
+
+/*
+ * Since get_random_int() returns the same value within a 1 jiffy window, we
+ * will almost always get the same randomisation for the stack and mmap
+ * region. This will mean the relative distance between stack and mmap will be
+ * the same.
+ *
+ * To avoid this we can shift the randomness by 1 bit.
+ */
+static unsigned long mmap_rnd(void)
+{
+ unsigned long rnd = 0;
+
+ if (current->flags & PF_RANDOMIZE)
+ rnd = (long)get_random_int() & (STACK_RND_MASK >> 1);
+
+ return rnd << (PAGE_SHIFT + 1);
+}
+
+static unsigned long mmap_base(void)
+{
+ unsigned long gap = rlimit(RLIMIT_STACK);
+
+ if (gap < MIN_GAP)
+ gap = MIN_GAP;
+ else if (gap > MAX_GAP)
+ gap = MAX_GAP;
+
+ return PAGE_ALIGN(STACK_TOP - gap - mmap_rnd());
+}
+
+/*
+ * This function, called very early during the creation of a new process VM
+ * image, sets up which VM layout function to use:
+ */
+void arch_pick_mmap_layout(struct mm_struct *mm)
+{
+ /*
+ * Fall back to the standard layout if the personality bit is set, or
+ * if the expected stack growth is unlimited:
+ */
+ if (mmap_is_legacy()) {
+ mm->mmap_base = TASK_UNMAPPED_BASE;
+ mm->get_unmapped_area = arch_get_unmapped_area;
+ mm->unmap_area = arch_unmap_area;
+ } else {
+ mm->mmap_base = mmap_base();
+ mm->get_unmapped_area = arch_get_unmapped_area_topdown;
+ mm->unmap_area = arch_unmap_area_topdown;
+ }
+}
+EXPORT_SYMBOL_GPL(arch_pick_mmap_layout);
+
+
+/*
+ * You really shouldn't be using read() or write() on /dev/mem. This might go
+ * away in the future.
+ */
+int valid_phys_addr_range(unsigned long addr, size_t size)
+{
+ if (addr < PHYS_OFFSET)
+ return 0;
+ if (addr + size > __pa(high_memory - 1) + 1)
+ return 0;
+
+ return 1;
+}
+
+/*
+ * Do not allow /dev/mem mappings beyond the supported physical range.
+ */
+int valid_mmap_phys_addr_range(unsigned long pfn, size_t size)
+{
+ return !(((pfn << PAGE_SHIFT) + size) & ~PHYS_MASK);
+}
+
+#ifdef CONFIG_STRICT_DEVMEM
+
+#include <linux/ioport.h>
+
+/*
+ * devmem_is_allowed() checks to see if /dev/mem access to a certain address
+ * is valid. The argument is a physical page number. We mimic x86 here by
+ * disallowing access to system RAM as well as device-exclusive MMIO regions.
+ * This effectively disable read()/write() on /dev/mem.
+ */
+int devmem_is_allowed(unsigned long pfn)
+{
+ if (iomem_is_exclusive(pfn << PAGE_SHIFT))
+ return 0;
+ if (!page_is_ram(pfn))
+ return 1;
+ return 0;
+}
+
+#endif
diff --git a/arch/arm64/mm/pgd.c b/arch/arm64/mm/pgd.c
new file mode 100644
index 0000000..7a7b0e9
--- /dev/null
+++ b/arch/arm64/mm/pgd.c
@@ -0,0 +1,49 @@
+/*
+ * PGD allocation/freeing
+ *
+ * Copyright (C) 2012 ARM Ltd.
+ * Author: Catalin Marinas <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/mm.h>
+#include <linux/gfp.h>
+#include <linux/highmem.h>
+#include <linux/slab.h>
+
+#include <asm/pgalloc.h>
+#include <asm/page.h>
+#include <asm/tlbflush.h>
+
+#include "mm.h"
+
+#define PGD_ORDER 0
+
+pgd_t *pgd_alloc(struct mm_struct *mm)
+{
+ pgd_t *new_pgd;
+
+ new_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL, PGD_ORDER);
+ if (!new_pgd)
+ return NULL;
+
+ memset(new_pgd, 0, PAGE_SIZE << PGD_ORDER);
+
+ return new_pgd;
+}
+
+void pgd_free(struct mm_struct *mm, pgd_t *pgd)
+{
+ free_pages((unsigned long)pgd, PGD_ORDER);
+}

2012-08-14 17:54:58

by Catalin Marinas

[permalink] [raw]
Subject: [PATCH v2 26/31] arm64: Miscellaneous library functions

From: Marc Zyngier <[email protected]>

This patch adds udelay, memory and bit operations together with the
ksyms exports.

Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
---
arch/arm64/include/asm/bitops.h | 74 ++++++++++++++++++++++++++++
arch/arm64/include/asm/syscall.h | 101 ++++++++++++++++++++++++++++++++++++++
arch/arm64/kernel/arm64ksyms.c | 55 ++++++++++++++++++++
arch/arm64/lib/Makefile | 5 ++
arch/arm64/lib/bitops.c | 25 +++++++++
arch/arm64/lib/clear_page.S | 39 +++++++++++++++
arch/arm64/lib/copy_page.S | 46 +++++++++++++++++
arch/arm64/lib/delay.c | 55 ++++++++++++++++++++
8 files changed, 400 insertions(+), 0 deletions(-)
create mode 100644 arch/arm64/include/asm/bitops.h
create mode 100644 arch/arm64/include/asm/syscall.h
create mode 100644 arch/arm64/kernel/arm64ksyms.c
create mode 100644 arch/arm64/lib/Makefile
create mode 100644 arch/arm64/lib/bitops.c
create mode 100644 arch/arm64/lib/clear_page.S
create mode 100644 arch/arm64/lib/copy_page.S
create mode 100644 arch/arm64/lib/delay.c

diff --git a/arch/arm64/include/asm/bitops.h b/arch/arm64/include/asm/bitops.h
new file mode 100644
index 0000000..67df4d2
--- /dev/null
+++ b/arch/arm64/include/asm/bitops.h
@@ -0,0 +1,74 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_BITOPS_H
+#define __ASM_BITOPS_H
+
+#include <linux/compiler.h>
+
+#include <asm/barrier.h>
+
+/*
+ * clear_bit may not imply a memory barrier
+ */
+#ifndef smp_mb__before_clear_bit
+#define smp_mb__before_clear_bit() smp_mb()
+#define smp_mb__after_clear_bit() smp_mb()
+#endif
+
+/*
+ * Use compiler builtins for simple inline operations.
+ */
+static inline unsigned long __ffs(unsigned long word)
+{
+ return __builtin_ffsl(word) - 1;
+}
+
+static inline int ffs(int x)
+{
+ return __builtin_ffs(x);
+}
+
+static inline unsigned long __fls(unsigned long word)
+{
+ return BITS_PER_LONG - 1 - __builtin_clzl(word);
+}
+
+static inline int fls(int x)
+{
+ return x ? sizeof(x) * BITS_PER_BYTE - __builtin_clz(x) : 0;
+}
+
+/*
+ * Mainly use the generic routines for now.
+ */
+#ifndef _LINUX_BITOPS_H
+#error only <linux/bitops.h> can be included directly
+#endif
+
+#include <asm-generic/bitops/ffz.h>
+#include <asm-generic/bitops/fls64.h>
+#include <asm-generic/bitops/find.h>
+
+#include <asm-generic/bitops/sched.h>
+#include <asm-generic/bitops/hweight.h>
+#include <asm-generic/bitops/lock.h>
+
+#include <asm-generic/bitops/atomic.h>
+#include <asm-generic/bitops/non-atomic.h>
+#include <asm-generic/bitops/le.h>
+#include <asm-generic/bitops/ext2-atomic.h>
+
+#endif /* __ASM_BITOPS_H */
diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/asm/syscall.h
new file mode 100644
index 0000000..89c047f
--- /dev/null
+++ b/arch/arm64/include/asm/syscall.h
@@ -0,0 +1,101 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_SYSCALL_H
+#define __ASM_SYSCALL_H
+
+#include <linux/err.h>
+
+
+static inline int syscall_get_nr(struct task_struct *task,
+ struct pt_regs *regs)
+{
+ return regs->syscallno;
+}
+
+static inline void syscall_rollback(struct task_struct *task,
+ struct pt_regs *regs)
+{
+ regs->regs[0] = regs->orig_x0;
+}
+
+
+static inline long syscall_get_error(struct task_struct *task,
+ struct pt_regs *regs)
+{
+ unsigned long error = regs->regs[0];
+ return IS_ERR_VALUE(error) ? error : 0;
+}
+
+static inline long syscall_get_return_value(struct task_struct *task,
+ struct pt_regs *regs)
+{
+ return regs->regs[0];
+}
+
+static inline void syscall_set_return_value(struct task_struct *task,
+ struct pt_regs *regs,
+ int error, long val)
+{
+ regs->regs[0] = (long) error ? error : val;
+}
+
+#define SYSCALL_MAX_ARGS 6
+
+static inline void syscall_get_arguments(struct task_struct *task,
+ struct pt_regs *regs,
+ unsigned int i, unsigned int n,
+ unsigned long *args)
+{
+ if (i + n > SYSCALL_MAX_ARGS) {
+ unsigned long *args_bad = args + SYSCALL_MAX_ARGS - i;
+ unsigned int n_bad = n + i - SYSCALL_MAX_ARGS;
+ pr_warning("%s called with max args %d, handling only %d\n",
+ __func__, i + n, SYSCALL_MAX_ARGS);
+ memset(args_bad, 0, n_bad * sizeof(args[0]));
+ }
+
+ if (i == 0) {
+ args[0] = regs->orig_x0;
+ args++;
+ i++;
+ n--;
+ }
+
+ memcpy(args, &regs->regs[i], n * sizeof(args[0]));
+}
+
+static inline void syscall_set_arguments(struct task_struct *task,
+ struct pt_regs *regs,
+ unsigned int i, unsigned int n,
+ const unsigned long *args)
+{
+ if (i + n > SYSCALL_MAX_ARGS) {
+ pr_warning("%s called with max args %d, handling only %d\n",
+ __func__, i + n, SYSCALL_MAX_ARGS);
+ n = SYSCALL_MAX_ARGS - i;
+ }
+
+ if (i == 0) {
+ regs->orig_x0 = args[0];
+ args++;
+ i++;
+ n--;
+ }
+
+ memcpy(&regs->regs[i], args, n * sizeof(args[0]));
+}
+
+#endif /* __ASM_SYSCALL_H */
diff --git a/arch/arm64/kernel/arm64ksyms.c b/arch/arm64/kernel/arm64ksyms.c
new file mode 100644
index 0000000..4631573
--- /dev/null
+++ b/arch/arm64/kernel/arm64ksyms.c
@@ -0,0 +1,55 @@
+/*
+ * Based on arch/arm/kernel/armksyms.c
+ *
+ * Copyright (C) 2000 Russell King
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/export.h>
+#include <linux/sched.h>
+#include <linux/string.h>
+#include <linux/cryptohash.h>
+#include <linux/delay.h>
+#include <linux/in6.h>
+#include <linux/syscalls.h>
+#include <linux/uaccess.h>
+#include <linux/io.h>
+
+#include <asm/checksum.h>
+
+ /* user mem (segment) */
+EXPORT_SYMBOL(__strnlen_user);
+EXPORT_SYMBOL(__strncpy_from_user);
+
+EXPORT_SYMBOL(copy_page);
+
+EXPORT_SYMBOL(__copy_from_user);
+EXPORT_SYMBOL(__copy_to_user);
+EXPORT_SYMBOL(__clear_user);
+
+EXPORT_SYMBOL(__get_user_1);
+EXPORT_SYMBOL(__get_user_2);
+EXPORT_SYMBOL(__get_user_4);
+
+EXPORT_SYMBOL(__put_user_1);
+EXPORT_SYMBOL(__put_user_2);
+EXPORT_SYMBOL(__put_user_4);
+EXPORT_SYMBOL(__put_user_8);
+
+ /* bitops */
+EXPORT_SYMBOL(__atomic_hash);
+
+ /* physical memory */
+EXPORT_SYMBOL(memstart_addr);
diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
new file mode 100644
index 0000000..ae71bae
--- /dev/null
+++ b/arch/arm64/lib/Makefile
@@ -0,0 +1,5 @@
+lib-y := bitops.o delay.o \
+ strncpy_from_user.o strnlen_user.o \
+ clear_user.o getuser.o putuser.o \
+ copy_from_user.o copy_to_user.o copy_in_user.o \
+ copy_page.o clear_page.o
diff --git a/arch/arm64/lib/bitops.c b/arch/arm64/lib/bitops.c
new file mode 100644
index 0000000..aa4965e
--- /dev/null
+++ b/arch/arm64/lib/bitops.c
@@ -0,0 +1,25 @@
+/*
+ * Copyright (C) 2012 ARM Limited
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/kernel.h>
+#include <linux/spinlock.h>
+#include <linux/atomic.h>
+
+#ifdef CONFIG_SMP
+arch_spinlock_t __atomic_hash[ATOMIC_HASH_SIZE] __lock_aligned = {
+ [0 ... (ATOMIC_HASH_SIZE-1)] = __ARCH_SPIN_LOCK_UNLOCKED
+};
+#endif
diff --git a/arch/arm64/lib/clear_page.S b/arch/arm64/lib/clear_page.S
new file mode 100644
index 0000000..ef08e90
--- /dev/null
+++ b/arch/arm64/lib/clear_page.S
@@ -0,0 +1,39 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/linkage.h>
+#include <linux/const.h>
+#include <asm/assembler.h>
+#include <asm/page.h>
+
+/*
+ * Clear page @dest
+ *
+ * Parameters:
+ * x0 - dest
+ */
+ENTRY(clear_page)
+ mrs x1, dczid_el0
+ and w1, w1, #0xf
+ mov x2, #4
+ lsl x1, x2, x1
+
+1: dc zva, x0
+ add x0, x0, x1
+ tst x0, #(PAGE_SIZE - 1)
+ b.ne 1b
+ ret
+ENDPROC(clear_page)
diff --git a/arch/arm64/lib/copy_page.S b/arch/arm64/lib/copy_page.S
new file mode 100644
index 0000000..512b9a7
--- /dev/null
+++ b/arch/arm64/lib/copy_page.S
@@ -0,0 +1,46 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/linkage.h>
+#include <linux/const.h>
+#include <asm/assembler.h>
+#include <asm/page.h>
+
+/*
+ * Copy a page from src to dest (both are page aligned)
+ *
+ * Parameters:
+ * x0 - dest
+ * x1 - src
+ */
+ENTRY(copy_page)
+ /* Assume cache line size is 64 bytes. */
+ prfm pldl1strm, [x1, #64]
+1: ldp x2, x3, [x1]
+ ldp x4, x5, [x1, #16]
+ ldp x6, x7, [x1, #32]
+ ldp x8, x9, [x1, #48]
+ add x1, x1, #64
+ prfm pldl1strm, [x1, #64]
+ stnp x2, x3, [x0]
+ stnp x4, x5, [x0, #16]
+ stnp x6, x7, [x0, #32]
+ stnp x8, x9, [x0, #48]
+ add x0, x0, #64
+ tst x1, #(PAGE_SIZE - 1)
+ b.ne 1b
+ ret
+ENDPROC(copy_page)
diff --git a/arch/arm64/lib/delay.c b/arch/arm64/lib/delay.c
new file mode 100644
index 0000000..dad4ec9
--- /dev/null
+++ b/arch/arm64/lib/delay.c
@@ -0,0 +1,55 @@
+/*
+ * Delay loops based on the OpenRISC implementation.
+ *
+ * Copyright (C) 2012 ARM Limited
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Author: Will Deacon <[email protected]>
+ */
+
+#include <linux/delay.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/timex.h>
+
+void __delay(unsigned long cycles)
+{
+ cycles_t start = get_cycles();
+
+ while ((get_cycles() - start) < cycles)
+ cpu_relax();
+}
+EXPORT_SYMBOL(__delay);
+
+inline void __const_udelay(unsigned long xloops)
+{
+ unsigned long loops;
+
+ loops = xloops * loops_per_jiffy * HZ;
+ __delay(loops >> 32);
+}
+EXPORT_SYMBOL(__const_udelay);
+
+void __udelay(unsigned long usecs)
+{
+ __const_udelay(usecs * 0x10C7UL); /* 2**32 / 1000000 (rounded up) */
+}
+EXPORT_SYMBOL(__udelay);
+
+void __ndelay(unsigned long nsecs)
+{
+ __const_udelay(nsecs * 0x5UL); /* 2**32 / 1000000000 (rounded up) */
+}
+EXPORT_SYMBOL(__ndelay);

2012-08-14 17:54:55

by Catalin Marinas

[permalink] [raw]
Subject: [PATCH v2 25/31] arm64: Performance counters support

From: Will Deacon <[email protected]>

This patch adds support for the AArch64 performance counters.

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
---
arch/arm64/include/asm/perf_event.h | 22 +
arch/arm64/include/asm/pmu.h | 82 +++
arch/arm64/kernel/perf_event.c | 1368 +++++++++++++++++++++++++++++++++++
tools/perf/perf.h | 6 +
4 files changed, 1478 insertions(+), 0 deletions(-)
create mode 100644 arch/arm64/include/asm/perf_event.h
create mode 100644 arch/arm64/include/asm/pmu.h
create mode 100644 arch/arm64/kernel/perf_event.c

diff --git a/arch/arm64/include/asm/perf_event.h b/arch/arm64/include/asm/perf_event.h
new file mode 100644
index 0000000..a6fffd5
--- /dev/null
+++ b/arch/arm64/include/asm/perf_event.h
@@ -0,0 +1,22 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __ASM_PERF_EVENT_H
+#define __ASM_PERF_EVENT_H
+
+/* It's quiet around here... */
+
+#endif
diff --git a/arch/arm64/include/asm/pmu.h b/arch/arm64/include/asm/pmu.h
new file mode 100644
index 0000000..e6f0878
--- /dev/null
+++ b/arch/arm64/include/asm/pmu.h
@@ -0,0 +1,82 @@
+/*
+ * Based on arch/arm/include/asm/pmu.h
+ *
+ * Copyright (C) 2009 picoChip Designs Ltd, Jamie Iles
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_PMU_H
+#define __ASM_PMU_H
+
+#ifdef CONFIG_HW_PERF_EVENTS
+
+/* The events for a given PMU register set. */
+struct pmu_hw_events {
+ /*
+ * The events that are active on the PMU for the given index.
+ */
+ struct perf_event **events;
+
+ /*
+ * A 1 bit for an index indicates that the counter is being used for
+ * an event. A 0 means that the counter can be used.
+ */
+ unsigned long *used_mask;
+
+ /*
+ * Hardware lock to serialize accesses to PMU registers. Needed for the
+ * read/modify/write sequences.
+ */
+ raw_spinlock_t pmu_lock;
+};
+
+struct arm_pmu {
+ struct pmu pmu;
+ cpumask_t active_irqs;
+ const char *name;
+ irqreturn_t (*handle_irq)(int irq_num, void *dev);
+ void (*enable)(struct hw_perf_event *evt, int idx);
+ void (*disable)(struct hw_perf_event *evt, int idx);
+ int (*get_event_idx)(struct pmu_hw_events *hw_events,
+ struct hw_perf_event *hwc);
+ int (*set_event_filter)(struct hw_perf_event *evt,
+ struct perf_event_attr *attr);
+ u32 (*read_counter)(int idx);
+ void (*write_counter)(int idx, u32 val);
+ void (*start)(void);
+ void (*stop)(void);
+ void (*reset)(void *);
+ int (*map_event)(struct perf_event *event);
+ int num_events;
+ atomic_t active_events;
+ struct mutex reserve_mutex;
+ u64 max_period;
+ struct platform_device *plat_device;
+ struct pmu_hw_events *(*get_hw_events)(void);
+};
+
+#define to_arm_pmu(p) (container_of(p, struct arm_pmu, pmu))
+
+int __init armpmu_register(struct arm_pmu *armpmu, char *name, int type);
+
+u64 armpmu_event_update(struct perf_event *event,
+ struct hw_perf_event *hwc,
+ int idx);
+
+int armpmu_event_set_period(struct perf_event *event,
+ struct hw_perf_event *hwc,
+ int idx);
+
+#endif /* CONFIG_HW_PERF_EVENTS */
+#endif /* __ASM_PMU_H */
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
new file mode 100644
index 0000000..ecbf2d8
--- /dev/null
+++ b/arch/arm64/kernel/perf_event.c
@@ -0,0 +1,1368 @@
+/*
+ * PMU support
+ *
+ * Copyright (C) 2012 ARM Limited
+ * Author: Will Deacon <[email protected]>
+ *
+ * This code is based heavily on the ARMv7 perf event code.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#define pr_fmt(fmt) "hw perfevents: " fmt
+
+#include <linux/bitmap.h>
+#include <linux/interrupt.h>
+#include <linux/kernel.h>
+#include <linux/export.h>
+#include <linux/perf_event.h>
+#include <linux/platform_device.h>
+#include <linux/spinlock.h>
+#include <linux/uaccess.h>
+
+#include <asm/cputype.h>
+#include <asm/irq.h>
+#include <asm/irq_regs.h>
+#include <asm/pmu.h>
+#include <asm/stacktrace.h>
+
+/*
+ * ARMv8 supports a maximum of 32 events.
+ * The cycle counter is included in this total.
+ */
+#define ARMPMU_MAX_HWEVENTS 32
+
+static DEFINE_PER_CPU(struct perf_event * [ARMPMU_MAX_HWEVENTS], hw_events);
+static DEFINE_PER_CPU(unsigned long [BITS_TO_LONGS(ARMPMU_MAX_HWEVENTS)], used_mask);
+static DEFINE_PER_CPU(struct pmu_hw_events, cpu_hw_events);
+
+#define to_arm_pmu(p) (container_of(p, struct arm_pmu, pmu))
+
+/* Set at runtime when we know what CPU type we are. */
+static struct arm_pmu *cpu_pmu;
+
+int
+armpmu_get_max_events(void)
+{
+ int max_events = 0;
+
+ if (cpu_pmu != NULL)
+ max_events = cpu_pmu->num_events;
+
+ return max_events;
+}
+EXPORT_SYMBOL_GPL(armpmu_get_max_events);
+
+int perf_num_counters(void)
+{
+ return armpmu_get_max_events();
+}
+EXPORT_SYMBOL_GPL(perf_num_counters);
+
+#define HW_OP_UNSUPPORTED 0xFFFF
+
+#define C(_x) \
+ PERF_COUNT_HW_CACHE_##_x
+
+#define CACHE_OP_UNSUPPORTED 0xFFFF
+
+static int
+armpmu_map_cache_event(const unsigned (*cache_map)
+ [PERF_COUNT_HW_CACHE_MAX]
+ [PERF_COUNT_HW_CACHE_OP_MAX]
+ [PERF_COUNT_HW_CACHE_RESULT_MAX],
+ u64 config)
+{
+ unsigned int cache_type, cache_op, cache_result, ret;
+
+ cache_type = (config >> 0) & 0xff;
+ if (cache_type >= PERF_COUNT_HW_CACHE_MAX)
+ return -EINVAL;
+
+ cache_op = (config >> 8) & 0xff;
+ if (cache_op >= PERF_COUNT_HW_CACHE_OP_MAX)
+ return -EINVAL;
+
+ cache_result = (config >> 16) & 0xff;
+ if (cache_result >= PERF_COUNT_HW_CACHE_RESULT_MAX)
+ return -EINVAL;
+
+ ret = (int)(*cache_map)[cache_type][cache_op][cache_result];
+
+ if (ret == CACHE_OP_UNSUPPORTED)
+ return -ENOENT;
+
+ return ret;
+}
+
+static int
+armpmu_map_event(const unsigned (*event_map)[PERF_COUNT_HW_MAX], u64 config)
+{
+ int mapping = (*event_map)[config];
+ return mapping == HW_OP_UNSUPPORTED ? -ENOENT : mapping;
+}
+
+static int
+armpmu_map_raw_event(u32 raw_event_mask, u64 config)
+{
+ return (int)(config & raw_event_mask);
+}
+
+static int map_cpu_event(struct perf_event *event,
+ const unsigned (*event_map)[PERF_COUNT_HW_MAX],
+ const unsigned (*cache_map)
+ [PERF_COUNT_HW_CACHE_MAX]
+ [PERF_COUNT_HW_CACHE_OP_MAX]
+ [PERF_COUNT_HW_CACHE_RESULT_MAX],
+ u32 raw_event_mask)
+{
+ u64 config = event->attr.config;
+
+ switch (event->attr.type) {
+ case PERF_TYPE_HARDWARE:
+ return armpmu_map_event(event_map, config);
+ case PERF_TYPE_HW_CACHE:
+ return armpmu_map_cache_event(cache_map, config);
+ case PERF_TYPE_RAW:
+ return armpmu_map_raw_event(raw_event_mask, config);
+ }
+
+ return -ENOENT;
+}
+
+int
+armpmu_event_set_period(struct perf_event *event,
+ struct hw_perf_event *hwc,
+ int idx)
+{
+ struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
+ s64 left = local64_read(&hwc->period_left);
+ s64 period = hwc->sample_period;
+ int ret = 0;
+
+ if (unlikely(left <= -period)) {
+ left = period;
+ local64_set(&hwc->period_left, left);
+ hwc->last_period = period;
+ ret = 1;
+ }
+
+ if (unlikely(left <= 0)) {
+ left += period;
+ local64_set(&hwc->period_left, left);
+ hwc->last_period = period;
+ ret = 1;
+ }
+
+ if (left > (s64)armpmu->max_period)
+ left = armpmu->max_period;
+
+ local64_set(&hwc->prev_count, (u64)-left);
+
+ armpmu->write_counter(idx, (u64)(-left) & 0xffffffff);
+
+ perf_event_update_userpage(event);
+
+ return ret;
+}
+
+u64
+armpmu_event_update(struct perf_event *event,
+ struct hw_perf_event *hwc,
+ int idx)
+{
+ struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
+ u64 delta, prev_raw_count, new_raw_count;
+
+again:
+ prev_raw_count = local64_read(&hwc->prev_count);
+ new_raw_count = armpmu->read_counter(idx);
+
+ if (local64_cmpxchg(&hwc->prev_count, prev_raw_count,
+ new_raw_count) != prev_raw_count)
+ goto again;
+
+ delta = (new_raw_count - prev_raw_count) & armpmu->max_period;
+
+ local64_add(delta, &event->count);
+ local64_sub(delta, &hwc->period_left);
+
+ return new_raw_count;
+}
+
+static void
+armpmu_read(struct perf_event *event)
+{
+ struct hw_perf_event *hwc = &event->hw;
+
+ /* Don't read disabled counters! */
+ if (hwc->idx < 0)
+ return;
+
+ armpmu_event_update(event, hwc, hwc->idx);
+}
+
+static void
+armpmu_stop(struct perf_event *event, int flags)
+{
+ struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
+ struct hw_perf_event *hwc = &event->hw;
+
+ /*
+ * ARM pmu always has to update the counter, so ignore
+ * PERF_EF_UPDATE, see comments in armpmu_start().
+ */
+ if (!(hwc->state & PERF_HES_STOPPED)) {
+ armpmu->disable(hwc, hwc->idx);
+ barrier(); /* why? */
+ armpmu_event_update(event, hwc, hwc->idx);
+ hwc->state |= PERF_HES_STOPPED | PERF_HES_UPTODATE;
+ }
+}
+
+static void
+armpmu_start(struct perf_event *event, int flags)
+{
+ struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
+ struct hw_perf_event *hwc = &event->hw;
+
+ /*
+ * ARM pmu always has to reprogram the period, so ignore
+ * PERF_EF_RELOAD, see the comment below.
+ */
+ if (flags & PERF_EF_RELOAD)
+ WARN_ON_ONCE(!(hwc->state & PERF_HES_UPTODATE));
+
+ hwc->state = 0;
+ /*
+ * Set the period again. Some counters can't be stopped, so when we
+ * were stopped we simply disabled the IRQ source and the counter
+ * may have been left counting. If we don't do this step then we may
+ * get an interrupt too soon or *way* too late if the overflow has
+ * happened since disabling.
+ */
+ armpmu_event_set_period(event, hwc, hwc->idx);
+ armpmu->enable(hwc, hwc->idx);
+}
+
+static void
+armpmu_del(struct perf_event *event, int flags)
+{
+ struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
+ struct pmu_hw_events *hw_events = armpmu->get_hw_events();
+ struct hw_perf_event *hwc = &event->hw;
+ int idx = hwc->idx;
+
+ WARN_ON(idx < 0);
+
+ armpmu_stop(event, PERF_EF_UPDATE);
+ hw_events->events[idx] = NULL;
+ clear_bit(idx, hw_events->used_mask);
+
+ perf_event_update_userpage(event);
+}
+
+static int
+armpmu_add(struct perf_event *event, int flags)
+{
+ struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
+ struct pmu_hw_events *hw_events = armpmu->get_hw_events();
+ struct hw_perf_event *hwc = &event->hw;
+ int idx;
+ int err = 0;
+
+ perf_pmu_disable(event->pmu);
+
+ /* If we don't have a space for the counter then finish early. */
+ idx = armpmu->get_event_idx(hw_events, hwc);
+ if (idx < 0) {
+ err = idx;
+ goto out;
+ }
+
+ /*
+ * If there is an event in the counter we are going to use then make
+ * sure it is disabled.
+ */
+ event->hw.idx = idx;
+ armpmu->disable(hwc, idx);
+ hw_events->events[idx] = event;
+
+ hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
+ if (flags & PERF_EF_START)
+ armpmu_start(event, PERF_EF_RELOAD);
+
+ /* Propagate our changes to the userspace mapping. */
+ perf_event_update_userpage(event);
+
+out:
+ perf_pmu_enable(event->pmu);
+ return err;
+}
+
+static int
+validate_event(struct pmu_hw_events *hw_events,
+ struct perf_event *event)
+{
+ struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
+ struct hw_perf_event fake_event = event->hw;
+ struct pmu *leader_pmu = event->group_leader->pmu;
+
+ if (event->pmu != leader_pmu || event->state <= PERF_EVENT_STATE_OFF)
+ return 1;
+
+ return armpmu->get_event_idx(hw_events, &fake_event) >= 0;
+}
+
+static int
+validate_group(struct perf_event *event)
+{
+ struct perf_event *sibling, *leader = event->group_leader;
+ struct pmu_hw_events fake_pmu;
+ DECLARE_BITMAP(fake_used_mask, ARMPMU_MAX_HWEVENTS);
+
+ /*
+ * Initialise the fake PMU. We only need to populate the
+ * used_mask for the purposes of validation.
+ */
+ memset(fake_used_mask, 0, sizeof(fake_used_mask));
+ fake_pmu.used_mask = fake_used_mask;
+
+ if (!validate_event(&fake_pmu, leader))
+ return -EINVAL;
+
+ list_for_each_entry(sibling, &leader->sibling_list, group_entry) {
+ if (!validate_event(&fake_pmu, sibling))
+ return -EINVAL;
+ }
+
+ if (!validate_event(&fake_pmu, event))
+ return -EINVAL;
+
+ return 0;
+}
+
+static void
+armpmu_release_hardware(struct arm_pmu *armpmu)
+{
+ int i, irq, irqs;
+ struct platform_device *pmu_device = armpmu->plat_device;
+
+ irqs = min(pmu_device->num_resources, num_possible_cpus());
+
+ for (i = 0; i < irqs; ++i) {
+ if (!cpumask_test_and_clear_cpu(i, &armpmu->active_irqs))
+ continue;
+ irq = platform_get_irq(pmu_device, i);
+ if (irq >= 0)
+ free_irq(irq, armpmu);
+ }
+}
+
+static int
+armpmu_reserve_hardware(struct arm_pmu *armpmu)
+{
+ int i, err, irq, irqs;
+ struct platform_device *pmu_device = armpmu->plat_device;
+
+ if (!pmu_device) {
+ pr_err("no PMU device registered\n");
+ return -ENODEV;
+ }
+
+ irqs = min(pmu_device->num_resources, num_possible_cpus());
+ if (irqs < 1) {
+ pr_err("no irqs for PMUs defined\n");
+ return -ENODEV;
+ }
+
+ for (i = 0; i < irqs; ++i) {
+ err = 0;
+ irq = platform_get_irq(pmu_device, i);
+ if (irq < 0)
+ continue;
+
+ /*
+ * If we have a single PMU interrupt that we can't shift,
+ * assume that we're running on a uniprocessor machine and
+ * continue. Otherwise, continue without this interrupt.
+ */
+ if (irq_set_affinity(irq, cpumask_of(i)) && irqs > 1) {
+ pr_warning("unable to set irq affinity (irq=%d, cpu=%u)\n",
+ irq, i);
+ continue;
+ }
+
+ err = request_irq(irq, armpmu->handle_irq,
+ IRQF_NOBALANCING,
+ "arm-pmu", armpmu);
+ if (err) {
+ pr_err("unable to request IRQ%d for ARM PMU counters\n",
+ irq);
+ armpmu_release_hardware(armpmu);
+ return err;
+ }
+
+ cpumask_set_cpu(i, &armpmu->active_irqs);
+ }
+
+ return 0;
+}
+
+static void
+hw_perf_event_destroy(struct perf_event *event)
+{
+ struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
+ atomic_t *active_events = &armpmu->active_events;
+ struct mutex *pmu_reserve_mutex = &armpmu->reserve_mutex;
+
+ if (atomic_dec_and_mutex_lock(active_events, pmu_reserve_mutex)) {
+ armpmu_release_hardware(armpmu);
+ mutex_unlock(pmu_reserve_mutex);
+ }
+}
+
+static int
+event_requires_mode_exclusion(struct perf_event_attr *attr)
+{
+ return attr->exclude_idle || attr->exclude_user ||
+ attr->exclude_kernel || attr->exclude_hv;
+}
+
+static int
+__hw_perf_event_init(struct perf_event *event)
+{
+ struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
+ struct hw_perf_event *hwc = &event->hw;
+ int mapping, err;
+
+ mapping = armpmu->map_event(event);
+
+ if (mapping < 0) {
+ pr_debug("event %x:%llx not supported\n", event->attr.type,
+ event->attr.config);
+ return mapping;
+ }
+
+ /*
+ * We don't assign an index until we actually place the event onto
+ * hardware. Use -1 to signify that we haven't decided where to put it
+ * yet. For SMP systems, each core has it's own PMU so we can't do any
+ * clever allocation or constraints checking at this point.
+ */
+ hwc->idx = -1;
+ hwc->config_base = 0;
+ hwc->config = 0;
+ hwc->event_base = 0;
+
+ /*
+ * Check whether we need to exclude the counter from certain modes.
+ */
+ if ((!armpmu->set_event_filter ||
+ armpmu->set_event_filter(hwc, &event->attr)) &&
+ event_requires_mode_exclusion(&event->attr)) {
+ pr_debug("ARM performance counters do not support mode exclusion\n");
+ return -EPERM;
+ }
+
+ /*
+ * Store the event encoding into the config_base field.
+ */
+ hwc->config_base |= (unsigned long)mapping;
+
+ if (!hwc->sample_period) {
+ /*
+ * For non-sampling runs, limit the sample_period to half
+ * of the counter width. That way, the new counter value
+ * is far less likely to overtake the previous one unless
+ * you have some serious IRQ latency issues.
+ */
+ hwc->sample_period = armpmu->max_period >> 1;
+ hwc->last_period = hwc->sample_period;
+ local64_set(&hwc->period_left, hwc->sample_period);
+ }
+
+ err = 0;
+ if (event->group_leader != event) {
+ err = validate_group(event);
+ if (err)
+ return -EINVAL;
+ }
+
+ return err;
+}
+
+static int armpmu_event_init(struct perf_event *event)
+{
+ struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
+ int err = 0;
+ atomic_t *active_events = &armpmu->active_events;
+
+ if (armpmu->map_event(event) == -ENOENT)
+ return -ENOENT;
+
+ event->destroy = hw_perf_event_destroy;
+
+ if (!atomic_inc_not_zero(active_events)) {
+ mutex_lock(&armpmu->reserve_mutex);
+ if (atomic_read(active_events) == 0)
+ err = armpmu_reserve_hardware(armpmu);
+
+ if (!err)
+ atomic_inc(active_events);
+ mutex_unlock(&armpmu->reserve_mutex);
+ }
+
+ if (err)
+ return err;
+
+ err = __hw_perf_event_init(event);
+ if (err)
+ hw_perf_event_destroy(event);
+
+ return err;
+}
+
+static void armpmu_enable(struct pmu *pmu)
+{
+ struct arm_pmu *armpmu = to_arm_pmu(pmu);
+ struct pmu_hw_events *hw_events = armpmu->get_hw_events();
+ int enabled = bitmap_weight(hw_events->used_mask, armpmu->num_events);
+
+ if (enabled)
+ armpmu->start();
+}
+
+static void armpmu_disable(struct pmu *pmu)
+{
+ struct arm_pmu *armpmu = to_arm_pmu(pmu);
+ armpmu->stop();
+}
+
+static void __init armpmu_init(struct arm_pmu *armpmu)
+{
+ atomic_set(&armpmu->active_events, 0);
+ mutex_init(&armpmu->reserve_mutex);
+
+ armpmu->pmu = (struct pmu) {
+ .pmu_enable = armpmu_enable,
+ .pmu_disable = armpmu_disable,
+ .event_init = armpmu_event_init,
+ .add = armpmu_add,
+ .del = armpmu_del,
+ .start = armpmu_start,
+ .stop = armpmu_stop,
+ .read = armpmu_read,
+ };
+}
+
+int __init armpmu_register(struct arm_pmu *armpmu, char *name, int type)
+{
+ armpmu_init(armpmu);
+ return perf_pmu_register(&armpmu->pmu, name, type);
+}
+
+/*
+ * ARMv8 PMUv3 Performance Events handling code.
+ * Common event types.
+ */
+enum armv8_pmuv3_perf_types {
+ /* Required events. */
+ ARMV8_PMUV3_PERFCTR_PMNC_SW_INCR = 0x00,
+ ARMV8_PMUV3_PERFCTR_L1_DCACHE_REFILL = 0x03,
+ ARMV8_PMUV3_PERFCTR_L1_DCACHE_ACCESS = 0x04,
+ ARMV8_PMUV3_PERFCTR_PC_BRANCH_MIS_PRED = 0x10,
+ ARMV8_PMUV3_PERFCTR_CLOCK_CYCLES = 0x11,
+ ARMV8_PMUV3_PERFCTR_PC_BRANCH_PRED = 0x12,
+
+ /* At least one of the following is required. */
+ ARMV8_PMUV3_PERFCTR_INSTR_EXECUTED = 0x08,
+ ARMV8_PMUV3_PERFCTR_OP_SPEC = 0x1B,
+
+ /* Common architectural events. */
+ ARMV8_PMUV3_PERFCTR_MEM_READ = 0x06,
+ ARMV8_PMUV3_PERFCTR_MEM_WRITE = 0x07,
+ ARMV8_PMUV3_PERFCTR_EXC_TAKEN = 0x09,
+ ARMV8_PMUV3_PERFCTR_EXC_EXECUTED = 0x0A,
+ ARMV8_PMUV3_PERFCTR_CID_WRITE = 0x0B,
+ ARMV8_PMUV3_PERFCTR_PC_WRITE = 0x0C,
+ ARMV8_PMUV3_PERFCTR_PC_IMM_BRANCH = 0x0D,
+ ARMV8_PMUV3_PERFCTR_PC_PROC_RETURN = 0x0E,
+ ARMV8_PMUV3_PERFCTR_MEM_UNALIGNED_ACCESS = 0x0F,
+ ARMV8_PMUV3_PERFCTR_TTBR_WRITE = 0x1C,
+
+ /* Common microarchitectural events. */
+ ARMV8_PMUV3_PERFCTR_L1_ICACHE_REFILL = 0x01,
+ ARMV8_PMUV3_PERFCTR_ITLB_REFILL = 0x02,
+ ARMV8_PMUV3_PERFCTR_DTLB_REFILL = 0x05,
+ ARMV8_PMUV3_PERFCTR_MEM_ACCESS = 0x13,
+ ARMV8_PMUV3_PERFCTR_L1_ICACHE_ACCESS = 0x14,
+ ARMV8_PMUV3_PERFCTR_L1_DCACHE_WB = 0x15,
+ ARMV8_PMUV3_PERFCTR_L2_CACHE_ACCESS = 0x16,
+ ARMV8_PMUV3_PERFCTR_L2_CACHE_REFILL = 0x17,
+ ARMV8_PMUV3_PERFCTR_L2_CACHE_WB = 0x18,
+ ARMV8_PMUV3_PERFCTR_BUS_ACCESS = 0x19,
+ ARMV8_PMUV3_PERFCTR_MEM_ERROR = 0x1A,
+ ARMV8_PMUV3_PERFCTR_BUS_CYCLES = 0x1D,
+
+ /*
+ * This isn't an architected event.
+ * We detect this event number and use the cycle counter instead.
+ */
+ ARMV8_PMUV3_PERFCTR_CPU_CYCLES = 0xFF,
+};
+
+/* PMUv3 HW events mapping. */
+static const unsigned armv8_pmuv3_perf_map[PERF_COUNT_HW_MAX] = {
+ [PERF_COUNT_HW_CPU_CYCLES] = ARMV8_PMUV3_PERFCTR_CPU_CYCLES,
+ [PERF_COUNT_HW_INSTRUCTIONS] = ARMV8_PMUV3_PERFCTR_INSTR_EXECUTED,
+ [PERF_COUNT_HW_CACHE_REFERENCES] = ARMV8_PMUV3_PERFCTR_L1_DCACHE_ACCESS,
+ [PERF_COUNT_HW_CACHE_MISSES] = ARMV8_PMUV3_PERFCTR_L1_DCACHE_REFILL,
+ [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = HW_OP_UNSUPPORTED,
+ [PERF_COUNT_HW_BRANCH_MISSES] = ARMV8_PMUV3_PERFCTR_PC_BRANCH_MIS_PRED,
+ [PERF_COUNT_HW_BUS_CYCLES] = HW_OP_UNSUPPORTED,
+ [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = HW_OP_UNSUPPORTED,
+ [PERF_COUNT_HW_STALLED_CYCLES_BACKEND] = HW_OP_UNSUPPORTED,
+};
+
+static const unsigned armv8_pmuv3_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
+ [PERF_COUNT_HW_CACHE_OP_MAX]
+ [PERF_COUNT_HW_CACHE_RESULT_MAX] = {
+ [C(L1D)] = {
+ [C(OP_READ)] = {
+ [C(RESULT_ACCESS)] = ARMV8_PMUV3_PERFCTR_L1_DCACHE_ACCESS,
+ [C(RESULT_MISS)] = ARMV8_PMUV3_PERFCTR_L1_DCACHE_REFILL,
+ },
+ [C(OP_WRITE)] = {
+ [C(RESULT_ACCESS)] = ARMV8_PMUV3_PERFCTR_L1_DCACHE_ACCESS,
+ [C(RESULT_MISS)] = ARMV8_PMUV3_PERFCTR_L1_DCACHE_REFILL,
+ },
+ [C(OP_PREFETCH)] = {
+ [C(RESULT_ACCESS)] = CACHE_OP_UNSUPPORTED,
+ [C(RESULT_MISS)] = CACHE_OP_UNSUPPORTED,
+ },
+ },
+ [C(L1I)] = {
+ [C(OP_READ)] = {
+ [C(RESULT_ACCESS)] = CACHE_OP_UNSUPPORTED,
+ [C(RESULT_MISS)] = CACHE_OP_UNSUPPORTED,
+ },
+ [C(OP_WRITE)] = {
+ [C(RESULT_ACCESS)] = CACHE_OP_UNSUPPORTED,
+ [C(RESULT_MISS)] = CACHE_OP_UNSUPPORTED,
+ },
+ [C(OP_PREFETCH)] = {
+ [C(RESULT_ACCESS)] = CACHE_OP_UNSUPPORTED,
+ [C(RESULT_MISS)] = CACHE_OP_UNSUPPORTED,
+ },
+ },
+ [C(LL)] = {
+ [C(OP_READ)] = {
+ [C(RESULT_ACCESS)] = CACHE_OP_UNSUPPORTED,
+ [C(RESULT_MISS)] = CACHE_OP_UNSUPPORTED,
+ },
+ [C(OP_WRITE)] = {
+ [C(RESULT_ACCESS)] = CACHE_OP_UNSUPPORTED,
+ [C(RESULT_MISS)] = CACHE_OP_UNSUPPORTED,
+ },
+ [C(OP_PREFETCH)] = {
+ [C(RESULT_ACCESS)] = CACHE_OP_UNSUPPORTED,
+ [C(RESULT_MISS)] = CACHE_OP_UNSUPPORTED,
+ },
+ },
+ [C(DTLB)] = {
+ [C(OP_READ)] = {
+ [C(RESULT_ACCESS)] = CACHE_OP_UNSUPPORTED,
+ [C(RESULT_MISS)] = CACHE_OP_UNSUPPORTED,
+ },
+ [C(OP_WRITE)] = {
+ [C(RESULT_ACCESS)] = CACHE_OP_UNSUPPORTED,
+ [C(RESULT_MISS)] = CACHE_OP_UNSUPPORTED,
+ },
+ [C(OP_PREFETCH)] = {
+ [C(RESULT_ACCESS)] = CACHE_OP_UNSUPPORTED,
+ [C(RESULT_MISS)] = CACHE_OP_UNSUPPORTED,
+ },
+ },
+ [C(ITLB)] = {
+ [C(OP_READ)] = {
+ [C(RESULT_ACCESS)] = CACHE_OP_UNSUPPORTED,
+ [C(RESULT_MISS)] = CACHE_OP_UNSUPPORTED,
+ },
+ [C(OP_WRITE)] = {
+ [C(RESULT_ACCESS)] = CACHE_OP_UNSUPPORTED,
+ [C(RESULT_MISS)] = CACHE_OP_UNSUPPORTED,
+ },
+ [C(OP_PREFETCH)] = {
+ [C(RESULT_ACCESS)] = CACHE_OP_UNSUPPORTED,
+ [C(RESULT_MISS)] = CACHE_OP_UNSUPPORTED,
+ },
+ },
+ [C(BPU)] = {
+ [C(OP_READ)] = {
+ [C(RESULT_ACCESS)] = ARMV8_PMUV3_PERFCTR_PC_BRANCH_PRED,
+ [C(RESULT_MISS)] = ARMV8_PMUV3_PERFCTR_PC_BRANCH_MIS_PRED,
+ },
+ [C(OP_WRITE)] = {
+ [C(RESULT_ACCESS)] = ARMV8_PMUV3_PERFCTR_PC_BRANCH_PRED,
+ [C(RESULT_MISS)] = ARMV8_PMUV3_PERFCTR_PC_BRANCH_MIS_PRED,
+ },
+ [C(OP_PREFETCH)] = {
+ [C(RESULT_ACCESS)] = CACHE_OP_UNSUPPORTED,
+ [C(RESULT_MISS)] = CACHE_OP_UNSUPPORTED,
+ },
+ },
+ [C(NODE)] = {
+ [C(OP_READ)] = {
+ [C(RESULT_ACCESS)] = CACHE_OP_UNSUPPORTED,
+ [C(RESULT_MISS)] = CACHE_OP_UNSUPPORTED,
+ },
+ [C(OP_WRITE)] = {
+ [C(RESULT_ACCESS)] = CACHE_OP_UNSUPPORTED,
+ [C(RESULT_MISS)] = CACHE_OP_UNSUPPORTED,
+ },
+ [C(OP_PREFETCH)] = {
+ [C(RESULT_ACCESS)] = CACHE_OP_UNSUPPORTED,
+ [C(RESULT_MISS)] = CACHE_OP_UNSUPPORTED,
+ },
+ },
+};
+
+/*
+ * Perf Events' indices
+ */
+#define ARMV8_IDX_CYCLE_COUNTER 0
+#define ARMV8_IDX_COUNTER0 1
+#define ARMV8_IDX_COUNTER_LAST (ARMV8_IDX_CYCLE_COUNTER + cpu_pmu->num_events - 1)
+
+#define ARMV8_MAX_COUNTERS 32
+#define ARMV8_COUNTER_MASK (ARMV8_MAX_COUNTERS - 1)
+
+/*
+ * ARMv8 low level PMU access
+ */
+
+/*
+ * Perf Event to low level counters mapping
+ */
+#define ARMV8_IDX_TO_COUNTER(x) \
+ (((x) - ARMV8_IDX_COUNTER0) & ARMV8_COUNTER_MASK)
+
+/*
+ * Per-CPU PMCR: config reg
+ */
+#define ARMV8_PMCR_E (1 << 0) /* Enable all counters */
+#define ARMV8_PMCR_P (1 << 1) /* Reset all counters */
+#define ARMV8_PMCR_C (1 << 2) /* Cycle counter reset */
+#define ARMV8_PMCR_D (1 << 3) /* CCNT counts every 64th cpu cycle */
+#define ARMV8_PMCR_X (1 << 4) /* Export to ETM */
+#define ARMV8_PMCR_DP (1 << 5) /* Disable CCNT if non-invasive debug*/
+#define ARMV8_PMCR_N_SHIFT 11 /* Number of counters supported */
+#define ARMV8_PMCR_N_MASK 0x1f
+#define ARMV8_PMCR_MASK 0x3f /* Mask for writable bits */
+
+/*
+ * PMOVSR: counters overflow flag status reg
+ */
+#define ARMV8_OVSR_MASK 0xffffffff /* Mask for writable bits */
+#define ARMV8_OVERFLOWED_MASK ARMV8_OVSR_MASK
+
+/*
+ * PMXEVTYPER: Event selection reg
+ */
+#define ARMV8_EVTYPE_MASK 0xc00000ff /* Mask for writable bits */
+#define ARMV8_EVTYPE_EVENT 0xff /* Mask for EVENT bits */
+
+/*
+ * Event filters for PMUv3
+ */
+#define ARMV8_EXCLUDE_EL1 (1 << 31)
+#define ARMV8_EXCLUDE_EL0 (1 << 30)
+#define ARMV8_INCLUDE_EL2 (1 << 27)
+
+static inline u32 armv8pmu_pmcr_read(void)
+{
+ u32 val;
+ asm volatile("mrs %0, pmcr_el0" : "=r" (val));
+ return val;
+}
+
+static inline void armv8pmu_pmcr_write(u32 val)
+{
+ val &= ARMV8_PMCR_MASK;
+ isb();
+ asm volatile("msr pmcr_el0, %0" :: "r" (val));
+}
+
+static inline int armv8pmu_has_overflowed(u32 pmovsr)
+{
+ return pmovsr & ARMV8_OVERFLOWED_MASK;
+}
+
+static inline int armv8pmu_counter_valid(int idx)
+{
+ return idx >= ARMV8_IDX_CYCLE_COUNTER && idx <= ARMV8_IDX_COUNTER_LAST;
+}
+
+static inline int armv8pmu_counter_has_overflowed(u32 pmnc, int idx)
+{
+ int ret = 0;
+ u32 counter;
+
+ if (!armv8pmu_counter_valid(idx)) {
+ pr_err("CPU%u checking wrong counter %d overflow status\n",
+ smp_processor_id(), idx);
+ } else {
+ counter = ARMV8_IDX_TO_COUNTER(idx);
+ ret = pmnc & BIT(counter);
+ }
+
+ return ret;
+}
+
+static inline int armv8pmu_select_counter(int idx)
+{
+ u32 counter;
+
+ if (!armv8pmu_counter_valid(idx)) {
+ pr_err("CPU%u selecting wrong PMNC counter %d\n",
+ smp_processor_id(), idx);
+ return -EINVAL;
+ }
+
+ counter = ARMV8_IDX_TO_COUNTER(idx);
+ asm volatile("msr pmselr_el0, %0" :: "r" (counter));
+ isb();
+
+ return idx;
+}
+
+static inline u32 armv8pmu_read_counter(int idx)
+{
+ u32 value = 0;
+
+ if (!armv8pmu_counter_valid(idx))
+ pr_err("CPU%u reading wrong counter %d\n",
+ smp_processor_id(), idx);
+ else if (idx == ARMV8_IDX_CYCLE_COUNTER)
+ asm volatile("mrs %0, pmccntr_el0" : "=r" (value));
+ else if (armv8pmu_select_counter(idx) == idx)
+ asm volatile("mrs %0, pmxevcntr_el0" : "=r" (value));
+
+ return value;
+}
+
+static inline void armv8pmu_write_counter(int idx, u32 value)
+{
+ if (!armv8pmu_counter_valid(idx))
+ pr_err("CPU%u writing wrong counter %d\n",
+ smp_processor_id(), idx);
+ else if (idx == ARMV8_IDX_CYCLE_COUNTER)
+ asm volatile("msr pmccntr_el0, %0" :: "r" (value));
+ else if (armv8pmu_select_counter(idx) == idx)
+ asm volatile("msr pmxevcntr_el0, %0" :: "r" (value));
+}
+
+static inline void armv8pmu_write_evtype(int idx, u32 val)
+{
+ if (armv8pmu_select_counter(idx) == idx) {
+ val &= ARMV8_EVTYPE_MASK;
+ asm volatile("msr pmxevtyper_el0, %0" :: "r" (val));
+ }
+}
+
+static inline int armv8pmu_enable_counter(int idx)
+{
+ u32 counter;
+
+ if (!armv8pmu_counter_valid(idx)) {
+ pr_err("CPU%u enabling wrong PMNC counter %d\n",
+ smp_processor_id(), idx);
+ return -EINVAL;
+ }
+
+ counter = ARMV8_IDX_TO_COUNTER(idx);
+ asm volatile("msr pmcntenset_el0, %0" :: "r" (BIT(counter)));
+ return idx;
+}
+
+static inline int armv8pmu_disable_counter(int idx)
+{
+ u32 counter;
+
+ if (!armv8pmu_counter_valid(idx)) {
+ pr_err("CPU%u disabling wrong PMNC counter %d\n",
+ smp_processor_id(), idx);
+ return -EINVAL;
+ }
+
+ counter = ARMV8_IDX_TO_COUNTER(idx);
+ asm volatile("msr pmcntenclr_el0, %0" :: "r" (BIT(counter)));
+ return idx;
+}
+
+static inline int armv8pmu_enable_intens(int idx)
+{
+ u32 counter;
+
+ if (!armv8pmu_counter_valid(idx)) {
+ pr_err("CPU%u enabling wrong PMNC counter IRQ enable %d\n",
+ smp_processor_id(), idx);
+ return -EINVAL;
+ }
+
+ counter = ARMV8_IDX_TO_COUNTER(idx);
+ asm volatile("msr pmintenset_el1, %0" :: "r" (BIT(counter)));
+ return idx;
+}
+
+static inline int armv8pmu_disable_intens(int idx)
+{
+ u32 counter;
+
+ if (!armv8pmu_counter_valid(idx)) {
+ pr_err("CPU%u disabling wrong PMNC counter IRQ enable %d\n",
+ smp_processor_id(), idx);
+ return -EINVAL;
+ }
+
+ counter = ARMV8_IDX_TO_COUNTER(idx);
+ asm volatile("msr pmintenclr_el1, %0" :: "r" (BIT(counter)));
+ isb();
+ /* Clear the overflow flag in case an interrupt is pending. */
+ asm volatile("msr pmovsclr_el0, %0" :: "r" (BIT(counter)));
+ isb();
+ return idx;
+}
+
+static inline u32 armv8pmu_getreset_flags(void)
+{
+ u32 value;
+
+ /* Read */
+ asm volatile("mrs %0, pmovsclr_el0" : "=r" (value));
+
+ /* Write to clear flags */
+ value &= ARMV8_OVSR_MASK;
+ asm volatile("msr pmovsclr_el0, %0" :: "r" (value));
+
+ return value;
+}
+
+static void armv8pmu_enable_event(struct hw_perf_event *hwc, int idx)
+{
+ unsigned long flags;
+ struct pmu_hw_events *events = cpu_pmu->get_hw_events();
+
+ /*
+ * Enable counter and interrupt, and set the counter to count
+ * the event that we're interested in.
+ */
+ raw_spin_lock_irqsave(&events->pmu_lock, flags);
+
+ /*
+ * Disable counter
+ */
+ armv8pmu_disable_counter(idx);
+
+ /*
+ * Set event (if destined for PMNx counters).
+ */
+ armv8pmu_write_evtype(idx, hwc->config_base);
+
+ /*
+ * Enable interrupt for this counter
+ */
+ armv8pmu_enable_intens(idx);
+
+ /*
+ * Enable counter
+ */
+ armv8pmu_enable_counter(idx);
+
+ raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
+}
+
+static void armv8pmu_disable_event(struct hw_perf_event *hwc, int idx)
+{
+ unsigned long flags;
+ struct pmu_hw_events *events = cpu_pmu->get_hw_events();
+
+ /*
+ * Disable counter and interrupt
+ */
+ raw_spin_lock_irqsave(&events->pmu_lock, flags);
+
+ /*
+ * Disable counter
+ */
+ armv8pmu_disable_counter(idx);
+
+ /*
+ * Disable interrupt for this counter
+ */
+ armv8pmu_disable_intens(idx);
+
+ raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
+}
+
+static irqreturn_t armv8pmu_handle_irq(int irq_num, void *dev)
+{
+ u32 pmovsr;
+ struct perf_sample_data data;
+ struct pmu_hw_events *cpuc;
+ struct pt_regs *regs;
+ int idx;
+
+ /*
+ * Get and reset the IRQ flags
+ */
+ pmovsr = armv8pmu_getreset_flags();
+
+ /*
+ * Did an overflow occur?
+ */
+ if (!armv8pmu_has_overflowed(pmovsr))
+ return IRQ_NONE;
+
+ /*
+ * Handle the counter(s) overflow(s)
+ */
+ regs = get_irq_regs();
+
+ cpuc = &__get_cpu_var(cpu_hw_events);
+ for (idx = 0; idx < cpu_pmu->num_events; ++idx) {
+ struct perf_event *event = cpuc->events[idx];
+ struct hw_perf_event *hwc;
+
+ /* Ignore if we don't have an event. */
+ if (!event)
+ continue;
+
+ /*
+ * We have a single interrupt for all counters. Check that
+ * each counter has overflowed before we process it.
+ */
+ if (!armv8pmu_counter_has_overflowed(pmovsr, idx))
+ continue;
+
+ hwc = &event->hw;
+ armpmu_event_update(event, hwc, idx);
+ perf_sample_data_init(&data, 0, hwc->last_period);
+ if (!armpmu_event_set_period(event, hwc, idx))
+ continue;
+
+ if (perf_event_overflow(event, &data, regs))
+ cpu_pmu->disable(hwc, idx);
+ }
+
+ /*
+ * Handle the pending perf events.
+ *
+ * Note: this call *must* be run with interrupts disabled. For
+ * platforms that can have the PMU interrupts raised as an NMI, this
+ * will not work.
+ */
+ irq_work_run();
+
+ return IRQ_HANDLED;
+}
+
+static void armv8pmu_start(void)
+{
+ unsigned long flags;
+ struct pmu_hw_events *events = cpu_pmu->get_hw_events();
+
+ raw_spin_lock_irqsave(&events->pmu_lock, flags);
+ /* Enable all counters */
+ armv8pmu_pmcr_write(armv8pmu_pmcr_read() | ARMV8_PMCR_E);
+ raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
+}
+
+static void armv8pmu_stop(void)
+{
+ unsigned long flags;
+ struct pmu_hw_events *events = cpu_pmu->get_hw_events();
+
+ raw_spin_lock_irqsave(&events->pmu_lock, flags);
+ /* Disable all counters */
+ armv8pmu_pmcr_write(armv8pmu_pmcr_read() & ~ARMV8_PMCR_E);
+ raw_spin_unlock_irqrestore(&events->pmu_lock, flags);
+}
+
+static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
+ struct hw_perf_event *event)
+{
+ int idx;
+ unsigned long evtype = event->config_base & ARMV8_EVTYPE_EVENT;
+
+ /* Always place a cycle counter into the cycle counter. */
+ if (evtype == ARMV8_PMUV3_PERFCTR_CPU_CYCLES) {
+ if (test_and_set_bit(ARMV8_IDX_CYCLE_COUNTER, cpuc->used_mask))
+ return -EAGAIN;
+
+ return ARMV8_IDX_CYCLE_COUNTER;
+ }
+
+ /*
+ * For anything other than a cycle counter, try and use
+ * the events counters
+ */
+ for (idx = ARMV8_IDX_COUNTER0; idx < cpu_pmu->num_events; ++idx) {
+ if (!test_and_set_bit(idx, cpuc->used_mask))
+ return idx;
+ }
+
+ /* The counters are all in use. */
+ return -EAGAIN;
+}
+
+/*
+ * Add an event filter to a given event. This will only work for PMUv2 PMUs.
+ */
+static int armv8pmu_set_event_filter(struct hw_perf_event *event,
+ struct perf_event_attr *attr)
+{
+ unsigned long config_base = 0;
+
+ if (attr->exclude_idle)
+ return -EPERM;
+ if (attr->exclude_user)
+ config_base |= ARMV8_EXCLUDE_EL0;
+ if (attr->exclude_kernel)
+ config_base |= ARMV8_EXCLUDE_EL1;
+ if (!attr->exclude_hv)
+ config_base |= ARMV8_INCLUDE_EL2;
+
+ /*
+ * Install the filter into config_base as this is used to
+ * construct the event type.
+ */
+ event->config_base = config_base;
+
+ return 0;
+}
+
+static void armv8pmu_reset(void *info)
+{
+ u32 idx, nb_cnt = cpu_pmu->num_events;
+
+ /* The counter and interrupt enable registers are unknown at reset. */
+ for (idx = ARMV8_IDX_CYCLE_COUNTER; idx < nb_cnt; ++idx)
+ armv8pmu_disable_event(NULL, idx);
+
+ /* Initialize & Reset PMNC: C and P bits. */
+ armv8pmu_pmcr_write(ARMV8_PMCR_P | ARMV8_PMCR_C);
+
+ /* Disable access from userspace. */
+ asm volatile("msr pmuserenr_el0, %0" :: "r" (0));
+}
+
+static int armv8_pmuv3_map_event(struct perf_event *event)
+{
+ return map_cpu_event(event, &armv8_pmuv3_perf_map,
+ &armv8_pmuv3_perf_cache_map, 0xFF);
+}
+
+static struct arm_pmu armv8pmu = {
+ .handle_irq = armv8pmu_handle_irq,
+ .enable = armv8pmu_enable_event,
+ .disable = armv8pmu_disable_event,
+ .read_counter = armv8pmu_read_counter,
+ .write_counter = armv8pmu_write_counter,
+ .get_event_idx = armv8pmu_get_event_idx,
+ .start = armv8pmu_start,
+ .stop = armv8pmu_stop,
+ .reset = armv8pmu_reset,
+ .max_period = (1LLU << 32) - 1,
+};
+
+static u32 __init armv8pmu_read_num_pmnc_events(void)
+{
+ u32 nb_cnt;
+
+ /* Read the nb of CNTx counters supported from PMNC */
+ nb_cnt = (armv8pmu_pmcr_read() >> ARMV8_PMCR_N_SHIFT) & ARMV8_PMCR_N_MASK;
+
+ /* Add the CPU cycles counter and return */
+ return nb_cnt + 1;
+}
+
+static struct arm_pmu *__init armv8_pmuv3_pmu_init(void)
+{
+ armv8pmu.name = "arm/armv8-pmuv3";
+ armv8pmu.map_event = armv8_pmuv3_map_event;
+ armv8pmu.num_events = armv8pmu_read_num_pmnc_events();
+ armv8pmu.set_event_filter = armv8pmu_set_event_filter;
+ return &armv8pmu;
+}
+
+/*
+ * Ensure the PMU has sane values out of reset.
+ * This requires SMP to be available, so exists as a separate initcall.
+ */
+static int __init
+cpu_pmu_reset(void)
+{
+ if (cpu_pmu && cpu_pmu->reset)
+ return on_each_cpu(cpu_pmu->reset, NULL, 1);
+ return 0;
+}
+arch_initcall(cpu_pmu_reset);
+
+/*
+ * PMU platform driver and devicetree bindings.
+ */
+static struct of_device_id armpmu_of_device_ids[] = {
+ {.compatible = "arm,armv8-pmuv3"},
+ {},
+};
+
+static int __devinit armpmu_device_probe(struct platform_device *pdev)
+{
+ if (!cpu_pmu)
+ return -ENODEV;
+
+ cpu_pmu->plat_device = pdev;
+ return 0;
+}
+
+static struct platform_driver armpmu_driver = {
+ .driver = {
+ .name = "arm-pmu",
+ .of_match_table = armpmu_of_device_ids,
+ },
+ .probe = armpmu_device_probe,
+};
+
+static int __init register_pmu_driver(void)
+{
+ return platform_driver_register(&armpmu_driver);
+}
+device_initcall(register_pmu_driver);
+
+static struct pmu_hw_events *armpmu_get_cpu_events(void)
+{
+ return &__get_cpu_var(cpu_hw_events);
+}
+
+static void __init cpu_pmu_init(struct arm_pmu *armpmu)
+{
+ int cpu;
+ for_each_possible_cpu(cpu) {
+ struct pmu_hw_events *events = &per_cpu(cpu_hw_events, cpu);
+ events->events = per_cpu(hw_events, cpu);
+ events->used_mask = per_cpu(used_mask, cpu);
+ raw_spin_lock_init(&events->pmu_lock);
+ }
+ armpmu->get_hw_events = armpmu_get_cpu_events;
+}
+
+static int __init init_hw_perf_events(void)
+{
+ u64 dfr = read_cpuid(ID_AA64DFR0_EL1);
+
+ switch ((dfr >> 8) & 0xf) {
+ case 0x1: /* PMUv3 */
+ cpu_pmu = armv8_pmuv3_pmu_init();
+ break;
+ }
+
+ if (cpu_pmu) {
+ pr_info("enabled with %s PMU driver, %d counters available\n",
+ cpu_pmu->name, cpu_pmu->num_events);
+ cpu_pmu_init(cpu_pmu);
+ armpmu_register(cpu_pmu, "cpu", PERF_TYPE_RAW);
+ } else {
+ pr_info("no hardware support available\n");
+ }
+
+ return 0;
+}
+early_initcall(init_hw_perf_events);
+
+/*
+ * Callchain handling code.
+ */
+struct frame_tail {
+ struct frame_tail __user *fp;
+ unsigned long lr;
+} __attribute__((packed));
+
+/*
+ * Get the return address for a single stackframe and return a pointer to the
+ * next frame tail.
+ */
+static struct frame_tail __user *
+user_backtrace(struct frame_tail __user *tail,
+ struct perf_callchain_entry *entry)
+{
+ struct frame_tail buftail;
+ unsigned long err;
+
+ /* Also check accessibility of one struct frame_tail beyond */
+ if (!access_ok(VERIFY_READ, tail, sizeof(buftail)))
+ return NULL;
+
+ pagefault_disable();
+ err = __copy_from_user_inatomic(&buftail, tail, sizeof(buftail));
+ pagefault_enable();
+
+ if (err)
+ return NULL;
+
+ perf_callchain_store(entry, buftail.lr);
+
+ /*
+ * Frame pointers should strictly progress back up the stack
+ * (towards higher addresses).
+ */
+ if (tail >= buftail.fp)
+ return NULL;
+
+ return buftail.fp;
+}
+
+void perf_callchain_user(struct perf_callchain_entry *entry,
+ struct pt_regs *regs)
+{
+ struct frame_tail __user *tail;
+
+ tail = (struct frame_tail __user *)regs->regs[29];
+
+ while (entry->nr < PERF_MAX_STACK_DEPTH &&
+ tail && !((unsigned long)tail & 0xf))
+ tail = user_backtrace(tail, entry);
+}
+
+/*
+ * Gets called by walk_stackframe() for every stackframe. This will be called
+ * whist unwinding the stackframe and is like a subroutine return so we use
+ * the PC.
+ */
+static int callchain_trace(struct stackframe *frame, void *data)
+{
+ struct perf_callchain_entry *entry = data;
+ perf_callchain_store(entry, frame->pc);
+ return 0;
+}
+
+void perf_callchain_kernel(struct perf_callchain_entry *entry,
+ struct pt_regs *regs)
+{
+ struct stackframe frame;
+
+ frame.fp = regs->regs[29];
+ frame.sp = regs->sp;
+ frame.pc = regs->pc;
+ walk_stackframe(&frame, callchain_trace, entry);
+}
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index f960ccb..8c36763 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -88,6 +88,12 @@ void get_term_dimensions(struct winsize *ws);
#define CPUINFO_PROC "Processor"
#endif

+#ifdef __aarch64__
+#include "../../arch/arm64/include/asm/unistd.h"
+#define rmb() asm volatile("dmb ld" ::: "memory")
+#define cpu_relax() asm volatile("yield" ::: "memory")
+#endif
+
#ifdef __mips__
#include "../../arch/mips/include/asm/unistd.h"
#define rmb() asm volatile( \

2012-08-14 17:54:50

by Catalin Marinas

[permalink] [raw]
Subject: [PATCH v2 04/31] arm64: MMU definitions

The virtual memory layout is described in
Documentation/arm64/memory.txt. This patch adds the MMU definitions for
the 4KB and 64KB translation table configurations. The SECTION_SIZE is
2MB with 4KB page and 512MB with 64KB page configuration.

PHYS_OFFSET is calculated at run-time and stored in a variable (no
run-time code patching at this stage).

On the current implementation, both user and kernel address spaces are
512G (39-bit) each with a maximum of 256G for the RAM linear mapping.
Linux uses 3 levels of translation tables with the 4K page configuration
and 2 levels with the 64K configuration. Extending the memory space
beyond 39-bit with the 4K pages or 42-bit with 64K pages requires an
additional level of translation tables.

The SPARSEMEM configuration is global to all AArch64 platforms and
allows for 1GB sections with SPARSEMEM_VMEMMAP enabled by default.

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
---
Documentation/arm64/memory.txt | 69 +++++
arch/arm64/include/asm/memory.h | 144 +++++++++++
arch/arm64/include/asm/mmu.h | 27 ++
arch/arm64/include/asm/pgtable-2level-hwdef.h | 43 ++++
arch/arm64/include/asm/pgtable-2level-types.h | 60 +++++
arch/arm64/include/asm/pgtable-3level-hwdef.h | 50 ++++
arch/arm64/include/asm/pgtable-3level-types.h | 66 +++++
arch/arm64/include/asm/pgtable-hwdef.h | 94 +++++++
arch/arm64/include/asm/pgtable.h | 328 +++++++++++++++++++++++++
arch/arm64/include/asm/sparsemem.h | 24 ++
10 files changed, 905 insertions(+), 0 deletions(-)
create mode 100644 Documentation/arm64/memory.txt
create mode 100644 arch/arm64/include/asm/memory.h
create mode 100644 arch/arm64/include/asm/mmu.h
create mode 100644 arch/arm64/include/asm/pgtable-2level-hwdef.h
create mode 100644 arch/arm64/include/asm/pgtable-2level-types.h
create mode 100644 arch/arm64/include/asm/pgtable-3level-hwdef.h
create mode 100644 arch/arm64/include/asm/pgtable-3level-types.h
create mode 100644 arch/arm64/include/asm/pgtable-hwdef.h
create mode 100644 arch/arm64/include/asm/pgtable.h
create mode 100644 arch/arm64/include/asm/sparsemem.h

diff --git a/Documentation/arm64/memory.txt b/Documentation/arm64/memory.txt
new file mode 100644
index 0000000..7210af7
--- /dev/null
+++ b/Documentation/arm64/memory.txt
@@ -0,0 +1,69 @@
+ Memory Layout on AArch64 Linux
+ ==============================
+
+Author: Catalin Marinas <[email protected]>
+Date : 20 February 2012
+
+This document describes the virtual memory layout used by the AArch64
+Linux kernel. The architecture allows up to 4 levels of translation
+tables with a 4KB page size and up to 3 levels with a 64KB page size.
+
+AArch64 Linux uses 3 levels of translation tables with the 4KB page
+configuration, allowing 39-bit (512GB) virtual addresses for both user
+and kernel. With 64KB pages, only 2 levels of translation tables are
+used but the memory layout is the same.
+
+User addresses have bits 63:39 set to 0 while the kernel addresses have
+the same bits set to 1. TTBRx selection is given by bit 63 of the
+virtual address. The swapper_pg_dir contains only kernel (global)
+mappings while the user pgd contains only user (non-global) mappings.
+The swapper_pgd_dir address is written to TTBR1 and never written to
+TTBR0.
+
+
+AArch64 Linux memory layout:
+
+Start End Size Use
+-----------------------------------------------------------------------
+0000000000000000 0000007fffffffff 512GB user
+
+ffffff8000000000 ffffffbbfffeffff ~240GB vmalloc
+
+ffffffbbffff0000 ffffffbcffffffff 64KB [guard page]
+
+ffffffbc00000000 ffffffbdffffffff 8GB vmemmap
+
+ffffffbe00000000 ffffffbffbffffff ~8GB [guard, future vmmemap]
+
+ffffffbffc000000 ffffffbfffffffff 64MB modules
+
+ffffffc000000000 ffffffffffffffff 256GB memory
+
+
+Translation table lookup with 4KB pages:
+
++--------+--------+--------+--------+--------+--------+--------+--------+
+|63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0|
++--------+--------+--------+--------+--------+--------+--------+--------+
+ | | | | | |
+ | | | | | v
+ | | | | | [11:0] in-page offset
+ | | | | +-> [20:12] L3 index
+ | | | +-----------> [29:21] L2 index
+ | | +---------------------> [38:30] L1 index
+ | +-------------------------------> [47:39] L0 index (not used)
+ +-------------------------------------------------> [63] TTBR0/1
+
+
+Translation table lookup with 64KB pages:
+
++--------+--------+--------+--------+--------+--------+--------+--------+
+|63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0|
++--------+--------+--------+--------+--------+--------+--------+--------+
+ | | | | |
+ | | | | v
+ | | | | [15:0] in-page offset
+ | | | +----------> [28:16] L3 index
+ | | +--------------------------> [41:29] L2 index (only 38:29 used)
+ | +-------------------------------> [47:42] L1 index (not used)
+ +-------------------------------------------------> [63] TTBR0/1
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
new file mode 100644
index 0000000..3cfdc4b
--- /dev/null
+++ b/arch/arm64/include/asm/memory.h
@@ -0,0 +1,144 @@
+/*
+ * Based on arch/arm/include/asm/memory.h
+ *
+ * Copyright (C) 2000-2002 Russell King
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Note: this file should not be included by non-asm/.h files
+ */
+#ifndef __ASM_MEMORY_H
+#define __ASM_MEMORY_H
+
+#include <linux/compiler.h>
+#include <linux/const.h>
+#include <linux/types.h>
+#include <asm/sizes.h>
+
+/*
+ * Allow for constants defined here to be used from assembly code
+ * by prepending the UL suffix only with actual C code compilation.
+ */
+#define UL(x) _AC(x, UL)
+
+/*
+ * PAGE_OFFSET - the virtual address of the start of the kernel image.
+ * VA_BITS - the maximum number of bits for virtual addresses.
+ * TASK_SIZE - the maximum size of a user space task.
+ * TASK_UNMAPPED_BASE - the lower boundary of the mmap VM area.
+ * The module space lives between the addresses given by TASK_SIZE
+ * and PAGE_OFFSET - it must be within 128MB of the kernel text.
+ */
+#define PAGE_OFFSET UL(0xffffffc000000000)
+#define MODULES_END (PAGE_OFFSET)
+#define MODULES_VADDR (MODULES_END - SZ_64M)
+#define VA_BITS (39)
+#define TASK_SIZE_64 (UL(1) << VA_BITS)
+
+#ifdef CONFIG_AARCH32_EMULATION
+#define TASK_SIZE_32 UL(0x100000000)
+#define TASK_SIZE (test_thread_flag(TIF_32BIT) ? \
+ TASK_SIZE_32 : TASK_SIZE_64)
+#else
+#define TASK_SIZE TASK_SIZE_64
+#endif /* CONFIG_AARCH32_EMULATION */
+
+#define TASK_UNMAPPED_BASE (PAGE_ALIGN(TASK_SIZE / 4))
+
+#if TASK_SIZE_64 > MODULES_VADDR
+#error Top of 64-bit user space clashes with start of module space
+#endif
+
+/*
+ * Physical vs virtual RAM address space conversion. These are
+ * private definitions which should NOT be used outside memory.h
+ * files. Use virt_to_phys/phys_to_virt/__pa/__va instead.
+ */
+#define __virt_to_phys(x) (((phys_addr_t)(x) - PAGE_OFFSET + PHYS_OFFSET))
+#define __phys_to_virt(x) ((unsigned long)((x) - PHYS_OFFSET + PAGE_OFFSET))
+
+/*
+ * Convert a physical address to a Page Frame Number and back
+ */
+#define __phys_to_pfn(paddr) ((unsigned long)((paddr) >> PAGE_SHIFT))
+#define __pfn_to_phys(pfn) ((phys_addr_t)(pfn) << PAGE_SHIFT)
+
+/*
+ * Convert a page to/from a physical address
+ */
+#define page_to_phys(page) (__pfn_to_phys(page_to_pfn(page)))
+#define phys_to_page(phys) (pfn_to_page(__phys_to_pfn(phys)))
+
+/*
+ * Memory types available.
+ */
+#define MT_DEVICE_nGnRnE 0
+#define MT_DEVICE_nGnRE 1
+#define MT_DEVICE_GRE 2
+#define MT_NORMAL_NC 3
+#define MT_NORMAL 4
+
+#ifndef __ASSEMBLY__
+
+extern phys_addr_t memstart_addr;
+/* PHYS_OFFSET - the physical address of the start of memory. */
+#define PHYS_OFFSET ({ memstart_addr; })
+
+/*
+ * PFNs are used to describe any physical page; this means
+ * PFN 0 == physical address 0.
+ *
+ * This is the PFN of the first RAM page in the kernel
+ * direct-mapped view. We assume this is the first page
+ * of RAM in the mem_map as well.
+ */
+#define PHYS_PFN_OFFSET (PHYS_OFFSET >> PAGE_SHIFT)
+
+/*
+ * Note: Drivers should NOT use these. They are the wrong
+ * translation for translating DMA addresses. Use the driver
+ * DMA support - see dma-mapping.h.
+ */
+static inline phys_addr_t virt_to_phys(const volatile void *x)
+{
+ return __virt_to_phys((unsigned long)(x));
+}
+
+static inline void *phys_to_virt(phys_addr_t x)
+{
+ return (void *)(__phys_to_virt(x));
+}
+
+/*
+ * Drivers should NOT use these either.
+ */
+#define __pa(x) __virt_to_phys((unsigned long)(x))
+#define __va(x) ((void *)__phys_to_virt((phys_addr_t)(x)))
+#define pfn_to_kaddr(pfn) __va((pfn) << PAGE_SHIFT)
+
+/*
+ * virt_to_page(k) convert a _valid_ virtual address to struct page *
+ * virt_addr_valid(k) indicates whether a virtual address is valid
+ */
+#define ARCH_PFN_OFFSET PHYS_PFN_OFFSET
+
+#define virt_to_page(kaddr) pfn_to_page(__pa(kaddr) >> PAGE_SHIFT)
+#define virt_addr_valid(kaddr) (((void *)(kaddr) >= (void *)PAGE_OFFSET) && \
+ ((void *)(kaddr) < (void *)high_memory))
+
+#endif
+
+#include <asm-generic/memory_model.h>
+
+#endif
diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
new file mode 100644
index 0000000..981498a
--- /dev/null
+++ b/arch/arm64/include/asm/mmu.h
@@ -0,0 +1,27 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_MMU_H
+#define __ASM_MMU_H
+
+typedef struct {
+ unsigned int id;
+ spinlock_t id_lock;
+ void *vdso;
+} mm_context_t;
+
+#define ASID(mm) ((mm)->context.id & 0xffff)
+
+#endif
diff --git a/arch/arm64/include/asm/pgtable-2level-hwdef.h b/arch/arm64/include/asm/pgtable-2level-hwdef.h
new file mode 100644
index 0000000..0a8ed3f
--- /dev/null
+++ b/arch/arm64/include/asm/pgtable-2level-hwdef.h
@@ -0,0 +1,43 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_PGTABLE_2LEVEL_HWDEF_H
+#define __ASM_PGTABLE_2LEVEL_HWDEF_H
+
+/*
+ * With LPAE and 64KB pages, there are 2 levels of page tables. Each level has
+ * 8192 entries of 8 bytes each, occupying a 64KB page. Levels 0 and 1 are not
+ * used. The 2nd level table (PGD for Linux) can cover a range of 4TB, each
+ * entry representing 512MB. The user and kernel address spaces are limited to
+ * 512GB and therefore we only use 1024 entries in the PGD.
+ */
+#define PTRS_PER_PTE 8192
+#define PTRS_PER_PGD 1024
+
+/*
+ * PGDIR_SHIFT determines the size a top-level page table entry can map.
+ */
+#define PGDIR_SHIFT 29
+#define PGDIR_SIZE (_AC(1, UL) << PGDIR_SHIFT)
+#define PGDIR_MASK (~(PGDIR_SIZE-1))
+
+/*
+ * section address mask and size definitions.
+ */
+#define SECTION_SHIFT 29
+#define SECTION_SIZE (_AC(1, UL) << SECTION_SHIFT)
+#define SECTION_MASK (~(SECTION_SIZE-1))
+
+#endif
diff --git a/arch/arm64/include/asm/pgtable-2level-types.h b/arch/arm64/include/asm/pgtable-2level-types.h
new file mode 100644
index 0000000..3c3ca7d
--- /dev/null
+++ b/arch/arm64/include/asm/pgtable-2level-types.h
@@ -0,0 +1,60 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_PGTABLE_2LEVEL_TYPES_H
+#define __ASM_PGTABLE_2LEVEL_TYPES_H
+
+typedef u64 pteval_t;
+typedef u64 pgdval_t;
+typedef pgdval_t pmdval_t;
+
+#undef STRICT_MM_TYPECHECKS
+
+#ifdef STRICT_MM_TYPECHECKS
+
+/*
+ * These are used to make use of C type-checking..
+ */
+typedef struct { pteval_t pte; } pte_t;
+typedef struct { pgdval_t pgd; } pgd_t;
+typedef struct { pteval_t pgprot; } pgprot_t;
+
+#define pte_val(x) ((x).pte)
+#define pgd_val(x) ((x).pgd)
+#define pgprot_val(x) ((x).pgprot)
+
+#define __pte(x) ((pte_t) { (x) } )
+#define __pgd(x) ((pgd_t) { (x) } )
+#define __pgprot(x) ((pgprot_t) { (x) } )
+
+#else /* !STRICT_MM_TYPECHECKS */
+
+typedef pteval_t pte_t;
+typedef pgdval_t pgd_t;
+typedef pteval_t pgprot_t;
+
+#define pte_val(x) (x)
+#define pgd_val(x) (x)
+#define pgprot_val(x) (x)
+
+#define __pte(x) (x)
+#define __pgd(x) (x)
+#define __pgprot(x) (x)
+
+#endif /* STRICT_MM_TYPECHECKS */
+
+#include <asm-generic/pgtable-nopmd.h>
+
+#endif /* __ASM_PGTABLE_2LEVEL_TYPES_H */
diff --git a/arch/arm64/include/asm/pgtable-3level-hwdef.h b/arch/arm64/include/asm/pgtable-3level-hwdef.h
new file mode 100644
index 0000000..3dbf941
--- /dev/null
+++ b/arch/arm64/include/asm/pgtable-3level-hwdef.h
@@ -0,0 +1,50 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_PGTABLE_3LEVEL_HWDEF_H
+#define __ASM_PGTABLE_3LEVEL_HWDEF_H
+
+/*
+ * With LPAE and 4KB pages, there are 3 levels of page tables. Each level has
+ * 512 entries of 8 bytes each, occupying a 4K page. The first level table
+ * covers a range of 512GB, each entry representing 1GB. The user and kernel
+ * address spaces are limited to 512GB each.
+ */
+#define PTRS_PER_PTE 512
+#define PTRS_PER_PMD 512
+#define PTRS_PER_PGD 512
+
+/*
+ * PGDIR_SHIFT determines the size a top-level page table entry can map.
+ */
+#define PGDIR_SHIFT 30
+#define PGDIR_SIZE (_AC(1, UL) << PGDIR_SHIFT)
+#define PGDIR_MASK (~(PGDIR_SIZE-1))
+
+/*
+ * PMD_SHIFT determines the size a middle-level page table entry can map.
+ */
+#define PMD_SHIFT 21
+#define PMD_SIZE (_AC(1, UL) << PMD_SHIFT)
+#define PMD_MASK (~(PMD_SIZE-1))
+
+/*
+ * section address mask and size definitions.
+ */
+#define SECTION_SHIFT 21
+#define SECTION_SIZE (_AC(1, UL) << SECTION_SHIFT)
+#define SECTION_MASK (~(SECTION_SIZE-1))
+
+#endif
diff --git a/arch/arm64/include/asm/pgtable-3level-types.h b/arch/arm64/include/asm/pgtable-3level-types.h
new file mode 100644
index 0000000..4489615
--- /dev/null
+++ b/arch/arm64/include/asm/pgtable-3level-types.h
@@ -0,0 +1,66 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_PGTABLE_3LEVEL_TYPES_H
+#define __ASM_PGTABLE_3LEVEL_TYPES_H
+
+typedef u64 pteval_t;
+typedef u64 pmdval_t;
+typedef u64 pgdval_t;
+
+#undef STRICT_MM_TYPECHECKS
+
+#ifdef STRICT_MM_TYPECHECKS
+
+/*
+ * These are used to make use of C type-checking..
+ */
+typedef struct { pteval_t pte; } pte_t;
+typedef struct { pmdval_t pmd; } pmd_t;
+typedef struct { pgdval_t pgd; } pgd_t;
+typedef struct { pteval_t pgprot; } pgprot_t;
+
+#define pte_val(x) ((x).pte)
+#define pmd_val(x) ((x).pmd)
+#define pgd_val(x) ((x).pgd)
+#define pgprot_val(x) ((x).pgprot)
+
+#define __pte(x) ((pte_t) { (x) } )
+#define __pmd(x) ((pmd_t) { (x) } )
+#define __pgd(x) ((pgd_t) { (x) } )
+#define __pgprot(x) ((pgprot_t) { (x) } )
+
+#else /* !STRICT_MM_TYPECHECKS */
+
+typedef pteval_t pte_t;
+typedef pmdval_t pmd_t;
+typedef pgdval_t pgd_t;
+typedef pteval_t pgprot_t;
+
+#define pte_val(x) (x)
+#define pmd_val(x) (x)
+#define pgd_val(x) (x)
+#define pgprot_val(x) (x)
+
+#define __pte(x) (x)
+#define __pmd(x) (x)
+#define __pgd(x) (x)
+#define __pgprot(x) (x)
+
+#endif /* STRICT_MM_TYPECHECKS */
+
+#include <asm-generic/pgtable-nopud.h>
+
+#endif /* __ASM_PGTABLE_3LEVEL_TYPES_H */
diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
new file mode 100644
index 0000000..561fb08
--- /dev/null
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -0,0 +1,94 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_PGTABLE_HWDEF_H
+#define __ASM_PGTABLE_HWDEF_H
+
+#ifdef CONFIG_ARM64_64K_PAGES
+#include <asm/pgtable-2level-hwdef.h>
+#else
+#include <asm/pgtable-3level-hwdef.h>
+#endif
+
+/*
+ * Hardware page table definitions.
+ *
+ * Level 2 descriptor (PMD).
+ */
+#define PMD_TYPE_MASK (_AT(pmdval_t, 3) << 0)
+#define PMD_TYPE_FAULT (_AT(pmdval_t, 0) << 0)
+#define PMD_TYPE_TABLE (_AT(pmdval_t, 3) << 0)
+#define PMD_TYPE_SECT (_AT(pmdval_t, 1) << 0)
+
+/*
+ * Section
+ */
+#define PMD_SECT_S (_AT(pmdval_t, 3) << 8)
+#define PMD_SECT_AF (_AT(pmdval_t, 1) << 10)
+#define PMD_SECT_NG (_AT(pmdval_t, 1) << 11)
+#define PMD_SECT_XN (_AT(pmdval_t, 1) << 54)
+
+/*
+ * AttrIndx[2:0] encoding (mapping attributes defined in the MAIR* registers).
+ */
+#define PMD_ATTRINDX(t) (_AT(pmdval_t, (t)) << 2)
+#define PMD_ATTRINDX_MASK (_AT(pmdval_t, 7) << 2)
+
+/*
+ * Level 3 descriptor (PTE).
+ */
+#define PTE_TYPE_MASK (_AT(pteval_t, 3) << 0)
+#define PTE_TYPE_FAULT (_AT(pteval_t, 0) << 0)
+#define PTE_TYPE_PAGE (_AT(pteval_t, 3) << 0)
+#define PTE_USER (_AT(pteval_t, 1) << 6) /* AP[1] */
+#define PTE_RDONLY (_AT(pteval_t, 1) << 7) /* AP[2] */
+#define PTE_SHARED (_AT(pteval_t, 3) << 8) /* SH[1:0], inner shareable */
+#define PTE_AF (_AT(pteval_t, 1) << 10) /* Access Flag */
+#define PTE_NG (_AT(pteval_t, 1) << 11) /* nG */
+#define PTE_XN (_AT(pteval_t, 1) << 54) /* XN */
+
+/*
+ * AttrIndx[2:0] encoding (mapping attributes defined in the MAIR* registers).
+ */
+#define PTE_ATTRINDX(t) (_AT(pteval_t, (t)) << 2)
+#define PTE_ATTRINDX_MASK (_AT(pteval_t, 7) << 2)
+
+/*
+ * 40-bit physical address supported.
+ */
+#define PHYS_MASK_SHIFT (40)
+#define PHYS_MASK ((1UL << PHYS_MASK_SHIFT) - 1)
+
+/*
+ * TCR flags.
+ */
+#define TCR_TxSZ(x) (((64 - (x)) << 16) | ((64 - (x)) << 0))
+#define TCR_IRGN_NC ((0 << 8) | (0 << 24))
+#define TCR_IRGN_WBWA ((1 << 8) | (1 << 24))
+#define TCR_IRGN_WT ((2 << 8) | (2 << 24))
+#define TCR_IRGN_WBnWA ((3 << 8) | (3 << 24))
+#define TCR_IRGN_MASK ((3 << 8) | (3 << 24))
+#define TCR_ORGN_NC ((0 << 10) | (0 << 26))
+#define TCR_ORGN_WBWA ((1 << 10) | (1 << 26))
+#define TCR_ORGN_WT ((2 << 10) | (2 << 26))
+#define TCR_ORGN_WBnWA ((3 << 10) | (3 << 26))
+#define TCR_ORGN_MASK ((3 << 10) | (3 << 26))
+#define TCR_SHARED ((3 << 12) | (3 << 28))
+#define TCR_TG0_64K (1 << 14)
+#define TCR_TG1_64K (1 << 30)
+#define TCR_IPS_40BIT (2 << 32)
+#define TCR_ASID16 (1 << 36)
+
+#endif
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
new file mode 100644
index 0000000..6981da0
--- /dev/null
+++ b/arch/arm64/include/asm/pgtable.h
@@ -0,0 +1,328 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_PGTABLE_H
+#define __ASM_PGTABLE_H
+
+#include <asm/proc-fns.h>
+
+#include <asm/memory.h>
+#include <asm/pgtable-hwdef.h>
+
+/*
+ * Software defined PTE bits definition.
+ */
+#define PTE_VALID (_AT(pteval_t, 1) << 0) /* pte_present() check */
+#define PTE_FILE (_AT(pteval_t, 1) << 2) /* only when !pte_present() */
+#define PTE_DIRTY (_AT(pteval_t, 1) << 55)
+#define PTE_SPECIAL (_AT(pteval_t, 1) << 56)
+
+/*
+ * VMALLOC and SPARSEMEM_VMEMMAP ranges.
+ */
+#define VMALLOC_START UL(0xffffff8000000000)
+#define VMALLOC_END (PAGE_OFFSET - UL(0x400000000) - SZ_64K)
+
+#define vmemmap ((struct page *)(VMALLOC_END + SZ_64K))
+
+#define FIRST_USER_ADDRESS 0
+
+#ifndef __ASSEMBLY__
+extern void __pte_error(const char *file, int line, unsigned long val);
+extern void __pmd_error(const char *file, int line, unsigned long val);
+extern void __pgd_error(const char *file, int line, unsigned long val);
+
+#define pte_ERROR(pte) __pte_error(__FILE__, __LINE__, pte_val(pte))
+#ifndef CONFIG_ARM64_64K_PAGES
+#define pmd_ERROR(pmd) __pmd_error(__FILE__, __LINE__, pmd_val(pmd))
+#endif
+#define pgd_ERROR(pgd) __pgd_error(__FILE__, __LINE__, pgd_val(pgd))
+
+/*
+ * The pgprot_* and protection_map entries will be fixed up at runtime to
+ * include the cachable and bufferable bits based on memory policy, as well as
+ * any architecture dependent bits like global/ASID and SMP shared mapping
+ * bits.
+ */
+#define _PAGE_DEFAULT PTE_TYPE_PAGE | PTE_AF
+
+extern pgprot_t pgprot_default;
+
+#define _MOD_PROT(p, b) __pgprot(pgprot_val(p) | (b))
+
+#define PAGE_NONE _MOD_PROT(pgprot_default, PTE_NG | PTE_XN | PTE_RDONLY)
+#define PAGE_SHARED _MOD_PROT(pgprot_default, PTE_USER | PTE_NG | PTE_XN)
+#define PAGE_SHARED_EXEC _MOD_PROT(pgprot_default, PTE_USER | PTE_NG)
+#define PAGE_COPY _MOD_PROT(pgprot_default, PTE_USER | PTE_NG | PTE_XN | PTE_RDONLY)
+#define PAGE_COPY_EXEC _MOD_PROT(pgprot_default, PTE_USER | PTE_NG | PTE_RDONLY)
+#define PAGE_READONLY _MOD_PROT(pgprot_default, PTE_USER | PTE_NG | PTE_XN | PTE_RDONLY)
+#define PAGE_READONLY_EXEC _MOD_PROT(pgprot_default, PTE_USER | PTE_NG | PTE_RDONLY)
+#define PAGE_KERNEL _MOD_PROT(pgprot_default, PTE_XN | PTE_DIRTY)
+#define PAGE_KERNEL_EXEC _MOD_PROT(pgprot_default, PTE_DIRTY)
+
+#define __PAGE_NONE __pgprot(_PAGE_DEFAULT | PTE_NG | PTE_XN | PTE_RDONLY)
+#define __PAGE_SHARED __pgprot(_PAGE_DEFAULT | PTE_USER | PTE_NG | PTE_XN)
+#define __PAGE_SHARED_EXEC __pgprot(_PAGE_DEFAULT | PTE_USER | PTE_NG)
+#define __PAGE_COPY __pgprot(_PAGE_DEFAULT | PTE_USER | PTE_NG | PTE_XN | PTE_RDONLY)
+#define __PAGE_COPY_EXEC __pgprot(_PAGE_DEFAULT | PTE_USER | PTE_NG | PTE_RDONLY)
+#define __PAGE_READONLY __pgprot(_PAGE_DEFAULT | PTE_USER | PTE_NG | PTE_XN | PTE_RDONLY)
+#define __PAGE_READONLY_EXEC __pgprot(_PAGE_DEFAULT | PTE_USER | PTE_NG | PTE_RDONLY)
+
+#endif /* __ASSEMBLY__ */
+
+#define __P000 __PAGE_NONE
+#define __P001 __PAGE_READONLY
+#define __P010 __PAGE_COPY
+#define __P011 __PAGE_COPY
+#define __P100 __PAGE_READONLY_EXEC
+#define __P101 __PAGE_READONLY_EXEC
+#define __P110 __PAGE_COPY_EXEC
+#define __P111 __PAGE_COPY_EXEC
+
+#define __S000 __PAGE_NONE
+#define __S001 __PAGE_READONLY
+#define __S010 __PAGE_SHARED
+#define __S011 __PAGE_SHARED
+#define __S100 __PAGE_READONLY_EXEC
+#define __S101 __PAGE_READONLY_EXEC
+#define __S110 __PAGE_SHARED_EXEC
+#define __S111 __PAGE_SHARED_EXEC
+
+#ifndef __ASSEMBLY__
+/*
+ * ZERO_PAGE is a global shared page that is always zero: used
+ * for zero-mapped memory areas etc..
+ */
+extern struct page *empty_zero_page;
+#define ZERO_PAGE(vaddr) (empty_zero_page)
+
+#define pte_pfn(pte) ((pte_val(pte) & PHYS_MASK) >> PAGE_SHIFT)
+
+#define pfn_pte(pfn,prot) (__pte(((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot)))
+
+#define pte_none(pte) (!pte_val(pte))
+#define pte_clear(mm,addr,ptep) set_pte(ptep, __pte(0))
+#define pte_page(pte) (pfn_to_page(pte_pfn(pte)))
+#define pte_offset_kernel(dir,addr) (pmd_page_vaddr(*(dir)) + __pte_index(addr))
+
+#define pte_offset_map(dir,addr) pte_offset_kernel((dir), (addr))
+#define pte_offset_map_nested(dir,addr) pte_offset_kernel((dir), (addr))
+#define pte_unmap(pte) do { } while (0)
+#define pte_unmap_nested(pte) do { } while (0)
+
+/*
+ * The following only work if pte_present(). Undefined behaviour otherwise.
+ */
+#define pte_present(pte) (pte_val(pte) & PTE_VALID)
+#define pte_dirty(pte) (pte_val(pte) & PTE_DIRTY)
+#define pte_young(pte) (pte_val(pte) & PTE_AF)
+#define pte_special(pte) (pte_val(pte) & PTE_SPECIAL)
+#define pte_write(pte) (!(pte_val(pte) & PTE_RDONLY))
+#define pte_exec(pte) (!(pte_val(pte) & PTE_XN))
+
+#define pte_present_exec_user(pte) \
+ ((pte_val(pte) & (PTE_VALID | PTE_USER | PTE_XN)) == \
+ (PTE_VALID | PTE_USER))
+
+#define PTE_BIT_FUNC(fn,op) \
+static inline pte_t pte_##fn(pte_t pte) { pte_val(pte) op; return pte; }
+
+PTE_BIT_FUNC(wrprotect, |= PTE_RDONLY);
+PTE_BIT_FUNC(mkwrite, &= ~PTE_RDONLY);
+PTE_BIT_FUNC(mkclean, &= ~PTE_DIRTY);
+PTE_BIT_FUNC(mkdirty, |= PTE_DIRTY);
+PTE_BIT_FUNC(mkold, &= ~PTE_AF);
+PTE_BIT_FUNC(mkyoung, |= PTE_AF);
+PTE_BIT_FUNC(mkspecial, |= PTE_SPECIAL);
+
+static inline void set_pte(pte_t *ptep, pte_t pte)
+{
+ *ptep = pte;
+}
+
+extern void __sync_icache_dcache(pte_t pteval);
+
+static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte)
+{
+ if (pte_present_exec_user(pte))
+ __sync_icache_dcache(pte);
+ set_pte(ptep, pte);
+}
+
+/*
+ * Huge pte definitions.
+ */
+#define pte_huge(pte) ((pte_val(pte) & PTE_TYPE_MASK) == PTE_TYPE_HUGEPAGE)
+#define pte_mkhuge(pte) (__pte((pte_val(pte) & ~PTE_TYPE_MASK) | PTE_TYPE_HUGEPAGE))
+
+#define __pgprot_modify(prot,mask,bits) \
+ __pgprot((pgprot_val(prot) & ~(mask)) | (bits))
+
+#define __HAVE_ARCH_PTE_SPECIAL
+
+/*
+ * Mark the prot value as uncacheable and unbufferable.
+ */
+#define pgprot_noncached(prot) \
+ __pgprot_modify(prot, PTE_ATTRINDX_MASK, PTE_ATTRINDX(MT_DEVICE_nGnRnE))
+#define pgprot_writecombine(prot) \
+ __pgprot_modify(prot, PTE_ATTRINDX_MASK, PTE_ATTRINDX(MT_DEVICE_GRE))
+#define pgprot_dmacoherent(prot) \
+ __pgprot_modify(prot, PTE_ATTRINDX_MASK, PTE_ATTRINDX(MT_NORMAL_NC))
+#define __HAVE_PHYS_MEM_ACCESS_PROT
+struct file;
+extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
+ unsigned long size, pgprot_t vma_prot);
+
+#define pmd_none(pmd) (!pmd_val(pmd))
+#define pmd_present(pmd) (pmd_val(pmd))
+
+#define pmd_bad(pmd) (!(pmd_val(pmd) & 2))
+
+static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
+{
+ *pmdp = pmd;
+ dsb();
+}
+
+static inline void pmd_clear(pmd_t *pmdp)
+{
+ set_pmd(pmdp, __pmd(0));
+}
+
+static inline pte_t *pmd_page_vaddr(pmd_t pmd)
+{
+ return __va(pmd_val(pmd) & PHYS_MASK & (s32)PAGE_MASK);
+}
+
+#define pmd_page(pmd) pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
+
+/*
+ * Conversion functions: convert a page and protection to a page entry,
+ * and a page entry and page directory to the page they refer to.
+ */
+#define mk_pte(page,prot) pfn_pte(page_to_pfn(page),prot)
+
+#ifndef CONFIG_ARM64_64K_PAGES
+
+#define pud_none(pud) (!pud_val(pud))
+#define pud_bad(pud) (!(pud_val(pud) & 2))
+#define pud_present(pud) (pud_val(pud))
+
+static inline void set_pud(pud_t *pudp, pud_t pud)
+{
+ *pudp = pud;
+ dsb();
+}
+
+static inline void pud_clear(pud_t *pudp)
+{
+ set_pud(pudp, __pud(0));
+}
+
+static inline pmd_t *pud_page_vaddr(pud_t pud)
+{
+ return __va(pud_val(pud) & PHYS_MASK & (s32)PAGE_MASK);
+}
+
+#endif /* CONFIG_ARM64_64K_PAGES */
+
+/* to find an entry in a page-table-directory */
+#define pgd_index(addr) (((addr) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1))
+
+#define pgd_offset(mm, addr) ((mm)->pgd+pgd_index(addr))
+
+/* to find an entry in a kernel page-table-directory */
+#define pgd_offset_k(addr) pgd_offset(&init_mm, addr)
+
+/* Find an entry in the second-level page table.. */
+#ifndef CONFIG_ARM64_64K_PAGES
+#define pmd_index(addr) (((addr) >> PMD_SHIFT) & (PTRS_PER_PMD - 1))
+static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
+{
+ return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(addr);
+}
+#endif
+
+/* Find an entry in the third-level page table.. */
+#define __pte_index(addr) (((addr) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1))
+
+static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
+{
+ const pteval_t mask = PTE_USER | PTE_XN | PTE_RDONLY;
+ pte_val(pte) = (pte_val(pte) & ~mask) | (pgprot_val(newprot) & mask);
+ return pte;
+}
+
+extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
+extern pgd_t idmap_pg_dir[PTRS_PER_PGD];
+
+#define SWAPPER_DIR_SIZE (3 * PAGE_SIZE)
+#define IDMAP_DIR_SIZE (2 * PAGE_SIZE)
+
+/*
+ * Encode and decode a swap entry:
+ * bits 0-1: present (must be zero)
+ * bit 2: PTE_FILE
+ * bits 3-8: swap type
+ * bits 9-63: swap offset
+ */
+#define __SWP_TYPE_SHIFT 3
+#define __SWP_TYPE_BITS 6
+#define __SWP_TYPE_MASK ((1 << __SWP_TYPE_BITS) - 1)
+#define __SWP_OFFSET_SHIFT (__SWP_TYPE_BITS + __SWP_TYPE_SHIFT)
+
+#define __swp_type(x) (((x).val >> __SWP_TYPE_SHIFT) & __SWP_TYPE_MASK)
+#define __swp_offset(x) ((x).val >> __SWP_OFFSET_SHIFT)
+#define __swp_entry(type,offset) ((swp_entry_t) { ((type) << __SWP_TYPE_SHIFT) | ((offset) << __SWP_OFFSET_SHIFT) })
+
+#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
+#define __swp_entry_to_pte(swp) ((pte_t) { (swp).val })
+
+/*
+ * Ensure that there are not more swap files than can be encoded in the kernel
+ * the PTEs.
+ */
+#define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > __SWP_TYPE_BITS)
+
+/*
+ * Encode and decode a file entry:
+ * bits 0-1: present (must be zero)
+ * bit 2: PTE_FILE
+ * bits 3-63: file offset / PAGE_SIZE
+ */
+#define pte_file(pte) (pte_val(pte) & PTE_FILE)
+#define pte_to_pgoff(x) (pte_val(x) >> 3)
+#define pgoff_to_pte(x) __pte(((x) << 3) | PTE_FILE)
+
+#define PTE_FILE_MAX_BITS 61
+
+extern int kern_addr_valid(unsigned long addr);
+
+#include <asm-generic/pgtable.h>
+
+/*
+ * remap a physical page `pfn' of size `size' with page protection `prot'
+ * into virtual address `from'
+ */
+#define io_remap_pfn_range(vma,from,pfn,size,prot) \
+ remap_pfn_range(vma, from, pfn, size, prot)
+
+#define pgtable_cache_init() do { } while (0)
+
+#endif /* !__ASSEMBLY__ */
+
+#endif /* __ASM_PGTABLE_H */
diff --git a/arch/arm64/include/asm/sparsemem.h b/arch/arm64/include/asm/sparsemem.h
new file mode 100644
index 0000000..1be62bc
--- /dev/null
+++ b/arch/arm64/include/asm/sparsemem.h
@@ -0,0 +1,24 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_SPARSEMEM_H
+#define __ASM_SPARSEMEM_H
+
+#ifdef CONFIG_SPARSEMEM
+#define MAX_PHYSMEM_BITS 40
+#define SECTION_SIZE_BITS 30
+#endif
+
+#endif

2012-08-14 17:54:48

by Catalin Marinas

[permalink] [raw]
Subject: [PATCH v2 03/31] arm64: Exception handling

The patch contains the exception entry code (kernel/entry.S), pt_regs
structure and related accessors, undefined instruction trapping and
stack tracing.

AArch64 Linux kernel (including kernel threads) runs in EL1 mode using
the SP1 stack. The vectors don't have a fixed address, only alignment
(2^11) requirements.

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
---
arch/arm64/include/asm/ptrace.h | 206 +++++++++++
arch/arm64/include/asm/stacktrace.h | 29 ++
arch/arm64/include/asm/traps.h | 30 ++
arch/arm64/kernel/entry.S | 695 +++++++++++++++++++++++++++++++++++
arch/arm64/kernel/stacktrace.c | 127 +++++++
arch/arm64/kernel/traps.c | 357 ++++++++++++++++++
6 files changed, 1444 insertions(+), 0 deletions(-)
create mode 100644 arch/arm64/include/asm/ptrace.h
create mode 100644 arch/arm64/include/asm/stacktrace.h
create mode 100644 arch/arm64/include/asm/traps.h
create mode 100644 arch/arm64/kernel/entry.S
create mode 100644 arch/arm64/kernel/stacktrace.c
create mode 100644 arch/arm64/kernel/traps.c

diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h
new file mode 100644
index 0000000..a9abace
--- /dev/null
+++ b/arch/arm64/include/asm/ptrace.h
@@ -0,0 +1,206 @@
+/*
+ * Based on arch/arm/include/asm/ptrace.h
+ *
+ * Copyright (C) 1996-2003 Russell King
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_PTRACE_H
+#define __ASM_PTRACE_H
+
+#include <linux/types.h>
+
+#include <asm/hwcap.h>
+
+#define PTRACE_GETREGS 12
+#define PTRACE_SETREGS 13
+#define PTRACE_GETFPSIMDREGS 14
+#define PTRACE_SETFPSIMDREGS 15
+/* PTRACE_ATTACH is 16 */
+/* PTRACE_DETACH is 17 */
+#define PTRACE_GET_THREAD_AREA 22
+#define PTRACE_SET_SYSCALL 23
+#define PTRACE_GETHBPREGS 29
+#define PTRACE_SETHBPREGS 30
+
+/* AArch32-specific ptrace requests */
+#define COMPAT_PTRACE_GETVFPREGS 27
+#define COMPAT_PTRACE_SETVFPREGS 28
+
+/*
+ * PSR bits
+ */
+#define PSR_MODE_EL0t 0x00000000
+#define PSR_MODE_EL1t 0x00000004
+#define PSR_MODE_EL1h 0x00000005
+#define PSR_MODE_EL2t 0x00000008
+#define PSR_MODE_EL2h 0x00000009
+#define PSR_MODE_EL3t 0x0000000c
+#define PSR_MODE_EL3h 0x0000000d
+#define PSR_MODE_MASK 0x0000000f
+
+/* AArch32 CPSR bits */
+#define PSR_MODE32_BIT 0x00000010
+#define COMPAT_PSR_MODE_USR 0x00000010
+#define COMPAT_PSR_T_BIT 0x00000020
+#define COMPAT_PSR_IT_MASK 0x0600fc00 /* If-Then execution state mask */
+
+/* AArch64 SPSR bits */
+#define PSR_F_BIT 0x00000040
+#define PSR_I_BIT 0x00000080
+#define PSR_A_BIT 0x00000100
+#define PSR_D_BIT 0x00000200
+#define PSR_Q_BIT 0x08000000
+#define PSR_V_BIT 0x10000000
+#define PSR_C_BIT 0x20000000
+#define PSR_Z_BIT 0x40000000
+#define PSR_N_BIT 0x80000000
+
+/*
+ * Groups of PSR bits
+ */
+#define PSR_f 0xff000000 /* Flags */
+#define PSR_s 0x00ff0000 /* Status */
+#define PSR_x 0x0000ff00 /* Extension */
+#define PSR_c 0x000000ff /* Control */
+
+/*
+ * These are 'magic' values for PTRACE_PEEKUSR that return info about where a
+ * process is located in memory.
+ */
+#define PT_TEXT_ADDR 0x10000
+#define PT_DATA_ADDR 0x10004
+#define PT_TEXT_END_ADDR 0x10008
+
+#ifndef __ASSEMBLY__
+
+/*
+ * User structures for general purpose and floating point registers.
+ */
+struct user_pt_regs {
+ __u64 regs[31];
+ __u64 sp;
+ __u64 pc;
+ __u64 pstate;
+};
+
+struct user_fpsimd_state {
+ __uint128_t vregs[32];
+ __u32 fpsr;
+ __u32 fpcr;
+};
+
+#ifdef __KERNEL__
+
+#ifdef CONFIG_AARCH32_EMULATION
+/* sizeof(struct user) for AArch32 */
+#define COMPAT_USER_SZ 296
+/* AArch32 uses x13 as the stack pointer... */
+#define compat_sp regs[13]
+/* ... and x14 as the link register. */
+#define compat_lr regs[14]
+#endif
+
+/*
+ * This struct defines the way the registers are stored on the stack during an
+ * exception. Note that sizeof(struct pt_regs) has to be a multiple of 16 (for
+ * stack alignment). struct user_pt_regs must form a prefix of struct pt_regs.
+ */
+struct pt_regs {
+ union {
+ struct user_pt_regs user_regs;
+ struct {
+ u64 regs[31];
+ u64 sp;
+ u64 pc;
+ u64 pstate;
+ };
+ };
+ u64 orig_x0;
+ u64 syscallno;
+};
+
+#define arch_has_single_step() (1)
+
+#ifdef CONFIG_AARCH32_EMULATION
+#define compat_thumb_mode(regs) \
+ (((regs)->pstate & COMPAT_PSR_T_BIT))
+#else
+#define compat_thumb_mode(regs) (0)
+#endif
+
+#define user_mode(regs) \
+ (((regs)->pstate & PSR_MODE_MASK) == PSR_MODE_EL0t)
+
+#define compat_user_mode(regs) \
+ (((regs)->pstate & (PSR_MODE32_BIT | PSR_MODE_MASK)) == \
+ (PSR_MODE32_BIT | PSR_MODE_EL0t))
+
+#define processor_mode(regs) \
+ ((regs)->pstate & PSR_MODE_MASK)
+
+#define interrupts_enabled(regs) \
+ (!((regs)->pstate & PSR_I_BIT))
+
+#define fast_interrupts_enabled(regs) \
+ (!((regs)->pstate & PSR_F_BIT))
+
+#define user_stack_pointer(regs) \
+ ((regs)->sp)
+
+/*
+ * Are the current registers suitable for user mode? (used to maintain
+ * security in signal handlers)
+ */
+static inline int valid_user_regs(struct user_pt_regs *regs)
+{
+ if (user_mode(regs) && (regs->pstate & PSR_I_BIT) == 0) {
+ regs->pstate &= ~(PSR_F_BIT | PSR_A_BIT);
+
+ /* The T bit is reserved for AArch64 */
+ if (!(regs->pstate & PSR_MODE32_BIT))
+ regs->pstate &= ~COMPAT_PSR_T_BIT;
+
+ return 1;
+ }
+
+ /*
+ * Force PSR to something logical...
+ */
+ regs->pstate &= PSR_f | PSR_s | (PSR_x & ~PSR_A_BIT) | \
+ COMPAT_PSR_T_BIT | PSR_MODE32_BIT;
+
+ if (!(regs->pstate & PSR_MODE32_BIT)) {
+ regs->pstate &= ~COMPAT_PSR_T_BIT;
+ regs->pstate |= PSR_MODE_EL0t;
+ }
+
+ return 0;
+}
+
+#define instruction_pointer(regs) (regs)->pc
+
+#ifdef CONFIG_SMP
+extern unsigned long profile_pc(struct pt_regs *regs);
+#else
+#define profile_pc(regs) instruction_pointer(regs)
+#endif
+
+extern int aarch32_break_trap(struct pt_regs *regs);
+
+#endif /* __KERNEL__ */
+
+#endif /* __ASSEMBLY__ */
+
+#endif
diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
new file mode 100644
index 0000000..7318f6d
--- /dev/null
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -0,0 +1,29 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_STACKTRACE_H
+#define __ASM_STACKTRACE_H
+
+struct stackframe {
+ unsigned long fp;
+ unsigned long sp;
+ unsigned long pc;
+};
+
+extern int unwind_frame(struct stackframe *frame);
+extern void walk_stackframe(struct stackframe *frame,
+ int (*fn)(struct stackframe *, void *), void *data);
+
+#endif /* __ASM_STACKTRACE_H */
diff --git a/arch/arm64/include/asm/traps.h b/arch/arm64/include/asm/traps.h
new file mode 100644
index 0000000..10ca8ff
--- /dev/null
+++ b/arch/arm64/include/asm/traps.h
@@ -0,0 +1,30 @@
+/*
+ * Based on arch/arm/include/asm/traps.h
+ *
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_TRAP_H
+#define __ASM_TRAP_H
+
+static inline int in_exception_text(unsigned long ptr)
+{
+ extern char __exception_text_start[];
+ extern char __exception_text_end[];
+
+ return ptr >= (unsigned long)&__exception_text_start &&
+ ptr < (unsigned long)&__exception_text_end;
+}
+
+#endif
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
new file mode 100644
index 0000000..32b96ab
--- /dev/null
+++ b/arch/arm64/kernel/entry.S
@@ -0,0 +1,695 @@
+/*
+ * Low-level exception handling code
+ *
+ * Copyright (C) 2012 ARM Ltd.
+ * Authors: Catalin Marinas <[email protected]>
+ * Will Deacon <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/init.h>
+#include <linux/linkage.h>
+
+#include <asm/assembler.h>
+#include <asm/asm-offsets.h>
+#include <asm/errno.h>
+#include <asm/thread_info.h>
+#include <asm/unistd.h>
+
+/*
+ * Bad Abort numbers
+ *-----------------
+ */
+#define BAD_SYNC 0
+#define BAD_IRQ 1
+#define BAD_FIQ 2
+#define BAD_ERROR 3
+
+ .macro kernel_entry, el, regsize = 64
+ sub sp, sp, #S_FRAME_SIZE - S_LR // room for LR, SP, SPSR, ELR
+ .if \regsize == 32
+ mov w0, w0 // zero upper 32 bits of x0
+ .endif
+ push x28, x29
+ push x26, x27
+ push x24, x25
+ push x22, x23
+ push x20, x21
+ push x18, x19
+ push x16, x17
+ push x14, x15
+ push x12, x13
+ push x10, x11
+ push x8, x9
+ push x6, x7
+ push x4, x5
+ push x2, x3
+ push x0, x1
+ .if \el == 0
+ mrs x21, sp_el0
+ .else
+ add x21, sp, #S_FRAME_SIZE
+ .endif
+ mrs x22, elr_el1
+ mrs x23, spsr_el1
+ stp lr, x21, [sp, #S_LR]
+ stp x22, x23, [sp, #S_PC]
+
+ /*
+ * Set syscallno to -1 by default (overridden later if real syscall).
+ */
+ .if \el == 0
+ mvn x21, xzr
+ str x21, [sp, #S_SYSCALLNO]
+ .endif
+
+ /*
+ * Registers that may be useful after this macro is invoked:
+ *
+ * x21 - aborted SP
+ * x22 - aborted PC
+ * x23 - aborted PSTATE
+ */
+ .endm
+
+ .macro kernel_exit, el, ret = 0
+ ldp x21, x22, [sp, #S_PC] // load ELR, SPSR
+ .if \el == 0
+ ldr x23, [sp, #S_SP] // load return stack pointer
+ .endif
+ .if \ret
+ ldr x1, [sp, #S_X1] // preserve x0 (syscall return)
+ add sp, sp, S_X2
+ .else
+ pop x0, x1
+ .endif
+ pop x2, x3 // load the rest of the registers
+ pop x4, x5
+ pop x6, x7
+ pop x8, x9
+ msr elr_el1, x21 // set up the return data
+ msr spsr_el1, x22
+ .if \el == 0
+ msr sp_el0, x23
+ .endif
+ pop x10, x11
+ pop x12, x13
+ pop x14, x15
+ pop x16, x17
+ pop x18, x19
+ pop x20, x21
+ pop x22, x23
+ pop x24, x25
+ pop x26, x27
+ pop x28, x29
+ ldr lr, [sp], #S_FRAME_SIZE - S_LR // load LR and restore SP
+ eret // return to kernel
+ .endm
+
+ .macro get_thread_info, rd
+ mov \rd, sp
+ and \rd, \rd, #~((1 << 13) - 1) // top of 8K stack
+ .endm
+
+/*
+ * These are the registers used in the syscall handler, and allow us to
+ * have in theory up to 7 arguments to a function - x0 to x6.
+ *
+ * x7 is reserved for the system call number in 32-bit mode.
+ */
+sc_nr .req x25 // number of system calls
+scno .req x26 // syscall number
+stbl .req x27 // syscall table pointer
+tsk .req x28 // current thread_info
+
+/*
+ * Interrupt handling.
+ */
+ .macro irq_handler
+ ldr x1, handle_arch_irq
+ mov x0, sp
+ blr x1
+ .endm
+
+ .text
+
+/*
+ * Exception vectors.
+ */
+ .macro ventry label
+ .align 7
+ b \label
+ .endm
+
+ .align 11
+ENTRY(vectors)
+ ventry el1_sync_invalid // Synchronous EL1t
+ ventry el1_irq_invalid // IRQ EL1t
+ ventry el1_fiq_invalid // FIQ EL1t
+ ventry el1_error_invalid // Error EL1t
+
+ ventry el1_sync // Synchronous EL1h
+ ventry el1_irq // IRQ EL1h
+ ventry el1_fiq_invalid // FIQ EL1h
+ ventry el1_error_invalid // Error EL1h
+
+ ventry el0_sync // Synchronous 64-bit EL0
+ ventry el0_irq // IRQ 64-bit EL0
+ ventry el0_fiq_invalid // FIQ 64-bit EL0
+ ventry el0_error_invalid // Error 64-bit EL0
+
+#ifdef CONFIG_AARCH32_EMULATION
+ ventry el0_sync_compat // Synchronous 32-bit EL0
+ ventry el0_irq_compat // IRQ 32-bit EL0
+ ventry el0_fiq_invalid_compat // FIQ 32-bit EL0
+ ventry el0_error_invalid_compat // Error 32-bit EL0
+#else
+ ventry el0_sync_invalid // Synchronous 32-bit EL0
+ ventry el0_irq_invalid // IRQ 32-bit EL0
+ ventry el0_fiq_invalid // FIQ 32-bit EL0
+ ventry el0_error_invalid // Error 32-bit EL0
+#endif
+END(vectors)
+
+/*
+ * Invalid mode handlers
+ */
+ .macro inv_entry, el, reason, regsize = 64
+ kernel_entry el, \regsize
+ mov x0, sp
+ mov x1, #\reason
+ mrs x2, esr_el1
+ b bad_mode
+ .endm
+
+el0_sync_invalid:
+ inv_entry 0, BAD_SYNC
+ENDPROC(el0_sync_invalid)
+
+el0_irq_invalid:
+ inv_entry 0, BAD_IRQ
+ENDPROC(el0_irq_invalid)
+
+el0_fiq_invalid:
+ inv_entry 0, BAD_FIQ
+ENDPROC(el0_fiq_invalid)
+
+el0_error_invalid:
+ inv_entry 0, BAD_ERROR
+ENDPROC(el0_error_invalid)
+
+#ifdef CONFIG_AARCH32_EMULATION
+el0_fiq_invalid_compat:
+ inv_entry 0, BAD_FIQ, 32
+ENDPROC(el0_fiq_invalid_compat)
+
+el0_error_invalid_compat:
+ inv_entry 0, BAD_ERROR, 32
+ENDPROC(el0_error_invalid_compat)
+#endif
+
+el1_sync_invalid:
+ inv_entry 1, BAD_SYNC
+ENDPROC(el1_sync_invalid)
+
+el1_irq_invalid:
+ inv_entry 1, BAD_IRQ
+ENDPROC(el1_irq_invalid)
+
+el1_fiq_invalid:
+ inv_entry 1, BAD_FIQ
+ENDPROC(el1_fiq_invalid)
+
+el1_error_invalid:
+ inv_entry 1, BAD_ERROR
+ENDPROC(el1_error_invalid)
+
+/*
+ * EL1 mode handlers.
+ */
+ .align 6
+el1_sync:
+ kernel_entry 1
+ mrs x1, esr_el1 // read the syndrome register
+ lsr x24, x1, #26 // exception class
+ cmp x24, #0x25 // data abort in EL1
+ b.eq el1_da
+ cmp x24, #0x18 // configurable trap
+ b.eq el1_undef
+ cmp x24, #0x26 // stack alignment exception
+ b.eq el1_sp_pc
+ cmp x24, #0x22 // pc alignment exception
+ b.eq el1_sp_pc
+ cmp x24, #0x00 // unknown exception in EL1
+ b.eq el1_undef
+ cmp x24, #0x30 // debug exception in EL1
+ b.ge el1_dbg
+ b el1_inv
+el1_da:
+ /*
+ * Data abort handling
+ */
+ mrs x0, far_el1
+ enable_dbg_if_not_stepping x2
+ // re-enable interrupts if they were enabled in the aborted context
+ tbnz x23, #7, 1f // PSR_I_BIT
+ enable_irq
+1:
+ mov x2, sp // struct pt_regs
+ bl do_mem_abort
+
+ // disable interrupts before pulling preserved data off the stack
+ disable_irq
+ kernel_exit 1
+el1_sp_pc:
+ /*
+ *Stack or PC alignment exception handling
+ */
+ mrs x0, far_el1
+ mov x1, x25
+ mov x2, sp
+ b do_sp_pc_abort
+el1_undef:
+ /*
+ *Undefined instruction
+ */
+ mov x0, sp
+ b do_undefinstr
+el1_dbg:
+ /*
+ * Debug exception handling
+ */
+ tbz x24, #0, el1_inv // EL1 only
+ mrs x0, far_el1
+ mov x2, sp // struct pt_regs
+ bl do_debug_exception
+
+ kernel_exit 1
+el1_inv:
+ // TODO: add support for undefined instructions in kernel mode
+ mov x0, sp
+ mov x1, #BAD_SYNC
+ mrs x2, esr_el1
+ b bad_mode
+ENDPROC(el1_sync)
+
+ .align 6
+el1_irq:
+ kernel_entry 1
+ enable_dbg_if_not_stepping x0
+#ifdef CONFIG_TRACE_IRQFLAGS
+ bl trace_hardirqs_off
+#endif
+#ifdef CONFIG_PREEMPT
+ get_thread_info tsk
+ ldr x24, [tsk, #TI_PREEMPT] // get preempt count
+ add x0, x24, #1 // increment it
+ str x0, [tsk, #TI_PREEMPT]
+#endif
+ irq_handler
+#ifdef CONFIG_PREEMPT
+ str x24, [tsk, #TI_PREEMPT] // restore preempt count
+ cbnz x24, 1f // preempt count != 0
+ ldr x0, [tsk, #TI_FLAGS] // get flags
+ tbz x0, #TIF_NEED_RESCHED, 1f // needs rescheduling?
+ bl el1_preempt
+1:
+#endif
+#ifdef CONFIG_TRACE_IRQFLAGS
+ bl trace_hardirqs_on
+#endif
+ kernel_exit 1
+ENDPROC(el1_irq)
+
+#ifdef CONFIG_PREEMPT
+el1_preempt:
+ mov x24, lr
+1: enable_dbg
+ bl preempt_schedule_irq // irq en/disable is done inside
+ ldr x0, [tsk, #TI_FLAGS] // get new tasks TI_FLAGS
+ tbnz x0, #TIF_NEED_RESCHED, 1b // needs rescheduling?
+ ret x24
+#endif
+
+/*
+ * EL0 mode handlers.
+ */
+ .align 6
+el0_sync:
+ kernel_entry 0
+ mrs x25, esr_el1 // read the syndrome register
+ lsr x24, x25, #26 // exception class
+ cmp x24, #0x15 // SVC in 64-bit state
+ b.eq el0_svc
+ adr lr, ret_from_exception
+ cmp x24, #0x24 // data abort in EL0
+ b.eq el0_da
+ cmp x24, #0x20 // instruction abort in EL0
+ b.eq el0_ia
+ cmp x24, #0x07 // FP/ASIMD access
+ b.eq el0_fpsimd_acc
+ cmp x24, #0x2c // FP/ASIMD exception
+ b.eq el0_fpsimd_exc
+ cmp x24, #0x18 // configurable trap
+ b.eq el0_undef
+ cmp x24, #0x26 // stack alignment exception
+ b.eq el0_sp_pc
+ cmp x24, #0x22 // pc alignment exception
+ b.eq el0_sp_pc
+ cmp x24, #0x00 // unknown exception in EL0
+ b.eq el0_undef
+ cmp x24, #0x30 // debug exception in EL0
+ b.ge el0_dbg
+ b el0_inv
+
+#ifdef CONFIG_AARCH32_EMULATION
+ .align 6
+el0_sync_compat:
+ kernel_entry 0, 32
+ mrs x25, esr_el1 // read the syndrome register
+ lsr x24, x25, #26 // exception class
+ cmp x24, #0x11 // SVC in 32-bit state
+ b.eq el0_svc_compat
+ adr lr, ret_from_exception
+ cmp x24, #0x24 // data abort in EL0
+ b.eq el0_da
+ cmp x24, #0x20 // instruction abort in EL0
+ b.eq el0_ia
+ cmp x24, #0x07 // FP/ASIMD access
+ b.eq el0_fpsimd_acc
+ cmp x24, #0x28 // FP/ASIMD exception
+ b.eq el0_fpsimd_exc
+ cmp x24, #0x00 // unknown exception in EL0
+ b.eq el0_undef
+ cmp x24, #0x30 // debug exception in EL0
+ b.ge el0_dbg
+ b el0_inv
+el0_svc_compat:
+ /*
+ * AArch32 syscall handling
+ */
+ adr stbl, compat_sys_call_table // load compat syscall table pointer
+ uxtw scno, w7 // syscall number in w7 (r7)
+ mov sc_nr, #__NR_compat_syscalls
+ b el0_svc_naked
+
+ .align 6
+el0_irq_compat:
+ kernel_entry 0, 32
+ b el0_irq_naked
+#endif
+
+el0_da:
+ /*
+ * Data abort handling
+ */
+ mrs x0, far_el1
+ disable_step x1
+ isb
+ enable_dbg
+ // enable interrupts before calling the main handler
+ enable_irq
+ mov x1, x25
+ mov x2, sp
+ b do_mem_abort
+el0_ia:
+ /*
+ * Instruction abort handling
+ */
+ mrs x0, far_el1
+ disable_step x1
+ isb
+ enable_dbg
+ // enable interrupts before calling the main handler
+ enable_irq
+ orr x1, x25, #1 << 24 // use reserved ISS bit for instruction aborts
+ mov x2, sp
+ b do_mem_abort
+el0_fpsimd_acc:
+ /*
+ * Floating Point or Advanced SIMD access
+ */
+ mov x0, x25
+ mov x1, sp
+ b do_fpsimd_acc
+el0_fpsimd_exc:
+ /*
+ * Floating Point or Advanced SIMD exception
+ */
+ mov x0, x25
+ mov x1, sp
+ b do_fpsimd_exc
+el0_sp_pc:
+ /*
+ * Stack or PC alignment exception handling
+ */
+ mrs x0, far_el1
+ disable_step x1
+ isb
+ enable_dbg
+ // enable interrupts before calling the main handler
+ enable_irq
+ mov x1, x25
+ mov x2, sp
+ b do_sp_pc_abort
+el0_undef:
+ /*
+ *Undefined instruction
+ */
+ mov x0, sp
+ b do_undefinstr
+el0_dbg:
+ /*
+ * Debug exception handling
+ */
+ tbnz x24, #0, el0_inv // EL0 only
+ mrs x0, far_el1
+ disable_step x1
+ mov x1, x25
+ mov x2, sp
+ b do_debug_exception
+el0_inv:
+ mov x0, sp
+ mov x1, #BAD_SYNC
+ mrs x2, esr_el1
+ b bad_mode
+ENDPROC(el0_sync)
+
+ .align 6
+el0_irq:
+ kernel_entry 0
+el0_irq_naked:
+ disable_step x1
+ isb
+ enable_dbg
+#ifdef CONFIG_TRACE_IRQFLAGS
+ bl trace_hardirqs_off
+#endif
+ get_thread_info tsk
+#ifdef CONFIG_PREEMPT
+ ldr x24, [tsk, #TI_PREEMPT] // get preempt count
+ add x23, x24, #1 // increment it
+ str x23, [tsk, #TI_PREEMPT]
+#endif
+ irq_handler
+#ifdef CONFIG_PREEMPT
+ ldr x0, [tsk, #TI_PREEMPT]
+ str x24, [tsk, #TI_PREEMPT]
+ cmp x0, x23
+ b.eq 1f
+ mov x1, #0
+ str x1, [x1] // BUG
+1:
+#endif
+#ifdef CONFIG_TRACE_IRQFLAGS
+ bl trace_hardirqs_on
+#endif
+ b ret_to_user
+ENDPROC(el0_irq)
+
+/*
+ * This is the return code to user mode for abort handlers
+ */
+ENTRY(ret_from_exception)
+ get_thread_info tsk
+ b ret_to_user
+ENDPROC(ret_from_exception)
+
+/*
+ * Register switch for AArch64. The callee-saved registers need to be saved
+ * and restored. On entry:
+ * x0 = previous task_struct (must be preserved across the switch)
+ * x1 = next task_struct
+ * Previous and next are guaranteed not to be the same.
+ *
+ */
+ENTRY(cpu_switch_to)
+ add x8, x0, #THREAD_CPU_CONTEXT
+ mov x9, sp
+ stp x19, x20, [x8], #16 // store callee-saved registers
+ stp x21, x22, [x8], #16
+ stp x23, x24, [x8], #16
+ stp x25, x26, [x8], #16
+ stp x27, x28, [x8], #16
+ stp x29, x9, [x8], #16
+ str lr, [x8]
+ add x8, x1, #THREAD_CPU_CONTEXT
+ ldp x19, x20, [x8], #16 // restore callee-saved registers
+ ldp x21, x22, [x8], #16
+ ldp x23, x24, [x8], #16
+ ldp x25, x26, [x8], #16
+ ldp x27, x28, [x8], #16
+ ldp x29, x9, [x8], #16
+ ldr lr, [x8]
+ mov sp, x9
+ ret
+ENDPROC(cpu_switch_to)
+
+/*
+ * This is the fast syscall return path. We do as little as possible here,
+ * and this includes saving x0 back into the kernel stack.
+ */
+ret_fast_syscall:
+ disable_irq // disable interrupts
+ ldr x1, [tsk, #TI_FLAGS]
+ and x2, x1, #_TIF_WORK_MASK
+ cbnz x2, fast_work_pending
+ tbz x1, #TIF_SINGLESTEP, fast_exit
+ disable_dbg
+ enable_step x2
+fast_exit:
+ kernel_exit 0, ret = 1
+
+/*
+ * Ok, we need to do extra processing, enter the slow path.
+ */
+fast_work_pending:
+ str x0, [sp, #S_X0] // returned x0
+work_pending:
+ tbnz x1, #TIF_NEED_RESCHED, work_resched
+ /* TIF_SIGPENDING or TIF_NOTIFY_RESUME case */
+ ldr x2, [sp, #S_PSTATE]
+ mov x0, sp // 'regs'
+ tst x2, #PSR_MODE_MASK // user mode regs?
+ b.ne no_work_pending // returning to kernel
+ bl do_notify_resume
+ b ret_to_user
+work_resched:
+ enable_dbg
+ bl schedule
+
+/*
+ * "slow" syscall return path.
+ */
+ENTRY(ret_to_user)
+ disable_irq // disable interrupts
+ ldr x1, [tsk, #TI_FLAGS]
+ and x2, x1, #_TIF_WORK_MASK
+ cbnz x2, work_pending
+ tbz x1, #TIF_SINGLESTEP, no_work_pending
+ disable_dbg
+ enable_step x2
+no_work_pending:
+ kernel_exit 0, ret = 0
+ENDPROC(ret_to_user)
+
+/*
+ * This is how we return from a fork.
+ */
+ENTRY(ret_from_fork)
+ bl schedule_tail
+ get_thread_info tsk
+ b ret_to_user
+ENDPROC(ret_from_fork)
+
+/*
+ * SVC handler.
+ */
+ .align 6
+ENTRY(el0_svc)
+ adrp stbl, sys_call_table // load syscall table pointer
+ uxtw scno, w8 // syscall number in w8
+ mov sc_nr, #__NR_syscalls
+el0_svc_naked: // compat entry point
+ stp x0, scno, [sp, #S_ORIG_X0] // save the original x0 and syscall number
+ disable_step x16
+ isb
+ enable_dbg
+ enable_irq
+
+ get_thread_info tsk
+ ldr x16, [tsk, #TI_FLAGS] // check for syscall tracing
+ tbnz x16, #TIF_SYSCALL_TRACE, __sys_trace // are we tracing syscalls?
+ adr lr, ret_fast_syscall // return address
+ cmp scno, sc_nr // check upper syscall limit
+ b.hs ni_sys
+ ldr x16, [stbl, scno, lsl #3] // address in the syscall table
+ br x16 // call sys_* routine
+ni_sys:
+ mov x0, sp
+ b do_ni_syscall
+ENDPROC(el0_svc)
+
+ /*
+ * This is the really slow path. We're going to be doing context
+ * switches, and waiting for our parent to respond.
+ */
+__sys_trace:
+ mov x1, sp
+ mov w0, #0 // trace entry
+ bl syscall_trace
+ adr lr, __sys_trace_return // return address
+ uxtw scno, w0 // syscall number (possibly new)
+ mov x1, sp // pointer to regs
+ cmp scno, sc_nr // check upper syscall limit
+ b.hs ni_sys
+ ldp x0, x1, [sp] // restore the syscall args
+ ldp x2, x3, [sp, #S_X2]
+ ldp x4, x5, [sp, #S_X4]
+ ldp x6, x7, [sp, #S_X6]
+ ldr x16, [stbl, scno, lsl #3] // address in the syscall table
+ br x16 // call sys_* routine
+
+__sys_trace_return:
+ str x0, [sp] // save returned x0
+ mov x1, sp
+ mov w0, #1 // trace exit
+ bl syscall_trace
+ b ret_to_user
+
+/*
+ * Special system call wrappers.
+ */
+ENTRY(sys_execve_wrapper)
+ mov x3, sp
+ b sys_execve
+ENDPROC(sys_execve_wrapper)
+
+ENTRY(sys_clone_wrapper)
+ mov x5, sp
+ b sys_clone
+ENDPROC(sys_clone_wrapper)
+
+ENTRY(sys_rt_sigreturn_wrapper)
+ mov x0, sp
+ b sys_rt_sigreturn
+ENDPROC(sys_rt_sigreturn_wrapper)
+
+ENTRY(sys_sigaltstack_wrapper)
+ ldr x2, [sp, #S_SP]
+ b sys_sigaltstack
+ENDPROC(sys_sigaltstack_wrapper)
+
+ENTRY(handle_arch_irq)
+ .quad 0
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
new file mode 100644
index 0000000..d25459f
--- /dev/null
+++ b/arch/arm64/kernel/stacktrace.c
@@ -0,0 +1,127 @@
+/*
+ * Stack tracing support
+ *
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#include <linux/kernel.h>
+#include <linux/export.h>
+#include <linux/sched.h>
+#include <linux/stacktrace.h>
+
+#include <asm/stacktrace.h>
+
+/*
+ * AArch64 PCS assigns the frame pointer to x29.
+ *
+ * A simple function prologue looks like this:
+ * sub sp, sp, #0x10
+ * stp x29, x30, [sp]
+ * mov x29, sp
+ *
+ * A simple function epilogue looks like this:
+ * mov sp, x29
+ * ldp x29, x30, [sp]
+ * add sp, sp, #0x10
+ */
+int unwind_frame(struct stackframe *frame)
+{
+ unsigned long high, low;
+ unsigned long fp = frame->fp;
+
+ low = frame->sp;
+ high = ALIGN(low, THREAD_SIZE);
+
+ if (fp < low || fp > high || fp & 0xf)
+ return -EINVAL;
+
+ frame->sp = fp + 0x10;
+ frame->fp = *(unsigned long *)(fp);
+ frame->pc = *(unsigned long *)(fp + 8);
+
+ return 0;
+}
+
+void notrace walk_stackframe(struct stackframe *frame,
+ int (*fn)(struct stackframe *, void *), void *data)
+{
+ while (1) {
+ int ret;
+
+ if (fn(frame, data))
+ break;
+ ret = unwind_frame(frame);
+ if (ret < 0)
+ break;
+ }
+}
+EXPORT_SYMBOL(walk_stackframe);
+
+#ifdef CONFIG_STACKTRACE
+struct stack_trace_data {
+ struct stack_trace *trace;
+ unsigned int no_sched_functions;
+ unsigned int skip;
+};
+
+static int save_trace(struct stackframe *frame, void *d)
+{
+ struct stack_trace_data *data = d;
+ struct stack_trace *trace = data->trace;
+ unsigned long addr = frame->pc;
+
+ if (data->no_sched_functions && in_sched_functions(addr))
+ return 0;
+ if (data->skip) {
+ data->skip--;
+ return 0;
+ }
+
+ trace->entries[trace->nr_entries++] = addr;
+
+ return trace->nr_entries >= trace->max_entries;
+}
+
+void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace)
+{
+ struct stack_trace_data data;
+ struct stackframe frame;
+
+ data.trace = trace;
+ data.skip = trace->skip;
+
+ if (tsk != current) {
+ data.no_sched_functions = 1;
+ frame.fp = thread_saved_fp(tsk);
+ frame.sp = thread_saved_sp(tsk);
+ frame.pc = thread_saved_pc(tsk);
+ } else {
+ register unsigned long current_sp asm("sp");
+ data.no_sched_functions = 0;
+ frame.fp = (unsigned long)__builtin_frame_address(0);
+ frame.sp = current_sp;
+ frame.pc = (unsigned long)save_stack_trace_tsk;
+ }
+
+ walk_stackframe(&frame, save_trace, &data);
+ if (trace->nr_entries < trace->max_entries)
+ trace->entries[trace->nr_entries++] = ULONG_MAX;
+}
+
+void save_stack_trace(struct stack_trace *trace)
+{
+ save_stack_trace_tsk(current, trace);
+}
+EXPORT_SYMBOL_GPL(save_stack_trace);
+#endif
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
new file mode 100644
index 0000000..8712a8e
--- /dev/null
+++ b/arch/arm64/kernel/traps.c
@@ -0,0 +1,357 @@
+/*
+ * Based on arch/arm/kernel/traps.c
+ *
+ * Copyright (C) 1995-2009 Russell King
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/signal.h>
+#include <linux/personality.h>
+#include <linux/kallsyms.h>
+#include <linux/spinlock.h>
+#include <linux/uaccess.h>
+#include <linux/hardirq.h>
+#include <linux/kdebug.h>
+#include <linux/module.h>
+#include <linux/kexec.h>
+#include <linux/delay.h>
+#include <linux/init.h>
+#include <linux/sched.h>
+#include <linux/syscalls.h>
+
+#include <asm/atomic.h>
+#include <asm/traps.h>
+#include <asm/stacktrace.h>
+#include <asm/exception.h>
+#include <asm/system_misc.h>
+
+static const char *handler[]= {
+ "Synchronous Abort",
+ "IRQ",
+ "FIQ",
+ "Error"
+};
+
+int show_unhandled_signals = 1;
+
+/*
+ * Dump out the contents of some memory nicely...
+ */
+static void dump_mem(const char *lvl, const char *str, unsigned long bottom,
+ unsigned long top)
+{
+ unsigned long first;
+ mm_segment_t fs;
+ int i;
+
+ /*
+ * We need to switch to kernel mode so that we can use __get_user
+ * to safely read from kernel space. Note that we now dump the
+ * code first, just in case the backtrace kills us.
+ */
+ fs = get_fs();
+ set_fs(KERNEL_DS);
+
+ printk("%s%s(0x%016lx to 0x%016lx)\n", lvl, str, bottom, top);
+
+ for (first = bottom & ~31; first < top; first += 32) {
+ unsigned long p;
+ char str[sizeof(" 12345678") * 8 + 1];
+
+ memset(str, ' ', sizeof(str));
+ str[sizeof(str) - 1] = '\0';
+
+ for (p = first, i = 0; i < 8 && p < top; i++, p += 4) {
+ if (p >= bottom && p < top) {
+ unsigned int val;
+ if (__get_user(val, (unsigned int *)p) == 0)
+ sprintf(str + i * 9, " %08x", val);
+ else
+ sprintf(str + i * 9, " ????????");
+ }
+ }
+ printk("%s%04lx:%s\n", lvl, first & 0xffff, str);
+ }
+
+ set_fs(fs);
+}
+
+static void dump_backtrace_entry(unsigned long where, unsigned long stack)
+{
+ print_ip_sym(where);
+ if (in_exception_text(where))
+ dump_mem("", "Exception stack", stack,
+ stack + sizeof(struct pt_regs));
+}
+
+static void dump_instr(const char *lvl, struct pt_regs *regs)
+{
+ unsigned long addr = instruction_pointer(regs);
+ mm_segment_t fs;
+ char str[sizeof("00000000 ") * 5 + 2 + 1], *p = str;
+ int i;
+
+ /*
+ * We need to switch to kernel mode so that we can use __get_user
+ * to safely read from kernel space. Note that we now dump the
+ * code first, just in case the backtrace kills us.
+ */
+ fs = get_fs();
+ set_fs(KERNEL_DS);
+
+ for (i = -4; i < 1; i++) {
+ unsigned int val, bad;
+
+ bad = __get_user(val, &((u32 *)addr)[i]);
+
+ if (!bad)
+ p += sprintf(p, i == 0 ? "(%08x) " : "%08x ", val);
+ else {
+ p += sprintf(p, "bad PC value");
+ break;
+ }
+ }
+ printk("%sCode: %s\n", lvl, str);
+
+ set_fs(fs);
+}
+
+static void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
+{
+ struct stackframe frame;
+ const register unsigned long current_sp asm ("sp");
+
+ pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk);
+
+ if (!tsk)
+ tsk = current;
+
+ if (regs) {
+ frame.fp = regs->regs[29];
+ frame.sp = regs->sp;
+ frame.pc = regs->pc;
+ } else if (tsk == current) {
+ frame.fp = (unsigned long)__builtin_frame_address(0);
+ frame.sp = current_sp;
+ frame.pc = (unsigned long)dump_backtrace;
+ } else {
+ /*
+ * task blocked in __switch_to
+ */
+ frame.fp = thread_saved_fp(tsk);
+ frame.sp = thread_saved_sp(tsk);
+ frame.pc = thread_saved_pc(tsk);
+ }
+
+ printk("Call trace:\n");
+ while (1) {
+ unsigned long where = frame.pc;
+ int ret;
+
+ ret = unwind_frame(&frame);
+ if (ret < 0)
+ break;
+ dump_backtrace_entry(where, frame.sp);
+ }
+}
+
+void dump_stack(void)
+{
+ dump_backtrace(NULL, NULL);
+}
+
+EXPORT_SYMBOL(dump_stack);
+
+void show_stack(struct task_struct *tsk, unsigned long *sp)
+{
+ dump_backtrace(NULL, tsk);
+ barrier();
+}
+
+#ifdef CONFIG_PREEMPT
+#define S_PREEMPT " PREEMPT"
+#else
+#define S_PREEMPT ""
+#endif
+#ifdef CONFIG_SMP
+#define S_SMP " SMP"
+#else
+#define S_SMP ""
+#endif
+
+static int __die(const char *str, int err, struct thread_info *thread,
+ struct pt_regs *regs)
+{
+ struct task_struct *tsk = thread->task;
+ static int die_counter;
+ int ret;
+
+ pr_emerg("Internal error: %s: %x [#%d]" S_PREEMPT S_SMP "\n",
+ str, err, ++die_counter);
+
+ /* trap and error numbers are mostly meaningless on ARM */
+ ret = notify_die(DIE_OOPS, str, regs, err, 0, SIGSEGV);
+ if (ret == NOTIFY_STOP)
+ return ret;
+
+ print_modules();
+ __show_regs(regs);
+ pr_emerg("Process %.*s (pid: %d, stack limit = 0x%p)\n",
+ TASK_COMM_LEN, tsk->comm, task_pid_nr(tsk), thread + 1);
+
+ if (!user_mode(regs) || in_interrupt()) {
+ dump_mem(KERN_EMERG, "Stack: ", regs->sp,
+ THREAD_SIZE + (unsigned long)task_stack_page(tsk));
+ dump_backtrace(regs, tsk);
+ dump_instr(KERN_EMERG, regs);
+ }
+
+ return ret;
+}
+
+DEFINE_SPINLOCK(die_lock);
+
+/*
+ * This function is protected against re-entrancy.
+ */
+void die(const char *str, struct pt_regs *regs, int err)
+{
+ struct thread_info *thread = current_thread_info();
+ int ret;
+
+ oops_enter();
+
+ spin_lock_irq(&die_lock);
+ console_verbose();
+ bust_spinlocks(1);
+ ret = __die(str, err, thread, regs);
+
+ if (regs && kexec_should_crash(thread->task))
+ crash_kexec(regs);
+
+ bust_spinlocks(0);
+ add_taint(TAINT_DIE);
+ spin_unlock_irq(&die_lock);
+ oops_exit();
+
+ if (in_interrupt())
+ panic("Fatal exception in interrupt");
+ if (panic_on_oops)
+ panic("Fatal exception");
+ if (ret != NOTIFY_STOP)
+ do_exit(SIGSEGV);
+}
+
+void arm64_notify_die(const char *str, struct pt_regs *regs,
+ struct siginfo *info, int err)
+{
+ if (user_mode(regs))
+ force_sig_info(info->si_signo, info, current);
+ else
+ die(str, regs, err);
+}
+
+asmlinkage void __exception do_undefinstr(struct pt_regs *regs)
+{
+ siginfo_t info;
+ void __user *pc = (void __user *)instruction_pointer(regs);
+
+#ifdef CONFIG_AARCH32_EMULATION
+ /* check for AArch32 breakpoint instructions */
+ if (compat_user_mode(regs) && aarch32_break_trap(regs) == 0)
+ return;
+#endif
+
+ if (show_unhandled_signals) {
+ pr_info("%s[%d]: undefined instruction: pc=%p\n",
+ current->comm, task_pid_nr(current), pc);
+ dump_instr(KERN_INFO, regs);
+ }
+
+ info.si_signo = SIGILL;
+ info.si_errno = 0;
+ info.si_code = ILL_ILLOPC;
+ info.si_addr = pc;
+
+ arm64_notify_die("Oops - undefined instruction", regs, &info, 0);
+}
+
+long compat_arm_syscall(struct pt_regs *regs);
+
+asmlinkage long do_ni_syscall(struct pt_regs *regs)
+{
+#ifdef CONFIG_AARCH32_EMULATION
+ long ret;
+ if (is_compat_task()) {
+ ret = compat_arm_syscall(regs);
+ if (ret != -ENOSYS)
+ return ret;
+ }
+#endif
+
+ if (show_unhandled_signals) {
+ pr_info("%s[%d]: syscall %d\n", current->comm,
+ task_pid_nr(current), (int)regs->syscallno);
+ dump_instr("", regs);
+ if (user_mode(regs))
+ __show_regs(regs);
+ }
+
+ return sys_ni_syscall();
+}
+
+/*
+ * bad_mode handles the impossible case in the exception vector.
+ */
+asmlinkage void bad_mode(struct pt_regs *regs, int reason, unsigned int esr)
+{
+ console_verbose();
+
+ pr_crit("Bad mode in %s handler detected, code 0x%08x\n",
+ handler[reason], esr);
+
+ die("Oops - bad mode", regs, 0);
+ local_irq_disable();
+ panic("bad mode");
+}
+
+
+void __bad_xchg(volatile void *ptr, int size)
+{
+ printk("xchg: bad data size: pc 0x%p, ptr 0x%p, size %d\n",
+ __builtin_return_address(0), ptr, size);
+ BUG();
+}
+EXPORT_SYMBOL(__bad_xchg);
+
+void __pte_error(const char *file, int line, unsigned long val)
+{
+ printk("%s:%d: bad pte %016lx.\n", file, line, val);
+}
+
+void __pmd_error(const char *file, int line, unsigned long val)
+{
+ printk("%s:%d: bad pmd %016lx.\n", file, line, val);
+}
+
+void __pgd_error(const char *file, int line, unsigned long val)
+{
+ printk("%s:%d: bad pgd %016lx.\n", file, line, val);
+}
+
+void __init trap_init(void)
+{
+ return;
+}

2012-08-14 17:54:45

by Catalin Marinas

[permalink] [raw]
Subject: [PATCH v2 02/31] arm64: Kernel booting and initialisation

The patch adds the kernel booting and the initial setup code.
Documentation/arm64/booting.txt describes the booting protocol on the
AArch64 Linux kernel. This is subject to change following the work on
boot standardisation, ACPI.

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
---
Documentation/arm64/booting.txt | 141 +++++++++++
arch/arm64/include/asm/setup.h | 26 ++
arch/arm64/kernel/head.S | 521 +++++++++++++++++++++++++++++++++++++++
arch/arm64/kernel/setup.c | 357 +++++++++++++++++++++++++++
4 files changed, 1045 insertions(+), 0 deletions(-)
create mode 100644 Documentation/arm64/booting.txt
create mode 100644 arch/arm64/include/asm/setup.h
create mode 100644 arch/arm64/kernel/head.S
create mode 100644 arch/arm64/kernel/setup.c

diff --git a/Documentation/arm64/booting.txt b/Documentation/arm64/booting.txt
new file mode 100644
index 0000000..3197820
--- /dev/null
+++ b/Documentation/arm64/booting.txt
@@ -0,0 +1,141 @@
+ Booting AArch64 Linux
+ =====================
+
+Author: Will Deacon <[email protected]>
+Date : 25 April 2012
+
+This document is based on the ARM booting document by Russell King and
+is relevant to all public releases of the AArch64 Linux kernel.
+
+The AArch64 exception model is made up of a number of exception levels
+(EL0 - EL3), with EL0 and EL1 having a secure and a non-secure
+counterpart. EL2 is the hypervisor level and exists only in non-secure
+mode. EL3 is the highest priority level and exists only in secure mode.
+
+For the purposes of this document, we will use the term `boot loader'
+simply to define all software that executes on the CPU(s) before control
+is passed to the Linux kernel. This may include secure monitor and
+hypervisor code, or it may just be a handful of instructions for
+preparing a minimal boot environment.
+
+Essentially, the boot loader should provide (as a minimum) the
+following:
+
+1. Setup and initialise the RAM
+2. Setup the device tree
+3. Decompress the kernel image
+4. Call the kernel image
+
+
+1. Setup and initialise RAM
+---------------------------
+
+Requirement: MANDATORY
+
+The boot loader is expected to find and initialise all RAM that the
+kernel will use for volatile data storage in the system. It performs
+this in a machine dependent manner. (It may use internal algorithms
+to automatically locate and size all RAM, or it may use knowledge of
+the RAM in the machine, or any other method the boot loader designer
+sees fit.)
+
+
+2. Setup the device tree
+-------------------------
+
+Requirement: MANDATORY
+
+The device tree blob (dtb) must be no bigger than 2 megabytes in size
+and placed at a 2-megabyte boundary within the first 512 megabytes from
+the start of the kernel image. This is to allow the kernel to map the
+blob using a single section mapping in the initial page tables.
+
+
+3. Decompress the kernel image
+------------------------------
+
+Requirement: OPTIONAL
+
+The AArch64 kernel does not provide a decompressor and therefore
+requires gzip decompression to be performed by the boot loader if the
+default Image.gz target is used. For bootloaders that do not implement
+this requirement, the larger Image target is available instead.
+
+
+4. Call the kernel image
+------------------------
+
+Requirement: MANDATORY
+
+The decompressed kernel image contains a 32-byte header as follows:
+
+ u32 magic = 0x14000008; /* branch to stext, little-endian */
+ u32 res0 = 0; /* reserved */
+ u64 text_offset; /* Image load offset */
+ u64 res1 = 0; /* reserved */
+ u64 res2 = 0; /* reserved */
+
+The image must be placed at the specified offset (currently 0x80000)
+from the start of the system RAM and called there. The start of the
+system RAM must be aligned to 2MB.
+
+Before jumping into the kernel, the following conditions must be met:
+
+- Quiesce all DMA capable devices so that memory does not get
+ corrupted by bogus network packets or disk data. This will save
+ you many hours of debug.
+
+- Primary CPU general-purpose register settings
+ x0 = physical address of device tree blob (dtb) in system RAM.
+
+- CPU mode
+ All forms of interrupts must be masked in PSTATE.DAIF (Debug, SError,
+ IRQ and FIQ).
+ The CPU must be in either EL2 (RECOMMENDED in order to have access to
+ the virtualisation extensions) or non-secure EL1.
+
+- Caches, MMUs
+ The MMU must be off.
+ Instruction cache may be on or off.
+ Data cache must be off and invalidated.
+
+- Architected timers
+ CNTFRQ must be programmed with the timer frequency.
+ If entering the kernel at EL1, CNTHCTL_EL2 must have EL1PCTEN (bit 0)
+ set where available.
+
+- Coherency
+ All CPUs to be booted by the kernel must be part of the same coherency
+ domain on entry to the kernel. This may require IMPLEMENTATION DEFINED
+ initialisation to enable the receiving of maintenance operations on
+ each CPU.
+
+- System registers
+ All writable architected system registers at the exception level where
+ the kernel image will be entered must be initialised by software at a
+ higher exception level to prevent execution in an UNKNOWN state.
+
+The boot loader is expected to enter the kernel on each CPU in the
+following manner:
+
+- The primary CPU must jump directly to the first instruction of the
+ kernel image. The device tree blob passed by this CPU must contain
+ for each CPU node:
+
+ 1. An 'enable-method' property. Currently, the only supported value
+ for this field is the string "spin-table".
+
+ 2. A 'cpu-release-addr' property identifying a 64-bit,
+ zero-initialised memory location.
+
+ It is expected that the bootloader will generate these device tree
+ properties and insert them into the blob prior to kernel entry.
+
+- Any secondary CPUs must spin outside of the kernel in a reserved area
+ of memory (communicated to the kernel by a /memreserve/ region in the
+ device tree) polling their cpu-release-addr location, which must be
+ contained in the reserved region. A wfe instruction may be inserted
+ to reduce the overhead of the busy-loop and a sev will be issued by
+ the primary CPU. When a read of the location pointed to by the
+ cpu-release-addr returns a non-zero value, the CPU must jump directly
+ to this value.
diff --git a/arch/arm64/include/asm/setup.h b/arch/arm64/include/asm/setup.h
new file mode 100644
index 0000000..d766493
--- /dev/null
+++ b/arch/arm64/include/asm/setup.h
@@ -0,0 +1,26 @@
+/*
+ * Based on arch/arm/include/asm/setup.h
+ *
+ * Copyright (C) 1997-1999 Russell King
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_SETUP_H
+#define __ASM_SETUP_H
+
+#include <linux/types.h>
+
+#define COMMAND_LINE_SIZE 1024
+
+#endif
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
new file mode 100644
index 0000000..34ccdc0
--- /dev/null
+++ b/arch/arm64/kernel/head.S
@@ -0,0 +1,521 @@
+/*
+ * Low-level CPU initialisation
+ * Based on arch/arm/kernel/head.S
+ *
+ * Copyright (C) 1994-2002 Russell King
+ * Copyright (C) 2003-2012 ARM Ltd.
+ * Authors: Catalin Marinas <[email protected]>
+ * Will Deacon <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/linkage.h>
+#include <linux/init.h>
+
+#include <asm/assembler.h>
+#include <asm/ptrace.h>
+#include <asm/asm-offsets.h>
+#include <asm/memory.h>
+#include <asm/thread_info.h>
+#include <asm/pgtable-hwdef.h>
+#include <asm/pgtable.h>
+#include <asm/page.h>
+
+/*
+ * swapper_pg_dir is the virtual address of the initial page table. We place
+ * the page tables 3 * PAGE_SIZE below KERNEL_RAM_VADDR. The idmap_pg_dir has
+ * 2 pages and is placed below swapper_pg_dir.
+ */
+#define KERNEL_RAM_VADDR (PAGE_OFFSET + TEXT_OFFSET)
+
+#if (KERNEL_RAM_VADDR & 0xfffff) != 0x80000
+#error KERNEL_RAM_VADDR must start at 0xXXX80000
+#endif
+
+#define SWAPPER_DIR_SIZE (3 * PAGE_SIZE)
+#define IDMAP_DIR_SIZE (2 * PAGE_SIZE)
+
+ .globl swapper_pg_dir
+ .equ swapper_pg_dir, KERNEL_RAM_VADDR - SWAPPER_DIR_SIZE
+
+ .globl idmap_pg_dir
+ .equ idmap_pg_dir, swapper_pg_dir - IDMAP_DIR_SIZE
+
+ .macro pgtbl, ttb0, ttb1, phys
+ add \ttb1, \phys, #TEXT_OFFSET - SWAPPER_DIR_SIZE
+ sub \ttb0, \ttb1, #IDMAP_DIR_SIZE
+ .endm
+
+#ifdef CONFIG_ARM64_64K_PAGES
+#define BLOCK_SHIFT PAGE_SHIFT
+#define BLOCK_SIZE PAGE_SIZE
+#else
+#define BLOCK_SHIFT SECTION_SHIFT
+#define BLOCK_SIZE SECTION_SIZE
+#endif
+
+#define KERNEL_START KERNEL_RAM_VADDR
+#define KERNEL_END _end
+
+/*
+ * Initial memory map attributes.
+ */
+#ifndef CONFIG_SMP
+#define PTE_FLAGS PTE_ATTRINDX(MT_NORMAL) | PTE_AF
+#define PMD_FLAGS PMD_ATTRINDX(MT_NORMAL) | PMD_SECT_AF
+#else
+#define PTE_FLAGS PTE_ATTRINDX(MT_NORMAL) | PTE_AF | PTE_SHARED
+#define PMD_FLAGS PMD_ATTRINDX(MT_NORMAL) | PMD_SECT_AF | PMD_SECT_S
+#endif
+
+#ifdef CONFIG_ARM64_64K_PAGES
+#define MM_MMUFLAGS PTE_TYPE_PAGE | PTE_FLAGS
+#define IO_MMUFLAGS PTE_TYPE_PAGE | PTE_XN | PTE_FLAGS
+#else
+#define MM_MMUFLAGS PMD_TYPE_SECT | PMD_FLAGS
+#define IO_MMUFLAGS PMD_TYPE_SECT | PMD_SECT_XN | PMD_FLAGS
+#endif
+
+/*
+ * Kernel startup entry point.
+ * ---------------------------
+ *
+ * The requirements are:
+ * MMU = off, D-cache = off, I-cache = on or off,
+ * x0 = physical address to the FDT blob.
+ *
+ * This code is mostly position independent so you call this at
+ * __pa(PAGE_OFFSET + TEXT_OFFSET).
+ *
+ * Note that the callee-saved registers are used for storing variables
+ * that are useful before the MMU is enabled. The allocations are described
+ * in the entry routines.
+ */
+ __HEAD
+
+ /*
+ * DO NOT MODIFY. Image header expected by Linux boot-loaders.
+ */
+ b stext // branch to kernel start, magic
+ .long 0 // reserved
+ .quad TEXT_OFFSET // Image load offset from start of RAM
+ .quad 0 // reserved
+ .quad 0 // reserved
+
+ENTRY(stext)
+ mov x21, x0 // x21=FDT
+ bl el2_setup // Drop to EL1
+ mrs x22, midr_el1 // x22=cpuid
+ mov x0, x22
+ bl __lookup_processor_type
+ mov x23, x0 // x23=procinfo
+ cbz x23, __error_p // invalid processor (x23=0)?
+ bl __calc_phys_offset // x24=phys offset
+ bl __vet_fdt
+ bl __create_page_tables // x25=TTBR0, x26=TTBR1
+ /*
+ * The following calls CPU specific code in a position independent
+ * manner. See arch/arm64/mm/proc.S for details. x23 = base of
+ * cpu_proc_info structure selected by __lookup_processor_type above.
+ * On return, the CPU will be ready for the MMU to be turned on and
+ * the TCR will have been set.
+ */
+ ldr x27, __switch_data // address to jump to after
+ // MMU has been enabled
+ adr lr, __enable_mmu // return (PIC) address
+ add x12, x23, #PROCINFO_INITFUNC
+ br x12 // initialise processor
+ENDPROC(stext)
+
+/*
+ * If we're fortunate enough to boot at EL2, ensure that the world is
+ * sane before dropping to EL1.
+ */
+ENTRY(el2_setup)
+ mrs x0, CurrentEL
+ cmp x0, #PSR_MODE_EL2t
+ ccmp x0, #PSR_MODE_EL2h, #0x4, ne
+ b.eq 1f
+ ret
+
+ /* Hyp configuration. */
+1: mov x0, #(1 << 31) // 64-bit EL1
+ msr hcr_el2, x0
+
+ /* Generic timers. */
+ mrs x0, cnthctl_el2
+ orr x0, x0, #3 // Enable EL1 physical timers
+ msr cnthctl_el2, x0
+
+ /* Populate ID registers. */
+ mrs x0, midr_el1
+ mrs x1, mpidr_el1
+ msr vpidr_el2, x0
+ msr vmpidr_el2, x1
+
+ /* sctlr_el1 */
+ mov x0, #0x0800 // Set/clear RES{1,0} bits
+ movk x0, #0x30d0, lsl #16
+ msr sctlr_el1, x0
+
+ /* Coprocessor traps. */
+ mov x0, #0x33ff
+ msr cptr_el2, x0 // Disable copro. traps to EL2
+
+#ifdef CONFIG_AARCH32_EMULATION
+ msr hstr_el2, xzr // Disable CP15 traps to EL2
+#endif
+
+ /* spsr */
+ mov x0, #(PSR_F_BIT | PSR_I_BIT | PSR_A_BIT | PSR_D_BIT |\
+ PSR_MODE_EL1h)
+ msr spsr_el2, x0
+ msr elr_el2, lr
+ eret
+ENDPROC(el2_setup)
+
+ .align 3
+2: .quad .
+ .quad PAGE_OFFSET
+
+#ifdef CONFIG_SMP
+ .pushsection .smp.pen.text, "ax"
+ .align 3
+1: .quad .
+ .quad secondary_holding_pen_release
+
+ /*
+ * This provides a "holding pen" for platforms to hold all secondary
+ * cores are held until we're ready for them to initialise.
+ */
+ENTRY(secondary_holding_pen)
+ bl el2_setup // Drop to EL1
+ mrs x0, mpidr_el1
+ and x0, x0, #15 // CPU number
+ adr x1, 1b
+ ldp x2, x3, [x1]
+ sub x1, x1, x2
+ add x3, x3, x1
+pen: ldr x4, [x3]
+ cmp x4, x0
+ b.eq secondary_startup
+ wfe
+ b pen
+ENDPROC(secondary_holding_pen)
+ .popsection
+
+ENTRY(secondary_startup)
+ /*
+ * Common entry point for secondary CPUs.
+ */
+ mrs x22, midr_el1 // x22=cpuid
+ mov x0, x22
+ bl __lookup_processor_type
+ mov x23, x0 // x23=procinfo
+ cbz x23, __error_p // invalid processor (x23=0)?
+
+ bl __calc_phys_offset // x24=phys offset
+ pgtbl x25, x26, x24 // x25=TTBR0, x26=TTBR1
+ add x12, x23, #PROCINFO_INITFUNC
+ blr x12 // initialise processor
+
+ ldr x21, =secondary_data
+ ldr x27, =__secondary_switched // address to jump to after enabling the MMU
+ b __enable_mmu
+ENDPROC(secondary_startup)
+
+ENTRY(__secondary_switched)
+ ldr x0, [x21] // get secondary_data.stack
+ mov sp, x0
+ mov x29, #0
+ b secondary_start_kernel
+ENDPROC(__secondary_switched)
+#endif /* CONFIG_SMP */
+
+/*
+ * Setup common bits before finally enabling the MMU. Essentially this is just
+ * loading the page table pointer and vector base registers.
+ *
+ * On entry to this code, x0 must contain the SCTLR_EL1 value for turning on
+ * the MMU.
+ */
+__enable_mmu:
+ ldr x5, =vectors
+ msr vbar_el1, x5
+ msr ttbr0_el1, x25 // load TTBR0
+ msr ttbr1_el1, x26 // load TTBR1
+ isb
+ b __turn_mmu_on
+ENDPROC(__enable_mmu)
+
+/*
+ * Enable the MMU. This completely changes the structure of the visible memory
+ * space. You will not be able to trace execution through this.
+ *
+ * x0 = system control register
+ * x27 = *virtual* address to jump to upon completion
+ *
+ * other registers depend on the function called upon completion
+ */
+ .align 6
+__turn_mmu_on:
+ msr sctlr_el1, x0
+ isb
+ br x27
+ENDPROC(__turn_mmu_on)
+
+/*
+ * Calculate the start of physical memory.
+ */
+__calc_phys_offset:
+ adr x0, 1f
+ ldp x1, x2, [x0]
+ sub x3, x0, x1 // PHYS_OFFSET - PAGE_OFFSET
+ add x24, x2, x3 // x24=PHYS_OFFSET
+ ret
+ENDPROC(__calc_phys_offset)
+
+ .align 3
+1: .quad .
+ .quad PAGE_OFFSET
+
+/*
+ * Macro to populate the PGD for the corresponding block entry in the next
+ * level (tbl) for the given virtual address.
+ *
+ * Preserves: pgd, tbl, virt
+ * Corrupts: tmp1, tmp2
+ */
+ .macro create_pgd_entry, pgd, tbl, virt, tmp1, tmp2
+ lsr \tmp1, \virt, #PGDIR_SHIFT
+ and \tmp1, \tmp1, #PTRS_PER_PGD - 1 // PGD index
+ orr \tmp2, \tbl, #3 // PGD entry table type
+ str \tmp2, [\pgd, \tmp1, lsl #3]
+ .endm
+
+/*
+ * Macro to populate block entries in the page table for the start..end
+ * virtual range (inclusive).
+ *
+ * Preserves: tbl, flags
+ * Corrupts: phys, start, end, pstate
+ */
+ .macro create_block_map, tbl, flags, phys, start, end, idmap=0
+ lsr \phys, \phys, #BLOCK_SHIFT
+ .if \idmap
+ and \start, \phys, #PTRS_PER_PTE - 1 // table index
+ .else
+ lsr \start, \start, #BLOCK_SHIFT
+ and \start, \start, #PTRS_PER_PTE - 1 // table index
+ .endif
+ orr \phys, \flags, \phys, lsl #BLOCK_SHIFT // table entry
+ .ifnc \start,\end
+ lsr \end, \end, #BLOCK_SHIFT
+ and \end, \end, #PTRS_PER_PTE - 1 // table end index
+ .endif
+9999: str \phys, [\tbl, \start, lsl #3] // store the entry
+ .ifnc \start,\end
+ add \start, \start, #1 // next entry
+ add \phys, \phys, #BLOCK_SIZE // next block
+ cmp \start, \end
+ b.ls 9999b
+ .endif
+ .endm
+
+/*
+ * Setup the initial page tables. We only setup the barest amount which is
+ * required to get the kernel running. The following sections are required:
+ * - identity mapping to enable the MMU (low address, TTBR0)
+ * - first few MB of the kernel linear mapping to jump to once the MMU has
+ * been enabled, including the FDT blob (TTBR1)
+ */
+__create_page_tables:
+ pgtbl x25, x26, x24 // idmap_pg_dir and swapper_pg_dir addresses
+
+ /*
+ * Clear the idmap and swapper page tables.
+ */
+ mov x0, x25
+ add x6, x26, #SWAPPER_DIR_SIZE
+1: stp xzr, xzr, [x0], #16
+ stp xzr, xzr, [x0], #16
+ stp xzr, xzr, [x0], #16
+ stp xzr, xzr, [x0], #16
+ cmp x0, x6
+ b.lo 1b
+
+ ldr x7, =MM_MMUFLAGS
+
+ /*
+ * Create the identity mapping.
+ */
+ add x0, x25, #PAGE_SIZE // section table address
+ adr x3, __turn_mmu_on // virtual/physical address
+ create_pgd_entry x25, x0, x3, x5, x6
+ create_block_map x0, x7, x3, x5, x5, idmap=1
+
+ /*
+ * Map the kernel image (starting with PHYS_OFFSET).
+ */
+ add x0, x26, #PAGE_SIZE // section table address
+ mov x5, #PAGE_OFFSET
+ create_pgd_entry x26, x0, x5, x3, x6
+ ldr x6, =KERNEL_END - 1
+ mov x3, x24 // phys offset
+ create_block_map x0, x7, x3, x5, x6
+
+ /*
+ * Map the FDT blob (maximum 2MB; must be within 512MB of
+ * PHYS_OFFSET).
+ */
+ mov x3, x21 // FDT phys address
+ and x3, x3, #~((1 << 21) - 1) // 2MB aligned
+ mov x6, #PAGE_OFFSET
+ sub x5, x3, x24 // subtract PHYS_OFFSET
+ tst x5, #~((1 << 29) - 1) // within 512MB?
+ csel x21, xzr, x21, ne // zero the FDT pointer
+ b.ne 1f
+ add x5, x5, x6 // __va(FDT blob)
+ add x6, x5, #1 << 21 // 2MB for the FDT blob
+ sub x6, x6, #1 // inclusive range
+ create_block_map x0, x7, x3, x5, x6
+1:
+ ret
+ENDPROC(__create_page_tables)
+ .ltorg
+
+ .align 3
+ .type __switch_data, %object
+__switch_data:
+ .quad __mmap_switched
+ .quad __data_loc // x4
+ .quad _data // x5
+ .quad __bss_start // x6
+ .quad _end // x7
+ .quad processor_id // x4
+ .quad __fdt_pointer // x5
+ .quad memstart_addr // x6
+ .quad init_thread_union + THREAD_START_SP // sp
+
+/*
+ * The following fragment of code is executed with the MMU on in MMU mode, and
+ * uses absolute addresses; this is not position independent.
+ */
+__mmap_switched:
+ adr x3, __switch_data + 8
+
+ ldp x4, x5, [x3], #16
+ ldp x6, x7, [x3], #16
+ cmp x4, x5 // Copy data segment if needed
+1: ccmp x5, x6, #4, ne
+ b.eq 2f
+ ldr x16, [x4], #8
+ str x16, [x5], #8
+ b 1b
+2:
+1: cmp x6, x7
+ b.hs 2f
+ str xzr, [x6], #8 // Clear BSS
+ b 1b
+2:
+ ldp x4, x5, [x3], #16
+ ldr x6, [x3], #8
+ ldr x16, [x3]
+ mov sp, x16
+ str x22, [x4] // Save processor ID
+ str x21, [x5] // Save FDT pointer
+ str x24, [x6] // Save PHYS_OFFSET
+ mov x29, #0
+ b start_kernel
+ENDPROC(__mmap_switched)
+
+/*
+ * Exception handling. Something went wrong and we can't proceed. We ought to
+ * tell the user, but since we don't have any guarantee that we're even
+ * running on the right architecture, we do virtually nothing.
+ */
+__error_p:
+ENDPROC(__error_p)
+
+__error:
+1: nop
+ b 1b
+ENDPROC(__error)
+
+/*
+ * Read processor ID register and look up in the linker-built supported
+ * processor list. Note that we can't use the absolute addresses for the
+ * __proc_info lists since we aren't running with the MMU on (and therefore,
+ * we are not in the correct address space). We have to calculate the offset.
+ *
+ * This routine can be called via C code, so to avoid needlessly saving
+ * callee-saved registers, we take the CPUID in x0 and return the physical
+ * proc_info pointer in x0 as well.
+ */
+__lookup_processor_type:
+ adr x1, __lookup_processor_type_data
+ ldr x2, [x1]
+ ldp x3, x4, [x1, #8]
+ sub x1, x1, x2 // get offset between virt&phys
+ add x3, x3, x1 // convert virt addresses to
+ add x4, x4, x1 // physical address space
+1:
+ ldp w5, w6, [x3] // load cpu_val and cpu_mask
+ and x6, x6, x0
+ cmp x5, x6
+ b.eq 2f
+ add x3, x3, #PROC_INFO_SZ
+ cmp x4, x4
+ b.ne 1b
+ mov x3, #0 // unknown processor
+2:
+ mov x0, x3
+ ret
+ENDPROC(__lookup_processor_type)
+
+/*
+ * This provides a C-API version of the above function.
+ */
+ENTRY(lookup_processor_type)
+ mov x8, lr
+ bl __lookup_processor_type
+ ret x8
+ENDPROC(lookup_processor_type)
+
+ .align 3
+ .type __lookup_processor_type_data, %object
+__lookup_processor_type_data:
+ .quad .
+ .quad __proc_info_begin
+ .quad __proc_info_end
+ .size __lookup_processor_type_data, . - __lookup_processor_type_data
+
+/*
+ * Determine validity of the x21 FDT pointer.
+ * The dtb must be 8-byte aligned and live in the first 512M of memory.
+ */
+__vet_fdt:
+ tst x21, #0x7
+ b.ne 1f
+ cmp x21, x24
+ b.lt 1f
+ mov x0, #(1 << 29)
+ add x0, x0, x24
+ cmp x21, x0
+ b.ge 1f
+ ret
+1:
+ mov x21, #0
+ ret
+ENDPROC(__vet_fdt)
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
new file mode 100644
index 0000000..f25186f
--- /dev/null
+++ b/arch/arm64/kernel/setup.c
@@ -0,0 +1,357 @@
+/*
+ * Based on arch/arm/kernel/setup.c
+ *
+ * Copyright (C) 1995-2001 Russell King
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/export.h>
+#include <linux/kernel.h>
+#include <linux/stddef.h>
+#include <linux/ioport.h>
+#include <linux/delay.h>
+#include <linux/utsname.h>
+#include <linux/initrd.h>
+#include <linux/console.h>
+#include <linux/bootmem.h>
+#include <linux/seq_file.h>
+#include <linux/screen_info.h>
+#include <linux/init.h>
+#include <linux/kexec.h>
+#include <linux/crash_dump.h>
+#include <linux/root_dev.h>
+#include <linux/cpu.h>
+#include <linux/interrupt.h>
+#include <linux/smp.h>
+#include <linux/fs.h>
+#include <linux/proc_fs.h>
+#include <linux/memblock.h>
+#include <linux/of_fdt.h>
+
+#include <asm/cputype.h>
+#include <asm/elf.h>
+#include <asm/procinfo.h>
+#include <asm/sections.h>
+#include <asm/setup.h>
+#include <asm/cacheflush.h>
+#include <asm/tlbflush.h>
+#include <asm/traps.h>
+#include <asm/memblock.h>
+
+extern void paging_init(void);
+
+unsigned int processor_id;
+EXPORT_SYMBOL(processor_id);
+
+unsigned int elf_hwcap __read_mostly;
+EXPORT_SYMBOL(elf_hwcap);
+
+static const char *cpu_name;
+static const char *machine_name;
+phys_addr_t __fdt_pointer __initdata;
+
+/*
+ * Standard memory resources
+ */
+static struct resource mem_res[] = {
+ {
+ .name = "Kernel code",
+ .start = 0,
+ .end = 0,
+ .flags = IORESOURCE_MEM
+ },
+ {
+ .name = "Kernel data",
+ .start = 0,
+ .end = 0,
+ .flags = IORESOURCE_MEM
+ }
+};
+
+#define kernel_code mem_res[0]
+#define kernel_data mem_res[1]
+
+/*
+ * These functions re-use the assembly code in head.S, which
+ * already provide the required functionality.
+ */
+extern struct proc_info_list *lookup_processor_type(unsigned int);
+
+void __init early_print(const char *str, ...)
+{
+ char buf[256];
+ va_list ap;
+
+ va_start(ap, str);
+ vsnprintf(buf, sizeof(buf), str, ap);
+ va_end(ap);
+
+ printk("%s", buf);
+}
+
+static void __init setup_processor(void)
+{
+ struct proc_info_list *list;
+
+ /*
+ * locate processor in the list of supported processor
+ * types. The linker builds this table for us from the
+ * entries in arch/arm/mm/proc.S
+ */
+ list = lookup_processor_type(read_cpuid_id());
+ if (!list) {
+ printk("CPU configuration botched (ID %08x), unable to continue.\n",
+ read_cpuid_id());
+ while (1);
+ }
+
+ cpu_name = list->cpu_name;
+
+ printk("CPU: %s [%08x] revision %d\n",
+ cpu_name, read_cpuid_id(), read_cpuid_id() & 15);
+
+ sprintf(init_utsname()->machine, "aarch64");
+ elf_hwcap = 0;
+
+ cpu_proc_init();
+}
+
+static void __init setup_machine_fdt(phys_addr_t dt_phys)
+{
+ struct boot_param_header *devtree;
+ unsigned long dt_root;
+
+ /* Check we have a non-NULL DT pointer */
+ if (!dt_phys) {
+ early_print("\n"
+ "Error: NULL or invalid device tree blob\n"
+ "The dtb must be 8-byte aligned and passed in the first 512MB of memory\n"
+ "\nPlease check your bootloader.\n");
+
+ while (true)
+ cpu_relax();
+
+ }
+
+ devtree = phys_to_virt(dt_phys);
+
+ /* Check device tree validity */
+ if (be32_to_cpu(devtree->magic) != OF_DT_HEADER) {
+ early_print("\n"
+ "Error: invalid device tree blob at physical address 0x%p (virtual address 0x%p)\n"
+ "Expected 0x%x, found 0x%x\n"
+ "\nPlease check your bootloader.\n",
+ dt_phys, devtree, OF_DT_HEADER,
+ be32_to_cpu(devtree->magic));
+
+ while (true)
+ cpu_relax();
+ }
+
+ initial_boot_params = devtree;
+ dt_root = of_get_flat_dt_root();
+
+ machine_name = of_get_flat_dt_prop(dt_root, "model", NULL);
+ if (!machine_name)
+ machine_name = of_get_flat_dt_prop(dt_root, "compatible", NULL);
+ if (!machine_name)
+ machine_name = "<unknown>";
+ pr_info("Machine: %s\n", machine_name);
+
+ /* Retrieve various information from the /chosen node */
+ of_scan_flat_dt(early_init_dt_scan_chosen, boot_command_line);
+ /* Initialize {size,address}-cells info */
+ of_scan_flat_dt(early_init_dt_scan_root, NULL);
+ /* Setup memory, calling early_init_dt_add_memory_arch */
+ of_scan_flat_dt(early_init_dt_scan_memory, NULL);
+}
+
+void __init early_init_dt_add_memory_arch(u64 base, u64 size)
+{
+ size &= PAGE_MASK;
+ memblock_add(base, size);
+}
+
+void * __init early_init_dt_alloc_memory_arch(u64 size, u64 align)
+{
+ return __va(memblock_alloc(size, align));
+}
+
+/*
+ * Limit the memory size that was specified via FDT.
+ */
+static int __init early_mem(char *p)
+{
+ phys_addr_t limit;
+
+ if (!p)
+ return 1;
+
+ limit = memparse(p, &p) & PAGE_MASK;
+ pr_notice("Memory limited to %lldMB\n", limit >> 20);
+
+ memblock_enforce_memory_limit(limit);
+
+ return 0;
+}
+early_param("mem", early_mem);
+
+static void __init request_standard_resources(void)
+{
+ struct memblock_region *region;
+ struct resource *res;
+
+ kernel_code.start = virt_to_phys(_text);
+ kernel_code.end = virt_to_phys(_etext - 1);
+ kernel_data.start = virt_to_phys(_sdata);
+ kernel_data.end = virt_to_phys(_end - 1);
+
+ for_each_memblock(memory, region) {
+ res = alloc_bootmem_low(sizeof(*res));
+ res->name = "System RAM";
+ res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
+ res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
+ res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
+
+ request_resource(&iomem_resource, res);
+
+ if (kernel_code.start >= res->start &&
+ kernel_code.end <= res->end)
+ request_resource(res, &kernel_code);
+ if (kernel_data.start >= res->start &&
+ kernel_data.end <= res->end)
+ request_resource(res, &kernel_data);
+ }
+}
+
+void __init setup_arch(char **cmdline_p)
+{
+ setup_processor();
+
+ setup_machine_fdt(__fdt_pointer);
+
+ init_mm.start_code = (unsigned long) _text;
+ init_mm.end_code = (unsigned long) _etext;
+ init_mm.end_data = (unsigned long) _edata;
+ init_mm.brk = (unsigned long) _end;
+
+ *cmdline_p = boot_command_line;
+
+ parse_early_param();
+
+ arm64_memblock_init();
+
+ paging_init();
+ request_standard_resources();
+
+ unflatten_device_tree();
+
+#ifdef CONFIG_SMP
+ smp_init_cpus();
+#endif
+
+#ifdef CONFIG_VT
+#if defined(CONFIG_VGA_CONSOLE)
+ conswitchp = &vga_con;
+#elif defined(CONFIG_DUMMY_CONSOLE)
+ conswitchp = &dummy_con;
+#endif
+#endif
+}
+
+static DEFINE_PER_CPU(struct cpu, cpu_data);
+
+static int __init topology_init(void)
+{
+ int i;
+
+ for_each_possible_cpu(i) {
+ struct cpu *cpu = &per_cpu(cpu_data, i);
+ cpu->hotpluggable = 1;
+ register_cpu(cpu, i);
+ }
+
+ return 0;
+}
+subsys_initcall(topology_init);
+
+static const char *hwcap_str[] = {
+ "fp",
+ "asimd",
+ NULL
+};
+
+static int c_show(struct seq_file *m, void *v)
+{
+ int i;
+
+ seq_printf(m, "Processor\t: %s rev %d (%s)\n",
+ cpu_name, read_cpuid_id() & 15, ELF_PLATFORM);
+
+ for_each_online_cpu(i) {
+ /*
+ * glibc reads /proc/cpuinfo to determine the number of
+ * online processors, looking for lines beginning with
+ * "processor". Give glibc what it expects.
+ */
+#ifdef CONFIG_SMP
+ seq_printf(m, "processor\t: %d\n", i);
+#endif
+ seq_printf(m, "BogoMIPS\t: %lu.%02lu\n\n",
+ loops_per_jiffy / (500000UL/HZ),
+ loops_per_jiffy / (5000UL/HZ) % 100);
+ }
+
+ /* dump out the processor features */
+ seq_puts(m, "Features\t: ");
+
+ for (i = 0; hwcap_str[i]; i++)
+ if (elf_hwcap & (1 << i))
+ seq_printf(m, "%s ", hwcap_str[i]);
+
+ seq_printf(m, "\nCPU implementer\t: 0x%02x\n", read_cpuid_id() >> 24);
+ seq_printf(m, "CPU architecture: AArch64\n");
+ seq_printf(m, "CPU variant\t: 0x%x\n", (read_cpuid_id() >> 20) & 15);
+ seq_printf(m, "CPU part\t: 0x%03x\n", (read_cpuid_id() >> 4) & 0xfff);
+ seq_printf(m, "CPU revision\t: %d\n", read_cpuid_id() & 15);
+
+ seq_puts(m, "\n");
+
+ seq_printf(m, "Hardware\t: %s\n", machine_name);
+
+ return 0;
+}
+
+static void *c_start(struct seq_file *m, loff_t *pos)
+{
+ return *pos < 1 ? (void *)1 : NULL;
+}
+
+static void *c_next(struct seq_file *m, void *v, loff_t *pos)
+{
+ ++*pos;
+ return NULL;
+}
+
+static void c_stop(struct seq_file *m, void *v)
+{
+}
+
+const struct seq_operations cpuinfo_op = {
+ .start = c_start,
+ .next = c_next,
+ .stop = c_stop,
+ .show = c_show
+};

2012-08-14 17:54:43

by Catalin Marinas

[permalink] [raw]
Subject: [PATCH v2 30/31] arm64: Build infrastructure

This patch adds Makefile and Kconfig files required for building an
AArch64 kernel.

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
---
arch/arm64/Kconfig | 261 ++++++++++++++++++++++++++++++++++
arch/arm64/Kconfig.debug | 27 ++++
arch/arm64/Makefile | 71 +++++++++
arch/arm64/boot/.gitignore | 2 +
arch/arm64/boot/Makefile | 38 +++++
arch/arm64/boot/install.sh | 52 +++++++
arch/arm64/configs/generic_defconfig | 85 +++++++++++
arch/arm64/include/asm/prom.h | 1 +
arch/arm64/kernel/.gitignore | 1 +
arch/arm64/kernel/Makefile | 27 ++++
arch/arm64/kernel/vmlinux.lds.S | 146 +++++++++++++++++++
arch/arm64/mm/Kconfig | 5 +
arch/arm64/mm/Makefile | 6 +
init/Kconfig | 3 +-
lib/Kconfig.debug | 6 +-
15 files changed, 728 insertions(+), 3 deletions(-)
create mode 100644 arch/arm64/Kconfig
create mode 100644 arch/arm64/Kconfig.debug
create mode 100644 arch/arm64/Makefile
create mode 100644 arch/arm64/boot/.gitignore
create mode 100644 arch/arm64/boot/Makefile
create mode 100644 arch/arm64/boot/install.sh
create mode 100644 arch/arm64/configs/generic_defconfig
create mode 100644 arch/arm64/include/asm/prom.h
create mode 100644 arch/arm64/kernel/.gitignore
create mode 100644 arch/arm64/kernel/Makefile
create mode 100644 arch/arm64/kernel/vmlinux.lds.S
create mode 100644 arch/arm64/mm/Kconfig
create mode 100644 arch/arm64/mm/Makefile

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
new file mode 100644
index 0000000..1ce3d04
--- /dev/null
+++ b/arch/arm64/Kconfig
@@ -0,0 +1,261 @@
+config ARM64
+ def_bool y
+ select OF
+ select OF_EARLY_FLATTREE
+ select IRQ_DOMAIN
+ select HAVE_AOUT
+ select HAVE_DMA_ATTRS
+ select HAVE_DMA_API_DEBUG
+ select HAVE_IDE
+ select HAVE_MEMBLOCK
+ select RTC_LIB
+ select SYS_SUPPORTS_APM_EMULATION
+ select HAVE_GENERIC_DMA_COHERENT
+ select GENERIC_IOMAP
+ select HAVE_IRQ_WORK
+ select HAVE_PERF_EVENTS
+ select HAVE_ARCH_TRACEHOOK
+ select PERF_USE_VMALLOC
+ select HAVE_HW_BREAKPOINT if PERF_EVENTS
+ select HAVE_GENERIC_HARDIRQS
+ select GENERIC_HARDIRQS_NO_DEPRECATED
+ select HAVE_SPARSE_IRQ
+ select SPARSE_IRQ
+ select GENERIC_IRQ_SHOW
+ select GENERIC_SMP_IDLE_THREAD
+ select NO_BOOTMEM
+ help
+ ARM 64-bit (AArch64) Linux support.
+
+config 64BIT
+ def_bool y
+
+config ARCH_PHYS_ADDR_T_64BIT
+ def_bool y
+
+config HAVE_PWM
+ bool
+
+config SYS_SUPPORTS_APM_EMULATION
+ bool
+
+config NO_IOPORT
+ def_bool y
+
+config GENERIC_GPIO
+ bool
+
+config GENERIC_TIME_VSYSCALL
+ def_bool y
+
+config GENERIC_CLOCKEVENTS
+ def_bool y
+
+config STACKTRACE_SUPPORT
+ def_bool y
+
+config LOCKDEP_SUPPORT
+ def_bool y
+
+config TRACE_IRQFLAGS_SUPPORT
+ def_bool y
+
+config HARDIRQS_SW_RESEND
+ def_bool y
+
+config GENERIC_IRQ_PROBE
+ def_bool y
+
+config GENERIC_LOCKBREAK
+ def_bool y
+ depends on SMP && PREEMPT
+
+config RWSEM_GENERIC_SPINLOCK
+ def_bool y
+
+config RWSEM_XCHGADD_ALGORITHM
+ bool
+
+config ARCH_HAS_ILOG2_U32
+ bool
+
+config ARCH_HAS_ILOG2_U64
+ bool
+
+config ARCH_HAS_CPUFREQ
+ bool
+ help
+ Internal node to signify that the ARCH has CPUFREQ support
+ and that the relevant menu configurations are displayed for
+ it.
+
+config GENERIC_HWEIGHT
+ def_bool y
+
+config GENERIC_CSUM
+ def_bool y
+
+config GENERIC_CALIBRATE_DELAY
+ def_bool y
+
+config ZONE_DMA32
+ def_bool y
+
+config ARCH_DMA_ADDR_T_64BIT
+ def_bool y
+
+config NEED_DMA_MAP_STATE
+ def_bool y
+
+config NEED_SG_DMA_LENGTH
+ def_bool y
+
+config SWIOTLB
+ def_bool y
+
+config IOMMU_HELPER
+ def_bool SWIOTLB
+
+source "init/Kconfig"
+
+source "kernel/Kconfig.freezer"
+
+menu "System Type"
+
+source "arch/arm64/mm/Kconfig"
+
+endmenu
+
+menu "Bus support"
+
+config ARM_AMBA
+ bool
+
+endmenu
+
+menu "Kernel Features"
+
+source "kernel/time/Kconfig"
+
+config ARM64_64K_PAGES
+ bool "Enable 64KB pages support"
+ help
+ This feature enables 64KB pages support (4KB by default)
+ allowing only two levels of page tables and faster TLB
+ look-up. AArch32 emulation is not available when this feature
+ is enabled.
+
+config SMP
+ bool "Symmetric Multi-Processing"
+ depends on GENERIC_CLOCKEVENTS
+ select USE_GENERIC_SMP_HELPERS
+ help
+ This enables support for systems with more than one CPU. If
+ you say N here, the kernel will run on single and
+ multiprocessor machines, but will use only one CPU of a
+ multiprocessor machine. If you say Y here, the kernel will run
+ on many, but not all, single processor machines. On a single
+ processor machine, the kernel will run faster if you say N
+ here.
+
+ If you don't know what to do here, say N.
+
+config NR_CPUS
+ int "Maximum number of CPUs (2-32)"
+ range 2 32
+ depends on SMP
+ default "4"
+
+source kernel/Kconfig.preempt
+
+config HZ
+ int
+ default 100
+
+config ARCH_HAS_HOLES_MEMORYMODEL
+ def_bool y if SPARSEMEM
+
+config ARCH_SPARSEMEM_ENABLE
+ def_bool y
+ select SPARSEMEM_VMEMMAP_ENABLE
+
+config ARCH_SPARSEMEM_DEFAULT
+ def_bool ARCH_SPARSEMEM_ENABLE
+
+config ARCH_SELECT_MEMORY_MODEL
+ def_bool ARCH_SPARSEMEM_ENABLE
+
+config HAVE_ARCH_PFN_VALID
+ def_bool ARCH_HAS_HOLES_MEMORYMODEL || !SPARSEMEM
+
+config HW_PERF_EVENTS
+ bool "Enable hardware performance counter support for perf events"
+ depends on PERF_EVENTS
+ default y
+ help
+ Enable hardware performance counter support for perf events. If
+ disabled, perf events will use software events only.
+
+source "mm/Kconfig"
+
+endmenu
+
+menu "Boot options"
+
+config CMDLINE
+ string "Default kernel command string"
+ default ""
+ help
+ Provide a set of default command-line options at build time by
+ entering them here. As a minimum, you should specify the the
+ root device (e.g. root=/dev/nfs).
+
+config CMDLINE_FORCE
+ bool "Always use the default kernel command string"
+ help
+ Always use the default kernel command string, even if the boot
+ loader passes other arguments to the kernel.
+ This is useful if you cannot or don't want to change the
+ command-line options your boot loader passes to the kernel.
+
+endmenu
+
+menu "Userspace binary formats"
+
+source "fs/Kconfig.binfmt"
+
+config AARCH32_EMULATION
+ bool "Kernel support for 32-bit EL0"
+ depends on !ARM64_64K_PAGES
+ select COMPAT_BINFMT_ELF
+ help
+ This option enables support for a 32-bit EL0 running under a 64-bit
+ kernel at EL1. AArch32-specific components such as system calls,
+ the user helper functions, VFP support and the ptrace interface are
+ handled appropriately by the kernel.
+
+ If you want to execute 32-bit userspace applications, say Y.
+
+config COMPAT
+ def_bool y
+ depends on AARCH32_EMULATION
+
+config SYSVIPC_COMPAT
+ def_bool y
+ depends on COMPAT && SYSVIPC
+
+endmenu
+
+source "net/Kconfig"
+
+source "drivers/Kconfig"
+
+source "fs/Kconfig"
+
+source "arch/arm64/Kconfig.debug"
+
+source "security/Kconfig"
+
+source "crypto/Kconfig"
+
+source "lib/Kconfig"
diff --git a/arch/arm64/Kconfig.debug b/arch/arm64/Kconfig.debug
new file mode 100644
index 0000000..d7553f2
--- /dev/null
+++ b/arch/arm64/Kconfig.debug
@@ -0,0 +1,27 @@
+menu "Kernel hacking"
+
+source "lib/Kconfig.debug"
+
+config FRAME_POINTER
+ bool
+ default y
+
+config DEBUG_ERRORS
+ bool "Verbose kernel error messages"
+ depends on DEBUG_KERNEL
+ help
+ This option controls verbose debugging information which can be
+ printed when the kernel detects an internal error. This debugging
+ information is useful to kernel hackers when tracking down problems,
+ but mostly meaningless to other people. It's safe to say Y unless
+ you are concerned with the code size or don't want to see these
+ messages.
+
+config DEBUG_STACK_USAGE
+ bool "Enable stack utilization instrumentation"
+ depends on DEBUG_KERNEL
+ help
+ Enables the display of the minimum amount of free stack which each
+ task has ever had available in the sysrq-T output.
+
+endmenu
diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
new file mode 100644
index 0000000..831bd41
--- /dev/null
+++ b/arch/arm64/Makefile
@@ -0,0 +1,71 @@
+#
+# arch/arm64/Makefile
+#
+# This file is included by the global makefile so that you can add your own
+# architecture-specific flags and dependencies.
+#
+# This file is subject to the terms and conditions of the GNU General Public
+# License. See the file "COPYING" in the main directory of this archive
+# for more details.
+#
+# Copyright (C) 1995-2001 by Russell King
+
+LDFLAGS_vmlinux :=-p --no-undefined -X
+CPPFLAGS_vmlinux.lds = -DTEXT_OFFSET=$(TEXT_OFFSET)
+OBJCOPYFLAGS :=-O binary -R .note -R .note.gnu.build-id -R .comment -S
+GZFLAGS :=-9
+
+LIBGCC := $(shell $(CC) $(KBUILD_CFLAGS) -print-libgcc-file-name)
+
+KBUILD_DEFCONFIG := generic_defconfig
+
+KBUILD_CFLAGS += -mgeneral-regs-only
+KBUILD_CPPFLAGS += -mlittle-endian
+AS += -EL
+LD += -EL
+
+comma = ,
+
+CHECKFLAGS += -D__aarch64__
+
+# Default value
+head-y := arch/arm64/kernel/head.o
+
+# The byte offset of the kernel image in RAM from the start of RAM.
+TEXT_OFFSET := 0x00080000
+
+export TEXT_OFFSET GZFLAGS
+
+core-y += arch/arm64/kernel/ arch/arm64/mm/
+libs-y := arch/arm64/lib/ $(libs-y)
+libs-y += $(LIBGCC)
+
+# Default target when executing plain make
+KBUILD_IMAGE := Image.gz
+
+all: $(KBUILD_IMAGE)
+
+boot := arch/arm64/boot
+
+Image Image.gz: vmlinux
+ $(Q)$(MAKE) $(build)=$(boot) MACHINE=$(MACHINE) $(boot)/$@
+
+zinstall install: vmlinux
+ $(Q)$(MAKE) $(build)=$(boot) MACHINE=$(MACHINE) $@
+
+%.dtb:
+ $(Q)$(MAKE) $(build)=$(boot) MACHINE=$(MACHINE) $(boot)/$@
+
+# We use MRPROPER_FILES and CLEAN_FILES now
+archclean:
+ $(Q)$(MAKE) $(clean)=$(boot)
+
+define archhelp
+ echo '* Image.gz - Compressed kernel image (arch/$(ARCH)/boot/Image.gz)'
+ echo ' Image - Uncompressed kernel image (arch/$(ARCH)/boot/Image)'
+ echo ' install - Install uncompressed kernel'
+ echo ' zinstall - Install compressed kernel'
+ echo ' Install using (your) ~/bin/installkernel or'
+ echo ' (distribution) /sbin/installkernel or'
+ echo ' install to $$(INSTALL_PATH) and run lilo'
+endef
diff --git a/arch/arm64/boot/.gitignore b/arch/arm64/boot/.gitignore
new file mode 100644
index 0000000..8dab0bb
--- /dev/null
+++ b/arch/arm64/boot/.gitignore
@@ -0,0 +1,2 @@
+Image
+Image.gz
diff --git a/arch/arm64/boot/Makefile b/arch/arm64/boot/Makefile
new file mode 100644
index 0000000..15a58a8
--- /dev/null
+++ b/arch/arm64/boot/Makefile
@@ -0,0 +1,38 @@
+#
+# arch/arm64/boot/Makefile
+#
+# This file is included by the global makefile so that you can add your own
+# architecture-specific flags and dependencies.
+#
+# This file is subject to the terms and conditions of the GNU General Public
+# License. See the file "COPYING" in the main directory of this archive
+# for more details.
+#
+# Copyright (C) 2012, ARM Ltd.
+# Author: Will Deacon <[email protected]>
+#
+# Based on the ia64 boot/Makefile.
+#
+
+targets := Image Image.gz
+
+$(obj)/Image: vmlinux FORCE
+ $(call if_changed,objcopy)
+ @echo ' Kernel: $@ is ready'
+
+$(obj)/Image.gz: $(obj)/Image FORCE
+ $(call if_changed,gzip)
+ @echo ' Kernel: $@ is ready'
+
+$(obj)/%.dtb: $(src)/dts/%.dts
+ $(call cmd,dtc)
+
+install: $(obj)/Image
+ $(CONFIG_SHELL) $(srctree)/$(src)/install.sh $(KERNELRELEASE) \
+ $(obj)/Image System.map "$(INSTALL_PATH)"
+
+zinstall: $(obj)/Image.gz
+ $(CONFIG_SHELL) $(srctree)/$(src)/install.sh $(KERNELRELEASE) \
+ $(obj)/Image.gz System.map "$(INSTALL_PATH)"
+
+clean-files += *.dtb
diff --git a/arch/arm64/boot/install.sh b/arch/arm64/boot/install.sh
new file mode 100644
index 0000000..9151e21
--- /dev/null
+++ b/arch/arm64/boot/install.sh
@@ -0,0 +1,52 @@
+#!/bin/sh
+#
+# arch/arm64/boot/install.sh
+#
+# This file is subject to the terms and conditions of the GNU General Public
+# License. See the file "COPYING" in the main directory of this archive
+# for more details.
+#
+# Copyright (C) 1995 by Linus Torvalds
+#
+# Adapted from code in arch/i386/boot/Makefile by H. Peter Anvin
+# Adapted from code in arch/i386/boot/install.sh by Russell King
+#
+# "make install" script for the AArch64 Linux port
+#
+# Arguments:
+# $1 - kernel version
+# $2 - kernel image file
+# $3 - kernel map file
+# $4 - default install path (blank if root directory)
+#
+
+# User may have a custom install script
+if [ -x ~/bin/${INSTALLKERNEL} ]; then exec ~/bin/${INSTALLKERNEL} "$@"; fi
+if [ -x /sbin/${INSTALLKERNEL} ]; then exec /sbin/${INSTALLKERNEL} "$@"; fi
+
+if [ "$(basename $2)" = "Image.gz" ]; then
+# Compressed install
+ echo "Installing compressed kernel"
+ base=vmlinuz
+else
+# Normal install
+ echo "Installing normal kernel"
+ base=vmlinux
+fi
+
+if [ -f $4/$base-$1 ]; then
+ mv $4/$base-$1 $4/$base-$1.old
+fi
+cat $2 > $4/$base-$1
+
+# Install system map file
+if [ -f $4/System.map-$1 ]; then
+ mv $4/System.map-$1 $4/System.map-$1.old
+fi
+cp $3 $4/System.map-$1
+
+if [ -x /sbin/loadmap ]; then
+ /sbin/loadmap
+else
+ echo "You have to install it yourself"
+fi
diff --git a/arch/arm64/configs/generic_defconfig b/arch/arm64/configs/generic_defconfig
new file mode 100644
index 0000000..d9aac95
--- /dev/null
+++ b/arch/arm64/configs/generic_defconfig
@@ -0,0 +1,85 @@
+CONFIG_EXPERIMENTAL=y
+# CONFIG_LOCALVERSION_AUTO is not set
+# CONFIG_SWAP is not set
+CONFIG_SYSVIPC=y
+CONFIG_POSIX_MQUEUE=y
+CONFIG_BSD_PROCESS_ACCT=y
+CONFIG_BSD_PROCESS_ACCT_V3=y
+CONFIG_NO_HZ=y
+CONFIG_HIGH_RES_TIMERS=y
+CONFIG_IKCONFIG=y
+CONFIG_IKCONFIG_PROC=y
+CONFIG_LOG_BUF_SHIFT=14
+# CONFIG_UTS_NS is not set
+# CONFIG_IPC_NS is not set
+# CONFIG_PID_NS is not set
+# CONFIG_NET_NS is not set
+CONFIG_SCHED_AUTOGROUP=y
+CONFIG_BLK_DEV_INITRD=y
+CONFIG_KALLSYMS_ALL=y
+# CONFIG_COMPAT_BRK is not set
+CONFIG_PROFILING=y
+CONFIG_MODULES=y
+CONFIG_MODULE_UNLOAD=y
+# CONFIG_BLK_DEV_BSG is not set
+# CONFIG_IOSCHED_DEADLINE is not set
+CONFIG_SMP=y
+CONFIG_PREEMPT_VOLUNTARY=y
+CONFIG_CMDLINE="console=ttyAMA0"
+# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
+CONFIG_AARCH32_EMULATION=y
+CONFIG_NET=y
+CONFIG_PACKET=y
+CONFIG_UNIX=y
+CONFIG_INET=y
+CONFIG_IP_PNP=y
+CONFIG_IP_PNP_DHCP=y
+CONFIG_IP_PNP_BOOTP=y
+# CONFIG_INET_LRO is not set
+# CONFIG_IPV6 is not set
+# CONFIG_WIRELESS is not set
+CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
+CONFIG_DEVTMPFS=y
+# CONFIG_BLK_DEV is not set
+CONFIG_SCSI=y
+# CONFIG_SCSI_PROC_FS is not set
+CONFIG_BLK_DEV_SD=y
+# CONFIG_SCSI_LOWLEVEL is not set
+CONFIG_NETDEVICES=y
+CONFIG_MII=y
+# CONFIG_WLAN is not set
+CONFIG_INPUT_EVDEV=y
+# CONFIG_SERIO_I8042 is not set
+# CONFIG_SERIO_SERPORT is not set
+CONFIG_LEGACY_PTY_COUNT=16
+# CONFIG_HW_RANDOM is not set
+# CONFIG_HWMON is not set
+CONFIG_FB=y
+# CONFIG_VGA_CONSOLE is not set
+CONFIG_FRAMEBUFFER_CONSOLE=y
+CONFIG_LOGO=y
+# CONFIG_LOGO_LINUX_MONO is not set
+# CONFIG_LOGO_LINUX_VGA16 is not set
+# CONFIG_USB_SUPPORT is not set
+# CONFIG_IOMMU_SUPPORT is not set
+CONFIG_EXT2_FS=y
+CONFIG_EXT3_FS=y
+# CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set
+# CONFIG_EXT3_FS_XATTR is not set
+CONFIG_FUSE_FS=y
+CONFIG_CUSE=y
+CONFIG_VFAT_FS=y
+CONFIG_TMPFS=y
+# CONFIG_MISC_FILESYSTEMS is not set
+CONFIG_NFS_FS=y
+CONFIG_ROOT_NFS=y
+CONFIG_NLS_CODEPAGE_437=y
+CONFIG_NLS_ISO8859_1=y
+CONFIG_MAGIC_SYSRQ=y
+CONFIG_DEBUG_FS=y
+CONFIG_DEBUG_KERNEL=y
+# CONFIG_SCHED_DEBUG is not set
+CONFIG_DEBUG_INFO=y
+# CONFIG_FTRACE is not set
+CONFIG_ATOMIC64_SELFTEST=y
+CONFIG_DEBUG_ERRORS=y
diff --git a/arch/arm64/include/asm/prom.h b/arch/arm64/include/asm/prom.h
new file mode 100644
index 0000000..68b90e6
--- /dev/null
+++ b/arch/arm64/include/asm/prom.h
@@ -0,0 +1 @@
+/* Empty for now */
diff --git a/arch/arm64/kernel/.gitignore b/arch/arm64/kernel/.gitignore
new file mode 100644
index 0000000..c5f676c
--- /dev/null
+++ b/arch/arm64/kernel/.gitignore
@@ -0,0 +1 @@
+vmlinux.lds
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
new file mode 100644
index 0000000..59fbdef
--- /dev/null
+++ b/arch/arm64/kernel/Makefile
@@ -0,0 +1,27 @@
+#
+# Makefile for the linux kernel.
+#
+
+CPPFLAGS_vmlinux.lds := -DTEXT_OFFSET=$(TEXT_OFFSET)
+AFLAGS_head.o := -DTEXT_OFFSET=$(TEXT_OFFSET)
+
+# Object file lists.
+arm64-obj-y := debug-monitors.o elf.o entry.o irq.o fpsimd.o \
+ entry-fpsimd.o process.o ptrace.o setup.o signal.o \
+ sys.o stacktrace.o time.o traps.o io.o vdso.o
+
+arm64-obj-$(CONFIG_AARCH32_EMULATION) += sys32.o kuser32.o signal32.o \
+ sys_compat.o
+arm64-obj-$(CONFIG_MODULES) += arm64ksyms.o module.o
+arm64-obj-$(CONFIG_SMP) += smp.o
+arm64-obj-$(CONFIG_HW_PERF_EVENTS) += perf_event.o
+arm64-obj-$(CONFIG_HAVE_HW_BREAKPOINT)+= hw_breakpoint.o
+
+obj-y += $(arm64-obj-y) vdso/
+obj-m += $(arm64-obj-m)
+head-y := head.o
+extra-y := $(head-y) vmlinux.lds
+
+# vDSO - this must be built first to generate the symbol offsets
+$(call objectify,$(arm64-obj-y)): $(obj)/vdso/vdso-offsets.h
+$(obj)/vdso/vdso-offsets.h: $(obj)/vdso
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
new file mode 100644
index 0000000..5eab87b
--- /dev/null
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -0,0 +1,146 @@
+/*
+ * ld script to make ARM Linux kernel
+ * taken from the i386 version by Russell King
+ * Written by Martin Mares <[email protected]>
+ */
+
+#include <asm-generic/vmlinux.lds.h>
+#include <asm/thread_info.h>
+#include <asm/memory.h>
+#include <asm/page.h>
+
+#define PROC_INFO \
+ VMLINUX_SYMBOL(__proc_info_begin) = .; \
+ *(.proc.info.init) \
+ VMLINUX_SYMBOL(__proc_info_end) = .;
+
+#define ARM_CPU_DISCARD(x) x
+#define ARM_CPU_KEEP(x)
+
+#define ARM_EXIT_KEEP(x)
+#define ARM_EXIT_DISCARD(x) x
+
+OUTPUT_ARCH(aarch64)
+ENTRY(stext)
+
+jiffies = jiffies_64;
+
+SECTIONS
+{
+ /*
+ * XXX: The linker does not define how output sections are
+ * assigned to input sections when there are multiple statements
+ * matching the same input section name. There is no documented
+ * order of matching.
+ */
+ /DISCARD/ : {
+ ARM_EXIT_DISCARD(EXIT_TEXT)
+ ARM_EXIT_DISCARD(EXIT_DATA)
+ EXIT_CALL
+ *(.discard)
+ *(.discard.*)
+ }
+
+ . = PAGE_OFFSET + TEXT_OFFSET;
+
+ .head.text : {
+ _text = .;
+ HEAD_TEXT
+ }
+ .text : { /* Real text segment */
+ _stext = .; /* Text and read-only data */
+ *(.smp.pen.text)
+ __exception_text_start = .;
+ *(.exception.text)
+ __exception_text_end = .;
+ IRQENTRY_TEXT
+ TEXT_TEXT
+ SCHED_TEXT
+ LOCK_TEXT
+ *(.fixup)
+ *(.gnu.warning)
+ . = ALIGN(16);
+ *(.got) /* Global offset table */
+ ARM_CPU_KEEP(PROC_INFO)
+ }
+
+ RO_DATA(PAGE_SIZE)
+
+ _etext = .; /* End of text and rodata section */
+
+ . = ALIGN(PAGE_SIZE);
+ __init_begin = .;
+
+ INIT_TEXT_SECTION(8)
+ .exit.text : {
+ ARM_EXIT_KEEP(EXIT_TEXT)
+ }
+ . = ALIGN(16);
+ .init.proc.info : {
+ ARM_CPU_DISCARD(PROC_INFO)
+ }
+ . = ALIGN(16);
+ .init.data : {
+ INIT_DATA
+ INIT_SETUP(16)
+ INIT_CALLS
+ CON_INITCALL
+ SECURITY_INITCALL
+ INIT_RAM_FS
+ }
+ .exit.data : {
+ ARM_EXIT_KEEP(EXIT_DATA)
+ }
+
+ PERCPU_SECTION(64)
+
+ __init_end = .;
+ . = ALIGN(THREAD_SIZE);
+ __data_loc = .;
+
+ .data : AT(__data_loc) {
+ _data = .; /* address in memory */
+ _sdata = .;
+
+ /*
+ * first, the init task union, aligned
+ * to an 8192 byte boundary.
+ */
+ INIT_TASK_DATA(THREAD_SIZE)
+ NOSAVE_DATA
+ CACHELINE_ALIGNED_DATA(64)
+ READ_MOSTLY_DATA(64)
+
+ /*
+ * The exception fixup table (might need resorting at runtime)
+ */
+ . = ALIGN(32);
+ __start___ex_table = .;
+ *(__ex_table)
+ __stop___ex_table = .;
+
+ /*
+ * and the usual data section
+ */
+ DATA_DATA
+ CONSTRUCTORS
+
+ _edata = .;
+ }
+ _edata_loc = __data_loc + SIZEOF(.data);
+
+ NOTES
+
+ BSS_SECTION(0, 0, 0)
+ _end = .;
+
+ STABS_DEBUG
+ .comment 0 : { *(.comment) }
+}
+
+/*
+ * These must never be empty
+ * If you have to comment these two assert statements out, your
+ * binutils is too old (for other reasons as well)
+ */
+ASSERT((__proc_info_end - __proc_info_begin), "missing CPU support")
diff --git a/arch/arm64/mm/Kconfig b/arch/arm64/mm/Kconfig
new file mode 100644
index 0000000..8e94e52
--- /dev/null
+++ b/arch/arm64/mm/Kconfig
@@ -0,0 +1,5 @@
+config MMU
+ def_bool y
+
+config CPU_64
+ def_bool y
diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
new file mode 100644
index 0000000..81a9d8b
--- /dev/null
+++ b/arch/arm64/mm/Makefile
@@ -0,0 +1,6 @@
+obj-y := dma-mapping.o extable.o fault.o init.o \
+ cache.o copypage.o flush.o \
+ ioremap.o mmap.o pgd.o mmu.o \
+ context.o tlb.o proc.o
+
+obj-$(CONFIG_MODULES) += proc-syms.o
diff --git a/init/Kconfig b/init/Kconfig
index af6c7f8..8bfda46 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1153,7 +1153,8 @@ menuconfig EXPERT

config UID16
bool "Enable 16-bit UID system calls" if EXPERT
- depends on ARM || BLACKFIN || CRIS || FRV || H8300 || X86_32 || M68K || (S390 && !64BIT) || SUPERH || SPARC32 || (SPARC64 && COMPAT) || UML || (X86_64 && IA32_EMULATION)
+ depends on ARM || BLACKFIN || CRIS || FRV || H8300 || X86_32 || M68K || (S390 && !64BIT) || SUPERH || SPARC32 || (SPARC64 && COMPAT) || UML || (X86_64 && IA32_EMULATION) \
+ || AARCH32_EMULATION
default y
help
This enables the legacy 16-bit UID syscall wrappers.
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 2403a63..cfb4578 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -452,7 +452,8 @@ config SLUB_STATS
config DEBUG_KMEMLEAK
bool "Kernel memory leak detector"
depends on DEBUG_KERNEL && EXPERIMENTAL && \
- (X86 || ARM || PPC || MIPS || S390 || SPARC64 || SUPERH || MICROBLAZE || TILE)
+ (X86 || ARM || PPC || MIPS || S390 || SPARC64 || SUPERH || \
+ MICROBLAZE || TILE || ARM64)

select DEBUG_FS
select STACKTRACE if STACKTRACE_SUPPORT
@@ -739,7 +740,8 @@ config DEBUG_BUGVERBOSE
bool "Verbose BUG() reporting (adds 70K)" if DEBUG_KERNEL && EXPERT
depends on BUG
depends on ARM || AVR32 || M32R || M68K || SPARC32 || SPARC64 || \
- FRV || SUPERH || GENERIC_BUG || BLACKFIN || MN10300 || TILE
+ FRV || SUPERH || GENERIC_BUG || BLACKFIN || MN10300 || \
+ TILE || ARM64
default y
help
Say Y here to make BUG() panics output the file name and line number

2012-08-14 17:54:41

by Catalin Marinas

[permalink] [raw]
Subject: [PATCH v2 31/31] arm64: MAINTAINERS update

This patch updates the MAINTAINERS file for the AArch64 Linux kernel
port.

Signed-off-by: Catalin Marinas <[email protected]>
---
MAINTAINERS | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 94b823f..6d7c5f4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1204,6 +1204,12 @@ S: Maintained
F: arch/arm/mach-pxa/z2.c
F: arch/arm/mach-pxa/include/mach/z2.h

+ARM64 PORT (AARCH64 ARCHITECTURE)
+M: Catalin Marinas <[email protected]>
+L: [email protected] (moderated for non-subscribers)
+S: Maintained
+F: arch/arm64/
+
ASC7621 HARDWARE MONITOR DRIVER
M: George Joseph <[email protected]>
L: [email protected]

2012-08-14 17:54:39

by Catalin Marinas

[permalink] [raw]
Subject: [PATCH v2 29/31] arm64: Miscellaneous header files

This patch introduces a few AArch64-specific header files together with
Kbuild entries for generic headers.

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
---
arch/arm64/include/asm/Kbuild | 51 ++++++++++
arch/arm64/include/asm/barrier.h | 52 ++++++++++
arch/arm64/include/asm/bitsperlong.h | 23 +++++
arch/arm64/include/asm/byteorder.h | 21 ++++
arch/arm64/include/asm/cmpxchg.h | 180 ++++++++++++++++++++++++++++++++++
arch/arm64/include/asm/compiler.h | 30 ++++++
arch/arm64/include/asm/exception.h | 23 +++++
arch/arm64/include/asm/exec.h | 23 +++++
arch/arm64/include/asm/fcntl.h | 29 ++++++
arch/arm64/include/asm/system_misc.h | 54 ++++++++++
10 files changed, 486 insertions(+), 0 deletions(-)
create mode 100644 arch/arm64/include/asm/Kbuild
create mode 100644 arch/arm64/include/asm/barrier.h
create mode 100644 arch/arm64/include/asm/bitsperlong.h
create mode 100644 arch/arm64/include/asm/byteorder.h
create mode 100644 arch/arm64/include/asm/cmpxchg.h
create mode 100644 arch/arm64/include/asm/compiler.h
create mode 100644 arch/arm64/include/asm/exception.h
create mode 100644 arch/arm64/include/asm/exec.h
create mode 100644 arch/arm64/include/asm/fcntl.h
create mode 100644 arch/arm64/include/asm/system_misc.h

diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild
new file mode 100644
index 0000000..35924a5
--- /dev/null
+++ b/arch/arm64/include/asm/Kbuild
@@ -0,0 +1,51 @@
+include include/asm-generic/Kbuild.asm
+
+header-y += hwcap.h
+
+generic-y += bug.h
+generic-y += bugs.h
+generic-y += checksum.h
+generic-y += cputime.h
+generic-y += current.h
+generic-y += delay.h
+generic-y += div64.h
+generic-y += dma.h
+generic-y += emergency-restart.h
+generic-y += errno.h
+generic-y += ftrace.h
+generic-y += hw_irq.h
+generic-y += ioctl.h
+generic-y += ioctls.h
+generic-y += ipcbuf.h
+generic-y += irq_regs.h
+generic-y += kdebug.h
+generic-y += kmap_types.h
+generic-y += linkage.h
+generic-y += local.h
+generic-y += local64.h
+generic-y += mman.h
+generic-y += msgbuf.h
+generic-y += mutex.h
+generic-y += pci.h
+generic-y += percpu.h
+generic-y += poll.h
+generic-y += posix_types.h
+generic-y += resource.h
+generic-y += scatterlist.h
+generic-y += sections.h
+generic-y += segment.h
+generic-y += sembuf.h
+generic-y += serial.h
+generic-y += shmbuf.h
+generic-y += sizes.h
+generic-y += socket.h
+generic-y += sockios.h
+generic-y += string.h
+generic-y += switch_to.h
+generic-y += swab.h
+generic-y += termbits.h
+generic-y += termios.h
+generic-y += topology.h
+generic-y += types.h
+generic-y += unaligned.h
+generic-y += user.h
diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
new file mode 100644
index 0000000..d4a6333
--- /dev/null
+++ b/arch/arm64/include/asm/barrier.h
@@ -0,0 +1,52 @@
+/*
+ * Based on arch/arm/include/asm/barrier.h
+ *
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_BARRIER_H
+#define __ASM_BARRIER_H
+
+#ifndef __ASSEMBLY__
+
+#define sev() asm volatile("sev" : : : "memory")
+#define wfe() asm volatile("wfe" : : : "memory")
+#define wfi() asm volatile("wfi" : : : "memory")
+
+#define isb() asm volatile("isb" : : : "memory")
+#define dsb() asm volatile("dsb sy" : : : "memory")
+
+#define mb() dsb()
+#define rmb() asm volatile("dsb ld" : : : "memory")
+#define wmb() asm volatile("dsb st" : : : "memory")
+
+#ifndef CONFIG_SMP
+#define smp_mb() barrier()
+#define smp_rmb() barrier()
+#define smp_wmb() barrier()
+#else
+#define smp_mb() asm volatile("dmb ish" : : : "memory")
+#define smp_rmb() asm volatile("dmb ishld" : : : "memory")
+#define smp_wmb() asm volatile("dmb ishst" : : : "memory")
+#endif
+
+#define read_barrier_depends() do { } while(0)
+#define smp_read_barrier_depends() do { } while(0)
+
+#define set_mb(var, value) do { var = value; smp_mb(); } while (0)
+#define nop() asm volatile("nop");
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* __ASM_BARRIER_H */
diff --git a/arch/arm64/include/asm/bitsperlong.h b/arch/arm64/include/asm/bitsperlong.h
new file mode 100644
index 0000000..fce9c29
--- /dev/null
+++ b/arch/arm64/include/asm/bitsperlong.h
@@ -0,0 +1,23 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_BITSPERLONG_H
+#define __ASM_BITSPERLONG_H
+
+#define __BITS_PER_LONG 64
+
+#include <asm-generic/bitsperlong.h>
+
+#endif /* __ASM_BITSPERLONG_H */
diff --git a/arch/arm64/include/asm/byteorder.h b/arch/arm64/include/asm/byteorder.h
new file mode 100644
index 0000000..2b92046
--- /dev/null
+++ b/arch/arm64/include/asm/byteorder.h
@@ -0,0 +1,21 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_BYTEORDER_H
+#define __ASM_BYTEORDER_H
+
+#include <linux/byteorder/little_endian.h>
+
+#endif /* __ASM_BYTEORDER_H */
diff --git a/arch/arm64/include/asm/cmpxchg.h b/arch/arm64/include/asm/cmpxchg.h
new file mode 100644
index 0000000..dc50de7
--- /dev/null
+++ b/arch/arm64/include/asm/cmpxchg.h
@@ -0,0 +1,180 @@
+/*
+ * Based on arch/arm/include/asm/cmpxchg.h
+ *
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_CMPXCHG_H
+#define __ASM_CMPXCHG_H
+
+#include <linux/irqflags.h>
+#include <asm/barrier.h>
+
+static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size)
+{
+ extern void __bad_xchg(volatile void *, int);
+ unsigned long ret, tmp;
+
+ switch (size) {
+ case 1:
+ asm volatile("// __xchg1\n"
+ "1: ldaxrb %w0, [%3]\n"
+ " stlxrb %w1, %w2, [%3]\n"
+ " cbnz %w1, 1b\n"
+ : "=&r" (ret), "=&r" (tmp)
+ : "r" (x), "r" (ptr)
+ : "memory", "cc");
+ break;
+ case 2:
+ asm volatile("// __xchg2\n"
+ "1: ldaxrh %w0, [%3]\n"
+ " stlxrh %w1, %w2, [%3]\n"
+ " cbnz %w1, 1b\n"
+ : "=&r" (ret), "=&r" (tmp)
+ : "r" (x), "r" (ptr)
+ : "memory", "cc");
+ break;
+ case 4:
+ asm volatile("// __xchg4\n"
+ "1: ldaxr %w0, [%3]\n"
+ " stlxr %w1, %w2, [%3]\n"
+ " cbnz %w1, 1b\n"
+ : "=&r" (ret), "=&r" (tmp)
+ : "r" (x), "r" (ptr)
+ : "memory", "cc");
+ break;
+ case 8:
+ asm volatile("// __xchg8\n"
+ "1: ldaxr %0, [%3]\n"
+ " stlxr %w1, %2, [%3]\n"
+ " cbnz %w1, 1b\n"
+ : "=&r" (ret), "=&r" (tmp)
+ : "r" (x), "r" (ptr)
+ : "memory", "cc");
+ break;
+ default:
+ __bad_xchg(ptr, size), ret = 0;
+ break;
+ }
+
+ return ret;
+}
+
+#define xchg(ptr,x) \
+ ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr))))
+
+/*
+ * cmpxchg operations.
+ */
+extern void __bad_cmpxchg(volatile void *ptr, int size);
+
+static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
+ unsigned long new, int size)
+{
+ unsigned long oldval, res;
+
+ switch (size) {
+ case 1:
+ do {
+ asm volatile("// __cmpxchg1\n"
+ " ldxrb %w1, [%2]\n"
+ " mov %w0, #0\n"
+ " cmp %w1, %w3\n"
+ " b.ne 1f\n"
+ " stxrb %w0, %w4, [%2]\n"
+ "1:\n"
+ : "=&r" (res), "=&r" (oldval)
+ : "r" (ptr), "Ir" (old), "r" (new)
+ : "cc");
+ } while (res);
+ break;
+
+ case 2:
+ do {
+ asm volatile("// __cmpxchg2\n"
+ " ldxrh %w1, [%2]\n"
+ " mov %w0, #0\n"
+ " cmp %w1, %w3\n"
+ " b.ne 1f\n"
+ " stxrh %w0, %w4, [%2]\n"
+ "1:\n"
+ : "=&r" (res), "=&r" (oldval)
+ : "r" (ptr), "Ir" (old), "r" (new)
+ : "memory", "cc");
+ } while (res);
+ break;
+
+ case 4:
+ do {
+ asm volatile("// __cmpxchg4\n"
+ " ldxr %w1, [%2]\n"
+ " mov %w0, #0\n"
+ " cmp %w1, %w3\n"
+ " b.ne 1f\n"
+ " stxr %w0, %w4, [%2]\n"
+ "1:\n"
+ : "=&r" (res), "=&r" (oldval)
+ : "r" (ptr), "Ir" (old), "r" (new)
+ : "cc");
+ } while (res);
+ break;
+
+ case 8:
+ do {
+ asm volatile("// __cmpxchg8\n"
+ " ldxr %1, [%2]\n"
+ " mov %w0, #0\n"
+ " cmp %1, %3\n"
+ " b.ne 1f\n"
+ " stxr %w0, %4, [%2]\n"
+ "1:\n"
+ : "=&r" (res), "=&r" (oldval)
+ : "r" (ptr), "Ir" (old), "r" (new)
+ : "cc");
+ } while (res);
+ break;
+
+ default:
+ __bad_cmpxchg(ptr, size);
+ oldval = 0;
+ }
+
+ return oldval;
+}
+
+static inline unsigned long __cmpxchg_mb(volatile void *ptr, unsigned long old,
+ unsigned long new, int size)
+{
+ unsigned long ret;
+
+ smp_mb();
+ ret = __cmpxchg(ptr, old, new, size);
+ smp_mb();
+
+ return ret;
+}
+
+#define cmpxchg(ptr,o,n) \
+ ((__typeof__(*(ptr)))__cmpxchg_mb((ptr), \
+ (unsigned long)(o), \
+ (unsigned long)(n), \
+ sizeof(*(ptr))))
+
+#define cmpxchg_local(ptr,o,n) \
+ ((__typeof__(*(ptr)))__cmpxchg((ptr), \
+ (unsigned long)(o), \
+ (unsigned long)(n), \
+ sizeof(*(ptr))))
+
+#endif /* __ASM_CMPXCHG_H */
diff --git a/arch/arm64/include/asm/compiler.h b/arch/arm64/include/asm/compiler.h
new file mode 100644
index 0000000..ee35fd0
--- /dev/null
+++ b/arch/arm64/include/asm/compiler.h
@@ -0,0 +1,30 @@
+/*
+ * Based on arch/arm/include/asm/compiler.h
+ *
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_COMPILER_H
+#define __ASM_COMPILER_H
+
+/*
+ * This is used to ensure the compiler did actually allocate the register we
+ * asked it for some inline assembly sequences. Apparently we can't trust the
+ * compiler from one version to another so a bit of paranoia won't hurt. This
+ * string is meant to be concatenated with the inline asm string and will
+ * cause compilation to stop on mismatch. (for details, see gcc PR 15089)
+ */
+#define __asmeq(x, y) ".ifnc " x "," y " ; .err ; .endif\n\t"
+
+#endif /* __ASM_COMPILER_H */
diff --git a/arch/arm64/include/asm/exception.h b/arch/arm64/include/asm/exception.h
new file mode 100644
index 0000000..ac63519
--- /dev/null
+++ b/arch/arm64/include/asm/exception.h
@@ -0,0 +1,23 @@
+/*
+ * Based on arch/arm/include/asm/exception.h
+ *
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_EXCEPTION_H
+#define __ASM_EXCEPTION_H
+
+#define __exception __attribute__((section(".exception.text")))
+
+#endif /* __ASM_EXCEPTION_H */
diff --git a/arch/arm64/include/asm/exec.h b/arch/arm64/include/asm/exec.h
new file mode 100644
index 0000000..db0563c
--- /dev/null
+++ b/arch/arm64/include/asm/exec.h
@@ -0,0 +1,23 @@
+/*
+ * Based on arch/arm/include/asm/exec.h
+ *
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_EXEC_H
+#define __ASM_EXEC_H
+
+extern unsigned long arch_align_stack(unsigned long sp);
+
+#endif /* __ASM_EXEC_H */
diff --git a/arch/arm64/include/asm/fcntl.h b/arch/arm64/include/asm/fcntl.h
new file mode 100644
index 0000000..cd2e630
--- /dev/null
+++ b/arch/arm64/include/asm/fcntl.h
@@ -0,0 +1,29 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_FCNTL_H
+#define __ASM_FCNTL_H
+
+/*
+ * Using our own definitions for AArch32 (compat) support.
+ */
+#define O_DIRECTORY 040000 /* must be a directory */
+#define O_NOFOLLOW 0100000 /* don't follow links */
+#define O_DIRECT 0200000 /* direct disk access hint - currently ignored */
+#define O_LARGEFILE 0400000
+
+#include <asm-generic/fcntl.h>
+
+#endif
diff --git a/arch/arm64/include/asm/system_misc.h b/arch/arm64/include/asm/system_misc.h
new file mode 100644
index 0000000..95e4072
--- /dev/null
+++ b/arch/arm64/include/asm/system_misc.h
@@ -0,0 +1,54 @@
+/*
+ * Based on arch/arm/include/asm/system_misc.h
+ *
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_SYSTEM_MISC_H
+#define __ASM_SYSTEM_MISC_H
+
+#ifndef __ASSEMBLY__
+
+#include <linux/compiler.h>
+#include <linux/linkage.h>
+#include <linux/irqflags.h>
+
+struct pt_regs;
+
+void die(const char *msg, struct pt_regs *regs, int err);
+
+struct siginfo;
+void arm64_notify_die(const char *str, struct pt_regs *regs,
+ struct siginfo *info, int err);
+
+void hook_debug_fault_code(int nr, int (*fn)(unsigned long, unsigned int,
+ struct pt_regs *),
+ int sig, int code, const char *name);
+
+struct mm_struct;
+extern void show_pte(struct mm_struct *mm, unsigned long addr);
+extern void __show_regs(struct pt_regs *);
+
+void soft_restart(unsigned long);
+extern void (*pm_restart)(const char *cmd);
+
+#define UDBG_UNDEFINED (1 << 0)
+#define UDBG_SYSCALL (1 << 1)
+#define UDBG_BADABORT (1 << 2)
+#define UDBG_SEGV (1 << 3)
+#define UDBG_BUS (1 << 4)
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* __ASM_SYSTEM_MISC_H */

2012-08-14 17:54:36

by Catalin Marinas

[permalink] [raw]
Subject: [PATCH v2 28/31] arm64: Generic timers support

From: Marc Zyngier <[email protected]>

This patch adds support for the ARM generic timers with A64 instructions
for accessing the timer registers. It uses the physical counter as the
clock source and the virtual counter as sched_clock.

The timer frequency can be specified via DT or read from the CNTFRQ_EL0
register. The physical counter is also accessible from user space
allowing fast gettimeofday() implementation.

Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
---
arch/arm64/include/asm/timex.h | 32 ++++
arch/arm64/kernel/time.c | 65 ++++++++
drivers/clocksource/Kconfig | 5 +
drivers/clocksource/Makefile | 1 +
drivers/clocksource/arm_generic.c | 309 +++++++++++++++++++++++++++++++++++++
include/clocksource/arm_generic.h | 21 +++
6 files changed, 433 insertions(+), 0 deletions(-)
create mode 100644 arch/arm64/include/asm/timex.h
create mode 100644 arch/arm64/kernel/time.c
create mode 100644 drivers/clocksource/arm_generic.c
create mode 100644 include/clocksource/arm_generic.h

diff --git a/arch/arm64/include/asm/timex.h b/arch/arm64/include/asm/timex.h
new file mode 100644
index 0000000..88f74de
--- /dev/null
+++ b/arch/arm64/include/asm/timex.h
@@ -0,0 +1,32 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_TIMEX_H
+#define __ASM_TIMEX_H
+
+/*
+ * Use the current timer as a cycle counter since this is what we use for
+ * the delay loop.
+ */
+#define get_cycles() ({ cycles_t c; read_current_timer(&c); c; })
+
+#include <asm-generic/timex.h>
+
+#define ARCH_HAS_READ_CURRENT_TIMER
+
+/* This isn't really used any more */
+#define CLOCK_TICK_RATE 1000
+
+#endif
diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
new file mode 100644
index 0000000..3b4b725
--- /dev/null
+++ b/arch/arm64/kernel/time.c
@@ -0,0 +1,65 @@
+/*
+ * Based on arch/arm/kernel/time.c
+ *
+ * Copyright (C) 1991, 1992, 1995 Linus Torvalds
+ * Modifications for ARM (C) 1994-2001 Russell King
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/export.h>
+#include <linux/kernel.h>
+#include <linux/interrupt.h>
+#include <linux/time.h>
+#include <linux/init.h>
+#include <linux/sched.h>
+#include <linux/smp.h>
+#include <linux/timex.h>
+#include <linux/errno.h>
+#include <linux/profile.h>
+#include <linux/syscore_ops.h>
+#include <linux/timer.h>
+#include <linux/irq.h>
+
+#include <clocksource/arm_generic.h>
+
+#include <asm/thread_info.h>
+#include <asm/stacktrace.h>
+
+#ifdef CONFIG_SMP
+unsigned long profile_pc(struct pt_regs *regs)
+{
+ struct stackframe frame;
+
+ if (!in_lock_functions(regs->pc))
+ return regs->pc;
+
+ frame.fp = regs->regs[29];
+ frame.sp = regs->sp;
+ frame.pc = regs->pc;
+ do {
+ int ret = unwind_frame(&frame);
+ if (ret < 0)
+ return 0;
+ } while (in_lock_functions(frame.pc));
+
+ return frame.pc;
+}
+EXPORT_SYMBOL(profile_pc);
+#endif
+
+void __init time_init(void)
+{
+ arm_generic_timer_init();
+}
diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
index d53cd0a..6a78073 100644
--- a/drivers/clocksource/Kconfig
+++ b/drivers/clocksource/Kconfig
@@ -35,3 +35,8 @@ config CLKSRC_DBX500_PRCMU_SCHED_CLOCK
default y
help
Use the always on PRCMU Timer as sched_clock
+
+config CLKSRC_ARM_GENERIC
+ def_bool y if ARM64
+ help
+ This option enables support for the ARM generic timer.
diff --git a/drivers/clocksource/Makefile b/drivers/clocksource/Makefile
index b65d0c5..6591990 100644
--- a/drivers/clocksource/Makefile
+++ b/drivers/clocksource/Makefile
@@ -13,3 +13,4 @@ obj-$(CONFIG_DW_APB_TIMER) += dw_apb_timer.o
obj-$(CONFIG_DW_APB_TIMER_OF) += dw_apb_timer_of.o
obj-$(CONFIG_CLKSRC_DBX500_PRCMU) += clksrc-dbx500-prcmu.o
obj-$(CONFIG_ARMADA_370_XP_TIMER) += time-armada-370-xp.o
+obj-$(CONFIG_CLKSRC_ARM_GENERIC) += arm_generic.o
diff --git a/drivers/clocksource/arm_generic.c b/drivers/clocksource/arm_generic.c
new file mode 100644
index 0000000..05c898c
--- /dev/null
+++ b/drivers/clocksource/arm_generic.c
@@ -0,0 +1,309 @@
+/*
+ * Generic timers support
+ *
+ * Copyright (C) 2012 ARM Ltd.
+ * Author: Marc Zyngier <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/delay.h>
+#include <linux/device.h>
+#include <linux/smp.h>
+#include <linux/cpu.h>
+#include <linux/jiffies.h>
+#include <linux/interrupt.h>
+#include <linux/clockchips.h>
+#include <linux/of_irq.h>
+#include <linux/io.h>
+
+#include <clocksource/arm_generic.h>
+
+static u32 arch_timer_rate;
+static u64 sched_clock_mult __read_mostly;
+static DEFINE_PER_CPU(struct clock_event_device, arch_timer_evt);
+static int arch_timer_ppi;
+
+/*
+ * Architected system timer support.
+ */
+
+#define ARCH_TIMER_CTRL_ENABLE (1 << 0)
+#define ARCH_TIMER_CTRL_IT_MASK (1 << 1)
+
+#define ARCH_TIMER_REG_CTRL 0
+#define ARCH_TIMER_REG_FREQ 1
+#define ARCH_TIMER_REG_TVAL 2
+
+static void arch_timer_reg_write(int reg, u32 val)
+{
+ switch (reg) {
+ case ARCH_TIMER_REG_CTRL:
+ asm volatile("msr cntp_ctl_el0, %0" : : "r" (val));
+ break;
+ case ARCH_TIMER_REG_TVAL:
+ asm volatile("msr cntp_tval_el0, %0" : : "r" (val));
+ break;
+ default:
+ BUG();
+ }
+
+ isb();
+}
+
+static u32 arch_timer_reg_read(int reg)
+{
+ u32 val;
+
+ switch (reg) {
+ case ARCH_TIMER_REG_CTRL:
+ asm volatile("mrs %0, cntp_ctl_el0" : "=r" (val));
+ break;
+ case ARCH_TIMER_REG_FREQ:
+ asm volatile("mrs %0, cntfrq_el0" : "=r" (val));
+ break;
+ case ARCH_TIMER_REG_TVAL:
+ asm volatile("mrs %0, cntp_tval_el0" : "=r" (val));
+ break;
+ default:
+ BUG();
+ }
+
+ return val;
+}
+
+static irqreturn_t arch_timer_handle_irq(int irq, void *dev_id)
+{
+ struct clock_event_device *evt = dev_id;
+ unsigned long ctrl;
+
+ ctrl = arch_timer_reg_read(ARCH_TIMER_REG_CTRL);
+ if (ctrl & 0x4) {
+ ctrl |= ARCH_TIMER_CTRL_IT_MASK;
+ arch_timer_reg_write(ARCH_TIMER_REG_CTRL, ctrl);
+ evt->event_handler(evt);
+ return IRQ_HANDLED;
+ }
+
+ return IRQ_NONE;
+}
+
+static void arch_timer_stop(void)
+{
+ unsigned long ctrl;
+
+ ctrl = arch_timer_reg_read(ARCH_TIMER_REG_CTRL);
+ ctrl &= ~ARCH_TIMER_CTRL_ENABLE;
+ arch_timer_reg_write(ARCH_TIMER_REG_CTRL, ctrl);
+}
+
+static void arch_timer_set_mode(enum clock_event_mode mode,
+ struct clock_event_device *clk)
+{
+ switch (mode) {
+ case CLOCK_EVT_MODE_UNUSED:
+ case CLOCK_EVT_MODE_SHUTDOWN:
+ arch_timer_stop();
+ break;
+ default:
+ break;
+ }
+}
+
+static int arch_timer_set_next_event(unsigned long evt,
+ struct clock_event_device *unused)
+{
+ unsigned long ctrl;
+
+ ctrl = arch_timer_reg_read(ARCH_TIMER_REG_CTRL);
+ ctrl |= ARCH_TIMER_CTRL_ENABLE;
+ ctrl &= ~ARCH_TIMER_CTRL_IT_MASK;
+
+ arch_timer_reg_write(ARCH_TIMER_REG_TVAL, evt);
+ arch_timer_reg_write(ARCH_TIMER_REG_CTRL, ctrl);
+
+ return 0;
+}
+
+static void __cpuinit arch_counter_enable_user_access(void)
+{
+ u32 cntkctl;
+
+ /* Disable user access to the timers and the virtual counter. */
+ asm volatile("mrs %0, cntkctl_el1" : "=r" (cntkctl));
+ cntkctl &= ~((3 << 8) | (1 << 1));
+
+ /* Enable user access to the physical counter and frequency. */
+ cntkctl |= 1;
+ asm volatile("msr cntkctl_el1, %0" : : "r" (cntkctl));
+}
+
+static void __cpuinit arch_timer_setup(struct clock_event_device *clk)
+{
+ /* Let's make sure the timer is off before doing anything else */
+ arch_timer_stop();
+
+ clk->features = CLOCK_EVT_FEAT_ONESHOT;
+ clk->name = "arch_sys_timer";
+ clk->rating = 400;
+ clk->set_mode = arch_timer_set_mode;
+ clk->set_next_event = arch_timer_set_next_event;
+ clk->irq = arch_timer_ppi;
+ clk->cpumask = cpumask_of(smp_processor_id());
+
+ clockevents_config_and_register(clk, arch_timer_rate,
+ 0xf, 0x7fffffff);
+
+ enable_percpu_irq(clk->irq, 0);
+
+ /* Ensure the physical counter is visible to userspace for the vDSO. */
+ arch_counter_enable_user_access();
+}
+
+static void __init arch_timer_calibrate(void)
+{
+ if (arch_timer_rate == 0) {
+ arch_timer_reg_write(ARCH_TIMER_REG_CTRL, 0);
+ arch_timer_rate = arch_timer_reg_read(ARCH_TIMER_REG_FREQ);
+
+ /* Check the timer frequency. */
+ if (arch_timer_rate == 0)
+ panic("Architected timer frequency is set to zero.\n"
+ "You must set this in your .dts file\n");
+ }
+
+ /* Cache the sched_clock multiplier to save a divide in the hot path. */
+
+ sched_clock_mult = NSEC_PER_SEC / arch_timer_rate;
+
+ pr_info("Architected local timer running at %u.%02uMHz.\n",
+ arch_timer_rate / 1000000, (arch_timer_rate / 10000) % 100);
+}
+
+static inline cycle_t arch_counter_get_cntpct(void)
+{
+ cycle_t cval;
+
+ asm volatile("mrs %0, cntpct_el0" : "=r" (cval));
+
+ return cval;
+}
+
+static inline cycle_t arch_counter_get_cntvct(void)
+{
+ cycle_t cval;
+
+ asm volatile("mrs %0, cntvct_el0" : "=r" (cval));
+
+ return cval;
+}
+
+static cycle_t arch_counter_read(struct clocksource *cs)
+{
+ return arch_counter_get_cntpct();
+}
+
+static struct clocksource clocksource_counter = {
+ .name = "arch_sys_counter",
+ .rating = 400,
+ .read = arch_counter_read,
+ .mask = CLOCKSOURCE_MASK(56),
+ .flags = (CLOCK_SOURCE_IS_CONTINUOUS | CLOCK_SOURCE_VALID_FOR_HRES),
+};
+
+int read_current_timer(unsigned long *timer_value)
+{
+ *timer_value = arch_counter_get_cntpct();
+ return 0;
+}
+
+unsigned long long notrace sched_clock(void)
+{
+ return arch_counter_get_cntvct() * sched_clock_mult;
+}
+
+static int __cpuinit arch_timer_cpu_notify(struct notifier_block *self,
+ unsigned long action, void *hcpu)
+{
+ int cpu = (long)hcpu;
+ struct clock_event_device *clk = per_cpu_ptr(&arch_timer_evt, cpu);
+
+ switch(action) {
+ case CPU_STARTING:
+ case CPU_STARTING_FROZEN:
+ arch_timer_setup(clk);
+ break;
+
+ case CPU_DYING:
+ case CPU_DYING_FROZEN:
+ pr_debug("arch_timer_teardown disable IRQ%d cpu #%d\n",
+ clk->irq, cpu);
+ disable_percpu_irq(clk->irq);
+ arch_timer_set_mode(CLOCK_EVT_MODE_UNUSED, clk);
+ break;
+ }
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block __cpuinitdata arch_timer_cpu_nb = {
+ .notifier_call = arch_timer_cpu_notify,
+};
+
+static const struct of_device_id arch_timer_of_match[] __initconst = {
+ { .compatible = "arm,armv8-timer" },
+ {},
+};
+
+int __init arm_generic_timer_init(void)
+{
+ struct device_node *np;
+ int err;
+ u32 freq;
+
+ np = of_find_matching_node(NULL, arch_timer_of_match);
+ if (!np) {
+ pr_err("arch_timer: can't find DT node\n");
+ return -ENODEV;
+ }
+
+ /* Try to determine the frequency from the device tree or CNTFRQ */
+ if (!of_property_read_u32(np, "clock-frequency", &freq))
+ arch_timer_rate = freq;
+ arch_timer_calibrate();
+
+ arch_timer_ppi = irq_of_parse_and_map(np, 0);
+ pr_info("arch_timer: found %s irq %d\n", np->name, arch_timer_ppi);
+
+ err = request_percpu_irq(arch_timer_ppi, arch_timer_handle_irq,
+ np->name, &arch_timer_evt);
+ if (err) {
+ pr_err("arch_timer: can't register interrupt %d (%d)\n",
+ arch_timer_ppi, err);
+ return err;
+ }
+
+ clocksource_register_hz(&clocksource_counter, arch_timer_rate);
+
+ /* Calibrate the delay loop directly */
+ lpj_fine = arch_timer_rate / HZ;
+
+ /* Immediately configure the timer on the boot CPU */
+ arch_timer_setup(per_cpu_ptr(&arch_timer_evt, smp_processor_id()));
+
+ register_cpu_notifier(&arch_timer_cpu_nb);
+
+ return 0;
+}
diff --git a/include/clocksource/arm_generic.h b/include/clocksource/arm_generic.h
new file mode 100644
index 0000000..5b41b0d
--- /dev/null
+++ b/include/clocksource/arm_generic.h
@@ -0,0 +1,21 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __CLKSOURCE_ARM_GENERIC_H
+#define __CLKSOURCE_ARM_GENERIC_H
+
+extern int arm_generic_timer_init(void);
+
+#endif

2012-08-14 17:54:34

by Catalin Marinas

[permalink] [raw]
Subject: [PATCH v2 27/31] arm64: Loadable modules

From: Will Deacon <[email protected]>

This patch adds support for loadable modules. Loadable modules are
loaded 64MB below the kernel image due to branch relocation restrictions
(see Documentation/arm64/memory.txt).

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
---
arch/arm64/include/asm/module.h | 23 ++
arch/arm64/kernel/module.c | 456 +++++++++++++++++++++++++++++++++++++++
2 files changed, 479 insertions(+), 0 deletions(-)
create mode 100644 arch/arm64/include/asm/module.h
create mode 100644 arch/arm64/kernel/module.c

diff --git a/arch/arm64/include/asm/module.h b/arch/arm64/include/asm/module.h
new file mode 100644
index 0000000..e80e232
--- /dev/null
+++ b/arch/arm64/include/asm/module.h
@@ -0,0 +1,23 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_MODULE_H
+#define __ASM_MODULE_H
+
+#include <asm-generic/module.h>
+
+#define MODULE_ARCH_VERMAGIC "aarch64"
+
+#endif /* __ASM_MODULE_H */
diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
new file mode 100644
index 0000000..ca0e3d5
--- /dev/null
+++ b/arch/arm64/kernel/module.c
@@ -0,0 +1,456 @@
+/*
+ * AArch64 loadable module support.
+ *
+ * Copyright (C) 2012 ARM Limited
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Author: Will Deacon <[email protected]>
+ */
+
+#include <linux/bitops.h>
+#include <linux/elf.h>
+#include <linux/gfp.h>
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/moduleloader.h>
+#include <linux/vmalloc.h>
+
+void *module_alloc(unsigned long size)
+{
+ return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
+ GFP_KERNEL, PAGE_KERNEL_EXEC, -1,
+ __builtin_return_address(0));
+}
+
+enum aarch64_reloc_op {
+ RELOC_OP_NONE,
+ RELOC_OP_ABS,
+ RELOC_OP_PREL,
+ RELOC_OP_PAGE,
+};
+
+static u64 do_reloc(enum aarch64_reloc_op reloc_op, void *place, u64 val)
+{
+ switch (reloc_op) {
+ case RELOC_OP_ABS:
+ return val;
+ case RELOC_OP_PREL:
+ return val - (u64)place;
+ case RELOC_OP_PAGE:
+ return (val & ~0xfff) - ((u64)place & ~0xfff);
+ case RELOC_OP_NONE:
+ return 0;
+ }
+
+ pr_err("do_reloc: unknown relocation operation %d\n", reloc_op);
+ return 0;
+}
+
+static int reloc_data(enum aarch64_reloc_op op, void *place, u64 val, int len)
+{
+ u64 imm_mask = (1 << len) - 1;
+ s64 sval = do_reloc(op, place, val);
+
+ switch (len) {
+ case 16:
+ *(s16 *)place = sval;
+ break;
+ case 32:
+ *(s32 *)place = sval;
+ break;
+ case 64:
+ *(s64 *)place = sval;
+ break;
+ default:
+ pr_err("Invalid length (%d) for data relocation\n", len);
+ return 0;
+ }
+
+ /*
+ * Extract the upper value bits (including the sign bit) and
+ * shift them to bit 0.
+ */
+ sval = (s64)(sval & ~(imm_mask >> 1)) >> (len - 1);
+
+ /*
+ * Overflow has occurred if the value is not representable in
+ * len bits (i.e the bottom len bits are not sign-extended and
+ * the top bits are not all zero).
+ */
+ if ((u64)(sval + 1) > 2)
+ return -ERANGE;
+
+ return 0;
+}
+
+enum aarch64_imm_type {
+ INSN_IMM_MOVNZ,
+ INSN_IMM_MOVK,
+ INSN_IMM_ADR,
+ INSN_IMM_26,
+ INSN_IMM_19,
+ INSN_IMM_16,
+ INSN_IMM_14,
+ INSN_IMM_12,
+ INSN_IMM_9,
+};
+
+static u32 encode_insn_immediate(enum aarch64_imm_type type, u32 insn, u64 imm)
+{
+ u32 immlo, immhi, lomask, himask, mask;
+ int shift;
+
+ switch (type) {
+ case INSN_IMM_MOVNZ:
+ /*
+ * For signed MOVW relocations, we have to manipulate the
+ * instruction encoding depending on whether or not the
+ * immediate is less than zero.
+ */
+ insn &= ~(3 << 29);
+ if ((s64)imm >= 0) {
+ /* >=0: Set the instruction to MOVZ (opcode 10b). */
+ insn |= 2 << 29;
+ } else {
+ /*
+ * <0: Set the instruction to MOVN (opcode 00b).
+ * Since we've masked the opcode already, we
+ * don't need to do anything other than
+ * inverting the new immediate field.
+ */
+ imm = ~imm;
+ }
+ case INSN_IMM_MOVK:
+ mask = BIT(16) - 1;
+ shift = 5;
+ break;
+ case INSN_IMM_ADR:
+ lomask = 0x3;
+ himask = 0x7ffff;
+ immlo = imm & lomask;
+ imm >>= 2;
+ immhi = imm & himask;
+ imm = (immlo << 24) | (immhi);
+ mask = (lomask << 24) | (himask);
+ shift = 5;
+ break;
+ case INSN_IMM_26:
+ mask = BIT(26) - 1;
+ shift = 0;
+ break;
+ case INSN_IMM_19:
+ mask = BIT(19) - 1;
+ shift = 5;
+ break;
+ case INSN_IMM_16:
+ mask = BIT(16) - 1;
+ shift = 5;
+ break;
+ case INSN_IMM_14:
+ mask = BIT(14) - 1;
+ shift = 5;
+ break;
+ case INSN_IMM_12:
+ mask = BIT(12) - 1;
+ shift = 10;
+ break;
+ case INSN_IMM_9:
+ mask = BIT(9) - 1;
+ shift = 12;
+ break;
+ default:
+ pr_err("encode_insn_immediate: unknown immediate encoding %d\n",
+ type);
+ return 0;
+ }
+
+ /* Update the immediate field. */
+ insn &= ~(mask << shift);
+ insn |= (imm & mask) << shift;
+
+ return insn;
+}
+
+static int reloc_insn_movw(enum aarch64_reloc_op op, void *place, u64 val,
+ int lsb, enum aarch64_imm_type imm_type)
+{
+ u64 imm, limit = 0;
+ s64 sval;
+ u32 insn = *(u32 *)place;
+
+ sval = do_reloc(op, place, val);
+ sval >>= lsb;
+ imm = sval & 0xffff;
+
+ /* Update the instruction with the new encoding. */
+ *(u32 *)place = encode_insn_immediate(imm_type, insn, imm);
+
+ /* Shift out the immediate field. */
+ sval >>= 16;
+
+ /*
+ * For unsigned immediates, the overflow check is straightforward.
+ * For signed immediates, the sign bit is actually the bit past the
+ * most significant bit of the field.
+ * The INSN_IMM_16 immediate type is unsigned.
+ */
+ if (imm_type != INSN_IMM_16) {
+ sval++;
+ limit++;
+ }
+
+ /* Check the upper bits depending on the sign of the immediate. */
+ if ((u64)sval > limit)
+ return -ERANGE;
+
+ return 0;
+}
+
+static int reloc_insn_imm(enum aarch64_reloc_op op, void *place, u64 val,
+ int lsb, int len, enum aarch64_imm_type imm_type)
+{
+ u64 imm, imm_mask;
+ s64 sval;
+ u32 insn = *(u32 *)place;
+
+ /* Calculate the relocation value. */
+ sval = do_reloc(op, place, val);
+ sval >>= lsb;
+
+ /* Extract the value bits and shift them to bit 0. */
+ imm_mask = (BIT(lsb + len) - 1) >> lsb;
+ imm = sval & imm_mask;
+
+ /* Update the instruction's immediate field. */
+ *(u32 *)place = encode_insn_immediate(imm_type, insn, imm);
+
+ /*
+ * Extract the upper value bits (including the sign bit) and
+ * shift them to bit 0.
+ */
+ sval = (s64)(sval & ~(imm_mask >> 1)) >> (len - 1);
+
+ /*
+ * Overflow has occurred if the upper bits are not all equal to
+ * the sign bit of the value.
+ */
+ if ((u64)(sval + 1) >= 2)
+ return -ERANGE;
+
+ return 0;
+}
+
+int apply_relocate_add(Elf64_Shdr *sechdrs,
+ const char *strtab,
+ unsigned int symindex,
+ unsigned int relsec,
+ struct module *me)
+{
+ unsigned int i;
+ int ovf;
+ bool overflow_check;
+ Elf64_Sym *sym;
+ void *loc;
+ u64 val;
+ Elf64_Rela *rel = (void *)sechdrs[relsec].sh_addr;
+
+ for (i = 0; i < sechdrs[relsec].sh_size / sizeof(*rel); i++) {
+ /* loc corresponds to P in the AArch64 ELF document. */
+ loc = (void *)sechdrs[sechdrs[relsec].sh_info].sh_addr
+ + rel[i].r_offset;
+
+ /* sym is the ELF symbol we're referring to. */
+ sym = (Elf64_Sym *)sechdrs[symindex].sh_addr
+ + ELF64_R_SYM(rel[i].r_info);
+
+ /* val corresponds to (S + A) in the AArch64 ELF document. */
+ val = sym->st_value + rel[i].r_addend;
+
+ /* Check for overflow by default. */
+ overflow_check = true;
+
+ /* Perform the static relocation. */
+ switch (ELF64_R_TYPE(rel[i].r_info)) {
+ /* Null relocations. */
+ case R_ARM_NONE:
+ case R_AARCH64_NONE:
+ ovf = 0;
+ break;
+
+ /* Data relocations. */
+ case R_AARCH64_ABS64:
+ overflow_check = false;
+ ovf = reloc_data(RELOC_OP_ABS, loc, val, 64);
+ break;
+ case R_AARCH64_ABS32:
+ ovf = reloc_data(RELOC_OP_ABS, loc, val, 32);
+ break;
+ case R_AARCH64_ABS16:
+ ovf = reloc_data(RELOC_OP_ABS, loc, val, 16);
+ break;
+ case R_AARCH64_PREL64:
+ overflow_check = false;
+ ovf = reloc_data(RELOC_OP_PREL, loc, val, 64);
+ break;
+ case R_AARCH64_PREL32:
+ ovf = reloc_data(RELOC_OP_PREL, loc, val, 32);
+ break;
+ case R_AARCH64_PREL16:
+ ovf = reloc_data(RELOC_OP_PREL, loc, val, 16);
+ break;
+
+ /* MOVW instruction relocations. */
+ case R_AARCH64_MOVW_UABS_G0_NC:
+ overflow_check = false;
+ case R_AARCH64_MOVW_UABS_G0:
+ ovf = reloc_insn_movw(RELOC_OP_ABS, loc, val, 0,
+ INSN_IMM_16);
+ break;
+ case R_AARCH64_MOVW_UABS_G1_NC:
+ overflow_check = false;
+ case R_AARCH64_MOVW_UABS_G1:
+ ovf = reloc_insn_movw(RELOC_OP_ABS, loc, val, 16,
+ INSN_IMM_16);
+ break;
+ case R_AARCH64_MOVW_UABS_G2_NC:
+ overflow_check = false;
+ case R_AARCH64_MOVW_UABS_G2:
+ ovf = reloc_insn_movw(RELOC_OP_ABS, loc, val, 32,
+ INSN_IMM_16);
+ break;
+ case R_AARCH64_MOVW_UABS_G3:
+ /* We're using the top bits so we can't overflow. */
+ overflow_check = false;
+ ovf = reloc_insn_movw(RELOC_OP_ABS, loc, val, 48,
+ INSN_IMM_16);
+ break;
+ case R_AARCH64_MOVW_SABS_G0:
+ ovf = reloc_insn_movw(RELOC_OP_ABS, loc, val, 0,
+ INSN_IMM_MOVNZ);
+ break;
+ case R_AARCH64_MOVW_SABS_G1:
+ ovf = reloc_insn_movw(RELOC_OP_ABS, loc, val, 16,
+ INSN_IMM_MOVNZ);
+ break;
+ case R_AARCH64_MOVW_SABS_G2:
+ ovf = reloc_insn_movw(RELOC_OP_ABS, loc, val, 32,
+ INSN_IMM_MOVNZ);
+ break;
+ case R_AARCH64_MOVW_PREL_G0_NC:
+ overflow_check = false;
+ ovf = reloc_insn_movw(RELOC_OP_PREL, loc, val, 0,
+ INSN_IMM_MOVK);
+ break;
+ case R_AARCH64_MOVW_PREL_G0:
+ ovf = reloc_insn_movw(RELOC_OP_PREL, loc, val, 0,
+ INSN_IMM_MOVNZ);
+ break;
+ case R_AARCH64_MOVW_PREL_G1_NC:
+ overflow_check = false;
+ ovf = reloc_insn_movw(RELOC_OP_PREL, loc, val, 16,
+ INSN_IMM_MOVK);
+ break;
+ case R_AARCH64_MOVW_PREL_G1:
+ ovf = reloc_insn_movw(RELOC_OP_PREL, loc, val, 16,
+ INSN_IMM_MOVNZ);
+ break;
+ case R_AARCH64_MOVW_PREL_G2_NC:
+ overflow_check = false;
+ ovf = reloc_insn_movw(RELOC_OP_PREL, loc, val, 32,
+ INSN_IMM_MOVK);
+ break;
+ case R_AARCH64_MOVW_PREL_G2:
+ ovf = reloc_insn_movw(RELOC_OP_PREL, loc, val, 32,
+ INSN_IMM_MOVNZ);
+ break;
+ case R_AARCH64_MOVW_PREL_G3:
+ /* We're using the top bits so we can't overflow. */
+ overflow_check = false;
+ ovf = reloc_insn_movw(RELOC_OP_PREL, loc, val, 48,
+ INSN_IMM_MOVNZ);
+ break;
+
+ /* Immediate instruction relocations. */
+ case R_AARCH64_LD_PREL_LO19:
+ ovf = reloc_insn_imm(RELOC_OP_PREL, loc, val, 2, 19,
+ INSN_IMM_19);
+ break;
+ case R_AARCH64_ADR_PREL_LO21:
+ ovf = reloc_insn_imm(RELOC_OP_PREL, loc, val, 0, 21,
+ INSN_IMM_ADR);
+ break;
+ case R_AARCH64_ADR_PREL_PG_HI21_NC:
+ overflow_check = false;
+ case R_AARCH64_ADR_PREL_PG_HI21:
+ ovf = reloc_insn_imm(RELOC_OP_PAGE, loc, val, 12, 21,
+ INSN_IMM_ADR);
+ break;
+ case R_AARCH64_ADD_ABS_LO12_NC:
+ case R_AARCH64_LDST8_ABS_LO12_NC:
+ overflow_check = false;
+ ovf = reloc_insn_imm(RELOC_OP_ABS, loc, val, 0, 12,
+ INSN_IMM_12);
+ break;
+ case R_AARCH64_LDST16_ABS_LO12_NC:
+ overflow_check = false;
+ ovf = reloc_insn_imm(RELOC_OP_ABS, loc, val, 1, 11,
+ INSN_IMM_12);
+ break;
+ case R_AARCH64_LDST32_ABS_LO12_NC:
+ overflow_check = false;
+ ovf = reloc_insn_imm(RELOC_OP_ABS, loc, val, 2, 10,
+ INSN_IMM_12);
+ break;
+ case R_AARCH64_LDST64_ABS_LO12_NC:
+ overflow_check = false;
+ ovf = reloc_insn_imm(RELOC_OP_ABS, loc, val, 3, 9,
+ INSN_IMM_12);
+ break;
+ case R_AARCH64_LDST128_ABS_LO12_NC:
+ overflow_check = false;
+ ovf = reloc_insn_imm(RELOC_OP_ABS, loc, val, 4, 8,
+ INSN_IMM_12);
+ break;
+ case R_AARCH64_TSTBR14:
+ ovf = reloc_insn_imm(RELOC_OP_PREL, loc, val, 2, 14,
+ INSN_IMM_14);
+ break;
+ case R_AARCH64_CONDBR19:
+ ovf = reloc_insn_imm(RELOC_OP_PREL, loc, val, 2, 19,
+ INSN_IMM_19);
+ break;
+ case R_AARCH64_JUMP26:
+ case R_AARCH64_CALL26:
+ ovf = reloc_insn_imm(RELOC_OP_PREL, loc, val, 2, 26,
+ INSN_IMM_26);
+ break;
+
+ default:
+ pr_err("module %s: unsupported RELA relocation: %llu\n",
+ me->name, ELF64_R_TYPE(rel[i].r_info));
+ return -ENOEXEC;
+ }
+
+ if (overflow_check && ovf == -ERANGE)
+ goto overflow;
+
+ }
+
+ return 0;
+
+overflow:
+ pr_err("module %s: overflow in relocation type %d val %Lx\n",
+ me->name, (int)ELF64_R_TYPE(rel[i].r_info), val);
+ return -ENOEXEC;
+}

2012-08-14 17:54:32

by Catalin Marinas

[permalink] [raw]
Subject: [PATCH v2 24/31] arm64: Add support for /proc/sys/debug/exception-trace

This patch allows setting of the show_unhandled_signals variable via
/proc/sys/debug/exception-trace. The default value is currently 1
showing unhandled user faults (undefined instructions, data aborts) and
invalid signal stack frames.

Signed-off-by: Catalin Marinas <[email protected]>
---
kernel/sysctl.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 87174ef..79dcb00 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1544,7 +1544,7 @@ static struct ctl_table fs_table[] = {

static struct ctl_table debug_table[] = {
#if defined(CONFIG_X86) || defined(CONFIG_PPC) || defined(CONFIG_SPARC) || \
- defined(CONFIG_S390) || defined(CONFIG_TILE)
+ defined(CONFIG_S390) || defined(CONFIG_TILE) || defined(CONFIG_ARM64)
{
.procname = "exception-trace",
.data = &show_unhandled_signals,

2012-08-14 17:54:30

by Catalin Marinas

[permalink] [raw]
Subject: [PATCH v2 21/31] arm64: 32-bit (compat) applications support

From: Will Deacon <[email protected]>

This patch adds support for 32-bit applications. The vectors page is a
binary blob mapped into the application user space at 0xffff0000 (the
AArch64 toolchain does not support compilation of AArch32 code). Full
compatibility with ARMv7 user space is supported. The use of deprecated
ARMv7 functionality (SWP, CP15 barriers) has been disabled by default on
AArch64 kernels and unaligned LDM/STM is not supported.

Please note that only the ARM 32-bit EABI is supported, so no OABI
compatibility.

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
---
arch/arm64/include/asm/compat.h | 232 ++++++++++
arch/arm64/include/asm/signal32.h | 54 +++
arch/arm64/include/asm/unistd32.h | 758 ++++++++++++++++++++++++++++++++
arch/arm64/kernel/kuser32.S | 77 ++++
arch/arm64/kernel/signal32.c | 876 +++++++++++++++++++++++++++++++++++++
arch/arm64/kernel/sys32.S | 283 ++++++++++++
arch/arm64/kernel/sys_compat.c | 177 ++++++++
7 files changed, 2457 insertions(+), 0 deletions(-)
create mode 100644 arch/arm64/include/asm/compat.h
create mode 100644 arch/arm64/include/asm/signal32.h
create mode 100644 arch/arm64/include/asm/unistd32.h
create mode 100644 arch/arm64/kernel/kuser32.S
create mode 100644 arch/arm64/kernel/signal32.c
create mode 100644 arch/arm64/kernel/sys32.S
create mode 100644 arch/arm64/kernel/sys_compat.c

diff --git a/arch/arm64/include/asm/compat.h b/arch/arm64/include/asm/compat.h
new file mode 100644
index 0000000..91e72b7
--- /dev/null
+++ b/arch/arm64/include/asm/compat.h
@@ -0,0 +1,232 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_COMPAT_H
+#define __ASM_COMPAT_H
+#ifdef __KERNEL__
+#ifdef CONFIG_COMPAT
+
+/*
+ * Architecture specific compatibility types
+ */
+#include <linux/types.h>
+#include <linux/sched.h>
+
+#define COMPAT_USER_HZ 100
+#define COMPAT_UTS_MACHINE "armv8l\0\0"
+
+typedef u32 compat_size_t;
+typedef s32 compat_ssize_t;
+typedef s32 compat_time_t;
+typedef s32 compat_clock_t;
+typedef s32 compat_pid_t;
+typedef u32 __compat_uid_t;
+typedef u32 __compat_gid_t;
+typedef u32 __compat_uid32_t;
+typedef u32 __compat_gid32_t;
+typedef u32 compat_mode_t;
+typedef u32 compat_ino_t;
+typedef u32 compat_dev_t;
+typedef s32 compat_off_t;
+typedef s64 compat_loff_t;
+typedef s16 compat_nlink_t;
+typedef u16 compat_ipc_pid_t;
+typedef s32 compat_daddr_t;
+typedef u32 compat_caddr_t;
+typedef __kernel_fsid_t compat_fsid_t;
+typedef s32 compat_key_t;
+typedef s32 compat_timer_t;
+
+typedef s32 compat_int_t;
+typedef s32 compat_long_t;
+typedef s64 compat_s64;
+typedef u32 compat_uint_t;
+typedef u32 compat_ulong_t;
+typedef u64 compat_u64;
+
+struct compat_timespec {
+ compat_time_t tv_sec;
+ s32 tv_nsec;
+};
+
+struct compat_timeval {
+ compat_time_t tv_sec;
+ s32 tv_usec;
+};
+
+struct compat_stat {
+ compat_dev_t st_dev;
+ compat_ino_t st_ino;
+ compat_mode_t st_mode;
+ compat_nlink_t st_nlink;
+ __compat_uid32_t st_uid;
+ __compat_gid32_t st_gid;
+ compat_dev_t st_rdev;
+ compat_off_t st_size;
+ compat_off_t st_blksize;
+ compat_off_t st_blocks;
+ compat_time_t st_atime;
+ u32 st_atime_nsec;
+ compat_time_t st_mtime;
+ u32 st_mtime_nsec;
+ compat_time_t st_ctime;
+ u32 st_ctime_nsec;
+ u32 __unused4[2];
+};
+
+struct compat_flock {
+ short l_type;
+ short l_whence;
+ compat_off_t l_start;
+ compat_off_t l_len;
+ compat_pid_t l_pid;
+};
+
+#define F_GETLK64 12 /* using 'struct flock64' */
+#define F_SETLK64 13
+#define F_SETLKW64 14
+
+struct compat_flock64 {
+ short l_type;
+ short l_whence;
+ compat_loff_t l_start;
+ compat_loff_t l_len;
+ compat_pid_t l_pid;
+};
+
+struct compat_statfs {
+ int f_type;
+ int f_bsize;
+ int f_blocks;
+ int f_bfree;
+ int f_bavail;
+ int f_files;
+ int f_ffree;
+ compat_fsid_t f_fsid;
+ int f_namelen; /* SunOS ignores this field. */
+ int f_frsize;
+ int f_flags;
+ int f_spare[4];
+};
+
+#define COMPAT_RLIM_INFINITY 0xffffffff
+
+typedef u32 compat_old_sigset_t;
+
+#define _COMPAT_NSIG 64
+#define _COMPAT_NSIG_BPW 32
+
+typedef u32 compat_sigset_word;
+
+#define COMPAT_OFF_T_MAX 0x7fffffff
+#define COMPAT_LOFF_T_MAX 0x7fffffffffffffffL
+
+/*
+ * A pointer passed in from user mode. This should not
+ * be used for syscall parameters, just declare them
+ * as pointers because the syscall entry code will have
+ * appropriately converted them already.
+ */
+typedef u32 compat_uptr_t;
+
+static inline void __user *compat_ptr(compat_uptr_t uptr)
+{
+ return (void __user *)(unsigned long)uptr;
+}
+
+static inline compat_uptr_t ptr_to_compat(void __user *uptr)
+{
+ return (u32)(unsigned long)uptr;
+}
+
+static inline void __user *arch_compat_alloc_user_space(long len)
+{
+ struct pt_regs *regs = task_pt_regs(current);
+ return (void __user *)regs->compat_sp - len;
+}
+
+struct compat_ipc64_perm {
+ compat_key_t key;
+ __compat_uid32_t uid;
+ __compat_gid32_t gid;
+ __compat_uid32_t cuid;
+ __compat_gid32_t cgid;
+ unsigned short mode;
+ unsigned short __pad1;
+ unsigned short seq;
+ unsigned short __pad2;
+ compat_ulong_t unused1;
+ compat_ulong_t unused2;
+};
+
+struct compat_semid64_ds {
+ struct compat_ipc64_perm sem_perm;
+ compat_time_t sem_otime;
+ compat_ulong_t __unused1;
+ compat_time_t sem_ctime;
+ compat_ulong_t __unused2;
+ compat_ulong_t sem_nsems;
+ compat_ulong_t __unused3;
+ compat_ulong_t __unused4;
+};
+
+struct compat_msqid64_ds {
+ struct compat_ipc64_perm msg_perm;
+ compat_time_t msg_stime;
+ compat_ulong_t __unused1;
+ compat_time_t msg_rtime;
+ compat_ulong_t __unused2;
+ compat_time_t msg_ctime;
+ compat_ulong_t __unused3;
+ compat_ulong_t msg_cbytes;
+ compat_ulong_t msg_qnum;
+ compat_ulong_t msg_qbytes;
+ compat_pid_t msg_lspid;
+ compat_pid_t msg_lrpid;
+ compat_ulong_t __unused4;
+ compat_ulong_t __unused5;
+};
+
+struct compat_shmid64_ds {
+ struct compat_ipc64_perm shm_perm;
+ compat_size_t shm_segsz;
+ compat_time_t shm_atime;
+ compat_ulong_t __unused1;
+ compat_time_t shm_dtime;
+ compat_ulong_t __unused2;
+ compat_time_t shm_ctime;
+ compat_ulong_t __unused3;
+ compat_pid_t shm_cpid;
+ compat_pid_t shm_lpid;
+ compat_ulong_t shm_nattch;
+ compat_ulong_t __unused4;
+ compat_ulong_t __unused5;
+};
+
+static inline int is_compat_task(void)
+{
+ return test_thread_flag(TIF_32BIT);
+}
+
+#else /* !CONFIG_COMPAT */
+
+static inline int is_compat_task(void)
+{
+ return 0;
+}
+
+#endif /* CONFIG_COMPAT */
+#endif /* __KERNEL__ */
+#endif /* __ASM_COMPAT_H */
diff --git a/arch/arm64/include/asm/signal32.h b/arch/arm64/include/asm/signal32.h
new file mode 100644
index 0000000..f9cf9e1
--- /dev/null
+++ b/arch/arm64/include/asm/signal32.h
@@ -0,0 +1,54 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_SIGNAL32_H
+#define __ASM_SIGNAL32_H
+
+#ifdef __KERNEL__
+#ifdef CONFIG_AARCH32_EMULATION
+#include <linux/compat.h>
+
+#define AARCH32_KERN_SIGRET_CODE_OFFSET 0x500
+
+extern const compat_ulong_t aarch32_sigret_code[6];
+
+int compat_setup_frame(int usig, struct k_sigaction *ka, sigset_t *set,
+ struct pt_regs *regs);
+int compat_setup_rt_frame(int usig, struct k_sigaction *ka, siginfo_t *info,
+ sigset_t *set, struct pt_regs *regs);
+
+void compat_setup_restart_syscall(struct pt_regs *regs);
+#else
+
+static inline int compat_setup_frame(int usid, struct k_sigaction *ka,
+ sigset_t *set, struct pt_regs *regs)
+{
+ BUG();
+}
+
+static inline int compat_setup_rt_frame(int usig, struct k_sigaction *ka,
+ siginfo_t *info, sigset_t *set,
+ struct pt_regs *regs)
+{
+ BUG();
+}
+
+static inline void compat_setup_restart_syscall(struct pt_regs *regs)
+{
+ BUG();
+}
+#endif /* CONFIG_AARCH32_EMULATION */
+#endif /* __KERNEL__ */
+#endif /* __ASM_SIGNAL32_H */
diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
new file mode 100644
index 0000000..a50405f
--- /dev/null
+++ b/arch/arm64/include/asm/unistd32.h
@@ -0,0 +1,758 @@
+/*
+ * Based on arch/arm/include/asm/unistd.h
+ *
+ * Copyright (C) 2001-2005 Russell King
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#if !defined(__ASM_UNISTD32_H) || defined(__SYSCALL)
+#define __ASM_UNISTD32_H
+
+#ifndef __SYSCALL
+#define __SYSCALL(x, y)
+#endif
+
+/*
+ * This file contains the system call numbers.
+ */
+
+#ifdef __SYSCALL_COMPAT
+
+#define __NR_restart_syscall 0
+__SYSCALL(__NR_restart_syscall, sys_restart_syscall)
+#define __NR_exit 1
+__SYSCALL(__NR_exit, sys_exit)
+#define __NR_fork 2
+__SYSCALL(__NR_fork, sys_fork)
+#define __NR_read 3
+__SYSCALL(__NR_read, sys_read)
+#define __NR_write 4
+__SYSCALL(__NR_write, sys_write)
+#define __NR_open 5
+__SYSCALL(__NR_open, sys_open)
+#define __NR_close 6
+__SYSCALL(__NR_close, sys_close)
+__SYSCALL(7, sys_ni_syscall) /* 7 was sys_waitpid */
+#define __NR_creat 8
+__SYSCALL(__NR_creat, sys_creat)
+#define __NR_link 9
+__SYSCALL(__NR_link, sys_link)
+#define __NR_unlink 10
+__SYSCALL(__NR_unlink, sys_unlink)
+#define __NR_execve 11
+__SYSCALL(__NR_execve, sys_execve)
+#define __NR_chdir 12
+__SYSCALL(__NR_chdir, sys_chdir)
+__SYSCALL(13, sys_ni_syscall) /* 13 was sys_time */
+#define __NR_mknod 14
+__SYSCALL(__NR_mknod, sys_mknod)
+#define __NR_chmod 15
+__SYSCALL(__NR_chmod, sys_chmod)
+#define __NR_lchown 16
+__SYSCALL(__NR_lchown, sys_lchown16)
+__SYSCALL(17, sys_ni_syscall) /* 17 was sys_break */
+__SYSCALL(18, sys_ni_syscall) /* 18 was sys_stat */
+#define __NR_lseek 19
+__SYSCALL(__NR_lseek, sys_lseek)
+#define __NR_getpid 20
+__SYSCALL(__NR_getpid, sys_getpid)
+#define __NR_mount 21
+__SYSCALL(__NR_mount, sys_mount)
+__SYSCALL(22, sys_ni_syscall) /* 22 was sys_umount */
+#define __NR_setuid 23
+__SYSCALL(__NR_setuid, sys_setuid16)
+#define __NR_getuid 24
+__SYSCALL(__NR_getuid, sys_getuid16)
+__SYSCALL(25, sys_ni_syscall) /* 25 was sys_stime */
+#define __NR_ptrace 26
+__SYSCALL(__NR_ptrace, sys_ptrace)
+__SYSCALL(27, sys_ni_syscall) /* 27 was sys_alarm */
+__SYSCALL(28, sys_ni_syscall) /* 28 was sys_fstat */
+#define __NR_pause 29
+__SYSCALL(__NR_pause, sys_pause)
+__SYSCALL(30, sys_ni_syscall) /* 30 was sys_utime */
+__SYSCALL(31, sys_ni_syscall) /* 31 was sys_stty */
+__SYSCALL(32, sys_ni_syscall) /* 32 was sys_gtty */
+#define __NR_access 33
+__SYSCALL(__NR_access, sys_access)
+#define __NR_nice 34
+__SYSCALL(__NR_nice, sys_nice)
+__SYSCALL(35, sys_ni_syscall) /* 35 was sys_ftime */
+#define __NR_sync 36
+__SYSCALL(__NR_sync, sys_sync)
+#define __NR_kill 37
+__SYSCALL(__NR_kill, sys_kill)
+#define __NR_rename 38
+__SYSCALL(__NR_rename, sys_rename)
+#define __NR_mkdir 39
+__SYSCALL(__NR_mkdir, sys_mkdir)
+#define __NR_rmdir 40
+__SYSCALL(__NR_rmdir, sys_rmdir)
+#define __NR_dup 41
+__SYSCALL(__NR_dup, sys_dup)
+#define __NR_pipe 42
+__SYSCALL(__NR_pipe, sys_pipe)
+#define __NR_times 43
+__SYSCALL(__NR_times, sys_times)
+__SYSCALL(44, sys_ni_syscall) /* 44 was sys_prof */
+#define __NR_brk 45
+__SYSCALL(__NR_brk, sys_brk)
+#define __NR_setgid 46
+__SYSCALL(__NR_setgid, sys_setgid16)
+#define __NR_getgid 47
+__SYSCALL(__NR_getgid, sys_getgid16)
+__SYSCALL(48, sys_ni_syscall) /* 48 was sys_signal */
+#define __NR_geteuid 49
+__SYSCALL(__NR_geteuid, sys_geteuid16)
+#define __NR_getegid 50
+__SYSCALL(__NR_getegid, sys_getegid16)
+#define __NR_acct 51
+__SYSCALL(__NR_acct, sys_acct)
+#define __NR_umount2 52
+__SYSCALL(__NR_umount2, sys_umount)
+__SYSCALL(53, sys_ni_syscall) /* 53 was sys_lock */
+#define __NR_ioctl 54
+__SYSCALL(__NR_ioctl, sys_ioctl)
+#define __NR_fcntl 55
+__SYSCALL(__NR_fcntl, sys_fcntl)
+__SYSCALL(56, sys_ni_syscall) /* 56 was sys_mpx */
+#define __NR_setpgid 57
+__SYSCALL(__NR_setpgid, sys_setpgid)
+__SYSCALL(58, sys_ni_syscall) /* 58 was sys_ulimit */
+__SYSCALL(59, sys_ni_syscall) /* 59 was sys_olduname */
+#define __NR_umask 60
+__SYSCALL(__NR_umask, sys_umask)
+#define __NR_chroot 61
+__SYSCALL(__NR_chroot, sys_chroot)
+#define __NR_ustat 62
+__SYSCALL(__NR_ustat, sys_ustat)
+#define __NR_dup2 63
+__SYSCALL(__NR_dup2, sys_dup2)
+#define __NR_getppid 64
+__SYSCALL(__NR_getppid, sys_getppid)
+#define __NR_getpgrp 65
+__SYSCALL(__NR_getpgrp, sys_getpgrp)
+#define __NR_setsid 66
+__SYSCALL(__NR_setsid, sys_setsid)
+#define __NR_sigaction 67
+__SYSCALL(__NR_sigaction, sys_sigaction)
+__SYSCALL(68, sys_ni_syscall) /* 68 was sys_sgetmask */
+__SYSCALL(69, sys_ni_syscall) /* 69 was sys_ssetmask */
+#define __NR_setreuid 70
+__SYSCALL(__NR_setreuid, sys_setreuid16)
+#define __NR_setregid 71
+__SYSCALL(__NR_setregid, sys_setregid16)
+#define __NR_sigsuspend 72
+__SYSCALL(__NR_sigsuspend, sys_sigsuspend)
+#define __NR_sigpending 73
+__SYSCALL(__NR_sigpending, sys_sigpending)
+#define __NR_sethostname 74
+__SYSCALL(__NR_sethostname, sys_sethostname)
+#define __NR_setrlimit 75
+__SYSCALL(__NR_setrlimit, sys_setrlimit)
+__SYSCALL(76, sys_ni_syscall) /* 76 was sys_getrlimit */
+#define __NR_getrusage 77
+__SYSCALL(__NR_getrusage, sys_getrusage)
+#define __NR_gettimeofday 78
+__SYSCALL(__NR_gettimeofday, sys_gettimeofday)
+#define __NR_settimeofday 79
+__SYSCALL(__NR_settimeofday, sys_settimeofday)
+#define __NR_getgroups 80
+__SYSCALL(__NR_getgroups, sys_getgroups16)
+#define __NR_setgroups 81
+__SYSCALL(__NR_setgroups, sys_setgroups16)
+__SYSCALL(82, sys_ni_syscall) /* 82 was sys_select */
+#define __NR_symlink 83
+__SYSCALL(__NR_symlink, sys_symlink)
+__SYSCALL(84, sys_ni_syscall) /* 84 was sys_lstat */
+#define __NR_readlink 85
+__SYSCALL(__NR_readlink, sys_readlink)
+#define __NR_uselib 86
+__SYSCALL(__NR_uselib, sys_uselib)
+#define __NR_swapon 87
+__SYSCALL(__NR_swapon, sys_swapon)
+#define __NR_reboot 88
+__SYSCALL(__NR_reboot, sys_reboot)
+__SYSCALL(89, sys_ni_syscall) /* 89 was sys_readdir */
+__SYSCALL(90, sys_ni_syscall) /* 90 was sys_mmap */
+#define __NR_munmap 91
+__SYSCALL(__NR_munmap, sys_munmap)
+#define __NR_truncate 92
+__SYSCALL(__NR_truncate, sys_truncate)
+#define __NR_ftruncate 93
+__SYSCALL(__NR_ftruncate, sys_ftruncate)
+#define __NR_fchmod 94
+__SYSCALL(__NR_fchmod, sys_fchmod)
+#define __NR_fchown 95
+__SYSCALL(__NR_fchown, sys_fchown16)
+#define __NR_getpriority 96
+__SYSCALL(__NR_getpriority, sys_getpriority)
+#define __NR_setpriority 97
+__SYSCALL(__NR_setpriority, sys_setpriority)
+__SYSCALL(98, sys_ni_syscall) /* 98 was sys_profil */
+#define __NR_statfs 99
+__SYSCALL(__NR_statfs, sys_statfs)
+#define __NR_fstatfs 100
+__SYSCALL(__NR_fstatfs, sys_fstatfs)
+__SYSCALL(101, sys_ni_syscall) /* 101 was sys_ioperm */
+__SYSCALL(102, sys_ni_syscall) /* 102 was sys_socketcall */
+#define __NR_syslog 103
+__SYSCALL(__NR_syslog, sys_syslog)
+#define __NR_setitimer 104
+__SYSCALL(__NR_setitimer, sys_setitimer)
+#define __NR_getitimer 105
+__SYSCALL(__NR_getitimer, sys_getitimer)
+#define __NR_stat 106
+__SYSCALL(__NR_stat, sys_newstat)
+#define __NR_lstat 107
+__SYSCALL(__NR_lstat, sys_newlstat)
+#define __NR_fstat 108
+__SYSCALL(__NR_fstat, sys_newfstat)
+__SYSCALL(109, sys_ni_syscall) /* 109 was sys_uname */
+__SYSCALL(110, sys_ni_syscall) /* 110 was sys_iopl */
+#define __NR_vhangup 111
+__SYSCALL(__NR_vhangup, sys_vhangup)
+__SYSCALL(112, sys_ni_syscall) /* 112 was sys_idle */
+__SYSCALL(113, sys_ni_syscall) /* 113 was sys_syscall */
+#define __NR_wait4 114
+__SYSCALL(__NR_wait4, sys_wait4)
+#define __NR_swapoff 115
+__SYSCALL(__NR_swapoff, sys_swapoff)
+#define __NR_sysinfo 116
+__SYSCALL(__NR_sysinfo, sys_sysinfo)
+__SYSCALL(117, sys_ni_syscall) /* 117 was sys_ipc */
+#define __NR_fsync 118
+__SYSCALL(__NR_fsync, sys_fsync)
+#define __NR_sigreturn 119
+__SYSCALL(__NR_sigreturn, sys_sigreturn)
+#define __NR_clone 120
+__SYSCALL(__NR_clone, sys_clone)
+#define __NR_setdomainname 121
+__SYSCALL(__NR_setdomainname, sys_setdomainname)
+#define __NR_uname 122
+__SYSCALL(__NR_uname, sys_newuname)
+__SYSCALL(123, sys_ni_syscall) /* 123 was sys_modify_ldt */
+#define __NR_adjtimex 124
+__SYSCALL(__NR_adjtimex, sys_adjtimex)
+#define __NR_mprotect 125
+__SYSCALL(__NR_mprotect, sys_mprotect)
+#define __NR_sigprocmask 126
+__SYSCALL(__NR_sigprocmask, sys_sigprocmask)
+__SYSCALL(127, sys_ni_syscall) /* 127 was sys_create_module */
+#define __NR_init_module 128
+__SYSCALL(__NR_init_module, sys_init_module)
+#define __NR_delete_module 129
+__SYSCALL(__NR_delete_module, sys_delete_module)
+__SYSCALL(130, sys_ni_syscall) /* 130 was sys_get_kernel_syms */
+#define __NR_quotactl 131
+__SYSCALL(__NR_quotactl, sys_quotactl)
+#define __NR_getpgid 132
+__SYSCALL(__NR_getpgid, sys_getpgid)
+#define __NR_fchdir 133
+__SYSCALL(__NR_fchdir, sys_fchdir)
+#define __NR_bdflush 134
+__SYSCALL(__NR_bdflush, sys_bdflush)
+#define __NR_sysfs 135
+__SYSCALL(__NR_sysfs, sys_sysfs)
+#define __NR_personality 136
+__SYSCALL(__NR_personality, sys_personality)
+__SYSCALL(137, sys_ni_syscall) /* 137 was sys_afs_syscall */
+#define __NR_setfsuid 138
+__SYSCALL(__NR_setfsuid, sys_setfsuid16)
+#define __NR_setfsgid 139
+__SYSCALL(__NR_setfsgid, sys_setfsgid16)
+#define __NR__llseek 140
+__SYSCALL(__NR__llseek, sys_llseek)
+#define __NR_getdents 141
+__SYSCALL(__NR_getdents, sys_getdents)
+#define __NR__newselect 142
+__SYSCALL(__NR__newselect, sys_select)
+#define __NR_flock 143
+__SYSCALL(__NR_flock, sys_flock)
+#define __NR_msync 144
+__SYSCALL(__NR_msync, sys_msync)
+#define __NR_readv 145
+__SYSCALL(__NR_readv, sys_readv)
+#define __NR_writev 146
+__SYSCALL(__NR_writev, sys_writev)
+#define __NR_getsid 147
+__SYSCALL(__NR_getsid, sys_getsid)
+#define __NR_fdatasync 148
+__SYSCALL(__NR_fdatasync, sys_fdatasync)
+#define __NR__sysctl 149
+__SYSCALL(__NR__sysctl, sys_sysctl)
+#define __NR_mlock 150
+__SYSCALL(__NR_mlock, sys_mlock)
+#define __NR_munlock 151
+__SYSCALL(__NR_munlock, sys_munlock)
+#define __NR_mlockall 152
+__SYSCALL(__NR_mlockall, sys_mlockall)
+#define __NR_munlockall 153
+__SYSCALL(__NR_munlockall, sys_munlockall)
+#define __NR_sched_setparam 154
+__SYSCALL(__NR_sched_setparam, sys_sched_setparam)
+#define __NR_sched_getparam 155
+__SYSCALL(__NR_sched_getparam, sys_sched_getparam)
+#define __NR_sched_setscheduler 156
+__SYSCALL(__NR_sched_setscheduler, sys_sched_setscheduler)
+#define __NR_sched_getscheduler 157
+__SYSCALL(__NR_sched_getscheduler, sys_sched_getscheduler)
+#define __NR_sched_yield 158
+__SYSCALL(__NR_sched_yield, sys_sched_yield)
+#define __NR_sched_get_priority_max 159
+__SYSCALL(__NR_sched_get_priority_max, sys_sched_get_priority_max)
+#define __NR_sched_get_priority_min 160
+__SYSCALL(__NR_sched_get_priority_min, sys_sched_get_priority_min)
+#define __NR_sched_rr_get_interval 161
+__SYSCALL(__NR_sched_rr_get_interval, sys_sched_rr_get_interval)
+#define __NR_nanosleep 162
+__SYSCALL(__NR_nanosleep, sys_nanosleep)
+#define __NR_mremap 163
+__SYSCALL(__NR_mremap, sys_mremap)
+#define __NR_setresuid 164
+__SYSCALL(__NR_setresuid, sys_setresuid16)
+#define __NR_getresuid 165
+__SYSCALL(__NR_getresuid, sys_getresuid16)
+__SYSCALL(166, sys_ni_syscall) /* 166 was sys_vm86 */
+__SYSCALL(167, sys_ni_syscall) /* 167 was sys_query_module */
+#define __NR_poll 168
+__SYSCALL(__NR_poll, sys_poll)
+#define __NR_nfsservctl 169
+__SYSCALL(__NR_nfsservctl, sys_ni_syscall)
+#define __NR_setresgid 170
+__SYSCALL(__NR_setresgid, sys_setresgid16)
+#define __NR_getresgid 171
+__SYSCALL(__NR_getresgid, sys_getresgid16)
+#define __NR_prctl 172
+__SYSCALL(__NR_prctl, sys_prctl)
+#define __NR_rt_sigreturn 173
+__SYSCALL(__NR_rt_sigreturn, sys_rt_sigreturn)
+#define __NR_rt_sigaction 174
+__SYSCALL(__NR_rt_sigaction, sys_rt_sigaction)
+#define __NR_rt_sigprocmask 175
+__SYSCALL(__NR_rt_sigprocmask, sys_rt_sigprocmask)
+#define __NR_rt_sigpending 176
+__SYSCALL(__NR_rt_sigpending, sys_rt_sigpending)
+#define __NR_rt_sigtimedwait 177
+__SYSCALL(__NR_rt_sigtimedwait, sys_rt_sigtimedwait)
+#define __NR_rt_sigqueueinfo 178
+__SYSCALL(__NR_rt_sigqueueinfo, sys_rt_sigqueueinfo)
+#define __NR_rt_sigsuspend 179
+__SYSCALL(__NR_rt_sigsuspend, sys_rt_sigsuspend)
+#define __NR_pread64 180
+__SYSCALL(__NR_pread64, sys_pread64)
+#define __NR_pwrite64 181
+__SYSCALL(__NR_pwrite64, sys_pwrite64)
+#define __NR_chown 182
+__SYSCALL(__NR_chown, sys_chown16)
+#define __NR_getcwd 183
+__SYSCALL(__NR_getcwd, sys_getcwd)
+#define __NR_capget 184
+__SYSCALL(__NR_capget, sys_capget)
+#define __NR_capset 185
+__SYSCALL(__NR_capset, sys_capset)
+#define __NR_sigaltstack 186
+__SYSCALL(__NR_sigaltstack, sys_sigaltstack)
+#define __NR_sendfile 187
+__SYSCALL(__NR_sendfile, sys_sendfile)
+__SYSCALL(188, sys_ni_syscall) /* 188 reserved */
+__SYSCALL(189, sys_ni_syscall) /* 189 reserved */
+#define __NR_vfork 190
+__SYSCALL(__NR_vfork, sys_vfork)
+#define __NR_ugetrlimit 191 /* SuS compliant getrlimit */
+__SYSCALL(__NR_ugetrlimit, sys_getrlimit)
+#define __NR_mmap2 192
+__SYSCALL(__NR_mmap2, sys_mmap2)
+#define __NR_truncate64 193
+__SYSCALL(__NR_truncate64, sys_truncate64)
+#define __NR_ftruncate64 194
+__SYSCALL(__NR_ftruncate64, sys_ftruncate64)
+#define __NR_stat64 195
+__SYSCALL(__NR_stat64, sys_stat64)
+#define __NR_lstat64 196
+__SYSCALL(__NR_lstat64, sys_lstat64)
+#define __NR_fstat64 197
+__SYSCALL(__NR_fstat64, sys_fstat64)
+#define __NR_lchown32 198
+__SYSCALL(__NR_lchown32, sys_lchown)
+#define __NR_getuid32 199
+__SYSCALL(__NR_getuid32, sys_getuid)
+#define __NR_getgid32 200
+__SYSCALL(__NR_getgid32, sys_getgid)
+#define __NR_geteuid32 201
+__SYSCALL(__NR_geteuid32, sys_geteuid)
+#define __NR_getegid32 202
+__SYSCALL(__NR_getegid32, sys_getegid)
+#define __NR_setreuid32 203
+__SYSCALL(__NR_setreuid32, sys_setreuid)
+#define __NR_setregid32 204
+__SYSCALL(__NR_setregid32, sys_setregid)
+#define __NR_getgroups32 205
+__SYSCALL(__NR_getgroups32, sys_getgroups)
+#define __NR_setgroups32 206
+__SYSCALL(__NR_setgroups32, sys_setgroups)
+#define __NR_fchown32 207
+__SYSCALL(__NR_fchown32, sys_fchown)
+#define __NR_setresuid32 208
+__SYSCALL(__NR_setresuid32, sys_setresuid)
+#define __NR_getresuid32 209
+__SYSCALL(__NR_getresuid32, sys_getresuid)
+#define __NR_setresgid32 210
+__SYSCALL(__NR_setresgid32, sys_setresgid)
+#define __NR_getresgid32 211
+__SYSCALL(__NR_getresgid32, sys_getresgid)
+#define __NR_chown32 212
+__SYSCALL(__NR_chown32, sys_chown)
+#define __NR_setuid32 213
+__SYSCALL(__NR_setuid32, sys_setuid)
+#define __NR_setgid32 214
+__SYSCALL(__NR_setgid32, sys_setgid)
+#define __NR_setfsuid32 215
+__SYSCALL(__NR_setfsuid32, sys_setfsuid)
+#define __NR_setfsgid32 216
+__SYSCALL(__NR_setfsgid32, sys_setfsgid)
+#define __NR_getdents64 217
+__SYSCALL(__NR_getdents64, sys_getdents64)
+#define __NR_pivot_root 218
+__SYSCALL(__NR_pivot_root, sys_pivot_root)
+#define __NR_mincore 219
+__SYSCALL(__NR_mincore, sys_mincore)
+#define __NR_madvise 220
+__SYSCALL(__NR_madvise, sys_madvise)
+#define __NR_fcntl64 221
+__SYSCALL(__NR_fcntl64, sys_fcntl64)
+__SYSCALL(222, sys_ni_syscall) /* 222 for tux */
+__SYSCALL(223, sys_ni_syscall) /* 223 is unused */
+#define __NR_gettid 224
+__SYSCALL(__NR_gettid, sys_gettid)
+#define __NR_readahead 225
+__SYSCALL(__NR_readahead, sys_readahead)
+#define __NR_setxattr 226
+__SYSCALL(__NR_setxattr, sys_setxattr)
+#define __NR_lsetxattr 227
+__SYSCALL(__NR_lsetxattr, sys_lsetxattr)
+#define __NR_fsetxattr 228
+__SYSCALL(__NR_fsetxattr, sys_fsetxattr)
+#define __NR_getxattr 229
+__SYSCALL(__NR_getxattr, sys_getxattr)
+#define __NR_lgetxattr 230
+__SYSCALL(__NR_lgetxattr, sys_lgetxattr)
+#define __NR_fgetxattr 231
+__SYSCALL(__NR_fgetxattr, sys_fgetxattr)
+#define __NR_listxattr 232
+__SYSCALL(__NR_listxattr, sys_listxattr)
+#define __NR_llistxattr 233
+__SYSCALL(__NR_llistxattr, sys_llistxattr)
+#define __NR_flistxattr 234
+__SYSCALL(__NR_flistxattr, sys_flistxattr)
+#define __NR_removexattr 235
+__SYSCALL(__NR_removexattr, sys_removexattr)
+#define __NR_lremovexattr 236
+__SYSCALL(__NR_lremovexattr, sys_lremovexattr)
+#define __NR_fremovexattr 237
+__SYSCALL(__NR_fremovexattr, sys_fremovexattr)
+#define __NR_tkill 238
+__SYSCALL(__NR_tkill, sys_tkill)
+#define __NR_sendfile64 239
+__SYSCALL(__NR_sendfile64, sys_sendfile64)
+#define __NR_futex 240
+__SYSCALL(__NR_futex, sys_futex)
+#define __NR_sched_setaffinity 241
+__SYSCALL(__NR_sched_setaffinity, sys_sched_setaffinity)
+#define __NR_sched_getaffinity 242
+__SYSCALL(__NR_sched_getaffinity, sys_sched_getaffinity)
+#define __NR_io_setup 243
+__SYSCALL(__NR_io_setup, sys_io_setup)
+#define __NR_io_destroy 244
+__SYSCALL(__NR_io_destroy, sys_io_destroy)
+#define __NR_io_getevents 245
+__SYSCALL(__NR_io_getevents, sys_io_getevents)
+#define __NR_io_submit 246
+__SYSCALL(__NR_io_submit, sys_io_submit)
+#define __NR_io_cancel 247
+__SYSCALL(__NR_io_cancel, sys_io_cancel)
+#define __NR_exit_group 248
+__SYSCALL(__NR_exit_group, sys_exit_group)
+#define __NR_lookup_dcookie 249
+__SYSCALL(__NR_lookup_dcookie, sys_lookup_dcookie)
+#define __NR_epoll_create 250
+__SYSCALL(__NR_epoll_create, sys_epoll_create)
+#define __NR_epoll_ctl 251
+__SYSCALL(__NR_epoll_ctl, sys_epoll_ctl)
+#define __NR_epoll_wait 252
+__SYSCALL(__NR_epoll_wait, sys_epoll_wait)
+#define __NR_remap_file_pages 253
+__SYSCALL(__NR_remap_file_pages, sys_remap_file_pages)
+__SYSCALL(254, sys_ni_syscall) /* 254 for set_thread_area */
+__SYSCALL(255, sys_ni_syscall) /* 255 for get_thread_area */
+#define __NR_set_tid_address 256
+__SYSCALL(__NR_set_tid_address, sys_set_tid_address)
+#define __NR_timer_create 257
+__SYSCALL(__NR_timer_create, sys_timer_create)
+#define __NR_timer_settime 258
+__SYSCALL(__NR_timer_settime, sys_timer_settime)
+#define __NR_timer_gettime 259
+__SYSCALL(__NR_timer_gettime, sys_timer_gettime)
+#define __NR_timer_getoverrun 260
+__SYSCALL(__NR_timer_getoverrun, sys_timer_getoverrun)
+#define __NR_timer_delete 261
+__SYSCALL(__NR_timer_delete, sys_timer_delete)
+#define __NR_clock_settime 262
+__SYSCALL(__NR_clock_settime, sys_clock_settime)
+#define __NR_clock_gettime 263
+__SYSCALL(__NR_clock_gettime, sys_clock_gettime)
+#define __NR_clock_getres 264
+__SYSCALL(__NR_clock_getres, sys_clock_getres)
+#define __NR_clock_nanosleep 265
+__SYSCALL(__NR_clock_nanosleep, sys_clock_nanosleep)
+#define __NR_statfs64 266
+__SYSCALL(__NR_statfs64, sys_statfs64)
+#define __NR_fstatfs64 267
+__SYSCALL(__NR_fstatfs64, sys_fstatfs64)
+#define __NR_tgkill 268
+__SYSCALL(__NR_tgkill, sys_tgkill)
+#define __NR_utimes 269
+__SYSCALL(__NR_utimes, sys_utimes)
+#define __NR_fadvise64 270
+__SYSCALL(__NR_fadvise64, sys_fadvise64_64)
+#define __NR_pciconfig_iobase 271
+__SYSCALL(__NR_pciconfig_iobase, sys_pciconfig_iobase)
+#define __NR_pciconfig_read 272
+__SYSCALL(__NR_pciconfig_read, sys_pciconfig_read)
+#define __NR_pciconfig_write 273
+__SYSCALL(__NR_pciconfig_write, sys_pciconfig_write)
+#define __NR_mq_open 274
+__SYSCALL(__NR_mq_open, sys_mq_open)
+#define __NR_mq_unlink 275
+__SYSCALL(__NR_mq_unlink, sys_mq_unlink)
+#define __NR_mq_timedsend 276
+__SYSCALL(__NR_mq_timedsend, sys_mq_timedsend)
+#define __NR_mq_timedreceive 277
+__SYSCALL(__NR_mq_timedreceive, sys_mq_timedreceive)
+#define __NR_mq_notify 278
+__SYSCALL(__NR_mq_notify, sys_mq_notify)
+#define __NR_mq_getsetattr 279
+__SYSCALL(__NR_mq_getsetattr, sys_mq_getsetattr)
+#define __NR_waitid 280
+__SYSCALL(__NR_waitid, sys_waitid)
+#define __NR_socket 281
+__SYSCALL(__NR_socket, sys_socket)
+#define __NR_bind 282
+__SYSCALL(__NR_bind, sys_bind)
+#define __NR_connect 283
+__SYSCALL(__NR_connect, sys_connect)
+#define __NR_listen 284
+__SYSCALL(__NR_listen, sys_listen)
+#define __NR_accept 285
+__SYSCALL(__NR_accept, sys_accept)
+#define __NR_getsockname 286
+__SYSCALL(__NR_getsockname, sys_getsockname)
+#define __NR_getpeername 287
+__SYSCALL(__NR_getpeername, sys_getpeername)
+#define __NR_socketpair 288
+__SYSCALL(__NR_socketpair, sys_socketpair)
+#define __NR_send 289
+__SYSCALL(__NR_send, sys_send)
+#define __NR_sendto 290
+__SYSCALL(__NR_sendto, sys_sendto)
+#define __NR_recv 291
+__SYSCALL(__NR_recv, sys_recv)
+#define __NR_recvfrom 292
+__SYSCALL(__NR_recvfrom, sys_recvfrom)
+#define __NR_shutdown 293
+__SYSCALL(__NR_shutdown, sys_shutdown)
+#define __NR_setsockopt 294
+__SYSCALL(__NR_setsockopt, sys_setsockopt)
+#define __NR_getsockopt 295
+__SYSCALL(__NR_getsockopt, sys_getsockopt)
+#define __NR_sendmsg 296
+__SYSCALL(__NR_sendmsg, sys_sendmsg)
+#define __NR_recvmsg 297
+__SYSCALL(__NR_recvmsg, sys_recvmsg)
+#define __NR_semop 298
+__SYSCALL(__NR_semop, sys_semop)
+#define __NR_semget 299
+__SYSCALL(__NR_semget, sys_semget)
+#define __NR_semctl 300
+__SYSCALL(__NR_semctl, sys_semctl)
+#define __NR_msgsnd 301
+__SYSCALL(__NR_msgsnd, sys_msgsnd)
+#define __NR_msgrcv 302
+__SYSCALL(__NR_msgrcv, sys_msgrcv)
+#define __NR_msgget 303
+__SYSCALL(__NR_msgget, sys_msgget)
+#define __NR_msgctl 304
+__SYSCALL(__NR_msgctl, sys_msgctl)
+#define __NR_shmat 305
+__SYSCALL(__NR_shmat, sys_shmat)
+#define __NR_shmdt 306
+__SYSCALL(__NR_shmdt, sys_shmdt)
+#define __NR_shmget 307
+__SYSCALL(__NR_shmget, sys_shmget)
+#define __NR_shmctl 308
+__SYSCALL(__NR_shmctl, sys_shmctl)
+#define __NR_add_key 309
+__SYSCALL(__NR_add_key, sys_add_key)
+#define __NR_request_key 310
+__SYSCALL(__NR_request_key, sys_request_key)
+#define __NR_keyctl 311
+__SYSCALL(__NR_keyctl, sys_keyctl)
+#define __NR_semtimedop 312
+__SYSCALL(__NR_semtimedop, sys_semtimedop)
+#define __NR_vserver 313
+__SYSCALL(__NR_vserver, sys_ni_syscall)
+#define __NR_ioprio_set 314
+__SYSCALL(__NR_ioprio_set, sys_ioprio_set)
+#define __NR_ioprio_get 315
+__SYSCALL(__NR_ioprio_get, sys_ioprio_get)
+#define __NR_inotify_init 316
+__SYSCALL(__NR_inotify_init, sys_inotify_init)
+#define __NR_inotify_add_watch 317
+__SYSCALL(__NR_inotify_add_watch, sys_inotify_add_watch)
+#define __NR_inotify_rm_watch 318
+__SYSCALL(__NR_inotify_rm_watch, sys_inotify_rm_watch)
+#define __NR_mbind 319
+__SYSCALL(__NR_mbind, sys_mbind)
+#define __NR_get_mempolicy 320
+__SYSCALL(__NR_get_mempolicy, sys_get_mempolicy)
+#define __NR_set_mempolicy 321
+__SYSCALL(__NR_set_mempolicy, sys_set_mempolicy)
+#define __NR_openat 322
+__SYSCALL(__NR_openat, sys_openat)
+#define __NR_mkdirat 323
+__SYSCALL(__NR_mkdirat, sys_mkdirat)
+#define __NR_mknodat 324
+__SYSCALL(__NR_mknodat, sys_mknodat)
+#define __NR_fchownat 325
+__SYSCALL(__NR_fchownat, sys_fchownat)
+#define __NR_futimesat 326
+__SYSCALL(__NR_futimesat, sys_futimesat)
+#define __NR_fstatat64 327
+__SYSCALL(__NR_fstatat64, sys_fstatat64)
+#define __NR_unlinkat 328
+__SYSCALL(__NR_unlinkat, sys_unlinkat)
+#define __NR_renameat 329
+__SYSCALL(__NR_renameat, sys_renameat)
+#define __NR_linkat 330
+__SYSCALL(__NR_linkat, sys_linkat)
+#define __NR_symlinkat 331
+__SYSCALL(__NR_symlinkat, sys_symlinkat)
+#define __NR_readlinkat 332
+__SYSCALL(__NR_readlinkat, sys_readlinkat)
+#define __NR_fchmodat 333
+__SYSCALL(__NR_fchmodat, sys_fchmodat)
+#define __NR_faccessat 334
+__SYSCALL(__NR_faccessat, sys_faccessat)
+#define __NR_pselect6 335
+__SYSCALL(__NR_pselect6, sys_pselect6)
+#define __NR_ppoll 336
+__SYSCALL(__NR_ppoll, sys_ppoll)
+#define __NR_unshare 337
+__SYSCALL(__NR_unshare, sys_unshare)
+#define __NR_set_robust_list 338
+__SYSCALL(__NR_set_robust_list, sys_set_robust_list)
+#define __NR_get_robust_list 339
+__SYSCALL(__NR_get_robust_list, sys_get_robust_list)
+#define __NR_splice 340
+__SYSCALL(__NR_splice, sys_splice)
+#define __NR_sync_file_range2 341
+__SYSCALL(__NR_sync_file_range2, sys_sync_file_range2)
+#define __NR_tee 342
+__SYSCALL(__NR_tee, sys_tee)
+#define __NR_vmsplice 343
+__SYSCALL(__NR_vmsplice, sys_vmsplice)
+#define __NR_move_pages 344
+__SYSCALL(__NR_move_pages, sys_move_pages)
+#define __NR_getcpu 345
+__SYSCALL(__NR_getcpu, sys_getcpu)
+#define __NR_epoll_pwait 346
+__SYSCALL(__NR_epoll_pwait, sys_epoll_pwait)
+#define __NR_kexec_load 347
+__SYSCALL(__NR_kexec_load, sys_kexec_load)
+#define __NR_utimensat 348
+__SYSCALL(__NR_utimensat, sys_utimensat)
+#define __NR_signalfd 349
+__SYSCALL(__NR_signalfd, sys_signalfd)
+#define __NR_timerfd_create 350
+__SYSCALL(__NR_timerfd_create, sys_timerfd_create)
+#define __NR_eventfd 351
+__SYSCALL(__NR_eventfd, sys_eventfd)
+#define __NR_fallocate 352
+__SYSCALL(__NR_fallocate, sys_fallocate)
+#define __NR_timerfd_settime 353
+__SYSCALL(__NR_timerfd_settime, sys_timerfd_settime)
+#define __NR_timerfd_gettime 354
+__SYSCALL(__NR_timerfd_gettime, sys_timerfd_gettime)
+#define __NR_signalfd4 355
+__SYSCALL(__NR_signalfd4, sys_signalfd4)
+#define __NR_eventfd2 356
+__SYSCALL(__NR_eventfd2, sys_eventfd2)
+#define __NR_epoll_create1 357
+__SYSCALL(__NR_epoll_create1, sys_epoll_create1)
+#define __NR_dup3 358
+__SYSCALL(__NR_dup3, sys_dup3)
+#define __NR_pipe2 359
+__SYSCALL(__NR_pipe2, sys_pipe2)
+#define __NR_inotify_init1 360
+__SYSCALL(__NR_inotify_init1, sys_inotify_init1)
+#define __NR_preadv 361
+__SYSCALL(__NR_preadv, sys_preadv)
+#define __NR_pwritev 362
+__SYSCALL(__NR_pwritev, sys_pwritev)
+#define __NR_rt_tgsigqueueinfo 363
+__SYSCALL(__NR_rt_tgsigqueueinfo, sys_rt_tgsigqueueinfo)
+#define __NR_perf_event_open 364
+__SYSCALL(__NR_perf_event_open, sys_perf_event_open)
+#define __NR_recvmmsg 365
+__SYSCALL(__NR_recvmmsg, sys_recvmmsg)
+#define __NR_accept4 366
+__SYSCALL(__NR_accept4, sys_accept4)
+#define __NR_fanotify_init 367
+__SYSCALL(__NR_fanotify_init, sys_fanotify_init)
+#define __NR_fanotify_mark 368
+__SYSCALL(__NR_fanotify_mark, sys_fanotify_mark)
+#define __NR_prlimit64 369
+__SYSCALL(__NR_prlimit64, sys_prlimit64)
+#define __NR_name_to_handle_at 370
+__SYSCALL(__NR_name_to_handle_at, sys_name_to_handle_at)
+#define __NR_open_by_handle_at 371
+__SYSCALL(__NR_open_by_handle_at, sys_open_by_handle_at)
+#define __NR_clock_adjtime 372
+__SYSCALL(__NR_clock_adjtime, sys_clock_adjtime)
+#define __NR_syncfs 373
+__SYSCALL(__NR_syncfs, sys_syncfs)
+
+/*
+ * The following SVCs are ARM private.
+ */
+#define __ARM_NR_COMPAT_BASE 0x0f0000
+#define __ARM_NR_compat_cacheflush (__ARM_NR_COMPAT_BASE+2)
+#define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE+5)
+
+#endif /* __SYSCALL_COMPAT */
+
+#define __NR_compat_syscalls 374
+
+#define __ARCH_WANT_COMPAT_IPC_PARSE_VERSION
+#define __ARCH_WANT_COMPAT_STAT64
+#define __ARCH_WANT_SYS_GETHOSTNAME
+#define __ARCH_WANT_SYS_PAUSE
+#define __ARCH_WANT_SYS_GETPGRP
+#define __ARCH_WANT_SYS_LLSEEK
+#define __ARCH_WANT_SYS_NICE
+#define __ARCH_WANT_SYS_SIGPENDING
+#define __ARCH_WANT_SYS_SIGPROCMASK
+#define __ARCH_WANT_COMPAT_SYS_RT_SIGSUSPEND
+
+#endif /* __ASM_UNISTD32_H */
diff --git a/arch/arm64/kernel/kuser32.S b/arch/arm64/kernel/kuser32.S
new file mode 100644
index 0000000..c3bab17
--- /dev/null
+++ b/arch/arm64/kernel/kuser32.S
@@ -0,0 +1,77 @@
+/*
+ * Low-level user helpers placed in the vectors page for AArch32.
+ *
+ * Copyright (C) 1996-2000 Russell King.
+ * Copyright (C) 2012 ARM Ltd
+ * Author: Will Deacon <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ *
+ *
+ * AArch32 user helpers.
+ *
+ * Each segment is 32-byte aligned and will be moved to the top of the high
+ * vector page. New segments (if ever needed) must be added in front of
+ * existing ones. This mechanism should be used only for things that are
+ * really small and justified, and not be abused freely.
+ *
+ * See Documentation/arm/kernel_user_helpers.txt for formal definitions.
+ */
+ .align 5
+ .globl __kuser_helper_start
+__kuser_helper_start:
+
+__kuser_cmpxchg64: // 0xffff0f60
+ .inst 0xe92d00f0 // push {r4, r5, r6, r7}
+ .inst 0xe1c040d0 // ldrd r4, r5, [r0]
+ .inst 0xe1c160d0 // ldrd r6, r7, [r1]
+ .inst 0xf57ff05f // dmb sy
+ .inst 0xe1b20f9f // 1: ldrexd r0, r1, [r2]
+ .inst 0xe0303004 // eors r3, r0, r4
+ .inst 0x00313005 // eoreqs r3, r1, r5
+ .inst 0x01a23f96 // strexdeq r3, r6, [r2]
+ .inst 0x03330001 // teqeq r3, #1
+ .inst 0x0afffff9 // beq 1b
+ .inst 0xf57ff05f // dmb sy
+ .inst 0xe2730000 // rsbs r0, r3, #0
+ .inst 0xe8bd00f0 // pop {r4, r5, r6, r7}
+ .inst 0xe12fff1e // bx lr
+
+ .align 5
+__kuser_memory_barrier: // 0xffff0fa0
+ .inst 0xf57ff05f // dmb sy
+ .inst 0xe12fff1e // bx lr
+
+ .align 5
+__kuser_cmpxchg: // 0xffff0fc0
+ .inst 0xf57ff05f // dmb sy
+ .inst 0xe1923f9f // 1: ldrex r3, [r2]
+ .inst 0xe0533000 // subs r3, r3, r0
+ .inst 0x01823f91 // strexeq r3, r1, [r2]
+ .inst 0x03330001 // teqeq r3, #1
+ .inst 0x0afffffa // beq 1b
+ .inst 0xe2730000 // rsbs r0, r3, #0
+ .inst 0xeaffffef // b <__kuser_memory_barrier>
+
+ .align 5
+__kuser_get_tls: // 0xffff0fe0
+ .inst 0xee1d0f70 // mrc p15, 0, r0, c13, c0, 3
+ .inst 0xe12fff1e // bx lr
+ .rep 5
+ .word 0
+ .endr
+
+__kuser_helper_version: // 0xffff0ffc
+ .word ((__kuser_helper_end - __kuser_helper_start) >> 5)
+ .globl __kuser_helper_end
+__kuser_helper_end:
diff --git a/arch/arm64/kernel/signal32.c b/arch/arm64/kernel/signal32.c
new file mode 100644
index 0000000..4bb754c
--- /dev/null
+++ b/arch/arm64/kernel/signal32.c
@@ -0,0 +1,876 @@
+/*
+ * Based on arch/arm/kernel/signal.c
+ *
+ * Copyright (C) 1995-2009 Russell King
+ * Copyright (C) 2012 ARM Ltd.
+ * Modified by Will Deacon <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#define __SYSCALL_COMPAT
+
+#include <linux/compat.h>
+#include <linux/signal.h>
+#include <linux/syscalls.h>
+#include <linux/ratelimit.h>
+
+#include <asm/fpsimd.h>
+#include <asm/signal32.h>
+#include <asm/uaccess.h>
+#include <asm/unistd.h>
+
+typedef struct compat_siginfo {
+ int si_signo;
+ int si_errno;
+ int si_code;
+
+ union {
+ /* The padding is the same size as AArch64. */
+ int _pad[SI_PAD_SIZE];
+
+ /* kill() */
+ struct {
+ compat_pid_t _pid; /* sender's pid */
+ __compat_uid32_t _uid; /* sender's uid */
+ } _kill;
+
+ /* POSIX.1b timers */
+ struct {
+ compat_timer_t _tid; /* timer id */
+ int _overrun; /* overrun count */
+ compat_sigval_t _sigval; /* same as below */
+ int _sys_private; /* not to be passed to user */
+ } _timer;
+
+ /* POSIX.1b signals */
+ struct {
+ compat_pid_t _pid; /* sender's pid */
+ __compat_uid32_t _uid; /* sender's uid */
+ compat_sigval_t _sigval;
+ } _rt;
+
+ /* SIGCHLD */
+ struct {
+ compat_pid_t _pid; /* which child */
+ __compat_uid32_t _uid; /* sender's uid */
+ int _status; /* exit code */
+ compat_clock_t _utime;
+ compat_clock_t _stime;
+ } _sigchld;
+
+ /* SIGILL, SIGFPE, SIGSEGV, SIGBUS */
+ struct {
+ compat_uptr_t _addr; /* faulting insn/memory ref. */
+ short _addr_lsb; /* LSB of the reported address */
+ } _sigfault;
+
+ /* SIGPOLL */
+ struct {
+ compat_long_t _band; /* POLL_IN, POLL_OUT, POLL_MSG */
+ int _fd;
+ } _sigpoll;
+ } _sifields;
+} compat_siginfo_t;
+
+struct compat_sigaction {
+ compat_uptr_t sa_handler;
+ compat_ulong_t sa_flags;
+ compat_uptr_t sa_restorer;
+ compat_sigset_t sa_mask;
+};
+
+struct compat_old_sigaction {
+ compat_uptr_t sa_handler;
+ compat_old_sigset_t sa_mask;
+ compat_ulong_t sa_flags;
+ compat_uptr_t sa_restorer;
+};
+
+typedef struct compat_sigaltstack {
+ compat_uptr_t ss_sp;
+ int ss_flags;
+ compat_size_t ss_size;
+} compat_stack_t;
+
+struct compat_sigcontext {
+ /* We always set these two fields to 0 */
+ compat_ulong_t trap_no;
+ compat_ulong_t error_code;
+
+ compat_ulong_t oldmask;
+ compat_ulong_t arm_r0;
+ compat_ulong_t arm_r1;
+ compat_ulong_t arm_r2;
+ compat_ulong_t arm_r3;
+ compat_ulong_t arm_r4;
+ compat_ulong_t arm_r5;
+ compat_ulong_t arm_r6;
+ compat_ulong_t arm_r7;
+ compat_ulong_t arm_r8;
+ compat_ulong_t arm_r9;
+ compat_ulong_t arm_r10;
+ compat_ulong_t arm_fp;
+ compat_ulong_t arm_ip;
+ compat_ulong_t arm_sp;
+ compat_ulong_t arm_lr;
+ compat_ulong_t arm_pc;
+ compat_ulong_t arm_cpsr;
+ compat_ulong_t fault_address;
+};
+
+struct compat_ucontext {
+ compat_ulong_t uc_flags;
+ struct compat_ucontext *uc_link;
+ compat_stack_t uc_stack;
+ struct compat_sigcontext uc_mcontext;
+ compat_sigset_t uc_sigmask;
+ int __unused[32 - (sizeof (compat_sigset_t) / sizeof (int))];
+ compat_ulong_t uc_regspace[128] __attribute__((__aligned__(8)));
+};
+
+struct compat_vfp_sigframe {
+ compat_ulong_t magic;
+ compat_ulong_t size;
+ struct compat_user_vfp {
+ compat_u64 fpregs[32];
+ compat_ulong_t fpscr;
+ } ufp;
+ struct compat_user_vfp_exc {
+ compat_ulong_t fpexc;
+ compat_ulong_t fpinst;
+ compat_ulong_t fpinst2;
+ } ufp_exc;
+} __attribute__((__aligned__(8)));
+
+#define VFP_MAGIC 0x56465001
+#define VFP_STORAGE_SIZE sizeof(struct compat_vfp_sigframe)
+
+struct compat_aux_sigframe {
+ struct compat_vfp_sigframe vfp;
+
+ /* Something that isn't a valid magic number for any coprocessor. */
+ unsigned long end_magic;
+} __attribute__((__aligned__(8)));
+
+struct compat_sigframe {
+ struct compat_ucontext uc;
+ compat_ulong_t retcode[2];
+};
+
+struct compat_rt_sigframe {
+ struct compat_siginfo info;
+ struct compat_sigframe sig;
+};
+
+#define _BLOCKABLE (~(sigmask(SIGKILL) | sigmask(SIGSTOP)))
+
+/*
+ * For ARM syscalls, the syscall number has to be loaded into r7.
+ * We do not support an OABI userspace.
+ */
+#define MOV_R7_NR_SIGRETURN (0xe3a07000 | __NR_sigreturn)
+#define SVC_SYS_SIGRETURN (0xef000000 | __NR_sigreturn)
+#define MOV_R7_NR_RT_SIGRETURN (0xe3a07000 | __NR_rt_sigreturn)
+#define SVC_SYS_RT_SIGRETURN (0xef000000 | __NR_rt_sigreturn)
+
+/*
+ * For Thumb syscalls, we also pass the syscall number via r7. We therefore
+ * need two 16-bit instructions.
+ */
+#define SVC_THUMB_SIGRETURN (((0xdf00 | __NR_sigreturn) << 16) | \
+ 0x2700 | __NR_sigreturn)
+#define SVC_THUMB_RT_SIGRETURN (((0xdf00 | __NR_rt_sigreturn) << 16) | \
+ 0x2700 | __NR_rt_sigreturn)
+
+const compat_ulong_t aarch32_sigret_code[6] = {
+ /*
+ * AArch32 sigreturn code.
+ * We don't construct an OABI SWI - instead we just set the imm24 field
+ * to the EABI syscall number so that we create a sane disassembly.
+ */
+ MOV_R7_NR_SIGRETURN, SVC_SYS_SIGRETURN, SVC_THUMB_SIGRETURN,
+ MOV_R7_NR_RT_SIGRETURN, SVC_SYS_RT_SIGRETURN, SVC_THUMB_RT_SIGRETURN,
+};
+
+static inline int put_sigset_t(compat_sigset_t __user *uset, sigset_t *set)
+{
+ compat_sigset_t cset;
+
+ cset.sig[0] = set->sig[0] & 0xffffffffull;
+ cset.sig[1] = set->sig[0] >> 32;
+
+ return copy_to_user(uset, &cset, sizeof(*uset));
+}
+
+static inline int get_sigset_t(sigset_t *set,
+ const compat_sigset_t __user *uset)
+{
+ compat_sigset_t s32;
+
+ if (copy_from_user(&s32, uset, sizeof(*uset)))
+ return -EFAULT;
+
+ set->sig[0] = s32.sig[0] | (((long)s32.sig[1]) << 32);
+ return 0;
+}
+
+int copy_siginfo_to_user32(compat_siginfo_t __user *to, siginfo_t *from)
+{
+ int err;
+
+ if (!access_ok(VERIFY_WRITE, to, sizeof(*to)))
+ return -EFAULT;
+
+ /* If you change siginfo_t structure, please be sure
+ * this code is fixed accordingly.
+ * It should never copy any pad contained in the structure
+ * to avoid security leaks, but must copy the generic
+ * 3 ints plus the relevant union member.
+ * This routine must convert siginfo from 64bit to 32bit as well
+ * at the same time.
+ */
+ err = __put_user(from->si_signo, &to->si_signo);
+ err |= __put_user(from->si_errno, &to->si_errno);
+ err |= __put_user((short)from->si_code, &to->si_code);
+ if (from->si_code < 0)
+ err |= __copy_to_user(&to->_sifields._pad, &from->_sifields._pad,
+ SI_PAD_SIZE);
+ else switch (from->si_code & __SI_MASK) {
+ case __SI_KILL:
+ err |= __put_user(from->si_pid, &to->si_pid);
+ err |= __put_user(from->si_uid, &to->si_uid);
+ break;
+ case __SI_TIMER:
+ err |= __put_user(from->si_tid, &to->si_tid);
+ err |= __put_user(from->si_overrun, &to->si_overrun);
+ err |= __put_user((compat_uptr_t)(unsigned long)from->si_ptr,
+ &to->si_ptr);
+ break;
+ case __SI_POLL:
+ err |= __put_user(from->si_band, &to->si_band);
+ err |= __put_user(from->si_fd, &to->si_fd);
+ break;
+ case __SI_FAULT:
+ err |= __put_user((compat_uptr_t)(unsigned long)from->si_addr,
+ &to->si_addr);
+#ifdef BUS_MCEERR_AO
+ /*
+ * Other callers might not initialize the si_lsb field,
+ * so check explicitely for the right codes here.
+ */
+ if (from->si_code == BUS_MCEERR_AR || from->si_code == BUS_MCEERR_AO)
+ err |= __put_user(from->si_addr_lsb, &to->si_addr_lsb);
+#endif
+ break;
+ case __SI_CHLD:
+ err |= __put_user(from->si_pid, &to->si_pid);
+ err |= __put_user(from->si_uid, &to->si_uid);
+ err |= __put_user(from->si_status, &to->si_status);
+ err |= __put_user(from->si_utime, &to->si_utime);
+ err |= __put_user(from->si_stime, &to->si_stime);
+ break;
+ case __SI_RT: /* This is not generated by the kernel as of now. */
+ case __SI_MESGQ: /* But this is */
+ err |= __put_user(from->si_pid, &to->si_pid);
+ err |= __put_user(from->si_uid, &to->si_uid);
+ err |= __put_user((compat_uptr_t)(unsigned long)from->si_ptr, &to->si_ptr);
+ break;
+ default: /* this is just in case for now ... */
+ err |= __put_user(from->si_pid, &to->si_pid);
+ err |= __put_user(from->si_uid, &to->si_uid);
+ break;
+ }
+ return err;
+}
+
+int copy_siginfo_from_user32(siginfo_t *to, compat_siginfo_t __user *from)
+{
+ memset(to, 0, sizeof *to);
+
+ if (copy_from_user(to, from, __ARCH_SI_PREAMBLE_SIZE) ||
+ copy_from_user(to->_sifields._pad,
+ from->_sifields._pad, SI_PAD_SIZE))
+ return -EFAULT;
+
+ return 0;
+}
+
+/*
+ * VFP save/restore code.
+ */
+static int compat_preserve_vfp_context(struct compat_vfp_sigframe __user *frame)
+{
+ struct fpsimd_state *fpsimd = &current->thread.fpsimd_state;
+ compat_ulong_t magic = VFP_MAGIC;
+ compat_ulong_t size = VFP_STORAGE_SIZE;
+ compat_ulong_t fpscr, fpexc;
+ int err = 0;
+
+ /*
+ * Save the hardware registers to the fpsimd_state structure.
+ * Note that this also saves V16-31, which aren't visible
+ * in AArch32.
+ */
+ fpsimd_save_state(fpsimd);
+
+ /* Place structure header on the stack */
+ __put_user_error(magic, &frame->magic, err);
+ __put_user_error(size, &frame->size, err);
+
+ /*
+ * Now copy the FP registers. Since the registers are packed,
+ * we can copy the prefix we want (V0-V15) as it is.
+ * FIXME: Won't work if big endian.
+ */
+ err |= __copy_to_user(&frame->ufp.fpregs, fpsimd->vregs,
+ sizeof(frame->ufp.fpregs));
+
+ /* Create an AArch32 fpscr from the fpsr and the fpcr. */
+ fpscr = (fpsimd->fpsr & VFP_FPSCR_STAT_MASK) |
+ (fpsimd->fpcr & VFP_FPSCR_CTRL_MASK);
+ __put_user_error(fpscr, &frame->ufp.fpscr, err);
+
+ /*
+ * The exception register aren't available so we fake up a
+ * basic FPEXC and zero everything else.
+ */
+ fpexc = (1 << 30);
+ __put_user_error(fpexc, &frame->ufp_exc.fpexc, err);
+ __put_user_error(0, &frame->ufp_exc.fpinst, err);
+ __put_user_error(0, &frame->ufp_exc.fpinst2, err);
+
+ return err ? -EFAULT : 0;
+}
+
+static int compat_restore_vfp_context(struct compat_vfp_sigframe __user *frame)
+{
+ struct fpsimd_state fpsimd;
+ compat_ulong_t magic = VFP_MAGIC;
+ compat_ulong_t size = VFP_STORAGE_SIZE;
+ compat_ulong_t fpscr;
+ int err = 0;
+
+ __get_user_error(magic, &frame->magic, err);
+ __get_user_error(size, &frame->size, err);
+
+ if (err)
+ return -EFAULT;
+ if (magic != VFP_MAGIC || size != VFP_STORAGE_SIZE)
+ return -EINVAL;
+
+ /*
+ * Copy the FP registers into the start of the fpsimd_state.
+ * FIXME: Won't work if big endian.
+ */
+ err |= __copy_from_user(fpsimd.vregs, frame->ufp.fpregs,
+ sizeof(frame->ufp.fpregs));
+
+ /* Extract the fpsr and the fpcr from the fpscr */
+ __get_user_error(fpscr, &frame->ufp.fpscr, err);
+ fpsimd.fpsr = fpscr & VFP_FPSCR_STAT_MASK;
+ fpsimd.fpcr = fpscr & VFP_FPSCR_CTRL_MASK;
+
+ /*
+ * We don't need to touch the exception register, so
+ * reload the hardware state.
+ */
+ if (!err) {
+ preempt_disable();
+ fpsimd_load_state(&fpsimd);
+ preempt_enable();
+ }
+
+ return err ? -EFAULT : 0;
+}
+
+/*
+ * atomically swap in the new signal mask, and wait for a signal.
+ */
+asmlinkage int compat_sys_sigsuspend(int restart, compat_ulong_t oldmask,
+ compat_old_sigset_t mask)
+{
+ sigset_t blocked;
+
+ siginitset(&current->blocked, mask);
+ return sigsuspend(&blocked);
+}
+
+asmlinkage int compat_sys_sigaction(int sig,
+ const struct compat_old_sigaction __user *act,
+ struct compat_old_sigaction __user *oact)
+{
+ struct k_sigaction new_ka, old_ka;
+ int ret;
+ compat_old_sigset_t mask;
+ compat_uptr_t handler, restorer;
+
+ if (act) {
+ if (!access_ok(VERIFY_READ, act, sizeof(*act)) ||
+ __get_user(handler, &act->sa_handler) ||
+ __get_user(restorer, &act->sa_restorer) ||
+ __get_user(new_ka.sa.sa_flags, &act->sa_flags) ||
+ __get_user(mask, &act->sa_mask))
+ return -EFAULT;
+
+ new_ka.sa.sa_handler = compat_ptr(handler);
+ new_ka.sa.sa_restorer = compat_ptr(restorer);
+ siginitset(&new_ka.sa.sa_mask, mask);
+ }
+
+ ret = do_sigaction(sig, act ? &new_ka : NULL, oact ? &old_ka : NULL);
+
+ if (!ret && oact) {
+ if (!access_ok(VERIFY_WRITE, oact, sizeof(*oact)) ||
+ __put_user(ptr_to_compat(old_ka.sa.sa_handler),
+ &oact->sa_handler) ||
+ __put_user(ptr_to_compat(old_ka.sa.sa_restorer),
+ &oact->sa_restorer) ||
+ __put_user(old_ka.sa.sa_flags, &oact->sa_flags) ||
+ __put_user(old_ka.sa.sa_mask.sig[0], &oact->sa_mask))
+ return -EFAULT;
+ }
+
+ return ret;
+}
+
+asmlinkage int compat_sys_rt_sigaction(int sig,
+ const struct compat_sigaction __user *act,
+ struct compat_sigaction __user *oact,
+ compat_size_t sigsetsize)
+{
+ struct k_sigaction new_ka, old_ka;
+ int ret;
+
+ /* XXX: Don't preclude handling different sized sigset_t's. */
+ if (sigsetsize != sizeof(compat_sigset_t))
+ return -EINVAL;
+
+ if (act) {
+ compat_uptr_t handler, restorer;
+
+ ret = get_user(handler, &act->sa_handler);
+ new_ka.sa.sa_handler = compat_ptr(handler);
+ ret |= get_user(restorer, &act->sa_restorer);
+ new_ka.sa.sa_restorer = compat_ptr(restorer);
+ ret |= get_sigset_t(&new_ka.sa.sa_mask, &act->sa_mask);
+ ret |= __get_user(new_ka.sa.sa_flags, &act->sa_flags);
+ if (ret)
+ return -EFAULT;
+ }
+
+ ret = do_sigaction(sig, act ? &new_ka : NULL, oact ? &old_ka : NULL);
+ if (!ret && oact) {
+ ret = put_user(ptr_to_compat(old_ka.sa.sa_handler), &oact->sa_handler);
+ ret |= put_sigset_t(&oact->sa_mask, &old_ka.sa.sa_mask);
+ ret |= __put_user(old_ka.sa.sa_flags, &oact->sa_flags);
+ }
+ return ret;
+}
+
+int compat_do_sigaltstack(compat_uptr_t compat_uss, compat_uptr_t compat_uoss,
+ compat_ulong_t sp)
+{
+ compat_stack_t __user *newstack = compat_ptr(compat_uss);
+ compat_stack_t __user *oldstack = compat_ptr(compat_uoss);
+ compat_uptr_t ss_sp;
+ int ret;
+ mm_segment_t old_fs;
+ stack_t uss, uoss;
+
+ /* Marshall the compat new stack into a stack_t */
+ if (newstack) {
+ if (get_user(ss_sp, &newstack->ss_sp) ||
+ __get_user(uss.ss_flags, &newstack->ss_flags) ||
+ __get_user(uss.ss_size, &newstack->ss_size))
+ return -EFAULT;
+ uss.ss_sp = compat_ptr(ss_sp);
+ }
+
+ old_fs = get_fs();
+ set_fs(KERNEL_DS);
+ /* The __user pointer casts are valid because of the set_fs() */
+ ret = do_sigaltstack(
+ newstack ? (stack_t __user *) &uss : NULL,
+ oldstack ? (stack_t __user *) &uoss : NULL,
+ (unsigned long)sp);
+ set_fs(old_fs);
+
+ /* Convert the old stack_t into a compat stack. */
+ if (!ret && oldstack &&
+ (put_user(ptr_to_compat(uoss.ss_sp), &oldstack->ss_sp) ||
+ __put_user(uoss.ss_flags, &oldstack->ss_flags) ||
+ __put_user(uoss.ss_size, &oldstack->ss_size)))
+ return -EFAULT;
+ return ret;
+}
+
+static int compat_restore_sigframe(struct pt_regs *regs,
+ struct compat_sigframe __user *sf)
+{
+ int err;
+ sigset_t set;
+ struct compat_aux_sigframe __user *aux;
+
+ err = get_sigset_t(&set, &sf->uc.uc_sigmask);
+ if (err == 0) {
+ sigdelsetmask(&set, ~_BLOCKABLE);
+ set_current_blocked(&set);
+ }
+
+ __get_user_error(regs->regs[0], &sf->uc.uc_mcontext.arm_r0, err);
+ __get_user_error(regs->regs[1], &sf->uc.uc_mcontext.arm_r1, err);
+ __get_user_error(regs->regs[2], &sf->uc.uc_mcontext.arm_r2, err);
+ __get_user_error(regs->regs[3], &sf->uc.uc_mcontext.arm_r3, err);
+ __get_user_error(regs->regs[4], &sf->uc.uc_mcontext.arm_r4, err);
+ __get_user_error(regs->regs[5], &sf->uc.uc_mcontext.arm_r5, err);
+ __get_user_error(regs->regs[6], &sf->uc.uc_mcontext.arm_r6, err);
+ __get_user_error(regs->regs[7], &sf->uc.uc_mcontext.arm_r7, err);
+ __get_user_error(regs->regs[8], &sf->uc.uc_mcontext.arm_r8, err);
+ __get_user_error(regs->regs[9], &sf->uc.uc_mcontext.arm_r9, err);
+ __get_user_error(regs->regs[10], &sf->uc.uc_mcontext.arm_r10, err);
+ __get_user_error(regs->regs[11], &sf->uc.uc_mcontext.arm_fp, err);
+ __get_user_error(regs->regs[12], &sf->uc.uc_mcontext.arm_ip, err);
+ __get_user_error(regs->compat_sp, &sf->uc.uc_mcontext.arm_sp, err);
+ __get_user_error(regs->compat_lr, &sf->uc.uc_mcontext.arm_lr, err);
+ __get_user_error(regs->pc, &sf->uc.uc_mcontext.arm_pc, err);
+ __get_user_error(regs->pstate, &sf->uc.uc_mcontext.arm_cpsr, err);
+
+ /*
+ * Avoid compat_sys_sigreturn() restarting.
+ */
+ regs->syscallno = ~0UL;
+
+ err |= !valid_user_regs(&regs->user_regs);
+
+ aux = (struct compat_aux_sigframe __user *) sf->uc.uc_regspace;
+ if (err == 0)
+ err |= compat_restore_vfp_context(&aux->vfp);
+
+ return err;
+}
+
+asmlinkage int compat_sys_sigreturn(struct pt_regs *regs)
+{
+ struct compat_sigframe __user *frame;
+
+ /* Always make any pending restarted system calls return -EINTR */
+ current_thread_info()->restart_block.fn = do_no_restart_syscall;
+
+ /*
+ * Since we stacked the signal on a 64-bit boundary,
+ * then 'sp' should be word aligned here. If it's
+ * not, then the user is trying to mess with us.
+ */
+ if (regs->compat_sp & 7)
+ goto badframe;
+
+ frame = (struct compat_sigframe __user *)regs->compat_sp;
+
+ if (!access_ok(VERIFY_READ, frame, sizeof (*frame)))
+ goto badframe;
+
+ if (compat_restore_sigframe(regs, frame))
+ goto badframe;
+
+ return regs->regs[0];
+
+badframe:
+ if (show_unhandled_signals)
+ printk_ratelimited(KERN_INFO "%s[%d]: bad frame in %s: pc=%08llx sp=%08llx\n",
+ current->comm, task_pid_nr(current), __func__,
+ regs->pc, regs->sp);
+ force_sig(SIGSEGV, current);
+ return 0;
+}
+
+asmlinkage int compat_sys_rt_sigreturn(struct pt_regs *regs)
+{
+ struct compat_rt_sigframe __user *frame;
+
+ /* Always make any pending restarted system calls return -EINTR */
+ current_thread_info()->restart_block.fn = do_no_restart_syscall;
+
+ /*
+ * Since we stacked the signal on a 64-bit boundary,
+ * then 'sp' should be word aligned here. If it's
+ * not, then the user is trying to mess with us.
+ */
+ if (regs->compat_sp & 7)
+ goto badframe;
+
+ frame = (struct compat_rt_sigframe __user *)regs->compat_sp;
+
+ if (!access_ok(VERIFY_READ, frame, sizeof (*frame)))
+ goto badframe;
+
+ if (compat_restore_sigframe(regs, &frame->sig))
+ goto badframe;
+
+ if (compat_do_sigaltstack(ptr_to_compat(&frame->sig.uc.uc_stack),
+ ptr_to_compat((void __user *)NULL),
+ regs->compat_sp) == -EFAULT)
+ goto badframe;
+
+ return regs->regs[0];
+
+badframe:
+ if (show_unhandled_signals)
+ printk_ratelimited(KERN_INFO "%s[%d]: bad frame in %s: pc=%08llx sp=%08llx\n",
+ current->comm, task_pid_nr(current), __func__,
+ regs->pc, regs->sp);
+ force_sig(SIGSEGV, current);
+ return 0;
+}
+
+static inline void __user *compat_get_sigframe(struct k_sigaction *ka,
+ struct pt_regs *regs,
+ int framesize)
+{
+ compat_ulong_t sp = regs->compat_sp;
+ void __user *frame;
+
+ /*
+ * This is the X/Open sanctioned signal stack switching.
+ */
+ if ((ka->sa.sa_flags & SA_ONSTACK) && !sas_ss_flags(sp))
+ sp = current->sas_ss_sp + current->sas_ss_size;
+
+ /*
+ * ATPCS B01 mandates 8-byte alignment
+ */
+ frame = compat_ptr((compat_uptr_t)((sp - framesize) & ~7));
+
+ /*
+ * Check that we can actually write to the signal frame.
+ */
+ if (!access_ok(VERIFY_WRITE, frame, framesize))
+ frame = NULL;
+
+ return frame;
+}
+
+static int compat_setup_return(struct pt_regs *regs, struct k_sigaction *ka,
+ compat_ulong_t __user *rc, void __user *frame,
+ int usig)
+{
+ compat_ulong_t handler = ptr_to_compat(ka->sa.sa_handler);
+ compat_ulong_t retcode;
+ compat_ulong_t spsr = regs->pstate & ~PSR_f;
+ int thumb;
+
+ /* Check if the handler is written for ARM or Thumb */
+ thumb = handler & 1;
+
+ if (thumb) {
+ spsr |= COMPAT_PSR_T_BIT;
+ spsr &= ~COMPAT_PSR_IT_MASK;
+ } else {
+ spsr &= ~COMPAT_PSR_T_BIT;
+ }
+
+ if (ka->sa.sa_flags & SA_RESTORER) {
+ retcode = ptr_to_compat(ka->sa.sa_restorer);
+ } else {
+ /* Set up sigreturn pointer */
+ unsigned int idx = thumb << 1;
+
+ if (ka->sa.sa_flags & SA_SIGINFO)
+ idx += 3;
+
+ retcode = AARCH32_VECTORS_BASE +
+ AARCH32_KERN_SIGRET_CODE_OFFSET +
+ (idx << 2) + thumb;
+ }
+
+ regs->regs[0] = usig;
+ regs->compat_sp = ptr_to_compat(frame);
+ regs->compat_lr = retcode;
+ regs->pc = handler;
+ regs->pstate = spsr;
+
+ return 0;
+}
+
+static int compat_setup_sigframe(struct compat_sigframe __user *sf,
+ struct pt_regs *regs, sigset_t *set)
+{
+ struct compat_aux_sigframe __user *aux;
+ int err = 0;
+
+ __put_user_error(regs->regs[0], &sf->uc.uc_mcontext.arm_r0, err);
+ __put_user_error(regs->regs[1], &sf->uc.uc_mcontext.arm_r1, err);
+ __put_user_error(regs->regs[2], &sf->uc.uc_mcontext.arm_r2, err);
+ __put_user_error(regs->regs[3], &sf->uc.uc_mcontext.arm_r3, err);
+ __put_user_error(regs->regs[4], &sf->uc.uc_mcontext.arm_r4, err);
+ __put_user_error(regs->regs[5], &sf->uc.uc_mcontext.arm_r5, err);
+ __put_user_error(regs->regs[6], &sf->uc.uc_mcontext.arm_r6, err);
+ __put_user_error(regs->regs[7], &sf->uc.uc_mcontext.arm_r7, err);
+ __put_user_error(regs->regs[8], &sf->uc.uc_mcontext.arm_r8, err);
+ __put_user_error(regs->regs[9], &sf->uc.uc_mcontext.arm_r9, err);
+ __put_user_error(regs->regs[10], &sf->uc.uc_mcontext.arm_r10, err);
+ __put_user_error(regs->regs[11], &sf->uc.uc_mcontext.arm_fp, err);
+ __put_user_error(regs->regs[12], &sf->uc.uc_mcontext.arm_ip, err);
+ __put_user_error(regs->compat_sp, &sf->uc.uc_mcontext.arm_sp, err);
+ __put_user_error(regs->compat_lr, &sf->uc.uc_mcontext.arm_lr, err);
+ __put_user_error(regs->pc, &sf->uc.uc_mcontext.arm_pc, err);
+ __put_user_error(regs->pstate, &sf->uc.uc_mcontext.arm_cpsr, err);
+
+ __put_user_error((compat_ulong_t)0, &sf->uc.uc_mcontext.trap_no, err);
+ __put_user_error((compat_ulong_t)0, &sf->uc.uc_mcontext.error_code, err);
+ __put_user_error(current->thread.fault_address, &sf->uc.uc_mcontext.fault_address, err);
+ __put_user_error(set->sig[0], &sf->uc.uc_mcontext.oldmask, err);
+
+ err |= put_sigset_t(&sf->uc.uc_sigmask, set);
+
+ aux = (struct compat_aux_sigframe __user *) sf->uc.uc_regspace;
+
+ if (err == 0)
+ err |= compat_preserve_vfp_context(&aux->vfp);
+ __put_user_error(0, &aux->end_magic, err);
+
+ return err;
+}
+
+/*
+ * 32-bit signal handling routines called from signal.c
+ */
+int compat_setup_rt_frame(int usig, struct k_sigaction *ka, siginfo_t *info,
+ sigset_t *set, struct pt_regs *regs)
+{
+ struct compat_rt_sigframe __user *frame;
+ compat_stack_t stack;
+ int err = 0;
+
+ frame = compat_get_sigframe(ka, regs, sizeof(*frame));
+
+ if (!frame)
+ return 1;
+
+ err |= copy_siginfo_to_user32(&frame->info, info);
+
+ __put_user_error(0, &frame->sig.uc.uc_flags, err);
+ __put_user_error(NULL, &frame->sig.uc.uc_link, err);
+
+ memset(&stack, 0, sizeof(stack));
+ stack.ss_sp = (compat_uptr_t)current->sas_ss_sp;
+ stack.ss_flags = sas_ss_flags(regs->compat_sp);
+ stack.ss_size = current->sas_ss_size;
+ err |= __copy_to_user(&frame->sig.uc.uc_stack, &stack, sizeof(stack));
+
+ err |= compat_setup_sigframe(&frame->sig, regs, set);
+ if (err == 0)
+ err = compat_setup_return(regs, ka, frame->sig.retcode, frame,
+ usig);
+
+ if (err == 0) {
+ regs->regs[1] = (compat_ulong_t)(unsigned long)&frame->info;
+ regs->regs[2] = (compat_ulong_t)(unsigned long)&frame->sig.uc;
+ }
+
+ return err;
+}
+
+int compat_setup_frame(int usig, struct k_sigaction *ka, sigset_t *set,
+ struct pt_regs *regs)
+{
+ struct compat_sigframe __user *frame;
+ int err = 0;
+
+ frame = compat_get_sigframe(ka, regs, sizeof(*frame));
+
+ if (!frame)
+ return 1;
+
+ __put_user_error(0x5ac3c35a, &frame->uc.uc_flags, err);
+
+ err |= compat_setup_sigframe(frame, regs, set);
+ if (err == 0)
+ err = compat_setup_return(regs, ka, frame->retcode, frame, usig);
+
+ return err;
+}
+
+/*
+ * RT signals don't have generic compat wrappers.
+ * See arch/powerpc/kernel/signal_32.c
+ */
+asmlinkage int compat_sys_rt_sigprocmask(int how, compat_sigset_t __user *set,
+ compat_sigset_t __user *oset,
+ compat_size_t sigsetsize)
+{
+ sigset_t s;
+ sigset_t __user *up;
+ int ret;
+ mm_segment_t old_fs = get_fs();
+
+ if (set) {
+ if (get_sigset_t(&s, set))
+ return -EFAULT;
+ }
+
+ set_fs(KERNEL_DS);
+ /* This is valid because of the set_fs() */
+ up = (sigset_t __user *) &s;
+ ret = sys_rt_sigprocmask(how, set ? up : NULL, oset ? up : NULL,
+ sigsetsize);
+ set_fs(old_fs);
+ if (ret)
+ return ret;
+ if (oset) {
+ if (put_sigset_t(oset, &s))
+ return -EFAULT;
+ }
+ return 0;
+}
+
+asmlinkage int compat_sys_rt_sigpending(compat_sigset_t __user *set,
+ compat_size_t sigsetsize)
+{
+ sigset_t s;
+ int ret;
+ mm_segment_t old_fs = get_fs();
+
+ set_fs(KERNEL_DS);
+ /* The __user pointer cast is valid because of the set_fs() */
+ ret = sys_rt_sigpending((sigset_t __user *) &s, sigsetsize);
+ set_fs(old_fs);
+ if (!ret) {
+ if (put_sigset_t(set, &s))
+ return -EFAULT;
+ }
+ return ret;
+}
+
+asmlinkage int compat_sys_rt_sigqueueinfo(int pid, int sig,
+ compat_siginfo_t __user *uinfo)
+{
+ siginfo_t info;
+ int ret;
+ mm_segment_t old_fs = get_fs();
+
+ ret = copy_siginfo_from_user32(&info, uinfo);
+ if (unlikely(ret))
+ return ret;
+
+ set_fs (KERNEL_DS);
+ /* The __user pointer cast is valid because of the set_fs() */
+ ret = sys_rt_sigqueueinfo(pid, sig, (siginfo_t __user *) &info);
+ set_fs (old_fs);
+ return ret;
+}
+
+void compat_setup_restart_syscall(struct pt_regs *regs)
+{
+ regs->regs[7] = __NR_restart_syscall;
+}
diff --git a/arch/arm64/kernel/sys32.S b/arch/arm64/kernel/sys32.S
new file mode 100644
index 0000000..fc764c1
--- /dev/null
+++ b/arch/arm64/kernel/sys32.S
@@ -0,0 +1,283 @@
+/*
+ * Compat system call wrappers
+ *
+ * Copyright (C) 2012 ARM Ltd.
+ * Authors: Will Deacon <[email protected]>
+ * Catalin Marinas <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/linkage.h>
+
+#include <asm/assembler.h>
+#include <asm/asm-offsets.h>
+
+/*
+ * System call wrappers for the AArch32 compatibility layer.
+ */
+compat_sys_fork_wrapper:
+ mov x0, sp
+ b compat_sys_fork
+ENDPROC(compat_sys_fork_wrapper)
+
+compat_sys_vfork_wrapper:
+ mov x0, sp
+ b compat_sys_vfork
+ENDPROC(compat_sys_vfork_wrapper)
+
+compat_sys_execve_wrapper:
+ mov x3, sp
+ b compat_sys_execve
+ENDPROC(compat_sys_execve_wrapper)
+
+compat_sys_clone_wrapper:
+ mov x5, sp
+ b compat_sys_clone
+ENDPROC(compat_sys_clone_wrapper)
+
+compat_sys_sigreturn_wrapper:
+ mov x0, sp
+ mov x27, #0 // prevent syscall restart handling (why)
+ b compat_sys_sigreturn
+ENDPROC(compat_sys_sigreturn_wrapper)
+
+compat_sys_rt_sigreturn_wrapper:
+ mov x0, sp
+ mov x27, #0 // prevent syscall restart handling (why)
+ b compat_sys_rt_sigreturn
+ENDPROC(compat_sys_rt_sigreturn_wrapper)
+
+compat_sys_sigaltstack_wrapper:
+ ldr x2, [sp, #S_COMPAT_SP]
+ b compat_do_sigaltstack
+ENDPROC(compat_sys_sigaltstack_wrapper)
+
+compat_sys_statfs64_wrapper:
+ mov w3, #84
+ cmp w1, #88
+ csel w1, w3, w1, eq
+ b compat_sys_statfs64
+ENDPROC(compat_sys_statfs64_wrapper)
+
+compat_sys_fstatfs64_wrapper:
+ mov w3, #84
+ cmp w1, #88
+ csel w1, w3, w1, eq
+ b compat_sys_fstatfs64
+ENDPROC(compat_sys_fstatfs64_wrapper)
+
+/*
+ * Wrappers for AArch32 syscalls that either take 64-bit parameters
+ * in registers or that take 32-bit parameters which require sign
+ * extension.
+ */
+compat_sys_lseek_wrapper:
+ sxtw x1, w1
+ b sys_lseek
+ENDPROC(compat_sys_lseek_wrapper)
+
+compat_sys_pread64_wrapper:
+ orr x3, x4, x5, lsl #32
+ b sys_pread64
+ENDPROC(compat_sys_pread64_wrapper)
+
+compat_sys_pwrite64_wrapper:
+ orr x3, x4, x5, lsl #32
+ b sys_pwrite64
+ENDPROC(compat_sys_pwrite64_wrapper)
+
+compat_sys_truncate64_wrapper:
+ orr x1, x2, x3, lsl #32
+ b sys_truncate
+ENDPROC(compat_sys_truncate64_wrapper)
+
+compat_sys_ftruncate64_wrapper:
+ orr x1, x2, x3, lsl #32
+ b sys_ftruncate
+ENDPROC(compat_sys_ftruncate64_wrapper)
+
+compat_sys_readahead_wrapper:
+ orr x1, x2, x3, lsl #32
+ mov w2, w4
+ b sys_readahead
+ENDPROC(compat_sys_readahead_wrapper)
+
+compat_sys_lookup_dcookie:
+ orr x0, x0, x1, lsl #32
+ mov w1, w2
+ mov w2, w3
+ b sys_lookup_dcookie
+ENDPROC(compat_sys_lookup_dcookie)
+
+compat_sys_fadvise64_64_wrapper:
+ mov w6, w1
+ orr x1, x2, x3, lsl #32
+ orr x2, x4, x5, lsl #32
+ mov w3, w6
+ b sys_fadvise64_64
+ENDPROC(compat_sys_fadvise64_64_wrapper)
+
+compat_sys_sync_file_range2_wrapper:
+ orr x2, x2, x3, lsl #32
+ orr x3, x4, x5, lsl #32
+ b sys_sync_file_range2
+ENDPROC(compat_sys_sync_file_range2_wrapper)
+
+compat_sys_fallocate_wrapper:
+ orr x2, x2, x3, lsl #32
+ orr x3, x4, x5, lsl #32
+ b sys_fallocate
+ENDPROC(compat_sys_fallocate_wrapper)
+
+compat_sys_fanotify_mark_wrapper:
+ orr x2, x2, x3, lsl #32
+ mov w3, w4
+ mov w4, w5
+ b sys_fanotify_mark
+ENDPROC(compat_sys_fanotify_mark_wrapper)
+
+/*
+ * Use the compat system call wrappers.
+ */
+#define sys_fork compat_sys_fork_wrapper
+#define sys_open compat_sys_open
+#define sys_execve compat_sys_execve_wrapper
+#define sys_lseek compat_sys_lseek_wrapper
+#define sys_mount compat_sys_mount
+#define sys_ptrace compat_sys_ptrace
+#define sys_times compat_sys_times
+#define sys_ioctl compat_sys_ioctl
+#define sys_fcntl compat_sys_fcntl
+#define sys_ustat compat_sys_ustat
+#define sys_sigaction compat_sys_sigaction
+#define sys_sigsuspend compat_sys_sigsuspend
+#define sys_sigpending compat_sys_sigpending
+#define sys_setrlimit compat_sys_setrlimit
+#define sys_getrusage compat_sys_getrusage
+#define sys_gettimeofday compat_sys_gettimeofday
+#define sys_settimeofday compat_sys_settimeofday
+#define sys_statfs compat_sys_statfs
+#define sys_fstatfs compat_sys_fstatfs
+#define sys_setitimer compat_sys_setitimer
+#define sys_getitimer compat_sys_getitimer
+#define sys_newstat compat_sys_newstat
+#define sys_newlstat compat_sys_newlstat
+#define sys_newfstat compat_sys_newfstat
+#define sys_wait4 compat_sys_wait4
+#define sys_sysinfo compat_sys_sysinfo
+#define sys_sigreturn compat_sys_sigreturn_wrapper
+#define sys_clone compat_sys_clone_wrapper
+#define sys_adjtimex compat_sys_adjtimex
+#define sys_sigprocmask compat_sys_sigprocmask
+#define sys_personality compat_sys_personality
+#define sys_getdents compat_sys_getdents
+#define sys_select compat_sys_select
+#define sys_readv compat_sys_readv
+#define sys_writev compat_sys_writev
+#define sys_sysctl compat_sys_sysctl
+#define sys_sched_rr_get_interval compat_sys_sched_rr_get_interval
+#define sys_nanosleep compat_sys_nanosleep
+#define sys_rt_sigreturn compat_sys_rt_sigreturn_wrapper
+#define sys_rt_sigaction compat_sys_rt_sigaction
+#define sys_rt_sigprocmask compat_sys_rt_sigprocmask
+#define sys_rt_sigpending compat_sys_rt_sigpending
+#define sys_rt_sigtimedwait compat_sys_rt_sigtimedwait
+#define sys_rt_sigqueueinfo compat_sys_rt_sigqueueinfo
+#define sys_rt_sigsuspend compat_sys_rt_sigsuspend
+#define sys_pread64 compat_sys_pread64_wrapper
+#define sys_pwrite64 compat_sys_pwrite64_wrapper
+#define sys_sigaltstack compat_sys_sigaltstack_wrapper
+#define sys_sendfile compat_sys_sendfile
+#define sys_vfork compat_sys_vfork_wrapper
+#define sys_getrlimit compat_sys_getrlimit
+#define sys_mmap2 sys_mmap_pgoff
+#define sys_truncate64 compat_sys_truncate64_wrapper
+#define sys_ftruncate64 compat_sys_ftruncate64_wrapper
+#define sys_getdents64 compat_sys_getdents64
+#define sys_fcntl64 compat_sys_fcntl64
+#define sys_readahead compat_sys_readahead_wrapper
+#define sys_futex compat_sys_futex
+#define sys_sched_setaffinity compat_sys_sched_setaffinity
+#define sys_sched_getaffinity compat_sys_sched_getaffinity
+#define sys_io_setup compat_sys_io_setup
+#define sys_io_getevents compat_sys_io_getevents
+#define sys_io_submit compat_sys_io_submit
+#define sys_lookup_dcookie compat_sys_lookup_dcookie
+#define sys_timer_create compat_sys_timer_create
+#define sys_timer_settime compat_sys_timer_settime
+#define sys_timer_gettime compat_sys_timer_gettime
+#define sys_clock_settime compat_sys_clock_settime
+#define sys_clock_gettime compat_sys_clock_gettime
+#define sys_clock_getres compat_sys_clock_getres
+#define sys_clock_nanosleep compat_sys_clock_nanosleep
+#define sys_statfs64 compat_sys_statfs64_wrapper
+#define sys_fstatfs64 compat_sys_fstatfs64_wrapper
+#define sys_utimes compat_sys_utimes
+#define sys_fadvise64_64 compat_sys_fadvise64_64_wrapper
+#define sys_mq_open compat_sys_mq_open
+#define sys_mq_timedsend compat_sys_mq_timedsend
+#define sys_mq_timedreceive compat_sys_mq_timedreceive
+#define sys_mq_notify compat_sys_mq_notify
+#define sys_mq_getsetattr compat_sys_mq_getsetattr
+#define sys_waitid compat_sys_waitid
+#define sys_recv compat_sys_recv
+#define sys_recvfrom compat_sys_recvfrom
+#define sys_setsockopt compat_sys_setsockopt
+#define sys_getsockopt compat_sys_getsockopt
+#define sys_sendmsg compat_sys_sendmsg
+#define sys_recvmsg compat_sys_recvmsg
+#define sys_semctl compat_sys_semctl
+#define sys_msgsnd compat_sys_msgsnd
+#define sys_msgrcv compat_sys_msgrcv
+#define sys_msgctl compat_sys_msgctl
+#define sys_shmat compat_sys_shmat
+#define sys_shmctl compat_sys_shmctl
+#define sys_keyctl compat_sys_keyctl
+#define sys_semtimedop compat_sys_semtimedop
+#define sys_mbind compat_sys_mbind
+#define sys_get_mempolicy compat_sys_get_mempolicy
+#define sys_set_mempolicy compat_sys_set_mempolicy
+#define sys_openat compat_sys_openat
+#define sys_futimesat compat_sys_futimesat
+#define sys_pselect6 compat_sys_pselect6
+#define sys_ppoll compat_sys_ppoll
+#define sys_set_robust_list compat_sys_set_robust_list
+#define sys_get_robust_list compat_sys_get_robust_list
+#define sys_sync_file_range2 compat_sys_sync_file_range2_wrapper
+#define sys_vmsplice compat_sys_vmsplice
+#define sys_move_pages compat_sys_move_pages
+#define sys_epoll_pwait compat_sys_epoll_pwait
+#define sys_kexec_load compat_sys_kexec_load
+#define sys_utimensat compat_sys_utimensat
+#define sys_signalfd compat_sys_signalfd
+#define sys_fallocate compat_sys_fallocate_wrapper
+#define sys_timerfd_settime compat_sys_timerfd_settime
+#define sys_timerfd_gettime compat_sys_timerfd_gettime
+#define sys_signalfd4 compat_sys_signalfd4
+#define sys_preadv compat_sys_preadv
+#define sys_pwritev compat_sys_pwritev
+#define sys_rt_tgsigqueueinfo compat_sys_rt_tgsigqueueinfo
+#define sys_recvmmsg compat_sys_recvmmsg
+#define sys_fanotify_mark compat_sys_fanotify_mark_wrapper
+
+#undef __SYSCALL
+#define __SYSCALL(x, y) .quad y // x
+#define __SYSCALL_COMPAT
+
+/*
+ * The system calls table must be 4KB aligned.
+ */
+ .align 12
+ENTRY(compat_sys_call_table)
+#include <asm/unistd.h>
diff --git a/arch/arm64/kernel/sys_compat.c b/arch/arm64/kernel/sys_compat.c
new file mode 100644
index 0000000..025ec0a
--- /dev/null
+++ b/arch/arm64/kernel/sys_compat.c
@@ -0,0 +1,177 @@
+/*
+ * Based on arch/arm/kernel/sys_arm.c
+ *
+ * Copyright (C) People who wrote linux/arch/i386/kernel/sys_i386.c
+ * Copyright (C) 1995, 1996 Russell King.
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#define __SYSCALL_COMPAT
+
+#include <linux/compat.h>
+#include <linux/personality.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/syscalls.h>
+#include <linux/uaccess.h>
+
+#include <asm/cacheflush.h>
+#include <asm/unistd.h>
+
+asmlinkage int compat_sys_fork(struct pt_regs *regs)
+{
+ return do_fork(SIGCHLD, regs->compat_sp, regs, 0, NULL, NULL);
+}
+
+asmlinkage int compat_sys_clone(unsigned long clone_flags, unsigned long newsp,
+ int __user *parent_tidptr, int tls_val,
+ int __user *child_tidptr, struct pt_regs *regs)
+{
+ if (!newsp)
+ newsp = regs->compat_sp;
+
+ return do_fork(clone_flags, newsp, regs, 0, parent_tidptr, child_tidptr);
+}
+
+asmlinkage int compat_sys_vfork(struct pt_regs *regs)
+{
+ return do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, regs->compat_sp,
+ regs, 0, NULL, NULL);
+}
+
+asmlinkage int compat_sys_execve(const char __user *filenamei,
+ compat_uptr_t argv, compat_uptr_t envp,
+ struct pt_regs *regs)
+{
+ int error;
+ char * filename;
+
+ filename = getname(filenamei);
+ error = PTR_ERR(filename);
+ if (IS_ERR(filename))
+ goto out;
+ error = compat_do_execve(filename, compat_ptr(argv), compat_ptr(envp),
+ regs);
+ putname(filename);
+out:
+ return error;
+}
+
+asmlinkage int compat_sys_sched_rr_get_interval(compat_pid_t pid,
+ struct compat_timespec __user *interval)
+{
+ struct timespec t;
+ int ret;
+ mm_segment_t old_fs = get_fs();
+
+ set_fs(KERNEL_DS);
+ ret = sys_sched_rr_get_interval(pid, (struct timespec __user *)&t);
+ set_fs(old_fs);
+ if (put_compat_timespec(&t, interval))
+ return -EFAULT;
+ return ret;
+}
+
+asmlinkage int compat_sys_personality(compat_ulong_t personality)
+{
+ int ret;
+
+ if (personality(current->personality) == PER_LINUX32 &&
+ personality == PER_LINUX)
+ personality = PER_LINUX32;
+ ret = sys_personality(personality);
+ if (ret == PER_LINUX32)
+ ret = PER_LINUX;
+ return ret;
+}
+
+asmlinkage int compat_sys_sendfile(int out_fd, int in_fd,
+ compat_off_t __user *offset, s32 count)
+{
+ mm_segment_t old_fs = get_fs();
+ int ret;
+ off_t of;
+
+ if (offset && get_user(of, offset))
+ return -EFAULT;
+
+ set_fs(KERNEL_DS);
+ ret = sys_sendfile(out_fd, in_fd, offset ? (off_t __user *)&of : NULL,
+ count);
+ set_fs(old_fs);
+
+ if (offset && put_user(of, offset))
+ return -EFAULT;
+ return ret;
+}
+
+static inline void
+do_compat_cache_op(unsigned long start, unsigned long end, int flags)
+{
+ struct mm_struct *mm = current->active_mm;
+ struct vm_area_struct *vma;
+
+ if (end < start || flags)
+ return;
+
+ down_read(&mm->mmap_sem);
+ vma = find_vma(mm, start);
+ if (vma && vma->vm_start < end) {
+ if (start < vma->vm_start)
+ start = vma->vm_start;
+ if (end > vma->vm_end)
+ end = vma->vm_end;
+ up_read(&mm->mmap_sem);
+ flush_cache_user_range(start, end);
+ return;
+ }
+ up_read(&mm->mmap_sem);
+}
+
+/*
+ * Handle all unrecognised system calls.
+ */
+long compat_arm_syscall(struct pt_regs *regs)
+{
+ unsigned int no = regs->regs[7];
+
+ switch (no) {
+ /*
+ * Flush a region from virtual address 'r0' to virtual address 'r1'
+ * _exclusive_. There is no alignment requirement on either address;
+ * user space does not need to know the hardware cache layout.
+ *
+ * r2 contains flags. It should ALWAYS be passed as ZERO until it
+ * is defined to be something else. For now we ignore it, but may
+ * the fires of hell burn in your belly if you break this rule. ;)
+ *
+ * (at a later date, we may want to allow this call to not flush
+ * various aspects of the cache. Passing '0' will guarantee that
+ * everything necessary gets flushed to maintain consistency in
+ * the specified region).
+ */
+ case __ARM_NR_compat_cacheflush:
+ do_compat_cache_op(regs->regs[0], regs->regs[1], regs->regs[2]);
+ return 0;
+
+ case __ARM_NR_compat_set_tls:
+ current->thread.tp_value = regs->regs[0];
+ asm ("msr tpidrro_el0, %0" : : "r" (regs->regs[0]));
+ return 0;
+
+ default:
+ return -ENOSYS;
+ }
+}

2012-08-14 17:54:23

by Catalin Marinas

[permalink] [raw]
Subject: [PATCH v2 23/31] arm64: Debugging support

From: Will Deacon <[email protected]>

This patch adds ptrace, debug monitors and hardware breakpoints support.

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
---
arch/arm64/include/asm/debug-monitors.h | 88 +++
arch/arm64/include/asm/hw_breakpoint.h | 137 +++++
arch/arm64/kernel/debug-monitors.c | 288 ++++++++++
arch/arm64/kernel/hw_breakpoint.c | 880 +++++++++++++++++++++++++++++++
arch/arm64/kernel/ptrace.c | 834 +++++++++++++++++++++++++++++
5 files changed, 2227 insertions(+), 0 deletions(-)
create mode 100644 arch/arm64/include/asm/debug-monitors.h
create mode 100644 arch/arm64/include/asm/hw_breakpoint.h
create mode 100644 arch/arm64/kernel/debug-monitors.c
create mode 100644 arch/arm64/kernel/hw_breakpoint.c
create mode 100644 arch/arm64/kernel/ptrace.c

diff --git a/arch/arm64/include/asm/debug-monitors.h b/arch/arm64/include/asm/debug-monitors.h
new file mode 100644
index 0000000..7eaa0b3
--- /dev/null
+++ b/arch/arm64/include/asm/debug-monitors.h
@@ -0,0 +1,88 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_DEBUG_MONITORS_H
+#define __ASM_DEBUG_MONITORS_H
+
+#ifdef __KERNEL__
+
+#define DBG_ESR_EVT(x) (((x) >> 27) & 0x7)
+
+/* AArch64 */
+#define DBG_ESR_EVT_HWBP 0x0
+#define DBG_ESR_EVT_HWSS 0x1
+#define DBG_ESR_EVT_HWWP 0x2
+#define DBG_ESR_EVT_BRK 0x6
+
+enum debug_el {
+ DBG_ACTIVE_EL0 = 0,
+ DBG_ACTIVE_EL1,
+};
+
+/* AArch32 */
+#define DBG_ESR_EVT_BKPT 0x4
+#define DBG_ESR_EVT_VECC 0x5
+
+#define AARCH32_BREAK_ARM 0x07f001f0
+#define AARCH32_BREAK_THUMB 0xde01
+#define AARCH32_BREAK_THUMB2_LO 0xf7f0
+#define AARCH32_BREAK_THUMB2_HI 0xa000
+
+#ifndef __ASSEMBLY__
+struct task_struct;
+
+#define local_dbg_save(flags) \
+ do { \
+ typecheck(unsigned long, flags); \
+ asm volatile( \
+ "mrs %0, daif // local_dbg_save\n" \
+ "msr daifset, #8" \
+ : "=r" (flags) : : "memory"); \
+ } while (0)
+
+#define local_dbg_restore(flags) \
+ do { \
+ typecheck(unsigned long, flags); \
+ asm volatile( \
+ "msr daif, %0 // local_dbg_restore\n" \
+ : : "r" (flags) : "memory"); \
+ } while (0)
+
+#define DBG_ARCH_ID_RESERVED 0 /* In case of ptrace ABI updates. */
+
+u8 debug_monitors_arch(void);
+
+void enable_debug_monitors(enum debug_el el);
+void disable_debug_monitors(enum debug_el el);
+
+void user_rewind_single_step(struct task_struct *task);
+void user_fastforward_single_step(struct task_struct *task);
+
+void kernel_enable_single_step(struct pt_regs *regs);
+void kernel_disable_single_step(void);
+int kernel_active_single_step(void);
+
+#ifdef CONFIG_HAVE_HW_BREAKPOINT
+int reinstall_suspended_bps(struct pt_regs *regs);
+#else
+static inline int reinstall_suspended_bps(struct pt_regs *regs)
+{
+ return -ENODEV;
+}
+#endif
+
+#endif /* __ASSEMBLY */
+#endif /* __KERNEL__ */
+#endif /* __ASM_DEBUG_MONITORS_H */
diff --git a/arch/arm64/include/asm/hw_breakpoint.h b/arch/arm64/include/asm/hw_breakpoint.h
new file mode 100644
index 0000000..d064047
--- /dev/null
+++ b/arch/arm64/include/asm/hw_breakpoint.h
@@ -0,0 +1,137 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_HW_BREAKPOINT_H
+#define __ASM_HW_BREAKPOINT_H
+
+#ifdef __KERNEL__
+
+struct arch_hw_breakpoint_ctrl {
+ u32 __reserved : 19,
+ len : 8,
+ type : 2,
+ privilege : 2,
+ enabled : 1;
+};
+
+struct arch_hw_breakpoint {
+ u64 address;
+ u64 trigger;
+ struct arch_hw_breakpoint_ctrl ctrl;
+};
+
+static inline u32 encode_ctrl_reg(struct arch_hw_breakpoint_ctrl ctrl)
+{
+ return (ctrl.len << 5) | (ctrl.type << 3) | (ctrl.privilege << 1) |
+ ctrl.enabled;
+}
+
+static inline void decode_ctrl_reg(u32 reg,
+ struct arch_hw_breakpoint_ctrl *ctrl)
+{
+ ctrl->enabled = reg & 0x1;
+ reg >>= 1;
+ ctrl->privilege = reg & 0x3;
+ reg >>= 2;
+ ctrl->type = reg & 0x3;
+ reg >>= 2;
+ ctrl->len = reg & 0xff;
+}
+
+/* Breakpoint */
+#define ARM_BREAKPOINT_EXECUTE 0
+
+/* Watchpoints */
+#define ARM_BREAKPOINT_LOAD 1
+#define ARM_BREAKPOINT_STORE 2
+#define AARCH64_ESR_ACCESS_MASK (1 << 6)
+
+/* Privilege Levels */
+#define AARCH64_BREAKPOINT_EL1 1
+#define AARCH64_BREAKPOINT_EL0 2
+
+/* Lengths */
+#define ARM_BREAKPOINT_LEN_1 0x1
+#define ARM_BREAKPOINT_LEN_2 0x3
+#define ARM_BREAKPOINT_LEN_4 0xf
+#define ARM_BREAKPOINT_LEN_8 0xff
+
+/* Kernel stepping */
+#define ARM_KERNEL_STEP_NONE 0
+#define ARM_KERNEL_STEP_ACTIVE 1
+#define ARM_KERNEL_STEP_SUSPEND 2
+
+/*
+ * Limits.
+ * Changing these will require modifications to the register accessors.
+ */
+#define ARM_MAX_BRP 16
+#define ARM_MAX_WRP 16
+#define ARM_MAX_HBP_SLOTS (ARM_MAX_BRP + ARM_MAX_WRP)
+
+/* Virtual debug register bases. */
+#define AARCH64_DBG_REG_BVR 0
+#define AARCH64_DBG_REG_BCR (AARCH64_DBG_REG_BVR + ARM_MAX_BRP)
+#define AARCH64_DBG_REG_WVR (AARCH64_DBG_REG_BCR + ARM_MAX_BRP)
+#define AARCH64_DBG_REG_WCR (AARCH64_DBG_REG_WVR + ARM_MAX_WRP)
+
+/* Debug register names. */
+#define AARCH64_DBG_REG_NAME_BVR "bvr"
+#define AARCH64_DBG_REG_NAME_BCR "bcr"
+#define AARCH64_DBG_REG_NAME_WVR "wvr"
+#define AARCH64_DBG_REG_NAME_WCR "wcr"
+
+/* Accessor macros for the debug registers. */
+#define AARCH64_DBG_READ(N, REG, VAL) do {\
+ asm volatile("mrs %0, dbg" REG #N "_el1" : "=r" (VAL));\
+} while (0)
+
+#define AARCH64_DBG_WRITE(N, REG, VAL) do {\
+ asm volatile("msr dbg" REG #N "_el1, %0" :: "r" (VAL));\
+} while (0)
+
+struct task_struct;
+struct notifier_block;
+struct perf_event;
+struct pmu;
+
+extern int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl,
+ int *gen_len, int *gen_type);
+extern int arch_check_bp_in_kernelspace(struct perf_event *bp);
+extern int arch_validate_hwbkpt_settings(struct perf_event *bp);
+extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
+ unsigned long val, void *data);
+
+extern int arch_install_hw_breakpoint(struct perf_event *bp);
+extern void arch_uninstall_hw_breakpoint(struct perf_event *bp);
+extern void hw_breakpoint_pmu_read(struct perf_event *bp);
+extern int hw_breakpoint_slots(int type);
+
+#ifdef CONFIG_HAVE_HW_BREAKPOINT
+extern void hw_breakpoint_thread_switch(struct task_struct *next);
+extern void ptrace_hw_copy_thread(struct task_struct *task);
+#else
+static inline void hw_breakpoint_thread_switch(struct task_struct *next)
+{
+}
+static inline void ptrace_hw_copy_thread(struct task_struct *task)
+{
+}
+#endif
+
+extern struct pmu perf_ops_bp;
+
+#endif /* __KERNEL__ */
+#endif /* __ASM_BREAKPOINT_H */
diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c
new file mode 100644
index 0000000..0c3ba9f
--- /dev/null
+++ b/arch/arm64/kernel/debug-monitors.c
@@ -0,0 +1,288 @@
+/*
+ * ARMv8 single-step debug support and mdscr context switching.
+ *
+ * Copyright (C) 2012 ARM Limited
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Author: Will Deacon <[email protected]>
+ */
+
+#include <linux/cpu.h>
+#include <linux/debugfs.h>
+#include <linux/hardirq.h>
+#include <linux/init.h>
+#include <linux/ptrace.h>
+#include <linux/stat.h>
+
+#include <asm/debug-monitors.h>
+#include <asm/local.h>
+#include <asm/cputype.h>
+#include <asm/system_misc.h>
+
+/* Low-level stepping controls. */
+#define DBG_MDSCR_SS (1 << 0)
+#define DBG_SPSR_SS (1 << 21)
+
+/* MDSCR_EL1 enabling bits */
+#define DBG_MDSCR_KDE (1 << 13)
+#define DBG_MDSCR_MDE (1 << 15)
+#define DBG_MDSCR_MASK ~(DBG_MDSCR_KDE | DBG_MDSCR_MDE)
+
+/* Determine debug architecture. */
+u8 debug_monitors_arch(void)
+{
+ return read_cpuid(ID_AA64DFR0_EL1) & 0xf;
+}
+
+/*
+ * MDSCR access routines.
+ */
+static void mdscr_write(u32 mdscr)
+{
+ unsigned long flags;
+ local_dbg_save(flags);
+ asm volatile("msr mdscr_el1, %0" :: "r" (mdscr));
+ local_dbg_restore(flags);
+}
+
+static u32 mdscr_read(void)
+{
+ u32 mdscr;
+ asm volatile("mrs %0, mdscr_el1" : "=r" (mdscr));
+ return mdscr;
+}
+
+/*
+ * Allow root to disable self-hosted debug from userspace.
+ * This is useful if you want to connect an external JTAG debugger.
+ */
+static u32 debug_enabled = 1;
+
+static int create_debug_debugfs_entry(void)
+{
+ debugfs_create_bool("debug_enabled", 0644, NULL, &debug_enabled);
+ return 0;
+}
+fs_initcall(create_debug_debugfs_entry);
+
+static int __init early_debug_disable(char *buf)
+{
+ debug_enabled = 0;
+ return 0;
+}
+
+early_param("nodebugmon", early_debug_disable);
+
+/*
+ * Keep track of debug users on each core.
+ * The ref counts are per-cpu so we use a local_t type.
+ */
+static DEFINE_PER_CPU(local_t, mde_ref_count);
+static DEFINE_PER_CPU(local_t, kde_ref_count);
+
+void enable_debug_monitors(enum debug_el el)
+{
+ u32 mdscr, enable = 0;
+
+ WARN_ON(preemptible());
+
+ if (local_inc_return(&__get_cpu_var(mde_ref_count)) == 1)
+ enable = DBG_MDSCR_MDE;
+
+ if (el == DBG_ACTIVE_EL1 &&
+ local_inc_return(&__get_cpu_var(kde_ref_count)) == 1)
+ enable |= DBG_MDSCR_KDE;
+
+ if (enable && debug_enabled) {
+ mdscr = mdscr_read();
+ mdscr |= enable;
+ mdscr_write(mdscr);
+ }
+}
+
+void disable_debug_monitors(enum debug_el el)
+{
+ u32 mdscr, disable = 0;
+
+ WARN_ON(preemptible());
+
+ if (local_dec_and_test(&__get_cpu_var(mde_ref_count)))
+ disable = ~DBG_MDSCR_MDE;
+
+ if (el == DBG_ACTIVE_EL1 &&
+ local_dec_and_test(&__get_cpu_var(kde_ref_count)))
+ disable &= ~DBG_MDSCR_KDE;
+
+ if (disable) {
+ mdscr = mdscr_read();
+ mdscr &= disable;
+ mdscr_write(mdscr);
+ }
+}
+
+/*
+ * OS lock clearing.
+ */
+static void clear_os_lock(void *unused)
+{
+ asm volatile("msr mdscr_el1, %0" : : "r" (0));
+ isb();
+ asm volatile("msr oslar_el1, %0" : : "r" (0));
+ isb();
+}
+
+static int __cpuinit os_lock_notify(struct notifier_block *self,
+ unsigned long action, void *data)
+{
+ int cpu = (unsigned long)data;
+ if (action == CPU_ONLINE)
+ smp_call_function_single(cpu, clear_os_lock, NULL, 1);
+ return NOTIFY_OK;
+}
+
+static struct notifier_block __cpuinitdata os_lock_nb = {
+ .notifier_call = os_lock_notify,
+};
+
+static int __cpuinit debug_monitors_init(void)
+{
+ /* Clear the OS lock. */
+ smp_call_function(clear_os_lock, NULL, 1);
+ clear_os_lock(NULL);
+
+ /* Register hotplug handler. */
+ register_cpu_notifier(&os_lock_nb);
+ return 0;
+}
+postcore_initcall(debug_monitors_init);
+
+/*
+ * Single step API and exception handling.
+ */
+static void set_regs_spsr_ss(struct pt_regs *regs)
+{
+ unsigned long spsr;
+
+ spsr = regs->pstate;
+ spsr &= ~DBG_SPSR_SS;
+ spsr |= DBG_SPSR_SS;
+ regs->pstate = spsr;
+}
+
+static void clear_regs_spsr_ss(struct pt_regs *regs)
+{
+ unsigned long spsr;
+
+ spsr = regs->pstate;
+ spsr &= ~DBG_SPSR_SS;
+ regs->pstate = spsr;
+}
+
+static int single_step_handler(unsigned long addr, unsigned int esr,
+ struct pt_regs *regs)
+{
+ siginfo_t info;
+
+ /*
+ * If we are stepping a pending breakpoint, call the hw_breakpoint
+ * handler first.
+ */
+ if (!reinstall_suspended_bps(regs))
+ return 0;
+
+ if (user_mode(regs)) {
+ info.si_signo = SIGTRAP;
+ info.si_errno = 0;
+ info.si_code = TRAP_HWBKPT;
+ info.si_addr = (void __user *)instruction_pointer(regs);
+ force_sig_info(SIGTRAP, &info, current);
+
+ /*
+ * ptrace will disable single step unless explicitly
+ * asked to re-enable it. For other clients, it makes
+ * sense to leave it enabled (i.e. rewind the controls
+ * to the active-not-pending state).
+ */
+ user_rewind_single_step(current);
+ } else {
+ /* TODO: route to KGDB */
+ pr_warning("Unexpected kernel single-step exception at EL1\n");
+ /*
+ * Re-enable stepping since we know that we will be
+ * returning to regs.
+ */
+ set_regs_spsr_ss(regs);
+ }
+
+ return 0;
+}
+
+static int __init single_step_init(void)
+{
+ hook_debug_fault_code(DBG_ESR_EVT_HWSS, single_step_handler, SIGTRAP,
+ TRAP_HWBKPT, "single-step handler");
+ return 0;
+}
+arch_initcall(single_step_init);
+
+/* Re-enable single step for syscall restarting. */
+void user_rewind_single_step(struct task_struct *task)
+{
+ /*
+ * If single step is active for this thread, then set SPSR.SS
+ * to 1 to avoid returning to the active-pending state.
+ */
+ if (test_ti_thread_flag(task_thread_info(task), TIF_SINGLESTEP))
+ set_regs_spsr_ss(task_pt_regs(task));
+}
+
+void user_fastforward_single_step(struct task_struct *task)
+{
+ if (test_ti_thread_flag(task_thread_info(task), TIF_SINGLESTEP))
+ clear_regs_spsr_ss(task_pt_regs(task));
+}
+
+/* Kernel API */
+void kernel_enable_single_step(struct pt_regs *regs)
+{
+ WARN_ON(!irqs_disabled());
+ set_regs_spsr_ss(regs);
+ mdscr_write(mdscr_read() | DBG_MDSCR_SS);
+ enable_debug_monitors(DBG_ACTIVE_EL1);
+}
+
+void kernel_disable_single_step(void)
+{
+ WARN_ON(!irqs_disabled());
+ mdscr_write(mdscr_read() & ~DBG_MDSCR_SS);
+ disable_debug_monitors(DBG_ACTIVE_EL1);
+}
+
+int kernel_active_single_step(void)
+{
+ WARN_ON(!irqs_disabled());
+ return mdscr_read() & DBG_MDSCR_SS;
+}
+
+/* ptrace API */
+void user_enable_single_step(struct task_struct *task)
+{
+ set_ti_thread_flag(task_thread_info(task), TIF_SINGLESTEP);
+ set_regs_spsr_ss(task_pt_regs(task));
+}
+
+void user_disable_single_step(struct task_struct *task)
+{
+ clear_ti_thread_flag(task_thread_info(task), TIF_SINGLESTEP);
+}
diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c
new file mode 100644
index 0000000..5ab825c
--- /dev/null
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -0,0 +1,880 @@
+/*
+ * HW_breakpoint: a unified kernel/user-space hardware breakpoint facility,
+ * using the CPU's debug registers.
+ *
+ * Copyright (C) 2012 ARM Limited
+ * Author: Will Deacon <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#define pr_fmt(fmt) "hw-breakpoint: " fmt
+
+#include <linux/errno.h>
+#include <linux/hw_breakpoint.h>
+#include <linux/perf_event.h>
+#include <linux/ptrace.h>
+#include <linux/smp.h>
+
+#include <asm/compat.h>
+#include <asm/current.h>
+#include <asm/debug-monitors.h>
+#include <asm/hw_breakpoint.h>
+#include <asm/kdebug.h>
+#include <asm/traps.h>
+#include <asm/cputype.h>
+#include <asm/system_misc.h>
+
+/* Breakpoint currently in use for each BRP. */
+static DEFINE_PER_CPU(struct perf_event *, bp_on_reg[ARM_MAX_BRP]);
+
+/* Watchpoint currently in use for each WRP. */
+static DEFINE_PER_CPU(struct perf_event *, wp_on_reg[ARM_MAX_WRP]);
+
+/* Currently stepping a per-CPU kernel breakpoint. */
+static DEFINE_PER_CPU(int, stepping_kernel_bp);
+
+/* Number of BRP/WRP registers on this CPU. */
+static int core_num_brps;
+static int core_num_wrps;
+
+/* Determine number of BRP registers available. */
+static int get_num_brps(void)
+{
+ return ((read_cpuid(ID_AA64DFR0_EL1) >> 12) & 0xf) + 1;
+}
+
+/* Determine number of WRP registers available. */
+static int get_num_wrps(void)
+{
+ return ((read_cpuid(ID_AA64DFR0_EL1) >> 20) & 0xf) + 1;
+}
+
+int hw_breakpoint_slots(int type)
+{
+ /*
+ * We can be called early, so don't rely on
+ * our static variables being initialised.
+ */
+ switch (type) {
+ case TYPE_INST:
+ return get_num_brps();
+ case TYPE_DATA:
+ return get_num_wrps();
+ default:
+ pr_warning("unknown slot type: %d\n", type);
+ return 0;
+ }
+}
+
+#define READ_WB_REG_CASE(OFF, N, REG, VAL) \
+ case (OFF + N): \
+ AARCH64_DBG_READ(N, REG, VAL); \
+ break
+
+#define WRITE_WB_REG_CASE(OFF, N, REG, VAL) \
+ case (OFF + N): \
+ AARCH64_DBG_WRITE(N, REG, VAL); \
+ break
+
+#define GEN_READ_WB_REG_CASES(OFF, REG, VAL) \
+ READ_WB_REG_CASE(OFF, 0, REG, VAL); \
+ READ_WB_REG_CASE(OFF, 1, REG, VAL); \
+ READ_WB_REG_CASE(OFF, 2, REG, VAL); \
+ READ_WB_REG_CASE(OFF, 3, REG, VAL); \
+ READ_WB_REG_CASE(OFF, 4, REG, VAL); \
+ READ_WB_REG_CASE(OFF, 5, REG, VAL); \
+ READ_WB_REG_CASE(OFF, 6, REG, VAL); \
+ READ_WB_REG_CASE(OFF, 7, REG, VAL); \
+ READ_WB_REG_CASE(OFF, 8, REG, VAL); \
+ READ_WB_REG_CASE(OFF, 9, REG, VAL); \
+ READ_WB_REG_CASE(OFF, 10, REG, VAL); \
+ READ_WB_REG_CASE(OFF, 11, REG, VAL); \
+ READ_WB_REG_CASE(OFF, 12, REG, VAL); \
+ READ_WB_REG_CASE(OFF, 13, REG, VAL); \
+ READ_WB_REG_CASE(OFF, 14, REG, VAL); \
+ READ_WB_REG_CASE(OFF, 15, REG, VAL)
+
+#define GEN_WRITE_WB_REG_CASES(OFF, REG, VAL) \
+ WRITE_WB_REG_CASE(OFF, 0, REG, VAL); \
+ WRITE_WB_REG_CASE(OFF, 1, REG, VAL); \
+ WRITE_WB_REG_CASE(OFF, 2, REG, VAL); \
+ WRITE_WB_REG_CASE(OFF, 3, REG, VAL); \
+ WRITE_WB_REG_CASE(OFF, 4, REG, VAL); \
+ WRITE_WB_REG_CASE(OFF, 5, REG, VAL); \
+ WRITE_WB_REG_CASE(OFF, 6, REG, VAL); \
+ WRITE_WB_REG_CASE(OFF, 7, REG, VAL); \
+ WRITE_WB_REG_CASE(OFF, 8, REG, VAL); \
+ WRITE_WB_REG_CASE(OFF, 9, REG, VAL); \
+ WRITE_WB_REG_CASE(OFF, 10, REG, VAL); \
+ WRITE_WB_REG_CASE(OFF, 11, REG, VAL); \
+ WRITE_WB_REG_CASE(OFF, 12, REG, VAL); \
+ WRITE_WB_REG_CASE(OFF, 13, REG, VAL); \
+ WRITE_WB_REG_CASE(OFF, 14, REG, VAL); \
+ WRITE_WB_REG_CASE(OFF, 15, REG, VAL)
+
+static u64 read_wb_reg(int reg, int n)
+{
+ u64 val = 0;
+
+ switch (reg + n) {
+ GEN_READ_WB_REG_CASES(AARCH64_DBG_REG_BVR, AARCH64_DBG_REG_NAME_BVR, val);
+ GEN_READ_WB_REG_CASES(AARCH64_DBG_REG_BCR, AARCH64_DBG_REG_NAME_BCR, val);
+ GEN_READ_WB_REG_CASES(AARCH64_DBG_REG_WVR, AARCH64_DBG_REG_NAME_WVR, val);
+ GEN_READ_WB_REG_CASES(AARCH64_DBG_REG_WCR, AARCH64_DBG_REG_NAME_WCR, val);
+ default:
+ pr_warning("attempt to read from unknown breakpoint register %d\n", n);
+ }
+
+ return val;
+}
+
+static void write_wb_reg(int reg, int n, u64 val)
+{
+ switch (reg + n) {
+ GEN_WRITE_WB_REG_CASES(AARCH64_DBG_REG_BVR, AARCH64_DBG_REG_NAME_BVR, val);
+ GEN_WRITE_WB_REG_CASES(AARCH64_DBG_REG_BCR, AARCH64_DBG_REG_NAME_BCR, val);
+ GEN_WRITE_WB_REG_CASES(AARCH64_DBG_REG_WVR, AARCH64_DBG_REG_NAME_WVR, val);
+ GEN_WRITE_WB_REG_CASES(AARCH64_DBG_REG_WCR, AARCH64_DBG_REG_NAME_WCR, val);
+ default:
+ pr_warning("attempt to write to unknown breakpoint register %d\n", n);
+ }
+ isb();
+}
+
+/*
+ * Convert a breakpoint privilege level to the corresponding exception
+ * level.
+ */
+static enum debug_el debug_exception_level(int privilege)
+{
+ switch (privilege) {
+ case AARCH64_BREAKPOINT_EL0:
+ return DBG_ACTIVE_EL0;
+ case AARCH64_BREAKPOINT_EL1:
+ return DBG_ACTIVE_EL1;
+ default:
+ pr_warning("invalid breakpoint privilege level %d\n", privilege);
+ return -EINVAL;
+ }
+}
+
+/*
+ * Install a perf counter breakpoint.
+ */
+int arch_install_hw_breakpoint(struct perf_event *bp)
+{
+ struct arch_hw_breakpoint *info = counter_arch_bp(bp);
+ struct perf_event **slot, **slots;
+ struct debug_info *debug_info = &current->thread.debug;
+ int i, max_slots, ctrl_reg, val_reg, reg_enable;
+ u32 ctrl;
+
+ if (info->ctrl.type == ARM_BREAKPOINT_EXECUTE) {
+ /* Breakpoint */
+ ctrl_reg = AARCH64_DBG_REG_BCR;
+ val_reg = AARCH64_DBG_REG_BVR;
+ slots = __get_cpu_var(bp_on_reg);
+ max_slots = core_num_brps;
+ reg_enable = !debug_info->bps_disabled;
+ } else {
+ /* Watchpoint */
+ ctrl_reg = AARCH64_DBG_REG_WCR;
+ val_reg = AARCH64_DBG_REG_WVR;
+ slots = __get_cpu_var(wp_on_reg);
+ max_slots = core_num_wrps;
+ reg_enable = !debug_info->wps_disabled;
+ }
+
+ for (i = 0; i < max_slots; ++i) {
+ slot = &slots[i];
+
+ if (!*slot) {
+ *slot = bp;
+ break;
+ }
+ }
+
+ if (WARN_ONCE(i == max_slots, "Can't find any breakpoint slot"))
+ return -ENOSPC;
+
+ /* Ensure debug monitors are enabled at the correct exception level. */
+ enable_debug_monitors(debug_exception_level(info->ctrl.privilege));
+
+ /* Setup the address register. */
+ write_wb_reg(val_reg, i, info->address);
+
+ /* Setup the control register. */
+ ctrl = encode_ctrl_reg(info->ctrl);
+ write_wb_reg(ctrl_reg, i, reg_enable ? ctrl | 0x1 : ctrl & ~0x1);
+
+ return 0;
+}
+
+void arch_uninstall_hw_breakpoint(struct perf_event *bp)
+{
+ struct arch_hw_breakpoint *info = counter_arch_bp(bp);
+ struct perf_event **slot, **slots;
+ int i, max_slots, base;
+
+ if (info->ctrl.type == ARM_BREAKPOINT_EXECUTE) {
+ /* Breakpoint */
+ base = AARCH64_DBG_REG_BCR;
+ slots = __get_cpu_var(bp_on_reg);
+ max_slots = core_num_brps;
+ } else {
+ /* Watchpoint */
+ base = AARCH64_DBG_REG_WCR;
+ slots = __get_cpu_var(wp_on_reg);
+ max_slots = core_num_wrps;
+ }
+
+ /* Remove the breakpoint. */
+ for (i = 0; i < max_slots; ++i) {
+ slot = &slots[i];
+
+ if (*slot == bp) {
+ *slot = NULL;
+ break;
+ }
+ }
+
+ if (WARN_ONCE(i == max_slots, "Can't find any breakpoint slot"))
+ return;
+
+ /* Reset the control register. */
+ write_wb_reg(base, i, 0);
+
+ /* Release the debug monitors for the correct exception level. */
+ disable_debug_monitors(debug_exception_level(info->ctrl.privilege));
+}
+
+static int get_hbp_len(u8 hbp_len)
+{
+ unsigned int len_in_bytes = 0;
+
+ switch (hbp_len) {
+ case ARM_BREAKPOINT_LEN_1:
+ len_in_bytes = 1;
+ break;
+ case ARM_BREAKPOINT_LEN_2:
+ len_in_bytes = 2;
+ break;
+ case ARM_BREAKPOINT_LEN_4:
+ len_in_bytes = 4;
+ break;
+ case ARM_BREAKPOINT_LEN_8:
+ len_in_bytes = 8;
+ break;
+ }
+
+ return len_in_bytes;
+}
+
+/*
+ * Check whether bp virtual address is in kernel space.
+ */
+int arch_check_bp_in_kernelspace(struct perf_event *bp)
+{
+ unsigned int len;
+ unsigned long va;
+ struct arch_hw_breakpoint *info = counter_arch_bp(bp);
+
+ va = info->address;
+ len = get_hbp_len(info->ctrl.len);
+
+ return (va >= TASK_SIZE) && ((va + len - 1) >= TASK_SIZE);
+}
+
+/*
+ * Extract generic type and length encodings from an arch_hw_breakpoint_ctrl.
+ * Hopefully this will disappear when ptrace can bypass the conversion
+ * to generic breakpoint descriptions.
+ */
+int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl,
+ int *gen_len, int *gen_type)
+{
+ /* Type */
+ switch (ctrl.type) {
+ case ARM_BREAKPOINT_EXECUTE:
+ *gen_type = HW_BREAKPOINT_X;
+ break;
+ case ARM_BREAKPOINT_LOAD:
+ *gen_type = HW_BREAKPOINT_R;
+ break;
+ case ARM_BREAKPOINT_STORE:
+ *gen_type = HW_BREAKPOINT_W;
+ break;
+ case ARM_BREAKPOINT_LOAD | ARM_BREAKPOINT_STORE:
+ *gen_type = HW_BREAKPOINT_RW;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ /* Len */
+ switch (ctrl.len) {
+ case ARM_BREAKPOINT_LEN_1:
+ *gen_len = HW_BREAKPOINT_LEN_1;
+ break;
+ case ARM_BREAKPOINT_LEN_2:
+ *gen_len = HW_BREAKPOINT_LEN_2;
+ break;
+ case ARM_BREAKPOINT_LEN_4:
+ *gen_len = HW_BREAKPOINT_LEN_4;
+ break;
+ case ARM_BREAKPOINT_LEN_8:
+ *gen_len = HW_BREAKPOINT_LEN_8;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+/*
+ * Construct an arch_hw_breakpoint from a perf_event.
+ */
+static int arch_build_bp_info(struct perf_event *bp)
+{
+ struct arch_hw_breakpoint *info = counter_arch_bp(bp);
+
+ /* Type */
+ switch (bp->attr.bp_type) {
+ case HW_BREAKPOINT_X:
+ info->ctrl.type = ARM_BREAKPOINT_EXECUTE;
+ break;
+ case HW_BREAKPOINT_R:
+ info->ctrl.type = ARM_BREAKPOINT_LOAD;
+ break;
+ case HW_BREAKPOINT_W:
+ info->ctrl.type = ARM_BREAKPOINT_STORE;
+ break;
+ case HW_BREAKPOINT_RW:
+ info->ctrl.type = ARM_BREAKPOINT_LOAD | ARM_BREAKPOINT_STORE;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ /* Len */
+ switch (bp->attr.bp_len) {
+ case HW_BREAKPOINT_LEN_1:
+ info->ctrl.len = ARM_BREAKPOINT_LEN_1;
+ break;
+ case HW_BREAKPOINT_LEN_2:
+ info->ctrl.len = ARM_BREAKPOINT_LEN_2;
+ break;
+ case HW_BREAKPOINT_LEN_4:
+ info->ctrl.len = ARM_BREAKPOINT_LEN_4;
+ break;
+ case HW_BREAKPOINT_LEN_8:
+ info->ctrl.len = ARM_BREAKPOINT_LEN_8;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ /*
+ * On AArch64, we only permit breakpoints of length 4, whereas
+ * AArch32 also requires breakpoints of length 2 for Thumb.
+ * Watchpoints can be of length 1, 2, 4 or 8 bytes.
+ */
+ if (info->ctrl.type == ARM_BREAKPOINT_EXECUTE) {
+ if (is_compat_task()) {
+ if (info->ctrl.len != ARM_BREAKPOINT_LEN_2 &&
+ info->ctrl.len != ARM_BREAKPOINT_LEN_4)
+ return -EINVAL;
+ } else if (info->ctrl.len != ARM_BREAKPOINT_LEN_4) {
+ /*
+ * FIXME: Some tools (I'm looking at you perf) assume
+ * that breakpoints should be sizeof(long). This
+ * is nonsense. For now, we fix up the parameter
+ * but we should probably return -EINVAL instead.
+ */
+ info->ctrl.len = ARM_BREAKPOINT_LEN_4;
+ }
+ }
+
+ /* Address */
+ info->address = bp->attr.bp_addr;
+
+ /*
+ * Privilege
+ * Note that we disallow combined EL0/EL1 breakpoints because
+ * that would complicate the stepping code.
+ */
+ if (arch_check_bp_in_kernelspace(bp))
+ info->ctrl.privilege = AARCH64_BREAKPOINT_EL1;
+ else
+ info->ctrl.privilege = AARCH64_BREAKPOINT_EL0;
+
+ /* Enabled? */
+ info->ctrl.enabled = !bp->attr.disabled;
+
+ return 0;
+}
+
+/*
+ * Validate the arch-specific HW Breakpoint register settings.
+ */
+int arch_validate_hwbkpt_settings(struct perf_event *bp)
+{
+ struct arch_hw_breakpoint *info = counter_arch_bp(bp);
+ int ret;
+ u64 alignment_mask, offset;
+
+ /* Build the arch_hw_breakpoint. */
+ ret = arch_build_bp_info(bp);
+ if (ret)
+ return ret;
+
+ /*
+ * Check address alignment.
+ * We don't do any clever alignment correction for watchpoints
+ * because using 64-bit unaligned addresses is deprecated for
+ * AArch64.
+ *
+ * AArch32 tasks expect some simple alignment fixups, so emulate
+ * that here.
+ */
+ if (is_compat_task()) {
+ if (info->ctrl.len == ARM_BREAKPOINT_LEN_8)
+ alignment_mask = 0x7;
+ else
+ alignment_mask = 0x3;
+ offset = info->address & alignment_mask;
+ switch (offset) {
+ case 0:
+ /* Aligned */
+ break;
+ case 1:
+ /* Allow single byte watchpoint. */
+ if (info->ctrl.len == ARM_BREAKPOINT_LEN_1)
+ break;
+ case 2:
+ /* Allow halfword watchpoints and breakpoints. */
+ if (info->ctrl.len == ARM_BREAKPOINT_LEN_2)
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ info->address &= ~alignment_mask;
+ info->ctrl.len <<= offset;
+ } else {
+ if (info->ctrl.type == ARM_BREAKPOINT_EXECUTE)
+ alignment_mask = 0x3;
+ else
+ alignment_mask = 0x7;
+ if (info->address & alignment_mask)
+ return -EINVAL;
+ }
+
+ /*
+ * Disallow per-task kernel breakpoints since these would
+ * complicate the stepping code.
+ */
+ if (info->ctrl.privilege == AARCH64_BREAKPOINT_EL1 && bp->hw.bp_target)
+ return -EINVAL;
+
+ return 0;
+}
+
+/*
+ * Enable/disable all of the breakpoints active at the specified
+ * exception level at the register level.
+ * This is used when single-stepping after a breakpoint exception.
+ */
+static void toggle_bp_registers(int reg, enum debug_el el, int enable)
+{
+ int i, max_slots, privilege;
+ u32 ctrl;
+ struct perf_event **slots;
+
+ switch (reg) {
+ case AARCH64_DBG_REG_BCR:
+ slots = __get_cpu_var(bp_on_reg);
+ max_slots = core_num_brps;
+ break;
+ case AARCH64_DBG_REG_WCR:
+ slots = __get_cpu_var(wp_on_reg);
+ max_slots = core_num_wrps;
+ break;
+ default:
+ return;
+ }
+
+ for (i = 0; i < max_slots; ++i) {
+ if (!slots[i])
+ continue;
+
+ privilege = counter_arch_bp(slots[i])->ctrl.privilege;
+ if (debug_exception_level(privilege) != el)
+ continue;
+
+ ctrl = read_wb_reg(reg, i);
+ if (enable)
+ ctrl |= 0x1;
+ else
+ ctrl &= ~0x1;
+ write_wb_reg(reg, i, ctrl);
+ }
+}
+
+/*
+ * Debug exception handlers.
+ */
+static int breakpoint_handler(unsigned long unused, unsigned int esr,
+ struct pt_regs *regs)
+{
+ int i, step = 0, *kernel_step;
+ u32 ctrl_reg;
+ u64 addr, val;
+ struct perf_event *bp, **slots;
+ struct debug_info *debug_info;
+ struct arch_hw_breakpoint_ctrl ctrl;
+
+ slots = (struct perf_event **)__get_cpu_var(bp_on_reg);
+ addr = instruction_pointer(regs);
+ debug_info = &current->thread.debug;
+
+ for (i = 0; i < core_num_brps; ++i) {
+ rcu_read_lock();
+
+ bp = slots[i];
+
+ if (bp == NULL)
+ goto unlock;
+
+ /* Check if the breakpoint value matches. */
+ val = read_wb_reg(AARCH64_DBG_REG_BVR, i);
+ if (val != (addr & ~0x3))
+ goto unlock;
+
+ /* Possible match, check the byte address select to confirm. */
+ ctrl_reg = read_wb_reg(AARCH64_DBG_REG_BCR, i);
+ decode_ctrl_reg(ctrl_reg, &ctrl);
+ if (!((1 << (addr & 0x3)) & ctrl.len))
+ goto unlock;
+
+ counter_arch_bp(bp)->trigger = addr;
+ perf_bp_event(bp, regs);
+
+ /* Do we need to handle the stepping? */
+ if (!bp->overflow_handler)
+ step = 1;
+unlock:
+ rcu_read_unlock();
+ }
+
+ if (!step)
+ return 0;
+
+ if (user_mode(regs)) {
+ debug_info->bps_disabled = 1;
+ toggle_bp_registers(AARCH64_DBG_REG_BCR, DBG_ACTIVE_EL0, 0);
+
+ /* If we're already stepping a watchpoint, just return. */
+ if (debug_info->wps_disabled)
+ return 0;
+
+ if (test_thread_flag(TIF_SINGLESTEP))
+ debug_info->suspended_step = 1;
+ else
+ user_enable_single_step(current);
+ } else {
+ toggle_bp_registers(AARCH64_DBG_REG_BCR, DBG_ACTIVE_EL1, 0);
+ kernel_step = &__get_cpu_var(stepping_kernel_bp);
+
+ if (*kernel_step != ARM_KERNEL_STEP_NONE)
+ return 0;
+
+ if (kernel_active_single_step()) {
+ *kernel_step = ARM_KERNEL_STEP_SUSPEND;
+ } else {
+ *kernel_step = ARM_KERNEL_STEP_ACTIVE;
+ kernel_enable_single_step(regs);
+ }
+ }
+
+ return 0;
+}
+
+static int watchpoint_handler(unsigned long addr, unsigned int esr,
+ struct pt_regs *regs)
+{
+ int i, step = 0, *kernel_step, access;
+ u32 ctrl_reg;
+ u64 val, alignment_mask;
+ struct perf_event *wp, **slots;
+ struct debug_info *debug_info;
+ struct arch_hw_breakpoint *info;
+ struct arch_hw_breakpoint_ctrl ctrl;
+
+ slots = (struct perf_event **)__get_cpu_var(wp_on_reg);
+ debug_info = &current->thread.debug;
+
+ for (i = 0; i < core_num_wrps; ++i) {
+ rcu_read_lock();
+
+ wp = slots[i];
+
+ if (wp == NULL)
+ goto unlock;
+
+ info = counter_arch_bp(wp);
+ /* AArch32 watchpoints are either 4 or 8 bytes aligned. */
+ if (is_compat_task()) {
+ if (info->ctrl.len == ARM_BREAKPOINT_LEN_8)
+ alignment_mask = 0x7;
+ else
+ alignment_mask = 0x3;
+ } else {
+ alignment_mask = 0x7;
+ }
+
+ /* Check if the watchpoint value matches. */
+ val = read_wb_reg(AARCH64_DBG_REG_WVR, i);
+ if (val != (addr & ~alignment_mask))
+ goto unlock;
+
+ /* Possible match, check the byte address select to confirm. */
+ ctrl_reg = read_wb_reg(AARCH64_DBG_REG_WCR, i);
+ decode_ctrl_reg(ctrl_reg, &ctrl);
+ if (!((1 << (addr & alignment_mask)) & ctrl.len))
+ goto unlock;
+
+ /*
+ * Check that the access type matches.
+ * 0 => load, otherwise => store
+ */
+ access = (esr & AARCH64_ESR_ACCESS_MASK) ? HW_BREAKPOINT_W :
+ HW_BREAKPOINT_R;
+ if (!(access & hw_breakpoint_type(wp)))
+ goto unlock;
+
+ info->trigger = addr;
+ perf_bp_event(wp, regs);
+
+ /* Do we need to handle the stepping? */
+ if (!wp->overflow_handler)
+ step = 1;
+
+unlock:
+ rcu_read_unlock();
+ }
+
+ if (!step)
+ return 0;
+
+ /*
+ * We always disable EL0 watchpoints because the kernel can
+ * cause these to fire via an unprivileged access.
+ */
+ toggle_bp_registers(AARCH64_DBG_REG_WCR, DBG_ACTIVE_EL0, 0);
+
+ if (user_mode(regs)) {
+ debug_info->wps_disabled = 1;
+
+ /* If we're already stepping a breakpoint, just return. */
+ if (debug_info->bps_disabled)
+ return 0;
+
+ if (test_thread_flag(TIF_SINGLESTEP))
+ debug_info->suspended_step = 1;
+ else
+ user_enable_single_step(current);
+ } else {
+ toggle_bp_registers(AARCH64_DBG_REG_WCR, DBG_ACTIVE_EL1, 0);
+ kernel_step = &__get_cpu_var(stepping_kernel_bp);
+
+ if (*kernel_step != ARM_KERNEL_STEP_NONE)
+ return 0;
+
+ if (kernel_active_single_step()) {
+ *kernel_step = ARM_KERNEL_STEP_SUSPEND;
+ } else {
+ *kernel_step = ARM_KERNEL_STEP_ACTIVE;
+ kernel_enable_single_step(regs);
+ }
+ }
+
+ return 0;
+}
+
+/*
+ * Handle single-step exception.
+ */
+int reinstall_suspended_bps(struct pt_regs *regs)
+{
+ struct debug_info *debug_info = &current->thread.debug;
+ int handled_exception = 0, *kernel_step;
+
+ kernel_step = &__get_cpu_var(stepping_kernel_bp);
+
+ /*
+ * Called from single-step exception handler.
+ * Return 0 if execution can resume, 1 if a SIGTRAP should be
+ * reported.
+ */
+ if (user_mode(regs)) {
+ if (debug_info->bps_disabled) {
+ debug_info->bps_disabled = 0;
+ toggle_bp_registers(AARCH64_DBG_REG_BCR, DBG_ACTIVE_EL0, 1);
+ handled_exception = 1;
+ }
+
+ if (debug_info->wps_disabled) {
+ debug_info->wps_disabled = 0;
+ toggle_bp_registers(AARCH64_DBG_REG_WCR, DBG_ACTIVE_EL0, 1);
+ handled_exception = 1;
+ }
+
+ if (handled_exception) {
+ if (debug_info->suspended_step) {
+ debug_info->suspended_step = 0;
+ /* Allow exception handling to fall-through. */
+ handled_exception = 0;
+ } else {
+ user_disable_single_step(current);
+ }
+ }
+ } else if (*kernel_step != ARM_KERNEL_STEP_NONE) {
+ toggle_bp_registers(AARCH64_DBG_REG_BCR, DBG_ACTIVE_EL1, 1);
+ toggle_bp_registers(AARCH64_DBG_REG_WCR, DBG_ACTIVE_EL1, 1);
+
+ if (!debug_info->wps_disabled)
+ toggle_bp_registers(AARCH64_DBG_REG_WCR, DBG_ACTIVE_EL0, 1);
+
+ if (*kernel_step != ARM_KERNEL_STEP_SUSPEND) {
+ kernel_disable_single_step();
+ handled_exception = 1;
+ } else {
+ handled_exception = 0;
+ }
+
+ *kernel_step = ARM_KERNEL_STEP_NONE;
+ }
+
+ return !handled_exception;
+}
+
+/*
+ * Context-switcher for restoring suspended breakpoints.
+ */
+void hw_breakpoint_thread_switch(struct task_struct *next)
+{
+ /*
+ * current next
+ * disabled: 0 0 => The usual case, NOTIFY_DONE
+ * 0 1 => Disable the registers
+ * 1 0 => Enable the registers
+ * 1 1 => NOTIFY_DONE. per-task bps will
+ * get taken care of by perf.
+ */
+
+ struct debug_info *current_debug_info, *next_debug_info;
+
+ current_debug_info = &current->thread.debug;
+ next_debug_info = &next->thread.debug;
+
+ /* Update breakpoints. */
+ if (current_debug_info->bps_disabled != next_debug_info->bps_disabled)
+ toggle_bp_registers(AARCH64_DBG_REG_BCR,
+ DBG_ACTIVE_EL0,
+ !next_debug_info->bps_disabled);
+
+ /* Update watchpoints. */
+ if (current_debug_info->wps_disabled != next_debug_info->wps_disabled)
+ toggle_bp_registers(AARCH64_DBG_REG_WCR,
+ DBG_ACTIVE_EL0,
+ !next_debug_info->wps_disabled);
+}
+
+/*
+ * CPU initialisation.
+ */
+static void reset_ctrl_regs(void *unused)
+{
+ int i;
+
+ for (i = 0; i < core_num_brps; ++i) {
+ write_wb_reg(AARCH64_DBG_REG_BCR, i, 0UL);
+ write_wb_reg(AARCH64_DBG_REG_BVR, i, 0UL);
+ }
+
+ for (i = 0; i < core_num_wrps; ++i) {
+ write_wb_reg(AARCH64_DBG_REG_WCR, i, 0UL);
+ write_wb_reg(AARCH64_DBG_REG_WVR, i, 0UL);
+ }
+}
+
+static int __cpuinit hw_breakpoint_reset_notify(struct notifier_block *self,
+ unsigned long action,
+ void *hcpu)
+{
+ int cpu = (long)hcpu;
+ if (action == CPU_ONLINE)
+ smp_call_function_single(cpu, reset_ctrl_regs, NULL, 1);
+ return NOTIFY_OK;
+}
+
+static struct notifier_block __cpuinitdata hw_breakpoint_reset_nb = {
+ .notifier_call = hw_breakpoint_reset_notify,
+};
+
+/*
+ * One-time initialisation.
+ */
+static int __init arch_hw_breakpoint_init(void)
+{
+ core_num_brps = get_num_brps();
+ core_num_wrps = get_num_wrps();
+
+ pr_info("found %d breakpoint and %d watchpoint registers.\n",
+ core_num_brps, core_num_wrps);
+
+ /*
+ * Reset the breakpoint resources. We assume that a halting
+ * debugger will leave the world in a nice state for us.
+ */
+ smp_call_function(reset_ctrl_regs, NULL, 1);
+ reset_ctrl_regs(NULL);
+
+ /* Register debug fault handlers. */
+ hook_debug_fault_code(DBG_ESR_EVT_HWBP, breakpoint_handler, SIGTRAP,
+ TRAP_HWBKPT, "hw-breakpoint handler");
+ hook_debug_fault_code(DBG_ESR_EVT_HWWP, watchpoint_handler, SIGTRAP,
+ TRAP_HWBKPT, "hw-watchpoint handler");
+
+ /* Register hotplug notifier. */
+ register_cpu_notifier(&hw_breakpoint_reset_nb);
+
+ return 0;
+}
+arch_initcall(arch_hw_breakpoint_init);
+
+void hw_breakpoint_pmu_read(struct perf_event *bp)
+{
+}
+
+/*
+ * Dummy function to register with die_notifier.
+ */
+int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
+ unsigned long val, void *data)
+{
+ return NOTIFY_DONE;
+}
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
new file mode 100644
index 0000000..816b1b5
--- /dev/null
+++ b/arch/arm64/kernel/ptrace.c
@@ -0,0 +1,834 @@
+/*
+ * Based on arch/arm/kernel/ptrace.c
+ *
+ * By Ross Biro 1/23/92
+ * edited by Linus Torvalds
+ * ARM modifications Copyright (C) 2000 Russell King
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/mm.h>
+#include <linux/smp.h>
+#include <linux/ptrace.h>
+#include <linux/user.h>
+#include <linux/security.h>
+#include <linux/init.h>
+#include <linux/signal.h>
+#include <linux/uaccess.h>
+#include <linux/perf_event.h>
+#include <linux/hw_breakpoint.h>
+#include <linux/regset.h>
+#include <linux/tracehook.h>
+#include <linux/elf.h>
+
+#include <asm/compat.h>
+#include <asm/debug-monitors.h>
+#include <asm/pgtable.h>
+#include <asm/traps.h>
+#include <asm/system_misc.h>
+
+/*
+ * TODO: does not yet catch signals sent when the child dies.
+ * in exit.c or in signal.c.
+ */
+
+/*
+ * Called by kernel/ptrace.c when detaching..
+ */
+void ptrace_disable(struct task_struct *child)
+{
+}
+
+/*
+ * Handle hitting a breakpoint.
+ */
+static int ptrace_break(struct pt_regs *regs)
+{
+ siginfo_t info;
+
+ info.si_signo = SIGTRAP;
+ info.si_errno = 0;
+ info.si_code = TRAP_BRKPT;
+ info.si_addr = (void __user *)instruction_pointer(regs);
+
+ force_sig_info(SIGTRAP, &info, current);
+ return 0;
+}
+
+static int arm64_break_trap(unsigned long addr, unsigned int esr,
+ struct pt_regs *regs)
+{
+ return ptrace_break(regs);
+}
+
+#ifdef CONFIG_HAVE_HW_BREAKPOINT
+/*
+ * Convert a virtual register number into an index for a thread_info
+ * breakpoint array. Breakpoints are identified using positive numbers
+ * whilst watchpoints are negative. The registers are laid out as pairs
+ * of (address, control), each pair mapping to a unique hw_breakpoint struct.
+ * Register 0 is reserved for describing resource information.
+ */
+static int ptrace_hbp_num_to_idx(long num)
+{
+ if (num < 0)
+ num = (ARM_MAX_BRP << 1) - num;
+ return (num - 1) >> 1;
+}
+
+/*
+ * Returns the virtual register number for the address of the
+ * breakpoint at index idx.
+ */
+static long ptrace_hbp_idx_to_num(int idx)
+{
+ long mid = ARM_MAX_BRP << 1;
+ long num = (idx << 1) + 1;
+ return num > mid ? mid - num : num;
+}
+
+/*
+ * Handle hitting a HW-breakpoint.
+ */
+static void ptrace_hbptriggered(struct perf_event *bp,
+ struct perf_sample_data *data,
+ struct pt_regs *regs)
+{
+ struct arch_hw_breakpoint *bkpt = counter_arch_bp(bp);
+ long num;
+ int i;
+ siginfo_t info;
+
+ for (i = 0; i < ARM_MAX_HBP_SLOTS; ++i)
+ if (current->thread.debug.hbp[i] == bp)
+ break;
+
+ num = (i == ARM_MAX_HBP_SLOTS) ? 0 : ptrace_hbp_idx_to_num(i);
+
+ info.si_signo = SIGTRAP;
+ info.si_errno = (int)num;
+ info.si_code = TRAP_HWBKPT;
+ info.si_addr = (void __user *)(bkpt->trigger);
+
+ force_sig_info(SIGTRAP, &info, current);
+}
+
+/*
+ * Unregister breakpoints from this task and reset the pointers in
+ * the thread_struct.
+ */
+void flush_ptrace_hw_breakpoint(struct task_struct *tsk)
+{
+ int i;
+ struct thread_struct *t = &tsk->thread;
+
+ for (i = 0; i < ARM_MAX_HBP_SLOTS; i++) {
+ if (t->debug.hbp[i]) {
+ unregister_hw_breakpoint(t->debug.hbp[i]);
+ t->debug.hbp[i] = NULL;
+ }
+ }
+}
+
+void ptrace_hw_copy_thread(struct task_struct *task)
+{
+ memset(&task->thread.debug, 0, sizeof(struct debug_info));
+}
+
+static u32 ptrace_get_hbp_resource_info(void)
+{
+ u8 num_brps, num_wrps, debug_arch, wp_len;
+ u32 reg = 0;
+
+ num_brps = hw_breakpoint_slots(TYPE_INST);
+ num_wrps = hw_breakpoint_slots(TYPE_DATA);
+
+ debug_arch = debug_monitors_arch();
+ wp_len = 8; /* Reserved on AArch64 */
+ reg |= debug_arch;
+ reg <<= 8;
+ reg |= wp_len;
+ reg <<= 8;
+ reg |= num_wrps;
+ reg <<= 8;
+ reg |= num_brps;
+
+ return reg;
+}
+
+static struct perf_event *ptrace_hbp_create(struct task_struct *tsk, int type)
+{
+ struct perf_event_attr attr;
+
+ ptrace_breakpoint_init(&attr);
+
+ /*
+ * Initialise fields to sane defaults
+ * (i.e. values that will pass validation).
+ */
+ attr.bp_addr = 0;
+ attr.bp_len = HW_BREAKPOINT_LEN_4;
+ attr.bp_type = type;
+ attr.disabled = 1;
+
+ return register_user_hw_breakpoint(&attr, ptrace_hbptriggered, NULL,
+ tsk);
+}
+
+static int ptrace_gethbpregs(struct task_struct *tsk, long num,
+ unsigned long __user *data)
+{
+ u64 addr_reg;
+ u32 ctrl_reg;
+ int idx, ret = 0;
+ struct perf_event *bp;
+ struct arch_hw_breakpoint_ctrl arch_ctrl;
+
+ if (num == 0) {
+ ctrl_reg = ptrace_get_hbp_resource_info();
+ if (put_user(ctrl_reg, (u32 __user *)data))
+ ret = -EFAULT;
+ } else {
+ idx = ptrace_hbp_num_to_idx(num);
+ if (idx < 0 || idx >= ARM_MAX_HBP_SLOTS)
+ return -EINVAL;
+
+ bp = tsk->thread.debug.hbp[idx];
+ arch_ctrl = counter_arch_bp(bp)->ctrl;
+
+ if (is_compat_task()) {
+ /*
+ * Fix up the len because we may have adjusted
+ * it to compensate for an unaligned address.
+ */
+ while (!(arch_ctrl.len & 0x1))
+ arch_ctrl.len >>= 1;
+ }
+
+ if (num & 0x1) {
+ addr_reg = bp ? bp->attr.bp_addr : 0;
+ if (put_user(addr_reg, data))
+ ret = -EFAULT;
+ } else {
+ ctrl_reg = bp ? encode_ctrl_reg(arch_ctrl) : 0;
+ if (put_user(ctrl_reg, (u32 __user *)data))
+ ret = -EFAULT;
+ }
+ }
+
+ return ret;
+}
+
+static int ptrace_sethbpregs(struct task_struct *tsk, long num,
+ unsigned long __user *data)
+{
+ int idx, gen_len, gen_type, implied_type, ret;
+ u64 user_addr;
+ u32 user_ctrl;
+ struct perf_event *bp;
+ struct arch_hw_breakpoint_ctrl ctrl;
+ struct perf_event_attr attr;
+
+ if (num == 0)
+ return 0;
+ else if (num < 0)
+ implied_type = HW_BREAKPOINT_RW;
+ else
+ implied_type = HW_BREAKPOINT_X;
+
+ idx = ptrace_hbp_num_to_idx(num);
+ if (idx < 0 || idx >= ARM_MAX_HBP_SLOTS)
+ return -EFAULT;
+
+ bp = tsk->thread.debug.hbp[idx];
+ if (!bp) {
+ bp = ptrace_hbp_create(tsk, implied_type);
+ if (IS_ERR(bp))
+ return PTR_ERR(bp);
+ tsk->thread.debug.hbp[idx] = bp;
+ }
+
+ attr = bp->attr;
+
+ if (num & 0x1) {
+ /* Address */
+ if (get_user(user_addr, data))
+ return -EFAULT;
+ attr.bp_addr = user_addr;
+ } else {
+ /* Control */
+ if (get_user(user_ctrl, (u32 __user *)data))
+ return -EFAULT;
+ decode_ctrl_reg(user_ctrl, &ctrl);
+ ret = arch_bp_generic_fields(ctrl, &gen_len, &gen_type);
+ if (ret)
+ return ret;
+
+ if ((gen_type & implied_type) != gen_type)
+ return -EINVAL;
+
+ attr.bp_len = gen_len;
+ attr.bp_type = gen_type;
+ attr.disabled = !ctrl.enabled;
+ }
+
+ return modify_user_hw_breakpoint(bp, &attr);
+}
+#endif /* CONFIG_HAVE_HW_BREAKPOINT */
+
+static int gpr_get(struct task_struct *target,
+ const struct user_regset *regset,
+ unsigned int pos, unsigned int count,
+ void *kbuf, void __user *ubuf)
+{
+ struct user_pt_regs *uregs = &task_pt_regs(target)->user_regs;
+ return user_regset_copyout(&pos, &count, &kbuf, &ubuf, uregs, 0, -1);
+}
+
+static int gpr_set(struct task_struct *target, const struct user_regset *regset,
+ unsigned int pos, unsigned int count,
+ const void *kbuf, const void __user *ubuf)
+{
+ int ret;
+ struct user_pt_regs newregs;
+
+ ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &newregs, 0, -1);
+ if (ret)
+ return ret;
+
+ if (!valid_user_regs(&newregs))
+ return -EINVAL;
+
+ task_pt_regs(target)->user_regs = newregs;
+ return 0;
+}
+
+/*
+ * TODO: update fp accessors for lazy context switching (sync/flush hwstate)
+ */
+static int fpr_get(struct task_struct *target, const struct user_regset *regset,
+ unsigned int pos, unsigned int count,
+ void *kbuf, void __user *ubuf)
+{
+ struct user_fpsimd_state *uregs;
+ uregs = &target->thread.fpsimd_state.user_fpsimd;
+ return user_regset_copyout(&pos, &count, &kbuf, &ubuf, uregs, 0, -1);
+}
+
+static int fpr_set(struct task_struct *target, const struct user_regset *regset,
+ unsigned int pos, unsigned int count,
+ const void *kbuf, const void __user *ubuf)
+{
+ int ret;
+ struct user_fpsimd_state newstate;
+
+ ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &newstate, 0, -1);
+ if (ret)
+ return ret;
+
+ target->thread.fpsimd_state.user_fpsimd = newstate;
+ return ret;
+}
+
+enum aarch64_regset {
+ REGSET_GPR,
+ REGSET_FPR,
+};
+
+static const struct user_regset aarch64_regsets[] = {
+ [REGSET_GPR] = {
+ .core_note_type = NT_PRSTATUS,
+ .n = sizeof(struct user_pt_regs) / sizeof(u64),
+ .size = sizeof(u64),
+ .align = sizeof(u64),
+ .get = gpr_get,
+ .set = gpr_set
+ },
+ [REGSET_FPR] = {
+ .core_note_type = NT_PRFPREG,
+ .n = sizeof(struct user_fpsimd_state) / sizeof(u32),
+ /*
+ * We pretend we have 32-bit registers because the fpsr and
+ * fpcr are 32-bits wide.
+ */
+ .size = sizeof(u32),
+ .align = sizeof(u32),
+ .get = fpr_get,
+ .set = fpr_set
+ },
+};
+
+static const struct user_regset_view user_aarch64_view = {
+ .name = "aarch64", .e_machine = EM_AARCH64,
+ .regsets = aarch64_regsets, .n = ARRAY_SIZE(aarch64_regsets)
+};
+
+#ifdef CONFIG_AARCH32_EMULATION
+enum compat_regset {
+ REGSET_COMPAT_GPR,
+ REGSET_COMPAT_VFP,
+};
+
+static int compat_gpr_get(struct task_struct *target,
+ const struct user_regset *regset,
+ unsigned int pos, unsigned int count,
+ void *kbuf, void __user *ubuf)
+{
+ int ret = 0;
+ unsigned int i, start, num_regs;
+
+ /* Calculate the number of AArch32 registers contained in count */
+ num_regs = count / regset->size;
+
+ /* Convert pos into an register number */
+ start = pos / regset->size;
+
+ if (start + num_regs > regset->n)
+ return -EIO;
+
+ for (i = 0; i < num_regs; ++i) {
+ unsigned int idx = start + i;
+ void *reg;
+
+ switch (idx) {
+ case 15:
+ reg = (void *)&task_pt_regs(target)->pc;
+ break;
+ case 16:
+ reg = (void *)&task_pt_regs(target)->pstate;
+ break;
+ case 17:
+ reg = (void *)&task_pt_regs(target)->orig_x0;
+ break;
+ default:
+ reg = (void *)&task_pt_regs(target)->regs[idx];
+ }
+
+ ret = copy_to_user(ubuf, reg, sizeof(compat_ulong_t));
+
+ if (ret)
+ break;
+ else
+ ubuf += sizeof(compat_ulong_t);
+ }
+
+ return ret;
+}
+
+static int compat_gpr_set(struct task_struct *target,
+ const struct user_regset *regset,
+ unsigned int pos, unsigned int count,
+ const void *kbuf, const void __user *ubuf)
+{
+ struct pt_regs newregs;
+ int ret = 0;
+ unsigned int i, start, num_regs;
+
+ /* Calculate the number of AArch32 registers contained in count */
+ num_regs = count / regset->size;
+
+ /* Convert pos into an register number */
+ start = pos / regset->size;
+
+ if (start + num_regs > regset->n)
+ return -EIO;
+
+ newregs = *task_pt_regs(target);
+
+ for (i = 0; i < num_regs; ++i) {
+ unsigned int idx = start + i;
+ void *reg;
+
+ switch (idx) {
+ case 15:
+ reg = (void *)&newregs.pc;
+ break;
+ case 16:
+ reg = (void *)&newregs.pstate;
+ break;
+ case 17:
+ reg = (void *)&newregs.orig_x0;
+ break;
+ default:
+ reg = (void *)&newregs.regs[idx];
+ }
+
+ ret = copy_from_user(reg, ubuf, sizeof(compat_ulong_t));
+
+ if (ret)
+ goto out;
+ else
+ ubuf += sizeof(compat_ulong_t);
+ }
+
+ if (valid_user_regs(&newregs.user_regs))
+ *task_pt_regs(target) = newregs;
+ else
+ ret = -EINVAL;
+
+out:
+ return ret;
+}
+
+static int compat_vfp_get(struct task_struct *target,
+ const struct user_regset *regset,
+ unsigned int pos, unsigned int count,
+ void *kbuf, void __user *ubuf)
+{
+ struct user_fpsimd_state *uregs;
+ compat_ulong_t fpscr;
+ int ret;
+
+ uregs = &target->thread.fpsimd_state.user_fpsimd;
+
+ /*
+ * The VFP registers are packed into the fpsimd_state, so they all sit
+ * nicely together for us. We just need to create the fpscr separately.
+ */
+ ret = user_regset_copyout(&pos, &count, &kbuf, &ubuf, uregs, 0,
+ VFP_STATE_SIZE - sizeof(compat_ulong_t));
+
+ if (count && !ret) {
+ fpscr = (uregs->fpsr & VFP_FPSCR_STAT_MASK) |
+ (uregs->fpcr & VFP_FPSCR_CTRL_MASK);
+ ret = put_user(fpscr, (compat_ulong_t *)ubuf);
+ }
+
+ return ret;
+}
+
+static int compat_vfp_set(struct task_struct *target,
+ const struct user_regset *regset,
+ unsigned int pos, unsigned int count,
+ const void *kbuf, const void __user *ubuf)
+{
+ struct user_fpsimd_state *uregs;
+ compat_ulong_t fpscr;
+ int ret;
+
+ if (pos + count > VFP_STATE_SIZE)
+ return -EIO;
+
+ uregs = &target->thread.fpsimd_state.user_fpsimd;
+
+ ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, uregs, 0,
+ VFP_STATE_SIZE - sizeof(compat_ulong_t));
+
+ if (count && !ret) {
+ ret = get_user(fpscr, (compat_ulong_t *)ubuf);
+ uregs->fpsr = fpscr & VFP_FPSCR_STAT_MASK;
+ uregs->fpcr = fpscr & VFP_FPSCR_CTRL_MASK;
+ }
+
+ return ret;
+}
+
+static const struct user_regset aarch32_regsets[] = {
+ [REGSET_COMPAT_GPR] = {
+ .core_note_type = NT_PRSTATUS,
+ .n = COMPAT_ELF_NGREG,
+ .size = sizeof(compat_elf_greg_t),
+ .align = sizeof(compat_elf_greg_t),
+ .get = compat_gpr_get,
+ .set = compat_gpr_set
+ },
+ [REGSET_COMPAT_VFP] = {
+ .core_note_type = NT_ARM_VFP,
+ .n = VFP_STATE_SIZE / sizeof(compat_ulong_t),
+ .size = sizeof(compat_ulong_t),
+ .align = sizeof(compat_ulong_t),
+ .get = compat_vfp_get,
+ .set = compat_vfp_set
+ },
+};
+
+static const struct user_regset_view user_aarch32_view = {
+ .name = "aarch32", .e_machine = EM_ARM,
+ .regsets = aarch32_regsets, .n = ARRAY_SIZE(aarch32_regsets)
+};
+#endif /* CONFIG_AARCH32_EMULATION */
+
+const struct user_regset_view *task_user_regset_view(struct task_struct *task)
+{
+#ifdef CONFIG_AARCH32_EMULATION
+ if (test_tsk_thread_flag(task, TIF_32BIT))
+ return &user_aarch32_view;
+#endif
+ return &user_aarch64_view;
+}
+
+long arch_ptrace(struct task_struct *child, long request,
+ unsigned long addr, unsigned long data)
+{
+ int ret;
+ unsigned long *datap = (unsigned long __user *)data;
+
+ switch (request) {
+ case PTRACE_GET_THREAD_AREA:
+ ret = put_user(child->thread.tp_value, datap);
+ break;
+
+#ifdef CONFIG_HAVE_HW_BREAKPOINT
+ case PTRACE_GETHBPREGS:
+ ret = ptrace_gethbpregs(child, addr, datap);
+ break;
+
+ case PTRACE_SETHBPREGS:
+ ret = ptrace_sethbpregs(child, addr, datap);
+ break;
+#endif
+
+ default:
+ ret = ptrace_request(child, request, addr, data);
+ break;
+ }
+
+ return ret;
+}
+
+#ifdef CONFIG_AARCH32_EMULATION
+
+#include <linux/compat.h>
+
+int aarch32_break_trap(struct pt_regs *regs)
+{
+ unsigned int instr;
+ bool bp = false;
+ void __user *pc = (void __user *)instruction_pointer(regs);
+
+ if (compat_thumb_mode(regs)) {
+ /* get 16-bit Thumb instruction */
+ get_user(instr, (u16 __user *)pc);
+ if (instr == AARCH32_BREAK_THUMB2_LO) {
+ /* get second half of 32-bit Thumb-2 instruction */
+ get_user(instr, (u16 __user *)(pc + 2));
+ bp = instr == AARCH32_BREAK_THUMB2_HI;
+ } else {
+ bp = instr == AARCH32_BREAK_THUMB;
+ }
+ } else {
+ /* 32-bit ARM instruction */
+ get_user(instr, (u32 __user *)pc);
+ bp = (instr & ~0xf0000000) == AARCH32_BREAK_ARM;
+ }
+
+ if (bp)
+ return ptrace_break(regs);
+ return 1;
+}
+
+static int compat_ptrace_read_user(struct task_struct *tsk, compat_ulong_t off,
+ compat_ulong_t __user *ret)
+{
+ compat_ulong_t tmp;
+
+ if (off & 3)
+ return -EIO;
+
+ if (off == PT_TEXT_ADDR)
+ tmp = tsk->mm->start_code;
+ else if (off == PT_DATA_ADDR)
+ tmp = tsk->mm->start_data;
+ else if (off == PT_TEXT_END_ADDR)
+ tmp = tsk->mm->end_code;
+ else if (off < sizeof(compat_elf_gregset_t))
+ return copy_regset_to_user(tsk, &user_aarch32_view,
+ REGSET_COMPAT_GPR, off,
+ sizeof(compat_ulong_t), ret);
+ else if (off >= COMPAT_USER_SZ)
+ return -EIO;
+ else
+ tmp = 0;
+
+ return put_user(tmp, ret);
+}
+
+static int compat_ptrace_write_user(struct task_struct *tsk, compat_ulong_t off,
+ compat_ulong_t val)
+{
+ int ret;
+
+ if (off & 3 || off >= COMPAT_USER_SZ)
+ return -EIO;
+
+ if (off >= sizeof(compat_elf_gregset_t))
+ return 0;
+
+ ret = copy_regset_from_user(tsk, &user_aarch32_view,
+ REGSET_COMPAT_GPR, off,
+ sizeof(compat_ulong_t),
+ &val);
+ return ret;
+}
+
+#ifdef CONFIG_HAVE_HW_BREAKPOINT
+static int compat_ptrace_gethbpregs(struct task_struct *tsk, compat_long_t num,
+ compat_ulong_t __user *data)
+{
+ int ret;
+ unsigned long kdata;
+
+ mm_segment_t old_fs = get_fs();
+ set_fs(KERNEL_DS);
+ ret = ptrace_gethbpregs(tsk, (long)num, &kdata);
+ set_fs(old_fs);
+
+ if (!ret)
+ ret = put_user(kdata, data);
+
+ return ret;
+}
+
+static int compat_ptrace_sethbpregs(struct task_struct *tsk, compat_long_t num,
+ compat_ulong_t __user *data)
+{
+ int ret;
+ unsigned long kdata = 0;
+ mm_segment_t old_fs = get_fs();
+
+ ret = get_user(kdata, data);
+
+ if (!ret) {
+ set_fs(KERNEL_DS);
+ ret = ptrace_sethbpregs(tsk, (long)num, &kdata);
+ set_fs(old_fs);
+ }
+
+ return ret;
+}
+#endif /* CONFIG_HAVE_HW_BREAKPOINT */
+
+long compat_arch_ptrace(struct task_struct *child, compat_long_t request,
+ compat_ulong_t caddr, compat_ulong_t cdata)
+{
+ unsigned long addr = caddr;
+ unsigned long data = cdata;
+ void __user *datap = compat_ptr(data);
+ int ret;
+
+ switch (request) {
+ case PTRACE_PEEKUSR:
+ ret = compat_ptrace_read_user(child, addr, datap);
+ break;
+
+ case PTRACE_POKEUSR:
+ ret = compat_ptrace_write_user(child, addr, data);
+ break;
+
+ case PTRACE_GETREGS:
+ ret = copy_regset_to_user(child,
+ &user_aarch32_view,
+ REGSET_COMPAT_GPR,
+ 0, sizeof(compat_elf_gregset_t),
+ datap);
+ break;
+
+ case PTRACE_SETREGS:
+ ret = copy_regset_from_user(child,
+ &user_aarch32_view,
+ REGSET_COMPAT_GPR,
+ 0, sizeof(compat_elf_gregset_t),
+ datap);
+ break;
+
+ case PTRACE_GET_THREAD_AREA:
+ ret = put_user((compat_ulong_t)child->thread.tp_value,
+ (compat_ulong_t __user *)datap);
+ break;
+
+ case PTRACE_SET_SYSCALL:
+ task_pt_regs(child)->syscallno = data;
+ ret = 0;
+ break;
+
+ case COMPAT_PTRACE_GETVFPREGS:
+ ret = copy_regset_to_user(child,
+ &user_aarch32_view,
+ REGSET_COMPAT_VFP,
+ 0, VFP_STATE_SIZE,
+ datap);
+ break;
+
+ case COMPAT_PTRACE_SETVFPREGS:
+ ret = copy_regset_from_user(child,
+ &user_aarch32_view,
+ REGSET_COMPAT_VFP,
+ 0, VFP_STATE_SIZE,
+ datap);
+ break;
+
+#ifdef CONFIG_HAVE_HW_BREAKPOINT
+ case PTRACE_GETHBPREGS:
+ ret = compat_ptrace_gethbpregs(child, addr, datap);
+ break;
+
+ case PTRACE_SETHBPREGS:
+ ret = compat_ptrace_sethbpregs(child, addr, datap);
+ break;
+#endif
+
+ default:
+ ret = compat_ptrace_request(child, request, addr,
+ data);
+ break;
+ }
+
+ return ret;
+}
+#endif /* CONFIG_AARCH32_EMULATION */
+
+static int __init ptrace_break_init(void)
+{
+ hook_debug_fault_code(DBG_ESR_EVT_BRK, arm64_break_trap, SIGTRAP,
+ TRAP_BRKPT, "ptrace BRK handler");
+ return 0;
+}
+core_initcall(ptrace_break_init);
+
+
+asmlinkage int syscall_trace(int dir, struct pt_regs *regs)
+{
+ unsigned long saved_reg;
+
+ if (!test_thread_flag(TIF_SYSCALL_TRACE))
+ return regs->syscallno;
+
+ if (test_thread_flag(TIF_32BIT)) {
+ /* AArch32 uses ip (r12) for scratch */
+ saved_reg = regs->regs[12];
+ regs->regs[12] = dir;
+ } else {
+ /*
+ * Save X7. X7 is used to denote syscall entry/exit:
+ * X7 = 0 -> entry, = 1 -> exit
+ */
+ saved_reg = regs->regs[7];
+ regs->regs[7] = dir;
+ }
+
+ if (dir)
+ tracehook_report_syscall_exit(regs, 0);
+ else if (tracehook_report_syscall_entry(regs))
+ regs->syscallno = ~0UL;
+
+ if (test_thread_flag(TIF_32BIT))
+ regs->regs[12] = saved_reg;
+ else
+ regs->regs[7] = saved_reg;
+
+ return regs->syscallno;
+}

2012-08-14 17:54:18

by Catalin Marinas

[permalink] [raw]
Subject: [PATCH v2 14/31] arm64: DMA mapping API

This patch adds support for the DMA mapping API. It uses dma_map_ops for
flexibility and it currently supports swiotlb. This patch could be
simplified further if the DMA accesses are coherent (not mandated by the
architecture) or if corresponding hooks are placed in the generic
swiotlb code to deal with cache maintenance.

Signed-off-by: Catalin Marinas <[email protected]>
---
arch/arm64/include/asm/dma-mapping.h | 124 ++++++++++++++++++++
arch/arm64/mm/dma-mapping.c | 208 ++++++++++++++++++++++++++++++++++
2 files changed, 332 insertions(+), 0 deletions(-)
create mode 100644 arch/arm64/include/asm/dma-mapping.h
create mode 100644 arch/arm64/mm/dma-mapping.c

diff --git a/arch/arm64/include/asm/dma-mapping.h b/arch/arm64/include/asm/dma-mapping.h
new file mode 100644
index 0000000..538f4b4
--- /dev/null
+++ b/arch/arm64/include/asm/dma-mapping.h
@@ -0,0 +1,124 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_DMA_MAPPING_H
+#define __ASM_DMA_MAPPING_H
+
+#ifdef __KERNEL__
+
+#include <linux/types.h>
+#include <linux/vmalloc.h>
+
+#include <asm-generic/dma-coherent.h>
+
+#define ARCH_HAS_DMA_GET_REQUIRED_MASK
+
+extern struct dma_map_ops *dma_ops;
+
+static inline struct dma_map_ops *get_dma_ops(struct device *dev)
+{
+ if (unlikely(!dev) || !dev->archdata.dma_ops)
+ return dma_ops;
+ else
+ return dev->archdata.dma_ops;
+}
+
+#include <asm-generic/dma-mapping-common.h>
+
+static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
+{
+ return (dma_addr_t)paddr;
+}
+
+static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t dev_addr)
+{
+ return (phys_addr_t)dev_addr;
+}
+
+static inline int dma_mapping_error(struct device *dev, dma_addr_t dev_addr)
+{
+ struct dma_map_ops *ops = get_dma_ops(dev);
+ return ops->mapping_error(dev, dev_addr);
+}
+
+static inline int dma_supported(struct device *dev, u64 mask)
+{
+ struct dma_map_ops *ops = get_dma_ops(dev);
+ return ops->dma_supported(dev, mask);
+}
+
+static inline int dma_set_mask(struct device *dev, u64 mask)
+{
+ if (!dev->dma_mask || !dma_supported(dev, mask))
+ return -EIO;
+ *dev->dma_mask = mask;
+
+ return 0;
+}
+
+static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size)
+{
+ if (!dev->dma_mask)
+ return 0;
+
+ return addr + size - 1 <= *dev->dma_mask;
+}
+
+static inline void dma_mark_clean(void *addr, size_t size)
+{
+}
+
+static inline void *dma_alloc_coherent(struct device *dev, size_t size,
+ dma_addr_t *dma_handle, gfp_t flags)
+{
+ struct dma_map_ops *ops = get_dma_ops(dev);
+ void *vaddr;
+
+ if (dma_alloc_from_coherent(dev, size, dma_handle, &vaddr))
+ return vaddr;
+
+ vaddr = ops->alloc(dev, size, dma_handle, flags, NULL);
+ debug_dma_alloc_coherent(dev, size, *dma_handle, vaddr);
+ return vaddr;
+}
+
+static inline void dma_free_coherent(struct device *dev, size_t size,
+ void *vaddr, dma_addr_t dev_addr)
+{
+ struct dma_map_ops *ops = get_dma_ops(dev);
+
+ if (dma_release_from_coherent(dev, get_order(size), vaddr))
+ return;
+
+ debug_dma_free_coherent(dev, size, vaddr, dev_addr);
+ ops->free(dev, size, vaddr, dev_addr, NULL);
+}
+
+/*
+ * There is no dma_cache_sync() implementation, so just return NULL here.
+ */
+static inline void *dma_alloc_noncoherent(struct device *dev, size_t size,
+ dma_addr_t *handle, gfp_t flags)
+{
+ return NULL;
+}
+
+static inline void dma_free_noncoherent(struct device *dev, size_t size,
+ void *cpu_addr, dma_addr_t handle)
+{
+}
+
+#endif /* __KERNEL__ */
+#endif /* __ASM_DMA_MAPPING_H */
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
new file mode 100644
index 0000000..4e5871d
--- /dev/null
+++ b/arch/arm64/mm/dma-mapping.c
@@ -0,0 +1,208 @@
+/*
+ * SWIOTLB-based DMA API implementation
+ *
+ * Copyright (C) 2012 ARM Ltd.
+ * Author: Catalin Marinas <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/gfp.h>
+#include <linux/export.h>
+#include <linux/slab.h>
+#include <linux/dma-mapping.h>
+#include <linux/vmalloc.h>
+#include <linux/swiotlb.h>
+
+#include <asm/cacheflush.h>
+
+struct dma_map_ops *dma_ops;
+EXPORT_SYMBOL(dma_ops);
+
+static void *arm64_swiotlb_alloc_coherent(struct device *dev, size_t size,
+ dma_addr_t *dma_handle, gfp_t flags,
+ struct dma_attrs *attrs)
+{
+ struct page *page, **map;
+ void *ptr;
+ int order = get_order(size);
+ int i;
+
+ if (dev->coherent_dma_mask != DMA_BIT_MASK(64))
+ flags |= GFP_DMA;
+
+ ptr = swiotlb_alloc_coherent(dev, size, dma_handle, flags);
+ if (!ptr)
+ goto no_mem;
+ map = kmalloc(sizeof(struct page *) << order, flags & ~GFP_DMA);
+ if (!map)
+ goto no_map;
+
+ /* remove any dirty cache lines on the kernel alias */
+ dmac_flush_range(ptr, ptr + size);
+
+ /* create a coherent mapping */
+ page = virt_to_page(ptr);
+ for (i = 0; i < (size >> PAGE_SHIFT); i++)
+ map[i] = page + i;
+ ptr = vmap(map, size >> PAGE_SHIFT, VM_MAP,
+ pgprot_dmacoherent(pgprot_default)); kfree(map);
+ if (!ptr)
+ goto no_map;
+
+ return ptr;
+
+no_map:
+ swiotlb_free_coherent(dev, size, ptr, *dma_handle);
+no_mem:
+ *dma_handle = ~0;
+ return NULL;
+}
+
+static void arm64_swiotlb_free_coherent(struct device *dev, size_t size,
+ void *vaddr, dma_addr_t dma_handle,
+ struct dma_attrs *attrs)
+{
+ vunmap(vaddr);
+ swiotlb_free_coherent(dev, size, vaddr, dma_handle);
+}
+
+static dma_addr_t arm64_swiotlb_map_page(struct device *dev,
+ struct page *page,
+ unsigned long offset, size_t size,
+ enum dma_data_direction dir,
+ struct dma_attrs *attrs)
+{
+ dma_addr_t dev_addr;
+
+ dev_addr = swiotlb_map_page(dev, page, offset, size, dir, attrs);
+ dmac_map_area(phys_to_virt(dma_to_phys(dev, dev_addr)), size, dir);
+
+ return dev_addr;
+}
+
+
+static void arm64_swiotlb_unmap_page(struct device *dev, dma_addr_t dev_addr,
+ size_t size, enum dma_data_direction dir,
+ struct dma_attrs *attrs)
+{
+ dmac_unmap_area(phys_to_virt(dma_to_phys(dev, dev_addr)), size, dir);
+ swiotlb_unmap_page(dev, dev_addr, size, dir, attrs);
+}
+
+static int arm64_swiotlb_map_sg_attrs(struct device *dev,
+ struct scatterlist *sgl, int nelems,
+ enum dma_data_direction dir,
+ struct dma_attrs *attrs)
+{
+ struct scatterlist *sg;
+ int i, ret;
+
+ ret = swiotlb_map_sg_attrs(dev, sgl, nelems, dir, attrs);
+ for_each_sg(sgl, sg, ret, i)
+ dmac_map_area(phys_to_virt(dma_to_phys(dev, sg->dma_address)),
+ sg->length, dir);
+
+ return ret;
+}
+
+static void arm64_swiotlb_unmap_sg_attrs(struct device *dev,
+ struct scatterlist *sgl, int nelems,
+ enum dma_data_direction dir,
+ struct dma_attrs *attrs)
+{
+ struct scatterlist *sg;
+ int i;
+
+ for_each_sg(sgl, sg, nelems, i)
+ dmac_unmap_area(phys_to_virt(dma_to_phys(dev, sg->dma_address)),
+ sg->length, dir);
+ swiotlb_unmap_sg_attrs(dev, sgl, nelems, dir, attrs);
+}
+
+static void arm64_swiotlb_sync_single_for_cpu(struct device *dev,
+ dma_addr_t dev_addr,
+ size_t size,
+ enum dma_data_direction dir)
+{
+ dmac_unmap_area(phys_to_virt(dma_to_phys(dev, dev_addr)), size, dir);
+ swiotlb_sync_single_for_cpu(dev, dev_addr, size, dir);
+}
+
+static void arm64_swiotlb_sync_single_for_device(struct device *dev,
+ dma_addr_t dev_addr,
+ size_t size,
+ enum dma_data_direction dir)
+{
+ swiotlb_sync_single_for_device(dev, dev_addr, size, dir);
+ dmac_map_area(phys_to_virt(dma_to_phys(dev, dev_addr)), size, dir);
+}
+
+static void arm64_swiotlb_sync_sg_for_cpu(struct device *dev,
+ struct scatterlist *sgl, int nelems,
+ enum dma_data_direction dir)
+{
+ struct scatterlist *sg;
+ int i;
+
+ for_each_sg(sgl, sg, nelems, i)
+ dmac_unmap_area(phys_to_virt(dma_to_phys(dev, sg->dma_address)),
+ sg->length, dir);
+ swiotlb_sync_sg_for_cpu(dev, sgl, nelems, dir);
+}
+
+static void arm64_swiotlb_sync_sg_for_device(struct device *dev,
+ struct scatterlist *sgl,
+ int nelems,
+ enum dma_data_direction dir)
+{
+ struct scatterlist *sg;
+ int i;
+
+ swiotlb_sync_sg_for_device(dev, sgl, nelems, dir);
+ for_each_sg(sgl, sg, nelems, i)
+ dmac_map_area(phys_to_virt(dma_to_phys(dev, sg->dma_address)),
+ sg->length, dir);
+}
+
+static struct dma_map_ops arm64_swiotlb_dma_ops = {
+ .alloc = arm64_swiotlb_alloc_coherent,
+ .free = arm64_swiotlb_free_coherent,
+ .map_page = arm64_swiotlb_map_page,
+ .unmap_page = arm64_swiotlb_unmap_page,
+ .map_sg = arm64_swiotlb_map_sg_attrs,
+ .unmap_sg = arm64_swiotlb_unmap_sg_attrs,
+ .sync_single_for_cpu = arm64_swiotlb_sync_single_for_cpu,
+ .sync_single_for_device = arm64_swiotlb_sync_single_for_device,
+ .sync_sg_for_cpu = arm64_swiotlb_sync_sg_for_cpu,
+ .sync_sg_for_device = arm64_swiotlb_sync_sg_for_device,
+ .dma_supported = swiotlb_dma_supported,
+ .mapping_error = swiotlb_dma_mapping_error,
+};
+
+void __init swiotlb_init_with_default_size(size_t default_size, int verbose);
+
+void __init arm64_swiotlb_init(size_t max_size)
+{
+ dma_ops = &arm64_swiotlb_dma_ops;
+ swiotlb_init_with_default_size(min((size_t)SZ_64M, max_size), 1);
+}
+
+#define PREALLOC_DMA_DEBUG_ENTRIES 4096
+
+static int __init dma_debug_do_init(void)
+{
+ dma_debug_init(PREALLOC_DMA_DEBUG_ENTRIES);
+ return 0;
+}
+fs_initcall(dma_debug_do_init);

2012-08-14 17:54:16

by Catalin Marinas

[permalink] [raw]
Subject: [PATCH v2 13/31] arm64: Device specific operations

This patch adds several definitions for device communication, including
I/O accessors and ioremap(). The __raw_* accessors are implemented as
inline asm to avoid compiler generation of post-indexed accesses (less
efficient to emulate in a virtualised environment).

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
---
arch/arm64/include/asm/device.h | 26 ++++
arch/arm64/include/asm/fb.h | 34 +++++
arch/arm64/include/asm/io.h | 263 +++++++++++++++++++++++++++++++++++++++
arch/arm64/kernel/io.c | 64 ++++++++++
arch/arm64/mm/ioremap.c | 84 +++++++++++++
5 files changed, 471 insertions(+), 0 deletions(-)
create mode 100644 arch/arm64/include/asm/device.h
create mode 100644 arch/arm64/include/asm/fb.h
create mode 100644 arch/arm64/include/asm/io.h
create mode 100644 arch/arm64/kernel/io.c
create mode 100644 arch/arm64/mm/ioremap.c

diff --git a/arch/arm64/include/asm/device.h b/arch/arm64/include/asm/device.h
new file mode 100644
index 0000000..0d8453c
--- /dev/null
+++ b/arch/arm64/include/asm/device.h
@@ -0,0 +1,26 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_DEVICE_H
+#define __ASM_DEVICE_H
+
+struct dev_archdata {
+ struct dma_map_ops *dma_ops;
+};
+
+struct pdev_archdata {
+};
+
+#endif
diff --git a/arch/arm64/include/asm/fb.h b/arch/arm64/include/asm/fb.h
new file mode 100644
index 0000000..adb88a6
--- /dev/null
+++ b/arch/arm64/include/asm/fb.h
@@ -0,0 +1,34 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_FB_H_
+#define __ASM_FB_H_
+
+#include <linux/fb.h>
+#include <linux/fs.h>
+#include <asm/page.h>
+
+static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma,
+ unsigned long off)
+{
+ vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
+}
+
+static inline int fb_is_primary_device(struct fb_info *info)
+{
+ return 0;
+}
+
+#endif /* __ASM_FB_H_ */
diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
new file mode 100644
index 0000000..48fa83f
--- /dev/null
+++ b/arch/arm64/include/asm/io.h
@@ -0,0 +1,263 @@
+/*
+ * Based on arch/arm/include/asm/io.h
+ *
+ * Copyright (C) 1996-2000 Russell King
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_IO_H
+#define __ASM_IO_H
+
+#ifdef __KERNEL__
+
+#include <linux/types.h>
+
+#include <asm/byteorder.h>
+#include <asm/barrier.h>
+#include <asm/pgtable.h>
+
+/*
+ * Generic IO read/write. These perform native-endian accesses.
+ */
+static inline void __raw_writeb(u8 val, volatile void __iomem *addr)
+{
+ asm volatile("strb %w0, [%1]" : : "r" (val), "r" (addr));
+}
+
+static inline void __raw_writew(u16 val, volatile void __iomem *addr)
+{
+ asm volatile("strh %w0, [%1]" : : "r" (val), "r" (addr));
+}
+
+static inline void __raw_writel(u32 val, volatile void __iomem *addr)
+{
+ asm volatile("str %w0, [%1]" : : "r" (val), "r" (addr));
+}
+
+static inline void __raw_writeq(u64 val, volatile void __iomem *addr)
+{
+ asm volatile("str %0, [%1]" : : "r" (val), "r" (addr));
+}
+
+static inline u8 __raw_readb(const volatile void __iomem *addr)
+{
+ u8 val;
+ asm volatile("ldrb %w0, [%1]" : "=r" (val) : "r" (addr));
+ return val;
+}
+
+static inline u16 __raw_readw(const volatile void __iomem *addr)
+{
+ u16 val;
+ asm volatile("ldrh %w0, [%1]" : "=r" (val) : "r" (addr));
+ return val;
+}
+
+static inline u32 __raw_readl(const volatile void __iomem *addr)
+{
+ u32 val;
+ asm volatile("ldr %w0, [%1]" : "=r" (val) : "r" (addr));
+ return val;
+}
+
+static inline u64 __raw_readq(const volatile void __iomem *addr)
+{
+ u64 val;
+ asm volatile("ldr %0, [%1]" : "=r" (val) : "r" (addr));
+ return val;
+}
+
+/* IO barriers */
+#define __iormb() rmb()
+#define __iowmb() wmb()
+
+#define mmiowb() do { } while (0)
+
+/*
+ * Relaxed I/O memory access primitives. These follow the Device memory
+ * ordering rules but do not guarantee any ordering relative to Normal memory
+ * accesses.
+ */
+#define readb_relaxed(c) ({ u8 __v = __raw_readb(c); __v; })
+#define readw_relaxed(c) ({ u16 __v = le16_to_cpu((__force __le16)__raw_readw(c)); __v; })
+#define readl_relaxed(c) ({ u32 __v = le32_to_cpu((__force __le32)__raw_readl(c)); __v; })
+
+#define writeb_relaxed(v,c) ((void)__raw_writeb((v),(c)))
+#define writew_relaxed(v,c) ((void)__raw_writew((__force u16)cpu_to_le16(v),(c)))
+#define writel_relaxed(v,c) ((void)__raw_writel((__force u32)cpu_to_le32(v),(c)))
+
+/*
+ * I/O memory access primitives. Reads are ordered relative to any
+ * following Normal memory access. Writes are ordered relative to any prior
+ * Normal memory access.
+ */
+#define readb(c) ({ u8 __v = readb_relaxed(c); __iormb(); __v; })
+#define readw(c) ({ u16 __v = readw_relaxed(c); __iormb(); __v; })
+#define readl(c) ({ u32 __v = readl_relaxed(c); __iormb(); __v; })
+
+#define writeb(v,c) ({ __iowmb(); writeb_relaxed((v),(c)); })
+#define writew(v,c) ({ __iowmb(); writew_relaxed((v),(c)); })
+#define writel(v,c) ({ __iowmb(); writel_relaxed((v),(c)); })
+
+/*
+ * I/O port access primitives.
+ */
+#define IO_SPACE_LIMIT 0xffff
+
+/*
+ * We currently don't have any platform with PCI support, so just leave this
+ * defined to 0 until needed.
+ */
+#define PCI_IOBASE ((void __iomem *)0)
+
+static inline u8 inb(unsigned long addr)
+{
+ return readb(addr + PCI_IOBASE);
+}
+
+static inline u16 inw(unsigned long addr)
+{
+ return readw(addr + PCI_IOBASE);
+}
+
+static inline u32 inl(unsigned long addr)
+{
+ return readl(addr + PCI_IOBASE);
+}
+
+static inline void outb(u8 b, unsigned long addr)
+{
+ writeb(b, addr + PCI_IOBASE);
+}
+
+static inline void outw(u16 b, unsigned long addr)
+{
+ writew(b, addr + PCI_IOBASE);
+}
+
+static inline void outl(u32 b, unsigned long addr)
+{
+ writel(b, addr + PCI_IOBASE);
+}
+
+#define inb_p(addr) inb(addr)
+#define inw_p(addr) inw(addr)
+#define inl_p(addr) inl(addr)
+
+#define outb_p(x, addr) outb((x), (addr))
+#define outw_p(x, addr) outw((x), (addr))
+#define outl_p(x, addr) outl((x), (addr))
+
+static inline void insb(unsigned long addr, void *buffer, int count)
+{
+ u8 *buf = buffer;
+ while (count--)
+ *buf++ = __raw_readb(addr + PCI_IOBASE);
+}
+
+static inline void insw(unsigned long addr, void *buffer, int count)
+{
+ u16 *buf = buffer;
+ while (count--)
+ *buf++ = __raw_readw(addr + PCI_IOBASE);
+}
+
+static inline void insl(unsigned long addr, void *buffer, int count)
+{
+ u32 *buf = buffer;
+ while (count--)
+ *buf++ = __raw_readl(addr + PCI_IOBASE);
+}
+
+static inline void outsb(unsigned long addr, const void *buffer, int count)
+{
+ const u8 *buf = buffer;
+ while (count--)
+ __raw_writeb(*buf++, addr + PCI_IOBASE);
+}
+
+static inline void outsw(unsigned long addr, const void *buffer, int count)
+{
+ const u16 *buf = buffer;
+ while (count--)
+ __raw_writew(*buf++, addr + PCI_IOBASE);
+}
+
+static inline void outsl(unsigned long addr, const void *buffer, int count)
+{
+ const u32 *buf = buffer;
+ while (count--)
+ __raw_writel(*buf++, addr + PCI_IOBASE);
+}
+
+#define insb_p(port,to,len) insb(port,to,len)
+#define insw_p(port,to,len) insw(port,to,len)
+#define insl_p(port,to,len) insl(port,to,len)
+
+#define outsb_p(port,from,len) outsb(port,from,len)
+#define outsw_p(port,from,len) outsw(port,from,len)
+#define outsl_p(port,from,len) outsl(port,from,len)
+
+/*
+ * String version of I/O memory access operations.
+ */
+extern void __memcpy_fromio(void *, const volatile void __iomem *, size_t);
+extern void __memcpy_toio(volatile void __iomem *, const void *, size_t);
+extern void __memset_io(volatile void __iomem *, int, size_t);
+
+#define memset_io(c,v,l) __memset_io((c),(v),(l))
+#define memcpy_fromio(a,c,l) __memcpy_fromio((a),(c),(l))
+#define memcpy_toio(c,a,l) __memcpy_toio((c),(a),(l))
+
+/*
+ * I/O memory mapping functions.
+ */
+extern void __iomem *__ioremap(phys_addr_t phys_addr, size_t size, pgprot_t prot);
+extern void __iounmap(volatile void __iomem *addr);
+
+#define PROT_DEFAULT (PTE_TYPE_PAGE | PTE_AF | PTE_DIRTY)
+#define PROT_DEVICE_nGnRE (PROT_DEFAULT | PTE_XN | PTE_ATTRINDX(MT_DEVICE_nGnRE))
+#define PROT_NORMAL_NC (PROT_DEFAULT | PTE_ATTRINDX(MT_NORMAL_NC))
+
+#define ioremap(addr, size) __ioremap((addr), (size), PROT_DEVICE_nGnRE)
+#define ioremap_nocache(addr, size) __ioremap((addr), (size), PROT_DEVICE_nGnRE)
+#define ioremap_wc(addr, size) __ioremap((addr), (size), PROT_NORMAL_NC)
+#define iounmap __iounmap
+
+#define ARCH_HAS_IOREMAP_WC
+#include <asm-generic/iomap.h>
+
+/*
+ * More restrictive address range checking than the default implementation
+ * (PHYS_OFFSET and PHYS_MASK taken into account).
+ */
+#define ARCH_HAS_VALID_PHYS_ADDR_RANGE
+extern int valid_phys_addr_range(unsigned long addr, size_t size);
+extern int valid_mmap_phys_addr_range(unsigned long pfn, size_t size);
+
+extern int devmem_is_allowed(unsigned long pfn);
+
+/*
+ * Convert a physical pointer to a virtual kernel pointer for /dev/mem
+ * access
+ */
+#define xlate_dev_mem_ptr(p) __va(p)
+
+/*
+ * Convert a virtual cached pointer to an uncached pointer
+ */
+#define xlate_dev_kmem_ptr(p) p
+
+#endif /* __KERNEL__ */
+#endif /* __ASM_IO_H */
diff --git a/arch/arm64/kernel/io.c b/arch/arm64/kernel/io.c
new file mode 100644
index 0000000..7d37ead
--- /dev/null
+++ b/arch/arm64/kernel/io.c
@@ -0,0 +1,64 @@
+/*
+ * Based on arch/arm/kernel/io.c
+ *
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/export.h>
+#include <linux/types.h>
+#include <linux/io.h>
+
+/*
+ * Copy data from IO memory space to "real" memory space.
+ */
+void __memcpy_fromio(void *to, const volatile void __iomem *from, size_t count)
+{
+ unsigned char *t = to;
+ while (count) {
+ count--;
+ *t = readb(from);
+ t++;
+ from++;
+ }
+}
+EXPORT_SYMBOL(__memcpy_fromio);
+
+/*
+ * Copy data from "real" memory space to IO memory space.
+ */
+void __memcpy_toio(volatile void __iomem *to, const void *from, size_t count)
+{
+ const unsigned char *f = from;
+ while (count) {
+ count--;
+ writeb(*f, to);
+ f++;
+ to++;
+ }
+}
+EXPORT_SYMBOL(__memcpy_toio);
+
+/*
+ * "memset" on IO memory space.
+ */
+void __memset_io(volatile void __iomem *dst, int c, size_t count)
+{
+ while (count) {
+ count--;
+ writeb(c, dst);
+ dst++;
+ }
+}
+EXPORT_SYMBOL(__memset_io);
diff --git a/arch/arm64/mm/ioremap.c b/arch/arm64/mm/ioremap.c
new file mode 100644
index 0000000..1725cd6
--- /dev/null
+++ b/arch/arm64/mm/ioremap.c
@@ -0,0 +1,84 @@
+/*
+ * Based on arch/arm/mm/ioremap.c
+ *
+ * (C) Copyright 1995 1996 Linus Torvalds
+ * Hacked for ARM by Phil Blundell <[email protected]>
+ * Hacked to allow all architectures to build, and various cleanups
+ * by Russell King
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/export.h>
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
+#include <linux/io.h>
+
+static void __iomem *__ioremap_caller(phys_addr_t phys_addr, size_t size,
+ pgprot_t prot, void *caller)
+{
+ unsigned long last_addr;
+ unsigned long offset = phys_addr & ~PAGE_MASK;
+ int err;
+ unsigned long addr;
+ struct vm_struct *area;
+
+ /*
+ * Page align the mapping address and size, taking account of any
+ * offset.
+ */
+ phys_addr &= PAGE_MASK;
+ size = PAGE_ALIGN(size + offset);
+
+ /*
+ * Don't allow wraparound, zero size or outside PHYS_MASK.
+ */
+ last_addr = phys_addr + size - 1;
+ if (!size || last_addr < phys_addr || (last_addr & ~PHYS_MASK))
+ return NULL;
+
+ /*
+ * Don't allow RAM to be mapped.
+ */
+ if (WARN_ON(pfn_valid(__phys_to_pfn(phys_addr))))
+ return NULL;
+
+ area = get_vm_area_caller(size, VM_IOREMAP, caller);
+ if (!area)
+ return NULL;
+ addr = (unsigned long)area->addr;
+
+ err = ioremap_page_range(addr, addr + size, phys_addr, prot);
+ if (err) {
+ vunmap((void *)addr);
+ return NULL;
+ }
+
+ return (void __iomem *)(offset + addr);
+}
+
+void __iomem *__ioremap(phys_addr_t phys_addr, size_t size, pgprot_t prot)
+{
+ return __ioremap_caller(phys_addr, size, prot,
+ __builtin_return_address(0));
+}
+EXPORT_SYMBOL(__ioremap);
+
+void __iounmap(volatile void __iomem *io_addr)
+{
+ void *addr = (void *)(PAGE_MASK & (unsigned long)io_addr);
+
+ vunmap(addr);
+}
+EXPORT_SYMBOL(__iounmap);

2012-08-14 17:54:15

by Catalin Marinas

[permalink] [raw]
Subject: [PATCH v2 12/31] arm64: Atomic operations

This patch introduces the atomic, mutex and futex operations. Many
atomic operations use the load-acquire and store-release operations
which imply barriers, avoiding the need for explicit DMB.

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
---
arch/arm64/include/asm/atomic.h | 306 +++++++++++++++++++++++++++++++++++++++
arch/arm64/include/asm/futex.h | 134 +++++++++++++++++
2 files changed, 440 insertions(+), 0 deletions(-)
create mode 100644 arch/arm64/include/asm/atomic.h
create mode 100644 arch/arm64/include/asm/futex.h

diff --git a/arch/arm64/include/asm/atomic.h b/arch/arm64/include/asm/atomic.h
new file mode 100644
index 0000000..fa60c8b
--- /dev/null
+++ b/arch/arm64/include/asm/atomic.h
@@ -0,0 +1,306 @@
+/*
+ * Based on arch/arm/include/asm/atomic.h
+ *
+ * Copyright (C) 1996 Russell King.
+ * Copyright (C) 2002 Deep Blue Solutions Ltd.
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_ATOMIC_H
+#define __ASM_ATOMIC_H
+
+#include <linux/compiler.h>
+#include <linux/types.h>
+
+#include <asm/barrier.h>
+#include <asm/cmpxchg.h>
+
+#define ATOMIC_INIT(i) { (i) }
+
+#ifdef __KERNEL__
+
+/*
+ * On ARM, ordinary assignment (str instruction) doesn't clear the local
+ * strex/ldrex monitor on some implementations. The reason we can use it for
+ * atomic_set() is the clrex or dummy strex done on every exception return.
+ */
+#define atomic_read(v) (*(volatile int *)&(v)->counter)
+#define atomic_set(v,i) (((v)->counter) = (i))
+
+/*
+ * AArch64 UP and SMP safe atomic ops. We use load exclusive and
+ * store exclusive to ensure that these are atomic. We may loop
+ * to ensure that the update happens.
+ */
+static inline void atomic_add(int i, atomic_t *v)
+{
+ unsigned long tmp;
+ int result;
+
+ asm volatile("// atomic_add\n"
+"1: ldxr %w0, [%3]\n"
+" add %w0, %w0, %w4\n"
+" stxr %w1, %w0, [%3]\n"
+" cbnz %w1,1b"
+ : "=&r" (result), "=&r" (tmp), "+o" (v->counter)
+ : "r" (&v->counter), "Ir" (i)
+ : "cc");
+}
+
+static inline int atomic_add_return(int i, atomic_t *v)
+{
+ unsigned long tmp;
+ int result;
+
+ asm volatile("// atomic_add_return\n"
+"1: ldaxr %w0, [%3]\n"
+" add %w0, %w0, %w4\n"
+" stlxr %w1, %w0, [%3]\n"
+" cbnz %w1, 1b"
+ : "=&r" (result), "=&r" (tmp), "+o" (v->counter)
+ : "r" (&v->counter), "Ir" (i)
+ : "cc");
+
+ return result;
+}
+
+static inline void atomic_sub(int i, atomic_t *v)
+{
+ unsigned long tmp;
+ int result;
+
+ asm volatile("// atomic_sub\n"
+"1: ldxr %w0, [%3]\n"
+" sub %w0, %w0, %w4\n"
+" stxr %w1, %w0, [%3]\n"
+" cbnz %w1, 1b"
+ : "=&r" (result), "=&r" (tmp), "+o" (v->counter)
+ : "r" (&v->counter), "Ir" (i)
+ : "cc");
+}
+
+static inline int atomic_sub_return(int i, atomic_t *v)
+{
+ unsigned long tmp;
+ int result;
+
+ asm volatile("// atomic_sub_return\n"
+"1: ldaxr %w0, [%3]\n"
+" sub %w0, %w0, %w4\n"
+" stlxr %w1, %w0, [%3]\n"
+" cbnz %w1, 1b"
+ : "=&r" (result), "=&r" (tmp), "+o" (v->counter)
+ : "r" (&v->counter), "Ir" (i)
+ : "cc");
+
+ return result;
+}
+
+static inline int atomic_cmpxchg(atomic_t *ptr, int old, int new)
+{
+ unsigned long tmp;
+ int oldval;
+
+ asm volatile("// atomic_cmpxchg\n"
+"1: ldaxr %w1, [%3]\n"
+" cmp %w1, %w4\n"
+" b.ne 2f\n"
+" stlxr %w0, %w5, [%3]\n"
+" cbnz %w0, 1b\n"
+"2:"
+ : "=&r" (tmp), "=&r" (oldval), "+o" (ptr->counter)
+ : "r" (&ptr->counter), "Ir" (old), "r" (new)
+ : "cc");
+
+ return oldval;
+}
+
+static inline void atomic_clear_mask(unsigned long mask, unsigned long *addr)
+{
+ unsigned long tmp, tmp2;
+
+ asm volatile("// atomic_clear_mask\n"
+"1: ldxr %0, [%3]\n"
+" bic %0, %0, %4\n"
+" stxr %w1, %0, [%3]\n"
+" cbnz %w1, 1b"
+ : "=&r" (tmp), "=&r" (tmp2), "+o" (*addr)
+ : "r" (addr), "Ir" (mask)
+ : "cc");
+}
+
+#define atomic_xchg(v, new) (xchg(&((v)->counter), new))
+
+static inline int __atomic_add_unless(atomic_t *v, int a, int u)
+{
+ int c, old;
+
+ c = atomic_read(v);
+ while (c != u && (old = atomic_cmpxchg((v), c, c + a)) != c)
+ c = old;
+ return c;
+}
+
+#define atomic_inc(v) atomic_add(1, v)
+#define atomic_dec(v) atomic_sub(1, v)
+
+#define atomic_inc_and_test(v) (atomic_add_return(1, v) == 0)
+#define atomic_dec_and_test(v) (atomic_sub_return(1, v) == 0)
+#define atomic_inc_return(v) (atomic_add_return(1, v))
+#define atomic_dec_return(v) (atomic_sub_return(1, v))
+#define atomic_sub_and_test(i, v) (atomic_sub_return(i, v) == 0)
+
+#define atomic_add_negative(i,v) (atomic_add_return(i, v) < 0)
+
+#define smp_mb__before_atomic_dec() smp_mb()
+#define smp_mb__after_atomic_dec() smp_mb()
+#define smp_mb__before_atomic_inc() smp_mb()
+#define smp_mb__after_atomic_inc() smp_mb()
+
+/*
+ * 64-bit atomic operations.
+ */
+#define ATOMIC64_INIT(i) { (i) }
+
+#define atomic64_read(v) (*(volatile long long *)&(v)->counter)
+#define atomic64_set(v,i) (((v)->counter) = (i))
+
+static inline void atomic64_add(u64 i, atomic64_t *v)
+{
+ long result;
+ unsigned long tmp;
+
+ asm volatile("// atomic64_add\n"
+"1: ldxr %0, [%3]\n"
+" add %0, %0, %4\n"
+" stxr %w1, %0, [%3]\n"
+" cbnz %w1, 1b"
+ : "=&r" (result), "=&r" (tmp), "+o" (v->counter)
+ : "r" (&v->counter), "Ir" (i)
+ : "cc");
+}
+
+static inline long atomic64_add_return(long i, atomic64_t *v)
+{
+ long result;
+ unsigned long tmp;
+
+ asm volatile("// atomic64_add_return\n"
+"1: ldaxr %0, [%3]\n"
+" add %0, %0, %4\n"
+" stlxr %w1, %0, [%3]\n"
+" cbnz %w1, 1b"
+ : "=&r" (result), "=&r" (tmp), "+o" (v->counter)
+ : "r" (&v->counter), "Ir" (i)
+ : "cc");
+
+ return result;
+}
+
+static inline void atomic64_sub(u64 i, atomic64_t *v)
+{
+ long result;
+ unsigned long tmp;
+
+ asm volatile("// atomic64_sub\n"
+"1: ldxr %0, [%3]\n"
+" sub %0, %0, %4\n"
+" stxr %w1, %0, [%3]\n"
+" cbnz %w1, 1b"
+ : "=&r" (result), "=&r" (tmp), "+o" (v->counter)
+ : "r" (&v->counter), "Ir" (i)
+ : "cc");
+}
+
+static inline long atomic64_sub_return(long i, atomic64_t *v)
+{
+ long result;
+ unsigned long tmp;
+
+ asm volatile("// atomic64_sub_return\n"
+"1: ldaxr %0, [%3]\n"
+" sub %0, %0, %4\n"
+" stlxr %w1, %0, [%3]\n"
+" cbnz %w1, 1b"
+ : "=&r" (result), "=&r" (tmp), "+o" (v->counter)
+ : "r" (&v->counter), "Ir" (i)
+ : "cc");
+
+ return result;
+}
+
+static inline long atomic64_cmpxchg(atomic64_t *ptr, long old, long new)
+{
+ long oldval;
+ unsigned long res;
+
+ asm volatile("// atomic64_cmpxchg\n"
+"1: ldaxr %1, [%3]\n"
+" cmp %1, %4\n"
+" b.ne 2f\n"
+" stlxr %w0, %5, [%3]\n"
+" cbnz %w0, 1b\n"
+"2:"
+ : "=&r" (res), "=&r" (oldval), "+o" (ptr->counter)
+ : "r" (&ptr->counter), "Ir" (old), "r" (new)
+ : "cc");
+
+ return oldval;
+}
+
+#define atomic64_xchg(v, new) (xchg(&((v)->counter), new))
+
+#define ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
+static inline long atomic64_dec_if_positive(atomic64_t *v)
+{
+ long result;
+ unsigned long tmp;
+
+ asm volatile("// atomic64_dec_if_positive\n"
+"1: ldaxr %0, [%3]\n"
+" subs %0, %0, #1\n"
+" b.mi 2f\n"
+" stlxr %w1, %0, [%3]\n"
+" cbnz %w1, 1b\n"
+"2:"
+ : "=&r" (result), "=&r" (tmp), "+o" (v->counter)
+ : "r" (&v->counter)
+ : "cc");
+
+ return result;
+}
+
+static inline int atomic64_add_unless(atomic64_t *v, long a, long u)
+{
+ long c, old;
+
+ c = atomic64_read(v);
+ while (c != u && (old = atomic64_cmpxchg((v), c, c + a)) != c)
+ c = old;
+
+ return c != u;
+}
+
+#define atomic64_add_negative(a, v) (atomic64_add_return((a), (v)) < 0)
+#define atomic64_inc(v) atomic64_add(1LL, (v))
+#define atomic64_inc_return(v) atomic64_add_return(1LL, (v))
+#define atomic64_inc_and_test(v) (atomic64_inc_return(v) == 0)
+#define atomic64_sub_and_test(a, v) (atomic64_sub_return((a), (v)) == 0)
+#define atomic64_dec(v) atomic64_sub(1LL, (v))
+#define atomic64_dec_return(v) atomic64_sub_return(1LL, (v))
+#define atomic64_dec_and_test(v) (atomic64_dec_return((v)) == 0)
+#define atomic64_inc_not_zero(v) atomic64_add_unless((v), 1LL, 0LL)
+
+#endif
+#endif
diff --git a/arch/arm64/include/asm/futex.h b/arch/arm64/include/asm/futex.h
new file mode 100644
index 0000000..0745e82
--- /dev/null
+++ b/arch/arm64/include/asm/futex.h
@@ -0,0 +1,134 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_FUTEX_H
+#define __ASM_FUTEX_H
+
+#ifdef __KERNEL__
+
+#include <linux/futex.h>
+#include <linux/uaccess.h>
+#include <asm/errno.h>
+
+#define __futex_atomic_op(insn, ret, oldval, uaddr, tmp, oparg) \
+ asm volatile( \
+"1: ldaxr %w1, %2\n" \
+ insn "\n" \
+"2: stlxr %w3, %w0, %2\n" \
+" cbnz %w3, 1b\n" \
+"3: .pushsection __ex_table,\"a\"\n" \
+" .align 3\n" \
+" .quad 1b, 4f, 2b, 4f\n" \
+" .popsection\n" \
+" .pushsection .fixup,\"ax\"\n" \
+"4: mov %w0, %w5\n" \
+" b 3b\n" \
+" .popsection" \
+ : "=&r" (ret), "=&r" (oldval), "+Q" (*uaddr), "=&r" (tmp) \
+ : "r" (oparg), "Ir" (-EFAULT) \
+ : "cc")
+
+static inline int
+futex_atomic_op_inuser (int encoded_op, u32 __user *uaddr)
+{
+ int op = (encoded_op >> 28) & 7;
+ int cmp = (encoded_op >> 24) & 15;
+ int oparg = (encoded_op << 8) >> 20;
+ int cmparg = (encoded_op << 20) >> 20;
+ int oldval = 0, ret, tmp;
+
+ if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28))
+ oparg = 1 << oparg;
+
+ if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
+ return -EFAULT;
+
+ pagefault_disable(); /* implies preempt_disable() */
+
+ switch (op) {
+ case FUTEX_OP_SET:
+ __futex_atomic_op("mov %w0, %w4",
+ ret, oldval, uaddr, tmp, oparg);
+ break;
+ case FUTEX_OP_ADD:
+ __futex_atomic_op("add %w0, %w1, %w4",
+ ret, oldval, uaddr, tmp, oparg);
+ break;
+ case FUTEX_OP_OR:
+ __futex_atomic_op("orr %w0, %w1, %w4",
+ ret, oldval, uaddr, tmp, oparg);
+ break;
+ case FUTEX_OP_ANDN:
+ __futex_atomic_op("and %w0, %w1, %w4",
+ ret, oldval, uaddr, tmp, ~oparg);
+ break;
+ case FUTEX_OP_XOR:
+ __futex_atomic_op("eor %w0, %w1, %w4",
+ ret, oldval, uaddr, tmp, oparg);
+ break;
+ default:
+ ret = -ENOSYS;
+ }
+
+ pagefault_enable(); /* subsumes preempt_enable() */
+
+ if (!ret) {
+ switch (cmp) {
+ case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break;
+ case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break;
+ case FUTEX_OP_CMP_LT: ret = (oldval < cmparg); break;
+ case FUTEX_OP_CMP_GE: ret = (oldval >= cmparg); break;
+ case FUTEX_OP_CMP_LE: ret = (oldval <= cmparg); break;
+ case FUTEX_OP_CMP_GT: ret = (oldval > cmparg); break;
+ default: ret = -ENOSYS;
+ }
+ }
+ return ret;
+}
+
+static inline int
+futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
+ u32 oldval, u32 newval)
+{
+ int ret = 0;
+ u32 val, tmp;
+
+ if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
+ return -EFAULT;
+
+ asm volatile("// futex_atomic_cmpxchg_inatomic\n"
+"1: ldaxr %w1, %2\n"
+" sub %w3, %w1, %w4\n"
+" cbnz %w3, 3f\n"
+"2: stlxr %w3, %w5, %2\n"
+" cbnz %w3, 1b\n"
+"3: .pushsection __ex_table,\"a\"\n"
+" .align 3\n"
+" .quad 1b, 4f, 2b, 4f\n"
+" .popsection\n"
+" .pushsection .fixup,\"ax\"\n"
+"4: mov %w0, %w6\n"
+" b 3b\n"
+" .popsection"
+ : "+r" (ret), "=&r" (val), "+Q" (*uaddr), "=&r" (tmp)
+ : "r" (oldval), "r" (newval), "Ir" (-EFAULT)
+ : "cc", "memory");
+
+ *uval = val;
+ return ret;
+}
+
+#endif /* __KERNEL__ */
+#endif /* __ASM_FUTEX_H */

2012-08-14 17:54:14

by Catalin Marinas

[permalink] [raw]
Subject: [PATCH v2 19/31] arm64: Signal handling support

This patch adds support for signal handling. The sigreturn is done via
VDSO, introduced by a previous patch. The SA_RESTORER is still defined
as it is required for 32-bit (compat) support but it is not to be used
for 64-bit applications.

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
---
arch/arm64/include/asm/sigcontext.h | 69 ++++++
arch/arm64/include/asm/siginfo.h | 23 ++
arch/arm64/include/asm/signal.h | 24 ++
arch/arm64/include/asm/ucontext.h | 30 +++
arch/arm64/kernel/signal.c | 436 +++++++++++++++++++++++++++++++++++
5 files changed, 582 insertions(+), 0 deletions(-)
create mode 100644 arch/arm64/include/asm/sigcontext.h
create mode 100644 arch/arm64/include/asm/siginfo.h
create mode 100644 arch/arm64/include/asm/signal.h
create mode 100644 arch/arm64/include/asm/ucontext.h
create mode 100644 arch/arm64/kernel/signal.c

diff --git a/arch/arm64/include/asm/sigcontext.h b/arch/arm64/include/asm/sigcontext.h
new file mode 100644
index 0000000..573cec7
--- /dev/null
+++ b/arch/arm64/include/asm/sigcontext.h
@@ -0,0 +1,69 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_SIGCONTEXT_H
+#define __ASM_SIGCONTEXT_H
+
+#include <linux/types.h>
+
+/*
+ * Signal context structure - contains all info to do with the state
+ * before the signal handler was invoked.
+ */
+struct sigcontext {
+ __u64 fault_address;
+ /* AArch64 registers */
+ __u64 regs[31];
+ __u64 sp;
+ __u64 pc;
+ __u64 pstate;
+ /* 4K reserved for FP/SIMD state and future expansion */
+ __u8 __reserved[4096] __attribute__((__aligned__(16)));
+};
+
+/*
+ * Header to be used at the beginning of structures extending the user
+ * context. Such structures must be placed after the rt_sigframe on the stack
+ * and be 16-byte aligned. The last structure must be a dummy one with the
+ * magic and size set to 0.
+ */
+struct _aarch64_ctx {
+ __u32 magic;
+ __u32 size;
+};
+
+#define FPSIMD_MAGIC 0x46508001
+
+struct fpsimd_context {
+ struct _aarch64_ctx head;
+ __u32 fpsr;
+ __u32 fpcr;
+ __uint128_t vregs[32];
+};
+
+#ifdef __KERNEL__
+/*
+ * Auxiliary context saved in the sigcontext.__reserved array. Not exported to
+ * user space as it will change with the addition of new context. User space
+ * should check the magic/size information.
+ */
+struct aux_context {
+ struct fpsimd_context fpsimd;
+ /* additional context to be added before "end" */
+ struct _aarch64_ctx end;
+};
+#endif
+
+#endif
diff --git a/arch/arm64/include/asm/siginfo.h b/arch/arm64/include/asm/siginfo.h
new file mode 100644
index 0000000..5a74a08
--- /dev/null
+++ b/arch/arm64/include/asm/siginfo.h
@@ -0,0 +1,23 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_SIGINFO_H
+#define __ASM_SIGINFO_H
+
+#define __ARCH_SI_PREAMBLE_SIZE (4 * sizeof(int))
+
+#include <asm-generic/siginfo.h>
+
+#endif
diff --git a/arch/arm64/include/asm/signal.h b/arch/arm64/include/asm/signal.h
new file mode 100644
index 0000000..8d1e723
--- /dev/null
+++ b/arch/arm64/include/asm/signal.h
@@ -0,0 +1,24 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_SIGNAL_H
+#define __ASM_SIGNAL_H
+
+/* Required for AArch32 compatibility. */
+#define SA_RESTORER 0x04000000
+
+#include <asm-generic/signal.h>
+
+#endif
diff --git a/arch/arm64/include/asm/ucontext.h b/arch/arm64/include/asm/ucontext.h
new file mode 100644
index 0000000..bde9607
--- /dev/null
+++ b/arch/arm64/include/asm/ucontext.h
@@ -0,0 +1,30 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_UCONTEXT_H
+#define __ASM_UCONTEXT_H
+
+struct ucontext {
+ unsigned long uc_flags;
+ struct ucontext *uc_link;
+ stack_t uc_stack;
+ sigset_t uc_sigmask;
+ /* glibc uses a 1024-bit sigset_t */
+ __u8 __unused[(1024 - sizeof(sigset_t)) / 8];
+ /* last for future expansion */
+ struct sigcontext uc_mcontext;
+};
+
+#endif /* __ASM_UCONTEXT_H */
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
new file mode 100644
index 0000000..a8f29d2
--- /dev/null
+++ b/arch/arm64/kernel/signal.c
@@ -0,0 +1,436 @@
+/*
+ * Based on arch/arm/kernel/signal.c
+ *
+ * Copyright (C) 1995-2009 Russell King
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/errno.h>
+#include <linux/signal.h>
+#include <linux/personality.h>
+#include <linux/freezer.h>
+#include <linux/uaccess.h>
+#include <linux/tracehook.h>
+#include <linux/ratelimit.h>
+
+#include <asm/debug-monitors.h>
+#include <asm/elf.h>
+#include <asm/cacheflush.h>
+#include <asm/ucontext.h>
+#include <asm/unistd.h>
+#include <asm/fpsimd.h>
+#include <asm/signal32.h>
+#include <asm/vdso.h>
+
+/*
+ * Do a signal return; undo the signal stack. These are aligned to 128-bit.
+ */
+struct rt_sigframe {
+ struct siginfo info;
+ struct ucontext uc;
+};
+
+static int preserve_fpsimd_context(struct fpsimd_context __user *ctx)
+{
+ struct fpsimd_state *fpsimd = &current->thread.fpsimd_state;
+ int err;
+
+ /* dump the hardware registers to the fpsimd_state structure */
+ fpsimd_save_state(fpsimd);
+
+ /* copy the FP and status/control registers */
+ err = __copy_to_user(ctx->vregs, fpsimd->vregs, sizeof(fpsimd->vregs));
+ __put_user_error(fpsimd->fpsr, &ctx->fpsr, err);
+ __put_user_error(fpsimd->fpcr, &ctx->fpcr, err);
+
+ /* copy the magic/size information */
+ __put_user_error(FPSIMD_MAGIC, &ctx->head.magic, err);
+ __put_user_error(sizeof(struct fpsimd_context), &ctx->head.size, err);
+
+ return err ? -EFAULT : 0;
+}
+
+static int restore_fpsimd_context(struct fpsimd_context __user *ctx)
+{
+ struct fpsimd_state fpsimd;
+ __u32 magic, size;
+ int err = 0;
+
+ /* check the magic/size information */
+ __get_user_error(magic, &ctx->head.magic, err);
+ __get_user_error(size, &ctx->head.size, err);
+ if (err)
+ return -EFAULT;
+ if (magic != FPSIMD_MAGIC || size != sizeof(struct fpsimd_context))
+ return -EINVAL;
+
+ /* copy the FP and status/control registers */
+ err = __copy_from_user(fpsimd.vregs, ctx->vregs,
+ sizeof(fpsimd.vregs));
+ __get_user_error(fpsimd.fpsr, &ctx->fpsr, err);
+ __get_user_error(fpsimd.fpcr, &ctx->fpcr, err);
+
+ /* load the hardware registers from the fpsimd_state structure */
+ if (!err) {
+ preempt_disable();
+ fpsimd_load_state(&fpsimd);
+ preempt_enable();
+ }
+
+ return err ? -EFAULT : 0;
+}
+
+static int restore_sigframe(struct pt_regs *regs,
+ struct rt_sigframe __user *sf)
+{
+ sigset_t set;
+ int i, err;
+ struct aux_context __user *aux =
+ (struct aux_context __user *)sf->uc.uc_mcontext.__reserved;
+
+ err = __copy_from_user(&set, &sf->uc.uc_sigmask, sizeof(set));
+ if (err == 0)
+ set_current_blocked(&set);
+
+ for (i = 0; i < 31; i++)
+ __get_user_error(regs->regs[i], &sf->uc.uc_mcontext.regs[i],
+ err);
+ __get_user_error(regs->sp, &sf->uc.uc_mcontext.sp, err);
+ __get_user_error(regs->pc, &sf->uc.uc_mcontext.pc, err);
+ __get_user_error(regs->pstate, &sf->uc.uc_mcontext.pstate, err);
+
+ /*
+ * Avoid sys_rt_sigreturn() restarting.
+ */
+ regs->syscallno = ~0UL;
+
+ err |= !valid_user_regs(&regs->user_regs);
+
+ if (err == 0)
+ err |= restore_fpsimd_context(&aux->fpsimd);
+
+ return err;
+}
+
+asmlinkage long sys_rt_sigreturn(struct pt_regs *regs)
+{
+ struct rt_sigframe __user *frame;
+
+ /* Always make any pending restarted system calls return -EINTR */
+ current_thread_info()->restart_block.fn = do_no_restart_syscall;
+
+ /*
+ * Since we stacked the signal on a 128-bit boundary, then 'sp' should
+ * be word aligned here.
+ */
+ if (regs->sp & 15)
+ goto badframe;
+
+ frame = (struct rt_sigframe __user *)regs->sp;
+
+ if (!access_ok(VERIFY_READ, frame, sizeof (*frame)))
+ goto badframe;
+
+ if (restore_sigframe(regs, frame))
+ goto badframe;
+
+ if (do_sigaltstack(&frame->uc.uc_stack,
+ NULL, regs->sp) == -EFAULT)
+ goto badframe;
+
+ return regs->regs[0];
+
+badframe:
+ if (show_unhandled_signals)
+ printk_ratelimited(KERN_INFO "%s[%d]: bad frame in %s: pc=%08llx sp=%08llx\n",
+ current->comm, task_pid_nr(current), __func__,
+ regs->pc, regs->sp);
+ force_sig(SIGSEGV, current);
+ return 0;
+}
+
+asmlinkage long sys_sigaltstack(const stack_t __user *uss, stack_t __user *uoss,
+ unsigned long sp)
+{
+ return do_sigaltstack(uss, uoss, sp);
+}
+
+static int setup_sigframe(struct rt_sigframe __user *sf,
+ struct pt_regs *regs, sigset_t *set)
+{
+ int i, err = 0;
+ struct aux_context __user *aux =
+ (struct aux_context __user *)sf->uc.uc_mcontext.__reserved;
+
+ for (i = 0; i < 31; i++)
+ __put_user_error(regs->regs[i], &sf->uc.uc_mcontext.regs[i],
+ err);
+ __put_user_error(regs->sp, &sf->uc.uc_mcontext.sp, err);
+ __put_user_error(regs->pc, &sf->uc.uc_mcontext.pc, err);
+ __put_user_error(regs->pstate, &sf->uc.uc_mcontext.pstate, err);
+
+ __put_user_error(current->thread.fault_address, &sf->uc.uc_mcontext.fault_address, err);
+
+ err |= __copy_to_user(&sf->uc.uc_sigmask, set, sizeof(*set));
+
+ if (err == 0)
+ err |= preserve_fpsimd_context(&aux->fpsimd);
+
+ /* set the "end" magic */
+ __put_user_error(0, &aux->end.magic, err);
+ __put_user_error(0, &aux->end.size, err);
+
+ return err;
+}
+
+static void __user *get_sigframe(struct k_sigaction *ka, struct pt_regs *regs,
+ int framesize)
+{
+ unsigned long sp, sp_top;
+ void __user *frame;
+
+ sp = sp_top = regs->sp;
+
+ /*
+ * This is the X/Open sanctioned signal stack switching.
+ */
+ if ((ka->sa.sa_flags & SA_ONSTACK) && !sas_ss_flags(sp))
+ sp = sp_top = current->sas_ss_sp + current->sas_ss_size;
+
+ /* room for stack frame (FP, LR) */
+ sp -= 16;
+
+ sp = (sp - framesize) & ~15;
+ frame = (void __user *)sp;
+
+ /*
+ * Check that we can actually write to the signal frame.
+ */
+ if (!access_ok(VERIFY_WRITE, frame, sp_top - sp))
+ frame = NULL;
+
+ return frame;
+}
+
+static int setup_return(struct pt_regs *regs, struct k_sigaction *ka,
+ void __user *frame, int usig)
+{
+ int err = 0;
+ __sigrestore_t sigtramp;
+ unsigned long __user *sp = (unsigned long __user *)regs->sp;
+
+ /* set up the stack frame */
+ __put_user_error(regs->regs[29], sp - 2, err);
+ __put_user_error(regs->regs[30], sp - 1, err);
+
+ regs->regs[0] = usig;
+ regs->regs[29] = regs->sp - 16;
+ regs->sp = (unsigned long)frame;
+ regs->pc = (unsigned long)ka->sa.sa_handler;
+
+ if (ka->sa.sa_flags & SA_RESTORER)
+ sigtramp = ka->sa.sa_restorer;
+ else
+ sigtramp = VDSO_SYMBOL(current->mm->context.vdso, sigtramp);
+
+ regs->regs[30] = (unsigned long)sigtramp;
+
+ return err;
+}
+
+static int setup_rt_frame(int usig, struct k_sigaction *ka, siginfo_t *info,
+ sigset_t *set, struct pt_regs *regs)
+{
+ struct rt_sigframe __user *frame;
+ stack_t stack;
+ int err = 0;
+
+ frame = get_sigframe(ka, regs, sizeof(*frame));
+ if (!frame)
+ return 1;
+
+ __put_user_error(0, &frame->uc.uc_flags, err);
+ __put_user_error(NULL, &frame->uc.uc_link, err);
+
+ memset(&stack, 0, sizeof(stack));
+ stack.ss_sp = (void __user *)current->sas_ss_sp;
+ stack.ss_flags = sas_ss_flags(regs->sp);
+ stack.ss_size = current->sas_ss_size;
+ err |= __copy_to_user(&frame->uc.uc_stack, &stack, sizeof(stack));
+
+ err |= setup_sigframe(frame, regs, set);
+ if (err == 0)
+ err = setup_return(regs, ka, frame, usig);
+
+ if (err == 0 && ka->sa.sa_flags & SA_SIGINFO) {
+ err |= copy_siginfo_to_user(&frame->info, info);
+ regs->regs[1] = (unsigned long)&frame->info;
+ regs->regs[2] = (unsigned long)&frame->uc;
+ }
+
+ return err;
+}
+
+static void setup_restart_syscall(struct pt_regs *regs)
+{
+ if (test_thread_flag(TIF_32BIT))
+ compat_setup_restart_syscall(regs);
+ else
+ regs->regs[8] = __NR_restart_syscall;
+}
+
+/*
+ * OK, we're invoking a handler
+ */
+static void handle_signal(unsigned long sig, struct k_sigaction *ka,
+ siginfo_t *info, struct pt_regs *regs)
+{
+ struct thread_info *thread = current_thread_info();
+ struct task_struct *tsk = current;
+ sigset_t *oldset = sigmask_to_save();
+ int usig = sig;
+ int ret;
+
+ /*
+ * translate the signal
+ */
+ if (usig < 32 && thread->exec_domain && thread->exec_domain->signal_invmap)
+ usig = thread->exec_domain->signal_invmap[usig];
+
+ /*
+ * Set up the stack frame
+ */
+ if (test_thread_flag(TIF_32BIT)) {
+ if (ka->sa.sa_flags & SA_SIGINFO)
+ ret = compat_setup_rt_frame(usig, ka, info, oldset,
+ regs);
+ else
+ ret = compat_setup_frame(usig, ka, oldset, regs);
+ } else {
+ ret = setup_rt_frame(usig, ka, info, oldset, regs);
+ }
+
+ /*
+ * Check that the resulting registers are actually sane.
+ */
+ ret |= !valid_user_regs(&regs->user_regs);
+
+ if (ret != 0) {
+ force_sigsegv(sig, tsk);
+ return;
+ }
+
+ /*
+ * Fast forward the stepping logic so we step into the signal
+ * handler.
+ */
+ user_fastforward_single_step(tsk);
+
+ signal_delivered(sig, info, ka, regs, 0);
+}
+
+/*
+ * Note that 'init' is a special process: it doesn't get signals it doesn't
+ * want to handle. Thus you cannot kill init even with a SIGKILL even by
+ * mistake.
+ *
+ * Note that we go through the signals twice: once to check the signals that
+ * the kernel can handle, and then we build all the user-level signal handling
+ * stack-frames in one go after that.
+ */
+static void do_signal(struct pt_regs *regs)
+{
+ unsigned long continue_addr = 0, restart_addr = 0;
+ struct k_sigaction ka;
+ siginfo_t info;
+ int signr, retval = 0;
+ int syscall = (int)regs->syscallno;
+
+ /*
+ * If we were from a system call, check for system call restarting...
+ */
+ if (syscall >= 0) {
+ continue_addr = regs->pc;
+ restart_addr = continue_addr - (compat_thumb_mode(regs) ? 2 : 4);
+ retval = regs->regs[0];
+
+ /*
+ * Avoid additional syscall restarting via ret_to_user.
+ */
+ regs->syscallno = ~0UL;
+
+ /*
+ * Prepare for system call restart. We do this here so that a
+ * debugger will see the already changed PC.
+ */
+ switch (retval) {
+ case -ERESTARTNOHAND:
+ case -ERESTARTSYS:
+ case -ERESTARTNOINTR:
+ case -ERESTART_RESTARTBLOCK:
+ regs->regs[0] = regs->orig_x0;
+ regs->pc = restart_addr;
+ break;
+ }
+ }
+
+ /*
+ * Get the signal to deliver. When running under ptrace, at this point
+ * the debugger may change all of our registers.
+ */
+ signr = get_signal_to_deliver(&info, &ka, regs, NULL);
+ if (signr > 0) {
+ /*
+ * Depending on the signal settings, we may need to revert the
+ * decision to restart the system call, but skip this if a
+ * debugger has chosen to restart at a different PC.
+ */
+ if (regs->pc == restart_addr &&
+ (retval == -ERESTARTNOHAND ||
+ retval == -ERESTART_RESTARTBLOCK ||
+ (retval == -ERESTARTSYS &&
+ !(ka.sa.sa_flags & SA_RESTART)))) {
+ regs->regs[0] = -EINTR;
+ regs->pc = continue_addr;
+ }
+
+ handle_signal(signr, &ka, &info, regs);
+ return;
+ }
+
+ /*
+ * Handle restarting a different system call. As above, if a debugger
+ * has chosen to restart at a different PC, ignore the restart.
+ */
+ if (syscall >= 0 && regs->pc == restart_addr) {
+ if (retval == -ERESTART_RESTARTBLOCK)
+ setup_restart_syscall(regs);
+ user_rewind_single_step(current);
+ }
+
+ restore_saved_sigmask();
+}
+
+asmlinkage void do_notify_resume(struct pt_regs *regs,
+ unsigned int thread_flags)
+{
+ if (thread_flags & _TIF_SIGPENDING)
+ do_signal(regs);
+
+ if (thread_flags & _TIF_NOTIFY_RESUME) {
+ clear_thread_flag(TIF_NOTIFY_RESUME);
+ tracehook_notify_resume(regs);
+ }
+}

2012-08-14 17:54:10

by Catalin Marinas

[permalink] [raw]
Subject: [PATCH v2 11/31] arm64: IRQ handling

From: Marc Zyngier <[email protected]>

This patch adds the support for IRQ handling. The actual interrupt
controller will be part of a separate patch (going into
drivers/irqchip/).

Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
---
arch/arm64/include/asm/hardirq.h | 47 +++++++++++++++++++
arch/arm64/include/asm/irq.h | 8 +++
arch/arm64/include/asm/irqflags.h | 91 +++++++++++++++++++++++++++++++++++++
arch/arm64/kernel/irq.c | 84 ++++++++++++++++++++++++++++++++++
4 files changed, 230 insertions(+), 0 deletions(-)
create mode 100644 arch/arm64/include/asm/hardirq.h
create mode 100644 arch/arm64/include/asm/irq.h
create mode 100644 arch/arm64/include/asm/irqflags.h
create mode 100644 arch/arm64/kernel/irq.c

diff --git a/arch/arm64/include/asm/hardirq.h b/arch/arm64/include/asm/hardirq.h
new file mode 100644
index 0000000..c6c9514
--- /dev/null
+++ b/arch/arm64/include/asm/hardirq.h
@@ -0,0 +1,47 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_HARDIRQ_H
+#define __ASM_HARDIRQ_H
+
+#include <linux/cache.h>
+#include <linux/threads.h>
+#include <asm/irq.h>
+
+typedef struct {
+ unsigned int __softirq_pending;
+} ____cacheline_aligned irq_cpustat_t;
+
+#include <linux/irq_cpustat.h> /* Standard mappings for irq_cpustat_t above */
+
+#define __inc_irq_stat(cpu, member) __IRQ_STAT(cpu, member)++
+#define __get_irq_stat(cpu, member) __IRQ_STAT(cpu, member)
+
+#ifdef CONFIG_SMP
+u64 smp_irq_stat_cpu(unsigned int cpu);
+#define arch_irq_stat_cpu smp_irq_stat_cpu
+#endif
+
+#define __ARCH_IRQ_EXIT_IRQS_DISABLED 1
+
+static inline void ack_bad_irq(unsigned int irq)
+{
+ extern unsigned long irq_err_count;
+ irq_err_count++;
+}
+
+extern void handle_IRQ(unsigned int, struct pt_regs *);
+
+#endif /* __ASM_HARDIRQ_H */
diff --git a/arch/arm64/include/asm/irq.h b/arch/arm64/include/asm/irq.h
new file mode 100644
index 0000000..a4e1cad
--- /dev/null
+++ b/arch/arm64/include/asm/irq.h
@@ -0,0 +1,8 @@
+#ifndef __ASM_IRQ_H
+#define __ASM_IRQ_H
+
+#include <asm-generic/irq.h>
+
+extern void (*handle_arch_irq)(struct pt_regs *);
+
+#endif
diff --git a/arch/arm64/include/asm/irqflags.h b/arch/arm64/include/asm/irqflags.h
new file mode 100644
index 0000000..aa11943
--- /dev/null
+++ b/arch/arm64/include/asm/irqflags.h
@@ -0,0 +1,91 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_IRQFLAGS_H
+#define __ASM_IRQFLAGS_H
+
+#ifdef __KERNEL__
+
+#include <asm/ptrace.h>
+
+/*
+ * CPU interrupt mask handling.
+ */
+static inline unsigned long arch_local_irq_save(void)
+{
+ unsigned long flags;
+ asm volatile(
+ "mrs %0, daif // arch_local_irq_save\n"
+ "msr daifset, #2"
+ : "=r" (flags)
+ :
+ : "memory");
+ return flags;
+}
+
+static inline void arch_local_irq_enable(void)
+{
+ asm volatile(
+ "msr daifclr, #2 // arch_local_irq_enable"
+ :
+ :
+ : "memory");
+}
+
+static inline void arch_local_irq_disable(void)
+{
+ asm volatile(
+ "msr daifset, #2 // arch_local_irq_disable"
+ :
+ :
+ : "memory");
+}
+
+#define local_fiq_enable() asm("msr daifclr, #1" : : : "memory")
+#define local_fiq_disable() asm("msr daifset, #1" : : : "memory")
+
+/*
+ * Save the current interrupt enable state.
+ */
+static inline unsigned long arch_local_save_flags(void)
+{
+ unsigned long flags;
+ asm volatile(
+ "mrs %0, daif // arch_local_save_flags"
+ : "=r" (flags)
+ :
+ : "memory");
+ return flags;
+}
+
+/*
+ * restore saved IRQ state
+ */
+static inline void arch_local_irq_restore(unsigned long flags)
+{
+ asm volatile(
+ "msr daif, %0 // arch_local_irq_restore"
+ :
+ : "r" (flags)
+ : "memory");
+}
+
+static inline int arch_irqs_disabled_flags(unsigned long flags)
+{
+ return flags & PSR_I_BIT;
+}
+
+#endif
+#endif
diff --git a/arch/arm64/kernel/irq.c b/arch/arm64/kernel/irq.c
new file mode 100644
index 0000000..d346241
--- /dev/null
+++ b/arch/arm64/kernel/irq.c
@@ -0,0 +1,84 @@
+/*
+ * Based on arch/arm/kernel/irq.c
+ *
+ * Copyright (C) 1992 Linus Torvalds
+ * Modifications for ARM processor Copyright (C) 1995-2000 Russell King.
+ * Support for Dynamic Tick Timer Copyright (C) 2004-2005 Nokia Corporation.
+ * Dynamic Tick Timer written by Tony Lindgren <[email protected]> and
+ * Tuukka Tikkanen <[email protected]>.
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/kernel_stat.h>
+#include <linux/irq.h>
+#include <linux/smp.h>
+#include <linux/init.h>
+#include <linux/of_irq.h>
+#include <linux/seq_file.h>
+
+unsigned long irq_err_count;
+
+int arch_show_interrupts(struct seq_file *p, int prec)
+{
+#ifdef CONFIG_SMP
+ show_ipi_list(p, prec);
+#endif
+ seq_printf(p, "%*s: %10lu\n", prec, "Err", irq_err_count);
+ return 0;
+}
+
+/*
+ * handle_IRQ handles all hardware IRQ's. Decoded IRQs should
+ * not come via this function. Instead, they should provide their
+ * own 'handler'. Used by platform code implementing C-based 1st
+ * level decoding.
+ */
+void handle_IRQ(unsigned int irq, struct pt_regs *regs)
+{
+ struct pt_regs *old_regs = set_irq_regs(regs);
+
+ irq_enter();
+
+ /*
+ * Some hardware gives randomly wrong interrupts. Rather
+ * than crashing, do something sensible.
+ */
+ if (unlikely(irq >= nr_irqs)) {
+ if (printk_ratelimit())
+ pr_warning("Bad IRQ%u\n", irq);
+ ack_bad_irq(irq);
+ } else {
+ generic_handle_irq(irq);
+ }
+
+ irq_exit();
+ set_irq_regs(old_regs);
+}
+
+/*
+ * Interrupt controllers supported by the kernel.
+ */
+static const struct of_device_id intctrl_of_match[] __initconst = {
+ /* IRQ controllers { .compatible, .data } info to go here */
+ {}
+};
+
+void __init init_IRQ(void)
+{
+ of_irq_init(intctrl_of_match);
+
+ if (!handle_arch_irq)
+ panic("No interrupt controller found.");
+}

2012-08-14 17:54:08

by Catalin Marinas

[permalink] [raw]
Subject: [PATCH v2 16/31] arm64: ELF definitions

This patch adds definitions for the ELF format, including personality
personality setting and EXEC_PAGESIZE. The are only two hwcap
definitions for 64-bit applications - HWCAP_FP and HWCAP_ASIMD.

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
---
arch/arm64/include/asm/auxvec.h | 22 +++++
arch/arm64/include/asm/elf.h | 176 +++++++++++++++++++++++++++++++++++++
arch/arm64/include/asm/hwcap.h | 57 ++++++++++++
arch/arm64/include/asm/param.h | 23 +++++
arch/arm64/include/asm/shmparam.h | 28 ++++++
arch/arm64/kernel/elf.c | 41 +++++++++
6 files changed, 347 insertions(+), 0 deletions(-)
create mode 100644 arch/arm64/include/asm/auxvec.h
create mode 100644 arch/arm64/include/asm/elf.h
create mode 100644 arch/arm64/include/asm/hwcap.h
create mode 100644 arch/arm64/include/asm/param.h
create mode 100644 arch/arm64/include/asm/shmparam.h
create mode 100644 arch/arm64/kernel/elf.c

diff --git a/arch/arm64/include/asm/auxvec.h b/arch/arm64/include/asm/auxvec.h
new file mode 100644
index 0000000..22d6d88
--- /dev/null
+++ b/arch/arm64/include/asm/auxvec.h
@@ -0,0 +1,22 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_AUXVEC_H
+#define __ASM_AUXVEC_H
+
+/* vDSO location */
+#define AT_SYSINFO_EHDR 33
+
+#endif
diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h
new file mode 100644
index 0000000..9d62a7a
--- /dev/null
+++ b/arch/arm64/include/asm/elf.h
@@ -0,0 +1,176 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_ELF_H
+#define __ASM_ELF_H
+
+#include <asm/hwcap.h>
+
+/*
+ * ELF register definitions..
+ */
+#include <asm/ptrace.h>
+#include <asm/user.h>
+
+typedef unsigned long elf_greg_t;
+typedef unsigned long elf_freg_t[3];
+
+#define ELF_NGREG (sizeof (struct pt_regs) / sizeof(elf_greg_t))
+typedef elf_greg_t elf_gregset_t[ELF_NGREG];
+
+typedef struct user_fp elf_fpregset_t;
+
+#define EM_AARCH64 183
+
+/*
+ * AArch64 static relocation types.
+ */
+
+/* Miscellaneous. */
+#define R_ARM_NONE 0
+#define R_AARCH64_NONE 256
+
+/* Data. */
+#define R_AARCH64_ABS64 257
+#define R_AARCH64_ABS32 258
+#define R_AARCH64_ABS16 259
+#define R_AARCH64_PREL64 260
+#define R_AARCH64_PREL32 261
+#define R_AARCH64_PREL16 262
+
+/* Instructions. */
+#define R_AARCH64_MOVW_UABS_G0 263
+#define R_AARCH64_MOVW_UABS_G0_NC 264
+#define R_AARCH64_MOVW_UABS_G1 265
+#define R_AARCH64_MOVW_UABS_G1_NC 266
+#define R_AARCH64_MOVW_UABS_G2 267
+#define R_AARCH64_MOVW_UABS_G2_NC 268
+#define R_AARCH64_MOVW_UABS_G3 269
+
+#define R_AARCH64_MOVW_SABS_G0 270
+#define R_AARCH64_MOVW_SABS_G1 271
+#define R_AARCH64_MOVW_SABS_G2 272
+
+#define R_AARCH64_LD_PREL_LO19 273
+#define R_AARCH64_ADR_PREL_LO21 274
+#define R_AARCH64_ADR_PREL_PG_HI21 275
+#define R_AARCH64_ADR_PREL_PG_HI21_NC 276
+#define R_AARCH64_ADD_ABS_LO12_NC 277
+#define R_AARCH64_LDST8_ABS_LO12_NC 278
+
+#define R_AARCH64_TSTBR14 279
+#define R_AARCH64_CONDBR19 280
+#define R_AARCH64_JUMP26 282
+#define R_AARCH64_CALL26 283
+#define R_AARCH64_LDST16_ABS_LO12_NC 284
+#define R_AARCH64_LDST32_ABS_LO12_NC 285
+#define R_AARCH64_LDST64_ABS_LO12_NC 286
+#define R_AARCH64_LDST128_ABS_LO12_NC 299
+
+#define R_AARCH64_MOVW_PREL_G0 287
+#define R_AARCH64_MOVW_PREL_G0_NC 288
+#define R_AARCH64_MOVW_PREL_G1 289
+#define R_AARCH64_MOVW_PREL_G1_NC 290
+#define R_AARCH64_MOVW_PREL_G2 291
+#define R_AARCH64_MOVW_PREL_G2_NC 292
+#define R_AARCH64_MOVW_PREL_G3 293
+
+
+/*
+ * These are used to set parameters in the core dumps.
+ */
+#define ELF_CLASS ELFCLASS64
+#define ELF_DATA ELFDATA2LSB
+#define ELF_ARCH EM_AARCH64
+
+#define ELF_PLATFORM_SIZE 16
+#define ELF_PLATFORM ("aarch64")
+
+/*
+ * This is used to ensure we don't load something for the wrong architecture.
+ */
+#define elf_check_arch(x) ((x)->e_machine == EM_AARCH64)
+
+#define elf_read_implies_exec(ex,stk) (stk != EXSTACK_DISABLE_X)
+
+#define CORE_DUMP_USE_REGSET
+#define ELF_EXEC_PAGESIZE PAGE_SIZE
+
+/*
+ * This is the location that an ET_DYN program is loaded if exec'ed. Typical
+ * use of this is to invoke "./ld.so someprog" to test out a new version of
+ * the loader. We need to make sure that it is out of the way of the program
+ * that it will "exec", and that there is sufficient room for the brk.
+ */
+extern unsigned long randomize_et_dyn(unsigned long base);
+#define ELF_ET_DYN_BASE (randomize_et_dyn(2 * TASK_SIZE_64 / 3))
+
+/*
+ * When the program starts, a1 contains a pointer to a function to be
+ * registered with atexit, as per the SVR4 ABI. A value of 0 means we have no
+ * such handler.
+ */
+#define ELF_PLAT_INIT(_r, load_addr) (_r)->regs[0] = 0
+
+extern void elf_set_personality(int personality);
+#define SET_PERSONALITY(ex) elf_set_personality(PER_LINUX)
+
+#define ARCH_DLINFO \
+do { \
+ NEW_AUX_ENT(AT_SYSINFO_EHDR, \
+ (elf_addr_t)current->mm->context.vdso); \
+} while (0)
+
+#define ARCH_HAS_SETUP_ADDITIONAL_PAGES
+struct linux_binprm;
+extern int arch_setup_additional_pages(struct linux_binprm *bprm,
+ int uses_interp);
+
+/* 1GB of VA */
+#define STACK_RND_MASK (test_thread_flag(TIF_32BIT) ? \
+ 0x7ff >> (PAGE_SHIFT - 12) : \
+ 0x3ffff >> (PAGE_SHIFT - 12))
+
+struct mm_struct;
+extern unsigned long arch_randomize_brk(struct mm_struct *mm);
+#define arch_randomize_brk arch_randomize_brk
+
+#ifdef CONFIG_AARCH32_EMULATION
+#define EM_ARM 40
+#define COMPAT_ELF_PLATFORM ("v8l")
+
+#define COMPAT_ELF_ET_DYN_BASE (randomize_et_dyn(2 * TASK_SIZE_32 / 3))
+
+/* AArch32 registers. */
+#define COMPAT_ELF_NGREG 18
+typedef unsigned int compat_elf_greg_t;
+typedef compat_elf_greg_t compat_elf_gregset_t[COMPAT_ELF_NGREG];
+
+/* AArch32 EABI. */
+#define EF_ARM_EABI_MASK 0xff000000
+#define compat_elf_check_arch(x) (((x)->e_machine == EM_ARM) && \
+ ((x)->e_flags & EF_ARM_EABI_MASK))
+
+#define compat_start_thread compat_start_thread
+#define COMPAT_SET_PERSONALITY(ex) elf_set_personality(PER_LINUX32)
+#define COMPAT_ARCH_DLINFO
+extern int aarch32_setup_vectors_page(struct linux_binprm *bprm,
+ int uses_interp);
+#define compat_arch_setup_additional_pages \
+ aarch32_setup_vectors_page
+
+#endif /* CONFIG_AARCH32_EMULATION */
+
+#endif
diff --git a/arch/arm64/include/asm/hwcap.h b/arch/arm64/include/asm/hwcap.h
new file mode 100644
index 0000000..0cc7c03
--- /dev/null
+++ b/arch/arm64/include/asm/hwcap.h
@@ -0,0 +1,57 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_HWCAP_H
+#define __ASM_HWCAP_H
+
+/*
+ * HWCAP flags - for elf_hwcap (in kernel) and AT_HWCAP
+ */
+#define HWCAP_FP (1 << 0)
+#define HWCAP_ASIMD (1 << 1)
+
+#ifdef CONFIG_AARCH32_EMULATION
+#define COMPAT_HWCAP_HALF (1 << 1)
+#define COMPAT_HWCAP_THUMB (1 << 2)
+#define COMPAT_HWCAP_FAST_MULT (1 << 4)
+#define COMPAT_HWCAP_VFP (1 << 6)
+#define COMPAT_HWCAP_EDSP (1 << 7)
+#define COMPAT_HWCAP_NEON (1 << 12)
+#define COMPAT_HWCAP_VFPv3 (1 << 13)
+#define COMPAT_HWCAP_TLS (1 << 15)
+#define COMPAT_HWCAP_VFPv4 (1 << 16)
+#define COMPAT_HWCAP_IDIVA (1 << 17)
+#define COMPAT_HWCAP_IDIVT (1 << 18)
+#define COMPAT_HWCAP_IDIV (COMPAT_HWCAP_IDIVA|COMPAT_HWCAP_IDIVT)
+
+#endif /* CONFIG_AARCH32_EMULATION */
+
+#if defined(__KERNEL__) && !defined(__ASSEMBLY__)
+/*
+ * This yields a mask that user programs can use to figure out what
+ * instruction set this cpu supports.
+ */
+#define ELF_HWCAP (elf_hwcap)
+#ifdef CONFIG_AARCH32_EMULATION
+#define COMPAT_ELF_HWCAP (COMPAT_HWCAP_HALF|COMPAT_HWCAP_THUMB|\
+ COMPAT_HWCAP_FAST_MULT|COMPAT_HWCAP_EDSP|\
+ COMPAT_HWCAP_TLS|COMPAT_HWCAP_VFP|\
+ COMPAT_HWCAP_VFPv3|COMPAT_HWCAP_VFPv4|\
+ COMPAT_HWCAP_NEON|COMPAT_HWCAP_IDIV)
+#endif
+extern unsigned int elf_hwcap;
+#endif
+
+#endif
diff --git a/arch/arm64/include/asm/param.h b/arch/arm64/include/asm/param.h
new file mode 100644
index 0000000..8e3a281
--- /dev/null
+++ b/arch/arm64/include/asm/param.h
@@ -0,0 +1,23 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_PARAM_H
+#define __ASM_PARAM_H
+
+#define EXEC_PAGESIZE 65536
+
+#include <asm-generic/param.h>
+
+#endif
diff --git a/arch/arm64/include/asm/shmparam.h b/arch/arm64/include/asm/shmparam.h
new file mode 100644
index 0000000..4df608a
--- /dev/null
+++ b/arch/arm64/include/asm/shmparam.h
@@ -0,0 +1,28 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_SHMPARAM_H
+#define __ASM_SHMPARAM_H
+
+/*
+ * For IPC syscalls from compat tasks, we need to use the legacy 16k
+ * alignment value. Since we don't have aliasing D-caches, the rest of
+ * the time we can safely use PAGE_SIZE.
+ */
+#define COMPAT_SHMLBA 0x4000
+
+#include <asm-generic/shmparam.h>
+
+#endif /* __ASM_SHMPARAM_H */
diff --git a/arch/arm64/kernel/elf.c b/arch/arm64/kernel/elf.c
new file mode 100644
index 0000000..6f98076
--- /dev/null
+++ b/arch/arm64/kernel/elf.c
@@ -0,0 +1,41 @@
+/*
+ * ELF personality setting
+ *
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#include <linux/export.h>
+#include <linux/sched.h>
+#include <linux/personality.h>
+#include <linux/binfmts.h>
+#include <linux/elf.h>
+
+void elf_set_personality(int personality)
+{
+ switch (personality & PER_MASK) {
+ case PER_LINUX:
+ clear_thread_flag(TIF_32BIT);
+ break;
+ case PER_LINUX32:
+ set_thread_flag(TIF_32BIT);
+ break;
+ default:
+ pr_warning("Process %s tried to assume unknown personality %d\n",
+ current->comm, personality);
+ return;
+ }
+
+ current->personality = personality;
+}
+EXPORT_SYMBOL(elf_set_personality);

2012-08-14 17:54:06

by Catalin Marinas

[permalink] [raw]
Subject: [PATCH v2 17/31] arm64: System calls handling

This patch adds support for system calls coming from 64-bit
applications. It uses the asm-generic/unistd.h definitions with the
canonical set of system calls. The private system calls are only used
for 32-bit (compat) applications as 64-bit ones can set the TLS and
flush the caches entirely from user space.

The sys_call_table is just an array defined in a C file and it contains
pointers to the syscall functions. The array is 4KB aligned to allow the
use of the ADRP instruction (longer range ADR) in entry.S.

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
---
arch/arm64/include/asm/stat.h | 63 +++++++++++++++++
arch/arm64/include/asm/statfs.h | 23 ++++++
arch/arm64/include/asm/syscalls.h | 40 +++++++++++
arch/arm64/include/asm/unistd.h | 27 +++++++
arch/arm64/kernel/sys.c | 138 +++++++++++++++++++++++++++++++++++++
5 files changed, 291 insertions(+), 0 deletions(-)
create mode 100644 arch/arm64/include/asm/stat.h
create mode 100644 arch/arm64/include/asm/statfs.h
create mode 100644 arch/arm64/include/asm/syscalls.h
create mode 100644 arch/arm64/include/asm/unistd.h
create mode 100644 arch/arm64/kernel/sys.c

diff --git a/arch/arm64/include/asm/stat.h b/arch/arm64/include/asm/stat.h
new file mode 100644
index 0000000..f63a680
--- /dev/null
+++ b/arch/arm64/include/asm/stat.h
@@ -0,0 +1,63 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_STAT_H
+#define __ASM_STAT_H
+
+#include <asm-generic/stat.h>
+
+#if defined(__KERNEL__) && defined(CONFIG_AARCH32_EMULATION)
+
+#include <asm/compat.h>
+
+/* This matches struct stat64 in glibc2.1, hence the absolutely
+ * insane amounts of padding around dev_t's.
+ * Note: The kernel zero's the padded region because glibc might read them
+ * in the hope that the kernel has stretched to using larger sizes.
+ */
+struct stat64 {
+ compat_u64 st_dev;
+ unsigned char __pad0[4];
+
+#define STAT64_HAS_BROKEN_ST_INO 1
+ compat_ulong_t __st_ino;
+ compat_uint_t st_mode;
+ compat_uint_t st_nlink;
+
+ compat_ulong_t st_uid;
+ compat_ulong_t st_gid;
+
+ compat_u64 st_rdev;
+ unsigned char __pad3[4];
+
+ compat_s64 st_size;
+ compat_ulong_t st_blksize;
+ compat_u64 st_blocks; /* Number 512-byte blocks allocated. */
+
+ compat_ulong_t st_atime;
+ compat_ulong_t st_atime_nsec;
+
+ compat_ulong_t st_mtime;
+ compat_ulong_t st_mtime_nsec;
+
+ compat_ulong_t st_ctime;
+ compat_ulong_t st_ctime_nsec;
+
+ compat_u64 st_ino;
+};
+
+#endif
+
+#endif
diff --git a/arch/arm64/include/asm/statfs.h b/arch/arm64/include/asm/statfs.h
new file mode 100644
index 0000000..6f62190
--- /dev/null
+++ b/arch/arm64/include/asm/statfs.h
@@ -0,0 +1,23 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_STATFS_H
+#define __ASM_STATFS_H
+
+#define ARCH_PACK_COMPAT_STATFS64 __attribute__((packed,aligned(4)))
+
+#include <asm-generic/statfs.h>
+
+#endif
diff --git a/arch/arm64/include/asm/syscalls.h b/arch/arm64/include/asm/syscalls.h
new file mode 100644
index 0000000..09ff335
--- /dev/null
+++ b/arch/arm64/include/asm/syscalls.h
@@ -0,0 +1,40 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_SYSCALLS_H
+#define __ASM_SYSCALLS_H
+
+#include <linux/linkage.h>
+#include <linux/compiler.h>
+#include <linux/signal.h>
+
+/*
+ * System call wrappers implemented in kernel/entry.S.
+ */
+asmlinkage long sys_execve_wrapper(const char __user *filename,
+ const char __user *const __user *argv,
+ const char __user *const __user *envp);
+asmlinkage long sys_clone_wrapper(unsigned long clone_flags,
+ unsigned long newsp,
+ void __user *parent_tid,
+ unsigned long tls_val,
+ void __user *child_tid);
+asmlinkage long sys_rt_sigreturn_wrapper(void);
+asmlinkage long sys_sigaltstack_wrapper(const stack_t __user *uss,
+ stack_t __user *uoss);
+
+#include <asm-generic/syscalls.h>
+
+#endif /* __ASM_SYSCALLS_H */
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
new file mode 100644
index 0000000..b00718c
--- /dev/null
+++ b/arch/arm64/include/asm/unistd.h
@@ -0,0 +1,27 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#if !defined(__ASM_UNISTD_H) || defined(__SYSCALL)
+#define __ASM_UNISTD_H
+
+#ifndef __SYSCALL_COMPAT
+#include <asm-generic/unistd.h>
+#endif
+
+#if defined(__KERNEL__) && defined(CONFIG_AARCH32_EMULATION)
+#include <asm/unistd32.h>
+#endif
+
+#endif /* __ASM_UNISTD_H */
diff --git a/arch/arm64/kernel/sys.c b/arch/arm64/kernel/sys.c
new file mode 100644
index 0000000..905fcfb
--- /dev/null
+++ b/arch/arm64/kernel/sys.c
@@ -0,0 +1,138 @@
+/*
+ * AArch64-specific system calls implementation
+ *
+ * Copyright (C) 2012 ARM Ltd.
+ * Author: Catalin Marinas <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/compiler.h>
+#include <linux/errno.h>
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/export.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/syscalls.h>
+
+/*
+ * Clone a task - this clones the calling program thread.
+ */
+asmlinkage long sys_clone(unsigned long clone_flags, unsigned long newsp,
+ int __user *parent_tidptr, unsigned long tls_val,
+ int __user *child_tidptr, struct pt_regs *regs)
+{
+ if (!newsp)
+ newsp = regs->sp;
+ /* 16-byte aligned stack mandatory on AArch64 */
+ if (newsp & 15)
+ return -EINVAL;
+ return do_fork(clone_flags, newsp, regs, 0, parent_tidptr, child_tidptr);
+}
+
+/*
+ * sys_execve() executes a new program.
+ */
+asmlinkage long sys_execve(const char __user *filenamei,
+ const char __user *const __user *argv,
+ const char __user *const __user *envp,
+ struct pt_regs *regs)
+{
+ long error;
+ char * filename;
+
+ filename = getname(filenamei);
+ error = PTR_ERR(filename);
+ if (IS_ERR(filename))
+ goto out;
+ error = do_execve(filename, argv, envp, regs);
+ putname(filename);
+out:
+ return error;
+}
+
+int kernel_execve(const char *filename,
+ const char *const argv[],
+ const char *const envp[])
+{
+ struct pt_regs regs;
+ int ret;
+
+ memset(&regs, 0, sizeof(struct pt_regs));
+ ret = do_execve(filename,
+ (const char __user *const __user *)argv,
+ (const char __user *const __user *)envp, &regs);
+ if (ret < 0)
+ goto out;
+
+ /*
+ * Save argc to the register structure for userspace.
+ */
+ regs.regs[0] = ret;
+
+ /*
+ * We were successful. We won't be returning to our caller, but
+ * instead to user space by manipulating the kernel stack.
+ */
+ asm( "add x0, %0, %1\n\t"
+ "mov x1, %2\n\t"
+ "mov x2, %3\n\t"
+ "bl memmove\n\t" /* copy regs to top of stack */
+ "mov x27, #0\n\t" /* not a syscall */
+ "mov x28, %0\n\t" /* thread structure */
+ "mov sp, x0\n\t" /* reposition stack pointer */
+ "b ret_to_user"
+ :
+ : "r" (current_thread_info()),
+ "Ir" (THREAD_START_SP - sizeof(regs)),
+ "r" (&regs),
+ "Ir" (sizeof(regs))
+ : "x0", "x1", "x2", "x27", "x28", "x30", "memory");
+
+ out:
+ return ret;
+}
+EXPORT_SYMBOL(kernel_execve);
+
+asmlinkage long sys_mmap(unsigned long addr, unsigned long len,
+ unsigned long prot, unsigned long flags,
+ unsigned long fd, off_t off)
+{
+ if (offset_in_page(off) != 0)
+ return -EINVAL;
+
+ return sys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
+}
+
+/*
+ * Wrappers to pass the pt_regs argument.
+ */
+#define sys_execve sys_execve_wrapper
+#define sys_clone sys_clone_wrapper
+#define sys_rt_sigreturn sys_rt_sigreturn_wrapper
+#define sys_sigaltstack sys_sigaltstack_wrapper
+
+#include <asm/syscalls.h>
+
+#undef __SYSCALL
+#define __SYSCALL(nr, sym) [nr] = sym,
+
+/*
+ * The sys_call_table array must be 4K aligned to be accessible from
+ * kernel/entry.S.
+ */
+void *sys_call_table[__NR_syscalls] __aligned(4096) = {
+ [0 ... __NR_syscalls - 1] = sys_ni_syscall,
+#include <asm/unistd.h>
+};

2012-08-14 17:54:04

by Catalin Marinas

[permalink] [raw]
Subject: [PATCH v2 18/31] arm64: VDSO support

From: Will Deacon <[email protected]>

This patch adds VDSO support for 64-bit applications. The VDSO code is
currently used for sys_rt_sigreturn() and optimised gettimeofday()
(using the user-accessible generic counter).

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
---
arch/arm64/include/asm/vdso.h | 41 +++++
arch/arm64/include/asm/vdso_datapage.h | 43 +++++
arch/arm64/kernel/vdso.c | 261 ++++++++++++++++++++++++++++
arch/arm64/kernel/vdso/.gitignore | 2 +
arch/arm64/kernel/vdso/Makefile | 63 +++++++
arch/arm64/kernel/vdso/gen_vdso_offsets.sh | 15 ++
arch/arm64/kernel/vdso/gettimeofday.S | 242 ++++++++++++++++++++++++++
arch/arm64/kernel/vdso/note.S | 28 +++
arch/arm64/kernel/vdso/sigreturn.S | 37 ++++
arch/arm64/kernel/vdso/vdso.S | 33 ++++
arch/arm64/kernel/vdso/vdso.lds.S | 100 +++++++++++
11 files changed, 865 insertions(+), 0 deletions(-)
create mode 100644 arch/arm64/include/asm/vdso.h
create mode 100644 arch/arm64/include/asm/vdso_datapage.h
create mode 100644 arch/arm64/kernel/vdso.c
create mode 100644 arch/arm64/kernel/vdso/.gitignore
create mode 100644 arch/arm64/kernel/vdso/Makefile
create mode 100755 arch/arm64/kernel/vdso/gen_vdso_offsets.sh
create mode 100644 arch/arm64/kernel/vdso/gettimeofday.S
create mode 100644 arch/arm64/kernel/vdso/note.S
create mode 100644 arch/arm64/kernel/vdso/sigreturn.S
create mode 100644 arch/arm64/kernel/vdso/vdso.S
create mode 100644 arch/arm64/kernel/vdso/vdso.lds.S

diff --git a/arch/arm64/include/asm/vdso.h b/arch/arm64/include/asm/vdso.h
new file mode 100644
index 0000000..839ce00
--- /dev/null
+++ b/arch/arm64/include/asm/vdso.h
@@ -0,0 +1,41 @@
+/*
+ * Copyright (C) 2012 ARM Limited
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_VDSO_H
+#define __ASM_VDSO_H
+
+#ifdef __KERNEL__
+
+/*
+ * Default link address for the vDSO.
+ * Since we randomise the VDSO mapping, there's little point in trying
+ * to prelink this.
+ */
+#define VDSO_LBASE 0x0
+
+#ifndef __ASSEMBLY__
+
+#include <generated/vdso-offsets.h>
+
+#define VDSO_SYMBOL(base, name) \
+({ \
+ (void *)(vdso_offset_##name - VDSO_LBASE + (unsigned long)(base)); \
+})
+
+#endif /* !__ASSEMBLY__ */
+
+#endif /* __KERNEL__ */
+
+#endif /* __ASM_VDSO_H */
diff --git a/arch/arm64/include/asm/vdso_datapage.h b/arch/arm64/include/asm/vdso_datapage.h
new file mode 100644
index 0000000..de66199
--- /dev/null
+++ b/arch/arm64/include/asm/vdso_datapage.h
@@ -0,0 +1,43 @@
+/*
+ * Copyright (C) 2012 ARM Limited
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_VDSO_DATAPAGE_H
+#define __ASM_VDSO_DATAPAGE_H
+
+#ifdef __KERNEL__
+
+#ifndef __ASSEMBLY__
+
+struct vdso_data {
+ __u64 cs_cycle_last; /* Timebase at clocksource init */
+ __u64 xtime_clock_sec; /* Kernel time */
+ __u64 xtime_clock_nsec;
+ __u64 xtime_coarse_sec; /* Coarse time */
+ __u64 xtime_coarse_nsec;
+ __u64 wtm_clock_sec; /* Wall to monotonic time */
+ __u64 wtm_clock_nsec;
+ __u32 tb_seq_count; /* Timebase sequence counter */
+ __u32 cs_mult; /* Clocksource multiplier */
+ __u32 cs_shift; /* Clocksource shift */
+ __u32 tz_minuteswest; /* Whacky timezone stuff */
+ __u32 tz_dsttime;
+ __u32 use_syscall;
+};
+
+#endif /* !__ASSEMBLY__ */
+
+#endif /* __KERNEL__ */
+
+#endif /* __ASM_VDSO_DATAPAGE_H */
diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
new file mode 100644
index 0000000..8d8a365
--- /dev/null
+++ b/arch/arm64/kernel/vdso.c
@@ -0,0 +1,261 @@
+/*
+ * VDSO implementation for AArch64 and vector page setup for AArch32.
+ *
+ * Copyright (C) 2012 ARM Limited
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Author: Will Deacon <[email protected]>
+ */
+
+#include <linux/kernel.h>
+#include <linux/clocksource.h>
+#include <linux/elf.h>
+#include <linux/err.h>
+#include <linux/errno.h>
+#include <linux/gfp.h>
+#include <linux/mm.h>
+#include <linux/sched.h>
+#include <linux/signal.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+
+#include <asm/cacheflush.h>
+#include <asm/signal32.h>
+#include <asm/vdso.h>
+#include <asm/vdso_datapage.h>
+
+extern char vdso_start, vdso_end;
+static unsigned long vdso_pages;
+static struct page **vdso_pagelist;
+
+/*
+ * The vDSO data page.
+ */
+static union {
+ struct vdso_data data;
+ u8 page[PAGE_SIZE];
+} vdso_data_store __page_aligned_data;
+struct vdso_data *vdso_data = &vdso_data_store.data;
+
+#ifdef CONFIG_AARCH32_EMULATION
+/*
+ * Create and map the vectors page for AArch32 tasks.
+ */
+static struct page *vectors_page[1];
+
+static int alloc_vectors_page(void)
+{
+ extern char __kuser_helper_start[], __kuser_helper_end[];
+ int kuser_sz = __kuser_helper_end - __kuser_helper_start;
+ unsigned long vpage;
+
+ vpage = get_zeroed_page(GFP_ATOMIC);
+
+ if (!vpage)
+ return -ENOMEM;
+
+ /* kuser helpers */
+ memcpy((void *)vpage + 0x1000 - kuser_sz, __kuser_helper_start,
+ kuser_sz);
+
+ /* sigreturn code */
+ memcpy((void *)vpage + AARCH32_KERN_SIGRET_CODE_OFFSET,
+ aarch32_sigret_code, sizeof(aarch32_sigret_code));
+
+ flush_icache_range(vpage, vpage + PAGE_SIZE);
+ vectors_page[0] = virt_to_page(vpage);
+
+ return 0;
+}
+arch_initcall(alloc_vectors_page);
+
+int aarch32_setup_vectors_page(struct linux_binprm *bprm, int uses_interp)
+{
+ struct mm_struct *mm = current->mm;
+ unsigned long addr = AARCH32_VECTORS_BASE;
+ int ret;
+
+ down_write(&mm->mmap_sem);
+ current->mm->context.vdso = (void *)addr;
+
+ /* Map vectors page at the high address. */
+ ret = install_special_mapping(mm, addr, PAGE_SIZE,
+ VM_READ|VM_EXEC|VM_MAYREAD|VM_MAYEXEC,
+ vectors_page);
+
+ up_write(&mm->mmap_sem);
+
+ return ret;
+}
+#endif /* CONFIG_AARCH32_EMULATION */
+
+static int __init vdso_init(void)
+{
+ struct page *pg;
+ char *vbase;
+ int i, ret = 0;
+
+ vdso_pages = (&vdso_end - &vdso_start) >> PAGE_SHIFT;
+ pr_info("vdso: %ld pages (%ld code, %ld data) at base %p\n",
+ vdso_pages + 1, vdso_pages, 1L, &vdso_start);
+
+ /* Allocate the vDSO pagelist, plus a page for the data. */
+ vdso_pagelist = kzalloc(sizeof(struct page *) * (vdso_pages + 1),
+ GFP_KERNEL);
+ if (vdso_pagelist == NULL) {
+ pr_err("Failed to allocate vDSO pagelist!\n");
+ return -ENOMEM;
+ }
+
+ /* Grab the vDSO code pages. */
+ for (i = 0; i < vdso_pages; i++) {
+ pg = virt_to_page(&vdso_start + i*PAGE_SIZE);
+ ClearPageReserved(pg);
+ get_page(pg);
+ vdso_pagelist[i] = pg;
+ }
+
+ /* Sanity check the shared object header. */
+ vbase = vmap(vdso_pagelist, 1, 0, PAGE_KERNEL);
+ if (vbase == NULL) {
+ pr_err("Failed to map vDSO pagelist!\n");
+ return -ENOMEM;
+ } else if (memcmp(vbase, "\177ELF", 4)) {
+ pr_err("vDSO is not a valid ELF object!\n");
+ ret = -EINVAL;
+ goto unmap;
+ }
+
+ /* Grab the vDSO data page. */
+ pg = virt_to_page(vdso_data);
+ get_page(pg);
+ vdso_pagelist[i] = pg;
+
+unmap:
+ vunmap(vbase);
+ return ret;
+}
+arch_initcall(vdso_init);
+
+int arch_setup_additional_pages(struct linux_binprm *bprm,
+ int uses_interp)
+{
+ struct mm_struct *mm = current->mm;
+ unsigned long vdso_base, vdso_mapping_len;
+ int ret;
+
+ /* Be sure to map the data page */
+ vdso_mapping_len = (vdso_pages + 1) << PAGE_SHIFT;
+
+ down_write(&mm->mmap_sem);
+ vdso_base = get_unmapped_area(NULL, 0, vdso_mapping_len, 0, 0);
+ if (IS_ERR_VALUE(vdso_base)) {
+ ret = vdso_base;
+ goto up_fail;
+ }
+ mm->context.vdso = (void *)vdso_base;
+
+ ret = install_special_mapping(mm, vdso_base, vdso_mapping_len,
+ VM_READ|VM_EXEC|
+ VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
+ vdso_pagelist);
+ if (ret) {
+ mm->context.vdso = NULL;
+ goto up_fail;
+ }
+
+up_fail:
+ up_write(&mm->mmap_sem);
+
+ return ret;
+}
+
+const char *arch_vma_name(struct vm_area_struct *vma)
+{
+ /*
+ * We can re-use the vdso pointer in mm_context_t for identifying
+ * the vectors page for compat applications. The vDSO will always
+ * sit above TASK_UNMAPPED_BASE and so we don't need to worry about
+ * it conflicting with the vectors base.
+ */
+ if (vma->vm_mm && vma->vm_start == (long)vma->vm_mm->context.vdso) {
+#ifdef CONFIG_AARCH32_EMULATION
+ if (vma->vm_start == AARCH32_VECTORS_BASE)
+ return "[vectors]";
+#endif
+ return "[vdso]";
+ }
+
+ return NULL;
+}
+
+/*
+ * We define AT_SYSINFO_EHDR, so we need these function stubs to keep
+ * Linux happy.
+ */
+int in_gate_area_no_mm(unsigned long addr)
+{
+ return 0;
+}
+
+int in_gate_area(struct mm_struct *mm, unsigned long addr)
+{
+ return 0;
+}
+
+struct vm_area_struct *get_gate_vma(struct mm_struct *mm)
+{
+ return NULL;
+}
+
+/*
+ * Update the vDSO data page to keep in sync with kernel timekeeping.
+ */
+void update_vsyscall(struct timespec *ts, struct timespec *wtm,
+ struct clocksource *clock, u32 mult)
+{
+ struct timespec xtime_coarse;
+ u32 use_syscall = strcmp(clock->name, "arch_sys_counter");
+
+ ++vdso_data->tb_seq_count;
+ smp_wmb();
+
+ xtime_coarse = __current_kernel_time();
+ vdso_data->use_syscall = use_syscall;
+ vdso_data->xtime_coarse_sec = xtime_coarse.tv_sec;
+ vdso_data->xtime_coarse_nsec = xtime_coarse.tv_nsec;
+
+ if (!use_syscall) {
+ vdso_data->cs_cycle_last = clock->cycle_last;
+ vdso_data->xtime_clock_sec = ts->tv_sec;
+ vdso_data->xtime_clock_nsec = ts->tv_nsec;
+ vdso_data->cs_mult = mult;
+ vdso_data->cs_shift = clock->shift;
+ vdso_data->wtm_clock_sec = wtm->tv_sec;
+ vdso_data->wtm_clock_nsec = wtm->tv_nsec;
+ }
+
+ smp_wmb();
+ ++vdso_data->tb_seq_count;
+}
+
+void update_vsyscall_tz(void)
+{
+ ++vdso_data->tb_seq_count;
+ smp_wmb();
+ vdso_data->tz_minuteswest = sys_tz.tz_minuteswest;
+ vdso_data->tz_dsttime = sys_tz.tz_dsttime;
+ smp_wmb();
+ ++vdso_data->tb_seq_count;
+}
diff --git a/arch/arm64/kernel/vdso/.gitignore b/arch/arm64/kernel/vdso/.gitignore
new file mode 100644
index 0000000..b8cc94e
--- /dev/null
+++ b/arch/arm64/kernel/vdso/.gitignore
@@ -0,0 +1,2 @@
+vdso.lds
+vdso-offsets.h
diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile
new file mode 100644
index 0000000..d8064af
--- /dev/null
+++ b/arch/arm64/kernel/vdso/Makefile
@@ -0,0 +1,63 @@
+#
+# Building a vDSO image for AArch64.
+#
+# Author: Will Deacon <[email protected]>
+# Heavily based on the vDSO Makefiles for other archs.
+#
+
+obj-vdso := gettimeofday.o note.o sigreturn.o
+
+# Build rules
+targets := $(obj-vdso) vdso.so vdso.so.dbg
+obj-vdso := $(addprefix $(obj)/, $(obj-vdso))
+
+ccflags-y := -shared -fno-common -fno-builtin
+ccflags-y += -nostdlib -Wl,-soname=linux-vdso.so.1 \
+ $(call cc-ldoption, -Wl$(comma)--hash-style=sysv)
+
+obj-y += vdso.o
+extra-y += vdso.lds vdso-offsets.h
+CPPFLAGS_vdso.lds += -P -C -U$(ARCH)
+
+# Force dependency (incbin is bad)
+$(obj)/vdso.o : $(obj)/vdso.so
+
+# Link rule for the .so file, .lds has to be first
+$(obj)/vdso.so.dbg: $(src)/vdso.lds $(obj-vdso)
+ $(call if_changed,vdsold)
+
+# Strip rule for the .so file
+$(obj)/%.so: OBJCOPYFLAGS := -S
+$(obj)/%.so: $(obj)/%.so.dbg FORCE
+ $(call if_changed,objcopy)
+
+# Generate VDSO offsets using helper script
+gen-vdsosym := $(srctree)/$(src)/gen_vdso_offsets.sh
+quiet_cmd_vdsosym = VDSOSYM $@
+define cmd_vdsosym
+ $(NM) $< | $(gen-vdsosym) | LC_ALL=C sort > $@ && \
+ cp $@ include/generated/
+endef
+
+$(obj)/vdso-offsets.h: $(obj)/vdso.so.dbg FORCE
+ $(call if_changed,vdsosym)
+
+# Assembly rules for the .S files
+$(obj-vdso): %.o: %.S
+ $(call if_changed_dep,vdsoas)
+
+# Actual build commands
+quiet_cmd_vdsold = VDSOL $@
+ cmd_vdsold = $(CC) $(c_flags) -Wl,-T $^ -o $@
+quiet_cmd_vdsoas = VDSOA $@
+ cmd_vdsoas = $(CC) $(a_flags) -c -o $@ $<
+
+# Install commands for the unstripped file
+quiet_cmd_vdso_install = INSTALL $@
+ cmd_vdso_install = cp $(obj)/[email protected] $(MODLIB)/vdso/$@
+
+vdso.so: $(obj)/vdso.so.dbg
+ @mkdir -p $(MODLIB)/vdso
+ $(call cmd,vdso_install)
+
+vdso_install: vdso.so
diff --git a/arch/arm64/kernel/vdso/gen_vdso_offsets.sh b/arch/arm64/kernel/vdso/gen_vdso_offsets.sh
new file mode 100755
index 0000000..01924ff
--- /dev/null
+++ b/arch/arm64/kernel/vdso/gen_vdso_offsets.sh
@@ -0,0 +1,15 @@
+#!/bin/sh
+
+#
+# Match symbols in the DSO that look like VDSO_*; produce a header file
+# of constant offsets into the shared object.
+#
+# Doing this inside the Makefile will break the $(filter-out) function,
+# causing Kbuild to rebuild the vdso-offsets header file every time.
+#
+# Author: Will Deacon <[email protected]
+#
+
+LC_ALL=C
+sed -n -e 's/^00*/0/' -e \
+'s/^\([0-9a-fA-F]*\) . VDSO_\([a-zA-Z0-9_]*\)$/\#define vdso_offset_\2\t0x\1/p'
diff --git a/arch/arm64/kernel/vdso/gettimeofday.S b/arch/arm64/kernel/vdso/gettimeofday.S
new file mode 100644
index 0000000..dcb8c20
--- /dev/null
+++ b/arch/arm64/kernel/vdso/gettimeofday.S
@@ -0,0 +1,242 @@
+/*
+ * Userspace implementations of gettimeofday() and friends.
+ *
+ * Copyright (C) 2012 ARM Limited
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Author: Will Deacon <[email protected]>
+ */
+
+#include <linux/linkage.h>
+#include <asm/asm-offsets.h>
+#include <asm/unistd.h>
+
+#define NSEC_PER_SEC_LO16 0xca00
+#define NSEC_PER_SEC_HI16 0x3b9a
+
+vdso_data .req x6
+use_syscall .req w7
+seqcnt .req w8
+
+ .macro seqcnt_acquire
+9999: ldr seqcnt, [vdso_data, #VDSO_TB_SEQ_COUNT]
+ tbnz seqcnt, #0, 9999b
+ dmb ishld
+ ldr use_syscall, [vdso_data, #VDSO_USE_SYSCALL]
+ .endm
+
+ .macro seqcnt_read, cnt
+ dmb ishld
+ ldr \cnt, [vdso_data, #VDSO_TB_SEQ_COUNT]
+ .endm
+
+ .macro seqcnt_check, cnt, fail
+ cmp \cnt, seqcnt
+ b.ne \fail
+ .endm
+
+ .text
+
+/* int __kernel_gettimeofday(struct timeval *tv, struct timezone *tz); */
+ENTRY(__kernel_gettimeofday)
+ .cfi_startproc
+ mov x2, x30
+ .cfi_register x30, x2
+
+ /* Acquire the sequence counter and get the timespec. */
+ adr vdso_data, _vdso_data
+1: seqcnt_acquire
+ cbnz use_syscall, 4f
+
+ /* If tv is NULL, skip to the timezone code. */
+ cbz x0, 2f
+ bl __do_get_tspec
+ seqcnt_check w13, 1b
+
+ /* Convert ns to us. */
+ mov x11, #1000
+ udiv x10, x10, x11
+ stp x9, x10, [x0, #TVAL_TV_SEC]
+2:
+ /* If tz is NULL, return 0. */
+ cbz x1, 3f
+ ldp w4, w5, [vdso_data, #VDSO_TZ_MINWEST]
+ seqcnt_read w13
+ seqcnt_check w13, 1b
+ stp w4, w5, [x1, #TZ_MINWEST]
+3:
+ mov x0, xzr
+ ret x2
+4:
+ /* Syscall fallback. */
+ mov x8, #__NR_gettimeofday
+ svc #0
+ ret x2
+ .cfi_endproc
+ENDPROC(__kernel_gettimeofday)
+
+/* int __kernel_clock_gettime(clockid_t clock_id, struct timespec *tp); */
+ENTRY(__kernel_clock_gettime)
+ .cfi_startproc
+ cmp w0, #CLOCK_REALTIME
+ ccmp w0, #CLOCK_MONOTONIC, #0x4, ne
+ b.ne 2f
+
+ mov x2, x30
+ .cfi_register x30, x2
+
+ /* Get kernel timespec. */
+ adr vdso_data, _vdso_data
+1: seqcnt_acquire
+ cbnz use_syscall, 7f
+
+ bl __do_get_tspec
+ seqcnt_check w13, 1b
+
+ cmp w0, #CLOCK_MONOTONIC
+ b.ne 6f
+
+ /* Get wtm timespec. */
+ ldp x14, x15, [vdso_data, #VDSO_WTM_CLK_SEC]
+
+ /* Check the sequence counter. */
+ seqcnt_read w13
+ seqcnt_check w13, 1b
+ b 4f
+2:
+ cmp w0, #CLOCK_REALTIME_COARSE
+ ccmp w0, #CLOCK_MONOTONIC_COARSE, #0x4, ne
+ b.ne 8f
+
+ /* Get coarse timespec. */
+ adr vdso_data, _vdso_data
+3: seqcnt_acquire
+ ldp x9, x10, [vdso_data, #VDSO_XTIME_CRS_SEC]
+
+ cmp w0, #CLOCK_MONOTONIC_COARSE
+ b.ne 6f
+
+ /* Get wtm timespec. */
+ ldp x14, x15, [vdso_data, #VDSO_WTM_CLK_SEC]
+
+ /* Check the sequence counter. */
+ seqcnt_read w13
+ seqcnt_check w13, 3b
+4:
+ /* Add on wtm timespec. */
+ add x9, x9, x14
+ add x10, x10, x15
+
+ /* Normalise the new timespec. */
+ mov x14, #NSEC_PER_SEC_LO16
+ movk x14, #NSEC_PER_SEC_HI16, lsl #16
+ cmp x10, x14
+ b.lt 5f
+ sub x10, x10, x14
+ add x9, x9, #1
+5:
+ cmp x10, #0
+ b.ge 6f
+ add x10, x10, x14
+ sub x9, x9, #1
+
+6: /* Store to the user timespec. */
+ stp x9, x10, [x1, #TSPEC_TV_SEC]
+ mov x0, xzr
+ ret x2
+7:
+ mov x30, x2
+8: /* Syscall fallback. */
+ mov x8, #__NR_clock_gettime
+ svc #0
+ ret
+ .cfi_endproc
+ENDPROC(__kernel_clock_gettime)
+
+/* int __kernel_clock_getres(clockid_t clock_id, struct timespec *res); */
+ENTRY(__kernel_clock_getres)
+ .cfi_startproc
+ cbz w1, 3f
+
+ cmp w0, #CLOCK_REALTIME
+ ccmp w0, #CLOCK_MONOTONIC, #0x4, ne
+ b.ne 1f
+
+ ldr x2, 5f
+ b 2f
+1:
+ cmp w0, #CLOCK_REALTIME_COARSE
+ ccmp w0, #CLOCK_MONOTONIC_COARSE, #0x4, ne
+ b.ne 4f
+ ldr x2, 6f
+2:
+ stp xzr, x2, [x1]
+
+3: /* res == NULL. */
+ mov w0, wzr
+ ret
+
+4: /* Syscall fallback. */
+ mov x8, #__NR_clock_getres
+ svc #0
+ ret
+5:
+ .quad CLOCK_REALTIME_RES
+6:
+ .quad CLOCK_COARSE_RES
+ .cfi_endproc
+ENDPROC(__kernel_clock_getres)
+
+/*
+ * Read the current time from the architected counter.
+ * Expects vdso_data to be initialised.
+ * Clobbers the temporary registers (x9 - x15).
+ * Returns:
+ * - (x9, x10) = (ts->tv_sec, ts->tv_nsec)
+ * - (x11, x12) = (xtime->tv_sec, xtime->tv_nsec)
+ * - w13 = vDSO sequence counter
+ */
+ENTRY(__do_get_tspec)
+ .cfi_startproc
+
+ /* Read from the vDSO data page. */
+ ldr x10, [vdso_data, #VDSO_CS_CYCLE_LAST]
+ ldp x11, x12, [vdso_data, #VDSO_XTIME_CLK_SEC]
+ ldp w14, w15, [vdso_data, #VDSO_CS_MULT]
+ seqcnt_read w13
+
+ /* Read the physical counter. */
+ isb
+ mrs x9, cntpct_el0
+
+ /* Calculate cycle delta and convert to ns. */
+ sub x10, x9, x10
+ /* We can only guarantee 56 bits of precision. */
+ movn x9, #0xff0, lsl #48
+ and x10, x9, x10
+ mul x10, x10, x14
+ lsr x10, x10, x15
+
+ /* Use the kernel time to calculate the new timespec. */
+ add x10, x12, x10
+ mov x14, #NSEC_PER_SEC_LO16
+ movk x14, #NSEC_PER_SEC_HI16, lsl #16
+ udiv x15, x10, x14
+ add x9, x15, x11
+ mul x14, x14, x15
+ sub x10, x10, x14
+
+ ret
+ .cfi_endproc
+ENDPROC(__do_get_tspec)
diff --git a/arch/arm64/kernel/vdso/note.S b/arch/arm64/kernel/vdso/note.S
new file mode 100644
index 0000000..b82c85e
--- /dev/null
+++ b/arch/arm64/kernel/vdso/note.S
@@ -0,0 +1,28 @@
+/*
+ * Copyright (C) 2012 ARM Limited
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Author: Will Deacon <[email protected]>
+ *
+ * This supplies .note.* sections to go into the PT_NOTE inside the vDSO text.
+ * Here we can supply some information useful to userland.
+ */
+
+#include <linux/uts.h>
+#include <linux/version.h>
+#include <linux/elfnote.h>
+
+ELFNOTE_START(Linux, 0, "a")
+ .long LINUX_VERSION_CODE
+ELFNOTE_END
diff --git a/arch/arm64/kernel/vdso/sigreturn.S b/arch/arm64/kernel/vdso/sigreturn.S
new file mode 100644
index 0000000..20d98ef
--- /dev/null
+++ b/arch/arm64/kernel/vdso/sigreturn.S
@@ -0,0 +1,37 @@
+/*
+ * Sigreturn trampoline for returning from a signal when the SA_RESTORER
+ * flag is not set.
+ *
+ * Copyright (C) 2012 ARM Limited
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Author: Will Deacon <[email protected]>
+ */
+
+#include <linux/linkage.h>
+#include <asm/unistd.h>
+
+ .text
+
+ nop
+ENTRY(__kernel_rt_sigreturn)
+ .cfi_startproc
+ .cfi_signal_frame
+ .cfi_def_cfa x29, 0
+ .cfi_offset x29, 0 * 8
+ .cfi_offset x30, 1 * 8
+ mov x8, #__NR_rt_sigreturn
+ svc #0
+ .cfi_endproc
+ENDPROC(__kernel_rt_sigreturn)
diff --git a/arch/arm64/kernel/vdso/vdso.S b/arch/arm64/kernel/vdso/vdso.S
new file mode 100644
index 0000000..60c1db5
--- /dev/null
+++ b/arch/arm64/kernel/vdso/vdso.S
@@ -0,0 +1,33 @@
+/*
+ * Copyright (C) 2012 ARM Limited
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Author: Will Deacon <[email protected]>
+ */
+
+#include <linux/init.h>
+#include <linux/linkage.h>
+#include <linux/const.h>
+#include <asm/page.h>
+
+ __PAGE_ALIGNED_DATA
+
+ .globl vdso_start, vdso_end
+ .balign PAGE_SIZE
+vdso_start:
+ .incbin "arch/arm64/kernel/vdso/vdso.so"
+ .balign PAGE_SIZE
+vdso_end:
+
+ .previous
diff --git a/arch/arm64/kernel/vdso/vdso.lds.S b/arch/arm64/kernel/vdso/vdso.lds.S
new file mode 100644
index 0000000..8154b8d
--- /dev/null
+++ b/arch/arm64/kernel/vdso/vdso.lds.S
@@ -0,0 +1,100 @@
+/*
+ * GNU linker script for the VDSO library.
+*
+ * Copyright (C) 2012 ARM Limited
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Author: Will Deacon <[email protected]>
+ * Heavily based on the vDSO linker scripts for other archs.
+ */
+
+#include <linux/const.h>
+#include <asm/page.h>
+#include <asm/vdso.h>
+
+OUTPUT_FORMAT("elf64-littleaarch64", "elf64-bigaarch64", "elf64-littleaarch64")
+OUTPUT_ARCH(aarch64)
+
+SECTIONS
+{
+ . = VDSO_LBASE + SIZEOF_HEADERS;
+
+ .hash : { *(.hash) } :text
+ .gnu.hash : { *(.gnu.hash) }
+ .dynsym : { *(.dynsym) }
+ .dynstr : { *(.dynstr) }
+ .gnu.version : { *(.gnu.version) }
+ .gnu.version_d : { *(.gnu.version_d) }
+ .gnu.version_r : { *(.gnu.version_r) }
+
+ .note : { *(.note.*) } :text :note
+
+ . = ALIGN(16);
+
+ .text : { *(.text*) } :text =0xd503201f
+ PROVIDE (__etext = .);
+ PROVIDE (_etext = .);
+ PROVIDE (etext = .);
+
+ .eh_frame_hdr : { *(.eh_frame_hdr) } :text :eh_frame_hdr
+ .eh_frame : { KEEP (*(.eh_frame)) } :text
+
+ .dynamic : { *(.dynamic) } :text :dynamic
+
+ .rodata : { *(.rodata*) } :text
+
+ _end = .;
+ PROVIDE(end = .);
+
+ . = ALIGN(PAGE_SIZE);
+ PROVIDE(_vdso_data = .);
+
+ /DISCARD/ : {
+ *(.note.GNU-stack)
+ *(.data .data.* .gnu.linkonce.d.* .sdata*)
+ *(.bss .sbss .dynbss .dynsbss)
+ }
+}
+
+/*
+ * We must supply the ELF program headers explicitly to get just one
+ * PT_LOAD segment, and set the flags explicitly to make segments read-only.
+ */
+PHDRS
+{
+ text PT_LOAD FLAGS(5) FILEHDR PHDRS; /* PF_R|PF_X */
+ dynamic PT_DYNAMIC FLAGS(4); /* PF_R */
+ note PT_NOTE FLAGS(4); /* PF_R */
+ eh_frame_hdr PT_GNU_EH_FRAME;
+}
+
+/*
+ * This controls what symbols we export from the DSO.
+ */
+VERSION
+{
+ LINUX_2.6.39 {
+ global:
+ __kernel_rt_sigreturn;
+ __kernel_gettimeofday;
+ __kernel_clock_gettime;
+ __kernel_clock_getres;
+ local: *;
+ };
+}
+
+/*
+ * Make the sigreturn code visible to the kernel.
+ */
+VDSO_sigtramp = __kernel_rt_sigreturn;

2012-08-14 17:54:01

by Catalin Marinas

[permalink] [raw]
Subject: [PATCH v2 15/31] arm64: SMP support

This patch adds SMP initialisation and spinlocks implementation for
AArch64. The spinlock support uses the new load-acquire/store-release
instructions to avoid explicit barriers. The architecture also specifies
that an event is automatically generated when clearing the exclusive
monitor state to wake up processors in WFE, so there is no need for an
explicit DSB/SEV instruction sequence. The SEVL instruction is used to
set the exclusive monitor locally as there is no conditional WFE and a
branch is more expensive.

For the SMP booting protocol, see Documentation/arm64/booting.txt.

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
---
arch/arm64/include/asm/hardirq.h | 5 +
arch/arm64/include/asm/smp.h | 69 +++++
arch/arm64/include/asm/spinlock.h | 199 +++++++++++++
arch/arm64/include/asm/spinlock_types.h | 38 +++
arch/arm64/kernel/smp.c | 469 +++++++++++++++++++++++++++++++
5 files changed, 780 insertions(+), 0 deletions(-)
create mode 100644 arch/arm64/include/asm/smp.h
create mode 100644 arch/arm64/include/asm/spinlock.h
create mode 100644 arch/arm64/include/asm/spinlock_types.h
create mode 100644 arch/arm64/kernel/smp.c

diff --git a/arch/arm64/include/asm/hardirq.h b/arch/arm64/include/asm/hardirq.h
index c6c9514..5075463 100644
--- a/arch/arm64/include/asm/hardirq.h
+++ b/arch/arm64/include/asm/hardirq.h
@@ -20,8 +20,13 @@
#include <linux/threads.h>
#include <asm/irq.h>

+#define NR_IPI 4
+
typedef struct {
unsigned int __softirq_pending;
+#ifdef CONFIG_SMP
+ unsigned int ipi_irqs[NR_IPI];
+#endif
} ____cacheline_aligned irq_cpustat_t;

#include <linux/irq_cpustat.h> /* Standard mappings for irq_cpustat_t above */
diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
new file mode 100644
index 0000000..7e34295
--- /dev/null
+++ b/arch/arm64/include/asm/smp.h
@@ -0,0 +1,69 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_SMP_H
+#define __ASM_SMP_H
+
+#include <linux/threads.h>
+#include <linux/cpumask.h>
+#include <linux/thread_info.h>
+
+#ifndef CONFIG_SMP
+# error "<asm/smp.h> included in non-SMP build"
+#endif
+
+#define raw_smp_processor_id() (current_thread_info()->cpu)
+
+struct seq_file;
+
+/*
+ * generate IPI list text
+ */
+extern void show_ipi_list(struct seq_file *p, int prec);
+
+/*
+ * Called from C code, this handles an IPI.
+ */
+extern void handle_IPI(int ipinr, struct pt_regs *regs);
+
+/*
+ * Setup the set of possible CPUs (via set_cpu_possible)
+ */
+extern void smp_init_cpus(void);
+
+/*
+ * Provide a function to raise an IPI cross call on CPUs in callmap.
+ */
+extern void set_smp_cross_call(void (*)(const struct cpumask *, unsigned int));
+
+/*
+ * Called from the secondary holding pen, this is the secondary CPU entry point.
+ */
+asmlinkage void secondary_start_kernel(void);
+
+/*
+ * Initial data for bringing up a secondary CPU.
+ */
+struct secondary_data {
+ void *stack;
+};
+extern struct secondary_data secondary_data;
+extern void secondary_holding_pen(void);
+extern volatile unsigned long secondary_holding_pen_release;
+
+extern void arch_send_call_function_single_ipi(int cpu);
+extern void arch_send_call_function_ipi_mask(const struct cpumask *mask);
+
+#endif /* ifndef __ASM_SMP_H */
diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h
new file mode 100644
index 0000000..34a37fb
--- /dev/null
+++ b/arch/arm64/include/asm/spinlock.h
@@ -0,0 +1,199 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_SPINLOCK_H
+#define __ASM_SPINLOCK_H
+
+#include <asm/spinlock_types.h>
+#include <asm/processor.h>
+
+/*
+ * AArch64 Spin-locking.
+ *
+ * We exclusively read the old value. If it is zero, we may have
+ * won the lock, so we try exclusively storing it. A memory barrier
+ * is required after we get a lock, and before we release it, because
+ * V6 CPUs are assumed to have weakly ordered memory.
+ *
+ * Unlocked value: 0
+ * Locked value: 1
+ */
+
+#define arch_spin_is_locked(x) ((x)->lock != 0)
+#define arch_spin_unlock_wait(lock) \
+ do { while (arch_spin_is_locked(lock)) cpu_relax(); } while (0)
+
+#define arch_spin_lock_flags(lock, flags) arch_spin_lock(lock)
+
+static inline void arch_spin_lock(arch_spinlock_t *lock)
+{
+ unsigned int tmp;
+
+ asm volatile(
+ " sevl\n"
+ "1: wfe\n"
+ "2: ldaxr %w0, [%1]\n"
+ " cbnz %w0, 1b\n"
+ " stxr %w0, %w2, [%1]\n"
+ " cbnz %w0, 2b\n"
+ : "=&r" (tmp)
+ : "r" (&lock->lock), "r" (1)
+ : "memory");
+}
+
+static inline int arch_spin_trylock(arch_spinlock_t *lock)
+{
+ unsigned int tmp;
+
+ asm volatile(
+ " ldaxr %w0, [%1]\n"
+ " cbnz %w0, 1f\n"
+ " stxr %w0, %w2, [%1]\n"
+ "1:\n"
+ : "=&r" (tmp)
+ : "r" (&lock->lock), "r" (1)
+ : "memory");
+
+ return !tmp;
+}
+
+static inline void arch_spin_unlock(arch_spinlock_t *lock)
+{
+ asm volatile(
+ " stlr %w1, [%0]\n"
+ : : "r" (&lock->lock), "r" (0) : "memory");
+}
+
+/*
+ * RWLOCKS
+ *
+ *
+ * Write locks are easy - we just set bit 31. When unlocking, we can
+ * just write zero since the lock is exclusively held.
+ */
+
+static inline void arch_write_lock(arch_rwlock_t *rw)
+{
+ unsigned int tmp;
+
+ asm volatile(
+ " sevl\n"
+ "1: wfe\n"
+ "2: ldaxr %w0, [%1]\n"
+ " cbnz %w0, 1b\n"
+ " stxr %w0, %w2, [%1]\n"
+ " cbnz %w0, 2b\n"
+ : "=&r" (tmp)
+ : "r" (&rw->lock), "r" (0x80000000)
+ : "memory");
+}
+
+static inline int arch_write_trylock(arch_rwlock_t *rw)
+{
+ unsigned int tmp;
+
+ asm volatile(
+ " ldaxr %w0, [%1]\n"
+ " cbnz %w0, 1f\n"
+ " stxr %w0, %w2, [%1]\n"
+ "1:\n"
+ : "=&r" (tmp)
+ : "r" (&rw->lock), "r" (0x80000000)
+ : "memory");
+
+ return !tmp;
+}
+
+static inline void arch_write_unlock(arch_rwlock_t *rw)
+{
+ asm volatile(
+ " stlr %w1, [%0]\n"
+ : : "r" (&rw->lock), "r" (0) : "memory");
+}
+
+/* write_can_lock - would write_trylock() succeed? */
+#define arch_write_can_lock(x) ((x)->lock == 0)
+
+/*
+ * Read locks are a bit more hairy:
+ * - Exclusively load the lock value.
+ * - Increment it.
+ * - Store new lock value if positive, and we still own this location.
+ * If the value is negative, we've already failed.
+ * - If we failed to store the value, we want a negative result.
+ * - If we failed, try again.
+ * Unlocking is similarly hairy. We may have multiple read locks
+ * currently active. However, we know we won't have any write
+ * locks.
+ */
+static inline void arch_read_lock(arch_rwlock_t *rw)
+{
+ unsigned int tmp, tmp2;
+
+ asm volatile(
+ " sevl\n"
+ "1: wfe\n"
+ "2: ldaxr %w0, [%2]\n"
+ " add %w0, %w0, #1\n"
+ " tbnz %w0, #31, 1b\n"
+ " stxr %w1, %w0, [%2]\n"
+ " cbnz %w1, 2b\n"
+ : "=&r" (tmp), "=&r" (tmp2)
+ : "r" (&rw->lock)
+ : "memory");
+}
+
+static inline void arch_read_unlock(arch_rwlock_t *rw)
+{
+ unsigned int tmp, tmp2;
+
+ asm volatile(
+ "1: ldxr %w0, [%2]\n"
+ " sub %w0, %w0, #1\n"
+ " stlxr %w1, %w0, [%2]\n"
+ " cbnz %w1, 1b\n"
+ : "=&r" (tmp), "=&r" (tmp2)
+ : "r" (&rw->lock)
+ : "memory");
+}
+
+static inline int arch_read_trylock(arch_rwlock_t *rw)
+{
+ unsigned int tmp, tmp2 = 1;
+
+ asm volatile(
+ " ldaxr %w0, [%2]\n"
+ " add %w0, %w0, #1\n"
+ " tbnz %w0, #31, 1f\n"
+ " stxr %w1, %w0, [%2]\n"
+ "1:\n"
+ : "=&r" (tmp), "+r" (tmp2)
+ : "r" (&rw->lock)
+ : "memory");
+
+ return !tmp2;
+}
+
+/* read_can_lock - would read_trylock() succeed? */
+#define arch_read_can_lock(x) ((x)->lock < 0x80000000)
+
+#define arch_read_lock_flags(lock, flags) arch_read_lock(lock)
+#define arch_write_lock_flags(lock, flags) arch_write_lock(lock)
+
+#define arch_spin_relax(lock) cpu_relax()
+#define arch_read_relax(lock) cpu_relax()
+#define arch_write_relax(lock) cpu_relax()
+
+#endif /* __ASM_SPINLOCK_H */
diff --git a/arch/arm64/include/asm/spinlock_types.h b/arch/arm64/include/asm/spinlock_types.h
new file mode 100644
index 0000000..9a49434
--- /dev/null
+++ b/arch/arm64/include/asm/spinlock_types.h
@@ -0,0 +1,38 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_SPINLOCK_TYPES_H
+#define __ASM_SPINLOCK_TYPES_H
+
+#if !defined(__LINUX_SPINLOCK_TYPES_H) && !defined(__ASM_SPINLOCK_H)
+# error "please don't include this file directly"
+#endif
+
+/* We only require natural alignment for exclusive accesses. */
+#define __lock_aligned
+
+typedef struct {
+ volatile unsigned int lock;
+} arch_spinlock_t;
+
+#define __ARCH_SPIN_LOCK_UNLOCKED { 0 }
+
+typedef struct {
+ volatile unsigned int lock;
+} arch_rwlock_t;
+
+#define __ARCH_RW_LOCK_UNLOCKED { 0 }
+
+#endif
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
new file mode 100644
index 0000000..0b03e77
--- /dev/null
+++ b/arch/arm64/kernel/smp.c
@@ -0,0 +1,469 @@
+/*
+ * SMP initialisation and IPI support
+ * Based on arch/arm/kernel/smp.c
+ *
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/delay.h>
+#include <linux/init.h>
+#include <linux/spinlock.h>
+#include <linux/sched.h>
+#include <linux/interrupt.h>
+#include <linux/cache.h>
+#include <linux/profile.h>
+#include <linux/errno.h>
+#include <linux/mm.h>
+#include <linux/err.h>
+#include <linux/cpu.h>
+#include <linux/smp.h>
+#include <linux/seq_file.h>
+#include <linux/irq.h>
+#include <linux/percpu.h>
+#include <linux/clockchips.h>
+#include <linux/completion.h>
+#include <linux/of.h>
+
+#include <asm/atomic.h>
+#include <asm/cacheflush.h>
+#include <asm/cputype.h>
+#include <asm/mmu_context.h>
+#include <asm/pgtable.h>
+#include <asm/pgalloc.h>
+#include <asm/processor.h>
+#include <asm/sections.h>
+#include <asm/tlbflush.h>
+#include <asm/ptrace.h>
+#include <asm/mmu_context.h>
+
+/*
+ * as from 2.5, kernels no longer have an init_tasks structure
+ * so we need some other way of telling a new secondary core
+ * where to place its SVC stack
+ */
+struct secondary_data secondary_data;
+volatile unsigned long secondary_holding_pen_release;
+
+enum ipi_msg_type {
+ IPI_RESCHEDULE,
+ IPI_CALL_FUNC,
+ IPI_CALL_FUNC_SINGLE,
+ IPI_CPU_STOP,
+};
+
+static DEFINE_SPINLOCK(boot_lock);
+
+/*
+ * Write secondary_holding_pen_release in a way that is guaranteed to be
+ * visible to all observers, irrespective of whether they're taking part
+ * in coherency or not. This is necessary for the hotplug code to work
+ * reliably.
+ */
+static void __cpuinit write_pen_release(int val)
+{
+ void *start = (void *)&secondary_holding_pen_release;
+ unsigned long size = sizeof(secondary_holding_pen_release);
+
+ secondary_holding_pen_release = val;
+ __cpuc_flush_dcache_area(start, size);
+}
+
+/*
+ * Boot a secondary CPU, and assign it the specified idle task.
+ * This also gives us the initial stack to use for this CPU.
+ */
+static int __cpuinit boot_secondary(unsigned int cpu, struct task_struct *idle)
+{
+ unsigned long timeout;
+
+ /*
+ * Set synchronisation state between this boot processor
+ * and the secondary one
+ */
+ spin_lock(&boot_lock);
+
+ /*
+ * Update the pen release flag.
+ */
+ write_pen_release(cpu);
+
+ /*
+ * Send an event, causing the secondaries to read pen_release.
+ */
+ sev();
+
+ timeout = jiffies + (1 * HZ);
+ while (time_before(jiffies, timeout)) {
+ if (secondary_holding_pen_release == -1UL)
+ break;
+ udelay(10);
+ }
+
+ /*
+ * Now the secondary core is starting up let it run its
+ * calibrations, then wait for it to finish
+ */
+ spin_unlock(&boot_lock);
+
+ return secondary_holding_pen_release != -1 ? -ENOSYS : 0;
+}
+
+static DECLARE_COMPLETION(cpu_running);
+
+int __cpuinit __cpu_up(unsigned int cpu, struct task_struct *idle)
+{
+ int ret;
+
+ /*
+ * We need to tell the secondary core where to find its stack and the
+ * page tables.
+ */
+ secondary_data.stack = task_stack_page(idle) + THREAD_START_SP;
+ __cpuc_flush_dcache_area(&secondary_data, sizeof(secondary_data));
+
+ /*
+ * Now bring the CPU into our world.
+ */
+ ret = boot_secondary(cpu, idle);
+ if (ret == 0) {
+ /*
+ * CPU was successfully started, wait for it to come online or
+ * time out.
+ */
+ wait_for_completion_timeout(&cpu_running,
+ msecs_to_jiffies(1000));
+
+ if (!cpu_online(cpu)) {
+ pr_crit("CPU%u: failed to come online\n", cpu);
+ ret = -EIO;
+ }
+ } else {
+ pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
+ }
+
+ secondary_data.stack = NULL;
+
+ return ret;
+}
+
+/*
+ * This is the secondary CPU boot entry. We're using this CPUs
+ * idle thread stack, but a set of temporary page tables.
+ */
+asmlinkage void __cpuinit secondary_start_kernel(void)
+{
+ struct mm_struct *mm = &init_mm;
+ unsigned int cpu = smp_processor_id();
+
+ printk("CPU%u: Booted secondary processor\n", cpu);
+
+ /*
+ * All kernel threads share the same mm context; grab a
+ * reference and switch to it.
+ */
+ atomic_inc(&mm->mm_count);
+ current->active_mm = mm;
+ cpumask_set_cpu(cpu, mm_cpumask(mm));
+
+ /*
+ * TTBR0 is only used for the identity mapping at this stage. Make it
+ * point to zero page to avoid speculatively fetching new entries.
+ */
+ cpu_set_reserved_ttbr0();
+ flush_tlb_all();
+
+ preempt_disable();
+ trace_hardirqs_off();
+
+ /*
+ * Let the primary processor know we're out of the
+ * pen, then head off into the C entry point
+ */
+ write_pen_release(-1);
+
+ /*
+ * Synchronise with the boot thread.
+ */
+ spin_lock(&boot_lock);
+ spin_unlock(&boot_lock);
+
+ /*
+ * Enable local interrupts.
+ */
+ notify_cpu_starting(cpu);
+ local_irq_enable();
+ local_fiq_enable();
+
+ /*
+ * OK, now it's safe to let the boot CPU continue. Wait for
+ * the CPU migration code to notice that the CPU is online
+ * before we continue.
+ */
+ set_cpu_online(cpu, true);
+ while (!cpu_active(cpu))
+ cpu_relax();
+
+ /*
+ * OK, it's off to the idle thread for us
+ */
+ cpu_idle();
+}
+
+void __init smp_cpus_done(unsigned int max_cpus)
+{
+ unsigned long bogosum = loops_per_jiffy * num_online_cpus();
+
+ pr_info("SMP: Total of %d processors activated (%lu.%02lu BogoMIPS).\n",
+ num_online_cpus(), bogosum / (500000/HZ),
+ (bogosum / (5000/HZ)) % 100);
+}
+
+void __init smp_prepare_boot_cpu(void)
+{
+}
+
+static void (*smp_cross_call)(const struct cpumask *, unsigned int);
+static phys_addr_t cpu_release_addr[NR_CPUS];
+
+/*
+ * Enumerate the possible CPU set from the device tree.
+ */
+void __init smp_init_cpus(void)
+{
+ const char *enable_method;
+ struct device_node *dn = NULL;
+ int cpu = 0;
+
+ while ((dn = of_find_node_by_type(dn, "cpu"))) {
+ if (cpu >= NR_CPUS)
+ goto next;
+
+ /*
+ * We currently support only the "spin-table" enable-method.
+ */
+ enable_method = of_get_property(dn, "enable-method", NULL);
+ if (!enable_method || strcmp(enable_method, "spin-table")) {
+ pr_err("CPU %d: missing or invalid enable-method property: %s\n",
+ cpu, enable_method);
+ goto next;
+ }
+
+ /*
+ * Determine the address from which the CPU is polling.
+ */
+ if (of_property_read_u64(dn, "cpu-release-addr",
+ &cpu_release_addr[cpu])) {
+ pr_err("CPU %d: missing or invalid cpu-release-addr property\n",
+ cpu);
+ goto next;
+ }
+
+ set_cpu_possible(cpu, true);
+next:
+ cpu++;
+ }
+
+ /* sanity check */
+ if (cpu > NR_CPUS)
+ pr_warning("no. of cores (%d) greater than configured maximum of %d - clipping\n",
+ cpu, NR_CPUS);
+}
+
+void __init smp_prepare_cpus(unsigned int max_cpus)
+{
+ int cpu;
+ void **release_addr;
+ unsigned int ncores = num_possible_cpus();
+
+ /*
+ * are we trying to boot more cores than exist?
+ */
+ if (max_cpus > ncores)
+ max_cpus = ncores;
+
+ /*
+ * Initialise the present map (which describes the set of CPUs
+ * actually populated at the present time) and release the
+ * secondaries from the bootloader.
+ */
+ for_each_possible_cpu(cpu) {
+ if (max_cpus == 0)
+ break;
+
+ if (!cpu_release_addr[cpu])
+ continue;
+
+ release_addr = __va(cpu_release_addr[cpu]);
+ release_addr[0] = (void *)__pa(secondary_holding_pen);
+ __cpuc_flush_dcache_area(release_addr, sizeof(release_addr[0]));
+
+ set_cpu_present(cpu, true);
+ max_cpus--;
+ }
+
+ /*
+ * Send an event to wake up the secondaries.
+ */
+ sev();
+}
+
+
+void __init set_smp_cross_call(void (*fn)(const struct cpumask *, unsigned int))
+{
+ smp_cross_call = fn;
+}
+
+void arch_send_call_function_ipi_mask(const struct cpumask *mask)
+{
+ smp_cross_call(mask, IPI_CALL_FUNC);
+}
+
+void arch_send_call_function_single_ipi(int cpu)
+{
+ smp_cross_call(cpumask_of(cpu), IPI_CALL_FUNC_SINGLE);
+}
+
+static const char *ipi_types[NR_IPI] = {
+#define S(x,s) [x - IPI_RESCHEDULE] = s
+ S(IPI_RESCHEDULE, "Rescheduling interrupts"),
+ S(IPI_CALL_FUNC, "Function call interrupts"),
+ S(IPI_CALL_FUNC_SINGLE, "Single function call interrupts"),
+ S(IPI_CPU_STOP, "CPU stop interrupts"),
+};
+
+void show_ipi_list(struct seq_file *p, int prec)
+{
+ unsigned int cpu, i;
+
+ for (i = 0; i < NR_IPI; i++) {
+ seq_printf(p, "%*s%u:%s", prec - 1, "IPI", i + IPI_RESCHEDULE,
+ prec >= 4 ? " " : "");
+ for_each_present_cpu(cpu)
+ seq_printf(p, "%10u ",
+ __get_irq_stat(cpu, ipi_irqs[i]));
+ seq_printf(p, " %s\n", ipi_types[i]);
+ }
+}
+
+u64 smp_irq_stat_cpu(unsigned int cpu)
+{
+ u64 sum = 0;
+ int i;
+
+ for (i = 0; i < NR_IPI; i++)
+ sum += __get_irq_stat(cpu, ipi_irqs[i]);
+
+ return sum;
+}
+
+static DEFINE_SPINLOCK(stop_lock);
+
+/*
+ * ipi_cpu_stop - handle IPI from smp_send_stop()
+ */
+static void ipi_cpu_stop(unsigned int cpu)
+{
+ if (system_state == SYSTEM_BOOTING ||
+ system_state == SYSTEM_RUNNING) {
+ spin_lock(&stop_lock);
+ pr_crit("CPU%u: stopping\n", cpu);
+ dump_stack();
+ spin_unlock(&stop_lock);
+ }
+
+ set_cpu_online(cpu, false);
+
+ local_fiq_disable();
+ local_irq_disable();
+
+ while (1)
+ cpu_relax();
+}
+
+/*
+ * Main handler for inter-processor interrupts
+ */
+void handle_IPI(int ipinr, struct pt_regs *regs)
+{
+ unsigned int cpu = smp_processor_id();
+ struct pt_regs *old_regs = set_irq_regs(regs);
+
+ if (ipinr >= IPI_RESCHEDULE && ipinr < IPI_RESCHEDULE + NR_IPI)
+ __inc_irq_stat(cpu, ipi_irqs[ipinr - IPI_RESCHEDULE]);
+
+ switch (ipinr) {
+ case IPI_RESCHEDULE:
+ scheduler_ipi();
+ break;
+
+ case IPI_CALL_FUNC:
+ irq_enter();
+ generic_smp_call_function_interrupt();
+ irq_exit();
+ break;
+
+ case IPI_CALL_FUNC_SINGLE:
+ irq_enter();
+ generic_smp_call_function_single_interrupt();
+ irq_exit();
+ break;
+
+ case IPI_CPU_STOP:
+ irq_enter();
+ ipi_cpu_stop(cpu);
+ irq_exit();
+ break;
+
+ default:
+ pr_crit("CPU%u: Unknown IPI message 0x%x\n", cpu, ipinr);
+ break;
+ }
+ set_irq_regs(old_regs);
+}
+
+void smp_send_reschedule(int cpu)
+{
+ smp_cross_call(cpumask_of(cpu), IPI_RESCHEDULE);
+}
+
+void smp_send_stop(void)
+{
+ unsigned long timeout;
+
+ if (num_online_cpus() > 1) {
+ cpumask_t mask;
+
+ cpumask_copy(&mask, cpu_online_mask);
+ cpu_clear(smp_processor_id(), mask);
+
+ smp_cross_call(&mask, IPI_CPU_STOP);
+ }
+
+ /* Wait up to one second for other CPUs to stop */
+ timeout = USEC_PER_SEC;
+ while (num_online_cpus() > 1 && timeout--)
+ udelay(1);
+
+ if (num_online_cpus() > 1)
+ pr_warning("SMP: failed to stop secondary CPUs\n");
+}
+
+/*
+ * not supported here
+ */
+int setup_profiling_timer(unsigned int multiplier)
+{
+ return -EINVAL;
+}

2012-08-14 17:53:59

by Catalin Marinas

[permalink] [raw]
Subject: [PATCH v2 10/31] arm64: TLB maintenance functionality

This patch adds the TLB maintenance functions. There is no distinction
made between the I and D TLBs. TLB maintenance operations are
automatically broadcast between CPUs in hardware. The inner-shareable
operations are always present, even on UP systems.

NOTE: Large part of this patch to be dropped once Peter Z's generic
mmu_gather patches are merged.

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
---
arch/arm64/include/asm/tlb.h | 190 +++++++++++++++++++++++++++++++++++++
arch/arm64/include/asm/tlbflush.h | 123 ++++++++++++++++++++++++
arch/arm64/mm/tlb.S | 71 ++++++++++++++
3 files changed, 384 insertions(+), 0 deletions(-)
create mode 100644 arch/arm64/include/asm/tlb.h
create mode 100644 arch/arm64/include/asm/tlbflush.h
create mode 100644 arch/arm64/mm/tlb.S

diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
new file mode 100644
index 0000000..654f096
--- /dev/null
+++ b/arch/arm64/include/asm/tlb.h
@@ -0,0 +1,190 @@
+/*
+ * Based on arch/arm/include/asm/tlb.h
+ *
+ * Copyright (C) 2002 Russell King
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_TLB_H
+#define __ASM_TLB_H
+
+#include <linux/pagemap.h>
+#include <linux/swap.h>
+
+#include <asm/pgalloc.h>
+#include <asm/tlbflush.h>
+
+#define MMU_GATHER_BUNDLE 8
+
+/*
+ * TLB handling. This allows us to remove pages from the page
+ * tables, and efficiently handle the TLB issues.
+ */
+struct mmu_gather {
+ struct mm_struct *mm;
+ unsigned int fullmm;
+ struct vm_area_struct *vma;
+ unsigned long range_start;
+ unsigned long range_end;
+ unsigned int nr;
+ unsigned int max;
+ struct page **pages;
+ struct page *local[MMU_GATHER_BUNDLE];
+};
+
+/*
+ * This is unnecessarily complex. There's three ways the TLB shootdown
+ * code is used:
+ * 1. Unmapping a range of vmas. See zap_page_range(), unmap_region().
+ * tlb->fullmm = 0, and tlb_start_vma/tlb_end_vma will be called.
+ * tlb->vma will be non-NULL.
+ * 2. Unmapping all vmas. See exit_mmap().
+ * tlb->fullmm = 1, and tlb_start_vma/tlb_end_vma will be called.
+ * tlb->vma will be non-NULL. Additionally, page tables will be freed.
+ * 3. Unmapping argument pages. See shift_arg_pages().
+ * tlb->fullmm = 0, but tlb_start_vma/tlb_end_vma will not be called.
+ * tlb->vma will be NULL.
+ */
+static inline void tlb_flush(struct mmu_gather *tlb)
+{
+ if (tlb->fullmm || !tlb->vma)
+ flush_tlb_mm(tlb->mm);
+ else if (tlb->range_end > 0) {
+ flush_tlb_range(tlb->vma, tlb->range_start, tlb->range_end);
+ tlb->range_start = TASK_SIZE;
+ tlb->range_end = 0;
+ }
+}
+
+static inline void tlb_add_flush(struct mmu_gather *tlb, unsigned long addr)
+{
+ if (!tlb->fullmm) {
+ if (addr < tlb->range_start)
+ tlb->range_start = addr;
+ if (addr + PAGE_SIZE > tlb->range_end)
+ tlb->range_end = addr + PAGE_SIZE;
+ }
+}
+
+static inline void __tlb_alloc_page(struct mmu_gather *tlb)
+{
+ unsigned long addr = __get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0);
+
+ if (addr) {
+ tlb->pages = (void *)addr;
+ tlb->max = PAGE_SIZE / sizeof(struct page *);
+ }
+}
+
+static inline void tlb_flush_mmu(struct mmu_gather *tlb)
+{
+ tlb_flush(tlb);
+ free_pages_and_swap_cache(tlb->pages, tlb->nr);
+ tlb->nr = 0;
+ if (tlb->pages == tlb->local)
+ __tlb_alloc_page(tlb);
+}
+
+static inline void
+tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int fullmm)
+{
+ tlb->mm = mm;
+ tlb->fullmm = fullmm;
+ tlb->vma = NULL;
+ tlb->max = ARRAY_SIZE(tlb->local);
+ tlb->pages = tlb->local;
+ tlb->nr = 0;
+ __tlb_alloc_page(tlb);
+}
+
+static inline void
+tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
+{
+ tlb_flush_mmu(tlb);
+
+ /* keep the page table cache within bounds */
+ check_pgt_cache();
+
+ if (tlb->pages != tlb->local)
+ free_pages((unsigned long)tlb->pages, 0);
+}
+
+/*
+ * Memorize the range for the TLB flush.
+ */
+static inline void
+tlb_remove_tlb_entry(struct mmu_gather *tlb, pte_t *ptep, unsigned long addr)
+{
+ tlb_add_flush(tlb, addr);
+}
+
+/*
+ * In the case of tlb vma handling, we can optimise these away in the
+ * case where we're doing a full MM flush. When we're doing a munmap,
+ * the vmas are adjusted to only cover the region to be torn down.
+ */
+static inline void
+tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
+{
+ if (!tlb->fullmm) {
+ tlb->vma = vma;
+ tlb->range_start = TASK_SIZE;
+ tlb->range_end = 0;
+ }
+}
+
+static inline void
+tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
+{
+ if (!tlb->fullmm)
+ tlb_flush(tlb);
+}
+
+static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
+{
+ tlb->pages[tlb->nr++] = page;
+ VM_BUG_ON(tlb->nr > tlb->max);
+ return tlb->max - tlb->nr;
+}
+
+static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page)
+{
+ if (!__tlb_remove_page(tlb, page))
+ tlb_flush_mmu(tlb);
+}
+
+static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
+ unsigned long addr)
+{
+ pgtable_page_dtor(pte);
+ tlb_add_flush(tlb, addr);
+ tlb_remove_page(tlb, pte);
+}
+
+#ifndef CONFIG_ARM64_64K_PAGES
+static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
+ unsigned long addr)
+{
+ tlb_add_flush(tlb, addr);
+ tlb_remove_page(tlb, virt_to_page(pmdp));
+}
+#endif
+
+#define pte_free_tlb(tlb, ptep, addr) __pte_free_tlb(tlb, ptep, addr)
+#define pmd_free_tlb(tlb, pmdp, addr) __pmd_free_tlb(tlb, pmdp, addr)
+#define pud_free_tlb(tlb, pudp, addr) pud_free((tlb)->mm, pudp)
+
+#define tlb_migrate_finish(mm) do { } while (0)
+
+#endif
diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
new file mode 100644
index 0000000..615d131
--- /dev/null
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -0,0 +1,123 @@
+/*
+ * Based on arch/arm/include/asm/tlbflush.h
+ *
+ * Copyright (C) 1999-2003 Russell King
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_TLBFLUSH_H
+#define __ASM_TLBFLUSH_H
+
+#ifndef __ASSEMBLY__
+
+#include <linux/sched.h>
+#include <asm/cputype.h>
+
+extern void __cpu_flush_user_tlb_range(unsigned long, unsigned long, struct vm_area_struct *);
+extern void __cpu_flush_kern_tlb_range(unsigned long, unsigned long);
+
+extern struct cpu_tlb_fns cpu_tlb;
+
+/*
+ * TLB Management
+ * ==============
+ *
+ * The arch/arm/mm/tlb-*.S files implement these methods.
+ *
+ * The TLB specific code is expected to perform whatever tests it
+ * needs to determine if it should invalidate the TLB for each
+ * call. Start addresses are inclusive and end addresses are
+ * exclusive; it is safe to round these addresses down.
+ *
+ * flush_tlb_all()
+ *
+ * Invalidate the entire TLB.
+ *
+ * flush_tlb_mm(mm)
+ *
+ * Invalidate all TLB entries in a particular address
+ * space.
+ * - mm - mm_struct describing address space
+ *
+ * flush_tlb_range(mm,start,end)
+ *
+ * Invalidate a range of TLB entries in the specified
+ * address space.
+ * - mm - mm_struct describing address space
+ * - start - start address (may not be aligned)
+ * - end - end address (exclusive, may not be aligned)
+ *
+ * flush_tlb_page(vaddr,vma)
+ *
+ * Invalidate the specified page in the specified address range.
+ * - vaddr - virtual address (may not be aligned)
+ * - vma - vma_struct describing address range
+ *
+ * flush_kern_tlb_page(kaddr)
+ *
+ * Invalidate the TLB entry for the specified page. The address
+ * will be in the kernels virtual memory space. Current uses
+ * only require the D-TLB to be invalidated.
+ * - kaddr - Kernel virtual memory address
+ */
+static inline void flush_tlb_all(void)
+{
+ dsb();
+ asm("tlbi vmalle1is");
+ dsb();
+ isb();
+}
+
+static inline void flush_tlb_mm(struct mm_struct *mm)
+{
+ unsigned long asid = (unsigned long)ASID(mm) << 48;
+
+ dsb();
+ asm("tlbi aside1is, %0" : : "r" (asid));
+ dsb();
+}
+
+static inline void flush_tlb_page(struct vm_area_struct *vma,
+ unsigned long uaddr)
+{
+ unsigned long addr = uaddr >> 12 |
+ ((unsigned long)ASID(vma->vm_mm) << 48);
+
+ dsb();
+ asm("tlbi vae1is, %0" : : "r" (addr));
+ dsb();
+}
+
+/*
+ * Convert calls to our calling convention.
+ */
+#define flush_tlb_range(vma,start,end) __cpu_flush_user_tlb_range(start,end,vma)
+#define flush_tlb_kernel_range(s,e) __cpu_flush_kern_tlb_range(s,e)
+
+/*
+ * On AArch64, the cache coherency is handled via the set_pte_at() function.
+ */
+static inline void update_mmu_cache(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep)
+{
+ /*
+ * set_pte() does not have a DSB, so make sure that the page table
+ * write is visible.
+ */
+ dsb();
+}
+
+#endif
+
+#endif
diff --git a/arch/arm64/mm/tlb.S b/arch/arm64/mm/tlb.S
new file mode 100644
index 0000000..8ae80a1
--- /dev/null
+++ b/arch/arm64/mm/tlb.S
@@ -0,0 +1,71 @@
+/*
+ * Based on arch/arm/mm/tlb.S
+ *
+ * Copyright (C) 1997-2002 Russell King
+ * Copyright (C) 2012 ARM Ltd.
+ * Written by Catalin Marinas <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+#include <asm/asm-offsets.h>
+#include <asm/page.h>
+#include <asm/tlbflush.h>
+#include "proc-macros.S"
+
+/*
+ * __cpu_flush_user_tlb_range(start, end, vma)
+ *
+ * Invalidate a range of TLB entries in the specified address space.
+ *
+ * - start - start address (may not be aligned)
+ * - end - end address (exclusive, may not be aligned)
+ * - vma - vma_struct describing address range
+ */
+ENTRY(__cpu_flush_user_tlb_range)
+ vma_vm_mm x3, x2 // get vma->vm_mm
+ mmid x3, x3 // get vm_mm->context.id
+ dsb sy
+ lsr x0, x0, #12 // align address
+ lsr x1, x1, #12
+ bfi x0, x3, #48, #16 // start VA and ASID
+ bfi x1, x3, #48, #16 // end VA and ASID
+1: tlbi vae1is, x0 // TLB invalidate by address and ASID
+ add x0, x0, #1
+ cmp x0, x1
+ b.lo 1b
+ dsb sy
+ ret
+ENDPROC(__cpu_flush_user_tlb_range)
+
+/*
+ * __cpu_flush_kern_tlb_range(start,end)
+ *
+ * Invalidate a range of kernel TLB entries.
+ *
+ * - start - start address (may not be aligned)
+ * - end - end address (exclusive, may not be aligned)
+ */
+ENTRY(__cpu_flush_kern_tlb_range)
+ dsb sy
+ lsr x0, x0, #12 // align address
+ lsr x1, x1, #12
+1: tlbi vaae1is, x0 // TLB invalidate by address
+ add x0, x0, #1
+ cmp x0, x1
+ b.lo 1b
+ dsb sy
+ isb
+ ret
+ENDPROC(__cpu_flush_kern_tlb_range)

2012-08-14 17:53:57

by Catalin Marinas

[permalink] [raw]
Subject: [PATCH v2 07/31] arm64: Process management

The patch adds support for thread creation and context switching. The
context switching CPU specific code is introduced with the CPU support
patch (part of the arch/arm64/mm/proc.S file). AArch64 supports
ASID-tagged TLBs and the ASID can be either 8 or 16-bit wide (detectable
via the ID_AA64AFR0_EL1 register).

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
---
arch/arm64/include/asm/mmu_context.h | 152 +++++++++++++
arch/arm64/include/asm/thread_info.h | 124 ++++++++++
arch/arm64/kernel/process.c | 416 ++++++++++++++++++++++++++++++++++
arch/arm64/mm/context.c | 159 +++++++++++++
4 files changed, 851 insertions(+), 0 deletions(-)
create mode 100644 arch/arm64/include/asm/mmu_context.h
create mode 100644 arch/arm64/include/asm/thread_info.h
create mode 100644 arch/arm64/kernel/process.c
create mode 100644 arch/arm64/mm/context.c

diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
new file mode 100644
index 0000000..f68465d
--- /dev/null
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -0,0 +1,152 @@
+/*
+ * Based on arch/arm/include/asm/mmu_context.h
+ *
+ * Copyright (C) 1996 Russell King.
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_MMU_CONTEXT_H
+#define __ASM_MMU_CONTEXT_H
+
+#include <linux/compiler.h>
+#include <linux/sched.h>
+
+#include <asm/cacheflush.h>
+#include <asm/proc-fns.h>
+#include <asm-generic/mm_hooks.h>
+#include <asm/cputype.h>
+#include <asm/pgtable.h>
+
+#define MAX_ASID_BITS 16
+
+extern unsigned int cpu_last_asid;
+
+void __init_new_context(struct task_struct *tsk, struct mm_struct *mm);
+void __new_context(struct mm_struct *mm);
+
+/*
+ * Set TTBR0 to empty_zero_page. No translations will be possible via TTBR0.
+ */
+static inline void cpu_set_reserved_ttbr0(void)
+{
+ unsigned long ttbr = page_to_phys(empty_zero_page);
+
+ asm(
+ " msr ttbr0_el1, %0 // set TTBR0\n"
+ " isb"
+ :
+ : "r" (ttbr));
+}
+
+static inline void switch_new_context(struct mm_struct *mm)
+{
+ unsigned long flags;
+
+ __new_context(mm);
+
+ local_irq_save(flags);
+ cpu_switch_mm(mm->pgd, mm);
+ local_irq_restore(flags);
+}
+
+static inline void check_and_switch_context(struct mm_struct *mm,
+ struct task_struct *tsk)
+{
+ /*
+ * Required during context switch to avoid speculative page table
+ * walking with the wrong TTBR.
+ */
+ cpu_set_reserved_ttbr0();
+
+ if (!((mm->context.id ^ cpu_last_asid) >> MAX_ASID_BITS))
+ /*
+ * The ASID is from the current generation, just switch to the
+ * new pgd. This condition is only true for calls from
+ * context_switch() and interrupts are already disabled.
+ */
+ cpu_switch_mm(mm->pgd, mm);
+ else if (irqs_disabled())
+ /*
+ * Defer the new ASID allocation until after the context
+ * switch critical region since __new_context() cannot be
+ * called with interrupts disabled.
+ */
+ set_ti_thread_flag(task_thread_info(tsk), TIF_SWITCH_MM);
+ else
+ /*
+ * That is a direct call to switch_mm() or activate_mm() with
+ * interrupts enabled and a new context.
+ */
+ switch_new_context(mm);
+}
+
+#define init_new_context(tsk,mm) (__init_new_context(tsk,mm),0)
+#define destroy_context(mm) do { } while(0)
+
+#define finish_arch_post_lock_switch \
+ finish_arch_post_lock_switch
+static inline void finish_arch_post_lock_switch(void)
+{
+ if (test_and_clear_thread_flag(TIF_SWITCH_MM)) {
+ struct mm_struct *mm = current->mm;
+ unsigned long flags;
+
+ __new_context(mm);
+
+ local_irq_save(flags);
+ cpu_switch_mm(mm->pgd, mm);
+ local_irq_restore(flags);
+ }
+}
+
+/*
+ * This is called when "tsk" is about to enter lazy TLB mode.
+ *
+ * mm: describes the currently active mm context
+ * tsk: task which is entering lazy tlb
+ * cpu: cpu number which is entering lazy tlb
+ *
+ * tsk->mm will be NULL
+ */
+static inline void
+enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
+{
+}
+
+/*
+ * This is the actual mm switch as far as the scheduler
+ * is concerned. No registers are touched. We avoid
+ * calling the CPU specific function when the mm hasn't
+ * actually changed.
+ */
+static inline void
+switch_mm(struct mm_struct *prev, struct mm_struct *next,
+ struct task_struct *tsk)
+{
+ unsigned int cpu = smp_processor_id();
+
+#ifdef CONFIG_SMP
+ /* check for possible thread migration */
+ if (!cpumask_empty(mm_cpumask(next)) &&
+ !cpumask_test_cpu(cpu, mm_cpumask(next)))
+ __flush_icache_all();
+#endif
+ if (!cpumask_test_and_set_cpu(cpu, mm_cpumask(next)) || prev != next)
+ check_and_switch_context(next, tsk);
+}
+
+#define deactivate_mm(tsk,mm) do { } while (0)
+#define activate_mm(prev,next) switch_mm(prev, next, NULL)
+
+#endif
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
new file mode 100644
index 0000000..4f909a3
--- /dev/null
+++ b/arch/arm64/include/asm/thread_info.h
@@ -0,0 +1,124 @@
+/*
+ * Based on arch/arm/include/asm/thread_info.h
+ *
+ * Copyright (C) 2002 Russell King.
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_THREAD_INFO_H
+#define __ASM_THREAD_INFO_H
+
+#ifdef __KERNEL__
+
+#include <linux/compiler.h>
+
+#define THREAD_SIZE_ORDER 1
+#define THREAD_SIZE 8192
+#define THREAD_START_SP (THREAD_SIZE - 16)
+
+#ifndef __ASSEMBLY__
+
+struct task_struct;
+struct exec_domain;
+
+#include <asm/types.h>
+
+typedef unsigned long mm_segment_t;
+
+/*
+ * low level task data that entry.S needs immediate access to.
+ * __switch_to() assumes cpu_context follows immediately after cpu_domain.
+ */
+struct thread_info {
+ unsigned long flags; /* low level flags */
+ mm_segment_t addr_limit; /* address limit */
+ struct task_struct *task; /* main task structure */
+ struct exec_domain *exec_domain; /* execution domain */
+ struct restart_block restart_block;
+ int preempt_count; /* 0 => preemptable, <0 => bug */
+ int cpu; /* cpu */
+};
+
+#define INIT_THREAD_INFO(tsk) \
+{ \
+ .task = &tsk, \
+ .exec_domain = &default_exec_domain, \
+ .flags = 0, \
+ .preempt_count = INIT_PREEMPT_COUNT, \
+ .addr_limit = KERNEL_DS, \
+ .restart_block = { \
+ .fn = do_no_restart_syscall, \
+ }, \
+}
+
+#define init_thread_info (init_thread_union.thread_info)
+#define init_stack (init_thread_union.stack)
+
+/*
+ * how to get the thread information struct from C
+ */
+static inline struct thread_info *current_thread_info(void) __attribute_const__;
+
+static inline struct thread_info *current_thread_info(void)
+{
+ register unsigned long sp asm ("sp");
+ return (struct thread_info *)(sp & ~(THREAD_SIZE - 1));
+}
+
+#define thread_saved_pc(tsk) \
+ ((unsigned long)(tsk->thread.cpu_context.pc))
+#define thread_saved_sp(tsk) \
+ ((unsigned long)(tsk->thread.cpu_context.sp))
+#define thread_saved_fp(tsk) \
+ ((unsigned long)(tsk->thread.cpu_context.fp))
+
+#endif
+
+/*
+ * We use bit 30 of the preempt_count to indicate that kernel
+ * preemption is occurring. See <asm/hardirq.h>.
+ */
+#define PREEMPT_ACTIVE 0x40000000
+
+/*
+ * thread information flags:
+ * TIF_SYSCALL_TRACE - syscall trace active
+ * TIF_SIGPENDING - signal pending
+ * TIF_NEED_RESCHED - rescheduling necessary
+ * TIF_NOTIFY_RESUME - callback before returning to user
+ * TIF_USEDFPU - FPU was used by this task this quantum (SMP)
+ * TIF_POLLING_NRFLAG - true if poll_idle() is polling TIF_NEED_RESCHED
+ */
+#define TIF_SIGPENDING 0
+#define TIF_NEED_RESCHED 1
+#define TIF_NOTIFY_RESUME 2 /* callback before returning to user */
+#define TIF_SYSCALL_TRACE 8
+#define TIF_POLLING_NRFLAG 16
+#define TIF_MEMDIE 18 /* is terminating due to OOM killer */
+#define TIF_FREEZE 19
+#define TIF_RESTORE_SIGMASK 20
+#define TIF_SINGLESTEP 21
+#define TIF_32BIT 22 /* 32bit process */
+#define TIF_SWITCH_MM 23 /* deferred switch_mm */
+
+#define _TIF_SIGPENDING (1 << TIF_SIGPENDING)
+#define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED)
+#define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
+#define _TIF_32BIT (1 << TIF_32BIT)
+
+#define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
+ _TIF_NOTIFY_RESUME)
+
+#endif /* __KERNEL__ */
+#endif /* __ASM_THREAD_INFO_H */
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
new file mode 100644
index 0000000..c4a4e1c
--- /dev/null
+++ b/arch/arm64/kernel/process.c
@@ -0,0 +1,416 @@
+/*
+ * Based on arch/arm/kernel/process.c
+ *
+ * Original Copyright (C) 1995 Linus Torvalds
+ * Copyright (C) 1996-2000 Russell King - Converted to ARM.
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <stdarg.h>
+
+#include <linux/export.h>
+#include <linux/sched.h>
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/stddef.h>
+#include <linux/unistd.h>
+#include <linux/user.h>
+#include <linux/delay.h>
+#include <linux/reboot.h>
+#include <linux/interrupt.h>
+#include <linux/kallsyms.h>
+#include <linux/init.h>
+#include <linux/cpu.h>
+#include <linux/elfcore.h>
+#include <linux/pm.h>
+#include <linux/tick.h>
+#include <linux/utsname.h>
+#include <linux/uaccess.h>
+#include <linux/random.h>
+#include <linux/hw_breakpoint.h>
+#include <linux/personality.h>
+#include <linux/notifier.h>
+
+#include <asm/cacheflush.h>
+#include <asm/processor.h>
+#include <asm/stacktrace.h>
+#include <asm/fpsimd.h>
+
+extern void setup_mm_for_reboot(void);
+
+static void setup_restart(void)
+{
+ /*
+ * Tell the mm system that we are going to reboot -
+ * we may need it to insert some 1:1 mappings so that
+ * soft boot works.
+ */
+ setup_mm_for_reboot();
+
+ /* Clean and invalidate caches */
+ flush_cache_all();
+
+ /* Turn off caching */
+ cpu_proc_fin();
+
+ /* Push out any further dirty data, and ensure cache is empty */
+ flush_cache_all();
+}
+
+void soft_restart(unsigned long addr)
+{
+ setup_restart();
+ cpu_reset(addr);
+}
+
+/*
+ * Function pointers to optional machine specific functions
+ */
+void (*pm_power_off)(void);
+EXPORT_SYMBOL(pm_power_off);
+
+void (*pm_restart)(const char *cmd);
+EXPORT_SYMBOL_GPL(pm_restart);
+
+
+/*
+ * This is our default idle handler.
+ */
+static void default_idle(void)
+{
+ /*
+ * This should do all the clock switching and wait for interrupt
+ * tricks
+ */
+ cpu_do_idle();
+ local_irq_enable();
+}
+
+void (*pm_idle)(void) = default_idle;
+EXPORT_SYMBOL(pm_idle);
+
+/*
+ * The idle thread, has rather strange semantics for calling pm_idle,
+ * but this is what x86 does and we need to do the same, so that
+ * things like cpuidle get called in the same way. The only difference
+ * is that we always respect 'hlt_counter' to prevent low power idle.
+ */
+void cpu_idle(void)
+{
+ local_fiq_enable();
+
+ /* endless idle loop with no priority at all */
+ while (1) {
+ tick_nohz_idle_enter();
+ rcu_idle_enter();
+ while (!need_resched()) {
+ /*
+ * We need to disable interrupts here to ensure
+ * we don't miss a wakeup call.
+ */
+ local_irq_disable();
+ if (!need_resched()) {
+ stop_critical_timings();
+ pm_idle();
+ start_critical_timings();
+ /*
+ * pm_idle functions should always return
+ * with IRQs enabled.
+ */
+ WARN_ON(irqs_disabled());
+ } else {
+ local_irq_enable();
+ }
+ }
+ rcu_idle_exit();
+ tick_nohz_idle_exit();
+ preempt_enable_no_resched();
+ schedule();
+ preempt_disable();
+ }
+}
+
+void machine_shutdown(void)
+{
+#ifdef CONFIG_SMP
+ smp_send_stop();
+#endif
+}
+
+void machine_halt(void)
+{
+ machine_shutdown();
+ while (1);
+}
+
+void machine_power_off(void)
+{
+ machine_shutdown();
+ if (pm_power_off)
+ pm_power_off();
+}
+
+void machine_restart(char *cmd)
+{
+ machine_shutdown();
+
+ /* Disable interrupts first */
+ local_irq_disable();
+ local_fiq_disable();
+
+ /* Now call the architecture specific reboot code. */
+ if (pm_restart)
+ pm_restart(cmd);
+
+ /*
+ * Whoops - the architecture was unable to reboot.
+ * Tell the user!
+ */
+ mdelay(1000);
+ printk("Reboot failed -- System halted\n");
+ while (1);
+}
+
+void __show_regs(struct pt_regs *regs)
+{
+ int i;
+
+ printk("CPU: %d %s (%s %.*s)\n",
+ raw_smp_processor_id(), print_tainted(),
+ init_utsname()->release,
+ (int)strcspn(init_utsname()->version, " "),
+ init_utsname()->version);
+ print_symbol("PC is at %s\n", instruction_pointer(regs));
+ print_symbol("LR is at %s\n", regs->regs[30]);
+ printk("pc : [<%016llx>] lr : [<%016llx>] pstate: %08llx\n",
+ regs->pc, regs->regs[30], regs->pstate);
+ printk("sp : %016llx\n", regs->sp);
+ for (i = 29; i >= 0; i--) {
+ printk("x%-2d: %016llx ", i, regs->regs[i]);
+ if (i % 2 == 0)
+ printk("\n");
+ }
+ printk("\n");
+}
+
+void show_regs(struct pt_regs * regs)
+{
+ printk("\n");
+ printk("Pid: %d, comm: %20s\n", task_pid_nr(current), current->comm);
+ __show_regs(regs);
+}
+
+/*
+ * Free current thread data structures etc..
+ */
+void exit_thread(void)
+{
+}
+
+void flush_thread(void)
+{
+ fpsimd_flush_thread();
+ flush_ptrace_hw_breakpoint(current);
+}
+
+void release_thread(struct task_struct *dead_task)
+{
+}
+
+int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
+{
+ fpsimd_save_state(&current->thread.fpsimd_state);
+ *dst = *src;
+ return 0;
+}
+
+asmlinkage void ret_from_fork(void) asm("ret_from_fork");
+
+int copy_thread(unsigned long clone_flags, unsigned long stack_start,
+ unsigned long stk_sz, struct task_struct *p,
+ struct pt_regs *regs)
+{
+ struct pt_regs *childregs = task_pt_regs(p);
+ unsigned long tls = p->thread.tp_value;
+
+ *childregs = *regs;
+ childregs->regs[0] = 0;
+
+#ifdef CONFIG_AARCH32_EMULATION
+ if (test_ti_thread_flag(task_thread_info(p), TIF_32BIT))
+ childregs->compat_sp = stack_start;
+ else
+#endif
+ {
+ /*
+ * Read the current TLS pointer from tpidr_el0 as it may be
+ * out-of-sync with the saved value.
+ */
+ asm("mrs %0, tpidr_el0" : "=r" (tls));
+ childregs->sp = stack_start;
+ }
+
+ memset(&p->thread.cpu_context, 0, sizeof(struct cpu_context));
+ p->thread.cpu_context.sp = (unsigned long)childregs;
+ p->thread.cpu_context.pc = (unsigned long)ret_from_fork;
+
+ /* If a TLS pointer was passed to clone, use that for the new thread. */
+ if (clone_flags & CLONE_SETTLS)
+ tls = regs->regs[3];
+ p->thread.tp_value = tls;
+
+ ptrace_hw_copy_thread(p);
+
+ return 0;
+}
+
+static void tls_thread_switch(struct task_struct *next)
+{
+ unsigned long tpidr, tpidrro;
+
+ if (!test_thread_flag(TIF_32BIT)) {
+ asm("mrs %0, tpidr_el0" : "=r" (tpidr));
+ current->thread.tp_value = tpidr;
+ }
+
+ if (test_ti_thread_flag(task_thread_info(next), TIF_32BIT)) {
+ tpidr = 0;
+ tpidrro = next->thread.tp_value;
+ } else {
+ tpidr = next->thread.tp_value;
+ tpidrro = 0;
+ }
+
+ asm(
+ " msr tpidr_el0, %0\n"
+ " msr tpidrro_el0, %1"
+ : : "r" (tpidr), "r" (tpidrro));
+}
+
+/*
+ * Thread switching.
+ */
+struct task_struct *__switch_to(struct task_struct *prev,
+ struct task_struct *next)
+{
+ struct task_struct *last;
+
+ fpsimd_thread_switch(next);
+ tls_thread_switch(next);
+ hw_breakpoint_thread_switch(next);
+
+ /* the actual thread switch */
+ last = cpu_switch_to(prev, next);
+
+ return last;
+}
+
+/*
+ * Fill in the task's elfregs structure for a core dump.
+ */
+int dump_task_regs(struct task_struct *t, elf_gregset_t *elfregs)
+{
+ elf_core_copy_regs(elfregs, task_pt_regs(t));
+ return 1;
+}
+
+/*
+ * fill in the fpe structure for a core dump...
+ */
+int dump_fpu (struct pt_regs *regs, struct user_fp *fp)
+{
+ return 0;
+}
+EXPORT_SYMBOL(dump_fpu);
+
+/*
+ * Shuffle the argument into the correct register before calling the
+ * thread function. x1 is the thread argument, x2 is the pointer to
+ * the thread function, and x3 points to the exit function.
+ */
+extern void kernel_thread_helper(void);
+asm( ".section .text\n"
+" .align\n"
+" .type kernel_thread_helper, #function\n"
+"kernel_thread_helper:\n"
+" mov x0, x1\n"
+" mov x30, x3\n"
+" br x2\n"
+" .size kernel_thread_helper, . - kernel_thread_helper\n"
+" .previous");
+
+#define kernel_thread_exit do_exit
+
+/*
+ * Create a kernel thread.
+ */
+pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags)
+{
+ struct pt_regs regs;
+
+ memset(&regs, 0, sizeof(regs));
+
+ regs.regs[1] = (unsigned long)arg;
+ regs.regs[2] = (unsigned long)fn;
+ regs.regs[3] = (unsigned long)kernel_thread_exit;
+ regs.pc = (unsigned long)kernel_thread_helper;
+ regs.pstate = PSR_MODE_EL1h;
+
+ return do_fork(flags|CLONE_VM|CLONE_UNTRACED, 0, &regs, 0, NULL, NULL);
+}
+EXPORT_SYMBOL(kernel_thread);
+
+unsigned long get_wchan(struct task_struct *p)
+{
+ struct stackframe frame;
+ int count = 0;
+ if (!p || p == current || p->state == TASK_RUNNING)
+ return 0;
+
+ frame.fp = thread_saved_fp(p);
+ frame.sp = thread_saved_sp(p);
+ frame.pc = thread_saved_pc(p);
+ do {
+ int ret = unwind_frame(&frame);
+ if (ret < 0)
+ return 0;
+ if (!in_sched_functions(frame.pc))
+ return frame.pc;
+ } while (count ++ < 16);
+ return 0;
+}
+
+unsigned long arch_align_stack(unsigned long sp)
+{
+ if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space)
+ sp -= get_random_int() & ~PAGE_MASK;
+ return sp & ~0xf;
+}
+
+static unsigned long randomize_base(unsigned long base)
+{
+ unsigned long range_end = base + (STACK_RND_MASK << PAGE_SHIFT) + 1;
+ return randomize_range(base, range_end, 0) ? : base;
+}
+
+unsigned long arch_randomize_brk(struct mm_struct *mm)
+{
+ return randomize_base(mm->brk);
+}
+
+unsigned long randomize_et_dyn(unsigned long base)
+{
+ return randomize_base(base);
+}
diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
new file mode 100644
index 0000000..e06f47a
--- /dev/null
+++ b/arch/arm64/mm/context.c
@@ -0,0 +1,159 @@
+/*
+ * Based on arch/arm/mm/context.c
+ *
+ * Copyright (C) 2002-2003 Deep Blue Solutions Ltd, all rights reserved.
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/init.h>
+#include <linux/sched.h>
+#include <linux/mm.h>
+#include <linux/smp.h>
+#include <linux/percpu.h>
+
+#include <asm/mmu_context.h>
+#include <asm/tlbflush.h>
+#include <asm/cachetype.h>
+
+#define asid_bits(reg) \
+ (((read_cpuid(ID_AA64MMFR0_EL1) & 0xf0) >> 2) + 8)
+
+#define ASID_FIRST_VERSION (1 << MAX_ASID_BITS)
+
+static DEFINE_SPINLOCK(cpu_asid_lock);
+unsigned int cpu_last_asid = ASID_FIRST_VERSION;
+
+/*
+ * We fork()ed a process, and we need a new context for the child to run in.
+ */
+void __init_new_context(struct task_struct *tsk, struct mm_struct *mm)
+{
+ mm->context.id = 0;
+ spin_lock_init(&mm->context.id_lock);
+}
+
+static void flush_context(void)
+{
+ /* set the reserved TTBR0 before flushing the TLB */
+ cpu_set_reserved_ttbr0();
+ flush_tlb_all();
+ if (icache_is_aivivt())
+ __flush_icache_all();
+}
+
+#ifdef CONFIG_SMP
+
+static void set_mm_context(struct mm_struct *mm, unsigned int asid)
+{
+ unsigned long flags;
+
+ /*
+ * Locking needed for multi-threaded applications where the same
+ * mm->context.id could be set from different CPUs during the
+ * broadcast. This function is also called via IPI so the
+ * mm->context.id_lock has to be IRQ-safe.
+ */
+ spin_lock_irqsave(&mm->context.id_lock, flags);
+ if (likely((mm->context.id ^ cpu_last_asid) >> MAX_ASID_BITS)) {
+ /*
+ * Old version of ASID found. Set the new one and reset
+ * mm_cpumask(mm).
+ */
+ mm->context.id = asid;
+ cpumask_clear(mm_cpumask(mm));
+ }
+ spin_unlock_irqrestore(&mm->context.id_lock, flags);
+
+ /*
+ * Set the mm_cpumask(mm) bit for the current CPU.
+ */
+ cpumask_set_cpu(smp_processor_id(), mm_cpumask(mm));
+}
+
+/*
+ * Reset the ASID on the current CPU. This function call is broadcast from the
+ * CPU handling the ASID rollover and holding cpu_asid_lock.
+ */
+static void reset_context(void *info)
+{
+ unsigned int asid;
+ unsigned int cpu = smp_processor_id();
+ struct mm_struct *mm = current->active_mm;
+
+ smp_rmb();
+ asid = cpu_last_asid + cpu;
+
+ flush_context();
+ set_mm_context(mm, asid);
+
+ /* set the new ASID */
+ cpu_switch_mm(mm->pgd, mm);
+}
+
+#else
+
+static inline void set_mm_context(struct mm_struct *mm, unsigned int asid)
+{
+ mm->context.id = asid;
+ cpumask_copy(mm_cpumask(mm), cpumask_of(smp_processor_id()));
+}
+
+#endif
+
+void __new_context(struct mm_struct *mm)
+{
+ unsigned int asid;
+ unsigned int bits = asid_bits();
+
+ spin_lock(&cpu_asid_lock);
+#ifdef CONFIG_SMP
+ /*
+ * Check the ASID again, in case the change was broadcast from another
+ * CPU before we acquired the lock.
+ */
+ if (!unlikely((mm->context.id ^ cpu_last_asid) >> MAX_ASID_BITS)) {
+ cpumask_set_cpu(smp_processor_id(), mm_cpumask(mm));
+ spin_unlock(&cpu_asid_lock);
+ return;
+ }
+#endif
+ /*
+ * At this point, it is guaranteed that the current mm (with an old
+ * ASID) isn't active on any other CPU since the ASIDs are changed
+ * simultaneously via IPI.
+ */
+ asid = ++cpu_last_asid;
+
+ /*
+ * If we've used up all our ASIDs, we need to start a new version and
+ * flush the TLB.
+ */
+ if (unlikely((asid & ((1 << bits) - 1)) == 0)) {
+ /* increment the ASID version */
+ cpu_last_asid += (1 << MAX_ASID_BITS) - (1 << bits);
+ if (cpu_last_asid == 0)
+ cpu_last_asid = ASID_FIRST_VERSION;
+ asid = cpu_last_asid + smp_processor_id();
+ flush_context();
+#ifdef CONFIG_SMP
+ smp_wmb();
+ smp_call_function(reset_context, NULL, 1);
+#endif
+ cpu_last_asid += NR_CPUS - 1;
+ }
+
+ set_mm_context(mm, asid);
+ spin_unlock(&cpu_asid_lock);
+}

2012-08-14 17:53:55

by Catalin Marinas

[permalink] [raw]
Subject: [PATCH v2 09/31] arm64: Cache maintenance routines

The patch adds functionality required for cache maintenance. The AArch64
architecture mandates non-aliasing VIPT or PIPT D-cache and VIPT (may
have aliases) or ASID-tagged VIVT I-cache. Cache maintenance operations
are automatically broadcast in hardware between CPUs.

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
---
arch/arm64/include/asm/cache.h | 32 ++++
arch/arm64/include/asm/cacheflush.h | 209 ++++++++++++++++++++++++++
arch/arm64/include/asm/cachetype.h | 48 ++++++
arch/arm64/mm/cache.S | 279 +++++++++++++++++++++++++++++++++++
arch/arm64/mm/flush.c | 132 +++++++++++++++++
5 files changed, 700 insertions(+), 0 deletions(-)
create mode 100644 arch/arm64/include/asm/cache.h
create mode 100644 arch/arm64/include/asm/cacheflush.h
create mode 100644 arch/arm64/include/asm/cachetype.h
create mode 100644 arch/arm64/mm/cache.S
create mode 100644 arch/arm64/mm/flush.c

diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
new file mode 100644
index 0000000..390308a
--- /dev/null
+++ b/arch/arm64/include/asm/cache.h
@@ -0,0 +1,32 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_CACHE_H
+#define __ASM_CACHE_H
+
+#define L1_CACHE_SHIFT 6
+#define L1_CACHE_BYTES (1 << L1_CACHE_SHIFT)
+
+/*
+ * Memory returned by kmalloc() may be used for DMA, so we must make
+ * sure that all such allocations are cache aligned. Otherwise,
+ * unrelated code may cause parts of the buffer to be read into the
+ * cache before the transfer is done, causing old data to be seen by
+ * the CPU.
+ */
+#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
+#define ARCH_SLAB_MINALIGN 8
+
+#endif
diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
new file mode 100644
index 0000000..93b5590
--- /dev/null
+++ b/arch/arm64/include/asm/cacheflush.h
@@ -0,0 +1,209 @@
+/*
+ * Based on arch/arm/include/asm/cacheflush.h
+ *
+ * Copyright (C) 1999-2002 Russell King.
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_CACHEFLUSH_H
+#define __ASM_CACHEFLUSH_H
+
+#include <linux/mm.h>
+
+/*
+ * This flag is used to indicate that the page pointed to by a pte is clean
+ * and does not require cleaning before returning it to the user.
+ */
+#define PG_dcache_clean PG_arch_1
+
+/*
+ * MM Cache Management
+ * ===================
+ *
+ * The arch/arm/mm/cache-*.S and arch/arm/mm/proc-*.S files
+ * implement these methods.
+ *
+ * Start addresses are inclusive and end addresses are exclusive;
+ * start addresses should be rounded down, end addresses up.
+ *
+ * See Documentation/cachetlb.txt for more information.
+ * Please note that the implementation of these, and the required
+ * effects are cache-type (VIVT/VIPT/PIPT) specific.
+ *
+ * flush_cache_kern_all()
+ *
+ * Unconditionally clean and invalidate the entire cache.
+ *
+ * flush_cache_user_mm(mm)
+ *
+ * Clean and invalidate all user space cache entries
+ * before a change of page tables.
+ *
+ * flush_cache_user_range(start, end, flags)
+ *
+ * Clean and invalidate a range of cache entries in the
+ * specified address space before a change of page tables.
+ * - start - user start address (inclusive, page aligned)
+ * - end - user end address (exclusive, page aligned)
+ * - flags - vma->vm_flags field
+ *
+ * coherent_kern_range(start, end)
+ *
+ * Ensure coherency between the Icache and the Dcache in the
+ * region described by start, end. If you have non-snooping
+ * Harvard caches, you need to implement this function.
+ * - start - virtual start address
+ * - end - virtual end address
+ *
+ * coherent_user_range(start, end)
+ *
+ * Ensure coherency between the Icache and the Dcache in the
+ * region described by start, end. If you have non-snooping
+ * Harvard caches, you need to implement this function.
+ * - start - virtual start address
+ * - end - virtual end address
+ *
+ * flush_kern_dcache_area(kaddr, size)
+ *
+ * Ensure that the data held in page is written back.
+ * - kaddr - page address
+ * - size - region size
+ *
+ * DMA Cache Coherency
+ * ===================
+ *
+ * dma_flush_range(start, end)
+ *
+ * Clean and invalidate the specified virtual address range.
+ * - start - virtual start address
+ * - end - virtual end address
+ */
+extern void __cpuc_flush_kern_all(void);
+extern void __cpuc_flush_user_all(void);
+extern void __cpuc_flush_user_range(unsigned long, unsigned long, unsigned int);
+extern void __cpuc_coherent_kern_range(unsigned long, unsigned long);
+extern void __cpuc_coherent_user_range(unsigned long, unsigned long);
+extern void __cpuc_flush_dcache_area(void *, size_t);
+
+/*
+ * These are private to the dma-mapping API. Do not use directly.
+ * Their sole purpose is to ensure that data held in the cache
+ * is visible to DMA, or data written by DMA to system memory is
+ * visible to the CPU.
+ */
+extern void dmac_map_area(const void *, size_t, int);
+extern void dmac_unmap_area(const void *, size_t, int);
+extern void dmac_flush_range(const void *, const void *);
+
+/*
+ * Copy user data from/to a page which is mapped into a different
+ * processes address space. Really, we want to allow our "user
+ * space" model to handle this.
+ */
+extern void copy_to_user_page(struct vm_area_struct *, struct page *,
+ unsigned long, void *, const void *, unsigned long);
+#define copy_from_user_page(vma, page, vaddr, dst, src, len) \
+ do { \
+ memcpy(dst, src, len); \
+ } while (0)
+
+/*
+ * Convert calls to our calling convention.
+ */
+#define flush_cache_all() __cpuc_flush_kern_all()
+extern void flush_cache_mm(struct mm_struct *mm);
+extern void flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned long end);
+extern void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr, unsigned long pfn);
+
+#define flush_cache_dup_mm(mm) flush_cache_mm(mm)
+
+/*
+ * flush_cache_user_range is used when we want to ensure that the
+ * Harvard caches are synchronised for the user space address range.
+ * This is used for the ARM private sys_cacheflush system call.
+ */
+#define flush_cache_user_range(start, end) \
+ __cpuc_coherent_user_range((start) & PAGE_MASK, PAGE_ALIGN(end))
+
+/*
+ * Perform necessary cache operations to ensure that data previously
+ * stored within this range of addresses can be executed by the CPU.
+ */
+#define flush_icache_range(s,e) __cpuc_coherent_kern_range(s,e)
+
+/*
+ * flush_dcache_page is used when the kernel has written to the page
+ * cache page at virtual address page->virtual.
+ *
+ * If this page isn't mapped (ie, page_mapping == NULL), or it might
+ * have userspace mappings, then we _must_ always clean + invalidate
+ * the dcache entries associated with the kernel mapping.
+ *
+ * Otherwise we can defer the operation, and clean the cache when we are
+ * about to change to user space. This is the same method as used on SPARC64.
+ * See update_mmu_cache for the user space part.
+ */
+#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
+extern void flush_dcache_page(struct page *);
+
+static inline void __flush_icache_all(void)
+{
+ asm("ic ialluis");
+}
+
+#define ARCH_HAS_FLUSH_ANON_PAGE
+static inline void flush_anon_page(struct vm_area_struct *vma,
+ struct page *page, unsigned long vmaddr)
+{
+ extern void __flush_anon_page(struct vm_area_struct *vma,
+ struct page *, unsigned long);
+ if (PageAnon(page))
+ __flush_anon_page(vma, page, vmaddr);
+}
+
+#define flush_dcache_mmap_lock(mapping) \
+ spin_lock_irq(&(mapping)->tree_lock)
+#define flush_dcache_mmap_unlock(mapping) \
+ spin_unlock_irq(&(mapping)->tree_lock)
+
+#define flush_icache_user_range(vma,page,addr,len) \
+ flush_dcache_page(page)
+
+/*
+ * We don't appear to need to do anything here. In fact, if we did, we'd
+ * duplicate cache flushing elsewhere performed by flush_dcache_page().
+ */
+#define flush_icache_page(vma,page) do { } while (0)
+
+/*
+ * flush_cache_vmap() is used when creating mappings (eg, via vmap,
+ * vmalloc, ioremap etc) in kernel space for pages. On non-VIPT
+ * caches, since the direct-mappings of these pages may contain cached
+ * data, we need to do a full cache flush to ensure that writebacks
+ * don't corrupt data placed into these pages via the new mappings.
+ */
+static inline void flush_cache_vmap(unsigned long start, unsigned long end)
+{
+ /*
+ * set_pte_at() called from vmap_pte_range() does not
+ * have a DSB after cleaning the cache line.
+ */
+ dsb();
+}
+
+static inline void flush_cache_vunmap(unsigned long start, unsigned long end)
+{
+}
+
+#endif
diff --git a/arch/arm64/include/asm/cachetype.h b/arch/arm64/include/asm/cachetype.h
new file mode 100644
index 0000000..85f5f51
--- /dev/null
+++ b/arch/arm64/include/asm/cachetype.h
@@ -0,0 +1,48 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_CACHETYPE_H
+#define __ASM_CACHETYPE_H
+
+#include <asm/cputype.h>
+
+#define CTR_L1IP_SHIFT 14
+#define CTR_L1IP_MASK 3
+
+#define ICACHE_POLICY_RESERVED 0
+#define ICACHE_POLICY_AIVIVT 1
+#define ICACHE_POLICY_VIPT 2
+#define ICACHE_POLICY_PIPT 3
+
+static inline u32 icache_policy(void)
+{
+ return (read_cpuid_cachetype() >> CTR_L1IP_SHIFT) & CTR_L1IP_MASK;
+}
+
+/*
+ * Whilst the D-side always behaves as PIPT on AArch64, aliasing is
+ * permitted in the I-cache.
+ */
+static inline int icache_is_aliasing(void)
+{
+ return icache_policy() != ICACHE_POLICY_PIPT;
+}
+
+static inline int icache_is_aivivt(void)
+{
+ return icache_policy() == ICACHE_POLICY_AIVIVT;
+}
+
+#endif /* __ASM_CACHETYPE_H */
diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S
new file mode 100644
index 0000000..f4efa04
--- /dev/null
+++ b/arch/arm64/mm/cache.S
@@ -0,0 +1,279 @@
+/*
+ * Cache maintenance
+ *
+ * Copyright (C) 2001 Deep Blue Solutions Ltd.
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/linkage.h>
+#include <linux/init.h>
+#include <asm/assembler.h>
+
+#include "proc-macros.S"
+
+/*
+ * __cpuc_flush_dcache_all()
+ *
+ * Flush the whole D-cache.
+ *
+ * Corrupted registers: x0-x7, x9-x11
+ */
+ENTRY(__cpuc_flush_dcache_all)
+ dsb sy // ensure ordering with previous memory accesses
+ mrs x0, clidr_el1 // read clidr
+ and x3, x0, #0x7000000 // extract loc from clidr
+ lsr x3, x3, #23 // left align loc bit field
+ cbz x3, finished // if loc is 0, then no need to clean
+ mov x10, #0 // start clean at cache level 0
+loop1:
+ add x2, x10, x10, lsr #1 // work out 3x current cache level
+ lsr x1, x0, x2 // extract cache type bits from clidr
+ and x1, x1, #7 // mask of the bits for current cache only
+ cmp x1, #2 // see what cache we have at this level
+ b.lt skip // skip if no cache, or just i-cache
+ save_and_disable_irqs x9 // make CSSELR and CCSIDR access atomic
+ msr csselr_el1, x10 // select current cache level in csselr
+ isb // isb to sych the new cssr&csidr
+ mrs x1, ccsidr_el1 // read the new ccsidr
+ restore_irqs x9
+ and x2, x1, #7 // extract the length of the cache lines
+ add x2, x2, #4 // add 4 (line length offset)
+ mov x4, #0x3ff
+ and x4, x4, x1, lsr #3 // find maximum number on the way size
+ clz x5, x4 // find bit position of way size increment
+ mov x7, #0x7fff
+ and x7, x7, x1, lsr #13 // extract max number of the index size
+loop2:
+ mov x9, x4 // create working copy of max way size
+loop3:
+ lsl x6, x9, x5
+ orr x11, x10, x6 // factor way and cache number into x11
+ lsl x6, x7, x2
+ orr x11, x11, x6 // factor index number into x11
+ dc cisw, x11 // clean & invalidate by set/way
+ subs x9, x9, #1 // decrement the way
+ b.ge loop3
+ subs x7, x7, #1 // decrement the index
+ b.ge loop2
+skip:
+ add x10, x10, #2 // increment cache number
+ cmp x3, x10
+ b.gt loop1
+finished:
+ mov x10, #0 // swith back to cache level 0
+ msr csselr_el1, x10 // select current cache level in csselr
+ dsb sy
+ isb
+ ret
+ENDPROC(__cpuc_flush_dcache_all)
+
+/*
+ * __cpuc_flush_cache_all()
+ *
+ * Flush the entire cache system. The data cache flush is now achieved
+ * using atomic clean / invalidates working outwards from L1 cache. This
+ * is done using Set/Way based cache maintainance instructions. The
+ * instruction cache can still be invalidated back to the point of
+ * unification in a single instruction.
+ */
+ENTRY(__cpuc_flush_kern_all)
+ mov x12, lr
+ bl __cpuc_flush_dcache_all
+ mov x0, #0
+ ic ialluis // I+BTB cache invalidate
+ ret x12
+ENDPROC(__cpuc_flush_kern_all)
+
+/*
+ * __cpuc_flush_cache_all()
+ *
+ * Flush all TLB entries in a particular address space
+ */
+ENTRY(__cpuc_flush_user_all)
+ /*FALLTHROUGH*/
+
+/*
+ * __cpuc_flush_cache_range(start, end, flags)
+ *
+ * Flush a range of TLB entries in the specified address space.
+ *
+ * - start - start address (may not be aligned)
+ * - end - end address (exclusive, may not be aligned)
+ * - flags - vm_area_struct flags describing address space
+ */
+ENTRY(__cpuc_flush_user_range)
+ ret
+ENDPROC(__cpuc_flush_user_all)
+ENDPROC(__cpuc_flush_user_range)
+
+/*
+ * __cpuc_coherent_kern_range(start,end)
+ *
+ * Ensure that the I and D caches are coherent within specified region.
+ * This is typically used when code has been written to a memory region,
+ * and will be executed.
+ *
+ * - start - virtual start address of region
+ * - end - virtual end address of region
+ */
+ENTRY(__cpuc_coherent_kern_range)
+ /* FALLTHROUGH */
+
+/*
+ * __cpuc_coherent_user_range(start,end)
+ *
+ * Ensure that the I and D caches are coherent within specified region.
+ * This is typically used when code has been written to a memory region,
+ * and will be executed.
+ *
+ * - start - virtual start address of region
+ * - end - virtual end address of region
+ */
+ENTRY(__cpuc_coherent_user_range)
+ dcache_line_size x2, x3
+ sub x3, x2, #1
+ bic x4, x0, x3
+1:
+USER(9f, dc cvau, x4 ) // clean D line to PoU
+ add x4, x4, x2
+ cmp x4, x1
+ b.lo 1b
+ dsb sy
+
+ icache_line_size x2, x3
+ sub x3, x2, #1
+ bic x4, x0, x3
+1:
+USER(9f, ic ivau, x4 ) // invalidate I line PoU
+ add x4, x4, x2
+ cmp x4, x1
+ b.lo 1b
+9: // ignore any faulting cache operation
+ dsb sy
+ isb
+ ret
+ENDPROC(__cpuc_coherent_kern_range)
+ENDPROC(__cpuc_coherent_user_range)
+
+ .section .fixup,"ax"
+ .align 0
+9001: ret
+ .previous
+
+
+/*
+ * __cpuc_flush_kern_dcache_page(kaddr)
+ *
+ * Ensure that the data held in the page kaddr is written back to the
+ * page in question.
+ *
+ * - kaddr - kernel address
+ * - size - size in question
+ */
+ENTRY(__cpuc_flush_dcache_area)
+ dcache_line_size x2, x3
+ add x1, x0, x1
+ sub x3, x2, #1
+ bic x0, x0, x3
+1: dc civac, x0 // clean & invalidate D line / unified line
+ add x0, x0, x2
+ cmp x0, x1
+ b.lo 1b
+ dsb sy
+ ret
+ENDPROC(__cpuc_flush_dcache_area)
+
+/*
+ * dmac_inv_range(start,end)
+ *
+ * Invalidate the data cache within the specified region; we will be
+ * performing a DMA operation in this region and we want to purge old
+ * data in the cache.
+ *
+ * - start - virtual start address of region
+ * - end - virtual end address of region
+ */
+ENTRY(dmac_inv_range)
+ dcache_line_size x2, x3
+ sub x3, x2, #1
+ bic x0, x0, x3
+ bic x1, x1, x3
+1: dc ivac, x0 // invalidate D / U line
+ add x0, x0, x2
+ cmp x0, x1
+ b.lo 1b
+ dsb sy
+ ret
+ENDPROC(dmac_inv_range)
+
+/*
+ * dmac_clean_range(start,end)
+ * - start - virtual start address of region
+ * - end - virtual end address of region
+ */
+ENTRY(dmac_clean_range)
+ dcache_line_size x2, x3
+ sub x3, x2, #1
+ bic x0, x0, x3
+1: dc cvac, x0 // clean D / U line
+ add x0, x0, x2
+ cmp x0, x1
+ b.lo 1b
+ dsb sy
+ ret
+ENDPROC(dmac_clean_range)
+
+/*
+ * dmac_flush_range(start,end)
+ * - start - virtual start address of region
+ * - end - virtual end address of region
+ */
+ENTRY(dmac_flush_range)
+ dcache_line_size x2, x3
+ sub x3, x2, #1
+ bic x0, x0, x3
+1: dc civac, x0 // clean & invalidate D / U line
+ add x0, x0, x2
+ cmp x0, x1
+ b.lo 1b
+ dsb sy
+ ret
+ENDPROC(dmac_flush_range)
+
+/*
+ * dmac_map_area(start, size, dir)
+ * - start - kernel virtual start address
+ * - size - size of region
+ * - dir - DMA direction
+ */
+ENTRY(dmac_map_area)
+ add x1, x1, x0
+ cmp x2, #DMA_FROM_DEVICE
+ b.eq dmac_inv_range
+ b dmac_clean_range
+ENDPROC(dmac_map_area)
+
+/*
+ * dmac_unmap_area(start, size, dir)
+ * - start - kernel virtual start address
+ * - size - size of region
+ * - dir - DMA direction
+ */
+ENTRY(dmac_unmap_area)
+ add x1, x1, x0
+ cmp x2, #DMA_TO_DEVICE
+ b.ne dmac_inv_range
+ ret
+ENDPROC(dmac_unmap_area)
diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
new file mode 100644
index 0000000..44f9e5c
--- /dev/null
+++ b/arch/arm64/mm/flush.c
@@ -0,0 +1,132 @@
+/*
+ * Based on arch/arm/mm/flush.c
+ *
+ * Copyright (C) 1995-2002 Russell King
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/export.h>
+#include <linux/mm.h>
+#include <linux/pagemap.h>
+
+#include <asm/cacheflush.h>
+#include <asm/cachetype.h>
+#include <asm/tlbflush.h>
+
+#include "mm.h"
+
+void flush_cache_mm(struct mm_struct *mm)
+{
+}
+
+void flush_cache_range(struct vm_area_struct *vma, unsigned long start,
+ unsigned long end)
+{
+ if (vma->vm_flags & VM_EXEC)
+ __flush_icache_all();
+}
+
+void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr,
+ unsigned long pfn)
+{
+}
+
+static void flush_ptrace_access(struct vm_area_struct *vma, struct page *page,
+ unsigned long uaddr, void *kaddr,
+ unsigned long len)
+{
+ if (vma->vm_flags & VM_EXEC) {
+ unsigned long addr = (unsigned long)kaddr;
+ if (icache_is_aliasing()) {
+ __cpuc_flush_dcache_area(kaddr, len);
+ __flush_icache_all();
+ } else {
+ __cpuc_coherent_kern_range(addr, addr + len);
+ }
+ }
+}
+
+/*
+ * Copy user data from/to a page which is mapped into a different processes
+ * address space. Really, we want to allow our "user space" model to handle
+ * this.
+ *
+ * Note that this code needs to run on the current CPU.
+ */
+void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
+ unsigned long uaddr, void *dst, const void *src,
+ unsigned long len)
+{
+#ifdef CONFIG_SMP
+ preempt_disable();
+#endif
+ memcpy(dst, src, len);
+ flush_ptrace_access(vma, page, uaddr, dst, len);
+#ifdef CONFIG_SMP
+ preempt_enable();
+#endif
+}
+
+void __flush_dcache_page(struct address_space *mapping, struct page *page)
+{
+ __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);
+}
+
+void __sync_icache_dcache(pte_t pte)
+{
+ unsigned long pfn;
+ struct page *page;
+
+ pfn = pte_pfn(pte);
+ if (!pfn_valid(pfn))
+ return;
+
+ page = pfn_to_page(pfn);
+ if (!test_and_set_bit(PG_dcache_clean, &page->flags))
+ __flush_dcache_page(NULL, page);
+ __flush_icache_all();
+}
+
+/*
+ * Ensure cache coherency between kernel mapping and userspace mapping of this
+ * page.
+ */
+void flush_dcache_page(struct page *page)
+{
+ struct address_space *mapping;
+
+ /*
+ * The zero page is never written to, so never has any dirty cache
+ * lines, and therefore never needs to be flushed.
+ */
+ if (page == ZERO_PAGE(0))
+ return;
+
+ mapping = page_mapping(page);
+
+ if (mapping && !mapping_mapped(mapping))
+ clear_bit(PG_dcache_clean, &page->flags);
+ else {
+ __flush_dcache_page(mapping, page);
+ if (mapping)
+ __flush_icache_all();
+ set_bit(PG_dcache_clean, &page->flags);
+ }
+}
+EXPORT_SYMBOL(flush_dcache_page);
+
+void __flush_anon_page(struct vm_area_struct *vma, struct page *page, unsigned long vmaddr)
+{
+}

2012-08-14 17:53:53

by Catalin Marinas

[permalink] [raw]
Subject: [PATCH v2 08/31] arm64: CPU support

This patch adds AArch64 CPU specific functionality. It assumes that the
implementation is generic to AArch64 and does not require specific
identification. Different CPU implementations may require the setting of
various ACTLR_EL1 bits but such information is not currently available
and it should ideally be pushed to firmware.

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
---
arch/arm64/include/asm/cputype.h | 49 +++++++++
arch/arm64/include/asm/proc-fns.h | 51 ++++++++++
arch/arm64/include/asm/processor.h | 174 ++++++++++++++++++++++++++++++++
arch/arm64/include/asm/procinfo.h | 44 ++++++++
arch/arm64/mm/proc-syms.c | 31 ++++++
arch/arm64/mm/proc.S | 193 ++++++++++++++++++++++++++++++++++++
6 files changed, 542 insertions(+), 0 deletions(-)
create mode 100644 arch/arm64/include/asm/cputype.h
create mode 100644 arch/arm64/include/asm/proc-fns.h
create mode 100644 arch/arm64/include/asm/processor.h
create mode 100644 arch/arm64/include/asm/procinfo.h
create mode 100644 arch/arm64/mm/proc-syms.c
create mode 100644 arch/arm64/mm/proc.S

diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h
new file mode 100644
index 0000000..ef54125
--- /dev/null
+++ b/arch/arm64/include/asm/cputype.h
@@ -0,0 +1,49 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_CPUTYPE_H
+#define __ASM_CPUTYPE_H
+
+#define ID_MIDR_EL1 "midr_el1"
+#define ID_CTR_EL0 "ctr_el0"
+
+#define ID_AA64PFR0_EL1 "id_aa64pfr0_el1"
+#define ID_AA64DFR0_EL1 "id_aa64dfr0_el1"
+#define ID_AA64AFR0_EL1 "id_aa64afr0_el1"
+#define ID_AA64ISAR0_EL1 "id_aa64isar0_el1"
+#define ID_AA64MMFR0_EL1 "id_aa64mmfr0_el1"
+
+#define read_cpuid(reg) ({ \
+ u64 __val; \
+ asm("mrs %0, " reg : "=r" (__val)); \
+ __val; \
+})
+
+/*
+ * The CPU ID never changes at run time, so we might as well tell the
+ * compiler that it's constant. Use this function to read the CPU ID
+ * rather than directly reading processor_id or read_cpuid() directly.
+ */
+static inline u32 __attribute_const__ read_cpuid_id(void)
+{
+ return read_cpuid(ID_MIDR_EL1);
+}
+
+static inline u32 __attribute_const__ read_cpuid_cachetype(void)
+{
+ return read_cpuid(ID_CTR_EL0);
+}
+
+#endif
diff --git a/arch/arm64/include/asm/proc-fns.h b/arch/arm64/include/asm/proc-fns.h
new file mode 100644
index 0000000..520331b
--- /dev/null
+++ b/arch/arm64/include/asm/proc-fns.h
@@ -0,0 +1,51 @@
+/*
+ * Based on arch/arm/include/asm/proc-fns.h
+ *
+ * Copyright (C) 1997-1999 Russell King
+ * Copyright (C) 2000 Deep Blue Solutions Ltd
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_PROCFNS_H
+#define __ASM_PROCFNS_H
+
+#ifdef __KERNEL__
+#ifndef __ASSEMBLY__
+
+#include <asm/page.h>
+
+struct mm_struct;
+
+extern void cpu_proc_init(void);
+extern void cpu_proc_fin(void);
+extern void cpu_do_idle(void);
+extern void cpu_do_switch_mm(unsigned long pgd_phys, struct mm_struct *mm);
+extern void cpu_reset(unsigned long addr) __attribute__((noreturn));
+
+#include <asm/memory.h>
+
+#define cpu_switch_mm(pgd,mm) cpu_do_switch_mm(virt_to_phys(pgd),mm)
+
+#define cpu_get_pgd() \
+({ \
+ unsigned long pg; \
+ asm("mrs %0, ttbr0_el1\n" \
+ : "=r" (pg)); \
+ pg &= ~0xffff000000003ffful; \
+ (pgd_t *)phys_to_virt(pg); \
+})
+
+#endif /* __ASSEMBLY__ */
+#endif /* __KERNEL__ */
+#endif /* __ASM_PROCFNS_H */
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
new file mode 100644
index 0000000..ebf2b22
--- /dev/null
+++ b/arch/arm64/include/asm/processor.h
@@ -0,0 +1,174 @@
+/*
+ * Based on arch/arm/include/asm/processor.h
+ *
+ * Copyright (C) 1995-1999 Russell King
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_PROCESSOR_H
+#define __ASM_PROCESSOR_H
+
+/*
+ * Default implementation of macro that returns current
+ * instruction pointer ("program counter").
+ */
+#define current_text_addr() ({ __label__ _l; _l: &&_l;})
+
+#ifdef __KERNEL__
+
+#include <linux/string.h>
+
+#include <asm/fpsimd.h>
+#include <asm/hw_breakpoint.h>
+#include <asm/ptrace.h>
+#include <asm/types.h>
+
+#ifdef __KERNEL__
+#define STACK_TOP_MAX TASK_SIZE_64
+#ifdef CONFIG_AARCH32_EMULATION
+#define AARCH32_VECTORS_BASE 0xffff0000
+#define STACK_TOP (test_thread_flag(TIF_32BIT) ? \
+ AARCH32_VECTORS_BASE : STACK_TOP_MAX)
+#else
+#define STACK_TOP STACK_TOP_MAX
+#endif /* CONFIG_AARCH32_EMULATION */
+#endif /* __KERNEL__ */
+
+struct debug_info {
+ /* Have we suspended stepping by a debugger? */
+ int suspended_step;
+ /* Allow breakpoints and watchpoints to be disabled for this thread. */
+ int bps_disabled;
+ int wps_disabled;
+ /* Hardware breakpoints pinned to this task. */
+ struct perf_event *hbp[ARM_MAX_HBP_SLOTS];
+};
+
+struct cpu_context {
+ unsigned long x19;
+ unsigned long x20;
+ unsigned long x21;
+ unsigned long x22;
+ unsigned long x23;
+ unsigned long x24;
+ unsigned long x25;
+ unsigned long x26;
+ unsigned long x27;
+ unsigned long x28;
+ unsigned long fp;
+ unsigned long sp;
+ unsigned long pc;
+};
+
+struct thread_struct {
+ struct cpu_context cpu_context; /* cpu context */
+ unsigned long tp_value;
+ struct fpsimd_state fpsimd_state;
+ unsigned long fault_address; /* fault info */
+ struct debug_info debug; /* debugging */
+};
+
+#define INIT_THREAD { }
+
+static inline void start_thread_common(struct pt_regs *regs, unsigned long pc)
+{
+ memset(regs, 0, sizeof(*regs));
+ regs->syscallno = ~0UL;
+ regs->pc = pc;
+}
+
+static inline void start_thread(struct pt_regs *regs, unsigned long pc,
+ unsigned long sp)
+{
+ unsigned long *stack = (unsigned long *)sp;
+
+ start_thread_common(regs, pc);
+ regs->pstate = PSR_MODE_EL0t;
+ regs->sp = sp;
+ regs->regs[2] = stack[2]; /* x2 (envp) */
+ regs->regs[1] = stack[1]; /* x1 (argv) */
+ regs->regs[0] = stack[0]; /* x0 (argc) */
+}
+
+#ifdef CONFIG_AARCH32_EMULATION
+static inline void compat_start_thread(struct pt_regs *regs, unsigned long pc,
+ unsigned long sp)
+{
+ unsigned int *stack = (unsigned int *)sp;
+
+ start_thread_common(regs, pc);
+ regs->pstate = COMPAT_PSR_MODE_USR;
+ if (pc & 1)
+ regs->pstate |= COMPAT_PSR_T_BIT;
+ regs->compat_sp = sp;
+ regs->regs[2] = stack[2]; /* x2 (envp) */
+ regs->regs[1] = stack[1]; /* x1 (argv) */
+ regs->regs[0] = stack[0]; /* x0 (argc) */
+}
+#endif
+
+/* Forward declaration, a strange C thing */
+struct task_struct;
+
+/* Free all resources held by a thread. */
+extern void release_thread(struct task_struct *);
+
+/* Prepare to copy thread state - unlazy all lazy status */
+#define prepare_to_copy(tsk) do { } while (0)
+
+unsigned long get_wchan(struct task_struct *p);
+
+#define cpu_relax() barrier()
+
+/* Thread switching */
+extern struct task_struct *cpu_switch_to(struct task_struct *prev,
+ struct task_struct *next);
+
+/*
+ * Create a new kernel thread
+ */
+extern int kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
+
+#define task_pt_regs(p) \
+ ((struct pt_regs *)(THREAD_START_SP + task_stack_page(p)) - 1)
+
+#define KSTK_EIP(tsk) task_pt_regs(tsk)->pc
+#define KSTK_ESP(tsk) task_pt_regs(tsk)->sp
+
+/*
+ * Prefetching support
+ */
+#define ARCH_HAS_PREFETCH
+static inline void prefetch(const void *ptr)
+{
+ asm volatile("prfm pldl1keep, %a0\n" : : "p" (ptr));
+}
+
+#define ARCH_HAS_PREFETCHW
+static inline void prefetchw(const void *ptr)
+{
+ asm volatile("prfm pstl1keep, %a0\n" : : "p" (ptr));
+}
+
+#define ARCH_HAS_SPINLOCK_PREFETCH
+static inline void spin_lock_prefetch(const void *x)
+{
+ prefetchw(x);
+}
+
+#define HAVE_ARCH_PICK_MMAP_LAYOUT
+
+#endif
+
+#endif /* __ASM_PROCESSOR_H */
diff --git a/arch/arm64/include/asm/procinfo.h b/arch/arm64/include/asm/procinfo.h
new file mode 100644
index 0000000..81fece9
--- /dev/null
+++ b/arch/arm64/include/asm/procinfo.h
@@ -0,0 +1,44 @@
+/*
+ * Based on arch/arm/include/asm/procinfo.h
+ *
+ * Copyright (C) 1996-1999 Russell King
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_PROCINFO_H
+#define __ASM_PROCINFO_H
+
+#ifdef __KERNEL__
+
+/*
+ * Note! struct processor is always defined if we're
+ * using MULTI_CPU, otherwise this entry is unused,
+ * but still exists.
+ *
+ * NOTE! The following structure is defined by assembly
+ * language, NOT C code. For more information, check:
+ * arch/arm/mm/proc-*.S and arch/arm/kernel/head.S
+ */
+struct proc_info_list {
+ unsigned int cpu_val;
+ unsigned int cpu_mask;
+ unsigned long __cpu_flush; /* used by head.S */
+ const char *cpu_name;
+};
+
+#else /* __KERNEL__ */
+#include <asm/elf.h>
+#warning "Please include asm/elf.h instead"
+#endif /* __KERNEL__ */
+#endif
diff --git a/arch/arm64/mm/proc-syms.c b/arch/arm64/mm/proc-syms.c
new file mode 100644
index 0000000..2d99ef9
--- /dev/null
+++ b/arch/arm64/mm/proc-syms.c
@@ -0,0 +1,31 @@
+/*
+ * Based on arch/arm/mm/proc-syms.c
+ *
+ * Copyright (C) 2000-2002 Russell King
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/export.h>
+#include <linux/mm.h>
+
+#include <asm/cacheflush.h>
+#include <asm/proc-fns.h>
+#include <asm/tlbflush.h>
+#include <asm/page.h>
+
+EXPORT_SYMBOL(__cpuc_flush_kern_all);
+EXPORT_SYMBOL(__cpuc_flush_user_all);
+EXPORT_SYMBOL(__cpuc_flush_user_range);
+EXPORT_SYMBOL(__cpuc_coherent_kern_range);
+EXPORT_SYMBOL(__cpuc_flush_dcache_area);
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
new file mode 100644
index 0000000..453f517
--- /dev/null
+++ b/arch/arm64/mm/proc.S
@@ -0,0 +1,193 @@
+/*
+ * Based on arch/arm/mm/proc.S
+ *
+ * Copyright (C) 2001 Deep Blue Solutions Ltd.
+ * Copyright (C) 2012 ARM Ltd.
+ * Author: Catalin Marinas <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/init.h>
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+#include <asm/asm-offsets.h>
+#include <asm/hwcap.h>
+#include <asm/pgtable-hwdef.h>
+#include <asm/pgtable.h>
+
+#include "proc-macros.S"
+
+#ifndef CONFIG_SMP
+/* PTWs cacheable, inner/outer WBWA not shareable */
+#define TCR_FLAGS TCR_IRGN_WBWA | TCR_ORGN_WBWA
+#else
+/* PTWs cacheable, inner/outer WBWA shareable */
+#define TCR_FLAGS TCR_IRGN_WBWA | TCR_ORGN_WBWA | TCR_SHARED
+#endif
+
+#define MAIR(attr, mt) ((attr) << ((mt) * 8))
+
+ENTRY(cpu_proc_init)
+ ret
+ENDPROC(cpu_proc_init)
+
+ENTRY(cpu_proc_fin)
+ ret
+ENDPROC(cpu_proc_fin)
+
+/*
+ * cpu_reset(loc)
+ *
+ * Perform a soft reset of the system. Put the CPU into the same state
+ * as it would be if it had been reset, and branch to what would be the
+ * reset vector. It must be executed with the flat identity mapping.
+ *
+ * - loc - location to jump to for soft reset
+ */
+ .align 5
+ENTRY(cpu_reset)
+ mrs x1, sctlr_el1
+ bic x1, x1, #1
+ msr sctlr_el1, x1 // disable the MMU
+ isb
+ ret x0
+ENDPROC(cpu_reset)
+
+/*
+ * cpu_do_idle()
+ *
+ * Idle the processor (wait for interrupt).
+ */
+ENTRY(cpu_do_idle)
+ dsb sy // WFI may enter a low-power mode
+ wfi
+ ret
+ENDPROC(cpu_do_idle)
+
+/*
+ * cpu_switch_mm(pgd_phys, tsk)
+ *
+ * Set the translation table base pointer to be pgd_phys.
+ *
+ * - pgd_phys - physical address of new TTB
+ */
+ENTRY(cpu_do_switch_mm)
+ mmid w1, x1 // get mm->context.id
+ bfi x0, x1, #48, #16 // set the ASID
+ msr ttbr0_el1, x0 // set TTBR0
+ isb
+ ret
+ENDPROC(cpu_do_switch_mm)
+
+cpu_name:
+ .ascii "AArch64 Processor"
+ .align
+
+ .section ".text.init", #alloc, #execinstr
+
+/*
+ * __cpu_setup
+ *
+ * Initialise the processor for turning the MMU on. Return in x0 the
+ * value of the SCTLR_EL1 register.
+ */
+__cpu_setup:
+#ifdef CONFIG_SMP
+ /* TODO: only do this for certain CPUs */
+ /*
+ * Enable SMP/nAMP mode.
+ */
+ mrs x0, actlr_el1
+ tbnz x0, #6, 1f // already enabled?
+ orr x0, x0, #1 << 6
+ msr actlr_el1, x0
+1:
+#endif
+ /*
+ * Preserve the link register across the function call.
+ */
+ mov x28, lr
+ bl __cpuc_flush_dcache_all
+ mov lr, x28
+ ic iallu // I+BTB cache invalidate
+ dsb sy
+
+ mov x0, #3 << 20
+ msr cpacr_el1, x0 // Enable FP/ASIMD
+ mov x0, #1
+ msr oslar_el1, x0 // Set the debug OS lock
+ tlbi vmalle1is // invalidate I + D TLBs
+ /*
+ * Memory region attributes for LPAE:
+ *
+ * n = AttrIndx[2:0]
+ * n MAIR
+ * DEVICE_nGnRnE 000 00000000
+ * DEVICE_nGnRE 001 00000100
+ * DEVICE_GRE 010 00001100
+ * NORMAL_NC 011 01000100
+ * NORMAL 100 11111111
+ */
+ ldr x5, =MAIR(0x00, MT_DEVICE_nGnRnE) | \
+ MAIR(0x04, MT_DEVICE_nGnRE) | \
+ MAIR(0x0c, MT_DEVICE_GRE) | \
+ MAIR(0x44, MT_NORMAL_NC) | \
+ MAIR(0xff, MT_NORMAL)
+ msr mair_el1, x5
+ /*
+ * Prepare SCTLR
+ */
+ adr x5, crval
+ ldp w5, w6, [x5]
+ mrs x0, sctlr_el1
+ bic x0, x0, x5 // clear bits
+ orr x0, x0, x6 // set bits
+ /*
+ * Set/prepare TCR and TTBR. We use 512GB (39-bit) address range for
+ * both user and kernel.
+ */
+ ldr x10, =TCR_TxSZ(VA_BITS) | TCR_FLAGS | TCR_IPS_40BIT | \
+ TCR_ASID16 | (1 << 31)
+#ifdef CONFIG_ARM64_64K_PAGES
+ orr x10, x10, TCR_TG0_64K
+ orr x10, x10, TCR_TG1_64K
+#endif
+ msr tcr_el1, x10
+ ret // return to head.S
+ENDPROC(__cpu_setup)
+
+ /*
+ * n n T
+ * U E WT T UD US IHBS
+ * CE0 XWHW CZ ME TEEA S
+ * .... .IEE .... NEAI TE.I ..AD DEN0 ACAM
+ * 0011 0... 1101 ..0. ..0. 10.. .... .... < hardware reserved
+ * .... .100 .... 01.1 11.1 ..01 0001 1101 < software settings
+ */
+ .type crval, #object
+crval:
+ .word 0x030802e2 // clear
+ .word 0x0405d11d // set
+
+ .section ".proc.info.init", #alloc, #execinstr
+
+ .type __v8_proc_info, #object
+__v8_proc_info:
+ .long 0x000f0000 // Required ID value
+ .long 0x000f0000 // Mask for ID
+ b __cpu_setup
+ nop
+ .quad cpu_name
+ .long 0
+ .size __v8_proc_info, . - __v8_proc_info

2012-08-14 21:01:08

by Sam Ravnborg

[permalink] [raw]
Subject: Re: [PATCH v2 30/31] arm64: Build infrastructure

>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> new file mode 100644
> index 0000000..1ce3d04
> --- /dev/null
> +++ b/arch/arm64/Kconfig
> @@ -0,0 +1,261 @@
> +config ARM64
> + def_bool y
> + select OF
> + select OF_EARLY_FLATTREE
> + select IRQ_DOMAIN
> + select HAVE_AOUT
> + select HAVE_DMA_ATTRS
> + select HAVE_DMA_API_DEBUG
> + select HAVE_IDE
> + select HAVE_MEMBLOCK
> + select RTC_LIB
> + select SYS_SUPPORTS_APM_EMULATION
> + select HAVE_GENERIC_DMA_COHERENT
> + select GENERIC_IOMAP
> + select HAVE_IRQ_WORK
> + select HAVE_PERF_EVENTS
> + select HAVE_ARCH_TRACEHOOK
> + select PERF_USE_VMALLOC
> + select HAVE_HW_BREAKPOINT if PERF_EVENTS
> + select HAVE_GENERIC_HARDIRQS
> + select GENERIC_HARDIRQS_NO_DEPRECATED
> + select HAVE_SPARSE_IRQ
> + select SPARSE_IRQ
> + select GENERIC_IRQ_SHOW
> + select GENERIC_SMP_IDLE_THREAD
> + select NO_BOOTMEM

If you keep this list sorted then merge conflicts are less likely.


> + help
> + ARM 64-bit (AArch64) Linux support.
> +
> +config 64BIT
> + def_bool y
> +
> +config ARCH_PHYS_ADDR_T_64BIT
> + def_bool y
> +
> +config HAVE_PWM
> + bool
> +
> +config SYS_SUPPORTS_APM_EMULATION
> + bool
> +
> +config NO_IOPORT
> + def_bool y
> +
> +config GENERIC_GPIO
> + bool
> +
> +config GENERIC_TIME_VSYSCALL
> + def_bool y
Please use select like all other archs do.


> +
> +config GENERIC_CLOCKEVENTS
> + def_bool y
Again - please use select.

> +
> +config STACKTRACE_SUPPORT
> + def_bool y
> +
> +config LOCKDEP_SUPPORT
> + def_bool y
> +
> +config TRACE_IRQFLAGS_SUPPORT
> + def_bool y
> +
> +config HARDIRQS_SW_RESEND
> + def_bool y
Please use select.

> +
> +config GENERIC_IRQ_PROBE
> + def_bool y
Please use select.

> +
> +config GENERIC_LOCKBREAK
> + def_bool y
> + depends on SMP && PREEMPT
> +
> +config RWSEM_GENERIC_SPINLOCK
> + def_bool y
> +
> +config RWSEM_XCHGADD_ALGORITHM
> + bool
> +
> +config ARCH_HAS_ILOG2_U32
> + bool
> +
> +config ARCH_HAS_ILOG2_U64
> + bool
> +
> +config ARCH_HAS_CPUFREQ
> + bool
> + help
> + Internal node to signify that the ARCH has CPUFREQ support
> + and that the relevant menu configurations are displayed for
> + it.
> +
> +config GENERIC_HWEIGHT
> + def_bool y
> +
> +config GENERIC_CSUM
> + def_bool y
> +
> +config GENERIC_CALIBRATE_DELAY
> + def_bool y
> +
> +config ZONE_DMA32
> + def_bool y
> +
> +config ARCH_DMA_ADDR_T_64BIT
> + def_bool y
> +
> +config NEED_DMA_MAP_STATE
> + def_bool y
> +
> +config NEED_SG_DMA_LENGTH
> + def_bool y
> +
> +config SWIOTLB
> + def_bool y
> +
> +config IOMMU_HELPER
> + def_bool SWIOTLB
> +


Sam

2012-08-14 23:06:52

by Olof Johansson

[permalink] [raw]
Subject: Re: [PATCH v2 02/31] arm64: Kernel booting and initialisation

Hi,


On Tue, Aug 14, 2012 at 06:52:03PM +0100, Catalin Marinas wrote:

> +Before jumping into the kernel, the following conditions must be met:
> +
> +- Quiesce all DMA capable devices so that memory does not get
> + corrupted by bogus network packets or disk data. This will save
> + you many hours of debug.
> +
> +- Primary CPU general-purpose register settings
> + x0 = physical address of device tree blob (dtb) in system RAM.
> +
> +- CPU mode
> + All forms of interrupts must be masked in PSTATE.DAIF (Debug, SError,
> + IRQ and FIQ).
> + The CPU must be in either EL2 (RECOMMENDED in order to have access to
> + the virtualisation extensions) or non-secure EL1.
> +
> +- Caches, MMUs
> + The MMU must be off.
> + Instruction cache may be on or off.
> + Data cache must be off and invalidated.
> +
> +- Architected timers
> + CNTFRQ must be programmed with the timer frequency.
> + If entering the kernel at EL1, CNTHCTL_EL2 must have EL1PCTEN (bit 0)
> + set where available.
> +
> +- Coherency
> + All CPUs to be booted by the kernel must be part of the same coherency
> + domain on entry to the kernel. This may require IMPLEMENTATION DEFINED
> + initialisation to enable the receiving of maintenance operations on
> + each CPU.
> +
> +- System registers
> + All writable architected system registers at the exception level where
> + the kernel image will be entered must be initialised by software at a
> + higher exception level to prevent execution in an UNKNOWN state.

Given the recent development of ARM platforms, you might want to mandate
the state of IOMMUs as well (they should probably be off, since there
should be no active DMA activity). Graphics would be the exception to
this, since if you want to keep scanning out a splash screen, you'll
have to keep doing DMA...

> +- The primary CPU must jump directly to the first instruction of the
> + kernel image. The device tree blob passed by this CPU must contain
> + for each CPU node:
> +
> + 1. An 'enable-method' property. Currently, the only supported value
> + for this field is the string "spin-table".
> +
> + 2. A 'cpu-release-addr' property identifying a 64-bit,
> + zero-initialised memory location.

These would be good to have documented in the
Documentation/devicetree/bindings hierarchy as well.

> index 0000000..d766493
> --- /dev/null
> +++ b/arch/arm64/include/asm/setup.h
> @@ -0,0 +1,26 @@
> +/*
> + * Based on arch/arm/include/asm/setup.h
> + *
> + * Copyright (C) 1997-1999 Russell King
> + * Copyright (C) 2012 ARM Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
> + */
> +#ifndef __ASM_SETUP_H
> +#define __ASM_SETUP_H
> +
> +#include <linux/types.h>
> +
> +#define COMMAND_LINE_SIZE 1024

Probably not a huge deal, and other architectures seem to be all over
the map on this, but you might want to go with a larger value now rather
than later. 2048 or 4096 perhaps?

> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> new file mode 100644
> index 0000000..34ccdc0
> --- /dev/null
> +++ b/arch/arm64/kernel/head.S

[...]

> +/*
> + * Setup common bits before finally enabling the MMU. Essentially this is just
> + * loading the page table pointer and vector base registers.
> + *
> + * On entry to this code, x0 must contain the SCTLR_EL1 value for turning on
> + * the MMU.
> + */
> +__enable_mmu:

ENTRY()?

> + ldr x5, =vectors
> + msr vbar_el1, x5
> + msr ttbr0_el1, x25 // load TTBR0
> + msr ttbr1_el1, x26 // load TTBR1
> + isb
> + b __turn_mmu_on
> +ENDPROC(__enable_mmu)

...or just END()? Same for a few of the other functions below.

> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> new file mode 100644
> index 0000000..f25186f
> --- /dev/null
> +++ b/arch/arm64/kernel/setup.c

[...]

> +static void __init setup_processor(void)
> +{
> + struct proc_info_list *list;
> +
> + /*
> + * locate processor in the list of supported processor
> + * types. The linker builds this table for us from the
> + * entries in arch/arm/mm/proc.S
> + */

Probably from arch/arm64/... somewhere?


[...]

> + printk("CPU: %s [%08x] revision %d\n",
> + cpu_name, read_cpuid_id(), read_cpuid_id() & 15);
> +
> + sprintf(init_utsname()->machine, "aarch64");

> + initial_boot_params = devtree;
> + dt_root = of_get_flat_dt_root();
> +
> + machine_name = of_get_flat_dt_prop(dt_root, "model", NULL);
> + if (!machine_name)
> + machine_name = of_get_flat_dt_prop(dt_root, "compatible", NULL);
> + if (!machine_name)
> + machine_name = "<unknown>";
> + pr_info("Machine: %s\n", machine_name);

This property is an array of strings. It would be more valuable to print out
the entry that was matched for a platform instead of the provided one from the
device tree.


-Olof

2012-08-14 23:23:23

by Aaro Koskinen

[permalink] [raw]
Subject: Re: [PATCH v2 11/31] arm64: IRQ handling

Hi,

On Tue, Aug 14, 2012 at 06:52:12PM +0100, Catalin Marinas wrote:
> +void handle_IRQ(unsigned int irq, struct pt_regs *regs)
> +{
> + struct pt_regs *old_regs = set_irq_regs(regs);
> +
> + irq_enter();
> +
> + /*
> + * Some hardware gives randomly wrong interrupts. Rather
> + * than crashing, do something sensible.
> + */
> + if (unlikely(irq >= nr_irqs)) {
> + if (printk_ratelimit())
> + pr_warning("Bad IRQ%u\n", irq);

I guess pr_warn_ratelimited() should be used for new code.

(See include/linux/printk.h, "Please don't use printk_ratelimit()...")

A.

2012-08-14 23:29:10

by Olof Johansson

[permalink] [raw]
Subject: Re: [PATCH v2 03/31] arm64: Exception handling

Hi,

This one is a bit denser, so just a quick first pass with a couple of minor
comments. I'll revisit the rest.

On Tue, Aug 14, 2012 at 06:52:04PM +0100, Catalin Marinas wrote:

> +el1_sp_pc:
> + /*
> + *Stack or PC alignment exception handling
> + */
> + mrs x0, far_el1
> + mov x1, x25
> + mov x2, sp
> + b do_sp_pc_abort
> +el1_undef:
> + /*
> + *Undefined instruction
> + */

Nit: Missing spaces in the comment here and the one above.

> +el0_undef:
> + /*
> + *Undefined instruction
> + */
> + mov x0, sp
> + b do_undefinstr

Here too.

> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> new file mode 100644
> index 0000000..8712a8e
> --- /dev/null
> +++ b/arch/arm64/kernel/traps.c
[...]
> +DEFINE_SPINLOCK(die_lock);

Should probably be static.

2012-08-14 23:47:19

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v2 03/31] arm64: Exception handling

On Tue, 14 Aug 2012, Olof Johansson wrote:
> > diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> > new file mode 100644
> > index 0000000..8712a8e
> > --- /dev/null
> > +++ b/arch/arm64/kernel/traps.c
> [...]
> > +DEFINE_SPINLOCK(die_lock);
>
> Should probably be static.

And RAW_

2012-08-14 23:50:16

by Olof Johansson

[permalink] [raw]
Subject: Re: [PATCH v2 07/31] arm64: Process management

Hi,

On Tue, Aug 14, 2012 at 06:52:08PM +0100, Catalin Marinas wrote:

> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> new file mode 100644
> index 0000000..c4a4e1c
> --- /dev/null
> +++ b/arch/arm64/kernel/process.c
> @@ -0,0 +1,416 @@

[...]
> +/*
> + * Function pointers to optional machine specific functions
> + */
> +void (*pm_power_off)(void);
> +EXPORT_SYMBOL(pm_power_off);
> +
> +void (*pm_restart)(const char *cmd);
> +EXPORT_SYMBOL_GPL(pm_restart);
[...]
> +void (*pm_idle)(void) = default_idle;
> +EXPORT_SYMBOL(pm_idle);

Does it really make sense to export these to modules?

I find the powerpc way of having a machine descriptor structure with these
(and other) function pointers in it a bit cleaner, since it gives you
one place to plug it all in. I'd recommend that you consider doing that
here as well, for these three and potentially other cases in the future.

(See arch/powerpc/include/asm/machdep.h, struct machdep_calls).

> +void machine_halt(void)
> +{
> + machine_shutdown();
> + while (1);
> +}
> +
> +void machine_power_off(void)
> +{
> + machine_shutdown();
> + if (pm_power_off)
> + pm_power_off();
> +}

Printing something here along the lines of "System halted, OK to power off"
is useful.


-Olof

2012-08-15 00:10:45

by Olof Johansson

[permalink] [raw]
Subject: Re: [PATCH v2 08/31] arm64: CPU support

Hi,

On Tue, Aug 14, 2012 at 06:52:09PM +0100, Catalin Marinas wrote:

> diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h
> new file mode 100644
> index 0000000..ef54125
> --- /dev/null
> +++ b/arch/arm64/include/asm/cputype.h
> @@ -0,0 +1,49 @@
> +/*
> + * Copyright (C) 2012 ARM Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
> + */
> +#ifndef __ASM_CPUTYPE_H
> +#define __ASM_CPUTYPE_H
> +
> +#define ID_MIDR_EL1 "midr_el1"
> +#define ID_CTR_EL0 "ctr_el0"
> +
> +#define ID_AA64PFR0_EL1 "id_aa64pfr0_el1"
> +#define ID_AA64DFR0_EL1 "id_aa64dfr0_el1"
> +#define ID_AA64AFR0_EL1 "id_aa64afr0_el1"
> +#define ID_AA64ISAR0_EL1 "id_aa64isar0_el1"
> +#define ID_AA64MMFR0_EL1 "id_aa64mmfr0_el1"
> +
> +#define read_cpuid(reg) ({ \
> + u64 __val; \
> + asm("mrs %0, " reg : "=r" (__val)); \
> + __val; \
> +})
> +
> +/*
> + * The CPU ID never changes at run time, so we might as well tell the
> + * compiler that it's constant. Use this function to read the CPU ID
> + * rather than directly reading processor_id or read_cpuid() directly.
> + */
> +static inline u32 __attribute_const__ read_cpuid_id(void)
> +{
> + return read_cpuid(ID_MIDR_EL1);
> +}
> +
> +static inline u32 __attribute_const__ read_cpuid_cachetype(void)
> +{
> + return read_cpuid(ID_CTR_EL0);
> +}

Is this perhaps a carry-over from arch/arm? Abstracting out read_cpuid()
doesn't seem to buy anything here, just opencode the one-line assembly
in each.

Might as well cleanup the naming a little too while you're at it, i.e.
read_cpu_id() and read_cpu_cachetype().


> diff --git a/arch/arm64/include/asm/procinfo.h b/arch/arm64/include/asm/procinfo.h
> new file mode 100644
> index 0000000..81fece9
> --- /dev/null
> +++ b/arch/arm64/include/asm/procinfo.h
> @@ -0,0 +1,44 @@
> +/*
> + * Based on arch/arm/include/asm/procinfo.h
> + *
> + * Copyright (C) 1996-1999 Russell King
> + * Copyright (C) 2012 ARM Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
> + */
> +#ifndef __ASM_PROCINFO_H
> +#define __ASM_PROCINFO_H
> +
> +#ifdef __KERNEL__
> +
> +/*
> + * Note! struct processor is always defined if we're
> + * using MULTI_CPU, otherwise this entry is unused,
> + * but still exists.

Stale comment?

> + *
> + * NOTE! The following structure is defined by assembly
> + * language, NOT C code. For more information, check:
> + * arch/arm/mm/proc-*.S and arch/arm/kernel/head.S

Stale references. Also, no current arm64 implementation uses this. Premature
abstraction perhaps?

> +struct proc_info_list {
> + unsigned int cpu_val;
> + unsigned int cpu_mask;
> + unsigned long __cpu_flush; /* used by head.S */
> + const char *cpu_name;
> +};
> +
> +#else /* __KERNEL__ */
> +#include <asm/elf.h>
> +#warning "Please include asm/elf.h instead"
> +#endif /* __KERNEL__ */
> +#endif
> diff --git a/arch/arm64/mm/proc-syms.c b/arch/arm64/mm/proc-syms.c
> new file mode 100644
> index 0000000..2d99ef9
> --- /dev/null
> +++ b/arch/arm64/mm/proc-syms.c
> @@ -0,0 +1,31 @@
> +/*
> + * Based on arch/arm/mm/proc-syms.c
> + *
> + * Copyright (C) 2000-2002 Russell King
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <linux/export.h>
> +#include <linux/mm.h>
> +
> +#include <asm/cacheflush.h>
> +#include <asm/proc-fns.h>
> +#include <asm/tlbflush.h>
> +#include <asm/page.h>
> +
> +EXPORT_SYMBOL(__cpuc_flush_kern_all);
> +EXPORT_SYMBOL(__cpuc_flush_user_all);
> +EXPORT_SYMBOL(__cpuc_flush_user_range);
> +EXPORT_SYMBOL(__cpuc_coherent_kern_range);
> +EXPORT_SYMBOL(__cpuc_flush_dcache_area);

See comment on other email about putting function pointers in a struct
instead.

> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> new file mode 100644
> index 0000000..453f517
> --- /dev/null
> +++ b/arch/arm64/mm/proc.S
> @@ -0,0 +1,193 @@
> + .section ".proc.info.init", #alloc, #execinstr
> +
> + .type __v8_proc_info, #object
> +__v8_proc_info:
> + .long 0x000f0000 // Required ID value
> + .long 0x000f0000 // Mask for ID
> + b __cpu_setup
> + nop
> + .quad cpu_name
> + .long 0
> + .size __v8_proc_info, . - __v8_proc_info

I know this is a carry-over from arch/arm, but how about moving this
to more of a C construct similar to arch/powerpc/kernel/cputable.c
instead? It's considerably easier to read that way, and it's convenient
to have the definitions all in one place, making it easier to share some
of the functions, etc.


-Olof

2012-08-15 00:21:32

by Olof Johansson

[permalink] [raw]
Subject: Re: [PATCH v2 12/31] arm64: Atomic operations

Hi,

On Tue, Aug 14, 2012 at 06:52:13PM +0100, Catalin Marinas wrote:
> This patch introduces the atomic, mutex and futex operations. Many
> atomic operations use the load-acquire and store-release operations
> which imply barriers, avoiding the need for explicit DMB.
>
> Signed-off-by: Will Deacon <[email protected]>
> Signed-off-by: Catalin Marinas <[email protected]>
> ---
> arch/arm64/include/asm/atomic.h | 306 +++++++++++++++++++++++++++++++++++++++
> arch/arm64/include/asm/futex.h | 134 +++++++++++++++++
> 2 files changed, 440 insertions(+), 0 deletions(-)
> create mode 100644 arch/arm64/include/asm/atomic.h
> create mode 100644 arch/arm64/include/asm/futex.h
>
> diff --git a/arch/arm64/include/asm/atomic.h b/arch/arm64/include/asm/atomic.h
> new file mode 100644
> index 0000000..fa60c8b
> --- /dev/null
> +++ b/arch/arm64/include/asm/atomic.h
> @@ -0,0 +1,306 @@
> +/*
> + * Based on arch/arm/include/asm/atomic.h
> + *
> + * Copyright (C) 1996 Russell King.
> + * Copyright (C) 2002 Deep Blue Solutions Ltd.
> + * Copyright (C) 2012 ARM Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
> + */
> +#ifndef __ASM_ATOMIC_H
> +#define __ASM_ATOMIC_H
> +
> +#include <linux/compiler.h>
> +#include <linux/types.h>
> +
> +#include <asm/barrier.h>
> +#include <asm/cmpxchg.h>
> +
> +#define ATOMIC_INIT(i) { (i) }
> +
> +#ifdef __KERNEL__
> +
> +/*
> + * On ARM, ordinary assignment (str instruction) doesn't clear the local
> + * strex/ldrex monitor on some implementations. The reason we can use it for
> + * atomic_set() is the clrex or dummy strex done on every exception return.
> + */
> +#define atomic_read(v) (*(volatile int *)&(v)->counter)
> +#define atomic_set(v,i) (((v)->counter) = (i))
> +
> +/*
> + * AArch64 UP and SMP safe atomic ops. We use load exclusive and
> + * store exclusive to ensure that these are atomic. We may loop
> + * to ensure that the update happens.
> + */
> +static inline void atomic_add(int i, atomic_t *v)
> +{
> + unsigned long tmp;
> + int result;
> +
> + asm volatile("// atomic_add\n"
> +"1: ldxr %w0, [%3]\n"
> +" add %w0, %w0, %w4\n"
> +" stxr %w1, %w0, [%3]\n"
> +" cbnz %w1,1b"

Nit: space before 1b

[...]

> diff --git a/arch/arm64/include/asm/futex.h b/arch/arm64/include/asm/futex.h
> new file mode 100644
> index 0000000..0745e82
> --- /dev/null
> +++ b/arch/arm64/include/asm/futex.h
> @@ -0,0 +1,134 @@
> +/*
> + * Copyright (C) 2012 ARM Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
> + */
> +#ifndef __ASM_FUTEX_H
> +#define __ASM_FUTEX_H
> +
> +#ifdef __KERNEL__
> +
> +#include <linux/futex.h>
> +#include <linux/uaccess.h>
> +#include <asm/errno.h>
> +
> +#define __futex_atomic_op(insn, ret, oldval, uaddr, tmp, oparg) \
> + asm volatile( \
> +"1: ldaxr %w1, %2\n" \
> + insn "\n" \
> +"2: stlxr %w3, %w0, %2\n" \
> +" cbnz %w3, 1b\n" \
> +"3: .pushsection __ex_table,\"a\"\n" \
> +" .align 3\n" \
> +" .quad 1b, 4f, 2b, 4f\n" \
> +" .popsection\n" \
> +" .pushsection .fixup,\"ax\"\n" \

Moving the exception table below the body of the code makes the flow easier to
read, please do that.

Also, don't you need a barrier here?

> +"4: mov %w0, %w5\n" \
> +" b 3b\n" \
> +" .popsection" \
> + : "=&r" (ret), "=&r" (oldval), "+Q" (*uaddr), "=&r" (tmp) \
> + : "r" (oparg), "Ir" (-EFAULT) \
> + : "cc")
> +
> +static inline int
> +futex_atomic_op_inuser (int encoded_op, u32 __user *uaddr)
> +{
> + int op = (encoded_op >> 28) & 7;
> + int cmp = (encoded_op >> 24) & 15;
> + int oparg = (encoded_op << 8) >> 20;
> + int cmparg = (encoded_op << 20) >> 20;
> + int oldval = 0, ret, tmp;
> +
> + if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28))
> + oparg = 1 << oparg;
> +
> + if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
> + return -EFAULT;
> +
> + pagefault_disable(); /* implies preempt_disable() */
> +
> + switch (op) {
> + case FUTEX_OP_SET:
> + __futex_atomic_op("mov %w0, %w4",
> + ret, oldval, uaddr, tmp, oparg);
> + break;
> + case FUTEX_OP_ADD:
> + __futex_atomic_op("add %w0, %w1, %w4",
> + ret, oldval, uaddr, tmp, oparg);
> + break;
> + case FUTEX_OP_OR:
> + __futex_atomic_op("orr %w0, %w1, %w4",
> + ret, oldval, uaddr, tmp, oparg);
> + break;
> + case FUTEX_OP_ANDN:
> + __futex_atomic_op("and %w0, %w1, %w4",
> + ret, oldval, uaddr, tmp, ~oparg);
> + break;
> + case FUTEX_OP_XOR:
> + __futex_atomic_op("eor %w0, %w1, %w4",
> + ret, oldval, uaddr, tmp, oparg);
> + break;
> + default:
> + ret = -ENOSYS;
> + }
> +
> + pagefault_enable(); /* subsumes preempt_enable() */
> +
> + if (!ret) {
> + switch (cmp) {
> + case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break;
> + case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break;
> + case FUTEX_OP_CMP_LT: ret = (oldval < cmparg); break;
> + case FUTEX_OP_CMP_GE: ret = (oldval >= cmparg); break;
> + case FUTEX_OP_CMP_LE: ret = (oldval <= cmparg); break;
> + case FUTEX_OP_CMP_GT: ret = (oldval > cmparg); break;
> + default: ret = -ENOSYS;
> + }
> + }
> + return ret;
> +}
> +
> +static inline int
> +futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
> + u32 oldval, u32 newval)
> +{
> + int ret = 0;
> + u32 val, tmp;
> +
> + if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
> + return -EFAULT;
> +
> + asm volatile("// futex_atomic_cmpxchg_inatomic\n"
> +"1: ldaxr %w1, %2\n"
> +" sub %w3, %w1, %w4\n"
> +" cbnz %w3, 3f\n"
> +"2: stlxr %w3, %w5, %2\n"
> +" cbnz %w3, 1b\n"
> +"3: .pushsection __ex_table,\"a\"\n"
> +" .align 3\n"
> +" .quad 1b, 4f, 2b, 4f\n"
> +" .popsection\n"
> +" .pushsection .fixup,\"ax\"\n"

Same here w.r.t. exception table location and barrier.

> +"4: mov %w0, %w6\n"
> +" b 3b\n"
> +" .popsection"
> + : "+r" (ret), "=&r" (val), "+Q" (*uaddr), "=&r" (tmp)
> + : "r" (oldval), "r" (newval), "Ir" (-EFAULT)
> + : "cc", "memory");
> +
> + *uval = val;
> + return ret;
> +}
> +
> +#endif /* __KERNEL__ */
> +#endif /* __ASM_FUTEX_H */


-Olof

2012-08-15 00:33:58

by Olof Johansson

[permalink] [raw]
Subject: Re: [PATCH v2 13/31] arm64: Device specific operations

On Tue, Aug 14, 2012 at 06:52:14PM +0100, Catalin Marinas wrote:
> This patch adds several definitions for device communication, including
> I/O accessors and ioremap(). The __raw_* accessors are implemented as
> inline asm to avoid compiler generation of post-indexed accesses (less
> efficient to emulate in a virtualised environment).
>
> Signed-off-by: Will Deacon <[email protected]>
> Signed-off-by: Catalin Marinas <[email protected]>
> ---
> arch/arm64/include/asm/device.h | 26 ++++
> arch/arm64/include/asm/fb.h | 34 +++++
> arch/arm64/include/asm/io.h | 263 +++++++++++++++++++++++++++++++++++++++
> arch/arm64/kernel/io.c | 64 ++++++++++
> arch/arm64/mm/ioremap.c | 84 +++++++++++++
> 5 files changed, 471 insertions(+), 0 deletions(-)
> create mode 100644 arch/arm64/include/asm/device.h
> create mode 100644 arch/arm64/include/asm/fb.h
> create mode 100644 arch/arm64/include/asm/io.h
> create mode 100644 arch/arm64/kernel/io.c
> create mode 100644 arch/arm64/mm/ioremap.c
>
> diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
> new file mode 100644
> index 0000000..48fa83f
> --- /dev/null
> +++ b/arch/arm64/include/asm/io.h

[...]

> +/*
> + * I/O port access primitives.
> + */
> +#define IO_SPACE_LIMIT 0xffff
> +
> +/*
> + * We currently don't have any platform with PCI support, so just leave this
> + * defined to 0 until needed.
> + */
> +#define PCI_IOBASE ((void __iomem *)0)

You could just leave out the PCI / I/O code alltogether instead.

> diff --git a/arch/arm64/kernel/io.c b/arch/arm64/kernel/io.c
> new file mode 100644
> index 0000000..7d37ead
> --- /dev/null
> +++ b/arch/arm64/kernel/io.c
> @@ -0,0 +1,64 @@
> +/*
> + * Based on arch/arm/kernel/io.c
> + *
> + * Copyright (C) 2012 ARM Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <linux/export.h>
> +#include <linux/types.h>
> +#include <linux/io.h>
> +
> +/*
> + * Copy data from IO memory space to "real" memory space.
> + */
> +void __memcpy_fromio(void *to, const volatile void __iomem *from, size_t count)
> +{
> + unsigned char *t = to;
> + while (count) {
> + count--;
> + *t = readb(from);
> + t++;
> + from++;
> + }
> +}
> +EXPORT_SYMBOL(__memcpy_fromio);
> +
> +/*
> + * Copy data from "real" memory space to IO memory space.
> + */
> +void __memcpy_toio(volatile void __iomem *to, const void *from, size_t count)
> +{
> + const unsigned char *f = from;
> + while (count) {
> + count--;
> + writeb(*f, to);
> + f++;
> + to++;
> + }
> +}
> +EXPORT_SYMBOL(__memcpy_toio);
> +
> +/*
> + * "memset" on IO memory space.
> + */
> +void __memset_io(volatile void __iomem *dst, int c, size_t count)
> +{
> + while (count) {
> + count--;
> + writeb(c, dst);
> + dst++;
> + }
> +}
> +EXPORT_SYMBOL(__memset_io);

Doing all of the above a byte at a time is horribly inefficient. Feel
free to borrow the implementations from arch/powerpc/kernel/io.c instead
of from ARM.


-Olof

2012-08-15 00:40:09

by Olof Johansson

[permalink] [raw]
Subject: Re: [PATCH v2 14/31] arm64: DMA mapping API

Hi,


On Tue, Aug 14, 2012 at 06:52:15PM +0100, Catalin Marinas wrote:
> This patch adds support for the DMA mapping API. It uses dma_map_ops for
> flexibility and it currently supports swiotlb. This patch could be
> simplified further if the DMA accesses are coherent (not mandated by the
> architecture) or if corresponding hooks are placed in the generic
> swiotlb code to deal with cache maintenance.
>
> Signed-off-by: Catalin Marinas <[email protected]>
> ---
> arch/arm64/include/asm/dma-mapping.h | 124 ++++++++++++++++++++
> arch/arm64/mm/dma-mapping.c | 208 ++++++++++++++++++++++++++++++++++
> 2 files changed, 332 insertions(+), 0 deletions(-)
> create mode 100644 arch/arm64/include/asm/dma-mapping.h
> create mode 100644 arch/arm64/mm/dma-mapping.c
>
> diff --git a/arch/arm64/include/asm/dma-mapping.h b/arch/arm64/include/asm/dma-mapping.h
> new file mode 100644
> index 0000000..538f4b4
> --- /dev/null
> +++ b/arch/arm64/include/asm/dma-mapping.h
> @@ -0,0 +1,124 @@
> +/*
> + * Copyright (C) 2012 ARM Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
> + */
> +#ifndef __ASM_DMA_MAPPING_H
> +#define __ASM_DMA_MAPPING_H
> +
> +#ifdef __KERNEL__
> +
> +#include <linux/types.h>
> +#include <linux/vmalloc.h>
> +
> +#include <asm-generic/dma-coherent.h>
> +
> +#define ARCH_HAS_DMA_GET_REQUIRED_MASK
> +
> +extern struct dma_map_ops *dma_ops;
>
> +static inline struct dma_map_ops *get_dma_ops(struct device *dev)
> +{
> + if (unlikely(!dev) || !dev->archdata.dma_ops)
> + return dma_ops;
> + else
> + return dev->archdata.dma_ops;
> +}

Does it make sense to add the concept of a global dma ops on arm64,
instead of requiring the dma ops pointer per device similar to how
some other platforms do it (including powerpc)? For devices that lack
archdata.dma_ops, dma_supported() should return 0 (and the other ops
should return error).



-Olof

2012-08-15 00:54:25

by Olof Johansson

[permalink] [raw]
Subject: Re: [PATCH v2 15/31] arm64: SMP support

Hi,

On Tue, Aug 14, 2012 at 06:52:16PM +0100, Catalin Marinas wrote:
> This patch adds SMP initialisation and spinlocks implementation for
> AArch64. The spinlock support uses the new load-acquire/store-release
> instructions to avoid explicit barriers. The architecture also specifies
> that an event is automatically generated when clearing the exclusive
> monitor state to wake up processors in WFE, so there is no need for an
> explicit DSB/SEV instruction sequence. The SEVL instruction is used to
> set the exclusive monitor locally as there is no conditional WFE and a
> branch is more expensive.
>
> For the SMP booting protocol, see Documentation/arm64/booting.txt.
>
> Signed-off-by: Will Deacon <[email protected]>
> Signed-off-by: Marc Zyngier <[email protected]>
> Signed-off-by: Catalin Marinas <[email protected]>
> ---

> diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h
> new file mode 100644
> index 0000000..34a37fb
> --- /dev/null
> +++ b/arch/arm64/include/asm/spinlock.h
> @@ -0,0 +1,199 @@
> +/*
> + * Copyright (C) 2012 ARM Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
> + */
> +#ifndef __ASM_SPINLOCK_H
> +#define __ASM_SPINLOCK_H
> +
> +#include <asm/spinlock_types.h>
> +#include <asm/processor.h>
> +
> +/*
> + * AArch64 Spin-locking.
> + *
> + * We exclusively read the old value. If it is zero, we may have
> + * won the lock, so we try exclusively storing it. A memory barrier
> + * is required after we get a lock, and before we release it, because
> + * V6 CPUs are assumed to have weakly ordered memory.

This comment should be updated, to mention the implicit locking and remove the
reference to V6?

Also, ignore previous questions on another reply about need for barriers,
obviously not needed given the load-acquire/store-release semantics.



-Olof

2012-08-15 12:57:38

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 01/31] arm64: Assembly macros and definitions

On Tuesday 14 August 2012, Catalin Marinas wrote:
> This patch introduces several assembly macros and definitions used in
> the .S files across arch/arm64/ like IRQ disabling/enabling, together
> with asm-offsets.c.
>
> Signed-off-by: Will Deacon <[email protected]>
> Signed-off-by: Catalin Marinas <[email protected]>

Acked-by: Arnd Bergmann <[email protected]>

2012-08-15 13:03:52

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 03/31] arm64: Exception handling

On Tuesday 14 August 2012, Catalin Marinas wrote:

> +#ifdef CONFIG_AARCH32_EMULATION
> +#define compat_thumb_mode(regs) \
> + (((regs)->pstate & COMPAT_PSR_T_BIT))
> +#else
> +#define compat_thumb_mode(regs) (0)
> +#endif

The symbol we use on other platforms is CONFIG_COMPAT. I don't think you
need to have a separate CONFIG_AARCH32_EMULATION

> +void __bad_xchg(volatile void *ptr, int size)
> +{
> + printk("xchg: bad data size: pc 0x%p, ptr 0x%p, size %d\n",
> + __builtin_return_address(0), ptr, size);
> + BUG();
> +}
> +EXPORT_SYMBOL(__bad_xchg);
> +

I think we're better off not defining this function. My guess is that
initially the idea on ARM was that it was meant as a BUILD_BUG_ON
replacement, but the someone added this function. And you copied it.

Microblaze has the same declaration, but (correctly) misses the
definition, which produces a much more helpful link failure than
a run-time BUG(). Using BUILD_BUG_ON would be even better.

Arnd

2012-08-15 13:04:20

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 15/31] arm64: SMP support

On Tuesday 14 August 2012, Catalin Marinas wrote:
> This patch adds SMP initialisation and spinlocks implementation for
> AArch64. The spinlock support uses the new load-acquire/store-release
> instructions to avoid explicit barriers. The architecture also specifies
> that an event is automatically generated when clearing the exclusive
> monitor state to wake up processors in WFE, so there is no need for an
> explicit DSB/SEV instruction sequence. The SEVL instruction is used to
> set the exclusive monitor locally as there is no conditional WFE and a
> branch is more expensive.
>
> For the SMP booting protocol, see Documentation/arm64/booting.txt.
>
> Signed-off-by: Will Deacon <[email protected]>
> Signed-off-by: Marc Zyngier <[email protected]>
> Signed-off-by: Catalin Marinas <[email protected]>

Acked-by: Arnd Bergmann <[email protected]>

2012-08-15 13:20:10

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 02/31] arm64: Kernel booting and initialisation

On Tuesday 14 August 2012, Catalin Marinas wrote:

> +The AArch64 exception model is made up of a number of exception levels
> +(EL0 - EL3), with EL0 and EL1 having a secure and a non-secure
> +counterpart. EL2 is the hypervisor level and exists only in non-secure
> +mode. EL3 is the highest priority level and exists only in secure mode.

I'm always confused by a description like this. It sounds like you cannot
have a hypervisor if you have code running in secure mode in EL3. What
I instead understand is that you enter non-secure mode by going from
EL3 into EL2.

> +2. Setup the device tree
> +-------------------------
> +
> +Requirement: MANDATORY
> +
> +The device tree blob (dtb) must be no bigger than 2 megabytes in size
> +and placed at a 2-megabyte boundary within the first 512 megabytes from
> +the start of the kernel image. This is to allow the kernel to map the
> +blob using a single section mapping in the initial page tables.

I've seen people put firmware for some peripherals into the device tree,
so that a device driver can grab a blob from there and load it into the
device, rather than calling request_firmware() which would fail if the
OS running on the system does not contain the blob. If such firmware is
too large, you end up violating the 2 MB limit you impose here.

Should we keep that limit and declare those use cases as invalid, or
should we try to make the boot protocol more flexible?

> diff --git a/arch/arm64/include/asm/setup.h b/arch/arm64/include/asm/setup.h
> new file mode 100644
> index 0000000..d766493
> --- /dev/null
> +++ b/arch/arm64/include/asm/setup.h
> @@ -0,0 +1,26 @@
> +#ifndef __ASM_SETUP_H
> +#define __ASM_SETUP_H
> +
> +#include <linux/types.h>
> +
> +#define COMMAND_LINE_SIZE 1024
> +
> +#endif

Is this necessary? The asm-generic version of this file allows 512 bytes,
which seems plenty.

> +unsigned int processor_id;
> +EXPORT_SYMBOL(processor_id);
> +
> +unsigned int elf_hwcap __read_mostly;
> +EXPORT_SYMBOL(elf_hwcap);

EXPORT_SYMBOL_GPL?

Neither of these looks like they should be used in drivers.

Arnd

2012-08-15 13:30:08

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 04/31] arm64: MMU definitions

On Tuesday 14 August 2012, Catalin Marinas wrote:
>
> +/*
> + * TCR flags.
> + */
> +#define TCR_TxSZ(x) (((64 - (x)) << 16) | ((64 - (x)) << 0))
> +#define TCR_IRGN_NC ((0 << 8) | (0 << 24))
> +#define TCR_IRGN_WBWA ((1 << 8) | (1 << 24))
> +#define TCR_IRGN_WT ((2 << 8) | (2 << 24))
> +#define TCR_IRGN_WBnWA ((3 << 8) | (3 << 24))
> +#define TCR_IRGN_MASK ((3 << 8) | (3 << 24))
> +#define TCR_ORGN_NC ((0 << 10) | (0 << 26))
> +#define TCR_ORGN_WBWA ((1 << 10) | (1 << 26))
> +#define TCR_ORGN_WT ((2 << 10) | (2 << 26))
> +#define TCR_ORGN_WBnWA ((3 << 10) | (3 << 26))
> +#define TCR_ORGN_MASK ((3 << 10) | (3 << 26))
> +#define TCR_SHARED ((3 << 12) | (3 << 28))
> +#define TCR_TG0_64K (1 << 14)
> +#define TCR_TG1_64K (1 << 30)
> +#define TCR_IPS_40BIT (2 << 32)
> +#define TCR_ASID16 (1 << 36)
> +

As a matter of coding style, I would much prefer tables like this to be
written as

#define TCR_IRGN_MASK 0x0000000003000300
#define TCR_IRGN_WBnWA 0x0000000003000300
#define TCR_IRGN_WT 0x0000000002000200
#define TCR_IRGN_WBWA 0x0000000001000100
#define TCR_IRGN_NC 0x0000000000000000

#define TCR_ORGN_MASK 0x000000000c000c00
#define TCR_ORGN_WBnWA 0x000000000c000c00
#define TCR_ORGN_WT 0x0000000008000800
#define TCR_ORGN_WBWA 0x0000000004000400
#define TCR_ORGN_NC 0x0000000000000000

The advantage of this is that you can visually compare the bitmasks
to a hex dump, and if you are suffering from endian-confused documentation
authors, there is no ambiguity about which end of the word is bit zero.

Arnd

2012-08-15 13:39:59

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 04/31] arm64: MMU definitions

Hi Arnd,

On Wed, Aug 15, 2012 at 02:30:01PM +0100, Arnd Bergmann wrote:
> On Tuesday 14 August 2012, Catalin Marinas wrote:
> > +/*
> > + * TCR flags.
> > + */
> > +#define TCR_TxSZ(x) (((64 - (x)) << 16) | ((64 - (x)) << 0))
> > +#define TCR_IRGN_NC ((0 << 8) | (0 << 24))
> > +#define TCR_IRGN_WBWA ((1 << 8) | (1 << 24))
> > +#define TCR_IRGN_WT ((2 << 8) | (2 << 24))
> > +#define TCR_IRGN_WBnWA ((3 << 8) | (3 << 24))
> > +#define TCR_IRGN_MASK ((3 << 8) | (3 << 24))
> > +#define TCR_ORGN_NC ((0 << 10) | (0 << 26))
> > +#define TCR_ORGN_WBWA ((1 << 10) | (1 << 26))
> > +#define TCR_ORGN_WT ((2 << 10) | (2 << 26))
> > +#define TCR_ORGN_WBnWA ((3 << 10) | (3 << 26))
> > +#define TCR_ORGN_MASK ((3 << 10) | (3 << 26))
> > +#define TCR_SHARED ((3 << 12) | (3 << 28))
> > +#define TCR_TG0_64K (1 << 14)
> > +#define TCR_TG1_64K (1 << 30)
> > +#define TCR_IPS_40BIT (2 << 32)
> > +#define TCR_ASID16 (1 << 36)
> > +
>
> As a matter of coding style, I would much prefer tables like this to be
> written as
>
> #define TCR_IRGN_MASK 0x0000000003000300
> #define TCR_IRGN_WBnWA 0x0000000003000300
> #define TCR_IRGN_WT 0x0000000002000200
> #define TCR_IRGN_WBWA 0x0000000001000100
> #define TCR_IRGN_NC 0x0000000000000000
>
> #define TCR_ORGN_MASK 0x000000000c000c00
> #define TCR_ORGN_WBnWA 0x000000000c000c00
> #define TCR_ORGN_WT 0x0000000008000800
> #define TCR_ORGN_WBWA 0x0000000004000400
> #define TCR_ORGN_NC 0x0000000000000000
>
> The advantage of this is that you can visually compare the bitmasks
> to a hex dump, and if you are suffering from endian-confused documentation
> authors, there is no ambiguity about which end of the word is bit zero.

That depends on the case, in some places it's more readable like this.
In the above case, I find it easier to compare against the documentation
which, for example, has groups of 2 bits at position 8 and 24 or 10 and
26 (for TTBR0 and TTBR1). The meaning of a group of 2 bits is described
separately as 0b00 (NC), 0b01(WBWA) etc. Same goes for the shareability
bits (12 and 28).

So I think at least for code writing it's less error-prone to write the
explicit bit position than a magic long hex.

--
Catalin

2012-08-15 13:45:18

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 05/31] arm64: MMU initialisation

On Tuesday 14 August 2012, Catalin Marinas wrote:
> This patch contains the initialisation of the memory blocks, MMU
> attributes and the memory map. Only five memory types are defined:
> Device nGnRnE (equivalent to Strongly Ordered), Device nGnRE (classic
> Device memory), Device GRE, Normal Non-cacheable and Normal Cacheable.
> Cache policies are supported via the memory attributes register
> (MAIR_EL1) and only affect the Normal Cacheable mappings.

It looks like you've managed to eliminate bootmem as I suggested earlier,
very nice!

Acked-by: Arnd Bergmann <[email protected]>

Arnd

2012-08-15 13:47:09

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 06/31] arm64: MMU fault handling and page table management

On Tuesday 14 August 2012, Catalin Marinas wrote:
> +
> +pgd_t *pgd_alloc(struct mm_struct *mm)
> +{
> + pgd_t *new_pgd;
> +
> + new_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL, PGD_ORDER);
> + if (!new_pgd)
> + return NULL;
> +
> + memset(new_pgd, 0, PAGE_SIZE << PGD_ORDER);
> +
> + return new_pgd;
> +}
> +
> +void pgd_free(struct mm_struct *mm, pgd_t *pgd)
> +{
> + free_pages((unsigned long)pgd, PGD_ORDER);
> +}

According to the documentation, you should only need 8kb for the pgd on
a 64kb page system. Is it required that you use up a full page here?

Arnd

2012-08-15 13:53:08

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 07/31] arm64: Process management

On Tuesday 14 August 2012, Catalin Marinas wrote:

> +#define THREAD_SIZE_ORDER 1
> +#define THREAD_SIZE 8192
> +#define THREAD_START_SP (THREAD_SIZE - 16)

THREAD_SIZE_ORDER looks wrong for 64kb-page kernels. It also doesn't seem to
be used, so better remove it.

Arnd

2012-08-15 13:56:14

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 08/31] arm64: CPU support

On Tuesday 14 August 2012, Catalin Marinas wrote:

> diff --git a/arch/arm64/include/asm/procinfo.h b/arch/arm64/include/asm/procinfo.h
> new file mode 100644
> index 0000000..81fece9
> --- /dev/null
> +++ b/arch/arm64/include/asm/procinfo.h
> @@ -0,0 +1,44 @@
> +/*
> + * Based on arch/arm/include/asm/procinfo.h
> + *
> + * Copyright (C) 1996-1999 Russell King
> + * Copyright (C) 2012 ARM Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
> + */
> +#ifndef __ASM_PROCINFO_H
> +#define __ASM_PROCINFO_H
> +
> +#ifdef __KERNEL__
> +
> +/*
> + * Note! struct processor is always defined if we're
> + * using MULTI_CPU, otherwise this entry is unused,
> + * but still exists.
> + *
> + * NOTE! The following structure is defined by assembly
> + * language, NOT C code. For more information, check:
> + * arch/arm/mm/proc-*.S and arch/arm/kernel/head.S
> + */
> +struct proc_info_list {
> + unsigned int cpu_val;
> + unsigned int cpu_mask;
> + unsigned long __cpu_flush; /* used by head.S */
> + const char *cpu_name;
> +};
> +
> +#else /* __KERNEL__ */
> +#include <asm/elf.h>
> +#warning "Please include asm/elf.h instead"
> +#endif /* __KERNEL__ */
> +#endif

I think you forgot to remove this file when you removed MULTI_CPU.

Arnd

2012-08-15 14:15:46

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 16/31] arm64: ELF definitions

On Tuesday 14 August 2012, Catalin Marinas wrote:
> +
> +void elf_set_personality(int personality)
> +{
> + switch (personality & PER_MASK) {
> + case PER_LINUX:
> + clear_thread_flag(TIF_32BIT);
> + break;
> + case PER_LINUX32:
> + set_thread_flag(TIF_32BIT);
> + break;
> + default:
> + pr_warning("Process %s tried to assume unknown personality %d\n",
> + current->comm, personality);
> + return;
> + }
> +
> + current->personality = personality;
> +}
> +EXPORT_SYMBOL(elf_set_personality);

This looks wrong: PER_LINUX/PER_LINUX32 decides over the output of the
uname system call, while TIF_32BIT decides over the instruction set
when returning to user space. You definitely should not set the personality
to the value you pass from the elf loader. Instead, just do

#define SET_PERSONALITY(ex) clear_thread_flag(TIF_32BIT);
#defined COMPAT_SET_PERSONALITY(ex) set_thread_flag(TIF_32BIT);

I also don't see a reason to export this. You'd have trouble loading
the elf interpreter module from user space without the elf interpreter.

Arnd

2012-08-15 14:22:21

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 17/31] arm64: System calls handling

On Tuesday 14 August 2012, Catalin Marinas wrote:

> +
> +/* This matches struct stat64 in glibc2.1, hence the absolutely
> + * insane amounts of padding around dev_t's.
> + * Note: The kernel zero's the padded region because glibc might read them
> + * in the hope that the kernel has stretched to using larger sizes.
> + */
> +struct stat64 {
> + compat_u64 st_dev;
> + unsigned char __pad0[4];
> +
> +#define STAT64_HAS_BROKEN_ST_INO 1
> + compat_ulong_t __st_ino;
> + compat_uint_t st_mode;
> + compat_uint_t st_nlink;
> +
> + compat_ulong_t st_uid;
> + compat_ulong_t st_gid;
> +
> + compat_u64 st_rdev;
> + unsigned char __pad3[4];
> +
> + compat_s64 st_size;
> + compat_ulong_t st_blksize;
> + compat_u64 st_blocks; /* Number 512-byte blocks allocated. */
> +
> + compat_ulong_t st_atime;
> + compat_ulong_t st_atime_nsec;
> +
> + compat_ulong_t st_mtime;
> + compat_ulong_t st_mtime_nsec;
> +
> + compat_ulong_t st_ctime;
> + compat_ulong_t st_ctime_nsec;
> +
> + compat_u64 st_ino;
> +};

The comment above struct stat64 is completely irrelevant here. I would instead
explain why you need your own stat64 in the first place.

> +int kernel_execve(const char *filename,
> + const char *const argv[],
> + const char *const envp[])
> +{
> + struct pt_regs regs;
> + int ret;
> +
> + memset(&regs, 0, sizeof(struct pt_regs));
> + ret = do_execve(filename,
> + (const char __user *const __user *)argv,
> + (const char __user *const __user *)envp, &regs);
> + if (ret < 0)
> + goto out;
> +
> + /*
> + * Save argc to the register structure for userspace.
> + */
> + regs.regs[0] = ret;
> +
> + /*
> + * We were successful. We won't be returning to our caller, but
> + * instead to user space by manipulating the kernel stack.
> + */
> + asm( "add x0, %0, %1\n\t"
> + "mov x1, %2\n\t"
> + "mov x2, %3\n\t"
> + "bl memmove\n\t" /* copy regs to top of stack */
> + "mov x27, #0\n\t" /* not a syscall */
> + "mov x28, %0\n\t" /* thread structure */
> + "mov sp, x0\n\t" /* reposition stack pointer */
> + "b ret_to_user"
> + :
> + : "r" (current_thread_info()),
> + "Ir" (THREAD_START_SP - sizeof(regs)),
> + "r" (&regs),
> + "Ir" (sizeof(regs))
> + : "x0", "x1", "x2", "x27", "x28", "x30", "memory");
> +
> + out:
> + return ret;
> +}
> +EXPORT_SYMBOL(kernel_execve);

Al Viro was recently talking about a generic implementation of execve.
I can't find that now, but I think you should use that.

> +
> +asmlinkage long sys_mmap(unsigned long addr, unsigned long len,
> + unsigned long prot, unsigned long flags,
> + unsigned long fd, off_t off)
> +{
> + if (offset_in_page(off) != 0)
> + return -EINVAL;
> +
> + return sys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
> +}
> +
> +/*
> + * Wrappers to pass the pt_regs argument.
> + */
> +#define sys_execve sys_execve_wrapper
> +#define sys_clone sys_clone_wrapper
> +#define sys_rt_sigreturn sys_rt_sigreturn_wrapper
> +#define sys_sigaltstack sys_sigaltstack_wrapper

I think

#define sys_mmap sys_mmap_pgoff

would be more appropriate than defining your own sys_mmap function here.
We should probably make that the default in asm-generic/unistd.h and
change the architectures that have their own implementation to override
it.

Arnd

2012-08-15 14:34:17

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 21/31] arm64: 32-bit (compat) applications support

On Tuesday 14 August 2012, Catalin Marinas wrote:

> +#ifdef CONFIG_AARCH32_EMULATION
> +#include <linux/compat.h>
> +
> +#define AARCH32_KERN_SIGRET_CODE_OFFSET 0x500
> +
> +extern const compat_ulong_t aarch32_sigret_code[6];
> +
> +int compat_setup_frame(int usig, struct k_sigaction *ka, sigset_t *set,
> + struct pt_regs *regs);
> +int compat_setup_rt_frame(int usig, struct k_sigaction *ka, siginfo_t *info,
> + sigset_t *set, struct pt_regs *regs);
> +
> +void compat_setup_restart_syscall(struct pt_regs *regs);
> +#else
> +
> +static inline int compat_setup_frame(int usid, struct k_sigaction *ka,
> + sigset_t *set, struct pt_regs *regs)
> +{
> + BUG();
> +}

What good is the run-time BUG() here? Nothing should be calling these
when CONFIG_COMPAT is disabled, so I think you should just remove
the #ifdef around the declarations, and the entire #else case.


> +asmlinkage int compat_sys_sched_rr_get_interval(compat_pid_t pid,
> + struct compat_timespec __user *interval)
> +{
> + struct timespec t;
> + int ret;
> + mm_segment_t old_fs = get_fs();
> +
> + set_fs(KERNEL_DS);
> + ret = sys_sched_rr_get_interval(pid, (struct timespec __user *)&t);
> + set_fs(old_fs);
> + if (put_compat_timespec(&t, interval))
> + return -EFAULT;
> + return ret;
> +}
> +
> +asmlinkage int compat_sys_sendfile(int out_fd, int in_fd,
> + compat_off_t __user *offset, s32 count)
> +{
> + mm_segment_t old_fs = get_fs();
> + int ret;
> + off_t of;
> +
> + if (offset && get_user(of, offset))
> + return -EFAULT;
> +
> + set_fs(KERNEL_DS);
> + ret = sys_sendfile(out_fd, in_fd, offset ? (off_t __user *)&of : NULL,
> + count);
> + set_fs(old_fs);
> +
> + if (offset && put_user(of, offset))
> + return -EFAULT;
> + return ret;
> +}

I guess it's time to move these two into common code. They look like they should
be shared across most architectures that have compat support.

> +asmlinkage int compat_sys_personality(compat_ulong_t personality)
> +{
> + int ret;
> +
> + if (personality(current->personality) == PER_LINUX32 &&
> + personality == PER_LINUX)
> + personality = PER_LINUX32;
> + ret = sys_personality(personality);
> + if (ret == PER_LINUX32)
> + ret = PER_LINUX;
> + return ret;
> +}

Where did you get this from?

You should not need compat_sys_personality, just call the native function.

Arnd

2012-08-15 14:35:41

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 22/31] arm64: Floating point and SIMD

On Tuesday 14 August 2012, Catalin Marinas wrote:
> This patch adds support for FP/ASIMD register bank saving and restoring
> during context switch and FP exception handling to generate SIGFPE.
> There are 32 128-bit registers and the context switching is currently
> done non-lazily. Benchmarks on real hardware are required before
> implementing lazy FP state saving/restoring.
>
> Signed-off-by: Will Deacon <[email protected]>
> Signed-off-by: Catalin Marinas <[email protected]>

Acked-by: Arnd Bergmann <[email protected]>

2012-08-15 14:50:07

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 20/31] arm64: User access library function

On Tuesday 14 August 2012, Catalin Marinas wrote:

> +/*
> + * Single-value transfer routines. They automatically use the right
> + * size if we just have the right pointer type. Note that the functions
> + * which read from user space (*get_*) need to take care not to leak
> + * kernel data even if the calling code is buggy and fails to check
> + * the return value. This means zeroing out the destination variable
> + * or buffer on error. Normally this is done out of line by the
> + * fixup code, but there are a few places where it intrudes on the
> + * main code path. When we only write to user space, there is no
> + * problem.
> + */
> +extern long __get_user_1(void *);
> +extern long __get_user_2(void *);
> +extern long __get_user_4(void *);
> +extern long __get_user_8(void *);
> +
> +#define __get_user_x(__r2,__p,__e,__s,__i...) \
> + asm volatile( \
> + __asmeq("%0", "x0") __asmeq("%1", "x2") \
> + "bl __get_user_" #__s \
> + : "=&r" (__e), "=r" (__r2) \
> + : "0" (__p) \
> + : __i, "cc")
> +
> +#define get_user(x,p) \
> + ({ \
> + register const typeof(*(p)) __user *__p asm("x0") = (p);\
> + register unsigned long __r2 asm("x2"); \
> + register long __e asm("x0"); \
> + switch (sizeof(*(__p))) { \
> + case 1: \
> + __get_user_x(__r2, __p, __e, 1, "x30"); \
> + break; \
> + case 2: \
> + __get_user_x(__r2, __p, __e, 2, "x3", "x30"); \
> + break; \
> + case 4: \
> + __get_user_x(__r2, __p, __e, 4, "x30"); \
> + break; \
> + case 8: \
> + __get_user_x(__r2, __p, __e, 8, "x30"); \
> + break; \
> + default: __e = __get_user_bad(); break; \
> + } \
> + x = (typeof(*(p))) __r2; \
> + __e; \
> + })

It's fairly unusual to have out of line get_user/put_user functions.
What is the reason for this, other than copying from ARM?

> +
> +__get_user_bad:
> + mov x2, #0
> + mov x0, #-EFAULT
> + ret
> +ENDPROC(__get_user_bad)

> +__put_user_bad:
> + mov x0, #-EFAULT
> + ret
> +ENDPROC(__put_user_bad)
> +

The purpose of these symbols is to provoke a link error when you
pass the wrong data into get_user/put_user. Actually defining them
completely breaks this logic, so you should remove these!

Arnd

2012-08-15 15:07:47

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 23/31] arm64: Debugging support

On Tuesday 14 August 2012, Catalin Marinas wrote:

> +const struct user_regset_view *task_user_regset_view(struct task_struct *task)
> +{
> +#ifdef CONFIG_AARCH32_EMULATION
> + if (test_tsk_thread_flag(task, TIF_32BIT))
> + return &user_aarch32_view;
> +#endif
> + return &user_aarch64_view;
> +}

Ah, nice. So you support 64 bit debuggers debugging 32 bit processes, right?

>From what I can tell, there is no support for 32 bit processes debugging
64 bit ones. Is that something you plan to add in the future, or do you
consider that out of scope? In either case, a comment would be helpful.

> +long arch_ptrace(struct task_struct *child, long request,
> + unsigned long addr, unsigned long data)
> +{
> + int ret;
> + unsigned long *datap = (unsigned long __user *)data;
> +
> + switch (request) {
> + case PTRACE_GET_THREAD_AREA:
> + ret = put_user(child->thread.tp_value, datap);
> + break;
> +
> +#ifdef CONFIG_HAVE_HW_BREAKPOINT
> + case PTRACE_GETHBPREGS:
> + ret = ptrace_gethbpregs(child, addr, datap);
> + break;
> +
> + case PTRACE_SETHBPREGS:
> + ret = ptrace_sethbpregs(child, addr, datap);
> + break;
> +#endif
> +
> + default:
> + ret = ptrace_request(child, request, addr, data);
> + break;
> + }
> +
> + return ret;
> +}

Is there a reaons why these are not regsets but have their own ptrace
commands? I believe new architectures should generally not add ptrace
commands any more.

Arnd

2012-08-15 15:09:03

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 24/31] arm64: Add support for /proc/sys/debug/exception-trace

On Tuesday 14 August 2012, Catalin Marinas wrote:
>
> This patch allows setting of the show_unhandled_signals variable via
> /proc/sys/debug/exception-trace. The default value is currently 1
> showing unhandled user faults (undefined instructions, data aborts) and
> invalid signal stack frames.
>
> Signed-off-by: Catalin Marinas <[email protected]>

Acked-by: Arnd Bergmann <[email protected]>

2012-08-15 15:11:15

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 25/31] arm64: Performance counters support

On Tuesday 14 August 2012, Catalin Marinas wrote:
> From: Will Deacon <[email protected]>
>
> This patch adds support for the AArch64 performance counters.
>
> Signed-off-by: Will Deacon <[email protected]>
> Signed-off-by: Catalin Marinas <[email protected]>
> ---
> arch/arm64/include/asm/perf_event.h | 22 +
> arch/arm64/include/asm/pmu.h | 82 +++
> arch/arm64/kernel/perf_event.c | 1368 +++++++++++++++++++++++++++++++++++
> tools/perf/perf.h | 6 +

Can you explain how AArch64 performance counters differ from the 32
bit ones? Do they work for AArch32 user space under AArch64 kernels?
Is it possible to share parts of the implementation with arch/arm?

Arnd

2012-08-15 15:21:27

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 26/31] arm64: Miscellaneous library functions

On Tuesday 14 August 2012, Catalin Marinas wrote:

> +
> +/*
> + * Use compiler builtins for simple inline operations.
> + */
> +static inline unsigned long __ffs(unsigned long word)
> +{
> + return __builtin_ffsl(word) - 1;
> +}
> +
> +static inline int ffs(int x)
> +{
> + return __builtin_ffs(x);
> +}
> +
> +static inline unsigned long __fls(unsigned long word)
> +{
> + return BITS_PER_LONG - 1 - __builtin_clzl(word);
> +}
> +
> +static inline int fls(int x)
> +{
> + return x ? sizeof(x) * BITS_PER_BYTE - __builtin_clz(x) : 0;
> +}

These are all great, but I think whether to use them or not should
depend on the compiler version rather than the architecture in
general. Do we know a minimum gcc version that supports all of the
above? Then we could put that code into the generic files.

If that's not possible, we could still make the implementation
available for other architectures by moving it to

asm-generic/bitops/builtin-__ffs.h
asm-generic/bitops/builtin-ffs.h
asm-generic/bitops/builtin-__fls.h
asm-generic/bitops/builtin-fls.h

> --- /dev/null
> +++ b/arch/arm64/lib/bitops.c
> @@ -0,0 +1,25 @@
> +/*
> + * Copyright (C) 2012 ARM Limited
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/spinlock.h>
> +#include <linux/atomic.h>
> +
> +#ifdef CONFIG_SMP
> +arch_spinlock_t __atomic_hash[ATOMIC_HASH_SIZE] __lock_aligned = {
> + [0 ... (ATOMIC_HASH_SIZE-1)] = __ARCH_SPIN_LOCK_UNLOCKED
> +};
> +#endif

What?

I suppose this is a leftover from an earlier version using the
generic bitops, right?

Arnd

2012-08-15 15:23:28

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 27/31] arm64: Loadable modules

On Tuesday 14 August 2012, Catalin Marinas wrote:
> +
> +void *module_alloc(unsigned long size)
> +{
> + return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
> + GFP_KERNEL, PAGE_KERNEL_EXEC, -1,
> + __builtin_return_address(0));
> +}
> +

What is the reason for using a separate virtual address range for the
modules instead of falling back to the default module_alloc function
that uses vmalloc_exec()?

Arnd

2012-08-15 15:36:17

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 27/31] arm64: Loadable modules

On Wed, Aug 15, 2012 at 04:23:21PM +0100, Arnd Bergmann wrote:
> On Tuesday 14 August 2012, Catalin Marinas wrote:
> > +
> > +void *module_alloc(unsigned long size)
> > +{
> > + return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
> > + GFP_KERNEL, PAGE_KERNEL_EXEC, -1,
> > + __builtin_return_address(0));
> > +}
> > +
>
> What is the reason for using a separate virtual address range for the
> modules instead of falling back to the default module_alloc function
> that uses vmalloc_exec()?

Primarily branch relocation, we have a limitation to 128MB branch range.
The alternative would be to always compile the modules with a large
memory model but we may lose some performance and could make the
relocation handling even harder. What we do now is pretty much similar
to static linking but at module load time.

--
Catalin

2012-08-15 15:52:51

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 28/31] arm64: Generic timers support

On Tuesday 14 August 2012, Catalin Marinas wrote:
> +static void arch_timer_reg_write(int reg, u32 val)
> +{
> + switch (reg) {
> + case ARCH_TIMER_REG_CTRL:
> + asm volatile("msr cntp_ctl_el0, %0" : : "r" (val));
> + break;
> + case ARCH_TIMER_REG_TVAL:
> + asm volatile("msr cntp_tval_el0, %0" : : "r" (val));
> + break;
> + default:
> + BUG();
> + }
> +
> + isb();
> +}
> +
> +static u32 arch_timer_reg_read(int reg)
> +{
> + u32 val;
> +
> + switch (reg) {
> + case ARCH_TIMER_REG_CTRL:
> + asm volatile("mrs %0, cntp_ctl_el0" : "=r" (val));
> + break;
> + case ARCH_TIMER_REG_FREQ:
> + asm volatile("mrs %0, cntfrq_el0" : "=r" (val));
> + break;
> + case ARCH_TIMER_REG_TVAL:
> + asm volatile("mrs %0, cntp_tval_el0" : "=r" (val));
> + break;
> + default:
> + BUG();
> + }
> +
> + return val;
> +}

Are the inline assemblies the only things in this driver that are
specific to AArch64?
Are you planning to use the same file for 32 bit ARM as well, e.g.
when running a 32 bit guest kernel on a 64 bit host?

Arnd

2012-08-15 15:56:53

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 29/31] arm64: Miscellaneous header files

On Tuesday 14 August 2012, Catalin Marinas wrote:

> diff --git a/arch/arm64/include/asm/cmpxchg.h b/arch/arm64/include/asm/cmpxchg.h
> new file mode 100644
> index 0000000..dc50de7
> --- /dev/null
> +++ b/arch/arm64/include/asm/cmpxchg.h

> + default:
> + __bad_cmpxchg(ptr, size);
> + oldval = 0;
> + }

I did not see a definition for __bad_cmpxchg but I may have missed that.
Please make sure that none exists, or just use BUILD_BUG_ON().

Arnd

2012-08-15 15:57:19

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 31/31] arm64: MAINTAINERS update

On Tuesday 14 August 2012, Catalin Marinas wrote:
>
> This patch updates the MAINTAINERS file for the AArch64 Linux kernel
> port.
>
> Signed-off-by: Catalin Marinas <[email protected]>


Acked-by: Arnd Bergmann <[email protected]>

2012-08-15 16:08:04

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 30/31] arm64: Build infrastructure

On Tuesday 14 August 2012, Catalin Marinas wrote:

> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> new file mode 100644
> index 0000000..1ce3d04
> --- /dev/null
> +++ b/arch/arm64/Kconfig
> @@ -0,0 +1,261 @@
> +config ARM64
> + def_bool y
> + select OF
> + select OF_EARLY_FLATTREE
> + select IRQ_DOMAIN
> + select HAVE_AOUT
> + select HAVE_DMA_ATTRS
> + select HAVE_DMA_API_DEBUG
> + select HAVE_IDE

Please remove HAVE_AOUT and HAVE_IDE

> + select HAVE_MEMBLOCK
> + select RTC_LIB
> + select SYS_SUPPORTS_APM_EMULATION

APM_EMULATION can probably go too

> +
> +config ARCH_PHYS_ADDR_T_64BIT
> + def_bool y
> +
> +config HAVE_PWM
> + bool

HAVE_PWM is going away soon.

> +config AARCH32_EMULATION
> + bool "Kernel support for 32-bit EL0"
> + depends on !ARM64_64K_PAGES
> + select COMPAT_BINFMT_ELF
> + help
> + This option enables support for a 32-bit EL0 running under a 64-bit
> + kernel at EL1. AArch32-specific components such as system calls,
> + the user helper functions, VFP support and the ptrace interface are
> + handled appropriately by the kernel.
> +
> + If you want to execute 32-bit userspace applications, say Y.
> +
> +config COMPAT
> + def_bool y
> + depends on AARCH32_EMULATION

As mentioned, you can just merge the two into CONFIG_COMPAT.

> +targets := Image Image.gz
> +
> +$(obj)/Image: vmlinux FORCE
> + $(call if_changed,objcopy)
> + @echo ' Kernel: $@ is ready'
> +
> +$(obj)/Image.gz: $(obj)/Image FORCE
> + $(call if_changed,gzip)
> + @echo ' Kernel: $@ is ready'

Drop the useless output, at least when building with make -s.

> +if [ -x /sbin/loadmap ]; then
> + /sbin/loadmap
> +else
> + echo "You have to install it yourself"
> +fi

What is loadmap?

> diff --git a/arch/arm64/configs/generic_defconfig b/arch/arm64/configs/generic_defconfig
> new file mode 100644
> index 0000000..d9aac95
> --- /dev/null
> +++ b/arch/arm64/configs/generic_defconfig

I think it can just be called "defconfig".

> diff --git a/arch/arm64/mm/Kconfig b/arch/arm64/mm/Kconfig
> new file mode 100644
> index 0000000..8e94e52
> --- /dev/null
> +++ b/arch/arm64/mm/Kconfig
> @@ -0,0 +1,5 @@
> +config MMU
> + def_bool y
> +
> +config CPU_64
> + def_bool y

This file can be dropped. You can unconditionally enable CONFIG_MMU,
and the CPU_64 symbol is pointless, just use CONFIG_64BIT.

Arnd

2012-08-15 16:14:06

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 13/31] arm64: Device specific operations

On Tuesday 14 August 2012, Catalin Marinas wrote:
>
> This patch adds several definitions for device communication, including
> I/O accessors and ioremap(). The __raw_* accessors are implemented as
> inline asm to avoid compiler generation of post-indexed accesses (less
> efficient to emulate in a virtualised environment).
>
> Signed-off-by: Will Deacon <[email protected]>
> Signed-off-by: Catalin Marinas <[email protected]>


Acked-by: Arnd Bergmann <[email protected]>

2012-08-15 16:16:08

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 14/31] arm64: DMA mapping API

On Tuesday 14 August 2012, Catalin Marinas wrote:
> +static struct dma_map_ops arm64_swiotlb_dma_ops = {
> + .alloc = arm64_swiotlb_alloc_coherent,
> + .free = arm64_swiotlb_free_coherent,
> + .map_page = arm64_swiotlb_map_page,
> + .unmap_page = arm64_swiotlb_unmap_page,
> + .map_sg = arm64_swiotlb_map_sg_attrs,
> + .unmap_sg = arm64_swiotlb_unmap_sg_attrs,
> + .sync_single_for_cpu = arm64_swiotlb_sync_single_for_cpu,
> + .sync_single_for_device = arm64_swiotlb_sync_single_for_device,
> + .sync_sg_for_cpu = arm64_swiotlb_sync_sg_for_cpu,
> + .sync_sg_for_device = arm64_swiotlb_sync_sg_for_device,
> + .dma_supported = swiotlb_dma_supported,
> + .mapping_error = swiotlb_dma_mapping_error,
> +};
> +
> +void __init swiotlb_init_with_default_size(size_t default_size, int verbose);
> +
> +void __init arm64_swiotlb_init(size_t max_size)
> +{
> + dma_ops = &arm64_swiotlb_dma_ops;
> + swiotlb_init_with_default_size(min((size_t)SZ_64M, max_size), 1);
> +}

Why is swiotlb the default? I would expect that most devices can in fact
use the entire 64 bit address space, so you can use a simple linear
implementation for those.

Arnd

2012-08-15 16:17:05

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 27/31] arm64: Loadable modules

On Wednesday 15 August 2012, Catalin Marinas wrote:
> Primarily branch relocation, we have a limitation to 128MB branch range.
> The alternative would be to always compile the modules with a large
> memory model but we may lose some performance and could make the
> relocation handling even harder. What we do now is pretty much similar
> to static linking but at module load time.

Ok, makes sense.

Arnd

2012-08-15 16:34:49

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v2 04/31] arm64: MMU definitions

On Wed, Aug 15, 2012 at 3:30 PM, Arnd Bergmann <[email protected]> wrote:
>> +#define TCR_IPS_40BIT (2 << 32)

By default, constants are int, i.e. 32-bit. So you must write

2ULL << 32

>> +#define TCR_ASID16 (1 << 36)

1ULL

> As a matter of coding style, I would much prefer tables like this to be
> written as
>
> #define TCR_IRGN_MASK 0x0000000003000300

0x0000000003000300ULL, to be safe

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2012-08-15 16:46:07

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 04/31] arm64: MMU definitions

On Wed, Aug 15, 2012 at 05:34:46PM +0100, Geert Uytterhoeven wrote:
> On Wed, Aug 15, 2012 at 3:30 PM, Arnd Bergmann <[email protected]> wrote:
> >> +#define TCR_IPS_40BIT (2 << 32)
>
> By default, constants are int, i.e. 32-bit. So you must write
>
> 2ULL << 32
>
> >> +#define TCR_ASID16 (1 << 36)
>
> 1ULL

Those higher constants are only used in assembly currently, so no
side-effects. But I agree that I should use something like:

(_AC(1, UL) << 36)

(UL is sufficient on a 64-bit system)

Thanks.

--
Catalin

2012-08-15 17:06:16

by Olof Johansson

[permalink] [raw]
Subject: Re: [PATCH v2 02/31] arm64: Kernel booting and initialisation

On Wed, Aug 15, 2012 at 01:20:02PM +0000, Arnd Bergmann wrote:
> On Tuesday 14 August 2012, Catalin Marinas wrote:
>
> > +The AArch64 exception model is made up of a number of exception levels
> > +(EL0 - EL3), with EL0 and EL1 having a secure and a non-secure
> > +counterpart. EL2 is the hypervisor level and exists only in non-secure
> > +mode. EL3 is the highest priority level and exists only in secure mode.
>
> I'm always confused by a description like this. It sounds like you cannot
> have a hypervisor if you have code running in secure mode in EL3. What
> I instead understand is that you enter non-secure mode by going from
> EL3 into EL2.
>
> > +2. Setup the device tree
> > +-------------------------
> > +
> > +Requirement: MANDATORY
> > +
> > +The device tree blob (dtb) must be no bigger than 2 megabytes in size
> > +and placed at a 2-megabyte boundary within the first 512 megabytes from
> > +the start of the kernel image. This is to allow the kernel to map the
> > +blob using a single section mapping in the initial page tables.
>
> I've seen people put firmware for some peripherals into the device tree,
> so that a device driver can grab a blob from there and load it into the
> device, rather than calling request_firmware() which would fail if the
> OS running on the system does not contain the blob. If such firmware is
> too large, you end up violating the 2 MB limit you impose here.
>
> Should we keep that limit and declare those use cases as invalid, or
> should we try to make the boot protocol more flexible?
>
> > diff --git a/arch/arm64/include/asm/setup.h b/arch/arm64/include/asm/setup.h
> > new file mode 100644
> > index 0000000..d766493
> > --- /dev/null
> > +++ b/arch/arm64/include/asm/setup.h
> > @@ -0,0 +1,26 @@
> > +#ifndef __ASM_SETUP_H
> > +#define __ASM_SETUP_H
> > +
> > +#include <linux/types.h>
> > +
> > +#define COMMAND_LINE_SIZE 1024
> > +
> > +#endif
>
> Is this necessary? The asm-generic version of this file allows 512 bytes,
> which seems plenty.

Chrome OS on my system today uses a 553 character cmdline, in particular
because some of the device mapper arguments are in there (since we boot without
ramdisk). It adds up quickly.

I suggest keeping it common with x86 since those limits are what people
will be used to (2048).


-Olof

2012-08-15 17:37:46

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 02/31] arm64: Kernel booting and initialisation

Hi Olof,

On Wed, Aug 15, 2012 at 12:06:45AM +0100, Olof Johansson wrote:
> On Tue, Aug 14, 2012 at 06:52:03PM +0100, Catalin Marinas wrote:
> > +Before jumping into the kernel, the following conditions must be met:
> > +
> > +- Quiesce all DMA capable devices so that memory does not get
> > + corrupted by bogus network packets or disk data. This will save
> > + you many hours of debug.
> > +
> > +- Primary CPU general-purpose register settings
> > + x0 = physical address of device tree blob (dtb) in system RAM.
> > +
> > +- CPU mode
> > + All forms of interrupts must be masked in PSTATE.DAIF (Debug, SError,
> > + IRQ and FIQ).
> > + The CPU must be in either EL2 (RECOMMENDED in order to have access to
> > + the virtualisation extensions) or non-secure EL1.
> > +
> > +- Caches, MMUs
> > + The MMU must be off.
> > + Instruction cache may be on or off.
> > + Data cache must be off and invalidated.
> > +
> > +- Architected timers
> > + CNTFRQ must be programmed with the timer frequency.
> > + If entering the kernel at EL1, CNTHCTL_EL2 must have EL1PCTEN (bit 0)
> > + set where available.
> > +
> > +- Coherency
> > + All CPUs to be booted by the kernel must be part of the same coherency
> > + domain on entry to the kernel. This may require IMPLEMENTATION DEFINED
> > + initialisation to enable the receiving of maintenance operations on
> > + each CPU.
> > +
> > +- System registers
> > + All writable architected system registers at the exception level where
> > + the kernel image will be entered must be initialised by software at a
> > + higher exception level to prevent execution in an UNKNOWN state.
>
> Given the recent development of ARM platforms, you might want to mandate
> the state of IOMMUs as well (they should probably be off, since there
> should be no active DMA activity). Graphics would be the exception to
> this, since if you want to keep scanning out a splash screen, you'll
> have to keep doing DMA...

We'll enhance this document as we get hardware as it's not clear whether
we can simply mandate it to be off. We may have situations with some
simple IOMMU that is previously set up by the firmware and the kernel
doesn't get access to it. One example is the System MMU from ARM that
supports stage 2 (hypervisor) translations and you just run a guest
kernel without any control of the IOMMU.

> > +- The primary CPU must jump directly to the first instruction of the
> > + kernel image. The device tree blob passed by this CPU must contain
> > + for each CPU node:
> > +
> > + 1. An 'enable-method' property. Currently, the only supported value
> > + for this field is the string "spin-table".
> > +
> > + 2. A 'cpu-release-addr' property identifying a 64-bit,
> > + zero-initialised memory location.
>
> These would be good to have documented in the
> Documentation/devicetree/bindings hierarchy as well.

OK.

> > index 0000000..d766493
> > --- /dev/null
> > +++ b/arch/arm64/include/asm/setup.h
> > @@ -0,0 +1,26 @@
> > +/*
> > + * Based on arch/arm/include/asm/setup.h
> > + *
> > + * Copyright (C) 1997-1999 Russell King
> > + * Copyright (C) 2012 ARM Ltd.
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program. If not, see <http://www.gnu.org/licenses/>.
> > + */
> > +#ifndef __ASM_SETUP_H
> > +#define __ASM_SETUP_H
> > +
> > +#include <linux/types.h>
> > +
> > +#define COMMAND_LINE_SIZE 1024
>
> Probably not a huge deal, and other architectures seem to be all over
> the map on this, but you might want to go with a larger value now rather
> than later. 2048 or 4096 perhaps?

It looks like there are many different values, including the asm-generic
one which is 512. I'm happy to follow the x86 example and change it to
2048, it doesn't really matter.

> > diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> > new file mode 100644
> > index 0000000..34ccdc0
> > --- /dev/null
> > +++ b/arch/arm64/kernel/head.S
>
> [...]
>
> > +/*
> > + * Setup common bits before finally enabling the MMU. Essentially this is just
> > + * loading the page table pointer and vector base registers.
> > + *
> > + * On entry to this code, x0 must contain the SCTLR_EL1 value for turning on
> > + * the MMU.
> > + */
> > +__enable_mmu:
>
> ENTRY()?

__enable_mmu is not used outside this file, so no need for ENTRY().

> > + ldr x5, =vectors
> > + msr vbar_el1, x5
> > + msr ttbr0_el1, x25 // load TTBR0
> > + msr ttbr1_el1, x26 // load TTBR1
> > + isb
> > + b __turn_mmu_on
> > +ENDPROC(__enable_mmu)
>
> ...or just END()? Same for a few of the other functions below.

ENDPROC() gives us ".type @function" in addition to END(). This proved
to be useful in the past for debugging symbols, unwind table (though we
don't have the latter on AArch64).

> > diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> > new file mode 100644
> > index 0000000..f25186f
> > --- /dev/null
> > +++ b/arch/arm64/kernel/setup.c
>
> [...]
>
> > +static void __init setup_processor(void)
> > +{
> > + struct proc_info_list *list;
> > +
> > + /*
> > + * locate processor in the list of supported processor
> > + * types. The linker builds this table for us from the
> > + * entries in arch/arm/mm/proc.S
> > + */
>
> Probably from arch/arm64/... somewhere?

Yes, I did a grep and found a few more.

> > + printk("CPU: %s [%08x] revision %d\n",
> > + cpu_name, read_cpuid_id(), read_cpuid_id() & 15);
> > +
> > + sprintf(init_utsname()->machine, "aarch64");
>
> > + initial_boot_params = devtree;
> > + dt_root = of_get_flat_dt_root();
> > +
> > + machine_name = of_get_flat_dt_prop(dt_root, "model", NULL);
> > + if (!machine_name)
> > + machine_name = of_get_flat_dt_prop(dt_root, "compatible", NULL);
> > + if (!machine_name)
> > + machine_name = "<unknown>";
> > + pr_info("Machine: %s\n", machine_name);
>
> This property is an array of strings. It would be more valuable to print out
> the entry that was matched for a platform instead of the provided one from the
> device tree.

If we add machine_desc structure back, we could print which machine was
matched. But so far I try to keep the SoC code to a minimum and just do
the probing later in the SoC code (of_find_matching_node). Ideally we
shouldn't have any SoC code and just keep code in drivers but we'll see
how far we can get. We can discuss more details at the KS as I would
like the arm-soc team to get involved here.

Thanks.

--
Catalin

2012-08-15 19:03:29

by Olof Johansson

[permalink] [raw]
Subject: Re: [PATCH v2 02/31] arm64: Kernel booting and initialisation

Hi,

On Wed, Aug 15, 2012 at 06:37:11PM +0100, Catalin Marinas wrote:
> Hi Olof,
>
> > Given the recent development of ARM platforms, you might want to mandate
> > the state of IOMMUs as well (they should probably be off, since there
> > should be no active DMA activity). Graphics would be the exception to
> > this, since if you want to keep scanning out a splash screen, you'll
> > have to keep doing DMA...
>
> We'll enhance this document as we get hardware as it's not clear whether
> we can simply mandate it to be off. We may have situations with some
> simple IOMMU that is previously set up by the firmware and the kernel
> doesn't get access to it. One example is the System MMU from ARM that
> supports stage 2 (hypervisor) translations and you just run a guest
> kernel without any control of the IOMMU.

Ok, fair enough.

> > > +/*
> > > + * Setup common bits before finally enabling the MMU. Essentially this is just
> > > + * loading the page table pointer and vector base registers.
> > > + *
> > > + * On entry to this code, x0 must contain the SCTLR_EL1 value for turning on
> > > + * the MMU.
> > > + */
> > > +__enable_mmu:
> >
> > ENTRY()?
>
> __enable_mmu is not used outside this file, so no need for ENTRY().
>
> > > + ldr x5, =vectors
> > > + msr vbar_el1, x5
> > > + msr ttbr0_el1, x25 // load TTBR0
> > > + msr ttbr1_el1, x26 // load TTBR1
> > > + isb
> > > + b __turn_mmu_on
> > > +ENDPROC(__enable_mmu)
> >
> > ...or just END()? Same for a few of the other functions below.
>
> ENDPROC() gives us ".type @function" in addition to END(). This proved
> to be useful in the past for debugging symbols, unwind table (though we
> don't have the latter on AArch64).

A good as reason as any, sounds good.

> > > +static void __init setup_processor(void)
> > > +{
> > > + struct proc_info_list *list;
> > > +
> > > + /*
> > > + * locate processor in the list of supported processor
> > > + * types. The linker builds this table for us from the
> > > + * entries in arch/arm/mm/proc.S
> > > + */
> >
> > Probably from arch/arm64/... somewhere?
>
> Yes, I did a grep and found a few more.

Yeah, I pointed out some other stale ARM-derived comments in other patches.

> > > + printk("CPU: %s [%08x] revision %d\n",
> > > + cpu_name, read_cpuid_id(), read_cpuid_id() & 15);
> > > +
> > > + sprintf(init_utsname()->machine, "aarch64");
> >
> > > + initial_boot_params = devtree;
> > > + dt_root = of_get_flat_dt_root();
> > > +
> > > + machine_name = of_get_flat_dt_prop(dt_root, "model", NULL);
> > > + if (!machine_name)
> > > + machine_name = of_get_flat_dt_prop(dt_root, "compatible", NULL);
> > > + if (!machine_name)
> > > + machine_name = "<unknown>";
> > > + pr_info("Machine: %s\n", machine_name);
> >
> > This property is an array of strings. It would be more valuable to print out
> > the entry that was matched for a platform instead of the provided one from the
> > device tree.
>
> If we add machine_desc structure back, we could print which machine was
> matched. But so far I try to keep the SoC code to a minimum and just do
> the probing later in the SoC code (of_find_matching_node). Ideally we
> shouldn't have any SoC code and just keep code in drivers but we'll see
> how far we can get. We can discuss more details at the KS as I would
> like the arm-soc team to get involved here.

Interesting approach, I wonder if it'll scale, in particular if it comes
to needing to do early setup and init. For device-level setup, generic
will probably work just fine. And if it doesn't, things can be changed
later. So it sounds like a good start.

Definitely something we should discuss. I suggest not doing it at KS
though, since only half of the arm-soc team is invited there. So the
ARM mini-summit or hallway around LPC is a better venue.


-Olof

2012-08-15 19:53:43

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 02/31] arm64: Kernel booting and initialisation

On 15 August 2012 20:03, Olof Johansson <[email protected]> wrote:
> On Wed, Aug 15, 2012 at 06:37:11PM +0100, Catalin Marinas wrote:
>> If we add machine_desc structure back, we could print which machine was
>> matched. But so far I try to keep the SoC code to a minimum and just do
>> the probing later in the SoC code (of_find_matching_node). Ideally we
>> shouldn't have any SoC code and just keep code in drivers but we'll see
>> how far we can get. We can discuss more details at the KS as I would
>> like the arm-soc team to get involved here.
>
> Interesting approach, I wonder if it'll scale, in particular if it comes
> to needing to do early setup and init. For device-level setup, generic
> will probably work just fine. And if it doesn't, things can be changed
> later. So it sounds like a good start.
>
> Definitely something we should discuss. I suggest not doing it at KS
> though, since only half of the arm-soc team is invited there. So the
> ARM mini-summit or hallway around LPC is a better venue.

I was indeed thinking of the ARM mini-summit or hallway discussions.
The KS has different topics and it wouldn't have been of wide interest
anyway.

--
Catalin

2012-08-16 10:06:36

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v2 03/31] arm64: Exception handling

On Wed, Aug 15, 2012 at 02:03:47PM +0100, Arnd Bergmann wrote:
> On Tuesday 14 August 2012, Catalin Marinas wrote:
>
> > +#ifdef CONFIG_AARCH32_EMULATION
> > +#define compat_thumb_mode(regs) \
> > + (((regs)->pstate & COMPAT_PSR_T_BIT))
> > +#else
> > +#define compat_thumb_mode(regs) (0)
> > +#endif
>
> The symbol we use on other platforms is CONFIG_COMPAT. I don't think you
> need to have a separate CONFIG_AARCH32_EMULATION

Using COMPAT does preclude the possibility of doing something like the x32
ABI later on though. Some other architectures seem to do something similar
(MIPS32_COMPAT, IA32_EMULATION).

Will

2012-08-16 10:23:14

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v2 16/31] arm64: ELF definitions

On Wed, Aug 15, 2012 at 03:15:39PM +0100, Arnd Bergmann wrote:
> On Tuesday 14 August 2012, Catalin Marinas wrote:
> > +
> > +void elf_set_personality(int personality)
> > +{
> > + switch (personality & PER_MASK) {
> > + case PER_LINUX:
> > + clear_thread_flag(TIF_32BIT);
> > + break;
> > + case PER_LINUX32:
> > + set_thread_flag(TIF_32BIT);
> > + break;
> > + default:
> > + pr_warning("Process %s tried to assume unknown personality %d\n",
> > + current->comm, personality);
> > + return;
> > + }
> > +
> > + current->personality = personality;
> > +}
> > +EXPORT_SYMBOL(elf_set_personality);
>
> This looks wrong: PER_LINUX/PER_LINUX32 decides over the output of the
> uname system call, while TIF_32BIT decides over the instruction set
> when returning to user space. You definitely should not set the personality
> to the value you pass from the elf loader. Instead, just do
>
> #define SET_PERSONALITY(ex) clear_thread_flag(TIF_32BIT);
> #defined COMPAT_SET_PERSONALITY(ex) set_thread_flag(TIF_32BIT);

In this case, won't uname be incorrect (aarch64l) for aarch32 tasks (which
expect something like armv8l)?

Will

2012-08-16 10:28:17

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v2 21/31] arm64: 32-bit (compat) applications support

On Wed, Aug 15, 2012 at 03:34:04PM +0100, Arnd Bergmann wrote:
> On Tuesday 14 August 2012, Catalin Marinas wrote:
> > +asmlinkage int compat_sys_personality(compat_ulong_t personality)
> > +{
> > + int ret;
> > +
> > + if (personality(current->personality) == PER_LINUX32 &&
> > + personality == PER_LINUX)
> > + personality = PER_LINUX32;
> > + ret = sys_personality(personality);
> > + if (ret == PER_LINUX32)
> > + ret = PER_LINUX;
> > + return ret;
> > +}
>
> Where did you get this from?
>
> You should not need compat_sys_personality, just call the native function.

Hmm, but in that case an aarch32 application doing a personality(PER_LINUX)
syscall will start seeing the wrong uname.

Will

2012-08-16 10:48:01

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v2 23/31] arm64: Debugging support

On Wed, Aug 15, 2012 at 04:07:36PM +0100, Arnd Bergmann wrote:
> On Tuesday 14 August 2012, Catalin Marinas wrote:
>
> > +const struct user_regset_view *task_user_regset_view(struct task_struct *task)
> > +{
> > +#ifdef CONFIG_AARCH32_EMULATION
> > + if (test_tsk_thread_flag(task, TIF_32BIT))
> > + return &user_aarch32_view;
> > +#endif
> > + return &user_aarch64_view;
> > +}
>
> Ah, nice. So you support 64 bit debuggers debugging 32 bit processes, right?

That should work if the debugger can deal with it, yes.

> From what I can tell, there is no support for 32 bit processes debugging
> 64 bit ones. Is that something you plan to add in the future, or do you
> consider that out of scope? In either case, a comment would be helpful.

That can't really work because the debugger won't be able to manipulate
child pointers properly without us adding a new ptrace interface (and then,
I still wonder about how feasible it really is). I can add a comment.

> > +long arch_ptrace(struct task_struct *child, long request,
> > + unsigned long addr, unsigned long data)
> > +{
> > + int ret;
> > + unsigned long *datap = (unsigned long __user *)data;
> > +
> > + switch (request) {
> > + case PTRACE_GET_THREAD_AREA:
> > + ret = put_user(child->thread.tp_value, datap);
> > + break;
> > +
> > +#ifdef CONFIG_HAVE_HW_BREAKPOINT
> > + case PTRACE_GETHBPREGS:
> > + ret = ptrace_gethbpregs(child, addr, datap);
> > + break;
> > +
> > + case PTRACE_SETHBPREGS:
> > + ret = ptrace_sethbpregs(child, addr, datap);
> > + break;
> > +#endif
> > +
> > + default:
> > + ret = ptrace_request(child, request, addr, data);
> > + break;
> > + }
> > +
> > + return ret;
> > +}
>
> Is there a reaons why these are not regsets but have their own ptrace
> commands? I believe new architectures should generally not add ptrace
> commands any more.

I could probably add some regset wrappers about the hbp accessors (which we
have to keep for the compat ptrace interface). I'll have a think as it might
even make sense to have different regsets for breakpoints and watchpoints.

As for the the tls, is it worth having a regset with only one register?

Will

2012-08-16 10:51:44

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v2 25/31] arm64: Performance counters support

On Wed, Aug 15, 2012 at 04:11:11PM +0100, Arnd Bergmann wrote:
> On Tuesday 14 August 2012, Catalin Marinas wrote:
> > From: Will Deacon <[email protected]>
> >
> > This patch adds support for the AArch64 performance counters.
> >
> > Signed-off-by: Will Deacon <[email protected]>
> > Signed-off-by: Catalin Marinas <[email protected]>
> > ---
> > arch/arm64/include/asm/perf_event.h | 22 +
> > arch/arm64/include/asm/pmu.h | 82 +++
> > arch/arm64/kernel/perf_event.c | 1368 +++++++++++++++++++++++++++++++++++
> > tools/perf/perf.h | 6 +
>
> Can you explain how AArch64 performance counters differ from the 32
> bit ones? Do they work for AArch32 user space under AArch64 kernels?
> Is it possible to share parts of the implementation with arch/arm?

Perf should work for compat tasks, yes. I'd like to share some of the code
with arch/arm/ and I've started reworking the arch/arm/ stuff to accomodate
this better:

git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git perf/updates

I'm not sure how well it will fit in drivers/ but I'm certainly willing to
give it a try.

Will

2012-08-16 10:57:20

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v2 26/31] arm64: Miscellaneous library functions

On Wed, Aug 15, 2012 at 04:21:14PM +0100, Arnd Bergmann wrote:
> On Tuesday 14 August 2012, Catalin Marinas wrote:
>
> > +
> > +/*
> > + * Use compiler builtins for simple inline operations.
> > + */
> > +static inline unsigned long __ffs(unsigned long word)
> > +{
> > + return __builtin_ffsl(word) - 1;
> > +}
> > +
> > +static inline int ffs(int x)
> > +{
> > + return __builtin_ffs(x);
> > +}
> > +
> > +static inline unsigned long __fls(unsigned long word)
> > +{
> > + return BITS_PER_LONG - 1 - __builtin_clzl(word);
> > +}
> > +
> > +static inline int fls(int x)
> > +{
> > + return x ? sizeof(x) * BITS_PER_BYTE - __builtin_clz(x) : 0;
> > +}
>
> These are all great, but I think whether to use them or not should
> depend on the compiler version rather than the architecture in
> general. Do we know a minimum gcc version that supports all of the
> above? Then we could put that code into the generic files.

I imagine that the version of GCC that supports these builtins varies for
each architecture. For aarch64, the compile will always support these
builtins and these particular ones are guaranteed to be inlined.

> If that's not possible, we could still make the implementation
> available for other architectures by moving it to
>
> asm-generic/bitops/builtin-__ffs.h
> asm-generic/bitops/builtin-ffs.h
> asm-generic/bitops/builtin-__fls.h
> asm-generic/bitops/builtin-fls.h

Yeah, that might be an idea. The architecture can then decide to use them if
it knows they are available and usable.

> > --- /dev/null
> > +++ b/arch/arm64/lib/bitops.c
> > @@ -0,0 +1,25 @@
> > +/*
> > + * Copyright (C) 2012 ARM Limited
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program. If not, see <http://www.gnu.org/licenses/>.
> > + */
> > +
> > +#include <linux/kernel.h>
> > +#include <linux/spinlock.h>
> > +#include <linux/atomic.h>
> > +
> > +#ifdef CONFIG_SMP
> > +arch_spinlock_t __atomic_hash[ATOMIC_HASH_SIZE] __lock_aligned = {
> > + [0 ... (ATOMIC_HASH_SIZE-1)] = __ARCH_SPIN_LOCK_UNLOCKED
> > +};
> > +#endif
>
> What?
>
> I suppose this is a leftover from an earlier version using the
> generic bitops, right?

We currently use the generic atomic bitops (asm-generic/bitops/atomic.h)
which contains:

# define ATOMIC_HASH(a) (&(__atomic_hash[ (((unsigned long) a)/L1_CACHE_BYTES) & (ATOMIC_HASH_SIZE-1) ]))

so we have to provide a definition for the array. We have additional patches
containing optimised assembly implementations of the atomic bitops which we
will push later, once we've got some hardware to benchmark with.

Will

2012-08-16 12:38:07

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 16/31] arm64: ELF definitions

On Thursday 16 August 2012, Will Deacon wrote:
> > This looks wrong: PER_LINUX/PER_LINUX32 decides over the output of the
> > uname system call, while TIF_32BIT decides over the instruction set
> > when returning to user space. You definitely should not set the personality
> > to the value you pass from the elf loader. Instead, just do
> >
> > #define SET_PERSONALITY(ex) clear_thread_flag(TIF_32BIT);
> > #defined COMPAT_SET_PERSONALITY(ex) set_thread_flag(TIF_32BIT);
>
> In this case, won't uname be incorrect (aarch64l) for aarch32 tasks (which
> expect something like armv8l)?

No, the uname output is meant to tell you about the system, not the
instruction set that you are using (you already know that in compiled
code).

The main use case is to fool stuff like autoconf into assuming your
architecture is the other one, e.g. when building a 32 bit package
on using a 64 bit /bin/bash to run ./configure or vice versa.

Arnd

2012-08-16 12:38:05

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 03/31] arm64: Exception handling

On Thursday 16 August 2012, Will Deacon wrote:
> On Wed, Aug 15, 2012 at 02:03:47PM +0100, Arnd Bergmann wrote:
> > On Tuesday 14 August 2012, Catalin Marinas wrote:
> >
> > > +#ifdef CONFIG_AARCH32_EMULATION
> > > +#define compat_thumb_mode(regs) \
> > > + (((regs)->pstate & COMPAT_PSR_T_BIT))
> > > +#else
> > > +#define compat_thumb_mode(regs) (0)
> > > +#endif
> >
> > The symbol we use on other platforms is CONFIG_COMPAT. I don't think you
> > need to have a separate CONFIG_AARCH32_EMULATION
>
> Using COMPAT does preclude the possibility of doing something like the x32
> ABI later on though. Some other architectures seem to do something similar
> (MIPS32_COMPAT, IA32_EMULATION).

Ok, fair enough.

Arnd

2012-08-16 12:39:37

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 21/31] arm64: 32-bit (compat) applications support

On Thursday 16 August 2012, Will Deacon wrote:
>
> On Wed, Aug 15, 2012 at 03:34:04PM +0100, Arnd Bergmann wrote:
> > On Tuesday 14 August 2012, Catalin Marinas wrote:
> > > +asmlinkage int compat_sys_personality(compat_ulong_t personality)
> > > +{
> > > + int ret;
> > > +
> > > + if (personality(current->personality) == PER_LINUX32 &&
> > > + personality == PER_LINUX)
> > > + personality = PER_LINUX32;
> > > + ret = sys_personality(personality);
> > > + if (ret == PER_LINUX32)
> > > + ret = PER_LINUX;
> > > + return ret;
> > > +}
> >
> > Where did you get this from?
> >
> > You should not need compat_sys_personality, just call the native function.
>
> Hmm, but in that case an aarch32 application doing a personality(PER_LINUX)
> syscall will start seeing the wrong uname.

Not the wrong uname, just the default one, which is correct.

Arnd

2012-08-16 12:40:09

by Linus Walleij

[permalink] [raw]
Subject: Re: [PATCH v2 28/31] arm64: Generic timers support

On Tue, Aug 14, 2012 at 7:52 PM, Catalin Marinas
<[email protected]> wrote:

> From: Marc Zyngier <[email protected]>
(...)
> +static void __init arch_timer_calibrate(void)

I think you wrot in the last review thread that this should be renamed
"arch_timer_get_freq()".

> +{
> + if (arch_timer_rate == 0) {
> + arch_timer_reg_write(ARCH_TIMER_REG_CTRL, 0);
> + arch_timer_rate = arch_timer_reg_read(ARCH_TIMER_REG_FREQ);
> +
> + /* Check the timer frequency. */
> + if (arch_timer_rate == 0)
> + panic("Architected timer frequency is set to zero.\n"
> + "You must set this in your .dts file\n");

This comment about the .dts file is completely out-of-place.
The error message should be about the register containing 0, right?

Also, at this very point in the code, I think you should convert that panic()
to pr_err() and insert:

#include <linux/clk.h>

struct clk *clk;

clk = clk_get_sys("arm_generic_timer", NULL);
if (IS_ERR(clk))
panic("Not even a clk to get freq off! Giving up.\n");
err = clk_prepare_enable(clk);
if (err) {
pr_err("smp_twd: clock failed to prepare: %d\n", err);
clk_put(clk);
panic("Not even a clk to get freq off! Giving up.\n");
}
arch_timer_rate = clk_get_rate(clk);

Possibly the clk should even be able to override the register value,
then it's the other way around.

> + }
> +
> + /* Cache the sched_clock multiplier to save a divide in the hot path. */
> +
> + sched_clock_mult = NSEC_PER_SEC / arch_timer_rate;
> +
> + pr_info("Architected local timer running at %u.%02uMHz.\n",
> + arch_timer_rate / 1000000, (arch_timer_rate / 10000) % 100);

This multiplier and print should be moved below the call site since
the DT frequency
overrides it and then you get no print which is sad, and incorrect
sched_clock_mult
which is a bug.

> +}

(...)
> + /* Try to determine the frequency from the device tree or CNTFRQ */
> + if (!of_property_read_u32(np, "clock-frequency", &freq))
> + arch_timer_rate = freq;
> + arch_timer_calibrate();

rename arch_timer_get_freq()

And move that print here.

Yours,
Linus Walleij

2012-08-16 12:49:46

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 23/31] arm64: Debugging support

On Thursday 16 August 2012, Will Deacon wrote:
> On Wed, Aug 15, 2012 at 04:07:36PM +0100, Arnd Bergmann wrote:
> > On Tuesday 14 August 2012, Catalin Marinas wrote:

> > From what I can tell, there is no support for 32 bit processes debugging
> > 64 bit ones. Is that something you plan to add in the future, or do you
> > consider that out of scope? In either case, a comment would be helpful.
>
> That can't really work because the debugger won't be able to manipulate
> child pointers properly without us adding a new ptrace interface (and then,
> I still wonder about how feasible it really is). I can add a comment.

You can already have a 32 bit gdb that is able to do remote debugging of
64 bit processes using a gdb server process. I guess it wouldn't be
too strange to have a ptrace extension to allow the native case as well.
I agree it's not a high priority.

> > > +long arch_ptrace(struct task_struct *child, long request,
> > > + unsigned long addr, unsigned long data)
> > > +{
> > > + int ret;
> > > + unsigned long *datap = (unsigned long __user *)data;
> > > +
> > > + switch (request) {
> > > + case PTRACE_GET_THREAD_AREA:
> > > + ret = put_user(child->thread.tp_value, datap);
> > > + break;
> > > +
> > > +#ifdef CONFIG_HAVE_HW_BREAKPOINT
> > > + case PTRACE_GETHBPREGS:
> > > + ret = ptrace_gethbpregs(child, addr, datap);
> > > + break;
> > > +
> > > + case PTRACE_SETHBPREGS:
> > > + ret = ptrace_sethbpregs(child, addr, datap);
> > > + break;
> > > +#endif
> > > +
> > > + default:
> > > + ret = ptrace_request(child, request, addr, data);
> > > + break;
> > > + }
> > > +
> > > + return ret;
> > > +}
> >
> > Is there a reaons why these are not regsets but have their own ptrace
> > commands? I believe new architectures should generally not add ptrace
> > commands any more.
>
> I could probably add some regset wrappers about the hbp accessors (which we
> have to keep for the compat ptrace interface). I'll have a think as it might
> even make sense to have different regsets for breakpoints and watchpoints.
>
> As for the the tls, is it worth having a regset with only one register?

Better ask the gdb folks. I'm adding Uli to Cc, maybe he has some insight.

Arnd

2012-08-16 12:53:58

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 02/31] arm64: Kernel booting and initialisation

On Wed, Aug 15, 2012 at 02:20:02PM +0100, Arnd Bergmann wrote:
> On Tuesday 14 August 2012, Catalin Marinas wrote:
>
> > +The AArch64 exception model is made up of a number of exception levels
> > +(EL0 - EL3), with EL0 and EL1 having a secure and a non-secure
> > +counterpart. EL2 is the hypervisor level and exists only in non-secure
> > +mode. EL3 is the highest priority level and exists only in secure mode.
>
> I'm always confused by a description like this. It sounds like you cannot
> have a hypervisor if you have code running in secure mode in EL3. What
> I instead understand is that you enter non-secure mode by going from
> EL3 into EL2.

>From EL3 you can drop to either EL2 (non-secure) or EL1 (secure or
non-secure), it's the highest privilege level. But we don't support the
kernel entering at EL3, the SoC firmware runs in this mode. I'll try to
make it clearer.

> > +2. Setup the device tree
> > +-------------------------
> > +
> > +Requirement: MANDATORY
> > +
> > +The device tree blob (dtb) must be no bigger than 2 megabytes in size
> > +and placed at a 2-megabyte boundary within the first 512 megabytes from
> > +the start of the kernel image. This is to allow the kernel to map the
> > +blob using a single section mapping in the initial page tables.
>
> I've seen people put firmware for some peripherals into the device tree,
> so that a device driver can grab a blob from there and load it into the
> device, rather than calling request_firmware() which would fail if the
> OS running on the system does not contain the blob. If such firmware is
> too large, you end up violating the 2 MB limit you impose here.
>
> Should we keep that limit and declare those use cases as invalid, or
> should we try to make the boot protocol more flexible?

I would be ok to allowing larger range here but we currently don't get
the information about the size of the dtb early enough to know how much
to map. We could make some other arbitrary choice if needed like being
in the first 16MB of RAM and we map this range.

--
Catalin

2012-08-16 13:00:42

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 26/31] arm64: Miscellaneous library functions

On Thursday 16 August 2012, Will Deacon wrote:
> > > +
> > > +#include <linux/kernel.h>
> > > +#include <linux/spinlock.h>
> > > +#include <linux/atomic.h>
> > > +
> > > +#ifdef CONFIG_SMP
> > > +arch_spinlock_t __atomic_hash[ATOMIC_HASH_SIZE] __lock_aligned = {
> > > + [0 ... (ATOMIC_HASH_SIZE-1)] = __ARCH_SPIN_LOCK_UNLOCKED
> > > +};
> > > +#endif
> >
> > What?
> >
> > I suppose this is a leftover from an earlier version using the
> > generic bitops, right?
>
> We currently use the generic atomic bitops (asm-generic/bitops/atomic.h)
> which contains:
>
> # define ATOMIC_HASH(a) (&(__atomic_hash[ (((unsigned long) a)/L1_CACHE_BYTES) & (ATOMIC_HASH_SIZE-1) ]))
>
> so we have to provide a definition for the array. We have additional patches
> containing optimised assembly implementations of the atomic bitops which we
> will push later, once we've got some hardware to benchmark with.
>

Ah, I was confusing this with the asm/atomic.h stuff, for which you already
provide an optimized version.

The generic atomic bitops are really horrible in performance and I would
expect that there is just one obvious way to implement bitops using ldaxr/stlxr,
so I recommend just doing that even if you have no hardware for benchmarking.

The s390 version should be fairly easy to adapt.

Arnd

2012-08-16 14:12:32

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 26/31] arm64: Miscellaneous library functions

On Thu, Aug 16, 2012 at 02:00:32PM +0100, Arnd Bergmann wrote:
> On Thursday 16 August 2012, Will Deacon wrote:
> > > > +
> > > > +#include <linux/kernel.h>
> > > > +#include <linux/spinlock.h>
> > > > +#include <linux/atomic.h>
> > > > +
> > > > +#ifdef CONFIG_SMP
> > > > +arch_spinlock_t __atomic_hash[ATOMIC_HASH_SIZE] __lock_aligned = {
> > > > + [0 ... (ATOMIC_HASH_SIZE-1)] = __ARCH_SPIN_LOCK_UNLOCKED
> > > > +};
> > > > +#endif
> > >
> > > What?
> > >
> > > I suppose this is a leftover from an earlier version using the
> > > generic bitops, right?
> >
> > We currently use the generic atomic bitops (asm-generic/bitops/atomic.h)
> > which contains:
> >
> > # define ATOMIC_HASH(a) (&(__atomic_hash[ (((unsigned long) a)/L1_CACHE_BYTES) & (ATOMIC_HASH_SIZE-1) ]))
> >
> > so we have to provide a definition for the array. We have additional patches
> > containing optimised assembly implementations of the atomic bitops which we
> > will push later, once we've got some hardware to benchmark with.
> >
>
> Ah, I was confusing this with the asm/atomic.h stuff, for which you already
> provide an optimized version.
>
> The generic atomic bitops are really horrible in performance and I would
> expect that there is just one obvious way to implement bitops using ldaxr/stlxr,
> so I recommend just doing that even if you have no hardware for benchmarking.

As Will said, we have the code already but I dropped it from the initial
set patches to be reviewed to keep them simpler. They will be added
later.

--
Catalin

2012-08-16 15:17:41

by Tobias Klauser

[permalink] [raw]
Subject: Re: [PATCH v2 07/31] arm64: Process management

On 2012-08-14 at 19:52:08 +0200, Catalin Marinas <[email protected]> wrote:
> +void cpu_idle(void)
> +{
> + local_fiq_enable();
> +
> + /* endless idle loop with no priority at all */
> + while (1) {
> + tick_nohz_idle_enter();
> + rcu_idle_enter();
> + while (!need_resched()) {
> + /*
> + * We need to disable interrupts here to ensure
> + * we don't miss a wakeup call.
> + */
> + local_irq_disable();
> + if (!need_resched()) {
> + stop_critical_timings();
> + pm_idle();
> + start_critical_timings();
> + /*
> + * pm_idle functions should always return
> + * with IRQs enabled.
> + */
> + WARN_ON(irqs_disabled());
> + } else {
> + local_irq_enable();
> + }
> + }
> + rcu_idle_exit();
> + tick_nohz_idle_exit();
> + preempt_enable_no_resched();
> + schedule();
> + preempt_disable();

You could use schedule_preempt_disabled() instead of the above 3 calls.
See http://lkml.kernel.org/n/[email protected]

Cheers
Tobias

2012-08-16 18:59:47

by Nicolas Pitre

[permalink] [raw]
Subject: Re: [PATCH v2 02/31] arm64: Kernel booting and initialisation

On Tue, 14 Aug 2012, Catalin Marinas wrote:

> The patch adds the kernel booting and the initial setup code.
> Documentation/arm64/booting.txt describes the booting protocol on the
> AArch64 Linux kernel. This is subject to change following the work on
> boot standardisation, ACPI.
>
> Signed-off-by: Will Deacon <[email protected]>
> Signed-off-by: Catalin Marinas <[email protected]>

A few minor comments below, otherwise...

Acked-by: Nicolas Pitre <[email protected]>

> ---
> Documentation/arm64/booting.txt | 141 +++++++++++
> arch/arm64/include/asm/setup.h | 26 ++
> arch/arm64/kernel/head.S | 521 +++++++++++++++++++++++++++++++++++++++
> arch/arm64/kernel/setup.c | 357 +++++++++++++++++++++++++++
> 4 files changed, 1045 insertions(+), 0 deletions(-)
> create mode 100644 Documentation/arm64/booting.txt
> create mode 100644 arch/arm64/include/asm/setup.h
> create mode 100644 arch/arm64/kernel/head.S
> create mode 100644 arch/arm64/kernel/setup.c
>
> diff --git a/Documentation/arm64/booting.txt b/Documentation/arm64/booting.txt
> new file mode 100644
> index 0000000..3197820
> --- /dev/null
> +++ b/Documentation/arm64/booting.txt
> @@ -0,0 +1,141 @@
> + Booting AArch64 Linux
> + =====================
> +
> +Author: Will Deacon <[email protected]>
> +Date : 25 April 2012
> +
> +This document is based on the ARM booting document by Russell King and
> +is relevant to all public releases of the AArch64 Linux kernel.
> +
> +The AArch64 exception model is made up of a number of exception levels
> +(EL0 - EL3), with EL0 and EL1 having a secure and a non-secure
> +counterpart. EL2 is the hypervisor level and exists only in non-secure
> +mode. EL3 is the highest priority level and exists only in secure mode.
> +
> +For the purposes of this document, we will use the term `boot loader'
> +simply to define all software that executes on the CPU(s) before control
> +is passed to the Linux kernel. This may include secure monitor and
> +hypervisor code, or it may just be a handful of instructions for
> +preparing a minimal boot environment.
> +
> +Essentially, the boot loader should provide (as a minimum) the
> +following:
> +
> +1. Setup and initialise the RAM
> +2. Setup the device tree
> +3. Decompress the kernel image
> +4. Call the kernel image
> +
> +
> +1. Setup and initialise RAM
> +---------------------------
> +
> +Requirement: MANDATORY
> +
> +The boot loader is expected to find and initialise all RAM that the
> +kernel will use for volatile data storage in the system. It performs
> +this in a machine dependent manner. (It may use internal algorithms
> +to automatically locate and size all RAM, or it may use knowledge of
> +the RAM in the machine, or any other method the boot loader designer
> +sees fit.)
> +
> +
> +2. Setup the device tree
> +-------------------------
> +
> +Requirement: MANDATORY
> +
> +The device tree blob (dtb) must be no bigger than 2 megabytes in size
> +and placed at a 2-megabyte boundary within the first 512 megabytes from
> +the start of the kernel image. This is to allow the kernel to map the
> +blob using a single section mapping in the initial page tables.

It might be a good idea to specify the minimum information that should
be contained in the DTB. Memory size is certainly one such item.

> +3. Decompress the kernel image
> +------------------------------
> +
> +Requirement: OPTIONAL
> +
> +The AArch64 kernel does not provide a decompressor and therefore
> +requires gzip decompression to be performed by the boot loader if the
> +default Image.gz target is used. For bootloaders that do not implement
> +this requirement, the larger Image target is available instead.

Some people will want to use bzip2 or whatever other decompressor du
jour. Maybe this shouldn't be gzip specific, or just presented as a
possible option?

> +4. Call the kernel image
> +------------------------
> +
> +Requirement: MANDATORY
> +
> +The decompressed kernel image contains a 32-byte header as follows:
> +
> + u32 magic = 0x14000008; /* branch to stext, little-endian */
> + u32 res0 = 0; /* reserved */
> + u64 text_offset; /* Image load offset */
> + u64 res1 = 0; /* reserved */
> + u64 res2 = 0; /* reserved */
> +
> +The image must be placed at the specified offset (currently 0x80000)
> +from the start of the system RAM and called there. The start of the
> +system RAM must be aligned to 2MB.
> +
> +Before jumping into the kernel, the following conditions must be met:
> +
> +- Quiesce all DMA capable devices so that memory does not get
> + corrupted by bogus network packets or disk data. This will save
> + you many hours of debug.
> +
> +- Primary CPU general-purpose register settings
> + x0 = physical address of device tree blob (dtb) in system RAM.

I think you should mandate that some additional registers be explicitly
initialized to 0 for possible future usage (and also mentioned in the
corresponding code comment). We have that issue on ARM32 where it is
unclear if r2 contains a valid ATAG/DTB address or not as its content
was not defined before.

[...]


Nicolas

2012-08-17 07:06:20

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 23/31] arm64: Debugging support

On Thursday 16 August 2012, Arnd Bergmann wrote:
> On Thursday 16 August 2012, Will Deacon wrote:
> > On Wed, Aug 15, 2012 at 04:07:36PM +0100, Arnd Bergmann wrote:
> > > On Tuesday 14 August 2012, Catalin Marinas wrote:
>
> > > From what I can tell, there is no support for 32 bit processes debugging
> > > 64 bit ones. Is that something you plan to add in the future, or do you
> > > consider that out of scope? In either case, a comment would be helpful.
> >
> > That can't really work because the debugger won't be able to manipulate
> > child pointers properly without us adding a new ptrace interface (and then,
> > I still wonder about how feasible it really is). I can add a comment.
>
> You can already have a 32 bit gdb that is able to do remote debugging of
> 64 bit processes using a gdb server process. I guess it wouldn't be
> too strange to have a ptrace extension to allow the native case as well.
> I agree it's not a high priority.
>
> > > > +long arch_ptrace(struct task_struct *child, long request,
> > > > + unsigned long addr, unsigned long data)
> > > > +{
> > > > + int ret;
> > > > + unsigned long *datap = (unsigned long __user *)data;
> > > > +
> > > > + switch (request) {
> > > > + case PTRACE_GET_THREAD_AREA:
> > > > + ret = put_user(child->thread.tp_value, datap);
> > > > + break;
> > > > +
> > > > +#ifdef CONFIG_HAVE_HW_BREAKPOINT
> > > > + case PTRACE_GETHBPREGS:
> > > > + ret = ptrace_gethbpregs(child, addr, datap);
> > > > + break;
> > > > +
> > > > + case PTRACE_SETHBPREGS:
> > > > + ret = ptrace_sethbpregs(child, addr, datap);
> > > > + break;
> > > > +#endif
> > > > +
> > > > + default:
> > > > + ret = ptrace_request(child, request, addr, data);
> > > > + break;
> > > > + }
> > > > +
> > > > + return ret;
> > > > +}
> > >
> > > Is there a reaons why these are not regsets but have their own ptrace
> > > commands? I believe new architectures should generally not add ptrace
> > > commands any more.
> >
> > I could probably add some regset wrappers about the hbp accessors (which we
> > have to keep for the compat ptrace interface). I'll have a think as it might
> > even make sense to have different regsets for breakpoints and watchpoints.
> >
> > As for the the tls, is it worth having a regset with only one register?
>
> Better ask the gdb folks. I'm adding Uli to Cc, maybe he has some insight.

Sorry for the dumb question, but why do you even need PTRACE_GET_THREAD_AREA
for 64 bit tasks? I thought the thread pointer is a GPR, or is this just
for compat tasks?

Arnd

2012-08-17 08:56:48

by Tony Lindgren

[permalink] [raw]
Subject: Re: [PATCH v2 02/31] arm64: Kernel booting and initialisation

* Catalin Marinas <[email protected]> [120814 11:00]:
> +3. Decompress the kernel image
> +------------------------------
> +
> +Requirement: OPTIONAL
> +
> +The AArch64 kernel does not provide a decompressor and therefore
> +requires gzip decompression to be performed by the boot loader if the
> +default Image.gz target is used. For bootloaders that do not implement
> +this requirement, the larger Image target is available instead.

Maybe add something here about why AArch64 does not provide a
decompressor? That's something everybody will wonder while reading
this part.

> +- Caches, MMUs
> + The MMU must be off.
> + Instruction cache may be on or off.
> + Data cache must be off and invalidated.

External caches must be configured by bootloader but set to disabled
state?

Other than that:

Acked-by: Tony Lindgren <[email protected]>

2012-08-17 09:05:04

by Tony Lindgren

[permalink] [raw]
Subject: Re: [PATCH v2 04/31] arm64: MMU definitions

* Catalin Marinas <[email protected]> [120814 10:57]:
> The virtual memory layout is described in
> Documentation/arm64/memory.txt. This patch adds the MMU definitions for
> the 4KB and 64KB translation table configurations. The SECTION_SIZE is
> 2MB with 4KB page and 512MB with 64KB page configuration.
>
> PHYS_OFFSET is calculated at run-time and stored in a variable (no
> run-time code patching at this stage).

Care to clarify this part a bit? Is the memory standardized somehow
now and not needed? Or do we still need to add that for various SoCs
later on?

Other than that:

Acked-by: Tony Lindgren <[email protected]>

2012-08-17 09:19:22

by Tony Lindgren

[permalink] [raw]
Subject: Re: [PATCH v2 13/31] arm64: Device specific operations

* Catalin Marinas <[email protected]> [120814 11:05]:
> --- /dev/null
> +++ b/arch/arm64/mm/ioremap.c
> +
> +void __iomem *__ioremap(phys_addr_t phys_addr, size_t size, pgprot_t prot)
> +{
> + return __ioremap_caller(phys_addr, size, prot,
> + __builtin_return_address(0));
> +}
> +EXPORT_SYMBOL(__ioremap);

>From SoC point of view, we're probably going to need __ioremap_exec()
here too for programming clocks during runtime in SRAM etc. But that
can be added later as needed.

Acked-by: Tony Lindgren <[email protected]>

2012-08-17 09:21:41

by Tony Lindgren

[permalink] [raw]
Subject: Re: [PATCH v2 15/31] arm64: SMP support

* Catalin Marinas <[email protected]> [120814 11:05]:
> This patch adds SMP initialisation and spinlocks implementation for
> AArch64. The spinlock support uses the new load-acquire/store-release
> instructions to avoid explicit barriers. The architecture also specifies
> that an event is automatically generated when clearing the exclusive
> monitor state to wake up processors in WFE, so there is no need for an
> explicit DSB/SEV instruction sequence. The SEVL instruction is used to
> set the exclusive monitor locally as there is no conditional WFE and a
> branch is more expensive.

Do we always have SMP hardware on arm64? Or are we going to need to
again add smp_on_up support later on?

Other than that:

Acked-by: Tony Lindgren <[email protected]>

2012-08-17 09:21:50

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 04/31] arm64: MMU definitions

On Fri, Aug 17, 2012 at 10:04:52AM +0100, Tony Lindgren wrote:
> * Catalin Marinas <[email protected]> [120814 10:57]:
> > The virtual memory layout is described in
> > Documentation/arm64/memory.txt. This patch adds the MMU definitions for
> > the 4KB and 64KB translation table configurations. The SECTION_SIZE is
> > 2MB with 4KB page and 512MB with 64KB page configuration.
> >
> > PHYS_OFFSET is calculated at run-time and stored in a variable (no
> > run-time code patching at this stage).
>
> Care to clarify this part a bit? Is the memory standardized somehow
> now and not needed? Or do we still need to add that for various SoCs
> later on?

The memory is not standardised but we have FDT to fully specify it. The
PHYS_OFFSET does not need to be defined, it is automatically detected at
boo-time based on the kernel load address and stored in a variable to be
used later.

> Other than that:
>
> Acked-by: Tony Lindgren <[email protected]>

Thanks.

--
Catalin

2012-08-17 09:29:14

by Tony Lindgren

[permalink] [raw]
Subject: Re: [PATCH v2 28/31] arm64: Generic timers support

* Catalin Marinas <[email protected]> [120814 11:00]:
> From: Marc Zyngier <[email protected]>
>
> This patch adds support for the ARM generic timers with A64 instructions
> for accessing the timer registers. It uses the physical counter as the
> clock source and the virtual counter as sched_clock.
>
> The timer frequency can be specified via DT or read from the CNTFRQ_EL0
> register. The physical counter is also accessible from user space
> allowing fast gettimeofday() implementation.

Can we always assume we can boot the kernel with ARM generic timers
and interrupts?

If so, maybe that should be mentioned in the booting requirements doc as
that means we can delay SoC specific initialization quite a bit and boot
with generic kernel to the initramfs ;)

Other than that:

Acked-by: Tony Lindgren <[email protected]>

2012-08-17 09:32:23

by Tony Lindgren

[permalink] [raw]
Subject: Re: [PATCH v2 30/31] arm64: Build infrastructure

* Catalin Marinas <[email protected]> [120814 11:00]:
> --- /dev/null
> +++ b/arch/arm64/Kconfig
> @@ -0,0 +1,261 @@
> +config ARM64
> + def_bool y
> + select OF
> + select OF_EARLY_FLATTREE
> + select IRQ_DOMAIN
> + select HAVE_AOUT
> + select HAVE_DMA_ATTRS
> + select HAVE_DMA_API_DEBUG
> + select HAVE_IDE
> + select HAVE_MEMBLOCK
> + select RTC_LIB
> + select SYS_SUPPORTS_APM_EMULATION
> + select HAVE_GENERIC_DMA_COHERENT
> + select GENERIC_IOMAP
> + select HAVE_IRQ_WORK
> + select HAVE_PERF_EVENTS
> + select HAVE_ARCH_TRACEHOOK
> + select PERF_USE_VMALLOC
> + select HAVE_HW_BREAKPOINT if PERF_EVENTS
> + select HAVE_GENERIC_HARDIRQS
> + select GENERIC_HARDIRQS_NO_DEPRECATED
> + select HAVE_SPARSE_IRQ
> + select SPARSE_IRQ
> + select GENERIC_IRQ_SHOW
> + select GENERIC_SMP_IDLE_THREAD
> + select NO_BOOTMEM
> + help
> + ARM 64-bit (AArch64) Linux support.

Anything we should select here for ARM generic timers and
interrupts assuming we can always expect to boot using those?

Other than that:

Acked-by: Tony Lindgren <[email protected]>

2012-08-17 09:33:30

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 15/31] arm64: SMP support

On Fri, Aug 17, 2012 at 10:21:33AM +0100, Tony Lindgren wrote:
> * Catalin Marinas <[email protected]> [120814 11:05]:
> > This patch adds SMP initialisation and spinlocks implementation for
> > AArch64. The spinlock support uses the new load-acquire/store-release
> > instructions to avoid explicit barriers. The architecture also specifies
> > that an event is automatically generated when clearing the exclusive
> > monitor state to wake up processors in WFE, so there is no need for an
> > explicit DSB/SEV instruction sequence. The SEVL instruction is used to
> > set the exclusive monitor locally as there is no conditional WFE and a
> > branch is more expensive.
>
> Do we always have SMP hardware on arm64? Or are we going to need to
> again add smp_on_up support later on?

There isn't anything in the architecture specs that mandates multiple
cores but given the current trend it's very likely that we'll always
have MP.

An improvement in AArch64 is that we can use the SMP cache/TLB ops (the
inner shareable variants) even on a UP system so there is no need for
run-time code patching for correct execution.

--
Catalin

2012-08-17 09:37:05

by Tony Lindgren

[permalink] [raw]
Subject: Re: [PATCH v2 00/31] AArch64 Linux kernel port

* Catalin Marinas <[email protected]> [120814 10:54]:
> This is the 2nd version of the set of patches implementing Linux kernel
> support for the 64-bit ARM architecture (AArch64). Thanks to all who
> provided feedback on the previous version.

I've looked at these patches mostly from supporting SoCs point of view
and made few minor comments. It's nice to see that at least so far I did
not spot mutually exlusive #ifdef else stuff that's been causing major issues
supporting multiple SoCs with the current ARMs :) So all these patches,
feel free to add:

Acked-by: Tony Lindgren <[email protected]>

2012-08-17 09:38:26

by Tony Lindgren

[permalink] [raw]
Subject: Re: [PATCH v2 04/31] arm64: MMU definitions

* Catalin Marinas <[email protected]> [120817 02:21]:
> On Fri, Aug 17, 2012 at 10:04:52AM +0100, Tony Lindgren wrote:
> > * Catalin Marinas <[email protected]> [120814 10:57]:
> > > The virtual memory layout is described in
> > > Documentation/arm64/memory.txt. This patch adds the MMU definitions for
> > > the 4KB and 64KB translation table configurations. The SECTION_SIZE is
> > > 2MB with 4KB page and 512MB with 64KB page configuration.
> > >
> > > PHYS_OFFSET is calculated at run-time and stored in a variable (no
> > > run-time code patching at this stage).
> >
> > Care to clarify this part a bit? Is the memory standardized somehow
> > now and not needed? Or do we still need to add that for various SoCs
> > later on?
>
> The memory is not standardised but we have FDT to fully specify it. The
> PHYS_OFFSET does not need to be defined, it is automatically detected at
> boo-time based on the kernel load address and stored in a variable to be
> used later.

OK nice thanks.

Tony

2012-08-17 09:39:19

by Tony Lindgren

[permalink] [raw]
Subject: Re: [PATCH v2 15/31] arm64: SMP support

* Catalin Marinas <[email protected]> [120817 02:33]:
> On Fri, Aug 17, 2012 at 10:21:33AM +0100, Tony Lindgren wrote:
> > * Catalin Marinas <[email protected]> [120814 11:05]:
> > > This patch adds SMP initialisation and spinlocks implementation for
> > > AArch64. The spinlock support uses the new load-acquire/store-release
> > > instructions to avoid explicit barriers. The architecture also specifies
> > > that an event is automatically generated when clearing the exclusive
> > > monitor state to wake up processors in WFE, so there is no need for an
> > > explicit DSB/SEV instruction sequence. The SEVL instruction is used to
> > > set the exclusive monitor locally as there is no conditional WFE and a
> > > branch is more expensive.
> >
> > Do we always have SMP hardware on arm64? Or are we going to need to
> > again add smp_on_up support later on?
>
> There isn't anything in the architecture specs that mandates multiple
> cores but given the current trend it's very likely that we'll always
> have MP.
>
> An improvement in AArch64 is that we can use the SMP cache/TLB ops (the
> inner shareable variants) even on a UP system so there is no need for
> run-time code patching for correct execution.

That's good to hear!

Tony

2012-08-17 09:41:21

by Santosh Shilimkar

[permalink] [raw]
Subject: Re: [PATCH v2 02/31] arm64: Kernel booting and initialisation

On Tuesday 14 August 2012 11:22 PM, Catalin Marinas wrote:
> The patch adds the kernel booting and the initial setup code.
> Documentation/arm64/booting.txt describes the booting protocol on the
> AArch64 Linux kernel. This is subject to change following the work on
> boot standardisation, ACPI.
>
> Signed-off-by: Will Deacon<[email protected]>
> Signed-off-by: Catalin Marinas<[email protected]>
> ---
> Documentation/arm64/booting.txt | 141 +++++++++++
> arch/arm64/include/asm/setup.h | 26 ++
> arch/arm64/kernel/head.S | 521 +++++++++++++++++++++++++++++++++++++++
> arch/arm64/kernel/setup.c | 357 +++++++++++++++++++++++++++
> 4 files changed, 1045 insertions(+), 0 deletions(-)
> create mode 100644 Documentation/arm64/booting.txt
> create mode 100644 arch/arm64/include/asm/setup.h
> create mode 100644 arch/arm64/kernel/head.S
> create mode 100644 arch/arm64/kernel/setup.c
>
> diff --git a/Documentation/arm64/booting.txt b/Documentation/arm64/booting.txt
> new file mode 100644
> index 0000000..3197820
> --- /dev/null
> +++ b/Documentation/arm64/booting.txt

[...]

> +
> +The boot loader is expected to enter the kernel on each CPU in the
> +following manner:
> +
> +- The primary CPU must jump directly to the first instruction of the
> + kernel image. The device tree blob passed by this CPU must contain
> + for each CPU node:
> +
> + 1. An 'enable-method' property. Currently, the only supported value
> + for this field is the string "spin-table".
> +
> + 2. A 'cpu-release-addr' property identifying a 64-bit,
> + zero-initialised memory location.
> +
> + It is expected that the bootloader will generate these device tree
> + properties and insert them into the blob prior to kernel entry.
> +
> +- Any secondary CPUs must spin outside of the kernel in a reserved area
> + of memory (communicated to the kernel by a /memreserve/ region in the
> + device tree) polling their cpu-release-addr location, which must be
> + contained in the reserved region. A wfe instruction may be inserted
> + to reduce the overhead of the busy-loop and a sev will be issued by
> + the primary CPU. When a read of the location pointed to by the
> + cpu-release-addr returns a non-zero value, the CPU must jump directly
> + to this value.

So you expect all the secondary CPUs to be in wakeup state and probably
looping in WFE for a signal from kernel to boot. There is one issue
with this requirement though. For large CPU system, you need to reset
all the CPUs and hit this waiting loop. This will lead to large inrush
current need at bootup which may be not be supported. To avoid this
issue, secondary CPUs are kept in OFF state and then they are woken
up from kernel one by one whenever they need to be brought into the
system. This requirement should be considered.


> diff --git a/arch/arm64/include/asm/setup.h b/arch/arm64/include/asm/setup.h
> new file mode 100644
> index 0000000..d766493
> --- /dev/null
> +++ b/arch/arm64/include/asm/setup.h
> @@ -0,0 +1,26 @@
> +/*
> + * Based on arch/arm/include/asm/setup.h
> + *
> + * Copyright (C) 1997-1999 Russell King
> + * Copyright (C) 2012 ARM Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see<http://www.gnu.org/licenses/>.
> + */
> +#ifndef __ASM_SETUP_H
> +#define __ASM_SETUP_H
> +
> +#include<linux/types.h>
> +
> +#define COMMAND_LINE_SIZE 1024
> +
> +#endif
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> new file mode 100644
> index 0000000..34ccdc0
> --- /dev/null
> +++ b/arch/arm64/kernel/head.S
> @@ -0,0 +1,521 @@
> +/*
> + * Low-level CPU initialisation
> + * Based on arch/arm/kernel/head.S
> + *
> + * Copyright (C) 1994-2002 Russell King
> + * Copyright (C) 2003-2012 ARM Ltd.
> + * Authors: Catalin Marinas<[email protected]>
> + * Will Deacon<[email protected]>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see<http://www.gnu.org/licenses/>.
> + */
> +

[..]

> +
> + /*
> + * DO NOT MODIFY. Image header expected by Linux boot-loaders.
> + */
> + b stext // branch to kernel start, magic
> + .long 0 // reserved
> + .quad TEXT_OFFSET // Image load offset from start of RAM
> + .quad 0 // reserved
> + .quad 0 // reserved
> +

Minor nit. Avoid C++ commenting style "//" here and rest of the patch.

Regards
santosh

2012-08-17 09:46:37

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 30/31] arm64: Build infrastructure

On Fri, Aug 17, 2012 at 10:32:13AM +0100, Tony Lindgren wrote:
> * Catalin Marinas <[email protected]> [120814 11:00]:
> > --- /dev/null
> > +++ b/arch/arm64/Kconfig
> > @@ -0,0 +1,261 @@
> > +config ARM64
> > + def_bool y
> > + select OF
> > + select OF_EARLY_FLATTREE
> > + select IRQ_DOMAIN
> > + select HAVE_AOUT
> > + select HAVE_DMA_ATTRS
> > + select HAVE_DMA_API_DEBUG
> > + select HAVE_IDE
> > + select HAVE_MEMBLOCK
> > + select RTC_LIB
> > + select SYS_SUPPORTS_APM_EMULATION
> > + select HAVE_GENERIC_DMA_COHERENT
> > + select GENERIC_IOMAP
> > + select HAVE_IRQ_WORK
> > + select HAVE_PERF_EVENTS
> > + select HAVE_ARCH_TRACEHOOK
> > + select PERF_USE_VMALLOC
> > + select HAVE_HW_BREAKPOINT if PERF_EVENTS
> > + select HAVE_GENERIC_HARDIRQS
> > + select GENERIC_HARDIRQS_NO_DEPRECATED
> > + select HAVE_SPARSE_IRQ
> > + select SPARSE_IRQ
> > + select GENERIC_IRQ_SHOW
> > + select GENERIC_SMP_IDLE_THREAD
> > + select NO_BOOTMEM
> > + help
> > + ARM 64-bit (AArch64) Linux support.
>
> Anything we should select here for ARM generic timers and
> interrupts assuming we can always expect to boot using those?

There is an entry in drivers/clocksource/Kconfig:

config CLKSRC_ARM_GENERIC
def_bool y if ARM64

I will have something similar for the GIC but the model does not
currently support GICv3 to be able to test. I'll publish a branch with
example SoC code (for the model) and that adds GIC support into
drivers/irqchip/ with the address information from FDT. The per-CPU GIC
initialisation is done via a CPU notifier to decouple this from the SoC
code (I think even on 32-bit ARM it could be done in the same way,
gic_secondary_init() always takes 0 as argument.

--
Catalin

2012-08-17 09:57:31

by Santosh Shilimkar

[permalink] [raw]
Subject: Re: [PATCH v2 09/31] arm64: Cache maintenance routines

On Tuesday 14 August 2012 11:22 PM, Catalin Marinas wrote:
> The patch adds functionality required for cache maintenance. The AArch64
> architecture mandates non-aliasing VIPT or PIPT D-cache and VIPT (may
> have aliases) or ASID-tagged VIVT I-cache. Cache maintenance operations
> are automatically broadcast in hardware between CPUs.
>
> Signed-off-by: Will Deacon<[email protected]>
> Signed-off-by: Catalin Marinas<[email protected]>
> ---
> arch/arm64/include/asm/cache.h | 32 ++++
> arch/arm64/include/asm/cacheflush.h | 209 ++++++++++++++++++++++++++
> arch/arm64/include/asm/cachetype.h | 48 ++++++
> arch/arm64/mm/cache.S | 279 +++++++++++++++++++++++++++++++++++
> arch/arm64/mm/flush.c | 132 +++++++++++++++++
> 5 files changed, 700 insertions(+), 0 deletions(-)
> create mode 100644 arch/arm64/include/asm/cache.h
> create mode 100644 arch/arm64/include/asm/cacheflush.h
> create mode 100644 arch/arm64/include/asm/cachetype.h
> create mode 100644 arch/arm64/mm/cache.S
> create mode 100644 arch/arm64/mm/flush.c
>
> diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
> new file mode 100644
> index 0000000..390308a
> --- /dev/null
> +++ b/arch/arm64/include/asm/cache.h
> @@ -0,0 +1,32 @@
> +/*
> + * Copyright (C) 2012 ARM Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see<http://www.gnu.org/licenses/>.
> + */
> +#ifndef __ASM_CACHE_H
> +#define __ASM_CACHE_H
> +
> +#define L1_CACHE_SHIFT 6
> +#define L1_CACHE_BYTES (1<< L1_CACHE_SHIFT)
> +
> +/*
> + * Memory returned by kmalloc() may be used for DMA, so we must make
> + * sure that all such allocations are cache aligned. Otherwise,
> + * unrelated code may cause parts of the buffer to be read into the
> + * cache before the transfer is done, causing old data to be seen by
> + * the CPU.
> + */
> +#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
> +#define ARCH_SLAB_MINALIGN 8
> +
> +#endif
> diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
> new file mode 100644
> index 0000000..93b5590
> --- /dev/null
> +++ b/arch/arm64/include/asm/cacheflush.h
> @@ -0,0 +1,209 @@
> +/*
> + * Based on arch/arm/include/asm/cacheflush.h
> + *
> + * Copyright (C) 1999-2002 Russell King.
> + * Copyright (C) 2012 ARM Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see<http://www.gnu.org/licenses/>.
> + */
> +#ifndef __ASM_CACHEFLUSH_H
> +#define __ASM_CACHEFLUSH_H
> +
> +#include<linux/mm.h>
> +
> +/*
> + * This flag is used to indicate that the page pointed to by a pte is clean
> + * and does not require cleaning before returning it to the user.
> + */
> +#define PG_dcache_clean PG_arch_1
> +
> +/*
> + * MM Cache Management
> + * ===================
> + *
> + * The arch/arm/mm/cache-*.S and arch/arm/mm/proc-*.S files
> + * implement these methods.
> + *
> + * Start addresses are inclusive and end addresses are exclusive;
> + * start addresses should be rounded down, end addresses up.
> + *
> + * See Documentation/cachetlb.txt for more information.
> + * Please note that the implementation of these, and the required
> + * effects are cache-type (VIVT/VIPT/PIPT) specific.
> + *
> + * flush_cache_kern_all()
> + *
> + * Unconditionally clean and invalidate the entire cache.
> + *
> + * flush_cache_user_mm(mm)
> + *
> + * Clean and invalidate all user space cache entries
> + * before a change of page tables.
> + *
> + * flush_cache_user_range(start, end, flags)
> + *
> + * Clean and invalidate a range of cache entries in the
> + * specified address space before a change of page tables.
> + * - start - user start address (inclusive, page aligned)
> + * - end - user end address (exclusive, page aligned)
> + * - flags - vma->vm_flags field
> + *
> + * coherent_kern_range(start, end)
> + *
> + * Ensure coherency between the Icache and the Dcache in the
> + * region described by start, end. If you have non-snooping
> + * Harvard caches, you need to implement this function.
> + * - start - virtual start address
> + * - end - virtual end address
> + *
> + * coherent_user_range(start, end)
> + *
> + * Ensure coherency between the Icache and the Dcache in the
> + * region described by start, end. If you have non-snooping
> + * Harvard caches, you need to implement this function.
> + * - start - virtual start address
> + * - end - virtual end address
> + *
> + * flush_kern_dcache_area(kaddr, size)
> + *
> + * Ensure that the data held in page is written back.
> + * - kaddr - page address
> + * - size - region size
> + *
> + * DMA Cache Coherency
> + * ===================
> + *
> + * dma_flush_range(start, end)
> + *
> + * Clean and invalidate the specified virtual address range.
> + * - start - virtual start address
> + * - end - virtual end address
> + */
> +extern void __cpuc_flush_kern_all(void);
> +extern void __cpuc_flush_user_all(void);
> +extern void __cpuc_flush_user_range(unsigned long, unsigned long, unsigned int);
> +extern void __cpuc_coherent_kern_range(unsigned long, unsigned long);
> +extern void __cpuc_coherent_user_range(unsigned long, unsigned long);
> +extern void __cpuc_flush_dcache_area(void *, size_t);
> +
> +/*
> + * These are private to the dma-mapping API. Do not use directly.
> + * Their sole purpose is to ensure that data held in the cache
> + * is visible to DMA, or data written by DMA to system memory is
> + * visible to the CPU.
> + */
> +extern void dmac_map_area(const void *, size_t, int);
> +extern void dmac_unmap_area(const void *, size_t, int);
> +extern void dmac_flush_range(const void *, const void *);
> +
> +/*
> + * Copy user data from/to a page which is mapped into a different
> + * processes address space. Really, we want to allow our "user
> + * space" model to handle this.
> + */
> +extern void copy_to_user_page(struct vm_area_struct *, struct page *,
> + unsigned long, void *, const void *, unsigned long);
> +#define copy_from_user_page(vma, page, vaddr, dst, src, len) \
> + do { \
> + memcpy(dst, src, len); \
> + } while (0)
> +
> +/*
> + * Convert calls to our calling convention.
> + */
> +#define flush_cache_all() __cpuc_flush_kern_all()
> +extern void flush_cache_mm(struct mm_struct *mm);
> +extern void flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned long end);
> +extern void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr, unsigned long pfn);
> +
> +#define flush_cache_dup_mm(mm) flush_cache_mm(mm)
> +
> +/*
> + * flush_cache_user_range is used when we want to ensure that the
> + * Harvard caches are synchronised for the user space address range.
> + * This is used for the ARM private sys_cacheflush system call.
> + */
> +#define flush_cache_user_range(start, end) \
> + __cpuc_coherent_user_range((start)& PAGE_MASK, PAGE_ALIGN(end))
> +
> +/*
> + * Perform necessary cache operations to ensure that data previously
> + * stored within this range of addresses can be executed by the CPU.
> + */
> +#define flush_icache_range(s,e) __cpuc_coherent_kern_range(s,e)
> +
> +/*
> + * flush_dcache_page is used when the kernel has written to the page
> + * cache page at virtual address page->virtual.
> + *
> + * If this page isn't mapped (ie, page_mapping == NULL), or it might
> + * have userspace mappings, then we _must_ always clean + invalidate
> + * the dcache entries associated with the kernel mapping.
> + *
> + * Otherwise we can defer the operation, and clean the cache when we are
> + * about to change to user space. This is the same method as used on SPARC64.
> + * See update_mmu_cache for the user space part.
> + */
> +#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
> +extern void flush_dcache_page(struct page *);
> +
> +static inline void __flush_icache_all(void)
> +{
> + asm("ic ialluis");
> +}
> +
> +#define ARCH_HAS_FLUSH_ANON_PAGE
> +static inline void flush_anon_page(struct vm_area_struct *vma,
> + struct page *page, unsigned long vmaddr)
> +{
> + extern void __flush_anon_page(struct vm_area_struct *vma,
> + struct page *, unsigned long);
> + if (PageAnon(page))
> + __flush_anon_page(vma, page, vmaddr);
> +}
> +
> +#define flush_dcache_mmap_lock(mapping) \
> + spin_lock_irq(&(mapping)->tree_lock)
> +#define flush_dcache_mmap_unlock(mapping) \
> + spin_unlock_irq(&(mapping)->tree_lock)
> +
> +#define flush_icache_user_range(vma,page,addr,len) \
> + flush_dcache_page(page)
> +
> +/*
> + * We don't appear to need to do anything here. In fact, if we did, we'd
> + * duplicate cache flushing elsewhere performed by flush_dcache_page().
> + */
> +#define flush_icache_page(vma,page) do { } while (0)
> +
> +/*
> + * flush_cache_vmap() is used when creating mappings (eg, via vmap,
> + * vmalloc, ioremap etc) in kernel space for pages. On non-VIPT
> + * caches, since the direct-mappings of these pages may contain cached
> + * data, we need to do a full cache flush to ensure that writebacks
> + * don't corrupt data placed into these pages via the new mappings.
> + */
> +static inline void flush_cache_vmap(unsigned long start, unsigned long end)
> +{
> + /*
> + * set_pte_at() called from vmap_pte_range() does not
> + * have a DSB after cleaning the cache line.
> + */
> + dsb();
> +}
> +
> +static inline void flush_cache_vunmap(unsigned long start, unsigned long end)
> +{
> +}
> +
> +#endif
> diff --git a/arch/arm64/include/asm/cachetype.h b/arch/arm64/include/asm/cachetype.h
> new file mode 100644
> index 0000000..85f5f51
> --- /dev/null
> +++ b/arch/arm64/include/asm/cachetype.h
> @@ -0,0 +1,48 @@
> +/*
> + * Copyright (C) 2012 ARM Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see<http://www.gnu.org/licenses/>.
> + */
> +#ifndef __ASM_CACHETYPE_H
> +#define __ASM_CACHETYPE_H
> +
> +#include<asm/cputype.h>
> +
> +#define CTR_L1IP_SHIFT 14
> +#define CTR_L1IP_MASK 3
> +
> +#define ICACHE_POLICY_RESERVED 0
> +#define ICACHE_POLICY_AIVIVT 1
> +#define ICACHE_POLICY_VIPT 2
> +#define ICACHE_POLICY_PIPT 3
> +
> +static inline u32 icache_policy(void)
> +{
> + return (read_cpuid_cachetype()>> CTR_L1IP_SHIFT)& CTR_L1IP_MASK;
> +}
> +
> +/*
> + * Whilst the D-side always behaves as PIPT on AArch64, aliasing is
> + * permitted in the I-cache.
> + */
> +static inline int icache_is_aliasing(void)
> +{
> + return icache_policy() != ICACHE_POLICY_PIPT;
> +}
> +
> +static inline int icache_is_aivivt(void)
> +{
> + return icache_policy() == ICACHE_POLICY_AIVIVT;
> +}
> +
> +#endif /* __ASM_CACHETYPE_H */
> diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S
> new file mode 100644
> index 0000000..f4efa04
> --- /dev/null
> +++ b/arch/arm64/mm/cache.S
> @@ -0,0 +1,279 @@
> +/*
> + * Cache maintenance
> + *
> + * Copyright (C) 2001 Deep Blue Solutions Ltd.
> + * Copyright (C) 2012 ARM Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see<http://www.gnu.org/licenses/>.
> + */
> +
> +#include<linux/linkage.h>
> +#include<linux/init.h>
> +#include<asm/assembler.h>
> +
> +#include "proc-macros.S"
> +
> +/*
> + * __cpuc_flush_dcache_all()
> + *
> + * Flush the whole D-cache.
> + *
> + * Corrupted registers: x0-x7, x9-x11
> + */
> +ENTRY(__cpuc_flush_dcache_all)
> + dsb sy // ensure ordering with previous memory accesses
> + mrs x0, clidr_el1 // read clidr
> + and x3, x0, #0x7000000 // extract loc from clidr
> + lsr x3, x3, #23 // left align loc bit field
> + cbz x3, finished // if loc is 0, then no need to clean
> + mov x10, #0 // start clean at cache level 0
> +loop1:
> + add x2, x10, x10, lsr #1 // work out 3x current cache level
> + lsr x1, x0, x2 // extract cache type bits from clidr
> + and x1, x1, #7 // mask of the bits for current cache only
> + cmp x1, #2 // see what cache we have at this level
> + b.lt skip // skip if no cache, or just i-cache
> + save_and_disable_irqs x9 // make CSSELR and CCSIDR access atomic
> + msr csselr_el1, x10 // select current cache level in csselr
> + isb // isb to sych the new cssr&csidr
> + mrs x1, ccsidr_el1 // read the new ccsidr
> + restore_irqs x9
> + and x2, x1, #7 // extract the length of the cache lines
> + add x2, x2, #4 // add 4 (line length offset)
> + mov x4, #0x3ff
> + and x4, x4, x1, lsr #3 // find maximum number on the way size
> + clz x5, x4 // find bit position of way size increment
> + mov x7, #0x7fff
> + and x7, x7, x1, lsr #13 // extract max number of the index size
> +loop2:
> + mov x9, x4 // create working copy of max way size
> +loop3:
> + lsl x6, x9, x5
> + orr x11, x10, x6 // factor way and cache number into x11
> + lsl x6, x7, x2
> + orr x11, x11, x6 // factor index number into x11
> + dc cisw, x11 // clean& invalidate by set/way
> + subs x9, x9, #1 // decrement the way
> + b.ge loop3
> + subs x7, x7, #1 // decrement the index
> + b.ge loop2
> +skip:
> + add x10, x10, #2 // increment cache number
> + cmp x3, x10
> + b.gt loop1
> +finished:
> + mov x10, #0 // swith back to cache level 0
> + msr csselr_el1, x10 // select current cache level in csselr
> + dsb sy
> + isb
> + ret
> +ENDPROC(__cpuc_flush_dcache_all)
> +
>
We have discussed the need of cache maintenance by
level kind of API for ARMv7 (A15).

Shouldn't we add such API for arm64 as well ?

Regards
Santosh

2012-08-17 10:06:23

by Santosh Shilimkar

[permalink] [raw]
Subject: Re: [PATCH v2 05/31] arm64: MMU initialisation

On Tuesday 14 August 2012 11:22 PM, Catalin Marinas wrote:
> This patch contains the initialisation of the memory blocks, MMU
> attributes and the memory map. Only five memory types are defined:
> Device nGnRnE (equivalent to Strongly Ordered), Device nGnRE (classic
> Device memory), Device GRE, Normal Non-cacheable and Normal Cacheable.
> Cache policies are supported via the memory attributes register
> (MAIR_EL1) and only affect the Normal Cacheable mappings.
>
> This patch also adds the SPARSEMEM_VMEMMAP initialisation.
>
> Signed-off-by: Will Deacon<[email protected]>
> Signed-off-by: Catalin Marinas<[email protected]>
> ---

Whats the difference between Device nGnRE and Device GRE ?
Sorry, I haven't gone through the specs yet and hence the
question.

> arch/arm64/include/asm/memblock.h | 21 ++
> arch/arm64/mm/init.c | 416 +++++++++++++++++++++++++++++++++++++
> arch/arm64/mm/mmu.c | 395 +++++++++++++++++++++++++++++++++++
> 3 files changed, 832 insertions(+), 0 deletions(-)
> create mode 100644 arch/arm64/include/asm/memblock.h
> create mode 100644 arch/arm64/mm/init.c
> create mode 100644 arch/arm64/mm/mmu.c
>
> diff --git a/arch/arm64/include/asm/memblock.h b/arch/arm64/include/asm/memblock.h
> new file mode 100644
> index 0000000..6afeed2
> --- /dev/null
> +++ b/arch/arm64/include/asm/memblock.h
> @@ -0,0 +1,21 @@
> +/*
> + * Copyright (C) 2012 ARM Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see<http://www.gnu.org/licenses/>.
> + */
> +#ifndef __ASM_MEMBLOCK_H
> +#define __ASM_MEMBLOCK_H
> +
> +extern void arm64_memblock_init(void);
> +
> +#endif
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> new file mode 100644
> index 0000000..c87583d
> --- /dev/null
> +++ b/arch/arm64/mm/init.c
> @@ -0,0 +1,416 @@
> +/*
> + * Based on arch/arm/mm/init.c
> + *
> + * Copyright (C) 1995-2005 Russell King
> + * Copyright (C) 2012 ARM Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see<http://www.gnu.org/licenses/>.
> + */
> +
> +#include<linux/kernel.h>
> +#include<linux/export.h>
> +#include<linux/errno.h>
> +#include<linux/swap.h>
> +#include<linux/init.h>
> +#include<linux/bootmem.h>
> +#include<linux/mman.h>
> +#include<linux/nodemask.h>
> +#include<linux/initrd.h>
> +#include<linux/gfp.h>
> +#include<linux/memblock.h>
> +#include<linux/sort.h>
> +#include<linux/of_fdt.h>
> +
> +#include<asm/prom.h>
> +#include<asm/sections.h>
> +#include<asm/setup.h>
> +#include<asm/sizes.h>
> +#include<asm/tlb.h>
> +
> +#include "mm.h"
> +
> +static unsigned long phys_initrd_start __initdata = 0;
> +static unsigned long phys_initrd_size __initdata = 0;
> +
> +phys_addr_t memstart_addr __read_mostly = 0;
> +
> +void __init early_init_dt_setup_initrd_arch(unsigned long start,
> + unsigned long end)
> +{
> + phys_initrd_start = start;
> + phys_initrd_size = end - start;
> +}
> +
> +static int __init early_initrd(char *p)
> +{
> + unsigned long start, size;
> + char *endp;
> +
> + start = memparse(p,&endp);
> + if (*endp == ',') {
> + size = memparse(endp + 1, NULL);
> +
> + phys_initrd_start = start;
> + phys_initrd_size = size;
> + }
> + return 0;
> +}
> +early_param("initrd", early_initrd);
> +
> +#define MAX_DMA32_PFN ((4UL * 1024 * 1024 * 1024)>> PAGE_SHIFT)
> +
> +static void __init zone_sizes_init(unsigned long min, unsigned long max)
> +{
> + unsigned long zone_size[MAX_NR_ZONES];
> + unsigned long max_dma32 = min;
> +
> + memset(zone_size, 0, sizeof(zone_size));
> +
> + zone_size[0] = max - min;
> +#ifdef CONFIG_ZONE_DMA32
> + /* 4GB maximum for 32-bit only capable devices */
> + max_dma32 = min(max, MAX_DMA32_PFN);
> + zone_size[ZONE_DMA32] = max_dma32 - min;
> +#endif
Do you see need of supporting DMA32 on arm64 SOCs ?

[..]

> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> new file mode 100644
> index 0000000..d2dd438
> --- /dev/null
> +++ b/arch/arm64/mm/mmu.c
> @@ -0,0 +1,395 @@
> +/*
> + * Based on arch/arm/mm/mmu.c
> + *
> + * Copyright (C) 1995-2005 Russell King
> + * Copyright (C) 2012 ARM Ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see<http://www.gnu.org/licenses/>.
> + */
> +
> +#include<linux/export.h>
> +#include<linux/kernel.h>
> +#include<linux/errno.h>
> +#include<linux/init.h>
> +#include<linux/mman.h>
> +#include<linux/nodemask.h>
> +#include<linux/memblock.h>
> +#include<linux/fs.h>
> +
> +#include<asm/cputype.h>
> +#include<asm/sections.h>
> +#include<asm/setup.h>
> +#include<asm/sizes.h>
> +#include<asm/tlb.h>
> +#include<asm/mmu_context.h>
> +
> +#include "mm.h"
> +
> +/*
> + * Empty_zero_page is a special page that is used for zero-initialized data
> + * and COW.
> + */
> +struct page *empty_zero_page;
> +EXPORT_SYMBOL(empty_zero_page);
> +
> +pgprot_t pgprot_default;
> +EXPORT_SYMBOL(pgprot_default);
> +
> +static pmdval_t prot_sect_kernel;
> +
> +struct cachepolicy {
> + const char policy[16];
> + u64 mair;
> + u64 tcr;
> +};
> +
> +static struct cachepolicy cache_policies[] __initdata = {
> + {
> + .policy = "uncached",
> + .mair = 0x44, /* inner, outer non-cacheable */
> + .tcr = TCR_IRGN_NC | TCR_ORGN_NC,
> + }, {
> + .policy = "writethrough",
> + .mair = 0xaa, /* inner, outer write-through, read-allocate */
> + .tcr = TCR_IRGN_WT | TCR_ORGN_WT,
Is WT supported on arm64?
On the recent ARMv7 processors, I think WT wasn't supported.

Regards
Santosh

2012-08-17 10:06:37

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 02/31] arm64: Kernel booting and initialisation

On Fri, Aug 17, 2012 at 10:41:10AM +0100, Santosh Shilimkar wrote:
> On Tuesday 14 August 2012 11:22 PM, Catalin Marinas wrote:
> > +The boot loader is expected to enter the kernel on each CPU in the
> > +following manner:
> > +
> > +- The primary CPU must jump directly to the first instruction of the
> > + kernel image. The device tree blob passed by this CPU must contain
> > + for each CPU node:
> > +
> > + 1. An 'enable-method' property. Currently, the only supported value
> > + for this field is the string "spin-table".
> > +
> > + 2. A 'cpu-release-addr' property identifying a 64-bit,
> > + zero-initialised memory location.
> > +
> > + It is expected that the bootloader will generate these device tree
> > + properties and insert them into the blob prior to kernel entry.
> > +
> > +- Any secondary CPUs must spin outside of the kernel in a reserved area
> > + of memory (communicated to the kernel by a /memreserve/ region in the
> > + device tree) polling their cpu-release-addr location, which must be
> > + contained in the reserved region. A wfe instruction may be inserted
> > + to reduce the overhead of the busy-loop and a sev will be issued by
> > + the primary CPU. When a read of the location pointed to by the
> > + cpu-release-addr returns a non-zero value, the CPU must jump directly
> > + to this value.
>
> So you expect all the secondary CPUs to be in wakeup state and probably
> looping in WFE for a signal from kernel to boot. There is one issue
> with this requirement though. For large CPU system, you need to reset
> all the CPUs and hit this waiting loop. This will lead to large inrush
> current need at bootup which may be not be supported. To avoid this
> issue, secondary CPUs are kept in OFF state and then they are woken
> up from kernel one by one whenever they need to be brought into the
> system. This requirement should be considered.

I agree, this part will be extended. That's one method that we currently
support and suitable to the model.

The better method is the SMC standardisation that Charles Garcia-Tobin
has written (to be made available soon) and was presented at the last
Linaro Connect in HK. Given that the CPU power is usually controlled by
the secure side, we'll ask for an SMC to be issued for waking up
secondary CPUs, so it's up to the secure firmware to write the correct
hardware registers.

> > --- /dev/null
> > +++ b/arch/arm64/kernel/head.S
> [..]
> > + /*
> > + * DO NOT MODIFY. Image header expected by Linux boot-loaders.
> > + */
> > + b stext // branch to kernel start, magic
> > + .long 0 // reserved
> > + .quad TEXT_OFFSET // Image load offset from start of RAM
> > + .quad 0 // reserved
> > + .quad 0 // reserved
> > +
>
> Minor nit. Avoid C++ commenting style "//" here and rest of the patch.

That's not C++ comment style, it's the *official* assembly comment style
for AArch64 ('@' is no longer supported).

--
Catalin

2012-08-17 10:08:04

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 09/31] arm64: Cache maintenance routines

On Fri, Aug 17, 2012 at 10:57:20AM +0100, Santosh Shilimkar wrote:
> On Tuesday 14 August 2012 11:22 PM, Catalin Marinas wrote:
> > +ENTRY(__cpuc_flush_dcache_all)
>
> We have discussed the need of cache maintenance by
> level kind of API for ARMv7 (A15).
>
> Shouldn't we add such API for arm64 as well ?

Yes, at some point we'll probably add but we'll discuss it again when
actually needed. I wouldn't define new API now that's not used by any
AArch64 code.

--
Catalin

2012-08-17 10:10:38

by Santosh Shilimkar

[permalink] [raw]
Subject: Re: [PATCH v2 02/31] arm64: Kernel booting and initialisation

On Fri, Aug 17, 2012 at 3:35 PM, Catalin Marinas
<[email protected]> wrote:
>
> On Fri, Aug 17, 2012 at 10:41:10AM +0100, Santosh Shilimkar wrote:
> > On Tuesday 14 August 2012 11:22 PM, Catalin Marinas wrote:
> > > +The boot loader is expected to enter the kernel on each CPU in the
> > > +following manner:
> > > +
> > > +- The primary CPU must jump directly to the first instruction of the
> > > + kernel image. The device tree blob passed by this CPU must contain
> > > + for each CPU node:
> > > +
> > > + 1. An 'enable-method' property. Currently, the only supported
> > > value
> > > + for this field is the string "spin-table".
> > > +
> > > + 2. A 'cpu-release-addr' property identifying a 64-bit,
> > > + zero-initialised memory location.
> > > +
> > > + It is expected that the bootloader will generate these device tree
> > > + properties and insert them into the blob prior to kernel entry.
> > > +
> > > +- Any secondary CPUs must spin outside of the kernel in a reserved
> > > area
> > > + of memory (communicated to the kernel by a /memreserve/ region in
> > > the
> > > + device tree) polling their cpu-release-addr location, which must be
> > > + contained in the reserved region. A wfe instruction may be
> > > inserted
> > > + to reduce the overhead of the busy-loop and a sev will be issued by
> > > + the primary CPU. When a read of the location pointed to by the
> > > + cpu-release-addr returns a non-zero value, the CPU must jump
> > > directly
> > > + to this value.
> >
> > So you expect all the secondary CPUs to be in wakeup state and probably
> > looping in WFE for a signal from kernel to boot. There is one issue
> > with this requirement though. For large CPU system, you need to reset
> > all the CPUs and hit this waiting loop. This will lead to large inrush
> > current need at bootup which may be not be supported. To avoid this
> > issue, secondary CPUs are kept in OFF state and then they are woken
> > up from kernel one by one whenever they need to be brought into the
> > system. This requirement should be considered.
>
> I agree, this part will be extended. That's one method that we currently
> support and suitable to the model.
>
> The better method is the SMC standardisation that Charles Garcia-Tobin
> has written (to be made available soon) and was presented at the last
> Linaro Connect in HK. Given that the CPU power is usually controlled by
> the secure side, we'll ask for an SMC to be issued for waking up
> secondary CPUs, so it's up to the secure firmware to write the correct
> hardware registers.
>
Thanks for the information. SMC standardization would indeed help
to overcome some of these. Will wait for that information before
next set of questions.

> > > --- /dev/null
> > > +++ b/arch/arm64/kernel/head.S
> > [..]
> > > + /*
> > > + * DO NOT MODIFY. Image header expected by Linux boot-loaders.
> > > + */
> > > + b stext // branch to kernel start,
> > > magic
> > > + .long 0 // reserved
> > > + .quad TEXT_OFFSET // Image load offset from
> > > start of RAM
> > > + .quad 0 // reserved
> > > + .quad 0 // reserved
> > > +
> >
> > Minor nit. Avoid C++ commenting style "//" here and rest of the patch.
>
> That's not C++ comment style, it's the *official* assembly comment style
> for AArch64 ('@' is no longer supported).
>
Ok. Thanks for clarifying.

Regards
Santosh

2012-08-17 10:12:31

by Santosh Shilimkar

[permalink] [raw]
Subject: Re: [PATCH v2 09/31] arm64: Cache maintenance routines

On Fri, Aug 17, 2012 at 3:37 PM, Catalin Marinas
<[email protected]> wrote:
> On Fri, Aug 17, 2012 at 10:57:20AM +0100, Santosh Shilimkar wrote:
>> On Tuesday 14 August 2012 11:22 PM, Catalin Marinas wrote:
>> > +ENTRY(__cpuc_flush_dcache_all)
>>
>> We have discussed the need of cache maintenance by
>> level kind of API for ARMv7 (A15).
>>
>> Shouldn't we add such API for arm64 as well ?
>
> Yes, at some point we'll probably add but we'll discuss it again when
> actually needed. I wouldn't define new API now that's not used by any
> AArch64 code.
>
The patches are already on the list for ARMv7 and just waiting to merge
two approaches.
I agree once it is getting merged for ARMv7 and ARMv8 port can be
updated.

Regards
Santosh

2012-08-17 10:15:37

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 05/31] arm64: MMU initialisation

On Fri, Aug 17, 2012 at 11:06:11AM +0100, Santosh Shilimkar wrote:
> On Tuesday 14 August 2012 11:22 PM, Catalin Marinas wrote:
> > This patch contains the initialisation of the memory blocks, MMU
> > attributes and the memory map. Only five memory types are defined:
> > Device nGnRnE (equivalent to Strongly Ordered), Device nGnRE (classic
> > Device memory), Device GRE, Normal Non-cacheable and Normal Cacheable.
> > Cache policies are supported via the memory attributes register
> > (MAIR_EL1) and only affect the Normal Cacheable mappings.
> >
> > This patch also adds the SPARSEMEM_VMEMMAP initialisation.
> >
> > Signed-off-by: Will Deacon<[email protected]>
> > Signed-off-by: Catalin Marinas<[email protected]>
> > ---
>
> Whats the difference between Device nGnRE and Device GRE ?
> Sorry, I haven't gone through the specs yet and hence the
> question.

G - gathering (multiple reads/writes into one)
R - reordering (reads/writes)
E - early acknowledgement (the write may have not hit the device before
the instruction returns).

The 'n' in front just negates the meaning.

So the Device memory as we know it on ARMv7 is equivalent to nGnRE. The
Strongly Ordered is nGnRnE. GRE is pretty much like Normal Non-cacheable
memory but with Device mapping, so there are restrictions on unaligned
accesses.

> > +#ifdef CONFIG_ZONE_DMA32
> > + /* 4GB maximum for 32-bit only capable devices */
> > + max_dma32 = min(max, MAX_DMA32_PFN);
> > + zone_size[ZONE_DMA32] = max_dma32 - min;
> > +#endif
>
> Do you see need of supporting DMA32 on arm64 SOCs ?

I've got some questions from partners but those devices may just be
hidden behind an iommu. For now I left it in.

> > +static struct cachepolicy cache_policies[] __initdata = {
> > + {
> > + .policy = "uncached",
> > + .mair = 0x44, /* inner, outer non-cacheable */
> > + .tcr = TCR_IRGN_NC | TCR_ORGN_NC,
> > + }, {
> > + .policy = "writethrough",
> > + .mair = 0xaa, /* inner, outer write-through, read-allocate */
> > + .tcr = TCR_IRGN_WT | TCR_ORGN_WT,
>
> Is WT supported on arm64?
> On the recent ARMv7 processors, I think WT wasn't supported.

All of WB, WA, WT are just architectural hints. A CPU implementation may
or may not ignore them but with Linux we try to follow the architecture
rather than specific implementations.

--
Catalin

2012-08-17 10:21:29

by Santosh Shilimkar

[permalink] [raw]
Subject: Re: [PATCH v2 28/31] arm64: Generic timers support

On Tuesday 14 August 2012 11:22 PM, Catalin Marinas wrote:
> From: Marc Zyngier<[email protected]>
>
> This patch adds support for the ARM generic timers with A64 instructions
> for accessing the timer registers. It uses the physical counter as the
> clock source and the virtual counter as sched_clock.
>
> The timer frequency can be specified via DT or read from the CNTFRQ_EL0
> register. The physical counter is also accessible from user space
> allowing fast gettimeofday() implementation.
>
> Signed-off-by: Marc Zyngier<[email protected]>
> Signed-off-by: Will Deacon<[email protected]>
> Signed-off-by: Catalin Marinas<[email protected]>
> ---

[..]

> diff --git a/drivers/clocksource/arm_generic.c b/drivers/clocksource/arm_generic.c
> new file mode 100644
> index 0000000..05c898c
> --- /dev/null
> +++ b/drivers/clocksource/arm_generic.c

[..]

> +
> +static void __cpuinit arch_timer_setup(struct clock_event_device *clk)
> +{
> + /* Let's make sure the timer is off before doing anything else */
> + arch_timer_stop();
> +
> + clk->features = CLOCK_EVT_FEAT_ONESHOT;
Are these CPU timers wakeup capable or we need the wakeup capable
broadcast timer for low power wakeups ?
In that case C3STOP would be needed here.

Regards
santosh

2012-08-17 10:25:50

by Santosh Shilimkar

[permalink] [raw]
Subject: Re: [PATCH v2 05/31] arm64: MMU initialisation

On Fri, Aug 17, 2012 at 3:45 PM, Catalin Marinas
<[email protected]> wrote:
> On Fri, Aug 17, 2012 at 11:06:11AM +0100, Santosh Shilimkar wrote:
>> On Tuesday 14 August 2012 11:22 PM, Catalin Marinas wrote:
>> > This patch contains the initialisation of the memory blocks, MMU
>> > attributes and the memory map. Only five memory types are defined:
>> > Device nGnRnE (equivalent to Strongly Ordered), Device nGnRE (classic
>> > Device memory), Device GRE, Normal Non-cacheable and Normal Cacheable.
>> > Cache policies are supported via the memory attributes register
>> > (MAIR_EL1) and only affect the Normal Cacheable mappings.
>> >
>> > This patch also adds the SPARSEMEM_VMEMMAP initialisation.
>> >
>> > Signed-off-by: Will Deacon<[email protected]>
>> > Signed-off-by: Catalin Marinas<[email protected]>
>> > ---
>>
>> Whats the difference between Device nGnRE and Device GRE ?
>> Sorry, I haven't gone through the specs yet and hence the
>> question.
>
> G - gathering (multiple reads/writes into one)
> R - reordering (reads/writes)
> E - early acknowledgement (the write may have not hit the device before
> the instruction returns).
>
> The 'n' in front just negates the meaning.
>
> So the Device memory as we know it on ARMv7 is equivalent to nGnRE. The
> Strongly Ordered is nGnRnE. GRE is pretty much like Normal Non-cacheable
> memory but with Device mapping, so there are restrictions on unaligned
> accesses.
>
Thanks for explaining it so clearly.

>> > +#ifdef CONFIG_ZONE_DMA32
>> > + /* 4GB maximum for 32-bit only capable devices */
>> > + max_dma32 = min(max, MAX_DMA32_PFN);
>> > + zone_size[ZONE_DMA32] = max_dma32 - min;
>> > +#endif
>>
>> Do you see need of supporting DMA32 on arm64 SOCs ?
>
> I've got some questions from partners but those devices may just be
> hidden behind an iommu. For now I left it in.
>
ok.

>> > +static struct cachepolicy cache_policies[] __initdata = {
>> > + {
>> > + .policy = "uncached",
>> > + .mair = 0x44, /* inner, outer non-cacheable */
>> > + .tcr = TCR_IRGN_NC | TCR_ORGN_NC,
>> > + }, {
>> > + .policy = "writethrough",
>> > + .mair = 0xaa, /* inner, outer write-through, read-allocate */
>> > + .tcr = TCR_IRGN_WT | TCR_ORGN_WT,
>>
>> Is WT supported on arm64?
>> On the recent ARMv7 processors, I think WT wasn't supported.
>
> All of WB, WA, WT are just architectural hints. A CPU implementation may
> or may not ignore them but with Linux we try to follow the architecture
> rather than specific implementations.
>
Agree.

Regards
Santosh

2012-08-17 11:20:49

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 02/31] arm64: Kernel booting and initialisation

On Thursday 16 August 2012, Nicolas Pitre wrote:
> > +3. Decompress the kernel image
> > +------------------------------
> > +
> > +Requirement: OPTIONAL
> > +
> > +The AArch64 kernel does not provide a decompressor and therefore
> > +requires gzip decompression to be performed by the boot loader if the
> > +default Image.gz target is used. For bootloaders that do not implement
> > +this requirement, the larger Image target is available instead.
>
> Some people will want to use bzip2 or whatever other decompressor du
> jour. Maybe this shouldn't be gzip specific, or just presented as a
> possible option?

Good point. Whether this should be part of this document depends on
what assumptions we make about the boot loader getting the image
in the first place.

In the strict sense, we are documenting the interface between the boot
loader and the kernel here, which already specifies that the kernel
must be uncompressed by the time we enter it. If the boot loader wants
to add its own encryption or compression methods, or its own headers
in front of the binary, the boot protocol isn't really impacted.

That said, I think it's a good idea to also specify what kind of
format we want to be used, e.g. a stripped ELF Image compressed
with one of gzip/bzip2/lzo/xz and with no other headers added,
on a vfat/ext4/btrfs formatted file system. There are probably a
lot of other things one might want to specify if we go down this
route. Or we could refer to the UEFI spec and mandate that the
same format that UEFI uses should be used here independent of
what boot loader is used. I think we can still allow other ways to
get to the image for deeply embedded systems, e.g. linking the
kernel into the boot loader as a blob on tightly constrained
systems. For that case, we'd only specify the interface between
boot loader and kernel as described above.

Arnd

2012-08-17 13:14:19

by Tony Lindgren

[permalink] [raw]
Subject: Re: [PATCH v2 02/31] arm64: Kernel booting and initialisation

* Shilimkar, Santosh <[email protected]> [120817 03:11]:
> On Fri, Aug 17, 2012 at 3:35 PM, Catalin Marinas
> <[email protected]> wrote:
> >
> > On Fri, Aug 17, 2012 at 10:41:10AM +0100, Santosh Shilimkar wrote:
> > >
> > > So you expect all the secondary CPUs to be in wakeup state and probably
> > > looping in WFE for a signal from kernel to boot. There is one issue
> > > with this requirement though. For large CPU system, you need to reset
> > > all the CPUs and hit this waiting loop. This will lead to large inrush
> > > current need at bootup which may be not be supported. To avoid this
> > > issue, secondary CPUs are kept in OFF state and then they are woken
> > > up from kernel one by one whenever they need to be brought into the
> > > system. This requirement should be considered.
> >
> > I agree, this part will be extended. That's one method that we currently
> > support and suitable to the model.
> >
> > The better method is the SMC standardisation that Charles Garcia-Tobin
> > has written (to be made available soon) and was presented at the last
> > Linaro Connect in HK. Given that the CPU power is usually controlled by
> > the secure side, we'll ask for an SMC to be issued for waking up
> > secondary CPUs, so it's up to the secure firmware to write the correct
> > hardware registers.
> >
> Thanks for the information. SMC standardization would indeed help
> to overcome some of these. Will wait for that information before
> next set of questions.

Yes please. If the SMC is not standardized for most calls at least,
we'll end up with a horrible mess of SoC specific calls like we
currently have. Related to that, the virtualization calls should be
also standardized so we don't end up with multiple different hypervisors
with different calls.

Regards,

Tony

2012-08-17 13:46:00

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 02/31] arm64: Kernel booting and initialisation

On Fri, Aug 17, 2012 at 12:20:40PM +0100, Arnd Bergmann wrote:
> On Thursday 16 August 2012, Nicolas Pitre wrote:
> > > +3. Decompress the kernel image
> > > +------------------------------
> > > +
> > > +Requirement: OPTIONAL
> > > +
> > > +The AArch64 kernel does not provide a decompressor and therefore
> > > +requires gzip decompression to be performed by the boot loader if the
> > > +default Image.gz target is used. For bootloaders that do not implement
> > > +this requirement, the larger Image target is available instead.
> >
> > Some people will want to use bzip2 or whatever other decompressor du
> > jour. Maybe this shouldn't be gzip specific, or just presented as a
> > possible option?
>
> Good point. Whether this should be part of this document depends on
> what assumptions we make about the boot loader getting the image
> in the first place.

It ended up here because there is a Makefile target, Image.gz, though I
haven't used it at all. I'll expand the text so that it is not
restricted to gzip.

> In the strict sense, we are documenting the interface between the boot
> loader and the kernel here, which already specifies that the kernel
> must be uncompressed by the time we enter it. If the boot loader wants
> to add its own encryption or compression methods, or its own headers
> in front of the binary, the boot protocol isn't really impacted.

Yes.

> That said, I think it's a good idea to also specify what kind of
> format we want to be used, e.g. a stripped ELF Image compressed
> with one of gzip/bzip2/lzo/xz and with no other headers added,
> on a vfat/ext4/btrfs formatted file system. There are probably a
> lot of other things one might want to specify if we go down this
> route. Or we could refer to the UEFI spec and mandate that the
> same format that UEFI uses should be used here independent of
> what boot loader is used. I think we can still allow other ways to
> get to the image for deeply embedded systems, e.g. linking the
> kernel into the boot loader as a blob on tightly constrained
> systems. For that case, we'd only specify the interface between
> boot loader and kernel as described above.

At least the latter case is used on the model. For the other cases, it's
quite early to clearly specify what is needed as we don't currently have
a boot-loader other than UEFI. We may expect GRUB as well.

--
Catalin

2012-08-17 13:49:43

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 02/31] arm64: Kernel booting and initialisation

On Fri, Aug 17, 2012 at 02:13:59PM +0100, Tony Lindgren wrote:
> * Shilimkar, Santosh <[email protected]> [120817 03:11]:
> > On Fri, Aug 17, 2012 at 3:35 PM, Catalin Marinas
> > <[email protected]> wrote:
> > >
> > > On Fri, Aug 17, 2012 at 10:41:10AM +0100, Santosh Shilimkar wrote:
> > > >
> > > > So you expect all the secondary CPUs to be in wakeup state and probably
> > > > looping in WFE for a signal from kernel to boot. There is one issue
> > > > with this requirement though. For large CPU system, you need to reset
> > > > all the CPUs and hit this waiting loop. This will lead to large inrush
> > > > current need at bootup which may be not be supported. To avoid this
> > > > issue, secondary CPUs are kept in OFF state and then they are woken
> > > > up from kernel one by one whenever they need to be brought into the
> > > > system. This requirement should be considered.
> > >
> > > I agree, this part will be extended. That's one method that we currently
> > > support and suitable to the model.
> > >
> > > The better method is the SMC standardisation that Charles Garcia-Tobin
> > > has written (to be made available soon) and was presented at the last
> > > Linaro Connect in HK. Given that the CPU power is usually controlled by
> > > the secure side, we'll ask for an SMC to be issued for waking up
> > > secondary CPUs, so it's up to the secure firmware to write the correct
> > > hardware registers.
> > >
> > Thanks for the information. SMC standardization would indeed help
> > to overcome some of these. Will wait for that information before
> > next set of questions.
>
> Yes please. If the SMC is not standardized for most calls at least,
> we'll end up with a horrible mess of SoC specific calls like we
> currently have. Related to that, the virtualization calls should be
> also standardized so we don't end up with multiple different hypervisors
> with different calls.

For hypervisor, there are two kinds of API - one meant for power
management that is pretty much the same as the SMC and another specific
to the hypervisor solution (KVM, Xen etc.). The latter is specific to
the host kernel that's running as we start it at EL2 but it's not
standardised. It would be good indeed and there are other advantages
like self-virtualisation but it needs the KVM and Xen guys to come to a
common definition.

--
Catalin

2012-08-17 16:08:20

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 06/31] arm64: MMU fault handling and page table management

On Wed, Aug 15, 2012 at 02:47:00PM +0100, Arnd Bergmann wrote:
> On Tuesday 14 August 2012, Catalin Marinas wrote:
> > +pgd_t *pgd_alloc(struct mm_struct *mm)
> > +{
> > + pgd_t *new_pgd;
> > +
> > + new_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL, PGD_ORDER);
> > + if (!new_pgd)
> > + return NULL;
> > +
> > + memset(new_pgd, 0, PAGE_SIZE << PGD_ORDER);
> > +
> > + return new_pgd;
> > +}
> > +
> > +void pgd_free(struct mm_struct *mm, pgd_t *pgd)
> > +{
> > + free_pages((unsigned long)pgd, PGD_ORDER);
> > +}
>
> According to the documentation, you should only need 8kb for the pgd on
> a 64kb page system. Is it required that you use up a full page here?

Not with the current virtual memory layout with 39-bit address space for
kernel. With 64K pages we can increase the address space to 42-bit while
still using 2-level page table, in which case a full page is used.

But for now I'll keep the same virtual memory layout and add a check on
(PTRS_PER_PGD * sizeof(pgd_t)), the compiler will choose the right path.

--
Catalin

2012-08-17 16:16:11

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 07/31] arm64: Process management

On Wed, Aug 15, 2012 at 02:53:01PM +0100, Arnd Bergmann wrote:
> On Tuesday 14 August 2012, Catalin Marinas wrote:
>
> > +#define THREAD_SIZE_ORDER 1
> > +#define THREAD_SIZE 8192
> > +#define THREAD_START_SP (THREAD_SIZE - 16)
>
> THREAD_SIZE_ORDER looks wrong for 64kb-page kernels. It also doesn't seem to
> be used, so better remove it.

It's used in kernel/fork.c if THREAD_SIZE >= PAGE_SIZE. I'll define it
conditionally.

--
Catalin

2012-08-17 18:21:33

by Nicolas Pitre

[permalink] [raw]
Subject: Re: [PATCH v2 02/31] arm64: Kernel booting and initialisation

On Fri, 17 Aug 2012, Arnd Bergmann wrote:

> On Thursday 16 August 2012, Nicolas Pitre wrote:
> > Some people will want to use bzip2 or whatever other decompressor du
> > jour. Maybe this shouldn't be gzip specific, or just presented as a
> > possible option?
>
> Good point. Whether this should be part of this document depends on
> what assumptions we make about the boot loader getting the image
> in the first place.
>
> In the strict sense, we are documenting the interface between the boot
> loader and the kernel here, which already specifies that the kernel
> must be uncompressed by the time we enter it. If the boot loader wants
> to add its own encryption or compression methods, or its own headers
> in front of the binary, the boot protocol isn't really impacted.

Right. And someone else will insist on wrapping the kernel into a boot
loader specific image format e.g. u-Boot. If all those variations could
be kept out of the kernel build that would be a good thing.

That means the kernel should be wrapped/compressed/scrambled at
installation time, not at build time. This way the kernel image remains
universal and flexibility in its installation is possible.

> That said, I think it's a good idea to also specify what kind of
> format we want to be used, e.g. a stripped ELF Image compressed with
> one of gzip/bzip2/lzo/xz and with no other headers added, on a
> vfat/ext4/btrfs formatted file system. There are probably a lot of
> other things one might want to specify if we go down this route. Or we
> could refer to the UEFI spec and mandate that the same format that
> UEFI uses should be used here independent of what boot loader is used.
> I think we can still allow other ways to get to the image for deeply
> embedded systems, e.g. linking the kernel into the boot loader as a
> blob on tightly constrained systems. For that case, we'd only specify
> the interface between boot loader and kernel as described above.

I don't think we'll have to concern ourselves with tightly constrained
systems that much on ARM64.


Nicolas

2012-08-20 09:08:09

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v2 23/31] arm64: Debugging support

On Fri, Aug 17, 2012 at 08:06:07AM +0100, Arnd Bergmann wrote:
> Sorry for the dumb question, but why do you even need PTRACE_GET_THREAD_AREA
> for 64 bit tasks? I thought the thread pointer is a GPR, or is this just
> for compat tasks?

The TLS is stored in a co-processor register which is read-only for
userspace. However, it's banked, so accessing the TLS of your child (the guy
you're ptracing) requires a system call.

Will

2012-08-20 09:27:59

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v2 23/31] arm64: Debugging support

On Mon, Aug 20, 2012 at 10:07:54AM +0100, Will Deacon wrote:
> On Fri, Aug 17, 2012 at 08:06:07AM +0100, Arnd Bergmann wrote:
> > Sorry for the dumb question, but why do you even need PTRACE_GET_THREAD_AREA
> > for 64 bit tasks? I thought the thread pointer is a GPR, or is this just
> > for compat tasks?
>
> The TLS is stored in a co-processor register which is read-only for
> userspace.

I should elaborate: the register is RW for AArch64 tasks, RO for aarch32
tasks (although that doesn't affect the need for the ptrace request).

Will

2012-08-20 10:53:12

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH v2 21/31] arm64: 32-bit (compat) applications support

Hi!

> This patch adds support for 32-bit applications. The vectors page is a
> binary blob mapped into the application user space at 0xffff0000 (the
> AArch64 toolchain does not support compilation of AArch32 code). Full
> compatibility with ARMv7 user space is supported. The use of deprecated
> ARMv7 functionality (SWP, CP15 barriers) has been disabled by default on
> AArch64 kernels and unaligned LDM/STM is not supported.
>
> Please note that only the ARM 32-bit EABI is supported, so no OABI
> compatibility.

> +struct compat_statfs {
> + int f_type;
> + int f_bsize;
> + int f_blocks;
> + int f_bfree;
> + int f_bavail;
> + int f_files;
> + int f_ffree;
> + compat_fsid_t f_fsid;
> + int f_namelen; /* SunOS ignores this field. */

I'm sure it does. But is it good comment?

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2012-08-20 15:58:20

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 08/31] arm64: CPU support

On Wed, Aug 15, 2012 at 01:10:43AM +0100, Olof Johansson wrote:
> On Tue, Aug 14, 2012 at 06:52:09PM +0100, Catalin Marinas wrote:
>
> > diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h
> > new file mode 100644
> > index 0000000..ef54125
> > --- /dev/null
> > +++ b/arch/arm64/include/asm/cputype.h
> > @@ -0,0 +1,49 @@
> > +#define ID_MIDR_EL1 "midr_el1"
> > +#define ID_CTR_EL0 "ctr_el0"
> > +
> > +#define ID_AA64PFR0_EL1 "id_aa64pfr0_el1"
> > +#define ID_AA64DFR0_EL1 "id_aa64dfr0_el1"
> > +#define ID_AA64AFR0_EL1 "id_aa64afr0_el1"
> > +#define ID_AA64ISAR0_EL1 "id_aa64isar0_el1"
> > +#define ID_AA64MMFR0_EL1 "id_aa64mmfr0_el1"
> > +
> > +#define read_cpuid(reg) ({ \
> > + u64 __val; \
> > + asm("mrs %0, " reg : "=r" (__val)); \
> > + __val; \
> > +})
> > +
> > +/*
> > + * The CPU ID never changes at run time, so we might as well tell the
> > + * compiler that it's constant. Use this function to read the CPU ID
> > + * rather than directly reading processor_id or read_cpuid() directly.
> > + */
> > +static inline u32 __attribute_const__ read_cpuid_id(void)
> > +{
> > + return read_cpuid(ID_MIDR_EL1);
> > +}
> > +
> > +static inline u32 __attribute_const__ read_cpuid_cachetype(void)
> > +{
> > + return read_cpuid(ID_CTR_EL0);
> > +}
>
> Is this perhaps a carry-over from arch/arm? Abstracting out read_cpuid()
> doesn't seem to buy anything here, just opencode the one-line assembly
> in each.

It doesn't buy much but it's more readable to use read_cpuid() in places
like hw_breakpoint.c than open coding the assembly.

I could get rid of the ID_* macros and just pass the register name
direcly to read_cpuid().

> Might as well cleanup the naming a little too while you're at it, i.e.
> read_cpu_id() and read_cpu_cachetype().

These were defined for convenience, a bit less typing. But they have the
intended name.

> > --- /dev/null
> > +++ b/arch/arm64/mm/proc-syms.c
...
> > +EXPORT_SYMBOL(__cpuc_flush_kern_all);
> > +EXPORT_SYMBOL(__cpuc_flush_user_all);
> > +EXPORT_SYMBOL(__cpuc_flush_user_range);
> > +EXPORT_SYMBOL(__cpuc_coherent_kern_range);
> > +EXPORT_SYMBOL(__cpuc_flush_dcache_area);
>
> See comment on other email about putting function pointers in a struct
> instead.

There is no need to support multiple CPU architectures with different
implementations, so allowing these functions to be called without
indirection is better.

> > diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> > new file mode 100644
> > index 0000000..453f517
> > --- /dev/null
> > +++ b/arch/arm64/mm/proc.S
> > @@ -0,0 +1,193 @@
> > + .section ".proc.info.init", #alloc, #execinstr
> > +
> > + .type __v8_proc_info, #object
> > +__v8_proc_info:
> > + .long 0x000f0000 // Required ID value
> > + .long 0x000f0000 // Mask for ID
> > + b __cpu_setup
> > + nop
> > + .quad cpu_name
> > + .long 0
> > + .size __v8_proc_info, . - __v8_proc_info
>
> I know this is a carry-over from arch/arm, but how about moving this
> to more of a C construct similar to arch/powerpc/kernel/cputable.c
> instead? It's considerably easier to read that way, and it's convenient
> to have the definitions all in one place, making it easier to share some
> of the functions, etc.

I can do this, it would be indeed cleaner.

--
Catalin

2012-08-20 16:01:29

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 08/31] arm64: CPU support

On Wed, Aug 15, 2012 at 02:56:05PM +0100, Arnd Bergmann wrote:
> On Tuesday 14 August 2012, Catalin Marinas wrote:
> > --- /dev/null
> > +++ b/arch/arm64/include/asm/procinfo.h
...
> > +struct proc_info_list {
> > + unsigned int cpu_val;
> > + unsigned int cpu_mask;
> > + unsigned long __cpu_flush; /* used by head.S */
> > + const char *cpu_name;
> > +};
> > +
> > +#else /* __KERNEL__ */
> > +#include <asm/elf.h>
> > +#warning "Please include asm/elf.h instead"
> > +#endif /* __KERNEL__ */
> > +#endif
>
> I think you forgot to remove this file when you removed MULTI_CPU.

proc_info_list() structure is still used just for the CPU name and setup
function (e.g. we need to apply errata workaround on a certain CPU). But
as Olof suggested, I better move all this to a cputable.c file.

--
Catalin

2012-08-20 20:11:09

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 23/31] arm64: Debugging support

On Monday 20 August 2012, Will Deacon wrote:
> On Mon, Aug 20, 2012 at 10:07:54AM +0100, Will Deacon wrote:
> > On Fri, Aug 17, 2012 at 08:06:07AM +0100, Arnd Bergmann wrote:
> > > Sorry for the dumb question, but why do you even need PTRACE_GET_THREAD_AREA
> > > for 64 bit tasks? I thought the thread pointer is a GPR, or is this just
> > > for compat tasks?
> >
> > The TLS is stored in a co-processor register which is read-only for
> > userspace.
>
> I should elaborate: the register is RW for AArch64 tasks, RO for aarch32
> tasks (although that doesn't affect the need for the ptrace request).

So can't you just /add/ that register to the GPR regset?

Arnd

2012-08-20 20:35:13

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 21/31] arm64: 32-bit (compat) applications support

On Monday 20 August 2012, Pavel Machek wrote:
> > This patch adds support for 32-bit applications. The vectors page is a
> > binary blob mapped into the application user space at 0xffff0000 (the
> > AArch64 toolchain does not support compilation of AArch32 code). Full
> > compatibility with ARMv7 user space is supported. The use of deprecated
> > ARMv7 functionality (SWP, CP15 barriers) has been disabled by default on
> > AArch64 kernels and unaligned LDM/STM is not supported.
> >
> > Please note that only the ARM 32-bit EABI is supported, so no OABI
> > compatibility.
>
> > +struct compat_statfs {
> > + int f_type;
> > + int f_bsize;
> > + int f_blocks;
> > + int f_bfree;
> > + int f_bavail;
> > + int f_files;
> > + int f_ffree;
> > + compat_fsid_t f_fsid;
> > + int f_namelen; /* SunOS ignores this field. */
>
> I'm sure it does. But is it good comment?

Good catch. It seems that some of the other compat platforms (x86,
sparc, powerpc) have the same thing. I guess the real solution would
be to introduce an asm-generic/compat.h file that contains a bunch
of those definitions, like

#ifndef compat_timespec
struct compat_timespec {
compat_time_t tv_sec;
s32 tv_nsec;
};
#endif

#ifndef compat_timeval
struct compat_timeval {
compat_time_t tv_sec;
s32 tv_usec;
};
#endif

#ifndef compat_sysctl
struct compat_sysctl {
unsigned int name;
int nlen;
unsigned int oldval;
unsigned int oldlenp;
unsigned int newval;
unsigned int newlen;
unsigned int __unused[4];
};
#endif

For the most part, arch/tile should have useful defaults, though not in the
case of struct statfs, because its 32 bit version does not have a statfs syscall
(it only has statfs64).

Arnd

2012-08-20 20:47:15

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 08/31] arm64: CPU support

On Monday 20 August 2012, Catalin Marinas wrote:
> > > --- /dev/null
> > > +++ b/arch/arm64/mm/proc-syms.c
> ...
> > > +EXPORT_SYMBOL(__cpuc_flush_kern_all);
> > > +EXPORT_SYMBOL(__cpuc_flush_user_all);
> > > +EXPORT_SYMBOL(__cpuc_flush_user_range);
> > > +EXPORT_SYMBOL(__cpuc_coherent_kern_range);
> > > +EXPORT_SYMBOL(__cpuc_flush_dcache_area);
> >
> > See comment on other email about putting function pointers in a struct
> > instead.
>
> There is no need to support multiple CPU architectures with different
> implementations, so allowing these functions to be called without
> indirection is better.

What is the __cpuc prefix about then? Could you just drop it?

Arnd

2012-08-21 09:03:36

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v2 23/31] arm64: Debugging support

On Mon, Aug 20, 2012 at 09:10:59PM +0100, Arnd Bergmann wrote:
> On Monday 20 August 2012, Will Deacon wrote:
> > On Mon, Aug 20, 2012 at 10:07:54AM +0100, Will Deacon wrote:
> > > On Fri, Aug 17, 2012 at 08:06:07AM +0100, Arnd Bergmann wrote:
> > > > Sorry for the dumb question, but why do you even need PTRACE_GET_THREAD_AREA
> > > > for 64 bit tasks? I thought the thread pointer is a GPR, or is this just
> > > > for compat tasks?
> > >
> > > The TLS is stored in a co-processor register which is read-only for
> > > userspace.
> >
> > I should elaborate: the register is RW for AArch64 tasks, RO for aarch32
> > tasks (although that doesn't affect the need for the ptrace request).
>
> So can't you just /add/ that register to the GPR regset?

We *could*, but it doesn't feel right to me. The TLS register:

- Can only be manipulated by msr/mrs instructions
- Is not part of the PCS
- Is not accessed explicitly by user applications
- Can not be written via ptrace

so I think it sticks out like a sore thumb if we bundle it up with the GPRs.

Will

2012-08-21 09:51:29

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 08/31] arm64: CPU support

On Mon, Aug 20, 2012 at 09:47:07PM +0100, Arnd Bergmann wrote:
> On Monday 20 August 2012, Catalin Marinas wrote:
> > > > --- /dev/null
> > > > +++ b/arch/arm64/mm/proc-syms.c
> > ...
> > > > +EXPORT_SYMBOL(__cpuc_flush_kern_all);
> > > > +EXPORT_SYMBOL(__cpuc_flush_user_all);
> > > > +EXPORT_SYMBOL(__cpuc_flush_user_range);
> > > > +EXPORT_SYMBOL(__cpuc_coherent_kern_range);
> > > > +EXPORT_SYMBOL(__cpuc_flush_dcache_area);
> > >
> > > See comment on other email about putting function pointers in a struct
> > > instead.
> >
> > There is no need to support multiple CPU architectures with different
> > implementations, so allowing these functions to be called without
> > indirection is better.
>
> What is the __cpuc prefix about then? Could you just drop it?

It can be dropped indeed.

--
Catalin

2012-08-21 10:28:26

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH v2 21/31] arm64: 32-bit (compat) applications support

On Mon 2012-08-20 20:34:54, Arnd Bergmann wrote:
> On Monday 20 August 2012, Pavel Machek wrote:
> > > This patch adds support for 32-bit applications. The vectors page is a
> > > binary blob mapped into the application user space at 0xffff0000 (the
> > > AArch64 toolchain does not support compilation of AArch32 code). Full
> > > compatibility with ARMv7 user space is supported. The use of deprecated
> > > ARMv7 functionality (SWP, CP15 barriers) has been disabled by default on
> > > AArch64 kernels and unaligned LDM/STM is not supported.
> > >
> > > Please note that only the ARM 32-bit EABI is supported, so no OABI
> > > compatibility.
> >
> > > +struct compat_statfs {
> > > + int f_type;
> > > + int f_bsize;
> > > + int f_blocks;
> > > + int f_bfree;
> > > + int f_bavail;
> > > + int f_files;
> > > + int f_ffree;
> > > + compat_fsid_t f_fsid;
> > > + int f_namelen; /* SunOS ignores this field. */
> >
> > I'm sure it does. But is it good comment?
>
> Good catch. It seems that some of the other compat platforms (x86,
> sparc, powerpc) have the same thing. I guess the real solution would
> be to introduce an asm-generic/compat.h file that contains a bunch
> of those definitions, like
>
> #ifndef compat_timespec
> struct compat_timespec {
> compat_time_t tv_sec;
> s32 tv_nsec;
> };
> #endif
>
> #ifndef compat_timeval
> struct compat_timeval {
> compat_time_t tv_sec;
> s32 tv_usec;
> };
> #endif

Yes, I guess that would be very good.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2012-08-21 13:00:30

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 14/31] arm64: DMA mapping API

On Wed, Aug 15, 2012 at 05:16:00PM +0100, Arnd Bergmann wrote:
> On Tuesday 14 August 2012, Catalin Marinas wrote:
> > +static struct dma_map_ops arm64_swiotlb_dma_ops = {
> > + .alloc = arm64_swiotlb_alloc_coherent,
> > + .free = arm64_swiotlb_free_coherent,
> > + .map_page = arm64_swiotlb_map_page,
> > + .unmap_page = arm64_swiotlb_unmap_page,
> > + .map_sg = arm64_swiotlb_map_sg_attrs,
> > + .unmap_sg = arm64_swiotlb_unmap_sg_attrs,
> > + .sync_single_for_cpu = arm64_swiotlb_sync_single_for_cpu,
> > + .sync_single_for_device = arm64_swiotlb_sync_single_for_device,
> > + .sync_sg_for_cpu = arm64_swiotlb_sync_sg_for_cpu,
> > + .sync_sg_for_device = arm64_swiotlb_sync_sg_for_device,
> > + .dma_supported = swiotlb_dma_supported,
> > + .mapping_error = swiotlb_dma_mapping_error,
> > +};
> > +
> > +void __init swiotlb_init_with_default_size(size_t default_size, int verbose);
> > +
> > +void __init arm64_swiotlb_init(size_t max_size)
> > +{
> > + dma_ops = &arm64_swiotlb_dma_ops;
> > + swiotlb_init_with_default_size(min((size_t)SZ_64M, max_size), 1);
> > +}
>
> Why is swiotlb the default? I would expect that most devices can in fact
> use the entire 64 bit address space, so you can use a simple linear
> implementation for those.

That was my worry, devices not capable of accessing the full 64-bit
address space. We can hope that those SoCs would have an IOMMU but I
can't tell for sure at this stage.

The default implementation could be simpler. I can even drop it
altogether from the initial patchset given that no SoC makes use of it
yet.

--
Catalin

2012-08-21 13:06:21

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 14/31] arm64: DMA mapping API

On Wed, Aug 15, 2012 at 01:40:06AM +0100, Olof Johansson wrote:
> On Tue, Aug 14, 2012 at 06:52:15PM +0100, Catalin Marinas wrote:
> > +static inline struct dma_map_ops *get_dma_ops(struct device *dev)
> > +{
> > + if (unlikely(!dev) || !dev->archdata.dma_ops)
> > + return dma_ops;
> > + else
> > + return dev->archdata.dma_ops;
> > +}
>
> Does it make sense to add the concept of a global dma ops on arm64,
> instead of requiring the dma ops pointer per device similar to how
> some other platforms do it (including powerpc)? For devices that lack
> archdata.dma_ops, dma_supported() should return 0 (and the other ops
> should return error).

If the device doesn't have archdata.dma_ops we return the default
implementation which is currently based on swiotlb. Do you mean that
this shouldn't be the case and just let the device always set
archdata.dma_ops?

--
Catalin

2012-08-21 16:07:28

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 16/31] arm64: ELF definitions

On Thu, Aug 16, 2012 at 01:37:53PM +0100, Arnd Bergmann wrote:
> On Thursday 16 August 2012, Will Deacon wrote:
> > > This looks wrong: PER_LINUX/PER_LINUX32 decides over the output of the
> > > uname system call, while TIF_32BIT decides over the instruction set
> > > when returning to user space. You definitely should not set the personality
> > > to the value you pass from the elf loader. Instead, just do
> > >
> > > #define SET_PERSONALITY(ex) clear_thread_flag(TIF_32BIT);
> > > #defined COMPAT_SET_PERSONALITY(ex) set_thread_flag(TIF_32BIT);
> >
> > In this case, won't uname be incorrect (aarch64l) for aarch32 tasks (which
> > expect something like armv8l)?
>
> No, the uname output is meant to tell you about the system, not the
> instruction set that you are using (you already know that in compiled
> code).

OK, so we assumed that compat tasks should get a uname as close as
possible to a 32-bit system, i.e. armv8l, for full compatibility. This
would allow us to run something like 32-bit Debian on an AArch64 kernel
without worrying about any scripts failing.

But I can see on x86 that it always reports x86_64 even if the task is
x86_32.

--
Catalin

2012-08-21 17:51:59

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 17/31] arm64: System calls handling

On Wed, Aug 15, 2012 at 03:22:16PM +0100, Arnd Bergmann wrote:
> On Tuesday 14 August 2012, Catalin Marinas wrote:
>
> > +
> > +/* This matches struct stat64 in glibc2.1, hence the absolutely
> > + * insane amounts of padding around dev_t's.
> > + * Note: The kernel zero's the padded region because glibc might read them
> > + * in the hope that the kernel has stretched to using larger sizes.
> > + */
> > +struct stat64 {
> > + compat_u64 st_dev;
> > + unsigned char __pad0[4];
>
> The comment above struct stat64 is completely irrelevant here. I would instead
> explain why you need your own stat64 in the first place.

OK, I added a comment. It's only needed for compat.

> > +int kernel_execve(const char *filename,
> > + const char *const argv[],
> > + const char *const envp[])
>
> Al Viro was recently talking about a generic implementation of execve.
> I can't find that now, but I think you should use that.

I've seen these but I'm waiting for the generic sys_execve and
kernel_execve to get into mainline before switch arch/arm64 to them.

> > +asmlinkage long sys_mmap(unsigned long addr, unsigned long len,
> > + unsigned long prot, unsigned long flags,
> > + unsigned long fd, off_t off)
> > +{
> > + if (offset_in_page(off) != 0)
> > + return -EINVAL;
> > +
> > + return sys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
> > +}
>
> I think
>
> #define sys_mmap sys_mmap_pgoff

There are slightly different semantics with the last argument of
sys_mmap() which takes a byte offset. The sys_mmap_pgoff() function
takes the offset shifted by PAGE_SHIFT (which is the same as sys_mmap2).

Looking at the other architectures, it makes sense to use a generic
sys_mmap() implementation similar to the one above (or the ia-64, seems
to be the most complete).

--
Catalin

2012-08-21 18:23:01

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v2 16/31] arm64: ELF definitions

On Tue, Aug 21, 2012 at 6:06 PM, Catalin Marinas
<[email protected]> wrote:
> But I can see on x86 that it always reports x86_64 even if the task is
> x86_32.

Really?

$ uname -m
x86_64
$ linux32 uname -m
i686
$

On Ubuntu 2.6.32-42-generic

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2012-08-21 18:28:11

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 16/31] arm64: ELF definitions

On Tue, Aug 21, 2012 at 07:17:19PM +0100, Geert Uytterhoeven wrote:
> On Tue, Aug 21, 2012 at 6:06 PM, Catalin Marinas
> <[email protected]> wrote:
> > But I can see on x86 that it always reports x86_64 even if the task is
> > x86_32.
>
> Really?
>
> $ uname -m
> x86_64
> $ linux32 uname -m
> i686
> $

Well, you set the personality explicitly with linux32. What I tested was
with an x86_32 uname called directly (without linux32) and even though
the ELF was a 32-bit one, it was reporting x86_64. In this AArch64
patch, a compat task was automatically setting the linux32 personality
(which x86 does not do).

Arnd's point is that the ELF file should not affect the personality and
hence the uname value. This should only be done by an explicit call to
sys_personality().

--
Catalin

2012-08-21 18:53:21

by Mike Frysinger

[permalink] [raw]
Subject: Re: [PATCH v2 16/31] arm64: ELF definitions

On Tuesday 21 August 2012 14:27:31 Catalin Marinas wrote:
> On Tue, Aug 21, 2012 at 07:17:19PM +0100, Geert Uytterhoeven wrote:
> > On Tue, Aug 21, 2012 at 6:06 PM, Catalin Marinas wrote:
> > > But I can see on x86 that it always reports x86_64 even if the task is
> > > x86_32.
> >
> > Really?
> >
> > $ uname -m
> > x86_64
> > $ linux32 uname -m
> > i686
> > $
>
> Well, you set the personality explicitly with linux32. What I tested was
> with an x86_32 uname called directly (without linux32) and even though
> the ELF was a 32-bit one, it was reporting x86_64. In this AArch64
> patch, a compat task was automatically setting the linux32 personality
> (which x86 does not do).

i don't think any arch does this.

$ uname -m
ppc64
$ linux32 uname -m
ppc

$ uname -m
sparc64
$ linux32 uname -m
sparc

$ uname -m
x86_64
$ linux32 uname -m
i686

$ uname -m
s390x
$ linux32 uname -m
s390

> Arnd's point is that the ELF file should not affect the personality and
> hence the uname value. This should only be done by an explicit call to
> sys_personality().

correct. if someone really wants to launch their whole userland with the
adjusted personality, they could always boot the kernel with:
init=/usr/bin/linux32 /sbin/init
but the kernel shouldn't be doing this automatically.
-mike


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part.

2012-08-21 19:20:29

by Christopher Covington

[permalink] [raw]
Subject: Re: [PATCH v2 28/31] arm64: Generic timers support

On 08/14/2012 01:52 PM, Catalin Marinas wrote:
> From: Marc Zyngier <[email protected]>
>
> This patch adds support for the ARM generic timers with A64 instructions
> for accessing the timer registers. It uses the physical counter as the
> clock source and the virtual counter as sched_clock.
>
> The timer frequency can be specified via DT or read from the CNTFRQ_EL0
> register. The physical counter is also accessible from user space
> allowing fast gettimeofday() implementation.

[...]

> +++ b/drivers/clocksource/arm_generic.c

[...]

> +static void arch_timer_reg_write(int reg, u32 val)
> +{
> + switch (reg) {
> + case ARCH_TIMER_REG_CTRL:
> + asm volatile("msr cntp_ctl_el0, %0" : : "r" (val));
> + break;
> + case ARCH_TIMER_REG_TVAL:
> + asm volatile("msr cntp_tval_el0, %0" : : "r" (val));
> + break;
> + default:
> + BUG();
> + }
> +
> + isb();
> +}

Doesn't architecture-specific assembly need to go in the arch directory rather
than the drivers directory?

Christopher

--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum

2012-08-21 20:14:13

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 17/31] arm64: System calls handling

On Tuesday 21 August 2012, Catalin Marinas wrote:
> > > +asmlinkage long sys_mmap(unsigned long addr, unsigned long len,
> > > + unsigned long prot, unsigned long flags,
> > > + unsigned long fd, off_t off)
> > > +{
> > > + if (offset_in_page(off) != 0)
> > > + return -EINVAL;
> > > +
> > > + return sys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
> > > +}
> >
> > I think
> >
> > #define sys_mmap sys_mmap_pgoff
>
> There are slightly different semantics with the last argument of
> sys_mmap() which takes a byte offset. The sys_mmap_pgoff() function
> takes the offset shifted by PAGE_SHIFT (which is the same as sys_mmap2).
>
> Looking at the other architectures, it makes sense to use a generic
> sys_mmap() implementation similar to the one above (or the ia-64, seems
> to be the most complete).
>

Why that? The generic sys_mmap_pgoff was specifically added so new architectures
could just use that instead of having their own wrappers, see f8b72560.

Arnd

2012-08-21 20:17:14

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 16/31] arm64: ELF definitions

On Tuesday 21 August 2012, Catalin Marinas wrote:
> On Thu, Aug 16, 2012 at 01:37:53PM +0100, Arnd Bergmann wrote:
> > On Thursday 16 August 2012, Will Deacon wrote:
> > > > This looks wrong: PER_LINUX/PER_LINUX32 decides over the output of the
> > > > uname system call, while TIF_32BIT decides over the instruction set
> > > > when returning to user space. You definitely should not set the personality
> > > > to the value you pass from the elf loader. Instead, just do
> > > >
> > > > #define SET_PERSONALITY(ex) clear_thread_flag(TIF_32BIT);
> > > > #defined COMPAT_SET_PERSONALITY(ex) set_thread_flag(TIF_32BIT);
> > >
> > > In this case, won't uname be incorrect (aarch64l) for aarch32 tasks (which
> > > expect something like armv8l)?
> >
> > No, the uname output is meant to tell you about the system, not the
> > instruction set that you are using (you already know that in compiled
> > code).
>
> OK, so we assumed that compat tasks should get a uname as close as
> possible to a 32-bit system, i.e. armv8l, for full compatibility. This
> would allow us to run something like 32-bit Debian on an AArch64 kernel
> without worrying about any scripts failing.

You can still do that, just boot with init="/sbin/setarch armv7 /sbin/init".

> But I can see on x86 that it always reports x86_64 even if the task is
> x86_32.

Not just x86, the same behavior is used on powerpc, s390, mips, sparc and
parisc. Not sure about tile though.

Arnd

2012-08-21 22:02:23

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 17/31] arm64: System calls handling

On Tue, Aug 21, 2012 at 09:14:01PM +0100, Arnd Bergmann wrote:
> On Tuesday 21 August 2012, Catalin Marinas wrote:
> > > > +asmlinkage long sys_mmap(unsigned long addr, unsigned long len,
> > > > + unsigned long prot, unsigned long flags,
> > > > + unsigned long fd, off_t off)
> > > > +{
> > > > + if (offset_in_page(off) != 0)
> > > > + return -EINVAL;
> > > > +
> > > > + return sys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
> > > > +}
> > >
> > > I think
> > >
> > > #define sys_mmap sys_mmap_pgoff
> >
> > There are slightly different semantics with the last argument of
> > sys_mmap() which takes a byte offset. The sys_mmap_pgoff() function
> > takes the offset shifted by PAGE_SHIFT (which is the same as sys_mmap2).
> >
> > Looking at the other architectures, it makes sense to use a generic
> > sys_mmap() implementation similar to the one above (or the ia-64, seems
> > to be the most complete).
>
> Why that? The generic sys_mmap_pgoff was specifically added so new architectures
> could just use that instead of having their own wrappers, see f8b72560.

As I understand, sys_mmap_pgoff can be used instead of sys_mmap2 on new
32-bit architectures. But on 64-bit architectures we don't have
sys_mmap2, only sys_mmap with the difference that the last argument is
the offset in bytes (and multiple of PAGE_SIZE) rather than in pages. So
unless we change the meaning of this last argument for sys_mmap, we
cannot just define it to sys_mmap_pgoff.

Since the other 64-bit architectures seem to have a sys_mmap wrapper
that does this:

sys_mmap_pgoff(..., off >> PAGE_SHIFT);

I think AArch64 should also use the same sys_mmap convention. We can
make this wrapper generic.

--
Catalin

2012-08-22 07:56:40

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 17/31] arm64: System calls handling

On Tuesday 21 August 2012, Catalin Marinas wrote:
> As I understand, sys_mmap_pgoff can be used instead of sys_mmap2 on new
> 32-bit architectures. But on 64-bit architectures we don't have
> sys_mmap2, only sys_mmap with the difference that the last argument is
> the offset in bytes (and multiple of PAGE_SIZE) rather than in pages. So
> unless we change the meaning of this last argument for sys_mmap, we
> cannot just define it to sys_mmap_pgoff.
>
> Since the other 64-bit architectures seem to have a sys_mmap wrapper
> that does this:
>
> sys_mmap_pgoff(..., off >> PAGE_SHIFT);
>
> I think AArch64 should also use the same sys_mmap convention. We can
> make this wrapper generic.

But the wrapper can just as well be part of glibc, which already has
one. There is no reason for the kernel to export two generic interfaces
for mmap when one of them only works on 64 bit and the other one is
good for both 32 and 64 bit.

All the other 64 bit architectures (besides tile) were added to the
kernel before we had sys_mmap_pgoff.

Arnd

2012-08-22 10:30:32

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 17/31] arm64: System calls handling

On Wed, Aug 22, 2012 at 08:56:30AM +0100, Arnd Bergmann wrote:
> On Tuesday 21 August 2012, Catalin Marinas wrote:
> > As I understand, sys_mmap_pgoff can be used instead of sys_mmap2 on new
> > 32-bit architectures. But on 64-bit architectures we don't have
> > sys_mmap2, only sys_mmap with the difference that the last argument is
> > the offset in bytes (and multiple of PAGE_SIZE) rather than in pages. So
> > unless we change the meaning of this last argument for sys_mmap, we
> > cannot just define it to sys_mmap_pgoff.
> >
> > Since the other 64-bit architectures seem to have a sys_mmap wrapper
> > that does this:
> >
> > sys_mmap_pgoff(..., off >> PAGE_SHIFT);
> >
> > I think AArch64 should also use the same sys_mmap convention. We can
> > make this wrapper generic.
>
> But the wrapper can just as well be part of glibc, which already has
> one. There is no reason for the kernel to export two generic interfaces
> for mmap when one of them only works on 64 bit and the other one is
> good for both 32 and 64 bit.

The kernel only exports a single interface for 64-bit, that's
sys_mmap(). For compat we only export sys_mmap2() (which, of course,
would not work for 64-bit).

The generic prototypes for sys_mmap and sys_mmap2 are different with
regards to the last argument: off_t vs unsigned long. While in practice
it's the same size, off_t is used throughout the kernel as offset in
bytes rather than pages (hence the prototype change in sys_mmap2 and
sys_mmap_pgoff).

But what's more important - moving this wrapper to glibc causes issues
with the page size. We support both 4KB and 64KB pages on 64-bit systems
(the latter without compat support). The kernel is in a better position
to do the shift by a compile-time constant. Glibc would need to enquire
the actual page size to do the shift before calling sys_mmap_pgoff. If
we assume in glibc that the shift is always 12, we need another wrapper
in the kernel anyway for 64KB page configuration. So passing the offset
in bytes worked best for us.

> All the other 64 bit architectures (besides tile) were added to the
> kernel before we had sys_mmap_pgoff.

So there is no new 64-bit architecture that defines sys_mmap to
sys_mmap_pgoff. I don't think that AArch64 should introduce this, given
the restrictions I mentioned above. sys_mmap2/sys_mmap_pgoff are a way
to extend the file offset beyond 32-bit but that's not needed on a
64-bit system.

--
Catalin

2012-08-22 12:27:24

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 17/31] arm64: System calls handling

On Wednesday 22 August 2012, Catalin Marinas wrote:
> But what's more important - moving this wrapper to glibc causes issues
> with the page size. We support both 4KB and 64KB pages on 64-bit systems
> (the latter without compat support). The kernel is in a better position
> to do the shift by a compile-time constant. Glibc would need to enquire
> the actual page size to do the shift before calling sys_mmap_pgoff. If
> we assume in glibc that the shift is always 12, we need another wrapper
> in the kernel anyway for 64KB page configuration. So passing the offset
> in bytes worked best for us.

Right, the kernel interface should really be independent of the page
size, as sys_mmap2 normally is, and sys_mmap2 is not provided here.

Arnd

2012-08-22 17:13:53

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 17/31] arm64: System calls handling

On Wed, Aug 22, 2012 at 01:27:14PM +0100, Arnd Bergmann wrote:
> On Wednesday 22 August 2012, Catalin Marinas wrote:
> > But what's more important - moving this wrapper to glibc causes issues
> > with the page size. We support both 4KB and 64KB pages on 64-bit systems
> > (the latter without compat support). The kernel is in a better position
> > to do the shift by a compile-time constant. Glibc would need to enquire
> > the actual page size to do the shift before calling sys_mmap_pgoff. If
> > we assume in glibc that the shift is always 12, we need another wrapper
> > in the kernel anyway for 64KB page configuration. So passing the offset
> > in bytes worked best for us.
>
> Right, the kernel interface should really be independent of the page
> size, as sys_mmap2 normally is, and sys_mmap2 is not provided here.

sys_mmap2 is indeed independent of the page size on most architectures
assuming that the last argument represents the offset in units of 4096.
The cris and ia64 seem to differ (one being 8K, the other variable).

sys_mmap is also independent of the page size.

But using sys_mmap2 for a 64-bit architecture, especially when the page
size is not always 4K, does not bring any advantages. We end up doing a
shift by 12 in glibc and another shift by (PAGE_SHIFT - 12) in the
kernel wrapper. Unless I missed your point, I don't see the reason for
using sys_mmap2 on a 64-bit architecture, apart from it being newer (and
compat support should not have any relevance, we have different syscall
tables anyway).

--
Catalin

2012-08-23 07:40:52

by Arnd Bergmann

[permalink] [raw]
Subject: PER_LINUX32, Was: [PATCH v2 21/31] arm64: 32-bit (compat) applications support

On Thursday 16 August 2012, Will Deacon wrote:
> On Wed, Aug 15, 2012 at 03:34:04PM +0100, Arnd Bergmann wrote:
> > On Tuesday 14 August 2012, Catalin Marinas wrote:
> > > +asmlinkage int compat_sys_personality(compat_ulong_t personality)
> > > +{
> > > + int ret;
> > > +
> > > + if (personality(current->personality) == PER_LINUX32 &&
> > > + personality == PER_LINUX)
> > > + personality = PER_LINUX32;
> > > + ret = sys_personality(personality);
> > > + if (ret == PER_LINUX32)
> > > + ret = PER_LINUX;
> > > + return ret;
> > > +}
> >
> > Where did you get this from?
> >
> > You should not need compat_sys_personality, just call the native function.
>
> Hmm, but in that case an aarch32 application doing a personality(PER_LINUX)
> syscall will start seeing the wrong uname.

Coming back at this topic, I noticed another issue. Jiri Kosina
has recently posted patches to fix this function in the other architectures
in order to mask out the other personality bits, which is a correct fix,
but the above function is odd for other reasons.

* On MIPS, it is used only for compat tasks, like you have it above.
* On PA-RISC, it is used for native 32 bit tasks and for compat 32 bit tasks,
but not for native 64 bit ones.
* On IA64, it was used for compat tasks (support for which has since
been removed from the kernel), plus all 32 bit tasks would start with
PER_LINUX32.
* On PowerPC, Sparc and s390, it is used for native 64 bit tasks and for
compat 32 bit tasks, but not for native 32 bit ones.
* On Tile, it was never used.
* On x86_64, it used to be defined (copied from ia64) but not used
throughout the git history.

The semantics of the function are also interesting: The intention seems
to be that to a compat task, PER_LINUX32 would appear as PER_LINUX.
The effect is that any process can set PER_LINUX32 but it can never
be unset except by a 64 bit MIPS or PA-RISC task.

Since x86_64 does not implement this behavior at all, I suspect that
there are now lots of things depending on not having it, while all
the other architectures might also have some (even predating the
x86_64 port) use cases that depend on depend on not being able to
observe PER_LINUX32 in 32 bit compat tasks.

I think we should try to agree on how this is all supposed to work
and use common code, either put the ppc/sparc/s390 version into
sys_personality, or remove all of them and just do what x86 and tile
do, using the regular sys_personality for all tasks.

Arnd

2012-08-23 10:43:32

by Catalin Marinas

[permalink] [raw]
Subject: Re: PER_LINUX32, Was: [PATCH v2 21/31] arm64: 32-bit (compat) applications support

On Thu, Aug 23, 2012 at 07:46:30AM +0100, Arnd Bergmann wrote:
> On Thursday 16 August 2012, Will Deacon wrote:
> > On Wed, Aug 15, 2012 at 03:34:04PM +0100, Arnd Bergmann wrote:
> > > On Tuesday 14 August 2012, Catalin Marinas wrote:
> > > > +asmlinkage int compat_sys_personality(compat_ulong_t personality)
> > > > +{
> > > > + int ret;
> > > > +
> > > > + if (personality(current->personality) == PER_LINUX32 &&
> > > > + personality == PER_LINUX)
> > > > + personality = PER_LINUX32;
> > > > + ret = sys_personality(personality);
> > > > + if (ret == PER_LINUX32)
> > > > + ret = PER_LINUX;
> > > > + return ret;
> > > > +}
> > >
> > > Where did you get this from?
> > >
> > > You should not need compat_sys_personality, just call the native function.
> >
> > Hmm, but in that case an aarch32 application doing a personality(PER_LINUX)
> > syscall will start seeing the wrong uname.
>
> Coming back at this topic, I noticed another issue. Jiri Kosina
> has recently posted patches to fix this function in the other architectures
> in order to mask out the other personality bits, which is a correct fix,
> but the above function is odd for other reasons.
>
> * On MIPS, it is used only for compat tasks, like you have it above.
> * On PA-RISC, it is used for native 32 bit tasks and for compat 32 bit tasks,
> but not for native 64 bit ones.
> * On IA64, it was used for compat tasks (support for which has since
> been removed from the kernel), plus all 32 bit tasks would start with
> PER_LINUX32.
> * On PowerPC, Sparc and s390, it is used for native 64 bit tasks and for
> compat 32 bit tasks, but not for native 32 bit ones.
> * On Tile, it was never used.
> * On x86_64, it used to be defined (copied from ia64) but not used
> throughout the git history.
>
> The semantics of the function are also interesting: The intention seems
> to be that to a compat task, PER_LINUX32 would appear as PER_LINUX.
> The effect is that any process can set PER_LINUX32 but it can never
> be unset except by a 64 bit MIPS or PA-RISC task.

IMHO, it makes sense to keep the compat_sys_personality() as implemented
above. You may want to start a chroot ARMv7 environment using "linux32"
but don't want some 32-bit app calling personality(PER_LINUX) (as that's
the default personality on an ARMv7 system) and unknowingly changing the
personality that you wanted to enforce via "linux32".

I agree with not setting the personality based on the ELF type, but
that's different from the compat_sys_personality().

> Since x86_64 does not implement this behavior at all, I suspect that
> there are now lots of things depending on not having it, while all
> the other architectures might also have some (even predating the
> x86_64 port) use cases that depend on depend on not being able to
> observe PER_LINUX32 in 32 bit compat tasks.
>
> I think we should try to agree on how this is all supposed to work
> and use common code, either put the ppc/sparc/s390 version into
> sys_personality, or remove all of them and just do what x86 and tile
> do, using the regular sys_personality for all tasks.

Late topic for the KS :).

I don't think we can move this behaviour to sys_personality. We may want
to add a generic compat_sys_personality() if we agree on the above
use-case.

--
Catalin

2012-08-24 09:50:58

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 02/31] arm64: Kernel booting and initialisation

On Fri, Aug 17, 2012 at 02:13:59PM +0100, Tony Lindgren wrote:
> * Shilimkar, Santosh <[email protected]> [120817 03:11]:
> > On Fri, Aug 17, 2012 at 3:35 PM, Catalin Marinas
> > <[email protected]> wrote:
> > >
> > > On Fri, Aug 17, 2012 at 10:41:10AM +0100, Santosh Shilimkar wrote:
> > > >
> > > > So you expect all the secondary CPUs to be in wakeup state and probably
> > > > looping in WFE for a signal from kernel to boot. There is one issue
> > > > with this requirement though. For large CPU system, you need to reset
> > > > all the CPUs and hit this waiting loop. This will lead to large inrush
> > > > current need at bootup which may be not be supported. To avoid this
> > > > issue, secondary CPUs are kept in OFF state and then they are woken
> > > > up from kernel one by one whenever they need to be brought into the
> > > > system. This requirement should be considered.
> > >
> > > I agree, this part will be extended. That's one method that we currently
> > > support and suitable to the model.
> > >
> > > The better method is the SMC standardisation that Charles Garcia-Tobin
> > > has written (to be made available soon) and was presented at the last
> > > Linaro Connect in HK. Given that the CPU power is usually controlled by
> > > the secure side, we'll ask for an SMC to be issued for waking up
> > > secondary CPUs, so it's up to the secure firmware to write the correct
> > > hardware registers.
> > >
> > Thanks for the information. SMC standardization would indeed help
> > to overcome some of these. Will wait for that information before
> > next set of questions.
>
> Yes please. If the SMC is not standardized for most calls at least,
> we'll end up with a horrible mess of SoC specific calls like we
> currently have. Related to that, the virtualization calls should be
> also standardized so we don't end up with multiple different hypervisors
> with different calls.

The Power State Coordination Interface initial proposal has been
published here:

http://infocenter.arm.com/help/topic/com.arm.doc.den0022a/index.html

(as with other ARM documents, they are public but free registration
required).

--
Catalin

2012-08-24 10:44:16

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 21/31] arm64: 32-bit (compat) applications support

On Wed, Aug 15, 2012 at 03:34:04PM +0100, Arnd Bergmann wrote:
> On Tuesday 14 August 2012, Catalin Marinas wrote:
>
> > +#ifdef CONFIG_AARCH32_EMULATION
> > +#include <linux/compat.h>
> > +
> > +#define AARCH32_KERN_SIGRET_CODE_OFFSET 0x500
> > +
> > +extern const compat_ulong_t aarch32_sigret_code[6];
> > +
> > +int compat_setup_frame(int usig, struct k_sigaction *ka, sigset_t *set,
> > + struct pt_regs *regs);
> > +int compat_setup_rt_frame(int usig, struct k_sigaction *ka, siginfo_t *info,
> > + sigset_t *set, struct pt_regs *regs);
> > +
> > +void compat_setup_restart_syscall(struct pt_regs *regs);
> > +#else
> > +
> > +static inline int compat_setup_frame(int usid, struct k_sigaction *ka,
> > + sigset_t *set, struct pt_regs *regs)
> > +{
> > + BUG();
> > +}
>
> What good is the run-time BUG() here? Nothing should be calling these
> when CONFIG_COMPAT is disabled, so I think you should just remove
> the #ifdef around the declarations, and the entire #else case.

They are called from handle_signal(), so that's to avoid #ifdef inside
functions. I can drop the BUG() (but keep the empty function) and
change the checks to is_compat_task() so that the compiler optimises the
condition out when !COMPAT.

--
Catalin

2012-08-26 04:49:43

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 21/31] arm64: 32-bit (compat) applications support

On Friday 24 August 2012, Catalin Marinas wrote:
> > What good is the run-time BUG() here? Nothing should be calling these
> > when CONFIG_COMPAT is disabled, so I think you should just remove
> > the #ifdef around the declarations, and the entire #else case.
>
> They are called from handle_signal(), so that's to avoid #ifdef inside
> functions. I can drop the BUG() (but keep the empty function) and
> change the checks to is_compat_task() so that the compiler optimises the
> condition out when !COMPAT.
>

Sounds good. Note that you can turn a lot of #ifdef into
if(IS_ENABLED(CONFIG_FOO)) as well, even if there is no other runtime
check for them.

Arnd

2012-08-28 18:28:31

by Jiri Kosina

[permalink] [raw]
Subject: Re: PER_LINUX32, Was: [PATCH v2 21/31] arm64: 32-bit (compat) applications support

On Thu, 23 Aug 2012, Arnd Bergmann wrote:

> > > > +asmlinkage int compat_sys_personality(compat_ulong_t personality)
> > > > +{
> > > > + int ret;
> > > > +
> > > > + if (personality(current->personality) == PER_LINUX32 &&
> > > > + personality == PER_LINUX)
> > > > + personality = PER_LINUX32;
> > > > + ret = sys_personality(personality);
> > > > + if (ret == PER_LINUX32)
> > > > + ret = PER_LINUX;
> > > > + return ret;
> > > > +}
> > >
> > > Where did you get this from?
> > >
> > > You should not need compat_sys_personality, just call the native function.
> >
> > Hmm, but in that case an aarch32 application doing a personality(PER_LINUX)
> > syscall will start seeing the wrong uname.
>
> Coming back at this topic, I noticed another issue. Jiri Kosina
> has recently posted patches to fix this function in the other architectures

Yeah, there were quite a few broken ones, some of them since the beginning
of time.

> in order to mask out the other personality bits, which is a correct fix,
> but the above function is odd for other reasons.
>
> * On MIPS, it is used only for compat tasks, like you have it above.
> * On PA-RISC, it is used for native 32 bit tasks and for compat 32 bit tasks,
> but not for native 64 bit ones.
> * On IA64, it was used for compat tasks (support for which has since
> been removed from the kernel), plus all 32 bit tasks would start with
> PER_LINUX32.
> * On PowerPC, Sparc and s390, it is used for native 64 bit tasks and for
> compat 32 bit tasks, but not for native 32 bit ones.
> * On Tile, it was never used.
> * On x86_64, it used to be defined (copied from ia64) but not used
> throughout the git history.
>
> The semantics of the function are also interesting: The intention seems
> to be that to a compat task, PER_LINUX32 would appear as PER_LINUX.
> The effect is that any process can set PER_LINUX32 but it can never
> be unset except by a 64 bit MIPS or PA-RISC task.
>
> Since x86_64 does not implement this behavior at all, I suspect that
> there are now lots of things depending on not having it, while all
> the other architectures might also have some (even predating the
> x86_64 port) use cases that depend on depend on not being able to
> observe PER_LINUX32 in 32 bit compat tasks.
>
> I think we should try to agree on how this is all supposed to work
> and use common code, either put the ppc/sparc/s390 version into
> sys_personality, or remove all of them and just do what x86 and tile
> do, using the regular sys_personality for all tasks.

How about rather introducing common compat_sys_personality() and switching
the archs that are using it to it? Unifying the behavior (PER_LINUX /
PER_LINUX32 masquerading) should be painless.

Thanks,

--
Jiri Kosina
SUSE Labs