2013-10-16 03:20:31

by Jiang Liu

[permalink] [raw]
Subject: [PATCH v3 0/7] Optimize jump label implementation for ARM64

From: Jiang Liu <[email protected]>

This patchset tries to optimize arch specfic jump label implementation
for ARM64 by dynamic kernel text patching.

To enable this feature, your toolchain must support "asm goto" extension
and "%c" constraint extesion. Current GCC for AARCH64 doesn't support
"%c", so you need a GCC patch similiar to this:
http://gcc.gnu.org/viewcvs/gcc/trunk/gcc/config/arm/arm.c?view=patch&r1=175293&r2=175565&pathrev=175565
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48637

It has been tested on ARM Fast mode and a real hardware platform.

Any comments are welcomed!

V2->V3:
1) fix a bug in comparing signed and unsigned values
2) detect big endian by checking __AARCH64EB__

V1->V2: address review comments of V1
1) refine comments
2) add a new interface to always synchronize with stop_machine()
when patching code
3) handle endian issue when patching code

Jiang Liu (7):
arm64: introduce basic aarch64 instruction decoding helpers
arm64: introduce interfaces to hotpatch kernel and module code
arm64: move encode_insn_immediate() from module.c to insn.c
arm64: introduce aarch64_insn_gen_{nop|branch_imm}() helper functions
arm64, jump label: detect %c support for ARM64
arm64, jump label: optimize jump label implementation
jump_label: use defined macros instead of hard-coding for better
readability

arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/insn.h | 79 ++++++++++
arch/arm64/include/asm/jump_label.h | 52 +++++++
arch/arm64/kernel/Makefile | 3 +-
arch/arm64/kernel/insn.c | 285 ++++++++++++++++++++++++++++++++++++
arch/arm64/kernel/jump_label.c | 60 ++++++++
arch/arm64/kernel/module.c | 151 ++++---------------
include/linux/jump_label.h | 15 +-
scripts/gcc-goto.sh | 2 +-
9 files changed, 522 insertions(+), 126 deletions(-)
create mode 100644 arch/arm64/include/asm/insn.h
create mode 100644 arch/arm64/include/asm/jump_label.h
create mode 100644 arch/arm64/kernel/insn.c
create mode 100644 arch/arm64/kernel/jump_label.c

--
1.8.1.2


2013-10-16 03:21:07

by Jiang Liu

[permalink] [raw]
Subject: [PATCH v3 1/7] arm64: introduce basic aarch64 instruction decoding helpers

From: Jiang Liu <[email protected]>

Introduce basic aarch64 instruction decoding helper
aarch64_get_insn_class() and aarch64_insn_hotpatch_safe().

Signed-off-by: Jiang Liu <[email protected]>
Cc: Jiang Liu <[email protected]>
---
arch/arm64/include/asm/insn.h | 53 ++++++++++++++++++++++++++
arch/arm64/kernel/Makefile | 2 +-
arch/arm64/kernel/insn.c | 86 +++++++++++++++++++++++++++++++++++++++++++
3 files changed, 140 insertions(+), 1 deletion(-)
create mode 100644 arch/arm64/include/asm/insn.h
create mode 100644 arch/arm64/kernel/insn.c

diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
new file mode 100644
index 0000000..e7d1bc8
--- /dev/null
+++ b/arch/arm64/include/asm/insn.h
@@ -0,0 +1,53 @@
+/*
+ * Copyright (C) 2013 Huawei Ltd.
+ * Author: Jiang Liu <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef _ASM_ARM64_INSN_H
+#define _ASM_ARM64_INSN_H
+#include <linux/types.h>
+
+enum aarch64_insn_class {
+ AARCH64_INSN_CLS_UNKNOWN, /* UNALLOCATED */
+ AARCH64_INSN_CLS_DP_IMM, /* Data processing - immediate */
+ AARCH64_INSN_CLS_DP_REG, /* Data processing - register */
+ AARCH64_INSN_CLS_DP_FPSIMD, /* Data processing - SIMD and FP */
+ AARCH64_INSN_CLS_LDST, /* Loads and stores */
+ AARCH64_INSN_CLS_BR_SYS, /* Branch, exception generation and
+ * system instructions */
+};
+
+#define __AARCH64_INSN_FUNCS(abbr, mask, val) \
+static __always_inline bool aarch64_insn_is_##abbr(u32 code) \
+{ return (code & (mask)) == (val); } \
+static __always_inline u32 aarch64_insn_get_##abbr##_mask(void) \
+{ return (mask); } \
+static __always_inline u32 aarch64_insn_get_##abbr##_value(void) \
+{ return (val); }
+
+__AARCH64_INSN_FUNCS(b, 0xFC000000, 0x14000000)
+__AARCH64_INSN_FUNCS(bl, 0xFC000000, 0x94000000)
+__AARCH64_INSN_FUNCS(svc, 0xFFE0001F, 0xD4000001)
+__AARCH64_INSN_FUNCS(hvc, 0xFFE0001F, 0xD4000002)
+__AARCH64_INSN_FUNCS(smc, 0xFFE0001F, 0xD4000003)
+__AARCH64_INSN_FUNCS(brk, 0xFFE0001F, 0xD4200000)
+__AARCH64_INSN_FUNCS(nop, 0xFFFFFFFF, 0xD503201F)
+
+#undef __AARCH64_INSN_FUNCS
+
+enum aarch64_insn_class aarch64_get_insn_class(u32 insn);
+
+bool aarch64_insn_hotpatch_safe(u32 old_insn, u32 new_insn);
+
+#endif /* _ASM_ARM64_INSN_H */
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 7b4b564..9af6cb3 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -9,7 +9,7 @@ AFLAGS_head.o := -DTEXT_OFFSET=$(TEXT_OFFSET)
arm64-obj-y := cputable.o debug-monitors.o entry.o irq.o fpsimd.o \
entry-fpsimd.o process.o ptrace.o setup.o signal.o \
sys.o stacktrace.o time.o traps.o io.o vdso.o \
- hyp-stub.o psci.o
+ hyp-stub.o psci.o insn.o

arm64-obj-$(CONFIG_COMPAT) += sys32.o kuser32.o signal32.o \
sys_compat.o
diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
new file mode 100644
index 0000000..1be4d11
--- /dev/null
+++ b/arch/arm64/kernel/insn.c
@@ -0,0 +1,86 @@
+/*
+ * Copyright (C) 2013 Huawei Ltd.
+ * Author: Jiang Liu <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#include <linux/compiler.h>
+#include <linux/kernel.h>
+#include <asm/insn.h>
+
+/*
+ * ARM Architecture Reference Manual ARMv8, Section C3.1
+ * AARCH64 main encoding table
+ * Bit position
+ * 28 27 26 25 Encoding Group
+ * 0 0 - - Unallocated
+ * 1 0 0 - Data processing, immediate
+ * 1 0 1 - Branch, exception generation and system instructions
+ * - 1 - 0 Loads and stores
+ * - 1 0 1 Data processing - register
+ * 0 1 1 1 Data processing - SIMD and floating point
+ * 1 1 1 1 Data processing - SIMD and floating point
+ * "-" means "don't care"
+ */
+static int aarch64_insn_cls[] = {
+ AARCH64_INSN_CLS_UNKNOWN,
+ AARCH64_INSN_CLS_UNKNOWN,
+ AARCH64_INSN_CLS_UNKNOWN,
+ AARCH64_INSN_CLS_UNKNOWN,
+ AARCH64_INSN_CLS_LDST,
+ AARCH64_INSN_CLS_DP_REG,
+ AARCH64_INSN_CLS_LDST,
+ AARCH64_INSN_CLS_DP_FPSIMD,
+ AARCH64_INSN_CLS_DP_IMM,
+ AARCH64_INSN_CLS_DP_IMM,
+ AARCH64_INSN_CLS_BR_SYS,
+ AARCH64_INSN_CLS_BR_SYS,
+ AARCH64_INSN_CLS_LDST,
+ AARCH64_INSN_CLS_DP_REG,
+ AARCH64_INSN_CLS_LDST,
+ AARCH64_INSN_CLS_DP_FPSIMD,
+};
+
+enum aarch64_insn_class __kprobes aarch64_get_insn_class(u32 insn)
+{
+ return aarch64_insn_cls[(insn >> 25) & 0xf];
+}
+
+static bool __kprobes __aarch64_insn_hotpatch_safe(u32 insn)
+{
+ if (aarch64_get_insn_class(insn) != AARCH64_INSN_CLS_BR_SYS)
+ return false;
+
+ return aarch64_insn_is_b(insn) ||
+ aarch64_insn_is_bl(insn) ||
+ aarch64_insn_is_svc(insn) ||
+ aarch64_insn_is_hvc(insn) ||
+ aarch64_insn_is_smc(insn) ||
+ aarch64_insn_is_brk(insn) ||
+ aarch64_insn_is_nop(insn);
+}
+
+/*
+ * ARMv8-A Section B2.6.5:
+ * Concurrent modification and execution of instructions can lead to the
+ * resulting instruction performing any behavior that can be achieved by
+ * executing any sequence of instructions that can be executed from the
+ * same Exception level, except where the instruction before modification
+ * and the instruction after modification is a B, BL, NOP, BKPT, SVC, HVC,
+ * or SMC instruction.
+ */
+bool __kprobes aarch64_insn_hotpatch_safe(u32 old_insn, u32 new_insn)
+{
+ return __aarch64_insn_hotpatch_safe(old_insn) &&
+ __aarch64_insn_hotpatch_safe(new_insn);
+}
--
1.8.1.2

2013-10-16 03:21:29

by Jiang Liu

[permalink] [raw]
Subject: [PATCH v3 2/7] arm64: introduce interfaces to hotpatch kernel and module code

From: Jiang Liu <[email protected]>

Introduce three interfaces to patch kernel and module code:
aarch64_insn_patch_text_nosync():
patch code without synchronization, it's caller's responsibility
to synchronize all CPUs if needed.
aarch64_insn_patch_text_sync():
patch code and always synchronize with stop_machine()
aarch64_insn_patch_text():
patch code and synchronize with stop_machine() if needed

Signed-off-by: Jiang Liu <[email protected]>
Cc: Jiang Liu <[email protected]>
---
arch/arm64/include/asm/insn.h | 7 +++-
arch/arm64/kernel/insn.c | 95 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 101 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
index e7d1bc8..2dfcdb4 100644
--- a/arch/arm64/include/asm/insn.h
+++ b/arch/arm64/include/asm/insn.h
@@ -47,7 +47,12 @@ __AARCH64_INSN_FUNCS(nop, 0xFFFFFFFF, 0xD503201F)
#undef __AARCH64_INSN_FUNCS

enum aarch64_insn_class aarch64_get_insn_class(u32 insn);
-
+u32 aarch64_insn_read(void *addr);
+void aarch64_insn_write(void *addr, u32 insn);
bool aarch64_insn_hotpatch_safe(u32 old_insn, u32 new_insn);

+int aarch64_insn_patch_text_nosync(void *addrs[], u32 insns[], int cnt);
+int aarch64_insn_patch_text_sync(void *addrs[], u32 insns[], int cnt);
+int aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt);
+
#endif /* _ASM_ARM64_INSN_H */
diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
index 1be4d11..ad4185f 100644
--- a/arch/arm64/kernel/insn.c
+++ b/arch/arm64/kernel/insn.c
@@ -16,6 +16,8 @@
*/
#include <linux/compiler.h>
#include <linux/kernel.h>
+#include <linux/stop_machine.h>
+#include <asm/cacheflush.h>
#include <asm/insn.h>

/*
@@ -84,3 +86,96 @@ bool __kprobes aarch64_insn_hotpatch_safe(u32 old_insn, u32 new_insn)
return __aarch64_insn_hotpatch_safe(old_insn) &&
__aarch64_insn_hotpatch_safe(new_insn);
}
+
+/*
+ * In ARMv8-A, A64 instructions have a fixed length of 32 bits and are always
+ * little-endian. On the other hand, SCTLR_EL1.EE (bit 25, Exception Endianness)
+ * flag controls endianness for EL1 explicit data accesses and stage 1
+ * translation table walks as below:
+ * 0: little-endian
+ * 1: big-endian
+ * So need to handle endianness when patching kernel code.
+ */
+u32 __kprobes aarch64_insn_read(void *addr)
+{
+ u32 insn;
+
+#ifdef __AARCH64EB__
+ insn = swab32(*(u32 *)addr);
+#else
+ insn = *(u32 *)addr;
+#endif
+
+ return insn;
+}
+
+void __kprobes aarch64_insn_write(void *addr, u32 insn)
+{
+#ifdef __AARCH64EB__
+ *(u32 *)addr = swab32(insn);
+#else
+ *(u32 *)addr = insn;
+#endif
+}
+
+int __kprobes aarch64_insn_patch_text_nosync(void *addrs[], u32 insns[],
+ int cnt)
+{
+ int i;
+ u32 *tp;
+
+ if (cnt <= 0)
+ return -EINVAL;
+
+ for (i = 0; i < cnt; i++) {
+ tp = addrs[i];
+ /* A64 instructions must be word aligned */
+ if ((uintptr_t)tp & 0x3)
+ return -EINVAL;
+ aarch64_insn_write(tp, insns[i]);
+ flush_icache_range((uintptr_t)tp, (uintptr_t)tp + sizeof(u32));
+ }
+
+ return 0;
+}
+
+struct aarch64_insn_patch {
+ void **text_addrs;
+ u32 *new_insns;
+ int insn_cnt;
+};
+
+static int __kprobes aarch64_insn_patch_text_cb(void *arg)
+{
+ struct aarch64_insn_patch *pp = arg;
+
+ return aarch64_insn_patch_text_nosync(pp->text_addrs, pp->new_insns,
+ pp->insn_cnt);
+}
+
+int __kprobes aarch64_insn_patch_text_sync(void *addrs[], u32 insns[], int cnt)
+{
+ struct aarch64_insn_patch patch = {
+ .text_addrs = addrs,
+ .new_insns = insns,
+ .insn_cnt = cnt,
+ };
+
+ if (cnt <= 0)
+ return -EINVAL;
+
+ /*
+ * Execute __aarch64_insn_patch_text() on every online CPU,
+ * which ensure serialization among all online CPUs.
+ */
+ return stop_machine(aarch64_insn_patch_text_cb, &patch, NULL);
+}
+
+int __kprobes aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt)
+{
+ if (cnt == 1 && aarch64_insn_hotpatch_safe(aarch64_insn_read(addrs[0]),
+ insns[0]))
+ return aarch64_insn_patch_text_nosync(addrs, insns, cnt);
+ else
+ return aarch64_insn_patch_text_sync(addrs, insns, cnt);
+}
--
1.8.1.2

2013-10-16 03:22:02

by Jiang Liu

[permalink] [raw]
Subject: [PATCH v3 3/7] arm64: move encode_insn_immediate() from module.c to insn.c

From: Jiang Liu <[email protected]>

Function encode_insn_immediate() will be used by other instruction
manipulate related functions, so move it into insn.c and rename it
as aarch64_insn_encode_immediate().

Signed-off-by: Jiang Liu <[email protected]>
Cc: Jiang Liu <[email protected]>
---
arch/arm64/include/asm/insn.h | 14 ++++
arch/arm64/kernel/insn.c | 77 +++++++++++++++++++++
arch/arm64/kernel/module.c | 151 +++++++++---------------------------------
3 files changed, 123 insertions(+), 119 deletions(-)

diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
index 2dfcdb4..8dc0a91 100644
--- a/arch/arm64/include/asm/insn.h
+++ b/arch/arm64/include/asm/insn.h
@@ -28,6 +28,18 @@ enum aarch64_insn_class {
* system instructions */
};

+enum aarch64_insn_imm_type {
+ AARCH64_INSN_IMM_MOVNZ,
+ AARCH64_INSN_IMM_MOVK,
+ AARCH64_INSN_IMM_ADR,
+ AARCH64_INSN_IMM_26,
+ AARCH64_INSN_IMM_19,
+ AARCH64_INSN_IMM_16,
+ AARCH64_INSN_IMM_14,
+ AARCH64_INSN_IMM_12,
+ AARCH64_INSN_IMM_9,
+};
+
#define __AARCH64_INSN_FUNCS(abbr, mask, val) \
static __always_inline bool aarch64_insn_is_##abbr(u32 code) \
{ return (code & (mask)) == (val); } \
@@ -47,6 +59,8 @@ __AARCH64_INSN_FUNCS(nop, 0xFFFFFFFF, 0xD503201F)
#undef __AARCH64_INSN_FUNCS

enum aarch64_insn_class aarch64_get_insn_class(u32 insn);
+u32 aarch64_insn_encode_immediate(enum aarch64_insn_imm_type type,
+ u32 insn, u64 imm);
u32 aarch64_insn_read(void *addr);
void aarch64_insn_write(void *addr, u32 insn);
bool aarch64_insn_hotpatch_safe(u32 old_insn, u32 new_insn);
diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
index ad4185f..90cc312 100644
--- a/arch/arm64/kernel/insn.c
+++ b/arch/arm64/kernel/insn.c
@@ -179,3 +179,80 @@ int __kprobes aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt)
else
return aarch64_insn_patch_text_sync(addrs, insns, cnt);
}
+
+u32 aarch64_insn_encode_immediate(enum aarch64_insn_imm_type type,
+ u32 insn, u64 imm)
+{
+ u32 immlo, immhi, lomask, himask, mask;
+ int shift;
+
+ switch (type) {
+ case AARCH64_INSN_IMM_MOVNZ:
+ /*
+ * For signed MOVW relocations, we have to manipulate the
+ * instruction encoding depending on whether or not the
+ * immediate is less than zero.
+ */
+ insn &= ~(3 << 29);
+ if ((s64)imm >= 0) {
+ /* >=0: Set the instruction to MOVZ (opcode 10b). */
+ insn |= 2 << 29;
+ } else {
+ /*
+ * <0: Set the instruction to MOVN (opcode 00b).
+ * Since we've masked the opcode already, we
+ * don't need to do anything other than
+ * inverting the new immediate field.
+ */
+ imm = ~imm;
+ }
+ case AARCH64_INSN_IMM_MOVK:
+ mask = BIT(16) - 1;
+ shift = 5;
+ break;
+ case AARCH64_INSN_IMM_ADR:
+ lomask = 0x3;
+ himask = 0x7ffff;
+ immlo = imm & lomask;
+ imm >>= 2;
+ immhi = imm & himask;
+ imm = (immlo << 24) | (immhi);
+ mask = (lomask << 24) | (himask);
+ shift = 5;
+ break;
+ case AARCH64_INSN_IMM_26:
+ mask = BIT(26) - 1;
+ shift = 0;
+ break;
+ case AARCH64_INSN_IMM_19:
+ mask = BIT(19) - 1;
+ shift = 5;
+ break;
+ case AARCH64_INSN_IMM_16:
+ mask = BIT(16) - 1;
+ shift = 5;
+ break;
+ case AARCH64_INSN_IMM_14:
+ mask = BIT(14) - 1;
+ shift = 5;
+ break;
+ case AARCH64_INSN_IMM_12:
+ mask = BIT(12) - 1;
+ shift = 10;
+ break;
+ case AARCH64_INSN_IMM_9:
+ mask = BIT(9) - 1;
+ shift = 12;
+ break;
+ default:
+ pr_err("aarch64_insn_encode_immediate: unknown immediate encoding %d\n",
+ type);
+ return 0;
+ }
+
+ /* Update the immediate field. */
+ insn &= ~(mask << shift);
+ insn |= (imm & mask) << shift;
+
+ return insn;
+}
diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index ca0e3d5..69e3c31 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -25,6 +25,7 @@
#include <linux/mm.h>
#include <linux/moduleloader.h>
#include <linux/vmalloc.h>
+#include <asm/insn.h>

void *module_alloc(unsigned long size)
{
@@ -94,96 +95,8 @@ static int reloc_data(enum aarch64_reloc_op op, void *place, u64 val, int len)
return 0;
}

-enum aarch64_imm_type {
- INSN_IMM_MOVNZ,
- INSN_IMM_MOVK,
- INSN_IMM_ADR,
- INSN_IMM_26,
- INSN_IMM_19,
- INSN_IMM_16,
- INSN_IMM_14,
- INSN_IMM_12,
- INSN_IMM_9,
-};
-
-static u32 encode_insn_immediate(enum aarch64_imm_type type, u32 insn, u64 imm)
-{
- u32 immlo, immhi, lomask, himask, mask;
- int shift;
-
- switch (type) {
- case INSN_IMM_MOVNZ:
- /*
- * For signed MOVW relocations, we have to manipulate the
- * instruction encoding depending on whether or not the
- * immediate is less than zero.
- */
- insn &= ~(3 << 29);
- if ((s64)imm >= 0) {
- /* >=0: Set the instruction to MOVZ (opcode 10b). */
- insn |= 2 << 29;
- } else {
- /*
- * <0: Set the instruction to MOVN (opcode 00b).
- * Since we've masked the opcode already, we
- * don't need to do anything other than
- * inverting the new immediate field.
- */
- imm = ~imm;
- }
- case INSN_IMM_MOVK:
- mask = BIT(16) - 1;
- shift = 5;
- break;
- case INSN_IMM_ADR:
- lomask = 0x3;
- himask = 0x7ffff;
- immlo = imm & lomask;
- imm >>= 2;
- immhi = imm & himask;
- imm = (immlo << 24) | (immhi);
- mask = (lomask << 24) | (himask);
- shift = 5;
- break;
- case INSN_IMM_26:
- mask = BIT(26) - 1;
- shift = 0;
- break;
- case INSN_IMM_19:
- mask = BIT(19) - 1;
- shift = 5;
- break;
- case INSN_IMM_16:
- mask = BIT(16) - 1;
- shift = 5;
- break;
- case INSN_IMM_14:
- mask = BIT(14) - 1;
- shift = 5;
- break;
- case INSN_IMM_12:
- mask = BIT(12) - 1;
- shift = 10;
- break;
- case INSN_IMM_9:
- mask = BIT(9) - 1;
- shift = 12;
- break;
- default:
- pr_err("encode_insn_immediate: unknown immediate encoding %d\n",
- type);
- return 0;
- }
-
- /* Update the immediate field. */
- insn &= ~(mask << shift);
- insn |= (imm & mask) << shift;
-
- return insn;
-}
-
static int reloc_insn_movw(enum aarch64_reloc_op op, void *place, u64 val,
- int lsb, enum aarch64_imm_type imm_type)
+ int lsb, enum aarch64_insn_imm_type imm_type)
{
u64 imm, limit = 0;
s64 sval;
@@ -194,7 +107,7 @@ static int reloc_insn_movw(enum aarch64_reloc_op op, void *place, u64 val,
imm = sval & 0xffff;

/* Update the instruction with the new encoding. */
- *(u32 *)place = encode_insn_immediate(imm_type, insn, imm);
+ *(u32 *)place = aarch64_insn_encode_immediate(imm_type, insn, imm);

/* Shift out the immediate field. */
sval >>= 16;
@@ -203,9 +116,9 @@ static int reloc_insn_movw(enum aarch64_reloc_op op, void *place, u64 val,
* For unsigned immediates, the overflow check is straightforward.
* For signed immediates, the sign bit is actually the bit past the
* most significant bit of the field.
- * The INSN_IMM_16 immediate type is unsigned.
+ * The AARCH64_INSN_IMM_16 immediate type is unsigned.
*/
- if (imm_type != INSN_IMM_16) {
+ if (imm_type != AARCH64_INSN_IMM_16) {
sval++;
limit++;
}
@@ -218,7 +131,7 @@ static int reloc_insn_movw(enum aarch64_reloc_op op, void *place, u64 val,
}

static int reloc_insn_imm(enum aarch64_reloc_op op, void *place, u64 val,
- int lsb, int len, enum aarch64_imm_type imm_type)
+ int lsb, int len, enum aarch64_insn_imm_type imm_type)
{
u64 imm, imm_mask;
s64 sval;
@@ -233,7 +146,7 @@ static int reloc_insn_imm(enum aarch64_reloc_op op, void *place, u64 val,
imm = sval & imm_mask;

/* Update the instruction's immediate field. */
- *(u32 *)place = encode_insn_immediate(imm_type, insn, imm);
+ *(u32 *)place = aarch64_insn_encode_immediate(imm_type, insn, imm);

/*
* Extract the upper value bits (including the sign bit) and
@@ -315,125 +228,125 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
overflow_check = false;
case R_AARCH64_MOVW_UABS_G0:
ovf = reloc_insn_movw(RELOC_OP_ABS, loc, val, 0,
- INSN_IMM_16);
+ AARCH64_INSN_IMM_16);
break;
case R_AARCH64_MOVW_UABS_G1_NC:
overflow_check = false;
case R_AARCH64_MOVW_UABS_G1:
ovf = reloc_insn_movw(RELOC_OP_ABS, loc, val, 16,
- INSN_IMM_16);
+ AARCH64_INSN_IMM_16);
break;
case R_AARCH64_MOVW_UABS_G2_NC:
overflow_check = false;
case R_AARCH64_MOVW_UABS_G2:
ovf = reloc_insn_movw(RELOC_OP_ABS, loc, val, 32,
- INSN_IMM_16);
+ AARCH64_INSN_IMM_16);
break;
case R_AARCH64_MOVW_UABS_G3:
/* We're using the top bits so we can't overflow. */
overflow_check = false;
ovf = reloc_insn_movw(RELOC_OP_ABS, loc, val, 48,
- INSN_IMM_16);
+ AARCH64_INSN_IMM_16);
break;
case R_AARCH64_MOVW_SABS_G0:
ovf = reloc_insn_movw(RELOC_OP_ABS, loc, val, 0,
- INSN_IMM_MOVNZ);
+ AARCH64_INSN_IMM_MOVNZ);
break;
case R_AARCH64_MOVW_SABS_G1:
ovf = reloc_insn_movw(RELOC_OP_ABS, loc, val, 16,
- INSN_IMM_MOVNZ);
+ AARCH64_INSN_IMM_MOVNZ);
break;
case R_AARCH64_MOVW_SABS_G2:
ovf = reloc_insn_movw(RELOC_OP_ABS, loc, val, 32,
- INSN_IMM_MOVNZ);
+ AARCH64_INSN_IMM_MOVNZ);
break;
case R_AARCH64_MOVW_PREL_G0_NC:
overflow_check = false;
ovf = reloc_insn_movw(RELOC_OP_PREL, loc, val, 0,
- INSN_IMM_MOVK);
+ AARCH64_INSN_IMM_MOVK);
break;
case R_AARCH64_MOVW_PREL_G0:
ovf = reloc_insn_movw(RELOC_OP_PREL, loc, val, 0,
- INSN_IMM_MOVNZ);
+ AARCH64_INSN_IMM_MOVNZ);
break;
case R_AARCH64_MOVW_PREL_G1_NC:
overflow_check = false;
ovf = reloc_insn_movw(RELOC_OP_PREL, loc, val, 16,
- INSN_IMM_MOVK);
+ AARCH64_INSN_IMM_MOVK);
break;
case R_AARCH64_MOVW_PREL_G1:
ovf = reloc_insn_movw(RELOC_OP_PREL, loc, val, 16,
- INSN_IMM_MOVNZ);
+ AARCH64_INSN_IMM_MOVNZ);
break;
case R_AARCH64_MOVW_PREL_G2_NC:
overflow_check = false;
ovf = reloc_insn_movw(RELOC_OP_PREL, loc, val, 32,
- INSN_IMM_MOVK);
+ AARCH64_INSN_IMM_MOVK);
break;
case R_AARCH64_MOVW_PREL_G2:
ovf = reloc_insn_movw(RELOC_OP_PREL, loc, val, 32,
- INSN_IMM_MOVNZ);
+ AARCH64_INSN_IMM_MOVNZ);
break;
case R_AARCH64_MOVW_PREL_G3:
/* We're using the top bits so we can't overflow. */
overflow_check = false;
ovf = reloc_insn_movw(RELOC_OP_PREL, loc, val, 48,
- INSN_IMM_MOVNZ);
+ AARCH64_INSN_IMM_MOVNZ);
break;

/* Immediate instruction relocations. */
case R_AARCH64_LD_PREL_LO19:
ovf = reloc_insn_imm(RELOC_OP_PREL, loc, val, 2, 19,
- INSN_IMM_19);
+ AARCH64_INSN_IMM_19);
break;
case R_AARCH64_ADR_PREL_LO21:
ovf = reloc_insn_imm(RELOC_OP_PREL, loc, val, 0, 21,
- INSN_IMM_ADR);
+ AARCH64_INSN_IMM_ADR);
break;
case R_AARCH64_ADR_PREL_PG_HI21_NC:
overflow_check = false;
case R_AARCH64_ADR_PREL_PG_HI21:
ovf = reloc_insn_imm(RELOC_OP_PAGE, loc, val, 12, 21,
- INSN_IMM_ADR);
+ AARCH64_INSN_IMM_ADR);
break;
case R_AARCH64_ADD_ABS_LO12_NC:
case R_AARCH64_LDST8_ABS_LO12_NC:
overflow_check = false;
ovf = reloc_insn_imm(RELOC_OP_ABS, loc, val, 0, 12,
- INSN_IMM_12);
+ AARCH64_INSN_IMM_12);
break;
case R_AARCH64_LDST16_ABS_LO12_NC:
overflow_check = false;
ovf = reloc_insn_imm(RELOC_OP_ABS, loc, val, 1, 11,
- INSN_IMM_12);
+ AARCH64_INSN_IMM_12);
break;
case R_AARCH64_LDST32_ABS_LO12_NC:
overflow_check = false;
ovf = reloc_insn_imm(RELOC_OP_ABS, loc, val, 2, 10,
- INSN_IMM_12);
+ AARCH64_INSN_IMM_12);
break;
case R_AARCH64_LDST64_ABS_LO12_NC:
overflow_check = false;
ovf = reloc_insn_imm(RELOC_OP_ABS, loc, val, 3, 9,
- INSN_IMM_12);
+ AARCH64_INSN_IMM_12);
break;
case R_AARCH64_LDST128_ABS_LO12_NC:
overflow_check = false;
ovf = reloc_insn_imm(RELOC_OP_ABS, loc, val, 4, 8,
- INSN_IMM_12);
+ AARCH64_INSN_IMM_12);
break;
case R_AARCH64_TSTBR14:
ovf = reloc_insn_imm(RELOC_OP_PREL, loc, val, 2, 14,
- INSN_IMM_14);
+ AARCH64_INSN_IMM_14);
break;
case R_AARCH64_CONDBR19:
ovf = reloc_insn_imm(RELOC_OP_PREL, loc, val, 2, 19,
- INSN_IMM_19);
+ AARCH64_INSN_IMM_19);
break;
case R_AARCH64_JUMP26:
case R_AARCH64_CALL26:
ovf = reloc_insn_imm(RELOC_OP_PREL, loc, val, 2, 26,
- INSN_IMM_26);
+ AARCH64_INSN_IMM_26);
break;

default:
--
1.8.1.2

2013-10-16 03:22:27

by Jiang Liu

[permalink] [raw]
Subject: [PATCH v3 4/7] arm64: introduce aarch64_insn_gen_{nop|branch_imm}() helper functions

From: Jiang Liu <[email protected]>

Introduce aarch64_insn_gen_{nop|branch_imm}() helper functions, which
will be used to implement jump label on ARM64.

Signed-off-by: Jiang Liu <[email protected]>
Cc: Jiang Liu <[email protected]>
---
arch/arm64/include/asm/insn.h | 7 +++++++
arch/arm64/kernel/insn.c | 27 +++++++++++++++++++++++++++
2 files changed, 34 insertions(+)

diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
index 8dc0a91..87c44b2 100644
--- a/arch/arm64/include/asm/insn.h
+++ b/arch/arm64/include/asm/insn.h
@@ -61,6 +61,13 @@ __AARCH64_INSN_FUNCS(nop, 0xFFFFFFFF, 0xD503201F)
enum aarch64_insn_class aarch64_get_insn_class(u32 insn);
u32 aarch64_insn_encode_immediate(enum aarch64_insn_imm_type type,
u32 insn, u64 imm);
+u32 aarch64_insn_gen_branch_imm(unsigned long pc, unsigned long addr,
+ bool link);
+static __always_inline u32 aarch64_insn_gen_nop(void)
+{
+ return aarch64_insn_get_nop_value();
+}
+
u32 aarch64_insn_read(void *addr);
void aarch64_insn_write(void *addr, u32 insn);
bool aarch64_insn_hotpatch_safe(u32 old_insn, u32 new_insn);
diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
index 90cc312..c63fae6 100644
--- a/arch/arm64/kernel/insn.c
+++ b/arch/arm64/kernel/insn.c
@@ -14,6 +14,7 @@
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
+#include <linux/bitops.h>
#include <linux/compiler.h>
#include <linux/kernel.h>
#include <linux/stop_machine.h>
@@ -256,3 +257,29 @@ u32 aarch64_insn_encode_immediate(enum aarch64_insn_imm_type type,

return insn;
}
+
+u32 aarch64_insn_gen_branch_imm(unsigned long pc, unsigned long addr, bool link)
+{
+ u32 insn;
+ long offset;
+
+ /*
+ * PC: A 64-bit Program Counter holding the address of the current
+ * instruction. A64 instructions may be word-aligned.
+ */
+ BUG_ON((pc & 0x3) || (addr & 0x3));
+
+ /* B/BR support [-128M, 128M) offset */
+ offset = ((long)addr - (long)pc) >> 2;
+ if (abs(offset) > BIT(25) || offset == BIT(25)) {
+ WARN_ON_ONCE(1);
+ return 0;
+ }
+
+ if (link)
+ insn = aarch64_insn_get_bl_value();
+ else
+ insn = aarch64_insn_get_b_value();
+
+ return aarch64_insn_encode_immediate(AARCH64_INSN_IMM_26, insn, offset);
+}
--
1.8.1.2

2013-10-16 03:22:37

by Jiang Liu

[permalink] [raw]
Subject: [PATCH v3 5/7] arm64, jump label: detect %c support for ARM64

From: Jiang Liu <[email protected]>

As commit a9468f30b5eac6 "ARM: 7333/2: jump label: detect %c
support for ARM", this patch detects the same thing for ARM64
because some ARM64 GCC versions have the same issue.

Some versions of ARM64 GCC which do support asm goto, do not
support the %c specifier. Since we need the %c to support jump
labels on ARM64, detect that too in the asm goto detection script
to avoid build errors with these versions.

Signed-off-by: Jiang Liu <[email protected]>
Cc: Jiang Liu <[email protected]>
---
scripts/gcc-goto.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/gcc-goto.sh b/scripts/gcc-goto.sh
index a2af2e8..c9469d3 100644
--- a/scripts/gcc-goto.sh
+++ b/scripts/gcc-goto.sh
@@ -5,7 +5,7 @@
cat << "END" | $@ -x c - -c -o /dev/null >/dev/null 2>&1 && echo "y"
int main(void)
{
-#ifdef __arm__
+#if defined(__arm__) || defined(__aarch64__)
/*
* Not related to asm goto, but used by jump label
* and broken on some ARM GCC versions (see GCC Bug 48637).
--
1.8.1.2

2013-10-16 03:23:06

by Jiang Liu

[permalink] [raw]
Subject: [PATCH v3 6/7] arm64, jump label: optimize jump label implementation

From: Jiang Liu <[email protected]>

Optimize jump label implementation for ARM64 by dynamically patching
kernel text.

Signed-off-by: Jiang Liu <[email protected]>
Cc: Jiang Liu <[email protected]>
---
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/jump_label.h | 52 ++++++++++++++++++++++++++++++++
arch/arm64/kernel/Makefile | 1 +
arch/arm64/kernel/jump_label.c | 60 +++++++++++++++++++++++++++++++++++++
4 files changed, 114 insertions(+)
create mode 100644 arch/arm64/include/asm/jump_label.h
create mode 100644 arch/arm64/kernel/jump_label.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c044548..da388e4 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -17,6 +17,7 @@ config ARM64
select GENERIC_SMP_IDLE_THREAD
select GENERIC_TIME_VSYSCALL
select HARDIRQS_SW_RESEND
+ select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_TRACEHOOK
select HAVE_DEBUG_BUGVERBOSE
select HAVE_DEBUG_KMEMLEAK
diff --git a/arch/arm64/include/asm/jump_label.h b/arch/arm64/include/asm/jump_label.h
new file mode 100644
index 0000000..d268fab
--- /dev/null
+++ b/arch/arm64/include/asm/jump_label.h
@@ -0,0 +1,52 @@
+/*
+ * Copyright (C) 2013 Huawei Ltd.
+ * Author: Jiang Liu <[email protected]>
+ *
+ * Based on arch/arm/include/asm/jump_label.h
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef _ASM_ARM64_JUMP_LABEL_H
+#define _ASM_ARM64_JUMP_LABEL_H
+#include <linux/types.h>
+
+#ifdef __KERNEL__
+
+#define JUMP_LABEL_NOP_SIZE 4
+
+static __always_inline bool arch_static_branch(struct static_key *key)
+{
+ asm goto("1:\n\t"
+ "nop\n\t"
+ ".pushsection __jump_table, \"aw\"\n\t"
+ ".align 3\n\t"
+ ".quad 1b, %l[l_yes], %c0\n\t"
+ ".popsection\n\t"
+ : : "i"(key) : : l_yes);
+
+ return false;
+l_yes:
+ return true;
+}
+
+#endif /* __KERNEL__ */
+
+typedef u64 jump_label_t;
+
+struct jump_entry {
+ jump_label_t code;
+ jump_label_t target;
+ jump_label_t key;
+};
+
+#endif /* _ASM_ARM64_JUMP_LABEL_H */
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 9af6cb3..b7db65e 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -18,6 +18,7 @@ arm64-obj-$(CONFIG_SMP) += smp.o smp_spin_table.o smp_psci.o
arm64-obj-$(CONFIG_HW_PERF_EVENTS) += perf_event.o
arm64-obj-$(CONFIG_HAVE_HW_BREAKPOINT)+= hw_breakpoint.o
arm64-obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
+arm64-obj-$(CONFIG_JUMP_LABEL) += jump_label.o

obj-y += $(arm64-obj-y) vdso/
obj-m += $(arm64-obj-m)
diff --git a/arch/arm64/kernel/jump_label.c b/arch/arm64/kernel/jump_label.c
new file mode 100644
index 0000000..74cbc73
--- /dev/null
+++ b/arch/arm64/kernel/jump_label.c
@@ -0,0 +1,60 @@
+/*
+ * Copyright (C) 2013 Huawei Ltd.
+ * Author: Jiang Liu <[email protected]>
+ *
+ * Based on arch/arm/kernel/jump_label.c
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#include <linux/kernel.h>
+#include <linux/jump_label.h>
+#include <asm/jump_label.h>
+#include <asm/insn.h>
+
+#ifdef HAVE_JUMP_LABEL
+
+static void __arch_jump_label_transform(struct jump_entry *entry,
+ enum jump_label_type type,
+ bool is_static)
+{
+ void *addr = (void *)entry->code;
+ u32 insn;
+
+ if (type == JUMP_LABEL_ENABLE) {
+ /* no way out if instruction offset is out of range(+/-128M) */
+ insn = aarch64_insn_gen_branch_imm(entry->code,
+ entry->target, 0);
+ BUG_ON(!insn);
+ } else {
+ insn = aarch64_insn_gen_nop();
+ }
+
+ if (is_static)
+ aarch64_insn_patch_text_nosync(&addr, &insn, 1);
+ else
+ aarch64_insn_patch_text(&addr, &insn, 1);
+}
+
+void arch_jump_label_transform(struct jump_entry *entry,
+ enum jump_label_type type)
+{
+ __arch_jump_label_transform(entry, type, false);
+}
+
+void arch_jump_label_transform_static(struct jump_entry *entry,
+ enum jump_label_type type)
+{
+ __arch_jump_label_transform(entry, type, true);
+}
+
+#endif /* HAVE_JUMP_LABEL */
--
1.8.1.2

2013-10-16 03:23:17

by Jiang Liu

[permalink] [raw]
Subject: [PATCH v3 7/7] jump_label: use defined macros instead of hard-coding for better readability

From: Jiang Liu <[email protected]>

Use macro JUMP_LABEL_TRUE_BRANCH instead of hard-coding for better
readability.

Signed-off-by: Jiang Liu <[email protected]>
Cc: Jiang Liu <[email protected]>
---
include/linux/jump_label.h | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index a507907..6e54029 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -74,18 +74,21 @@ struct module;
#include <linux/atomic.h>
#ifdef HAVE_JUMP_LABEL

-#define JUMP_LABEL_TRUE_BRANCH 1UL
+#define JUMP_LABEL_TYPE_FALSE_BRANCH 0UL
+#define JUMP_LABEL_TYPE_TRUE_BRANCH 1UL
+#define JUMP_LABEL_TYPE_MASK 1UL

static
inline struct jump_entry *jump_label_get_entries(struct static_key *key)
{
return (struct jump_entry *)((unsigned long)key->entries
- & ~JUMP_LABEL_TRUE_BRANCH);
+ & ~JUMP_LABEL_TYPE_MASK);
}

static inline bool jump_label_get_branch_default(struct static_key *key)
{
- if ((unsigned long)key->entries & JUMP_LABEL_TRUE_BRANCH)
+ if (((unsigned long)key->entries & JUMP_LABEL_TYPE_MASK) ==
+ JUMP_LABEL_TYPE_TRUE_BRANCH)
return true;
return false;
}
@@ -116,9 +119,11 @@ extern void static_key_slow_dec(struct static_key *key);
extern void jump_label_apply_nops(struct module *mod);

#define STATIC_KEY_INIT_TRUE ((struct static_key) \
- { .enabled = ATOMIC_INIT(1), .entries = (void *)1 })
+ { .enabled = ATOMIC_INIT(1), \
+ .entries = (void *)JUMP_LABEL_TYPE_TRUE_BRANCH })
#define STATIC_KEY_INIT_FALSE ((struct static_key) \
- { .enabled = ATOMIC_INIT(0), .entries = (void *)0 })
+ { .enabled = ATOMIC_INIT(0), \
+ .entries = (void *)JUMP_LABEL_TYPE_FALSE_BRANCH })

#else /* !HAVE_JUMP_LABEL */

--
1.8.1.2

2013-10-16 10:52:24

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v3 1/7] arm64: introduce basic aarch64 instruction decoding helpers

On Wed, Oct 16, 2013 at 04:18:06AM +0100, Jiang Liu wrote:
> From: Jiang Liu <[email protected]>
>
> Introduce basic aarch64 instruction decoding helper
> aarch64_get_insn_class() and aarch64_insn_hotpatch_safe().
>
> Signed-off-by: Jiang Liu <[email protected]>
> Cc: Jiang Liu <[email protected]>
> ---
> arch/arm64/include/asm/insn.h | 53 ++++++++++++++++++++++++++
> arch/arm64/kernel/Makefile | 2 +-
> arch/arm64/kernel/insn.c | 86 +++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 140 insertions(+), 1 deletion(-)
> create mode 100644 arch/arm64/include/asm/insn.h
> create mode 100644 arch/arm64/kernel/insn.c
>
> diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
> new file mode 100644
> index 0000000..e7d1bc8
> --- /dev/null
> +++ b/arch/arm64/include/asm/insn.h
> @@ -0,0 +1,53 @@
> +/*
> + * Copyright (C) 2013 Huawei Ltd.
> + * Author: Jiang Liu <[email protected]>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
> + */
> +#ifndef _ASM_ARM64_INSN_H
> +#define _ASM_ARM64_INSN_H
> +#include <linux/types.h>
> +
> +enum aarch64_insn_class {
> + AARCH64_INSN_CLS_UNKNOWN, /* UNALLOCATED */
> + AARCH64_INSN_CLS_DP_IMM, /* Data processing - immediate */
> + AARCH64_INSN_CLS_DP_REG, /* Data processing - register */
> + AARCH64_INSN_CLS_DP_FPSIMD, /* Data processing - SIMD and FP */
> + AARCH64_INSN_CLS_LDST, /* Loads and stores */
> + AARCH64_INSN_CLS_BR_SYS, /* Branch, exception generation and
> + * system instructions */
> +};

Strictly speaking, these are encoding groups, not instruction classes.

> +
> +#define __AARCH64_INSN_FUNCS(abbr, mask, val) \
> +static __always_inline bool aarch64_insn_is_##abbr(u32 code) \
> +{ return (code & (mask)) == (val); } \
> +static __always_inline u32 aarch64_insn_get_##abbr##_mask(void) \
> +{ return (mask); } \
> +static __always_inline u32 aarch64_insn_get_##abbr##_value(void) \
> +{ return (val); }
> +
> +__AARCH64_INSN_FUNCS(b, 0xFC000000, 0x14000000)
> +__AARCH64_INSN_FUNCS(bl, 0xFC000000, 0x94000000)
> +__AARCH64_INSN_FUNCS(svc, 0xFFE0001F, 0xD4000001)
> +__AARCH64_INSN_FUNCS(hvc, 0xFFE0001F, 0xD4000002)
> +__AARCH64_INSN_FUNCS(smc, 0xFFE0001F, 0xD4000003)
> +__AARCH64_INSN_FUNCS(brk, 0xFFE0001F, 0xD4200000)
> +__AARCH64_INSN_FUNCS(nop, 0xFFFFFFFF, 0xD503201F)
> +
> +#undef __AARCH64_INSN_FUNCS
> +
> +enum aarch64_insn_class aarch64_get_insn_class(u32 insn);
> +
> +bool aarch64_insn_hotpatch_safe(u32 old_insn, u32 new_insn);
> +
> +#endif /* _ASM_ARM64_INSN_H */
> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
> index 7b4b564..9af6cb3 100644
> --- a/arch/arm64/kernel/Makefile
> +++ b/arch/arm64/kernel/Makefile
> @@ -9,7 +9,7 @@ AFLAGS_head.o := -DTEXT_OFFSET=$(TEXT_OFFSET)
> arm64-obj-y := cputable.o debug-monitors.o entry.o irq.o fpsimd.o \
> entry-fpsimd.o process.o ptrace.o setup.o signal.o \
> sys.o stacktrace.o time.o traps.o io.o vdso.o \
> - hyp-stub.o psci.o
> + hyp-stub.o psci.o insn.o
>
> arm64-obj-$(CONFIG_COMPAT) += sys32.o kuser32.o signal32.o \
> sys_compat.o
> diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
> new file mode 100644
> index 0000000..1be4d11
> --- /dev/null
> +++ b/arch/arm64/kernel/insn.c
> @@ -0,0 +1,86 @@
> +/*
> + * Copyright (C) 2013 Huawei Ltd.
> + * Author: Jiang Liu <[email protected]>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
> + */
> +#include <linux/compiler.h>
> +#include <linux/kernel.h>
> +#include <asm/insn.h>
> +
> +/*
> + * ARM Architecture Reference Manual ARMv8, Section C3.1
> + * AARCH64 main encoding table

AArch64.

> + * Bit position
> + * 28 27 26 25 Encoding Group
> + * 0 0 - - Unallocated
> + * 1 0 0 - Data processing, immediate
> + * 1 0 1 - Branch, exception generation and system instructions
> + * - 1 - 0 Loads and stores
> + * - 1 0 1 Data processing - register
> + * 0 1 1 1 Data processing - SIMD and floating point
> + * 1 1 1 1 Data processing - SIMD and floating point
> + * "-" means "don't care"
> + */

This comment would probably be better off in the header file, along with the
mask/value pairs you use there. Failing that, the table below should at
least live alongside the enum type and mask/value stuff.

> +static int aarch64_insn_cls[] = {
> + AARCH64_INSN_CLS_UNKNOWN,
> + AARCH64_INSN_CLS_UNKNOWN,
> + AARCH64_INSN_CLS_UNKNOWN,
> + AARCH64_INSN_CLS_UNKNOWN,
> + AARCH64_INSN_CLS_LDST,
> + AARCH64_INSN_CLS_DP_REG,
> + AARCH64_INSN_CLS_LDST,
> + AARCH64_INSN_CLS_DP_FPSIMD,
> + AARCH64_INSN_CLS_DP_IMM,
> + AARCH64_INSN_CLS_DP_IMM,
> + AARCH64_INSN_CLS_BR_SYS,
> + AARCH64_INSN_CLS_BR_SYS,
> + AARCH64_INSN_CLS_LDST,
> + AARCH64_INSN_CLS_DP_REG,
> + AARCH64_INSN_CLS_LDST,
> + AARCH64_INSN_CLS_DP_FPSIMD,
> +};
> +
> +enum aarch64_insn_class __kprobes aarch64_get_insn_class(u32 insn)
> +{
> + return aarch64_insn_cls[(insn >> 25) & 0xf];
> +}
> +
> +static bool __kprobes __aarch64_insn_hotpatch_safe(u32 insn)
> +{
> + if (aarch64_get_insn_class(insn) != AARCH64_INSN_CLS_BR_SYS)
> + return false;
> +
> + return aarch64_insn_is_b(insn) ||
> + aarch64_insn_is_bl(insn) ||
> + aarch64_insn_is_svc(insn) ||
> + aarch64_insn_is_hvc(insn) ||
> + aarch64_insn_is_smc(insn) ||
> + aarch64_insn_is_brk(insn) ||
> + aarch64_insn_is_nop(insn);
> +}
> +
> +/*
> + * ARMv8-A Section B2.6.5:
> + * Concurrent modification and execution of instructions can lead to the
> + * resulting instruction performing any behavior that can be achieved by
> + * executing any sequence of instructions that can be executed from the
> + * same Exception level, except where the instruction before modification
> + * and the instruction after modification is a B, BL, NOP, BKPT, SVC, HVC,
> + * or SMC instruction.
> + */
> +bool __kprobes aarch64_insn_hotpatch_safe(u32 old_insn, u32 new_insn)
> +{
> + return __aarch64_insn_hotpatch_safe(old_insn) &&
> + __aarch64_insn_hotpatch_safe(new_insn);

Take care here:

- This doesn't guarantee that CPUs exclusively see new_insn (they
may see old_insn, even after patching)

- If you patch a conditional branch and change both the target address
*and* the condition, you can get combinations of target/condition
between old_insn and new_insn.

Then again, the jump_label code should always be patching between B and NOP
instructions, right?

Will

2013-10-16 11:12:13

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v3 2/7] arm64: introduce interfaces to hotpatch kernel and module code

On Wed, Oct 16, 2013 at 04:18:07AM +0100, Jiang Liu wrote:
> From: Jiang Liu <[email protected]>
>
> Introduce three interfaces to patch kernel and module code:
> aarch64_insn_patch_text_nosync():
> patch code without synchronization, it's caller's responsibility
> to synchronize all CPUs if needed.
> aarch64_insn_patch_text_sync():
> patch code and always synchronize with stop_machine()
> aarch64_insn_patch_text():
> patch code and synchronize with stop_machine() if needed
>
> Signed-off-by: Jiang Liu <[email protected]>
> Cc: Jiang Liu <[email protected]>
> ---
> arch/arm64/include/asm/insn.h | 7 +++-
> arch/arm64/kernel/insn.c | 95 +++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 101 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
> index e7d1bc8..2dfcdb4 100644
> --- a/arch/arm64/include/asm/insn.h
> +++ b/arch/arm64/include/asm/insn.h
> @@ -47,7 +47,12 @@ __AARCH64_INSN_FUNCS(nop, 0xFFFFFFFF, 0xD503201F)
> #undef __AARCH64_INSN_FUNCS
>
> enum aarch64_insn_class aarch64_get_insn_class(u32 insn);
> -
> +u32 aarch64_insn_read(void *addr);
> +void aarch64_insn_write(void *addr, u32 insn);
> bool aarch64_insn_hotpatch_safe(u32 old_insn, u32 new_insn);
>
> +int aarch64_insn_patch_text_nosync(void *addrs[], u32 insns[], int cnt);
> +int aarch64_insn_patch_text_sync(void *addrs[], u32 insns[], int cnt);
> +int aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt);
> +
> #endif /* _ASM_ARM64_INSN_H */
> diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
> index 1be4d11..ad4185f 100644
> --- a/arch/arm64/kernel/insn.c
> +++ b/arch/arm64/kernel/insn.c
> @@ -16,6 +16,8 @@
> */
> #include <linux/compiler.h>
> #include <linux/kernel.h>
> +#include <linux/stop_machine.h>
> +#include <asm/cacheflush.h>
> #include <asm/insn.h>
>
> /*
> @@ -84,3 +86,96 @@ bool __kprobes aarch64_insn_hotpatch_safe(u32 old_insn, u32 new_insn)
> return __aarch64_insn_hotpatch_safe(old_insn) &&
> __aarch64_insn_hotpatch_safe(new_insn);
> }
> +
> +/*
> + * In ARMv8-A, A64 instructions have a fixed length of 32 bits and are always
> + * little-endian. On the other hand, SCTLR_EL1.EE (bit 25, Exception Endianness)
> + * flag controls endianness for EL1 explicit data accesses and stage 1
> + * translation table walks as below:
> + * 0: little-endian
> + * 1: big-endian
> + * So need to handle endianness when patching kernel code.
> + */
> +u32 __kprobes aarch64_insn_read(void *addr)
> +{
> + u32 insn;
> +
> +#ifdef __AARCH64EB__
> + insn = swab32(*(u32 *)addr);
> +#else
> + insn = *(u32 *)addr;
> +#endif

le32_to_cpu ?

> +
> + return insn;
> +}
> +
> +void __kprobes aarch64_insn_write(void *addr, u32 insn)
> +{
> +#ifdef __AARCH64EB__
> + *(u32 *)addr = swab32(insn);
> +#else
> + *(u32 *)addr = insn;
> +#endif
> +}

cpu_to_le32 ?

> +int __kprobes aarch64_insn_patch_text_nosync(void *addrs[], u32 insns[],
> + int cnt)
> +{
> + int i;
> + u32 *tp;
> +
> + if (cnt <= 0)
> + return -EINVAL;
> +
> + for (i = 0; i < cnt; i++) {
> + tp = addrs[i];
> + /* A64 instructions must be word aligned */
> + if ((uintptr_t)tp & 0x3)
> + return -EINVAL;
> + aarch64_insn_write(tp, insns[i]);
> + flush_icache_range((uintptr_t)tp, (uintptr_t)tp + sizeof(u32));

What are you trying to achieve with this cache maintenance for the nosync
case? If you're not synchronising, then you will always have races with the
instruction patching, so I'd argue that this cache flush doesn't buy you
anything.

> + }
> +
> + return 0;
> +}
> +
> +struct aarch64_insn_patch {
> + void **text_addrs;
> + u32 *new_insns;
> + int insn_cnt;
> +};
> +
> +static int __kprobes aarch64_insn_patch_text_cb(void *arg)
> +{
> + struct aarch64_insn_patch *pp = arg;
> +
> + return aarch64_insn_patch_text_nosync(pp->text_addrs, pp->new_insns,
> + pp->insn_cnt);
> +}
> +
> +int __kprobes aarch64_insn_patch_text_sync(void *addrs[], u32 insns[], int cnt)
> +{
> + struct aarch64_insn_patch patch = {
> + .text_addrs = addrs,
> + .new_insns = insns,
> + .insn_cnt = cnt,
> + };
> +
> + if (cnt <= 0)
> + return -EINVAL;
> +
> + /*
> + * Execute __aarch64_insn_patch_text() on every online CPU,
> + * which ensure serialization among all online CPUs.
> + */
> + return stop_machine(aarch64_insn_patch_text_cb, &patch, NULL);
> +}
> +
> +int __kprobes aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt)
> +{
> + if (cnt == 1 && aarch64_insn_hotpatch_safe(aarch64_insn_read(addrs[0]),
> + insns[0]))
> + return aarch64_insn_patch_text_nosync(addrs, insns, cnt);
> + else
> + return aarch64_insn_patch_text_sync(addrs, insns, cnt);
> +}

The other way of doing this for cnt > 1 would be to patch in a branch to
the insns array and then a branch over the original code at the end of the
array. Obviously, this relies on insns being allocated somewhere persistent
(and non-pageable!).

I was wondering whether you could do something clever with BKPT, but
everything I've thought of so far is racy.

Will

2013-10-16 11:22:56

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v3 3/7] arm64: move encode_insn_immediate() from module.c to insn.c

On Wed, Oct 16, 2013 at 04:18:08AM +0100, Jiang Liu wrote:
> From: Jiang Liu <[email protected]>
>
> Function encode_insn_immediate() will be used by other instruction
> manipulate related functions, so move it into insn.c and rename it
> as aarch64_insn_encode_immediate().
>
> Signed-off-by: Jiang Liu <[email protected]>
> Cc: Jiang Liu <[email protected]>
> ---
> arch/arm64/include/asm/insn.h | 14 ++++
> arch/arm64/kernel/insn.c | 77 +++++++++++++++++++++
> arch/arm64/kernel/module.c | 151 +++++++++---------------------------------
> 3 files changed, 123 insertions(+), 119 deletions(-)
>
> diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
> index 2dfcdb4..8dc0a91 100644
> --- a/arch/arm64/include/asm/insn.h
> +++ b/arch/arm64/include/asm/insn.h
> @@ -28,6 +28,18 @@ enum aarch64_insn_class {
> * system instructions */
> };
>
> +enum aarch64_insn_imm_type {
> + AARCH64_INSN_IMM_MOVNZ,
> + AARCH64_INSN_IMM_MOVK,
> + AARCH64_INSN_IMM_ADR,
> + AARCH64_INSN_IMM_26,
> + AARCH64_INSN_IMM_19,
> + AARCH64_INSN_IMM_16,
> + AARCH64_INSN_IMM_14,
> + AARCH64_INSN_IMM_12,
> + AARCH64_INSN_IMM_9,
> +};
> +
> #define __AARCH64_INSN_FUNCS(abbr, mask, val) \
> static __always_inline bool aarch64_insn_is_##abbr(u32 code) \
> { return (code & (mask)) == (val); } \
> @@ -47,6 +59,8 @@ __AARCH64_INSN_FUNCS(nop, 0xFFFFFFFF, 0xD503201F)
> #undef __AARCH64_INSN_FUNCS
>
> enum aarch64_insn_class aarch64_get_insn_class(u32 insn);
> +u32 aarch64_insn_encode_immediate(enum aarch64_insn_imm_type type,
> + u32 insn, u64 imm);
> u32 aarch64_insn_read(void *addr);
> void aarch64_insn_write(void *addr, u32 insn);
> bool aarch64_insn_hotpatch_safe(u32 old_insn, u32 new_insn);
> diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
> index ad4185f..90cc312 100644
> --- a/arch/arm64/kernel/insn.c
> +++ b/arch/arm64/kernel/insn.c
> @@ -179,3 +179,80 @@ int __kprobes aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt)
> else
> return aarch64_insn_patch_text_sync(addrs, insns, cnt);
> }
> +
> +u32 aarch64_insn_encode_immediate(enum aarch64_insn_imm_type type,
> + u32 insn, u64 imm)
> +{
> + u32 immlo, immhi, lomask, himask, mask;
> + int shift;
> +
> + switch (type) {
> + case AARCH64_INSN_IMM_MOVNZ:
> + /*
> + * For signed MOVW relocations, we have to manipulate the
> + * instruction encoding depending on whether or not the
> + * immediate is less than zero.
> + */
> + insn &= ~(3 << 29);
> + if ((s64)imm >= 0) {
> + /* >=0: Set the instruction to MOVZ (opcode 10b). */
> + insn |= 2 << 29;
> + } else {
> + /*
> + * <0: Set the instruction to MOVN (opcode 00b).
> + * Since we've masked the opcode already, we
> + * don't need to do anything other than
> + * inverting the new immediate field.
> + */
> + imm = ~imm;
> + }

I'm really not comfortable with this. This code is performing static
relocations and re-encoding instructions as required by the AArch64 ELF
spec. That's not really what you'd expect from a generic instruction
encoder!

Will

2013-10-16 11:46:49

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v3 6/7] arm64, jump label: optimize jump label implementation

On Wed, Oct 16, 2013 at 04:18:11AM +0100, Jiang Liu wrote:
> From: Jiang Liu <[email protected]>
>
> Optimize jump label implementation for ARM64 by dynamically patching
> kernel text.
>
> Signed-off-by: Jiang Liu <[email protected]>
> Cc: Jiang Liu <[email protected]>
> ---
> arch/arm64/Kconfig | 1 +
> arch/arm64/include/asm/jump_label.h | 52 ++++++++++++++++++++++++++++++++
> arch/arm64/kernel/Makefile | 1 +
> arch/arm64/kernel/jump_label.c | 60 +++++++++++++++++++++++++++++++++++++
> 4 files changed, 114 insertions(+)
> create mode 100644 arch/arm64/include/asm/jump_label.h
> create mode 100644 arch/arm64/kernel/jump_label.c
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index c044548..da388e4 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -17,6 +17,7 @@ config ARM64
> select GENERIC_SMP_IDLE_THREAD
> select GENERIC_TIME_VSYSCALL
> select HARDIRQS_SW_RESEND
> + select HAVE_ARCH_JUMP_LABEL
> select HAVE_ARCH_TRACEHOOK
> select HAVE_DEBUG_BUGVERBOSE
> select HAVE_DEBUG_KMEMLEAK
> diff --git a/arch/arm64/include/asm/jump_label.h b/arch/arm64/include/asm/jump_label.h
> new file mode 100644
> index 0000000..d268fab
> --- /dev/null
> +++ b/arch/arm64/include/asm/jump_label.h
> @@ -0,0 +1,52 @@
> +/*
> + * Copyright (C) 2013 Huawei Ltd.
> + * Author: Jiang Liu <[email protected]>
> + *
> + * Based on arch/arm/include/asm/jump_label.h
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
> + */
> +#ifndef _ASM_ARM64_JUMP_LABEL_H
> +#define _ASM_ARM64_JUMP_LABEL_H
> +#include <linux/types.h>
> +
> +#ifdef __KERNEL__
> +
> +#define JUMP_LABEL_NOP_SIZE 4
> +
> +static __always_inline bool arch_static_branch(struct static_key *key)
> +{
> + asm goto("1:\n\t"
> + "nop\n\t"
> + ".pushsection __jump_table, \"aw\"\n\t"
> + ".align 3\n\t"
> + ".quad 1b, %l[l_yes], %c0\n\t"
> + ".popsection\n\t"
> + : : "i"(key) : : l_yes);
> +
> + return false;
> +l_yes:
> + return true;
> +}
> +
> +#endif /* __KERNEL__ */
> +
> +typedef u64 jump_label_t;
> +
> +struct jump_entry {
> + jump_label_t code;
> + jump_label_t target;
> + jump_label_t key;
> +};
> +
> +#endif /* _ASM_ARM64_JUMP_LABEL_H */
> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
> index 9af6cb3..b7db65e 100644
> --- a/arch/arm64/kernel/Makefile
> +++ b/arch/arm64/kernel/Makefile
> @@ -18,6 +18,7 @@ arm64-obj-$(CONFIG_SMP) += smp.o smp_spin_table.o smp_psci.o
> arm64-obj-$(CONFIG_HW_PERF_EVENTS) += perf_event.o
> arm64-obj-$(CONFIG_HAVE_HW_BREAKPOINT)+= hw_breakpoint.o
> arm64-obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
> +arm64-obj-$(CONFIG_JUMP_LABEL) += jump_label.o
>
> obj-y += $(arm64-obj-y) vdso/
> obj-m += $(arm64-obj-m)
> diff --git a/arch/arm64/kernel/jump_label.c b/arch/arm64/kernel/jump_label.c
> new file mode 100644
> index 0000000..74cbc73
> --- /dev/null
> +++ b/arch/arm64/kernel/jump_label.c
> @@ -0,0 +1,60 @@
> +/*
> + * Copyright (C) 2013 Huawei Ltd.
> + * Author: Jiang Liu <[email protected]>
> + *
> + * Based on arch/arm/kernel/jump_label.c
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
> + */
> +#include <linux/kernel.h>
> +#include <linux/jump_label.h>
> +#include <asm/jump_label.h>
> +#include <asm/insn.h>
> +
> +#ifdef HAVE_JUMP_LABEL
> +
> +static void __arch_jump_label_transform(struct jump_entry *entry,
> + enum jump_label_type type,
> + bool is_static)
> +{
> + void *addr = (void *)entry->code;
> + u32 insn;
> +
> + if (type == JUMP_LABEL_ENABLE) {
> + /* no way out if instruction offset is out of range(+/-128M) */
> + insn = aarch64_insn_gen_branch_imm(entry->code,
> + entry->target, 0);
> + BUG_ON(!insn);

In this case, rather than BUG_ON, it would be better (somehow) to abort
optimisation and instead patch in a branch to the actual condition check.

> + } else {
> + insn = aarch64_insn_gen_nop();

You could make the code more concise by limiting your patching ability to
branch immediates. Then a nop is simply a branch to the next instruction (I
doubt any modern CPUs will choke on this, whereas the architecture requires
a NOP to take time).

Will

2013-10-16 15:36:36

by Jiang Liu

[permalink] [raw]
Subject: Re: [PATCH v3 1/7] arm64: introduce basic aarch64 instruction decoding helpers

On 10/16/2013 06:51 PM, Will Deacon wrote:
> On Wed, Oct 16, 2013 at 04:18:06AM +0100, Jiang Liu wrote:
>> From: Jiang Liu <[email protected]>
>>
>> Introduce basic aarch64 instruction decoding helper
>> aarch64_get_insn_class() and aarch64_insn_hotpatch_safe().
>>
>> Signed-off-by: Jiang Liu <[email protected]>
>> Cc: Jiang Liu <[email protected]>
>> ---
>> arch/arm64/include/asm/insn.h | 53 ++++++++++++++++++++++++++
>> arch/arm64/kernel/Makefile | 2 +-
>> arch/arm64/kernel/insn.c | 86 +++++++++++++++++++++++++++++++++++++++++++
>> 3 files changed, 140 insertions(+), 1 deletion(-)
>> create mode 100644 arch/arm64/include/asm/insn.h
>> create mode 100644 arch/arm64/kernel/insn.c
>>
>> diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
>> new file mode 100644
>> index 0000000..e7d1bc8
>> --- /dev/null
>> +++ b/arch/arm64/include/asm/insn.h
>> @@ -0,0 +1,53 @@
>> +/*
>> + * Copyright (C) 2013 Huawei Ltd.
>> + * Author: Jiang Liu <[email protected]>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +#ifndef _ASM_ARM64_INSN_H
>> +#define _ASM_ARM64_INSN_H
>> +#include <linux/types.h>
>> +
>> +enum aarch64_insn_class {
>> + AARCH64_INSN_CLS_UNKNOWN, /* UNALLOCATED */
>> + AARCH64_INSN_CLS_DP_IMM, /* Data processing - immediate */
>> + AARCH64_INSN_CLS_DP_REG, /* Data processing - register */
>> + AARCH64_INSN_CLS_DP_FPSIMD, /* Data processing - SIMD and FP */
>> + AARCH64_INSN_CLS_LDST, /* Loads and stores */
>> + AARCH64_INSN_CLS_BR_SYS, /* Branch, exception generation and
>> + * system instructions */
>> +};
>
> Strictly speaking, these are encoding groups, not instruction classes.
Hi Will,
Thanks for review.
Will change it to "aarch64_insn_encoding_class.

>
>> +
>> +#define __AARCH64_INSN_FUNCS(abbr, mask, val) \
>> +static __always_inline bool aarch64_insn_is_##abbr(u32 code) \
>> +{ return (code & (mask)) == (val); } \
>> +static __always_inline u32 aarch64_insn_get_##abbr##_mask(void) \
>> +{ return (mask); } \
>> +static __always_inline u32 aarch64_insn_get_##abbr##_value(void) \
>> +{ return (val); }
>> +
>> +__AARCH64_INSN_FUNCS(b, 0xFC000000, 0x14000000)
>> +__AARCH64_INSN_FUNCS(bl, 0xFC000000, 0x94000000)
>> +__AARCH64_INSN_FUNCS(svc, 0xFFE0001F, 0xD4000001)
>> +__AARCH64_INSN_FUNCS(hvc, 0xFFE0001F, 0xD4000002)
>> +__AARCH64_INSN_FUNCS(smc, 0xFFE0001F, 0xD4000003)
>> +__AARCH64_INSN_FUNCS(brk, 0xFFE0001F, 0xD4200000)
>> +__AARCH64_INSN_FUNCS(nop, 0xFFFFFFFF, 0xD503201F)
>> +
>> +#undef __AARCH64_INSN_FUNCS
>> +
>> +enum aarch64_insn_class aarch64_get_insn_class(u32 insn);
>> +
>> +bool aarch64_insn_hotpatch_safe(u32 old_insn, u32 new_insn);
>> +
>> +#endif /* _ASM_ARM64_INSN_H */
>> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
>> index 7b4b564..9af6cb3 100644
>> --- a/arch/arm64/kernel/Makefile
>> +++ b/arch/arm64/kernel/Makefile
>> @@ -9,7 +9,7 @@ AFLAGS_head.o := -DTEXT_OFFSET=$(TEXT_OFFSET)
>> arm64-obj-y := cputable.o debug-monitors.o entry.o irq.o fpsimd.o \
>> entry-fpsimd.o process.o ptrace.o setup.o signal.o \
>> sys.o stacktrace.o time.o traps.o io.o vdso.o \
>> - hyp-stub.o psci.o
>> + hyp-stub.o psci.o insn.o
>>
>> arm64-obj-$(CONFIG_COMPAT) += sys32.o kuser32.o signal32.o \
>> sys_compat.o
>> diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
>> new file mode 100644
>> index 0000000..1be4d11
>> --- /dev/null
>> +++ b/arch/arm64/kernel/insn.c
>> @@ -0,0 +1,86 @@
>> +/*
>> + * Copyright (C) 2013 Huawei Ltd.
>> + * Author: Jiang Liu <[email protected]>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +#include <linux/compiler.h>
>> +#include <linux/kernel.h>
>> +#include <asm/insn.h>
>> +
>> +/*
>> + * ARM Architecture Reference Manual ARMv8, Section C3.1
>> + * AARCH64 main encoding table
>
> AArch64.
Fix it in next version.

>
>> + * Bit position
>> + * 28 27 26 25 Encoding Group
>> + * 0 0 - - Unallocated
>> + * 1 0 0 - Data processing, immediate
>> + * 1 0 1 - Branch, exception generation and system instructions
>> + * - 1 - 0 Loads and stores
>> + * - 1 0 1 Data processing - register
>> + * 0 1 1 1 Data processing - SIMD and floating point
>> + * 1 1 1 1 Data processing - SIMD and floating point
>> + * "-" means "don't care"
>> + */
>
> This comment would probably be better off in the header file, along with the
> mask/value pairs you use there. Failing that, the table below should at
> least live alongside the enum type and mask/value stuff.
Will move it into insn.h

>
>> +static int aarch64_insn_cls[] = {
>> + AARCH64_INSN_CLS_UNKNOWN,
>> + AARCH64_INSN_CLS_UNKNOWN,
>> + AARCH64_INSN_CLS_UNKNOWN,
>> + AARCH64_INSN_CLS_UNKNOWN,
>> + AARCH64_INSN_CLS_LDST,
>> + AARCH64_INSN_CLS_DP_REG,
>> + AARCH64_INSN_CLS_LDST,
>> + AARCH64_INSN_CLS_DP_FPSIMD,
>> + AARCH64_INSN_CLS_DP_IMM,
>> + AARCH64_INSN_CLS_DP_IMM,
>> + AARCH64_INSN_CLS_BR_SYS,
>> + AARCH64_INSN_CLS_BR_SYS,
>> + AARCH64_INSN_CLS_LDST,
>> + AARCH64_INSN_CLS_DP_REG,
>> + AARCH64_INSN_CLS_LDST,
>> + AARCH64_INSN_CLS_DP_FPSIMD,
>> +};
>> +
>> +enum aarch64_insn_class __kprobes aarch64_get_insn_class(u32 insn)
>> +{
>> + return aarch64_insn_cls[(insn >> 25) & 0xf];
>> +}
>> +
>> +static bool __kprobes __aarch64_insn_hotpatch_safe(u32 insn)
>> +{
>> + if (aarch64_get_insn_class(insn) != AARCH64_INSN_CLS_BR_SYS)
>> + return false;
>> +
>> + return aarch64_insn_is_b(insn) ||
>> + aarch64_insn_is_bl(insn) ||
>> + aarch64_insn_is_svc(insn) ||
>> + aarch64_insn_is_hvc(insn) ||
>> + aarch64_insn_is_smc(insn) ||
>> + aarch64_insn_is_brk(insn) ||
>> + aarch64_insn_is_nop(insn);
>> +}
>> +
>> +/*
>> + * ARMv8-A Section B2.6.5:
>> + * Concurrent modification and execution of instructions can lead to the
>> + * resulting instruction performing any behavior that can be achieved by
>> + * executing any sequence of instructions that can be executed from the
>> + * same Exception level, except where the instruction before modification
>> + * and the instruction after modification is a B, BL, NOP, BKPT, SVC, HVC,
>> + * or SMC instruction.
>> + */
>> +bool __kprobes aarch64_insn_hotpatch_safe(u32 old_insn, u32 new_insn)
>> +{
>> + return __aarch64_insn_hotpatch_safe(old_insn) &&
>> + __aarch64_insn_hotpatch_safe(new_insn);
>
> Take care here:
>
> - This doesn't guarantee that CPUs exclusively see new_insn (they
> may see old_insn, even after patching)
Good point. Still need some synchronization mechanism lighter than
stop_machine(). How about kick_all_cpus_sync(), which should be lighter
than stop_machine()?

>
> - If you patch a conditional branch and change both the target address
> *and* the condition, you can get combinations of target/condition
> between old_insn and new_insn.
>
> Then again, the jump_label code should always be patching between B and NOP
> instructions, right?
Yeah, should be safe for jump label because it only uses unconditional
B and NOP.

>
> Will
>

2013-10-16 16:15:22

by Jiang Liu

[permalink] [raw]
Subject: Re: [PATCH v3 2/7] arm64: introduce interfaces to hotpatch kernel and module code

On 10/16/2013 07:11 PM, Will Deacon wrote:
> On Wed, Oct 16, 2013 at 04:18:07AM +0100, Jiang Liu wrote:
>> From: Jiang Liu <[email protected]>
>>
>> Introduce three interfaces to patch kernel and module code:
>> aarch64_insn_patch_text_nosync():
>> patch code without synchronization, it's caller's responsibility
>> to synchronize all CPUs if needed.
>> aarch64_insn_patch_text_sync():
>> patch code and always synchronize with stop_machine()
>> aarch64_insn_patch_text():
>> patch code and synchronize with stop_machine() if needed
>>
>> Signed-off-by: Jiang Liu <[email protected]>
>> Cc: Jiang Liu <[email protected]>
>> ---
>> arch/arm64/include/asm/insn.h | 7 +++-
>> arch/arm64/kernel/insn.c | 95 +++++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 101 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
>> index e7d1bc8..2dfcdb4 100644
>> --- a/arch/arm64/include/asm/insn.h
>> +++ b/arch/arm64/include/asm/insn.h
>> @@ -47,7 +47,12 @@ __AARCH64_INSN_FUNCS(nop, 0xFFFFFFFF, 0xD503201F)
>> #undef __AARCH64_INSN_FUNCS
>>
>> enum aarch64_insn_class aarch64_get_insn_class(u32 insn);
>> -
>> +u32 aarch64_insn_read(void *addr);
>> +void aarch64_insn_write(void *addr, u32 insn);
>> bool aarch64_insn_hotpatch_safe(u32 old_insn, u32 new_insn);
>>
>> +int aarch64_insn_patch_text_nosync(void *addrs[], u32 insns[], int cnt);
>> +int aarch64_insn_patch_text_sync(void *addrs[], u32 insns[], int cnt);
>> +int aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt);
>> +
>> #endif /* _ASM_ARM64_INSN_H */
>> diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
>> index 1be4d11..ad4185f 100644
>> --- a/arch/arm64/kernel/insn.c
>> +++ b/arch/arm64/kernel/insn.c
>> @@ -16,6 +16,8 @@
>> */
>> #include <linux/compiler.h>
>> #include <linux/kernel.h>
>> +#include <linux/stop_machine.h>
>> +#include <asm/cacheflush.h>
>> #include <asm/insn.h>
>>
>> /*
>> @@ -84,3 +86,96 @@ bool __kprobes aarch64_insn_hotpatch_safe(u32 old_insn, u32 new_insn)
>> return __aarch64_insn_hotpatch_safe(old_insn) &&
>> __aarch64_insn_hotpatch_safe(new_insn);
>> }
>> +
>> +/*
>> + * In ARMv8-A, A64 instructions have a fixed length of 32 bits and are always
>> + * little-endian. On the other hand, SCTLR_EL1.EE (bit 25, Exception Endianness)
>> + * flag controls endianness for EL1 explicit data accesses and stage 1
>> + * translation table walks as below:
>> + * 0: little-endian
>> + * 1: big-endian
>> + * So need to handle endianness when patching kernel code.
>> + */
>> +u32 __kprobes aarch64_insn_read(void *addr)
>> +{
>> + u32 insn;
>> +
>> +#ifdef __AARCH64EB__
>> + insn = swab32(*(u32 *)addr);
>> +#else
>> + insn = *(u32 *)addr;
>> +#endif
>
> le32_to_cpu ?
>
>> +
>> + return insn;
>> +}
>> +
>> +void __kprobes aarch64_insn_write(void *addr, u32 insn)
>> +{
>> +#ifdef __AARCH64EB__
>> + *(u32 *)addr = swab32(insn);
>> +#else
>> + *(u32 *)addr = insn;
>> +#endif
>> +}
>
> cpu_to_le32 ?
Good suggestion, much more simpler.

>
>> +int __kprobes aarch64_insn_patch_text_nosync(void *addrs[], u32 insns[],
>> + int cnt)
>> +{
>> + int i;
>> + u32 *tp;
>> +
>> + if (cnt <= 0)
>> + return -EINVAL;
>> +
>> + for (i = 0; i < cnt; i++) {
>> + tp = addrs[i];
>> + /* A64 instructions must be word aligned */
>> + if ((uintptr_t)tp & 0x3)
>> + return -EINVAL;
>> + aarch64_insn_write(tp, insns[i]);
>> + flush_icache_range((uintptr_t)tp, (uintptr_t)tp + sizeof(u32));
>
> What are you trying to achieve with this cache maintenance for the nosync
> case? If you're not synchronising, then you will always have races with the
> instruction patching, so I'd argue that this cache flush doesn't buy you
> anything.
aarch64_insn_patch_text_nosync() may be used in cases of
1) during early boot with only the master CPU running.
2) during runtime with all other CPUs in controlled state, such as kgdb.
3) patching hot-patching safe instructions
So flush_icache_range() is used to support case 1 and ensure local CPU
sees the new instructions.

>
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +struct aarch64_insn_patch {
>> + void **text_addrs;
>> + u32 *new_insns;
>> + int insn_cnt;
>> +};
>> +
>> +static int __kprobes aarch64_insn_patch_text_cb(void *arg)
>> +{
>> + struct aarch64_insn_patch *pp = arg;
>> +
>> + return aarch64_insn_patch_text_nosync(pp->text_addrs, pp->new_insns,
>> + pp->insn_cnt);
>> +}
>> +
>> +int __kprobes aarch64_insn_patch_text_sync(void *addrs[], u32 insns[], int cnt)
>> +{
>> + struct aarch64_insn_patch patch = {
>> + .text_addrs = addrs,
>> + .new_insns = insns,
>> + .insn_cnt = cnt,
>> + };
>> +
>> + if (cnt <= 0)
>> + return -EINVAL;
>> +
>> + /*
>> + * Execute __aarch64_insn_patch_text() on every online CPU,
>> + * which ensure serialization among all online CPUs.
>> + */
>> + return stop_machine(aarch64_insn_patch_text_cb, &patch, NULL);
>> +}
>> +
>> +int __kprobes aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt)
>> +{
>> + if (cnt == 1 && aarch64_insn_hotpatch_safe(aarch64_insn_read(addrs[0]),
>> + insns[0]))
>> + return aarch64_insn_patch_text_nosync(addrs, insns, cnt);
>> + else
>> + return aarch64_insn_patch_text_sync(addrs, insns, cnt);
>> +}
>
> The other way of doing this for cnt > 1 would be to patch in a branch to
> the insns array and then a branch over the original code at the end of the
> array. Obviously, this relies on insns being allocated somewhere persistent
> (and non-pageable!).
Kprobe uses the optimization method described above, but it's rather
complex and depends on the instructions in the insns array.

>
> I was wondering whether you could do something clever with BKPT, but
> everything I've thought of so far is racy.
Need more time to think about this optimization.

According to my understanding, it's not safe to patch a normal
instruction (non B, BL, SVC, HVC, SMC, NOP, BRK) with BRK too.
If that's true, it will be challenging to avoid stop_machine()
here.

Thanks!
Gerry
>
> Will
>

2013-10-16 16:34:06

by Jiang Liu

[permalink] [raw]
Subject: Re: [PATCH v3 3/7] arm64: move encode_insn_immediate() from module.c to insn.c

On 10/16/2013 07:22 PM, Will Deacon wrote:
> On Wed, Oct 16, 2013 at 04:18:08AM +0100, Jiang Liu wrote:
>> From: Jiang Liu <[email protected]>
>>
>> Function encode_insn_immediate() will be used by other instruction
>> manipulate related functions, so move it into insn.c and rename it
>> as aarch64_insn_encode_immediate().
>>
>> Signed-off-by: Jiang Liu <[email protected]>
>> Cc: Jiang Liu <[email protected]>
>> ---
>> arch/arm64/include/asm/insn.h | 14 ++++
>> arch/arm64/kernel/insn.c | 77 +++++++++++++++++++++
>> arch/arm64/kernel/module.c | 151 +++++++++---------------------------------
>> 3 files changed, 123 insertions(+), 119 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
>> index 2dfcdb4..8dc0a91 100644
>> --- a/arch/arm64/include/asm/insn.h
>> +++ b/arch/arm64/include/asm/insn.h
>> @@ -28,6 +28,18 @@ enum aarch64_insn_class {
>> * system instructions */
>> };
>>
>> +enum aarch64_insn_imm_type {
>> + AARCH64_INSN_IMM_MOVNZ,
>> + AARCH64_INSN_IMM_MOVK,
>> + AARCH64_INSN_IMM_ADR,
>> + AARCH64_INSN_IMM_26,
>> + AARCH64_INSN_IMM_19,
>> + AARCH64_INSN_IMM_16,
>> + AARCH64_INSN_IMM_14,
>> + AARCH64_INSN_IMM_12,
>> + AARCH64_INSN_IMM_9,
>> +};
>> +
>> #define __AARCH64_INSN_FUNCS(abbr, mask, val) \
>> static __always_inline bool aarch64_insn_is_##abbr(u32 code) \
>> { return (code & (mask)) == (val); } \
>> @@ -47,6 +59,8 @@ __AARCH64_INSN_FUNCS(nop, 0xFFFFFFFF, 0xD503201F)
>> #undef __AARCH64_INSN_FUNCS
>>
>> enum aarch64_insn_class aarch64_get_insn_class(u32 insn);
>> +u32 aarch64_insn_encode_immediate(enum aarch64_insn_imm_type type,
>> + u32 insn, u64 imm);
>> u32 aarch64_insn_read(void *addr);
>> void aarch64_insn_write(void *addr, u32 insn);
>> bool aarch64_insn_hotpatch_safe(u32 old_insn, u32 new_insn);
>> diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
>> index ad4185f..90cc312 100644
>> --- a/arch/arm64/kernel/insn.c
>> +++ b/arch/arm64/kernel/insn.c
>> @@ -179,3 +179,80 @@ int __kprobes aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt)
>> else
>> return aarch64_insn_patch_text_sync(addrs, insns, cnt);
>> }
>> +
>> +u32 aarch64_insn_encode_immediate(enum aarch64_insn_imm_type type,
>> + u32 insn, u64 imm)
>> +{
>> + u32 immlo, immhi, lomask, himask, mask;
>> + int shift;
>> +
>> + switch (type) {
>> + case AARCH64_INSN_IMM_MOVNZ:
>> + /*
>> + * For signed MOVW relocations, we have to manipulate the
>> + * instruction encoding depending on whether or not the
>> + * immediate is less than zero.
>> + */
>> + insn &= ~(3 << 29);
>> + if ((s64)imm >= 0) {
>> + /* >=0: Set the instruction to MOVZ (opcode 10b). */
>> + insn |= 2 << 29;
>> + } else {
>> + /*
>> + * <0: Set the instruction to MOVN (opcode 00b).
>> + * Since we've masked the opcode already, we
>> + * don't need to do anything other than
>> + * inverting the new immediate field.
>> + */
>> + imm = ~imm;
>> + }
>
> I'm really not comfortable with this. This code is performing static
> relocations and re-encoding instructions as required by the AArch64 ELF
> spec. That's not really what you'd expect from a generic instruction
> encoder!
Thanks for reminder, will move above code back into module.c.
Thanks!
Gerry

>
> Will
>

2013-10-16 17:11:54

by Jiang Liu

[permalink] [raw]
Subject: Re: [PATCH v3 6/7] arm64, jump label: optimize jump label implementation

On 10/16/2013 07:46 PM, Will Deacon wrote:
> On Wed, Oct 16, 2013 at 04:18:11AM +0100, Jiang Liu wrote:
>> From: Jiang Liu <[email protected]>
>>
>> Optimize jump label implementation for ARM64 by dynamically patching
>> kernel text.
>>
>> Signed-off-by: Jiang Liu <[email protected]>
>> Cc: Jiang Liu <[email protected]>
>> ---
>> arch/arm64/Kconfig | 1 +
>> arch/arm64/include/asm/jump_label.h | 52 ++++++++++++++++++++++++++++++++
>> arch/arm64/kernel/Makefile | 1 +
>> arch/arm64/kernel/jump_label.c | 60 +++++++++++++++++++++++++++++++++++++
>> 4 files changed, 114 insertions(+)
>> create mode 100644 arch/arm64/include/asm/jump_label.h
>> create mode 100644 arch/arm64/kernel/jump_label.c
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index c044548..da388e4 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -17,6 +17,7 @@ config ARM64
>> select GENERIC_SMP_IDLE_THREAD
>> select GENERIC_TIME_VSYSCALL
>> select HARDIRQS_SW_RESEND
>> + select HAVE_ARCH_JUMP_LABEL
>> select HAVE_ARCH_TRACEHOOK
>> select HAVE_DEBUG_BUGVERBOSE
>> select HAVE_DEBUG_KMEMLEAK
>> diff --git a/arch/arm64/include/asm/jump_label.h b/arch/arm64/include/asm/jump_label.h
>> new file mode 100644
>> index 0000000..d268fab
>> --- /dev/null
>> +++ b/arch/arm64/include/asm/jump_label.h
>> @@ -0,0 +1,52 @@
>> +/*
>> + * Copyright (C) 2013 Huawei Ltd.
>> + * Author: Jiang Liu <[email protected]>
>> + *
>> + * Based on arch/arm/include/asm/jump_label.h
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +#ifndef _ASM_ARM64_JUMP_LABEL_H
>> +#define _ASM_ARM64_JUMP_LABEL_H
>> +#include <linux/types.h>
>> +
>> +#ifdef __KERNEL__
>> +
>> +#define JUMP_LABEL_NOP_SIZE 4
>> +
>> +static __always_inline bool arch_static_branch(struct static_key *key)
>> +{
>> + asm goto("1:\n\t"
>> + "nop\n\t"
>> + ".pushsection __jump_table, \"aw\"\n\t"
>> + ".align 3\n\t"
>> + ".quad 1b, %l[l_yes], %c0\n\t"
>> + ".popsection\n\t"
>> + : : "i"(key) : : l_yes);
>> +
>> + return false;
>> +l_yes:
>> + return true;
>> +}
>> +
>> +#endif /* __KERNEL__ */
>> +
>> +typedef u64 jump_label_t;
>> +
>> +struct jump_entry {
>> + jump_label_t code;
>> + jump_label_t target;
>> + jump_label_t key;
>> +};
>> +
>> +#endif /* _ASM_ARM64_JUMP_LABEL_H */
>> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
>> index 9af6cb3..b7db65e 100644
>> --- a/arch/arm64/kernel/Makefile
>> +++ b/arch/arm64/kernel/Makefile
>> @@ -18,6 +18,7 @@ arm64-obj-$(CONFIG_SMP) += smp.o smp_spin_table.o smp_psci.o
>> arm64-obj-$(CONFIG_HW_PERF_EVENTS) += perf_event.o
>> arm64-obj-$(CONFIG_HAVE_HW_BREAKPOINT)+= hw_breakpoint.o
>> arm64-obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
>> +arm64-obj-$(CONFIG_JUMP_LABEL) += jump_label.o
>>
>> obj-y += $(arm64-obj-y) vdso/
>> obj-m += $(arm64-obj-m)
>> diff --git a/arch/arm64/kernel/jump_label.c b/arch/arm64/kernel/jump_label.c
>> new file mode 100644
>> index 0000000..74cbc73
>> --- /dev/null
>> +++ b/arch/arm64/kernel/jump_label.c
>> @@ -0,0 +1,60 @@
>> +/*
>> + * Copyright (C) 2013 Huawei Ltd.
>> + * Author: Jiang Liu <[email protected]>
>> + *
>> + * Based on arch/arm/kernel/jump_label.c
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +#include <linux/kernel.h>
>> +#include <linux/jump_label.h>
>> +#include <asm/jump_label.h>
>> +#include <asm/insn.h>
>> +
>> +#ifdef HAVE_JUMP_LABEL
>> +
>> +static void __arch_jump_label_transform(struct jump_entry *entry,
>> + enum jump_label_type type,
>> + bool is_static)
>> +{
>> + void *addr = (void *)entry->code;
>> + u32 insn;
>> +
>> + if (type == JUMP_LABEL_ENABLE) {
>> + /* no way out if instruction offset is out of range(+/-128M) */
>> + insn = aarch64_insn_gen_branch_imm(entry->code,
>> + entry->target, 0);
>> + BUG_ON(!insn);
>
> In this case, rather than BUG_ON, it would be better (somehow) to abort
> optimisation and instead patch in a branch to the actual condition check.
Hi Will,
The ARM64 virtual address arrangement already ensures that all
kernel and module texts are within +/-128M, otherwise even module load
will fail. So the BUG_ON() should never happen.

Will add more comments into aarch64_insn_gen_branch_imm().

>
>> + } else {
>> + insn = aarch64_insn_gen_nop();
>
> You could make the code more concise by limiting your patching ability to
> branch immediates. Then a nop is simply a branch to the next instruction (I
> doubt any modern CPUs will choke on this, whereas the architecture requires
> a NOP to take time).
I guess a NOP should be more effecient than a "B #4" on real CPUs:)

>
> Will
>

2013-10-16 17:14:58

by Jiang Liu

[permalink] [raw]
Subject: Re: [PATCH v3 1/7] arm64: introduce basic aarch64 instruction decoding helpers

On 10/16/2013 11:36 PM, Jiang Liu wrote:
> On 10/16/2013 06:51 PM, Will Deacon wrote:
>> On Wed, Oct 16, 2013 at 04:18:06AM +0100, Jiang Liu wrote:
>>> From: Jiang Liu <[email protected]>
>>>
>>> Introduce basic aarch64 instruction decoding helper
>>> aarch64_get_insn_class() and aarch64_insn_hotpatch_safe().
>>>
>>> Signed-off-by: Jiang Liu <[email protected]>
>>> Cc: Jiang Liu <[email protected]>
>>> ---
>>> arch/arm64/include/asm/insn.h | 53 ++++++++++++++++++++++++++
>>> arch/arm64/kernel/Makefile | 2 +-
>>> arch/arm64/kernel/insn.c | 86 +++++++++++++++++++++++++++++++++++++++++++
>>> 3 files changed, 140 insertions(+), 1 deletion(-)
>>> create mode 100644 arch/arm64/include/asm/insn.h
>>> create mode 100644 arch/arm64/kernel/insn.c
>>>
>>> diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
>>> new file mode 100644
>>> index 0000000..e7d1bc8
>>> --- /dev/null
>>> +++ b/arch/arm64/include/asm/insn.h
>>> @@ -0,0 +1,53 @@
>>> +/*
>>> + * Copyright (C) 2013 Huawei Ltd.
>>> + * Author: Jiang Liu <[email protected]>
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify
>>> + * it under the terms of the GNU General Public License version 2 as
>>> + * published by the Free Software Foundation.
>>> + *
>>> + * This program is distributed in the hope that it will be useful,
>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>>> + * GNU General Public License for more details.
>>> + *
>>> + * You should have received a copy of the GNU General Public License
>>> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
>>> + */
>>> +#ifndef _ASM_ARM64_INSN_H
>>> +#define _ASM_ARM64_INSN_H
>>> +#include <linux/types.h>
>>> +
>>> +enum aarch64_insn_class {
>>> + AARCH64_INSN_CLS_UNKNOWN, /* UNALLOCATED */
>>> + AARCH64_INSN_CLS_DP_IMM, /* Data processing - immediate */
>>> + AARCH64_INSN_CLS_DP_REG, /* Data processing - register */
>>> + AARCH64_INSN_CLS_DP_FPSIMD, /* Data processing - SIMD and FP */
>>> + AARCH64_INSN_CLS_LDST, /* Loads and stores */
>>> + AARCH64_INSN_CLS_BR_SYS, /* Branch, exception generation and
>>> + * system instructions */
>>> +};
>>
>> Strictly speaking, these are encoding groups, not instruction classes.
> Hi Will,
> Thanks for review.
> Will change it to "aarch64_insn_encoding_class.
>
>>
>>> +
>>> +#define __AARCH64_INSN_FUNCS(abbr, mask, val) \
>>> +static __always_inline bool aarch64_insn_is_##abbr(u32 code) \
>>> +{ return (code & (mask)) == (val); } \
>>> +static __always_inline u32 aarch64_insn_get_##abbr##_mask(void) \
>>> +{ return (mask); } \
>>> +static __always_inline u32 aarch64_insn_get_##abbr##_value(void) \
>>> +{ return (val); }
>>> +
>>> +__AARCH64_INSN_FUNCS(b, 0xFC000000, 0x14000000)
>>> +__AARCH64_INSN_FUNCS(bl, 0xFC000000, 0x94000000)
>>> +__AARCH64_INSN_FUNCS(svc, 0xFFE0001F, 0xD4000001)
>>> +__AARCH64_INSN_FUNCS(hvc, 0xFFE0001F, 0xD4000002)
>>> +__AARCH64_INSN_FUNCS(smc, 0xFFE0001F, 0xD4000003)
>>> +__AARCH64_INSN_FUNCS(brk, 0xFFE0001F, 0xD4200000)
>>> +__AARCH64_INSN_FUNCS(nop, 0xFFFFFFFF, 0xD503201F)
>>> +
>>> +#undef __AARCH64_INSN_FUNCS
>>> +
>>> +enum aarch64_insn_class aarch64_get_insn_class(u32 insn);
>>> +
>>> +bool aarch64_insn_hotpatch_safe(u32 old_insn, u32 new_insn);
>>> +
>>> +#endif /* _ASM_ARM64_INSN_H */
>>> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
>>> index 7b4b564..9af6cb3 100644
>>> --- a/arch/arm64/kernel/Makefile
>>> +++ b/arch/arm64/kernel/Makefile
>>> @@ -9,7 +9,7 @@ AFLAGS_head.o := -DTEXT_OFFSET=$(TEXT_OFFSET)
>>> arm64-obj-y := cputable.o debug-monitors.o entry.o irq.o fpsimd.o \
>>> entry-fpsimd.o process.o ptrace.o setup.o signal.o \
>>> sys.o stacktrace.o time.o traps.o io.o vdso.o \
>>> - hyp-stub.o psci.o
>>> + hyp-stub.o psci.o insn.o
>>>
>>> arm64-obj-$(CONFIG_COMPAT) += sys32.o kuser32.o signal32.o \
>>> sys_compat.o
>>> diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
>>> new file mode 100644
>>> index 0000000..1be4d11
>>> --- /dev/null
>>> +++ b/arch/arm64/kernel/insn.c
>>> @@ -0,0 +1,86 @@
>>> +/*
>>> + * Copyright (C) 2013 Huawei Ltd.
>>> + * Author: Jiang Liu <[email protected]>
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify
>>> + * it under the terms of the GNU General Public License version 2 as
>>> + * published by the Free Software Foundation.
>>> + *
>>> + * This program is distributed in the hope that it will be useful,
>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>>> + * GNU General Public License for more details.
>>> + *
>>> + * You should have received a copy of the GNU General Public License
>>> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
>>> + */
>>> +#include <linux/compiler.h>
>>> +#include <linux/kernel.h>
>>> +#include <asm/insn.h>
>>> +
>>> +/*
>>> + * ARM Architecture Reference Manual ARMv8, Section C3.1
>>> + * AARCH64 main encoding table
>>
>> AArch64.
> Fix it in next version.
>
>>
>>> + * Bit position
>>> + * 28 27 26 25 Encoding Group
>>> + * 0 0 - - Unallocated
>>> + * 1 0 0 - Data processing, immediate
>>> + * 1 0 1 - Branch, exception generation and system instructions
>>> + * - 1 - 0 Loads and stores
>>> + * - 1 0 1 Data processing - register
>>> + * 0 1 1 1 Data processing - SIMD and floating point
>>> + * 1 1 1 1 Data processing - SIMD and floating point
>>> + * "-" means "don't care"
>>> + */
>>
>> This comment would probably be better off in the header file, along with the
>> mask/value pairs you use there. Failing that, the table below should at
>> least live alongside the enum type and mask/value stuff.
> Will move it into insn.h
>
>>
>>> +static int aarch64_insn_cls[] = {
>>> + AARCH64_INSN_CLS_UNKNOWN,
>>> + AARCH64_INSN_CLS_UNKNOWN,
>>> + AARCH64_INSN_CLS_UNKNOWN,
>>> + AARCH64_INSN_CLS_UNKNOWN,
>>> + AARCH64_INSN_CLS_LDST,
>>> + AARCH64_INSN_CLS_DP_REG,
>>> + AARCH64_INSN_CLS_LDST,
>>> + AARCH64_INSN_CLS_DP_FPSIMD,
>>> + AARCH64_INSN_CLS_DP_IMM,
>>> + AARCH64_INSN_CLS_DP_IMM,
>>> + AARCH64_INSN_CLS_BR_SYS,
>>> + AARCH64_INSN_CLS_BR_SYS,
>>> + AARCH64_INSN_CLS_LDST,
>>> + AARCH64_INSN_CLS_DP_REG,
>>> + AARCH64_INSN_CLS_LDST,
>>> + AARCH64_INSN_CLS_DP_FPSIMD,
>>> +};
>>> +
>>> +enum aarch64_insn_class __kprobes aarch64_get_insn_class(u32 insn)
>>> +{
>>> + return aarch64_insn_cls[(insn >> 25) & 0xf];
>>> +}
>>> +
>>> +static bool __kprobes __aarch64_insn_hotpatch_safe(u32 insn)
>>> +{
>>> + if (aarch64_get_insn_class(insn) != AARCH64_INSN_CLS_BR_SYS)
>>> + return false;
>>> +
>>> + return aarch64_insn_is_b(insn) ||
>>> + aarch64_insn_is_bl(insn) ||
>>> + aarch64_insn_is_svc(insn) ||
>>> + aarch64_insn_is_hvc(insn) ||
>>> + aarch64_insn_is_smc(insn) ||
>>> + aarch64_insn_is_brk(insn) ||
>>> + aarch64_insn_is_nop(insn);
>>> +}
>>> +
>>> +/*
>>> + * ARMv8-A Section B2.6.5:
>>> + * Concurrent modification and execution of instructions can lead to the
>>> + * resulting instruction performing any behavior that can be achieved by
>>> + * executing any sequence of instructions that can be executed from the
>>> + * same Exception level, except where the instruction before modification
>>> + * and the instruction after modification is a B, BL, NOP, BKPT, SVC, HVC,
>>> + * or SMC instruction.
>>> + */
>>> +bool __kprobes aarch64_insn_hotpatch_safe(u32 old_insn, u32 new_insn)
>>> +{
>>> + return __aarch64_insn_hotpatch_safe(old_insn) &&
>>> + __aarch64_insn_hotpatch_safe(new_insn);
>>
>> Take care here:
>>
>> - This doesn't guarantee that CPUs exclusively see new_insn (they
>> may see old_insn, even after patching)
> Good point. Still need some synchronization mechanism lighter than
> stop_machine(). How about kick_all_cpus_sync(), which should be lighter
> than stop_machine()?
For jump label, x86 uses three smp_call_function() to replace a
stop_machine(). Seems it's valuable for us to use just one
kick_all_cpus_sync() to avoid a stop_machine().

Thanks!
Gerry

>
>>
>> - If you patch a conditional branch and change both the target address
>> *and* the condition, you can get combinations of target/condition
>> between old_insn and new_insn.
>>
>> Then again, the jump_label code should always be patching between B and NOP
>> instructions, right?
> Yeah, should be safe for jump label because it only uses unconditional
> B and NOP.
>
>>
>> Will
>>
>

2013-10-17 09:40:26

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v3 6/7] arm64, jump label: optimize jump label implementation

On Wed, Oct 16, 2013 at 06:11:45PM +0100, Jiang Liu wrote:
> On 10/16/2013 07:46 PM, Will Deacon wrote:
> >
> >> + } else {
> >> + insn = aarch64_insn_gen_nop();
> >
> > You could make the code more concise by limiting your patching ability to
> > branch immediates. Then a nop is simply a branch to the next instruction (I
> > doubt any modern CPUs will choke on this, whereas the architecture requires
> > a NOP to take time).
> I guess a NOP should be more effecient than a "B #4" on real CPUs:)

Well, I was actually questioning that. A NOP *has* to take time (the
architecture prevents implementations from discaring it) whereas a static,
unconditional branch will likely be discarded early on by CPUs with even
simple branch prediction logic.

Will

2013-10-17 09:44:00

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v3 1/7] arm64: introduce basic aarch64 instruction decoding helpers

On Wed, Oct 16, 2013 at 06:14:50PM +0100, Jiang Liu wrote:
> On 10/16/2013 11:36 PM, Jiang Liu wrote:
> > On 10/16/2013 06:51 PM, Will Deacon wrote:
> >> Take care here:
> >>
> >> - This doesn't guarantee that CPUs exclusively see new_insn (they
> >> may see old_insn, even after patching)
> > Good point. Still need some synchronization mechanism lighter than
> > stop_machine(). How about kick_all_cpus_sync(), which should be lighter
> > than stop_machine()?
> For jump label, x86 uses three smp_call_function() to replace a
> stop_machine(). Seems it's valuable for us to use just one
> kick_all_cpus_sync() to avoid a stop_machine().

Yes, the exception return after handling the IPI from kick_all_cpus_sync
should be enough to synchronise the pipeline, providing that all cache
maintenance by the CPU writing the new instructions has completed (if you're
using flush_icache_range then you're fine).

Will

2013-10-17 14:40:42

by Jiang Liu

[permalink] [raw]
Subject: Re: [PATCH v3 6/7] arm64, jump label: optimize jump label implementation

On 10/17/2013 05:39 PM, Will Deacon wrote:
> On Wed, Oct 16, 2013 at 06:11:45PM +0100, Jiang Liu wrote:
>> On 10/16/2013 07:46 PM, Will Deacon wrote:
>>>
>>>> + } else {
>>>> + insn = aarch64_insn_gen_nop();
>>>
>>> You could make the code more concise by limiting your patching ability to
>>> branch immediates. Then a nop is simply a branch to the next instruction (I
>>> doubt any modern CPUs will choke on this, whereas the architecture requires
>>> a NOP to take time).
>> I guess a NOP should be more effecient than a "B #4" on real CPUs:)
>
> Well, I was actually questioning that. A NOP *has* to take time (the
> architecture prevents implementations from discaring it) whereas a static,
> unconditional branch will likely be discarded early on by CPUs with even
> simple branch prediction logic.
I naively thought "NOP" is cheaper than a "B" :(
Will use a "B #1" to replace "NOP".

Thanks!
Gerry

>
> Will
>

2013-10-17 15:27:16

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v3 6/7] arm64, jump label: optimize jump label implementation

On Thu, 17 Oct 2013 22:40:32 +0800
Jiang Liu <[email protected]> wrote:


> >>> You could make the code more concise by limiting your patching ability to
> >>> branch immediates. Then a nop is simply a branch to the next instruction (I
> >>> doubt any modern CPUs will choke on this, whereas the architecture requires
> >>> a NOP to take time).
> >> I guess a NOP should be more effecient than a "B #4" on real CPUs:)
> >
> > Well, I was actually questioning that. A NOP *has* to take time (the
> > architecture prevents implementations from discaring it) whereas a static,
> > unconditional branch will likely be discarded early on by CPUs with even
> > simple branch prediction logic.
> I naively thought "NOP" is cheaper than a "B" :(
> Will use a "B #1" to replace "NOP".
>

Really?? What's the purpose of a NOP then? It seems to me that an
architecture is broken if a NOP is slower than a static branch.

-- Steve

2013-10-18 03:34:54

by Jiang Liu (Gerry)

[permalink] [raw]
Subject: Re: [PATCH v3 6/7] arm64, jump label: optimize jump label implementation


On 2013/10/17 23:27, Steven Rostedt wrote:
> On Thu, 17 Oct 2013 22:40:32 +0800
> Jiang Liu <[email protected]> wrote:
>
>
>>>>> You could make the code more concise by limiting your patching ability to
>>>>> branch immediates. Then a nop is simply a branch to the next instruction (I
>>>>> doubt any modern CPUs will choke on this, whereas the architecture requires
>>>>> a NOP to take time).
>>>> I guess a NOP should be more effecient than a "B #4" on real CPUs:)
>>>
>>> Well, I was actually questioning that. A NOP *has* to take time (the
>>> architecture prevents implementations from discaring it) whereas a static,
>>> unconditional branch will likely be discarded early on by CPUs with even
>>> simple branch prediction logic.
>> I naively thought "NOP" is cheaper than a "B" :(
>> Will use a "B #1" to replace "NOP".
>>
>
> Really?? What's the purpose of a NOP then? It seems to me that an
> architecture is broken if a NOP is slower than a static branch.
>
> -- Steve
Hi Steve and Will,
I have discussed this with one of our chip design members.
He thinks "NOP" should be better than "B #1" because jump instruction
is one of the most complex instructions for microarchitecture, which
may stall the pipeline. And NOP should be friendly enough for all
microarchitectures. So I will keep the "NOP" version.
Thanks!
Gerry

2013-10-18 10:02:55

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v3 6/7] arm64, jump label: optimize jump label implementation

Hi guys,

On Fri, Oct 18, 2013 at 04:31:22AM +0100, Jiang Liu (Gerry) wrote:
> On 2013/10/17 23:27, Steven Rostedt wrote:
> > On Thu, 17 Oct 2013 22:40:32 +0800
> > Jiang Liu <[email protected]> wrote:
> >
> >
> >>>>> You could make the code more concise by limiting your patching ability to
> >>>>> branch immediates. Then a nop is simply a branch to the next instruction (I
> >>>>> doubt any modern CPUs will choke on this, whereas the architecture requires
> >>>>> a NOP to take time).
> >>>> I guess a NOP should be more effecient than a "B #4" on real CPUs:)
> >>>
> >>> Well, I was actually questioning that. A NOP *has* to take time (the
> >>> architecture prevents implementations from discaring it) whereas a static,
> >>> unconditional branch will likely be discarded early on by CPUs with even
> >>> simple branch prediction logic.
> >> I naively thought "NOP" is cheaper than a "B" :(
> >> Will use a "B #1" to replace "NOP".
> >>
> >
> > Really?? What's the purpose of a NOP then? It seems to me that an
> > architecture is broken if a NOP is slower than a static branch.

Cheers for making me double-check this: it turns out I was mixing up my
architecture and micro-architecture. The architecture actually states:

`The timing effects of including a NOP instruction in a program are not
guaranteed. It can increase execution time, leave it unchanged, or even
reduce it. Therefore, NOP instructions are not suitable for timing loops.'

however I know of at least one micro-architecture where a NOP is actually
more expensive than some other instructions, hence my original concerns.

> I have discussed this with one of our chip design members.
> He thinks "NOP" should be better than "B #1" because jump instruction
> is one of the most complex instructions for microarchitecture, which
> may stall the pipeline. And NOP should be friendly enough for all
> microarchitectures. So I will keep the "NOP" version.

Fine by me, we can't please all micro-architectures and NOP probably makes
more sense. However, I would rather you rework your aarch64_insn_gen_nop
function to actually generate hint instructions (since NOP is a hint alias
in AArch64), where you specify the alias as a parameter.

In other news, the GCC guys have started pushing a patch to add the %c
output template to the AArch64 backend:

http://gcc.gnu.org/ml/gcc-patches/2013-10/msg01314.html

Cheers, and sorry for the earlier confusion,

Will