2010-04-09 19:49:42

by Jason Baron

[permalink] [raw]
Subject: [PATCH 0/9] jump label v6

Hi,

Refresh of jump labeling patches aginst -tip tree. For bacground see:
http://marc.info/?l=linux-kernel&m=125858436505941&w=2

I believe I've addressed all the reviews from v5.

Changes in v6:

* I've moved Steve Rostedt's 'ftrace_dyn_arch_init()' to alternative.c to
put it into a common area for used by both ftrace and jump labels. By
default we put a 'jmp 5' in the nop slot. Then, when we detect the best
runtime no-op we patch over the 'jmp 5' with the appropriate nop.

* build time sort of the jump label table. The jump label table is more
optimally accessed if the entries are continguous. Sorting the table
accomplishes this. Do the sort at build-time. Adds a '-j' option to
'modpost' which replaces the vmlinux, with a sorted jump label section vmlinux.
I've tested this on x86 with relocatable and it works fine there as well. Note
that I have not sorted the jump label table in modules. This is b/c the jump
label names can be exported by the core kernel, and thus I don't have them
available at buildtime. This could be solved by either finding the correct
ones in the vmlinux, or by embedding the name of the jump label in the module
tables (and not just a pointer), but the module tables tend to be smaller, and
thus there is less value to this kind of change anyway. The kernel continues to
do the sort, just in case, but at least for the vmlinux, this is just a
verfication b/c the jump label table has already been sorted.

* added jump_label_text_reserved(), so that other routines that want to patch
the code, can first verify that they are not stomping on jump label addresses.

thanks,

-Jason

Jason Baron (8):
jump label: base patch
jump label: x86 support
jump label: tracepoint support
jump label: add module support
jump label: move ftrace_dyn_arch_init to common code
jump label: sort jump table at build-time
jump label: initialize workqueue tracepoints *before* they are
registered
jump label: jump_label_text_reserved() to reserve our jump points

Mathieu Desnoyers (1):
jump label: notifier atomic call chain notrace

Makefile | 4 +
arch/x86/include/asm/alternative.h | 14 ++
arch/x86/include/asm/jump_label.h | 27 +++
arch/x86/kernel/Makefile | 2 +-
arch/x86/kernel/alternative.c | 71 ++++++-
arch/x86/kernel/ftrace.c | 70 +------
arch/x86/kernel/jump_label.c | 42 ++++
arch/x86/kernel/kprobes.c | 3 +-
arch/x86/kernel/module.c | 3 +
arch/x86/kernel/ptrace.c | 1 +
arch/x86/kernel/setup.c | 2 +
include/asm-generic/vmlinux.lds.h | 22 ++-
include/linux/jump_label.h | 76 +++++++
include/linux/module.h | 5 +-
include/linux/tracepoint.h | 34 ++--
kernel/Makefile | 2 +-
kernel/jump_label.c | 428 ++++++++++++++++++++++++++++++++++++
kernel/kprobes.c | 3 +-
kernel/module.c | 7 +
kernel/notifier.c | 6 +-
kernel/trace/ftrace.c | 13 +-
kernel/trace/trace_workqueue.c | 10 +-
kernel/tracepoint.c | 8 +
scripts/mod/modpost.c | 69 ++++++-
scripts/mod/modpost.h | 9 +
25 files changed, 816 insertions(+), 115 deletions(-)
create mode 100644 arch/x86/include/asm/jump_label.h
create mode 100644 arch/x86/kernel/jump_label.c
create mode 100644 include/linux/jump_label.h
create mode 100644 kernel/jump_label.c


2010-04-09 19:49:52

by Jason Baron

[permalink] [raw]
Subject: [PATCH 1/9] jump label: notifier atomic call chain notrace

In LTTng, being able to use the atomic notifier from cpu idle entry to
ensure the tracer flush the last events in the current subbuffer
requires the rcu read-side to be marked "notrace", otherwise it can end
up calling back into lockdep and the tracer.

Also apply to the the die notifier.

Signed-off-by: Mathieu Desnoyers <[email protected]>
Signed-off-by: Jason Baron <[email protected]>
Reviewed-by: Masami Hiramatsu <[email protected]>
---
kernel/notifier.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/notifier.c b/kernel/notifier.c
index 2488ba7..88453a7 100644
--- a/kernel/notifier.c
+++ b/kernel/notifier.c
@@ -148,7 +148,7 @@ int atomic_notifier_chain_unregister(struct atomic_notifier_head *nh,
spin_lock_irqsave(&nh->lock, flags);
ret = notifier_chain_unregister(&nh->head, n);
spin_unlock_irqrestore(&nh->lock, flags);
- synchronize_rcu();
+ synchronize_sched();
return ret;
}
EXPORT_SYMBOL_GPL(atomic_notifier_chain_unregister);
@@ -178,9 +178,9 @@ int __kprobes __atomic_notifier_call_chain(struct atomic_notifier_head *nh,
{
int ret;

- rcu_read_lock();
+ rcu_read_lock_sched_notrace();
ret = notifier_call_chain(&nh->head, val, v, nr_to_call, nr_calls);
- rcu_read_unlock();
+ rcu_read_unlock_sched_notrace();
return ret;
}
EXPORT_SYMBOL_GPL(__atomic_notifier_call_chain);
--
1.7.0.1

2010-04-09 19:50:07

by Jason Baron

[permalink] [raw]
Subject: [PATCH 3/9] jump label: x86 support

add x86 support for jump label. I'm keeping this patch separate so its clear to
arch maintainers what was required for x86 support this new feature. hopefully,
it wouldn't be too painful for other arches.

Signed-off-by: Jason Baron <[email protected]>
---
arch/x86/include/asm/jump_label.h | 31 +++++++++++++++++++++
arch/x86/kernel/Makefile | 2 +-
arch/x86/kernel/jump_label.c | 53 +++++++++++++++++++++++++++++++++++++
3 files changed, 85 insertions(+), 1 deletions(-)
create mode 100644 arch/x86/include/asm/jump_label.h
create mode 100644 arch/x86/kernel/jump_label.c

diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h
new file mode 100644
index 0000000..b8ebdc8
--- /dev/null
+++ b/arch/x86/include/asm/jump_label.h
@@ -0,0 +1,31 @@
+#ifndef _ASM_X86_JUMP_LABEL_H
+#define _ASM_X86_JUMP_LABEL_H
+
+#include <asm/nops.h>
+
+#if (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 5))
+# define __HAVE_ARCH_JUMP_LABEL
+#endif
+
+#ifdef __HAVE_ARCH_JUMP_LABEL
+
+# ifdef CONFIG_X86_64
+# define JUMP_LABEL_NOP P6_NOP5
+# else
+# define JUMP_LABEL_NOP ".byte 0xe9 \n\t .long 0\n\t"
+# endif
+
+# define JUMP_LABEL(tag, label, cond) \
+ do { \
+ extern const char __jlstrtab_##tag[]; \
+ asm goto("1:" \
+ JUMP_LABEL_NOP \
+ ".pushsection __jump_table, \"a\" \n\t"\
+ _ASM_PTR "1b, %l[" #label "], %c0 \n\t" \
+ ".popsection \n\t" \
+ : : "i" (__jlstrtab_##tag) : : label);\
+ } while (0)
+
+# endif
+
+#endif
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 4c58352..7cd3bf4 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -32,7 +32,7 @@ GCOV_PROFILE_paravirt.o := n
obj-y := process_$(BITS).o signal.o entry_$(BITS).o
obj-y += traps.o irq.o irq_$(BITS).o dumpstack_$(BITS).o
obj-y += time.o ioport.o ldt.o dumpstack.o
-obj-y += setup.o x86_init.o i8259.o irqinit.o
+obj-y += setup.o x86_init.o i8259.o irqinit.o jump_label.o
obj-$(CONFIG_X86_VISWS) += visws_quirks.o
obj-$(CONFIG_X86_32) += probe_roms_32.o
obj-$(CONFIG_X86_32) += sys_i386_32.o i386_ksyms_32.o
diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
new file mode 100644
index 0000000..7fc4f84
--- /dev/null
+++ b/arch/x86/kernel/jump_label.c
@@ -0,0 +1,53 @@
+/*
+ * jump label x86 support
+ *
+ * Copyright (C) 2009 Jason Baron <[email protected]>
+ *
+ */
+#include <linux/jump_label.h>
+#include <linux/memory.h>
+#include <linux/uaccess.h>
+#include <linux/module.h>
+#include <linux/list.h>
+#include <linux/jhash.h>
+#include <linux/cpu.h>
+#include <asm/kprobes.h>
+
+#ifdef __HAVE_ARCH_JUMP_LABEL
+
+union jump_code_union {
+ char code[RELATIVEJUMP_SIZE];
+ struct {
+ char jump;
+ int offset;
+ } __attribute__((packed));
+};
+
+void arch_jump_label_transform(struct jump_entry *entry, enum jump_label_type type)
+{
+ union jump_code_union code;
+
+ if (type == JUMP_LABEL_ENABLE) {
+ code.jump = 0xe9;
+ code.offset = entry->target - (entry->code + RELATIVEJUMP_SIZE);
+ } else {
+#ifdef CONFIG_X86_64
+ /* opcode for P6_NOP5 */
+ code.code[0] = 0x0f;
+ code.code[1] = 0x1f;
+ code.code[2] = 0x44;
+ code.code[3] = 0x00;
+ code.code[4] = 0x00;
+#else
+ code.jump = 0xe9;
+ code.offset = 0;
+#endif
+ }
+ get_online_cpus();
+ mutex_lock(&text_mutex);
+ text_poke_smp((void *)entry->code, &code, RELATIVEJUMP_SIZE);
+ mutex_unlock(&text_mutex);
+ put_online_cpus();
+}
+
+#endif
--
1.7.0.1

2010-04-09 19:50:17

by Jason Baron

[permalink] [raw]
Subject: [PATCH 2/9] jump label: base patch

base patch to implement 'jump labeling'. Based on a new 'asm goto' inline
assembly gcc mechanism, we can now branch to labels from an 'asm goto'
statment. This allows us to create a 'no-op' fastpath, which can subsequently
be patched with a jump to the slowpath code. This is useful for code which
might be rarely used, but which we'd like to be able to call, if needed.
Tracepoints are the current usecase that these are being implemented for.

Signed-off-by: Jason Baron <[email protected]>
---
include/asm-generic/vmlinux.lds.h | 10 ++-
include/linux/jump_label.h | 59 ++++++++++++
kernel/Makefile | 2 +-
kernel/jump_label.c | 176 +++++++++++++++++++++++++++++++++++++
4 files changed, 245 insertions(+), 2 deletions(-)
create mode 100644 include/linux/jump_label.h
create mode 100644 kernel/jump_label.c

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 67e6520..83a469d 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -167,7 +167,8 @@
BRANCH_PROFILE() \
TRACE_PRINTKS() \
FTRACE_EVENTS() \
- TRACE_SYSCALLS()
+ TRACE_SYSCALLS() \
+ JUMP_TABLE() \

/*
* Data section helpers
@@ -206,6 +207,7 @@
*(__vermagic) /* Kernel version magic */ \
*(__markers_strings) /* Markers: strings */ \
*(__tracepoints_strings)/* Tracepoints: strings */ \
+ *(__jump_strings)/* Jump: strings */ \
} \
\
.rodata1 : AT(ADDR(.rodata1) - LOAD_OFFSET) { \
@@ -557,6 +559,12 @@
#define BUG_TABLE
#endif

+#define JUMP_TABLE() \
+ . = ALIGN(64); \
+ VMLINUX_SYMBOL(__start___jump_table) = .; \
+ *(__jump_table) \
+ VMLINUX_SYMBOL(__stop___jump_table) = .; \
+
#ifdef CONFIG_PM_TRACE
#define TRACEDATA \
. = ALIGN(4); \
diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
new file mode 100644
index 0000000..122d441
--- /dev/null
+++ b/include/linux/jump_label.h
@@ -0,0 +1,59 @@
+#ifndef _LINUX_JUMP_LABEL_H
+#define _LINUX_JUMP_LABEL_H
+
+#include <asm/jump_label.h>
+
+struct jump_entry {
+ unsigned long code;
+ unsigned long target;
+ char *name;
+};
+
+enum jump_label_type {
+ JUMP_LABEL_ENABLE,
+ JUMP_LABEL_DISABLE
+};
+
+#ifdef __HAVE_ARCH_JUMP_LABEL
+
+extern struct jump_entry __start___jump_table[];
+extern struct jump_entry __stop___jump_table[];
+
+#define DEFINE_JUMP_LABEL(name) \
+ const char __jlstrtab_##name[] \
+ __used __attribute__((section("__jump_strings"))) = #name;
+
+extern void arch_jump_label_transform(struct jump_entry *entry,
+ enum jump_label_type type);
+
+extern void jump_label_update(const char *name, enum jump_label_type type);
+
+#define enable_jump_label(name) \
+ jump_label_update(name, JUMP_LABEL_ENABLE);
+
+#define disable_jump_label(name) \
+ jump_label_update(name, JUMP_LABEL_DISABLE);
+
+#else
+
+#define DEFINE_JUMP_LABEL(name)
+
+#define JUMP_LABEL(tag, label, cond) \
+do { \
+ if (unlikely(cond)) \
+ goto label; \
+} while (0)
+
+static inline int enable_jump_label(const char *name)
+{
+ return 0;
+}
+
+static inline int disable_jump_label(const char *name)
+{
+ return 0;
+}
+
+#endif
+
+#endif
diff --git a/kernel/Makefile b/kernel/Makefile
index d5c3006..59ff12e 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -10,7 +10,7 @@ obj-y = sched.o fork.o exec_domain.o panic.o printk.o \
kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o mutex.o \
hrtimer.o rwsem.o nsproxy.o srcu.o semaphore.o \
notifier.o ksysfs.o pm_qos_params.o sched_clock.o cred.o \
- async.o range.o
+ async.o range.o jump_label.o
obj-$(CONFIG_HAVE_EARLY_RES) += early_res.o
obj-y += groups.o

diff --git a/kernel/jump_label.c b/kernel/jump_label.c
new file mode 100644
index 0000000..d5b7719
--- /dev/null
+++ b/kernel/jump_label.c
@@ -0,0 +1,176 @@
+/*
+ * jump label support
+ *
+ * Copyright (C) 2009 Jason Baron <[email protected]>
+ *
+ */
+#include <linux/jump_label.h>
+#include <linux/memory.h>
+#include <linux/uaccess.h>
+#include <linux/module.h>
+#include <asm/alternative.h>
+#include <linux/list.h>
+#include <linux/jhash.h>
+
+#ifdef __HAVE_ARCH_JUMP_LABEL
+
+#define JUMP_LABEL_HASH_BITS 6
+#define JUMP_LABEL_TABLE_SIZE (1 << JUMP_LABEL_HASH_BITS)
+static struct hlist_head jump_label_table[JUMP_LABEL_TABLE_SIZE];
+
+/* mutex to protect coming/going of the the jump_label table */
+static DEFINE_MUTEX(jump_label_mutex);
+
+struct jump_label_entry {
+ struct hlist_node hlist;
+ struct jump_entry *table;
+ int nr_entries;
+ /* hang modules off here */
+ struct hlist_head modules;
+ const char *name;
+};
+
+static void swap_jump_label_entries(struct jump_entry *previous, struct jump_entry *next)
+{
+ struct jump_entry tmp;
+
+ tmp = *next;
+ *next = *previous;
+ *previous = tmp;
+}
+
+static void sort_jump_label_entries(struct jump_entry *start, struct jump_entry *stop)
+{
+ int swapped = 0;
+ struct jump_entry *iter;
+ struct jump_entry *iter_next;
+
+ do {
+ swapped = 0;
+ iter = start;
+ iter_next = start;
+ iter_next++;
+ for (; iter_next < stop; iter++, iter_next++) {
+ if (strcmp(iter->name, iter_next->name) > 0) {
+ swap_jump_label_entries(iter, iter_next);
+ swapped = 1;
+ }
+ }
+ } while (swapped == 1);
+}
+
+static struct jump_label_entry *get_jump_label_entry(const char *name)
+{
+ struct hlist_head *head;
+ struct hlist_node *node;
+ struct jump_label_entry *e;
+ u32 hash = jhash(name, strlen(name), 0);
+
+ head = &jump_label_table[hash & (JUMP_LABEL_TABLE_SIZE - 1)];
+ hlist_for_each_entry(e, node, head, hlist) {
+ if (!strcmp(name, e->name))
+ return e;
+ }
+ return NULL;
+}
+
+static struct jump_label_entry *add_jump_label_entry(const char *name, int nr_entries, struct jump_entry *table)
+{
+ struct hlist_head *head;
+ struct jump_label_entry *e;
+ size_t name_len;
+ u32 hash;
+
+ e = get_jump_label_entry(name);
+ if (e)
+ return ERR_PTR(-EEXIST);
+
+ e = kmalloc(sizeof(struct jump_label_entry), GFP_KERNEL);
+ if (!e)
+ return ERR_PTR(-ENOMEM);
+
+ name_len = strlen(name) + 1;
+ hash = jhash(name, name_len-1, 0);
+ head = &jump_label_table[hash & (JUMP_LABEL_TABLE_SIZE - 1)];
+ e->name = name;
+ e->table = table;
+ e->nr_entries = nr_entries;
+ INIT_HLIST_HEAD(&(e->modules));
+ hlist_add_head(&e->hlist, head);
+ return e;
+}
+
+static int build_jump_label_hashtable(struct jump_entry *start, struct jump_entry *stop)
+{
+ struct jump_entry *iter, *iter_begin;
+ struct jump_label_entry *entry;
+ int count;
+
+ sort_jump_label_entries(start, stop);
+ iter = start;
+ while (iter < stop) {
+ entry = get_jump_label_entry(iter->name);
+ if (!entry) {
+ iter_begin = iter;
+ count = 0;
+ while ((iter < stop) &&
+ (strcmp(iter->name, iter_begin->name) == 0)) {
+ iter++;
+ count++;
+ }
+ entry = add_jump_label_entry(iter_begin->name, count,
+ iter_begin);
+ if (IS_ERR(entry))
+ return PTR_ERR(entry);
+ continue;
+ }
+ WARN(1, KERN_ERR "build_jump_hashtable: unexpected entry!\n");
+ }
+ return 0;
+}
+
+/***
+ * jump_label_update - update jump label text
+ * @name - name of the jump label
+ * @type - enum set to JUMP_LABEL_ENABLE or JUMP_LABEL_DISABLE
+ *
+ * Will enable/disable the jump for jump label @name, depending on the
+ * value of @type.
+ *
+ */
+
+void jump_label_update(const char *name, enum jump_label_type type)
+{
+ struct jump_entry *iter;
+ struct jump_label_entry *entry;
+ struct hlist_node *module_node;
+ struct jump_label_module_entry *e_module;
+ int count;
+
+ mutex_lock(&jump_label_mutex);
+ entry = get_jump_label_entry(name);
+ if (entry) {
+ count = entry->nr_entries;
+ iter = entry->table;
+ while (count--) {
+ if (kernel_text_address(iter->code))
+ arch_jump_label_transform(iter, type);
+ iter++;
+ }
+ }
+ mutex_unlock(&jump_label_mutex);
+}
+
+static int init_jump_label(void)
+{
+ int ret;
+
+ mutex_lock(&jump_label_mutex);
+ ret = build_jump_label_hashtable(__start___jump_table,
+ __stop___jump_table);
+ mutex_unlock(&jump_label_mutex);
+ return ret;
+}
+early_initcall(init_jump_label);
+
+#endif
--
1.7.0.1

2010-04-09 19:50:26

by Jason Baron

[permalink] [raw]
Subject: [PATCH 6/9] jump label: move ftrace_dyn_arch_init to common code

Move Steve's code for finding the best 5-byte no-op from ftrace.c to alternative.c.
The idea is that other consumers (in this case jump label) want to make use of
that code. I've created a global: 'char ideal_nop[5]', that is setup during
setup_arch that can be used.

Signed-off-by: Jason Baron <[email protected]>
---
arch/x86/include/asm/alternative.h | 14 +++++++
arch/x86/include/asm/jump_label.h | 10 ++----
arch/x86/kernel/alternative.c | 71 +++++++++++++++++++++++++++++++++++-
arch/x86/kernel/ftrace.c | 70 +-----------------------------------
arch/x86/kernel/jump_label.c | 15 +-------
arch/x86/kernel/module.c | 3 ++
arch/x86/kernel/setup.c | 2 +
include/linux/jump_label.h | 9 +++++
kernel/jump_label.c | 33 +++++++++++++++++
kernel/trace/ftrace.c | 13 +------
10 files changed, 137 insertions(+), 103 deletions(-)

diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index b09ec55..0218dbd 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -5,6 +5,7 @@
#include <linux/stddef.h>
#include <linux/stringify.h>
#include <asm/asm.h>
+#include <asm/jump_label.h>

/*
* Alternative inline assembly for SMP.
@@ -153,6 +154,8 @@ static inline void apply_paravirt(struct paravirt_patch_site *start,
#define __parainstructions_end NULL
#endif

+extern void *text_poke_early(void *addr, const void *opcode, size_t len);
+
/*
* Clear and restore the kernel write-protection flag on the local CPU.
* Allows the kernel to edit read-only pages.
@@ -173,4 +176,15 @@ static inline void apply_paravirt(struct paravirt_patch_site *start,
extern void *text_poke(void *addr, const void *opcode, size_t len);
extern void *text_poke_smp(void *addr, const void *opcode, size_t len);

+#if defined(CONFIG_DYNAMIC_FTRACE) || defined(__HAVE_ARCH_JUMP_LABEL)
+#define IDEAL_NOP_SIZE_5 5
+extern unsigned char ideal_nop5[IDEAL_NOP_SIZE_5];
+extern int arch_init_ideal_nop5(void);
+#else
+static inline arch_init_ideal_nop5(void)
+{
+ return 0;
+}
+#endif
+
#endif /* _ASM_X86_ALTERNATIVE_H */
diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h
index b8ebdc8..e3af6ca 100644
--- a/arch/x86/include/asm/jump_label.h
+++ b/arch/x86/include/asm/jump_label.h
@@ -9,23 +9,19 @@

#ifdef __HAVE_ARCH_JUMP_LABEL

-# ifdef CONFIG_X86_64
-# define JUMP_LABEL_NOP P6_NOP5
-# else
-# define JUMP_LABEL_NOP ".byte 0xe9 \n\t .long 0\n\t"
-# endif
+# define JUMP_LABEL_INITIAL_NOP ".byte 0xe9 \n\t .long 0\n\t"

# define JUMP_LABEL(tag, label, cond) \
do { \
extern const char __jlstrtab_##tag[]; \
asm goto("1:" \
- JUMP_LABEL_NOP \
+ JUMP_LABEL_INITIAL_NOP \
".pushsection __jump_table, \"a\" \n\t"\
_ASM_PTR "1b, %l[" #label "], %c0 \n\t" \
".popsection \n\t" \
: : "i" (__jlstrtab_##tag) : : label);\
} while (0)

-# endif
+#endif

#endif
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 3a4bf35..083ce9d 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -194,7 +194,6 @@ static void __init_or_module add_nops(void *insns, unsigned int len)

extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
extern u8 *__smp_locks[], *__smp_locks_end[];
-static void *text_poke_early(void *addr, const void *opcode, size_t len);

/* Replace instructions with better alternatives for this CPU type.
This runs before SMP is initialized to avoid SMP problems with
@@ -513,7 +512,7 @@ void __init alternative_instructions(void)
* instructions. And on the local CPU you need to be protected again NMI or MCE
* handlers seeing an inconsistent instruction while you patch.
*/
-static void *__init_or_module text_poke_early(void *addr, const void *opcode,
+void *__init_or_module text_poke_early(void *addr, const void *opcode,
size_t len)
{
unsigned long flags;
@@ -632,3 +631,71 @@ void *__kprobes text_poke_smp(void *addr, const void *opcode, size_t len)
return addr;
}

+#if defined(CONFIG_DYNAMIC_FTRACE) || defined(__HAVE_ARCH_JUMP_LABEL)
+
+unsigned char ideal_nop5[IDEAL_NOP_SIZE_5];
+
+int __init arch_init_ideal_nop5(void)
+{
+ extern const unsigned char ftrace_test_p6nop[];
+ extern const unsigned char ftrace_test_nop5[];
+ extern const unsigned char ftrace_test_jmp[];
+ int faulted = 0;
+ unsigned long flags;
+
+ local_irq_save(flags);
+ /*
+ * There is no good nop for all x86 archs.
+ * We will default to using the P6_NOP5, but first we
+ * will test to make sure that the nop will actually
+ * work on this CPU. If it faults, we will then
+ * go to a lesser efficient 5 byte nop. If that fails
+ * we then just use a jmp as our nop. This isn't the most
+ * efficient nop, but we can not use a multi part nop
+ * since we would then risk being preempted in the middle
+ * of that nop, and if we enabled tracing then, it might
+ * cause a system crash.
+ *
+ * TODO: check the cpuid to determine the best nop.
+ */
+ asm volatile (
+ "ftrace_test_jmp:"
+ "jmp ftrace_test_p6nop\n"
+ "nop\n"
+ "nop\n"
+ "nop\n" /* 2 byte jmp + 3 bytes */
+ "ftrace_test_p6nop:"
+ P6_NOP5
+ "jmp 1f\n"
+ "ftrace_test_nop5:"
+ ".byte 0x66,0x66,0x66,0x66,0x90\n"
+ "1:"
+ ".section .fixup, \"ax\"\n"
+ "2: movl $1, %0\n"
+ " jmp ftrace_test_nop5\n"
+ "3: movl $2, %0\n"
+ " jmp 1b\n"
+ ".previous\n"
+ _ASM_EXTABLE(ftrace_test_p6nop, 2b)
+ _ASM_EXTABLE(ftrace_test_nop5, 3b)
+ : "=r"(faulted) : "0" (faulted));
+
+ switch (faulted) {
+ case 0:
+ pr_info("converting mcount calls to 0f 1f 44 00 00\n");
+ memcpy(ideal_nop5, ftrace_test_p6nop, IDEAL_NOP_SIZE_5);
+ break;
+ case 1:
+ pr_info("converting mcount calls to 66 66 66 66 90\n");
+ memcpy(ideal_nop5, ftrace_test_nop5, IDEAL_NOP_SIZE_5);
+ break;
+ case 2:
+ pr_info("converting mcount calls to jmp . + 5\n");
+ memcpy(ideal_nop5, ftrace_test_jmp, IDEAL_NOP_SIZE_5);
+ break;
+ }
+
+ local_irq_restore(flags);
+ return 0;
+}
+#endif
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index cd37469..ba2e0d9 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -257,14 +257,9 @@ do_ftrace_mod_code(unsigned long ip, void *new_code)
return mod_code_status;
}

-
-
-
-static unsigned char ftrace_nop[MCOUNT_INSN_SIZE];
-
static unsigned char *ftrace_nop_replace(void)
{
- return ftrace_nop;
+ return ideal_nop5;
}

static int
@@ -336,69 +331,6 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
return ret;
}

-int __init ftrace_dyn_arch_init(void *data)
-{
- extern const unsigned char ftrace_test_p6nop[];
- extern const unsigned char ftrace_test_nop5[];
- extern const unsigned char ftrace_test_jmp[];
- int faulted = 0;
-
- /*
- * There is no good nop for all x86 archs.
- * We will default to using the P6_NOP5, but first we
- * will test to make sure that the nop will actually
- * work on this CPU. If it faults, we will then
- * go to a lesser efficient 5 byte nop. If that fails
- * we then just use a jmp as our nop. This isn't the most
- * efficient nop, but we can not use a multi part nop
- * since we would then risk being preempted in the middle
- * of that nop, and if we enabled tracing then, it might
- * cause a system crash.
- *
- * TODO: check the cpuid to determine the best nop.
- */
- asm volatile (
- "ftrace_test_jmp:"
- "jmp ftrace_test_p6nop\n"
- "nop\n"
- "nop\n"
- "nop\n" /* 2 byte jmp + 3 bytes */
- "ftrace_test_p6nop:"
- P6_NOP5
- "jmp 1f\n"
- "ftrace_test_nop5:"
- ".byte 0x66,0x66,0x66,0x66,0x90\n"
- "1:"
- ".section .fixup, \"ax\"\n"
- "2: movl $1, %0\n"
- " jmp ftrace_test_nop5\n"
- "3: movl $2, %0\n"
- " jmp 1b\n"
- ".previous\n"
- _ASM_EXTABLE(ftrace_test_p6nop, 2b)
- _ASM_EXTABLE(ftrace_test_nop5, 3b)
- : "=r"(faulted) : "0" (faulted));
-
- switch (faulted) {
- case 0:
- pr_info("converting mcount calls to 0f 1f 44 00 00\n");
- memcpy(ftrace_nop, ftrace_test_p6nop, MCOUNT_INSN_SIZE);
- break;
- case 1:
- pr_info("converting mcount calls to 66 66 66 66 90\n");
- memcpy(ftrace_nop, ftrace_test_nop5, MCOUNT_INSN_SIZE);
- break;
- case 2:
- pr_info("converting mcount calls to jmp . + 5\n");
- memcpy(ftrace_nop, ftrace_test_jmp, MCOUNT_INSN_SIZE);
- break;
- }
-
- /* The return code is retured via data */
- *(unsigned long *)data = 0;
-
- return 0;
-}
#endif

#ifdef CONFIG_FUNCTION_GRAPH_TRACER
diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
index 7fc4f84..8eca1b8 100644
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -30,19 +30,8 @@ void arch_jump_label_transform(struct jump_entry *entry, enum jump_label_type ty
if (type == JUMP_LABEL_ENABLE) {
code.jump = 0xe9;
code.offset = entry->target - (entry->code + RELATIVEJUMP_SIZE);
- } else {
-#ifdef CONFIG_X86_64
- /* opcode for P6_NOP5 */
- code.code[0] = 0x0f;
- code.code[1] = 0x1f;
- code.code[2] = 0x44;
- code.code[3] = 0x00;
- code.code[4] = 0x00;
-#else
- code.jump = 0xe9;
- code.offset = 0;
-#endif
- }
+ } else
+ memcpy(&code, ideal_nop5, 5);
get_online_cpus();
mutex_lock(&text_mutex);
text_poke_smp((void *)entry->code, &code, RELATIVEJUMP_SIZE);
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index 89f386f..e47fe49 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -238,6 +238,9 @@ int module_finalize(const Elf_Ehdr *hdr,
apply_paravirt(pseg, pseg + para->sh_size);
}

+ /* make jump label nops */
+ apply_jump_label_nops(me);
+
return module_bug_finalize(hdr, sechdrs, me);
}

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 5d7ba1a..7a4577a 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1040,6 +1040,8 @@ void __init setup_arch(char **cmdline_p)
x86_init.oem.banner();

mcheck_init();
+
+ arch_init_ideal_nop5();
}

#ifdef CONFIG_X86_32
diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index e0f968d..9868c43 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -14,6 +14,8 @@ enum jump_label_type {
JUMP_LABEL_DISABLE
};

+struct module;
+
#ifdef __HAVE_ARCH_JUMP_LABEL

extern struct jump_entry __start___jump_table[];
@@ -29,6 +31,8 @@ extern void arch_jump_label_transform(struct jump_entry *entry,

extern void jump_label_update(const char *name, enum jump_label_type type);

+extern void apply_jump_label_nops(struct module *mod);
+
#define enable_jump_label(name) \
jump_label_update(name, JUMP_LABEL_ENABLE);

@@ -55,6 +59,11 @@ static inline int disable_jump_label(const char *name)
return 0;
}

+static inline int apply_jump_label_nops(struct module *mod)
+{
+ return 0;
+}
+
#endif

#endif
diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index 0714c20..7e7458b 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -182,10 +182,19 @@ void jump_label_update(const char *name, enum jump_label_type type)
static int init_jump_label(void)
{
int ret;
+ struct jump_entry *iter_start = __start___jump_table;
+ struct jump_entry *iter_stop = __start___jump_table;
+ struct jump_entry *iter;

mutex_lock(&jump_label_mutex);
ret = build_jump_label_hashtable(__start___jump_table,
__stop___jump_table);
+ /* update with ideal nop */
+ iter = iter_start;
+ while (iter < iter_stop) {
+ text_poke_early((void *)iter->code, ideal_nop5, IDEAL_NOP_SIZE_5);
+ iter++;
+ }
mutex_unlock(&jump_label_mutex);
return ret;
}
@@ -296,6 +305,30 @@ static int jump_label_module_notify(struct notifier_block *self, unsigned long v
return ret;
}

+/***
+ * apply_jump_label_nops - patch module jump labels with ideal_nop5
+ * @mod: module to patch
+ *
+ * When a module is intially loaded the code has 'jump 5' instructions
+ * as nops. These are not the most optimal nops, so before the module
+ * loads patch these with the 'ideal_nop5', which was determined boot
+ * time.
+ */
+void apply_jump_label_nops(struct module *mod)
+{
+ struct jump_entry *iter;
+
+ /* if the module doesn't have jump label entries, just return */
+ if (!mod->num_jump_entries)
+ return;
+
+ iter = mod->jump_entries;
+ while (iter < mod->jump_entries + mod->num_jump_entries) {
+ text_poke_early((void *)iter->code, ideal_nop5, IDEAL_NOP_SIZE_5);
+ iter++;
+ }
+}
+
struct notifier_block jump_label_module_nb = {
.notifier_call = jump_label_module_notify,
.priority = 0,
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 8378357..abe6eaf 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -2725,20 +2725,9 @@ extern unsigned long __stop_mcount_loc[];

void __init ftrace_init(void)
{
- unsigned long count, addr, flags;
+ unsigned long count;
int ret;

- /* Keep the ftrace pointer to the stub */
- addr = (unsigned long)ftrace_stub;
-
- local_irq_save(flags);
- ftrace_dyn_arch_init(&addr);
- local_irq_restore(flags);
-
- /* ftrace_dyn_arch_init places the return code in addr */
- if (addr)
- goto failed;
-
count = __stop_mcount_loc - __start_mcount_loc;

ret = ftrace_dyn_table_alloc(count);
--
1.7.0.1

2010-04-09 19:50:33

by Jason Baron

[permalink] [raw]
Subject: [PATCH 8/9] jump label: initialize workqueue tracepoints *before* they are registered

Initialize the workqueue data structures *before* they are registered
so that they are ready for callbacks.

Signed-off-by: Jason Baron <[email protected]>
---
kernel/trace/trace_workqueue.c | 10 +++++-----
1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/trace/trace_workqueue.c b/kernel/trace/trace_workqueue.c
index 40cafb0..487d821 100644
--- a/kernel/trace/trace_workqueue.c
+++ b/kernel/trace/trace_workqueue.c
@@ -258,6 +258,11 @@ int __init trace_workqueue_early_init(void)
{
int ret, cpu;

+ for_each_possible_cpu(cpu) {
+ spin_lock_init(&workqueue_cpu_stat(cpu)->lock);
+ INIT_LIST_HEAD(&workqueue_cpu_stat(cpu)->list);
+ }
+
ret = register_trace_workqueue_insertion(probe_workqueue_insertion);
if (ret)
goto out;
@@ -274,11 +279,6 @@ int __init trace_workqueue_early_init(void)
if (ret)
goto no_creation;

- for_each_possible_cpu(cpu) {
- spin_lock_init(&workqueue_cpu_stat(cpu)->lock);
- INIT_LIST_HEAD(&workqueue_cpu_stat(cpu)->list);
- }
-
return 0;

no_creation:
--
1.7.0.1

2010-04-09 19:50:52

by Jason Baron

[permalink] [raw]
Subject: [PATCH 9/9] jump label: jump_label_text_reserved() to reserve our jump points

Add a jump_label_text_reserved(void *start, void *end), so that other
pieces of code that want to modify kernel text, can first verify that
jump label has not reserved the instruction.

Signed-off-by: Jason Baron <[email protected]>
---
arch/x86/kernel/kprobes.c | 3 +-
include/linux/jump_label.h | 6 +++
kernel/jump_label.c | 83 ++++++++++++++++++++++++++++++++++++++++++++
kernel/kprobes.c | 3 +-
4 files changed, 93 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
index b43bbae..87bcf63 100644
--- a/arch/x86/kernel/kprobes.c
+++ b/arch/x86/kernel/kprobes.c
@@ -1194,7 +1194,8 @@ static int __kprobes copy_optimized_instructions(u8 *dest, u8 *src)
}
/* Check whether the address range is reserved */
if (ftrace_text_reserved(src, src + len - 1) ||
- alternatives_text_reserved(src, src + len - 1))
+ alternatives_text_reserved(src, src + len - 1) ||
+ jump_label_text_reserved(src, src + len - 1))
return -EBUSY;

return len;
diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index 7238805..90ca4b6 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -31,6 +31,7 @@ extern void arch_jump_label_transform(struct jump_entry *entry,
enum jump_label_type type);

extern void jump_label_update(const char *name, enum jump_label_type type);
+extern int jump_label_text_reserved(void *start, void *end);

extern void apply_jump_label_nops(struct module *mod);

@@ -65,6 +66,11 @@ static inline int apply_jump_label_nops(struct module *mod)
return 0;
}

+static inline int jump_label_text_reserved(void *start, void *end)
+{
+ return 0;
+}
+
#endif

#endif
diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index 7e7458b..24bba61 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -179,6 +179,89 @@ void jump_label_update(const char *name, enum jump_label_type type)
mutex_unlock(&jump_label_mutex);
}

+static int addr_conflict(struct jump_entry *entry, void *start, void *end)
+{
+ if (entry->code <= (unsigned long)end &&
+ entry->code + IDEAL_NOP_SIZE_5 > (unsigned long)start)
+ return 1;
+
+ return 0;
+}
+
+#ifdef CONFIG_MODULES
+
+static int module_conflict(void *start, void *end)
+{
+ struct hlist_head *head;
+ struct hlist_node *node, *node_next, *module_node, *module_node_next;
+ struct jump_label_entry *e;
+ struct jump_label_module_entry *e_module;
+ struct jump_entry *iter;
+ int i, count;
+ int conflict = 0;
+
+ for (i = 0; i < JUMP_LABEL_TABLE_SIZE; i++) {
+ head = &jump_label_table[i];
+ hlist_for_each_entry_safe(e, node, node_next, head, hlist) {
+ hlist_for_each_entry_safe(e_module, module_node,
+ module_node_next,
+ &(e->modules), hlist) {
+ count = e_module->nr_entries;
+ iter = e_module->table;
+ while (count--) {
+ if (addr_conflict(iter, start, end)) {
+ conflict = 1;
+ goto out;
+ }
+ iter++;
+ }
+ }
+ }
+ }
+out:
+ return conflict;
+}
+
+#endif
+
+/***
+ * jump_label_text_reserved - check if addr range is reserved
+ * @start: start text addr
+ * @end: end text addr
+ *
+ * checks if the text addr located between @start and @end
+ * overlaps with any of the jump label patch addresses. Code
+ * that wants to modify kernel text should first verify that
+ * it does not overlap with any of the jump label addresses.
+ *
+ * returns 1 if there is an overlap, 0 otherwise
+ */
+int jump_label_text_reserved(void *start, void *end)
+{
+ struct jump_entry *iter;
+ struct jump_entry *iter_start = __start___jump_table;
+ struct jump_entry *iter_stop = __start___jump_table;
+ int conflict = 0;
+
+ mutex_lock(&jump_label_mutex);
+ iter = iter_start;
+ while (iter < iter_stop) {
+ if (addr_conflict(iter, start, end)) {
+ conflict = 1;
+ goto out;
+ }
+ iter++;
+ }
+
+ /* now check modules */
+#ifdef CONFIG_MODULES
+ conflict = module_conflict(start, end);
+#endif
+out:
+ mutex_unlock(&jump_label_mutex);
+ return conflict;
+}
+
static int init_jump_label(void)
{
int ret;
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index fa034d2..a57755f 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1140,7 +1140,8 @@ int __kprobes register_kprobe(struct kprobe *p)
preempt_disable();
if (!kernel_text_address((unsigned long) p->addr) ||
in_kprobes_functions((unsigned long) p->addr) ||
- ftrace_text_reserved(p->addr, p->addr)) {
+ ftrace_text_reserved(p->addr, p->addr) ||
+ jump_label_text_reserved(p->addr, p->addr)) {
preempt_enable();
return -EINVAL;
}
--
1.7.0.1

2010-04-09 19:50:55

by Jason Baron

[permalink] [raw]
Subject: [PATCH 5/9] jump label: add module support

Add support for 'jump label' for modules.

Signed-off-by: Jason Baron <[email protected]>
---
arch/x86/kernel/ptrace.c | 1 +
include/linux/jump_label.h | 3 +-
include/linux/module.h | 5 +-
kernel/jump_label.c | 136 ++++++++++++++++++++++++++++++++++++++++++++
kernel/module.c | 7 ++
5 files changed, 150 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 2d96aab..21854d2 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -24,6 +24,7 @@
#include <linux/workqueue.h>
#include <linux/perf_event.h>
#include <linux/hw_breakpoint.h>
+#include <linux/module.h>

#include <asm/uaccess.h>
#include <asm/pgtable.h>
diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index 122d441..e0f968d 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -21,7 +21,8 @@ extern struct jump_entry __stop___jump_table[];

#define DEFINE_JUMP_LABEL(name) \
const char __jlstrtab_##name[] \
- __used __attribute__((section("__jump_strings"))) = #name;
+ __used __attribute__((section("__jump_strings"))) = #name; \
+ EXPORT_SYMBOL_GPL(__jlstrtab_##name);

extern void arch_jump_label_transform(struct jump_entry *entry,
enum jump_label_type type);
diff --git a/include/linux/module.h b/include/linux/module.h
index dd618eb..6d7225e 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -339,7 +339,10 @@ struct module
struct tracepoint *tracepoints;
unsigned int num_tracepoints;
#endif
-
+#ifdef __HAVE_ARCH_JUMP_LABEL
+ struct jump_entry *jump_entries;
+ unsigned int num_jump_entries;
+#endif
#ifdef CONFIG_TRACING
const char **trace_bprintk_fmt_start;
unsigned int num_trace_bprintk_fmt;
diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index d5b7719..0714c20 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -30,6 +30,13 @@ struct jump_label_entry {
const char *name;
};

+struct jump_label_module_entry {
+ struct hlist_node hlist;
+ struct jump_entry *table;
+ int nr_entries;
+ struct module *mod;
+};
+
static void swap_jump_label_entries(struct jump_entry *previous, struct jump_entry *next)
{
struct jump_entry tmp;
@@ -157,6 +164,17 @@ void jump_label_update(const char *name, enum jump_label_type type)
arch_jump_label_transform(iter, type);
iter++;
}
+ /* eanble/disable jump labels in modules */
+ hlist_for_each_entry(e_module, module_node, &(entry->modules),
+ hlist) {
+ count = e_module->nr_entries;
+ iter = e_module->table;
+ while (count--) {
+ if (kernel_text_address(iter->code))
+ arch_jump_label_transform(iter, type);
+ iter++;
+ }
+ }
}
mutex_unlock(&jump_label_mutex);
}
@@ -173,4 +191,122 @@ static int init_jump_label(void)
}
early_initcall(init_jump_label);

+#ifdef CONFIG_MODULES
+
+static struct jump_label_module_entry *add_jump_label_module_entry(struct jump_label_entry *entry, struct jump_entry *iter_begin, int count, struct module *mod)
+{
+ struct jump_label_module_entry *e;
+
+ e = kmalloc(sizeof(struct jump_label_module_entry), GFP_KERNEL);
+ if (!e)
+ return ERR_PTR(-ENOMEM);
+ e->mod = mod;
+ e->nr_entries = count;
+ e->table = iter_begin;
+ hlist_add_head(&e->hlist, &entry->modules);
+ return e;
+}
+
+static int add_jump_label_module(struct module *mod)
+{
+ struct jump_entry *iter, *iter_begin;
+ struct jump_label_entry *entry;
+ struct jump_label_module_entry *module_entry;
+ int count;
+
+ /* if the module doesn't have jump label entries, just return */
+ if (!mod->num_jump_entries)
+ return 0;
+
+ sort_jump_label_entries(mod->jump_entries,
+ mod->jump_entries + mod->num_jump_entries);
+ iter = mod->jump_entries;
+ while (iter < mod->jump_entries + mod->num_jump_entries) {
+ entry = get_jump_label_entry(iter->name);
+ iter_begin = iter;
+ count = 0;
+ while ((iter < mod->jump_entries + mod->num_jump_entries) &&
+ (strcmp(iter->name, iter_begin->name) == 0)) {
+ iter++;
+ count++;
+ }
+ if (!entry) {
+ entry = add_jump_label_entry(iter_begin->name, 0, NULL);
+ if (IS_ERR(entry))
+ return PTR_ERR(entry);
+ }
+ module_entry = add_jump_label_module_entry(entry, iter_begin,
+ count, mod);
+ if (IS_ERR(module_entry))
+ return PTR_ERR(module_entry);
+ }
+ return 0;
+}
+
+static void remove_jump_label_module(struct module *mod)
+{
+ struct hlist_head *head;
+ struct hlist_node *node, *node_next, *module_node, *module_node_next;
+ struct jump_label_entry *e;
+ struct jump_label_module_entry *e_module;
+ int i;
+
+ /* if the module doesn't have jump label entries, just return */
+ if (!mod->num_jump_entries)
+ return;
+
+ for (i = 0; i < JUMP_LABEL_TABLE_SIZE; i++) {
+ head = &jump_label_table[i];
+ hlist_for_each_entry_safe(e, node, node_next, head, hlist) {
+ hlist_for_each_entry_safe(e_module, module_node,
+ module_node_next,
+ &(e->modules), hlist) {
+ if (e_module->mod == mod) {
+ hlist_del(&e_module->hlist);
+ kfree(e_module);
+ }
+ }
+ if (hlist_empty(&e->modules) && (e->nr_entries == 0)) {
+ hlist_del(&e->hlist);
+ kfree(e);
+ }
+ }
+ }
+}
+
+static int jump_label_module_notify(struct notifier_block *self, unsigned long val, void *data)
+{
+ struct module *mod = data;
+ int ret = 0;
+
+ switch (val) {
+ case MODULE_STATE_COMING:
+ mutex_lock(&jump_label_mutex);
+ ret = add_jump_label_module(mod);
+ if (ret)
+ remove_jump_label_module(mod);
+ mutex_unlock(&jump_label_mutex);
+ break;
+ case MODULE_STATE_GOING:
+ mutex_lock(&jump_label_mutex);
+ remove_jump_label_module(mod);
+ mutex_unlock(&jump_label_mutex);
+ break;
+ }
+ return ret;
+}
+
+struct notifier_block jump_label_module_nb = {
+ .notifier_call = jump_label_module_notify,
+ .priority = 0,
+};
+
+static int init_jump_label_module(void)
+{
+ return register_module_notifier(&jump_label_module_nb);
+}
+early_initcall(init_jump_label_module);
+
+#endif /* CONFIG_MODULES */
+
#endif
diff --git a/kernel/module.c b/kernel/module.c
index c968d36..a8c34a2 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -55,6 +55,7 @@
#include <linux/async.h>
#include <linux/percpu.h>
#include <linux/kmemleak.h>
+#include <linux/jump_label.h>

#define CREATE_TRACE_POINTS
#include <trace/events/module.h>
@@ -2249,6 +2250,12 @@ static noinline struct module *load_module(void __user *umod,
sizeof(*mod->tracepoints),
&mod->num_tracepoints);
#endif
+#ifdef __HAVE_ARCH_JUMP_LABEL
+ mod->jump_entries = section_objs(hdr, sechdrs, secstrings,
+ "__jump_table",
+ sizeof(*mod->jump_entries),
+ &mod->num_jump_entries);
+#endif
#ifdef CONFIG_EVENT_TRACING
mod->trace_events = section_objs(hdr, sechdrs, secstrings,
"_ftrace_events",
--
1.7.0.1

2010-04-09 19:51:38

by Jason Baron

[permalink] [raw]
Subject: [PATCH 4/9] jump label: tracepoint support

Make use of the jump label infrastructure for tracepoints.

Signed-off-by: Jason Baron <[email protected]>
---
include/linux/tracepoint.h | 34 +++++++++++++++++++---------------
kernel/tracepoint.c | 8 ++++++++
2 files changed, 27 insertions(+), 15 deletions(-)

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index f59604e..c18b9c0 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -16,6 +16,7 @@

#include <linux/types.h>
#include <linux/rcupdate.h>
+#include <linux/jump_label.h>

struct module;
struct tracepoint;
@@ -63,20 +64,22 @@ struct tracepoint {
* not add unwanted padding between the beginning of the section and the
* structure. Force alignment to the same alignment as the section start.
*/
-#define DECLARE_TRACE(name, proto, args) \
- extern struct tracepoint __tracepoint_##name; \
- static inline void trace_##name(proto) \
- { \
- if (unlikely(__tracepoint_##name.state)) \
- __DO_TRACE(&__tracepoint_##name, \
- TP_PROTO(proto), TP_ARGS(args)); \
- } \
- static inline int register_trace_##name(void (*probe)(proto)) \
- { \
- return tracepoint_probe_register(#name, (void *)probe); \
- } \
- static inline int unregister_trace_##name(void (*probe)(proto)) \
- { \
+#define DECLARE_TRACE(name, proto, args) \
+ extern struct tracepoint __tracepoint_##name; \
+ static inline void trace_##name(proto) \
+ { \
+ JUMP_LABEL(name, do_trace, __tracepoint_##name.state); \
+ return; \
+do_trace: \
+ __DO_TRACE(&__tracepoint_##name, \
+ TP_PROTO(proto), TP_ARGS(args)); \
+ } \
+ static inline int register_trace_##name(void (*probe)(proto)) \
+ { \
+ return tracepoint_probe_register(#name, (void *)probe); \
+ } \
+ static inline int unregister_trace_##name(void (*probe)(proto)) \
+ { \
return tracepoint_probe_unregister(#name, (void *)probe);\
}

@@ -86,7 +89,8 @@ struct tracepoint {
__attribute__((section("__tracepoints_strings"))) = #name; \
struct tracepoint __tracepoint_##name \
__attribute__((section("__tracepoints"), aligned(32))) = \
- { __tpstrtab_##name, 0, reg, unreg, NULL }
+ { __tpstrtab_##name, 0, reg, unreg, NULL }; \
+ DEFINE_JUMP_LABEL(name);

#define DEFINE_TRACE(name) \
DEFINE_TRACE_FN(name, NULL, NULL);
diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index cc89be5..8acced8 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -25,6 +25,7 @@
#include <linux/err.h>
#include <linux/slab.h>
#include <linux/sched.h>
+#include <linux/jump_label.h>

extern struct tracepoint __start___tracepoints[];
extern struct tracepoint __stop___tracepoints[];
@@ -256,6 +257,10 @@ static void set_tracepoint(struct tracepoint_entry **entry,
* is used.
*/
rcu_assign_pointer(elem->funcs, (*entry)->funcs);
+ if (!elem->state && active)
+ enable_jump_label(elem->name);
+ if (elem->state && !active)
+ disable_jump_label(elem->name);
elem->state = active;
}

@@ -270,6 +275,9 @@ static void disable_tracepoint(struct tracepoint *elem)
if (elem->unregfunc && elem->state)
elem->unregfunc();

+ if (elem->state)
+ disable_jump_label(elem->name);
+
elem->state = 0;
rcu_assign_pointer(elem->funcs, NULL);
}
--
1.7.0.1

2010-04-09 19:51:48

by Jason Baron

[permalink] [raw]
Subject: [PATCH 7/9] jump label: sort jump table at build-time

The jump label table is more optimal accessed if the entries are continguous.
Sorting the table accomplishes this. Do the sort at build-time. Adds a '-j'
option to 'modpost' which replaces the vmlinux, with a sorted jump label
section vmlinux. I've tested this on x86 with relocatable and it works fine
there as well. Note that I have not sorted the jump label table in modules.
This is b/c the jump label names can be exported by the core kernel, and thus
I don't have them available at buildtime. This could be solved by either
finding the correct ones in the vmlinux, or by embedding the name of the jump
label in the module tables (and not just a pointer), but the module tables
tend to be smaller, and thus their is less value to this kind of change
anyway. The kernel continues to do the sort, just in case, but at least for
the vmlinux, this is just a verfication that the jump label table has
already been sorted.

Signed-off-by: Jason Baron <[email protected]>
---
Makefile | 4 ++
include/asm-generic/vmlinux.lds.h | 18 ++++++++--
include/linux/jump_label.h | 1 +
scripts/mod/modpost.c | 69 +++++++++++++++++++++++++++++++++++--
scripts/mod/modpost.h | 9 +++++
5 files changed, 95 insertions(+), 6 deletions(-)

diff --git a/Makefile b/Makefile
index 558ddec..f59dd57 100644
--- a/Makefile
+++ b/Makefile
@@ -845,6 +845,9 @@ define rule_vmlinux-modpost
$(Q)echo 'cmd_$@ := $(cmd_vmlinux-modpost)' > $(dot-target).cmd
endef

+quiet_cmd_sort-jump-label = SORT $@
+ cmd_sort-jump-label = $(srctree)/scripts/mod/modpost -j $@
+
# vmlinux image - including updated kernel symbols
vmlinux: $(vmlinux-lds) $(vmlinux-init) $(vmlinux-main) vmlinux.o $(kallsyms.o) FORCE
ifdef CONFIG_HEADERS_CHECK
@@ -858,6 +861,7 @@ ifdef CONFIG_BUILD_DOCSRC
endif
$(call vmlinux-modpost)
$(call if_changed_rule,vmlinux__)
+ $(call cmd,sort-jump-label)
$(Q)rm -f .old_version

# build vmlinux.o first to catch section mismatch errors early
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 83a469d..f9d8188 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -168,7 +168,6 @@
TRACE_PRINTKS() \
FTRACE_EVENTS() \
TRACE_SYSCALLS() \
- JUMP_TABLE() \

/*
* Data section helpers
@@ -207,7 +206,6 @@
*(__vermagic) /* Kernel version magic */ \
*(__markers_strings) /* Markers: strings */ \
*(__tracepoints_strings)/* Tracepoints: strings */ \
- *(__jump_strings)/* Jump: strings */ \
} \
\
.rodata1 : AT(ADDR(.rodata1) - LOAD_OFFSET) { \
@@ -216,6 +214,10 @@
\
BUG_TABLE \
\
+ JUMP_TABLE \
+ \
+ JUMP_STRINGS \
+ \
/* PCI quirks */ \
.pci_fixup : AT(ADDR(.pci_fixup) - LOAD_OFFSET) { \
VMLINUX_SYMBOL(__start_pci_fixups_early) = .; \
@@ -559,11 +561,21 @@
#define BUG_TABLE
#endif

-#define JUMP_TABLE() \
+#define JUMP_TABLE \
. = ALIGN(64); \
+ __jump_table : AT(ADDR(__jump_table) - LOAD_OFFSET) { \
VMLINUX_SYMBOL(__start___jump_table) = .; \
*(__jump_table) \
VMLINUX_SYMBOL(__stop___jump_table) = .; \
+ }
+
+#define JUMP_STRINGS \
+ . = ALIGN(64); \
+ __jump_strings : AT(ADDR(__jump_strings) - LOAD_OFFSET) { \
+ VMLINUX_SYMBOL(__start___jump_strings) = .; \
+ *(__jump_strings) \
+ VMLINUX_SYMBOL(__stop___jump_strings) = .; \
+ }

#ifdef CONFIG_PM_TRACE
#define TRACEDATA \
diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index 9868c43..7238805 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -3,6 +3,7 @@

#include <asm/jump_label.h>

+/* struct jump_entry must match scripts/mod/mopost.h */
struct jump_entry {
unsigned long code;
unsigned long target;
diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 2092361..1a5f543 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -41,6 +41,8 @@ static int warn_unresolved = 0;
/* How a symbol is exported */
static int sec_mismatch_count = 0;
static int sec_mismatch_verbose = 1;
+/* jump label */
+static int enable_jump_label = 0;

enum export {
export_plain, export_unused, export_gpl,
@@ -315,12 +317,18 @@ void *grab_file(const char *filename, unsigned long *size)
void *map;
int fd;

- fd = open(filename, O_RDONLY);
+ if (!enable_jump_label)
+ fd = open(filename, O_RDONLY);
+ else
+ fd = open(filename, O_RDWR);
if (fd < 0 || fstat(fd, &st) != 0)
return NULL;

*size = st.st_size;
- map = mmap(NULL, *size, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
+ if (!enable_jump_label)
+ map = mmap(NULL, *size, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
+ else
+ map = mmap(NULL, *size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
close(fd);

if (map == MAP_FAILED)
@@ -367,6 +375,50 @@ void release_file(void *file, unsigned long size)
munmap(file, size);
}

+static void swap_jump_label_entries(struct jump_entry *previous, struct jump_entry *next)
+{
+ struct jump_entry tmp;
+
+ tmp = *next;
+ *next = *previous;
+ *previous = tmp;
+}
+
+static void sort_jump_label_table(struct elf_info *info, Elf_Ehdr *hdr)
+{
+ int swapped = 0;
+ struct jump_entry *iter, *iter_next;
+ char *name, *next_name;
+ Elf_Shdr *sechdrs = info->sechdrs;
+ unsigned long jump_table, jump_table_end;
+ unsigned long jump_strings, jump_strings_addr;
+
+ if ((info->jump_sec == 0) && (info->jump_strings_sec == 0))
+ return;
+
+ jump_table = (unsigned long)hdr + sechdrs[info->jump_sec].sh_offset;
+ jump_table_end = jump_table + sechdrs[info->jump_sec].sh_size;
+ jump_strings = (unsigned long)hdr +
+ sechdrs[info->jump_strings_sec].sh_offset;
+ jump_strings_addr = sechdrs[info->jump_strings_sec].sh_addr;
+
+ do {
+ swapped = 0;
+ iter = iter_next = (struct jump_entry *)jump_table;
+ iter_next++;
+ for (; iter_next < (struct jump_entry *)jump_table_end;
+ iter++, iter_next++) {
+ name = jump_strings + (iter->name - jump_strings_addr);
+ next_name = jump_strings +
+ (iter_next->name - jump_strings_addr);
+ if (strcmp(name, next_name) > 0) {
+ swap_jump_label_entries(iter, iter_next);
+ swapped = 1;
+ }
+ }
+ } while (swapped == 1);
+}
+
static int parse_elf(struct elf_info *info, const char *filename)
{
unsigned int i;
@@ -460,6 +512,10 @@ static int parse_elf(struct elf_info *info, const char *filename)
info->export_unused_gpl_sec = i;
else if (strcmp(secname, "__ksymtab_gpl_future") == 0)
info->export_gpl_future_sec = i;
+ else if (strcmp(secname, "__jump_table") == 0)
+ info->jump_sec = i;
+ else if (strcmp(secname, "__jump_strings") == 0)
+ info->jump_strings_sec = i;

if (sechdrs[i].sh_type != SHT_SYMTAB)
continue;
@@ -480,6 +536,10 @@ static int parse_elf(struct elf_info *info, const char *filename)
sym->st_value = TO_NATIVE(sym->st_value);
sym->st_size = TO_NATIVE(sym->st_size);
}
+
+ if (enable_jump_label)
+ sort_jump_label_table(info, hdr);
+
return 1;
}

@@ -1941,7 +2001,7 @@ int main(int argc, char **argv)
struct ext_sym_list *extsym_iter;
struct ext_sym_list *extsym_start = NULL;

- while ((opt = getopt(argc, argv, "i:I:e:cmsSo:awM:K:")) != -1) {
+ while ((opt = getopt(argc, argv, "i:I:e:cmsSo:awM:K:j")) != -1) {
switch (opt) {
case 'i':
kernel_read = optarg;
@@ -1979,6 +2039,9 @@ int main(int argc, char **argv)
case 'w':
warn_unresolved = 1;
break;
+ case 'j':
+ enable_jump_label = 1;
+ break;
default:
exit(1);
}
diff --git a/scripts/mod/modpost.h b/scripts/mod/modpost.h
index be987a4..312b6db 100644
--- a/scripts/mod/modpost.h
+++ b/scripts/mod/modpost.h
@@ -126,11 +126,20 @@ struct elf_info {
Elf_Section export_gpl_sec;
Elf_Section export_unused_gpl_sec;
Elf_Section export_gpl_future_sec;
+ Elf_Section jump_sec;
+ Elf_Section jump_strings_sec;
const char *strtab;
char *modinfo;
unsigned int modinfo_len;
};

+/* struct jump_entry must match include/linux/jump_labal.h */
+struct jump_entry {
+ unsigned long code;
+ unsigned long target;
+ char *name;
+};
+
/* file2alias.c */
extern unsigned int cross_build;
void handle_moddevtable(struct module *mod, struct elf_info *info,
--
1.7.0.1

2010-04-09 20:36:50

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH 0/9] jump label v6

Hi Jason,

Jason Baron wrote:
> Hi,
>
> Refresh of jump labeling patches aginst -tip tree. For bacground see:
> http://marc.info/?l=linux-kernel&m=125858436505941&w=2
>
> I believe I've addressed all the reviews from v5.
>
> Changes in v6:
>
> * I've moved Steve Rostedt's 'ftrace_dyn_arch_init()' to alternative.c to
> put it into a common area for used by both ftrace and jump labels. By
> default we put a 'jmp 5' in the nop slot. Then, when we detect the best
> runtime no-op we patch over the 'jmp 5' with the appropriate nop.
>
> * build time sort of the jump label table. The jump label table is more
> optimally accessed if the entries are continguous. Sorting the table
> accomplishes this. Do the sort at build-time. Adds a '-j' option to
> 'modpost' which replaces the vmlinux, with a sorted jump label section vmlinux.
> I've tested this on x86 with relocatable and it works fine there as well. Note
> that I have not sorted the jump label table in modules. This is b/c the jump
> label names can be exported by the core kernel, and thus I don't have them
> available at buildtime. This could be solved by either finding the correct
> ones in the vmlinux, or by embedding the name of the jump label in the module
> tables (and not just a pointer), but the module tables tend to be smaller, and
> thus there is less value to this kind of change anyway. The kernel continues to
> do the sort, just in case, but at least for the vmlinux, this is just a
> verfication b/c the jump label table has already been sorted.
>
> * added jump_label_text_reserved(), so that other routines that want to patch
> the code, can first verify that they are not stomping on jump label addresses.

Good!:-)
So now, it might be a good time to integrate those text_reserved() functions.

BTW, how many jumps would you expect modifying at once?
Since the text_poke_smp() uses stop_machine() for each modifying text,
I found that it can cause a delay issue if it is called so many times...
(e.g. a systemtap testcase sets ~5000 probes at once)

Thank you,

--
Masami Hiramatsu
e-mail: [email protected]

2010-04-09 21:10:11

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH 9/9] jump label: jump_label_text_reserved() to reserve our jump points

Jason Baron wrote:
> Add a jump_label_text_reserved(void *start, void *end), so that other
> pieces of code that want to modify kernel text, can first verify that
> jump label has not reserved the instruction.
>
> Signed-off-by: Jason Baron <[email protected]>

Acked-by: Masami Hiramatsu <[email protected]>

At least kprobes parts. :)

Thank you,

> ---
> arch/x86/kernel/kprobes.c | 3 +-
> include/linux/jump_label.h | 6 +++
> kernel/jump_label.c | 83 ++++++++++++++++++++++++++++++++++++++++++++
> kernel/kprobes.c | 3 +-
> 4 files changed, 93 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
> index b43bbae..87bcf63 100644
> --- a/arch/x86/kernel/kprobes.c
> +++ b/arch/x86/kernel/kprobes.c
> @@ -1194,7 +1194,8 @@ static int __kprobes copy_optimized_instructions(u8 *dest, u8 *src)
> }
> /* Check whether the address range is reserved */
> if (ftrace_text_reserved(src, src + len - 1) ||
> - alternatives_text_reserved(src, src + len - 1))
> + alternatives_text_reserved(src, src + len - 1) ||
> + jump_label_text_reserved(src, src + len - 1))
> return -EBUSY;
>
> return len;
> diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
> index 7238805..90ca4b6 100644
> --- a/include/linux/jump_label.h
> +++ b/include/linux/jump_label.h
> @@ -31,6 +31,7 @@ extern void arch_jump_label_transform(struct jump_entry *entry,
> enum jump_label_type type);
>
> extern void jump_label_update(const char *name, enum jump_label_type type);
> +extern int jump_label_text_reserved(void *start, void *end);
>
> extern void apply_jump_label_nops(struct module *mod);
>
> @@ -65,6 +66,11 @@ static inline int apply_jump_label_nops(struct module *mod)
> return 0;
> }
>
> +static inline int jump_label_text_reserved(void *start, void *end)
> +{
> + return 0;
> +}
> +
> #endif
>
> #endif
> diff --git a/kernel/jump_label.c b/kernel/jump_label.c
> index 7e7458b..24bba61 100644
> --- a/kernel/jump_label.c
> +++ b/kernel/jump_label.c
> @@ -179,6 +179,89 @@ void jump_label_update(const char *name, enum jump_label_type type)
> mutex_unlock(&jump_label_mutex);
> }
>
> +static int addr_conflict(struct jump_entry *entry, void *start, void *end)
> +{
> + if (entry->code <= (unsigned long)end &&
> + entry->code + IDEAL_NOP_SIZE_5 > (unsigned long)start)
> + return 1;
> +
> + return 0;
> +}
> +
> +#ifdef CONFIG_MODULES
> +
> +static int module_conflict(void *start, void *end)
> +{
> + struct hlist_head *head;
> + struct hlist_node *node, *node_next, *module_node, *module_node_next;
> + struct jump_label_entry *e;
> + struct jump_label_module_entry *e_module;
> + struct jump_entry *iter;
> + int i, count;
> + int conflict = 0;
> +
> + for (i = 0; i < JUMP_LABEL_TABLE_SIZE; i++) {
> + head = &jump_label_table[i];
> + hlist_for_each_entry_safe(e, node, node_next, head, hlist) {
> + hlist_for_each_entry_safe(e_module, module_node,
> + module_node_next,
> + &(e->modules), hlist) {
> + count = e_module->nr_entries;
> + iter = e_module->table;
> + while (count--) {
> + if (addr_conflict(iter, start, end)) {
> + conflict = 1;
> + goto out;
> + }
> + iter++;
> + }
> + }
> + }
> + }
> +out:
> + return conflict;
> +}
> +
> +#endif
> +
> +/***
> + * jump_label_text_reserved - check if addr range is reserved
> + * @start: start text addr
> + * @end: end text addr
> + *
> + * checks if the text addr located between @start and @end
> + * overlaps with any of the jump label patch addresses. Code
> + * that wants to modify kernel text should first verify that
> + * it does not overlap with any of the jump label addresses.
> + *
> + * returns 1 if there is an overlap, 0 otherwise
> + */
> +int jump_label_text_reserved(void *start, void *end)
> +{
> + struct jump_entry *iter;
> + struct jump_entry *iter_start = __start___jump_table;
> + struct jump_entry *iter_stop = __start___jump_table;
> + int conflict = 0;
> +
> + mutex_lock(&jump_label_mutex);
> + iter = iter_start;
> + while (iter < iter_stop) {
> + if (addr_conflict(iter, start, end)) {
> + conflict = 1;
> + goto out;
> + }
> + iter++;
> + }
> +
> + /* now check modules */
> +#ifdef CONFIG_MODULES
> + conflict = module_conflict(start, end);
> +#endif
> +out:
> + mutex_unlock(&jump_label_mutex);
> + return conflict;
> +}
> +
> static int init_jump_label(void)
> {
> int ret;
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index fa034d2..a57755f 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1140,7 +1140,8 @@ int __kprobes register_kprobe(struct kprobe *p)
> preempt_disable();
> if (!kernel_text_address((unsigned long) p->addr) ||
> in_kprobes_functions((unsigned long) p->addr) ||
> - ftrace_text_reserved(p->addr, p->addr)) {
> + ftrace_text_reserved(p->addr, p->addr) ||
> + jump_label_text_reserved(p->addr, p->addr)) {
> preempt_enable();
> return -EINVAL;
> }

--
Masami Hiramatsu
e-mail: [email protected]

2010-04-09 21:24:55

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH 7/9] jump label: sort jump table at build-time

On Fri, Apr 09, 2010 at 03:49:57PM -0400, Jason Baron wrote:
> The jump label table is more optimal accessed if the entries are continguous.
> Sorting the table accomplishes this. Do the sort at build-time. Adds a '-j'
> option to 'modpost' which replaces the vmlinux, with a sorted jump label
> section vmlinux. I've tested this on x86 with relocatable and it works fine
> there as well. Note that I have not sorted the jump label table in modules.
> This is b/c the jump label names can be exported by the core kernel, and thus
> I don't have them available at buildtime. This could be solved by either
> finding the correct ones in the vmlinux, or by embedding the name of the jump
> label in the module tables (and not just a pointer), but the module tables
> tend to be smaller, and thus their is less value to this kind of change
> anyway. The kernel continues to do the sort, just in case, but at least for
> the vmlinux, this is just a verfication that the jump label table has
> already been sorted.
>
> Signed-off-by: Jason Baron <[email protected]>
> ---

[ CCing Eric ]

[..]
> +static void swap_jump_label_entries(struct jump_entry *previous, struct jump_entry *next)
> +{
> + struct jump_entry tmp;
> +
> + tmp = *next;
> + *next = *previous;
> + *previous = tmp;
> +}
> +
> +static void sort_jump_label_table(struct elf_info *info, Elf_Ehdr *hdr)
> +{
> + int swapped = 0;
> + struct jump_entry *iter, *iter_next;
> + char *name, *next_name;
> + Elf_Shdr *sechdrs = info->sechdrs;
> + unsigned long jump_table, jump_table_end;
> + unsigned long jump_strings, jump_strings_addr;
> +
> + if ((info->jump_sec == 0) && (info->jump_strings_sec == 0))
> + return;
> +
> + jump_table = (unsigned long)hdr + sechdrs[info->jump_sec].sh_offset;
> + jump_table_end = jump_table + sechdrs[info->jump_sec].sh_size;
> + jump_strings = (unsigned long)hdr +
> + sechdrs[info->jump_strings_sec].sh_offset;
> + jump_strings_addr = sechdrs[info->jump_strings_sec].sh_addr;
> +
> + do {
> + swapped = 0;
> + iter = iter_next = (struct jump_entry *)jump_table;
> + iter_next++;
> + for (; iter_next < (struct jump_entry *)jump_table_end;
> + iter++, iter_next++) {
> + name = jump_strings + (iter->name - jump_strings_addr);
> + next_name = jump_strings +
> + (iter_next->name - jump_strings_addr);
> + if (strcmp(name, next_name) > 0) {
> + swap_jump_label_entries(iter, iter_next);
> + swapped = 1;


Jason,

As we were chatting about this, it looks like you are modifying vmlinux
section outside the knowledge of compiler. So theoritically associated
relocation section knowledge is no more valid and it can be a problem during
i386 relocatable kernels where we read the section's relocation inforamtion
and perform the relocations at runtime.

I know you have tested this on i386 and it works for you. I guess it works
because all the entries in the section are same and we apply same relocation
offset to all entries so even changing the order of entries is not impacting.

But conceptually, changing the vmlinux section outside knowledge of compiler
and assuming that we don't have to change the associated relocation section
probably is not the best thing.

I am not sure how to fix it. May be rely back on boot time sorting, or if
there is a way to relink sections after sorting etc. I just wanted to raise
a concern. May be other people (Eric, hpa) have ideas whether it is a valid
concern or not or how to handle it better.

Thanks
Vivek

2010-04-09 21:33:15

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH 7/9] jump label: sort jump table at build-time

I think it just happens to come out harmless. The relocs.c extraction
stuff doesn't actually care about the exact address to be relocated, it
only needs to know that a relocated value sits at a given place in the
binary. The sorting rearranges the addresses in the text without
rearranging their corresponding relocs, but it's still the case that at
exactly each of those spots in text there is a relocated address. The
boot-time "relocation" is just a blind adjustment to all those spots,
without reference to the original relocation details. So it Just Works.
I'm not at all sure this is how we want things to be. It's rather
nonobvious and fragile if we change any of the related magic. But I think
it is entirely reliable in today's code that it will do the right thing.


Thanks,
Roland

2010-04-09 21:37:53

by Jason Baron

[permalink] [raw]
Subject: Re: [PATCH 0/9] jump label v6

On Fri, Apr 09, 2010 at 04:36:24PM -0400, Masami Hiramatsu wrote:
> Hi Jason,
>
> Jason Baron wrote:
> > Hi,
> >
> > Refresh of jump labeling patches aginst -tip tree. For bacground see:
> > http://marc.info/?l=linux-kernel&m=125858436505941&w=2
> >
> > I believe I've addressed all the reviews from v5.
> >
> > Changes in v6:
> >
> > * I've moved Steve Rostedt's 'ftrace_dyn_arch_init()' to alternative.c to
> > put it into a common area for used by both ftrace and jump labels. By
> > default we put a 'jmp 5' in the nop slot. Then, when we detect the best
> > runtime no-op we patch over the 'jmp 5' with the appropriate nop.
> >
> > * build time sort of the jump label table. The jump label table is more
> > optimally accessed if the entries are continguous. Sorting the table
> > accomplishes this. Do the sort at build-time. Adds a '-j' option to
> > 'modpost' which replaces the vmlinux, with a sorted jump label section vmlinux.
> > I've tested this on x86 with relocatable and it works fine there as well. Note
> > that I have not sorted the jump label table in modules. This is b/c the jump
> > label names can be exported by the core kernel, and thus I don't have them
> > available at buildtime. This could be solved by either finding the correct
> > ones in the vmlinux, or by embedding the name of the jump label in the module
> > tables (and not just a pointer), but the module tables tend to be smaller, and
> > thus there is less value to this kind of change anyway. The kernel continues to
> > do the sort, just in case, but at least for the vmlinux, this is just a
> > verfication b/c the jump label table has already been sorted.
> >
> > * added jump_label_text_reserved(), so that other routines that want to patch
> > the code, can first verify that they are not stomping on jump label addresses.
>
> Good!:-)
> So now, it might be a good time to integrate those text_reserved() functions.
>
> BTW, how many jumps would you expect modifying at once?
> Since the text_poke_smp() uses stop_machine() for each modifying text,
> I found that it can cause a delay issue if it is called so many times...
> (e.g. a systemtap testcase sets ~5000 probes at once)
>
> Thank you,
>

I'm counting 934 jump label locations in the vmlinux, i have compiled,
675 of them being 'kmalloc'. Batch mode for text_poke_smp()?

-Jason

2010-04-09 21:58:33

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH 0/9] jump label v6

Jason Baron wrote:
> On Fri, Apr 09, 2010 at 04:36:24PM -0400, Masami Hiramatsu wrote:
>> Hi Jason,
>>
>> Jason Baron wrote:
>>> Hi,
>>>
>>> Refresh of jump labeling patches aginst -tip tree. For bacground see:
>>> http://marc.info/?l=linux-kernel&m=125858436505941&w=2
>>>
>>> I believe I've addressed all the reviews from v5.
>>>
>>> Changes in v6:
>>>
>>> * I've moved Steve Rostedt's 'ftrace_dyn_arch_init()' to alternative.c to
>>> put it into a common area for used by both ftrace and jump labels. By
>>> default we put a 'jmp 5' in the nop slot. Then, when we detect the best
>>> runtime no-op we patch over the 'jmp 5' with the appropriate nop.
>>>
>>> * build time sort of the jump label table. The jump label table is more
>>> optimally accessed if the entries are continguous. Sorting the table
>>> accomplishes this. Do the sort at build-time. Adds a '-j' option to
>>> 'modpost' which replaces the vmlinux, with a sorted jump label section vmlinux.
>>> I've tested this on x86 with relocatable and it works fine there as well. Note
>>> that I have not sorted the jump label table in modules. This is b/c the jump
>>> label names can be exported by the core kernel, and thus I don't have them
>>> available at buildtime. This could be solved by either finding the correct
>>> ones in the vmlinux, or by embedding the name of the jump label in the module
>>> tables (and not just a pointer), but the module tables tend to be smaller, and
>>> thus there is less value to this kind of change anyway. The kernel continues to
>>> do the sort, just in case, but at least for the vmlinux, this is just a
>>> verfication b/c the jump label table has already been sorted.
>>>
>>> * added jump_label_text_reserved(), so that other routines that want to patch
>>> the code, can first verify that they are not stomping on jump label addresses.
>>
>> Good!:-)
>> So now, it might be a good time to integrate those text_reserved() functions.
>>
>> BTW, how many jumps would you expect modifying at once?
>> Since the text_poke_smp() uses stop_machine() for each modifying text,
>> I found that it can cause a delay issue if it is called so many times...
>> (e.g. a systemtap testcase sets ~5000 probes at once)
>>
>> Thank you,
>>
>
> I'm counting 934 jump label locations in the vmlinux, i have compiled,
> 675 of them being 'kmalloc'. Batch mode for text_poke_smp()?

Yeah:) I'm now trying to make it with updating kprobes.

Thank you,

--
Masami Hiramatsu
e-mail: [email protected]

2010-04-10 06:16:14

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 0/9] jump label v6


I took a stab at implementing this on sparc64 but there are just
too many x86-specific references in kernel/jump_label.c to even
build this on non-x86 currently.

Please abstract out all of that ideal nop business so that whatever
values you need can be specified in the asm/jump_label.h header.
Probably what you want is the nop sequence size. Fine, call it
JUMP_LABEL_NOP_SIZE and define it in asm/jump_label.h

Also, kernel/jump_label.c is built unconditionally, and thus
linux/jump_label.h is included unconditionally, but that
unconditionally includes asm/jump_label.h which will break the
build on every platform that hasn't implemented the header file
yet.

Once you sort this out the patch below will probably work, once the
nop size defines have been added (sparc will use 2 nops so the
sequence is 8 bytes).

Finally, please abstract out the jump label and string address
type. Even on 64-bit sparc all of the kernel addresses are 32-bits
so we can use "u32" for all of the addresses jump label wants to
record. This will decrease the jump label section size. There is
at least one other 64-bit platform that can benefit from this,
which I think is s390x if I'm not mistaken.

Thanks.

--------------------
sparc64: Add jump_label support.

Signed-off-by: David S. Miller <[email protected]>
---
arch/sparc/include/asm/jump_label.h | 24 ++++++++++++++++++++++++
arch/sparc/kernel/Makefile | 2 ++
arch/sparc/kernel/jump_label.c | 29 +++++++++++++++++++++++++++++
arch/sparc/kernel/module.c | 6 ++++++
4 files changed, 61 insertions(+), 0 deletions(-)
create mode 100644 arch/sparc/include/asm/jump_label.h
create mode 100644 arch/sparc/kernel/jump_label.c

diff --git a/arch/sparc/include/asm/jump_label.h b/arch/sparc/include/asm/jump_label.h
new file mode 100644
index 0000000..efcd950
--- /dev/null
+++ b/arch/sparc/include/asm/jump_label.h
@@ -0,0 +1,24 @@
+#ifndef _ASM_SPARC_JUMP_LABEL_H
+#define _ASM_SPARC_JUMP_LABEL_H
+
+#if (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 5))
+# define __HAVE_ARCH_JUMP_LABEL
+#endif
+
+#ifdef __HAVE_ARCH_JUMP_LABEL
+
+# define JUMP_LABEL(tag, label, cond) \
+ do { \
+ extern const char __jlstrtab_##tag[]; \
+ asm goto("1:\n\t" \
+ "nop\n\t" \
+ "nop\n\t" \
+ ".pushsection __jump_table, \"a\"\n\t"\
+ ".xword 1b, %l[" #label "], %c0\n\t" \
+ ".popsection \n\t" \
+ : : "i" (__jlstrtab_##tag) : : label);\
+ } while (0)
+
+# endif
+
+#endif
diff --git a/arch/sparc/kernel/Makefile b/arch/sparc/kernel/Makefile
index 0c2dc1f..599398f 100644
--- a/arch/sparc/kernel/Makefile
+++ b/arch/sparc/kernel/Makefile
@@ -119,3 +119,5 @@ obj-$(CONFIG_COMPAT) += $(audit--y)

pc--$(CONFIG_PERF_EVENTS) := perf_event.o
obj-$(CONFIG_SPARC64) += $(pc--y)
+
+obj-$(CONFIG_SPARC64) += jump_label.o
diff --git a/arch/sparc/kernel/jump_label.c b/arch/sparc/kernel/jump_label.c
new file mode 100644
index 0000000..c636ccf
--- /dev/null
+++ b/arch/sparc/kernel/jump_label.c
@@ -0,0 +1,29 @@
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/mutex.h>
+#include <linux/cpu.h>
+
+#include <linux/jump_label.h>
+#include <linux/memory.h>
+
+#ifdef __HAVE_ARCH_JUMP_LABEL
+void arch_jump_label_transform(struct jump_entry *entry, enum jump_label_type type)
+{
+ u32 val, *insn = (u32 *) entry->code;
+
+ val = *insn;
+ if (type == JUMP_LABEL_ENABLE) {
+ s32 off = (s32)entry->target - (s32)entry->code;
+ val = 0x40000000 | ((u32) off >> 2);
+ } else {
+ val = 0x01000000;
+ }
+
+ get_online_cpus();
+ mutex_lock(&text_mutex);
+ *insn = val;
+ flushi(insn);
+ mutex_unlock(&text_mutex);
+ put_online_cpus();
+}
+#endif
diff --git a/arch/sparc/kernel/module.c b/arch/sparc/kernel/module.c
index 0ee642f..469f380 100644
--- a/arch/sparc/kernel/module.c
+++ b/arch/sparc/kernel/module.c
@@ -18,6 +18,9 @@
#include <asm/spitfire.h>

#ifdef CONFIG_SPARC64
+
+#include <linux/jump_label.h>
+
static void *module_map(unsigned long size)
{
struct vm_struct *area;
@@ -227,6 +230,9 @@ int module_finalize(const Elf_Ehdr *hdr,
const Elf_Shdr *sechdrs,
struct module *me)
{
+ /* make jump label nops */
+ apply_jump_label_nops(me);
+
/* Cheetah's I-cache is fully coherent. */
if (tlb_type == spitfire) {
unsigned long va;
--
1.6.6.1

2010-04-10 06:23:48

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 0/9] jump label v6

On 04/09/2010 11:16 PM, David Miller wrote:
>
> Finally, please abstract out the jump label and string address
> type. Even on 64-bit sparc all of the kernel addresses are 32-bits
> so we can use "u32" for all of the addresses jump label wants to
> record. This will decrease the jump label section size. There is
> at least one other 64-bit platform that can benefit from this,
> which I think is s390x if I'm not mistaken.
>

Even on x86 they're 32 bits -- but signed.

-hpa


--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2010-04-13 16:57:05

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH 0/9] jump label v6

* Jason Baron ([email protected]) wrote:
> Hi,
>
> Refresh of jump labeling patches aginst -tip tree. For bacground see:
> http://marc.info/?l=linux-kernel&m=125858436505941&w=2
>
> I believe I've addressed all the reviews from v5.

Hi Jason,

I would appreciate if you could add pointers to the Immediate Values
benchmarks (or possibly some benchmark information altogether) and a
notice that some parts of the design are inspired from Immediate Values
in the jump label code.

I recognise that you did great work on getting jump label in shape both
at the Linux kernel and gcc level, but it's usually appropriate to
acknowledge prior work it is based on. Only then can I justify that
Immediate Values have been useful in the whole process.

Thanks,

Mathieu


>
> Changes in v6:
>
> * I've moved Steve Rostedt's 'ftrace_dyn_arch_init()' to alternative.c to
> put it into a common area for used by both ftrace and jump labels. By
> default we put a 'jmp 5' in the nop slot. Then, when we detect the best
> runtime no-op we patch over the 'jmp 5' with the appropriate nop.
>
> * build time sort of the jump label table. The jump label table is more
> optimally accessed if the entries are continguous. Sorting the table
> accomplishes this. Do the sort at build-time. Adds a '-j' option to
> 'modpost' which replaces the vmlinux, with a sorted jump label section vmlinux.
> I've tested this on x86 with relocatable and it works fine there as well. Note
> that I have not sorted the jump label table in modules. This is b/c the jump
> label names can be exported by the core kernel, and thus I don't have them
> available at buildtime. This could be solved by either finding the correct
> ones in the vmlinux, or by embedding the name of the jump label in the module
> tables (and not just a pointer), but the module tables tend to be smaller, and
> thus there is less value to this kind of change anyway. The kernel continues to
> do the sort, just in case, but at least for the vmlinux, this is just a
> verfication b/c the jump label table has already been sorted.
>
> * added jump_label_text_reserved(), so that other routines that want to patch
> the code, can first verify that they are not stomping on jump label addresses.
>
> thanks,
>
> -Jason
>
> Jason Baron (8):
> jump label: base patch
> jump label: x86 support
> jump label: tracepoint support
> jump label: add module support
> jump label: move ftrace_dyn_arch_init to common code
> jump label: sort jump table at build-time
> jump label: initialize workqueue tracepoints *before* they are
> registered
> jump label: jump_label_text_reserved() to reserve our jump points
>
> Mathieu Desnoyers (1):
> jump label: notifier atomic call chain notrace
>
> Makefile | 4 +
> arch/x86/include/asm/alternative.h | 14 ++
> arch/x86/include/asm/jump_label.h | 27 +++
> arch/x86/kernel/Makefile | 2 +-
> arch/x86/kernel/alternative.c | 71 ++++++-
> arch/x86/kernel/ftrace.c | 70 +------
> arch/x86/kernel/jump_label.c | 42 ++++
> arch/x86/kernel/kprobes.c | 3 +-
> arch/x86/kernel/module.c | 3 +
> arch/x86/kernel/ptrace.c | 1 +
> arch/x86/kernel/setup.c | 2 +
> include/asm-generic/vmlinux.lds.h | 22 ++-
> include/linux/jump_label.h | 76 +++++++
> include/linux/module.h | 5 +-
> include/linux/tracepoint.h | 34 ++--
> kernel/Makefile | 2 +-
> kernel/jump_label.c | 428 ++++++++++++++++++++++++++++++++++++
> kernel/kprobes.c | 3 +-
> kernel/module.c | 7 +
> kernel/notifier.c | 6 +-
> kernel/trace/ftrace.c | 13 +-
> kernel/trace/trace_workqueue.c | 10 +-
> kernel/tracepoint.c | 8 +
> scripts/mod/modpost.c | 69 ++++++-
> scripts/mod/modpost.h | 9 +
> 25 files changed, 816 insertions(+), 115 deletions(-)
> create mode 100644 arch/x86/include/asm/jump_label.h
> create mode 100644 arch/x86/kernel/jump_label.c
> create mode 100644 include/linux/jump_label.h
> create mode 100644 kernel/jump_label.c
>

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2010-04-14 19:35:10

by Jason Baron

[permalink] [raw]
Subject: Re: [PATCH 0/9] jump label v6

On Tue, Apr 13, 2010 at 12:56:59PM -0400, Mathieu Desnoyers wrote:
> * Jason Baron ([email protected]) wrote:
> > Hi,
> >
> > Refresh of jump labeling patches aginst -tip tree. For bacground see:
> > http://marc.info/?l=linux-kernel&m=125858436505941&w=2
> >
> > I believe I've addressed all the reviews from v5.
>
> Hi Jason,
>
> I would appreciate if you could add pointers to the Immediate Values
> benchmarks (or possibly some benchmark information altogether) and a
> notice that some parts of the design are inspired from Immediate Values
> in the jump label code.
>
> I recognise that you did great work on getting jump label in shape both
> at the Linux kernel and gcc level, but it's usually appropriate to
> acknowledge prior work it is based on. Only then can I justify that
> Immediate Values have been useful in the whole process.
>

So I've been doing micro-benchmarks measuring the cycles involved when
the tracepoints are disabled. As quoted from the above pointer:

"As discussed in pervious mails I've seen an average improvement of 30
cycles per-tracepoint on x86_64 systems that I've tested."

I can post my test harness if you are interested.

If there are any other benchmarks of interest please let me know.

I am planning to add a Docmentation/ file for jump labels, so I can add
about the previous Immediate value work, which certainly has been
useful.

thanks,

-Jason