2020-04-05 20:14:59

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V6 00/15] perf/x86: Add perf text poke events

Hi

Here are patches to add a text poke event to record changes to kernel text
(i.e. self-modifying code) in order to support tracers like Intel PT
decoding through jump labels, kprobes and ftrace trampolines.

The first 8 patches make the kernel changes and the subsequent patches are
tools changes.

The next 4 patches add support for updating perf tools' data cache
with the changed bytes.

The next patch is an Intel PT specific tools change.

The final 2 patches add perf script --show-text-poke-events option

Patches also here:

git://git.infradead.org/users/ahunter/linux-perf.git text_poke

Changes in V6

perf: Add perf text poke event

Adjustments due to re-base on top of cgroup event patches

perf/x86: Add perf text poke events for kprobes

Expand commit message
Add Masami's Ack

perf tools: Add support for PERF_RECORD_TEXT_POKE

Adjustments due to re-base on top of cgroup event patches
Improvements to perf_event__fprintf_text_poke()

perf script: Add option --show-text-poke-events
perf script: Show text poke address symbol

New patches

Changes in V5

perf/x86: Add perf text poke events for kprobes

Simplify optimized kprobes events (Peter)

Changes in V4

kprobes: Add symbols for kprobe insn pages

Change "module name" from kprobe to __builtin__kprobes
Added comment about "module name" use

ftrace: Add symbols for ftrace trampolines

Change "module name" from ftrace to __builtin__ftrace
Move calls of ftrace_add_trampoline_to_kallsyms() and
ftrace_remove_trampoline_from_kallsyms() into
kernel/trace/ftrace.c
Added comment about "module name" use

ftrace: Add perf ksymbol events for ftrace trampolines

Move changes into kernel/trace/ftrace.c

ftrace: Add perf text poke events for ftrace trampolines

Move changes into kernel/trace/ftrace.c

Changes in V3:

perf: Add perf text poke event

To prevent warning, cast pointer to (unsigned long) not (u64)

kprobes: Add symbols for kprobe insn pages

Expand commit message
Remove un-needed declarations of kprobe_cache_get_kallsym() and arch_kprobe_get_kallsym() when !CONFIG_KPROBES

ftrace: Add symbols for ftrace trampolines

Expand commit message
Make ftrace_get_trampoline_kallsym() static

Changes in V2:

perf: Add perf text poke event

Separate out x86 changes
The text poke event now has old len and new len
Revised commit message

perf/x86: Add support for perf text poke event for text_poke_bp_batch() callers

New patch containing x86 changes from original first patch

kprobes: Add symbols for kprobe insn pages
kprobes: Add perf ksymbol events for kprobe insn pages
perf/x86: Add perf text poke events for kprobes
ftrace: Add symbols for ftrace trampolines
ftrace: Add perf ksymbol events for ftrace trampolines
ftrace: Add perf text poke events for ftrace trampolines
perf kcore_copy: Fix module map when there are no modules loaded
perf evlist: Disable 'immediate' events last

New patches

perf tools: Add support for PERF_RECORD_TEXT_POKE

The text poke event now has old len and new len
Also select ksymbol events with text poke events

perf tools: Add support for PERF_RECORD_KSYMBOL_TYPE_OOL

New patch

perf intel-pt: Add support for text poke events

The text poke event now has old len and new len
Allow for the address not having a map yet


Changes since RFC:

Dropped 'flags' from the new event. The consensus seemed to be that text
pokes should employ a scheme similar to x86's INT3 method instead.

dropped tools patches already applied.


Example:

For jump labels, the kernel needs
CONFIG_JUMP_LABEL=y
and also an easy to flip jump label is in sysctl_schedstats() which needs
CONFIG_SCHEDSTATS=y
CONFIG_PROC_SYSCTL=y
CONFIG_SCHED_DEBUG=y

Also note the 'sudo perf record' is put into the background which, as
written, needs sudo credential caching (otherwise the background task
will stop awaiting the sudo password), hence the 'sudo echo' to start.

Before:

$ sudo echo
$ sudo perf record -o perf.data.before --kcore -a -e intel_pt//k -m,64M &
[1] 1640
$ cat /proc/sys/kernel/sched_schedstats
0
$ sudo bash -c 'echo 1 > /proc/sys/kernel/sched_schedstats'
$ cat /proc/sys/kernel/sched_schedstats
1
$ sudo bash -c 'echo 0 > /proc/sys/kernel/sched_schedstats'
$ cat /proc/sys/kernel/sched_schedstats
0
$ sudo kill 1640
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 16.635 MB perf.data.before ]
$ perf script -i perf.data.before --itrace=e >/dev/null
Warning:
1946 instruction trace errors

After:

$ sudo echo
$ sudo perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M &
[1] 1882
$ cat /proc/sys/kernel/sched_schedstats
0
$ sudo bash -c 'echo 1 > /proc/sys/kernel/sched_schedstats'
$ cat /proc/sys/kernel/sched_schedstats
1
$ sudo bash -c 'echo 0 > /proc/sys/kernel/sched_schedstats'
$ cat /proc/sys/kernel/sched_schedstats
0
$ sudo kill 1882
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 10.893 MB perf.data.after ]
$ perf script -i perf.data.after --itrace=e
$


Adrian Hunter (15):
perf: Add perf text poke event
perf/x86: Add support for perf text poke event for text_poke_bp_batch() callers
kprobes: Add symbols for kprobe insn pages
kprobes: Add perf ksymbol events for kprobe insn pages
perf/x86: Add perf text poke events for kprobes
ftrace: Add symbols for ftrace trampolines
ftrace: Add perf ksymbol events for ftrace trampolines
ftrace: Add perf text poke events for ftrace trampolines
perf kcore_copy: Fix module map when there are no modules loaded
perf evlist: Disable 'immediate' events last
perf tools: Add support for PERF_RECORD_TEXT_POKE
perf tools: Add support for PERF_RECORD_KSYMBOL_TYPE_OOL
perf intel-pt: Add support for text poke events
perf script: Add option --show-text-poke-events
perf script: Show text poke address symbol

arch/x86/include/asm/kprobes.h | 2 +
arch/x86/kernel/alternative.c | 37 ++++++++++-
arch/x86/kernel/kprobes/core.c | 15 ++++-
arch/x86/kernel/kprobes/opt.c | 38 +++++++++--
include/linux/ftrace.h | 12 ++--
include/linux/kprobes.h | 15 +++++
include/linux/perf_event.h | 8 +++
include/uapi/linux/perf_event.h | 26 +++++++-
kernel/events/core.c | 90 +++++++++++++++++++++++++-
kernel/kallsyms.c | 42 +++++++++++--
kernel/kprobes.c | 57 +++++++++++++++++
kernel/trace/ftrace.c | 101 +++++++++++++++++++++++++++++-
tools/include/uapi/linux/perf_event.h | 26 +++++++-
tools/lib/perf/include/perf/event.h | 9 +++
tools/perf/Documentation/perf-script.txt | 4 ++
tools/perf/arch/x86/util/intel-pt.c | 4 ++
tools/perf/builtin-record.c | 45 +++++++++++++
tools/perf/builtin-script.c | 23 ++++++-
tools/perf/tests/perf-record.c | 4 +-
tools/perf/util/dso.c | 3 +
tools/perf/util/dso.h | 1 +
tools/perf/util/event.c | 62 +++++++++++++++++-
tools/perf/util/event.h | 9 ++-
tools/perf/util/evlist.c | 31 ++++++---
tools/perf/util/evlist.h | 1 +
tools/perf/util/evsel.c | 7 ++-
tools/perf/util/intel-pt.c | 77 ++++++++++++++++++++++-
tools/perf/util/machine.c | 49 +++++++++++++++
tools/perf/util/machine.h | 3 +
tools/perf/util/map.c | 5 ++
tools/perf/util/map.h | 3 +-
tools/perf/util/perf_event_attr_fprintf.c | 1 +
tools/perf/util/record.c | 10 +++
tools/perf/util/record.h | 1 +
tools/perf/util/session.c | 23 +++++++
tools/perf/util/symbol-elf.c | 7 +++
tools/perf/util/symbol.c | 1 +
tools/perf/util/tool.h | 3 +-
38 files changed, 815 insertions(+), 40 deletions(-)


Regards
Adrian


2020-04-05 20:14:59

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V6 01/15] perf: Add perf text poke event

Record (single instruction) changes to the kernel text (i.e.
self-modifying code) in order to support tracers like Intel PT and
ARM CoreSight.

A copy of the running kernel code is needed as a reference point (e.g.
from /proc/kcore). The text poke event records the old bytes and the
new bytes so that the event can be processed forwards or backwards.

The basic problem is recording the modified instruction in an
unambiguous manner given SMP instruction cache (in)coherence. That is,
when modifying an instruction concurrently any solution with one or
multiple timestamps is not sufficient:

CPU0 CPU1
0
1 write insn A
2 execute insn A
3 sync-I$
4

Due to I$, CPU1 might execute either the old or new A. No matter where
we record tracepoints on CPU0, one simply cannot tell what CPU1 will
have observed, except that at 0 it must be the old one and at 4 it
must be the new one.

To solve this, take inspiration from x86 text poking, which has to
solve this exact problem due to variable length instruction encoding
and I-fetch windows.

1) overwrite the instruction with a breakpoint and sync I$

This guarantees that that code flow will never hit the target
instruction anymore, on any CPU (or rather, it will cause an
exception).

2) issue the TEXT_POKE event

3) overwrite the breakpoint with the new instruction and sync I$

Now we know that any execution after the TEXT_POKE event will either
observe the breakpoint (and hit the exception) or the new instruction.

So by guarding the TEXT_POKE event with an exception on either side;
we can now tell, without doubt, which instruction another CPU will
have observed.

Signed-off-by: Adrian Hunter <[email protected]>
---
include/linux/perf_event.h | 8 +++
include/uapi/linux/perf_event.h | 21 +++++++-
kernel/events/core.c | 90 ++++++++++++++++++++++++++++++++-
3 files changed, 117 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 9c3e7619c929..2d6e05d601f6 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1232,6 +1232,9 @@ extern void perf_event_exec(void);
extern void perf_event_comm(struct task_struct *tsk, bool exec);
extern void perf_event_namespaces(struct task_struct *tsk);
extern void perf_event_fork(struct task_struct *tsk);
+extern void perf_event_text_poke(const void *addr,
+ const void *old_bytes, size_t old_len,
+ const void *new_bytes, size_t new_len);

/* Callchains */
DECLARE_PER_CPU(struct perf_callchain_entry, perf_callchain_entry);
@@ -1482,6 +1485,11 @@ static inline void perf_event_exec(void) { }
static inline void perf_event_comm(struct task_struct *tsk, bool exec) { }
static inline void perf_event_namespaces(struct task_struct *tsk) { }
static inline void perf_event_fork(struct task_struct *tsk) { }
+static inline void perf_event_text_poke(const void *addr,
+ const void *old_bytes,
+ size_t old_len,
+ const void *new_bytes,
+ size_t new_len) { }
static inline void perf_event_init(void) { }
static inline int perf_swevent_get_recursion_context(void) { return -1; }
static inline void perf_swevent_put_recursion_context(int rctx) { }
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 7b2d6fc9e6ed..e5bee6c17b86 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -383,7 +383,8 @@ struct perf_event_attr {
bpf_event : 1, /* include bpf events */
aux_output : 1, /* generate AUX records instead of events */
cgroup : 1, /* include cgroup events */
- __reserved_1 : 31;
+ text_poke : 1, /* include text poke events */
+ __reserved_1 : 30;

union {
__u32 wakeup_events; /* wakeup every n events */
@@ -1024,6 +1025,24 @@ enum perf_event_type {
*/
PERF_RECORD_CGROUP = 19,

+ /*
+ * Records changes to kernel text i.e. self-modified code. 'old_len' is
+ * the number of old bytes, 'new_len' is the number of new bytes. Either
+ * 'old_len' or 'new_len' may be zero to indicate, for example, the
+ * addition or removal of a trampoline. 'bytes' contains the old bytes
+ * followed immediately by the new bytes.
+ *
+ * struct {
+ * struct perf_event_header header;
+ * u64 addr;
+ * u16 old_len;
+ * u16 new_len;
+ * u8 bytes[];
+ * struct sample_id sample_id;
+ * };
+ */
+ PERF_RECORD_TEXT_POKE = 20,
+
PERF_RECORD_MAX, /* non-ABI */
};

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 1569979c8912..7ecc932f4845 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -388,6 +388,7 @@ static atomic_t nr_switch_events __read_mostly;
static atomic_t nr_ksymbol_events __read_mostly;
static atomic_t nr_bpf_events __read_mostly;
static atomic_t nr_cgroup_events __read_mostly;
+static atomic_t nr_text_poke_events __read_mostly;

static LIST_HEAD(pmus);
static DEFINE_MUTEX(pmus_lock);
@@ -4554,7 +4555,7 @@ static bool is_sb_event(struct perf_event *event)
if (attr->mmap || attr->mmap_data || attr->mmap2 ||
attr->comm || attr->comm_exec ||
attr->task || attr->ksymbol ||
- attr->context_switch ||
+ attr->context_switch || attr->text_poke ||
attr->bpf_event)
return true;
return false;
@@ -4630,6 +4631,8 @@ static void unaccount_event(struct perf_event *event)
atomic_dec(&nr_ksymbol_events);
if (event->attr.bpf_event)
atomic_dec(&nr_bpf_events);
+ if (event->attr.text_poke)
+ atomic_dec(&nr_text_poke_events);

if (dec) {
if (!atomic_add_unless(&perf_sched_count, -1, 1))
@@ -8598,6 +8601,89 @@ void perf_event_bpf_event(struct bpf_prog *prog,
perf_iterate_sb(perf_event_bpf_output, &bpf_event, NULL);
}

+struct perf_text_poke_event {
+ const void *old_bytes;
+ const void *new_bytes;
+ size_t pad;
+ u16 old_len;
+ u16 new_len;
+
+ struct {
+ struct perf_event_header header;
+
+ u64 addr;
+ } event_id;
+};
+
+static int perf_event_text_poke_match(struct perf_event *event)
+{
+ return event->attr.text_poke;
+}
+
+static void perf_event_text_poke_output(struct perf_event *event, void *data)
+{
+ struct perf_text_poke_event *text_poke_event = data;
+ struct perf_output_handle handle;
+ struct perf_sample_data sample;
+ u64 padding = 0;
+ int ret;
+
+ if (!perf_event_text_poke_match(event))
+ return;
+
+ perf_event_header__init_id(&text_poke_event->event_id.header, &sample, event);
+
+ ret = perf_output_begin(&handle, event, text_poke_event->event_id.header.size);
+ if (ret)
+ return;
+
+ perf_output_put(&handle, text_poke_event->event_id);
+ perf_output_put(&handle, text_poke_event->old_len);
+ perf_output_put(&handle, text_poke_event->new_len);
+
+ __output_copy(&handle, text_poke_event->old_bytes, text_poke_event->old_len);
+ __output_copy(&handle, text_poke_event->new_bytes, text_poke_event->new_len);
+
+ if (text_poke_event->pad)
+ __output_copy(&handle, &padding, text_poke_event->pad);
+
+ perf_event__output_id_sample(event, &handle, &sample);
+
+ perf_output_end(&handle);
+}
+
+void perf_event_text_poke(const void *addr, const void *old_bytes,
+ size_t old_len, const void *new_bytes, size_t new_len)
+{
+ struct perf_text_poke_event text_poke_event;
+ size_t tot, pad;
+
+ if (!atomic_read(&nr_text_poke_events))
+ return;
+
+ tot = sizeof(text_poke_event.old_len) + old_len;
+ tot += sizeof(text_poke_event.new_len) + new_len;
+ pad = ALIGN(tot, sizeof(u64)) - tot;
+
+ text_poke_event = (struct perf_text_poke_event){
+ .old_bytes = old_bytes,
+ .new_bytes = new_bytes,
+ .pad = pad,
+ .old_len = old_len,
+ .new_len = new_len,
+ .event_id = {
+ .header = {
+ .type = PERF_RECORD_TEXT_POKE,
+ .misc = PERF_RECORD_MISC_KERNEL,
+ .size = sizeof(text_poke_event.event_id) + tot + pad,
+ },
+ .addr = (unsigned long)addr,
+ },
+ };
+
+ perf_iterate_sb(perf_event_text_poke_output, &text_poke_event, NULL);
+}
+
void perf_event_itrace_started(struct perf_event *event)
{
event->attach_state |= PERF_ATTACH_ITRACE;
@@ -10917,6 +11003,8 @@ static void account_event(struct perf_event *event)
atomic_inc(&nr_ksymbol_events);
if (event->attr.bpf_event)
atomic_inc(&nr_bpf_events);
+ if (event->attr.text_poke)
+ atomic_inc(&nr_text_poke_events);

if (inc) {
/*
--
2.17.1

2020-04-05 20:15:04

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V6 02/15] perf/x86: Add support for perf text poke event for text_poke_bp_batch() callers

Add support for perf text poke event for text_poke_bp_batch() callers. That
includes jump labels. See comments for more details.

Signed-off-by: Adrian Hunter <[email protected]>
---
arch/x86/kernel/alternative.c | 37 ++++++++++++++++++++++++++++++++++-
1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 15ac0d5f4b40..69b49521bc4b 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -3,6 +3,7 @@

#include <linux/module.h>
#include <linux/sched.h>
+#include <linux/perf_event.h>
#include <linux/mutex.h>
#include <linux/list.h>
#include <linux/stringify.h>
@@ -947,6 +948,7 @@ struct text_poke_loc {
s32 rel32;
u8 opcode;
const u8 text[POKE_MAX_OPCODE_SIZE];
+ u8 old;
};

struct bp_patching_desc {
@@ -1115,8 +1117,10 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries
/*
* First step: add a int3 trap to the address that will be patched.
*/
- for (i = 0; i < nr_entries; i++)
+ for (i = 0; i < nr_entries; i++) {
+ tp[i].old = *(u8 *)text_poke_addr(&tp[i]);
text_poke(text_poke_addr(&tp[i]), &int3, INT3_INSN_SIZE);
+ }

text_poke_sync();

@@ -1124,14 +1128,45 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries
* Second step: update all but the first byte of the patched range.
*/
for (do_sync = 0, i = 0; i < nr_entries; i++) {
+ u8 old[POKE_MAX_OPCODE_SIZE] = { tp[i].old, };
int len = text_opcode_size(tp[i].opcode);

if (len - INT3_INSN_SIZE > 0) {
+ memcpy(old + INT3_INSN_SIZE,
+ text_poke_addr(&tp[i]) + INT3_INSN_SIZE,
+ len - INT3_INSN_SIZE);
text_poke(text_poke_addr(&tp[i]) + INT3_INSN_SIZE,
(const char *)tp[i].text + INT3_INSN_SIZE,
len - INT3_INSN_SIZE);
do_sync++;
}
+
+ /*
+ * Emit a perf event to record the text poke, primarily to
+ * support Intel PT decoding which must walk the executable code
+ * to reconstruct the trace. The flow up to here is:
+ * - write INT3 byte
+ * - IPI-SYNC
+ * - write instruction tail
+ * At this point the actual control flow will be through the
+ * INT3 and handler and not hit the old or new instruction.
+ * Intel PT outputs FUP/TIP packets for the INT3, so the flow
+ * can still be decoded. Subsequently:
+ * - emit RECORD_TEXT_POKE with the new instruction
+ * - IPI-SYNC
+ * - write first byte
+ * - IPI-SYNC
+ * So before the text poke event timestamp, the decoder will see
+ * either the old instruction flow or FUP/TIP of INT3. After the
+ * text poke event timestamp, the decoder will see either the
+ * new instruction flow or FUP/TIP of INT3. Thus decoders can
+ * use the timestamp as the point at which to modify the
+ * executable code.
+ * The old instruction is recorded so that the event can be
+ * processed forwards or backwards.
+ */
+ perf_event_text_poke(text_poke_addr(&tp[i]), old, len,
+ tp[i].text, len);
}

if (do_sync) {
--
2.17.1

2020-04-05 20:15:15

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V6 03/15] kprobes: Add symbols for kprobe insn pages

Symbols are needed for tools to describe instruction addresses. Pages
allocated for kprobe's purposes need symbols to be created for them.
Add such symbols to be visible via /proc/kallsyms.

Note: kprobe insn pages are not used if ftrace is configured. To see the
effect of this patch, the kernel must be configured with:

# CONFIG_FUNCTION_TRACER is not set
CONFIG_KPROBES=y

and for optimised kprobes:

CONFIG_OPTPROBES=y

Example on x86:

# perf probe __schedule
Added new event:
probe:__schedule (on __schedule)
# cat /proc/kallsyms | grep '\[__builtin__kprobes\]'
ffffffffc00d4000 t kprobe_insn_page [__builtin__kprobes]
ffffffffc00d6000 t kprobe_optinsn_page [__builtin__kprobes]

Note: This patch adds "__builtin__kprobes" as a module name in
/proc/kallsyms for symbols for pages allocated for kprobes' purposes, even
though "__builtin__kprobes" is not a module.

Signed-off-by: Adrian Hunter <[email protected]>
---
include/linux/kprobes.h | 15 ++++++++++++++
kernel/kallsyms.c | 37 +++++++++++++++++++++++++++++----
kernel/kprobes.c | 45 +++++++++++++++++++++++++++++++++++++++++
3 files changed, 93 insertions(+), 4 deletions(-)

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 04bdaf01112c..62d682f47b5e 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -242,6 +242,7 @@ struct kprobe_insn_cache {
struct mutex mutex;
void *(*alloc)(void); /* allocate insn page */
void (*free)(void *); /* free insn page */
+ const char *sym; /* symbol for insn pages */
struct list_head pages; /* list of kprobe_insn_page */
size_t insn_size; /* size of instruction slot */
int nr_garbage;
@@ -272,6 +273,8 @@ static inline bool is_kprobe_##__name##_slot(unsigned long addr) \
{ \
return __is_insn_slot_addr(&kprobe_##__name##_slots, addr); \
}
+#define KPROBE_INSN_PAGE_SYM "kprobe_insn_page"
+#define KPROBE_OPTINSN_PAGE_SYM "kprobe_optinsn_page"
#else /* __ARCH_WANT_KPROBES_INSN_SLOT */
#define DEFINE_INSN_CACHE_OPS(__name) \
static inline bool is_kprobe_##__name##_slot(unsigned long addr) \
@@ -373,6 +376,13 @@ void dump_kprobe(struct kprobe *kp);
void *alloc_insn_page(void);
void free_insn_page(void *page);

+int kprobe_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
+ char *sym);
+int kprobe_cache_get_kallsym(struct kprobe_insn_cache *c, unsigned int *symnum,
+ unsigned long *value, char *type, char *sym);
+
+int arch_kprobe_get_kallsym(unsigned int *symnum, unsigned long *value,
+ char *type, char *sym);
#else /* !CONFIG_KPROBES: */

static inline int kprobes_built_in(void)
@@ -435,6 +445,11 @@ static inline bool within_kprobe_blacklist(unsigned long addr)
{
return true;
}
+static inline int kprobe_get_kallsym(unsigned int symnum, unsigned long *value,
+ char *type, char *sym)
+{
+ return -ERANGE;
+}
#endif /* CONFIG_KPROBES */
static inline int disable_kretprobe(struct kretprobe *rp)
{
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index a9b3f660dee7..92b510aa6d79 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -24,6 +24,7 @@
#include <linux/slab.h>
#include <linux/filter.h>
#include <linux/ftrace.h>
+#include <linux/kprobes.h>
#include <linux/compiler.h>

/*
@@ -439,6 +440,7 @@ struct kallsym_iter {
loff_t pos_arch_end;
loff_t pos_mod_end;
loff_t pos_ftrace_mod_end;
+ loff_t pos_bpf_end;
unsigned long value;
unsigned int nameoff; /* If iterating in core kernel symbols. */
char type;
@@ -498,11 +500,33 @@ static int get_ksymbol_ftrace_mod(struct kallsym_iter *iter)

static int get_ksymbol_bpf(struct kallsym_iter *iter)
{
+ int ret;
+
strlcpy(iter->module_name, "bpf", MODULE_NAME_LEN);
iter->exported = 0;
- return bpf_get_kallsym(iter->pos - iter->pos_ftrace_mod_end,
- &iter->value, &iter->type,
- iter->name) < 0 ? 0 : 1;
+ ret = bpf_get_kallsym(iter->pos - iter->pos_ftrace_mod_end,
+ &iter->value, &iter->type,
+ iter->name);
+ if (ret < 0) {
+ iter->pos_bpf_end = iter->pos;
+ return 0;
+ }
+
+ return 1;
+}
+
+/*
+ * This uses "__builtin__kprobes" as a module name for symbols for pages
+ * allocated for kprobes' purposes, even though "__builtin__kprobes" is not a
+ * module.
+ */
+static int get_ksymbol_kprobe(struct kallsym_iter *iter)
+{
+ strlcpy(iter->module_name, "__builtin__kprobes", MODULE_NAME_LEN);
+ iter->exported = 0;
+ return kprobe_get_kallsym(iter->pos - iter->pos_bpf_end,
+ &iter->value, &iter->type,
+ iter->name) < 0 ? 0 : 1;
}

/* Returns space to next name. */
@@ -529,6 +553,7 @@ static void reset_iter(struct kallsym_iter *iter, loff_t new_pos)
iter->pos_arch_end = 0;
iter->pos_mod_end = 0;
iter->pos_ftrace_mod_end = 0;
+ iter->pos_bpf_end = 0;
}
}

@@ -553,7 +578,11 @@ static int update_iter_mod(struct kallsym_iter *iter, loff_t pos)
get_ksymbol_ftrace_mod(iter))
return 1;

- return get_ksymbol_bpf(iter);
+ if ((!iter->pos_bpf_end || iter->pos_bpf_end > pos) &&
+ get_ksymbol_bpf(iter))
+ return 1;
+
+ return get_ksymbol_kprobe(iter);
}

/* Returns false if pos at or past end of file. */
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 2625c241ac00..229d1b596690 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -118,6 +118,7 @@ struct kprobe_insn_cache kprobe_insn_slots = {
.mutex = __MUTEX_INITIALIZER(kprobe_insn_slots.mutex),
.alloc = alloc_insn_page,
.free = free_insn_page,
+ .sym = KPROBE_INSN_PAGE_SYM,
.pages = LIST_HEAD_INIT(kprobe_insn_slots.pages),
.insn_size = MAX_INSN_SIZE,
.nr_garbage = 0,
@@ -296,6 +297,7 @@ struct kprobe_insn_cache kprobe_optinsn_slots = {
.mutex = __MUTEX_INITIALIZER(kprobe_optinsn_slots.mutex),
.alloc = alloc_insn_page,
.free = free_insn_page,
+ .sym = KPROBE_OPTINSN_PAGE_SYM,
.pages = LIST_HEAD_INIT(kprobe_optinsn_slots.pages),
/* .insn_size is initialized later */
.nr_garbage = 0,
@@ -2179,6 +2181,49 @@ int kprobe_add_area_blacklist(unsigned long start, unsigned long end)
return 0;
}

+int kprobe_cache_get_kallsym(struct kprobe_insn_cache *c, unsigned int *symnum,
+ unsigned long *value, char *type, char *sym)
+{
+ struct kprobe_insn_page *kip;
+ int ret = -ERANGE;
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(kip, &c->pages, list) {
+ if ((*symnum)--)
+ continue;
+ strlcpy(sym, c->sym, KSYM_NAME_LEN);
+ *type = 't';
+ *value = (unsigned long)kip->insns;
+ ret = 0;
+ break;
+ }
+ rcu_read_unlock();
+
+ return ret;
+}
+
+int __weak arch_kprobe_get_kallsym(unsigned int *symnum, unsigned long *value,
+ char *type, char *sym)
+{
+ return -ERANGE;
+}
+
+int kprobe_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
+ char *sym)
+{
+#ifdef __ARCH_WANT_KPROBES_INSN_SLOT
+ if (!kprobe_cache_get_kallsym(&kprobe_insn_slots, &symnum, value, type, sym))
+ return 0;
+#ifdef CONFIG_OPTPROBES
+ if (!kprobe_cache_get_kallsym(&kprobe_optinsn_slots, &symnum, value, type, sym))
+ return 0;
+#endif
+#endif
+ if (!arch_kprobe_get_kallsym(&symnum, value, type, sym))
+ return 0;
+ return -ERANGE;
+}
+
int __init __weak arch_populate_kprobe_blacklist(void)
{
return 0;
--
2.17.1

2020-04-05 20:15:19

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V6 04/15] kprobes: Add perf ksymbol events for kprobe insn pages

Symbols are needed for tools to describe instruction addresses. Pages
allocated for kprobe's purposes need symbols to be created for them.
Add such symbols to be visible via perf ksymbol events.

Signed-off-by: Adrian Hunter <[email protected]>
Acked-by: Masami Hiramatsu <[email protected]>
---
include/uapi/linux/perf_event.h | 5 +++++
kernel/kprobes.c | 12 ++++++++++++
2 files changed, 17 insertions(+)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index e5bee6c17b86..e1a4179144a1 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -1049,6 +1049,11 @@ enum perf_event_type {
enum perf_record_ksymbol_type {
PERF_RECORD_KSYMBOL_TYPE_UNKNOWN = 0,
PERF_RECORD_KSYMBOL_TYPE_BPF = 1,
+ /*
+ * Out of line code such as kprobe-replaced instructions or optimized
+ * kprobes.
+ */
+ PERF_RECORD_KSYMBOL_TYPE_OOL = 2,
PERF_RECORD_KSYMBOL_TYPE_MAX /* non-ABI */
};

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 229d1b596690..f880eb2189c0 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -35,6 +35,7 @@
#include <linux/ftrace.h>
#include <linux/cpu.h>
#include <linux/jump_label.h>
+#include <linux/perf_event.h>

#include <asm/sections.h>
#include <asm/cacheflush.h>
@@ -184,6 +185,10 @@ kprobe_opcode_t *__get_insn_slot(struct kprobe_insn_cache *c)
kip->cache = c;
list_add_rcu(&kip->list, &c->pages);
slot = kip->insns;
+
+ /* Record the perf ksymbol register event after adding the page */
+ perf_event_ksymbol(PERF_RECORD_KSYMBOL_TYPE_OOL, (u64)kip->insns,
+ PAGE_SIZE, false, c->sym);
out:
mutex_unlock(&c->mutex);
return slot;
@@ -202,6 +207,13 @@ static int collect_one_slot(struct kprobe_insn_page *kip, int idx)
* next time somebody inserts a probe.
*/
if (!list_is_singular(&kip->list)) {
+ /*
+ * Record perf ksymbol unregister event before removing
+ * the page.
+ */
+ perf_event_ksymbol(PERF_RECORD_KSYMBOL_TYPE_OOL,
+ (u64)kip->insns, PAGE_SIZE, true,
+ kip->cache->sym);
list_del_rcu(&kip->list);
synchronize_rcu();
kip->cache->free(kip->insns);
--
2.17.1

2020-04-05 20:15:25

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V6 05/15] perf/x86: Add perf text poke events for kprobes

Add perf text poke events for kprobes. That includes:

- the replaced instruction(s) which are executed out-of-line
i.e. arch_copy_kprobe() and arch_remove_kprobe()

- the INT3 that activates the kprobe
i.e. arch_arm_kprobe() and arch_disarm_kprobe()

- optimised kprobe function
i.e. arch_prepare_optimized_kprobe() and
__arch_remove_optimized_kprobe()

- optimised kprobe
i.e. arch_optimize_kprobes() and arch_unoptimize_kprobe()

Resulting in 8 possible text_poke events:

0: NULL -> probe.ainsn.insn (if ainsn.boostable && !kp.post_handler)
arch_copy_kprobe()

1: old0 -> INT3 arch_arm_kprobe()

// boosted kprobe active

2: NULL -> optprobe_trampoline arch_prepare_optimized_kprobe()

3: INT3,old1,old2,old3,old4 -> JMP32 arch_optimize_kprobes()

// optprobe active

4: JMP32 -> INT3,old1,old2,old3,old4

// optprobe disabled and kprobe active (this sometimes goes back to 3)
arch_unoptimize_kprobe()

5: optprobe_trampoline -> NULL arch_remove_optimized_kprobe()

// boosted kprobe active

6: INT3 -> old0 arch_disarm_kprobe()

7: probe.ainsn.insn -> NULL (if ainsn.boostable && !kp.post_handler)
arch_remove_kprobe()

Signed-off-by: Adrian Hunter <[email protected]>
Acked-by: Masami Hiramatsu <[email protected]>
---
arch/x86/include/asm/kprobes.h | 2 ++
arch/x86/kernel/kprobes/core.c | 15 +++++++++++++-
arch/x86/kernel/kprobes/opt.c | 38 +++++++++++++++++++++++++++++-----
3 files changed, 49 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kprobes.h b/arch/x86/include/asm/kprobes.h
index 073eb7ad2f56..143bc9abe99c 100644
--- a/arch/x86/include/asm/kprobes.h
+++ b/arch/x86/include/asm/kprobes.h
@@ -66,6 +66,8 @@ struct arch_specific_insn {
*/
bool boostable;
bool if_modifier;
+ /* Number of bytes of text poked */
+ int tp_len;
};

struct arch_optimized_insn {
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 4d7022a740ab..6d8b58c84cbc 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -33,6 +33,7 @@
#include <linux/hardirq.h>
#include <linux/preempt.h>
#include <linux/sched/debug.h>
+#include <linux/perf_event.h>
#include <linux/extable.h>
#include <linux/kdebug.h>
#include <linux/kallsyms.h>
@@ -471,6 +472,9 @@ static int arch_copy_kprobe(struct kprobe *p)
/* Also, displacement change doesn't affect the first byte */
p->opcode = buf[0];

+ p->ainsn.tp_len = len;
+ perf_event_text_poke(p->ainsn.insn, NULL, 0, buf, len);
+
/* OK, write back the instruction(s) into ROX insn buffer */
text_poke(p->ainsn.insn, buf, len);

@@ -502,12 +506,18 @@ int arch_prepare_kprobe(struct kprobe *p)

void arch_arm_kprobe(struct kprobe *p)
{
- text_poke(p->addr, ((unsigned char []){INT3_INSN_OPCODE}), 1);
+ u8 int3 = INT3_INSN_OPCODE;
+
+ text_poke(p->addr, &int3, 1);
text_poke_sync();
+ perf_event_text_poke(p->addr, &p->opcode, 1, &int3, 1);
}

void arch_disarm_kprobe(struct kprobe *p)
{
+ u8 int3 = INT3_INSN_OPCODE;
+
+ perf_event_text_poke(p->addr, &int3, 1, &p->opcode, 1);
text_poke(p->addr, &p->opcode, 1);
text_poke_sync();
}
@@ -515,6 +525,9 @@ void arch_disarm_kprobe(struct kprobe *p)
void arch_remove_kprobe(struct kprobe *p)
{
if (p->ainsn.insn) {
+ /* Record the perf event before freeing the slot */
+ perf_event_text_poke(p->ainsn.insn, p->ainsn.insn,
+ p->ainsn.tp_len, NULL, 0);
free_insn_slot(p->ainsn.insn, p->ainsn.boostable);
p->ainsn.insn = NULL;
}
diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c
index ea13f6888284..f86be31d37cc 100644
--- a/arch/x86/kernel/kprobes/opt.c
+++ b/arch/x86/kernel/kprobes/opt.c
@@ -6,6 +6,7 @@
* Copyright (C) Hitachi Ltd., 2012
*/
#include <linux/kprobes.h>
+#include <linux/perf_event.h>
#include <linux/ptrace.h>
#include <linux/string.h>
#include <linux/slab.h>
@@ -354,8 +355,15 @@ int arch_within_optimized_kprobe(struct optimized_kprobe *op,
static
void __arch_remove_optimized_kprobe(struct optimized_kprobe *op, int dirty)
{
- if (op->optinsn.insn) {
- free_optinsn_slot(op->optinsn.insn, dirty);
+ u8 *slot = op->optinsn.insn;
+ if (slot) {
+ int len = TMPL_END_IDX + op->optinsn.size + JMP32_INSN_SIZE;
+
+ /* Record the perf event before freeing the slot */
+ if (dirty)
+ perf_event_text_poke(slot, slot, len, NULL, 0);
+
+ free_optinsn_slot(slot, dirty);
op->optinsn.insn = NULL;
op->optinsn.size = 0;
}
@@ -426,8 +434,15 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe *op,
(u8 *)op->kp.addr + op->optinsn.size);
len += JMP32_INSN_SIZE;

+ /*
+ * Note len = TMPL_END_IDX + op->optinsn.size + JMP32_INSN_SIZE is also
+ * used in __arch_remove_optimized_kprobe().
+ */
+
/* We have to use text_poke() for instruction buffer because it is RO */
+ perf_event_text_poke(slot, NULL, 0, buf, len);
text_poke(slot, buf, len);
+
ret = 0;
out:
kfree(buf);
@@ -479,10 +494,23 @@ void arch_optimize_kprobes(struct list_head *oplist)
*/
void arch_unoptimize_kprobe(struct optimized_kprobe *op)
{
- arch_arm_kprobe(&op->kp);
- text_poke(op->kp.addr + INT3_INSN_SIZE,
- op->optinsn.copied_insn, DISP32_SIZE);
+ u8 new[JMP32_INSN_SIZE] = { INT3_INSN_OPCODE, };
+ u8 old[JMP32_INSN_SIZE];
+ u8 *addr = op->kp.addr;
+
+ memcpy(old, op->kp.addr, JMP32_INSN_SIZE);
+ memcpy(new + INT3_INSN_SIZE,
+ op->optinsn.copied_insn,
+ JMP32_INSN_SIZE - INT3_INSN_SIZE);
+
+ text_poke(addr, new, INT3_INSN_SIZE);
text_poke_sync();
+ text_poke(addr + INT3_INSN_SIZE,
+ new + INT3_INSN_SIZE,
+ JMP32_INSN_SIZE - INT3_INSN_SIZE);
+ text_poke_sync();
+
+ perf_event_text_poke(op->kp.addr, old, JMP32_INSN_SIZE, new, JMP32_INSN_SIZE);
}

/*
--
2.17.1

2020-04-05 20:15:39

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V6 07/15] ftrace: Add perf ksymbol events for ftrace trampolines

Symbols are needed for tools to describe instruction addresses. Pages
allocated for ftrace's purposes need symbols to be created for them.
Add such symbols to be visible via perf ksymbol events.

Signed-off-by: Adrian Hunter <[email protected]>
---
include/uapi/linux/perf_event.h | 2 +-
kernel/trace/ftrace.c | 14 ++++++++++++--
2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index e1a4179144a1..52ca2093831c 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -1051,7 +1051,7 @@ enum perf_record_ksymbol_type {
PERF_RECORD_KSYMBOL_TYPE_BPF = 1,
/*
* Out of line code such as kprobe-replaced instructions or optimized
- * kprobes.
+ * kprobes or ftrace trampolines.
*/
PERF_RECORD_KSYMBOL_TYPE_OOL = 2,
PERF_RECORD_KSYMBOL_TYPE_MAX /* non-ABI */
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 4e1c2f0bea76..398d58b8ffa5 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -2791,8 +2791,13 @@ static void ftrace_remove_trampoline_from_kallsyms(struct ftrace_ops *ops)
static void ftrace_trampoline_free(struct ftrace_ops *ops)
{
if (ops && (ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP) &&
- ops->trampoline)
+ ops->trampoline) {
+ perf_event_ksymbol(PERF_RECORD_KSYMBOL_TYPE_OOL,
+ ops->trampoline, ops->trampoline_size,
+ true, FTRACE_TRAMPOLINE_SYM);
+ /* Remove from kallsyms after the perf events */
ftrace_remove_trampoline_from_kallsyms(ops);
+ }

arch_ftrace_trampoline_free(ops);
}
@@ -6805,8 +6810,13 @@ static void ftrace_update_trampoline(struct ftrace_ops *ops)

arch_ftrace_update_trampoline(ops);
if (ops->trampoline && ops->trampoline != trampoline &&
- (ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP))
+ (ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP)) {
+ /* Add to kallsyms before the perf events */
ftrace_add_trampoline_to_kallsyms(ops);
+ perf_event_ksymbol(PERF_RECORD_KSYMBOL_TYPE_OOL,
+ ops->trampoline, ops->trampoline_size, false,
+ FTRACE_TRAMPOLINE_SYM);
+ }
}

void ftrace_init_trace_array(struct trace_array *tr)
--
2.17.1

2020-04-05 20:15:45

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V6 08/15] ftrace: Add perf text poke events for ftrace trampolines

Add perf text poke events for ftrace trampolines when created and when
freed.

Signed-off-by: Adrian Hunter <[email protected]>
---
kernel/trace/ftrace.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 398d58b8ffa5..3318f7eca76c 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -2792,6 +2792,13 @@ static void ftrace_trampoline_free(struct ftrace_ops *ops)
{
if (ops && (ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP) &&
ops->trampoline) {
+ /*
+ * Record the text poke event before the ksymbol unregister
+ * event.
+ */
+ perf_event_text_poke((void *)ops->trampoline,
+ (void *)ops->trampoline,
+ ops->trampoline_size, NULL, 0);
perf_event_ksymbol(PERF_RECORD_KSYMBOL_TYPE_OOL,
ops->trampoline, ops->trampoline_size,
true, FTRACE_TRAMPOLINE_SYM);
@@ -6816,6 +6823,13 @@ static void ftrace_update_trampoline(struct ftrace_ops *ops)
perf_event_ksymbol(PERF_RECORD_KSYMBOL_TYPE_OOL,
ops->trampoline, ops->trampoline_size, false,
FTRACE_TRAMPOLINE_SYM);
+ /*
+ * Record the perf text poke event after the ksymbol register
+ * event.
+ */
+ perf_event_text_poke((void *)ops->trampoline, NULL, 0,
+ (void *)ops->trampoline,
+ ops->trampoline_size);
}
}

--
2.17.1

2020-04-05 20:15:54

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V6 10/15] perf evlist: Disable 'immediate' events last

Events marked as 'immediate' are started before other events to ensure
that there is context at the start of the main tracing events. The same
is true at the end of tracing, so disable 'immediate' events after other
events.

Signed-off-by: Adrian Hunter <[email protected]>
---
tools/perf/util/evlist.c | 31 +++++++++++++++++++++----------
1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 1548237b6558..06c4586065cc 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -379,22 +379,33 @@ void evlist__disable(struct evlist *evlist)
{
struct evsel *pos;
struct affinity affinity;
- int cpu, i;
+ int cpu, i, imm = 0;
+ bool has_imm = false;

if (affinity__setup(&affinity) < 0)
return;

- evlist__for_each_cpu(evlist, i, cpu) {
- affinity__set(&affinity, cpu);
-
- evlist__for_each_entry(evlist, pos) {
- if (evsel__cpu_iter_skip(pos, cpu))
- continue;
- if (pos->disabled || !perf_evsel__is_group_leader(pos) || !pos->core.fd)
- continue;
- evsel__disable_cpu(pos, pos->cpu_iter - 1);
+ /* Disable 'immediate' events last */
+ for (imm = 0; imm <= 1; imm++) {
+ evlist__for_each_cpu(evlist, i, cpu) {
+ affinity__set(&affinity, cpu);
+
+ evlist__for_each_entry(evlist, pos) {
+ if (evsel__cpu_iter_skip(pos, cpu))
+ continue;
+ if (pos->disabled || !perf_evsel__is_group_leader(pos) || !pos->core.fd)
+ continue;
+ if (pos->immediate)
+ has_imm = true;
+ if (pos->immediate != imm)
+ continue;
+ evsel__disable_cpu(pos, pos->cpu_iter - 1);
+ }
}
+ if (!has_imm)
+ break;
}
+
affinity__cleanup(&affinity);
evlist__for_each_entry(evlist, pos) {
if (!perf_evsel__is_group_leader(pos) || !pos->core.fd)
--
2.17.1

2020-04-05 20:16:11

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V6 12/15] perf tools: Add support for PERF_RECORD_KSYMBOL_TYPE_OOL

PERF_RECORD_KSYMBOL_TYPE_OOL marks an executable page. Create a map backed
only by memory, which will be populated as necessary by text poke events.

Signed-off-by: Adrian Hunter <[email protected]>
---
tools/include/uapi/linux/perf_event.h | 5 +++++
tools/perf/util/dso.c | 3 +++
tools/perf/util/dso.h | 1 +
tools/perf/util/machine.c | 6 ++++++
tools/perf/util/map.c | 5 +++++
tools/perf/util/map.h | 3 ++-
tools/perf/util/symbol.c | 1 +
7 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index e5bee6c17b86..52ca2093831c 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -1049,6 +1049,11 @@ enum perf_event_type {
enum perf_record_ksymbol_type {
PERF_RECORD_KSYMBOL_TYPE_UNKNOWN = 0,
PERF_RECORD_KSYMBOL_TYPE_BPF = 1,
+ /*
+ * Out of line code such as kprobe-replaced instructions or optimized
+ * kprobes or ftrace trampolines.
+ */
+ PERF_RECORD_KSYMBOL_TYPE_OOL = 2,
PERF_RECORD_KSYMBOL_TYPE_MAX /* non-ABI */
};

diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index 91f21239608b..9eb50ff6da96 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -191,6 +191,7 @@ int dso__read_binary_type_filename(const struct dso *dso,
case DSO_BINARY_TYPE__GUEST_KALLSYMS:
case DSO_BINARY_TYPE__JAVA_JIT:
case DSO_BINARY_TYPE__BPF_PROG_INFO:
+ case DSO_BINARY_TYPE__OOL:
case DSO_BINARY_TYPE__NOT_FOUND:
ret = -1;
break;
@@ -881,6 +882,8 @@ static struct dso_cache *dso_cache__populate(struct dso *dso,

if (dso->binary_type == DSO_BINARY_TYPE__BPF_PROG_INFO)
*ret = bpf_read(dso, cache_offset, cache->data);
+ else if (dso->binary_type == DSO_BINARY_TYPE__OOL)
+ *ret = DSO__DATA_CACHE_SIZE;
else
*ret = file_read(dso, machine, cache_offset, cache->data);

diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index 2db64b79617a..b482453bc3d1 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -40,6 +40,7 @@ enum dso_binary_type {
DSO_BINARY_TYPE__GUEST_KCORE,
DSO_BINARY_TYPE__OPENEMBEDDED_DEBUGINFO,
DSO_BINARY_TYPE__BPF_PROG_INFO,
+ DSO_BINARY_TYPE__OOL,
DSO_BINARY_TYPE__NOT_FOUND,
};

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 0fff43cf15e5..8447334ac056 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -756,6 +756,12 @@ static int machine__process_ksymbol_register(struct machine *machine,
return -ENOMEM;
}

+ if (event->ksymbol.ksym_type == PERF_RECORD_KSYMBOL_TYPE_OOL) {
+ map->dso->binary_type = DSO_BINARY_TYPE__OOL;
+ map->dso->data.file_size = event->ksymbol.len;
+ dso__set_loaded(map->dso);
+ }
+
map->start = event->ksymbol.addr;
map->end = map->start + event->ksymbol.len;
maps__insert(&machine->kmaps, map);
diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index 53d96611e6a6..f9dc8c5493ea 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -267,6 +267,11 @@ bool __map__is_bpf_prog(const struct map *map)
return name && (strstr(name, "bpf_prog_") == name);
}

+bool __map__is_ool(const struct map *map)
+{
+ return map->dso && map->dso->binary_type == DSO_BINARY_TYPE__OOL;
+}
+
bool map__has_symbols(const struct map *map)
{
return dso__has_symbols(map->dso);
diff --git a/tools/perf/util/map.h b/tools/perf/util/map.h
index 067036e8970c..9e312ae2d656 100644
--- a/tools/perf/util/map.h
+++ b/tools/perf/util/map.h
@@ -147,11 +147,12 @@ int map__set_kallsyms_ref_reloc_sym(struct map *map, const char *symbol_name,
bool __map__is_kernel(const struct map *map);
bool __map__is_extra_kernel_map(const struct map *map);
bool __map__is_bpf_prog(const struct map *map);
+bool __map__is_ool(const struct map *map);

static inline bool __map__is_kmodule(const struct map *map)
{
return !__map__is_kernel(map) && !__map__is_extra_kernel_map(map) &&
- !__map__is_bpf_prog(map);
+ !__map__is_bpf_prog(map) && !__map__is_ool(map);
}

bool map__has_symbols(const struct map *map);
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 26bc6a0096ce..7d4b5ca0bd12 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1544,6 +1544,7 @@ static bool dso__is_compatible_symtab_type(struct dso *dso, bool kmod,
return true;

case DSO_BINARY_TYPE__BPF_PROG_INFO:
+ case DSO_BINARY_TYPE__OOL:
case DSO_BINARY_TYPE__NOT_FOUND:
default:
return false;
--
2.17.1

2020-04-05 20:16:28

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V6 06/15] ftrace: Add symbols for ftrace trampolines

Symbols are needed for tools to describe instruction addresses. Pages
allocated for ftrace's purposes need symbols to be created for them.
Add such symbols to be visible via /proc/kallsyms.

Example on x86 with CONFIG_DYNAMIC_FTRACE=y

# echo function > /sys/kernel/debug/tracing/current_tracer
# cat /proc/kallsyms | grep '\[__builtin__ftrace\]'
ffffffffc0238000 t ftrace_trampoline [__builtin__ftrace]

Note: This patch adds "__builtin__ftrace" as a module name in /proc/kallsyms for
symbols for pages allocated for ftrace's purposes, even though "__builtin__ftrace"
is not a module.

Signed-off-by: Adrian Hunter <[email protected]>
---
include/linux/ftrace.h | 12 ++++---
kernel/kallsyms.c | 5 +++
kernel/trace/ftrace.c | 77 ++++++++++++++++++++++++++++++++++++++++--
3 files changed, 88 insertions(+), 6 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index db95244a62d4..ea726ad1fa83 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -58,9 +58,6 @@ struct ftrace_direct_func;
const char *
ftrace_mod_address_lookup(unsigned long addr, unsigned long *size,
unsigned long *off, char **modname, char *sym);
-int ftrace_mod_get_kallsym(unsigned int symnum, unsigned long *value,
- char *type, char *name,
- char *module_name, int *exported);
#else
static inline const char *
ftrace_mod_address_lookup(unsigned long addr, unsigned long *size,
@@ -68,6 +65,13 @@ ftrace_mod_address_lookup(unsigned long addr, unsigned long *size,
{
return NULL;
}
+#endif
+
+#if defined(CONFIG_FUNCTION_TRACER) && defined(CONFIG_DYNAMIC_FTRACE)
+int ftrace_mod_get_kallsym(unsigned int symnum, unsigned long *value,
+ char *type, char *name,
+ char *module_name, int *exported);
+#else
static inline int ftrace_mod_get_kallsym(unsigned int symnum, unsigned long *value,
char *type, char *name,
char *module_name, int *exported)
@@ -76,7 +80,6 @@ static inline int ftrace_mod_get_kallsym(unsigned int symnum, unsigned long *val
}
#endif

-
#ifdef CONFIG_FUNCTION_TRACER

extern int ftrace_enabled;
@@ -207,6 +210,7 @@ struct ftrace_ops {
struct ftrace_ops_hash old_hash;
unsigned long trampoline;
unsigned long trampoline_size;
+ struct list_head list;
#endif
};

diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 92b510aa6d79..b6b8613906da 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -484,6 +484,11 @@ static int get_ksymbol_mod(struct kallsym_iter *iter)
return 1;
}

+/*
+ * ftrace_mod_get_kallsym() may also get symbols for pages allocated for ftrace
+ * purposes. In that case "__builtin__ftrace" is used as a module name, even
+ * though "__builtin__ftrace" is not a module.
+ */
static int get_ksymbol_ftrace_mod(struct kallsym_iter *iter)
{
int ret = ftrace_mod_get_kallsym(iter->pos - iter->pos_mod_end,
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index fd81c7de77a7..4e1c2f0bea76 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -2765,6 +2765,38 @@ void __weak arch_ftrace_trampoline_free(struct ftrace_ops *ops)
{
}

+/* List of trace_ops that have allocated trampolines */
+static LIST_HEAD(ftrace_ops_trampoline_list);
+
+static void ftrace_add_trampoline_to_kallsyms(struct ftrace_ops *ops)
+{
+ lockdep_assert_held(&ftrace_lock);
+ list_add_rcu(&ops->list, &ftrace_ops_trampoline_list);
+}
+
+static void ftrace_remove_trampoline_from_kallsyms(struct ftrace_ops *ops)
+{
+ lockdep_assert_held(&ftrace_lock);
+ list_del_rcu(&ops->list);
+}
+
+/*
+ * "__builtin__ftrace" is used as a module name in /proc/kallsyms for symbols
+ * for pages allocated for ftrace purposes, even though "__builtin__ftrace" is
+ * not a module.
+ */
+#define FTRACE_TRAMPOLINE_MOD "__builtin__ftrace"
+#define FTRACE_TRAMPOLINE_SYM "ftrace_trampoline"
+
+static void ftrace_trampoline_free(struct ftrace_ops *ops)
+{
+ if (ops && (ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP) &&
+ ops->trampoline)
+ ftrace_remove_trampoline_from_kallsyms(ops);
+
+ arch_ftrace_trampoline_free(ops);
+}
+
static void ftrace_startup_enable(int command)
{
if (saved_ftrace_func != ftrace_trace_function) {
@@ -2935,7 +2967,7 @@ int ftrace_shutdown(struct ftrace_ops *ops, int command)
synchronize_rcu_tasks();

free_ops:
- arch_ftrace_trampoline_free(ops);
+ ftrace_trampoline_free(ops);
}

return 0;
@@ -6178,6 +6210,27 @@ struct ftrace_mod_map {
unsigned int num_funcs;
};

+static int ftrace_get_trampoline_kallsym(unsigned int symnum,
+ unsigned long *value, char *type,
+ char *name, char *module_name,
+ int *exported)
+{
+ struct ftrace_ops *op;
+
+ list_for_each_entry_rcu(op, &ftrace_ops_trampoline_list, list) {
+ if (!op->trampoline || symnum--)
+ continue;
+ *value = op->trampoline;
+ *type = 't';
+ strlcpy(name, FTRACE_TRAMPOLINE_SYM, KSYM_NAME_LEN);
+ strlcpy(module_name, FTRACE_TRAMPOLINE_MOD, MODULE_NAME_LEN);
+ *exported = 0;
+ return 0;
+ }
+
+ return -ERANGE;
+}
+
#ifdef CONFIG_MODULES

#define next_to_ftrace_page(p) container_of(p, struct ftrace_page, next)
@@ -6514,6 +6567,7 @@ int ftrace_mod_get_kallsym(unsigned int symnum, unsigned long *value,
{
struct ftrace_mod_map *mod_map;
struct ftrace_mod_func *mod_func;
+ int ret;

preempt_disable();
list_for_each_entry_rcu(mod_map, &ftrace_mod_maps, list) {
@@ -6540,8 +6594,10 @@ int ftrace_mod_get_kallsym(unsigned int symnum, unsigned long *value,
WARN_ON(1);
break;
}
+ ret = ftrace_get_trampoline_kallsym(symnum, value, type, name,
+ module_name, exported);
preempt_enable();
- return -ERANGE;
+ return ret;
}

#else
@@ -6553,6 +6609,18 @@ allocate_ftrace_mod_map(struct module *mod,
{
return NULL;
}
+int ftrace_mod_get_kallsym(unsigned int symnum, unsigned long *value,
+ char *type, char *name, char *module_name,
+ int *exported)
+{
+ int ret;
+
+ preempt_disable();
+ ret = ftrace_get_trampoline_kallsym(symnum, value, type, name,
+ module_name, exported);
+ preempt_enable();
+ return ret;
+}
#endif /* CONFIG_MODULES */

struct ftrace_init_func {
@@ -6733,7 +6801,12 @@ void __weak arch_ftrace_update_trampoline(struct ftrace_ops *ops)

static void ftrace_update_trampoline(struct ftrace_ops *ops)
{
+ unsigned long trampoline = ops->trampoline;
+
arch_ftrace_update_trampoline(ops);
+ if (ops->trampoline && ops->trampoline != trampoline &&
+ (ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP))
+ ftrace_add_trampoline_to_kallsyms(ops);
}

void ftrace_init_trace_array(struct trace_array *tr)
--
2.17.1

2020-04-05 20:16:32

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V6 11/15] perf tools: Add support for PERF_RECORD_TEXT_POKE

Add processing for PERF_RECORD_TEXT_POKE events. When a text poke event is
processed, then the kernel dso data cache is updated with the poked bytes.

Signed-off-by: Adrian Hunter <[email protected]>
---
tools/include/uapi/linux/perf_event.h | 21 +++++++++-
tools/lib/perf/include/perf/event.h | 9 +++++
tools/perf/builtin-record.c | 45 ++++++++++++++++++++++
tools/perf/util/event.c | 47 +++++++++++++++++++++++
tools/perf/util/event.h | 5 +++
tools/perf/util/evlist.h | 1 +
tools/perf/util/evsel.c | 7 +++-
tools/perf/util/machine.c | 43 +++++++++++++++++++++
tools/perf/util/machine.h | 3 ++
tools/perf/util/perf_event_attr_fprintf.c | 1 +
tools/perf/util/record.c | 10 +++++
tools/perf/util/record.h | 1 +
tools/perf/util/session.c | 23 +++++++++++
tools/perf/util/tool.h | 3 +-
14 files changed, 216 insertions(+), 3 deletions(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 7b2d6fc9e6ed..e5bee6c17b86 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -383,7 +383,8 @@ struct perf_event_attr {
bpf_event : 1, /* include bpf events */
aux_output : 1, /* generate AUX records instead of events */
cgroup : 1, /* include cgroup events */
- __reserved_1 : 31;
+ text_poke : 1, /* include text poke events */
+ __reserved_1 : 30;

union {
__u32 wakeup_events; /* wakeup every n events */
@@ -1024,6 +1025,24 @@ enum perf_event_type {
*/
PERF_RECORD_CGROUP = 19,

+ /*
+ * Records changes to kernel text i.e. self-modified code. 'old_len' is
+ * the number of old bytes, 'new_len' is the number of new bytes. Either
+ * 'old_len' or 'new_len' may be zero to indicate, for example, the
+ * addition or removal of a trampoline. 'bytes' contains the old bytes
+ * followed immediately by the new bytes.
+ *
+ * struct {
+ * struct perf_event_header header;
+ * u64 addr;
+ * u16 old_len;
+ * u16 new_len;
+ * u8 bytes[];
+ * struct sample_id sample_id;
+ * };
+ */
+ PERF_RECORD_TEXT_POKE = 20,
+
PERF_RECORD_MAX, /* non-ABI */
};

diff --git a/tools/lib/perf/include/perf/event.h b/tools/lib/perf/include/perf/event.h
index 69b44d2cc0f5..842028858d66 100644
--- a/tools/lib/perf/include/perf/event.h
+++ b/tools/lib/perf/include/perf/event.h
@@ -111,6 +111,14 @@ struct perf_record_cgroup {
char path[PATH_MAX];
};

+struct perf_record_text_poke_event {
+ struct perf_event_header header;
+ __u64 addr;
+ __u16 old_len;
+ __u16 new_len;
+ __u8 bytes[];
+};
+
struct perf_record_sample {
struct perf_event_header header;
__u64 array[];
@@ -367,6 +375,7 @@ union perf_event {
struct perf_record_sample sample;
struct perf_record_bpf_event bpf;
struct perf_record_ksymbol ksymbol;
+ struct perf_record_text_poke_event text_poke;
struct perf_record_header_attr attr;
struct perf_record_event_update event_update;
struct perf_record_header_event_type event_type;
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 7d7912e121d6..839edc23242c 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -723,6 +723,43 @@ static int record__auxtrace_init(struct record *rec __maybe_unused)

#endif

+static int record__config_text_poke(struct evlist *evlist)
+{
+ struct evsel *evsel;
+ int err;
+
+ /* Nothing to do if text poke is already configured */
+ evlist__for_each_entry(evlist, evsel) {
+ if (evsel->core.attr.text_poke)
+ return 0;
+ }
+
+ err = parse_events(evlist, "dummy:u", NULL);
+ if (err)
+ return err;
+
+ evsel = evlist__last(evlist);
+
+ evsel->core.attr.freq = 0;
+ evsel->core.attr.sample_period = 1;
+ evsel->core.attr.text_poke = 1;
+ evsel->core.attr.ksymbol = 1;
+
+ evsel->core.system_wide = true;
+ evsel->no_aux_samples = true;
+ evsel->immediate = true;
+
+ /* Text poke must be collected on all CPUs */
+ perf_cpu_map__put(evsel->core.own_cpus);
+ evsel->core.own_cpus = perf_cpu_map__new(NULL);
+ perf_cpu_map__put(evsel->core.cpus);
+ evsel->core.cpus = perf_cpu_map__get(evsel->core.own_cpus);
+
+ perf_evsel__set_sample_bit(evsel, TIME);
+
+ return 0;
+}
+
static bool record__kcore_readable(struct machine *machine)
{
char kcore[PATH_MAX];
@@ -2620,6 +2657,14 @@ int cmd_record(int argc, const char **argv)
if (rec->opts.full_auxtrace)
rec->buildid_all = true;

+ if (rec->opts.text_poke) {
+ err = record__config_text_poke(rec->evlist);
+ if (err) {
+ pr_err("record__config_text_poke failed, error %d\n", err);
+ goto out;
+ }
+ }
+
if (record_opts__config(&rec->opts)) {
err = -EINVAL;
goto out;
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index dc0e11214ae1..cdaf58734254 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -31,6 +31,7 @@
#include "stat.h"
#include "session.h"
#include "bpf-event.h"
+#include "print_binary.h"
#include "tool.h"
#include "../perf.h"

@@ -55,6 +56,7 @@ static const char *perf_event__names[] = {
[PERF_RECORD_KSYMBOL] = "KSYMBOL",
[PERF_RECORD_BPF_EVENT] = "BPF_EVENT",
[PERF_RECORD_CGROUP] = "CGROUP",
+ [PERF_RECORD_TEXT_POKE] = "TEXT_POKE",
[PERF_RECORD_HEADER_ATTR] = "ATTR",
[PERF_RECORD_HEADER_EVENT_TYPE] = "EVENT_TYPE",
[PERF_RECORD_HEADER_TRACING_DATA] = "TRACING_DATA",
@@ -267,6 +269,14 @@ int perf_event__process_bpf(struct perf_tool *tool __maybe_unused,
return machine__process_bpf(machine, event, sample);
}

+int perf_event__process_text_poke(struct perf_tool *tool __maybe_unused,
+ union perf_event *event,
+ struct perf_sample *sample,
+ struct machine *machine)
+{
+ return machine__process_text_poke(machine, event, sample);
+}
+
size_t perf_event__fprintf_mmap(union perf_event *event, FILE *fp)
{
return fprintf(fp, " %d/%d: [%#" PRI_lx64 "(%#" PRI_lx64 ") @ %#" PRI_lx64 "]: %c %s\n",
@@ -413,6 +423,40 @@ size_t perf_event__fprintf_bpf(union perf_event *event, FILE *fp)
event->bpf.type, event->bpf.flags, event->bpf.id);
}

+static int text_poke_printer(enum binary_printer_ops op, unsigned int val,
+ void *extra, FILE *fp)
+{
+ bool old = *(bool *)extra;
+
+ switch ((int)op) {
+ case BINARY_PRINT_LINE_BEGIN:
+ return fprintf(fp, " %s bytes:", old ? "Old" : "New");
+ case BINARY_PRINT_NUM_DATA:
+ return fprintf(fp, " %02x", val);
+ case BINARY_PRINT_LINE_END:
+ return fprintf(fp, "\n");
+ default:
+ return 0;
+ }
+}
+
+size_t perf_event__fprintf_text_poke(union perf_event *event, FILE *fp)
+{
+ struct perf_record_text_poke_event *tp = &event->text_poke;
+ size_t ret;
+ bool old;
+
+ ret = fprintf(fp, " %" PRI_lx64 " old len %u new len %u\n",
+ tp->addr, tp->old_len, tp->new_len);
+ old = true;
+ ret += binary__fprintf(tp->bytes, tp->old_len, 16, text_poke_printer,
+ &old, fp);
+ old = false;
+ ret += binary__fprintf(tp->bytes + tp->old_len, tp->new_len, 16,
+ text_poke_printer, &old, fp);
+ return ret;
+}
+
size_t perf_event__fprintf(union perf_event *event, FILE *fp)
{
size_t ret = fprintf(fp, "PERF_RECORD_%s",
@@ -457,6 +501,9 @@ size_t perf_event__fprintf(union perf_event *event, FILE *fp)
case PERF_RECORD_BPF_EVENT:
ret += perf_event__fprintf_bpf(event, fp);
break;
+ case PERF_RECORD_TEXT_POKE:
+ ret += perf_event__fprintf_text_poke(event, fp);
+ break;
default:
ret += fprintf(fp, "\n");
}
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index b8289f160f07..dbbba6dea51d 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -351,6 +351,10 @@ int perf_event__process_bpf(struct perf_tool *tool,
union perf_event *event,
struct perf_sample *sample,
struct machine *machine);
+int perf_event__process_text_poke(struct perf_tool *tool,
+ union perf_event *event,
+ struct perf_sample *sample,
+ struct machine *machine);
int perf_event__process(struct perf_tool *tool,
union perf_event *event,
struct perf_sample *sample,
@@ -385,6 +389,7 @@ size_t perf_event__fprintf_namespaces(union perf_event *event, FILE *fp);
size_t perf_event__fprintf_cgroup(union perf_event *event, FILE *fp);
size_t perf_event__fprintf_ksymbol(union perf_event *event, FILE *fp);
size_t perf_event__fprintf_bpf(union perf_event *event, FILE *fp);
+size_t perf_event__fprintf_text_poke(union perf_event *event, FILE *fp);
size_t perf_event__fprintf(union perf_event *event, FILE *fp);

int kallsyms__get_function_start(const char *kallsyms_filename,
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index f5bd5c386df1..93e9b35e53b0 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -175,6 +175,7 @@ struct callchain_param;
void perf_evlist__set_id_pos(struct evlist *evlist);
bool perf_can_sample_identifier(void);
bool perf_can_record_switch_events(void);
+bool perf_can_record_text_poke_events(void);
bool perf_can_record_cpu_wide(void);
bool perf_can_aux_sample(void);
void perf_evlist__config(struct evlist *evlist, struct record_opts *opts,
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index eb880efbce16..b617d3b1fa95 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1098,7 +1098,12 @@ void perf_evsel__config(struct evsel *evsel, struct record_opts *opts,
attr->mmap = track;
attr->mmap2 = track && !perf_missing_features.mmap2;
attr->comm = track;
- attr->ksymbol = track && !perf_missing_features.ksymbol;
+ /*
+ * ksymbol is tracked separately with text poke because it needs to be
+ * system wide and enabled immediately.
+ */
+ if (!opts->text_poke)
+ attr->ksymbol = track && !perf_missing_features.ksymbol;
attr->bpf_event = track && !opts->no_bpf_event && !perf_missing_features.bpf;

if (opts->record_namespaces)
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 97142e9671be..0fff43cf15e5 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -796,6 +796,47 @@ int machine__process_ksymbol(struct machine *machine __maybe_unused,
return machine__process_ksymbol_register(machine, event, sample);
}

+int machine__process_text_poke(struct machine *machine, union perf_event *event,
+ struct perf_sample *sample __maybe_unused)
+{
+ struct map *map = maps__find(&machine->kmaps, event->text_poke.addr);
+ u8 cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
+
+ if (dump_trace)
+ perf_event__fprintf_text_poke(event, stdout);
+
+ if (!event->text_poke.new_len)
+ return 0;
+
+ if (cpumode != PERF_RECORD_MISC_KERNEL) {
+ pr_debug("%s: unsupported cpumode - ignoring\n", __func__);
+ return 0;
+ }
+
+ if (map && map->dso) {
+ u8 *new_bytes = event->text_poke.bytes + event->text_poke.old_len;
+ int ret;
+
+ /*
+ * Kernel maps might be changed when loading symbols so loading
+ * must be done prior to using kernel maps.
+ */
+ map__load(map);
+ ret = dso__data_write_cache_addr(map->dso, map, machine,
+ event->text_poke.addr,
+ new_bytes,
+ event->text_poke.new_len);
+ if (ret != event->text_poke.new_len)
+ pr_debug("Failed to write kernel text poke at %#" PRI_lx64 "\n",
+ event->text_poke.addr);
+ } else {
+ pr_debug("Failed to find kernel text poke address map for %#" PRI_lx64 "\n",
+ event->text_poke.addr);
+ }
+
+ return 0;
+}
+
static struct map *machine__addnew_module_map(struct machine *machine, u64 start,
const char *filename)
{
@@ -1918,6 +1959,8 @@ int machine__process_event(struct machine *machine, union perf_event *event,
ret = machine__process_ksymbol(machine, event, sample); break;
case PERF_RECORD_BPF_EVENT:
ret = machine__process_bpf(machine, event, sample); break;
+ case PERF_RECORD_TEXT_POKE:
+ ret = machine__process_text_poke(machine, event, sample); break;
default:
ret = -1;
break;
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index fa1be9ea00fa..062c36a8433c 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -138,6 +138,9 @@ int machine__process_mmap2_event(struct machine *machine, union perf_event *even
int machine__process_ksymbol(struct machine *machine,
union perf_event *event,
struct perf_sample *sample);
+int machine__process_text_poke(struct machine *machine,
+ union perf_event *event,
+ struct perf_sample *sample);
int machine__process_event(struct machine *machine, union perf_event *event,
struct perf_sample *sample);

diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/perf_event_attr_fprintf.c
index b94fa07f5d32..e67a227c0ce7 100644
--- a/tools/perf/util/perf_event_attr_fprintf.c
+++ b/tools/perf/util/perf_event_attr_fprintf.c
@@ -147,6 +147,7 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
PRINT_ATTRf(aux_watermark, p_unsigned);
PRINT_ATTRf(sample_max_stack, p_unsigned);
PRINT_ATTRf(aux_sample_size, p_unsigned);
+ PRINT_ATTRf(text_poke, p_unsigned);

return ret;
}
diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c
index 7def66168503..207ba2a65008 100644
--- a/tools/perf/util/record.c
+++ b/tools/perf/util/record.c
@@ -97,6 +97,11 @@ static void perf_probe_context_switch(struct evsel *evsel)
evsel->core.attr.context_switch = 1;
}

+static void perf_probe_text_poke(struct evsel *evsel)
+{
+ evsel->core.attr.text_poke = 1;
+}
+
bool perf_can_sample_identifier(void)
{
return perf_probe_api(perf_probe_sample_identifier);
@@ -112,6 +117,11 @@ bool perf_can_record_switch_events(void)
return perf_probe_api(perf_probe_context_switch);
}

+bool perf_can_record_text_poke_events(void)
+{
+ return perf_probe_api(perf_probe_text_poke);
+}
+
bool perf_can_record_cpu_wide(void)
{
struct perf_event_attr attr = {
diff --git a/tools/perf/util/record.h b/tools/perf/util/record.h
index 24316458be20..d0fd99864c49 100644
--- a/tools/perf/util/record.h
+++ b/tools/perf/util/record.h
@@ -47,6 +47,7 @@ struct record_opts {
bool sample_id;
bool no_bpf_event;
bool kcore;
+ bool text_poke;
unsigned int freq;
unsigned int mmap_pages;
unsigned int auxtrace_mmap_pages;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 0b0bfe5bef17..9e011f3a83ef 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -491,6 +491,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
tool->ksymbol = perf_event__process_ksymbol;
if (tool->bpf == NULL)
tool->bpf = perf_event__process_bpf;
+ if (tool->text_poke == NULL)
+ tool->text_poke = perf_event__process_text_poke;
if (tool->read == NULL)
tool->read = process_event_sample_stub;
if (tool->throttle == NULL)
@@ -660,6 +662,24 @@ static void perf_event__switch_swap(union perf_event *event, bool sample_id_all)
swap_sample_id_all(event, &event->context_switch + 1);
}

+static void perf_event__text_poke_swap(union perf_event *event, bool sample_id_all)
+{
+ event->text_poke.addr = bswap_64(event->text_poke.addr);
+ event->text_poke.old_len = bswap_16(event->text_poke.old_len);
+ event->text_poke.new_len = bswap_16(event->text_poke.new_len);
+
+ if (sample_id_all) {
+ size_t len = sizeof(event->text_poke.old_len) +
+ sizeof(event->text_poke.new_len) +
+ event->text_poke.old_len +
+ event->text_poke.new_len;
+ void *data = &event->text_poke.old_len;
+
+ data += PERF_ALIGN(len, sizeof(u64));
+ swap_sample_id_all(event, data);
+ }
+}
+
static void perf_event__throttle_swap(union perf_event *event,
bool sample_id_all)
{
@@ -933,6 +953,7 @@ static perf_event__swap_op perf_event__swap_ops[] = {
[PERF_RECORD_SWITCH] = perf_event__switch_swap,
[PERF_RECORD_SWITCH_CPU_WIDE] = perf_event__switch_swap,
[PERF_RECORD_NAMESPACES] = perf_event__namespaces_swap,
+ [PERF_RECORD_TEXT_POKE] = perf_event__text_poke_swap,
[PERF_RECORD_HEADER_ATTR] = perf_event__hdr_attr_swap,
[PERF_RECORD_HEADER_EVENT_TYPE] = perf_event__event_type_swap,
[PERF_RECORD_HEADER_TRACING_DATA] = perf_event__tracing_data_swap,
@@ -1476,6 +1497,8 @@ static int machines__deliver_event(struct machines *machines,
return tool->ksymbol(tool, event, sample, machine);
case PERF_RECORD_BPF_EVENT:
return tool->bpf(tool, event, sample, machine);
+ case PERF_RECORD_TEXT_POKE:
+ return tool->text_poke(tool, event, sample, machine);
default:
++evlist->stats.nr_unknown_events;
return -1;
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index 3fb67bd31e4a..bbbc0dcd461f 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -57,7 +57,8 @@ struct perf_tool {
throttle,
unthrottle,
ksymbol,
- bpf;
+ bpf,
+ text_poke;

event_attr_op attr;
event_attr_op event_update;
--
2.17.1

2020-04-05 20:16:48

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V6 09/15] perf kcore_copy: Fix module map when there are no modules loaded

In the absence of any modules, no "modules" map is created, but there are
other executable pages to map, due to eBPF JIT, kprobe or ftrace. Map them
by recognizing that the first "module" symbol is not necessarily from a
module, and adjust the map accordingly.

Signed-off-by: Adrian Hunter <[email protected]>
---
tools/perf/util/symbol-elf.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index be5b493f8284..5e43054bffea 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1458,6 +1458,7 @@ struct kcore_copy_info {
u64 first_symbol;
u64 last_symbol;
u64 first_module;
+ u64 first_module_symbol;
u64 last_module_symbol;
size_t phnum;
struct list_head phdrs;
@@ -1534,6 +1535,8 @@ static int kcore_copy__process_kallsyms(void *arg, const char *name, char type,
return 0;

if (strchr(name, '[')) {
+ if (!kci->first_module_symbol || start < kci->first_module_symbol)
+ kci->first_module_symbol = start;
if (start > kci->last_module_symbol)
kci->last_module_symbol = start;
return 0;
@@ -1731,6 +1734,10 @@ static int kcore_copy__calc_maps(struct kcore_copy_info *kci, const char *dir,
kci->etext += page_size;
}

+ if (kci->first_module_symbol &&
+ (!kci->first_module || kci->first_module_symbol < kci->first_module))
+ kci->first_module = kci->first_module_symbol;
+
kci->first_module = round_down(kci->first_module, page_size);

if (kci->last_module_symbol) {
--
2.17.1

2020-04-05 20:17:21

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V6 13/15] perf intel-pt: Add support for text poke events

Select text poke events when available and the kernel is being traced.
Process text poke events to invalidate entries in Intel PT's instruction
cache.

Example:

The example requires kernel config:
CONFIG_PROC_SYSCTL=y
CONFIG_SCHED_DEBUG=y
CONFIG_SCHEDSTATS=y

Before:

# perf record -o perf.data.before --kcore -a -e intel_pt//k -m,64M &
# cat /proc/sys/kernel/sched_schedstats
0
# echo 1 > /proc/sys/kernel/sched_schedstats
# cat /proc/sys/kernel/sched_schedstats
1
# echo 0 > /proc/sys/kernel/sched_schedstats
# cat /proc/sys/kernel/sched_schedstats
0
# kill %1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 3.341 MB perf.data.before ]
[1]+ Terminated perf record -o perf.data.before --kcore -a -e intel_pt//k -m,64M
# perf script -i perf.data.before --itrace=e >/dev/null
Warning:
474 instruction trace errors

After:

# perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M &
# cat /proc/sys/kernel/sched_schedstats
0
# echo 1 > /proc/sys/kernel/sched_schedstats
# cat /proc/sys/kernel/sched_schedstats
1
# echo 0 > /proc/sys/kernel/sched_schedstats
# cat /proc/sys/kernel/sched_schedstats
0
# kill %1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.646 MB perf.data.after ]
[1]+ Terminated perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M
# perf script -i perf.data.after --itrace=e >/dev/null

Example:

The example requires kernel config:
# CONFIG_FUNCTION_TRACER is not set

Before:
# perf record --kcore -m,64M -o t1 -a -e intel_pt//k &
# perf probe __schedule
Added new event:
probe:__schedule (on __schedule)

You can now use it in all perf tools, such as:

perf record -e probe:__schedule -aR sleep 1

# perf record -e probe:__schedule -aR sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.026 MB perf.data (68 samples) ]
# perf probe -d probe:__schedule
Removed event: probe:__schedule
# kill %1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 41.268 MB t1 ]
[1]+ Terminated perf record --kcore -m,64M -o t1 -a -e intel_pt//k
# perf script -i t1 --itrace=e >/dev/null
Warning:
207 instruction trace errors

After:
# perf record --kcore -m,64M -o t1 -a -e intel_pt//k &
# perf probe __schedule
Added new event:
probe:__schedule (on __schedule)

You can now use it in all perf tools, such as:

perf record -e probe:__schedule -aR sleep 1

# perf record -e probe:__schedule -aR sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.028 MB perf.data (107 samples) ]
# perf probe -d probe:__schedule
Removed event: probe:__schedule
# kill %1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 39.978 MB t1 ]
[1]+ Terminated perf record --kcore -m,64M -o t1 -a -e intel_pt//k
# perf script -i t1 --itrace=e >/dev/null
# perf script -i t1 --no-itrace -D | grep 'POKE\|KSYMBOL'
6 565303693547 0x291f18 [0x50]: PERF_RECORD_KSYMBOL addr ffffffffc027a000 len 4096 type 2 flags 0x0 name kprobe_insn_page
6 565303697010 0x291f68 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffc027a000 old len 0 new len 6
6 565303838278 0x291fa8 [0x50]: PERF_RECORD_KSYMBOL addr ffffffffc027c000 len 4096 type 2 flags 0x0 name kprobe_optinsn_page
6 565303848286 0x291ff8 [0xa0]: PERF_RECORD_TEXT_POKE addr 0xffffffffc027c000 old len 0 new len 106
6 565369336743 0x292af8 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffff88ab8890 old len 5 new len 5
7 566434327704 0x217c208 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffff88ab8890 old len 5 new len 5
6 566456313475 0x293198 [0xa0]: PERF_RECORD_TEXT_POKE addr 0xffffffffc027c000 old len 106 new len 0
6 566456314935 0x293238 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffc027a000 old len 6 new len 0

Example:

The example requires kernel config:
CONFIG_FUNCTION_TRACER=y

Before:
# perf record --kcore -m,64M -o t1 -a -e intel_pt//k &
# perf probe __kmalloc
Added new event:
probe:__kmalloc (on __kmalloc)

You can now use it in all perf tools, such as:

perf record -e probe:__kmalloc -aR sleep 1

# perf record -e probe:__kmalloc -aR sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.022 MB perf.data (6 samples) ]
# perf probe -d probe:__kmalloc
Removed event: probe:__kmalloc
# kill %1
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 43.850 MB t1 ]
[1]+ Terminated perf record --kcore -m,64M -o t1 -a -e intel_pt//k
# perf script -i t1 --itrace=e >/dev/null
Warning:
8 instruction trace errors

After:
# perf record --kcore -m,64M -o t1 -a -e intel_pt//k &
# perf probe __kmalloc
Added new event:
probe:__kmalloc (on __kmalloc)

You can now use it in all perf tools, such as:

perf record -e probe:__kmalloc -aR sleep 1

# perf record -e probe:__kmalloc -aR sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.037 MB perf.data (206 samples) ]
# perf probe -d probe:__kmalloc
Removed event: probe:__kmalloc
# kill %1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 41.442 MB t1 ]
[1]+ Terminated perf record --kcore -m,64M -o t1 -a -e intel_pt//k
# perf script -i t1 --itrace=e >/dev/null
# perf script -i t1 --no-itrace -D | grep 'POKE\|KSYMBOL'
5 312216133258 0x8bafe0 [0x50]: PERF_RECORD_KSYMBOL addr ffffffffc0360000 len 415 type 2 flags 0x0 name ftrace_trampoline
5 312216133494 0x8bb030 [0x1d8]: PERF_RECORD_TEXT_POKE addr 0xffffffffc0360000 old len 0 new len 415
5 312216229563 0x8bb208 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac6016f5 old len 5 new len 5
5 312216239063 0x8bb248 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac601803 old len 5 new len 5
5 312216727230 0x8bb288 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffabbea190 old len 5 new len 5
5 312216739322 0x8bb2c8 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac6016f5 old len 5 new len 5
5 312216748321 0x8bb308 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac601803 old len 5 new len 5
7 313287163462 0x2817430 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac6016f5 old len 5 new len 5
7 313287174890 0x2817470 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac601803 old len 5 new len 5
7 313287818979 0x28174b0 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffabbea190 old len 5 new len 5
7 313287829357 0x28174f0 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac6016f5 old len 5 new len 5
7 313287841246 0x2817530 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac601803 old len 5 new len 5

Signed-off-by: Adrian Hunter <[email protected]>
---
tools/perf/arch/x86/util/intel-pt.c | 4 ++
tools/perf/util/intel-pt.c | 75 +++++++++++++++++++++++++++++
2 files changed, 79 insertions(+)

diff --git a/tools/perf/arch/x86/util/intel-pt.c b/tools/perf/arch/x86/util/intel-pt.c
index 1643aed8c4c8..0140b4466209 100644
--- a/tools/perf/arch/x86/util/intel-pt.c
+++ b/tools/perf/arch/x86/util/intel-pt.c
@@ -827,6 +827,10 @@ static int intel_pt_recording_options(struct auxtrace_record *itr,
}
}

+ if (have_timing_info && !intel_pt_evsel->core.attr.exclude_kernel &&
+ perf_can_record_text_poke_events() && perf_can_record_cpu_wide())
+ opts->text_poke = true;
+
if (intel_pt_evsel) {
/*
* To obtain the auxtrace buffer file descriptor, the auxtrace
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index 23c8289c2472..470a3b413869 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -514,6 +514,17 @@ intel_pt_cache_lookup(struct dso *dso, struct machine *machine, u64 offset)
return auxtrace_cache__lookup(dso->auxtrace_cache, offset);
}

+static void intel_pt_cache_invalidate(struct dso *dso, struct machine *machine,
+ u64 offset)
+{
+ struct auxtrace_cache *c = intel_pt_cache(dso, machine);
+
+ if (!c)
+ return;
+
+ auxtrace_cache__remove(dso->auxtrace_cache, offset);
+}
+
static inline u8 intel_pt_cpumode(struct intel_pt *pt, uint64_t ip)
{
return ip >= pt->kernel_start ?
@@ -2594,6 +2605,67 @@ static int intel_pt_process_itrace_start(struct intel_pt *pt,
event->itrace_start.tid);
}

+static int intel_pt_find_map(struct thread *thread, u8 cpumode, u64 addr,
+ struct addr_location *al)
+{
+ if (!al->map || addr < al->map->start || addr >= al->map->end) {
+ if (!thread__find_map(thread, cpumode, addr, al))
+ return -1;
+ }
+
+ return 0;
+}
+
+/* Invalidate all instruction cache entries that overlap the text poke */
+static int intel_pt_text_poke(struct intel_pt *pt, union perf_event *event)
+{
+ u8 cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
+ u64 addr = event->text_poke.addr + event->text_poke.new_len - 1;
+ /* Assume text poke begins in a basic block no more than 4096 bytes */
+ int cnt = 4096 + event->text_poke.new_len;
+ struct thread *thread = pt->unknown_thread;
+ struct addr_location al = { .map = NULL };
+ struct machine *machine = pt->machine;
+ struct intel_pt_cache_entry *e;
+ u64 offset;
+
+ if (!event->text_poke.new_len)
+ return 0;
+
+ for (; cnt; cnt--, addr--) {
+ if (intel_pt_find_map(thread, cpumode, addr, &al)) {
+ if (addr < event->text_poke.addr)
+ return 0;
+ continue;
+ }
+
+ if (!al.map->dso || !al.map->dso->auxtrace_cache)
+ continue;
+
+ offset = al.map->map_ip(al.map, addr);
+
+ e = intel_pt_cache_lookup(al.map->dso, machine, offset);
+ if (!e)
+ continue;
+
+ if (addr + e->byte_cnt + e->length <= event->text_poke.addr) {
+ /*
+ * No overlap. Working backwards there cannot be another
+ * basic block that overlaps the text poke if there is a
+ * branch instruction before the text poke address.
+ */
+ if (e->branch != INTEL_PT_BR_NO_BRANCH)
+ return 0;
+ } else {
+ intel_pt_cache_invalidate(al.map->dso, machine, offset);
+ intel_pt_log("Invalidated instruction cache for %s at %#"PRIx64"\n",
+ al.map->dso->long_name, addr);
+ }
+ }
+
+ return 0;
+}
+
static int intel_pt_process_event(struct perf_session *session,
union perf_event *event,
struct perf_sample *sample,
@@ -2655,6 +2727,9 @@ static int intel_pt_process_event(struct perf_session *session,
event->header.type == PERF_RECORD_SWITCH_CPU_WIDE)
err = intel_pt_context_switch(pt, event, sample);

+ if (!err && event->header.type == PERF_RECORD_TEXT_POKE)
+ err = intel_pt_text_poke(pt, event);
+
intel_pt_log("event %u: cpu %d time %"PRIu64" tsc %#"PRIx64" ",
event->header.type, sample->cpu, sample->time, timestamp);
intel_pt_log_event(event);
--
2.17.1

2020-04-05 20:37:59

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V6 14/15] perf script: Add option --show-text-poke-events

Consistent with other new events, add an option to perf script to display
text poke events and ksymbol events. Both text poke events and ksymbol
events are displayed because some text pokes (e.g. ftrace trampolines) have
corresponding ksymbol events.

Signed-off-by: Adrian Hunter <[email protected]>
---
tools/perf/Documentation/perf-script.txt | 4 ++++
tools/perf/builtin-script.c | 19 +++++++++++++++++++
2 files changed, 23 insertions(+)

diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index 99a9853a11ba..8fe507d69f95 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -322,6 +322,10 @@ OPTIONS
--show-cgroup-events
Display cgroup events i.e. events of type PERF_RECORD_CGROUP.

+--show-text-poke-events
+ Display text poke events i.e. events of type PERF_RECORD_TEXT_POKE and
+ PERF_RECORD_KSYMBOL.
+
--demangle::
Demangle symbol names to human readable form. It's enabled by default,
disable with --no-demangle.
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 348fec9743a4..e934bf46269d 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -1695,6 +1695,7 @@ struct perf_script {
bool show_round_events;
bool show_bpf_events;
bool show_cgroup_events;
+ bool show_text_poke_events;
bool allocated;
bool per_event_dump;
struct evswitch evswitch;
@@ -2319,6 +2320,18 @@ process_bpf_events(struct perf_tool *tool __maybe_unused,
sample->tid);
}

+static int process_text_poke_events(struct perf_tool *tool,
+ union perf_event *event,
+ struct perf_sample *sample,
+ struct machine *machine)
+{
+ if (perf_event__process_text_poke(tool, event, sample, machine) < 0)
+ return -1;
+
+ return print_event(tool, event, sample, machine, sample->pid,
+ sample->tid);
+}
+
static void sig_handler(int sig __maybe_unused)
{
session_done = 1;
@@ -2427,6 +2440,10 @@ static int __cmd_script(struct perf_script *script)
script->tool.ksymbol = process_bpf_events;
script->tool.bpf = process_bpf_events;
}
+ if (script->show_text_poke_events) {
+ script->tool.ksymbol = process_bpf_events;
+ script->tool.text_poke = process_text_poke_events;
+ }

if (perf_script__setup_per_event_dump(script)) {
pr_err("Couldn't create the per event dump files\n");
@@ -3452,6 +3469,8 @@ int cmd_script(int argc, const char **argv)
"Show round events (if recorded)"),
OPT_BOOLEAN('\0', "show-bpf-events", &script.show_bpf_events,
"Show bpf related events (if recorded)"),
+ OPT_BOOLEAN('\0', "show-text-poke-events", &script.show_text_poke_events,
+ "Show text poke related events (if recorded)"),
OPT_BOOLEAN('\0', "per-event-dump", &script.per_event_dump,
"Dump trace output to files named by the monitored events"),
OPT_BOOLEAN('f', "force", &symbol_conf.force, "don't complain, do it"),
--
2.17.1

2020-04-05 20:40:27

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V6 15/15] perf script: Show text poke address symbol

It is generally more useful to show the symbol with an address. In this
case, the print function requires the 'machine' which means changing
callers to provide it as a parameter. It is optional because most events
do not need it and the callers that matter can provide it.

Signed-off-by: Adrian Hunter <[email protected]>
---
tools/perf/builtin-script.c | 4 ++--
tools/perf/tests/perf-record.c | 4 ++--
tools/perf/util/event.c | 23 ++++++++++++++++++-----
tools/perf/util/event.h | 6 ++++--
tools/perf/util/intel-pt.c | 2 +-
tools/perf/util/machine.c | 2 +-
6 files changed, 28 insertions(+), 13 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index e934bf46269d..f06470e8f07b 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -2168,7 +2168,7 @@ static int print_event_with_time(struct perf_tool *tool,
event->header.type, stdout);
}

- perf_event__fprintf(event, stdout);
+ perf_event__fprintf(machine, event, stdout);

thread__put(thread);

@@ -2303,7 +2303,7 @@ process_finished_round_event(struct perf_tool *tool __maybe_unused,
struct ordered_events *oe __maybe_unused)

{
- perf_event__fprintf(event, stdout);
+ perf_event__fprintf(NULL, event, stdout);
return 0;
}

diff --git a/tools/perf/tests/perf-record.c b/tools/perf/tests/perf-record.c
index 2195fc205e72..a187a70b4a98 100644
--- a/tools/perf/tests/perf-record.c
+++ b/tools/perf/tests/perf-record.c
@@ -185,14 +185,14 @@ int test__PERF_RECORD(struct test *test __maybe_unused, int subtest __maybe_unus
err = perf_evlist__parse_sample(evlist, event, &sample);
if (err < 0) {
if (verbose > 0)
- perf_event__fprintf(event, stderr);
+ perf_event__fprintf(NULL, event, stderr);
pr_debug("Couldn't parse sample\n");
goto out_delete_evlist;
}

if (verbose > 0) {
pr_info("%" PRIu64" %d ", sample.time, sample.cpu);
- perf_event__fprintf(event, stderr);
+ perf_event__fprintf(NULL, event, stderr);
}

if (prev_time > sample.time) {
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index cdaf58734254..e151df9b28d7 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -440,14 +440,26 @@ static int text_poke_printer(enum binary_printer_ops op, unsigned int val,
}
}

-size_t perf_event__fprintf_text_poke(union perf_event *event, FILE *fp)
+size_t perf_event__fprintf_text_poke(struct machine *machine,
+ union perf_event *event, FILE *fp)
{
struct perf_record_text_poke_event *tp = &event->text_poke;
size_t ret;
bool old;

- ret = fprintf(fp, " %" PRI_lx64 " old len %u new len %u\n",
- tp->addr, tp->old_len, tp->new_len);
+ ret = fprintf(fp, " %" PRI_lx64 " ", tp->addr);
+ if (machine) {
+ struct addr_location al;
+
+ al.map = maps__find(&machine->kmaps, tp->addr);
+ if (al.map && map__load(al.map) >= 0) {
+ al.addr = al.map->map_ip(al.map, tp->addr);
+ al.sym = map__find_symbol(al.map, al.addr);
+ if (al.sym)
+ ret += symbol__fprintf_symname_offs(al.sym, &al, fp);
+ }
+ }
+ ret += fprintf(fp, " old len %u new len %u\n", tp->old_len, tp->new_len);
old = true;
ret += binary__fprintf(tp->bytes, tp->old_len, 16, text_poke_printer,
&old, fp);
@@ -457,7 +469,8 @@ size_t perf_event__fprintf_text_poke(union perf_event *event, FILE *fp)
return ret;
}

-size_t perf_event__fprintf(union perf_event *event, FILE *fp)
+size_t perf_event__fprintf(struct machine *machine, union perf_event *event,
+ FILE *fp)
{
size_t ret = fprintf(fp, "PERF_RECORD_%s",
perf_event__name(event->header.type));
@@ -502,7 +515,7 @@ size_t perf_event__fprintf(union perf_event *event, FILE *fp)
ret += perf_event__fprintf_bpf(event, fp);
break;
case PERF_RECORD_TEXT_POKE:
- ret += perf_event__fprintf_text_poke(event, fp);
+ ret += perf_event__fprintf_text_poke(machine, event, fp);
break;
default:
ret += fprintf(fp, "\n");
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index dbbba6dea51d..611c8e5ad9d9 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -389,8 +389,10 @@ size_t perf_event__fprintf_namespaces(union perf_event *event, FILE *fp);
size_t perf_event__fprintf_cgroup(union perf_event *event, FILE *fp);
size_t perf_event__fprintf_ksymbol(union perf_event *event, FILE *fp);
size_t perf_event__fprintf_bpf(union perf_event *event, FILE *fp);
-size_t perf_event__fprintf_text_poke(union perf_event *event, FILE *fp);
-size_t perf_event__fprintf(union perf_event *event, FILE *fp);
+size_t perf_event__fprintf_text_poke(struct machine *machine,
+ union perf_event *event, FILE *fp);
+size_t perf_event__fprintf(struct machine *machine, union perf_event *event,
+ FILE *fp);

int kallsyms__get_function_start(const char *kallsyms_filename,
const char *symbol_name, u64 *addr);
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index 470a3b413869..902f85ded944 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -230,7 +230,7 @@ static void intel_pt_log_event(union perf_event *event)
if (!intel_pt_enable_logging || !f)
return;

- perf_event__fprintf(event, f);
+ perf_event__fprintf(NULL, event, f);
}

static void intel_pt_dump_sample(struct perf_session *session,
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 8447334ac056..4c023667a073 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -809,7 +809,7 @@ int machine__process_text_poke(struct machine *machine, union perf_event *event,
u8 cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;

if (dump_trace)
- perf_event__fprintf_text_poke(event, stdout);
+ perf_event__fprintf_text_poke(machine, event, stdout);

if (!event->text_poke.new_len)
return 0;
--
2.17.1

2020-04-20 06:25:22

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH V6 00/15] perf/x86: Add perf text poke events

On 5/04/20 11:13 pm, Adrian Hunter wrote:
> Hi
>
> Here are patches to add a text poke event to record changes to kernel text
> (i.e. self-modifying code) in order to support tracers like Intel PT
> decoding through jump labels, kprobes and ftrace trampolines.
>
> The first 8 patches make the kernel changes and the subsequent patches are
> tools changes.
>
> The next 4 patches add support for updating perf tools' data cache
> with the changed bytes.
>
> The next patch is an Intel PT specific tools change.
>
> The final 2 patches add perf script --show-text-poke-events option
>
> Patches also here:
>
> git://git.infradead.org/users/ahunter/linux-perf.git text_poke
>
> Changes in V6
>
> perf: Add perf text poke event
>
> Adjustments due to re-base on top of cgroup event patches
>
> perf/x86: Add perf text poke events for kprobes
>
> Expand commit message
> Add Masami's Ack
>
> perf tools: Add support for PERF_RECORD_TEXT_POKE
>
> Adjustments due to re-base on top of cgroup event patches
> Improvements to perf_event__fprintf_text_poke()
>
> perf script: Add option --show-text-poke-events
> perf script: Show text poke address symbol
>
> New patches
>

Any comments?

2020-04-21 12:53:20

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH V6 13/15] perf intel-pt: Add support for text poke events

On Sun, Apr 05, 2020 at 11:13:25PM +0300, Adrian Hunter wrote:
> Select text poke events when available and the kernel is being traced.
> Process text poke events to invalidate entries in Intel PT's instruction
> cache.
>
> Example:
>
> The example requires kernel config:
> CONFIG_PROC_SYSCTL=y
> CONFIG_SCHED_DEBUG=y
> CONFIG_SCHEDSTATS=y
>
> Before:
>
> # perf record -o perf.data.before --kcore -a -e intel_pt//k -m,64M &
> # cat /proc/sys/kernel/sched_schedstats
> 0
> # echo 1 > /proc/sys/kernel/sched_schedstats
> # cat /proc/sys/kernel/sched_schedstats
> 1
> # echo 0 > /proc/sys/kernel/sched_schedstats
> # cat /proc/sys/kernel/sched_schedstats
> 0
> # kill %1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 3.341 MB perf.data.before ]
> [1]+ Terminated perf record -o perf.data.before --kcore -a -e intel_pt//k -m,64M
> # perf script -i perf.data.before --itrace=e >/dev/null
> Warning:
> 474 instruction trace errors
>
> After:
>
> # perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M &
> # cat /proc/sys/kernel/sched_schedstats
> 0
> # echo 1 > /proc/sys/kernel/sched_schedstats
> # cat /proc/sys/kernel/sched_schedstats
> 1
> # echo 0 > /proc/sys/kernel/sched_schedstats
> # cat /proc/sys/kernel/sched_schedstats
> 0
> # kill %1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 2.646 MB perf.data.after ]
> [1]+ Terminated perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M
> # perf script -i perf.data.after --itrace=e >/dev/null


I'm still seeing some, probably I'm missing some CONFIG, will check

# ./perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M &
[1] 5880
# cat /proc/sys/kernel/sched_schedstats
0
# echo 1 > /proc/sys/kernel/sched_schedstats
# cat /proc/sys/kernel/sched_schedstats
1
# echo 0 > /proc/sys/kernel/sched_schedstats
# kill %1
# [ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 6.181 MB perf.data.after ]
[1]+ Terminated ./perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M
# ./perf script --itrace=e -i perf.data.after > /dev/null
Warning:
18837 instruction trace errors

# ./perf script --itrace=e -i perf.data.after | head
instruction trace error type 1 time 9274.420582345 cpu 9 pid 845 tid 845 ip 0xffffffff814e6cf2 code 6: Trace doesn't match instruction
instruction trace error type 1 time 9274.422229726 cpu 39 pid 5880 tid 5880 ip 0xffffffffa030a320 code 6: Trace doesn't match instruction
instruction trace error type 1 time 9274.422231972 cpu 39 pid 5880 tid 5880 ip 0xffffffffa0315c1c code 6: Trace doesn't match instruction
instruction trace error type 1 time 9274.422236141 cpu 39 pid 5880 tid 5880 ip 0xffffffff81143263 code 6: Trace doesn't match instruction
instruction trace error type 1 time 9274.422237424 cpu 39 pid 5880 tid 5880 ip 0xffffffff8115c388 code 6: Trace doesn't match instruction
instruction trace error type 1 time 9274.422239028 cpu 39 pid 5880 tid 5880 ip 0xffffffff811428c9 code 6: Trace doesn't match instruction
instruction trace error type 1 time 9274.422239028 cpu 39 pid 5880 tid 5880 ip 0xffffffffa13279fb code 5: Failed to get instruction
instruction trace error type 1 time 9274.422242556 cpu 39 pid 5880 tid 5880 ip 0xffffffff814e9c73 code 6: Trace doesn't match instruction
instruction trace error type 1 time 9274.422258915 cpu 39 pid 5880 tid 5880 ip 0xffffffff810d7da2 code 6: Trace doesn't match instruction
instruction trace error type 1 time 9274.422258915 cpu 39 pid 5880 tid 5880 ip 0xffffffffa030a0b2 code 6: Trace doesn't match instruction

jirka

2020-04-21 13:08:14

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH V6 13/15] perf intel-pt: Add support for text poke events

On 21/04/20 3:50 pm, Jiri Olsa wrote:
> On Sun, Apr 05, 2020 at 11:13:25PM +0300, Adrian Hunter wrote:
>> Select text poke events when available and the kernel is being traced.
>> Process text poke events to invalidate entries in Intel PT's instruction
>> cache.
>>
>> Example:
>>
>> The example requires kernel config:
>> CONFIG_PROC_SYSCTL=y
>> CONFIG_SCHED_DEBUG=y
>> CONFIG_SCHEDSTATS=y
>>
>> Before:
>>
>> # perf record -o perf.data.before --kcore -a -e intel_pt//k -m,64M &
>> # cat /proc/sys/kernel/sched_schedstats
>> 0
>> # echo 1 > /proc/sys/kernel/sched_schedstats
>> # cat /proc/sys/kernel/sched_schedstats
>> 1
>> # echo 0 > /proc/sys/kernel/sched_schedstats
>> # cat /proc/sys/kernel/sched_schedstats
>> 0
>> # kill %1
>> [ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 3.341 MB perf.data.before ]
>> [1]+ Terminated perf record -o perf.data.before --kcore -a -e intel_pt//k -m,64M
>> # perf script -i perf.data.before --itrace=e >/dev/null
>> Warning:
>> 474 instruction trace errors
>>
>> After:
>>
>> # perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M &
>> # cat /proc/sys/kernel/sched_schedstats
>> 0
>> # echo 1 > /proc/sys/kernel/sched_schedstats
>> # cat /proc/sys/kernel/sched_schedstats
>> 1
>> # echo 0 > /proc/sys/kernel/sched_schedstats
>> # cat /proc/sys/kernel/sched_schedstats
>> 0
>> # kill %1
>> [ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 2.646 MB perf.data.after ]
>> [1]+ Terminated perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M
>> # perf script -i perf.data.after --itrace=e >/dev/null
>
>
> I'm still seeing some, probably I'm missing some CONFIG, will check
>
> # ./perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M &
> [1] 5880
> # cat /proc/sys/kernel/sched_schedstats
> 0
> # echo 1 > /proc/sys/kernel/sched_schedstats
> # cat /proc/sys/kernel/sched_schedstats
> 1
> # echo 0 > /proc/sys/kernel/sched_schedstats
> # kill %1
> # [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 6.181 MB perf.data.after ]
> [1]+ Terminated ./perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M
> # ./perf script --itrace=e -i perf.data.after > /dev/null
> Warning:
> 18837 instruction trace errors
>
> # ./perf script --itrace=e -i perf.data.after | head
> instruction trace error type 1 time 9274.420582345 cpu 9 pid 845 tid 845 ip 0xffffffff814e6cf2 code 6: Trace doesn't match instruction
> instruction trace error type 1 time 9274.422229726 cpu 39 pid 5880 tid 5880 ip 0xffffffffa030a320 code 6: Trace doesn't match instruction
> instruction trace error type 1 time 9274.422231972 cpu 39 pid 5880 tid 5880 ip 0xffffffffa0315c1c code 6: Trace doesn't match instruction
> instruction trace error type 1 time 9274.422236141 cpu 39 pid 5880 tid 5880 ip 0xffffffff81143263 code 6: Trace doesn't match instruction
> instruction trace error type 1 time 9274.422237424 cpu 39 pid 5880 tid 5880 ip 0xffffffff8115c388 code 6: Trace doesn't match instruction
> instruction trace error type 1 time 9274.422239028 cpu 39 pid 5880 tid 5880 ip 0xffffffff811428c9 code 6: Trace doesn't match instruction
> instruction trace error type 1 time 9274.422239028 cpu 39 pid 5880 tid 5880 ip 0xffffffffa13279fb code 5: Failed to get instruction
> instruction trace error type 1 time 9274.422242556 cpu 39 pid 5880 tid 5880 ip 0xffffffff814e9c73 code 6: Trace doesn't match instruction
> instruction trace error type 1 time 9274.422258915 cpu 39 pid 5880 tid 5880 ip 0xffffffff810d7da2 code 6: Trace doesn't match instruction
> instruction trace error type 1 time 9274.422258915 cpu 39 pid 5880 tid 5880 ip 0xffffffffa030a0b2 code 6: Trace doesn't match instruction

I don't think it can be CONFIG. Can you share the branch you are using?
Then I can test it.


2020-04-21 13:08:24

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH V6 13/15] perf intel-pt: Add support for text poke events

On Tue, Apr 21, 2020 at 04:02:32PM +0300, Adrian Hunter wrote:
> On 21/04/20 3:50 pm, Jiri Olsa wrote:
> > On Sun, Apr 05, 2020 at 11:13:25PM +0300, Adrian Hunter wrote:
> >> Select text poke events when available and the kernel is being traced.
> >> Process text poke events to invalidate entries in Intel PT's instruction
> >> cache.
> >>
> >> Example:
> >>
> >> The example requires kernel config:
> >> CONFIG_PROC_SYSCTL=y
> >> CONFIG_SCHED_DEBUG=y
> >> CONFIG_SCHEDSTATS=y
> >>
> >> Before:
> >>
> >> # perf record -o perf.data.before --kcore -a -e intel_pt//k -m,64M &
> >> # cat /proc/sys/kernel/sched_schedstats
> >> 0
> >> # echo 1 > /proc/sys/kernel/sched_schedstats
> >> # cat /proc/sys/kernel/sched_schedstats
> >> 1
> >> # echo 0 > /proc/sys/kernel/sched_schedstats
> >> # cat /proc/sys/kernel/sched_schedstats
> >> 0
> >> # kill %1
> >> [ perf record: Woken up 1 times to write data ]
> >> [ perf record: Captured and wrote 3.341 MB perf.data.before ]
> >> [1]+ Terminated perf record -o perf.data.before --kcore -a -e intel_pt//k -m,64M
> >> # perf script -i perf.data.before --itrace=e >/dev/null
> >> Warning:
> >> 474 instruction trace errors
> >>
> >> After:
> >>
> >> # perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M &
> >> # cat /proc/sys/kernel/sched_schedstats
> >> 0
> >> # echo 1 > /proc/sys/kernel/sched_schedstats
> >> # cat /proc/sys/kernel/sched_schedstats
> >> 1
> >> # echo 0 > /proc/sys/kernel/sched_schedstats
> >> # cat /proc/sys/kernel/sched_schedstats
> >> 0
> >> # kill %1
> >> [ perf record: Woken up 1 times to write data ]
> >> [ perf record: Captured and wrote 2.646 MB perf.data.after ]
> >> [1]+ Terminated perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M
> >> # perf script -i perf.data.after --itrace=e >/dev/null
> >
> >
> > I'm still seeing some, probably I'm missing some CONFIG, will check
> >
> > # ./perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M &
> > [1] 5880
> > # cat /proc/sys/kernel/sched_schedstats
> > 0
> > # echo 1 > /proc/sys/kernel/sched_schedstats
> > # cat /proc/sys/kernel/sched_schedstats
> > 1
> > # echo 0 > /proc/sys/kernel/sched_schedstats
> > # kill %1
> > # [ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 6.181 MB perf.data.after ]
> > [1]+ Terminated ./perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M
> > # ./perf script --itrace=e -i perf.data.after > /dev/null
> > Warning:
> > 18837 instruction trace errors
> >
> > # ./perf script --itrace=e -i perf.data.after | head
> > instruction trace error type 1 time 9274.420582345 cpu 9 pid 845 tid 845 ip 0xffffffff814e6cf2 code 6: Trace doesn't match instruction
> > instruction trace error type 1 time 9274.422229726 cpu 39 pid 5880 tid 5880 ip 0xffffffffa030a320 code 6: Trace doesn't match instruction
> > instruction trace error type 1 time 9274.422231972 cpu 39 pid 5880 tid 5880 ip 0xffffffffa0315c1c code 6: Trace doesn't match instruction
> > instruction trace error type 1 time 9274.422236141 cpu 39 pid 5880 tid 5880 ip 0xffffffff81143263 code 6: Trace doesn't match instruction
> > instruction trace error type 1 time 9274.422237424 cpu 39 pid 5880 tid 5880 ip 0xffffffff8115c388 code 6: Trace doesn't match instruction
> > instruction trace error type 1 time 9274.422239028 cpu 39 pid 5880 tid 5880 ip 0xffffffff811428c9 code 6: Trace doesn't match instruction
> > instruction trace error type 1 time 9274.422239028 cpu 39 pid 5880 tid 5880 ip 0xffffffffa13279fb code 5: Failed to get instruction
> > instruction trace error type 1 time 9274.422242556 cpu 39 pid 5880 tid 5880 ip 0xffffffff814e9c73 code 6: Trace doesn't match instruction
> > instruction trace error type 1 time 9274.422258915 cpu 39 pid 5880 tid 5880 ip 0xffffffff810d7da2 code 6: Trace doesn't match instruction
> > instruction trace error type 1 time 9274.422258915 cpu 39 pid 5880 tid 5880 ip 0xffffffffa030a0b2 code 6: Trace doesn't match instruction
>
> I don't think it can be CONFIG. Can you share the branch you are using?
> Then I can test it.
>

I built the one you mentioned in cover email:
git://git.infradead.org/users/ahunter/linux-perf.git text_poke

jirka

2020-04-21 13:13:17

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH V6 13/15] perf intel-pt: Add support for text poke events

On 21/04/20 4:06 pm, Jiri Olsa wrote:
> On Tue, Apr 21, 2020 at 04:02:32PM +0300, Adrian Hunter wrote:
>> On 21/04/20 3:50 pm, Jiri Olsa wrote:
>>> On Sun, Apr 05, 2020 at 11:13:25PM +0300, Adrian Hunter wrote:
>>>> Select text poke events when available and the kernel is being traced.
>>>> Process text poke events to invalidate entries in Intel PT's instruction
>>>> cache.
>>>>
>>>> Example:
>>>>
>>>> The example requires kernel config:
>>>> CONFIG_PROC_SYSCTL=y
>>>> CONFIG_SCHED_DEBUG=y
>>>> CONFIG_SCHEDSTATS=y
>>>>
>>>> Before:
>>>>
>>>> # perf record -o perf.data.before --kcore -a -e intel_pt//k -m,64M &
>>>> # cat /proc/sys/kernel/sched_schedstats
>>>> 0
>>>> # echo 1 > /proc/sys/kernel/sched_schedstats
>>>> # cat /proc/sys/kernel/sched_schedstats
>>>> 1
>>>> # echo 0 > /proc/sys/kernel/sched_schedstats
>>>> # cat /proc/sys/kernel/sched_schedstats
>>>> 0
>>>> # kill %1
>>>> [ perf record: Woken up 1 times to write data ]
>>>> [ perf record: Captured and wrote 3.341 MB perf.data.before ]
>>>> [1]+ Terminated perf record -o perf.data.before --kcore -a -e intel_pt//k -m,64M
>>>> # perf script -i perf.data.before --itrace=e >/dev/null
>>>> Warning:
>>>> 474 instruction trace errors
>>>>
>>>> After:
>>>>
>>>> # perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M &
>>>> # cat /proc/sys/kernel/sched_schedstats
>>>> 0
>>>> # echo 1 > /proc/sys/kernel/sched_schedstats
>>>> # cat /proc/sys/kernel/sched_schedstats
>>>> 1
>>>> # echo 0 > /proc/sys/kernel/sched_schedstats
>>>> # cat /proc/sys/kernel/sched_schedstats
>>>> 0
>>>> # kill %1
>>>> [ perf record: Woken up 1 times to write data ]
>>>> [ perf record: Captured and wrote 2.646 MB perf.data.after ]
>>>> [1]+ Terminated perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M
>>>> # perf script -i perf.data.after --itrace=e >/dev/null
>>>
>>>
>>> I'm still seeing some, probably I'm missing some CONFIG, will check
>>>
>>> # ./perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M &
>>> [1] 5880
>>> # cat /proc/sys/kernel/sched_schedstats
>>> 0
>>> # echo 1 > /proc/sys/kernel/sched_schedstats
>>> # cat /proc/sys/kernel/sched_schedstats
>>> 1
>>> # echo 0 > /proc/sys/kernel/sched_schedstats
>>> # kill %1
>>> # [ perf record: Woken up 1 times to write data ]
>>> [ perf record: Captured and wrote 6.181 MB perf.data.after ]
>>> [1]+ Terminated ./perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M
>>> # ./perf script --itrace=e -i perf.data.after > /dev/null
>>> Warning:
>>> 18837 instruction trace errors
>>>
>>> # ./perf script --itrace=e -i perf.data.after | head
>>> instruction trace error type 1 time 9274.420582345 cpu 9 pid 845 tid 845 ip 0xffffffff814e6cf2 code 6: Trace doesn't match instruction
>>> instruction trace error type 1 time 9274.422229726 cpu 39 pid 5880 tid 5880 ip 0xffffffffa030a320 code 6: Trace doesn't match instruction
>>> instruction trace error type 1 time 9274.422231972 cpu 39 pid 5880 tid 5880 ip 0xffffffffa0315c1c code 6: Trace doesn't match instruction
>>> instruction trace error type 1 time 9274.422236141 cpu 39 pid 5880 tid 5880 ip 0xffffffff81143263 code 6: Trace doesn't match instruction
>>> instruction trace error type 1 time 9274.422237424 cpu 39 pid 5880 tid 5880 ip 0xffffffff8115c388 code 6: Trace doesn't match instruction
>>> instruction trace error type 1 time 9274.422239028 cpu 39 pid 5880 tid 5880 ip 0xffffffff811428c9 code 6: Trace doesn't match instruction
>>> instruction trace error type 1 time 9274.422239028 cpu 39 pid 5880 tid 5880 ip 0xffffffffa13279fb code 5: Failed to get instruction
>>> instruction trace error type 1 time 9274.422242556 cpu 39 pid 5880 tid 5880 ip 0xffffffff814e9c73 code 6: Trace doesn't match instruction
>>> instruction trace error type 1 time 9274.422258915 cpu 39 pid 5880 tid 5880 ip 0xffffffff810d7da2 code 6: Trace doesn't match instruction
>>> instruction trace error type 1 time 9274.422258915 cpu 39 pid 5880 tid 5880 ip 0xffffffffa030a0b2 code 6: Trace doesn't match instruction
>>
>> I don't think it can be CONFIG. Can you share the branch you are using?
>> Then I can test it.
>>
>
> I built the one you mentioned in cover email:
> git://git.infradead.org/users/ahunter/linux-perf.git text_poke

The tools and the kernel?

Does it successfully decode anything? e.g.
./perf script --itrace=be -i perf.data.after | head

2020-04-21 13:37:59

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH V6 13/15] perf intel-pt: Add support for text poke events

On Tue, Apr 21, 2020 at 04:10:33PM +0300, Adrian Hunter wrote:

SNIP

> >>> I'm still seeing some, probably I'm missing some CONFIG, will check
> >>>
> >>> # ./perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M &
> >>> [1] 5880
> >>> # cat /proc/sys/kernel/sched_schedstats
> >>> 0
> >>> # echo 1 > /proc/sys/kernel/sched_schedstats
> >>> # cat /proc/sys/kernel/sched_schedstats
> >>> 1
> >>> # echo 0 > /proc/sys/kernel/sched_schedstats
> >>> # kill %1
> >>> # [ perf record: Woken up 1 times to write data ]
> >>> [ perf record: Captured and wrote 6.181 MB perf.data.after ]
> >>> [1]+ Terminated ./perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M
> >>> # ./perf script --itrace=e -i perf.data.after > /dev/null
> >>> Warning:
> >>> 18837 instruction trace errors
> >>>
> >>> # ./perf script --itrace=e -i perf.data.after | head
> >>> instruction trace error type 1 time 9274.420582345 cpu 9 pid 845 tid 845 ip 0xffffffff814e6cf2 code 6: Trace doesn't match instruction
> >>> instruction trace error type 1 time 9274.422229726 cpu 39 pid 5880 tid 5880 ip 0xffffffffa030a320 code 6: Trace doesn't match instruction
> >>> instruction trace error type 1 time 9274.422231972 cpu 39 pid 5880 tid 5880 ip 0xffffffffa0315c1c code 6: Trace doesn't match instruction
> >>> instruction trace error type 1 time 9274.422236141 cpu 39 pid 5880 tid 5880 ip 0xffffffff81143263 code 6: Trace doesn't match instruction
> >>> instruction trace error type 1 time 9274.422237424 cpu 39 pid 5880 tid 5880 ip 0xffffffff8115c388 code 6: Trace doesn't match instruction
> >>> instruction trace error type 1 time 9274.422239028 cpu 39 pid 5880 tid 5880 ip 0xffffffff811428c9 code 6: Trace doesn't match instruction
> >>> instruction trace error type 1 time 9274.422239028 cpu 39 pid 5880 tid 5880 ip 0xffffffffa13279fb code 5: Failed to get instruction
> >>> instruction trace error type 1 time 9274.422242556 cpu 39 pid 5880 tid 5880 ip 0xffffffff814e9c73 code 6: Trace doesn't match instruction
> >>> instruction trace error type 1 time 9274.422258915 cpu 39 pid 5880 tid 5880 ip 0xffffffff810d7da2 code 6: Trace doesn't match instruction
> >>> instruction trace error type 1 time 9274.422258915 cpu 39 pid 5880 tid 5880 ip 0xffffffffa030a0b2 code 6: Trace doesn't match instruction
> >>
> >> I don't think it can be CONFIG. Can you share the branch you are using?
> >> Then I can test it.
> >>
> >
> > I built the one you mentioned in cover email:
> > git://git.infradead.org/users/ahunter/linux-perf.git text_poke
>
> The tools and the kernel?

both from that branch

>
> Does it successfully decode anything? e.g.
> ./perf script --itrace=be -i perf.data.after | head
>

# ./perf script --itrace=be -i perf.data.after | head
perf 5880 [000] 9274.419884: 1 branches:k: 0 [unknown] ([unknown]) => ffffffff8106a4a6 native_write_msr+0x6 (/lib/modules/5.6.0-rc6+/build/vmlinux)
perf 5880 [000] 9274.419884: 1 branches:k: ffffffff8106a4ab native_write_msr+0xb (/lib/modules/5.6.0-rc6+/build/vmlinux) => ffffffff810152f9 pt_config_start+0x59 (/lib/modules/5.6.0-rc6+/build/vmlinux)
perf 5880 [000] 9274.419884: 1 branches:k: ffffffff81015304 pt_config_start+0x64 (/lib/modules/5.6.0-rc6+/build/vmlinux) => ffffffff81015648 pt_event_add+0x38 (/lib/modules/5.6.0-rc6+/build/vmlinux)
perf 5880 [000] 9274.419884: 1 branches:k: ffffffff8101565d pt_event_add+0x4d (/lib/modules/5.6.0-rc6+/build/vmlinux) => ffffffff8121d3ca event_sched_in.isra.0+0xea (/lib/modules/5.6.0-rc6+/build/vmlinux)
perf 5880 [000] 9274.419884: 1 branches:k: ffffffff8121d3dc event_sched_in.isra.0+0xfc (/lib/modules/5.6.0-rc6+/build/vmlinux) => ffffffff8121d3e3 event_sched_in.isra.0+0x103 (/lib/modules/5.6.0-rc6+/build/vmlinux)
perf 5880 [000] 9274.419884: 1 branches:k: ffffffff8121d3f4 event_sched_in.isra.0+0x114 (/lib/modules/5.6.0-rc6+/build/vmlinux) => ffffffff8121d497 event_sched_in.isra.0+0x1b7 (/lib/modules/5.6.0-rc6+/build/vmlinux)
perf 5880 [000] 9274.419884: 1 branches:k: ffffffff8121d4c1 event_sched_in.isra.0+0x1e1 (/lib/modules/5.6.0-rc6+/build/vmlinux) => ffffffff814f9560 __list_add_valid+0x0 (/lib/modules/5.6.0-rc6+/build/vmlinux)
perf 5880 [000] 9274.419884: 1 branches:k: ffffffff814f9593 __list_add_valid+0x33 (/lib/modules/5.6.0-rc6+/build/vmlinux) => ffffffff8121d4c6 event_sched_in.isra.0+0x1e6 (/lib/modules/5.6.0-rc6+/build/vmlinux)
perf 5880 [000] 9274.419884: 1 branches:k: ffffffff8121d4e2 event_sched_in.isra.0+0x202 (/lib/modules/5.6.0-rc6+/build/vmlinux) => ffffffff8121d3fa event_sched_in.isra.0+0x11a (/lib/modules/5.6.0-rc6+/build/vmlinux)
perf 5880 [000] 9274.419884: 1 branches:k: ffffffff8121d401 event_sched_in.isra.0+0x121 (/lib/modules/5.6.0-rc6+/build/vmlinux) => ffffffff8121d40d event_sched_in.isra.0+0x12d (/lib/modules/5.6.0-rc6+/build/vmlinux)

# ./perf script --call-trace -i perf.data.after | head
perf 5880 [000] 9274.419884188: cbr: 8 freq: 798 MHz ( 36%)
perf 5880 [000] 9274.419884389: (/lib/modules/5.6.0-rc6+/build/vmlinux ) __list_add_valid
perf 5880 [000] 9274.419884389: (/lib/modules/5.6.0-rc6+/build/vmlinux ) perf_pmu_enable.part.0
perf 5880 [000] 9274.419884389: (/lib/modules/5.6.0-rc6+/build/vmlinux ) __x86_indirect_thunk_rax
perf 5880 [000] 9274.419884389: (/lib/modules/5.6.0-rc6+/build/vmlinux ) __x86_indirect_thunk_rax
perf 5880 [000] 9274.419884389: (/lib/modules/5.6.0-rc6+/build/vmlinux ) __x86_indirect_thunk_rax
perf 5880 [000] 9274.419884709: (/lib/modules/5.6.0-rc6+/build/vmlinux ) __list_add_valid
perf 5880 [000] 9274.419884709: (/lib/modules/5.6.0-rc6+/build/vmlinux ) rb_next
perf 5880 [000] 9274.419884709: (/lib/modules/5.6.0-rc6+/build/vmlinux ) merge_sched_in
perf 5880 [000] 9274.419884709: (/lib/modules/5.6.0-rc6+/build/vmlinux ) __x86_indirect_thunk_rax

jirka

2020-04-21 13:48:05

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH V6 13/15] perf intel-pt: Add support for text poke events

On 21/04/20 4:35 pm, Jiri Olsa wrote:
> On Tue, Apr 21, 2020 at 04:10:33PM +0300, Adrian Hunter wrote:
>
> SNIP
>
>>>>> I'm still seeing some, probably I'm missing some CONFIG, will check
>>>>>
>>>>> # ./perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M &
>>>>> [1] 5880
>>>>> # cat /proc/sys/kernel/sched_schedstats
>>>>> 0
>>>>> # echo 1 > /proc/sys/kernel/sched_schedstats
>>>>> # cat /proc/sys/kernel/sched_schedstats
>>>>> 1
>>>>> # echo 0 > /proc/sys/kernel/sched_schedstats
>>>>> # kill %1
>>>>> # [ perf record: Woken up 1 times to write data ]
>>>>> [ perf record: Captured and wrote 6.181 MB perf.data.after ]
>>>>> [1]+ Terminated ./perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M
>>>>> # ./perf script --itrace=e -i perf.data.after > /dev/null
>>>>> Warning:
>>>>> 18837 instruction trace errors
>>>>>
>>>>> # ./perf script --itrace=e -i perf.data.after | head
>>>>> instruction trace error type 1 time 9274.420582345 cpu 9 pid 845 tid 845 ip 0xffffffff814e6cf2 code 6: Trace doesn't match instruction
>>>>> instruction trace error type 1 time 9274.422229726 cpu 39 pid 5880 tid 5880 ip 0xffffffffa030a320 code 6: Trace doesn't match instruction
>>>>> instruction trace error type 1 time 9274.422231972 cpu 39 pid 5880 tid 5880 ip 0xffffffffa0315c1c code 6: Trace doesn't match instruction
>>>>> instruction trace error type 1 time 9274.422236141 cpu 39 pid 5880 tid 5880 ip 0xffffffff81143263 code 6: Trace doesn't match instruction
>>>>> instruction trace error type 1 time 9274.422237424 cpu 39 pid 5880 tid 5880 ip 0xffffffff8115c388 code 6: Trace doesn't match instruction
>>>>> instruction trace error type 1 time 9274.422239028 cpu 39 pid 5880 tid 5880 ip 0xffffffff811428c9 code 6: Trace doesn't match instruction
>>>>> instruction trace error type 1 time 9274.422239028 cpu 39 pid 5880 tid 5880 ip 0xffffffffa13279fb code 5: Failed to get instruction
>>>>> instruction trace error type 1 time 9274.422242556 cpu 39 pid 5880 tid 5880 ip 0xffffffff814e9c73 code 6: Trace doesn't match instruction
>>>>> instruction trace error type 1 time 9274.422258915 cpu 39 pid 5880 tid 5880 ip 0xffffffff810d7da2 code 6: Trace doesn't match instruction
>>>>> instruction trace error type 1 time 9274.422258915 cpu 39 pid 5880 tid 5880 ip 0xffffffffa030a0b2 code 6: Trace doesn't match instruction
>>>>
>>>> I don't think it can be CONFIG. Can you share the branch you are using?
>>>> Then I can test it.
>>>>
>>>
>>> I built the one you mentioned in cover email:
>>> git://git.infradead.org/users/ahunter/linux-perf.git text_poke
>>
>> The tools and the kernel?
>
> both from that branch
>
>>
>> Does it successfully decode anything? e.g.
>> ./perf script --itrace=be -i perf.data.after | head
>>
>
> # ./perf script --itrace=be -i perf.data.after | head
> perf 5880 [000] 9274.419884: 1 branches:k: 0 [unknown] ([unknown]) => ffffffff8106a4a6 native_write_msr+0x6 (/lib/modules/5.6.0-rc6+/build/vmlinux)
> perf 5880 [000] 9274.419884: 1 branches:k: ffffffff8106a4ab native_write_msr+0xb (/lib/modules/5.6.0-rc6+/build/vmlinux) => ffffffff810152f9 pt_config_start+0x59 (/lib/modules/5.6.0-rc6+/build/vmlinux)
> perf 5880 [000] 9274.419884: 1 branches:k: ffffffff81015304 pt_config_start+0x64 (/lib/modules/5.6.0-rc6+/build/vmlinux) => ffffffff81015648 pt_event_add+0x38 (/lib/modules/5.6.0-rc6+/build/vmlinux)
> perf 5880 [000] 9274.419884: 1 branches:k: ffffffff8101565d pt_event_add+0x4d (/lib/modules/5.6.0-rc6+/build/vmlinux) => ffffffff8121d3ca event_sched_in.isra.0+0xea (/lib/modules/5.6.0-rc6+/build/vmlinux)
> perf 5880 [000] 9274.419884: 1 branches:k: ffffffff8121d3dc event_sched_in.isra.0+0xfc (/lib/modules/5.6.0-rc6+/build/vmlinux) => ffffffff8121d3e3 event_sched_in.isra.0+0x103 (/lib/modules/5.6.0-rc6+/build/vmlinux)
> perf 5880 [000] 9274.419884: 1 branches:k: ffffffff8121d3f4 event_sched_in.isra.0+0x114 (/lib/modules/5.6.0-rc6+/build/vmlinux) => ffffffff8121d497 event_sched_in.isra.0+0x1b7 (/lib/modules/5.6.0-rc6+/build/vmlinux)
> perf 5880 [000] 9274.419884: 1 branches:k: ffffffff8121d4c1 event_sched_in.isra.0+0x1e1 (/lib/modules/5.6.0-rc6+/build/vmlinux) => ffffffff814f9560 __list_add_valid+0x0 (/lib/modules/5.6.0-rc6+/build/vmlinux)
> perf 5880 [000] 9274.419884: 1 branches:k: ffffffff814f9593 __list_add_valid+0x33 (/lib/modules/5.6.0-rc6+/build/vmlinux) => ffffffff8121d4c6 event_sched_in.isra.0+0x1e6 (/lib/modules/5.6.0-rc6+/build/vmlinux)
> perf 5880 [000] 9274.419884: 1 branches:k: ffffffff8121d4e2 event_sched_in.isra.0+0x202 (/lib/modules/5.6.0-rc6+/build/vmlinux) => ffffffff8121d3fa event_sched_in.isra.0+0x11a (/lib/modules/5.6.0-rc6+/build/vmlinux)
> perf 5880 [000] 9274.419884: 1 branches:k: ffffffff8121d401 event_sched_in.isra.0+0x121 (/lib/modules/5.6.0-rc6+/build/vmlinux) => ffffffff8121d40d event_sched_in.isra.0+0x12d (/lib/modules/5.6.0-rc6+/build/vmlinux)
>
> # ./perf script --call-trace -i perf.data.after | head
> perf 5880 [000] 9274.419884188: cbr: 8 freq: 798 MHz ( 36%)
> perf 5880 [000] 9274.419884389: (/lib/modules/5.6.0-rc6+/build/vmlinux ) __list_add_valid
> perf 5880 [000] 9274.419884389: (/lib/modules/5.6.0-rc6+/build/vmlinux ) perf_pmu_enable.part.0
> perf 5880 [000] 9274.419884389: (/lib/modules/5.6.0-rc6+/build/vmlinux ) __x86_indirect_thunk_rax
> perf 5880 [000] 9274.419884389: (/lib/modules/5.6.0-rc6+/build/vmlinux ) __x86_indirect_thunk_rax
> perf 5880 [000] 9274.419884389: (/lib/modules/5.6.0-rc6+/build/vmlinux ) __x86_indirect_thunk_rax
> perf 5880 [000] 9274.419884709: (/lib/modules/5.6.0-rc6+/build/vmlinux ) __list_add_valid
> perf 5880 [000] 9274.419884709: (/lib/modules/5.6.0-rc6+/build/vmlinux ) rb_next
> perf 5880 [000] 9274.419884709: (/lib/modules/5.6.0-rc6+/build/vmlinux ) merge_sched_in
> perf 5880 [000] 9274.419884709: (/lib/modules/5.6.0-rc6+/build/vmlinux ) __x86_indirect_thunk_rax
>

perf is not using kcore, which explains the errors. Not sure why it prefers vmlinux. It should effectively be doing:

perf script -i perf.data.after --itrace=e --kallsyms perf.data.after/kcore_dir/kallsyms

What is in the perf.data.after directory?





2020-04-21 13:48:12

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH V6 08/15] ftrace: Add perf text poke events for ftrace trampolines

On Sun, Apr 05, 2020 at 11:13:20PM +0300, Adrian Hunter wrote:
> Add perf text poke events for ftrace trampolines when created and when
> freed.

Maybe also put in a little more detail on the various events. Because
arch_ftrace_update_trampoline() can also generate text_poke_bp() events,
to update an existing trampoline.

A diagram, like with the kprobes thing perhaps.

> Signed-off-by: Adrian Hunter <[email protected]>
> ---
> kernel/trace/ftrace.c | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 398d58b8ffa5..3318f7eca76c 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -2792,6 +2792,13 @@ static void ftrace_trampoline_free(struct ftrace_ops *ops)
> {
> if (ops && (ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP) &&
> ops->trampoline) {
> + /*
> + * Record the text poke event before the ksymbol unregister
> + * event.
> + */
> + perf_event_text_poke((void *)ops->trampoline,
> + (void *)ops->trampoline,
> + ops->trampoline_size, NULL, 0);
> perf_event_ksymbol(PERF_RECORD_KSYMBOL_TYPE_OOL,
> ops->trampoline, ops->trampoline_size,
> true, FTRACE_TRAMPOLINE_SYM);
> @@ -6816,6 +6823,13 @@ static void ftrace_update_trampoline(struct ftrace_ops *ops)
> perf_event_ksymbol(PERF_RECORD_KSYMBOL_TYPE_OOL,
> ops->trampoline, ops->trampoline_size, false,
> FTRACE_TRAMPOLINE_SYM);
> + /*
> + * Record the perf text poke event after the ksymbol register
> + * event.
> + */
> + perf_event_text_poke((void *)ops->trampoline, NULL, 0,
> + (void *)ops->trampoline,
> + ops->trampoline_size);
> }
> }
>
> --
> 2.17.1
>

2020-04-21 13:48:24

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH V6 00/15] perf/x86: Add perf text poke events

On Sun, Apr 05, 2020 at 11:13:12PM +0300, Adrian Hunter wrote:
>
> Adrian Hunter (15):
> perf: Add perf text poke event
> perf/x86: Add support for perf text poke event for text_poke_bp_batch() callers
> kprobes: Add symbols for kprobe insn pages
> kprobes: Add perf ksymbol events for kprobe insn pages
> perf/x86: Add perf text poke events for kprobes
> ftrace: Add symbols for ftrace trampolines
> ftrace: Add perf ksymbol events for ftrace trampolines
> ftrace: Add perf text poke events for ftrace trampolines

Acked-by: Peter Zijlstra (Intel) <[email protected]>

2020-04-21 15:41:29

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH V6 13/15] perf intel-pt: Add support for text poke events

On 21/04/20 4:45 pm, Adrian Hunter wrote:
> On 21/04/20 4:35 pm, Jiri Olsa wrote:
>> On Tue, Apr 21, 2020 at 04:10:33PM +0300, Adrian Hunter wrote:
>>
>> SNIP
>>
>>>>>> I'm still seeing some, probably I'm missing some CONFIG, will check
>>>>>>
>>>>>> # ./perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M &
>>>>>> [1] 5880
>>>>>> # cat /proc/sys/kernel/sched_schedstats
>>>>>> 0
>>>>>> # echo 1 > /proc/sys/kernel/sched_schedstats
>>>>>> # cat /proc/sys/kernel/sched_schedstats
>>>>>> 1
>>>>>> # echo 0 > /proc/sys/kernel/sched_schedstats
>>>>>> # kill %1
>>>>>> # [ perf record: Woken up 1 times to write data ]
>>>>>> [ perf record: Captured and wrote 6.181 MB perf.data.after ]
>>>>>> [1]+ Terminated ./perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M
>>>>>> # ./perf script --itrace=e -i perf.data.after > /dev/null
>>>>>> Warning:
>>>>>> 18837 instruction trace errors
>>>>>>
>>>>>> # ./perf script --itrace=e -i perf.data.after | head
>>>>>> instruction trace error type 1 time 9274.420582345 cpu 9 pid 845 tid 845 ip 0xffffffff814e6cf2 code 6: Trace doesn't match instruction
>>>>>> instruction trace error type 1 time 9274.422229726 cpu 39 pid 5880 tid 5880 ip 0xffffffffa030a320 code 6: Trace doesn't match instruction
>>>>>> instruction trace error type 1 time 9274.422231972 cpu 39 pid 5880 tid 5880 ip 0xffffffffa0315c1c code 6: Trace doesn't match instruction
>>>>>> instruction trace error type 1 time 9274.422236141 cpu 39 pid 5880 tid 5880 ip 0xffffffff81143263 code 6: Trace doesn't match instruction
>>>>>> instruction trace error type 1 time 9274.422237424 cpu 39 pid 5880 tid 5880 ip 0xffffffff8115c388 code 6: Trace doesn't match instruction
>>>>>> instruction trace error type 1 time 9274.422239028 cpu 39 pid 5880 tid 5880 ip 0xffffffff811428c9 code 6: Trace doesn't match instruction
>>>>>> instruction trace error type 1 time 9274.422239028 cpu 39 pid 5880 tid 5880 ip 0xffffffffa13279fb code 5: Failed to get instruction
>>>>>> instruction trace error type 1 time 9274.422242556 cpu 39 pid 5880 tid 5880 ip 0xffffffff814e9c73 code 6: Trace doesn't match instruction
>>>>>> instruction trace error type 1 time 9274.422258915 cpu 39 pid 5880 tid 5880 ip 0xffffffff810d7da2 code 6: Trace doesn't match instruction
>>>>>> instruction trace error type 1 time 9274.422258915 cpu 39 pid 5880 tid 5880 ip 0xffffffffa030a0b2 code 6: Trace doesn't match instruction
>>>>>
>>>>> I don't think it can be CONFIG. Can you share the branch you are using?
>>>>> Then I can test it.
>>>>>
>>>>
>>>> I built the one you mentioned in cover email:
>>>> git://git.infradead.org/users/ahunter/linux-perf.git text_poke
>>>
>>> The tools and the kernel?
>>
>> both from that branch
>>
>>>
>>> Does it successfully decode anything? e.g.
>>> ./perf script --itrace=be -i perf.data.after | head
>>>
>>
>> # ./perf script --itrace=be -i perf.data.after | head
>> perf 5880 [000] 9274.419884: 1 branches:k: 0 [unknown] ([unknown]) => ffffffff8106a4a6 native_write_msr+0x6 (/lib/modules/5.6.0-rc6+/build/vmlinux)
>> perf 5880 [000] 9274.419884: 1 branches:k: ffffffff8106a4ab native_write_msr+0xb (/lib/modules/5.6.0-rc6+/build/vmlinux) => ffffffff810152f9 pt_config_start+0x59 (/lib/modules/5.6.0-rc6+/build/vmlinux)
>> perf 5880 [000] 9274.419884: 1 branches:k: ffffffff81015304 pt_config_start+0x64 (/lib/modules/5.6.0-rc6+/build/vmlinux) => ffffffff81015648 pt_event_add+0x38 (/lib/modules/5.6.0-rc6+/build/vmlinux)
>> perf 5880 [000] 9274.419884: 1 branches:k: ffffffff8101565d pt_event_add+0x4d (/lib/modules/5.6.0-rc6+/build/vmlinux) => ffffffff8121d3ca event_sched_in.isra.0+0xea (/lib/modules/5.6.0-rc6+/build/vmlinux)
>> perf 5880 [000] 9274.419884: 1 branches:k: ffffffff8121d3dc event_sched_in.isra.0+0xfc (/lib/modules/5.6.0-rc6+/build/vmlinux) => ffffffff8121d3e3 event_sched_in.isra.0+0x103 (/lib/modules/5.6.0-rc6+/build/vmlinux)
>> perf 5880 [000] 9274.419884: 1 branches:k: ffffffff8121d3f4 event_sched_in.isra.0+0x114 (/lib/modules/5.6.0-rc6+/build/vmlinux) => ffffffff8121d497 event_sched_in.isra.0+0x1b7 (/lib/modules/5.6.0-rc6+/build/vmlinux)
>> perf 5880 [000] 9274.419884: 1 branches:k: ffffffff8121d4c1 event_sched_in.isra.0+0x1e1 (/lib/modules/5.6.0-rc6+/build/vmlinux) => ffffffff814f9560 __list_add_valid+0x0 (/lib/modules/5.6.0-rc6+/build/vmlinux)
>> perf 5880 [000] 9274.419884: 1 branches:k: ffffffff814f9593 __list_add_valid+0x33 (/lib/modules/5.6.0-rc6+/build/vmlinux) => ffffffff8121d4c6 event_sched_in.isra.0+0x1e6 (/lib/modules/5.6.0-rc6+/build/vmlinux)
>> perf 5880 [000] 9274.419884: 1 branches:k: ffffffff8121d4e2 event_sched_in.isra.0+0x202 (/lib/modules/5.6.0-rc6+/build/vmlinux) => ffffffff8121d3fa event_sched_in.isra.0+0x11a (/lib/modules/5.6.0-rc6+/build/vmlinux)
>> perf 5880 [000] 9274.419884: 1 branches:k: ffffffff8121d401 event_sched_in.isra.0+0x121 (/lib/modules/5.6.0-rc6+/build/vmlinux) => ffffffff8121d40d event_sched_in.isra.0+0x12d (/lib/modules/5.6.0-rc6+/build/vmlinux)
>>
>> # ./perf script --call-trace -i perf.data.after | head
>> perf 5880 [000] 9274.419884188: cbr: 8 freq: 798 MHz ( 36%)
>> perf 5880 [000] 9274.419884389: (/lib/modules/5.6.0-rc6+/build/vmlinux ) __list_add_valid
>> perf 5880 [000] 9274.419884389: (/lib/modules/5.6.0-rc6+/build/vmlinux ) perf_pmu_enable.part.0
>> perf 5880 [000] 9274.419884389: (/lib/modules/5.6.0-rc6+/build/vmlinux ) __x86_indirect_thunk_rax
>> perf 5880 [000] 9274.419884389: (/lib/modules/5.6.0-rc6+/build/vmlinux ) __x86_indirect_thunk_rax
>> perf 5880 [000] 9274.419884389: (/lib/modules/5.6.0-rc6+/build/vmlinux ) __x86_indirect_thunk_rax
>> perf 5880 [000] 9274.419884709: (/lib/modules/5.6.0-rc6+/build/vmlinux ) __list_add_valid
>> perf 5880 [000] 9274.419884709: (/lib/modules/5.6.0-rc6+/build/vmlinux ) rb_next
>> perf 5880 [000] 9274.419884709: (/lib/modules/5.6.0-rc6+/build/vmlinux ) merge_sched_in
>> perf 5880 [000] 9274.419884709: (/lib/modules/5.6.0-rc6+/build/vmlinux ) __x86_indirect_thunk_rax
>>
>
> perf is not using kcore, which explains the errors. Not sure why it prefers vmlinux. It should effectively be doing:
>
> perf script -i perf.data.after --itrace=e --kallsyms perf.data.after/kcore_dir/kallsyms
>
> What is in the perf.data.after directory?

I played around but have not yet figured out why perf prefers vmlinux on
your system, but I will investigate more.


2020-04-22 07:42:04

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH V6 08/15] ftrace: Add perf text poke events for ftrace trampolines

On 21/04/20 4:45 pm, Peter Zijlstra wrote:
> On Sun, Apr 05, 2020 at 11:13:20PM +0300, Adrian Hunter wrote:
>> Add perf text poke events for ftrace trampolines when created and when
>> freed.
>
> Maybe also put in a little more detail on the various events. Because
> arch_ftrace_update_trampoline() can also generate text_poke_bp() events,
> to update an existing trampoline.
>
> A diagram, like with the kprobes thing perhaps.

How about adding this:

There can be 3 text_poke events for ftrace trampolines:

1. NULL -> trampoline
By ftrace_update_trampoline() when !ops->trampoline
Trampoline created

2. [e.g. on x86] CALL rel32 -> CALL rel32
By arch_ftrace_update_trampoline() when ops->trampoline and
ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP
[e.g. on x86] via text_poke_bp() which generates text poke events
Trampoline-called function target updated

3. trampoline -> NULL
By ftrace_trampoline_free() when ops->trampoline and
ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP
Trampoline freed

>
>> Signed-off-by: Adrian Hunter <[email protected]>
>> ---
>> kernel/trace/ftrace.c | 14 ++++++++++++++
>> 1 file changed, 14 insertions(+)
>>
>> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
>> index 398d58b8ffa5..3318f7eca76c 100644
>> --- a/kernel/trace/ftrace.c
>> +++ b/kernel/trace/ftrace.c
>> @@ -2792,6 +2792,13 @@ static void ftrace_trampoline_free(struct ftrace_ops *ops)
>> {
>> if (ops && (ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP) &&
>> ops->trampoline) {
>> + /*
>> + * Record the text poke event before the ksymbol unregister
>> + * event.
>> + */
>> + perf_event_text_poke((void *)ops->trampoline,
>> + (void *)ops->trampoline,
>> + ops->trampoline_size, NULL, 0);
>> perf_event_ksymbol(PERF_RECORD_KSYMBOL_TYPE_OOL,
>> ops->trampoline, ops->trampoline_size,
>> true, FTRACE_TRAMPOLINE_SYM);
>> @@ -6816,6 +6823,13 @@ static void ftrace_update_trampoline(struct ftrace_ops *ops)
>> perf_event_ksymbol(PERF_RECORD_KSYMBOL_TYPE_OOL,
>> ops->trampoline, ops->trampoline_size, false,
>> FTRACE_TRAMPOLINE_SYM);
>> + /*
>> + * Record the perf text poke event after the ksymbol register
>> + * event.
>> + */
>> + perf_event_text_poke((void *)ops->trampoline, NULL, 0,
>> + (void *)ops->trampoline,
>> + ops->trampoline_size);
>> }
>> }
>>
>> --
>> 2.17.1
>>

2020-04-22 12:24:27

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH V6 08/15] ftrace: Add perf text poke events for ftrace trampolines

On Wed, Apr 22, 2020 at 10:39:06AM +0300, Adrian Hunter wrote:
> On 21/04/20 4:45 pm, Peter Zijlstra wrote:
> > On Sun, Apr 05, 2020 at 11:13:20PM +0300, Adrian Hunter wrote:
> >> Add perf text poke events for ftrace trampolines when created and when
> >> freed.
> >
> > Maybe also put in a little more detail on the various events. Because
> > arch_ftrace_update_trampoline() can also generate text_poke_bp() events,
> > to update an existing trampoline.
> >
> > A diagram, like with the kprobes thing perhaps.
>
> How about adding this:
>
> There can be 3 text_poke events for ftrace trampolines:
>
> 1. NULL -> trampoline
> By ftrace_update_trampoline() when !ops->trampoline
> Trampoline created
>
> 2. [e.g. on x86] CALL rel32 -> CALL rel32
> By arch_ftrace_update_trampoline() when ops->trampoline and
> ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP
> [e.g. on x86] via text_poke_bp() which generates text poke events
> Trampoline-called function target updated
>
> 3. trampoline -> NULL
> By ftrace_trampoline_free() when ops->trampoline and
> ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP
> Trampoline freed

Yes, very nice. Thanks!

2020-04-27 07:24:50

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH V6 08/15] ftrace: Add perf text poke events for ftrace trampolines

On 22/04/20 2:46 pm, Peter Zijlstra wrote:
> On Wed, Apr 22, 2020 at 10:39:06AM +0300, Adrian Hunter wrote:
>> On 21/04/20 4:45 pm, Peter Zijlstra wrote:
>>> On Sun, Apr 05, 2020 at 11:13:20PM +0300, Adrian Hunter wrote:
>>>> Add perf text poke events for ftrace trampolines when created and when
>>>> freed.
>>>
>>> Maybe also put in a little more detail on the various events. Because
>>> arch_ftrace_update_trampoline() can also generate text_poke_bp() events,
>>> to update an existing trampoline.
>>>
>>> A diagram, like with the kprobes thing perhaps.
>>
>> How about adding this:
>>
>> There can be 3 text_poke events for ftrace trampolines:
>>
>> 1. NULL -> trampoline
>> By ftrace_update_trampoline() when !ops->trampoline
>> Trampoline created
>>
>> 2. [e.g. on x86] CALL rel32 -> CALL rel32
>> By arch_ftrace_update_trampoline() when ops->trampoline and
>> ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP
>> [e.g. on x86] via text_poke_bp() which generates text poke events
>> Trampoline-called function target updated
>>
>> 3. trampoline -> NULL
>> By ftrace_trampoline_free() when ops->trampoline and
>> ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP
>> Trampoline freed
>
> Yes, very nice. Thanks!

Arnaldo, do you have this patchset on your radar?

2020-04-27 18:36:28

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH V6 08/15] ftrace: Add perf text poke events for ftrace trampolines

On Mon, 27 Apr 2020 10:23:05 +0300
Adrian Hunter <[email protected]> wrote:

> On 22/04/20 2:46 pm, Peter Zijlstra wrote:
> > On Wed, Apr 22, 2020 at 10:39:06AM +0300, Adrian Hunter wrote:
> >> On 21/04/20 4:45 pm, Peter Zijlstra wrote:
> >>> On Sun, Apr 05, 2020 at 11:13:20PM +0300, Adrian Hunter wrote:
> >>>> Add perf text poke events for ftrace trampolines when created and when
> >>>> freed.
> >>>
> >>> Maybe also put in a little more detail on the various events. Because
> >>> arch_ftrace_update_trampoline() can also generate text_poke_bp() events,
> >>> to update an existing trampoline.
> >>>
> >>> A diagram, like with the kprobes thing perhaps.
> >>
> >> How about adding this:
> >>
> >> There can be 3 text_poke events for ftrace trampolines:
> >>
> >> 1. NULL -> trampoline
> >> By ftrace_update_trampoline() when !ops->trampoline
> >> Trampoline created
> >>
> >> 2. [e.g. on x86] CALL rel32 -> CALL rel32
> >> By arch_ftrace_update_trampoline() when ops->trampoline and
> >> ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP
> >> [e.g. on x86] via text_poke_bp() which generates text poke events
> >> Trampoline-called function target updated
> >>
> >> 3. trampoline -> NULL
> >> By ftrace_trampoline_free() when ops->trampoline and
> >> ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP
> >> Trampoline freed
> >
> > Yes, very nice. Thanks!
>
> Arnaldo, do you have this patchset on your radar?

Arnaldo deals with the userspace perf code. This looks like it needs to go
through the x86 tree.

-- Steve

2020-04-27 20:15:38

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH V6 08/15] ftrace: Add perf text poke events for ftrace trampolines

Em Mon, Apr 27, 2020 at 10:23:05AM +0300, Adrian Hunter escreveu:
> On 22/04/20 2:46 pm, Peter Zijlstra wrote:
> > On Wed, Apr 22, 2020 at 10:39:06AM +0300, Adrian Hunter wrote:
> >> On 21/04/20 4:45 pm, Peter Zijlstra wrote:
> >>> On Sun, Apr 05, 2020 at 11:13:20PM +0300, Adrian Hunter wrote:
> >>>> Add perf text poke events for ftrace trampolines when created and when
> >>>> freed.
> >>>
> >>> Maybe also put in a little more detail on the various events. Because
> >>> arch_ftrace_update_trampoline() can also generate text_poke_bp() events,
> >>> to update an existing trampoline.
> >>>
> >>> A diagram, like with the kprobes thing perhaps.
> >>
> >> How about adding this:
> >>
> >> There can be 3 text_poke events for ftrace trampolines:
> >>
> >> 1. NULL -> trampoline
> >> By ftrace_update_trampoline() when !ops->trampoline
> >> Trampoline created
> >>
> >> 2. [e.g. on x86] CALL rel32 -> CALL rel32
> >> By arch_ftrace_update_trampoline() when ops->trampoline and
> >> ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP
> >> [e.g. on x86] via text_poke_bp() which generates text poke events
> >> Trampoline-called function target updated
> >>
> >> 3. trampoline -> NULL
> >> By ftrace_trampoline_free() when ops->trampoline and
> >> ops->flags & FTRACE_OPS_FL_ALLOC_TRAMP
> >> Trampoline freed
> >
> > Yes, very nice. Thanks!
>
> Arnaldo, do you have this patchset on your radar?

Now it is, will look it into it,

- Arnaldo

2020-04-28 16:49:27

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH V6 08/15] ftrace: Add perf text poke events for ftrace trampolines

On Mon, Apr 27, 2020 at 02:34:42PM -0400, Steven Rostedt wrote:
> Arnaldo deals with the userspace perf code. This looks like it needs to go
> through the x86 tree.

Sometimes, when a series crosses both, it'll go through either tree.

I'm fine with picking up the kernel bits I suppose, but then we need to
be careful to have them land in order or something. Whatever people
want.