2016-04-06 12:45:05

by Naveen N. Rao

[permalink] [raw]
Subject: [PATCH 0/2] perf probe fixes for ppc64le

This patchset fixes three issues found with perf probe on ppc64le:
1. 'perf test kallsyms' failure on ppc64le (reported by Michael
Ellerman). This was due to the symbols being fixed up during symbol
table load. This is fixed in patch 2 by delaying symbol fixup until
later.
2. perf probe function offset was being calculated from the local entry
point (LEP), which does not match user expectation when trying to look
at function disassembly output (reported by Ananth N). This is fixed for
kallsyms in patch 1 and for symbol table in patch 2.
3. perf probe failure with kretprobe when using kallsyms. This was
failing as we were specifying an offset. This is fixed in patch 1.

A few examples demonstrating the issues and the fix:

Example for issue (2):
--------------------
# objdump -d vmlinux | grep -A8 \<_do_fork\>:
c0000000000b6a00 <_do_fork>:
c0000000000b6a00: f7 00 4c 3c addis r2,r12,247
c0000000000b6a04: 00 86 42 38 addi r2,r2,-31232
c0000000000b6a08: a6 02 08 7c mflr r0
c0000000000b6a0c: d0 ff 41 fb std r26,-48(r1)
c0000000000b6a10: 26 80 90 7d mfocrf r12,8
c0000000000b6a14: d8 ff 61 fb std r27,-40(r1)
c0000000000b6a18: e0 ff 81 fb std r28,-32(r1)
c0000000000b6a1c: e8 ff a1 fb std r29,-24(r1)
# perf probe -v _do_fork+4
probe-definition(0): _do_fork+4
symbol:_do_fork file:(null) line:0 offset:4 return:0 lazy:(null)
0 arguments
Looking at the vmlinux_path (8 entries long)
Using /proc/kcore for kernel object code
Using /proc/kallsyms for symbols
Opening /sys/kernel/debug/tracing//kprobe_events write=1
Writing event: p:probe/_do_fork _text+748044
Added new event:
probe:_do_fork (on _do_fork+4)

You can now use it in all perf tools, such as:

perf record -e probe:_do_fork -aR sleep 1

# printf "%x\n" 748044
b6a0c
^^^^^
This is offset from the LEP. With this, there is also no way to ever
probe between the GEP and the LEP.

With this patchset:
# perf probe -v _do_fork+4
probe-definition(0): _do_fork+4
symbol:_do_fork file:(null) line:0 offset:4 return:0 lazy:(null)
0 arguments
Looking at the vmlinux_path (8 entries long)
Using /proc/kcore for kernel object code
Using /proc/kallsyms for symbols
Opening /sys/kernel/debug/tracing//kprobe_events write=1
Writing event: p:probe/_do_fork _text+748036
Added new event:
probe:_do_fork (on _do_fork+4)

You can now use it in all perf tools, such as:

perf record -e probe:_do_fork -aR sleep 1

# perf probe -v _do_fork
probe-definition(0): _do_fork
symbol:_do_fork file:(null) line:0 offset:0 return:0 lazy:(null)
0 arguments
Looking at the vmlinux_path (8 entries long)
Using /proc/kcore for kernel object code
Using /proc/kallsyms for symbols
Opening /sys/kernel/debug/tracing//kprobe_events write=1
Writing event: p:probe/_do_fork _text+748040
Added new event:
probe:_do_fork (on _do_fork)

You can now use it in all perf tools, such as:

perf record -e probe:_do_fork -aR sleep 1

We only offset to the LEP if function entry is specified, otherwise, we
offset from the GEP.

Example for issue (3):
---------------------
Before patch:
# perf probe -v _do_fork:%return
probe-definition(0): _do_fork:%return
symbol:_do_fork file:(null) line:0 offset:0 return:1 lazy:(null)
0 arguments
Looking at the vmlinux_path (8 entries long)
Using /proc/kcore for kernel object code
Using /proc/kallsyms for symbols
Opening /sys/kernel/debug/tracing//kprobe_events write=1
Writing event: r:probe/_do_fork _do_fork+8
Failed to write event: Invalid argument
Error: Failed to add events. Reason: Invalid argument (Code: -22)

After patch:
# perf probe _do_fork:%return
Added new event:
probe:_do_fork (on _do_fork%return)

You can now use it in all perf tools, such as:

perf record -e probe:_do_fork -aR sleep 1

Cc: Mark Wielaard <[email protected]>
Cc: Thiago Jung Bauermann <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>

Naveen N. Rao (2):
perf/powerpc: Fix kprobe and kretprobe handling with kallsyms
tools/perf: Fix kallsyms perf test on ppc64le

tools/perf/arch/powerpc/util/sym-handling.c | 41 ++++++++++++++++++++---------
tools/perf/util/probe-event.c | 5 ++--
tools/perf/util/probe-event.h | 3 ++-
tools/perf/util/symbol-elf.c | 7 ++---
tools/perf/util/symbol.h | 3 ++-
5 files changed, 40 insertions(+), 19 deletions(-)

--
2.7.4


2016-04-06 12:34:58

by Naveen N. Rao

[permalink] [raw]
Subject: [PATCH 1/2] perf/powerpc: Fix kprobe and kretprobe handling with kallsyms

So far, we used to treat probe point offsets as being offset from the
LEP. However, userspace applications (objdump/readelf) always show
disassembly and offsets from the function GEP. This is confusing to the
user as we will end up probing at an address different from what the
user expects when looking at the function disassembly with
readelf/objdump. Fix this by changing how we modify probe address with
perf.

If only the function name is provided, we assume the user needs the LEP.
Otherwise, if an offset is specified, we assume that the user knows the
exact address to probe based on function disassembly, and so we just
place the probe from the GEP offset.

Finally, kretprobe was also broken with kallsyms as we were trying to
specify an offset. This patch also fixes that issue.

Cc: Mark Wielaard <[email protected]>
Cc: Thiago Jung Bauermann <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Michael Ellerman <[email protected]>
Reported-by: Ananth N Mavinakayanahalli <[email protected]>
Signed-off-by: Naveen N. Rao <[email protected]>
---
tools/perf/arch/powerpc/util/sym-handling.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/tools/perf/arch/powerpc/util/sym-handling.c b/tools/perf/arch/powerpc/util/sym-handling.c
index bbc1a50..c5b4756 100644
--- a/tools/perf/arch/powerpc/util/sym-handling.c
+++ b/tools/perf/arch/powerpc/util/sym-handling.c
@@ -71,12 +71,21 @@ void arch__fix_tev_from_maps(struct perf_probe_event *pev,
struct probe_trace_event *tev, struct map *map)
{
/*
- * ppc64 ABIv2 local entry point is currently always 2 instructions
- * (8 bytes) after the global entry point.
+ * When probing at a function entry point, we normally always want the
+ * LEP since that catches calls to the function through both the GEP and
+ * the LEP. Hence, we would like to probe at an offset of 8 bytes if
+ * the user only specified the function entry.
+ *
+ * However, if the user specifies an offset, we fall back to using the
+ * GEP since all userspace applications (objdump/readelf) show function
+ * disassembly with offsets from the GEP.
+ *
+ * In addition, we shouldn't specify an offset for kretprobes.
*/
- if (!pev->uprobes && map->dso->symtab_type == DSO_BINARY_TYPE__KALLSYMS) {
- tev->point.address += PPC64LE_LEP_OFFSET;
+ if (pev->point.offset || pev->point.retprobe)
+ return;
+
+ if (!pev->uprobes && map->dso->symtab_type == DSO_BINARY_TYPE__KALLSYMS)
tev->point.offset += PPC64LE_LEP_OFFSET;
- }
}
#endif
--
2.7.4

2016-04-06 12:34:56

by Naveen N. Rao

[permalink] [raw]
Subject: [PATCH 2/2] tools/perf: Fix kallsyms perf test on ppc64le

ppc64le functions have a Global Entry Point (GEP) and a Local Entry
Point (LEP). While placing a probe, we always prefer the LEP since it
catches function calls through both the GEP and the LEP. In order to do
this, we fixup the function entry points during elf symbol table lookup
to point to the LEPs. This works, but breaks 'perf test kallsyms' since
the symbols loaded from the symbol table (pointing to the LEP) do not
match the symbols in kallsyms.

To fix this, we do not adjust all the symbols during symbol table load,
but only adjust the probe trace point.

Cc: Mark Wielaard <[email protected]>
Cc: Thiago Jung Bauermann <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Reported-by: Michael Ellerman <[email protected]>
Signed-off-by: Naveen N. Rao <[email protected]>
---
tools/perf/arch/powerpc/util/sym-handling.c | 24 ++++++++++++++++--------
tools/perf/util/probe-event.c | 5 +++--
tools/perf/util/probe-event.h | 3 ++-
tools/perf/util/symbol-elf.c | 7 ++++---
tools/perf/util/symbol.h | 3 ++-
5 files changed, 27 insertions(+), 15 deletions(-)

diff --git a/tools/perf/arch/powerpc/util/sym-handling.c b/tools/perf/arch/powerpc/util/sym-handling.c
index c5b4756..2f72aec 100644
--- a/tools/perf/arch/powerpc/util/sym-handling.c
+++ b/tools/perf/arch/powerpc/util/sym-handling.c
@@ -19,12 +19,6 @@ bool elf__needs_adjust_symbols(GElf_Ehdr ehdr)
ehdr.e_type == ET_DYN;
}

-#if defined(_CALL_ELF) && _CALL_ELF == 2
-void arch__elf_sym_adjust(GElf_Sym *sym)
-{
- sym->st_value += PPC64_LOCAL_ENTRY_OFFSET(sym->st_other);
-}
-#endif
#endif

#if !defined(_CALL_ELF) || _CALL_ELF != 2
@@ -65,11 +59,21 @@ bool arch__prefers_symtab(void)
return true;
}

+#ifdef HAVE_LIBELF_SUPPORT
+void arch__sym_update(struct symbol *s, GElf_Sym *sym)
+{
+ s->arch_sym = sym->st_other;
+}
+#endif
+
#define PPC64LE_LEP_OFFSET 8

void arch__fix_tev_from_maps(struct perf_probe_event *pev,
- struct probe_trace_event *tev, struct map *map)
+ struct probe_trace_event *tev, struct map *map,
+ struct symbol *sym)
{
+ int lep_offset;
+
/*
* When probing at a function entry point, we normally always want the
* LEP since that catches calls to the function through both the GEP and
@@ -82,10 +86,14 @@ void arch__fix_tev_from_maps(struct perf_probe_event *pev,
*
* In addition, we shouldn't specify an offset for kretprobes.
*/
- if (pev->point.offset || pev->point.retprobe)
+ if (pev->point.offset || pev->point.retprobe || !map || !sym)
return;

+ lep_offset = PPC64_LOCAL_ENTRY_OFFSET(sym->arch_sym);
+
if (!pev->uprobes && map->dso->symtab_type == DSO_BINARY_TYPE__KALLSYMS)
tev->point.offset += PPC64LE_LEP_OFFSET;
+ else if (lep_offset)
+ tev->point.offset += lep_offset;
}
#endif
diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index 8319fbb..d786a49 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -2498,7 +2498,8 @@ static int find_probe_functions(struct map *map, char *name,

void __weak arch__fix_tev_from_maps(struct perf_probe_event *pev __maybe_unused,
struct probe_trace_event *tev __maybe_unused,
- struct map *map __maybe_unused) { }
+ struct map *map __maybe_unused,
+ struct symbol *sym __maybe_unused) { }

/*
* Find probe function addresses from map.
@@ -2624,7 +2625,7 @@ static int find_probe_trace_events_from_map(struct perf_probe_event *pev,
strdup_or_goto(pev->args[i].type,
nomem_out);
}
- arch__fix_tev_from_maps(pev, tev, map);
+ arch__fix_tev_from_maps(pev, tev, map, sym);
}
if (ret == skipped) {
ret = -ENOENT;
diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
index e54e7b0..9bbc0c1 100644
--- a/tools/perf/util/probe-event.h
+++ b/tools/perf/util/probe-event.h
@@ -154,7 +154,8 @@ int show_available_vars(struct perf_probe_event *pevs, int npevs,
int show_available_funcs(const char *module, struct strfilter *filter, bool user);
bool arch__prefers_symtab(void);
void arch__fix_tev_from_maps(struct perf_probe_event *pev,
- struct probe_trace_event *tev, struct map *map);
+ struct probe_trace_event *tev, struct map *map,
+ struct symbol *sym);

/* If there is no space to write, returns -E2BIG. */
int e_snprintf(char *str, size_t size, const char *format, ...)
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index bc229a7..e6c032e 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -777,7 +777,8 @@ static bool want_demangle(bool is_kernel_sym)
return is_kernel_sym ? symbol_conf.demangle_kernel : symbol_conf.demangle;
}

-void __weak arch__elf_sym_adjust(GElf_Sym *sym __maybe_unused) { }
+void __weak arch__sym_update(struct symbol *s __maybe_unused,
+ GElf_Sym *sym __maybe_unused) { }

int dso__load_sym(struct dso *dso, struct map *map,
struct symsrc *syms_ss, struct symsrc *runtime_ss,
@@ -954,8 +955,6 @@ int dso__load_sym(struct dso *dso, struct map *map,
(sym.st_value & 1))
--sym.st_value;

- arch__elf_sym_adjust(&sym);
-
if (dso->kernel || kmodule) {
char dso_name[PATH_MAX];

@@ -1089,6 +1088,8 @@ new_symbol:
if (!f)
goto out_elf_end;

+ arch__sym_update(f, &sym);
+
if (filter && filter(curr_map, f))
symbol__delete(f);
else {
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index c8b7544..f0e62e8 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -55,6 +55,7 @@ struct symbol {
u16 namelen;
u8 binding;
bool ignore;
+ u8 arch_sym;
char name[0];
};

@@ -310,7 +311,7 @@ int setup_intlist(struct intlist **list, const char *list_str,

#ifdef HAVE_LIBELF_SUPPORT
bool elf__needs_adjust_symbols(GElf_Ehdr ehdr);
-void arch__elf_sym_adjust(GElf_Sym *sym);
+void arch__sym_update(struct symbol *s, GElf_Sym *sym);
#endif

#define SYMBOL_A 0
--
2.7.4

Subject: Re: [PATCH 2/2] tools/perf: Fix kallsyms perf test on ppc64le

On Wed, Apr 06, 2016 at 06:02:58PM +0530, Naveen N. Rao wrote:
> ppc64le functions have a Global Entry Point (GEP) and a Local Entry
> Point (LEP). While placing a probe, we always prefer the LEP since it
> catches function calls through both the GEP and the LEP. In order to do
> this, we fixup the function entry points during elf symbol table lookup
> to point to the LEPs. This works, but breaks 'perf test kallsyms' since
> the symbols loaded from the symbol table (pointing to the LEP) do not
> match the symbols in kallsyms.
>
> To fix this, we do not adjust all the symbols during symbol table load,
> but only adjust the probe trace point.
>
> Cc: Mark Wielaard <[email protected]>
> Cc: Thiago Jung Bauermann <[email protected]>
> Cc: Ananth N Mavinakayanahalli <[email protected]>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: Masami Hiramatsu <[email protected]>
> Reported-by: Michael Ellerman <[email protected]>
> Signed-off-by: Naveen N. Rao <[email protected]>

Acked-by: Ananth N Mavinakayanahalli <[email protected]>

Subject: Re: [PATCH 1/2] perf/powerpc: Fix kprobe and kretprobe handling with kallsyms

On Wed, Apr 06, 2016 at 06:02:57PM +0530, Naveen N. Rao wrote:

> + if (!pev->uprobes && map->dso->symtab_type == DSO_BINARY_TYPE__KALLSYMS)
> tev->point.offset += PPC64LE_LEP_OFFSET;

uprobes check against kallsysms? Am I missing something here?

Ananth

2016-04-07 06:47:06

by Naveen N. Rao

[permalink] [raw]
Subject: Re: [PATCH 1/2] perf/powerpc: Fix kprobe and kretprobe handling with kallsyms

On 2016/04/07 10:00AM, Ananth N wrote:
> On Wed, Apr 06, 2016 at 06:02:57PM +0530, Naveen N. Rao wrote:
>
> > + if (!pev->uprobes && map->dso->symtab_type == DSO_BINARY_TYPE__KALLSYMS)
> > tev->point.offset += PPC64LE_LEP_OFFSET;
>
> uprobes check against kallsysms? Am I missing something here?

Ah yes. That check shouldn't be necessary since symtab_type would be
different anyway. I will remove that check.

Thanks for the review!
- Naveen

2016-04-07 08:19:28

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH 0/2] perf probe fixes for ppc64le


On 06/04/16 22:32, Naveen N. Rao wrote:
> This patchset fixes three issues found with perf probe on ppc64le:
> 1. 'perf test kallsyms' failure on ppc64le (reported by Michael
> Ellerman). This was due to the symbols being fixed up during symbol
> table load. This is fixed in patch 2 by delaying symbol fixup until
> later.
> 2. perf probe function offset was being calculated from the local entry
> point (LEP), which does not match user expectation when trying to look
> at function disassembly output (reported by Ananth N). This is fixed for
> kallsyms in patch 1 and for symbol table in patch 2.

I think the bit where the offset is w.r.t LEP when using a name, but w.r.t
GEP when using function+offset can be confusing. Do we really need probe
points between GEP and LEP? All the GEP does is setup r2. The use case
could be more generic, but please clarify.

> 3. perf probe failure with kretprobe when using kallsyms. This was
> failing as we were specifying an offset. This is fixed in patch 1.
>

Balbir Singh.

2016-04-07 09:29:07

by Naveen N. Rao

[permalink] [raw]
Subject: Re: [PATCH 0/2] perf probe fixes for ppc64le

On 2016/04/07 06:19PM, Balbir Singh wrote:
>
> On 06/04/16 22:32, Naveen N. Rao wrote:
> > This patchset fixes three issues found with perf probe on ppc64le:
> > 1. 'perf test kallsyms' failure on ppc64le (reported by Michael
> > Ellerman). This was due to the symbols being fixed up during symbol
> > table load. This is fixed in patch 2 by delaying symbol fixup until
> > later.
> > 2. perf probe function offset was being calculated from the local entry
> > point (LEP), which does not match user expectation when trying to look
> > at function disassembly output (reported by Ananth N). This is fixed for
> > kallsyms in patch 1 and for symbol table in patch 2.
>
> I think the bit where the offset is w.r.t LEP when using a name, but w.r.t
> GEP when using function+offset can be confusing.

Thanks for your review!

The rationale for this is actually from the end-user perspective. The
two use cases we are considering are:
1. User just wants to probe at function entry point:
# perf probe _do_fork

In this case, the user most definitely needs the local entry point,
without which the probe won't be hit. So, for this case, we
automatically insert the probe at the LEP.

[We really only want to alter perf probe behavior in this case only, but
we were incorrectly changing the behavior of perf with the below
scenario as well.]

2. User wants to probe at a specific location. In this case, the user
most likely starts by looking at the function disassembly. For instance:
# objdump -S -d vmlinux.bak | grep -A100 \<_do_fork\>:
c0000000000b6a00 <_do_fork>:
unsigned long stack_start,
unsigned long stack_size,
int __user *parent_tidptr,
int __user *child_tidptr,
unsigned long tls)
{
c0000000000b6a00: f7 00 4c 3c addis r2,r12,247
c0000000000b6a04: 00 86 42 38 addi r2,r2,-31232
c0000000000b6a08: a6 02 08 7c mflr r0
c0000000000b6a0c: d0 ff 41 fb std r26,-48(r1)
c0000000000b6a10: 26 80 90 7d mfocrf r12,8
...<snip>...
if (!(clone_flags & CLONE_UNTRACED)) {
c0000000000b6a54: e3 4f c7 7b rldicl. r7,r30,41,63
c0000000000b6a58: 2c 00 82 40 bne c0000000000b6a84 <_do_fork+0x84>
if (clone_flags & CLONE_VFORK)
c0000000000b6a5c: e3 97 c8 7b rldicl. r8,r30,50,63
c0000000000b6a60: a0 01 82 41 beq c0000000000b6c00 <_do_fork+0x200>
c0000000000b6a64: 20 00 20 39 li r9,32
trace = PTRACE_EVENT_VFORK;
c0000000000b6a68: 02 00 80 3b li r28,2
c0000000000b6a6c: 10 02 4d e9 ld r10,528(r13)

If the user wants to probe at _do_fork+0x54, he'd do:
# perf probe _do_fork+0x54

With the earlier approach, we would insert the probe at _do_fork+0x5c
(0x54 from the LEP) instead, which is incorrect.

In reality, user would probably just use debuginfo:
# perf probe -L _do_fork
<_do_fork@/root/linus/kernel/fork.c:0>
0 long _do_fork(unsigned long clone_flags,
unsigned long stack_start,
unsigned long stack_size,
int __user *parent_tidptr,
int __user *child_tidptr,
unsigned long tls)
6 {
struct task_struct *p;
8 int trace = 0;
long nr;

/*
* Determine whether and which event to report to ptracer. When
* called from kernel_thread or CLONE_UNTRACED is explicitly
* requested, no event is reported; otherwise, report if the event
* for the type of forking is enabled.
*/
17 if (!(clone_flags & CLONE_UNTRACED)) {
18 if (clone_flags & CLONE_VFORK)
19 trace = PTRACE_EVENT_VFORK;
20 else if ((clone_flags & CSIGNAL) != SIGCHLD)
21 trace = PTRACE_EVENT_CLONE;

# perf probe _do_fork:17

In this case, perf chooses the right address based on DWARF. The current
patchset matches the behavior of perf without debuginfo with this.

> Do we really need probe
> points between GEP and LEP? All the GEP does is setup r2. The use case
> could be more generic, but please clarify.

There could be scenarios where having a probe point between GEP and LEP
is useful - for instance, if we are only interested in calls to an
in-kernel function from an external module. However, this is a secondary
consideration and the more important consideration was to be consistent
with userspace tooling (readelf/objdump) while choosing the address to
probe.

- Naveen

2016-04-08 06:57:54

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH 0/2] perf probe fixes for ppc64le

On Thu, 2016-04-07 at 14:56 +0530, Naveen N. Rao wrote:
> On 2016/04/07 06:19PM, Balbir Singh wrote:
> > 
> > 
> > On 06/04/16 22:32, Naveen N. Rao wrote:
> > > 
> > > This patchset fixes three issues found with perf probe on ppc64le:
> > > 1. 'perf test kallsyms' failure on ppc64le (reported by Michael
> > > Ellerman). This was due to the symbols being fixed up during symbol
> > > table load. This is fixed in patch 2 by delaying symbol fixup until
> > > later.
> > > 2. perf probe function offset was being calculated from the local entry
> > > point (LEP), which does not match user expectation when trying to look
> > > at function disassembly output (reported by Ananth N). This is fixed for
> > > kallsyms in patch 1 and for symbol table in patch 2.
> > I think the bit where the offset is w.r.t LEP when using a name, but w.r.t
> > GEP when using function+offset can be confusing.
> Thanks for your review!

> The rationale for this is actually from the end-user perspective. The 
> two use cases we are considering are:
> 1. User just wants to probe at function entry point:
>  # perf probe _do_fork

> In this case, the user most definitely needs the local entry point, 
> without which the probe won't be hit. So, for this case, we 
> automatically insert the probe at the LEP.

> [We really only want to alter perf probe behavior in this case only, but 
> we were incorrectly changing the behavior of perf with the below 
> scenario as well.]

> 2. User wants to probe at a specific location. In this case, the user 
> most likely starts by looking at the function disassembly. For instance:
>  # objdump -S -d vmlinux.bak | grep -A100 \<_do_fork\>:
>  c0000000000b6a00 <_do_fork>:
>        unsigned long stack_start,
>        unsigned long stack_size,
>        int __user *parent_tidptr,
>        int __user *child_tidptr,
>        unsigned long tls)
>  {
>  c0000000000b6a00: f7 00 4c 3c  addis   r2,r12,247
>  c0000000000b6a04: 00 86 42 38  addi    r2,r2,-31232
>  c0000000000b6a08: a6 02 08 7c  mflr    r0
>  c0000000000b6a0c: d0 ff 41 fb  std     r26,-48(r1)
>  c0000000000b6a10: 26 80 90 7d  mfocrf  r12,8
>  ...<snip>...
>  if (!(clone_flags & CLONE_UNTRACED)) {
>  c0000000000b6a54: e3 4f c7 7b  rldicl. r7,r30,41,63
>  c0000000000b6a58: 2c 00 82 40  bne     c0000000000b6a84 <_do_fork+0x84>
>  if (clone_flags & CLONE_VFORK)
>  c0000000000b6a5c: e3 97 c8 7b  rldicl. r8,r30,50,63
>  c0000000000b6a60: a0 01 82 41  beq     c0000000000b6c00 <_do_fork+0x200>
>  c0000000000b6a64: 20 00 20 39  li      r9,32
>  trace = PTRACE_EVENT_VFORK;
>  c0000000000b6a68: 02 00 80 3b  li      r28,2
>  c0000000000b6a6c: 10 02 4d e9  ld      r10,528(r13)

> If the user wants to probe at _do_fork+0x54, he'd do:
>  # perf probe _do_fork+0x54

> With the earlier approach, we would insert the probe at _do_fork+0x5c 
> (0x54 from the LEP) instead, which is incorrect.

> In reality, user would probably just use debuginfo:
>  # perf probe -L _do_fork
<_do_fork@/root/linus/kernel/fork.c:0>
>        0  long _do_fork(unsigned long clone_flags,
>        unsigned long stack_start,
>        unsigned long stack_size,
>        int __user *parent_tidptr,
>        int __user *child_tidptr,
>        unsigned long tls)
>        6  {
>  struct task_struct *p;
>        8         int trace = 0;
>  long nr;
>   
>  /*
>   * Determine whether and which event to report to ptracer.  When
>   * called from kernel_thread or CLONE_UNTRACED is explicitly
>   * requested, no event is reported; otherwise, report if the event
>   * for the type of forking is enabled.
>   */
>       17         if (!(clone_flags & CLONE_UNTRACED)) {
>       18                 if (clone_flags & CLONE_VFORK)
>       19                         trace = PTRACE_EVENT_VFORK;
>       20                 else if ((clone_flags & CSIGNAL) != SIGCHLD)
>       21                         trace = PTRACE_EVENT_CLONE;

>  # perf probe _do_fork:17

> In this case, perf chooses the right address based on DWARF. The current 
> patchset matches the behavior of perf without debuginfo with this.


I agree what I worry is that perf probe _do_fork sets a breakpoint after
perf probe _do_fork+0x4. I am not sure if there is an easy solution to
the problem. 

Balbir


2016-04-09 13:44:25

by Naveen N. Rao

[permalink] [raw]
Subject: Re: [PATCH 0/2] perf probe fixes for ppc64le

On 2016/04/08 04:57PM, Balbir Singh wrote:
> On Thu, 2016-04-07 at 14:56 +0530, Naveen N. Rao wrote:
> > On 2016/04/07 06:19PM, Balbir Singh wrote:
> > >?
> > >?
> > > On 06/04/16 22:32, Naveen N. Rao wrote:
> > > >?
> > > > This patchset fixes three issues found with perf probe on ppc64le:
> > > > 1. 'perf test kallsyms' failure on ppc64le (reported by Michael
> > > > Ellerman). This was due to the symbols being fixed up during symbol
> > > > table load. This is fixed in patch 2 by delaying symbol fixup until
> > > > later.
> > > > 2. perf probe function offset was being calculated from the local entry
> > > > point (LEP), which does not match user expectation when trying to look
> > > > at function disassembly output (reported by Ananth N). This is fixed for
> > > > kallsyms in patch 1 and for symbol table in patch 2.
> > > I think the bit where the offset is w.r.t LEP when using a name, but w.r.t
> > > GEP when using function+offset can be confusing.
> > Thanks for your review!
> >?
> > The rationale for this is actually from the end-user perspective. The?
> > two use cases we are considering are:
> > 1. User just wants to probe at function entry point:
> >? # perf probe _do_fork
> >?
> > In this case, the user most definitely needs the local entry point,?
> > without which the probe won't be hit. So, for this case, we?
> > automatically insert the probe at the LEP.
> >?
> > [We really only want to alter perf probe behavior in this case only, but?
> > we were incorrectly changing the behavior of perf with the below?
> > scenario as well.]
> >?
> > 2. User wants to probe at a specific location. In this case, the user?
> > most likely starts by looking at the function disassembly. For instance:
> >? # objdump -S -d vmlinux.bak | grep -A100 \<_do_fork\>:
> >? c0000000000b6a00 <_do_fork>:
> >? ??????unsigned long stack_start,
> >? ??????unsigned long stack_size,
> >? ??????int __user *parent_tidptr,
> >? ??????int __user *child_tidptr,
> >? ??????unsigned long tls)
> >? {
> >? c0000000000b6a00: f7 00 4c 3c? addis???r2,r12,247
> >? c0000000000b6a04: 00 86 42 38? addi????r2,r2,-31232
> >? c0000000000b6a08: a6 02 08 7c? mflr????r0
> >? c0000000000b6a0c: d0 ff 41 fb? std?????r26,-48(r1)
> >? c0000000000b6a10: 26 80 90 7d? mfocrf??r12,8
> >? ...<snip>...
> >? if (!(clone_flags & CLONE_UNTRACED)) {
> >? c0000000000b6a54: e3 4f c7 7b? rldicl. r7,r30,41,63
> >? c0000000000b6a58: 2c 00 82 40? bne?????c0000000000b6a84 <_do_fork+0x84>
> >? if (clone_flags & CLONE_VFORK)
> >? c0000000000b6a5c: e3 97 c8 7b? rldicl. r8,r30,50,63
> >? c0000000000b6a60: a0 01 82 41? beq?????c0000000000b6c00 <_do_fork+0x200>
> >? c0000000000b6a64: 20 00 20 39? li??????r9,32
> >? trace = PTRACE_EVENT_VFORK;
> >? c0000000000b6a68: 02 00 80 3b? li??????r28,2
> >? c0000000000b6a6c: 10 02 4d e9? ld??????r10,528(r13)
> >?
> > If the user wants to probe at _do_fork+0x54, he'd do:
> >? # perf probe _do_fork+0x54
> >?
> > With the earlier approach, we would insert the probe at _do_fork+0x5c?
> > (0x54 from the LEP) instead, which is incorrect.
> >?
> > In reality, user would probably just use debuginfo:
> >? # perf probe -L _do_fork
> >? <_do_fork@/root/linus/kernel/fork.c:0>
> >? ??????0??long _do_fork(unsigned long clone_flags,
> >? ??????unsigned long stack_start,
> >? ??????unsigned long stack_size,
> >? ??????int __user *parent_tidptr,
> >? ??????int __user *child_tidptr,
> >? ??????unsigned long tls)
> >? ??????6??{
> >? struct task_struct *p;
> >? ??????8?????????int trace = 0;
> >? long nr;
> >? ?
> >? /*
> >? ?* Determine whether and which event to report to ptracer.??When
> >? ?* called from kernel_thread or CLONE_UNTRACED is explicitly
> >? ?* requested, no event is reported; otherwise, report if the event
> >? ?* for the type of forking is enabled.
> >? ?*/
> >? ?????17?????????if (!(clone_flags & CLONE_UNTRACED)) {
> >? ?????18?????????????????if (clone_flags & CLONE_VFORK)
> >? ?????19?????????????????????????trace = PTRACE_EVENT_VFORK;
> >? ?????20?????????????????else if ((clone_flags & CSIGNAL) != SIGCHLD)
> >? ?????21?????????????????????????trace = PTRACE_EVENT_CLONE;
> >?
> >? # perf probe _do_fork:17
> >?
> > In this case, perf chooses the right address based on DWARF. The current?
> > patchset matches the behavior of perf without debuginfo with this.
>
>
> I agree what I worry is that perf probe _do_fork sets a breakpoint after
> perf probe _do_fork+0x4. I am not sure if there is an easy solution to
> the problem.?

I suppose this boils down to the quirkiness of ABIv2. Though, in
reality, I don't think most users will notice. As I stated above, users
will most likely start with the disassembly or debuginfo and this patch
ensures there are actually no surprises there.

- Naveen

2016-04-11 04:41:52

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH 0/2] perf probe fixes for ppc64le

On Sat, 2016-04-09 at 19:12 +0530, Naveen N. Rao wrote:
>
> I suppose this boils down to the quirkiness of ABIv2. Though, in
> reality, I don't think most users will notice. As I stated above, users
> will most likely start with the disassembly or debuginfo and this patch
> ensures there are actually no surprises there.

Yeah it's unfortunate that we have to handle these two cases differently.

But I think you've chosen the right trade off.

When we are just given the name we *must not* use the global entry point,
otherwise the probes will often not hit - because most calls go to the local
entry point and skip the global entry point entirely.

When we're given a name and offset, it's less confusing if we use the global
entry point as the base for the offset calculation.

So for the concept:

Acked-by: Michael Ellerman <[email protected]>

I don't really know this part of the perf code enough to give you an ack for the
actual changes, I'll leave that to the perf maintainers.

cheers

2016-04-11 13:45:24

by Naveen N. Rao

[permalink] [raw]
Subject: Re: [PATCH 0/2] perf probe fixes for ppc64le

On 2016/04/11 02:41PM, Michael Ellerman wrote:
> On Sat, 2016-04-09 at 19:12 +0530, Naveen N. Rao wrote:
> >
> > I suppose this boils down to the quirkiness of ABIv2. Though, in
> > reality, I don't think most users will notice. As I stated above, users
> > will most likely start with the disassembly or debuginfo and this patch
> > ensures there are actually no surprises there.
>
> Yeah it's unfortunate that we have to handle these two cases differently.
>
> But I think you've chosen the right trade off.
>
> When we are just given the name we *must not* use the global entry point,
> otherwise the probes will often not hit - because most calls go to the local
> entry point and skip the global entry point entirely.
>
> When we're given a name and offset, it's less confusing if we use the global
> entry point as the base for the offset calculation.
>
> So for the concept:
>
> Acked-by: Michael Ellerman <[email protected]>

Thanks, Michael. That helps.

>
> I don't really know this part of the perf code enough to give you an ack for the
> actual changes, I'll leave that to the perf maintainers.

Sure.

Arnaldo,
I will send a v2 soon with a bit more testing to make sure this covers
all scenarios properly (I am also trying to see if we can address
debuginfo-based probing properly).

- Naveen