2014-04-22 13:42:09

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 1/3] perf tests: Introduce perf_regs_load function on ARM64

Hi Jean,

Apologies for the delay on this.

On Tue, Mar 25, 2014 at 03:23:26PM +0000, Jean Pihet wrote:
> Hi Mark,
>
> On 21 March 2014 16:11, Mark Rutland <[email protected]> wrote:
> > Hi Jean,
> >
> > On Wed, Mar 19, 2014 at 09:42:33AM +0000, Jean Pihet wrote:
> >> Introducing perf_regs_load function, which is going
> >> to be used for dwarf unwind test in following patches.
> >>
> >> It takes single argument as a pointer to the regs dump
> >> buffer and populates it with current registers values, as
> >> expected by the perf built-in unwinding test.
> >>
> >> Signed-off-by: Jean Pihet <[email protected]>
> >> Cc: Steve Capper <[email protected]>
> >> Cc: Corey Ashford <[email protected]>
> >> Cc: Frederic Weisbecker <[email protected]>
> >> Cc: Ingo Molnar <[email protected]>
> >> Cc: Namhyung Kim <[email protected]>
> >> Cc: Paul Mackerras <[email protected]>
> >> Cc: Peter Zijlstra <[email protected]>
> >> Cc: Arnaldo Carvalho de Melo <[email protected]>
> >> Cc: David Ahern <[email protected]>
> >> Cc: Jiri Olsa <[email protected]>
> >> ---
> >> tools/perf/arch/arm64/Makefile | 1 +
> >> tools/perf/arch/arm64/include/perf_regs.h | 2 ++
> >> tools/perf/arch/arm64/tests/regs_load.S | 39 +++++++++++++++++++++++++++++++
> >> 3 files changed, 42 insertions(+)
> >> create mode 100644 tools/perf/arch/arm64/tests/regs_load.S
> >>
> >> diff --git a/tools/perf/arch/arm64/Makefile b/tools/perf/arch/arm64/Makefile
> >> index 67e9b3d..9b8f87e 100644
> >> --- a/tools/perf/arch/arm64/Makefile
> >> +++ b/tools/perf/arch/arm64/Makefile
> >> @@ -4,4 +4,5 @@ LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/dwarf-regs.o
> >> endif
> >> ifndef NO_LIBUNWIND
> >> LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/unwind-libunwind.o
> >> +LIB_OBJS += $(OUTPUT)arch/$(ARCH)/tests/regs_load.o
> >> endif
> >> diff --git a/tools/perf/arch/arm64/include/perf_regs.h b/tools/perf/arch/arm64/include/perf_regs.h
> >> index 2359546..1e052f1 100644
> >> --- a/tools/perf/arch/arm64/include/perf_regs.h
> >> +++ b/tools/perf/arch/arm64/include/perf_regs.h
> >> @@ -9,6 +9,8 @@
> >> #define PERF_REG_IP PERF_REG_ARM64_PC
> >> #define PERF_REG_SP PERF_REG_ARM64_SP
> >>
> >> +void perf_regs_load(u64 *regs);
> >> +
> >> static inline const char *perf_reg_name(int id)
> >> {
> >> switch (id) {
> >> diff --git a/tools/perf/arch/arm64/tests/regs_load.S b/tools/perf/arch/arm64/tests/regs_load.S
> >> new file mode 100644
> >> index 0000000..92ab968
> >> --- /dev/null
> >> +++ b/tools/perf/arch/arm64/tests/regs_load.S
> >> @@ -0,0 +1,39 @@
> >> +#include <linux/linkage.h>
> >> +
> >> +/*
> >> + * Implementation of void perf_regs_load(u64 *regs);
> >> + *
> >> + * This functions fills in the 'regs' buffer from the actual registers values,
> >> + * in the way the perf built-in unwinding test expects them:
> >> + * - the PC at the time at the call to this function. Since this function
> >> + * is called using a bl instruction, the PC value is taken from LR,
> >
> > Is it guaranteed that this function is always invoked with a branch with
> > link instruction, or is that just what current compiler versions are
> > doing? I couldn't see where we would get that guarantee from.
> The current compiler implements the call as a bl instruction.

While I don't think we can rely on the compiler using a bl to call the
function it shouldn't matter here if we only care about the LR value
being an address within the caller, as it doesn't look amenable to tail
call optimization.

> > If it is called with a branch with link, then the LR value will be the
> > PC at call time + 4, rather than just the exact PC at call time. If not
> > then we don't have a guaranteed relationship between the PC at call time
> > and the current LR value.
> >
> > If the only place that perf_regs_load is used is a single test which
> > doesn't care about the precise PC at the time of the call, then it's
> > probably OK to use the LR value, but we should be careful to document
> > what the faked-up PC actually is and how we expect it to be used.
> The code is only used by an unwinding test. The unwinding code
> resolves the function name from an address range found in the dwarf
> information so in principle it is ok to use the PC/LR at the time of
> the call to a function.
>
> Is the comment above OK or do you want an update of the code as well?

If we just need an (arbitrary) address within the caller, a comment
update should be fine.

> >> + * - the current SP (not touched by this function),
> >> + * - the current value of LR is merely retrieved and stored because the
> >> + * value before the call to this function is unknown at this time; it will
> >> + * be unwound from the dwarf information in unwind__get_entries.
> >> + */
> >> +
> >> +.text
> >> +.type perf_regs_load,%function
> >> +ENTRY(perf_regs_load)
> >> + stp x0, x1, [x0], #16 // store x0..x29
> >> + stp x2, x3, [x0], #16
> >> + stp x4, x5, [x0], #16
> >> + stp x6, x7, [x0], #16
> >> + stp x8, x9, [x0], #16
> >> + stp x10, x11, [x0], #16
> >> + stp x12, x13, [x0], #16
> >> + stp x14, x15, [x0], #16
> >> + stp x16, x17, [x0], #16
> >> + stp x18, x19, [x0], #16
> >> + stp x20, x21, [x0], #16
> >> + stp x22, x23, [x0], #16
> >> + stp x24, x25, [x0], #16
> >> + stp x26, x27, [x0], #16
> >> + stp x28, x29, [x0], #16
> >> + mov x1, sp
> >> + stp x30, x1, [x0], #16 // store lr and sp
> >> + str x30, [x0] // store pc as lr in order to skip the call
> >> + // to this function
> >
> > It might be better to word this a "store the lr in place of the pc". To
> > me at least the current wording implies the opposite of what the code
> > seems to be doing.
> Ok the last comment can be updated.

Ok, cheers.

With those changes I think this looks fine.

Thanks,
Mark.


2014-04-28 13:10:51

by Jean Pihet

[permalink] [raw]
Subject: Re: [PATCH 1/3] perf tests: Introduce perf_regs_load function on ARM64

On 22 April 2014 15:42, Mark Rutland <[email protected]> wrote:
> Hi Jean,
>
> Apologies for the delay on this.
>
> On Tue, Mar 25, 2014 at 03:23:26PM +0000, Jean Pihet wrote:
>> Hi Mark,
>>
>> On 21 March 2014 16:11, Mark Rutland <[email protected]> wrote:
>> > Hi Jean,
>> >
>> > On Wed, Mar 19, 2014 at 09:42:33AM +0000, Jean Pihet wrote:
>> >> Introducing perf_regs_load function, which is going
>> >> to be used for dwarf unwind test in following patches.
>> >>
>> >> It takes single argument as a pointer to the regs dump
>> >> buffer and populates it with current registers values, as
>> >> expected by the perf built-in unwinding test.
>> >>
>> >> Signed-off-by: Jean Pihet <[email protected]>
>> >> Cc: Steve Capper <[email protected]>
>> >> Cc: Corey Ashford <[email protected]>
>> >> Cc: Frederic Weisbecker <[email protected]>
>> >> Cc: Ingo Molnar <[email protected]>
>> >> Cc: Namhyung Kim <[email protected]>
>> >> Cc: Paul Mackerras <[email protected]>
>> >> Cc: Peter Zijlstra <[email protected]>
>> >> Cc: Arnaldo Carvalho de Melo <[email protected]>
>> >> Cc: David Ahern <[email protected]>
>> >> Cc: Jiri Olsa <[email protected]>
>> >> ---
>> >> tools/perf/arch/arm64/Makefile | 1 +
>> >> tools/perf/arch/arm64/include/perf_regs.h | 2 ++
>> >> tools/perf/arch/arm64/tests/regs_load.S | 39 +++++++++++++++++++++++++++++++
>> >> 3 files changed, 42 insertions(+)
>> >> create mode 100644 tools/perf/arch/arm64/tests/regs_load.S
>> >>
>> >> diff --git a/tools/perf/arch/arm64/Makefile b/tools/perf/arch/arm64/Makefile
>> >> index 67e9b3d..9b8f87e 100644
>> >> --- a/tools/perf/arch/arm64/Makefile
>> >> +++ b/tools/perf/arch/arm64/Makefile
>> >> @@ -4,4 +4,5 @@ LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/dwarf-regs.o
>> >> endif
>> >> ifndef NO_LIBUNWIND
>> >> LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/unwind-libunwind.o
>> >> +LIB_OBJS += $(OUTPUT)arch/$(ARCH)/tests/regs_load.o
>> >> endif
>> >> diff --git a/tools/perf/arch/arm64/include/perf_regs.h b/tools/perf/arch/arm64/include/perf_regs.h
>> >> index 2359546..1e052f1 100644
>> >> --- a/tools/perf/arch/arm64/include/perf_regs.h
>> >> +++ b/tools/perf/arch/arm64/include/perf_regs.h
>> >> @@ -9,6 +9,8 @@
>> >> #define PERF_REG_IP PERF_REG_ARM64_PC
>> >> #define PERF_REG_SP PERF_REG_ARM64_SP
>> >>
>> >> +void perf_regs_load(u64 *regs);
>> >> +
>> >> static inline const char *perf_reg_name(int id)
>> >> {
>> >> switch (id) {
>> >> diff --git a/tools/perf/arch/arm64/tests/regs_load.S b/tools/perf/arch/arm64/tests/regs_load.S
>> >> new file mode 100644
>> >> index 0000000..92ab968
>> >> --- /dev/null
>> >> +++ b/tools/perf/arch/arm64/tests/regs_load.S
>> >> @@ -0,0 +1,39 @@
>> >> +#include <linux/linkage.h>
>> >> +
>> >> +/*
>> >> + * Implementation of void perf_regs_load(u64 *regs);
>> >> + *
>> >> + * This functions fills in the 'regs' buffer from the actual registers values,
>> >> + * in the way the perf built-in unwinding test expects them:
>> >> + * - the PC at the time at the call to this function. Since this function
>> >> + * is called using a bl instruction, the PC value is taken from LR,
>> >
>> > Is it guaranteed that this function is always invoked with a branch with
>> > link instruction, or is that just what current compiler versions are
>> > doing? I couldn't see where we would get that guarantee from.
>> The current compiler implements the call as a bl instruction.
>
> While I don't think we can rely on the compiler using a bl to call the
> function it shouldn't matter here if we only care about the LR value
> being an address within the caller, as it doesn't look amenable to tail
> call optimization.
>
>> > If it is called with a branch with link, then the LR value will be the
>> > PC at call time + 4, rather than just the exact PC at call time. If not
>> > then we don't have a guaranteed relationship between the PC at call time
>> > and the current LR value.
>> >
>> > If the only place that perf_regs_load is used is a single test which
>> > doesn't care about the precise PC at the time of the call, then it's
>> > probably OK to use the LR value, but we should be careful to document
>> > what the faked-up PC actually is and how we expect it to be used.
>> The code is only used by an unwinding test. The unwinding code
>> resolves the function name from an address range found in the dwarf
>> information so in principle it is ok to use the PC/LR at the time of
>> the call to a function.
>>
>> Is the comment above OK or do you want an update of the code as well?
>
> If we just need an (arbitrary) address within the caller, a comment
> update should be fine.
Yes that is the idea;

>
>> >> + * - the current SP (not touched by this function),
>> >> + * - the current value of LR is merely retrieved and stored because the
>> >> + * value before the call to this function is unknown at this time; it will
>> >> + * be unwound from the dwarf information in unwind__get_entries.
>> >> + */
>> >> +
>> >> +.text
>> >> +.type perf_regs_load,%function
>> >> +ENTRY(perf_regs_load)
>> >> + stp x0, x1, [x0], #16 // store x0..x29
>> >> + stp x2, x3, [x0], #16
>> >> + stp x4, x5, [x0], #16
>> >> + stp x6, x7, [x0], #16
>> >> + stp x8, x9, [x0], #16
>> >> + stp x10, x11, [x0], #16
>> >> + stp x12, x13, [x0], #16
>> >> + stp x14, x15, [x0], #16
>> >> + stp x16, x17, [x0], #16
>> >> + stp x18, x19, [x0], #16
>> >> + stp x20, x21, [x0], #16
>> >> + stp x22, x23, [x0], #16
>> >> + stp x24, x25, [x0], #16
>> >> + stp x26, x27, [x0], #16
>> >> + stp x28, x29, [x0], #16
>> >> + mov x1, sp
>> >> + stp x30, x1, [x0], #16 // store lr and sp
>> >> + str x30, [x0] // store pc as lr in order to skip the call
>> >> + // to this function
>> >
>> > It might be better to word this a "store the lr in place of the pc". To
>> > me at least the current wording implies the opposite of what the code
>> > seems to be doing.
>> Ok the last comment can be updated.
>
> Ok, cheers.
>
> With those changes I think this looks fine.
Ok let me send a refreshed version in a bit. If the wording is Ok I
will refresh the ARM patches for the same topic and re-submit them.

>
> Thanks,
> Mark.

Thanks,
Jean

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2014-04-28 13:12:30

by Jean Pihet

[permalink] [raw]
Subject: [PATCH 1/3] perf tests: Introduce perf_regs_load function on ARM64

Introducing perf_regs_load function, which is going
to be used for dwarf unwind test in following patches.

It takes single argument as a pointer to the regs dump
buffer and populates it with current registers values, as
expected by the perf built-in unwinding test.

Signed-off-by: Jean Pihet <[email protected]>
Cc: Steve Capper <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jiri Olsa <[email protected]>
---
tools/perf/arch/arm64/Makefile | 1 +
tools/perf/arch/arm64/include/perf_regs.h | 2 ++
tools/perf/arch/arm64/tests/regs_load.S | 40 +++++++++++++++++++++++++++++++
3 files changed, 43 insertions(+)
create mode 100644 tools/perf/arch/arm64/tests/regs_load.S

diff --git a/tools/perf/arch/arm64/Makefile b/tools/perf/arch/arm64/Makefile
index 67e9b3d..9b8f87e 100644
--- a/tools/perf/arch/arm64/Makefile
+++ b/tools/perf/arch/arm64/Makefile
@@ -4,4 +4,5 @@ LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/dwarf-regs.o
endif
ifndef NO_LIBUNWIND
LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/unwind-libunwind.o
+LIB_OBJS += $(OUTPUT)arch/$(ARCH)/tests/regs_load.o
endif
diff --git a/tools/perf/arch/arm64/include/perf_regs.h b/tools/perf/arch/arm64/include/perf_regs.h
index 2359546..1e052f1 100644
--- a/tools/perf/arch/arm64/include/perf_regs.h
+++ b/tools/perf/arch/arm64/include/perf_regs.h
@@ -9,6 +9,8 @@
#define PERF_REG_IP PERF_REG_ARM64_PC
#define PERF_REG_SP PERF_REG_ARM64_SP

+void perf_regs_load(u64 *regs);
+
static inline const char *perf_reg_name(int id)
{
switch (id) {
diff --git a/tools/perf/arch/arm64/tests/regs_load.S b/tools/perf/arch/arm64/tests/regs_load.S
new file mode 100644
index 0000000..40b8b99
--- /dev/null
+++ b/tools/perf/arch/arm64/tests/regs_load.S
@@ -0,0 +1,40 @@
+#include <linux/linkage.h>
+
+/*
+ * Implementation of void perf_regs_load(u64 *regs);
+ *
+ * This functions fills in the 'regs' buffer from the actual registers values,
+ * in the way the perf built-in unwinding test expects them, e.g. an address
+ * within the caller:
+ * - the PC at the time at the call to this function. Since this function
+ * is called using a bl instruction, the PC value is taken from LR,
+ * - the current SP (not touched by this function),
+ * - the current value of LR is merely retrieved and stored because the
+ * value before the call to this function is unknown at this time; it will
+ * be unwound from the dwarf information in unwind__get_entries.
+ */
+
+.text
+.type perf_regs_load,%function
+ENTRY(perf_regs_load)
+ stp x0, x1, [x0], #16 // store x0..x29
+ stp x2, x3, [x0], #16
+ stp x4, x5, [x0], #16
+ stp x6, x7, [x0], #16
+ stp x8, x9, [x0], #16
+ stp x10, x11, [x0], #16
+ stp x12, x13, [x0], #16
+ stp x14, x15, [x0], #16
+ stp x16, x17, [x0], #16
+ stp x18, x19, [x0], #16
+ stp x20, x21, [x0], #16
+ stp x22, x23, [x0], #16
+ stp x24, x25, [x0], #16
+ stp x26, x27, [x0], #16
+ stp x28, x29, [x0], #16
+ mov x1, sp
+ stp x30, x1, [x0], #16 // store lr and sp
+ str x30, [x0] // store lr as the PC in order to skip the call
+ // to this function
+ ret
+ENDPROC(perf_regs_load)
--
1.7.11.7