2012-05-16 06:59:27

by Jiri Olsa

[permalink] [raw]
Subject: [PATCHv4 0/7] perf, tool: Fix endian issues

hi,
sending fixies to properly handle perf.data endians.

David,
could you please rerun your test? I tried it on my setup and
it works fine.


v4 changes:
- fixed patch 6/7 which was broken for -fstrict-aliasing related optimization
- added patch 3/7 and 7/7

v3 changes:
- added patch 5 to fix addons bitmask handling

v2 changes:
- added patches 3 and 4 to handle sample_id_all header endianity


Attached patches:
1/7 perf, tool: Handle different endians properly during symbol load
2/7 perf, tool: Carry perf_event_attr bitfield throught different endians
3/7 perf, tool: Add union u64_swap type for swapping u64 data
4/7 perf, tool: Handle endianity swap on sample_id_all header data
5/7 perf, tool: Fix 32 bit values endianity swap for sample_id_all header
6/7 perf, tool: Fix endianity trick for adds_features bitmask
7/7 perf, tool: Fix callchain ip printf


Tested by running following usecases:
test 1)
- origin system:
# perf record -a -- sleep 10 (any perf record will do)
# perf report > report.origin
# perf archive perf.data

- copy the perf.data, report.origin and perf.data.tar.bz2
to a target system and run:
# tar xjvf perf.data.tar.bz2 -C ~/.debug
# perf report > report.target
# diff -u report.origin report.target

- the diff should produce no output
(besides some white space stuff and possibly different
date/TZ output)

test 2)
- origin system:
# perf record -ag -fo /tmp/perf.data -- sleep 1
- mount origin system root to the target system on /mnt/origin
- target system:
# perf script --symfs /mnt/origin -I -i /mnt/origin/tmp/perf.data \
--kallsyms /mnt/origin/proc/kallsyms
- complete perf.data header is displayed

test 3)
- tested with old perf.data version, worked proplerly

Tested by above usecase cross following architectures:
i386, x86_64, s390x, ppc64, ppc32

Big thank to Caspar Zhang who verified this within RH QE testsuites.

thanks,
jirka

CC: Caspar Zhang <[email protected]>
---
tools/perf/util/evsel.c | 39 ++++++++-----
tools/perf/util/header.c | 21 +++++--
tools/perf/util/hist.c | 2 +-
tools/perf/util/include/linux/bitops.h | 1 +
tools/perf/util/session.c | 101 ++++++++++++++++++++++++++++----
tools/perf/util/symbol.c | 33 ++++++++++-
tools/perf/util/symbol.h | 30 +++++++++
tools/perf/util/types.h | 5 ++
8 files changed, 198 insertions(+), 34 deletions(-)


2012-05-16 06:59:36

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 2/7] perf, tool: Carry perf_event_attr bitfield throught different endians

When the perf data file is read cross architectures, the perf_event__attr_swap
function takes care about endianness of all the struct fields except the
bitfield flags.

The bitfield flags need to be transformed as well, since the bitfield
binary storage differs for both endians.

ABI says:
Bit-fields are allocated from right to left (least to most significant)
on little-endian implementations and from left to right (most to least
significant) on big-endian implementations.

The above seems to be byte specific, so we need to reverse each
byte of the bitfield. 'Internet' also says this might be implementation
specific and we probably need proper fix and carry perf_event_attr
bitfield flags in separate data file FEAT_ section. Thought this seems
to work for now.

Note, running following to test perf endianity handling:
test 1)
- origin system:
# perf record -a -- sleep 10 (any perf record will do)
# perf report > report.origin
# perf archive perf.data

- copy the perf.data, report.origin and perf.data.tar.bz2
to a target system and run:
# tar xjvf perf.data.tar.bz2 -C ~/.debug
# perf report > report.target
# diff -u report.origin report.target

- the diff should produce no output
(besides some white space stuff and possibly different
date/TZ output)

test 2)
- origin system:
# perf record -ag -fo /tmp/perf.data -- sleep 1
- mount origin system root to the target system on /mnt/origin
- target system:
# perf script --symfs /mnt/origin -I -i /mnt/origin/tmp/perf.data \
--kallsyms /mnt/origin/proc/kallsyms
- complete perf.data header is displayed

Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/util/session.c | 34 ++++++++++++++++++++++++++++++++++
1 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 4dcc8f3..17c9ace 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -481,6 +481,38 @@ static void perf_event__read_swap(union perf_event *event)
event->read.id = bswap_64(event->read.id);
}

+static u8 revbyte(u8 b)
+{
+ int rev = (b >> 4) | ((b & 0xf) << 4);
+ rev = ((rev & 0xcc) >> 2) | ((rev & 0x33) << 2);
+ rev = ((rev & 0xaa) >> 1) | ((rev & 0x55) << 1);
+ return (u8) rev;
+}
+
+/*
+ * XXX this is hack in attempt to carry flags bitfield
+ * throught endian village. ABI says:
+ *
+ * Bit-fields are allocated from right to left (least to most significant)
+ * on little-endian implementations and from left to right (most to least
+ * significant) on big-endian implementations.
+ *
+ * The above seems to be byte specific, so we need to reverse each
+ * byte of the bitfield. 'Internet' also says this might be implementation
+ * specific and we probably need proper fix and carry perf_event_attr
+ * bitfield flags in separate data file FEAT_ section. Thought this seems
+ * to work for now.
+ */
+static void swap_bitfield(u8 *p, unsigned len)
+{
+ unsigned i;
+
+ for (i = 0; i < len; i++) {
+ *p = revbyte(*p);
+ p++;
+ }
+}
+
/* exported for swapping attributes in file header */
void perf_event__attr_swap(struct perf_event_attr *attr)
{
@@ -494,6 +526,8 @@ void perf_event__attr_swap(struct perf_event_attr *attr)
attr->bp_type = bswap_32(attr->bp_type);
attr->bp_addr = bswap_64(attr->bp_addr);
attr->bp_len = bswap_64(attr->bp_len);
+
+ swap_bitfield((u8 *) (&attr->read_format + 1), sizeof(u64));
}

static void perf_event__hdr_attr_swap(union perf_event *event)
--
1.7.7.6

2012-05-16 06:59:40

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 4/7] perf, tool: Handle endianity swap on sample_id_all header data

Adding endianity swapping for event header attached via sample_id_all.

Currently we dont do that and it's causing wrong data to be read when
running report on architecture with different endianity than the record.

Note, running following to test perf endianity handling:
test 1)
- origin system:
# perf record -a -- sleep 10 (any perf record will do)
# perf report > report.origin
# perf archive perf.data

- copy the perf.data, report.origin and perf.data.tar.bz2
to a target system and run:
# tar xjvf perf.data.tar.bz2 -C ~/.debug
# perf report > report.target
# diff -u report.origin report.target

- the diff should produce no output
(besides some white space stuff and possibly different
date/TZ output)

test 2)
- origin system:
# perf record -ag -fo /tmp/perf.data -- sleep 1
- mount origin system root to the target system on /mnt/origin
- target system:
# perf script --symfs /mnt/origin -I -i /mnt/origin/tmp/perf.data \
--kallsyms /mnt/origin/proc/kallsyms
- complete perf.data header is displayed

Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/util/session.c | 67 +++++++++++++++++++++++++++++++++++++--------
1 files changed, 55 insertions(+), 12 deletions(-)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 17c9ace..b083891 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -441,37 +441,65 @@ void mem_bswap_64(void *src, int byte_size)
}
}

-static void perf_event__all64_swap(union perf_event *event)
+static void swap_sample_id_all(union perf_event *event, void *data)
+{
+ void *end = (void *) event + event->header.size;
+ int size = end - data;
+
+ BUG_ON(size % sizeof(u64));
+ mem_bswap_64(data, size);
+}
+
+static void perf_event__all64_swap(union perf_event *event,
+ bool sample_id_all __used)
{
struct perf_event_header *hdr = &event->header;
mem_bswap_64(hdr + 1, event->header.size - sizeof(*hdr));
}

-static void perf_event__comm_swap(union perf_event *event)
+static void perf_event__comm_swap(union perf_event *event, bool sample_id_all)
{
event->comm.pid = bswap_32(event->comm.pid);
event->comm.tid = bswap_32(event->comm.tid);
+
+ if (sample_id_all) {
+ void *data = &event->comm.comm;
+
+ data += ALIGN(strlen(data) + 1, sizeof(u64));
+ swap_sample_id_all(event, data);
+ }
}

-static void perf_event__mmap_swap(union perf_event *event)
+static void perf_event__mmap_swap(union perf_event *event,
+ bool sample_id_all)
{
event->mmap.pid = bswap_32(event->mmap.pid);
event->mmap.tid = bswap_32(event->mmap.tid);
event->mmap.start = bswap_64(event->mmap.start);
event->mmap.len = bswap_64(event->mmap.len);
event->mmap.pgoff = bswap_64(event->mmap.pgoff);
+
+ if (sample_id_all) {
+ void *data = &event->mmap.filename;
+
+ data += ALIGN(strlen(data) + 1, sizeof(u64));
+ swap_sample_id_all(event, data);
+ }
}

-static void perf_event__task_swap(union perf_event *event)
+static void perf_event__task_swap(union perf_event *event, bool sample_id_all)
{
event->fork.pid = bswap_32(event->fork.pid);
event->fork.tid = bswap_32(event->fork.tid);
event->fork.ppid = bswap_32(event->fork.ppid);
event->fork.ptid = bswap_32(event->fork.ptid);
event->fork.time = bswap_64(event->fork.time);
+
+ if (sample_id_all)
+ swap_sample_id_all(event, &event->fork + 1);
}

-static void perf_event__read_swap(union perf_event *event)
+static void perf_event__read_swap(union perf_event *event, bool sample_id_all)
{
event->read.pid = bswap_32(event->read.pid);
event->read.tid = bswap_32(event->read.tid);
@@ -479,6 +507,9 @@ static void perf_event__read_swap(union perf_event *event)
event->read.time_enabled = bswap_64(event->read.time_enabled);
event->read.time_running = bswap_64(event->read.time_running);
event->read.id = bswap_64(event->read.id);
+
+ if (sample_id_all)
+ swap_sample_id_all(event, &event->read + 1);
}

static u8 revbyte(u8 b)
@@ -530,7 +561,8 @@ void perf_event__attr_swap(struct perf_event_attr *attr)
swap_bitfield((u8 *) (&attr->read_format + 1), sizeof(u64));
}

-static void perf_event__hdr_attr_swap(union perf_event *event)
+static void perf_event__hdr_attr_swap(union perf_event *event,
+ bool sample_id_all __used)
{
size_t size;

@@ -541,18 +573,21 @@ static void perf_event__hdr_attr_swap(union perf_event *event)
mem_bswap_64(event->attr.id, size);
}

-static void perf_event__event_type_swap(union perf_event *event)
+static void perf_event__event_type_swap(union perf_event *event,
+ bool sample_id_all __used)
{
event->event_type.event_type.event_id =
bswap_64(event->event_type.event_type.event_id);
}

-static void perf_event__tracing_data_swap(union perf_event *event)
+static void perf_event__tracing_data_swap(union perf_event *event,
+ bool sample_id_all __used)
{
event->tracing_data.size = bswap_32(event->tracing_data.size);
}

-typedef void (*perf_event__swap_op)(union perf_event *event);
+typedef void (*perf_event__swap_op)(union perf_event *event,
+ bool sample_id_all);

static perf_event__swap_op perf_event__swap_ops[] = {
[PERF_RECORD_MMAP] = perf_event__mmap_swap,
@@ -986,6 +1021,15 @@ static int perf_session__process_user_event(struct perf_session *session, union
}
}

+static void event_swap(union perf_event *event, bool sample_id_all)
+{
+ perf_event__swap_op swap;
+
+ swap = perf_event__swap_ops[event->header.type];
+ if (swap)
+ swap(event, sample_id_all);
+}
+
static int perf_session__process_event(struct perf_session *session,
union perf_event *event,
struct perf_tool *tool,
@@ -994,9 +1038,8 @@ static int perf_session__process_event(struct perf_session *session,
struct perf_sample sample;
int ret;

- if (session->header.needs_swap &&
- perf_event__swap_ops[event->header.type])
- perf_event__swap_ops[event->header.type](event);
+ if (session->header.needs_swap)
+ event_swap(event, session->sample_id_all);

if (event->header.type >= PERF_RECORD_HEADER_MAX)
return -EINVAL;
--
1.7.7.6

2012-05-16 06:59:53

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 6/7] perf, tool: Fix endianity trick for adds_features bitmask

Addons bitmask is stored as array of unsigned long values. The size
of the unsigned long is same as pointer size for architecture, so it
could differ for each architecture.

To handle the endianity for adds_features bitmask, we first swap the
bitmaks as u64 values and check for HEADER_HOSTNAME bit. If not set we
want to unswap the u64 values and swap the adds_features as u32 values.

This is currently buggy, since we swap just first 32bits of each u64
value. Adding swap of the next 32 bits as well. Also adding & using
BITS_TO_U64 instead of BITS_TO_LONGS as counter max due to the different
size of unsigned longs per architecture.

Note, running following to test perf endianity handling:
test 1)
- origin system:
# perf record -a -- sleep 10 (any perf record will do)
# perf report > report.origin
# perf archive perf.data

- copy the perf.data, report.origin and perf.data.tar.bz2
to a target system and run:
# tar xjvf perf.data.tar.bz2 -C ~/.debug
# perf report > report.target
# diff -u report.origin report.target

- the diff should produce no output
(besides some white space stuff and possibly different
date/TZ output)

test 2)
- origin system:
# perf record -ag -fo /tmp/perf.data -- sleep 1
- mount origin system root to the target system on /mnt/origin
- target system:
# perf script --symfs /mnt/origin -I -i /mnt/origin/tmp/perf.data \
--kallsyms /mnt/origin/proc/kallsyms
- complete perf.data header is displayed

Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/util/header.c | 21 ++++++++++++++++-----
tools/perf/util/include/linux/bitops.h | 1 +
2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 5385980..f336f0a 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -1958,13 +1958,24 @@ int perf_file_header__read(struct perf_file_header *header,
* file), punt and fallback to the original behavior --
* clearing all feature bits and setting buildid.
*/
- for (i = 0; i < BITS_TO_LONGS(HEADER_FEAT_BITS); ++i)
- header->adds_features[i] = bswap_64(header->adds_features[i]);
+
+ mem_bswap_64(&header->adds_features,
+ BITS_TO_U64(HEADER_FEAT_BITS));

if (!test_bit(HEADER_HOSTNAME, header->adds_features)) {
- for (i = 0; i < BITS_TO_LONGS(HEADER_FEAT_BITS); ++i) {
- header->adds_features[i] = bswap_64(header->adds_features[i]);
- header->adds_features[i] = bswap_32(header->adds_features[i]);
+ union {
+ union u64_swap *p;
+ unsigned long *adds_features;
+ } feat;
+
+ feat.adds_features = &header->adds_features[0];
+
+ for (i = 0; i < BITS_TO_U64(HEADER_FEAT_BITS); ++i) {
+ union u64_swap *u = &feat.p[i];
+
+ u->val64 = bswap_64(u->val64);
+ u->val32[0] = bswap_32(u->val32[0]);
+ u->val32[1] = bswap_32(u->val32[1]);
}
}

diff --git a/tools/perf/util/include/linux/bitops.h b/tools/perf/util/include/linux/bitops.h
index f1584833..10096cb 100644
--- a/tools/perf/util/include/linux/bitops.h
+++ b/tools/perf/util/include/linux/bitops.h
@@ -8,6 +8,7 @@
#define BITS_PER_LONG __WORDSIZE
#define BITS_PER_BYTE 8
#define BITS_TO_LONGS(nr) DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(long))
+#define BITS_TO_U64(nr) DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(u64))

#define for_each_set_bit(bit, addr, size) \
for ((bit) = find_first_bit((addr), (size)); \
--
1.7.7.6

2012-05-16 06:59:38

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 5/7] perf, tool: Fix 32 bit values endianity swap for sample_id_all header

We swap the sample_id_all header by u64 pointers. Some members
of the header happen to be 32 bit values. We need to handle them
separatelly.

Note, running following to test perf endianity handling:
test 1)
- origin system:
# perf record -a -- sleep 10 (any perf record will do)
# perf report > report.origin
# perf archive perf.data

- copy the perf.data, report.origin and perf.data.tar.bz2
to a target system and run:
# tar xjvf perf.data.tar.bz2 -C ~/.debug
# perf report > report.target
# diff -u report.origin report.target

- the diff should produce no output
(besides some white space stuff and possibly different
date/TZ output)

test 2)
- origin system:
# perf record -ag -fo /tmp/perf.data -- sleep 1
- mount origin system root to the target system on /mnt/origin
- target system:
# perf script --symfs /mnt/origin -I -i /mnt/origin/tmp/perf.data \
--kallsyms /mnt/origin/proc/kallsyms
- complete perf.data header is displayed

Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/util/evsel.c | 29 ++++++++++++++++++++++-------
1 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 4e1b44e..f78467d 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -404,16 +404,24 @@ int perf_evsel__open_per_thread(struct perf_evsel *evsel,
}

static int perf_event__parse_id_sample(const union perf_event *event, u64 type,
- struct perf_sample *sample)
+ struct perf_sample *sample,
+ bool swapped)
{
const u64 *array = event->sample.array;
+ union u64_swap u;

array += ((event->header.size -
sizeof(event->header)) / sizeof(u64)) - 1;

if (type & PERF_SAMPLE_CPU) {
- u32 *p = (u32 *)array;
- sample->cpu = *p;
+ u.val64 = *array;
+ if (swapped) {
+ /* undo swap of u64, then swap on individual u32s */
+ u.val64 = bswap_64(u.val64);
+ u.val32[0] = bswap_32(u.val32[0]);
+ }
+
+ sample->cpu = u.val32[0];
array--;
}

@@ -433,9 +441,16 @@ static int perf_event__parse_id_sample(const union perf_event *event, u64 type,
}

if (type & PERF_SAMPLE_TID) {
- u32 *p = (u32 *)array;
- sample->pid = p[0];
- sample->tid = p[1];
+ u.val64 = *array;
+ if (swapped) {
+ /* undo swap of u64, then swap on individual u32s */
+ u.val64 = bswap_64(u.val64);
+ u.val32[0] = bswap_32(u.val32[0]);
+ u.val32[1] = bswap_32(u.val32[1]);
+ }
+
+ sample->pid = u.val32[0];
+ sample->tid = u.val32[1];
}

return 0;
@@ -472,7 +487,7 @@ int perf_event__parse_sample(const union perf_event *event, u64 type,
if (event->header.type != PERF_RECORD_SAMPLE) {
if (!sample_id_all)
return 0;
- return perf_event__parse_id_sample(event, type, data);
+ return perf_event__parse_id_sample(event, type, data, swapped);
}

array = event->sample.array;
--
1.7.7.6

2012-05-16 07:00:36

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 7/7] perf, tool: Fix callchain ip printf

The callchain address is stored as u64. Current code uses following
format string to display callchain address:
"%p\n", (void *)(long)chain->ip

This way we loose upper 32 bits if we report 64 bit addresses in
32 bit environment. Fixing this to always display whole 64 bits.

Note, running following to test perf endianity handling:
test 1)
- origin system:
# perf record -a -- sleep 10 (any perf record will do)
# perf report > report.origin
# perf archive perf.data

- copy the perf.data, report.origin and perf.data.tar.bz2
to a target system and run:
# tar xjvf perf.data.tar.bz2 -C ~/.debug
# perf report > report.target
# diff -u report.origin report.target

- the diff should produce no output
(besides some white space stuff and possibly different
date/TZ output)

test 2)
- origin system:
# perf record -ag -fo /tmp/perf.data -- sleep 1
- mount origin system root to the target system on /mnt/origin
- target system:
# perf script --symfs /mnt/origin -I -i /mnt/origin/tmp/perf.data \
--kallsyms /mnt/origin/proc/kallsyms
- complete perf.data header is displayed

Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/util/hist.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 9f6d630..1293b5e 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -599,7 +599,7 @@ static size_t ipchain__fprintf_graph(FILE *fp, struct callchain_list *chain,
if (chain->ms.sym)
ret += fprintf(fp, "%s\n", chain->ms.sym->name);
else
- ret += fprintf(fp, "%p\n", (void *)(long)chain->ip);
+ ret += fprintf(fp, "0x%0" PRIx64 "\n", chain->ip);

return ret;
}
--
1.7.7.6

2012-05-16 06:59:35

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 3/7] perf, tool: Add union u64_swap type for swapping u64 data

The following union:
union {
u64 val64;
u32 val32[2];
} u;

is used on more than single places in perf code and will be used
more in upcomming patches.

Adding union u64_swap to have it defined globaly so we dont need
to redefine it all the time.

Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/util/evsel.c | 10 ++--------
tools/perf/util/types.h | 5 +++++
2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 21eaab2..4e1b44e 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -462,10 +462,7 @@ int perf_event__parse_sample(const union perf_event *event, u64 type,
* used for cross-endian analysis. See git commit 65014ab3
* for why this goofiness is needed.
*/
- union {
- u64 val64;
- u32 val32[2];
- } u;
+ union u64_swap u;

memset(data, 0, sizeof(*data));
data->cpu = data->pid = data->tid = -1;
@@ -608,10 +605,7 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type,
* used for cross-endian analysis. See git commit 65014ab3
* for why this goofiness is needed.
*/
- union {
- u64 val64;
- u32 val32[2];
- } u;
+ union u64_swap u;

array = event->sample.array;

diff --git a/tools/perf/util/types.h b/tools/perf/util/types.h
index 5f3689a..c51fa6b 100644
--- a/tools/perf/util/types.h
+++ b/tools/perf/util/types.h
@@ -16,4 +16,9 @@ typedef signed short s16;
typedef unsigned char u8;
typedef signed char s8;

+union u64_swap {
+ u64 val64;
+ u32 val32[2];
+};
+
#endif /* __PERF_TYPES_H */
--
1.7.7.6

2012-05-16 07:01:11

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 1/7] perf, tool: Handle different endians properly during symbol load

Currently we dont care about the file object's endianness. It's possible
we read buildid file object from different architecture than we are
currentlly running on. So we need to care about properly reading such
object's data - handle different endianness properly.

Adding:
needs_swap DSO field
dso__swap_init function to initialize DSO's needs_swap
DSO__READ to read the data with proper swaps

Note, running following to test perf endianity handling:
test 1)
- origin system:
# perf record -a -- sleep 10 (any perf record will do)
# perf report > report.origin
# perf archive perf.data

- copy the perf.data, report.origin and perf.data.tar.bz2
to a target system and run:
# tar xjvf perf.data.tar.bz2 -C ~/.debug
# perf report > report.target
# diff -u report.origin report.target

- the diff should produce no output
(besides some white space stuff and possibly different
date/TZ output)

test 1)
- origin system:
# perf record -ag -fo /tmp/perf.data -- sleep 1
- mount origin system root to the target system on /mnt/origin
- target system:
# perf script --symfs /mnt/origin -I -i /mnt/origin/tmp/perf.data \
--kallsyms /mnt/origin/proc/kallsyms
- complete perf.data header is displayed

Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/util/symbol.c | 33 ++++++++++++++++++++++++++++++++-
tools/perf/util/symbol.h | 30 ++++++++++++++++++++++++++++++
2 files changed, 62 insertions(+), 1 deletions(-)

diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index e2ba885..04a83c5 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -323,6 +323,7 @@ struct dso *dso__new(const char *name)
dso->sorted_by_name = 0;
dso->has_build_id = 0;
dso->kernel = DSO_TYPE_USER;
+ dso->needs_swap = DSO_SWAP__UNSET;
INIT_LIST_HEAD(&dso->node);
}

@@ -1156,6 +1157,33 @@ static size_t elf_addr_to_index(Elf *elf, GElf_Addr addr)
return -1;
}

+static int dso__swap_init(struct dso *dso, unsigned char eidata)
+{
+ static unsigned int const endian = 1;
+
+ dso->needs_swap = DSO_SWAP__NO;
+
+ switch (eidata) {
+ case ELFDATA2LSB:
+ /* We are big endian, DSO is little endian. */
+ if (*(unsigned char const *)&endian != 1)
+ dso->needs_swap = DSO_SWAP__YES;
+ break;
+
+ case ELFDATA2MSB:
+ /* We are little endian, DSO is big endian. */
+ if (*(unsigned char const *)&endian != 0)
+ dso->needs_swap = DSO_SWAP__YES;
+ break;
+
+ default:
+ pr_err("unrecognized DSO data encoding %d\n", eidata);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
static int dso__load_sym(struct dso *dso, struct map *map, const char *name,
int fd, symbol_filter_t filter, int kmodule,
int want_symtab)
@@ -1187,6 +1215,9 @@ static int dso__load_sym(struct dso *dso, struct map *map, const char *name,
goto out_elf_end;
}

+ if (dso__swap_init(dso, ehdr.e_ident[EI_DATA]))
+ goto out_elf_end;
+
/* Always reject images with a mismatched build-id: */
if (dso->has_build_id) {
u8 build_id[BUILD_ID_SIZE];
@@ -1272,7 +1303,7 @@ static int dso__load_sym(struct dso *dso, struct map *map, const char *name,
if (opdsec && sym.st_shndx == opdidx) {
u32 offset = sym.st_value - opdshdr.sh_addr;
u64 *opd = opddata->d_buf + offset;
- sym.st_value = *opd;
+ sym.st_value = DSO__READ(dso, u64, *opd);
sym.st_shndx = elf_addr_to_index(elf, sym.st_value);
}

diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 5649d63..be14744 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -9,6 +9,7 @@
#include <linux/list.h>
#include <linux/rbtree.h>
#include <stdio.h>
+#include <byteswap.h>

#ifdef HAVE_CPLUS_DEMANGLE
extern char *cplus_demangle(const char *, int);
@@ -160,11 +161,18 @@ enum dso_kernel_type {
DSO_TYPE_GUEST_KERNEL
};

+enum dso_swap_type {
+ DSO_SWAP__UNSET,
+ DSO_SWAP__NO,
+ DSO_SWAP__YES,
+};
+
struct dso {
struct list_head node;
struct rb_root symbols[MAP__NR_TYPES];
struct rb_root symbol_names[MAP__NR_TYPES];
enum dso_kernel_type kernel;
+ enum dso_swap_type needs_swap;
u8 adjust_symbols:1;
u8 has_build_id:1;
u8 hit:1;
@@ -182,6 +190,28 @@ struct dso {
char name[0];
};

+#define DSO__READ(dso, type, val) \
+({ \
+ type ____r = val; \
+ BUG_ON(dso->needs_swap == DSO_SWAP__UNSET); \
+ if (dso->needs_swap == DSO_SWAP__YES) { \
+ switch (sizeof(____r)) { \
+ case 2: \
+ ____r = bswap_16(val); \
+ break; \
+ case 4: \
+ ____r = bswap_32(val); \
+ break; \
+ case 8: \
+ ____r = bswap_64(val); \
+ break; \
+ default: \
+ BUG_ON(1); \
+ } \
+ } \
+ ____r; \
+})
+
struct dso *dso__new(const char *name);
void dso__delete(struct dso *dso);

--
1.7.7.6

2012-05-21 07:40:32

by Jiri Olsa

[permalink] [raw]
Subject: [tip:perf/core] perf hists: Fix callchain ip printf format

Commit-ID: a0187060f4ab68cf1aa533446b906cae5b14eb48
Gitweb: http://git.kernel.org/tip/a0187060f4ab68cf1aa533446b906cae5b14eb48
Author: Jiri Olsa <[email protected]>
AuthorDate: Wed, 16 May 2012 08:59:08 +0200
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitDate: Thu, 17 May 2012 13:18:19 -0300

perf hists: Fix callchain ip printf format

The callchain address is stored as u64. Current code uses following
format string to display callchain address:

"%p\n", (void *)(long)chain->ip

This way we lose upper 32 bits if we report 64 bit addresses in 32 bit
environment. Fixing this to always display whole 64 bits.

Note, running following to test perf endianity handling:
test 1)
- origin system:
# perf record -a -- sleep 10 (any perf record will do)
# perf report > report.origin
# perf archive perf.data

- copy the perf.data, report.origin and perf.data.tar.bz2
to a target system and run:
# tar xjvf perf.data.tar.bz2 -C ~/.debug
# perf report > report.target
# diff -u report.origin report.target

- the diff should produce no output
(besides some white space stuff and possibly different
date/TZ output)

test 2)
- origin system:
# perf record -ag -fo /tmp/perf.data -- sleep 1
- mount origin system root to the target system on /mnt/origin
- target system:
# perf script --symfs /mnt/origin -I -i /mnt/origin/tmp/perf.data \
--kallsyms /mnt/origin/proc/kallsyms
- complete perf.data header is displayed

Signed-off-by: Jiri Olsa <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/hist.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 9f6d630..1293b5e 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -599,7 +599,7 @@ static size_t ipchain__fprintf_graph(FILE *fp, struct callchain_list *chain,
if (chain->ms.sym)
ret += fprintf(fp, "%s\n", chain->ms.sym->name);
else
- ret += fprintf(fp, "%p\n", (void *)(long)chain->ip);
+ ret += fprintf(fp, "0x%0" PRIx64 "\n", chain->ip);

return ret;
}

2012-05-22 03:26:47

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH 1/7] perf, tool: Handle different endians properly during symbol load

On 5/16/12 12:59 AM, Jiri Olsa wrote:
> Currently we dont care about the file object's endianness. It's possible
> we read buildid file object from different architecture than we are
> currentlly running on. So we need to care about properly reading such
> object's data - handle different endianness properly.
>
> Adding:
> needs_swap DSO field
> dso__swap_init function to initialize DSO's needs_swap
> DSO__READ to read the data with proper swaps
>
> Note, running following to test perf endianity handling:
> test 1)
> - origin system:
> # perf record -a -- sleep 10 (any perf record will do)
> # perf report> report.origin
> # perf archive perf.data
>
> - copy the perf.data, report.origin and perf.data.tar.bz2
> to a target system and run:
> # tar xjvf perf.data.tar.bz2 -C ~/.debug
> # perf report> report.target
> # diff -u report.origin report.target
>
> - the diff should produce no output
> (besides some white space stuff and possibly different
> date/TZ output)
>
> test 1)
> - origin system:
> # perf record -ag -fo /tmp/perf.data -- sleep 1
> - mount origin system root to the target system on /mnt/origin
> - target system:
> # perf script --symfs /mnt/origin -I -i /mnt/origin/tmp/perf.data \
> --kallsyms /mnt/origin/proc/kallsyms
> - complete perf.data header is displayed
>
> Signed-off-by: Jiri Olsa<[email protected]>
> ---
> tools/perf/util/symbol.c | 33 ++++++++++++++++++++++++++++++++-
> tools/perf/util/symbol.h | 30 ++++++++++++++++++++++++++++++
> 2 files changed, 62 insertions(+), 1 deletions(-)
>
> diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> index e2ba885..04a83c5 100644
> --- a/tools/perf/util/symbol.c
> +++ b/tools/perf/util/symbol.c
> @@ -323,6 +323,7 @@ struct dso *dso__new(const char *name)
> dso->sorted_by_name = 0;
> dso->has_build_id = 0;
> dso->kernel = DSO_TYPE_USER;
> + dso->needs_swap = DSO_SWAP__UNSET;
> INIT_LIST_HEAD(&dso->node);
> }
>
> @@ -1156,6 +1157,33 @@ static size_t elf_addr_to_index(Elf *elf, GElf_Addr addr)
> return -1;
> }
>
> +static int dso__swap_init(struct dso *dso, unsigned char eidata)
> +{
> + static unsigned int const endian = 1;
> +
> + dso->needs_swap = DSO_SWAP__NO;
> +
> + switch (eidata) {
> + case ELFDATA2LSB:
> + /* We are big endian, DSO is little endian. */
> + if (*(unsigned char const *)&endian != 1)
> + dso->needs_swap = DSO_SWAP__YES;
> + break;
> +
> + case ELFDATA2MSB:
> + /* We are little endian, DSO is big endian. */
> + if (*(unsigned char const *)&endian != 0)
> + dso->needs_swap = DSO_SWAP__YES;
> + break;
> +
> + default:
> + pr_err("unrecognized DSO data encoding %d\n", eidata);
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
> static int dso__load_sym(struct dso *dso, struct map *map, const char *name,
> int fd, symbol_filter_t filter, int kmodule,
> int want_symtab)
> @@ -1187,6 +1215,9 @@ static int dso__load_sym(struct dso *dso, struct map *map, const char *name,
> goto out_elf_end;
> }
>
> + if (dso__swap_init(dso, ehdr.e_ident[EI_DATA]))
> + goto out_elf_end;
> +
> /* Always reject images with a mismatched build-id: */
> if (dso->has_build_id) {
> u8 build_id[BUILD_ID_SIZE];
> @@ -1272,7 +1303,7 @@ static int dso__load_sym(struct dso *dso, struct map *map, const char *name,
> if (opdsec&& sym.st_shndx == opdidx) {
> u32 offset = sym.st_value - opdshdr.sh_addr;
> u64 *opd = opddata->d_buf + offset;
> - sym.st_value = *opd;
> + sym.st_value = DSO__READ(dso, u64, *opd);
> sym.st_shndx = elf_addr_to_index(elf, sym.st_value);
> }
>
> diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
> index 5649d63..be14744 100644
> --- a/tools/perf/util/symbol.h
> +++ b/tools/perf/util/symbol.h
> @@ -9,6 +9,7 @@
> #include<linux/list.h>
> #include<linux/rbtree.h>
> #include<stdio.h>
> +#include<byteswap.h>
>
> #ifdef HAVE_CPLUS_DEMANGLE
> extern char *cplus_demangle(const char *, int);
> @@ -160,11 +161,18 @@ enum dso_kernel_type {
> DSO_TYPE_GUEST_KERNEL
> };
>
> +enum dso_swap_type {
> + DSO_SWAP__UNSET,
> + DSO_SWAP__NO,
> + DSO_SWAP__YES,
> +};
> +
> struct dso {
> struct list_head node;
> struct rb_root symbols[MAP__NR_TYPES];
> struct rb_root symbol_names[MAP__NR_TYPES];
> enum dso_kernel_type kernel;
> + enum dso_swap_type needs_swap;
> u8 adjust_symbols:1;
> u8 has_build_id:1;
> u8 hit:1;
> @@ -182,6 +190,28 @@ struct dso {
> char name[0];
> };
>
> +#define DSO__READ(dso, type, val) \

s/DSO__READ/DSO__SWAP/? it's swapping byes, not reading.

David

2012-05-22 03:29:08

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH 2/7] perf, tool: Carry perf_event_attr bitfield throught different endians

On 5/16/12 12:59 AM, Jiri Olsa wrote:
> When the perf data file is read cross architectures, the perf_event__attr_swap
> function takes care about endianness of all the struct fields except the
> bitfield flags.
>
> The bitfield flags need to be transformed as well, since the bitfield
> binary storage differs for both endians.
>
> ABI says:
> Bit-fields are allocated from right to left (least to most significant)
> on little-endian implementations and from left to right (most to least
> significant) on big-endian implementations.
>
> The above seems to be byte specific, so we need to reverse each
> byte of the bitfield. 'Internet' also says this might be implementation
> specific and we probably need proper fix and carry perf_event_attr
> bitfield flags in separate data file FEAT_ section. Thought this seems
> to work for now.
>
> Note, running following to test perf endianity handling:
> test 1)
> - origin system:
> # perf record -a -- sleep 10 (any perf record will do)
> # perf report> report.origin
> # perf archive perf.data
>
> - copy the perf.data, report.origin and perf.data.tar.bz2
> to a target system and run:
> # tar xjvf perf.data.tar.bz2 -C ~/.debug
> # perf report> report.target
> # diff -u report.origin report.target
>
> - the diff should produce no output
> (besides some white space stuff and possibly different
> date/TZ output)
>
> test 2)
> - origin system:
> # perf record -ag -fo /tmp/perf.data -- sleep 1
> - mount origin system root to the target system on /mnt/origin
> - target system:
> # perf script --symfs /mnt/origin -I -i /mnt/origin/tmp/perf.data \
> --kallsyms /mnt/origin/proc/kallsyms
> - complete perf.data header is displayed
>
> Signed-off-by: Jiri Olsa<[email protected]>
> ---
> tools/perf/util/session.c | 34 ++++++++++++++++++++++++++++++++++
> 1 files changed, 34 insertions(+), 0 deletions(-)


Reviewed-by and Tested-by: David Ahern <[email protected]>

2012-05-22 03:29:16

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH 3/7] perf, tool: Add union u64_swap type for swapping u64 data

On 5/16/12 12:59 AM, Jiri Olsa wrote:
> The following union:
> union {
> u64 val64;
> u32 val32[2];
> } u;
>
> is used on more than single places in perf code and will be used
> more in upcomming patches.
>
> Adding union u64_swap to have it defined globaly so we dont need
> to redefine it all the time.
>
> Signed-off-by: Jiri Olsa<[email protected]>
> ---
> tools/perf/util/evsel.c | 10 ++--------
> tools/perf/util/types.h | 5 +++++
> 2 files changed, 7 insertions(+), 8 deletions(-)
>

Reviewed-by and Tested-by: David Ahern <[email protected]>

2012-05-22 03:35:53

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH 4/7] perf, tool: Handle endianity swap on sample_id_all header data

On 5/16/12 12:59 AM, Jiri Olsa wrote:
> Adding endianity swapping for event header attached via sample_id_all.
>
> Currently we dont do that and it's causing wrong data to be read when
> running report on architecture with different endianity than the record.
>
> Note, running following to test perf endianity handling:
> test 1)
> - origin system:
> # perf record -a -- sleep 10 (any perf record will do)
> # perf report> report.origin
> # perf archive perf.data
>
> - copy the perf.data, report.origin and perf.data.tar.bz2
> to a target system and run:
> # tar xjvf perf.data.tar.bz2 -C ~/.debug
> # perf report> report.target
> # diff -u report.origin report.target
>
> - the diff should produce no output
> (besides some white space stuff and possibly different
> date/TZ output)
>
> test 2)
> - origin system:
> # perf record -ag -fo /tmp/perf.data -- sleep 1
> - mount origin system root to the target system on /mnt/origin
> - target system:
> # perf script --symfs /mnt/origin -I -i /mnt/origin/tmp/perf.data \
> --kallsyms /mnt/origin/proc/kallsyms
> - complete perf.data header is displayed
>
> Signed-off-by: Jiri Olsa<[email protected]>

The code change is fine, but the commit message could use some
additions: for example, what does the current output look like and how
does the patch change it.

For example, perf is currently able to process 32-bit PPC samples on
32-bit and 64-bit x86 -- that's the use case I have and it works. So an
example of the affects of this patch for the commit log would be helpful.

Code wise:
Reviewed-by and Tested-by: David Ahern <[email protected]>

2012-05-22 04:38:15

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH 5/7] perf, tool: Fix 32 bit values endianity swap for sample_id_all header

On 5/16/12 12:59 AM, Jiri Olsa wrote:
> We swap the sample_id_all header by u64 pointers. Some members
> of the header happen to be 32 bit values. We need to handle them
> separatelly.
>
> Note, running following to test perf endianity handling:
> test 1)
> - origin system:
> # perf record -a -- sleep 10 (any perf record will do)
> # perf report> report.origin
> # perf archive perf.data
>
> - copy the perf.data, report.origin and perf.data.tar.bz2
> to a target system and run:
> # tar xjvf perf.data.tar.bz2 -C ~/.debug
> # perf report> report.target
> # diff -u report.origin report.target
>
> - the diff should produce no output
> (besides some white space stuff and possibly different
> date/TZ output)
>
> test 2)
> - origin system:
> # perf record -ag -fo /tmp/perf.data -- sleep 1
> - mount origin system root to the target system on /mnt/origin
> - target system:
> # perf script --symfs /mnt/origin -I -i /mnt/origin/tmp/perf.data \
> --kallsyms /mnt/origin/proc/kallsyms
> - complete perf.data header is displayed
>
> Signed-off-by: Jiri Olsa<[email protected]>

Same comment as the last - the commit log could use some words about
command line arguments you are running and how the output is affected.
As I mentioned perf-report/script on x86 processes data files and parses
samples from 32-bit ppc just fine -- including tid, pid, comm,
filenames, symbols, etc. So is the sample_id_all path run and how does
this patch change it.

Code wise:
Reviewed-by and Tested-by: David Ahern <[email protected]>

2012-05-22 04:38:25

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH 6/7] perf, tool: Fix endianity trick for adds_features bitmask

On 5/16/12 12:59 AM, Jiri Olsa wrote:
> Addons bitmask is stored as array of unsigned long values. The size
> of the unsigned long is same as pointer size for architecture, so it
> could differ for each architecture.
>
> To handle the endianity for adds_features bitmask, we first swap the
> bitmaks as u64 values and check for HEADER_HOSTNAME bit. If not set we
> want to unswap the u64 values and swap the adds_features as u32 values.
>
> This is currently buggy, since we swap just first 32bits of each u64
> value. Adding swap of the next 32 bits as well. Also adding& using
> BITS_TO_U64 instead of BITS_TO_LONGS as counter max due to the different
> size of unsigned longs per architecture.
>
> Note, running following to test perf endianity handling:
> test 1)
> - origin system:
> # perf record -a -- sleep 10 (any perf record will do)
> # perf report> report.origin
> # perf archive perf.data
>
> - copy the perf.data, report.origin and perf.data.tar.bz2
> to a target system and run:
> # tar xjvf perf.data.tar.bz2 -C ~/.debug
> # perf report> report.target
> # diff -u report.origin report.target
>
> - the diff should produce no output
> (besides some white space stuff and possibly different
> date/TZ output)
>
> test 2)
> - origin system:
> # perf record -ag -fo /tmp/perf.data -- sleep 1
> - mount origin system root to the target system on /mnt/origin
> - target system:
> # perf script --symfs /mnt/origin -I -i /mnt/origin/tmp/perf.data \
> --kallsyms /mnt/origin/proc/kallsyms
> - complete perf.data header is displayed
>
> Signed-off-by: Jiri Olsa<[email protected]>

32-bit ppc reading 64-bit x86 still does not work:

# ========
# captured on: Sun May 20 19:23:23 2012
# ========
#

Why not? It suggests there is still a bug in the processing of the
adds_feature bitmap.

David

2012-05-22 08:42:14

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 6/7] perf, tool: Fix endianity trick for adds_features bitmask

On Mon, May 21, 2012 at 10:38:17PM -0600, David Ahern wrote:
> On 5/16/12 12:59 AM, Jiri Olsa wrote:
> >Addons bitmask is stored as array of unsigned long values. The size
> >of the unsigned long is same as pointer size for architecture, so it
> >could differ for each architecture.
> >
> >To handle the endianity for adds_features bitmask, we first swap the
> >bitmaks as u64 values and check for HEADER_HOSTNAME bit. If not set we
> >want to unswap the u64 values and swap the adds_features as u32 values.
> >
> >This is currently buggy, since we swap just first 32bits of each u64
> >value. Adding swap of the next 32 bits as well. Also adding& using
> >BITS_TO_U64 instead of BITS_TO_LONGS as counter max due to the different
> >size of unsigned longs per architecture.
> >
> >Note, running following to test perf endianity handling:
> >test 1)
> > - origin system:
> > # perf record -a -- sleep 10 (any perf record will do)
> > # perf report> report.origin
> > # perf archive perf.data
> >
> > - copy the perf.data, report.origin and perf.data.tar.bz2
> > to a target system and run:
> > # tar xjvf perf.data.tar.bz2 -C ~/.debug
> > # perf report> report.target
> > # diff -u report.origin report.target
> >
> > - the diff should produce no output
> > (besides some white space stuff and possibly different
> > date/TZ output)
> >
> >test 2)
> > - origin system:
> > # perf record -ag -fo /tmp/perf.data -- sleep 1
> > - mount origin system root to the target system on /mnt/origin
> > - target system:
> > # perf script --symfs /mnt/origin -I -i /mnt/origin/tmp/perf.data \
> > --kallsyms /mnt/origin/proc/kallsyms
> > - complete perf.data header is displayed
> >
> >Signed-off-by: Jiri Olsa<[email protected]>
>
> 32-bit ppc reading 64-bit x86 still does not work:
>
> # ========
> # captured on: Sun May 20 19:23:23 2012
> # ========
> #
>
> Why not? It suggests there is still a bug in the processing of the
> adds_feature bitmap.

hm, any special details for the record? because I'm sure I tested this way..

I'll retest, thanks
jirka

2012-05-22 15:48:17

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH 6/7] perf, tool: Fix endianity trick for adds_features bitmask

On 5/22/12 2:41 AM, Jiri Olsa wrote:

> hm, any special details for the record? because I'm sure I tested this way..
>
> I'll retest, thanks
> jirka


The attached fixes it.


Attachments:
perf-swap-features.patch (3.92 kB)

2012-05-22 15:54:18

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 5/7] perf, tool: Fix 32 bit values endianity swap for sample_id_all header

Em Mon, May 21, 2012 at 10:38:09PM -0600, David Ahern escreveu:
> On 5/16/12 12:59 AM, Jiri Olsa wrote:
> >We swap the sample_id_all header by u64 pointers. Some members
> >of the header happen to be 32 bit values. We need to handle them
> >separatelly.
> >
> >Note, running following to test perf endianity handling:
> >test 1)
> > - origin system:
> > # perf record -a -- sleep 10 (any perf record will do)
> > # perf report> report.origin
> > # perf archive perf.data
> >
> > - copy the perf.data, report.origin and perf.data.tar.bz2
> > to a target system and run:
> > # tar xjvf perf.data.tar.bz2 -C ~/.debug
> > # perf report> report.target
> > # diff -u report.origin report.target
> >
> > - the diff should produce no output
> > (besides some white space stuff and possibly different
> > date/TZ output)
> >
> >test 2)
> > - origin system:
> > # perf record -ag -fo /tmp/perf.data -- sleep 1
> > - mount origin system root to the target system on /mnt/origin
> > - target system:
> > # perf script --symfs /mnt/origin -I -i /mnt/origin/tmp/perf.data \
> > --kallsyms /mnt/origin/proc/kallsyms
> > - complete perf.data header is displayed
> >
> >Signed-off-by: Jiri Olsa<[email protected]>
>
> Same comment as the last - the commit log could use some words about
> command line arguments you are running and how the output is
> affected. As I mentioned perf-report/script on x86 processes data
> files and parses samples from 32-bit ppc just fine -- including tid,
> pid, comm, filenames, symbols, etc. So is the sample_id_all path run
> and how does this patch change it.

Agreed, Jiri, can you do it? I've applied (2,3)/7, waiting for David's
suggestions to be addressed to pick the rest.

- Arnaldo

> Code wise:
> Reviewed-by and Tested-by: David Ahern <[email protected]>

2012-05-23 15:27:14

by Jiri Olsa

[permalink] [raw]
Subject: [tip:perf/core] perf tools: Carry perf_event_attr bitfield throught different endians

Commit-ID: e108c66e2c458f89931189a63a67ad16880d7f51
Gitweb: http://git.kernel.org/tip/e108c66e2c458f89931189a63a67ad16880d7f51
Author: Jiri Olsa <[email protected]>
AuthorDate: Wed, 16 May 2012 08:59:03 +0200
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitDate: Tue, 22 May 2012 12:48:24 -0300

perf tools: Carry perf_event_attr bitfield throught different endians

When the perf data file is read cross architectures, the
perf_event__attr_swap function takes care about endianness of all the
struct fields except the bitfield flags.

The bitfield flags need to be transformed as well, since the bitfield
binary storage differs for both endians.

ABI says:
Bit-fields are allocated from right to left (least to most significant)
on little-endian implementations and from left to right (most to least
significant) on big-endian implementations.

The above seems to be byte specific, so we need to reverse each byte of
the bitfield. 'Internet' also says this might be implementation specific
and we probably need proper fix and carry perf_event_attr bitfield flags
in separate data file FEAT_ section. Thought this seems to work for now.

Note, running following to test perf endianity handling:
test 1)
- origin system:
# perf record -a -- sleep 10 (any perf record will do)
# perf report > report.origin
# perf archive perf.data

- copy the perf.data, report.origin and perf.data.tar.bz2
to a target system and run:
# tar xjvf perf.data.tar.bz2 -C ~/.debug
# perf report > report.target
# diff -u report.origin report.target

- the diff should produce no output
(besides some white space stuff and possibly different
date/TZ output)

test 2)
- origin system:
# perf record -ag -fo /tmp/perf.data -- sleep 1
- mount origin system root to the target system on /mnt/origin
- target system:
# perf script --symfs /mnt/origin -I -i /mnt/origin/tmp/perf.data \
--kallsyms /mnt/origin/proc/kallsyms
- complete perf.data header is displayed

Signed-off-by: Jiri Olsa <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Tested-by: David Ahern <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/session.c | 34 ++++++++++++++++++++++++++++++++++
1 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 4dcc8f3..17c9ace 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -481,6 +481,38 @@ static void perf_event__read_swap(union perf_event *event)
event->read.id = bswap_64(event->read.id);
}

+static u8 revbyte(u8 b)
+{
+ int rev = (b >> 4) | ((b & 0xf) << 4);
+ rev = ((rev & 0xcc) >> 2) | ((rev & 0x33) << 2);
+ rev = ((rev & 0xaa) >> 1) | ((rev & 0x55) << 1);
+ return (u8) rev;
+}
+
+/*
+ * XXX this is hack in attempt to carry flags bitfield
+ * throught endian village. ABI says:
+ *
+ * Bit-fields are allocated from right to left (least to most significant)
+ * on little-endian implementations and from left to right (most to least
+ * significant) on big-endian implementations.
+ *
+ * The above seems to be byte specific, so we need to reverse each
+ * byte of the bitfield. 'Internet' also says this might be implementation
+ * specific and we probably need proper fix and carry perf_event_attr
+ * bitfield flags in separate data file FEAT_ section. Thought this seems
+ * to work for now.
+ */
+static void swap_bitfield(u8 *p, unsigned len)
+{
+ unsigned i;
+
+ for (i = 0; i < len; i++) {
+ *p = revbyte(*p);
+ p++;
+ }
+}
+
/* exported for swapping attributes in file header */
void perf_event__attr_swap(struct perf_event_attr *attr)
{
@@ -494,6 +526,8 @@ void perf_event__attr_swap(struct perf_event_attr *attr)
attr->bp_type = bswap_32(attr->bp_type);
attr->bp_addr = bswap_64(attr->bp_addr);
attr->bp_len = bswap_64(attr->bp_len);
+
+ swap_bitfield((u8 *) (&attr->read_format + 1), sizeof(u64));
}

static void perf_event__hdr_attr_swap(union perf_event *event)

2012-05-23 15:27:52

by Jiri Olsa

[permalink] [raw]
Subject: [tip:perf/core] perf tools: Add union u64_swap type for swapping u64 data

Commit-ID: 6a11f92ef449bfb87f93e7cc14cb2a717afc7aa3
Gitweb: http://git.kernel.org/tip/6a11f92ef449bfb87f93e7cc14cb2a717afc7aa3
Author: Jiri Olsa <[email protected]>
AuthorDate: Wed, 16 May 2012 08:59:04 +0200
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitDate: Tue, 22 May 2012 12:50:25 -0300

perf tools: Add union u64_swap type for swapping u64 data

The following union:
union {
u64 val64;
u32 val32[2];
} u;

is used on more than one place in perf code and will be used more in
upcomming patches.

Adding union u64_swap to have it defined globaly so we dont need to
redefine it all the time.

Signed-off-by: Jiri Olsa <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Tested-by: David Ahern <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/evsel.c | 10 ++--------
tools/perf/util/types.h | 5 +++++
2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 9abd8ac..57e4ce5 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -462,10 +462,7 @@ int perf_event__parse_sample(const union perf_event *event, u64 type,
* used for cross-endian analysis. See git commit 65014ab3
* for why this goofiness is needed.
*/
- union {
- u64 val64;
- u32 val32[2];
- } u;
+ union u64_swap u;

memset(data, 0, sizeof(*data));
data->cpu = data->pid = data->tid = -1;
@@ -608,10 +605,7 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type,
* used for cross-endian analysis. See git commit 65014ab3
* for why this goofiness is needed.
*/
- union {
- u64 val64;
- u32 val32[2];
- } u;
+ union u64_swap u;

array = event->sample.array;

diff --git a/tools/perf/util/types.h b/tools/perf/util/types.h
index 5f3689a..c51fa6b 100644
--- a/tools/perf/util/types.h
+++ b/tools/perf/util/types.h
@@ -16,4 +16,9 @@ typedef signed short s16;
typedef unsigned char u8;
typedef signed char s8;

+union u64_swap {
+ u64 val64;
+ u32 val32[2];
+};
+
#endif /* __PERF_TYPES_H */

2012-05-23 17:59:49

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 6/7] perf, tool: Fix endianity trick for adds_features bitmask

On Tue, May 22, 2012 at 09:48:09AM -0600, David Ahern wrote:
> On 5/22/12 2:41 AM, Jiri Olsa wrote:
>
> >hm, any special details for the record? because I'm sure I tested this way..
> >
> >I'll retest, thanks
> >jirka
>
>
> The attached fixes it.

> commit 1353676ca6551a0165df030784ada20ebea73f73
> Author: David Ahern <[email protected]>
> Date: Tue May 22 09:40:17 2012 -0600
>
> perf, tool: Fix endianity swapping for adds_features bitmask
>
> Based on Jiri's latest attempt:
> https://lkml.org/lkml/2012/5/16/61
>
> Basically, adds_features should be byte swapped assuming unsigned
> longs are either 8-bytes (u64) or 4-bytes (u32).
>
> Fixes 32-bit ppc dumping 64-bit x86 feature data:
> ========
> captured on: Sun May 20 19:23:23 2012
> hostname : nxos-vdc-dev3
> os release : 3.4.0-rc7+
> perf version : 3.4.rc4.137.g978da3
> arch : x86_64
> nrcpus online : 16
> nrcpus avail : 16
> cpudesc : Intel(R) Xeon(R) CPU E5540 @ 2.53GHz
> cpuid : GenuineIntel,6,26,5
> total memory : 24680324 kB
> ...
>
> Verified 64-bit x86 can still dump feature data for 32-bit ppc.
>
> Signed-off-by: David Ahern <[email protected]>

I got the header properly displayed with this patch, but I'm getting
following diffs in the perf report output (ppc32 vs x86_64):
(after moving origin perf archive build-id cache to target system)

- 0.00% perf [ext4] [k] 0x0005b318
+ 0.00% perf [ext4] [k] .cleanup_module

- 0.00% yes [kernel.kallsyms] [k] .sys_write
+ 0.00% yes [kernel.kallsyms] [k] .SyS_write

^^^ this one is particularly disturbing ;)

I guess it's unrelated to the header stuff which your patch fixes
properly I think, but I got small conflict rebasing this to current tip

Reviewed-by: Jiri Olsa <[email protected]>

2012-05-24 15:32:20

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH 6/7] perf, tool: Fix endianity trick for adds_features bitmask

On 5/23/12 11:59 AM, Jiri Olsa wrote:
> On Tue, May 22, 2012 at 09:48:09AM -0600, David Ahern wrote:
>> On 5/22/12 2:41 AM, Jiri Olsa wrote:
>>
>>> hm, any special details for the record? because I'm sure I tested this way..
>>>
>>> I'll retest, thanks
>>> jirka
>>
>>
>> The attached fixes it.
>
>> commit 1353676ca6551a0165df030784ada20ebea73f73
>> Author: David Ahern<[email protected]>
>> Date: Tue May 22 09:40:17 2012 -0600
>>
>> perf, tool: Fix endianity swapping for adds_features bitmask
>>
>> Based on Jiri's latest attempt:
>> https://lkml.org/lkml/2012/5/16/61
>>
>> Basically, adds_features should be byte swapped assuming unsigned
>> longs are either 8-bytes (u64) or 4-bytes (u32).
>>
>> Fixes 32-bit ppc dumping 64-bit x86 feature data:
>> ========
>> captured on: Sun May 20 19:23:23 2012
>> hostname : nxos-vdc-dev3
>> os release : 3.4.0-rc7+
>> perf version : 3.4.rc4.137.g978da3
>> arch : x86_64
>> nrcpus online : 16
>> nrcpus avail : 16
>> cpudesc : Intel(R) Xeon(R) CPU E5540 @ 2.53GHz
>> cpuid : GenuineIntel,6,26,5
>> total memory : 24680324 kB
>> ...
>>
>> Verified 64-bit x86 can still dump feature data for 32-bit ppc.
>>
>> Signed-off-by: David Ahern<[email protected]>
>
> I got the header properly displayed with this patch, but I'm getting
> following diffs in the perf report output (ppc32 vs x86_64):
> (after moving origin perf archive build-id cache to target system)
>
> - 0.00% perf [ext4] [k] 0x0005b318
> + 0.00% perf [ext4] [k] .cleanup_module
>
> - 0.00% yes [kernel.kallsyms] [k] .sys_write
> + 0.00% yes [kernel.kallsyms] [k] .SyS_write
>
> ^^^ this one is particularly disturbing ;)
>
> I guess it's unrelated to the header stuff which your patch fixes
> properly I think, but I got small conflict rebasing this to current tip
>
> Reviewed-by: Jiri Olsa<[email protected]>

That is odd... and if you are getting that much you are ahead of me.
When I analyze an x86_64 file on ppc32 all symbols show as
kernel.kallsyms dso.

The patch applies cleanly for me on latest acme/core:
$ patch -p1 < perf-swap-features.patch
patching file tools/perf/util/header.c
patching file tools/perf/util/include/linux/bitops.h
patching file tools/perf/util/session.c
patching file tools/perf/util/session.h

David

2012-05-24 19:49:13

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 6/7] perf, tool: Fix endianity trick for adds_features bitmask

On Thu, May 24, 2012 at 09:32:10AM -0600, David Ahern wrote:
> On 5/23/12 11:59 AM, Jiri Olsa wrote:
> >On Tue, May 22, 2012 at 09:48:09AM -0600, David Ahern wrote:
> >>On 5/22/12 2:41 AM, Jiri Olsa wrote:
> >>
> >>>hm, any special details for the record? because I'm sure I tested this way..
> >>>
> >>>I'll retest, thanks
> >>>jirka
> >>
> >>
> >>The attached fixes it.
> >
> >>commit 1353676ca6551a0165df030784ada20ebea73f73
> >>Author: David Ahern<[email protected]>
> >>Date: Tue May 22 09:40:17 2012 -0600
> >>
> >> perf, tool: Fix endianity swapping for adds_features bitmask
> >>
> >> Based on Jiri's latest attempt:
> >> https://lkml.org/lkml/2012/5/16/61
> >>
> >> Basically, adds_features should be byte swapped assuming unsigned
> >> longs are either 8-bytes (u64) or 4-bytes (u32).
> >>
> >> Fixes 32-bit ppc dumping 64-bit x86 feature data:
> >> ========
> >> captured on: Sun May 20 19:23:23 2012
> >> hostname : nxos-vdc-dev3
> >> os release : 3.4.0-rc7+
> >> perf version : 3.4.rc4.137.g978da3
> >> arch : x86_64
> >> nrcpus online : 16
> >> nrcpus avail : 16
> >> cpudesc : Intel(R) Xeon(R) CPU E5540 @ 2.53GHz
> >> cpuid : GenuineIntel,6,26,5
> >> total memory : 24680324 kB
> >> ...
> >>
> >> Verified 64-bit x86 can still dump feature data for 32-bit ppc.
> >>
> >> Signed-off-by: David Ahern<[email protected]>
> >
> >I got the header properly displayed with this patch, but I'm getting
> >following diffs in the perf report output (ppc32 vs x86_64):
> >(after moving origin perf archive build-id cache to target system)
> >
> >- 0.00% perf [ext4] [k] 0x0005b318
> >+ 0.00% perf [ext4] [k] .cleanup_module
> >
> >- 0.00% yes [kernel.kallsyms] [k] .sys_write
> >+ 0.00% yes [kernel.kallsyms] [k] .SyS_write
> >
> > ^^^ this one is particularly disturbing ;)
> >
> >I guess it's unrelated to the header stuff which your patch fixes
> >properly I think, but I got small conflict rebasing this to current tip
> >
> >Reviewed-by: Jiri Olsa<[email protected]>
>
> That is odd... and if you are getting that much you are ahead of me.
> When I analyze an x86_64 file on ppc32 all symbols show as
> kernel.kallsyms dso.

I made some load and collected wide system data

>
> The patch applies cleanly for me on latest acme/core:
> $ patch -p1 < perf-swap-features.patch
> patching file tools/perf/util/header.c
> patching file tools/perf/util/include/linux/bitops.h
> patching file tools/perf/util/session.c
> patching file tools/perf/util/session.h

hm, I had some troubles aplying this to current tip, maybe it's ok now

I'll deal with this issue and send the rest of the patches (updated with
your comments) I have together with the new fix.

jirka

2012-06-15 19:11:36

by David Ahern

[permalink] [raw]
Subject: [tip:perf/urgent] perf tools: Fix endianity swapping for adds_features bitmask

Commit-ID: 80c0120a3cca30166c0ab8b24e44be67e97b79af
Gitweb: http://git.kernel.org/tip/80c0120a3cca30166c0ab8b24e44be67e97b79af
Author: David Ahern <[email protected]>
AuthorDate: Fri, 8 Jun 2012 11:47:51 -0300
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitDate: Mon, 11 Jun 2012 11:20:01 -0300

perf tools: Fix endianity swapping for adds_features bitmask

Based on Jiri's latest attempt:
https://lkml.org/lkml/2012/5/16/61

Basically, adds_features should be byte swapped assuming unsigned
longs are either 8-bytes (u64) or 4-bytes (u32).

Fixes 32-bit ppc dumping 64-bit x86 feature data:
========
captured on: Sun May 20 19:23:23 2012
hostname : nxos-vdc-dev3
os release : 3.4.0-rc7+
perf version : 3.4.rc4.137.g978da3
arch : x86_64
nrcpus online : 16
nrcpus avail : 16
cpudesc : Intel(R) Xeon(R) CPU E5540 @ 2.53GHz
cpuid : GenuineIntel,6,26,5
total memory : 24680324 kB
...

Verified 64-bit x86 can still dump feature data for 32-bit ppc.

Signed-off-by: David Ahern <[email protected]>
Reviewed-by: Jiri Olsa <[email protected]>
Cc: Corey Ashford <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/header.c | 16 +++++++++-------
tools/perf/util/include/linux/bitops.h | 2 ++
tools/perf/util/session.c | 10 ++++++++++
tools/perf/util/session.h | 1 +
4 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 2dd5edf..4f9b247 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -1942,7 +1942,6 @@ int perf_file_header__read(struct perf_file_header *header,
else
return -1;
} else if (ph->needs_swap) {
- unsigned int i;
/*
* feature bitmap is declared as an array of unsigned longs --
* not good since its size can differ between the host that
@@ -1958,14 +1957,17 @@ int perf_file_header__read(struct perf_file_header *header,
* file), punt and fallback to the original behavior --
* clearing all feature bits and setting buildid.
*/
- for (i = 0; i < BITS_TO_LONGS(HEADER_FEAT_BITS); ++i)
- header->adds_features[i] = bswap_64(header->adds_features[i]);
+ mem_bswap_64(&header->adds_features,
+ BITS_TO_U64(HEADER_FEAT_BITS));

if (!test_bit(HEADER_HOSTNAME, header->adds_features)) {
- for (i = 0; i < BITS_TO_LONGS(HEADER_FEAT_BITS); ++i) {
- header->adds_features[i] = bswap_64(header->adds_features[i]);
- header->adds_features[i] = bswap_32(header->adds_features[i]);
- }
+ /* unswap as u64 */
+ mem_bswap_64(&header->adds_features,
+ BITS_TO_U64(HEADER_FEAT_BITS));
+
+ /* unswap as u32 */
+ mem_bswap_32(&header->adds_features,
+ BITS_TO_U32(HEADER_FEAT_BITS));
}

if (!test_bit(HEADER_HOSTNAME, header->adds_features)) {
diff --git a/tools/perf/util/include/linux/bitops.h b/tools/perf/util/include/linux/bitops.h
index f1584833..587a230 100644
--- a/tools/perf/util/include/linux/bitops.h
+++ b/tools/perf/util/include/linux/bitops.h
@@ -8,6 +8,8 @@
#define BITS_PER_LONG __WORDSIZE
#define BITS_PER_BYTE 8
#define BITS_TO_LONGS(nr) DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(long))
+#define BITS_TO_U64(nr) DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(u64))
+#define BITS_TO_U32(nr) DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(u32))

#define for_each_set_bit(bit, addr, size) \
for ((bit) = find_first_bit((addr), (size)); \
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 2600916..c3e399b 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -442,6 +442,16 @@ static void perf_tool__fill_defaults(struct perf_tool *tool)
tool->finished_round = process_finished_round_stub;
}
}
+
+void mem_bswap_32(void *src, int byte_size)
+{
+ u32 *m = src;
+ while (byte_size > 0) {
+ *m = bswap_32(*m);
+ byte_size -= sizeof(u32);
+ ++m;
+ }
+}

void mem_bswap_64(void *src, int byte_size)
{
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index 7a5434c..0c702e3 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -80,6 +80,7 @@ struct branch_info *machine__resolve_bstack(struct machine *self,
bool perf_session__has_traces(struct perf_session *self, const char *msg);

void mem_bswap_64(void *src, int byte_size);
+void mem_bswap_32(void *src, int byte_size);
void perf_event__attr_swap(struct perf_event_attr *attr);

int perf_session__create_kernel_maps(struct perf_session *self);