2012-05-12 19:54:20

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [GIT PULL 0/8] Annotation weekly ponies delivery

Hi Ingo,

Please consider pulling,

- Arnaldo

The following changes since commit 5dcefda0fd87fefa440abc9b9d3f1089229f8911:

Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2012-05-11 08:13:55 +0200)

are available in the git repository at:


git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux tags/perf-core-for-mingo

for you to fetch changes up to 54e7a4e88eed9ac423e22a259ec51a973fd59bab:

perf annotate browser: Add key bindings help window (2012-05-12 16:36:55 -0300)

----------------------------------------------------------------
Arjan & Linus Annotation Edition

More like Linus', but hey at least an attempt at implementing a suggestion by
Arjan made this time around!

. Fix indirect calls beautifier, reported by Linus.

. Use the objdump comments to nuke specificities about how access to a well
know variable is encoded, suggested by Linus.

. Show the number of places that jump to a target, requested by Arjan.

Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

----------------------------------------------------------------
Arnaldo Carvalho de Melo (8):
perf annotate: Use raw form for register indirect call instructions
perf annotate: Resolve symbols using objdump comment
perf annotate: Resolve symbols using objdump comment for single op ins
perf annotate: Augment lock instruction output
perf annotate: Introduce ->free() method in ins_ops
perf annotate browser: Count the numbers of jump sources to a target
perf annotate browser: Show 'jumpy' functions
perf annotate browser: Add key bindings help window

tools/perf/ui/browsers/annotate.c | 94 ++++++++++--
tools/perf/util/annotate.c | 303 +++++++++++++++++++++++++++++++++----
tools/perf/util/annotate.h | 15 +-
3 files changed, 369 insertions(+), 43 deletions(-)


2012-05-12 19:53:22

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [PATCH 1/8] perf annotate: Use raw form for register indirect call instructions

From: Arnaldo Carvalho de Melo <[email protected]>

callq *0x10(%rax)

was being rendered in simplified mode as:

callq *10

I.e. hexa, but without the 0x and omitting the register. In such cases
just use the raw form.

Reported-by: Linus Torvalds <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/annotate.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 6b4146b..9a020d1 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -56,6 +56,12 @@ static int call__parse(struct ins_operands *ops)
return ops->target.name == NULL ? -1 : 0;

indirect_call:
+ tok = strchr(endptr, '(');
+ if (tok != NULL) {
+ ops->target.addr = 0;
+ return 0;
+ }
+
tok = strchr(endptr, '*');
if (tok == NULL)
return -1;
@@ -70,6 +76,9 @@ static int call__scnprintf(struct ins *ins, char *bf, size_t size,
if (ops->target.name)
return scnprintf(bf, size, "%-6.6s %s", ins->name, ops->target.name);

+ if (ops->target.addr == 0)
+ return ins__raw_scnprintf(ins, bf, size, ops);
+
return scnprintf(bf, size, "%-6.6s *%" PRIx64, ins->name, ops->target.addr);
}

--
1.7.9.2.358.g22243

2012-05-12 19:53:23

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [PATCH 8/8] perf annotate browser: Add key bindings help window

From: Arnaldo Carvalho de Melo <[email protected]>

Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/ui/browsers/annotate.c | 25 ++++++++++++++++++-------
1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c
index 547bd35..6e0ef79 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -581,10 +581,7 @@ static int annotate_browser__run(struct annotate_browser *self, int evidx,
struct rb_node *nd = NULL;
struct map_symbol *ms = self->b.priv;
struct symbol *sym = ms->sym;
- const char *help = "<-/ESC: Exit, TAB/shift+TAB: Cycle hot lines, "
- "H: Hottest line, ->/ENTER: Line action, "
- "O: Offset view, "
- "S: Source view";
+ const char *help = "Press 'h' for help on key bindings";
int key;

if (ui_browser__show(&self->b, sym->name, help) < 0)
@@ -637,16 +634,30 @@ static int annotate_browser__run(struct annotate_browser *self, int evidx,
else
nd = self->curr_hot;
break;
- case 'H':
+ case K_F1:
case 'h':
+ ui_browser__help_window(&self->b,
+ "UP/DOWN/PGUP\n"
+ "PGDN/SPACE Navigate\n"
+ "q/ESC/CTRL+C Exit\n\n"
+ "-> Go to target\n"
+ "<- Exit\n"
+ "h Cycle thru hottest instructions\n"
+ "j Toggle showing jump to target arrows\n"
+ "J Toggle showing number of jump sources on targets\n"
+ "n Search next string\n"
+ "o Toggle disassembler output/simplified view\n"
+ "s Toggle source code view\n"
+ "/ Search string\n"
+ "? Search previous string\n");
+ continue;
+ case 'H':
nd = self->curr_hot;
break;
- case 'S':
case 's':
if (annotate_browser__toggle_source(self))
ui_helpline__puts(help);
continue;
- case 'O':
case 'o':
self->use_offset = !self->use_offset;
if (self->use_offset)
--
1.7.9.2.358.g22243

2012-05-12 19:54:30

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [PATCH 3/8] perf annotate: Resolve symbols using objdump comment for single op ins

From: Arnaldo Carvalho de Melo <[email protected]>

Starting with inc, incl, dec, decl.

Requested-by: Linus Torvalds <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/annotate.c | 45 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 45 insertions(+)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 82c7f63..a6109dc 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -205,6 +205,47 @@ static struct ins_ops mov_ops = {
.scnprintf = mov__scnprintf,
};

+static int dec__parse(struct ins_operands *ops)
+{
+ char *target, *comment, *s, prev;
+
+ target = s = ops->raw;
+
+ while (s[0] != '\0' && !isspace(s[0]))
+ ++s;
+ prev = *s;
+ *s = '\0';
+
+ ops->target.raw = strdup(target);
+ *s = prev;
+
+ if (ops->target.raw == NULL)
+ return -1;
+
+ comment = strchr(s, '#');
+ if (comment == NULL)
+ return 0;
+
+ while (comment[0] != '\0' && isspace(comment[0]))
+ ++comment;
+
+ comment__symbol(ops->target.raw, comment, &ops->target.addr, &ops->target.name);
+
+ return 0;
+}
+
+static int dec__scnprintf(struct ins *ins, char *bf, size_t size,
+ struct ins_operands *ops)
+{
+ return scnprintf(bf, size, "%-6.6s %s", ins->name,
+ ops->target.name ?: ops->target.raw);
+}
+
+static struct ins_ops dec_ops = {
+ .parse = dec__parse,
+ .scnprintf = dec__scnprintf,
+};
+
static int nop__scnprintf(struct ins *ins __used, char *bf, size_t size,
struct ins_operands *ops __used)
{
@@ -232,7 +273,11 @@ static struct ins instructions[] = {
{ .name = "cmpq", .ops = &mov_ops, },
{ .name = "cmpw", .ops = &mov_ops, },
{ .name = "cmpxch", .ops = &mov_ops, },
+ { .name = "dec", .ops = &dec_ops, },
+ { .name = "decl", .ops = &dec_ops, },
{ .name = "imul", .ops = &mov_ops, },
+ { .name = "inc", .ops = &dec_ops, },
+ { .name = "incl", .ops = &dec_ops, },
{ .name = "ja", .ops = &jump_ops, },
{ .name = "jae", .ops = &jump_ops, },
{ .name = "jb", .ops = &jump_ops, },
--
1.7.9.2.358.g22243

2012-05-12 19:54:32

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [PATCH 6/8] perf annotate browser: Count the numbers of jump sources to a target

From: Arnaldo Carvalho de Melo <[email protected]>

Instead of simply marking an offset as a jump target. So that we can
implement a new feature: showing "jumpy" targets, I.e. addresses that
lots of places jump to.

Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/ui/browsers/annotate.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c
index 06367c1..4463626 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -16,7 +16,7 @@ struct browser_disasm_line {
double percent;
u32 idx;
int idx_asm;
- bool jump_target;
+ int jump_sources;
};

struct annotate_browser {
@@ -98,7 +98,7 @@ static void annotate_browser__write(struct ui_browser *self, void *entry, int ro
if (!ab->use_offset) {
printed = scnprintf(bf, sizeof(bf), "%" PRIx64 ": ", addr);
} else {
- if (bdl->jump_target) {
+ if (bdl->jump_sources) {
printed = scnprintf(bf, sizeof(bf), "%*" PRIx64 ": ",
ab->addr_width, addr);
} else {
@@ -707,7 +707,7 @@ static void annotate_browser__mark_jump_targets(struct annotate_browser *browser
continue;

bdlt = disasm_line__browser(dlt);
- bdlt->jump_target = true;
+ ++bdlt->jump_sources;
}

}
--
1.7.9.2.358.g22243

2012-05-12 19:54:27

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [PATCH 7/8] perf annotate browser: Show 'jumpy' functions

From: Arnaldo Carvalho de Melo <[email protected]>

Just press 'J' and see how many places jump to jump targets.

The hottest jump target appears in red, targets with more than one
source have a different color than single source jump targets.

Suggested-by: Arjan van de Ven <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/ui/browsers/annotate.c | 65 ++++++++++++++++++++++++++++++++++---
1 file changed, 60 insertions(+), 5 deletions(-)

diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c
index 4463626..547bd35 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -28,11 +28,16 @@ struct annotate_browser {
u64 start;
int nr_asm_entries;
int nr_entries;
+ int max_jump_sources;
+ int nr_jumps;
bool hide_src_code;
bool use_offset;
bool jump_arrows;
+ bool show_nr_jumps;
bool searching_backwards;
u8 addr_width;
+ u8 jumps_width;
+ u8 target_width;
u8 min_addr_width;
u8 max_addr_width;
char search_bf[128];
@@ -55,6 +60,25 @@ static bool disasm_line__filter(struct ui_browser *browser, void *entry)
return false;
}

+static int annotate_browser__jumps_percent_color(struct annotate_browser *browser,
+ int nr, bool current)
+{
+ if (current && (!browser->b.use_navkeypressed || browser->b.navkeypressed))
+ return HE_COLORSET_SELECTED;
+ if (nr == browser->max_jump_sources)
+ return HE_COLORSET_TOP;
+ if (nr > 1)
+ return HE_COLORSET_MEDIUM;
+ return HE_COLORSET_NORMAL;
+}
+
+static int annotate_browser__set_jumps_percent_color(struct annotate_browser *browser,
+ int nr, bool current)
+{
+ int color = annotate_browser__jumps_percent_color(browser, nr, current);
+ return ui_browser__set_color(&browser->b, color);
+}
+
static void annotate_browser__write(struct ui_browser *self, void *entry, int row)
{
struct annotate_browser *ab = container_of(self, struct annotate_browser, b);
@@ -99,8 +123,19 @@ static void annotate_browser__write(struct ui_browser *self, void *entry, int ro
printed = scnprintf(bf, sizeof(bf), "%" PRIx64 ": ", addr);
} else {
if (bdl->jump_sources) {
+ if (ab->show_nr_jumps) {
+ int prev;
+ printed = scnprintf(bf, sizeof(bf), "%*d ",
+ ab->jumps_width,
+ bdl->jump_sources);
+ prev = annotate_browser__set_jumps_percent_color(ab, bdl->jump_sources,
+ current_entry);
+ slsmg_write_nstring(bf, printed);
+ ui_browser__set_color(self, prev);
+ }
+
printed = scnprintf(bf, sizeof(bf), "%*" PRIx64 ": ",
- ab->addr_width, addr);
+ ab->target_width, addr);
} else {
printed = scnprintf(bf, sizeof(bf), "%*s ",
ab->addr_width, " ");
@@ -615,13 +650,20 @@ static int annotate_browser__run(struct annotate_browser *self, int evidx,
case 'o':
self->use_offset = !self->use_offset;
if (self->use_offset)
- self->addr_width = self->min_addr_width;
+ self->target_width = self->min_addr_width;
else
- self->addr_width = self->max_addr_width;
+ self->target_width = self->max_addr_width;
+update_addr_width:
+ self->addr_width = self->target_width;
+ if (self->show_nr_jumps)
+ self->addr_width += self->jumps_width + 1;
continue;
case 'j':
self->jump_arrows = !self->jump_arrows;
continue;
+ case 'J':
+ self->show_nr_jumps = !self->show_nr_jumps;
+ goto update_addr_width;
case '/':
if (annotate_browser__search(self, delay_secs)) {
show_help:
@@ -707,11 +749,23 @@ static void annotate_browser__mark_jump_targets(struct annotate_browser *browser
continue;

bdlt = disasm_line__browser(dlt);
- ++bdlt->jump_sources;
+ if (++bdlt->jump_sources > browser->max_jump_sources)
+ browser->max_jump_sources = bdlt->jump_sources;
+
+ ++browser->nr_jumps;
}

}

+static inline int width_jumps(int n)
+{
+ if (n >= 100)
+ return 5;
+ if (n / 10)
+ return 2;
+ return 1;
+}
+
int symbol__tui_annotate(struct symbol *sym, struct map *map, int evidx,
void(*timer)(void *arg), void *arg,
int delay_secs)
@@ -784,8 +838,9 @@ int symbol__tui_annotate(struct symbol *sym, struct map *map, int evidx,

annotate_browser__mark_jump_targets(&browser, size);

- browser.addr_width = browser.min_addr_width = hex_width(size);
+ browser.addr_width = browser.target_width = browser.min_addr_width = hex_width(size);
browser.max_addr_width = hex_width(sym->end);
+ browser.jumps_width = width_jumps(browser.max_jump_sources);
browser.b.nr_entries = browser.nr_entries;
browser.b.entries = &notes->src->source,
browser.b.width += 18; /* Percentage */
--
1.7.9.2.358.g22243

2012-05-12 19:54:24

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [PATCH 4/8] perf annotate: Augment lock instruction output

From: Arnaldo Carvalho de Melo <[email protected]>

It just chops off the 'lock' and uses the ins__find, etc machinery to
call instruction specific parsers/beautifiers.

Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/annotate.c | 127 ++++++++++++++++++++++++++++++++++----------
tools/perf/util/annotate.h | 16 ++++--
2 files changed, 109 insertions(+), 34 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index a6109dc..1dce098 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -18,6 +18,9 @@

const char *disassembler_style;

+static struct ins *ins__find(const char *name);
+static int disasm_line__parse(char *line, char **namep, char **rawp);
+
static int ins__raw_scnprintf(struct ins *ins, char *bf, size_t size,
struct ins_operands *ops)
{
@@ -147,6 +150,53 @@ static int comment__symbol(char *raw, char *comment, u64 *addrp, char **namep)
return 0;
}

+static int lock__parse(struct ins_operands *ops)
+{
+ char *name;
+
+ ops->locked.ops = zalloc(sizeof(*ops->locked.ops));
+ if (ops->locked.ops == NULL)
+ return 0;
+
+ if (disasm_line__parse(ops->raw, &name, &ops->locked.ops->raw) < 0)
+ goto out_free_ops;
+
+ ops->locked.ins = ins__find(name);
+ if (ops->locked.ins == NULL)
+ goto out_free_ops;
+
+ if (!ops->locked.ins->ops)
+ return 0;
+
+ if (ops->locked.ins->ops->parse)
+ ops->locked.ins->ops->parse(ops->locked.ops);
+
+ return 0;
+
+out_free_ops:
+ free(ops->locked.ops);
+ ops->locked.ops = NULL;
+ return 0;
+}
+
+static int lock__scnprintf(struct ins *ins, char *bf, size_t size,
+ struct ins_operands *ops)
+{
+ int printed;
+
+ if (ops->locked.ins == NULL)
+ return ins__raw_scnprintf(ins, bf, size, ops);
+
+ printed = scnprintf(bf, size, "%-6.6s ", ins->name);
+ return printed + ins__scnprintf(ops->locked.ins, bf + printed,
+ size - printed, ops->locked.ops);
+}
+
+static struct ins_ops lock_ops = {
+ .parse = lock__parse,
+ .scnprintf = lock__scnprintf,
+};
+
static int mov__parse(struct ins_operands *ops)
{
char *s = strchr(ops->raw, ','), *target, *comment, prev;
@@ -265,6 +315,7 @@ static struct ins instructions[] = {
{ .name = "addq", .ops = &mov_ops, },
{ .name = "addw", .ops = &mov_ops, },
{ .name = "and", .ops = &mov_ops, },
+ { .name = "bts", .ops = &mov_ops, },
{ .name = "call", .ops = &call_ops, },
{ .name = "callq", .ops = &call_ops, },
{ .name = "cmp", .ops = &mov_ops, },
@@ -314,6 +365,7 @@ static struct ins instructions[] = {
{ .name = "js", .ops = &jump_ops, },
{ .name = "jz", .ops = &jump_ops, },
{ .name = "lea", .ops = &mov_ops, },
+ { .name = "lock", .ops = &lock_ops, },
{ .name = "mov", .ops = &mov_ops, },
{ .name = "movb", .ops = &mov_ops, },
{ .name = "movdqa",.ops = &mov_ops, },
@@ -330,6 +382,7 @@ static struct ins instructions[] = {
{ .name = "test", .ops = &mov_ops, },
{ .name = "testb", .ops = &mov_ops, },
{ .name = "testl", .ops = &mov_ops, },
+ { .name = "xadd", .ops = &mov_ops, },
};

static int ins__cmp(const void *name, const void *insp)
@@ -420,6 +473,44 @@ static void disasm_line__init_ins(struct disasm_line *dl)
dl->ins->ops->parse(&dl->ops);
}

+static int disasm_line__parse(char *line, char **namep, char **rawp)
+{
+ char *name = line, tmp;
+
+ while (isspace(name[0]))
+ ++name;
+
+ if (name[0] == '\0')
+ return -1;
+
+ *rawp = name + 1;
+
+ while ((*rawp)[0] != '\0' && !isspace((*rawp)[0]))
+ ++*rawp;
+
+ tmp = (*rawp)[0];
+ (*rawp)[0] = '\0';
+ *namep = strdup(name);
+
+ if (*namep == NULL)
+ goto out_free_name;
+
+ (*rawp)[0] = tmp;
+
+ if ((*rawp)[0] != '\0') {
+ (*rawp)++;
+ while (isspace((*rawp)[0]))
+ ++(*rawp);
+ }
+
+ return 0;
+
+out_free_name:
+ free(*namep);
+ *namep = NULL;
+ return -1;
+}
+
static struct disasm_line *disasm_line__new(s64 offset, char *line, size_t privsize)
{
struct disasm_line *dl = zalloc(sizeof(*dl) + privsize);
@@ -431,35 +522,9 @@ static struct disasm_line *disasm_line__new(s64 offset, char *line, size_t privs
goto out_delete;

if (offset != -1) {
- char *name = dl->line, tmp;
-
- while (isspace(name[0]))
- ++name;
-
- if (name[0] == '\0')
- goto out_delete;
-
- dl->ops.raw = name + 1;
-
- while (dl->ops.raw[0] != '\0' &&
- !isspace(dl->ops.raw[0]))
- ++dl->ops.raw;
-
- tmp = dl->ops.raw[0];
- dl->ops.raw[0] = '\0';
- dl->name = strdup(name);
-
- if (dl->name == NULL)
+ if (disasm_line__parse(dl->line, &dl->name, &dl->ops.raw) < 0)
goto out_free_line;

- dl->ops.raw[0] = tmp;
-
- if (dl->ops.raw[0] != '\0') {
- dl->ops.raw++;
- while (isspace(dl->ops.raw[0]))
- ++dl->ops.raw;
- }
-
disasm_line__init_ins(dl);
}
}
@@ -477,8 +542,12 @@ void disasm_line__free(struct disasm_line *dl)
{
free(dl->line);
free(dl->name);
- free(dl->ops.source.raw);
- free(dl->ops.source.name);
+ if (dl->ins && dl->ins->ops == &lock_ops) {
+ free(dl->ops.locked.ops);
+ } else {
+ free(dl->ops.source.raw);
+ free(dl->ops.source.name);
+ }
free(dl->ops.target.raw);
free(dl->ops.target.name);
free(dl);
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index 066d31d..b79d326 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -18,11 +18,17 @@ struct ins_operands {
u64 addr;
u64 offset;
} target;
- struct {
- char *raw;
- char *name;
- u64 addr;
- } source;
+ union {
+ struct {
+ char *raw;
+ char *name;
+ u64 addr;
+ } source;
+ struct {
+ struct ins *ins;
+ struct ins_operands *ops;
+ } locked;
+ };
};

struct ins_ops {
--
1.7.9.2.358.g22243

2012-05-12 19:54:22

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [PATCH 5/8] perf annotate: Introduce ->free() method in ins_ops

From: Arnaldo Carvalho de Melo <[email protected]>

So that we don't special case disasm_line__free, allowing each
instruction class to provide an specialized destructor, like is needed
for 'lock'.

Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/annotate.c | 28 ++++++++++++++++++++--------
tools/perf/util/annotate.h | 1 +
2 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 1dce098..8069dfb 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -21,6 +21,14 @@ const char *disassembler_style;
static struct ins *ins__find(const char *name);
static int disasm_line__parse(char *line, char **namep, char **rawp);

+static void ins__delete(struct ins_operands *ops)
+{
+ free(ops->source.raw);
+ free(ops->source.name);
+ free(ops->target.raw);
+ free(ops->target.name);
+}
+
static int ins__raw_scnprintf(struct ins *ins, char *bf, size_t size,
struct ins_operands *ops)
{
@@ -192,7 +200,15 @@ static int lock__scnprintf(struct ins *ins, char *bf, size_t size,
size - printed, ops->locked.ops);
}

+static void lock__delete(struct ins_operands *ops)
+{
+ free(ops->locked.ops);
+ free(ops->target.raw);
+ free(ops->target.name);
+}
+
static struct ins_ops lock_ops = {
+ .free = lock__delete,
.parse = lock__parse,
.scnprintf = lock__scnprintf,
};
@@ -542,14 +558,10 @@ void disasm_line__free(struct disasm_line *dl)
{
free(dl->line);
free(dl->name);
- if (dl->ins && dl->ins->ops == &lock_ops) {
- free(dl->ops.locked.ops);
- } else {
- free(dl->ops.source.raw);
- free(dl->ops.source.name);
- }
- free(dl->ops.target.raw);
- free(dl->ops.target.name);
+ if (dl->ins && dl->ins->ops->free)
+ dl->ins->ops->free(&dl->ops);
+ else
+ ins__delete(&dl->ops);
free(dl);
}

diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index b79d326..78a5692 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -32,6 +32,7 @@ struct ins_operands {
};

struct ins_ops {
+ void (*free)(struct ins_operands *ops);
int (*parse)(struct ins_operands *ops);
int (*scnprintf)(struct ins *ins, char *bf, size_t size,
struct ins_operands *ops);
--
1.7.9.2.358.g22243

2012-05-12 19:54:17

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: [PATCH 2/8] perf annotate: Resolve symbols using objdump comment

From: Arnaldo Carvalho de Melo <[email protected]>

This:

mov 0x95bbb6(%rip),%ecx # ffffffff81ae8d04 <d_hash_shift>

Becomes:

mov d_hash_shift,%ecx

Ditto for many more instructions that take two operands.

Requested-by: Linus Torvalds <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/annotate.c | 112 ++++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/annotate.h | 8 +++-
2 files changed, 119 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 9a020d1..82c7f63 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -122,6 +122,89 @@ bool ins__is_jump(const struct ins *ins)
return ins->ops == &jump_ops;
}

+static int comment__symbol(char *raw, char *comment, u64 *addrp, char **namep)
+{
+ char *endptr, *name, *t;
+
+ if (strstr(raw, "(%rip)") == NULL)
+ return 0;
+
+ *addrp = strtoull(comment, &endptr, 16);
+ name = strchr(endptr, '<');
+ if (name == NULL)
+ return -1;
+
+ name++;
+
+ t = strchr(name, '>');
+ if (t == NULL)
+ return 0;
+
+ *t = '\0';
+ *namep = strdup(name);
+ *t = '>';
+
+ return 0;
+}
+
+static int mov__parse(struct ins_operands *ops)
+{
+ char *s = strchr(ops->raw, ','), *target, *comment, prev;
+
+ if (s == NULL)
+ return -1;
+
+ *s = '\0';
+ ops->source.raw = strdup(ops->raw);
+ *s = ',';
+
+ if (ops->source.raw == NULL)
+ return -1;
+
+ target = ++s;
+
+ while (s[0] != '\0' && !isspace(s[0]))
+ ++s;
+ prev = *s;
+ *s = '\0';
+
+ ops->target.raw = strdup(target);
+ *s = prev;
+
+ if (ops->target.raw == NULL)
+ goto out_free_source;
+
+ comment = strchr(s, '#');
+ if (comment == NULL)
+ return 0;
+
+ while (comment[0] != '\0' && isspace(comment[0]))
+ ++comment;
+
+ comment__symbol(ops->source.raw, comment, &ops->source.addr, &ops->source.name);
+ comment__symbol(ops->target.raw, comment, &ops->target.addr, &ops->target.name);
+
+ return 0;
+
+out_free_source:
+ free(ops->source.raw);
+ ops->source.raw = NULL;
+ return -1;
+}
+
+static int mov__scnprintf(struct ins *ins, char *bf, size_t size,
+ struct ins_operands *ops)
+{
+ return scnprintf(bf, size, "%-6.6s %s,%s", ins->name,
+ ops->source.name ?: ops->source.raw,
+ ops->target.name ?: ops->target.raw);
+}
+
+static struct ins_ops mov_ops = {
+ .parse = mov__parse,
+ .scnprintf = mov__scnprintf,
+};
+
static int nop__scnprintf(struct ins *ins __used, char *bf, size_t size,
struct ins_operands *ops __used)
{
@@ -136,8 +219,20 @@ static struct ins_ops nop_ops = {
* Must be sorted by name!
*/
static struct ins instructions[] = {
+ { .name = "add", .ops = &mov_ops, },
+ { .name = "addl", .ops = &mov_ops, },
+ { .name = "addq", .ops = &mov_ops, },
+ { .name = "addw", .ops = &mov_ops, },
+ { .name = "and", .ops = &mov_ops, },
{ .name = "call", .ops = &call_ops, },
{ .name = "callq", .ops = &call_ops, },
+ { .name = "cmp", .ops = &mov_ops, },
+ { .name = "cmpb", .ops = &mov_ops, },
+ { .name = "cmpl", .ops = &mov_ops, },
+ { .name = "cmpq", .ops = &mov_ops, },
+ { .name = "cmpw", .ops = &mov_ops, },
+ { .name = "cmpxch", .ops = &mov_ops, },
+ { .name = "imul", .ops = &mov_ops, },
{ .name = "ja", .ops = &jump_ops, },
{ .name = "jae", .ops = &jump_ops, },
{ .name = "jb", .ops = &jump_ops, },
@@ -173,9 +268,23 @@ static struct ins instructions[] = {
{ .name = "jrcxz", .ops = &jump_ops, },
{ .name = "js", .ops = &jump_ops, },
{ .name = "jz", .ops = &jump_ops, },
+ { .name = "lea", .ops = &mov_ops, },
+ { .name = "mov", .ops = &mov_ops, },
+ { .name = "movb", .ops = &mov_ops, },
+ { .name = "movdqa",.ops = &mov_ops, },
+ { .name = "movl", .ops = &mov_ops, },
+ { .name = "movq", .ops = &mov_ops, },
+ { .name = "movslq", .ops = &mov_ops, },
+ { .name = "movzbl", .ops = &mov_ops, },
+ { .name = "movzwl", .ops = &mov_ops, },
{ .name = "nop", .ops = &nop_ops, },
{ .name = "nopl", .ops = &nop_ops, },
{ .name = "nopw", .ops = &nop_ops, },
+ { .name = "or", .ops = &mov_ops, },
+ { .name = "orl", .ops = &mov_ops, },
+ { .name = "test", .ops = &mov_ops, },
+ { .name = "testb", .ops = &mov_ops, },
+ { .name = "testl", .ops = &mov_ops, },
};

static int ins__cmp(const void *name, const void *insp)
@@ -323,6 +432,9 @@ void disasm_line__free(struct disasm_line *dl)
{
free(dl->line);
free(dl->name);
+ free(dl->ops.source.raw);
+ free(dl->ops.source.name);
+ free(dl->ops.target.raw);
free(dl->ops.target.name);
free(dl);
}
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index bb0a9f2..066d31d 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -13,10 +13,16 @@ struct ins;
struct ins_operands {
char *raw;
struct {
+ char *raw;
char *name;
- u64 offset;
u64 addr;
+ u64 offset;
} target;
+ struct {
+ char *raw;
+ char *name;
+ u64 addr;
+ } source;
};

struct ins_ops {
--
1.7.9.2.358.g22243

2012-05-12 20:40:32

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL 0/8] Annotation weekly ponies delivery

On Sat, May 12, 2012 at 12:52 PM, Arnaldo Carvalho de Melo
<[email protected]> wrote:
>
> Arjan & Linus Annotation Edition

I pulled this into my current git tree just for testing.

It's horribly broken, but it's not the annotation that is broken.

The whole "-e cycles:pp" doesn't work any more. I don't get any nice
PEBS information, I get the totally useless irq-based profiling.

The difference for a "make -j" profile is quite stunning:

Doing "perf record -f -e cycles:pp -F 20000 make -j"

- my current git:

[ perf record: Woken up 47 times to write data ]
[ perf record: Captured and wrote 11.890 MB perf.data (~519462 samples) ]

- with the above tree pulled into my current git tree (but compiling
the same old tree):

[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.031 MB perf.data (~1375 samples) ]

and not suprisingly, the resulting profiles are crap due to lack of
reasonable sample coverage.

I don't think this has anything to do with your changes, but I thought
I'd report it anyway. Something in 'perf' land is broken.

Linus

>
> More like Linus', but hey at least an attempt at implementing a suggestion by
> Arjan made this time around!
>
> . Fix indirect calls beautifier, reported by Linus.
>
> . Use the objdump comments to nuke specificities about how access to a well
> ?know variable is encoded, suggested by Linus.
>
> . Show the number of places that jump to a target, requested by Arjan.
>
> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
>
> ----------------------------------------------------------------
> Arnaldo Carvalho de Melo (8):
> ? ? ?perf annotate: Use raw form for register indirect call instructions
> ? ? ?perf annotate: Resolve symbols using objdump comment
> ? ? ?perf annotate: Resolve symbols using objdump comment for single op ins
> ? ? ?perf annotate: Augment lock instruction output
> ? ? ?perf annotate: Introduce ->free() method in ins_ops
> ? ? ?perf annotate browser: Count the numbers of jump sources to a target
> ? ? ?perf annotate browser: Show 'jumpy' functions
> ? ? ?perf annotate browser: Add key bindings help window
>
> ?tools/perf/ui/browsers/annotate.c | ? 94 ++++++++++--
> ?tools/perf/util/annotate.c ? ? ? ?| ?303 +++++++++++++++++++++++++++++++++----
> ?tools/perf/util/annotate.h ? ? ? ?| ? 15 +-
> ?3 files changed, 369 insertions(+), 43 deletions(-)

2012-05-12 21:25:42

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [GIT PULL 0/8] Annotation weekly ponies delivery

Em Sat, May 12, 2012 at 01:40:07PM -0700, Linus Torvalds escreveu:
> On Sat, May 12, 2012 at 12:52 PM, Arnaldo Carvalho de Melo
> <[email protected]> wrote:
> >
> > Arjan & Linus Annotation Edition
>
> I pulled this into my current git tree just for testing.
>
> It's horribly broken, but it's not the annotation that is broken.
>
> The whole "-e cycles:pp" doesn't work any more. I don't get any nice
> PEBS information, I get the totally useless irq-based profiling.

> I don't think this has anything to do with your changes, but I thought
> I'd report it anyway. Something in 'perf' land is broken.

Didn't notice it, but then I'm not updating the kernel that frequently,
will check on this Sandy Bridge nobo (i7-2920XM).

- Arnaldo

2012-05-12 22:29:56

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL 0/8] Annotation weekly ponies delivery

On Sat, May 12, 2012 at 2:25 PM, Arnaldo Carvalho de Melo
<[email protected]> wrote:
>
> Didn't notice it, but then I'm not updating the kernel that frequently,
> will check on this Sandy Bridge nobo (i7-2920XM).

I think SNB has problems with PEBS and cycles anyway. I have a
westmere machine that I use for profiling for that reason.

Linus

2012-05-14 07:05:14

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL 0/8] Annotation weekly ponies delivery


* Linus Torvalds <[email protected]> wrote:

> On Sat, May 12, 2012 at 2:25 PM, Arnaldo Carvalho de Melo
> <[email protected]> wrote:
> >
> > Didn't notice it, but then I'm not updating the kernel that frequently,
> > will check on this Sandy Bridge nobo (i7-2920XM).
>
> I think SNB has problems with PEBS and cycles anyway. I have a
> westmere machine that I use for profiling for that reason.

I have a pre-westmere Nehalem that I use most for perf testing -
that one appears to be working well both with default cycles and
with cycles:pp. Will test on a Westmere.

btw., Arnaldo, could we make cycles:pp the default event on perf
record and perf top? We have the new AMD IBS code in perf/core
queued up for v3.5, which is roughly equivalent to cycles:p on
Intel CPUs, so if we standardize on cycles:pp we'll get improved
skidless (and on Intel pre-SB CPUs, precise) profiling output by
default.

While at it I'd also suggest increasing the default sampling
frequency, from 1000 Hz per CPU to at least 4Khz auto-freq or so
- this should work well all across the board I think. CPUs are
getting faster and command/app run times are getting shorter,
1Khz is a bit low IMO.

Thanks,

Ingo

2012-05-14 11:13:31

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [GIT PULL 0/8] Annotation weekly ponies delivery

On Sat, 2012-05-12 at 13:40 -0700, Linus Torvalds wrote:
> The whole "-e cycles:pp" doesn't work any more. I don't get any nice
> PEBS information, I get the totally useless irq-based profiling.
>
> The difference for a "make -j" profile is quite stunning:
>
> Doing "perf record -f -e cycles:pp -F 20000 make -j"
>
> - my current git:
>
> [ perf record: Woken up 47 times to write data ]
> [ perf record: Captured and wrote 11.890 MB perf.data (~519462 samples) ]
>
> - with the above tree pulled into my current git tree (but compiling
> the same old tree):
>
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.031 MB perf.data (~1375 samples) ]

The output simply suggests we're not getting enough samples not that
PEBS isn't working, in fact I can could reproduce without using PEBS.

This bisected to the below commit, the code has since been changed again
and all that evlist stuff gives me a head-ache. Acme, Namhyung ?


55261f46702cec96911a81aacfb3cba13434d304 is the first bad commit
commit 55261f46702cec96911a81aacfb3cba13434d304
Author: Namhyung Kim <[email protected]>
Date: Mon May 7 14:08:59 2012 +0900

perf evlist: Fix creation of cpu map

Currently, 'perf record -- sleep 1' creates a cpu map for all online
cpus since it turns out calling cpu_map__new(NULL). Fix it.

Also it is guaranteed that cpu_list is NULL if PID/TID is given by
calling perf_target__validate(), so we can make the conditional bit
simpler.

This also fixes perf test 7 (Validate) failure on my 6 core machine:

$ cat /sys/devices/system/cpu/online
0-11
$ ./perf test -v 7
7: Validate PERF_RECORD_* events & perf_sample fields:
--- start ---
perf_evlist__mmap: Operation not permitted
---- end ----
Validate PERF_RECORD_* events & perf_sample fields: FAILED!

Signed-off-by: Namhyung Kim <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

2012-05-14 11:55:58

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL 0/8] Annotation weekly ponies delivery


* Peter Zijlstra <[email protected]> wrote:

> On Sat, 2012-05-12 at 13:40 -0700, Linus Torvalds wrote:
> > The whole "-e cycles:pp" doesn't work any more. I don't get any nice
> > PEBS information, I get the totally useless irq-based profiling.
> >
> > The difference for a "make -j" profile is quite stunning:
> >
> > Doing "perf record -f -e cycles:pp -F 20000 make -j"
> >
> > - my current git:
> >
> > [ perf record: Woken up 47 times to write data ]
> > [ perf record: Captured and wrote 11.890 MB perf.data (~519462 samples) ]
> >
> > - with the above tree pulled into my current git tree (but compiling
> > the same old tree):
> >
> > [ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 0.031 MB perf.data (~1375 samples) ]
>
> The output simply suggests we're not getting enough samples
> not that PEBS isn't working, in fact I can could reproduce
> without using PEBS.
>
> This bisected to the below commit, the code has since been
> changed again and all that evlist stuff gives me a head-ache.
> Acme, Namhyung ?
>
> 55261f46702cec96911a81aacfb3cba13434d304 is the first bad commit
> commit 55261f46702cec96911a81aacfb3cba13434d304
> Author: Namhyung Kim <[email protected]>
> Date: Mon May 7 14:08:59 2012 +0900
>
> perf evlist: Fix creation of cpu map

Another detail seems to be that the bug takes per-task-inherited
profiling. Doing:

perf record -a -e cycles:pp make -j64 bzImage

produces the expected number of events. Without the '-a' the bug
Linus found triggers.

Similarly, 'perf top' does not trigger the bug - because it's
using per CPU, not per task (inherited) profiling.

Thanks,

Ingo

2012-05-14 15:06:36

by Namhyung Kim

[permalink] [raw]
Subject: Re: [GIT PULL 0/8] Annotation weekly ponies delivery

2012-05-14 (월), 13:55 +0200, Ingo Molnar:
> * Peter Zijlstra <[email protected]> wrote:
>
> > On Sat, 2012-05-12 at 13:40 -0700, Linus Torvalds wrote:
> > > The whole "-e cycles:pp" doesn't work any more. I don't get any nice
> > > PEBS information, I get the totally useless irq-based profiling.
> > >
> > > The difference for a "make -j" profile is quite stunning:
> > >
> > > Doing "perf record -f -e cycles:pp -F 20000 make -j"
> > >
> > > - my current git:
> > >
> > > [ perf record: Woken up 47 times to write data ]
> > > [ perf record: Captured and wrote 11.890 MB perf.data (~519462 samples) ]
> > >
> > > - with the above tree pulled into my current git tree (but compiling
> > > the same old tree):
> > >
> > > [ perf record: Woken up 1 times to write data ]
> > > [ perf record: Captured and wrote 0.031 MB perf.data (~1375 samples) ]
> >
> > The output simply suggests we're not getting enough samples
> > not that PEBS isn't working, in fact I can could reproduce
> > without using PEBS.
> >
> > This bisected to the below commit, the code has since been
> > changed again and all that evlist stuff gives me a head-ache.
> > Acme, Namhyung ?
> >
> > 55261f46702cec96911a81aacfb3cba13434d304 is the first bad commit
> > commit 55261f46702cec96911a81aacfb3cba13434d304
> > Author: Namhyung Kim <[email protected]>
> > Date: Mon May 7 14:08:59 2012 +0900
> >
> > perf evlist: Fix creation of cpu map
>
> Another detail seems to be that the bug takes per-task-inherited
> profiling. Doing:
>
> perf record -a -e cycles:pp make -j64 bzImage
>
> produces the expected number of events. Without the '-a' the bug
> Linus found triggers.
>
> Similarly, 'perf top' does not trigger the bug - because it's
> using per CPU, not per task (inherited) profiling.
>

perf_evlist__config_attrs() has this:

if (evlist->cpus->map[0] < 0)
opts->no_inherit = true;

meaning that per task profiling won't enable event inheritance. I don't
know why it's needed though.


--
Regards,
Namhyung Kim

2012-05-15 04:50:29

by Namhyung Kim

[permalink] [raw]
Subject: Re: [GIT PULL 0/8] Annotation weekly ponies delivery

Hi,

On Tue, 15 May 2012 00:06:25 +0900, Namhyung Kim wrote:
> 2012-05-14 (월), 13:55 +0200, Ingo Molnar:
>> Another detail seems to be that the bug takes per-task-inherited
>> profiling. Doing:
>>
>> perf record -a -e cycles:pp make -j64 bzImage
>>
>> produces the expected number of events. Without the '-a' the bug
>> Linus found triggers.
>>
>> Similarly, 'perf top' does not trigger the bug - because it's
>> using per CPU, not per task (inherited) profiling.
>>
>
> perf_evlist__config_attrs() has this:
>
> if (evlist->cpus->map[0] < 0)
> opts->no_inherit = true;
>
> meaning that per task profiling won't enable event inheritance. I don't
> know why it's needed though.

I found this on commit 5d2cd90922c7 ("perf evsel: Fix use of inherit"):

- /*
- * Don't allow mmap() of inherited per-task counters. This
- * would create a performance issue due to all children writing
- * to the same buffer.
- *
- * FIXME:
- * Proper fix is not to pass 'inherit' to perf_evsel__open*,
- * but a 'flags' parameter, with 'group' folded there as well,
- * then introduce a PERF_O_{MMAP,GROUP,INHERIT} enum, and if
- * O_MMAP is set, emit a warning if cpu < 0 and O_INHERIT is
- * set. Lets go for the minimal fix first tho.
- */
- evsel->attr.inherit = (cpus->map[cpu] >= 0) && inherit;

Hmm.. still don't know what to do then.

Thanks,
Namhyung

2012-05-15 10:33:05

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [GIT PULL 0/8] Annotation weekly ponies delivery

On Tue, 2012-05-15 at 00:06 +0900, Namhyung Kim wrote:

> perf_evlist__config_attrs() has this:
>
> if (evlist->cpus->map[0] < 0)
> opts->no_inherit = true;
>
> meaning that per task profiling won't enable event inheritance. I don't
> know why it's needed though.

Because you cannot have inherited per-task counters. It only works for
per-task-per-cpu counters. Otherwise you'll have a scalability
nightmare.

2012-05-15 14:45:03

by Namhyung Kim

[permalink] [raw]
Subject: Re: [GIT PULL 0/8] Annotation weekly ponies delivery

2012-05-15 (화), 12:32 +0200, Peter Zijlstra:
> On Tue, 2012-05-15 at 00:06 +0900, Namhyung Kim wrote:
>
> > perf_evlist__config_attrs() has this:
> >
> > if (evlist->cpus->map[0] < 0)
> > opts->no_inherit = true;
> >
> > meaning that per task profiling won't enable event inheritance. I don't
> > know why it's needed though.
>
> Because you cannot have inherited per-task counters. It only works for
> per-task-per-cpu counters. Otherwise you'll have a scalability
> nightmare.
>

Got it. So it means that we do need to create an event for each cpu
in order to profile a task (and its children), right? (Originally, I
thought it's a bug :-p)

If so, yes, the commit 55261f46702c ("perf evlist: Fix creation of
cpu map") should be reverted like below (note that target->cpu_list
check no longer needed since perf_target__validate() will handle
those cases). If it looks ok to you guys, I'll send a formal patch
with name changes (to avoid the double negation suggested by Ingo):


diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 1201daf71719..f6979ba391d1 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -609,10 +609,10 @@ int perf_evlist__create_maps(struct perf_evlist
if (evlist->threads == NULL)
return -1;

- if (!perf_target__no_cpu(target))
- evlist->cpus = cpu_map__new(target->cpu_list);
- else
+ if (!perf_target__no_task(target))
evlist->cpus = cpu_map__dummy_new();
+ else
+ evlist->cpus = cpu_map__new(target->cpu_list);

if (evlist->cpus == NULL)
goto out_delete_threads;



--
Regards,
Namhyung Kim

2012-05-15 17:07:26

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [GIT PULL 0/8] Annotation weekly ponies delivery

On Tue, 2012-05-15 at 23:44 +0900, Namhyung Kim wrote:

> Got it. So it means that we do need to create an event for each cpu
> in order to profile a task (and its children), right? (Originally, I
> thought it's a bug :-p)

Almost, to profile a task you can get away with a per-task event, but to
add the 'and its children', you need per-task-per-cpu.

This is because while one task will only ever run on one cpu at the time
(spare scheduler bugs :-), multiple tasks can run on multiple cpus. So
if you have one event all output will need to go to the one buffer
associated with it, causing multiple cpus to write to the one buffer.

This creates resource contention to the point that a 64-cpu machine
would appear deadlocked (the original perf had it this way and made some
people very unhappy).

> If so, yes, the commit 55261f46702c ("perf evlist: Fix creation of
> cpu map") should be reverted like below (note that target->cpu_list
> check no longer needed since perf_target__validate() will handle
> those cases). If it looks ok to you guys, I'll send a formal patch
> with name changes (to avoid the double negation suggested by Ingo):
>
>
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index 1201daf71719..f6979ba391d1 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -609,10 +609,10 @@ int perf_evlist__create_maps(struct perf_evlist
> if (evlist->threads == NULL)
> return -1;
>
> - if (!perf_target__no_cpu(target))
> - evlist->cpus = cpu_map__new(target->cpu_list);
> - else
> + if (!perf_target__no_task(target))
> evlist->cpus = cpu_map__dummy_new();
> + else
> + evlist->cpus = cpu_map__new(target->cpu_list);
>
> if (evlist->cpus == NULL)
> goto out_delete_threads;


This does indeed make it work again:

pre patch:

[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.005 MB perf.data (~212 samples) ]

post patch:

[ perf record: Woken up 209 times to write data ]
[ perf record: Captured and wrote 54.931 MB perf.data (~2399974 samples) ]

So please do send as a proper patch, and you can add:

Acked-and-tested-by: Peter Zijlstra <[email protected]>

2012-05-16 05:59:47

by Namhyung Kim

[permalink] [raw]
Subject: Re: [GIT PULL 0/8] Annotation weekly ponies delivery

Hi,

On Tue, 15 May 2012 19:07:13 +0200, Peter Zijlstra wrote:
> This does indeed make it work again:
>
> pre patch:
>
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.005 MB perf.data (~212 samples) ]
>
> post patch:
>
> [ perf record: Woken up 209 times to write data ]
> [ perf record: Captured and wrote 54.931 MB perf.data (~2399974 samples) ]
>
> So please do send as a proper patch, and you can add:
>
> Acked-and-tested-by: Peter Zijlstra <[email protected]>

Hmm.. but it seems this breaks my 'perf stat -g' - it doesn't
respond any more (even for ^C):

namhyung@sejong:perf$ ./perf stat -g sleep 1
^C^C^C^C^\Quit (core dumped)

namhyung@sejong:perf$ gdb ./perf core.11964
[SNIP]
Core was generated by `./perf stat -g sleep 1'.
Program terminated with signal 3, Quit.
#0 0x0000003fd800e090 in __read_nocancel () from /lib64/libpthread.so.0
[SNIP]
(gdb) bt
#0 0x0000003fd800e090 in __read_nocancel () from /lib64/libpthread.so.0
#1 0x000000000047fa00 in read (__nbytes=24, __buf=0x7fffc5e1d0b0, __fd=0) at /usr/include/bits/unistd.h:45
#2 readn (fd=0, buf=<optimized out>, buf@entry=0x7fffc5e1d0b0, n=n@entry=24) at util/util.c:140
#3 0x000000000043e4bb in __perf_evsel__read (evsel=evsel@entry=0xc7e700, ncpus=<optimized out>, nthreads=1, scale=true)
at util/evsel.c:267
#4 0x0000000000422777 in read_counter_aggr (counter=0xc7e700) at builtin-stat.c:360
#5 run_perf_stat (argv=0x7fffc5e1d510, argc=2) at builtin-stat.c:528
#6 cmd_stat (argc=2, argv=0x7fffc5e1d510, prefix=<optimized out>) at builtin-stat.c:1251
#7 0x0000000000414723 in run_builtin (p=p@entry=0x77e1d8, argc=argc@entry=4, argv=argv@entry=0x7fffc5e1d510) at perf.c:273
#8 0x0000000000413f3b in handle_internal_command (argv=0x7fffc5e1d510, argc=4) at perf.c:345
#9 run_argv (argv=0x7fffc5e1d300, argcp=0x7fffc5e1d30c) at perf.c:389
#10 main (argc=4, argv=0x7fffc5e1d510) at perf.c:487


In fact, my perf stat -g was already broken - it cannot count at
all on my SNB box. I'll investigate it more.

Any thoughts?

Namhyung

2012-05-16 08:21:23

by Namhyung Kim

[permalink] [raw]
Subject: Re: [GIT PULL 0/8] Annotation weekly ponies delivery

Hi,

On Wed, 16 May 2012 14:57:49 +0900, Namhyung Kim wrote:
> Hmm.. but it seems this breaks my 'perf stat -g' - it doesn't
> respond any more (even for ^C):
>

I've found that it's a problem of whole perf stat, not only for -g
option. It's because the perf stat should use per-task events for some
reason as it's not use mmap on the events.

The commit 77a6f014e9ae ("perf stat: Use perf_evlist__create_maps")
introduced the issue and it seems following patch works well for me:

---[ cut here ]---

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index d19058a7b84c..185e59a4715d 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -754,6 +754,9 @@ static struct perf_record record = {
.user_freq = UINT_MAX,
.user_interval = ULLONG_MAX,
.freq = 1000,
+ .target = {
+ .need_mmap = true,
+ },
},
.write_mode = WRITE_FORCE,
.file_new = true,
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 553560a8b1be..ebcd15883ab8 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1162,6 +1162,9 @@ int cmd_top(int argc, const char **argv, const char *prefix __used)
.freq = 1000, /* 1 KHz */
.mmap_pages = 128,
.sym_pcnt_filter = 5,
+ .target = {
+ .need_mmap = true,
+ },
};
char callchain_default_opt[] = "fractal,0.5,callee";
const struct option options[] = {
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 87889f325678..ba43c3a4046c 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -611,6 +611,8 @@ int perf_evlist__create_maps(struct perf_evlist *evlist,

if (perf_target__has_task(target))
evlist->cpus = cpu_map__dummy_new();
+ else if (!perf_target__has_cpu(target) && !target->need_mmap)
+ evlist->cpus = cpu_map__dummy_new();
else
evlist->cpus = cpu_map__new(target->cpu_list);

diff --git a/tools/perf/util/target.h b/tools/perf/util/target.h
index c43f632955fa..ffa247d3ede3 100644
--- a/tools/perf/util/target.h
+++ b/tools/perf/util/target.h
@@ -11,6 +11,7 @@ struct perf_target {
const char *uid_str;
uid_t uid;
bool system_wide;
+ bool need_mmap;
};

enum perf_target_errno {


--
Thanks,
Namhyung

2012-05-17 00:15:39

by Namhyung Kim

[permalink] [raw]
Subject: Re: [GIT PULL 0/8] Annotation weekly ponies delivery

Hi,

On Wed, 16 May 2012 14:57:49 +0900, Namhyung Kim wrote:
> In fact, my perf stat -g was already broken - it cannot count at
> all on my SNB box. I'll investigate it more.
>

It's like a matter of the number of the hw counters. The perf stat -g
uses 6 hw events, but I can use up to 5 hw events on my SNB machine:

$ ./perf stat -g -e cycles,cache-references,cache-misses,instructions,branches sleep 1

Performance counter stats for 'sleep 1':

999,008 cycles # 0.000 GHz
10,963 cache-references
2,630 cache-misses # 23.990 % of all cache refs
635,225 instructions # 0.64 insns per cycle
128,014 branches

1.001466421 seconds time elapsed


But I adding branch-misses event outputs:

Performance counter stats for 'sleep 1':

<not counted> cycles
<not counted> cache-references
<not counted> cache-misses
<not counted> instructions
<not counted> branches
<not counted> branch-misses

1.000742533 seconds time elapsed


However adding ref-cycles is fine (Is it related to fixed counters?):

Performance counter stats for 'sleep 1':

1,027,263 cycles # 0.000 GHz
12,435 cache-references
2,747 cache-misses # 22.091 % of all cache refs
650,100 instructions # 0.63 insns per cycle
130,765 branches
2,367,488 ref-cycles

1.001557088 seconds time elapsed


BTW, dmesg says I have 7, but a counter was taken by NMI watchdog:

$ dmesg | grep -A32 Performance
[ 0.137227] Performance Events: PEBS fmt1+, 16-deep LBR, SandyBridge events, Intel PMU driver.
[ 0.137232] ... version: 3
[ 0.137233] ... bit width: 48
[ 0.137234] ... generic registers: 4
[ 0.137235] ... value mask: 0000ffffffffffff
[ 0.137236] ... max period: 000000007fffffff
[ 0.137237] ... fixed-purpose events: 3
[ 0.137238] ... event mask: 000000070000000f
[ 0.143371] NMI watchdog: enabled, takes one hw-pmu counter.
[ 0.149251] Booting Node 0, Processors #1
[ 0.162396] NMI watchdog: enabled, takes one hw-pmu counter.
[ 0.166265] #2
[ 0.179401] NMI watchdog: enabled, takes one hw-pmu counter.
[ 0.183263] #3
[ 0.196399] NMI watchdog: enabled, takes one hw-pmu counter.
[ 0.200263] #4
[ 0.213398] NMI watchdog: enabled, takes one hw-pmu counter.
[ 0.217268] #5
[ 0.230404] NMI watchdog: enabled, takes one hw-pmu counter.
[ 0.234263] #6
[ 0.247402] NMI watchdog: enabled, takes one hw-pmu counter.
[ 0.251268] #7
[ 0.264405] NMI watchdog: enabled, takes one hw-pmu counter.
[ 0.268266] #8
[ 0.281403] NMI watchdog: enabled, takes one hw-pmu counter.
[ 0.285268] #9
[ 0.298405] NMI watchdog: enabled, takes one hw-pmu counter.
[ 0.302264] #10
[ 0.315401] NMI watchdog: enabled, takes one hw-pmu counter.
[ 0.319266] #11 Ok.
[ 0.332404] NMI watchdog: enabled, takes one hw-pmu counter.
[ 0.332435] Brought up 12 CPUs
[ 0.332437] Total of 12 processors activated (76799.44 BogoMIPS).


Thanks,
Namhyung