From: Kan Liang <[email protected]>
perf_top__mmap_read has severe performance issue in
Knights Landing/Mill, when monitoring in heavy load system. It costs
several minutes to finish, which is unacceptable.
Currently, perf top is non overwrite mode. For non overwrite mode, it
tries to read everything in the ringbuffer and doesn't pause the
ringbuffer. Once there are lots of samples delivered persistently,
the processing time could be very long. Also, the latest samples could
be lost when the ringbuffer is full.
It's better to change it to overwrite mode, which takes a snapshot for
the system by pausing the ringbuffer and could significantly reducing
the processing time (from several minutes to several seconds).
Also, the overwrite mode always keep the latest samples.
Patch 1-7: Introduce new interfaces for generic code to support
overwrite mode for one by one event read.
Discards stale interfaces.
The patches can be merged separately.
Patch 8-12: Add overwrite support to perf top.
Perf top should only support either overwrite or non-overwrite
mode.
Switch default mode to overwrite mode
If kernel doesn't support overwrite mode, fall back to
non-overwrite mode.
Changes since V2:
- Move duplicate 'map->prev' out of perf_mmap__read. Modify the
perf_mmap__read_event accordingly.
- Introduce new interface perf_mmap__read_init to calculate the ringbuffer
position
- Check perf_missing_features.write_backward
- Discard stale interfaces perf_mmap__read_backward and
perf_mmap__read_catchup
Changes since V1:
- New patches 4-6
- Support both overwrite mode and non-overwrite mode.
If kernel doesn't support default overwrite mode, fall back to
non-overwrite mode.
Kan Liang (12):
perf evlist: remove stale mmap read for backward
perf mmap: factor out function to find ringbuffer position
perf mmap: discard 'prev' in perf_mmap__read
perf mmap: introduce perf_mmap__read_done
perf mmap: introduce perf_mmap__read_event
perf test: update mmap read functions for backward-ring-buffer test
perf mmap: discard legacy interface for mmap read
perf top: check per event overwrite term
perf evsel: expose perf_missing_features.write_backward
perf top: add overwrite fall back
perf top: switch default mode to overwrite mode
perf top: check the latency of perf_top__mmap_read
tools/perf/builtin-top.c | 139 +++++++++++++++++++++++++---
tools/perf/tests/backward-ring-buffer.c | 7 +-
tools/perf/ui/browsers/hists.c | 12 ++-
tools/perf/util/evlist.c | 17 ----
tools/perf/util/evlist.h | 4 -
tools/perf/util/evsel.c | 5 +
tools/perf/util/evsel.h | 2 +
tools/perf/util/mmap.c | 158 ++++++++++++++++++--------------
tools/perf/util/mmap.h | 10 +-
9 files changed, 244 insertions(+), 110 deletions(-)
--
2.5.5
From: Kan Liang <[email protected]>
perf_evlist__mmap_read_catchup and perf_evlist__mmap_read_backward are
only for overwrite mode.
But they read the evlist->mmap buffer which is for non-overwrite mode.
It did not bring any serious problem yet, because there is no one use
it.
Remove the unused interfaces.
Acked-by: Wang Nan <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/util/evlist.c | 17 -----------------
tools/perf/util/evlist.h | 4 ----
2 files changed, 21 deletions(-)
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index f0a5e09..cf0fd35 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -714,28 +714,11 @@ union perf_event *perf_evlist__mmap_read_forward(struct perf_evlist *evlist, int
return perf_mmap__read_forward(md);
}
-union perf_event *perf_evlist__mmap_read_backward(struct perf_evlist *evlist, int idx)
-{
- struct perf_mmap *md = &evlist->mmap[idx];
-
- /*
- * No need to check messup for backward ring buffer:
- * We can always read arbitrary long data from a backward
- * ring buffer unless we forget to pause it before reading.
- */
- return perf_mmap__read_backward(md);
-}
-
union perf_event *perf_evlist__mmap_read(struct perf_evlist *evlist, int idx)
{
return perf_evlist__mmap_read_forward(evlist, idx);
}
-void perf_evlist__mmap_read_catchup(struct perf_evlist *evlist, int idx)
-{
- perf_mmap__read_catchup(&evlist->mmap[idx]);
-}
-
void perf_evlist__mmap_consume(struct perf_evlist *evlist, int idx)
{
perf_mmap__consume(&evlist->mmap[idx], false);
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 7516066..a80fd47 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -132,10 +132,6 @@ union perf_event *perf_evlist__mmap_read(struct perf_evlist *evlist, int idx);
union perf_event *perf_evlist__mmap_read_forward(struct perf_evlist *evlist,
int idx);
-union perf_event *perf_evlist__mmap_read_backward(struct perf_evlist *evlist,
- int idx);
-void perf_evlist__mmap_read_catchup(struct perf_evlist *evlist, int idx);
-
void perf_evlist__mmap_consume(struct perf_evlist *evlist, int idx);
int perf_evlist__open(struct perf_evlist *evlist);
--
2.5.5
From: Kan Liang <[email protected]>
The latency of perf_top__mmap_read should be lower than refresh time.
If not, give some hints to reduce the latency.
Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/builtin-top.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 3c14333..8fb007b 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -877,8 +877,10 @@ static void perf_top__mmap_read(struct perf_top *top)
{
bool overwrite = top->record_opts.overwrite;
struct perf_evlist *evlist = top->evlist;
+ unsigned long long start, end;
int i;
+ start = rdclock();
if (overwrite)
perf_evlist__toggle_bkw_mmap(evlist, BKW_MMAP_DATA_PENDING);
@@ -889,6 +891,13 @@ static void perf_top__mmap_read(struct perf_top *top)
perf_evlist__toggle_bkw_mmap(evlist, BKW_MMAP_EMPTY);
perf_evlist__toggle_bkw_mmap(evlist, BKW_MMAP_RUNNING);
}
+ end = rdclock();
+
+ if ((end - start) > (unsigned long long)top->delay_secs * NSEC_PER_SEC)
+ ui__warning("Too slow to read ring buffer.\n"
+ "Please try increasing the period (-c) or\n"
+ "decreasing the freq (-F) or\n"
+ "limiting the number of CPUs (-C)\n");
}
/*
--
2.5.5
From: Kan Liang <[email protected]>
Switch to non-overwrite mode if kernel doesnot support overwrite
ringbuffer.
It's only effect when overwrite mode is supported.
No change to current behavior.
Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/builtin-top.c | 36 ++++++++++++++++++++++++++++++++++++
1 file changed, 36 insertions(+)
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 4b85e7b..8d19ef7 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -931,6 +931,27 @@ static int perf_top_overwrite_check(struct perf_top *top)
return 0;
}
+static int perf_top_overwrite_fallback(struct perf_top *top,
+ struct perf_evsel *evsel)
+{
+ struct record_opts *opts = &top->record_opts;
+ struct perf_evlist *evlist = top->evlist;
+ struct perf_evsel *counter;
+
+ if (!opts->overwrite)
+ return 0;
+
+ /* only fall back when first event fails */
+ if (evsel != perf_evlist__first(evlist))
+ return 0;
+
+ evlist__for_each_entry(evlist, counter)
+ counter->attr.write_backward = false;
+ opts->overwrite = false;
+ ui__warning("fall back to non-overwrite mode\n");
+ return 1;
+}
+
static int perf_top__start_counters(struct perf_top *top)
{
char msg[BUFSIZ];
@@ -954,6 +975,21 @@ static int perf_top__start_counters(struct perf_top *top)
try_again:
if (perf_evsel__open(counter, top->evlist->cpus,
top->evlist->threads) < 0) {
+
+ /*
+ * Specially handle overwrite fall back.
+ * Because perf top is the only tool which has
+ * overwrite mode by default, support
+ * both overwrite and non-overwrite mode, and
+ * require consistent mode for all events.
+ *
+ * May move it to generic code with more tools
+ * have similar attribute.
+ */
+ if (is_write_backward_fail() &&
+ perf_top_overwrite_fallback(top, counter))
+ goto try_again;
+
if (perf_evsel__fallback(counter, errno, msg, sizeof(msg))) {
if (verbose > 0)
ui__warning("%s\n", msg);
--
2.5.5
From: Kan Liang <[email protected]>
Per event overwrite term is not forbidden in perf top, which can bring
problems. Because perf top only support non-overwrite mode.
Check and forbid inconsistent per event overwrite term in the evlist.
Make it possible to support either non-overwrite or overwrite mode.
The overwrite mode is forbidden now, which will be removed when the
overwrite mode is supported later.
Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/builtin-top.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 60 insertions(+)
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index c6ccda5..4b85e7b 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -881,6 +881,56 @@ static void perf_top__mmap_read(struct perf_top *top)
perf_top__mmap_read_idx(top, i);
}
+/*
+ * Check per event overwrite term.
+ * perf top supports consistent mode for all events.
+ * Return -1 if the per event terms set but not consistent.
+ */
+static int perf_top_overwrite_check(struct perf_top *top)
+{
+ struct record_opts *opts = &top->record_opts;
+ struct perf_evlist *evlist = top->evlist;
+ struct perf_evsel_config_term *term;
+ struct list_head *config_terms;
+ struct perf_evsel *evsel;
+ int set, overwrite = -1;
+
+ evlist__for_each_entry(evlist, evsel) {
+ set = -1;
+ config_terms = &evsel->config_terms;
+ list_for_each_entry(term, config_terms, list) {
+ if (term->type == PERF_EVSEL__CONFIG_TERM_OVERWRITE)
+ set = term->val.overwrite ? 1 : 0;
+ }
+
+ /* no term for current and previous event (likely) */
+ if ((overwrite < 0) && (set < 0))
+ continue;
+
+ /* has term for both current and previous event, compare */
+ if ((overwrite >= 0) && (set >= 0) && (overwrite != set))
+ return -1;
+
+ /* no term for current event but has term for previous one */
+ if ((overwrite >= 0) && (set < 0))
+ return -1;
+
+ /* has term for current event */
+ if ((overwrite < 0) && (set >= 0)) {
+ /* if it's first event, set overwrite */
+ if (evsel == perf_evlist__first(evlist))
+ overwrite = set;
+ else
+ return -1;
+ }
+ }
+
+ if ((overwrite >= 0) && (opts->overwrite != overwrite))
+ opts->overwrite = overwrite;
+
+ return 0;
+}
+
static int perf_top__start_counters(struct perf_top *top)
{
char msg[BUFSIZ];
@@ -890,6 +940,16 @@ static int perf_top__start_counters(struct perf_top *top)
perf_evlist__config(evlist, opts, &callchain_param);
+ if (perf_top_overwrite_check(top)) {
+ ui__error("perf top support consistent mode for all events\n");
+ goto out_err;
+ }
+
+ if (opts->overwrite) {
+ ui__error("not support overwrite mode yet\n");
+ goto out_err;
+ }
+
evlist__for_each_entry(evlist, counter) {
try_again:
if (perf_evsel__open(counter, top->evlist->cpus,
--
2.5.5
From: Kan Liang <[email protected]>
perf_top__mmap_read has severe performance issue in
Knights Landing/Mill, when monitoring in heavy load system. It costs
several minutes to finish, which is unacceptable.
Currently, perf top is non overwrite mode. For non overwrite mode, it
tries to read everything in the ringbuffer and doesn't pause the
ringbuffer. Once there are lots of samples delivered persistently,
the processing time could be very long. Also, the latest samples could
be lost when the ringbuffer is full.
For overwrite mode, it takes a snapshot for the system by pausing the
ringbuffer, which could significantly reducing the processing time.
Also, the overwrite mode always keep the latest samples.
Considering the real time requirement for perf top, the overwrite mode
is more suitable for perf top.
Actually, perf top was overwrite mode. It is changed to non overwrite
mode since commit 93fc64f14472 ("perf top: Switch to non overwrite
mode"). It's better to change it back to overwrite mode by default.
For the kernel which doesn't support overwrite mode, it will fall back
to non overwrite mode.
There would be some records lost in overwrite mode because of pausing
the ringbuffer. It has little impact for the accuracy of the snapshot
and could be tolerant.
The lost events checking is removed.
For overwrite mode, unconditionally wait 100 ms before each snapshot. It
also reduce the overhead caused by pausing ringbuffer, especially on
light load system.
Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/builtin-top.c | 44 ++++++++++++++++++++++++------------------
tools/perf/ui/browsers/hists.c | 12 +++++++++---
2 files changed, 34 insertions(+), 22 deletions(-)
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 8d19ef7..3c14333 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -283,16 +283,6 @@ static void perf_top__print_sym_table(struct perf_top *top)
printf("%-*.*s\n", win_width, win_width, graph_dotted_line);
- if (hists->stats.nr_lost_warned !=
- hists->stats.nr_events[PERF_RECORD_LOST]) {
- hists->stats.nr_lost_warned =
- hists->stats.nr_events[PERF_RECORD_LOST];
- color_fprintf(stdout, PERF_COLOR_RED,
- "WARNING: LOST %d chunks, Check IO/CPU overload",
- hists->stats.nr_lost_warned);
- ++printed;
- }
-
if (top->sym_filter_entry) {
perf_top__show_details(top);
return;
@@ -807,15 +797,23 @@ static void perf_event__process_sample(struct perf_tool *tool,
static void perf_top__mmap_read_idx(struct perf_top *top, int idx)
{
+ struct record_opts *opts = &top->record_opts;
+ struct perf_evlist *evlist = top->evlist;
struct perf_sample sample;
struct perf_evsel *evsel;
+ struct perf_mmap *md;
struct perf_session *session = top->session;
union perf_event *event;
struct machine *machine;
+ u64 end, start;
int ret;
- while ((event = perf_evlist__mmap_read(top->evlist, idx)) != NULL) {
- ret = perf_evlist__parse_sample(top->evlist, event, &sample);
+ md = opts->overwrite ? &evlist->overwrite_mmap[idx] : &evlist->mmap[idx];
+ if (perf_mmap__read_init(md, opts->overwrite, &start, &end) < 0)
+ return;
+
+ while ((event = perf_mmap__read_event(md, opts->overwrite, &start, end)) != NULL) {
+ ret = perf_evlist__parse_sample(evlist, event, &sample);
if (ret) {
pr_err("Can't parse sample, err = %d\n", ret);
goto next_event;
@@ -869,16 +867,28 @@ static void perf_top__mmap_read_idx(struct perf_top *top, int idx)
} else
++session->evlist->stats.nr_unknown_events;
next_event:
- perf_evlist__mmap_consume(top->evlist, idx);
+ perf_mmap__consume(md, opts->overwrite);
}
+
+ perf_mmap__read_done(md);
}
static void perf_top__mmap_read(struct perf_top *top)
{
+ bool overwrite = top->record_opts.overwrite;
+ struct perf_evlist *evlist = top->evlist;
int i;
+ if (overwrite)
+ perf_evlist__toggle_bkw_mmap(evlist, BKW_MMAP_DATA_PENDING);
+
for (i = 0; i < top->evlist->nr_mmaps; i++)
perf_top__mmap_read_idx(top, i);
+
+ if (overwrite) {
+ perf_evlist__toggle_bkw_mmap(evlist, BKW_MMAP_EMPTY);
+ perf_evlist__toggle_bkw_mmap(evlist, BKW_MMAP_RUNNING);
+ }
}
/*
@@ -966,11 +976,6 @@ static int perf_top__start_counters(struct perf_top *top)
goto out_err;
}
- if (opts->overwrite) {
- ui__error("not support overwrite mode yet\n");
- goto out_err;
- }
-
evlist__for_each_entry(evlist, counter) {
try_again:
if (perf_evsel__open(counter, top->evlist->cpus,
@@ -1129,7 +1134,7 @@ static int __cmd_top(struct perf_top *top)
perf_top__mmap_read(top);
- if (hits == top->samples)
+ if (opts->overwrite || (hits == top->samples))
ret = perf_evlist__poll(top->evlist, 100);
if (resize) {
@@ -1223,6 +1228,7 @@ int cmd_top(int argc, const char **argv)
.uses_mmap = true,
},
.proc_map_timeout = 500,
+ .overwrite = 1,
},
.max_stack = sysctl_perf_event_max_stack,
.sym_pcnt_filter = 5,
diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index 68146f4..56023e4 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -638,8 +638,13 @@ int hist_browser__run(struct hist_browser *browser, const char *help)
nr_entries = hist_browser__nr_entries(browser);
ui_browser__update_nr_entries(&browser->b, nr_entries);
- if (browser->hists->stats.nr_lost_warned !=
- browser->hists->stats.nr_events[PERF_RECORD_LOST]) {
+ /*
+ * Don't print lost events warning for perf top,
+ * because it is overwrite mode.
+ * Perf top is the only tool which has hbt timer.
+ */
+ if ((browser->hists->stats.nr_lost_warned !=
+ browser->hists->stats.nr_events[PERF_RECORD_LOST]) && !hbt) {
browser->hists->stats.nr_lost_warned =
browser->hists->stats.nr_events[PERF_RECORD_LOST];
ui_browser__warn_lost_events(&browser->b);
@@ -3203,7 +3208,8 @@ static int perf_evsel_menu__run(struct perf_evsel_menu *menu,
case K_TIMER:
hbt->timer(hbt->arg);
- if (!menu->lost_events_warned && menu->lost_events) {
+ if (!menu->lost_events_warned &&
+ menu->lost_events && !hbt) {
ui_browser__warn_lost_events(&menu->b);
menu->lost_events_warned = true;
}
--
2.5.5
From: Kan Liang <[email protected]>
perf top need it to handle overwrite fallback later.
Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/util/evsel.c | 5 +++++
tools/perf/util/evsel.h | 2 ++
2 files changed, 7 insertions(+)
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index a4d256e..be67149 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1669,6 +1669,11 @@ static bool ignore_missing_thread(struct perf_evsel *evsel,
return true;
}
+bool is_write_backward_fail(void)
+{
+ return perf_missing_features.write_backward;
+}
+
int perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
struct thread_map *threads)
{
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 846e416..1f46728 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -448,4 +448,6 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
struct perf_env *perf_evsel__env(struct perf_evsel *evsel);
+bool is_write_backward_fail(void);
+
#endif /* __PERF_EVSEL_H */
--
2.5.5
From: Kan Liang <[email protected]>
Discards perf_mmap__read_backward and perf_mmap__read_catchup. No tools
use them.
There are tools still use perf_mmap__read_forward. Keep it, but add
comments to point to the new interface for future use.
Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/util/mmap.c | 50 ++++----------------------------------------------
tools/perf/util/mmap.h | 3 ---
2 files changed, 4 insertions(+), 49 deletions(-)
diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
index d0ca3ba..650e0a7 100644
--- a/tools/perf/util/mmap.c
+++ b/tools/perf/util/mmap.c
@@ -63,6 +63,10 @@ static union perf_event *perf_mmap__read(struct perf_mmap *map,
return event;
}
+/*
+ * legacy interface for mmap read.
+ * Don't use it. Use perf_mmap__read_event().
+ */
union perf_event *perf_mmap__read_forward(struct perf_mmap *map)
{
u64 head;
@@ -78,41 +82,6 @@ union perf_event *perf_mmap__read_forward(struct perf_mmap *map)
return perf_mmap__read(map, &map->prev, head);
}
-union perf_event *perf_mmap__read_backward(struct perf_mmap *map)
-{
- u64 head, end;
-
- /*
- * Check if event was unmapped due to a POLLHUP/POLLERR.
- */
- if (!refcount_read(&map->refcnt))
- return NULL;
-
- head = perf_mmap__read_head(map);
- if (!head)
- return NULL;
-
- /*
- * 'head' pointer starts from 0. Kernel minus sizeof(record) form
- * it each time when kernel writes to it, so in fact 'head' is
- * negative. 'end' pointer is made manually by adding the size of
- * the ring buffer to 'head' pointer, means the validate data can
- * read is the whole ring buffer. If 'end' is positive, the ring
- * buffer has not fully filled, so we must adjust 'end' to 0.
- *
- * However, since both 'head' and 'end' is unsigned, we can't
- * simply compare 'end' against 0. Here we compare '-head' and
- * the size of the ring buffer, where -head is the number of bytes
- * kernel write to the ring buffer.
- */
- if (-head < (u64)(map->mask + 1))
- end = 0;
- else
- end = head + map->mask + 1;
-
- return perf_mmap__read(map, &map->prev, end);
-}
-
/*
* Read event from ring buffer one by one.
* Return one event for each call.
@@ -152,17 +121,6 @@ union perf_event *perf_mmap__read_event(struct perf_mmap *map,
return event;
}
-void perf_mmap__read_catchup(struct perf_mmap *map)
-{
- u64 head;
-
- if (!refcount_read(&map->refcnt))
- return;
-
- head = perf_mmap__read_head(map);
- map->prev = head;
-}
-
static bool perf_mmap__empty(struct perf_mmap *map)
{
return perf_mmap__read_head(map) == map->prev && !map->auxtrace_mmap.base;
diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
index ed3b0d2..a99152e 100644
--- a/tools/perf/util/mmap.h
+++ b/tools/perf/util/mmap.h
@@ -65,8 +65,6 @@ void perf_mmap__put(struct perf_mmap *map);
void perf_mmap__consume(struct perf_mmap *map, bool overwrite);
-void perf_mmap__read_catchup(struct perf_mmap *md);
-
static inline u64 perf_mmap__read_head(struct perf_mmap *mm)
{
struct perf_event_mmap_page *pc = mm->base;
@@ -87,7 +85,6 @@ static inline void perf_mmap__write_tail(struct perf_mmap *md, u64 tail)
}
union perf_event *perf_mmap__read_forward(struct perf_mmap *map);
-union perf_event *perf_mmap__read_backward(struct perf_mmap *map);
union perf_event *perf_mmap__read_event(struct perf_mmap *map,
bool overwrite,
--
2.5.5
From: Kan Liang <[email protected]>
The direction of overwrite mode is backward. The last mmap__read_event
will set tail to map->prev. Need to correct the map->prev to head which
is the end of next read.
It will be used later.
Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/util/mmap.c | 11 +++++++++++
tools/perf/util/mmap.h | 1 +
2 files changed, 12 insertions(+)
diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
index a844a2f..4aaeb64 100644
--- a/tools/perf/util/mmap.c
+++ b/tools/perf/util/mmap.c
@@ -343,3 +343,14 @@ int perf_mmap__push(struct perf_mmap *md, bool overwrite,
out:
return rc;
}
+
+/*
+ * Mandatory for overwrite mode
+ * The direction of overwrite mode is backward.
+ * The last mmap__read_event will set tail to map->prev.
+ * Need to correct the map->prev to head which is the end of next read.
+ */
+void perf_mmap__read_done(struct perf_mmap *map)
+{
+ map->prev = perf_mmap__read_head(map);
+}
diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
index abe9b9f..2df27c1 100644
--- a/tools/perf/util/mmap.h
+++ b/tools/perf/util/mmap.h
@@ -96,4 +96,5 @@ size_t perf_mmap__mmap_len(struct perf_mmap *map);
int perf_mmap__read_init(struct perf_mmap *map, bool overwrite,
u64 *start, u64 *end);
+void perf_mmap__read_done(struct perf_mmap *map);
#endif /*__PERF_MMAP_H */
--
2.5.5
From: Kan Liang <[email protected]>
Use the new perf_mmap__read_* interfaces for overwrite ringbuffer test.
Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/tests/backward-ring-buffer.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/tools/perf/tests/backward-ring-buffer.c b/tools/perf/tests/backward-ring-buffer.c
index 4035d43..e0b1b41 100644
--- a/tools/perf/tests/backward-ring-buffer.c
+++ b/tools/perf/tests/backward-ring-buffer.c
@@ -31,10 +31,12 @@ static int count_samples(struct perf_evlist *evlist, int *sample_count,
int i;
for (i = 0; i < evlist->nr_mmaps; i++) {
+ struct perf_mmap *map = &evlist->overwrite_mmap[i];
union perf_event *event;
+ u64 start, end;
- perf_mmap__read_catchup(&evlist->overwrite_mmap[i]);
- while ((event = perf_mmap__read_backward(&evlist->overwrite_mmap[i])) != NULL) {
+ perf_mmap__read_init(map, true, &start, &end);
+ while ((event = perf_mmap__read_event(map, true, &start, end)) != NULL) {
const u32 type = event->header.type;
switch (type) {
@@ -49,6 +51,7 @@ static int count_samples(struct perf_evlist *evlist, int *sample_count,
return TEST_FAIL;
}
}
+ perf_mmap__read_done(map);
}
return TEST_OK;
}
--
2.5.5
From: Kan Liang <[email protected]>
'start' and 'prev' are duplicate in perf_mmap__read()
Use 'map->prev' to replace 'start' in perf_mmap__read_*().
Suggested-by: Wang Nan <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/util/mmap.c | 28 ++++++++++------------------
1 file changed, 10 insertions(+), 18 deletions(-)
diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
index 3fd4f3c..a844a2f 100644
--- a/tools/perf/util/mmap.c
+++ b/tools/perf/util/mmap.c
@@ -22,29 +22,27 @@ size_t perf_mmap__mmap_len(struct perf_mmap *map)
/* When check_messup is true, 'end' must points to a good entry */
static union perf_event *perf_mmap__read(struct perf_mmap *map,
- u64 start, u64 end, u64 *prev)
+ u64 *start, u64 end)
{
unsigned char *data = map->base + page_size;
union perf_event *event = NULL;
- int diff = end - start;
+ int diff = end - *start;
if (diff >= (int)sizeof(event->header)) {
size_t size;
- event = (union perf_event *)&data[start & map->mask];
+ event = (union perf_event *)&data[*start & map->mask];
size = event->header.size;
- if (size < sizeof(event->header) || diff < (int)size) {
- event = NULL;
- goto broken_event;
- }
+ if (size < sizeof(event->header) || diff < (int)size)
+ return NULL;
/*
* Event straddles the mmap boundary -- header should always
* be inside due to u64 alignment of output.
*/
- if ((start & map->mask) + size != ((start + size) & map->mask)) {
- unsigned int offset = start;
+ if ((*start & map->mask) + size != ((*start + size) & map->mask)) {
+ unsigned int offset = *start;
unsigned int len = min(sizeof(*event), size), cpy;
void *dst = map->event_copy;
@@ -59,20 +57,15 @@ static union perf_event *perf_mmap__read(struct perf_mmap *map,
event = (union perf_event *)map->event_copy;
}
- start += size;
+ *start += size;
}
-broken_event:
- if (prev)
- *prev = start;
-
return event;
}
union perf_event *perf_mmap__read_forward(struct perf_mmap *map)
{
u64 head;
- u64 old = map->prev;
/*
* Check if event was unmapped due to a POLLHUP/POLLERR.
@@ -82,13 +75,12 @@ union perf_event *perf_mmap__read_forward(struct perf_mmap *map)
head = perf_mmap__read_head(map);
- return perf_mmap__read(map, old, head, &map->prev);
+ return perf_mmap__read(map, &map->prev, head);
}
union perf_event *perf_mmap__read_backward(struct perf_mmap *map)
{
u64 head, end;
- u64 start = map->prev;
/*
* Check if event was unmapped due to a POLLHUP/POLLERR.
@@ -118,7 +110,7 @@ union perf_event *perf_mmap__read_backward(struct perf_mmap *map)
else
end = head + map->mask + 1;
- return perf_mmap__read(map, start, end, &map->prev);
+ return perf_mmap__read(map, &map->prev, end);
}
void perf_mmap__read_catchup(struct perf_mmap *map)
--
2.5.5
From: Kan Liang <[email protected]>
The perf record has specific codes to calculate the ringbuffer position
for both overwrite and non-overwrite mode.
The perf top will support both modes later.
It is useful to make the specific codes generic.
Introduce a new interface perf_mmap__read_init() to find ringbuffer
position.
Add a check for map->refcnt in perf_mmap__read_init().
'size' is needed for both perf_mmap__read_init() and perf_mmap__push().
Have to calculate in each function.
Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/util/mmap.c | 62 ++++++++++++++++++++++++++++++++++----------------
tools/perf/util/mmap.h | 2 ++
2 files changed, 45 insertions(+), 19 deletions(-)
diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
index 05076e6..3fd4f3c 100644
--- a/tools/perf/util/mmap.c
+++ b/tools/perf/util/mmap.c
@@ -267,41 +267,65 @@ static int overwrite_rb_find_range(void *buf, int mask, u64 head, u64 *start, u6
return -1;
}
-int perf_mmap__push(struct perf_mmap *md, bool overwrite,
- void *to, int push(void *to, void *buf, size_t size))
+/*
+ * Report the start and end of the available data in ringbuffer
+ */
+int perf_mmap__read_init(struct perf_mmap *map, bool overwrite,
+ u64 *start, u64 *end)
{
- u64 head = perf_mmap__read_head(md);
- u64 old = md->prev;
- u64 end = head, start = old;
- unsigned char *data = md->base + page_size;
+ u64 head = perf_mmap__read_head(map);
+ u64 old = map->prev;
+ unsigned char *data = map->base + page_size;
unsigned long size;
- void *buf;
- int rc = 0;
- start = overwrite ? head : old;
- end = overwrite ? old : head;
+ /*
+ * Check if event was unmapped due to a POLLHUP/POLLERR.
+ */
+ if (!refcount_read(&map->refcnt))
+ return -EINVAL;
- if (start == end)
- return 0;
+ *start = overwrite ? head : old;
+ *end = overwrite ? old : head;
- size = end - start;
- if (size > (unsigned long)(md->mask) + 1) {
+ if (*start == *end)
+ return -EAGAIN;
+
+ size = *end - *start;
+ if (size > (unsigned long)(map->mask) + 1) {
if (!overwrite) {
WARN_ONCE(1, "failed to keep up with mmap data. (warn only once)\n");
- md->prev = head;
- perf_mmap__consume(md, overwrite);
- return 0;
+ map->prev = head;
+ perf_mmap__consume(map, overwrite);
+ return -EAGAIN;
}
/*
* Backward ring buffer is full. We still have a chance to read
* most of data from it.
*/
- if (overwrite_rb_find_range(data, md->mask, head, &start, &end))
- return -1;
+ if (overwrite_rb_find_range(data, map->mask, head, start, end))
+ return -EINVAL;
}
+ return 0;
+}
+
+int perf_mmap__push(struct perf_mmap *md, bool overwrite,
+ void *to, int push(void *to, void *buf, size_t size))
+{
+ u64 head = perf_mmap__read_head(md);
+ u64 end, start;
+ unsigned char *data = md->base + page_size;
+ unsigned long size;
+ void *buf;
+ int rc;
+
+ rc = perf_mmap__read_init(md, overwrite, &start, &end);
+ if (rc < 0)
+ return (rc == -EAGAIN) ? 0 : -1;
+
+ size = end - start;
if ((start & md->mask) + size != (end & md->mask)) {
buf = &data[start & md->mask];
size = md->mask + 1 - (start & md->mask);
diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
index d640273..abe9b9f 100644
--- a/tools/perf/util/mmap.h
+++ b/tools/perf/util/mmap.h
@@ -94,4 +94,6 @@ int perf_mmap__push(struct perf_mmap *md, bool backward,
size_t perf_mmap__mmap_len(struct perf_mmap *map);
+int perf_mmap__read_init(struct perf_mmap *map, bool overwrite,
+ u64 *start, u64 *end);
#endif /*__PERF_MMAP_H */
--
2.5.5
From: Kan Liang <[email protected]>
Except perf record, other perf tools read events one by one from ring
buffer by perf_mmap__read_forward. But it only supports non-overwrite
mode.
Introduce perf_mmap__read_event to support both non-overwrite and
overwrite mode.
Usage:
perf_mmap__read_init()
while(event = perf_mmap__read_event()) {
//process the event
perf_mmap__consume()
}
perf_mmap__read_done()
It cannot use perf_mmap__read_backward. Because it always read the stale
buffer which is already processed. Furthermore, the forward and backward
concepts have been removed. The perf_mmap__read_backward will be
replaced and discarded later.
Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/util/mmap.c | 39 +++++++++++++++++++++++++++++++++++++++
tools/perf/util/mmap.h | 4 ++++
2 files changed, 43 insertions(+)
diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
index 4aaeb64..d0ca3ba 100644
--- a/tools/perf/util/mmap.c
+++ b/tools/perf/util/mmap.c
@@ -113,6 +113,45 @@ union perf_event *perf_mmap__read_backward(struct perf_mmap *map)
return perf_mmap__read(map, &map->prev, end);
}
+/*
+ * Read event from ring buffer one by one.
+ * Return one event for each call.
+ *
+ * Usage:
+ * perf_mmap__read_init()
+ * while(event = perf_mmap__read_event()) {
+ * //process the event
+ * perf_mmap__consume()
+ * }
+ * perf_mmap__read_done()
+ */
+union perf_event *perf_mmap__read_event(struct perf_mmap *map,
+ bool overwrite,
+ u64 *start, u64 end)
+{
+ union perf_event *event;
+
+ /*
+ * Check if event was unmapped due to a POLLHUP/POLLERR.
+ */
+ if (!refcount_read(&map->refcnt))
+ return NULL;
+
+ if (start == NULL)
+ return NULL;
+
+ /* non-overwirte doesn't pause the ringbuffer */
+ if (!overwrite)
+ end = perf_mmap__read_head(map);
+
+ event = perf_mmap__read(map, start, end);
+
+ if (!overwrite)
+ map->prev = *start;
+
+ return event;
+}
+
void perf_mmap__read_catchup(struct perf_mmap *map)
{
u64 head;
diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
index 2df27c1..ed3b0d2 100644
--- a/tools/perf/util/mmap.h
+++ b/tools/perf/util/mmap.h
@@ -89,6 +89,10 @@ static inline void perf_mmap__write_tail(struct perf_mmap *md, u64 tail)
union perf_event *perf_mmap__read_forward(struct perf_mmap *map);
union perf_event *perf_mmap__read_backward(struct perf_mmap *map);
+union perf_event *perf_mmap__read_event(struct perf_mmap *map,
+ bool overwrite,
+ u64 *start, u64 end);
+
int perf_mmap__push(struct perf_mmap *md, bool backward,
void *to, int push(void *to, void *buf, size_t size));
--
2.5.5
Hello,
On Thu, Dec 21, 2017 at 10:08:46AM -0800, [email protected] wrote:
> From: Kan Liang <[email protected]>
>
> The direction of overwrite mode is backward. The last mmap__read_event
> will set tail to map->prev. Need to correct the map->prev to head which
> is the end of next read.
Why do you update the map->prev needlessly then? I think we don't
need it for overwrite/backward mode, right?
Also I guess the current code might miss some events since the head
can be different between _read_init() and _read_done(), no?
Thanks,
Namhyung
>
> It will be used later.
>
> Signed-off-by: Kan Liang <[email protected]>
> ---
> tools/perf/util/mmap.c | 11 +++++++++++
> tools/perf/util/mmap.h | 1 +
> 2 files changed, 12 insertions(+)
>
> diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
> index a844a2f..4aaeb64 100644
> --- a/tools/perf/util/mmap.c
> +++ b/tools/perf/util/mmap.c
> @@ -343,3 +343,14 @@ int perf_mmap__push(struct perf_mmap *md, bool overwrite,
> out:
> return rc;
> }
> +
> +/*
> + * Mandatory for overwrite mode
> + * The direction of overwrite mode is backward.
> + * The last mmap__read_event will set tail to map->prev.
> + * Need to correct the map->prev to head which is the end of next read.
> + */
> +void perf_mmap__read_done(struct perf_mmap *map)
> +{
> + map->prev = perf_mmap__read_head(map);
> +}
> diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
> index abe9b9f..2df27c1 100644
> --- a/tools/perf/util/mmap.h
> +++ b/tools/perf/util/mmap.h
> @@ -96,4 +96,5 @@ size_t perf_mmap__mmap_len(struct perf_mmap *map);
>
> int perf_mmap__read_init(struct perf_mmap *map, bool overwrite,
> u64 *start, u64 *end);
> +void perf_mmap__read_done(struct perf_mmap *map);
> #endif /*__PERF_MMAP_H */
> --
> 2.5.5
>
> Hello,
>
> On Thu, Dec 21, 2017 at 10:08:46AM -0800, [email protected] wrote:
> > From: Kan Liang <[email protected]>
> >
> > The direction of overwrite mode is backward. The last mmap__read_event
> > will set tail to map->prev. Need to correct the map->prev to head
> > which is the end of next read.
>
> Why do you update the map->prev needlessly then? I think we don't need it
> for overwrite/backward mode, right?
The map->prev is needless only when the overwrite does really happen in ringbuffer.
In a light load system or with big ringbuffer, the unprocessed data will not be
overwritten. So it's necessary to keep an pointer to indicate the last position.
Overwrite mode is backward, but the event processing is always forward.
So map->prev has to be updated in __read_done().
>
> Also I guess the current code might miss some events since the head can be
> different between _read_init() and _read_done(), no?
>
The overwrite mode requires the ring buffer to be paused during processing.
The head is unchanged between __read_init() and __read_done().
The event during the pause should be missed. But I think it has little impact for
the accuracy of the snapshot and can be tolerant for perf top.
I mentioned it in the change log of patch 11/12.
I also removed the lost events checking for perf top.
Thanks,
Kan
> Thanks,
> Namhyung
>
>
> >
> > It will be used later.
> >
> > Signed-off-by: Kan Liang <[email protected]>
> > ---
> > tools/perf/util/mmap.c | 11 +++++++++++ tools/perf/util/mmap.h | 1
> > +
> > 2 files changed, 12 insertions(+)
> >
> > diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c index
> > a844a2f..4aaeb64 100644
> > --- a/tools/perf/util/mmap.c
> > +++ b/tools/perf/util/mmap.c
> > @@ -343,3 +343,14 @@ int perf_mmap__push(struct perf_mmap *md,
> bool
> > overwrite,
> > out:
> > return rc;
> > }
> > +
> > +/*
> > + * Mandatory for overwrite mode
> > + * The direction of overwrite mode is backward.
> > + * The last mmap__read_event will set tail to map->prev.
> > + * Need to correct the map->prev to head which is the end of next read.
> > + */
> > +void perf_mmap__read_done(struct perf_mmap *map) {
> > + map->prev = perf_mmap__read_head(map); }
> > diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h index
> > abe9b9f..2df27c1 100644
> > --- a/tools/perf/util/mmap.h
> > +++ b/tools/perf/util/mmap.h
> > @@ -96,4 +96,5 @@ size_t perf_mmap__mmap_len(struct perf_mmap
> *map);
> >
> > int perf_mmap__read_init(struct perf_mmap *map, bool overwrite,
> > u64 *start, u64 *end);
> > +void perf_mmap__read_done(struct perf_mmap *map);
> > #endif /*__PERF_MMAP_H */
> > --
> > 2.5.5
> >
Hi Kan,
On Wed, Jan 03, 2018 at 02:15:38PM +0000, Liang, Kan wrote:
> > Hello,
> >
> > On Thu, Dec 21, 2017 at 10:08:46AM -0800, [email protected] wrote:
> > > From: Kan Liang <[email protected]>
> > >
> > > The direction of overwrite mode is backward. The last mmap__read_event
> > > will set tail to map->prev. Need to correct the map->prev to head
> > > which is the end of next read.
> >
> > Why do you update the map->prev needlessly then? I think we don't need it
> > for overwrite/backward mode, right?
>
> The map->prev is needless only when the overwrite does really happen in ringbuffer.
> In a light load system or with big ringbuffer, the unprocessed data will not be
> overwritten. So it's necessary to keep an pointer to indicate the last position.
>
> Overwrite mode is backward, but the event processing is always forward.
> So map->prev has to be updated in __read_done().
Yep, I meant that updating map->prev in every perf_mmap__read_event()
is unnecessary for the overwrite mode. It only needs to be set in
perf_mmap__read_done(), right?
>
> >
> > Also I guess the current code might miss some events since the head can be
> > different between _read_init() and _read_done(), no?
> >
>
> The overwrite mode requires the ring buffer to be paused during processing.
> The head is unchanged between __read_init() and __read_done().
Ah, ok then. Maybe we could read the head once, and use it during processing.
Thanks,
Namhyung
>
> The event during the pause should be missed. But I think it has little impact for
> the accuracy of the snapshot and can be tolerant for perf top.
> I mentioned it in the change log of patch 11/12.
> I also removed the lost events checking for perf top.
> Hi Kan,
>
> On Wed, Jan 03, 2018 at 02:15:38PM +0000, Liang, Kan wrote:
> > > Hello,
> > >
> > > On Thu, Dec 21, 2017 at 10:08:46AM -0800, [email protected] wrote:
> > > > From: Kan Liang <[email protected]>
> > > >
> > > > The direction of overwrite mode is backward. The last
> mmap__read_event
> > > > will set tail to map->prev. Need to correct the map->prev to head
> > > > which is the end of next read.
> > >
> > > Why do you update the map->prev needlessly then? I think we don't
> need it
> > > for overwrite/backward mode, right?
> >
> > The map->prev is needless only when the overwrite does really happen in
> ringbuffer.
> > In a light load system or with big ringbuffer, the unprocessed data will not
> be
> > overwritten. So it's necessary to keep an pointer to indicate the last
> position.
> >
> > Overwrite mode is backward, but the event processing is always forward.
> > So map->prev has to be updated in __read_done().
>
> Yep, I meant that updating map->prev in every perf_mmap__read_event()
> is unnecessary for the overwrite mode. It only needs to be set in
> perf_mmap__read_done(), right?
Right, for overwrite, only updating the map->prev in perf_mmap__read_done()
is enough.
But for non-overwrite, we have to update map->prev.
It will be used by perf_mmap__consume() later to write the ring buffer tail.
So I specially handle the non-overwrite as below in patch 5/12.
+ event = perf_mmap__read(map, start, end);
+
+ if (!overwrite)
+ map->prev = *start;
> >
> > >
> > > Also I guess the current code might miss some events since the head can
> be
> > > different between _read_init() and _read_done(), no?
> > >
> >
> > The overwrite mode requires the ring buffer to be paused during
> processing.
> > The head is unchanged between __read_init() and __read_done().
>
> Ah, ok then. Maybe we could read the head once, and use it during
> processing.
Yes, it only needs to read head once for overwrite mode.
But for non-overwrite, we have to read the head in every
perf_mmap__read_event(). Because the head is floating.
The non-overwrite is specially handled in patch 5/12 as well.
+ /* non-overwirte doesn't pause the ringbuffer */
+ if (!overwrite)
+ end = perf_mmap__read_head(map);
Thanks,
Kan
>
> Thanks,
> Namhyung
>
>
> >
> > The event during the pause should be missed. But I think it has little impact
> for
> > the accuracy of the snapshot and can be tolerant for perf top.
> > I mentioned it in the change log of patch 11/12.
> > I also removed the lost events checking for perf top.
Hi,
On Thu, Jan 04, 2018 at 02:00:03PM +0000, Liang, Kan wrote:
> > Hi Kan,
> >
> > On Wed, Jan 03, 2018 at 02:15:38PM +0000, Liang, Kan wrote:
> > > > Hello,
> > > >
> > > > On Thu, Dec 21, 2017 at 10:08:46AM -0800, [email protected] wrote:
> > > > > From: Kan Liang <[email protected]>
> > > > >
> > > > > The direction of overwrite mode is backward. The last
> > mmap__read_event
> > > > > will set tail to map->prev. Need to correct the map->prev to head
> > > > > which is the end of next read.
> > > >
> > > > Why do you update the map->prev needlessly then? I think we don't
> > need it
> > > > for overwrite/backward mode, right?
> > >
> > > The map->prev is needless only when the overwrite does really happen in
> > ringbuffer.
> > > In a light load system or with big ringbuffer, the unprocessed data will not
> > be
> > > overwritten. So it's necessary to keep an pointer to indicate the last
> > position.
> > >
> > > Overwrite mode is backward, but the event processing is always forward.
> > > So map->prev has to be updated in __read_done().
> >
> > Yep, I meant that updating map->prev in every perf_mmap__read_event()
> > is unnecessary for the overwrite mode. It only needs to be set in
> > perf_mmap__read_done(), right?
>
> Right, for overwrite, only updating the map->prev in perf_mmap__read_done()
> is enough.
>
> But for non-overwrite, we have to update map->prev.
> It will be used by perf_mmap__consume() later to write the ring buffer tail.
> So I specially handle the non-overwrite as below in patch 5/12.
Oh I can see it in the patch 5, thanks.
> + event = perf_mmap__read(map, start, end);
> +
> + if (!overwrite)
> + map->prev = *start;
>
> > >
> > > >
> > > > Also I guess the current code might miss some events since the head can
> > be
> > > > different between _read_init() and _read_done(), no?
> > > >
> > >
> > > The overwrite mode requires the ring buffer to be paused during
> > processing.
> > > The head is unchanged between __read_init() and __read_done().
> >
> > Ah, ok then. Maybe we could read the head once, and use it during
> > processing.
>
> Yes, it only needs to read head once for overwrite mode.
> But for non-overwrite, we have to read the head in every
> perf_mmap__read_event(). Because the head is floating.
> The non-overwrite is specially handled in patch 5/12 as well.
Right, I understand it for the non-overwrite mode.
But, for the overwrite mode, my concern was that it might be possible
that it reads a stale head in __read_init() (even after it paused the
ring buffer) and reads an update head in __read_done(). Then it's
gonna miss some records. I'm not sure whether it reads the same head
in __read_init() and __read_done() by the pause.
Thanks,
Namhyung
> + /* non-overwirte doesn't pause the ringbuffer */
> + if (!overwrite)
> + end = perf_mmap__read_head(map);
> > > > >
> > > > > Also I guess the current code might miss some events since the head
> can
> > > be
> > > > > different between _read_init() and _read_done(), no?
> > > > >
> > > >
> > > > The overwrite mode requires the ring buffer to be paused during
> > > processing.
> > > > The head is unchanged between __read_init() and __read_done().
> > >
> > > Ah, ok then. Maybe we could read the head once, and use it during
> > > processing.
> >
> > Yes, it only needs to read head once for overwrite mode.
> > But for non-overwrite, we have to read the head in every
> > perf_mmap__read_event(). Because the head is floating.
> > The non-overwrite is specially handled in patch 5/12 as well.
>
> Right, I understand it for the non-overwrite mode.
>
> But, for the overwrite mode, my concern was that it might be possible
> that it reads a stale head in __read_init() (even after it paused the
> ring buffer) and reads an update head in __read_done(). Then it's
> gonna miss some records. I'm not sure whether it reads the same head
> in __read_init() and __read_done() by the pause.
>
The only scenario which may cause the different 'head' may be as below.
The 'rb->head' is updated in __perf_output_begin(), but haven’t been
assigned to 'pc->data_head' for perf tool. During this period, the 'paused'
is set and __read_init() reads head.
But this scenario never happens because of the ringbuffer lock.
Otherwise, I cannot imagine any other scenarios which may causes the
different 'head' in __read_init() and __read_done() with ringbuffer
paused. Please let me know if there is an example.
There would be some records miss. But it's only because the ringbuffer
is paused. The head should keep the same.
I also did some tests and dump the 'head' in __read_init() and
__read_done() with ringbuffer paused. I didn't see any difference either.
Thanks,
Kan
> Thanks,
> Namhyung
>
>
> > + /* non-overwirte doesn't pause the ringbuffer */
> > + if (!overwrite)
> > + end = perf_mmap__read_head(map);
On Thu, Dec 21, 2017 at 10:08:44AM -0800, [email protected] wrote:
> From: Kan Liang <[email protected]>
>
> The perf record has specific codes to calculate the ringbuffer position
> for both overwrite and non-overwrite mode.
> The perf top will support both modes later.
> It is useful to make the specific codes generic.
>
> Introduce a new interface perf_mmap__read_init() to find ringbuffer
> position.
> Add a check for map->refcnt in perf_mmap__read_init().
> 'size' is needed for both perf_mmap__read_init() and perf_mmap__push().
> Have to calculate in each function.
it's 2 separate changes then plus 1 not mentioned in changelog..
could you please split this into separate patches:
- Introduce a new interface perf_mmap__read_init ...
- Add a check for map->refcnt in perf_mmap__read_init
- add new EAGAIN return value logic
thanks,
jirka
>
> Signed-off-by: Kan Liang <[email protected]>
> ---
> tools/perf/util/mmap.c | 62 ++++++++++++++++++++++++++++++++++----------------
> tools/perf/util/mmap.h | 2 ++
> 2 files changed, 45 insertions(+), 19 deletions(-)
>
> diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
> index 05076e6..3fd4f3c 100644
> --- a/tools/perf/util/mmap.c
> +++ b/tools/perf/util/mmap.c
> @@ -267,41 +267,65 @@ static int overwrite_rb_find_range(void *buf, int mask, u64 head, u64 *start, u6
> return -1;
> }
>
> -int perf_mmap__push(struct perf_mmap *md, bool overwrite,
> - void *to, int push(void *to, void *buf, size_t size))
> +/*
> + * Report the start and end of the available data in ringbuffer
> + */
> +int perf_mmap__read_init(struct perf_mmap *map, bool overwrite,
> + u64 *start, u64 *end)
> {
> - u64 head = perf_mmap__read_head(md);
> - u64 old = md->prev;
> - u64 end = head, start = old;
> - unsigned char *data = md->base + page_size;
> + u64 head = perf_mmap__read_head(map);
> + u64 old = map->prev;
> + unsigned char *data = map->base + page_size;
> unsigned long size;
> - void *buf;
> - int rc = 0;
>
> - start = overwrite ? head : old;
> - end = overwrite ? old : head;
> + /*
> + * Check if event was unmapped due to a POLLHUP/POLLERR.
> + */
> + if (!refcount_read(&map->refcnt))
> + return -EINVAL;
>
> - if (start == end)
> - return 0;
> + *start = overwrite ? head : old;
> + *end = overwrite ? old : head;
>
> - size = end - start;
> - if (size > (unsigned long)(md->mask) + 1) {
> + if (*start == *end)
> + return -EAGAIN;
> +
> + size = *end - *start;
> + if (size > (unsigned long)(map->mask) + 1) {
> if (!overwrite) {
> WARN_ONCE(1, "failed to keep up with mmap data. (warn only once)\n");
>
> - md->prev = head;
> - perf_mmap__consume(md, overwrite);
> - return 0;
> + map->prev = head;
> + perf_mmap__consume(map, overwrite);
> + return -EAGAIN;
> }
>
> /*
> * Backward ring buffer is full. We still have a chance to read
> * most of data from it.
> */
> - if (overwrite_rb_find_range(data, md->mask, head, &start, &end))
> - return -1;
> + if (overwrite_rb_find_range(data, map->mask, head, start, end))
> + return -EINVAL;
> }
>
> + return 0;
> +}
> +
> +int perf_mmap__push(struct perf_mmap *md, bool overwrite,
> + void *to, int push(void *to, void *buf, size_t size))
> +{
> + u64 head = perf_mmap__read_head(md);
> + u64 end, start;
> + unsigned char *data = md->base + page_size;
> + unsigned long size;
> + void *buf;
> + int rc;
> +
> + rc = perf_mmap__read_init(md, overwrite, &start, &end);
> + if (rc < 0)
> + return (rc == -EAGAIN) ? 0 : -1;
> +
> + size = end - start;
> if ((start & md->mask) + size != (end & md->mask)) {
> buf = &data[start & md->mask];
> size = md->mask + 1 - (start & md->mask);
> diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
> index d640273..abe9b9f 100644
> --- a/tools/perf/util/mmap.h
> +++ b/tools/perf/util/mmap.h
> @@ -94,4 +94,6 @@ int perf_mmap__push(struct perf_mmap *md, bool backward,
>
> size_t perf_mmap__mmap_len(struct perf_mmap *map);
>
> +int perf_mmap__read_init(struct perf_mmap *map, bool overwrite,
> + u64 *start, u64 *end);
> #endif /*__PERF_MMAP_H */
> --
> 2.5.5
>
> On Thu, Dec 21, 2017 at 10:08:44AM -0800, [email protected] wrote:
> > From: Kan Liang <[email protected]>
> >
> > The perf record has specific codes to calculate the ringbuffer
> > position for both overwrite and non-overwrite mode.
> > The perf top will support both modes later.
> > It is useful to make the specific codes generic.
> >
> > Introduce a new interface perf_mmap__read_init() to find ringbuffer
> > position.
> > Add a check for map->refcnt in perf_mmap__read_init().
> > 'size' is needed for both perf_mmap__read_init() and perf_mmap__push().
> > Have to calculate in each function.
>
> it's 2 separate changes then plus 1 not mentioned in changelog..
> could you please split this into separate patches:
>
> - Introduce a new interface perf_mmap__read_init ...
> - Add a check for map->refcnt in perf_mmap__read_init
> - add new EAGAIN return value logic
>
Thanks for the comments, Jirka.
I will split the patch accordingly in V4.
Thanks,
Kan
> thanks,
> jirka
>
> >
> > Signed-off-by: Kan Liang <[email protected]>
> > ---
> > tools/perf/util/mmap.c | 62
> > ++++++++++++++++++++++++++++++++++----------------
> > tools/perf/util/mmap.h | 2 ++
> > 2 files changed, 45 insertions(+), 19 deletions(-)
> >
> > diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c index
> > 05076e6..3fd4f3c 100644
> > --- a/tools/perf/util/mmap.c
> > +++ b/tools/perf/util/mmap.c
> > @@ -267,41 +267,65 @@ static int overwrite_rb_find_range(void *buf, int
> mask, u64 head, u64 *start, u6
> > return -1;
> > }
> >
> > -int perf_mmap__push(struct perf_mmap *md, bool overwrite,
> > - void *to, int push(void *to, void *buf, size_t size))
> > +/*
> > + * Report the start and end of the available data in ringbuffer */
> > +int perf_mmap__read_init(struct perf_mmap *map, bool overwrite,
> > + u64 *start, u64 *end)
> > {
> > - u64 head = perf_mmap__read_head(md);
> > - u64 old = md->prev;
> > - u64 end = head, start = old;
> > - unsigned char *data = md->base + page_size;
> > + u64 head = perf_mmap__read_head(map);
> > + u64 old = map->prev;
> > + unsigned char *data = map->base + page_size;
> > unsigned long size;
> > - void *buf;
> > - int rc = 0;
> >
> > - start = overwrite ? head : old;
> > - end = overwrite ? old : head;
> > + /*
> > + * Check if event was unmapped due to a POLLHUP/POLLERR.
> > + */
> > + if (!refcount_read(&map->refcnt))
> > + return -EINVAL;
> >
> > - if (start == end)
> > - return 0;
> > + *start = overwrite ? head : old;
> > + *end = overwrite ? old : head;
> >
> > - size = end - start;
> > - if (size > (unsigned long)(md->mask) + 1) {
> > + if (*start == *end)
> > + return -EAGAIN;
> > +
> > + size = *end - *start;
> > + if (size > (unsigned long)(map->mask) + 1) {
> > if (!overwrite) {
> > WARN_ONCE(1, "failed to keep up with mmap data.
> (warn only
> > once)\n");
> >
> > - md->prev = head;
> > - perf_mmap__consume(md, overwrite);
> > - return 0;
> > + map->prev = head;
> > + perf_mmap__consume(map, overwrite);
> > + return -EAGAIN;
> > }
> >
> > /*
> > * Backward ring buffer is full. We still have a chance to read
> > * most of data from it.
> > */
> > - if (overwrite_rb_find_range(data, md->mask, head, &start,
> &end))
> > - return -1;
> > + if (overwrite_rb_find_range(data, map->mask, head, start,
> end))
> > + return -EINVAL;
> > }
> >
> > + return 0;
> > +}
> > +
> > +int perf_mmap__push(struct perf_mmap *md, bool overwrite,
> > + void *to, int push(void *to, void *buf, size_t size)) {
> > + u64 head = perf_mmap__read_head(md);
> > + u64 end, start;
> > + unsigned char *data = md->base + page_size;
> > + unsigned long size;
> > + void *buf;
> > + int rc;
> > +
> > + rc = perf_mmap__read_init(md, overwrite, &start, &end);
> > + if (rc < 0)
> > + return (rc == -EAGAIN) ? 0 : -1;
> > +
> > + size = end - start;
> > if ((start & md->mask) + size != (end & md->mask)) {
> > buf = &data[start & md->mask];
> > size = md->mask + 1 - (start & md->mask); diff --git
> > a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h index
> > d640273..abe9b9f 100644
> > --- a/tools/perf/util/mmap.h
> > +++ b/tools/perf/util/mmap.h
> > @@ -94,4 +94,6 @@ int perf_mmap__push(struct perf_mmap *md, bool
> > backward,
> >
> > size_t perf_mmap__mmap_len(struct perf_mmap *map);
> >
> > +int perf_mmap__read_init(struct perf_mmap *map, bool overwrite,
> > + u64 *start, u64 *end);
> > #endif /*__PERF_MMAP_H */
> > --
> > 2.5.5
> >
Hi,
On Tue, Jan 09, 2018 at 03:12:28PM +0000, Liang, Kan wrote:
> > > > > >
> > > > > > Also I guess the current code might miss some events since the head
> > can
> > > > be
> > > > > > different between _read_init() and _read_done(), no?
> > > > > >
> > > > >
> > > > > The overwrite mode requires the ring buffer to be paused during
> > > > processing.
> > > > > The head is unchanged between __read_init() and __read_done().
> > > >
> > > > Ah, ok then. Maybe we could read the head once, and use it during
> > > > processing.
> > >
> > > Yes, it only needs to read head once for overwrite mode.
> > > But for non-overwrite, we have to read the head in every
> > > perf_mmap__read_event(). Because the head is floating.
> > > The non-overwrite is specially handled in patch 5/12 as well.
> >
> > Right, I understand it for the non-overwrite mode.
> >
> > But, for the overwrite mode, my concern was that it might be possible
> > that it reads a stale head in __read_init() (even after it paused the
> > ring buffer) and reads an update head in __read_done(). Then it's
> > gonna miss some records. I'm not sure whether it reads the same head
> > in __read_init() and __read_done() by the pause.
> >
>
> The only scenario which may cause the different 'head' may be as below.
> The 'rb->head' is updated in __perf_output_begin(), but haven’t been
> assigned to 'pc->data_head' for perf tool. During this period, the 'paused'
> is set and __read_init() reads head.
> But this scenario never happens because of the ringbuffer lock.
Which lock did you say?
>
> Otherwise, I cannot imagine any other scenarios which may causes the
> different 'head' in __read_init() and __read_done() with ringbuffer
> paused. Please let me know if there is an example.
Maybe I'm missing something. But I don't know what makes it guarantee
to see the updated data_head written by another cpu before the pause.
>
> There would be some records miss. But it's only because the ringbuffer
> is paused. The head should keep the same.
Hmm.. yes. It's gonna miss some records anyway, then I don't care
about it anymore.
>
> I also did some tests and dump the 'head' in __read_init() and
> __read_done() with ringbuffer paused. I didn't see any difference either.
Maybe on a weak-ordered machine? Not sure though.
But as I said, I don't care anymore since we cannot avoid some missing
records.
Thanks,
Namhyung
On Thu, Dec 21, 2017 at 10:08:44AM -0800, [email protected] wrote:
SNIP
> +/*
> + * Report the start and end of the available data in ringbuffer
> + */
> +int perf_mmap__read_init(struct perf_mmap *map, bool overwrite,
> + u64 *start, u64 *end)
> {
> - u64 head = perf_mmap__read_head(md);
> - u64 old = md->prev;
> - u64 end = head, start = old;
> - unsigned char *data = md->base + page_size;
> + u64 head = perf_mmap__read_head(map);
> + u64 old = map->prev;
> + unsigned char *data = map->base + page_size;
> unsigned long size;
> - void *buf;
> - int rc = 0;
>
> - start = overwrite ? head : old;
> - end = overwrite ? old : head;
> + /*
> + * Check if event was unmapped due to a POLLHUP/POLLERR.
> + */
> + if (!refcount_read(&map->refcnt))
> + return -EINVAL;
>
> - if (start == end)
> - return 0;
> + *start = overwrite ? head : old;
> + *end = overwrite ? old : head;
>
> - size = end - start;
> - if (size > (unsigned long)(md->mask) + 1) {
> + if (*start == *end)
> + return -EAGAIN;
> +
> + size = *end - *start;
> + if (size > (unsigned long)(map->mask) + 1) {
> if (!overwrite) {
> WARN_ONCE(1, "failed to keep up with mmap data. (warn only once)\n");
>
I know you did not change this, but is this leg even possible
in !overwrite mode? I think kernel will throw away the data,
keep the head and wait for tail to be read by user..
jirka
> - md->prev = head;
> - perf_mmap__consume(md, overwrite);
> - return 0;
> + map->prev = head;
> + perf_mmap__consume(map, overwrite);
> + return -EAGAIN;
SNIP
On Thu, Dec 21, 2017 at 10:08:45AM -0800, [email protected] wrote:
> From: Kan Liang <[email protected]>
>
> 'start' and 'prev' are duplicate in perf_mmap__read()
>
> Use 'map->prev' to replace 'start' in perf_mmap__read_*().
>
> Suggested-by: Wang Nan <[email protected]>
> Signed-off-by: Kan Liang <[email protected]>
> ---
> tools/perf/util/mmap.c | 28 ++++++++++------------------
> 1 file changed, 10 insertions(+), 18 deletions(-)
>
> diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
> index 3fd4f3c..a844a2f 100644
> --- a/tools/perf/util/mmap.c
> +++ b/tools/perf/util/mmap.c
> @@ -22,29 +22,27 @@ size_t perf_mmap__mmap_len(struct perf_mmap *map)
>
> /* When check_messup is true, 'end' must points to a good entry */
> static union perf_event *perf_mmap__read(struct perf_mmap *map,
> - u64 start, u64 end, u64 *prev)
> + u64 *start, u64 end)
> {
> unsigned char *data = map->base + page_size;
> union perf_event *event = NULL;
> - int diff = end - start;
> + int diff = end - *start;
I'd like more if the arg was pointer, like 'u64 *startp' and you
kept using the 'u64 start' value in the body.. but can't see any
technical benefit behind it to force it ;-)
jirka
>
> if (diff >= (int)sizeof(event->header)) {
> size_t size;
>
> - event = (union perf_event *)&data[start & map->mask];
> + event = (union perf_event *)&data[*start & map->mask];
> size = event->header.size;
>
> - if (size < sizeof(event->header) || diff < (int)size) {
> - event = NULL;
> - goto broken_event;
> - }
> + if (size < sizeof(event->header) || diff < (int)size)
> + return NULL;
>
> /*
> * Event straddles the mmap boundary -- header should always
> * be inside due to u64 alignment of output.
> */
> - if ((start & map->mask) + size != ((start + size) & map->mask)) {
> - unsigned int offset = start;
> + if ((*start & map->mask) + size != ((*start + size) & map->mask)) {
> + unsigned int offset = *start;
> unsigned int len = min(sizeof(*event), size), cpy;
> void *dst = map->event_copy;
>
> @@ -59,20 +57,15 @@ static union perf_event *perf_mmap__read(struct perf_mmap *map,
> event = (union perf_event *)map->event_copy;
> }
>
> - start += size;
> + *start += size;
> }
>
> -broken_event:
> - if (prev)
> - *prev = start;
> -
> return event;
> }
SNIP
On Thu, Dec 21, 2017 at 10:08:46AM -0800, [email protected] wrote:
> From: Kan Liang <[email protected]>
>
> The direction of overwrite mode is backward. The last mmap__read_event
> will set tail to map->prev. Need to correct the map->prev to head which
> is the end of next read.
>
> It will be used later.
>
> Signed-off-by: Kan Liang <[email protected]>
> ---
> tools/perf/util/mmap.c | 11 +++++++++++
> tools/perf/util/mmap.h | 1 +
> 2 files changed, 12 insertions(+)
>
> diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
> index a844a2f..4aaeb64 100644
> --- a/tools/perf/util/mmap.c
> +++ b/tools/perf/util/mmap.c
> @@ -343,3 +343,14 @@ int perf_mmap__push(struct perf_mmap *md, bool overwrite,
> out:
> return rc;
> }
> +
> +/*
> + * Mandatory for overwrite mode
> + * The direction of overwrite mode is backward.
> + * The last mmap__read_event will set tail to map->prev.
you mean perf_mmap__read function in this comment, right?
jirka
> + * Need to correct the map->prev to head which is the end of next read.
> + */
> +void perf_mmap__read_done(struct perf_mmap *map)
> +{
> + map->prev = perf_mmap__read_head(map);
> +}
> diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
> index abe9b9f..2df27c1 100644
> --- a/tools/perf/util/mmap.h
> +++ b/tools/perf/util/mmap.h
> @@ -96,4 +96,5 @@ size_t perf_mmap__mmap_len(struct perf_mmap *map);
>
> int perf_mmap__read_init(struct perf_mmap *map, bool overwrite,
> u64 *start, u64 *end);
> +void perf_mmap__read_done(struct perf_mmap *map);
> #endif /*__PERF_MMAP_H */
> --
> 2.5.5
>
On Thu, Dec 21, 2017 at 10:08:50AM -0800, [email protected] wrote:
> From: Kan Liang <[email protected]>
>
> Per event overwrite term is not forbidden in perf top, which can bring
> problems. Because perf top only support non-overwrite mode.
>
> Check and forbid inconsistent per event overwrite term in the evlist.
> Make it possible to support either non-overwrite or overwrite mode.
> The overwrite mode is forbidden now, which will be removed when the
> overwrite mode is supported later.
>
> Signed-off-by: Kan Liang <[email protected]>
> ---
> tools/perf/builtin-top.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 60 insertions(+)
>
> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> index c6ccda5..4b85e7b 100644
> --- a/tools/perf/builtin-top.c
> +++ b/tools/perf/builtin-top.c
> @@ -881,6 +881,56 @@ static void perf_top__mmap_read(struct perf_top *top)
> perf_top__mmap_read_idx(top, i);
> }
>
> +/*
> + * Check per event overwrite term.
> + * perf top supports consistent mode for all events.
> + * Return -1 if the per event terms set but not consistent.
please list the rules for overwrite in the comment
are they just top specific?
SNIP
> + if (evsel == perf_evlist__first(evlist))
> + overwrite = set;
> + else
> + return -1;
> + }
> + }
> +
> + if ((overwrite >= 0) && (opts->overwrite != overwrite))
> + opts->overwrite = overwrite;
> +
> + return 0;
> +}
> +
> static int perf_top__start_counters(struct perf_top *top)
> {
> char msg[BUFSIZ];
> @@ -890,6 +940,16 @@ static int perf_top__start_counters(struct perf_top *top)
>
> perf_evlist__config(evlist, opts, &callchain_param);
so perf_evlist__config uses opts->overwrite, which you set
in your perf_top_overwrite_check call.. I'd think you need
to call it sooner
jirka
On Thu, Dec 21, 2017 at 10:08:49AM -0800, [email protected] wrote:
> From: Kan Liang <[email protected]>
>
> Discards perf_mmap__read_backward and perf_mmap__read_catchup. No tools
> use them.
>
> There are tools still use perf_mmap__read_forward. Keep it, but add
> comments to point to the new interface for future use.
>
> Signed-off-by: Kan Liang <[email protected]>
> ---
> tools/perf/util/mmap.c | 50 ++++----------------------------------------------
> tools/perf/util/mmap.h | 3 ---
> 2 files changed, 4 insertions(+), 49 deletions(-)
>
> diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
> index d0ca3ba..650e0a7 100644
> --- a/tools/perf/util/mmap.c
> +++ b/tools/perf/util/mmap.c
> @@ -63,6 +63,10 @@ static union perf_event *perf_mmap__read(struct perf_mmap *map,
> return event;
> }
>
> +/*
> + * legacy interface for mmap read.
> + * Don't use it. Use perf_mmap__read_event().
> + */
could we get rid of it then? looks like it's not much work,
seems it's used only in:
perf_evlist__mmap_read
perf_evlist__mmap_read_forward
it'd prove the new interface work correctly for both cases
thanks,
jirka
> union perf_event *perf_mmap__read_forward(struct perf_mmap *map)
> {
> u64 head;
> @@ -78,41 +82,6 @@ union perf_event *perf_mmap__read_forward(struct perf_mmap *map)
> return perf_mmap__read(map, &map->prev, head);
> }
SNIP
On Thu, Dec 21, 2017 at 10:08:53AM -0800, [email protected] wrote:
SNIP
> .max_stack = sysctl_perf_event_max_stack,
> .sym_pcnt_filter = 5,
> diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
> index 68146f4..56023e4 100644
> --- a/tools/perf/ui/browsers/hists.c
> +++ b/tools/perf/ui/browsers/hists.c
> @@ -638,8 +638,13 @@ int hist_browser__run(struct hist_browser *browser, const char *help)
> nr_entries = hist_browser__nr_entries(browser);
> ui_browser__update_nr_entries(&browser->b, nr_entries);
>
> - if (browser->hists->stats.nr_lost_warned !=
> - browser->hists->stats.nr_events[PERF_RECORD_LOST]) {
> + /*
> + * Don't print lost events warning for perf top,
> + * because it is overwrite mode.
> + * Perf top is the only tool which has hbt timer.
> + */
I hate that warning, but still should it be ommited only for overwrite case?
also please separate this change from the rest of the patch
jirka
> + if ((browser->hists->stats.nr_lost_warned !=
> + browser->hists->stats.nr_events[PERF_RECORD_LOST]) && !hbt) {
> browser->hists->stats.nr_lost_warned =
> browser->hists->stats.nr_events[PERF_RECORD_LOST];
> ui_browser__warn_lost_events(&browser->b);
> @@ -3203,7 +3208,8 @@ static int perf_evsel_menu__run(struct perf_evsel_menu *menu,
> case K_TIMER:
> hbt->timer(hbt->arg);
>
> - if (!menu->lost_events_warned && menu->lost_events) {
> + if (!menu->lost_events_warned &&
> + menu->lost_events && !hbt) {
> ui_browser__warn_lost_events(&menu->b);
> menu->lost_events_warned = true;
> }
> --
> 2.5.5
>
On Thu, Dec 21, 2017 at 10:08:53AM -0800, [email protected] wrote:
> From: Kan Liang <[email protected]>
>
> perf_top__mmap_read has severe performance issue in
> Knights Landing/Mill, when monitoring in heavy load system. It costs
> several minutes to finish, which is unacceptable.
>
> Currently, perf top is non overwrite mode. For non overwrite mode, it
> tries to read everything in the ringbuffer and doesn't pause the
> ringbuffer. Once there are lots of samples delivered persistently,
> the processing time could be very long. Also, the latest samples could
> be lost when the ringbuffer is full.
>
> For overwrite mode, it takes a snapshot for the system by pausing the
> ringbuffer, which could significantly reducing the processing time.
> Also, the overwrite mode always keep the latest samples.
> Considering the real time requirement for perf top, the overwrite mode
> is more suitable for perf top.
>
> Actually, perf top was overwrite mode. It is changed to non overwrite
> mode since commit 93fc64f14472 ("perf top: Switch to non overwrite
> mode"). It's better to change it back to overwrite mode by default.
>
> For the kernel which doesn't support overwrite mode, it will fall back
> to non overwrite mode.
>
> There would be some records lost in overwrite mode because of pausing
> the ringbuffer. It has little impact for the accuracy of the snapshot
> and could be tolerant.
> The lost events checking is removed.
>
> For overwrite mode, unconditionally wait 100 ms before each snapshot. It
> also reduce the overhead caused by pausing ringbuffer, especially on
> light load system.
>
> Signed-off-by: Kan Liang <[email protected]>
> ---
> tools/perf/builtin-top.c | 44 ++++++++++++++++++++++++------------------
> tools/perf/ui/browsers/hists.c | 12 +++++++++---
> 2 files changed, 34 insertions(+), 22 deletions(-)
>
> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> index 8d19ef7..3c14333 100644
> --- a/tools/perf/builtin-top.c
> +++ b/tools/perf/builtin-top.c
> @@ -283,16 +283,6 @@ static void perf_top__print_sym_table(struct perf_top *top)
>
> printf("%-*.*s\n", win_width, win_width, graph_dotted_line);
>
> - if (hists->stats.nr_lost_warned !=
> - hists->stats.nr_events[PERF_RECORD_LOST]) {
> - hists->stats.nr_lost_warned =
> - hists->stats.nr_events[PERF_RECORD_LOST];
> - color_fprintf(stdout, PERF_COLOR_RED,
> - "WARNING: LOST %d chunks, Check IO/CPU overload",
> - hists->stats.nr_lost_warned);
> - ++printed;
> - }
> -
should this stay for non-overwrite mode?
jirka
On Thu, Dec 21, 2017 at 10:08:52AM -0800, [email protected] wrote:
> From: Kan Liang <[email protected]>
>
> Switch to non-overwrite mode if kernel doesnot support overwrite
> ringbuffer.
>
> It's only effect when overwrite mode is supported.
> No change to current behavior.
>
> Signed-off-by: Kan Liang <[email protected]>
> ---
> tools/perf/builtin-top.c | 36 ++++++++++++++++++++++++++++++++++++
> 1 file changed, 36 insertions(+)
>
> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> index 4b85e7b..8d19ef7 100644
> --- a/tools/perf/builtin-top.c
> +++ b/tools/perf/builtin-top.c
> @@ -931,6 +931,27 @@ static int perf_top_overwrite_check(struct perf_top *top)
> return 0;
> }
>
> +static int perf_top_overwrite_fallback(struct perf_top *top,
> + struct perf_evsel *evsel)
> +{
> + struct record_opts *opts = &top->record_opts;
> + struct perf_evlist *evlist = top->evlist;
> + struct perf_evsel *counter;
> +
> + if (!opts->overwrite)
> + return 0;
> +
> + /* only fall back when first event fails */
> + if (evsel != perf_evlist__first(evlist))
> + return 0;
> +
> + evlist__for_each_entry(evlist, counter)
> + counter->attr.write_backward = false;
> + opts->overwrite = false;
> + ui__warning("fall back to non-overwrite mode\n");
> + return 1;
> +}
we already do that for evsel in perf_evsel__open.. could we make
this fallback on one place only?
checking the code, why don't we follow the perf_missing_features.write_backward
like we do for other features in the perf_evsel__open and set
write_backward accordingly, like in attached patch.
jirka
---
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index dbd80268de54..89253f4ff9ee 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1683,9 +1683,6 @@ int perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
int pid = -1, err;
enum { NO_CHANGE, SET_TO_MAX, INCREASED_MAX } set_rlimit = NO_CHANGE;
- if (perf_missing_features.write_backward && evsel->attr.write_backward)
- return -EINVAL;
-
if (cpus == NULL) {
static struct cpu_map *empty_cpu_map;
@@ -1725,6 +1722,8 @@ int perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
}
fallback_missing_features:
+ if (perf_missing_features.write_backward)
+ evsel->attr.write_backward = false;
if (perf_missing_features.clockid_wrong)
evsel->attr.clockid = CLOCK_MONOTONIC; /* should always work */
if (perf_missing_features.clockid) {
> Hi,
>
> On Tue, Jan 09, 2018 at 03:12:28PM +0000, Liang, Kan wrote:
> > > > > > >
> > > > > > > Also I guess the current code might miss some events since the
> head
> > > can
> > > > > be
> > > > > > > different between _read_init() and _read_done(), no?
> > > > > > >
> > > > > >
> > > > > > The overwrite mode requires the ring buffer to be paused during
> > > > > processing.
> > > > > > The head is unchanged between __read_init() and __read_done().
> > > > >
> > > > > Ah, ok then. Maybe we could read the head once, and use it during
> > > > > processing.
> > > >
> > > > Yes, it only needs to read head once for overwrite mode.
> > > > But for non-overwrite, we have to read the head in every
> > > > perf_mmap__read_event(). Because the head is floating.
> > > > The non-overwrite is specially handled in patch 5/12 as well.
> > >
> > > Right, I understand it for the non-overwrite mode.
> > >
> > > But, for the overwrite mode, my concern was that it might be possible
> > > that it reads a stale head in __read_init() (even after it paused the
> > > ring buffer) and reads an update head in __read_done(). Then it's
> > > gonna miss some records. I'm not sure whether it reads the same head
> > > in __read_init() and __read_done() by the pause.
> > >
> >
> > The only scenario which may cause the different 'head' may be as below.
> > The 'rb->head' is updated in __perf_output_begin(), but haven’t been
> > assigned to 'pc->data_head' for perf tool. During this period, the 'paused'
> > is set and __read_init() reads head.
> > But this scenario never happens because of the ringbuffer lock.
>
> Which lock did you say?
>
The RCU lock.
>
> >
> > Otherwise, I cannot imagine any other scenarios which may causes the
> > different 'head' in __read_init() and __read_done() with ringbuffer
> > paused. Please let me know if there is an example.
>
> Maybe I'm missing something. But I don't know what makes it guarantee
> to see the updated data_head written by another cpu before the pause.
>
I think it should be the kernel's responsibility. It's described in the changelog of
Commit 86e7972f690c "perf/ring_buffer: Introduce new ioctl options to
pause and resume the ring-buffer"
>
> >
> > There would be some records miss. But it's only because the ringbuffer
> > is paused. The head should keep the same.
>
> Hmm.. yes. It's gonna miss some records anyway, then I don't care
> about it anymore.
>
OK.
Thanks for the review.
Thanks,
Kan
> On Thu, Dec 21, 2017 at 10:08:44AM -0800, [email protected] wrote:
>
> SNIP
>
> > +/*
> > + * Report the start and end of the available data in ringbuffer
> > + */
> > +int perf_mmap__read_init(struct perf_mmap *map, bool overwrite,
> > + u64 *start, u64 *end)
> > {
> > - u64 head = perf_mmap__read_head(md);
> > - u64 old = md->prev;
> > - u64 end = head, start = old;
> > - unsigned char *data = md->base + page_size;
> > + u64 head = perf_mmap__read_head(map);
> > + u64 old = map->prev;
> > + unsigned char *data = map->base + page_size;
> > unsigned long size;
> > - void *buf;
> > - int rc = 0;
> >
> > - start = overwrite ? head : old;
> > - end = overwrite ? old : head;
> > + /*
> > + * Check if event was unmapped due to a POLLHUP/POLLERR.
> > + */
> > + if (!refcount_read(&map->refcnt))
> > + return -EINVAL;
> >
> > - if (start == end)
> > - return 0;
> > + *start = overwrite ? head : old;
> > + *end = overwrite ? old : head;
> >
> > - size = end - start;
> > - if (size > (unsigned long)(md->mask) + 1) {
> > + if (*start == *end)
> > + return -EAGAIN;
> > +
> > + size = *end - *start;
> > + if (size > (unsigned long)(map->mask) + 1) {
> > if (!overwrite) {
> > WARN_ONCE(1, "failed to keep up with mmap data.
> (warn only once)\n");
> >
>
> I know you did not change this, but is this leg even possible
> in !overwrite mode? I think kernel will throw away the data,
> keep the head and wait for tail to be read by user..
Right, it should not happen in !overwrite mode. I guess it's just
sanity check.
It should not bring any problems.
I think I will still keep it for V4 if no objection?
Thanks,
Kan
>
> jirka
>
> > - md->prev = head;
> > - perf_mmap__consume(md, overwrite);
> > - return 0;
> > + map->prev = head;
> > + perf_mmap__consume(map, overwrite);
> > + return -EAGAIN;
>
> SNIP
> On Thu, Dec 21, 2017 at 10:08:45AM -0800, [email protected] wrote:
> > From: Kan Liang <[email protected]>
> >
> > 'start' and 'prev' are duplicate in perf_mmap__read()
> >
> > Use 'map->prev' to replace 'start' in perf_mmap__read_*().
> >
> > Suggested-by: Wang Nan <[email protected]>
> > Signed-off-by: Kan Liang <[email protected]>
> > ---
> > tools/perf/util/mmap.c | 28 ++++++++++------------------
> > 1 file changed, 10 insertions(+), 18 deletions(-)
> >
> > diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
> > index 3fd4f3c..a844a2f 100644
> > --- a/tools/perf/util/mmap.c
> > +++ b/tools/perf/util/mmap.c
> > @@ -22,29 +22,27 @@ size_t perf_mmap__mmap_len(struct perf_mmap
> *map)
> >
> > /* When check_messup is true, 'end' must points to a good entry */
> > static union perf_event *perf_mmap__read(struct perf_mmap *map,
> > - u64 start, u64 end, u64 *prev)
> > + u64 *start, u64 end)
> > {
> > unsigned char *data = map->base + page_size;
> > union perf_event *event = NULL;
> > - int diff = end - start;
> > + int diff = end - *start;
>
> I'd like more if the arg was pointer, like 'u64 *startp' and you
> kept using the 'u64 start' value in the body.. but can't see any
> technical benefit behind it to force it ;-)
>
I will rename it as startp in V4.
Thanks,
Kan
> jirka
>
> >
> > if (diff >= (int)sizeof(event->header)) {
> > size_t size;
> >
> > - event = (union perf_event *)&data[start & map->mask];
> > + event = (union perf_event *)&data[*start & map->mask];
> > size = event->header.size;
> >
> > - if (size < sizeof(event->header) || diff < (int)size) {
> > - event = NULL;
> > - goto broken_event;
> > - }
> > + if (size < sizeof(event->header) || diff < (int)size)
> > + return NULL;
> >
> > /*
> > * Event straddles the mmap boundary -- header should
> always
> > * be inside due to u64 alignment of output.
> > */
> > - if ((start & map->mask) + size != ((start + size) & map->mask))
> {
> > - unsigned int offset = start;
> > + if ((*start & map->mask) + size != ((*start + size) & map-
> >mask)) {
> > + unsigned int offset = *start;
> > unsigned int len = min(sizeof(*event), size), cpy;
> > void *dst = map->event_copy;
> >
> > @@ -59,20 +57,15 @@ static union perf_event *perf_mmap__read(struct
> perf_mmap *map,
> > event = (union perf_event *)map->event_copy;
> > }
> >
> > - start += size;
> > + *start += size;
> > }
> >
> > -broken_event:
> > - if (prev)
> > - *prev = start;
> > -
> > return event;
> > }
>
> SNIP
> On Thu, Dec 21, 2017 at 10:08:46AM -0800, [email protected] wrote:
> > From: Kan Liang <[email protected]>
> >
> > The direction of overwrite mode is backward. The last mmap__read_event
> > will set tail to map->prev. Need to correct the map->prev to head which
> > is the end of next read.
> >
> > It will be used later.
> >
> > Signed-off-by: Kan Liang <[email protected]>
> > ---
> > tools/perf/util/mmap.c | 11 +++++++++++
> > tools/perf/util/mmap.h | 1 +
> > 2 files changed, 12 insertions(+)
> >
> > diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
> > index a844a2f..4aaeb64 100644
> > --- a/tools/perf/util/mmap.c
> > +++ b/tools/perf/util/mmap.c
> > @@ -343,3 +343,14 @@ int perf_mmap__push(struct perf_mmap *md,
> bool overwrite,
> > out:
> > return rc;
> > }
> > +
> > +/*
> > + * Mandatory for overwrite mode
> > + * The direction of overwrite mode is backward.
> > + * The last mmap__read_event will set tail to map->prev.
>
> you mean perf_mmap__read function in this comment, right?
>
Right.
Will fix it in V4.
Thanks,
Kan
> jirka
>
> > + * Need to correct the map->prev to head which is the end of next read.
> > + */
> > +void perf_mmap__read_done(struct perf_mmap *map)
> > +{
> > + map->prev = perf_mmap__read_head(map);
> > +}
> > diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
> > index abe9b9f..2df27c1 100644
> > --- a/tools/perf/util/mmap.h
> > +++ b/tools/perf/util/mmap.h
> > @@ -96,4 +96,5 @@ size_t perf_mmap__mmap_len(struct perf_mmap
> *map);
> >
> > int perf_mmap__read_init(struct perf_mmap *map, bool overwrite,
> > u64 *start, u64 *end);
> > +void perf_mmap__read_done(struct perf_mmap *map);
> > #endif /*__PERF_MMAP_H */
> > --
> > 2.5.5
> >
> On Thu, Dec 21, 2017 at 10:08:49AM -0800, [email protected] wrote:
> > From: Kan Liang <[email protected]>
> >
> > Discards perf_mmap__read_backward and perf_mmap__read_catchup.
> No tools
> > use them.
> >
> > There are tools still use perf_mmap__read_forward. Keep it, but add
> > comments to point to the new interface for future use.
> >
> > Signed-off-by: Kan Liang <[email protected]>
> > ---
> > tools/perf/util/mmap.c | 50 ++++----------------------------------------------
> > tools/perf/util/mmap.h | 3 ---
> > 2 files changed, 4 insertions(+), 49 deletions(-)
> >
> > diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
> > index d0ca3ba..650e0a7 100644
> > --- a/tools/perf/util/mmap.c
> > +++ b/tools/perf/util/mmap.c
> > @@ -63,6 +63,10 @@ static union perf_event *perf_mmap__read(struct
> perf_mmap *map,
> > return event;
> > }
> >
> > +/*
> > + * legacy interface for mmap read.
> > + * Don't use it. Use perf_mmap__read_event().
> > + */
>
> could we get rid of it then? looks like it's not much work,
> seems it's used only in:
>
To get rid of it, it has to introduce the whole
perf_mmap__read_init/_read_event/_done.
Besides perf top, kvm and trace need to be changed.
There are also 11 perf test cases need to be changed as well.
I think that would make current patch series too huge.
I can submit a separated patch series later to get rid of
the legacy interface. Is it OK?
Thanks,
Kan
> perf_evlist__mmap_read
> perf_evlist__mmap_read_forward
>
> it'd prove the new interface work correctly for both cases
>
> thanks,
> jirka
>
> > union perf_event *perf_mmap__read_forward(struct perf_mmap *map)
> > {
> > u64 head;
> > @@ -78,41 +82,6 @@ union perf_event
> *perf_mmap__read_forward(struct perf_mmap *map)
> > return perf_mmap__read(map, &map->prev, head);
> > }
>
> SNIP
> On Thu, Dec 21, 2017 at 10:08:50AM -0800, [email protected] wrote:
> > From: Kan Liang <[email protected]>
> >
> > Per event overwrite term is not forbidden in perf top, which can bring
> > problems. Because perf top only support non-overwrite mode.
> >
> > Check and forbid inconsistent per event overwrite term in the evlist.
> > Make it possible to support either non-overwrite or overwrite mode.
> > The overwrite mode is forbidden now, which will be removed when the
> > overwrite mode is supported later.
> >
> > Signed-off-by: Kan Liang <[email protected]>
> > ---
> > tools/perf/builtin-top.c | 60
> ++++++++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 60 insertions(+)
> >
> > diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> > index c6ccda5..4b85e7b 100644
> > --- a/tools/perf/builtin-top.c
> > +++ b/tools/perf/builtin-top.c
> > @@ -881,6 +881,56 @@ static void perf_top__mmap_read(struct perf_top
> *top)
> > perf_top__mmap_read_idx(top, i);
> > }
> >
> > +/*
> > + * Check per event overwrite term.
> > + * perf top supports consistent mode for all events.
> > + * Return -1 if the per event terms set but not consistent.
>
> please list the rules for overwrite in the comment
> are they just top specific?
Yes, it's only for perf top.
It doesn't support that some events are overwrite mode and other events are
non-overwrite mode.
I will refine the comments.
>
> SNIP
>
> > + if (evsel == perf_evlist__first(evlist))
> > + overwrite = set;
> > + else
> > + return -1;
> > + }
> > + }
> > +
> > + if ((overwrite >= 0) && (opts->overwrite != overwrite))
> > + opts->overwrite = overwrite;
> > +
> > + return 0;
> > +}
> > +
> > static int perf_top__start_counters(struct perf_top *top)
> > {
> > char msg[BUFSIZ];
> > @@ -890,6 +940,16 @@ static int perf_top__start_counters(struct
> perf_top *top)
> >
> > perf_evlist__config(evlist, opts, &callchain_param);
>
> so perf_evlist__config uses opts->overwrite, which you set
> in your perf_top_overwrite_check call.. I'd think you need
> to call it sooner
>
User may set per-event mode, which is processed in perf_evlist__config
It's possible that the per-event mode is not same as opts->overwrite.
If so, perf_evlist__config actually uses per-event mode.
perf_top_overwrite_check will do the check. If the per-event mode is
inconsistent as opts->overwrite, updating the opts->overwrite.
It has to be called after the perf_evlist__config.
Thanks,
Kan
> On Thu, Dec 21, 2017 at 10:08:52AM -0800, [email protected] wrote:
> > From: Kan Liang <[email protected]>
> >
> > Switch to non-overwrite mode if kernel doesnot support overwrite
> > ringbuffer.
> >
> > It's only effect when overwrite mode is supported.
> > No change to current behavior.
> >
> > Signed-off-by: Kan Liang <[email protected]>
> > ---
> > tools/perf/builtin-top.c | 36 ++++++++++++++++++++++++++++++++++++
> > 1 file changed, 36 insertions(+)
> >
> > diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> > index 4b85e7b..8d19ef7 100644
> > --- a/tools/perf/builtin-top.c
> > +++ b/tools/perf/builtin-top.c
> > @@ -931,6 +931,27 @@ static int perf_top_overwrite_check(struct
> perf_top *top)
> > return 0;
> > }
> >
> > +static int perf_top_overwrite_fallback(struct perf_top *top,
> > + struct perf_evsel *evsel)
> > +{
> > + struct record_opts *opts = &top->record_opts;
> > + struct perf_evlist *evlist = top->evlist;
> > + struct perf_evsel *counter;
> > +
> > + if (!opts->overwrite)
> > + return 0;
> > +
> > + /* only fall back when first event fails */
> > + if (evsel != perf_evlist__first(evlist))
> > + return 0;
> > +
> > + evlist__for_each_entry(evlist, counter)
> > + counter->attr.write_backward = false;
> > + opts->overwrite = false;
> > + ui__warning("fall back to non-overwrite mode\n");
> > + return 1;
> > +}
>
> we already do that for evsel in perf_evsel__open.. could we make
> this fallback on one place only?
>
> checking the code, why don't we follow the
> perf_missing_features.write_backward
> like we do for other features in the perf_evsel__open and set
> write_backward accordingly, like in attached patch.
>
No, the per-event fallback is explicitly disabled in perf_evsel__open.
You may refer to commit 32a951b4fd6b ("perf evlist: Drop redundant
evsel->overwrite indicator)".
Now, only perf top supports the whole fallback. So it is specially handled.
Thanks,
Kan
> jirka
>
>
> ---
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index dbd80268de54..89253f4ff9ee 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -1683,9 +1683,6 @@ int perf_evsel__open(struct perf_evsel *evsel,
> struct cpu_map *cpus,
> int pid = -1, err;
> enum { NO_CHANGE, SET_TO_MAX, INCREASED_MAX } set_rlimit =
> NO_CHANGE;
>
> - if (perf_missing_features.write_backward && evsel-
> >attr.write_backward)
> - return -EINVAL;
> -
> if (cpus == NULL) {
> static struct cpu_map *empty_cpu_map;
>
> @@ -1725,6 +1722,8 @@ int perf_evsel__open(struct perf_evsel *evsel,
> struct cpu_map *cpus,
> }
>
> fallback_missing_features:
> + if (perf_missing_features.write_backward)
> + evsel->attr.write_backward = false;
> if (perf_missing_features.clockid_wrong)
> evsel->attr.clockid = CLOCK_MONOTONIC; /* should always
> work */
> if (perf_missing_features.clockid) {
> SNIP
>
> > .max_stack = sysctl_perf_event_max_stack,
> > .sym_pcnt_filter = 5,
> > diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
> > index 68146f4..56023e4 100644
> > --- a/tools/perf/ui/browsers/hists.c
> > +++ b/tools/perf/ui/browsers/hists.c
> > @@ -638,8 +638,13 @@ int hist_browser__run(struct hist_browser
> *browser, const char *help)
> > nr_entries = hist_browser__nr_entries(browser);
> > ui_browser__update_nr_entries(&browser->b,
> nr_entries);
> >
> > - if (browser->hists->stats.nr_lost_warned !=
> > - browser->hists-
> >stats.nr_events[PERF_RECORD_LOST]) {
> > + /*
> > + * Don't print lost events warning for perf top,
> > + * because it is overwrite mode.
> > + * Perf top is the only tool which has hbt timer.
> > + */
>
> I hate that warning, but still should it be ommited only for overwrite case?
> also please separate this change from the rest of the patch
Sure.
Will do it.
Thanks,
Kan
>
> jirka
>
> > + if ((browser->hists->stats.nr_lost_warned !=
> > + browser->hists-
> >stats.nr_events[PERF_RECORD_LOST]) && !hbt) {
> > browser->hists->stats.nr_lost_warned =
> > browser->hists-
> >stats.nr_events[PERF_RECORD_LOST];
> > ui_browser__warn_lost_events(&browser-
> >b);
> > @@ -3203,7 +3208,8 @@ static int perf_evsel_menu__run(struct
> perf_evsel_menu *menu,
> > case K_TIMER:
> > hbt->timer(hbt->arg);
> >
> > - if (!menu->lost_events_warned && menu-
> >lost_events) {
> > + if (!menu->lost_events_warned &&
> > + menu->lost_events && !hbt) {
> > ui_browser__warn_lost_events(&menu->b);
> > menu->lost_events_warned = true;
> > }
> > --
> > 2.5.5
> >
> On Thu, Dec 21, 2017 at 10:08:53AM -0800, [email protected] wrote:
>
> SNIP
>
> > .max_stack = sysctl_perf_event_max_stack,
> > .sym_pcnt_filter = 5,
> > diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
> > index 68146f4..56023e4 100644
> > --- a/tools/perf/ui/browsers/hists.c
> > +++ b/tools/perf/ui/browsers/hists.c
> > @@ -638,8 +638,13 @@ int hist_browser__run(struct hist_browser
> *browser, const char *help)
> > nr_entries = hist_browser__nr_entries(browser);
> > ui_browser__update_nr_entries(&browser->b,
> nr_entries);
> >
> > - if (browser->hists->stats.nr_lost_warned !=
> > - browser->hists-
> >stats.nr_events[PERF_RECORD_LOST]) {
> > + /*
> > + * Don't print lost events warning for perf top,
> > + * because it is overwrite mode.
> > + * Perf top is the only tool which has hbt timer.
> > + */
>
> I hate that warning, but still should it be ommited only for overwrite case?
> also please separate this change from the rest of the patch
>
OK. Will do it.
Thanks,
Kan
> jirka
>
> > + if ((browser->hists->stats.nr_lost_warned !=
> > + browser->hists-
> >stats.nr_events[PERF_RECORD_LOST]) && !hbt) {
> > browser->hists->stats.nr_lost_warned =
> > browser->hists-
> >stats.nr_events[PERF_RECORD_LOST];
> > ui_browser__warn_lost_events(&browser-
> >b);
> > @@ -3203,7 +3208,8 @@ static int perf_evsel_menu__run(struct
> perf_evsel_menu *menu,
> > case K_TIMER:
> > hbt->timer(hbt->arg);
> >
> > - if (!menu->lost_events_warned && menu-
> >lost_events) {
> > + if (!menu->lost_events_warned &&
> > + menu->lost_events && !hbt) {
> > ui_browser__warn_lost_events(&menu->b);
> > menu->lost_events_warned = true;
> > }
> > --
> > 2.5.5
> >
On Thu, Jan 11, 2018 at 09:26:28PM +0000, Liang, Kan wrote:
>
> > On Thu, Dec 21, 2017 at 10:08:44AM -0800, [email protected] wrote:
> >
> > SNIP
> >
> > > +/*
> > > + * Report the start and end of the available data in ringbuffer
> > > + */
> > > +int perf_mmap__read_init(struct perf_mmap *map, bool overwrite,
> > > + u64 *start, u64 *end)
> > > {
> > > - u64 head = perf_mmap__read_head(md);
> > > - u64 old = md->prev;
> > > - u64 end = head, start = old;
> > > - unsigned char *data = md->base + page_size;
> > > + u64 head = perf_mmap__read_head(map);
> > > + u64 old = map->prev;
> > > + unsigned char *data = map->base + page_size;
> > > unsigned long size;
> > > - void *buf;
> > > - int rc = 0;
> > >
> > > - start = overwrite ? head : old;
> > > - end = overwrite ? old : head;
> > > + /*
> > > + * Check if event was unmapped due to a POLLHUP/POLLERR.
> > > + */
> > > + if (!refcount_read(&map->refcnt))
> > > + return -EINVAL;
> > >
> > > - if (start == end)
> > > - return 0;
> > > + *start = overwrite ? head : old;
> > > + *end = overwrite ? old : head;
> > >
> > > - size = end - start;
> > > - if (size > (unsigned long)(md->mask) + 1) {
> > > + if (*start == *end)
> > > + return -EAGAIN;
> > > +
> > > + size = *end - *start;
> > > + if (size > (unsigned long)(map->mask) + 1) {
> > > if (!overwrite) {
> > > WARN_ONCE(1, "failed to keep up with mmap data.
> > (warn only once)\n");
> > >
> >
> > I know you did not change this, but is this leg even possible
> > in !overwrite mode? I think kernel will throw away the data,
> > keep the head and wait for tail to be read by user..
>
> Right, it should not happen in !overwrite mode. I guess it's just
> sanity check.
> It should not bring any problems.
> I think I will still keep it for V4 if no objection?
ok
jirka
On Thu, Jan 11, 2018 at 09:28:07PM +0000, Liang, Kan wrote:
> > On Thu, Dec 21, 2017 at 10:08:49AM -0800, [email protected] wrote:
> > > From: Kan Liang <[email protected]>
> > >
> > > Discards perf_mmap__read_backward and perf_mmap__read_catchup.
> > No tools
> > > use them.
> > >
> > > There are tools still use perf_mmap__read_forward. Keep it, but add
> > > comments to point to the new interface for future use.
> > >
> > > Signed-off-by: Kan Liang <[email protected]>
> > > ---
> > > tools/perf/util/mmap.c | 50 ++++----------------------------------------------
> > > tools/perf/util/mmap.h | 3 ---
> > > 2 files changed, 4 insertions(+), 49 deletions(-)
> > >
> > > diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
> > > index d0ca3ba..650e0a7 100644
> > > --- a/tools/perf/util/mmap.c
> > > +++ b/tools/perf/util/mmap.c
> > > @@ -63,6 +63,10 @@ static union perf_event *perf_mmap__read(struct
> > perf_mmap *map,
> > > return event;
> > > }
> > >
> > > +/*
> > > + * legacy interface for mmap read.
> > > + * Don't use it. Use perf_mmap__read_event().
> > > + */
> >
> > could we get rid of it then? looks like it's not much work,
> > seems it's used only in:
> >
>
> To get rid of it, it has to introduce the whole
> perf_mmap__read_init/_read_event/_done.
>
> Besides perf top, kvm and trace need to be changed.
> There are also 11 perf test cases need to be changed as well.
>
> I think that would make current patch series too huge.
> I can submit a separated patch series later to get rid of
> the legacy interface. Is it OK?
ok
jirka
On Thu, Jan 11, 2018 at 09:29:21PM +0000, Liang, Kan wrote:
> > On Thu, Dec 21, 2017 at 10:08:50AM -0800, [email protected] wrote:
> > > From: Kan Liang <[email protected]>
> > >
> > > Per event overwrite term is not forbidden in perf top, which can bring
> > > problems. Because perf top only support non-overwrite mode.
> > >
> > > Check and forbid inconsistent per event overwrite term in the evlist.
> > > Make it possible to support either non-overwrite or overwrite mode.
> > > The overwrite mode is forbidden now, which will be removed when the
> > > overwrite mode is supported later.
> > >
> > > Signed-off-by: Kan Liang <[email protected]>
> > > ---
> > > tools/perf/builtin-top.c | 60
> > ++++++++++++++++++++++++++++++++++++++++++++++++
> > > 1 file changed, 60 insertions(+)
> > >
> > > diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> > > index c6ccda5..4b85e7b 100644
> > > --- a/tools/perf/builtin-top.c
> > > +++ b/tools/perf/builtin-top.c
> > > @@ -881,6 +881,56 @@ static void perf_top__mmap_read(struct perf_top
> > *top)
> > > perf_top__mmap_read_idx(top, i);
> > > }
> > >
> > > +/*
> > > + * Check per event overwrite term.
> > > + * perf top supports consistent mode for all events.
> > > + * Return -1 if the per event terms set but not consistent.
> >
> > please list the rules for overwrite in the comment
> > are they just top specific?
>
> Yes, it's only for perf top.
> It doesn't support that some events are overwrite mode and other events are
> non-overwrite mode.
> I will refine the comments.
>
> >
> > SNIP
> >
> > > + if (evsel == perf_evlist__first(evlist))
> > > + overwrite = set;
> > > + else
> > > + return -1;
> > > + }
> > > + }
> > > +
> > > + if ((overwrite >= 0) && (opts->overwrite != overwrite))
> > > + opts->overwrite = overwrite;
> > > +
> > > + return 0;
> > > +}
> > > +
> > > static int perf_top__start_counters(struct perf_top *top)
> > > {
> > > char msg[BUFSIZ];
> > > @@ -890,6 +940,16 @@ static int perf_top__start_counters(struct
> > perf_top *top)
> > >
> > > perf_evlist__config(evlist, opts, &callchain_param);
> >
> > so perf_evlist__config uses opts->overwrite, which you set
> > in your perf_top_overwrite_check call.. I'd think you need
> > to call it sooner
> >
>
> User may set per-event mode, which is processed in perf_evlist__config
> It's possible that the per-event mode is not same as opts->overwrite.
> If so, perf_evlist__config actually uses per-event mode.
>
> perf_top_overwrite_check will do the check. If the per-event mode is
> inconsistent as opts->overwrite, updating the opts->overwrite.
>
> It has to be called after the perf_evlist__config.
let's see the rules then first, also wrt opts->overwrite,
which is used later on I assume
thanks,
jirka
On Thu, Jan 11, 2018 at 09:30:20PM +0000, Liang, Kan wrote:
> > On Thu, Dec 21, 2017 at 10:08:52AM -0800, [email protected] wrote:
> > > From: Kan Liang <[email protected]>
> > >
> > > Switch to non-overwrite mode if kernel doesnot support overwrite
> > > ringbuffer.
> > >
> > > It's only effect when overwrite mode is supported.
> > > No change to current behavior.
> > >
> > > Signed-off-by: Kan Liang <[email protected]>
> > > ---
> > > tools/perf/builtin-top.c | 36 ++++++++++++++++++++++++++++++++++++
> > > 1 file changed, 36 insertions(+)
> > >
> > > diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> > > index 4b85e7b..8d19ef7 100644
> > > --- a/tools/perf/builtin-top.c
> > > +++ b/tools/perf/builtin-top.c
> > > @@ -931,6 +931,27 @@ static int perf_top_overwrite_check(struct
> > perf_top *top)
> > > return 0;
> > > }
> > >
> > > +static int perf_top_overwrite_fallback(struct perf_top *top,
> > > + struct perf_evsel *evsel)
> > > +{
> > > + struct record_opts *opts = &top->record_opts;
> > > + struct perf_evlist *evlist = top->evlist;
> > > + struct perf_evsel *counter;
> > > +
> > > + if (!opts->overwrite)
> > > + return 0;
> > > +
> > > + /* only fall back when first event fails */
> > > + if (evsel != perf_evlist__first(evlist))
> > > + return 0;
> > > +
> > > + evlist__for_each_entry(evlist, counter)
> > > + counter->attr.write_backward = false;
> > > + opts->overwrite = false;
> > > + ui__warning("fall back to non-overwrite mode\n");
> > > + return 1;
> > > +}
> >
> > we already do that for evsel in perf_evsel__open.. could we make
> > this fallback on one place only?
> >
> > checking the code, why don't we follow the
> > perf_missing_features.write_backward
> > like we do for other features in the perf_evsel__open and set
> > write_backward accordingly, like in attached patch.
> >
>
> No, the per-event fallback is explicitly disabled in perf_evsel__open.
> You may refer to commit 32a951b4fd6b ("perf evlist: Drop redundant
> evsel->overwrite indicator)".
>
> Now, only perf top supports the whole fallback. So it is specially handled.
ok
jirka
> On Thu, Jan 11, 2018 at 09:29:21PM +0000, Liang, Kan wrote:
> > > On Thu, Dec 21, 2017 at 10:08:50AM -0800, [email protected] wrote:
> > > > From: Kan Liang <[email protected]>
> > > >
> > > > Per event overwrite term is not forbidden in perf top, which can
> > > > bring problems. Because perf top only support non-overwrite mode.
> > > >
> > > > Check and forbid inconsistent per event overwrite term in the evlist.
> > > > Make it possible to support either non-overwrite or overwrite mode.
> > > > The overwrite mode is forbidden now, which will be removed when
> > > > the overwrite mode is supported later.
> > > >
> > > > Signed-off-by: Kan Liang <[email protected]>
> > > > ---
> > > > tools/perf/builtin-top.c | 60
> > > ++++++++++++++++++++++++++++++++++++++++++++++++
> > > > 1 file changed, 60 insertions(+)
> > > >
> > > > diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> > > > index c6ccda5..4b85e7b 100644
> > > > --- a/tools/perf/builtin-top.c
> > > > +++ b/tools/perf/builtin-top.c
> > > > @@ -881,6 +881,56 @@ static void perf_top__mmap_read(struct
> > > > perf_top
> > > *top)
> > > > perf_top__mmap_read_idx(top, i); }
> > > >
> > > > +/*
> > > > + * Check per event overwrite term.
> > > > + * perf top supports consistent mode for all events.
> > > > + * Return -1 if the per event terms set but not consistent.
> > >
> > > please list the rules for overwrite in the comment are they just top
> > > specific?
> >
> > Yes, it's only for perf top.
> > It doesn't support that some events are overwrite mode and other
> > events are non-overwrite mode.
> > I will refine the comments.
> >
> > >
> > > SNIP
> > >
> > > > + if (evsel == perf_evlist__first(evlist))
> > > > + overwrite = set;
> > > > + else
> > > > + return -1;
> > > > + }
> > > > + }
> > > > +
> > > > + if ((overwrite >= 0) && (opts->overwrite != overwrite))
> > > > + opts->overwrite = overwrite;
> > > > +
> > > > + return 0;
> > > > +}
> > > > +
> > > > static int perf_top__start_counters(struct perf_top *top) {
> > > > char msg[BUFSIZ];
> > > > @@ -890,6 +940,16 @@ static int perf_top__start_counters(struct
> > > perf_top *top)
> > > >
> > > > perf_evlist__config(evlist, opts, &callchain_param);
> > >
> > > so perf_evlist__config uses opts->overwrite, which you set
> > > in your perf_top_overwrite_check call.. I'd think you need
> > > to call it sooner
> > >
> >
> > User may set per-event mode, which is processed in perf_evlist__config
> > It's possible that the per-event mode is not same as opts->overwrite.
> > If so, perf_evlist__config actually uses per-event mode.
> >
> > perf_top_overwrite_check will do the check. If the per-event mode is
> > inconsistent as opts->overwrite, updating the opts->overwrite.
> >
> > It has to be called after the perf_evlist__config.
>
> let's see the rules then first, also wrt opts->overwrite,
> which is used later on I assume
>
The rules for perf_top_overwrite_check is quite simple.
- All events have same per-event mode.
E.g. "cpu/cpu-cycles,no-overwrite/,cpu/instructions,no-overwrite/
Using the per-event setting to replace the opts->overwrite if they are different.
- Events have different per-event mode.
E.g. "cpu/cpu-cycles,overwrite/,cpu/instructions,no-overwrite/"
Error out.
- Some of the event set per-event mode, but some not.
E.g. "cpu/cpu-cycles, /,cpu/instructions,no-overwrite/"
Error out.
There is no --overwrite option support for perf top.
We have a default opts->overwrite.
In perf_evlist__config, attr->write_backward will be set according to
opts->overwrite at first. But if there is per-event mode, the
attr->write_backward will be updated accordingly.
There may be inconsistent between opts->overwrite and
attr->write_backward.
perf_top_overwrite_check will be called right after perf_evlist__config.
It can fix the inconsistence by updating the opts->overwrite or error out.
The opts->overwrite will be used by fallback and _mmap_read() later.
Thanks,
Kan
> > On Thu, Jan 11, 2018 at 09:29:21PM +0000, Liang, Kan wrote:
> > > > On Thu, Dec 21, 2017 at 10:08:50AM -0800, [email protected] wrote:
> > > > > From: Kan Liang <[email protected]>
> > > > >
> > > > > Per event overwrite term is not forbidden in perf top, which can
> > > > > bring problems. Because perf top only support non-overwrite mode.
> > > > >
> > > > > Check and forbid inconsistent per event overwrite term in the evlist.
> > > > > Make it possible to support either non-overwrite or overwrite mode.
> > > > > The overwrite mode is forbidden now, which will be removed when
> > > > > the overwrite mode is supported later.
> > > > >
> > > > > Signed-off-by: Kan Liang <[email protected]>
> > > > > ---
> > > > > tools/perf/builtin-top.c | 60
> > > > ++++++++++++++++++++++++++++++++++++++++++++++++
> > > > > 1 file changed, 60 insertions(+)
> > > > >
> > > > > diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> > > > > index c6ccda5..4b85e7b 100644
> > > > > --- a/tools/perf/builtin-top.c
> > > > > +++ b/tools/perf/builtin-top.c
> > > > > @@ -881,6 +881,56 @@ static void perf_top__mmap_read(struct
> > > > > perf_top
> > > > *top)
> > > > > perf_top__mmap_read_idx(top, i); }
> > > > >
> > > > > +/*
> > > > > + * Check per event overwrite term.
> > > > > + * perf top supports consistent mode for all events.
> > > > > + * Return -1 if the per event terms set but not consistent.
> > > >
> > > > please list the rules for overwrite in the comment are they just
> > > > top specific?
> > >
> > > Yes, it's only for perf top.
> > > It doesn't support that some events are overwrite mode and other
> > > events are non-overwrite mode.
> > > I will refine the comments.
> > >
> > > >
> > > > SNIP
> > > >
> > > > > + if (evsel == perf_evlist__first(evlist))
> > > > > + overwrite = set;
> > > > > + else
> > > > > + return -1;
> > > > > + }
> > > > > + }
> > > > > +
> > > > > + if ((overwrite >= 0) && (opts->overwrite != overwrite))
> > > > > + opts->overwrite = overwrite;
> > > > > +
> > > > > + return 0;
> > > > > +}
> > > > > +
> > > > > static int perf_top__start_counters(struct perf_top *top) {
> > > > > char msg[BUFSIZ];
> > > > > @@ -890,6 +940,16 @@ static int perf_top__start_counters(struct
> > > > perf_top *top)
> > > > >
> > > > > perf_evlist__config(evlist, opts, &callchain_param);
> > > >
> > > > so perf_evlist__config uses opts->overwrite, which you set in your
> > > > perf_top_overwrite_check call.. I'd think you need to call it
> > > > sooner
> > > >
> > >
> > > User may set per-event mode, which is processed in
> > > perf_evlist__config It's possible that the per-event mode is not same as
> opts->overwrite.
> > > If so, perf_evlist__config actually uses per-event mode.
> > >
> > > perf_top_overwrite_check will do the check. If the per-event mode is
> > > inconsistent as opts->overwrite, updating the opts->overwrite.
> > >
> > > It has to be called after the perf_evlist__config.
> >
> > let's see the rules then first, also wrt opts->overwrite, which is
> > used later on I assume
> >
>
> The rules for perf_top_overwrite_check is quite simple.
> - All events have same per-event mode.
> E.g. "cpu/cpu-cycles,no-overwrite/,cpu/instructions,no-overwrite/
> Using the per-event setting to replace the opts->overwrite if they are
> different.
> - Events have different per-event mode.
> E.g. "cpu/cpu-cycles,overwrite/,cpu/instructions,no-overwrite/"
> Error out.
> - Some of the event set per-event mode, but some not.
> E.g. "cpu/cpu-cycles, /,cpu/instructions,no-overwrite/"
> Error out.
>
> There is no --overwrite option support for perf top.
> We have a default opts->overwrite.
> In perf_evlist__config, attr->write_backward will be set according to
> opts->overwrite at first. But if there is per-event mode, the
> attr->write_backward will be updated accordingly.
> There may be inconsistent between opts->overwrite and
> attr->write_backward.
>
> perf_top_overwrite_check will be called right after perf_evlist__config.
> It can fix the inconsistence by updating the opts->overwrite or error out.
> The opts->overwrite will be used by fallback and _mmap_read() later.
>
Check the code again.
Perf_top_overwrite_check() reply on evsel->config_terms to check the
terms, which is already parsed before perf_evlist__config().
You are right, we can call it earlier.
I will move the codes right before the perf_evlist__config() in V4.
Thanks,
Kan