2015-05-11 02:25:43

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V9 0/8] large PEBS interrupt threshold

This patch series implements large PEBS interrupt threshold.
Currently, the PEBS threshold is forced to set to one. A larger PEBS
interrupt threshold can significantly reduce the sampling overhead
especially for frequently occurring events
(like cycles or branches or load/stores) with small sampling period.
For example, perf record cycles event when running kernbench
with 10003 sampling period. The Elapsed Time reduced from 32.7 seconds
to 16.5 seconds, which is 2X faster.
For more details, please refer to patch 4's description.

Limitations:
- It can not supply a callgraph.
- It requires setting a fixed period.
- It cannot supply a time stamp.
- To supply a TID it requires flushing on context switch.
If the above requirement doesn't apply, the threshold will set to one.

Discard samples:
When PEBS events happen close to each other, the records for the events
could be mixed up. Demuxing the records is hard because of hardware
deficiecy. As a result, we have to drop some PEBS records.
A new RECORD type, PERF_RECORD_LOST_SAMPLES, is introduced to record
the number of possible discards, and make sure the user is not left
in the dark about such discards.
For details about sample discards, please refer to patch 3's description.

changes since v1:
- drop patch 'perf, core: Add all PMUs to pmu_idr'
- add comments for case that multiple counters overflow simultaneously
changes since v2:
- rename perf_sched_cb_{enable,disable} to perf_sched_cb_user_{inc,dec}
- use flag to indicate auto reload mechanism
- move codes that setup PEBS sample data to separate function
- output the PEBS records in batch
- enable this for All (PEBS capable) hardware
- more description for the multiplex
changes since v3:
- ignore conflicting PEBS record
changes since v4:
- Do more tests for collision and update comments
changes since v5:
- Move autoreload and large PEBS available check to intel_pmu_hw_config
- make AUTO_RELOAD conditional on large PEBS
- !PEBS bug fix
- coherent story about what is collision and how we handle it
- Remove extra state pebs_sched_cb_enabled
changes since v6:
- new flag PERF_X86_EVENT_FREERUNNING to indicate large PEBS available
- patch reorder and changelog changes for patch 1 and 3
- An easy way to clear !PEBS bit
- Log collision to PERF_RECORD_SAMPLES_LOST
changes since v7:
- Introduce PERF_RECORD_LOST_SAMPLES to record the number of discards
- Remove entire for_each_set_bit() loop
- Minor changes on comments and changelog
changes since v8:
- Record 'lost' events to all set bits
- dropped the @id field from the lost samples record
- Print lost samples event nr in perf report --stdio output

Yan, Zheng (6):
perf, x86: use the PEBS auto reload mechanism when possible
perf, x86: introduce setup_pebs_sample_data()
perf, x86: handle multiple records in PEBS buffer
perf, x86: large PEBS interrupt threshold
perf, x86: drain PEBS buffer during context switch
perf, x86: enlarge PEBS buffer

Kan Liang (2):
perf, x86: introduce PERF_RECORD_LOST_SAMPLES
perf tools: handle PERF_RECORD_LOST_SAMPLES

arch/x86/kernel/cpu/perf_event.c | 15 +-
arch/x86/kernel/cpu/perf_event.h | 16 ++
arch/x86/kernel/cpu/perf_event_intel.c | 22 +-
arch/x86/kernel/cpu/perf_event_intel_ds.c | 311 +++++++++++++++++++++--------
arch/x86/kernel/cpu/perf_event_intel_lbr.c | 3 -
include/linux/perf_event.h | 16 ++
include/uapi/linux/perf_event.h | 12 ++
kernel/events/core.c | 39 +++-
kernel/events/internal.h | 9 -
tools/perf/builtin-report.c | 1 +
tools/perf/util/event.c | 9 +
tools/perf/util/event.h | 17 ++
tools/perf/util/machine.c | 10 +
tools/perf/util/machine.h | 2 +
tools/perf/util/session.c | 19 ++
tools/perf/util/tool.h | 1 +
16 files changed, 392 insertions(+), 110 deletions(-)

--
1.8.3.1


2015-05-11 02:27:55

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V9 1/8] perf, x86: use the PEBS auto reload mechanism when possible

From: Yan, Zheng <[email protected]>

When a fixed period is specified, this patch make perf use the PEBS
auto reload mechanism. This makes normal profiling faster, because
it avoids one costly MSR write in the PMI handler.
However, the reset value will be loaded by hardware assist. There is a
little bit delay compared to previous non-auto-reload mechanism. The
delay time is arbitrary, but very small. The assist cost is 400-800
cycles, assuming common cases with everything cached. The minimum period
the patch currently uses is 10000. In that extreme case it can be ~10%
if cycles are used.

Signed-off-by: Yan, Zheng <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
---
arch/x86/kernel/cpu/perf_event.c | 15 +++++++++------
arch/x86/kernel/cpu/perf_event.h | 1 +
arch/x86/kernel/cpu/perf_event_intel.c | 8 ++++++--
arch/x86/kernel/cpu/perf_event_intel_ds.c | 7 +++++++
4 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 87848eb..8cc1153 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1058,13 +1058,16 @@ int x86_perf_event_set_period(struct perf_event *event)

per_cpu(pmc_prev_left[idx], smp_processor_id()) = left;

- /*
- * The hw event starts counting from this event offset,
- * mark it to be able to extra future deltas:
- */
- local64_set(&hwc->prev_count, (u64)-left);
+ if (!(hwc->flags & PERF_X86_EVENT_AUTO_RELOAD) ||
+ local64_read(&hwc->prev_count) != (u64)-left) {
+ /*
+ * The hw event starts counting from this event offset,
+ * mark it to be able to extra future deltas:
+ */
+ local64_set(&hwc->prev_count, (u64)-left);

- wrmsrl(hwc->event_base, (u64)(-left) & x86_pmu.cntval_mask);
+ wrmsrl(hwc->event_base, (u64)(-left) & x86_pmu.cntval_mask);
+ }

/*
* Due to erratum on certan cpu we need
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 6ac5cb7..1cb5859 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -74,6 +74,7 @@ struct event_constraint {
#define PERF_X86_EVENT_EXCL 0x0040 /* HT exclusivity on counter */
#define PERF_X86_EVENT_DYNAMIC 0x0080 /* dynamic alloc'd constraint */
#define PERF_X86_EVENT_RDPMC_ALLOWED 0x0100 /* grant rdpmc permission */
+#define PERF_X86_EVENT_AUTO_RELOAD 0x0200 /* use PEBS auto-reload */


struct amd_nb {
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 960e85d..3119071 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -2305,8 +2305,12 @@ static int intel_pmu_hw_config(struct perf_event *event)
if (ret)
return ret;

- if (event->attr.precise_ip && x86_pmu.pebs_aliases)
- x86_pmu.pebs_aliases(event);
+ if (event->attr.precise_ip) {
+ if (!event->attr.freq)
+ event->hw.flags |= PERF_X86_EVENT_AUTO_RELOAD;
+ if (x86_pmu.pebs_aliases)
+ x86_pmu.pebs_aliases(event);
+ }

if (needs_branch_stack(event)) {
ret = intel_pmu_setup_lbr_filter(event);
diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 813f75d..f856f73 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -688,6 +688,7 @@ void intel_pmu_pebs_enable(struct perf_event *event)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
struct hw_perf_event *hwc = &event->hw;
+ struct debug_store *ds = cpuc->ds;

hwc->config &= ~ARCH_PERFMON_EVENTSEL_INT;

@@ -697,6 +698,12 @@ void intel_pmu_pebs_enable(struct perf_event *event)
cpuc->pebs_enabled |= 1ULL << (hwc->idx + 32);
else if (event->hw.flags & PERF_X86_EVENT_PEBS_ST)
cpuc->pebs_enabled |= 1ULL << 63;
+
+ /* Use auto-reload if possible to save a MSR write in the PMI */
+ if (hwc->flags & PERF_X86_EVENT_AUTO_RELOAD) {
+ ds->pebs_event_reset[hwc->idx] =
+ (u64)(-hwc->sample_period) & x86_pmu.cntval_mask;
+ }
}

void intel_pmu_pebs_disable(struct perf_event *event)
--
1.8.3.1

2015-05-11 02:27:57

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V9 2/8] perf, x86: introduce setup_pebs_sample_data()

From: Yan, Zheng <[email protected]>

move codes that setup PEBS sample data to separate function.

Signed-off-by: Yan, Zheng <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
---
arch/x86/kernel/cpu/perf_event_intel_ds.c | 95 +++++++++++++++++--------------
1 file changed, 52 insertions(+), 43 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index f856f73..f26c8b4 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -853,8 +853,10 @@ static inline u64 intel_hsw_transaction(struct pebs_record_hsw *pebs)
return txn;
}

-static void __intel_pmu_pebs_event(struct perf_event *event,
- struct pt_regs *iregs, void *__pebs)
+static void setup_pebs_sample_data(struct perf_event *event,
+ struct pt_regs *iregs, void *__pebs,
+ struct perf_sample_data *data,
+ struct pt_regs *regs)
{
#define PERF_X86_EVENT_PEBS_HSW_PREC \
(PERF_X86_EVENT_PEBS_ST_HSW | \
@@ -866,30 +868,25 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
*/
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
struct pebs_record_hsw *pebs = __pebs;
- struct perf_sample_data data;
- struct pt_regs regs;
u64 sample_type;
int fll, fst, dsrc;
int fl = event->hw.flags;

- if (!intel_pmu_save_and_restart(event))
- return;
-
sample_type = event->attr.sample_type;
dsrc = sample_type & PERF_SAMPLE_DATA_SRC;

fll = fl & PERF_X86_EVENT_PEBS_LDLAT;
fst = fl & (PERF_X86_EVENT_PEBS_ST | PERF_X86_EVENT_PEBS_HSW_PREC);

- perf_sample_data_init(&data, 0, event->hw.last_period);
+ perf_sample_data_init(data, 0, event->hw.last_period);

- data.period = event->hw.last_period;
+ data->period = event->hw.last_period;

/*
* Use latency for weight (only avail with PEBS-LL)
*/
if (fll && (sample_type & PERF_SAMPLE_WEIGHT))
- data.weight = pebs->lat;
+ data->weight = pebs->lat;

/*
* data.data_src encodes the data source
@@ -902,7 +899,7 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
val = precise_datala_hsw(event, pebs->dse);
else if (fst)
val = precise_store_data(pebs->dse);
- data.data_src.val = val;
+ data->data_src.val = val;
}

/*
@@ -915,58 +912,70 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
* PERF_SAMPLE_IP and PERF_SAMPLE_CALLCHAIN to function properly.
* A possible PERF_SAMPLE_REGS will have to transfer all regs.
*/
- regs = *iregs;
- regs.flags = pebs->flags;
- set_linear_ip(&regs, pebs->ip);
- regs.bp = pebs->bp;
- regs.sp = pebs->sp;
+ *regs = *iregs;
+ regs->flags = pebs->flags;
+ set_linear_ip(regs, pebs->ip);
+ regs->bp = pebs->bp;
+ regs->sp = pebs->sp;

if (sample_type & PERF_SAMPLE_REGS_INTR) {
- regs.ax = pebs->ax;
- regs.bx = pebs->bx;
- regs.cx = pebs->cx;
- regs.dx = pebs->dx;
- regs.si = pebs->si;
- regs.di = pebs->di;
- regs.bp = pebs->bp;
- regs.sp = pebs->sp;
-
- regs.flags = pebs->flags;
+ regs->ax = pebs->ax;
+ regs->bx = pebs->bx;
+ regs->cx = pebs->cx;
+ regs->dx = pebs->dx;
+ regs->si = pebs->si;
+ regs->di = pebs->di;
+ regs->bp = pebs->bp;
+ regs->sp = pebs->sp;
+
+ regs->flags = pebs->flags;
#ifndef CONFIG_X86_32
- regs.r8 = pebs->r8;
- regs.r9 = pebs->r9;
- regs.r10 = pebs->r10;
- regs.r11 = pebs->r11;
- regs.r12 = pebs->r12;
- regs.r13 = pebs->r13;
- regs.r14 = pebs->r14;
- regs.r15 = pebs->r15;
+ regs->r8 = pebs->r8;
+ regs->r9 = pebs->r9;
+ regs->r10 = pebs->r10;
+ regs->r11 = pebs->r11;
+ regs->r12 = pebs->r12;
+ regs->r13 = pebs->r13;
+ regs->r14 = pebs->r14;
+ regs->r15 = pebs->r15;
#endif
}

if (event->attr.precise_ip > 1 && x86_pmu.intel_cap.pebs_format >= 2) {
- regs.ip = pebs->real_ip;
- regs.flags |= PERF_EFLAGS_EXACT;
- } else if (event->attr.precise_ip > 1 && intel_pmu_pebs_fixup_ip(&regs))
- regs.flags |= PERF_EFLAGS_EXACT;
+ regs->ip = pebs->real_ip;
+ regs->flags |= PERF_EFLAGS_EXACT;
+ } else if (event->attr.precise_ip > 1 && intel_pmu_pebs_fixup_ip(regs))
+ regs->flags |= PERF_EFLAGS_EXACT;
else
- regs.flags &= ~PERF_EFLAGS_EXACT;
+ regs->flags &= ~PERF_EFLAGS_EXACT;

if ((sample_type & PERF_SAMPLE_ADDR) &&
x86_pmu.intel_cap.pebs_format >= 1)
- data.addr = pebs->dla;
+ data->addr = pebs->dla;

if (x86_pmu.intel_cap.pebs_format >= 2) {
/* Only set the TSX weight when no memory weight. */
if ((sample_type & PERF_SAMPLE_WEIGHT) && !fll)
- data.weight = intel_hsw_weight(pebs);
+ data->weight = intel_hsw_weight(pebs);

if (sample_type & PERF_SAMPLE_TRANSACTION)
- data.txn = intel_hsw_transaction(pebs);
+ data->txn = intel_hsw_transaction(pebs);
}

if (has_branch_stack(event))
- data.br_stack = &cpuc->lbr_stack;
+ data->br_stack = &cpuc->lbr_stack;
+}
+
+static void __intel_pmu_pebs_event(struct perf_event *event,
+ struct pt_regs *iregs, void *__pebs)
+{
+ struct perf_sample_data data;
+ struct pt_regs regs;
+
+ if (!intel_pmu_save_and_restart(event))
+ return;
+
+ setup_pebs_sample_data(event, iregs, __pebs, &data, &regs);

if (perf_event_overflow(event, &data, &regs))
x86_pmu_stop(event, 0);
--
1.8.3.1

2015-05-11 02:25:49

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V9 3/8] perf, x86: handle multiple records in PEBS buffer

From: Yan, Zheng <[email protected]>

When the PEBS interrupt threshold is larger than one record and the
machine supports multiple PEBS events, the records of these events are
mixed up and we need to demultiplex them.

Demuxing the records is hard because the hardware is deficient. The
hardware has two issues that, when combined, create impossible scenarios
to demux. (The deficiency has been fixed in Skylake. The two issues here
are for pre-SKL platforms.)

The first issue is that the 'status' field of the PEBS record is a copy
of the GLOBAL_STATUS MSR at PEBS assist time. To see why this is a
problem let us first describe the regular PEBS cycle:

A) the CTRn value reaches 0:
- the corresponding bit in GLOBAL_STATUS gets set
- we start arming the hardware assist
< some unspecified amount of time later -- this could cover multiple
events of interest >

B) the hardware assist is armed, any next event will trigger it

C) a matching event happens:
- the hardware assist triggers and generates a PEBS record
this includes a copy of GLOBAL_STATUS at this moment
- if we auto-reload we (re)set CTRn
- we clear the relevant bit in GLOBAL_STATUS

Now consider the following chain of events:

A0, B0, A1, C0

The event generated for counter 0 will include a status with counter 1
set, even though its not at all related to the record. A similar thing
can happen with a !PEBS event if it just happens to overflow at the
right moment.

The second issue is that the hardware will only emit one record for two
or more counters if the event that triggers the assist is 'close'. The
'close' can be several cycles. In some cases even the complete assist,
if the event is something that doesn't need retirement.

For instance, consider this chain of events:
A0, B0, A1, B1, C01

Where C01 is an event that triggers both hardware assists, we will
generate but a single record, but again with both counters listed in the
status field.

This time the record pertains to both events.

Note that these two cases are different but undistinguishable with the
data as generated. Therefore demuxing records with multiple PEBS bits
(we can safely ignore status bits for !PEBS counters) is impossible.

Furthermore we cannot emit the record to both events because that might
cause a data leak -- the events might not have the same privileges -- so
what this patch does is discard such events.

The assumption/hope is that such discards will be rare.

Here lists some possible ways you may get high discard rate.
- when you count the same thing multiple times. But it is not a useful
configuration.
- you can be unfortunate if you measure with a userspace only PEBS
event along with either a kernel or unrestricted PEBS event. Imagine
the event triggering and setting the overflow flag right before
entering the kernel. Then all kernel side events will end up with
multiple bits set.

Signed-off-by: Yan, Zheng <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
---
arch/x86/kernel/cpu/perf_event_intel_ds.c | 144 +++++++++++++++++++++---------
include/linux/perf_event.h | 13 +++
kernel/events/core.c | 6 +-
kernel/events/internal.h | 9 --
4 files changed, 118 insertions(+), 54 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index f26c8b4..971c77e 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -872,6 +872,9 @@ static void setup_pebs_sample_data(struct perf_event *event,
int fll, fst, dsrc;
int fl = event->hw.flags;

+ if (pebs == NULL)
+ return;
+
sample_type = event->attr.sample_type;
dsrc = sample_type & PERF_SAMPLE_DATA_SRC;

@@ -966,19 +969,68 @@ static void setup_pebs_sample_data(struct perf_event *event,
data->br_stack = &cpuc->lbr_stack;
}

+static inline void *
+get_next_pebs_record_by_bit(void *base, void *top, int bit)
+{
+ struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+ void *at;
+ u64 pebs_status;
+
+ if (base == NULL)
+ return NULL;
+
+ for (at = base; at < top; at += x86_pmu.pebs_record_size) {
+ struct pebs_record_nhm *p = at;
+
+ if (test_bit(bit, (unsigned long *)&p->status)) {
+
+ if (p->status == (1 << bit))
+ return at;
+
+ /* clear non-PEBS bit and re-check */
+ pebs_status = p->status & cpuc->pebs_enabled;
+ pebs_status &= (1ULL << MAX_PEBS_EVENTS) - 1;
+ if (pebs_status == (1 << bit))
+ return at;
+ }
+ }
+ return NULL;
+}
+
static void __intel_pmu_pebs_event(struct perf_event *event,
- struct pt_regs *iregs, void *__pebs)
+ struct pt_regs *iregs,
+ void *base, void *top,
+ int bit, int count)
{
struct perf_sample_data data;
struct pt_regs regs;
+ int i;
+ void *at = get_next_pebs_record_by_bit(base, top, bit);

- if (!intel_pmu_save_and_restart(event))
+ if (!intel_pmu_save_and_restart(event) &&
+ !(event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD))
return;

- setup_pebs_sample_data(event, iregs, __pebs, &data, &regs);
+ if (count > 1) {
+ for (i = 0; i < count - 1; i++) {
+ setup_pebs_sample_data(event, iregs, at, &data, &regs);
+ perf_event_output(event, &data, &regs);
+ at += x86_pmu.pebs_record_size;
+ at = get_next_pebs_record_by_bit(at, top, bit);
+ }
+ }
+
+ setup_pebs_sample_data(event, iregs, at, &data, &regs);

- if (perf_event_overflow(event, &data, &regs))
+ /*
+ * All but the last records are processed.
+ * The last one is left to be able to call the overflow handler.
+ */
+ if (perf_event_overflow(event, &data, &regs)) {
x86_pmu_stop(event, 0);
+ return;
+ }
+
}

static void intel_pmu_drain_pebs_core(struct pt_regs *iregs)
@@ -1008,72 +1060,80 @@ static void intel_pmu_drain_pebs_core(struct pt_regs *iregs)
if (!event->attr.precise_ip)
return;

- n = top - at;
+ n = (top - at) / x86_pmu.pebs_record_size;
if (n <= 0)
return;

- /*
- * Should not happen, we program the threshold at 1 and do not
- * set a reset value.
- */
- WARN_ONCE(n > 1, "bad leftover pebs %d\n", n);
- at += n - 1;
-
- __intel_pmu_pebs_event(event, iregs, at);
+ __intel_pmu_pebs_event(event, iregs, at,
+ top, 0, n);
}

static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
struct debug_store *ds = cpuc->ds;
- struct perf_event *event = NULL;
- void *at, *top;
- u64 status = 0;
+ struct perf_event *event;
+ void *base, *at, *top;
int bit;
+ short counts[MAX_PEBS_EVENTS] = {};

if (!x86_pmu.pebs_active)
return;

- at = (struct pebs_record_nhm *)(unsigned long)ds->pebs_buffer_base;
+ base = (struct pebs_record_nhm *)(unsigned long)ds->pebs_buffer_base;
top = (struct pebs_record_nhm *)(unsigned long)ds->pebs_index;

ds->pebs_index = ds->pebs_buffer_base;

- if (unlikely(at > top))
+ if (unlikely(base >= top))
return;

- /*
- * Should not happen, we program the threshold at 1 and do not
- * set a reset value.
- */
- WARN_ONCE(top - at > x86_pmu.max_pebs_events * x86_pmu.pebs_record_size,
- "Unexpected number of pebs records %ld\n",
- (long)(top - at) / x86_pmu.pebs_record_size);
-
- for (; at < top; at += x86_pmu.pebs_record_size) {
+ for (at = base; at < top; at += x86_pmu.pebs_record_size) {
struct pebs_record_nhm *p = at;

- for_each_set_bit(bit, (unsigned long *)&p->status,
- x86_pmu.max_pebs_events) {
- event = cpuc->events[bit];
- if (!test_bit(bit, cpuc->active_mask))
- continue;
-
- WARN_ON_ONCE(!event);
-
- if (!event->attr.precise_ip)
- continue;
+ bit = find_first_bit((unsigned long *)&p->status,
+ x86_pmu.max_pebs_events);
+ if (bit >= x86_pmu.max_pebs_events)
+ continue;
+ if (!test_bit(bit, cpuc->active_mask))
+ continue;
+ /*
+ * The PEBS hardware does not deal well with the situation
+ * when events happen near to each other and multiple bits
+ * are set. But it should happen rarely.
+ *
+ * If these events include one PEBS and multiple non-PEBS
+ * events, it doesn't impact PEBS record. The record will
+ * be handled normally. (slow path)
+ *
+ * If these events include two or more PEBS events, the
+ * records for the events can be collapsed into a single
+ * one, and it's not possible to reconstruct all events
+ * that caused the PEBS record. It's called collision.
+ * If collision happened, the record will be dropped.
+ *
+ */
+ if (p->status != (1 << bit)) {
+ u64 pebs_status;

- if (__test_and_set_bit(bit, (unsigned long *)&status))
+ /* slow path */
+ pebs_status = p->status & cpuc->pebs_enabled;
+ pebs_status &= (1ULL << MAX_PEBS_EVENTS) - 1;
+ if (pebs_status != (1 << bit))
continue;
-
- break;
}
+ counts[bit]++;
+ }

- if (!event || bit >= x86_pmu.max_pebs_events)
+ for (bit = 0; bit < x86_pmu.max_pebs_events; bit++) {
+ if (counts[bit] == 0)
continue;
+ event = cpuc->events[bit];
+ WARN_ON_ONCE(!event);
+ WARN_ON_ONCE(!event->attr.precise_ip);

- __intel_pmu_pebs_event(event, iregs, at);
+ __intel_pmu_pebs_event(event, iregs, base,
+ top, bit, counts[bit]);
}
}

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 61992cf..bed1b6f 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -734,6 +734,19 @@ extern int perf_event_overflow(struct perf_event *event,
struct perf_sample_data *data,
struct pt_regs *regs);

+extern void perf_event_output(struct perf_event *event,
+ struct perf_sample_data *data,
+ struct pt_regs *regs);
+
+extern void
+perf_event_header__init_id(struct perf_event_header *header,
+ struct perf_sample_data *data,
+ struct perf_event *event);
+extern void
+perf_event__output_id_sample(struct perf_event *event,
+ struct perf_output_handle *handle,
+ struct perf_sample_data *sample);
+
static inline bool is_sampling_event(struct perf_event *event)
{
return event->attr.sample_period != 0;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index f528829..4d221a4 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5333,9 +5333,9 @@ void perf_prepare_sample(struct perf_event_header *header,
}
}

-static void perf_event_output(struct perf_event *event,
- struct perf_sample_data *data,
- struct pt_regs *regs)
+void perf_event_output(struct perf_event *event,
+ struct perf_sample_data *data,
+ struct pt_regs *regs)
{
struct perf_output_handle handle;
struct perf_event_header header;
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 9f6ce9b..2deb24c 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -72,15 +72,6 @@ static inline bool rb_has_aux(struct ring_buffer *rb)
void perf_event_aux_event(struct perf_event *event, unsigned long head,
unsigned long size, u64 flags);

-extern void
-perf_event_header__init_id(struct perf_event_header *header,
- struct perf_sample_data *data,
- struct perf_event *event);
-extern void
-perf_event__output_id_sample(struct perf_event *event,
- struct perf_output_handle *handle,
- struct perf_sample_data *sample);
-
extern struct page *
perf_mmap_to_page(struct ring_buffer *rb, unsigned long pgoff);

--
1.8.3.1

2015-05-11 02:25:53

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V9 4/8] perf, x86: large PEBS interrupt threshold

From: Yan, Zheng <[email protected]>

PEBS always had the capability to log samples to its buffers without
an interrupt. Traditionally perf has not used this but always set the
PEBS threshold to one.

For frequently occurring events (like cycles or branches or load/store)
this in term requires using a relatively high sampling period to avoid
overloading the system, by only processing PMIs. This in term increases
sampling error.

For the common cases we still need to use the PMI because the PEBS
hardware has various limitations. The biggest one is that it can not
supply a callgraph. It also requires setting a fixed period, as the
hardware does not support adaptive period. Another issue is that it
cannot supply a time stamp and some other options. To supply a TID it
requires flushing on context switch. It can however supply the IP, the
load/store address, TSX information, registers, and some other things.

So we can make PEBS work for some specific cases, basically as long as
you can do without a callgraph and can set the period you can use this
new PEBS mode.

The main benefit is the ability to support much lower sampling period
(down to -c 1000) without extensive overhead.

One use cases is for example to increase the resolution of the c2c tool.
Another is double checking when you suspect the standard sampling has
too much sampling error.

Some numbers on the overhead, using cycle soak, comparing the elapsed
time from "kernbench -M -H" between plain (threshold set to one) and
multi (large threshold).
The test command for plain:
"perf record --time -e cycles:p -c $period -- kernbench -M -H"
The test command for multi:
"perf record --no-time -e cycles:p -c $period -- kernbench -M -H"
(The only difference of test command between multi and plain is time
stamp options. Since time stamp is not supported by large PEBS
threshold, it can be used as a flag to indicate if large threshold is
enabled during the test.)

period plain(Sec) multi(Sec) Delta
10003 32.7 16.5 16.2
20003 30.2 16.2 14.0
40003 18.6 14.1 4.5
80003 16.8 14.6 2.2
100003 16.9 14.1 2.8
800003 15.4 15.7 -0.3
1000003 15.3 15.2 0.2
2000003 15.3 15.1 0.1

With periods below 100003, plain (threshold one) cause much more
overhead. With 10003 sampling period, the Elapsed Time for multi is
even 2X faster than plain.

Signed-off-by: Yan, Zheng <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
---
arch/x86/kernel/cpu/perf_event.h | 11 +++++++++++
arch/x86/kernel/cpu/perf_event_intel.c | 5 ++++-
arch/x86/kernel/cpu/perf_event_intel_ds.c | 27 +++++++++++++++++++++++----
3 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 1cb5859..626ded3 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -75,6 +75,7 @@ struct event_constraint {
#define PERF_X86_EVENT_DYNAMIC 0x0080 /* dynamic alloc'd constraint */
#define PERF_X86_EVENT_RDPMC_ALLOWED 0x0100 /* grant rdpmc permission */
#define PERF_X86_EVENT_AUTO_RELOAD 0x0200 /* use PEBS auto-reload */
+#define PERF_X86_EVENT_FREERUNNING 0x0400 /* use freerunning PEBS */


struct amd_nb {
@@ -88,6 +89,16 @@ struct amd_nb {
#define MAX_PEBS_EVENTS 8

/*
+ * Flags PEBS can handle without an PMI.
+ *
+ */
+#define PEBS_FREERUNNING_FLAGS \
+ (PERF_SAMPLE_IP | PERF_SAMPLE_ADDR | \
+ PERF_SAMPLE_ID | PERF_SAMPLE_CPU | PERF_SAMPLE_STREAM_ID | \
+ PERF_SAMPLE_DATA_SRC | PERF_SAMPLE_IDENTIFIER | \
+ PERF_SAMPLE_TRANSACTION)
+
+/*
* A debug store configuration.
*
* We only support architectures that use 64bit fields.
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 3119071..fdf818a 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -2306,8 +2306,11 @@ static int intel_pmu_hw_config(struct perf_event *event)
return ret;

if (event->attr.precise_ip) {
- if (!event->attr.freq)
+ if (!event->attr.freq) {
event->hw.flags |= PERF_X86_EVENT_AUTO_RELOAD;
+ if (!(event->attr.sample_type & ~PEBS_FREERUNNING_FLAGS))
+ event->hw.flags |= PERF_X86_EVENT_FREERUNNING;
+ }
if (x86_pmu.pebs_aliases)
x86_pmu.pebs_aliases(event);
}
diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 971c77e..0b727d1 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -250,7 +250,7 @@ static int alloc_pebs_buffer(int cpu)
{
struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds;
int node = cpu_to_node(cpu);
- int max, thresh = 1; /* always use a single PEBS record */
+ int max;
void *buffer, *ibuffer;

if (!x86_pmu.pebs)
@@ -280,9 +280,6 @@ static int alloc_pebs_buffer(int cpu)
ds->pebs_absolute_maximum = ds->pebs_buffer_base +
max * x86_pmu.pebs_record_size;

- ds->pebs_interrupt_threshold = ds->pebs_buffer_base +
- thresh * x86_pmu.pebs_record_size;
-
return 0;
}

@@ -684,14 +681,22 @@ struct event_constraint *intel_pebs_constraints(struct perf_event *event)
return &emptyconstraint;
}

+static inline bool pebs_is_enabled(struct cpu_hw_events *cpuc)
+{
+ return (cpuc->pebs_enabled & ((1ULL << MAX_PEBS_EVENTS) - 1));
+}
+
void intel_pmu_pebs_enable(struct perf_event *event)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
struct hw_perf_event *hwc = &event->hw;
struct debug_store *ds = cpuc->ds;
+ bool first_pebs;
+ u64 threshold;

hwc->config &= ~ARCH_PERFMON_EVENTSEL_INT;

+ first_pebs = !pebs_is_enabled(cpuc);
cpuc->pebs_enabled |= 1ULL << hwc->idx;

if (event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT)
@@ -699,11 +704,25 @@ void intel_pmu_pebs_enable(struct perf_event *event)
else if (event->hw.flags & PERF_X86_EVENT_PEBS_ST)
cpuc->pebs_enabled |= 1ULL << 63;

+ /*
+ * When the event is constrained enough we can use a larger
+ * threshold and run the event with less frequent PMI.
+ */
+ if (hwc->flags & PERF_X86_EVENT_FREERUNNING) {
+ threshold = ds->pebs_absolute_maximum -
+ x86_pmu.max_pebs_events * x86_pmu.pebs_record_size;
+ } else {
+ threshold = ds->pebs_buffer_base + x86_pmu.pebs_record_size;
+ }
+
/* Use auto-reload if possible to save a MSR write in the PMI */
if (hwc->flags & PERF_X86_EVENT_AUTO_RELOAD) {
ds->pebs_event_reset[hwc->idx] =
(u64)(-hwc->sample_period) & x86_pmu.cntval_mask;
}
+
+ if (first_pebs || ds->pebs_interrupt_threshold > threshold)
+ ds->pebs_interrupt_threshold = threshold;
}

void intel_pmu_pebs_disable(struct perf_event *event)
--
1.8.3.1

2015-05-11 02:27:34

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V9 5/8] perf, x86: drain PEBS buffer during context switch

From: Yan, Zheng <[email protected]>

Flush the PEBS buffer during context switch if PEBS interrupt threshold
is larger than one. This allows perf to supply TID for sample outputs.

Signed-off-by: Yan, Zheng <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
---
arch/x86/kernel/cpu/perf_event.h | 6 +++++-
arch/x86/kernel/cpu/perf_event_intel.c | 11 +++++++++-
arch/x86/kernel/cpu/perf_event_intel_ds.c | 32 ++++++++++++++++++++++++++++++
arch/x86/kernel/cpu/perf_event_intel_lbr.c | 3 ---
4 files changed, 47 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 626ded3..8746c61 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -91,9 +91,11 @@ struct amd_nb {
/*
* Flags PEBS can handle without an PMI.
*
+ * TID can only be handled by flushing at context switch.
+ *
*/
#define PEBS_FREERUNNING_FLAGS \
- (PERF_SAMPLE_IP | PERF_SAMPLE_ADDR | \
+ (PERF_SAMPLE_IP | PERF_SAMPLE_TID | PERF_SAMPLE_ADDR | \
PERF_SAMPLE_ID | PERF_SAMPLE_CPU | PERF_SAMPLE_STREAM_ID | \
PERF_SAMPLE_DATA_SRC | PERF_SAMPLE_IDENTIFIER | \
PERF_SAMPLE_TRANSACTION)
@@ -872,6 +874,8 @@ void intel_pmu_pebs_enable_all(void);

void intel_pmu_pebs_disable_all(void);

+void intel_pmu_pebs_sched_task(struct perf_event_context *ctx, bool sched_in);
+
void intel_ds_init(void);

void intel_pmu_lbr_sched_task(struct perf_event_context *ctx, bool sched_in);
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index fdf818a..c4c5e1f 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -2702,6 +2702,15 @@ static void intel_pmu_cpu_dying(int cpu)
fini_debug_store_on_cpu(cpu);
}

+static void intel_pmu_sched_task(struct perf_event_context *ctx,
+ bool sched_in)
+{
+ if (x86_pmu.pebs_active)
+ intel_pmu_pebs_sched_task(ctx, sched_in);
+ if (x86_pmu.lbr_nr)
+ intel_pmu_lbr_sched_task(ctx, sched_in);
+}
+
PMU_FORMAT_ATTR(offcore_rsp, "config1:0-63");

PMU_FORMAT_ATTR(ldlat, "config1:0-15");
@@ -2791,7 +2800,7 @@ static __initconst const struct x86_pmu intel_pmu = {
.cpu_starting = intel_pmu_cpu_starting,
.cpu_dying = intel_pmu_cpu_dying,
.guest_get_msrs = intel_guest_get_msrs,
- .sched_task = intel_pmu_lbr_sched_task,
+ .sched_task = intel_pmu_sched_task,
};

static __init void intel_clovertown_quirk(void)
diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 0b727d1..ba02783 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -546,6 +546,19 @@ int intel_pmu_drain_bts_buffer(void)
return 1;
}

+static inline void intel_pmu_drain_pebs_buffer(void)
+{
+ struct pt_regs regs;
+
+ x86_pmu.drain_pebs(&regs);
+}
+
+void intel_pmu_pebs_sched_task(struct perf_event_context *ctx, bool sched_in)
+{
+ if (!sched_in)
+ intel_pmu_drain_pebs_buffer();
+}
+
/*
* PEBS
*/
@@ -711,8 +724,19 @@ void intel_pmu_pebs_enable(struct perf_event *event)
if (hwc->flags & PERF_X86_EVENT_FREERUNNING) {
threshold = ds->pebs_absolute_maximum -
x86_pmu.max_pebs_events * x86_pmu.pebs_record_size;
+
+ if (first_pebs)
+ perf_sched_cb_inc(event->ctx->pmu);
} else {
threshold = ds->pebs_buffer_base + x86_pmu.pebs_record_size;
+
+ /*
+ * If not all events can use larger buffer,
+ * roll back to threshold = 1
+ */
+ if (!first_pebs &&
+ (ds->pebs_interrupt_threshold > threshold))
+ perf_sched_cb_dec(event->ctx->pmu);
}

/* Use auto-reload if possible to save a MSR write in the PMI */
@@ -729,6 +753,7 @@ void intel_pmu_pebs_disable(struct perf_event *event)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
struct hw_perf_event *hwc = &event->hw;
+ struct debug_store *ds = cpuc->ds;

cpuc->pebs_enabled &= ~(1ULL << hwc->idx);

@@ -737,6 +762,13 @@ void intel_pmu_pebs_disable(struct perf_event *event)
else if (event->hw.constraint->flags & PERF_X86_EVENT_PEBS_ST)
cpuc->pebs_enabled &= ~(1ULL << 63);

+ if (ds->pebs_interrupt_threshold >
+ ds->pebs_buffer_base + x86_pmu.pebs_record_size) {
+ intel_pmu_drain_pebs_buffer();
+ if (!pebs_is_enabled(cpuc))
+ perf_sched_cb_dec(event->ctx->pmu);
+ }
+
if (cpuc->enabled)
wrmsrl(MSR_IA32_PEBS_ENABLE, cpuc->pebs_enabled);

diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index 94e5b50..c8a72cc 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -262,9 +262,6 @@ void intel_pmu_lbr_sched_task(struct perf_event_context *ctx, bool sched_in)
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
struct x86_perf_task_context *task_ctx;

- if (!x86_pmu.lbr_nr)
- return;
-
/*
* If LBR callstack feature is enabled and the stack was saved when
* the task was scheduled out, restore the stack. Otherwise flush
--
1.8.3.1

2015-05-11 02:27:03

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V9 6/8] perf, x86: enlarge PEBS buffer

From: Yan, Zheng <[email protected]>

Currently the PEBS buffer size is 4k, it only can hold about 21
PEBS records. This patch enlarges the PEBS buffer size to 64k
(the same as BTS buffer), 64k memory can hold about 330 PEBS
records. This will significantly the reduce number of PMI when
large PEBS interrupt threshold is used.

Signed-off-by: Yan, Zheng <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
---
arch/x86/kernel/cpu/perf_event_intel_ds.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index ba02783..328b10c 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -11,7 +11,7 @@
#define BTS_RECORD_SIZE 24

#define BTS_BUFFER_SIZE (PAGE_SIZE << 4)
-#define PEBS_BUFFER_SIZE PAGE_SIZE
+#define PEBS_BUFFER_SIZE (PAGE_SIZE << 4)
#define PEBS_FIXUP_SIZE PAGE_SIZE

/*
--
1.8.3.1

2015-05-11 02:26:46

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V9 7/8] perf, x86: introduce PERF_RECORD_LOST_SAMPLES

From: Kan Liang <[email protected]>

After enlarging the PEBS interrupt threshold, there may be some mixed up
PEBS samples which are discarded by kernel. This patch drives the kernel
to emit a PERF_RECORD_LOST_SAMPLES record with the number of possible
discards when it is impossible to demux the samples. It makes sure the
user is not left in the dark about such discards.

Signed-off-by: Kan Liang <[email protected]>
---
arch/x86/kernel/cpu/perf_event_intel_ds.c | 20 +++++++++++++++----
include/linux/perf_event.h | 3 +++
include/uapi/linux/perf_event.h | 12 +++++++++++
kernel/events/core.c | 33 +++++++++++++++++++++++++++++++
4 files changed, 64 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 328b10c..18afea0b 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -1127,6 +1127,7 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
void *base, *at, *top;
int bit;
short counts[MAX_PEBS_EVENTS] = {};
+ short error[MAX_PEBS_EVENTS] = {};

if (!x86_pmu.pebs_active)
return;
@@ -1170,21 +1171,32 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
/* slow path */
pebs_status = p->status & cpuc->pebs_enabled;
pebs_status &= (1ULL << MAX_PEBS_EVENTS) - 1;
- if (pebs_status != (1 << bit))
+ if (pebs_status != (1 << bit)) {
+ u8 i;
+
+ for_each_set_bit(i, (unsigned long *)&pebs_status,
+ MAX_PEBS_EVENTS)
+ error[i]++;
continue;
+ }
}
counts[bit]++;
}

for (bit = 0; bit < x86_pmu.max_pebs_events; bit++) {
- if (counts[bit] == 0)
+ if ((counts[bit] == 0) && (error[bit] == 0))
continue;
event = cpuc->events[bit];
WARN_ON_ONCE(!event);
WARN_ON_ONCE(!event->attr.precise_ip);

- __intel_pmu_pebs_event(event, iregs, base,
- top, bit, counts[bit]);
+ /* log dropped samples number */
+ if (error[bit])
+ perf_log_lost_samples(event, error[bit]);
+
+ if (counts[bit])
+ __intel_pmu_pebs_event(event, iregs, base,
+ top, bit, counts[bit]);
}
}

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index bed1b6f..d47d792 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -747,6 +747,9 @@ perf_event__output_id_sample(struct perf_event *event,
struct perf_output_handle *handle,
struct perf_sample_data *sample);

+extern void
+perf_log_lost_samples(struct perf_event *event, u64 lost);
+
static inline bool is_sampling_event(struct perf_event *event)
{
return event->attr.sample_period != 0;
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 309211b..bab1938 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -800,6 +800,18 @@ enum perf_event_type {
*/
PERF_RECORD_ITRACE_START = 12,

+ /*
+ * Records the dropped/lost sample number.
+ *
+ * struct {
+ * struct perf_event_header header;
+ *
+ * u64 lost;
+ * struct sample_id sample_id;
+ * };
+ */
+ PERF_RECORD_LOST_SAMPLES = 13,
+
PERF_RECORD_MAX, /* non-ABI */
};

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 4d221a4..42f82c5 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5927,6 +5927,39 @@ void perf_event_aux_event(struct perf_event *event, unsigned long head,
}

/*
+ * Lost/dropped samples logging
+ */
+void perf_log_lost_samples(struct perf_event *event, u64 lost)
+{
+ struct perf_output_handle handle;
+ struct perf_sample_data sample;
+ int ret;
+
+ struct {
+ struct perf_event_header header;
+ u64 lost;
+ } lost_samples_event = {
+ .header = {
+ .type = PERF_RECORD_LOST_SAMPLES,
+ .misc = 0,
+ .size = sizeof(lost_samples_event),
+ },
+ .lost = lost,
+ };
+
+ perf_event_header__init_id(&lost_samples_event.header, &sample, event);
+
+ ret = perf_output_begin(&handle, event,
+ lost_samples_event.header.size);
+ if (ret)
+ return;
+
+ perf_output_put(&handle, lost_samples_event);
+ perf_event__output_id_sample(event, &handle, &sample);
+ perf_output_end(&handle);
+}
+
+/*
* IRQ throttle logging
*/

--
1.8.3.1

2015-05-11 02:26:12

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V9 8/8] perf tools: handle PERF_RECORD_LOST_SAMPLES

From: Kan Liang <[email protected]>

This patch modified the perf tool to handle the new RECORD type,
PERF_RECORD_LOST_SAMPLES.
The number of lost-sample events is stored in
.nr_events[PERF_RECORD_LOST_SAMPLES]. While the exact number of samples
which the kernel dropped is stored in total_lost_samples.
When the percentage of dropped samples is greater than 5%, a warning
will be sent out.

Here are some examples:

Eg 1, Recording different frequently-occurring events is safe with the
patch. Only a very low drop rate is associated with such actions.

$ perf record -e '{cycles:p,instructions:p}' -c 20003 --no-time ~/tchain
~/tchain
[perf record: Woken up 148 times to write data]
[perf record: Captured and wrote 36.922 MB perf.data (1206322 samples)]

$ perf report -D | tail
SAMPLE events: 120243
MMAP2 events: 5
LOST_SAMPLES events: 24
FINISHED_ROUND events: 15
cycles:p stats:
TOTAL events: 59348
SAMPLE events: 59348
instructions:p stats:
TOTAL events: 60895
SAMPLE events: 60895

$ perf report --stdio --group
# To display the perf.data header info, please use
--header/--header-only options.
#
#
# Total Lost Samples: 24
#
# Samples: 120K of event 'anon group { cycles:p, instructions:p }'
# Event count (approx.): 24048600000
#
# Overhead Command Shared Object Symbol
# ................ ........... ................
..................................
#
99.74% 99.86% tchain_edit tchain_edit [.] f3
0.09% 0.02% tchain_edit tchain_edit [.] f2
0.04% 0.00% tchain_edit [kernel.vmlinux] [k] ixgbe_read_reg

Eg 2, Recording the same thing multiple times can lead to high drop
rate, but it is not a useful configuration.

$ perf record -e '{cycles:p,cycles:p}' -c 20003 --no-time ~/tchain
[perf record: Woken up 1 times to write data]
Warning:
Processed 600592 samples and lost 99.73% samples!
[perf record: Captured and wrote 0.121 MB perf.data (1629 samples)]

Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/builtin-report.c | 1 +
tools/perf/util/event.c | 9 +++++++++
tools/perf/util/event.h | 17 +++++++++++++++++
tools/perf/util/machine.c | 10 ++++++++++
tools/perf/util/machine.h | 2 ++
tools/perf/util/session.c | 19 +++++++++++++++++++
tools/perf/util/tool.h | 1 +
7 files changed, 59 insertions(+)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 7c73ae5..485b7e9 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -318,6 +318,7 @@ static int perf_evlist__tty_browse_hists(struct perf_evlist *evlist,
{
struct perf_evsel *pos;

+ fprintf(stdout, "#\n# Total Lost Samples: %lu\n#\n", evlist->stats.total_lost_samples);
evlist__for_each(evlist, pos) {
struct hists *hists = evsel__hists(pos);
const char *evname = perf_evsel__name(pos);
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index db52609..2daadc8 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -25,6 +25,7 @@ static const char *perf_event__names[] = {
[PERF_RECORD_SAMPLE] = "SAMPLE",
[PERF_RECORD_AUX] = "AUX",
[PERF_RECORD_ITRACE_START] = "ITRACE_START",
+ [PERF_RECORD_LOST_SAMPLES] = "LOST_SAMPLES",
[PERF_RECORD_HEADER_ATTR] = "ATTR",
[PERF_RECORD_HEADER_EVENT_TYPE] = "EVENT_TYPE",
[PERF_RECORD_HEADER_TRACING_DATA] = "TRACING_DATA",
@@ -713,6 +714,14 @@ int perf_event__process_itrace_start(struct perf_tool *tool __maybe_unused,
return machine__process_itrace_start_event(machine, event);
}

+int perf_event__process_lost_samples(struct perf_tool *tool __maybe_unused,
+ union perf_event *event,
+ struct perf_sample *sample,
+ struct machine *machine)
+{
+ return machine__process_lost_samples_event(machine, event, sample);
+}
+
size_t perf_event__fprintf_mmap(union perf_event *event, FILE *fp)
{
return fprintf(fp, " %d/%d: [%#" PRIx64 "(%#" PRIx64 ") @ %#" PRIx64 "]: %c %s\n",
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 7eecd5e..e02996a 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -52,6 +52,11 @@ struct lost_event {
u64 lost;
};

+struct lost_samples_event {
+ struct perf_event_header header;
+ u64 lost;
+};
+
/*
* PERF_FORMAT_ENABLED | PERF_FORMAT_RUNNING | PERF_FORMAT_ID
*/
@@ -235,6 +240,12 @@ enum auxtrace_error_type {
* total_lost tells exactly how many events the kernel in fact lost, i.e. it is
* the sum of all struct lost_event.lost fields reported.
*
+ * The kernel discards mixed up samples and sends the number in a
+ * PERF_RECORD_LOST_SAMPLES event. The number of lost-samples events is stored
+ * in .nr_events[PERF_RECORD_LOST_SAMPLES] while total_lost_samples tells
+ * exactly how many samples the kernel in fact dropped, i.e. it is the sum of
+ * all struct lost_samples_event.lost fields reported.
+ *
* The total_period is needed because by default auto-freq is used, so
* multipling nr_events[PERF_EVENT_SAMPLE] by a frequency isn't possible to get
* the total number of low level events, it is necessary to to sum all struct
@@ -244,6 +255,7 @@ struct events_stats {
u64 total_period;
u64 total_non_filtered_period;
u64 total_lost;
+ u64 total_lost_samples;
u64 total_invalid_chains;
u32 nr_events[PERF_RECORD_HEADER_MAX];
u32 nr_non_filtered_samples;
@@ -342,6 +354,7 @@ union perf_event {
struct comm_event comm;
struct fork_event fork;
struct lost_event lost;
+ struct lost_samples_event lost_samples;
struct read_event read;
struct throttle_event throttle;
struct sample_event sample;
@@ -390,6 +403,10 @@ int perf_event__process_lost(struct perf_tool *tool,
union perf_event *event,
struct perf_sample *sample,
struct machine *machine);
+int perf_event__process_lost_samples(struct perf_tool *tool,
+ union perf_event *event,
+ struct perf_sample *sample,
+ struct machine *machine);
int perf_event__process_aux(struct perf_tool *tool,
union perf_event *event,
struct perf_sample *sample,
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 2f47110..991a342 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -458,6 +458,14 @@ int machine__process_lost_event(struct machine *machine __maybe_unused,
return 0;
}

+int machine__process_lost_samples_event(struct machine *machine __maybe_unused,
+ union perf_event *event, struct perf_sample *sample)
+{
+ dump_printf(": id:%" PRIu64 ": lost samples :%" PRIu64 "\n",
+ sample->id, event->lost_samples.lost);
+ return 0;
+}
+
static struct dso*
machine__module_dso(struct machine *machine, struct kmod_path *m,
const char *filename)
@@ -1351,6 +1359,8 @@ int machine__process_event(struct machine *machine, union perf_event *event,
ret = machine__process_aux_event(machine, event); break;
case PERF_RECORD_ITRACE_START:
ret = machine__process_itrace_start_event(machine, event);
+ case PERF_RECORD_LOST_SAMPLES:
+ ret = machine__process_lost_samples_event(machine, event, sample); break;
break;
default:
ret = -1;
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index 1d99296..7ba5e8f 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -81,6 +81,8 @@ int machine__process_fork_event(struct machine *machine, union perf_event *event
struct perf_sample *sample);
int machine__process_lost_event(struct machine *machine, union perf_event *event,
struct perf_sample *sample);
+int machine__process_lost_samples_event(struct machine *machine, union perf_event *event,
+ struct perf_sample *sample);
int machine__process_aux_event(struct machine *machine,
union perf_event *event);
int machine__process_itrace_start_event(struct machine *machine,
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index e722107..4a5a609 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -325,6 +325,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
tool->exit = process_event_stub;
if (tool->lost == NULL)
tool->lost = perf_event__process_lost;
+ if (tool->lost_samples == NULL)
+ tool->lost_samples = perf_event__process_lost_samples;
if (tool->aux == NULL)
tool->aux = perf_event__process_aux;
if (tool->itrace_start == NULL)
@@ -606,6 +608,7 @@ static perf_event__swap_op perf_event__swap_ops[] = {
[PERF_RECORD_SAMPLE] = perf_event__all64_swap,
[PERF_RECORD_AUX] = perf_event__aux_swap,
[PERF_RECORD_ITRACE_START] = perf_event__itrace_start_swap,
+ [PERF_RECORD_LOST_SAMPLES] = perf_event__all64_swap,
[PERF_RECORD_HEADER_ATTR] = perf_event__hdr_attr_swap,
[PERF_RECORD_HEADER_EVENT_TYPE] = perf_event__event_type_swap,
[PERF_RECORD_HEADER_TRACING_DATA] = perf_event__tracing_data_swap,
@@ -1049,6 +1052,10 @@ static int machines__deliver_event(struct machines *machines,
if (tool->lost == perf_event__process_lost)
evlist->stats.total_lost += event->lost.lost;
return tool->lost(tool, event, sample, machine);
+ case PERF_RECORD_LOST_SAMPLES:
+ if (tool->lost_samples == perf_event__process_lost_samples)
+ evlist->stats.total_lost_samples += event->lost_samples.lost;
+ return tool->lost_samples(tool, event, sample, machine);
case PERF_RECORD_READ:
return tool->read(tool, event, sample, evsel, machine);
case PERF_RECORD_THROTTLE:
@@ -1286,6 +1293,18 @@ static void perf_session__warn_about_errors(const struct perf_session *session)
stats->nr_events[PERF_RECORD_LOST]);
}

+ if (session->tool->lost_samples == perf_event__process_lost_samples) {
+ double drop_rate;
+
+ drop_rate = (double)stats->total_lost_samples /
+ (double) (stats->nr_events[PERF_RECORD_SAMPLE] + stats->total_lost_samples);
+ if (drop_rate > 0.05) {
+ ui__warning("Processed %lu samples and lost %3.2f%% samples!\n\n",
+ stats->nr_events[PERF_RECORD_SAMPLE] + stats->total_lost_samples,
+ drop_rate * 100.0);
+ }
+ }
+
if (stats->nr_unknown_events != 0) {
ui__warning("Found %u unknown events!\n\n"
"Is this an older tool processing a perf.data "
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index 7f282ad..c307dd4 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -43,6 +43,7 @@ struct perf_tool {
fork,
exit,
lost,
+ lost_samples,
aux,
itrace_start,
throttle,
--
1.8.3.1

2015-05-11 16:06:34

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH V9 0/8] large PEBS interrupt threshold

On Sun, May 10, 2015 at 03:13:07PM -0400, Kan Liang wrote:
> changes since v8:
> - Record 'lost' events to all set bits
> - dropped the @id field from the lost samples record
> - Print lost samples event nr in perf report --stdio output

Only the last two patches changed, right? Double checking because I
already had the others queued in a test branch.

2015-05-11 16:09:30

by Liang, Kan

[permalink] [raw]
Subject: RE: [PATCH V9 0/8] large PEBS interrupt threshold



>
> On Sun, May 10, 2015 at 03:13:07PM -0400, Kan Liang wrote:
> > changes since v8:
> > - Record 'lost' events to all set bits
> > - dropped the @id field from the lost samples record
> > - Print lost samples event nr in perf report --stdio output
>
> Only the last two patches changed, right? Double checking because I
> already had the others queued in a test branch.

Right. Only the last two patches changed.

2015-05-11 19:06:54

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH V9 7/8] perf, x86: introduce PERF_RECORD_LOST_SAMPLES

Em Sun, May 10, 2015 at 03:13:14PM -0400, Kan Liang escreveu:
> From: Kan Liang <[email protected]>
>
> After enlarging the PEBS interrupt threshold, there may be some mixed up
> PEBS samples which are discarded by kernel. This patch drives the kernel
> to emit a PERF_RECORD_LOST_SAMPLES record with the number of possible
> discards when it is impossible to demux the samples. It makes sure the
> user is not left in the dark about such discards.

ok, but would be nice to spell out what the tooling needs to do here,
i.e. when more than one event is mapping to the same mmap ring buffer,
the user has to use perf_event_attr.sample_id_all and have
PERF_SAMPLE_ID in its perf_event_attr.sample_type, if disambiguating the
event is desired. I.e. the discarded stuff is what is in the
PERF_SAMPLE_ID payload, when present.

Probably is what you did when using this in the tooling, lemme see...
;-)

- Arnaldo

> Signed-off-by: Kan Liang <[email protected]>
> ---
> arch/x86/kernel/cpu/perf_event_intel_ds.c | 20 +++++++++++++++----
> include/linux/perf_event.h | 3 +++
> include/uapi/linux/perf_event.h | 12 +++++++++++
> kernel/events/core.c | 33 +++++++++++++++++++++++++++++++
> 4 files changed, 64 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
> index 328b10c..18afea0b 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
> @@ -1127,6 +1127,7 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
> void *base, *at, *top;
> int bit;
> short counts[MAX_PEBS_EVENTS] = {};
> + short error[MAX_PEBS_EVENTS] = {};
>
> if (!x86_pmu.pebs_active)
> return;
> @@ -1170,21 +1171,32 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
> /* slow path */
> pebs_status = p->status & cpuc->pebs_enabled;
> pebs_status &= (1ULL << MAX_PEBS_EVENTS) - 1;
> - if (pebs_status != (1 << bit))
> + if (pebs_status != (1 << bit)) {
> + u8 i;
> +
> + for_each_set_bit(i, (unsigned long *)&pebs_status,
> + MAX_PEBS_EVENTS)
> + error[i]++;
> continue;
> + }
> }
> counts[bit]++;
> }
>
> for (bit = 0; bit < x86_pmu.max_pebs_events; bit++) {
> - if (counts[bit] == 0)
> + if ((counts[bit] == 0) && (error[bit] == 0))
> continue;
> event = cpuc->events[bit];
> WARN_ON_ONCE(!event);
> WARN_ON_ONCE(!event->attr.precise_ip);
>
> - __intel_pmu_pebs_event(event, iregs, base,
> - top, bit, counts[bit]);
> + /* log dropped samples number */
> + if (error[bit])
> + perf_log_lost_samples(event, error[bit]);
> +
> + if (counts[bit])
> + __intel_pmu_pebs_event(event, iregs, base,
> + top, bit, counts[bit]);
> }
> }
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index bed1b6f..d47d792 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -747,6 +747,9 @@ perf_event__output_id_sample(struct perf_event *event,
> struct perf_output_handle *handle,
> struct perf_sample_data *sample);
>
> +extern void
> +perf_log_lost_samples(struct perf_event *event, u64 lost);
> +
> static inline bool is_sampling_event(struct perf_event *event)
> {
> return event->attr.sample_period != 0;
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index 309211b..bab1938 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -800,6 +800,18 @@ enum perf_event_type {
> */
> PERF_RECORD_ITRACE_START = 12,
>
> + /*
> + * Records the dropped/lost sample number.
> + *
> + * struct {
> + * struct perf_event_header header;
> + *
> + * u64 lost;
> + * struct sample_id sample_id;
> + * };
> + */
> + PERF_RECORD_LOST_SAMPLES = 13,
> +
> PERF_RECORD_MAX, /* non-ABI */
> };
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 4d221a4..42f82c5 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -5927,6 +5927,39 @@ void perf_event_aux_event(struct perf_event *event, unsigned long head,
> }
>
> /*
> + * Lost/dropped samples logging
> + */
> +void perf_log_lost_samples(struct perf_event *event, u64 lost)
> +{
> + struct perf_output_handle handle;
> + struct perf_sample_data sample;
> + int ret;
> +
> + struct {
> + struct perf_event_header header;
> + u64 lost;
> + } lost_samples_event = {
> + .header = {
> + .type = PERF_RECORD_LOST_SAMPLES,
> + .misc = 0,
> + .size = sizeof(lost_samples_event),
> + },
> + .lost = lost,
> + };
> +
> + perf_event_header__init_id(&lost_samples_event.header, &sample, event);
> +
> + ret = perf_output_begin(&handle, event,
> + lost_samples_event.header.size);
> + if (ret)
> + return;
> +
> + perf_output_put(&handle, lost_samples_event);
> + perf_event__output_id_sample(event, &handle, &sample);
> + perf_output_end(&handle);
> +}
> +
> +/*
> * IRQ throttle logging
> */
>
> --
> 1.8.3.1

2015-05-11 19:22:45

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH V9 8/8] perf tools: handle PERF_RECORD_LOST_SAMPLES

Em Sun, May 10, 2015 at 03:13:15PM -0400, Kan Liang escreveu:
> From: Kan Liang <[email protected]>
>
> This patch modified the perf tool to handle the new RECORD type,
> PERF_RECORD_LOST_SAMPLES.
> The number of lost-sample events is stored in
> .nr_events[PERF_RECORD_LOST_SAMPLES]. While the exact number of samples
> which the kernel dropped is stored in total_lost_samples.
> When the percentage of dropped samples is greater than 5%, a warning
> will be sent out.
>
> Here are some examples:
>
> Eg 1, Recording different frequently-occurring events is safe with the
> patch. Only a very low drop rate is associated with such actions.
>
> $ perf record -e '{cycles:p,instructions:p}' -c 20003 --no-time ~/tchain
> ~/tchain
> [perf record: Woken up 148 times to write data]
> [perf record: Captured and wrote 36.922 MB perf.data (1206322 samples)]
>
> $ perf report -D | tail
> SAMPLE events: 120243
> MMAP2 events: 5
> LOST_SAMPLES events: 24
> FINISHED_ROUND events: 15
> cycles:p stats:
> TOTAL events: 59348
> SAMPLE events: 59348
> instructions:p stats:
> TOTAL events: 60895
> SAMPLE events: 60895

The example doesn't show which of cycles:p or instructions:p got lost,
isn't that possible? Guess not from the patch, but should, no? I.e. what
is PERF_SAMPLE_ID for then?

- Arnaldo

>
> $ perf report --stdio --group
> # To display the perf.data header info, please use
> --header/--header-only options.
> #
> #
> # Total Lost Samples: 24
> #
> # Samples: 120K of event 'anon group { cycles:p, instructions:p }'
> # Event count (approx.): 24048600000
> #
> # Overhead Command Shared Object Symbol
> # ................ ........... ................
> ..................................
> #
> 99.74% 99.86% tchain_edit tchain_edit [.] f3
> 0.09% 0.02% tchain_edit tchain_edit [.] f2
> 0.04% 0.00% tchain_edit [kernel.vmlinux] [k] ixgbe_read_reg
>
> Eg 2, Recording the same thing multiple times can lead to high drop
> rate, but it is not a useful configuration.
>
> $ perf record -e '{cycles:p,cycles:p}' -c 20003 --no-time ~/tchain
> [perf record: Woken up 1 times to write data]
> Warning:
> Processed 600592 samples and lost 99.73% samples!
> [perf record: Captured and wrote 0.121 MB perf.data (1629 samples)]
>
> Signed-off-by: Kan Liang <[email protected]>
> ---
> tools/perf/builtin-report.c | 1 +
> tools/perf/util/event.c | 9 +++++++++
> tools/perf/util/event.h | 17 +++++++++++++++++
> tools/perf/util/machine.c | 10 ++++++++++
> tools/perf/util/machine.h | 2 ++
> tools/perf/util/session.c | 19 +++++++++++++++++++
> tools/perf/util/tool.h | 1 +
> 7 files changed, 59 insertions(+)
>
> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> index 7c73ae5..485b7e9 100644
> --- a/tools/perf/builtin-report.c
> +++ b/tools/perf/builtin-report.c
> @@ -318,6 +318,7 @@ static int perf_evlist__tty_browse_hists(struct perf_evlist *evlist,
> {
> struct perf_evsel *pos;
>
> + fprintf(stdout, "#\n# Total Lost Samples: %lu\n#\n", evlist->stats.total_lost_samples);
> evlist__for_each(evlist, pos) {
> struct hists *hists = evsel__hists(pos);
> const char *evname = perf_evsel__name(pos);
> diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
> index db52609..2daadc8 100644
> --- a/tools/perf/util/event.c
> +++ b/tools/perf/util/event.c
> @@ -25,6 +25,7 @@ static const char *perf_event__names[] = {
> [PERF_RECORD_SAMPLE] = "SAMPLE",
> [PERF_RECORD_AUX] = "AUX",
> [PERF_RECORD_ITRACE_START] = "ITRACE_START",
> + [PERF_RECORD_LOST_SAMPLES] = "LOST_SAMPLES",
> [PERF_RECORD_HEADER_ATTR] = "ATTR",
> [PERF_RECORD_HEADER_EVENT_TYPE] = "EVENT_TYPE",
> [PERF_RECORD_HEADER_TRACING_DATA] = "TRACING_DATA",
> @@ -713,6 +714,14 @@ int perf_event__process_itrace_start(struct perf_tool *tool __maybe_unused,
> return machine__process_itrace_start_event(machine, event);
> }
>
> +int perf_event__process_lost_samples(struct perf_tool *tool __maybe_unused,
> + union perf_event *event,
> + struct perf_sample *sample,
> + struct machine *machine)
> +{
> + return machine__process_lost_samples_event(machine, event, sample);
> +}
> +
> size_t perf_event__fprintf_mmap(union perf_event *event, FILE *fp)
> {
> return fprintf(fp, " %d/%d: [%#" PRIx64 "(%#" PRIx64 ") @ %#" PRIx64 "]: %c %s\n",
> diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
> index 7eecd5e..e02996a 100644
> --- a/tools/perf/util/event.h
> +++ b/tools/perf/util/event.h
> @@ -52,6 +52,11 @@ struct lost_event {
> u64 lost;
> };
>
> +struct lost_samples_event {
> + struct perf_event_header header;
> + u64 lost;
> +};
> +
> /*
> * PERF_FORMAT_ENABLED | PERF_FORMAT_RUNNING | PERF_FORMAT_ID
> */
> @@ -235,6 +240,12 @@ enum auxtrace_error_type {
> * total_lost tells exactly how many events the kernel in fact lost, i.e. it is
> * the sum of all struct lost_event.lost fields reported.
> *
> + * The kernel discards mixed up samples and sends the number in a
> + * PERF_RECORD_LOST_SAMPLES event. The number of lost-samples events is stored
> + * in .nr_events[PERF_RECORD_LOST_SAMPLES] while total_lost_samples tells
> + * exactly how many samples the kernel in fact dropped, i.e. it is the sum of
> + * all struct lost_samples_event.lost fields reported.
> + *
> * The total_period is needed because by default auto-freq is used, so
> * multipling nr_events[PERF_EVENT_SAMPLE] by a frequency isn't possible to get
> * the total number of low level events, it is necessary to to sum all struct
> @@ -244,6 +255,7 @@ struct events_stats {
> u64 total_period;
> u64 total_non_filtered_period;
> u64 total_lost;
> + u64 total_lost_samples;
> u64 total_invalid_chains;
> u32 nr_events[PERF_RECORD_HEADER_MAX];
> u32 nr_non_filtered_samples;
> @@ -342,6 +354,7 @@ union perf_event {
> struct comm_event comm;
> struct fork_event fork;
> struct lost_event lost;
> + struct lost_samples_event lost_samples;
> struct read_event read;
> struct throttle_event throttle;
> struct sample_event sample;
> @@ -390,6 +403,10 @@ int perf_event__process_lost(struct perf_tool *tool,
> union perf_event *event,
> struct perf_sample *sample,
> struct machine *machine);
> +int perf_event__process_lost_samples(struct perf_tool *tool,
> + union perf_event *event,
> + struct perf_sample *sample,
> + struct machine *machine);
> int perf_event__process_aux(struct perf_tool *tool,
> union perf_event *event,
> struct perf_sample *sample,
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 2f47110..991a342 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -458,6 +458,14 @@ int machine__process_lost_event(struct machine *machine __maybe_unused,
> return 0;
> }
>
> +int machine__process_lost_samples_event(struct machine *machine __maybe_unused,
> + union perf_event *event, struct perf_sample *sample)
> +{
> + dump_printf(": id:%" PRIu64 ": lost samples :%" PRIu64 "\n",
> + sample->id, event->lost_samples.lost);
> + return 0;
> +}
> +
> static struct dso*
> machine__module_dso(struct machine *machine, struct kmod_path *m,
> const char *filename)
> @@ -1351,6 +1359,8 @@ int machine__process_event(struct machine *machine, union perf_event *event,
> ret = machine__process_aux_event(machine, event); break;
> case PERF_RECORD_ITRACE_START:
> ret = machine__process_itrace_start_event(machine, event);
> + case PERF_RECORD_LOST_SAMPLES:
> + ret = machine__process_lost_samples_event(machine, event, sample); break;
> break;
> default:
> ret = -1;
> diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
> index 1d99296..7ba5e8f 100644
> --- a/tools/perf/util/machine.h
> +++ b/tools/perf/util/machine.h
> @@ -81,6 +81,8 @@ int machine__process_fork_event(struct machine *machine, union perf_event *event
> struct perf_sample *sample);
> int machine__process_lost_event(struct machine *machine, union perf_event *event,
> struct perf_sample *sample);
> +int machine__process_lost_samples_event(struct machine *machine, union perf_event *event,
> + struct perf_sample *sample);
> int machine__process_aux_event(struct machine *machine,
> union perf_event *event);
> int machine__process_itrace_start_event(struct machine *machine,
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index e722107..4a5a609 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -325,6 +325,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
> tool->exit = process_event_stub;
> if (tool->lost == NULL)
> tool->lost = perf_event__process_lost;
> + if (tool->lost_samples == NULL)
> + tool->lost_samples = perf_event__process_lost_samples;
> if (tool->aux == NULL)
> tool->aux = perf_event__process_aux;
> if (tool->itrace_start == NULL)
> @@ -606,6 +608,7 @@ static perf_event__swap_op perf_event__swap_ops[] = {
> [PERF_RECORD_SAMPLE] = perf_event__all64_swap,
> [PERF_RECORD_AUX] = perf_event__aux_swap,
> [PERF_RECORD_ITRACE_START] = perf_event__itrace_start_swap,
> + [PERF_RECORD_LOST_SAMPLES] = perf_event__all64_swap,
> [PERF_RECORD_HEADER_ATTR] = perf_event__hdr_attr_swap,
> [PERF_RECORD_HEADER_EVENT_TYPE] = perf_event__event_type_swap,
> [PERF_RECORD_HEADER_TRACING_DATA] = perf_event__tracing_data_swap,
> @@ -1049,6 +1052,10 @@ static int machines__deliver_event(struct machines *machines,
> if (tool->lost == perf_event__process_lost)
> evlist->stats.total_lost += event->lost.lost;
> return tool->lost(tool, event, sample, machine);
> + case PERF_RECORD_LOST_SAMPLES:
> + if (tool->lost_samples == perf_event__process_lost_samples)
> + evlist->stats.total_lost_samples += event->lost_samples.lost;
> + return tool->lost_samples(tool, event, sample, machine);
> case PERF_RECORD_READ:
> return tool->read(tool, event, sample, evsel, machine);
> case PERF_RECORD_THROTTLE:
> @@ -1286,6 +1293,18 @@ static void perf_session__warn_about_errors(const struct perf_session *session)
> stats->nr_events[PERF_RECORD_LOST]);
> }
>
> + if (session->tool->lost_samples == perf_event__process_lost_samples) {
> + double drop_rate;
> +
> + drop_rate = (double)stats->total_lost_samples /
> + (double) (stats->nr_events[PERF_RECORD_SAMPLE] + stats->total_lost_samples);
> + if (drop_rate > 0.05) {
> + ui__warning("Processed %lu samples and lost %3.2f%% samples!\n\n",
> + stats->nr_events[PERF_RECORD_SAMPLE] + stats->total_lost_samples,
> + drop_rate * 100.0);
> + }
> + }
> +
> if (stats->nr_unknown_events != 0) {
> ui__warning("Found %u unknown events!\n\n"
> "Is this an older tool processing a perf.data "
> diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
> index 7f282ad..c307dd4 100644
> --- a/tools/perf/util/tool.h
> +++ b/tools/perf/util/tool.h
> @@ -43,6 +43,7 @@ struct perf_tool {
> fork,
> exit,
> lost,
> + lost_samples,
> aux,
> itrace_start,
> throttle,
> --
> 1.8.3.1

2015-05-11 20:41:22

by Liang, Kan

[permalink] [raw]
Subject: RE: [PATCH V9 8/8] perf tools: handle PERF_RECORD_LOST_SAMPLES


>
> Em Sun, May 10, 2015 at 03:13:15PM -0400, Kan Liang escreveu:
> > From: Kan Liang <[email protected]>
> >
> > This patch modified the perf tool to handle the new RECORD type,
> > PERF_RECORD_LOST_SAMPLES.
> > The number of lost-sample events is stored in
> > .nr_events[PERF_RECORD_LOST_SAMPLES]. While the exact number of
> > samples which the kernel dropped is stored in total_lost_samples.
> > When the percentage of dropped samples is greater than 5%, a warning
> > will be sent out.
> >
> > Here are some examples:
> >
> > Eg 1, Recording different frequently-occurring events is safe with the
> > patch. Only a very low drop rate is associated with such actions.
> >
> > $ perf record -e '{cycles:p,instructions:p}' -c 20003 --no-time
> > ~/tchain ~/tchain [perf record: Woken up 148 times to write data]
> > [perf record: Captured and wrote 36.922 MB perf.data (1206322
> > samples)]
> >
> > $ perf report -D | tail
> > SAMPLE events: 120243
> > MMAP2 events: 5
> > LOST_SAMPLES events: 24
> > FINISHED_ROUND events: 15
> > cycles:p stats:
> > TOTAL events: 59348
> > SAMPLE events: 59348
> > instructions:p stats:
> > TOTAL events: 60895
> > SAMPLE events: 60895
>
> The example doesn't show which of cycles:p or instructions:p got lost, isn't
> that possible? Guess not from the patch, but should, no? I.e. what is
> PERF_SAMPLE_ID for then?


Yes, it's possible to know the lost samples number for cycles:p or
instructions:p. But I didn't implement it in the summary of perf report -D.
I think a total lost_samples number is enough for user. What they really
care about should be the total samples drop rate.
(If they really want to know the number of which event got lost, they can
search LOST_SAMPLES in perf report -D. sample->id is dumped with lost
number.)

Thanks,
Kan

>
> - Arnaldo
>
> >
> > $ perf report --stdio --group
> > # To display the perf.data header info, please use
> > --header/--header-only options.
> > #
> > #
> > # Total Lost Samples: 24
> > #
> > # Samples: 120K of event 'anon group { cycles:p, instructions:p }'
> > # Event count (approx.): 24048600000
> > #
> > # Overhead Command Shared Object Symbol
> > # ................ ........... ................
> > ..................................
> > #
> > 99.74% 99.86% tchain_edit tchain_edit [.] f3
> > 0.09% 0.02% tchain_edit tchain_edit [.] f2
> > 0.04% 0.00% tchain_edit [kernel.vmlinux] [k] ixgbe_read_reg
> >
> > Eg 2, Recording the same thing multiple times can lead to high drop
> > rate, but it is not a useful configuration.
> >
> > $ perf record -e '{cycles:p,cycles:p}' -c 20003 --no-time ~/tchain
> > [perf record: Woken up 1 times to write data]
> > Warning:
> > Processed 600592 samples and lost 99.73% samples!
> > [perf record: Captured and wrote 0.121 MB perf.data (1629 samples)]
> >
> > Signed-off-by: Kan Liang <[email protected]>
> > ---
> > tools/perf/builtin-report.c | 1 +
> > tools/perf/util/event.c | 9 +++++++++
> > tools/perf/util/event.h | 17 +++++++++++++++++
> > tools/perf/util/machine.c | 10 ++++++++++
> > tools/perf/util/machine.h | 2 ++
> > tools/perf/util/session.c | 19 +++++++++++++++++++
> > tools/perf/util/tool.h | 1 +
> > 7 files changed, 59 insertions(+)
> >
> > diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> > index 7c73ae5..485b7e9 100644
> > --- a/tools/perf/builtin-report.c
> > +++ b/tools/perf/builtin-report.c
> > @@ -318,6 +318,7 @@ static int perf_evlist__tty_browse_hists(struct
> > perf_evlist *evlist, {
> > struct perf_evsel *pos;
> >
> > + fprintf(stdout, "#\n# Total Lost Samples: %lu\n#\n",
> > +evlist->stats.total_lost_samples);
> > evlist__for_each(evlist, pos) {
> > struct hists *hists = evsel__hists(pos);
> > const char *evname = perf_evsel__name(pos); diff --git
> > a/tools/perf/util/event.c b/tools/perf/util/event.c index
> > db52609..2daadc8 100644
> > --- a/tools/perf/util/event.c
> > +++ b/tools/perf/util/event.c
> > @@ -25,6 +25,7 @@ static const char *perf_event__names[] = {
> > [PERF_RECORD_SAMPLE] = "SAMPLE",
> > [PERF_RECORD_AUX] = "AUX",
> > [PERF_RECORD_ITRACE_START] = "ITRACE_START",
> > + [PERF_RECORD_LOST_SAMPLES] = "LOST_SAMPLES",
> > [PERF_RECORD_HEADER_ATTR] = "ATTR",
> > [PERF_RECORD_HEADER_EVENT_TYPE] =
> "EVENT_TYPE",
> > [PERF_RECORD_HEADER_TRACING_DATA] = "TRACING_DATA",
> > @@ -713,6 +714,14 @@ int perf_event__process_itrace_start(struct
> perf_tool *tool __maybe_unused,
> > return machine__process_itrace_start_event(machine, event); }
> >
> > +int perf_event__process_lost_samples(struct perf_tool *tool
> __maybe_unused,
> > + union perf_event *event,
> > + struct perf_sample *sample,
> > + struct machine *machine)
> > +{
> > + return machine__process_lost_samples_event(machine, event,
> sample);
> > +}
> > +
> > size_t perf_event__fprintf_mmap(union perf_event *event, FILE *fp) {
> > return fprintf(fp, " %d/%d: [%#" PRIx64 "(%#" PRIx64 ") @ %#"
> PRIx64
> > "]: %c %s\n", diff --git a/tools/perf/util/event.h
> > b/tools/perf/util/event.h index 7eecd5e..e02996a 100644
> > --- a/tools/perf/util/event.h
> > +++ b/tools/perf/util/event.h
> > @@ -52,6 +52,11 @@ struct lost_event {
> > u64 lost;
> > };
> >
> > +struct lost_samples_event {
> > + struct perf_event_header header;
> > + u64 lost;
> > +};
> > +
> > /*
> > * PERF_FORMAT_ENABLED | PERF_FORMAT_RUNNING |
> PERF_FORMAT_ID
> > */
> > @@ -235,6 +240,12 @@ enum auxtrace_error_type {
> > * total_lost tells exactly how many events the kernel in fact lost, i.e. it is
> > * the sum of all struct lost_event.lost fields reported.
> > *
> > + * The kernel discards mixed up samples and sends the number in a
> > + * PERF_RECORD_LOST_SAMPLES event. The number of lost-samples
> events
> > + is stored
> > + * in .nr_events[PERF_RECORD_LOST_SAMPLES] while
> total_lost_samples
> > + tells
> > + * exactly how many samples the kernel in fact dropped, i.e. it is
> > + the sum of
> > + * all struct lost_samples_event.lost fields reported.
> > + *
> > * The total_period is needed because by default auto-freq is used, so
> > * multipling nr_events[PERF_EVENT_SAMPLE] by a frequency isn't
> possible to get
> > * the total number of low level events, it is necessary to to sum
> > all struct @@ -244,6 +255,7 @@ struct events_stats {
> > u64 total_period;
> > u64 total_non_filtered_period;
> > u64 total_lost;
> > + u64 total_lost_samples;
> > u64 total_invalid_chains;
> > u32 nr_events[PERF_RECORD_HEADER_MAX];
> > u32 nr_non_filtered_samples;
> > @@ -342,6 +354,7 @@ union perf_event {
> > struct comm_event comm;
> > struct fork_event fork;
> > struct lost_event lost;
> > + struct lost_samples_event lost_samples;
> > struct read_event read;
> > struct throttle_event throttle;
> > struct sample_event sample;
> > @@ -390,6 +403,10 @@ int perf_event__process_lost(struct perf_tool
> *tool,
> > union perf_event *event,
> > struct perf_sample *sample,
> > struct machine *machine);
> > +int perf_event__process_lost_samples(struct perf_tool *tool,
> > + union perf_event *event,
> > + struct perf_sample *sample,
> > + struct machine *machine);
> > int perf_event__process_aux(struct perf_tool *tool,
> > union perf_event *event,
> > struct perf_sample *sample,
> > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > index 2f47110..991a342 100644
> > --- a/tools/perf/util/machine.c
> > +++ b/tools/perf/util/machine.c
> > @@ -458,6 +458,14 @@ int machine__process_lost_event(struct
> machine *machine __maybe_unused,
> > return 0;
> > }
> >
> > +int machine__process_lost_samples_event(struct machine *machine
> __maybe_unused,
> > + union perf_event *event, struct
> perf_sample *sample) {
> > + dump_printf(": id:%" PRIu64 ": lost samples :%" PRIu64 "\n",
> > + sample->id, event->lost_samples.lost);
> > + return 0;
> > +}
> > +
> > static struct dso*
> > machine__module_dso(struct machine *machine, struct kmod_path *m,
> > const char *filename)
> > @@ -1351,6 +1359,8 @@ int machine__process_event(struct machine
> *machine, union perf_event *event,
> > ret = machine__process_aux_event(machine, event);
> break;
> > case PERF_RECORD_ITRACE_START:
> > ret = machine__process_itrace_start_event(machine,
> event);
> > + case PERF_RECORD_LOST_SAMPLES:
> > + ret = machine__process_lost_samples_event(machine,
> event, sample);
> > +break;
> > break;
> > default:
> > ret = -1;
> > diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
> > index 1d99296..7ba5e8f 100644
> > --- a/tools/perf/util/machine.h
> > +++ b/tools/perf/util/machine.h
> > @@ -81,6 +81,8 @@ int machine__process_fork_event(struct machine
> *machine, union perf_event *event
> > struct perf_sample *sample);
> > int machine__process_lost_event(struct machine *machine, union
> perf_event *event,
> > struct perf_sample *sample);
> > +int machine__process_lost_samples_event(struct machine *machine,
> union perf_event *event,
> > + struct perf_sample *sample);
> > int machine__process_aux_event(struct machine *machine,
> > union perf_event *event);
> > int machine__process_itrace_start_event(struct machine *machine, diff
> > --git a/tools/perf/util/session.c b/tools/perf/util/session.c index
> > e722107..4a5a609 100644
> > --- a/tools/perf/util/session.c
> > +++ b/tools/perf/util/session.c
> > @@ -325,6 +325,8 @@ void perf_tool__fill_defaults(struct perf_tool
> *tool)
> > tool->exit = process_event_stub;
> > if (tool->lost == NULL)
> > tool->lost = perf_event__process_lost;
> > + if (tool->lost_samples == NULL)
> > + tool->lost_samples = perf_event__process_lost_samples;
> > if (tool->aux == NULL)
> > tool->aux = perf_event__process_aux;
> > if (tool->itrace_start == NULL)
> > @@ -606,6 +608,7 @@ static perf_event__swap_op
> perf_event__swap_ops[] = {
> > [PERF_RECORD_SAMPLE] =
> perf_event__all64_swap,
> > [PERF_RECORD_AUX] = perf_event__aux_swap,
> > [PERF_RECORD_ITRACE_START] =
> perf_event__itrace_start_swap,
> > + [PERF_RECORD_LOST_SAMPLES] =
> perf_event__all64_swap,
> > [PERF_RECORD_HEADER_ATTR] =
> perf_event__hdr_attr_swap,
> > [PERF_RECORD_HEADER_EVENT_TYPE] =
> perf_event__event_type_swap,
> > [PERF_RECORD_HEADER_TRACING_DATA] =
> perf_event__tracing_data_swap,
> > @@ -1049,6 +1052,10 @@ static int machines__deliver_event(struct
> machines *machines,
> > if (tool->lost == perf_event__process_lost)
> > evlist->stats.total_lost += event->lost.lost;
> > return tool->lost(tool, event, sample, machine);
> > + case PERF_RECORD_LOST_SAMPLES:
> > + if (tool->lost_samples ==
> perf_event__process_lost_samples)
> > + evlist->stats.total_lost_samples += event-
> >lost_samples.lost;
> > + return tool->lost_samples(tool, event, sample, machine);
> > case PERF_RECORD_READ:
> > return tool->read(tool, event, sample, evsel, machine);
> > case PERF_RECORD_THROTTLE:
> > @@ -1286,6 +1293,18 @@ static void
> perf_session__warn_about_errors(const struct perf_session *session)
> > stats->nr_events[PERF_RECORD_LOST]);
> > }
> >
> > + if (session->tool->lost_samples ==
> perf_event__process_lost_samples) {
> > + double drop_rate;
> > +
> > + drop_rate = (double)stats->total_lost_samples /
> > + (double) (stats-
> >nr_events[PERF_RECORD_SAMPLE] + stats->total_lost_samples);
> > + if (drop_rate > 0.05) {
> > + ui__warning("Processed %lu samples and
> lost %3.2f%% samples!\n\n",
> > + stats-
> >nr_events[PERF_RECORD_SAMPLE] + stats->total_lost_samples,
> > + drop_rate * 100.0);
> > + }
> > + }
> > +
> > if (stats->nr_unknown_events != 0) {
> > ui__warning("Found %u unknown events!\n\n"
> > "Is this an older tool processing a perf.data "
> > diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h index
> > 7f282ad..c307dd4 100644
> > --- a/tools/perf/util/tool.h
> > +++ b/tools/perf/util/tool.h
> > @@ -43,6 +43,7 @@ struct perf_tool {
> > fork,
> > exit,
> > lost,
> > + lost_samples,
> > aux,
> > itrace_start,
> > throttle,
> > --
> > 1.8.3.1

2015-05-11 21:28:05

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH V9 8/8] perf tools: handle PERF_RECORD_LOST_SAMPLES

Em Mon, May 11, 2015 at 08:40:56PM +0000, Liang, Kan escreveu:
> > Em Sun, May 10, 2015 at 03:13:15PM -0400, Kan Liang escreveu:
> > > $ perf record -e '{cycles:p,instructions:p}' -c 20003 --no-time
> > > ~/tchain ~/tchain [perf record: Woken up 148 times to write data]
> > > [perf record: Captured and wrote 36.922 MB perf.data (1206322
> > > samples)]

> > > $ perf report -D | tail
> > > SAMPLE events: 120243
> > > MMAP2 events: 5
> > > LOST_SAMPLES events: 24
> > > FINISHED_ROUND events: 15
> > > cycles:p stats:
> > > TOTAL events: 59348
> > > SAMPLE events: 59348
> > > instructions:p stats:
> > > TOTAL events: 60895
> > > SAMPLE events: 60895

> > The example doesn't show which of cycles:p or instructions:p got lost, isn't
> > that possible? Guess not from the patch, but should, no? I.e. what is
> > PERF_SAMPLE_ID for then?

> Yes, it's possible to know the lost samples number for cycles:p or
> instructions:p. But I didn't implement it in the summary of perf report -D.
> I think a total lost_samples number is enough for user. What they really
> care about should be the total samples drop rate.
> (If they really want to know the number of which event got lost, they can
> search LOST_SAMPLES in perf report -D. sample->id is dumped with lost
> number.)

I disagree, since the support is there, we need to have it in
hists->events_stats[PERF_RECORD_LOST_SAMPLES].

But that can be done in a follow up patch.

It just came quickly to my attention because of all the discussion about
where to store something (PERF_SAMPLE_ID via sample_type + sample_id_all)
that doesn't get used in the patch that introduces it :-)

I'll try to test this all tomorrow and will try to do the needed wiring
to hists_evsel->hists->events_stats.

All working I can push this all via my perf/core event, if PeterZ
agrees and is ok with the kernel specific bits.

- Arnaldo

2015-05-12 12:44:00

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH V9 8/8] perf tools: handle PERF_RECORD_LOST_SAMPLES

On Mon, May 11, 2015 at 06:27:58PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, May 11, 2015 at 08:40:56PM +0000, Liang, Kan escreveu:
> > > Em Sun, May 10, 2015 at 03:13:15PM -0400, Kan Liang escreveu:
> > > > $ perf record -e '{cycles:p,instructions:p}' -c 20003 --no-time
> > > > ~/tchain ~/tchain [perf record: Woken up 148 times to write data]
> > > > [perf record: Captured and wrote 36.922 MB perf.data (1206322
> > > > samples)]
>
> > > > $ perf report -D | tail
> > > > SAMPLE events: 120243
> > > > MMAP2 events: 5
> > > > LOST_SAMPLES events: 24
> > > > FINISHED_ROUND events: 15
> > > > cycles:p stats:
> > > > TOTAL events: 59348
> > > > SAMPLE events: 59348
> > > > instructions:p stats:
> > > > TOTAL events: 60895
> > > > SAMPLE events: 60895
>
> > > The example doesn't show which of cycles:p or instructions:p got lost, isn't
> > > that possible? Guess not from the patch, but should, no? I.e. what is
> > > PERF_SAMPLE_ID for then?
>
> > Yes, it's possible to know the lost samples number for cycles:p or
> > instructions:p. But I didn't implement it in the summary of perf report -D.
> > I think a total lost_samples number is enough for user. What they really
> > care about should be the total samples drop rate.
> > (If they really want to know the number of which event got lost, they can
> > search LOST_SAMPLES in perf report -D. sample->id is dumped with lost
> > number.)
>
> I disagree, since the support is there, we need to have it in
> hists->events_stats[PERF_RECORD_LOST_SAMPLES].
>
> But that can be done in a follow up patch.

Agreed, it would be good to know of which event the samples got lost.

> It just came quickly to my attention because of all the discussion about
> where to store something (PERF_SAMPLE_ID via sample_type + sample_id_all)
> that doesn't get used in the patch that introduces it :-)
>
> I'll try to test this all tomorrow and will try to do the needed wiring
> to hists_evsel->hists->events_stats.
>
> All working I can push this all via my perf/core event, if PeterZ
> agrees and is ok with the kernel specific bits.

I would like to carry these as there some conflicts with other patches I
have.

2015-05-12 13:04:29

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH V9 8/8] perf tools: handle PERF_RECORD_LOST_SAMPLES

Em Tue, May 12, 2015 at 02:43:44PM +0200, Peter Zijlstra escreveu:
> On Mon, May 11, 2015 at 06:27:58PM -0300, Arnaldo Carvalho de Melo wrote:
> > I disagree, since the support is there, we need to have it in
> > hists->events_stats[PERF_RECORD_LOST_SAMPLES].

> > But that can be done in a follow up patch.

> Agreed, it would be good to know of which event the samples got lost.

> > It just came quickly to my attention because of all the discussion about
> > where to store something (PERF_SAMPLE_ID via sample_type + sample_id_all)
> > that doesn't get used in the patch that introduces it :-)
> >
> > I'll try to test this all tomorrow and will try to do the needed wiring
> > to hists_evsel->hists->events_stats.
> >
> > All working I can push this all via my perf/core event, if PeterZ
> > agrees and is ok with the kernel specific bits.
>
> I would like to carry these as there some conflicts with other patches I
> have.

So go ahead and do it, when it lands on tip I'll check and try to
continue what was discussed here, thanks!

- Arnaldo

2015-05-12 13:26:28

by Peter Zijlstra

[permalink] [raw]
Subject: [RFC][PATCH] perf, pebs: Add PEBS v3 record decoding



So seeing how I have both this series and Andi's SKL patches, I did the
below on top of them both.

Could someone try that?

---
Subject: perf, pebs: Add PEBS v3 record decoding
From: Peter Zijlstra <[email protected]>
Date: Tue May 12 15:18:18 CEST 2015

PEBS v3 as present on Skylake fixed the long standing issue of the
status bits. They now really reflect the events that generated the
record.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
arch/x86/kernel/cpu/perf_event_intel_ds.c | 30 +++++++++++++++++++-----------
1 file changed, 19 insertions(+), 11 deletions(-)

--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -1034,6 +1034,9 @@ get_next_pebs_record_by_bit(void *base,
struct pebs_record_nhm *p = at;

if (test_bit(bit, (unsigned long *)&p->status)) {
+ /* PEBS v3 has accurate status bits */
+ if (x86_pmu.intel_cap.pebs_format >= 3)
+ return at;

if (p->status == (1 << bit))
return at;
@@ -1055,20 +1058,18 @@ static void __intel_pmu_pebs_event(struc
{
struct perf_sample_data data;
struct pt_regs regs;
- int i;
void *at = get_next_pebs_record_by_bit(base, top, bit);

if (!intel_pmu_save_and_restart(event) &&
!(event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD))
return;

- if (count > 1) {
- for (i = 0; i < count - 1; i++) {
- setup_pebs_sample_data(event, iregs, at, &data, &regs);
- perf_event_output(event, &data, &regs);
- at += x86_pmu.pebs_record_size;
- at = get_next_pebs_record_by_bit(at, top, bit);
- }
+ while (count > 1) {
+ setup_pebs_sample_data(event, iregs, at, &data, &regs);
+ perf_event_output(event, &data, &regs);
+ at += x86_pmu.pebs_record_size;
+ at = get_next_pebs_record_by_bit(at, top, bit);
+ count--;
}

setup_pebs_sample_data(event, iregs, at, &data, &regs);
@@ -1124,9 +1125,9 @@ static void intel_pmu_drain_pebs_nhm(str
struct debug_store *ds = cpuc->ds;
struct perf_event *event;
void *base, *at, *top;
- int bit;
short counts[MAX_PEBS_EVENTS] = {};
short error[MAX_PEBS_EVENTS] = {};
+ int bit, i;

if (!x86_pmu.pebs_active)
return;
@@ -1142,6 +1143,15 @@ static void intel_pmu_drain_pebs_nhm(str
for (at = base; at < top; at += x86_pmu.pebs_record_size) {
struct pebs_record_nhm *p = at;

+ /* PEBS v3 has accurate status bits */
+ if (x86_pmu.intel_cap.pebs_format >= 3) {
+ for_each_set_bit(bit, (unsigned long *)&p->status,
+ MAX_PEBS_EVENTS)
+ counts[bit]++;
+
+ continue;
+ }
+
bit = find_first_bit((unsigned long *)&p->status,
x86_pmu.max_pebs_events);
if (bit >= x86_pmu.max_pebs_events)
@@ -1171,8 +1181,6 @@ static void intel_pmu_drain_pebs_nhm(str
pebs_status = p->status & cpuc->pebs_enabled;
pebs_status &= (1ULL << MAX_PEBS_EVENTS) - 1;
if (pebs_status != (1 << bit)) {
- u8 i;
-
for_each_set_bit(i, (unsigned long *)&pebs_status,
MAX_PEBS_EVENTS)
error[i]++;

2015-05-12 18:08:40

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC][PATCH] perf, pebs: Add PEBS v3 record decoding

On Tue, May 12, 2015 at 03:25:57PM +0200, Peter Zijlstra wrote:
>
>
> So seeing how I have both this series and Andi's SKL patches, I did the
> below on top of them both.
>
> Could someone try that?

I did a quick test and didn't see any problems.

Tested-by: Andi Kleen <[email protected]>

-Andi

2015-05-13 01:31:36

by Liang, Kan

[permalink] [raw]
Subject: RE: [RFC][PATCH] perf, pebs: Add PEBS v3 record decoding


I did some tests on HSX platform. It works well.

Tested-by: Kan Liang <[email protected]>

Kan

>
> On Tue, May 12, 2015 at 03:25:57PM +0200, Peter Zijlstra wrote:
> >
> >
> > So seeing how I have both this series and Andi's SKL patches, I did
> > the below on top of them both.
> >
> > Could someone try that?
>
> I did a quick test and didn't see any problems.
>
> Tested-by: Andi Kleen <[email protected]>
>
> -Andi

Subject: [tip:perf/core] perf/x86/intel: Introduce PERF_RECORD_LOST_SAMPLES

Commit-ID: f38b0dbb491a6987e198aa6b428db8692a6480f8
Gitweb: http://git.kernel.org/tip/f38b0dbb491a6987e198aa6b428db8692a6480f8
Author: Kan Liang <[email protected]>
AuthorDate: Sun, 10 May 2015 15:13:14 -0400
Committer: Ingo Molnar <[email protected]>
CommitDate: Sun, 7 Jun 2015 16:09:02 +0200

perf/x86/intel: Introduce PERF_RECORD_LOST_SAMPLES

After enlarging the PEBS interrupt threshold, there may be some mixed up
PEBS samples which are discarded by the kernel.

This patch makes the kernel emit a PERF_RECORD_LOST_SAMPLES record with
the number of possible discarded records when it is impossible to demux
the samples.

It makes sure the user is not left in the dark about such discards.

Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/cpu/perf_event_intel_ds.c | 20 ++++++++++++++++---
include/linux/perf_event.h | 3 +++
include/uapi/linux/perf_event.h | 12 +++++++++++
kernel/events/core.c | 33 +++++++++++++++++++++++++++++++
4 files changed, 65 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 266079a..34d0c48 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -1126,6 +1126,7 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
void *base, *at, *top;
int bit;
short counts[MAX_PEBS_EVENTS] = {};
+ short error[MAX_PEBS_EVENTS] = {};

if (!x86_pmu.pebs_active)
return;
@@ -1169,20 +1170,33 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
/* slow path */
pebs_status = p->status & cpuc->pebs_enabled;
pebs_status &= (1ULL << MAX_PEBS_EVENTS) - 1;
- if (pebs_status != (1 << bit))
+ if (pebs_status != (1 << bit)) {
+ u8 i;
+
+ for_each_set_bit(i, (unsigned long *)&pebs_status,
+ MAX_PEBS_EVENTS)
+ error[i]++;
continue;
+ }
}
counts[bit]++;
}

for (bit = 0; bit < x86_pmu.max_pebs_events; bit++) {
- if (counts[bit] == 0)
+ if ((counts[bit] == 0) && (error[bit] == 0))
continue;
event = cpuc->events[bit];
WARN_ON_ONCE(!event);
WARN_ON_ONCE(!event->attr.precise_ip);

- __intel_pmu_pebs_event(event, iregs, base, top, bit, counts[bit]);
+ /* log dropped samples number */
+ if (error[bit])
+ perf_log_lost_samples(event, error[bit]);
+
+ if (counts[bit]) {
+ __intel_pmu_pebs_event(event, iregs, base,
+ top, bit, counts[bit]);
+ }
}
}

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 5f192e1..a204d52 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -743,6 +743,9 @@ perf_event__output_id_sample(struct perf_event *event,
struct perf_output_handle *handle,
struct perf_sample_data *sample);

+extern void
+perf_log_lost_samples(struct perf_event *event, u64 lost);
+
static inline bool is_sampling_event(struct perf_event *event)
{
return event->attr.sample_period != 0;
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index c4622f1..613ed9a 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -802,6 +802,18 @@ enum perf_event_type {
*/
PERF_RECORD_ITRACE_START = 12,

+ /*
+ * Records the dropped/lost sample number.
+ *
+ * struct {
+ * struct perf_event_header header;
+ *
+ * u64 lost;
+ * struct sample_id sample_id;
+ * };
+ */
+ PERF_RECORD_LOST_SAMPLES = 13,
+
PERF_RECORD_MAX, /* non-ABI */
};

diff --git a/kernel/events/core.c b/kernel/events/core.c
index e499b4e..9e0773d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5975,6 +5975,39 @@ void perf_event_aux_event(struct perf_event *event, unsigned long head,
}

/*
+ * Lost/dropped samples logging
+ */
+void perf_log_lost_samples(struct perf_event *event, u64 lost)
+{
+ struct perf_output_handle handle;
+ struct perf_sample_data sample;
+ int ret;
+
+ struct {
+ struct perf_event_header header;
+ u64 lost;
+ } lost_samples_event = {
+ .header = {
+ .type = PERF_RECORD_LOST_SAMPLES,
+ .misc = 0,
+ .size = sizeof(lost_samples_event),
+ },
+ .lost = lost,
+ };
+
+ perf_event_header__init_id(&lost_samples_event.header, &sample, event);
+
+ ret = perf_output_begin(&handle, event,
+ lost_samples_event.header.size);
+ if (ret)
+ return;
+
+ perf_output_put(&handle, lost_samples_event);
+ perf_event__output_id_sample(event, &handle, &sample);
+ perf_output_end(&handle);
+}
+
+/*
* IRQ throttle logging
*/

Subject: [tip:perf/core] perf tools: handle PERF_RECORD_LOST_SAMPLES

Commit-ID: c4937a91ea56b546234b0608a413ebad90536d26
Gitweb: http://git.kernel.org/tip/c4937a91ea56b546234b0608a413ebad90536d26
Author: Kan Liang <[email protected]>
AuthorDate: Sun, 10 May 2015 15:13:15 -0400
Committer: Ingo Molnar <[email protected]>
CommitDate: Sun, 7 Jun 2015 16:09:06 +0200

perf tools: handle PERF_RECORD_LOST_SAMPLES

This patch modifies the perf tool to handle the new RECORD type,
PERF_RECORD_LOST_SAMPLES.

The number of lost-sample events is stored in
.nr_events[PERF_RECORD_LOST_SAMPLES]. The exact number of samples
which the kernel dropped is stored in total_lost_samples.

When the percentage of dropped samples is greater than 5%, a warning
is printed.

Here are some examples:

Eg 1, Recording different frequently-occurring events is safe with the
patch. Only a very low drop rate is associated with such actions.

$ perf record -e '{cycles:p,instructions:p}' -c 20003 --no-time ~/tchain ~/tchain

$ perf report -D | tail
SAMPLE events: 120243
MMAP2 events: 5
LOST_SAMPLES events: 24
FINISHED_ROUND events: 15
cycles:p stats:
TOTAL events: 59348
SAMPLE events: 59348
instructions:p stats:
TOTAL events: 60895
SAMPLE events: 60895

$ perf report --stdio --group
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 24
#
# Samples: 120K of event 'anon group { cycles:p, instructions:p }'
# Event count (approx.): 24048600000
#
# Overhead Command Shared Object Symbol
# ................ ........... ................
..................................
#
99.74% 99.86% tchain_edit tchain_edit [.] f3
0.09% 0.02% tchain_edit tchain_edit [.] f2
0.04% 0.00% tchain_edit [kernel.vmlinux] [k] ixgbe_read_reg

Eg 2, Recording the same thing multiple times can lead to high drop
rate, but it is not a useful configuration.

$ perf record -e '{cycles:p,cycles:p}' -c 20003 --no-time ~/tchain
Warning: Processed 600592 samples and lost 99.73% samples!
[perf record: Woken up 148 times to write data]
[perf record: Captured and wrote 36.922 MB perf.data (1206322 samples)]
[perf record: Woken up 1 times to write data]
[perf record: Captured and wrote 0.121 MB perf.data (1629 samples)]

Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
tools/perf/builtin-report.c | 1 +
tools/perf/util/event.c | 9 +++++++++
tools/perf/util/event.h | 17 +++++++++++++++++
tools/perf/util/machine.c | 10 ++++++++++
tools/perf/util/machine.h | 2 ++
tools/perf/util/session.c | 19 +++++++++++++++++++
tools/perf/util/tool.h | 1 +
7 files changed, 59 insertions(+)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 56025d9..628090b 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -320,6 +320,7 @@ static int perf_evlist__tty_browse_hists(struct perf_evlist *evlist,
{
struct perf_evsel *pos;

+ fprintf(stdout, "#\n# Total Lost Samples: %lu\n#\n", evlist->stats.total_lost_samples);
evlist__for_each(evlist, pos) {
struct hists *hists = evsel__hists(pos);
const char *evname = perf_evsel__name(pos);
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index c192596..793b150 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -25,6 +25,7 @@ static const char *perf_event__names[] = {
[PERF_RECORD_SAMPLE] = "SAMPLE",
[PERF_RECORD_AUX] = "AUX",
[PERF_RECORD_ITRACE_START] = "ITRACE_START",
+ [PERF_RECORD_LOST_SAMPLES] = "LOST_SAMPLES",
[PERF_RECORD_HEADER_ATTR] = "ATTR",
[PERF_RECORD_HEADER_EVENT_TYPE] = "EVENT_TYPE",
[PERF_RECORD_HEADER_TRACING_DATA] = "TRACING_DATA",
@@ -712,6 +713,14 @@ int perf_event__process_itrace_start(struct perf_tool *tool __maybe_unused,
return machine__process_itrace_start_event(machine, event);
}

+int perf_event__process_lost_samples(struct perf_tool *tool __maybe_unused,
+ union perf_event *event,
+ struct perf_sample *sample,
+ struct machine *machine)
+{
+ return machine__process_lost_samples_event(machine, event, sample);
+}
+
size_t perf_event__fprintf_mmap(union perf_event *event, FILE *fp)
{
return fprintf(fp, " %d/%d: [%#" PRIx64 "(%#" PRIx64 ") @ %#" PRIx64 "]: %c %s\n",
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 97179ab..5dc51ad 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -52,6 +52,11 @@ struct lost_event {
u64 lost;
};

+struct lost_samples_event {
+ struct perf_event_header header;
+ u64 lost;
+};
+
/*
* PERF_FORMAT_ENABLED | PERF_FORMAT_RUNNING | PERF_FORMAT_ID
*/
@@ -235,6 +240,12 @@ enum auxtrace_error_type {
* total_lost tells exactly how many events the kernel in fact lost, i.e. it is
* the sum of all struct lost_event.lost fields reported.
*
+ * The kernel discards mixed up samples and sends the number in a
+ * PERF_RECORD_LOST_SAMPLES event. The number of lost-samples events is stored
+ * in .nr_events[PERF_RECORD_LOST_SAMPLES] while total_lost_samples tells
+ * exactly how many samples the kernel in fact dropped, i.e. it is the sum of
+ * all struct lost_samples_event.lost fields reported.
+ *
* The total_period is needed because by default auto-freq is used, so
* multipling nr_events[PERF_EVENT_SAMPLE] by a frequency isn't possible to get
* the total number of low level events, it is necessary to to sum all struct
@@ -244,6 +255,7 @@ struct events_stats {
u64 total_period;
u64 total_non_filtered_period;
u64 total_lost;
+ u64 total_lost_samples;
u64 total_invalid_chains;
u32 nr_events[PERF_RECORD_HEADER_MAX];
u32 nr_non_filtered_samples;
@@ -342,6 +354,7 @@ union perf_event {
struct comm_event comm;
struct fork_event fork;
struct lost_event lost;
+ struct lost_samples_event lost_samples;
struct read_event read;
struct throttle_event throttle;
struct sample_event sample;
@@ -390,6 +403,10 @@ int perf_event__process_lost(struct perf_tool *tool,
union perf_event *event,
struct perf_sample *sample,
struct machine *machine);
+int perf_event__process_lost_samples(struct perf_tool *tool,
+ union perf_event *event,
+ struct perf_sample *sample,
+ struct machine *machine);
int perf_event__process_aux(struct perf_tool *tool,
union perf_event *event,
struct perf_sample *sample,
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 9e02c86..f15ed24 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -482,6 +482,14 @@ int machine__process_lost_event(struct machine *machine __maybe_unused,
return 0;
}

+int machine__process_lost_samples_event(struct machine *machine __maybe_unused,
+ union perf_event *event, struct perf_sample *sample)
+{
+ dump_printf(": id:%" PRIu64 ": lost samples :%" PRIu64 "\n",
+ sample->id, event->lost_samples.lost);
+ return 0;
+}
+
static struct dso*
machine__module_dso(struct machine *machine, struct kmod_path *m,
const char *filename)
@@ -1419,6 +1427,8 @@ int machine__process_event(struct machine *machine, union perf_event *event,
ret = machine__process_aux_event(machine, event); break;
case PERF_RECORD_ITRACE_START:
ret = machine__process_itrace_start_event(machine, event);
+ case PERF_RECORD_LOST_SAMPLES:
+ ret = machine__process_lost_samples_event(machine, event, sample); break;
break;
default:
ret = -1;
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index 39a0ca0..8e1f796 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -81,6 +81,8 @@ int machine__process_fork_event(struct machine *machine, union perf_event *event
struct perf_sample *sample);
int machine__process_lost_event(struct machine *machine, union perf_event *event,
struct perf_sample *sample);
+int machine__process_lost_samples_event(struct machine *machine, union perf_event *event,
+ struct perf_sample *sample);
int machine__process_aux_event(struct machine *machine,
union perf_event *event);
int machine__process_itrace_start_event(struct machine *machine,
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 39fe09d..88d87bf 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -325,6 +325,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
tool->exit = process_event_stub;
if (tool->lost == NULL)
tool->lost = perf_event__process_lost;
+ if (tool->lost_samples == NULL)
+ tool->lost_samples = perf_event__process_lost_samples;
if (tool->aux == NULL)
tool->aux = perf_event__process_aux;
if (tool->itrace_start == NULL)
@@ -606,6 +608,7 @@ static perf_event__swap_op perf_event__swap_ops[] = {
[PERF_RECORD_SAMPLE] = perf_event__all64_swap,
[PERF_RECORD_AUX] = perf_event__aux_swap,
[PERF_RECORD_ITRACE_START] = perf_event__itrace_start_swap,
+ [PERF_RECORD_LOST_SAMPLES] = perf_event__all64_swap,
[PERF_RECORD_HEADER_ATTR] = perf_event__hdr_attr_swap,
[PERF_RECORD_HEADER_EVENT_TYPE] = perf_event__event_type_swap,
[PERF_RECORD_HEADER_TRACING_DATA] = perf_event__tracing_data_swap,
@@ -1049,6 +1052,10 @@ static int machines__deliver_event(struct machines *machines,
if (tool->lost == perf_event__process_lost)
evlist->stats.total_lost += event->lost.lost;
return tool->lost(tool, event, sample, machine);
+ case PERF_RECORD_LOST_SAMPLES:
+ if (tool->lost_samples == perf_event__process_lost_samples)
+ evlist->stats.total_lost_samples += event->lost_samples.lost;
+ return tool->lost_samples(tool, event, sample, machine);
case PERF_RECORD_READ:
return tool->read(tool, event, sample, evsel, machine);
case PERF_RECORD_THROTTLE:
@@ -1286,6 +1293,18 @@ static void perf_session__warn_about_errors(const struct perf_session *session)
stats->nr_events[PERF_RECORD_LOST]);
}

+ if (session->tool->lost_samples == perf_event__process_lost_samples) {
+ double drop_rate;
+
+ drop_rate = (double)stats->total_lost_samples /
+ (double) (stats->nr_events[PERF_RECORD_SAMPLE] + stats->total_lost_samples);
+ if (drop_rate > 0.05) {
+ ui__warning("Processed %lu samples and lost %3.2f%% samples!\n\n",
+ stats->nr_events[PERF_RECORD_SAMPLE] + stats->total_lost_samples,
+ drop_rate * 100.0);
+ }
+ }
+
if (stats->nr_unknown_events != 0) {
ui__warning("Found %u unknown events!\n\n"
"Is this an older tool processing a perf.data "
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index 7f282ad..c307dd4 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -43,6 +43,7 @@ struct perf_tool {
fork,
exit,
lost,
+ lost_samples,
aux,
itrace_start,
throttle,